-=head2 Internal Variable Types
-
-You should by now have had a look at L<perlguts>, which tells you about
-Perl's internal variable types: SVs, HVs, AVs and the rest. If not, do
-that now.
-
-These variables are used not only to represent Perl-space variables, but
-also any constants in the code, as well as some structures completely
-internal to Perl. The symbol table, for instance, is an ordinary Perl
-hash. Your code is represented by an SV as it's read into the parser;
-any program files you call are opened via ordinary Perl filehandles, and
-so on.
-
-The core L<Devel::Peek|Devel::Peek> module lets us examine SVs from a
-Perl program. Let's see, for instance, how Perl treats the constant
-C<"hello">.
-
- % perl -MDevel::Peek -e 'Dump("hello")'
- 1 SV = PV(0xa041450) at 0xa04ecbc
- 2 REFCNT = 1
- 3 FLAGS = (POK,READONLY,pPOK)
- 4 PV = 0xa0484e0 "hello"\0
- 5 CUR = 5
- 6 LEN = 6
-
-Reading C<Devel::Peek> output takes a bit of practise, so let's go
-through it line by line.
-
-Line 1 tells us we're looking at an SV which lives at C<0xa04ecbc> in
-memory. SVs themselves are very simple structures, but they contain a
-pointer to a more complex structure. In this case, it's a PV, a
-structure which holds a string value, at location C<0xa041450>. Line 2
-is the reference count; there are no other references to this data, so
-it's 1.
-
-Line 3 are the flags for this SV - it's OK to use it as a PV, it's a
-read-only SV (because it's a constant) and the data is a PV internally.
-Next we've got the contents of the string, starting at location
-C<0xa0484e0>.
-
-Line 5 gives us the current length of the string - note that this does
-B<not> include the null terminator. Line 6 is not the length of the
-string, but the length of the currently allocated buffer; as the string
-grows, Perl automatically extends the available storage via a routine
-called C<SvGROW>.
-
-You can get at any of these quantities from C very easily; just add
-C<Sv> to the name of the field shown in the snippet, and you've got a
-macro which will return the value: C<SvCUR(sv)> returns the current
-length of the string, C<SvREFCOUNT(sv)> returns the reference count,
-C<SvPV(sv, len)> returns the string itself with its length, and so on.
-More macros to manipulate these properties can be found in L<perlguts>.
-
-Let's take an example of manipulating a PV, from C<sv_catpvn>, in F<sv.c>
-
- 1 void
- 2 Perl_sv_catpvn(pTHX_ register SV *sv, register const char *ptr, register STRLEN len)
- 3 {
- 4 STRLEN tlen;
- 5 char *junk;
-
- 6 junk = SvPV_force(sv, tlen);
- 7 SvGROW(sv, tlen + len + 1);
- 8 if (ptr == junk)
- 9 ptr = SvPVX(sv);
- 10 Move(ptr,SvPVX(sv)+tlen,len,char);
- 11 SvCUR(sv) += len;
- 12 *SvEND(sv) = '\0';
- 13 (void)SvPOK_only_UTF8(sv); /* validate pointer */
- 14 SvTAINT(sv);
- 15 }
-
-This is a function which adds a string, C<ptr>, of length C<len> onto
-the end of the PV stored in C<sv>. The first thing we do in line 6 is
-make sure that the SV B<has> a valid PV, by calling the C<SvPV_force>
-macro to force a PV. As a side effect, C<tlen> gets set to the current
-value of the PV, and the PV itself is returned to C<junk>.
-
-In line 7, we make sure that the SV will have enough room to accommodate
-the old string, the new string and the null terminator. If C<LEN> isn't
-big enough, C<SvGROW> will reallocate space for us.
-
-Now, if C<junk> is the same as the string we're trying to add, we can
-grab the string directly from the SV; C<SvPVX> is the address of the PV
-in the SV.
-
-Line 10 does the actual catenation: the C<Move> macro moves a chunk of
-memory around: we move the string C<ptr> to the end of the PV - that's
-the start of the PV plus its current length. We're moving C<len> bytes
-of type C<char>. After doing so, we need to tell Perl we've extended the
-string, by altering C<CUR> to reflect the new length. C<SvEND> is a
-macro which gives us the end of the string, so that needs to be a
-C<"\0">.
-
-Line 13 manipulates the flags; since we've changed the PV, any IV or NV
-values will no longer be valid: if we have C<$a=10; $a.="6";> we don't
-want to use the old IV of 10. C<SvPOK_only_utf8> is a special UTF-8-aware
-version of C<SvPOK_only>, a macro which turns off the IOK and NOK flags
-and turns on POK. The final C<SvTAINT> is a macro which launders tainted
-data if taint mode is turned on.
-
-AVs and HVs are more complicated, but SVs are by far the most common
-variable type being thrown around. Having seen something of how we
-manipulate these, let's go on and look at how the op tree is
-constructed.
-
-=head2 Op Trees
-
-First, what is the op tree, anyway? The op tree is the parsed
-representation of your program, as we saw in our section on parsing, and
-it's the sequence of operations that Perl goes through to execute your
-program, as we saw in L</Running>.
-
-An op is a fundamental operation that Perl can perform: all the built-in
-functions and operators are ops, and there are a series of ops which
-deal with concepts the interpreter needs internally - entering and
-leaving a block, ending a statement, fetching a variable, and so on.
-
-The op tree is connected in two ways: you can imagine that there are two
-"routes" through it, two orders in which you can traverse the tree.
-First, parse order reflects how the parser understood the code, and
-secondly, execution order tells perl what order to perform the
-operations in.
-
-The easiest way to examine the op tree is to stop Perl after it has
-finished parsing, and get it to dump out the tree. This is exactly what
-the compiler backends L<B::Terse|B::Terse>, L<B::Concise|B::Concise>
-and L<B::Debug|B::Debug> do.
-
-Let's have a look at how Perl sees C<$a = $b + $c>:
-
- % perl -MO=Terse -e '$a=$b+$c'
- 1 LISTOP (0x8179888) leave
- 2 OP (0x81798b0) enter
- 3 COP (0x8179850) nextstate
- 4 BINOP (0x8179828) sassign
- 5 BINOP (0x8179800) add [1]
- 6 UNOP (0x81796e0) null [15]
- 7 SVOP (0x80fafe0) gvsv GV (0x80fa4cc) *b
- 8 UNOP (0x81797e0) null [15]
- 9 SVOP (0x8179700) gvsv GV (0x80efeb0) *c
- 10 UNOP (0x816b4f0) null [15]
- 11 SVOP (0x816dcf0) gvsv GV (0x80fa460) *a
-
-Let's start in the middle, at line 4. This is a BINOP, a binary
-operator, which is at location C<0x8179828>. The specific operator in
-question is C<sassign> - scalar assignment - and you can find the code
-which implements it in the function C<pp_sassign> in F<pp_hot.c>. As a
-binary operator, it has two children: the add operator, providing the
-result of C<$b+$c>, is uppermost on line 5, and the left hand side is on
-line 10.
-
-Line 10 is the null op: this does exactly nothing. What is that doing
-there? If you see the null op, it's a sign that something has been
-optimized away after parsing. As we mentioned in L</Optimization>,
-the optimization stage sometimes converts two operations into one, for
-example when fetching a scalar variable. When this happens, instead of
-rewriting the op tree and cleaning up the dangling pointers, it's easier
-just to replace the redundant operation with the null op. Originally,
-the tree would have looked like this:
-
- 10 SVOP (0x816b4f0) rv2sv [15]
- 11 SVOP (0x816dcf0) gv GV (0x80fa460) *a
-
-That is, fetch the C<a> entry from the main symbol table, and then look
-at the scalar component of it: C<gvsv> (C<pp_gvsv> into F<pp_hot.c>)
-happens to do both these things.
-
-The right hand side, starting at line 5 is similar to what we've just
-seen: we have the C<add> op (C<pp_add> also in F<pp_hot.c>) add together
-two C<gvsv>s.
-
-Now, what's this about?
-
- 1 LISTOP (0x8179888) leave
- 2 OP (0x81798b0) enter
- 3 COP (0x8179850) nextstate
-
-C<enter> and C<leave> are scoping ops, and their job is to perform any
-housekeeping every time you enter and leave a block: lexical variables
-are tidied up, unreferenced variables are destroyed, and so on. Every
-program will have those first three lines: C<leave> is a list, and its
-children are all the statements in the block. Statements are delimited
-by C<nextstate>, so a block is a collection of C<nextstate> ops, with
-the ops to be performed for each statement being the children of
-C<nextstate>. C<enter> is a single op which functions as a marker.
-
-That's how Perl parsed the program, from top to bottom:
-
- Program
- |
- Statement
- |
- =
- / \
- / \
- $a +
- / \
- $b $c
-
-However, it's impossible to B<perform> the operations in this order:
-you have to find the values of C<$b> and C<$c> before you add them
-together, for instance. So, the other thread that runs through the op
-tree is the execution order: each op has a field C<op_next> which points
-to the next op to be run, so following these pointers tells us how perl
-executes the code. We can traverse the tree in this order using
-the C<exec> option to C<B::Terse>:
-
- % perl -MO=Terse,exec -e '$a=$b+$c'
- 1 OP (0x8179928) enter
- 2 COP (0x81798c8) nextstate
- 3 SVOP (0x81796c8) gvsv GV (0x80fa4d4) *b
- 4 SVOP (0x8179798) gvsv GV (0x80efeb0) *c
- 5 BINOP (0x8179878) add [1]
- 6 SVOP (0x816dd38) gvsv GV (0x80fa468) *a
- 7 BINOP (0x81798a0) sassign
- 8 LISTOP (0x8179900) leave
-
-This probably makes more sense for a human: enter a block, start a
-statement. Get the values of C<$b> and C<$c>, and add them together.
-Find C<$a>, and assign one to the other. Then leave.
-
-The way Perl builds up these op trees in the parsing process can be
-unravelled by examining F<perly.y>, the YACC grammar. Let's take the
-piece we need to construct the tree for C<$a = $b + $c>
-
- 1 term : term ASSIGNOP term
- 2 { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); }
- 3 | term ADDOP term
- 4 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }
-
-If you're not used to reading BNF grammars, this is how it works: You're
-fed certain things by the tokeniser, which generally end up in upper
-case. Here, C<ADDOP>, is provided when the tokeniser sees C<+> in your
-code. C<ASSIGNOP> is provided when C<=> is used for assigning. These are
-"terminal symbols", because you can't get any simpler than them.
-
-The grammar, lines one and three of the snippet above, tells you how to
-build up more complex forms. These complex forms, "non-terminal symbols"
-are generally placed in lower case. C<term> here is a non-terminal
-symbol, representing a single expression.
-
-The grammar gives you the following rule: you can make the thing on the
-left of the colon if you see all the things on the right in sequence.
-This is called a "reduction", and the aim of parsing is to completely
-reduce the input. There are several different ways you can perform a
-reduction, separated by vertical bars: so, C<term> followed by C<=>
-followed by C<term> makes a C<term>, and C<term> followed by C<+>
-followed by C<term> can also make a C<term>.
-
-So, if you see two terms with an C<=> or C<+>, between them, you can
-turn them into a single expression. When you do this, you execute the
-code in the block on the next line: if you see C<=>, you'll do the code
-in line 2. If you see C<+>, you'll do the code in line 4. It's this code
-which contributes to the op tree.
-
- | term ADDOP term
- { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }
-
-What this does is creates a new binary op, and feeds it a number of
-variables. The variables refer to the tokens: C<$1> is the first token in
-the input, C<$2> the second, and so on - think regular expression
-backreferences. C<$$> is the op returned from this reduction. So, we
-call C<newBINOP> to create a new binary operator. The first parameter to
-C<newBINOP>, a function in F<op.c>, is the op type. It's an addition
-operator, so we want the type to be C<ADDOP>. We could specify this
-directly, but it's right there as the second token in the input, so we
-use C<$2>. The second parameter is the op's flags: 0 means "nothing
-special". Then the things to add: the left and right hand side of our
-expression, in scalar context.
-
-=head2 Stacks
-
-When perl executes something like C<addop>, how does it pass on its
-results to the next op? The answer is, through the use of stacks. Perl
-has a number of stacks to store things it's currently working on, and
-we'll look at the three most important ones here.
-
-=over 3
-
-=item Argument stack
-
-Arguments are passed to PP code and returned from PP code using the
-argument stack, C<ST>. The typical way to handle arguments is to pop
-them off the stack, deal with them how you wish, and then push the result
-back onto the stack. This is how, for instance, the cosine operator
-works:
-
- NV value;
- value = POPn;
- value = Perl_cos(value);
- XPUSHn(value);
-
-We'll see a more tricky example of this when we consider Perl's macros
-below. C<POPn> gives you the NV (floating point value) of the top SV on
-the stack: the C<$x> in C<cos($x)>. Then we compute the cosine, and push
-the result back as an NV. The C<X> in C<XPUSHn> means that the stack
-should be extended if necessary - it can't be necessary here, because we
-know there's room for one more item on the stack, since we've just
-removed one! The C<XPUSH*> macros at least guarantee safety.
-
-Alternatively, you can fiddle with the stack directly: C<SP> gives you
-the first element in your portion of the stack, and C<TOP*> gives you
-the top SV/IV/NV/etc. on the stack. So, for instance, to do unary
-negation of an integer:
-
- SETi(-TOPi);
-
-Just set the integer value of the top stack entry to its negation.
-
-Argument stack manipulation in the core is exactly the same as it is in
-XSUBs - see L<perlxstut>, L<perlxs> and L<perlguts> for a longer
-description of the macros used in stack manipulation.
-
-=item Mark stack
-
-I say "your portion of the stack" above because PP code doesn't
-necessarily get the whole stack to itself: if your function calls
-another function, you'll only want to expose the arguments aimed for the
-called function, and not (necessarily) let it get at your own data. The
-way we do this is to have a "virtual" bottom-of-stack, exposed to each
-function. The mark stack keeps bookmarks to locations in the argument
-stack usable by each function. For instance, when dealing with a tied
-variable, (internally, something with "P" magic) Perl has to call
-methods for accesses to the tied variables. However, we need to separate
-the arguments exposed to the method to the argument exposed to the
-original function - the store or fetch or whatever it may be. Here's
-roughly how the tied C<push> is implemented; see C<av_push> in F<av.c>:
-
- 1 PUSHMARK(SP);
- 2 EXTEND(SP,2);
- 3 PUSHs(SvTIED_obj((SV*)av, mg));
- 4 PUSHs(val);
- 5 PUTBACK;
- 6 ENTER;
- 7 call_method("PUSH", G_SCALAR|G_DISCARD);
- 8 LEAVE;
-
-Let's examine the whole implementation, for practice:
-
- 1 PUSHMARK(SP);
-
-Push the current state of the stack pointer onto the mark stack. This is
-so that when we've finished adding items to the argument stack, Perl
-knows how many things we've added recently.
-
- 2 EXTEND(SP,2);
- 3 PUSHs(SvTIED_obj((SV*)av, mg));
- 4 PUSHs(val);
-
-We're going to add two more items onto the argument stack: when you have
-a tied array, the C<PUSH> subroutine receives the object and the value
-to be pushed, and that's exactly what we have here - the tied object,
-retrieved with C<SvTIED_obj>, and the value, the SV C<val>.
-
- 5 PUTBACK;
-
-Next we tell Perl to update the global stack pointer from our internal
-variable: C<dSP> only gave us a local copy, not a reference to the global.
-
- 6 ENTER;
- 7 call_method("PUSH", G_SCALAR|G_DISCARD);
- 8 LEAVE;
-
-C<ENTER> and C<LEAVE> localise a block of code - they make sure that all
-variables are tidied up, everything that has been localised gets
-its previous value returned, and so on. Think of them as the C<{> and
-C<}> of a Perl block.
-
-To actually do the magic method call, we have to call a subroutine in
-Perl space: C<call_method> takes care of that, and it's described in
-L<perlcall>. We call the C<PUSH> method in scalar context, and we're
-going to discard its return value. The call_method() function
-removes the top element of the mark stack, so there is nothing for
-the caller to clean up.
-
-=item Save stack
-
-C doesn't have a concept of local scope, so perl provides one. We've
-seen that C<ENTER> and C<LEAVE> are used as scoping braces; the save
-stack implements the C equivalent of, for example:
-
- {
- local $foo = 42;
- ...
- }
-
-See L<perlguts/Localising Changes> for how to use the save stack.