This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Reapply "docs: clarify effect of $^H, %^H, ${^WARNING_BITS}"
[perl5.git] / pod / perlinterp.pod
CommitLineData
04c692a8
DR
1=encoding utf8
2
3=for comment
4Consistent formatting of this file is achieved with:
5 perl ./Porting/podtidy pod/perlinterp.pod
6
7=head1 NAME
8
9perlinterp - An overview of the Perl interpreter
10
11=head1 DESCRIPTION
12
13This document provides an overview of how the Perl interpreter works at
14the level of C code, along with pointers to the relevant C source code
15files.
16
17=head1 ELEMENTS OF THE INTERPRETER
18
19The work of the interpreter has two main stages: compiling the code
20into the internal representation, or bytecode, and then executing it.
21L<perlguts/Compiled code> explains exactly how the compilation stage
22happens.
23
24Here is a short breakdown of perl's operation:
25
26=head2 Startup
27
28The action begins in F<perlmain.c>. (or F<miniperlmain.c> for miniperl)
29This is very high-level code, enough to fit on a single screen, and it
30resembles the code found in L<perlembed>; most of the real action takes
31place in F<perl.c>
32
33F<perlmain.c> is generated by C<ExtUtils::Miniperl> from
34F<miniperlmain.c> at make time, so you should make perl to follow this
35along.
36
37First, F<perlmain.c> allocates some memory and constructs a Perl
38interpreter, along these lines:
39
40 1 PERL_SYS_INIT3(&argc,&argv,&env);
41 2
42 3 if (!PL_do_undump) {
43 4 my_perl = perl_alloc();
44 5 if (!my_perl)
45 6 exit(1);
46 7 perl_construct(my_perl);
47 8 PL_perl_destruct_level = 0;
48 9 }
49
50Line 1 is a macro, and its definition is dependent on your operating
51system. Line 3 references C<PL_do_undump>, a global variable - all
52global variables in Perl start with C<PL_>. This tells you whether the
53current running program was created with the C<-u> flag to perl and
54then F<undump>, which means it's going to be false in any sane context.
55
56Line 4 calls a function in F<perl.c> to allocate memory for a Perl
57interpreter. It's quite a simple function, and the guts of it looks
58like this:
59
60 my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter));
61
62Here you see an example of Perl's system abstraction, which we'll see
63later: C<PerlMem_malloc> is either your system's C<malloc>, or Perl's
64own C<malloc> as defined in F<malloc.c> if you selected that option at
65configure time.
66
67Next, in line 7, we construct the interpreter using perl_construct,
68also in F<perl.c>; this sets up all the special variables that Perl
69needs, the stacks, and so on.
70
71Now we pass Perl the command line options, and tell it to go:
72
fe2024f9 73 if (!perl_parse(my_perl, xs_init, argc, argv, (char **)NULL))
04c692a8
DR
74 perl_run(my_perl);
75
76 exitstatus = perl_destruct(my_perl);
77
78 perl_free(my_perl);
79
80C<perl_parse> is actually a wrapper around C<S_parse_body>, as defined
81in F<perl.c>, which processes the command line options, sets up any
82statically linked XS modules, opens the program and calls C<yyparse> to
83parse it.
84
85=head2 Parsing
86
87The aim of this stage is to take the Perl source, and turn it into an
88op tree. We'll see what one of those looks like later. Strictly
89speaking, there's three things going on here.
90
91C<yyparse>, the parser, lives in F<perly.c>, although you're better off
92reading the original YACC input in F<perly.y>. (Yes, Virginia, there
93B<is> a YACC grammar for Perl!) The job of the parser is to take your
94code and "understand" it, splitting it into sentences, deciding which
95operands go with which operators and so on.
96
97The parser is nobly assisted by the lexer, which chunks up your input
98into tokens, and decides what type of thing each token is: a variable
99name, an operator, a bareword, a subroutine, a core function, and so
100on. The main point of entry to the lexer is C<yylex>, and that and its
101associated routines can be found in F<toke.c>. Perl isn't much like
102other computer languages; it's highly context sensitive at times, it
103can be tricky to work out what sort of token something is, or where a
104token ends. As such, there's a lot of interplay between the tokeniser
105and the parser, which can get pretty frightening if you're not used to
106it.
107
108As the parser understands a Perl program, it builds up a tree of
109operations for the interpreter to perform during execution. The
110routines which construct and link together the various operations are
111to be found in F<op.c>, and will be examined later.
112
113=head2 Optimization
114
115Now the parsing stage is complete, and the finished tree represents the
116operations that the Perl interpreter needs to perform to execute our
117program. Next, Perl does a dry run over the tree looking for
118optimisations: constant expressions such as C<3 + 4> will be computed
119now, and the optimizer will also see if any multiple operations can be
120replaced with a single one. For instance, to fetch the variable
121C<$foo>, instead of grabbing the glob C<*foo> and looking at the scalar
122component, the optimizer fiddles the op tree to use a function which
123directly looks up the scalar in question. The main optimizer is C<peep>
124in F<op.c>, and many ops have their own optimizing functions.
125
126=head2 Running
127
128Now we're finally ready to go: we have compiled Perl byte code, and all
129that's left to do is run it. The actual execution is done by the
130C<runops_standard> function in F<run.c>; more specifically, it's done
131by these three innocent looking lines:
132
133 while ((PL_op = PL_op->op_ppaddr(aTHX))) {
134 PERL_ASYNC_CHECK();
135 }
136
137You may be more comfortable with the Perl version of that:
138
139 PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}};
140
141Well, maybe not. Anyway, each op contains a function pointer, which
142stipulates the function which will actually carry out the operation.
143This function will return the next op in the sequence - this allows for
144things like C<if> which choose the next op dynamically at run time. The
145C<PERL_ASYNC_CHECK> makes sure that things like signals interrupt
146execution if required.
147
46326d43
KW
148=for apidoc_section Embedding and Interpreter Cloning
149=for apidoc Amh|void|PERL_ASYNC_CHECK
150
04c692a8
DR
151The actual functions called are known as PP code, and they're spread
152between four files: F<pp_hot.c> contains the "hot" code, which is most
153often used and highly optimized, F<pp_sys.c> contains all the
154system-specific functions, F<pp_ctl.c> contains the functions which
155implement control structures (C<if>, C<while> and the like) and F<pp.c>
156contains everything else. These are, if you like, the C code for Perl's
157built-in functions and operators.
158
159Note that each C<pp_> function is expected to return a pointer to the
160next op. Calls to perl subs (and eval blocks) are handled within the
161same runops loop, and do not consume extra space on the C stack. For
162example, C<pp_entersub> and C<pp_entertry> just push a C<CxSUB> or
163C<CxEVAL> block struct onto the context stack which contain the address
164of the op following the sub call or eval. They then return the first op
165of that sub or eval block, and so execution continues of that sub or
166block. Later, a C<pp_leavesub> or C<pp_leavetry> op pops the C<CxSUB>
167or C<CxEVAL>, retrieves the return op from it, and returns it.
168
169=head2 Exception handing
170
171Perl's exception handing (i.e. C<die> etc.) is built on top of the
172low-level C<setjmp()>/C<longjmp()> C-library functions. These basically
173provide a way to capture the current PC and SP registers and later
174restore them; i.e. a C<longjmp()> continues at the point in code where
175a previous C<setjmp()> was done, with anything further up on the C
176stack being lost. This is why code should always save values using
88bd69c2 177C<SAVE_I<FOO>> rather than in auto variables.
04c692a8
DR
178
179The perl core wraps C<setjmp()> etc in the macros C<JMPENV_PUSH> and
180C<JMPENV_JUMP>. The basic rule of perl exceptions is that C<exit>, and
181C<die> (in the absence of C<eval>) perform a C<JMPENV_JUMP(2)>, while
182C<die> within C<eval> does a C<JMPENV_JUMP(3)>.
183
46326d43
KW
184=for apidoc_section Exception Handling (simple) Macros
185=for apidoc Amh|void|JMPENV_PUSH|int v
186=for apidoc Amh|void|JMPENV_JUMP|int v
187
04c692a8
DR
188At entry points to perl, such as C<perl_parse()>, C<perl_run()> and
189C<call_sv(cv, G_EVAL)> each does a C<JMPENV_PUSH>, then enter a runops
190loop or whatever, and handle possible exception returns. For a 2
191return, final cleanup is performed, such as popping stacks and calling
192C<CHECK> or C<END> blocks. Amongst other things, this is how scope
193cleanup still occurs during an C<exit>.
194
195If a C<die> can find a C<CxEVAL> block on the context stack, then the
196stack is popped to that level and the return op in that block is
197assigned to C<PL_restartop>; then a C<JMPENV_JUMP(3)> is performed.
198This normally passes control back to the guard. In the case of
199C<perl_run> and C<call_sv>, a non-null C<PL_restartop> triggers
200re-entry to the runops loop. The is the normal way that C<die> or
201C<croak> is handled within an C<eval>.
202
46326d43
KW
203=for apidoc Amnh|OP *|PL_restartop
204
04c692a8
DR
205Sometimes ops are executed within an inner runops loop, such as tie,
206sort or overload code. In this case, something like
207
208 sub FETCH { eval { die } }
209
210would cause a longjmp right back to the guard in C<perl_run>, popping
211both runops loops, which is clearly incorrect. One way to avoid this is
212for the tie code to do a C<JMPENV_PUSH> before executing C<FETCH> in
213the inner runops loop, but for efficiency reasons, perl in fact just
214sets a flag, using C<CATCH_SET(TRUE)>. The C<pp_require>,
215C<pp_entereval> and C<pp_entertry> ops check this flag, and if true,
216they call C<docatch>, which does a C<JMPENV_PUSH> and starts a new
217runops level to execute the code, rather than doing it on the current
218loop.
219
220As a further optimisation, on exit from the eval block in the C<FETCH>,
221execution of the code following the block is still carried on in the
222inner loop. When an exception is raised, C<docatch> compares the
223C<JMPENV> level of the C<CxEVAL> with C<PL_top_env> and if they differ,
224just re-throws the exception. In this way any inner loops get popped.
225
226Here's an example.
227
228 1: eval { tie @a, 'A' };
229 2: sub A::TIEARRAY {
230 3: eval { die };
231 4: die;
232 5: }
233
234To run this code, C<perl_run> is called, which does a C<JMPENV_PUSH>
235then enters a runops loop. This loop executes the eval and tie ops on
236line 1, with the eval pushing a C<CxEVAL> onto the context stack.
237
238The C<pp_tie> does a C<CATCH_SET(TRUE)>, then starts a second runops
239loop to execute the body of C<TIEARRAY>. When it executes the entertry
240op on line 3, C<CATCH_GET> is true, so C<pp_entertry> calls C<docatch>
241which does a C<JMPENV_PUSH> and starts a third runops loop, which then
242executes the die op. At this point the C call stack looks like this:
243
244 Perl_pp_die
245 Perl_runops # third loop
246 S_docatch_body
247 S_docatch
248 Perl_pp_entertry
249 Perl_runops # second loop
250 S_call_body
251 Perl_call_sv
252 Perl_pp_tie
253 Perl_runops # first loop
254 S_run_body
255 perl_run
256 main
257
258and the context and data stacks, as shown by C<-Dstv>, look like:
259
260 STACK 0: MAIN
261 CX 0: BLOCK =>
262 CX 1: EVAL => AV() PV("A"\0)
263 retop=leave
264 STACK 1: MAGIC
265 CX 0: SUB =>
266 retop=(null)
267 CX 1: EVAL => *
268 retop=nextstate
269
270The die pops the first C<CxEVAL> off the context stack, sets
271C<PL_restartop> from it, does a C<JMPENV_JUMP(3)>, and control returns
272to the top C<docatch>. This then starts another third-level runops
273level, which executes the nextstate, pushmark and die ops on line 4. At
274the point that the second C<pp_die> is called, the C call stack looks
275exactly like that above, even though we are no longer within an inner
276eval; this is because of the optimization mentioned earlier. However,
277the context stack now looks like this, ie with the top CxEVAL popped:
278
279 STACK 0: MAIN
280 CX 0: BLOCK =>
281 CX 1: EVAL => AV() PV("A"\0)
282 retop=leave
283 STACK 1: MAGIC
284 CX 0: SUB =>
285 retop=(null)
286
287The die on line 4 pops the context stack back down to the CxEVAL,
288leaving it as:
289
290 STACK 0: MAIN
291 CX 0: BLOCK =>
292
293As usual, C<PL_restartop> is extracted from the C<CxEVAL>, and a
294C<JMPENV_JUMP(3)> done, which pops the C stack back to the docatch:
295
296 S_docatch
297 Perl_pp_entertry
298 Perl_runops # second loop
299 S_call_body
300 Perl_call_sv
301 Perl_pp_tie
302 Perl_runops # first loop
303 S_run_body
304 perl_run
305 main
306
307In this case, because the C<JMPENV> level recorded in the C<CxEVAL>
308differs from the current one, C<docatch> just does a C<JMPENV_JUMP(3)>
309and the C stack unwinds to:
310
311 perl_run
312 main
313
314Because C<PL_restartop> is non-null, C<run_body> starts a new runops
315loop and execution continues.
316
317=head2 INTERNAL VARIABLE TYPES
318
319You should by now have had a look at L<perlguts>, which tells you about
320Perl's internal variable types: SVs, HVs, AVs and the rest. If not, do
321that now.
322
323These variables are used not only to represent Perl-space variables,
324but also any constants in the code, as well as some structures
325completely internal to Perl. The symbol table, for instance, is an
326ordinary Perl hash. Your code is represented by an SV as it's read into
327the parser; any program files you call are opened via ordinary Perl
328filehandles, and so on.
329
330The core L<Devel::Peek|Devel::Peek> module lets us examine SVs from a
331Perl program. Let's see, for instance, how Perl treats the constant
332C<"hello">.
333
334 % perl -MDevel::Peek -e 'Dump("hello")'
335 1 SV = PV(0xa041450) at 0xa04ecbc
336 2 REFCNT = 1
337 3 FLAGS = (POK,READONLY,pPOK)
338 4 PV = 0xa0484e0 "hello"\0
339 5 CUR = 5
340 6 LEN = 6
341
342Reading C<Devel::Peek> output takes a bit of practise, so let's go
343through it line by line.
344
345Line 1 tells us we're looking at an SV which lives at C<0xa04ecbc> in
346memory. SVs themselves are very simple structures, but they contain a
347pointer to a more complex structure. In this case, it's a PV, a
348structure which holds a string value, at location C<0xa041450>. Line 2
349is the reference count; there are no other references to this data, so
350it's 1.
351
352Line 3 are the flags for this SV - it's OK to use it as a PV, it's a
353read-only SV (because it's a constant) and the data is a PV internally.
354Next we've got the contents of the string, starting at location
355C<0xa0484e0>.
356
357Line 5 gives us the current length of the string - note that this does
358B<not> include the null terminator. Line 6 is not the length of the
359string, but the length of the currently allocated buffer; as the string
360grows, Perl automatically extends the available storage via a routine
361called C<SvGROW>.
362
363You can get at any of these quantities from C very easily; just add
364C<Sv> to the name of the field shown in the snippet, and you've got a
365macro which will return the value: C<SvCUR(sv)> returns the current
366length of the string, C<SvREFCOUNT(sv)> returns the reference count,
367C<SvPV(sv, len)> returns the string itself with its length, and so on.
368More macros to manipulate these properties can be found in L<perlguts>.
369
370Let's take an example of manipulating a PV, from C<sv_catpvn>, in
371F<sv.c>
372
373 1 void
5aaab254 374 2 Perl_sv_catpvn(pTHX_ SV *sv, const char *ptr, STRLEN len)
04c692a8
DR
375 3 {
376 4 STRLEN tlen;
377 5 char *junk;
378
379 6 junk = SvPV_force(sv, tlen);
380 7 SvGROW(sv, tlen + len + 1);
381 8 if (ptr == junk)
382 9 ptr = SvPVX(sv);
383 10 Move(ptr,SvPVX(sv)+tlen,len,char);
384 11 SvCUR(sv) += len;
385 12 *SvEND(sv) = '\0';
386 13 (void)SvPOK_only_UTF8(sv); /* validate pointer */
387 14 SvTAINT(sv);
388 15 }
389
390This is a function which adds a string, C<ptr>, of length C<len> onto
391the end of the PV stored in C<sv>. The first thing we do in line 6 is
392make sure that the SV B<has> a valid PV, by calling the C<SvPV_force>
393macro to force a PV. As a side effect, C<tlen> gets set to the current
394value of the PV, and the PV itself is returned to C<junk>.
395
396In line 7, we make sure that the SV will have enough room to
397accommodate the old string, the new string and the null terminator. If
398C<LEN> isn't big enough, C<SvGROW> will reallocate space for us.
399
400Now, if C<junk> is the same as the string we're trying to add, we can
401grab the string directly from the SV; C<SvPVX> is the address of the PV
402in the SV.
403
404Line 10 does the actual catenation: the C<Move> macro moves a chunk of
405memory around: we move the string C<ptr> to the end of the PV - that's
406the start of the PV plus its current length. We're moving C<len> bytes
407of type C<char>. After doing so, we need to tell Perl we've extended
408the string, by altering C<CUR> to reflect the new length. C<SvEND> is a
409macro which gives us the end of the string, so that needs to be a
410C<"\0">.
411
412Line 13 manipulates the flags; since we've changed the PV, any IV or NV
413values will no longer be valid: if we have C<$a=10; $a.="6";> we don't
414want to use the old IV of 10. C<SvPOK_only_utf8> is a special
415UTF-8-aware version of C<SvPOK_only>, a macro which turns off the IOK
416and NOK flags and turns on POK. The final C<SvTAINT> is a macro which
417launders tainted data if taint mode is turned on.
418
419AVs and HVs are more complicated, but SVs are by far the most common
420variable type being thrown around. Having seen something of how we
421manipulate these, let's go on and look at how the op tree is
422constructed.
423
424=head1 OP TREES
425
426First, what is the op tree, anyway? The op tree is the parsed
427representation of your program, as we saw in our section on parsing,
428and it's the sequence of operations that Perl goes through to execute
429your program, as we saw in L</Running>.
430
431An op is a fundamental operation that Perl can perform: all the
432built-in functions and operators are ops, and there are a series of ops
433which deal with concepts the interpreter needs internally - entering
434and leaving a block, ending a statement, fetching a variable, and so
435on.
436
437The op tree is connected in two ways: you can imagine that there are
438two "routes" through it, two orders in which you can traverse the tree.
439First, parse order reflects how the parser understood the code, and
440secondly, execution order tells perl what order to perform the
441operations in.
442
443The easiest way to examine the op tree is to stop Perl after it has
444finished parsing, and get it to dump out the tree. This is exactly what
445the compiler backends L<B::Terse|B::Terse>, L<B::Concise|B::Concise>
903b1101 446and CPAN module <B::Debug do.
04c692a8
DR
447
448Let's have a look at how Perl sees C<$a = $b + $c>:
449
450 % perl -MO=Terse -e '$a=$b+$c'
451 1 LISTOP (0x8179888) leave
452 2 OP (0x81798b0) enter
453 3 COP (0x8179850) nextstate
454 4 BINOP (0x8179828) sassign
455 5 BINOP (0x8179800) add [1]
456 6 UNOP (0x81796e0) null [15]
457 7 SVOP (0x80fafe0) gvsv GV (0x80fa4cc) *b
458 8 UNOP (0x81797e0) null [15]
459 9 SVOP (0x8179700) gvsv GV (0x80efeb0) *c
460 10 UNOP (0x816b4f0) null [15]
461 11 SVOP (0x816dcf0) gvsv GV (0x80fa460) *a
462
463Let's start in the middle, at line 4. This is a BINOP, a binary
464operator, which is at location C<0x8179828>. The specific operator in
465question is C<sassign> - scalar assignment - and you can find the code
466which implements it in the function C<pp_sassign> in F<pp_hot.c>. As a
467binary operator, it has two children: the add operator, providing the
468result of C<$b+$c>, is uppermost on line 5, and the left hand side is
469on line 10.
470
471Line 10 is the null op: this does exactly nothing. What is that doing
472there? If you see the null op, it's a sign that something has been
473optimized away after parsing. As we mentioned in L</Optimization>, the
474optimization stage sometimes converts two operations into one, for
475example when fetching a scalar variable. When this happens, instead of
476rewriting the op tree and cleaning up the dangling pointers, it's
477easier just to replace the redundant operation with the null op.
478Originally, the tree would have looked like this:
479
480 10 SVOP (0x816b4f0) rv2sv [15]
481 11 SVOP (0x816dcf0) gv GV (0x80fa460) *a
482
483That is, fetch the C<a> entry from the main symbol table, and then look
f672247c 484at the scalar component of it: C<gvsv> (C<pp_gvsv> in F<pp_hot.c>)
04c692a8
DR
485happens to do both these things.
486
487The right hand side, starting at line 5 is similar to what we've just
f672247c 488seen: we have the C<add> op (C<pp_add>, also in F<pp_hot.c>) add
04c692a8
DR
489together two C<gvsv>s.
490
491Now, what's this about?
492
493 1 LISTOP (0x8179888) leave
494 2 OP (0x81798b0) enter
495 3 COP (0x8179850) nextstate
496
497C<enter> and C<leave> are scoping ops, and their job is to perform any
498housekeeping every time you enter and leave a block: lexical variables
499are tidied up, unreferenced variables are destroyed, and so on. Every
500program will have those first three lines: C<leave> is a list, and its
501children are all the statements in the block. Statements are delimited
502by C<nextstate>, so a block is a collection of C<nextstate> ops, with
503the ops to be performed for each statement being the children of
504C<nextstate>. C<enter> is a single op which functions as a marker.
505
506That's how Perl parsed the program, from top to bottom:
507
508 Program
509 |
510 Statement
511 |
512 =
513 / \
514 / \
515 $a +
516 / \
517 $b $c
518
519However, it's impossible to B<perform> the operations in this order:
520you have to find the values of C<$b> and C<$c> before you add them
521together, for instance. So, the other thread that runs through the op
522tree is the execution order: each op has a field C<op_next> which
523points to the next op to be run, so following these pointers tells us
524how perl executes the code. We can traverse the tree in this order
525using the C<exec> option to C<B::Terse>:
526
527 % perl -MO=Terse,exec -e '$a=$b+$c'
528 1 OP (0x8179928) enter
529 2 COP (0x81798c8) nextstate
530 3 SVOP (0x81796c8) gvsv GV (0x80fa4d4) *b
531 4 SVOP (0x8179798) gvsv GV (0x80efeb0) *c
532 5 BINOP (0x8179878) add [1]
533 6 SVOP (0x816dd38) gvsv GV (0x80fa468) *a
534 7 BINOP (0x81798a0) sassign
535 8 LISTOP (0x8179900) leave
536
537This probably makes more sense for a human: enter a block, start a
538statement. Get the values of C<$b> and C<$c>, and add them together.
539Find C<$a>, and assign one to the other. Then leave.
540
541The way Perl builds up these op trees in the parsing process can be
65169990
FC
542unravelled by examining F<toke.c>, the lexer, and F<perly.y>, the YACC
543grammar. Let's look at the code that constructs the tree for C<$a = $b +
544$c>.
545
546First, we'll look at the C<Perl_yylex> function in the lexer. We want to
547look for C<case 'x'>, where x is the first character of the operator.
548(Incidentally, when looking for the code that handles a keyword, you'll
549want to search for C<KEY_foo> where "foo" is the keyword.) Here is the code
550that handles assignment (there are quite a few operators beginning with
551C<=>, so most of it is omitted for brevity):
552
553 1 case '=':
554 2 s++;
555 ... code that handles == => etc. and pod ...
556 3 pl_yylval.ival = 0;
557 4 OPERATOR(ASSIGNOP);
558
559We can see on line 4 that our token type is C<ASSIGNOP> (C<OPERATOR> is a
560macro, defined in F<toke.c>, that returns the token type, among other
561things). And C<+>:
562
563 1 case '+':
564 2 {
565 3 const char tmp = *s++;
566 ... code for ++ ...
567 4 if (PL_expect == XOPERATOR) {
568 ...
569 5 Aop(OP_ADD);
570 6 }
571 ...
572 7 }
573
574Line 4 checks what type of token we are expecting. C<Aop> returns a token.
575If you search for C<Aop> elsewhere in F<toke.c>, you will see that it
576returns an C<ADDOP> token.
577
578Now that we know the two token types we want to look for in the parser,
579let's take the piece of F<perly.y> we need to construct the tree for
580C<$a = $b + $c>
04c692a8
DR
581
582 1 term : term ASSIGNOP term
583 2 { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); }
584 3 | term ADDOP term
585 4 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }
586
587If you're not used to reading BNF grammars, this is how it works:
588You're fed certain things by the tokeniser, which generally end up in
65169990
FC
589upper case. C<ADDOP> and C<ASSIGNOP> are examples of "terminal symbols",
590because you can't get any simpler than
04c692a8
DR
591them.
592
593The grammar, lines one and three of the snippet above, tells you how to
594build up more complex forms. These complex forms, "non-terminal
595symbols" are generally placed in lower case. C<term> here is a
596non-terminal symbol, representing a single expression.
597
598The grammar gives you the following rule: you can make the thing on the
599left of the colon if you see all the things on the right in sequence.
600This is called a "reduction", and the aim of parsing is to completely
601reduce the input. There are several different ways you can perform a
602reduction, separated by vertical bars: so, C<term> followed by C<=>
603followed by C<term> makes a C<term>, and C<term> followed by C<+>
604followed by C<term> can also make a C<term>.
605
606So, if you see two terms with an C<=> or C<+>, between them, you can
607turn them into a single expression. When you do this, you execute the
608code in the block on the next line: if you see C<=>, you'll do the code
609in line 2. If you see C<+>, you'll do the code in line 4. It's this
610code which contributes to the op tree.
611
612 | term ADDOP term
613 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }
614
615What this does is creates a new binary op, and feeds it a number of
616variables. The variables refer to the tokens: C<$1> is the first token
617in the input, C<$2> the second, and so on - think regular expression
618backreferences. C<$$> is the op returned from this reduction. So, we
619call C<newBINOP> to create a new binary operator. The first parameter
620to C<newBINOP>, a function in F<op.c>, is the op type. It's an addition
621operator, so we want the type to be C<ADDOP>. We could specify this
622directly, but it's right there as the second token in the input, so we
623use C<$2>. The second parameter is the op's flags: 0 means "nothing
624special". Then the things to add: the left and right hand side of our
625expression, in scalar context.
626
65169990
FC
627The functions that create ops, which have names like C<newUNOP> and
628C<newBINOP>, call a "check" function associated with each op type, before
629returning the op. The check functions can mangle the op as they see fit,
630and even replace it with an entirely new one. These functions are defined
631in F<op.c>, and have a C<Perl_ck_> prefix. You can find out which
632check function is used for a particular op type by looking in
633F<regen/opcodes>. Take C<OP_ADD>, for example. (C<OP_ADD> is the token
634value from the C<Aop(OP_ADD)> in F<toke.c> which the parser passes to
635C<newBINOP> as its first argument.) Here is the relevant line:
636
637 add addition (+) ck_null IfsT2 S S
638
639The check function in this case is C<Perl_ck_null>, which does nothing.
640Let's look at a more interesting case:
641
642 readline <HANDLE> ck_readline t% F?
643
644And here is the function from F<op.c>:
645
646 1 OP *
647 2 Perl_ck_readline(pTHX_ OP *o)
648 3 {
649 4 PERL_ARGS_ASSERT_CK_READLINE;
650 5
651 6 if (o->op_flags & OPf_KIDS) {
652 7 OP *kid = cLISTOPo->op_first;
653 8 if (kid->op_type == OP_RV2GV)
654 9 kid->op_private |= OPpALLOW_FAKE;
655 10 }
656 11 else {
657 12 OP * const newop
658 13 = newUNOP(OP_READLINE, 0, newGVOP(OP_GV, 0,
659 14 PL_argvgv));
660 15 op_free(o);
661 16 return newop;
662 17 }
663 18 return o;
664 19 }
665
a1ac675e 666One particularly interesting aspect is that if the op has no kids (i.e.,
65169990
FC
667C<readline()> or C<< <> >>) the op is freed and replaced with an entirely
668new one that references C<*ARGV> (lines 12-16).
669
04c692a8
DR
670=head1 STACKS
671
672When perl executes something like C<addop>, how does it pass on its
673results to the next op? The answer is, through the use of stacks. Perl
674has a number of stacks to store things it's currently working on, and
675we'll look at the three most important ones here.
676
677=head2 Argument stack
678
679Arguments are passed to PP code and returned from PP code using the
680argument stack, C<ST>. The typical way to handle arguments is to pop
681them off the stack, deal with them how you wish, and then push the
682result back onto the stack. This is how, for instance, the cosine
683operator works:
684
685 NV value;
686 value = POPn;
687 value = Perl_cos(value);
688 XPUSHn(value);
689
690We'll see a more tricky example of this when we consider Perl's macros
691below. C<POPn> gives you the NV (floating point value) of the top SV on
692the stack: the C<$x> in C<cos($x)>. Then we compute the cosine, and
693push the result back as an NV. The C<X> in C<XPUSHn> means that the
694stack should be extended if necessary - it can't be necessary here,
695because we know there's room for one more item on the stack, since
696we've just removed one! The C<XPUSH*> macros at least guarantee safety.
697
698Alternatively, you can fiddle with the stack directly: C<SP> gives you
699the first element in your portion of the stack, and C<TOP*> gives you
700the top SV/IV/NV/etc. on the stack. So, for instance, to do unary
701negation of an integer:
702
703 SETi(-TOPi);
704
705Just set the integer value of the top stack entry to its negation.
706
707Argument stack manipulation in the core is exactly the same as it is in
708XSUBs - see L<perlxstut>, L<perlxs> and L<perlguts> for a longer
709description of the macros used in stack manipulation.
710
711=head2 Mark stack
712
713I say "your portion of the stack" above because PP code doesn't
714necessarily get the whole stack to itself: if your function calls
715another function, you'll only want to expose the arguments aimed for
716the called function, and not (necessarily) let it get at your own data.
717The way we do this is to have a "virtual" bottom-of-stack, exposed to
718each function. The mark stack keeps bookmarks to locations in the
719argument stack usable by each function. For instance, when dealing with
720a tied variable, (internally, something with "P" magic) Perl has to
721call methods for accesses to the tied variables. However, we need to
722separate the arguments exposed to the method to the argument exposed to
723the original function - the store or fetch or whatever it may be.
724Here's roughly how the tied C<push> is implemented; see C<av_push> in
725F<av.c>:
726
727 1 PUSHMARK(SP);
728 2 EXTEND(SP,2);
729 3 PUSHs(SvTIED_obj((SV*)av, mg));
730 4 PUSHs(val);
731 5 PUTBACK;
732 6 ENTER;
733 7 call_method("PUSH", G_SCALAR|G_DISCARD);
734 8 LEAVE;
735
736Let's examine the whole implementation, for practice:
737
738 1 PUSHMARK(SP);
739
740Push the current state of the stack pointer onto the mark stack. This
741is so that when we've finished adding items to the argument stack, Perl
742knows how many things we've added recently.
743
744 2 EXTEND(SP,2);
745 3 PUSHs(SvTIED_obj((SV*)av, mg));
746 4 PUSHs(val);
747
748We're going to add two more items onto the argument stack: when you
749have a tied array, the C<PUSH> subroutine receives the object and the
750value to be pushed, and that's exactly what we have here - the tied
751object, retrieved with C<SvTIED_obj>, and the value, the SV C<val>.
752
46326d43
KW
753=for apidoc_section Magic
754=for apidoc Amh||SvTIED_obj|SV *sv|MAGIC *mg
755
04c692a8
DR
756 5 PUTBACK;
757
758Next we tell Perl to update the global stack pointer from our internal
759variable: C<dSP> only gave us a local copy, not a reference to the
760global.
761
762 6 ENTER;
763 7 call_method("PUSH", G_SCALAR|G_DISCARD);
764 8 LEAVE;
765
766C<ENTER> and C<LEAVE> localise a block of code - they make sure that
767all variables are tidied up, everything that has been localised gets
768its previous value returned, and so on. Think of them as the C<{> and
769C<}> of a Perl block.
770
771To actually do the magic method call, we have to call a subroutine in
772Perl space: C<call_method> takes care of that, and it's described in
773L<perlcall>. We call the C<PUSH> method in scalar context, and we're
774going to discard its return value. The call_method() function removes
775the top element of the mark stack, so there is nothing for the caller
776to clean up.
777
778=head2 Save stack
779
780C doesn't have a concept of local scope, so perl provides one. We've
781seen that C<ENTER> and C<LEAVE> are used as scoping braces; the save
782stack implements the C equivalent of, for example:
783
784 {
785 local $foo = 42;
786 ...
787 }
788
548d0ee5 789See L<perlguts/"Localizing changes"> for how to use the save stack.
04c692a8
DR
790
791=head1 MILLIONS OF MACROS
792
793One thing you'll notice about the Perl source is that it's full of
794macros. Some have called the pervasive use of macros the hardest thing
795to understand, others find it adds to clarity. Let's take an example,
72876cce 796a stripped-down version the code which implements the addition operator:
04c692a8
DR
797
798 1 PP(pp_add)
799 2 {
72876cce
DM
800 3 dSP; dATARGET;
801 4 tryAMAGICbin_MG(add_amg, AMGf_assign|AMGf_numeric);
802 5 {
803 6 dPOPTOPnnrl_ul;
804 7 SETn( left + right );
805 8 RETURN;
806 9 }
807 10 }
04c692a8
DR
808
809Every line here (apart from the braces, of course) contains a macro.
810The first line sets up the function declaration as Perl expects for PP
811code; line 3 sets up variable declarations for the argument stack and
72876cce 812the target, the return value of the operation. Line 4 tries to see
04c692a8
DR
813if the addition operation is overloaded; if so, the appropriate
814subroutine is called.
815
72876cce 816Line 6 is another variable declaration - all variable declarations
04c692a8
DR
817start with C<d> - which pops from the top of the argument stack two NVs
818(hence C<nn>) and puts them into the variables C<right> and C<left>,
819hence the C<rl>. These are the two operands to the addition operator.
820Next, we call C<SETn> to set the NV of the return value to the result
821of adding the two values. This done, we return - the C<RETURN> macro
822makes sure that our return value is properly handled, and we pass the
823next operator to run back to the main run loop.
824
825Most of these macros are explained in L<perlapi>, and some of the more
826important ones are explained in L<perlxs> as well. Pay special
6e512bc2 827attention to L<perlguts/Background and MULTIPLICITY> for
04c692a8
DR
828information on the C<[pad]THX_?> macros.
829
830=head1 FURTHER READING
831
832For more information on the Perl internals, please see the documents
833listed at L<perl/Internals and C Language Interface>.