perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=encoding utf8
	2
	3	=for comment
	4	Consistent formatting of this file is achieved with:
	5	perl ./Porting/podtidy pod/perlinterp.pod
	6
	7	=head1 NAME
	8
	9	perlinterp - An overview of the Perl interpreter
	10
	11	=head1 DESCRIPTION
	12
	13	This document provides an overview of how the Perl interpreter works at
	14	the level of C code, along with pointers to the relevant C source code
	15	files.
	16
	17	=head1 ELEMENTS OF THE INTERPRETER
	18
	19	The work of the interpreter has two main stages: compiling the code
	20	into the internal representation, or bytecode, and then executing it.
	21	L<perlguts/Compiled code> explains exactly how the compilation stage
	22	happens.
	23
	24	Here is a short breakdown of perl's operation:
	25
	26	=head2 Startup
	27
	28	The action begins in F<perlmain.c>. (or F<miniperlmain.c> for miniperl)
	29	This is very high-level code, enough to fit on a single screen, and it
	30	resembles the code found in L<perlembed>; most of the real action takes
	31	place in F<perl.c>
	32
	33	F<perlmain.c> is generated by C<ExtUtils::Miniperl> from
	34	F<miniperlmain.c> at make time, so you should make perl to follow this
	35	along.
	36
	37	First, F<perlmain.c> allocates some memory and constructs a Perl
	38	interpreter, along these lines:
	39
	40	1 PERL_SYS_INIT3(&argc,&argv,&env);
	41	2
	42	3 if (!PL_do_undump) {
	43	4 my_perl = perl_alloc();
	44	5 if (!my_perl)
	45	6 exit(1);
	46	7 perl_construct(my_perl);
	47	8 PL_perl_destruct_level = 0;
	48	9 }
	49
	50	Line 1 is a macro, and its definition is dependent on your operating
	51	system. Line 3 references C<PL_do_undump>, a global variable - all
	52	global variables in Perl start with C<PL_>. This tells you whether the
	53	current running program was created with the C<-u> flag to perl and
	54	then F<undump>, which means it's going to be false in any sane context.
	55
	56	Line 4 calls a function in F<perl.c> to allocate memory for a Perl
	57	interpreter. It's quite a simple function, and the guts of it looks
	58	like this:
	59
	60	my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter));
	61
	62	Here you see an example of Perl's system abstraction, which we'll see
	63	later: C<PerlMem_malloc> is either your system's C<malloc>, or Perl's
	64	own C<malloc> as defined in F<malloc.c> if you selected that option at
	65	configure time.
	66
	67	Next, in line 7, we construct the interpreter using perl_construct,
	68	also in F<perl.c>; this sets up all the special variables that Perl
	69	needs, the stacks, and so on.
	70
	71	Now we pass Perl the command line options, and tell it to go:
	72
	73	exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL);
	74	if (!exitstatus)
	75	perl_run(my_perl);
	76
	77	exitstatus = perl_destruct(my_perl);
	78
	79	perl_free(my_perl);
	80
	81	C<perl_parse> is actually a wrapper around C<S_parse_body>, as defined
	82	in F<perl.c>, which processes the command line options, sets up any
	83	statically linked XS modules, opens the program and calls C<yyparse> to
	84	parse it.
	85
	86	=head2 Parsing
	87
	88	The aim of this stage is to take the Perl source, and turn it into an
	89	op tree. We'll see what one of those looks like later. Strictly
	90	speaking, there's three things going on here.
	91
	92	C<yyparse>, the parser, lives in F<perly.c>, although you're better off
	93	reading the original YACC input in F<perly.y>. (Yes, Virginia, there
	94	B<is> a YACC grammar for Perl!) The job of the parser is to take your
	95	code and "understand" it, splitting it into sentences, deciding which
	96	operands go with which operators and so on.
	97
	98	The parser is nobly assisted by the lexer, which chunks up your input
	99	into tokens, and decides what type of thing each token is: a variable
	100	name, an operator, a bareword, a subroutine, a core function, and so
	101	on. The main point of entry to the lexer is C<yylex>, and that and its
	102	associated routines can be found in F<toke.c>. Perl isn't much like
	103	other computer languages; it's highly context sensitive at times, it
	104	can be tricky to work out what sort of token something is, or where a
	105	token ends. As such, there's a lot of interplay between the tokeniser
	106	and the parser, which can get pretty frightening if you're not used to
	107	it.
	108
	109	As the parser understands a Perl program, it builds up a tree of
	110	operations for the interpreter to perform during execution. The
	111	routines which construct and link together the various operations are
	112	to be found in F<op.c>, and will be examined later.
	113
	114	=head2 Optimization
	115
	116	Now the parsing stage is complete, and the finished tree represents the
	117	operations that the Perl interpreter needs to perform to execute our
	118	program. Next, Perl does a dry run over the tree looking for
	119	optimisations: constant expressions such as C<3 + 4> will be computed
	120	now, and the optimizer will also see if any multiple operations can be
	121	replaced with a single one. For instance, to fetch the variable
	122	C<$foo>, instead of grabbing the glob C<*foo> and looking at the scalar
	123	component, the optimizer fiddles the op tree to use a function which
	124	directly looks up the scalar in question. The main optimizer is C<peep>
	125	in F<op.c>, and many ops have their own optimizing functions.
	126
	127	=head2 Running
	128
	129	Now we're finally ready to go: we have compiled Perl byte code, and all
	130	that's left to do is run it. The actual execution is done by the
	131	C<runops_standard> function in F<run.c>; more specifically, it's done
	132	by these three innocent looking lines:
	133
	134	while ((PL_op = PL_op->op_ppaddr(aTHX))) {
	135	PERL_ASYNC_CHECK();
	136	}
	137
	138	You may be more comfortable with the Perl version of that:
	139
	140	PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}};
	141
	142	Well, maybe not. Anyway, each op contains a function pointer, which
	143	stipulates the function which will actually carry out the operation.
	144	This function will return the next op in the sequence - this allows for
	145	things like C<if> which choose the next op dynamically at run time. The
	146	C<PERL_ASYNC_CHECK> makes sure that things like signals interrupt
	147	execution if required.
	148
	149	The actual functions called are known as PP code, and they're spread
	150	between four files: F<pp_hot.c> contains the "hot" code, which is most
	151	often used and highly optimized, F<pp_sys.c> contains all the
	152	system-specific functions, F<pp_ctl.c> contains the functions which
	153	implement control structures (C<if>, C<while> and the like) and F<pp.c>
	154	contains everything else. These are, if you like, the C code for Perl's
	155	built-in functions and operators.
	156
	157	Note that each C<pp_> function is expected to return a pointer to the
	158	next op. Calls to perl subs (and eval blocks) are handled within the
	159	same runops loop, and do not consume extra space on the C stack. For
	160	example, C<pp_entersub> and C<pp_entertry> just push a C<CxSUB> or
	161	C<CxEVAL> block struct onto the context stack which contain the address
	162	of the op following the sub call or eval. They then return the first op
	163	of that sub or eval block, and so execution continues of that sub or
	164	block. Later, a C<pp_leavesub> or C<pp_leavetry> op pops the C<CxSUB>
	165	or C<CxEVAL>, retrieves the return op from it, and returns it.
	166
	167	=head2 Exception handing
	168
	169	Perl's exception handing (i.e. C<die> etc.) is built on top of the
	170	low-level C<setjmp()>/C<longjmp()> C-library functions. These basically
	171	provide a way to capture the current PC and SP registers and later
	172	restore them; i.e. a C<longjmp()> continues at the point in code where
	173	a previous C<setjmp()> was done, with anything further up on the C
	174	stack being lost. This is why code should always save values using
	175	C<SAVE_FOO> rather than in auto variables.
	176
	177	The perl core wraps C<setjmp()> etc in the macros C<JMPENV_PUSH> and
	178	C<JMPENV_JUMP>. The basic rule of perl exceptions is that C<exit>, and
	179	C<die> (in the absence of C<eval>) perform a C<JMPENV_JUMP(2)>, while
	180	C<die> within C<eval> does a C<JMPENV_JUMP(3)>.
	181
	182	At entry points to perl, such as C<perl_parse()>, C<perl_run()> and
	183	C<call_sv(cv, G_EVAL)> each does a C<JMPENV_PUSH>, then enter a runops
	184	loop or whatever, and handle possible exception returns. For a 2
	185	return, final cleanup is performed, such as popping stacks and calling
	186	C<CHECK> or C<END> blocks. Amongst other things, this is how scope
	187	cleanup still occurs during an C<exit>.
	188
	189	If a C<die> can find a C<CxEVAL> block on the context stack, then the
	190	stack is popped to that level and the return op in that block is
	191	assigned to C<PL_restartop>; then a C<JMPENV_JUMP(3)> is performed.
	192	This normally passes control back to the guard. In the case of
	193	C<perl_run> and C<call_sv>, a non-null C<PL_restartop> triggers
	194	re-entry to the runops loop. The is the normal way that C<die> or
	195	C<croak> is handled within an C<eval>.
	196
	197	Sometimes ops are executed within an inner runops loop, such as tie,
	198	sort or overload code. In this case, something like
	199
	200	sub FETCH { eval { die } }
	201
	202	would cause a longjmp right back to the guard in C<perl_run>, popping
	203	both runops loops, which is clearly incorrect. One way to avoid this is
	204	for the tie code to do a C<JMPENV_PUSH> before executing C<FETCH> in
	205	the inner runops loop, but for efficiency reasons, perl in fact just
	206	sets a flag, using C<CATCH_SET(TRUE)>. The C<pp_require>,
	207	C<pp_entereval> and C<pp_entertry> ops check this flag, and if true,
	208	they call C<docatch>, which does a C<JMPENV_PUSH> and starts a new
	209	runops level to execute the code, rather than doing it on the current
	210	loop.
	211
	212	As a further optimisation, on exit from the eval block in the C<FETCH>,
	213	execution of the code following the block is still carried on in the
	214	inner loop. When an exception is raised, C<docatch> compares the
	215	C<JMPENV> level of the C<CxEVAL> with C<PL_top_env> and if they differ,
	216	just re-throws the exception. In this way any inner loops get popped.
	217
	218	Here's an example.
	219
	220	1: eval { tie @a, 'A' };
	221	2: sub A::TIEARRAY {
	222	3: eval { die };
	223	4: die;
	224	5: }
	225
	226	To run this code, C<perl_run> is called, which does a C<JMPENV_PUSH>
	227	then enters a runops loop. This loop executes the eval and tie ops on
	228	line 1, with the eval pushing a C<CxEVAL> onto the context stack.
	229
	230	The C<pp_tie> does a C<CATCH_SET(TRUE)>, then starts a second runops
	231	loop to execute the body of C<TIEARRAY>. When it executes the entertry
	232	op on line 3, C<CATCH_GET> is true, so C<pp_entertry> calls C<docatch>
	233	which does a C<JMPENV_PUSH> and starts a third runops loop, which then
	234	executes the die op. At this point the C call stack looks like this:
	235
	236	Perl_pp_die
	237	Perl_runops # third loop
	238	S_docatch_body
	239	S_docatch
	240	Perl_pp_entertry
	241	Perl_runops # second loop
	242	S_call_body
	243	Perl_call_sv
	244	Perl_pp_tie
	245	Perl_runops # first loop
	246	S_run_body
	247	perl_run
	248	main
	249
	250	and the context and data stacks, as shown by C<-Dstv>, look like:
	251
	252	STACK 0: MAIN
	253	CX 0: BLOCK =>
	254	CX 1: EVAL => AV() PV("A"\0)
	255	retop=leave
	256	STACK 1: MAGIC
	257	CX 0: SUB =>
	258	retop=(null)
	259	CX 1: EVAL => *
	260	retop=nextstate
	261
	262	The die pops the first C<CxEVAL> off the context stack, sets
	263	C<PL_restartop> from it, does a C<JMPENV_JUMP(3)>, and control returns
	264	to the top C<docatch>. This then starts another third-level runops
	265	level, which executes the nextstate, pushmark and die ops on line 4. At
	266	the point that the second C<pp_die> is called, the C call stack looks
	267	exactly like that above, even though we are no longer within an inner
	268	eval; this is because of the optimization mentioned earlier. However,
	269	the context stack now looks like this, ie with the top CxEVAL popped:
	270
	271	STACK 0: MAIN
	272	CX 0: BLOCK =>
	273	CX 1: EVAL => AV() PV("A"\0)
	274	retop=leave
	275	STACK 1: MAGIC
	276	CX 0: SUB =>
	277	retop=(null)
	278
	279	The die on line 4 pops the context stack back down to the CxEVAL,
	280	leaving it as:
	281
	282	STACK 0: MAIN
	283	CX 0: BLOCK =>
	284
	285	As usual, C<PL_restartop> is extracted from the C<CxEVAL>, and a
	286	C<JMPENV_JUMP(3)> done, which pops the C stack back to the docatch:
	287
	288	S_docatch
	289	Perl_pp_entertry
	290	Perl_runops # second loop
	291	S_call_body
	292	Perl_call_sv
	293	Perl_pp_tie
	294	Perl_runops # first loop
	295	S_run_body
	296	perl_run
	297	main
	298
	299	In this case, because the C<JMPENV> level recorded in the C<CxEVAL>
	300	differs from the current one, C<docatch> just does a C<JMPENV_JUMP(3)>
	301	and the C stack unwinds to:
	302
	303	perl_run
	304	main
	305
	306	Because C<PL_restartop> is non-null, C<run_body> starts a new runops
	307	loop and execution continues.
	308
	309	=head2 INTERNAL VARIABLE TYPES
	310
	311	You should by now have had a look at L<perlguts>, which tells you about
	312	Perl's internal variable types: SVs, HVs, AVs and the rest. If not, do
	313	that now.
	314
	315	These variables are used not only to represent Perl-space variables,
	316	but also any constants in the code, as well as some structures
	317	completely internal to Perl. The symbol table, for instance, is an
	318	ordinary Perl hash. Your code is represented by an SV as it's read into
	319	the parser; any program files you call are opened via ordinary Perl
	320	filehandles, and so on.
	321
	322	The core L<Devel::Peek\|Devel::Peek> module lets us examine SVs from a
	323	Perl program. Let's see, for instance, how Perl treats the constant
	324	C<"hello">.
	325
	326	% perl -MDevel::Peek -e 'Dump("hello")'
	327	1 SV = PV(0xa041450) at 0xa04ecbc
	328	2 REFCNT = 1
	329	3 FLAGS = (POK,READONLY,pPOK)
	330	4 PV = 0xa0484e0 "hello"\0
	331	5 CUR = 5
	332	6 LEN = 6
	333
	334	Reading C<Devel::Peek> output takes a bit of practise, so let's go
	335	through it line by line.
	336
	337	Line 1 tells us we're looking at an SV which lives at C<0xa04ecbc> in
	338	memory. SVs themselves are very simple structures, but they contain a
	339	pointer to a more complex structure. In this case, it's a PV, a
	340	structure which holds a string value, at location C<0xa041450>. Line 2
	341	is the reference count; there are no other references to this data, so
	342	it's 1.
	343
	344	Line 3 are the flags for this SV - it's OK to use it as a PV, it's a
	345	read-only SV (because it's a constant) and the data is a PV internally.
	346	Next we've got the contents of the string, starting at location
	347	C<0xa0484e0>.
	348
	349	Line 5 gives us the current length of the string - note that this does
	350	B<not> include the null terminator. Line 6 is not the length of the
	351	string, but the length of the currently allocated buffer; as the string
	352	grows, Perl automatically extends the available storage via a routine
	353	called C<SvGROW>.
	354
	355	You can get at any of these quantities from C very easily; just add
	356	C<Sv> to the name of the field shown in the snippet, and you've got a
	357	macro which will return the value: C<SvCUR(sv)> returns the current
	358	length of the string, C<SvREFCOUNT(sv)> returns the reference count,
	359	C<SvPV(sv, len)> returns the string itself with its length, and so on.
	360	More macros to manipulate these properties can be found in L<perlguts>.
	361
	362	Let's take an example of manipulating a PV, from C<sv_catpvn>, in
	363	F<sv.c>
	364
	365	1 void
	366	2 Perl_sv_catpvn(pTHX_ SV sv, const char ptr, STRLEN len)
	367	3 {
	368	4 STRLEN tlen;
	369	5 char *junk;
	370
	371	6 junk = SvPV_force(sv, tlen);
	372	7 SvGROW(sv, tlen + len + 1);
	373	8 if (ptr == junk)
	374	9 ptr = SvPVX(sv);
	375	10 Move(ptr,SvPVX(sv)+tlen,len,char);
	376	11 SvCUR(sv) += len;
	377	12 *SvEND(sv) = '\0';
	378	13 (void)SvPOK_only_UTF8(sv); /* validate pointer */
	379	14 SvTAINT(sv);
	380	15 }
	381
	382	This is a function which adds a string, C<ptr>, of length C<len> onto
	383	the end of the PV stored in C<sv>. The first thing we do in line 6 is
	384	make sure that the SV B<has> a valid PV, by calling the C<SvPV_force>
	385	macro to force a PV. As a side effect, C<tlen> gets set to the current
	386	value of the PV, and the PV itself is returned to C<junk>.
	387
	388	In line 7, we make sure that the SV will have enough room to
	389	accommodate the old string, the new string and the null terminator. If
	390	C<LEN> isn't big enough, C<SvGROW> will reallocate space for us.
	391
	392	Now, if C<junk> is the same as the string we're trying to add, we can
	393	grab the string directly from the SV; C<SvPVX> is the address of the PV
	394	in the SV.
	395
	396	Line 10 does the actual catenation: the C<Move> macro moves a chunk of
	397	memory around: we move the string C<ptr> to the end of the PV - that's
	398	the start of the PV plus its current length. We're moving C<len> bytes
	399	of type C<char>. After doing so, we need to tell Perl we've extended
	400	the string, by altering C<CUR> to reflect the new length. C<SvEND> is a
	401	macro which gives us the end of the string, so that needs to be a
	402	C<"\0">.
	403
	404	Line 13 manipulates the flags; since we've changed the PV, any IV or NV
	405	values will no longer be valid: if we have C<$a=10; $a.="6";> we don't
	406	want to use the old IV of 10. C<SvPOK_only_utf8> is a special
	407	UTF-8-aware version of C<SvPOK_only>, a macro which turns off the IOK
	408	and NOK flags and turns on POK. The final C<SvTAINT> is a macro which
	409	launders tainted data if taint mode is turned on.
	410
	411	AVs and HVs are more complicated, but SVs are by far the most common
	412	variable type being thrown around. Having seen something of how we
	413	manipulate these, let's go on and look at how the op tree is
	414	constructed.
	415
	416	=head1 OP TREES
	417
	418	First, what is the op tree, anyway? The op tree is the parsed
	419	representation of your program, as we saw in our section on parsing,
	420	and it's the sequence of operations that Perl goes through to execute
	421	your program, as we saw in L</Running>.
	422
	423	An op is a fundamental operation that Perl can perform: all the
	424	built-in functions and operators are ops, and there are a series of ops
	425	which deal with concepts the interpreter needs internally - entering
	426	and leaving a block, ending a statement, fetching a variable, and so
	427	on.
	428
	429	The op tree is connected in two ways: you can imagine that there are
	430	two "routes" through it, two orders in which you can traverse the tree.
	431	First, parse order reflects how the parser understood the code, and
	432	secondly, execution order tells perl what order to perform the
	433	operations in.
	434
	435	The easiest way to examine the op tree is to stop Perl after it has
	436	finished parsing, and get it to dump out the tree. This is exactly what
	437	the compiler backends L<B::Terse\|B::Terse>, L<B::Concise\|B::Concise>
	438	and L<B::Debug\|B::Debug> do.
	439
	440	Let's have a look at how Perl sees C<$a = $b + $c>:
	441
	442	% perl -MO=Terse -e '$a=$b+$c'
	443	1 LISTOP (0x8179888) leave
	444	2 OP (0x81798b0) enter
	445	3 COP (0x8179850) nextstate
	446	4 BINOP (0x8179828) sassign
	447	5 BINOP (0x8179800) add [1]
	448	6 UNOP (0x81796e0) null [15]
	449	7 SVOP (0x80fafe0) gvsv GV (0x80fa4cc) *b
	450	8 UNOP (0x81797e0) null [15]
	451	9 SVOP (0x8179700) gvsv GV (0x80efeb0) *c
	452	10 UNOP (0x816b4f0) null [15]
	453	11 SVOP (0x816dcf0) gvsv GV (0x80fa460) *a
	454
	455	Let's start in the middle, at line 4. This is a BINOP, a binary
	456	operator, which is at location C<0x8179828>. The specific operator in
	457	question is C<sassign> - scalar assignment - and you can find the code
	458	which implements it in the function C<pp_sassign> in F<pp_hot.c>. As a
	459	binary operator, it has two children: the add operator, providing the
	460	result of C<$b+$c>, is uppermost on line 5, and the left hand side is
	461	on line 10.
	462
	463	Line 10 is the null op: this does exactly nothing. What is that doing
	464	there? If you see the null op, it's a sign that something has been
	465	optimized away after parsing. As we mentioned in L</Optimization>, the
	466	optimization stage sometimes converts two operations into one, for
	467	example when fetching a scalar variable. When this happens, instead of
	468	rewriting the op tree and cleaning up the dangling pointers, it's
	469	easier just to replace the redundant operation with the null op.
	470	Originally, the tree would have looked like this:
	471
	472	10 SVOP (0x816b4f0) rv2sv [15]
	473	11 SVOP (0x816dcf0) gv GV (0x80fa460) *a
	474
	475	That is, fetch the C<a> entry from the main symbol table, and then look
	476	at the scalar component of it: C<gvsv> (C<pp_gvsv> in F<pp_hot.c>)
	477	happens to do both these things.
	478
	479	The right hand side, starting at line 5 is similar to what we've just
	480	seen: we have the C<add> op (C<pp_add>, also in F<pp_hot.c>) add
	481	together two C<gvsv>s.
	482
	483	Now, what's this about?
	484
	485	1 LISTOP (0x8179888) leave
	486	2 OP (0x81798b0) enter
	487	3 COP (0x8179850) nextstate
	488
	489	C<enter> and C<leave> are scoping ops, and their job is to perform any
	490	housekeeping every time you enter and leave a block: lexical variables
	491	are tidied up, unreferenced variables are destroyed, and so on. Every
	492	program will have those first three lines: C<leave> is a list, and its
	493	children are all the statements in the block. Statements are delimited
	494	by C<nextstate>, so a block is a collection of C<nextstate> ops, with
	495	the ops to be performed for each statement being the children of
	496	C<nextstate>. C<enter> is a single op which functions as a marker.
	497
	498	That's how Perl parsed the program, from top to bottom:
	499
	500	Program
	501	\|
	502	Statement
	503	\|
	504	=
	505	/ \
	506	/ \
	507	$a +
	508	/ \
	509	$b $c
	510
	511	However, it's impossible to B<perform> the operations in this order:
	512	you have to find the values of C<$b> and C<$c> before you add them
	513	together, for instance. So, the other thread that runs through the op
	514	tree is the execution order: each op has a field C<op_next> which
	515	points to the next op to be run, so following these pointers tells us
	516	how perl executes the code. We can traverse the tree in this order
	517	using the C<exec> option to C<B::Terse>:
	518
	519	% perl -MO=Terse,exec -e '$a=$b+$c'
	520	1 OP (0x8179928) enter
	521	2 COP (0x81798c8) nextstate
	522	3 SVOP (0x81796c8) gvsv GV (0x80fa4d4) *b
	523	4 SVOP (0x8179798) gvsv GV (0x80efeb0) *c
	524	5 BINOP (0x8179878) add [1]
	525	6 SVOP (0x816dd38) gvsv GV (0x80fa468) *a
	526	7 BINOP (0x81798a0) sassign
	527	8 LISTOP (0x8179900) leave
	528
	529	This probably makes more sense for a human: enter a block, start a
	530	statement. Get the values of C<$b> and C<$c>, and add them together.
	531	Find C<$a>, and assign one to the other. Then leave.
	532
	533	The way Perl builds up these op trees in the parsing process can be
	534	unravelled by examining F<toke.c>, the lexer, and F<perly.y>, the YACC
	535	grammar. Let's look at the code that constructs the tree for C<$a = $b +
	536	$c>.
	537
	538	First, we'll look at the C<Perl_yylex> function in the lexer. We want to
	539	look for C<case 'x'>, where x is the first character of the operator.
	540	(Incidentally, when looking for the code that handles a keyword, you'll
	541	want to search for C<KEY_foo> where "foo" is the keyword.) Here is the code
	542	that handles assignment (there are quite a few operators beginning with
	543	C<=>, so most of it is omitted for brevity):
	544
	545	1 case '=':
	546	2 s++;
	547	... code that handles == => etc. and pod ...
	548	3 pl_yylval.ival = 0;
	549	4 OPERATOR(ASSIGNOP);
	550
	551	We can see on line 4 that our token type is C<ASSIGNOP> (C<OPERATOR> is a
	552	macro, defined in F<toke.c>, that returns the token type, among other
	553	things). And C<+>:
	554
	555	1 case '+':
	556	2 {
	557	3 const char tmp = *s++;
	558	... code for ++ ...
	559	4 if (PL_expect == XOPERATOR) {
	560	...
	561	5 Aop(OP_ADD);
	562	6 }
	563	...
	564	7 }
	565
	566	Line 4 checks what type of token we are expecting. C<Aop> returns a token.
	567	If you search for C<Aop> elsewhere in F<toke.c>, you will see that it
	568	returns an C<ADDOP> token.
	569
	570	Now that we know the two token types we want to look for in the parser,
	571	let's take the piece of F<perly.y> we need to construct the tree for
	572	C<$a = $b + $c>
	573
	574	1 term : term ASSIGNOP term
	575	2 { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); }
	576	3 \| term ADDOP term
	577	4 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }
	578
	579	If you're not used to reading BNF grammars, this is how it works:
	580	You're fed certain things by the tokeniser, which generally end up in
	581	upper case. C<ADDOP> and C<ASSIGNOP> are examples of "terminal symbols",
	582	because you can't get any simpler than
	583	them.
	584
	585	The grammar, lines one and three of the snippet above, tells you how to
	586	build up more complex forms. These complex forms, "non-terminal
	587	symbols" are generally placed in lower case. C<term> here is a
	588	non-terminal symbol, representing a single expression.
	589
	590	The grammar gives you the following rule: you can make the thing on the
	591	left of the colon if you see all the things on the right in sequence.
	592	This is called a "reduction", and the aim of parsing is to completely
	593	reduce the input. There are several different ways you can perform a
	594	reduction, separated by vertical bars: so, C<term> followed by C<=>
	595	followed by C<term> makes a C<term>, and C<term> followed by C<+>
	596	followed by C<term> can also make a C<term>.
	597
	598	So, if you see two terms with an C<=> or C<+>, between them, you can
	599	turn them into a single expression. When you do this, you execute the
	600	code in the block on the next line: if you see C<=>, you'll do the code
	601	in line 2. If you see C<+>, you'll do the code in line 4. It's this
	602	code which contributes to the op tree.
	603
	604	\| term ADDOP term
	605	{ $$ = newBINOP($2, 0, scalar($1), scalar($3)); }
	606
	607	What this does is creates a new binary op, and feeds it a number of
	608	variables. The variables refer to the tokens: C<$1> is the first token
	609	in the input, C<$2> the second, and so on - think regular expression
	610	backreferences. C<$$> is the op returned from this reduction. So, we
	611	call C<newBINOP> to create a new binary operator. The first parameter
	612	to C<newBINOP>, a function in F<op.c>, is the op type. It's an addition
	613	operator, so we want the type to be C<ADDOP>. We could specify this
	614	directly, but it's right there as the second token in the input, so we
	615	use C<$2>. The second parameter is the op's flags: 0 means "nothing
	616	special". Then the things to add: the left and right hand side of our
	617	expression, in scalar context.
	618
	619	The functions that create ops, which have names like C<newUNOP> and
	620	C<newBINOP>, call a "check" function associated with each op type, before
	621	returning the op. The check functions can mangle the op as they see fit,
	622	and even replace it with an entirely new one. These functions are defined
	623	in F<op.c>, and have a C<Perl_ck_> prefix. You can find out which
	624	check function is used for a particular op type by looking in
	625	F<regen/opcodes>. Take C<OP_ADD>, for example. (C<OP_ADD> is the token
	626	value from the C<Aop(OP_ADD)> in F<toke.c> which the parser passes to
	627	C<newBINOP> as its first argument.) Here is the relevant line:
	628
	629	add addition (+) ck_null IfsT2 S S
	630
	631	The check function in this case is C<Perl_ck_null>, which does nothing.
	632	Let's look at a more interesting case:
	633
	634	readline <HANDLE> ck_readline t% F?
	635
	636	And here is the function from F<op.c>:
	637
	638	1 OP *
	639	2 Perl_ck_readline(pTHX_ OP *o)
	640	3 {
	641	4 PERL_ARGS_ASSERT_CK_READLINE;
	642	5
	643	6 if (o->op_flags & OPf_KIDS) {
	644	7 OP *kid = cLISTOPo->op_first;
	645	8 if (kid->op_type == OP_RV2GV)
	646	9 kid->op_private \|= OPpALLOW_FAKE;
	647	10 }
	648	11 else {
	649	12 OP * const newop
	650	13 = newUNOP(OP_READLINE, 0, newGVOP(OP_GV, 0,
	651	14 PL_argvgv));
	652	15 op_free(o);
	653	16 return newop;
	654	17 }
	655	18 return o;
	656	19 }
	657
	658	One particularly interesting aspect is that if the op has no kids (i.e.,
	659	C<readline()> or C<< <> >>) the op is freed and replaced with an entirely
	660	new one that references C<*ARGV> (lines 12-16).
	661
	662	=head1 STACKS
	663
	664	When perl executes something like C<addop>, how does it pass on its
	665	results to the next op? The answer is, through the use of stacks. Perl
	666	has a number of stacks to store things it's currently working on, and
	667	we'll look at the three most important ones here.
	668
	669	=head2 Argument stack
	670
	671	Arguments are passed to PP code and returned from PP code using the
	672	argument stack, C<ST>. The typical way to handle arguments is to pop
	673	them off the stack, deal with them how you wish, and then push the
	674	result back onto the stack. This is how, for instance, the cosine
	675	operator works:
	676
	677	NV value;
	678	value = POPn;
	679	value = Perl_cos(value);
	680	XPUSHn(value);
	681
	682	We'll see a more tricky example of this when we consider Perl's macros
	683	below. C<POPn> gives you the NV (floating point value) of the top SV on
	684	the stack: the C<$x> in C<cos($x)>. Then we compute the cosine, and
	685	push the result back as an NV. The C<X> in C<XPUSHn> means that the
	686	stack should be extended if necessary - it can't be necessary here,
	687	because we know there's room for one more item on the stack, since
	688	we've just removed one! The C<XPUSH*> macros at least guarantee safety.
	689
	690	Alternatively, you can fiddle with the stack directly: C<SP> gives you
	691	the first element in your portion of the stack, and C<TOP*> gives you
	692	the top SV/IV/NV/etc. on the stack. So, for instance, to do unary
	693	negation of an integer:
	694
	695	SETi(-TOPi);
	696
	697	Just set the integer value of the top stack entry to its negation.
	698
	699	Argument stack manipulation in the core is exactly the same as it is in
	700	XSUBs - see L<perlxstut>, L<perlxs> and L<perlguts> for a longer
	701	description of the macros used in stack manipulation.
	702
	703	=head2 Mark stack
	704
	705	I say "your portion of the stack" above because PP code doesn't
	706	necessarily get the whole stack to itself: if your function calls
	707	another function, you'll only want to expose the arguments aimed for
	708	the called function, and not (necessarily) let it get at your own data.
	709	The way we do this is to have a "virtual" bottom-of-stack, exposed to
	710	each function. The mark stack keeps bookmarks to locations in the
	711	argument stack usable by each function. For instance, when dealing with
	712	a tied variable, (internally, something with "P" magic) Perl has to
	713	call methods for accesses to the tied variables. However, we need to
	714	separate the arguments exposed to the method to the argument exposed to
	715	the original function - the store or fetch or whatever it may be.
	716	Here's roughly how the tied C<push> is implemented; see C<av_push> in
	717	F<av.c>:
	718
	719	1 PUSHMARK(SP);
	720	2 EXTEND(SP,2);
	721	3 PUSHs(SvTIED_obj((SV*)av, mg));
	722	4 PUSHs(val);
	723	5 PUTBACK;
	724	6 ENTER;
	725	7 call_method("PUSH", G_SCALAR\|G_DISCARD);
	726	8 LEAVE;
	727
	728	Let's examine the whole implementation, for practice:
	729
	730	1 PUSHMARK(SP);
	731
	732	Push the current state of the stack pointer onto the mark stack. This
	733	is so that when we've finished adding items to the argument stack, Perl
	734	knows how many things we've added recently.
	735
	736	2 EXTEND(SP,2);
	737	3 PUSHs(SvTIED_obj((SV*)av, mg));
	738	4 PUSHs(val);
	739
	740	We're going to add two more items onto the argument stack: when you
	741	have a tied array, the C<PUSH> subroutine receives the object and the
	742	value to be pushed, and that's exactly what we have here - the tied
	743	object, retrieved with C<SvTIED_obj>, and the value, the SV C<val>.
	744
	745	5 PUTBACK;
	746
	747	Next we tell Perl to update the global stack pointer from our internal
	748	variable: C<dSP> only gave us a local copy, not a reference to the
	749	global.
	750
	751	6 ENTER;
	752	7 call_method("PUSH", G_SCALAR\|G_DISCARD);
	753	8 LEAVE;
	754
	755	C<ENTER> and C<LEAVE> localise a block of code - they make sure that
	756	all variables are tidied up, everything that has been localised gets
	757	its previous value returned, and so on. Think of them as the C<{> and
	758	C<}> of a Perl block.
	759
	760	To actually do the magic method call, we have to call a subroutine in
	761	Perl space: C<call_method> takes care of that, and it's described in
	762	L<perlcall>. We call the C<PUSH> method in scalar context, and we're
	763	going to discard its return value. The call_method() function removes
	764	the top element of the mark stack, so there is nothing for the caller
	765	to clean up.
	766
	767	=head2 Save stack
	768
	769	C doesn't have a concept of local scope, so perl provides one. We've
	770	seen that C<ENTER> and C<LEAVE> are used as scoping braces; the save
	771	stack implements the C equivalent of, for example:
	772
	773	{
	774	local $foo = 42;
	775	...
	776	}
	777
	778	See L<perlguts/"Localizing changes"> for how to use the save stack.
	779
	780	=head1 MILLIONS OF MACROS
	781
	782	One thing you'll notice about the Perl source is that it's full of
	783	macros. Some have called the pervasive use of macros the hardest thing
	784	to understand, others find it adds to clarity. Let's take an example,
	785	the code which implements the addition operator:
	786
	787	1 PP(pp_add)
	788	2 {
	789	3 dSP; dATARGET; tryAMAGICbin(add,opASSIGN);
	790	4 {
	791	5 dPOPTOPnnrl_ul;
	792	6 SETn( left + right );
	793	7 RETURN;
	794	8 }
	795	9 }
	796
	797	Every line here (apart from the braces, of course) contains a macro.
	798	The first line sets up the function declaration as Perl expects for PP
	799	code; line 3 sets up variable declarations for the argument stack and
	800	the target, the return value of the operation. Finally, it tries to see
	801	if the addition operation is overloaded; if so, the appropriate
	802	subroutine is called.
	803
	804	Line 5 is another variable declaration - all variable declarations
	805	start with C<d> - which pops from the top of the argument stack two NVs
	806	(hence C<nn>) and puts them into the variables C<right> and C<left>,
	807	hence the C<rl>. These are the two operands to the addition operator.
	808	Next, we call C<SETn> to set the NV of the return value to the result
	809	of adding the two values. This done, we return - the C<RETURN> macro
	810	makes sure that our return value is properly handled, and we pass the
	811	next operator to run back to the main run loop.
	812
	813	Most of these macros are explained in L<perlapi>, and some of the more
	814	important ones are explained in L<perlxs> as well. Pay special
	815	attention to L<perlguts/Background and PERL_IMPLICIT_CONTEXT> for
	816	information on the C<[pad]THX_?> macros.
	817
	818	=head1 FURTHER READING
	819
	820	For more information on the Perl internals, please see the documents
	821	listed at L<perl/Internals and C Language Interface>.