This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Actually generate full failure diagnostics in checkErrs() in B's OptreeCheck,
[perl5.git] / pod / perlhack.pod
CommitLineData
e8cd7eae
GS
1=head1 NAME
2
3perlhack - How to hack at the Perl internals
4
5=head1 DESCRIPTION
6
7This document attempts to explain how Perl development takes place,
8and ends with some suggestions for people wanting to become bona fide
9porters.
10
cce04beb
DG
11=head1 HOW PERL DEVELOPMENT HAPPENS
12
13=head2 Perl 5 Porters
14
e8cd7eae
GS
15The perl5-porters mailing list is where the Perl standard distribution
16is maintained and developed. The list can get anywhere from 10 to 150
17messages a day, depending on the heatedness of the debate. Most days
18there are two or three patches, extensions, features, or bugs being
19discussed at a time.
20
f8e3975a 21A searchable archive of the list is at either:
e8cd7eae
GS
22
23 http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
24
f8e3975a
IP
25or
26
27 http://archive.develooper.com/perl5-porters@perl.org/
28
e8cd7eae
GS
29List subscribers (the porters themselves) come in several flavours.
30Some are quiet curious lurkers, who rarely pitch in and instead watch
31the ongoing development to ensure they're forewarned of new changes or
32features in Perl. Some are representatives of vendors, who are there
33to make sure that Perl continues to compile and work on their
34platforms. Some patch any reported bug that they know how to fix,
35some are actively patching their pet area (threads, Win32, the regexp
36engine), while others seem to do nothing but complain. In other
37words, it's your usual mix of technical people.
38
39Over this group of porters presides Larry Wall. He has the final word
f6c51b38 40in what does and does not change in the Perl language. Various
b432a672
AL
41releases of Perl are shepherded by a "pumpking", a porter
42responsible for gathering patches, deciding on a patch-by-patch,
f6c51b38 43feature-by-feature basis what will and will not go into the release.
caf100c0 44For instance, Gurusamy Sarathy was the pumpking for the 5.6 release of
961f29c6 45Perl, and Jarkko Hietaniemi was the pumpking for the 5.8 release, and
1a88dbf8 46Rafael Garcia-Suarez holds the pumpking crown for the 5.10 release.
e8cd7eae
GS
47
48In addition, various people are pumpkings for different things. For
961f29c6
MB
49instance, Andy Dougherty and Jarkko Hietaniemi did a grand job as the
50I<Configure> pumpkin up till the 5.8 release. For the 5.10 release
51H.Merijn Brand took over.
e8cd7eae
GS
52
53Larry sees Perl development along the lines of the US government:
54there's the Legislature (the porters), the Executive branch (the
55pumpkings), and the Supreme Court (Larry). The legislature can
56discuss and submit patches to the executive branch all they like, but
57the executive branch is free to veto them. Rarely, the Supreme Court
58will side with the executive branch over the legislature, or the
59legislature over the executive branch. Mostly, however, the
60legislature and the executive branch are supposed to get along and
61work out their differences without impeachment or court cases.
62
63You might sometimes see reference to Rule 1 and Rule 2. Larry's power
64as Supreme Court is expressed in The Rules:
65
66=over 4
67
68=item 1
69
70Larry is always by definition right about how Perl should behave.
71This means he has final veto power on the core functionality.
72
73=item 2
74
75Larry is allowed to change his mind about any matter at a later date,
76regardless of whether he previously invoked Rule 1.
77
78=back
79
80Got that? Larry is always right, even when he was wrong. It's rare
81to see either Rule exercised, but they are often alluded to.
82
cce04beb
DG
83=head2 What makes for a good patch?
84
e8cd7eae
GS
85New features and extensions to the language are contentious, because
86the criteria used by the pumpkings, Larry, and other porters to decide
87which features should be implemented and incorporated are not codified
88in a few small design goals as with some other languages. Instead,
89the heuristics are flexible and often difficult to fathom. Here is
90one person's list, roughly in decreasing order of importance, of
91heuristics that new features have to be weighed against:
92
93=over 4
94
95=item Does concept match the general goals of Perl?
96
97These haven't been written anywhere in stone, but one approximation
98is:
99
100 1. Keep it fast, simple, and useful.
101 2. Keep features/concepts as orthogonal as possible.
102 3. No arbitrary limits (platforms, data sizes, cultures).
103 4. Keep it open and exciting to use/patch/advocate Perl everywhere.
104 5. Either assimilate new technologies, or build bridges to them.
105
106=item Where is the implementation?
107
108All the talk in the world is useless without an implementation. In
109almost every case, the person or people who argue for a new feature
110will be expected to be the ones who implement it. Porters capable
111of coding new features have their own agendas, and are not available
112to implement your (possibly good) idea.
113
114=item Backwards compatibility
115
116It's a cardinal sin to break existing Perl programs. New warnings are
117contentious--some say that a program that emits warnings is not
118broken, while others say it is. Adding keywords has the potential to
119break programs, changing the meaning of existing token sequences or
120functions might break programs.
121
122=item Could it be a module instead?
123
124Perl 5 has extension mechanisms, modules and XS, specifically to avoid
125the need to keep changing the Perl interpreter. You can write modules
126that export functions, you can give those functions prototypes so they
127can be called like built-in functions, you can even write XS code to
128mess with the runtime data structures of the Perl interpreter if you
129want to implement really complicated things. If it can be done in a
130module instead of in the core, it's highly unlikely to be added.
131
132=item Is the feature generic enough?
133
134Is this something that only the submitter wants added to the language,
135or would it be broadly useful? Sometimes, instead of adding a feature
136with a tight focus, the porters might decide to wait until someone
137implements the more generalized feature. For instance, instead of
b432a672 138implementing a "delayed evaluation" feature, the porters are waiting
e8cd7eae
GS
139for a macro system that would permit delayed evaluation and much more.
140
141=item Does it potentially introduce new bugs?
142
143Radical rewrites of large chunks of the Perl interpreter have the
144potential to introduce new bugs. The smaller and more localized the
145change, the better.
146
147=item Does it preclude other desirable features?
148
149A patch is likely to be rejected if it closes off future avenues of
150development. For instance, a patch that placed a true and final
151interpretation on prototypes is likely to be rejected because there
152are still options for the future of prototypes that haven't been
153addressed.
154
155=item Is the implementation robust?
156
157Good patches (tight code, complete, correct) stand more chance of
158going in. Sloppy or incorrect patches might be placed on the back
159burner until the pumpking has time to fix, or might be discarded
160altogether without further notice.
161
162=item Is the implementation generic enough to be portable?
163
164The worst patches make use of a system-specific features. It's highly
353c6505 165unlikely that non-portable additions to the Perl language will be
e8cd7eae
GS
166accepted.
167
a936dd3c
NC
168=item Is the implementation tested?
169
170Patches which change behaviour (fixing bugs or introducing new features)
171must include regression tests to verify that everything works as expected.
172Without tests provided by the original author, how can anyone else changing
173perl in the future be sure that they haven't unwittingly broken the behaviour
174the patch implements? And without tests, how can the patch's author be
9d077eaa 175confident that his/her hard work put into the patch won't be accidentally
a936dd3c
NC
176thrown away by someone in the future?
177
e8cd7eae
GS
178=item Is there enough documentation?
179
180Patches without documentation are probably ill-thought out or
181incomplete. Nothing can be added without documentation, so submitting
182a patch for the appropriate manpages as well as the source code is
a936dd3c 183always a good idea.
e8cd7eae
GS
184
185=item Is there another way to do it?
186
b432a672
AL
187Larry said "Although the Perl Slogan is I<There's More Than One Way
188to Do It>, I hesitate to make 10 ways to do something". This is a
e8cd7eae
GS
189tricky heuristic to navigate, though--one man's essential addition is
190another man's pointless cruft.
191
192=item Does it create too much work?
193
194Work for the pumpking, work for Perl programmers, work for module
195authors, ... Perl is supposed to be easy.
196
f6c51b38
GS
197=item Patches speak louder than words
198
199Working code is always preferred to pie-in-the-sky ideas. A patch to
200add a feature stands a much higher chance of making it to the language
201than does a random feature request, no matter how fervently argued the
b432a672 202request might be. This ties into "Will it be useful?", as the fact
f6c51b38
GS
203that someone took the time to make the patch demonstrates a strong
204desire for the feature.
205
e8cd7eae
GS
206=back
207
b432a672
AL
208If you're on the list, you might hear the word "core" bandied
209around. It refers to the standard distribution. "Hacking on the
210core" means you're changing the C source code to the Perl
211interpreter. "A core module" is one that ships with Perl.
e8cd7eae 212
cce04beb 213=head2 Getting the Perl source
a1f349fd 214
e8cd7eae 215The source code to the Perl interpreter, in its different versions, is
b16c2e4a
RGS
216kept in a repository managed by the git revision control system. The
217pumpkings and a few others have write access to the repository to check in
218changes.
2be4c08b 219
b16c2e4a 220How to clone and use the git perl repository is described in L<perlrepository>.
2be4c08b 221
b16c2e4a 222You can also choose to use rsync to get a copy of the current source tree
a77cd7b8 223for the bleadperl branch and all maintenance branches:
0cfb3454 224
a77cd7b8
MB
225 $ rsync -avz rsync://perl5.git.perl.org/perl-current .
226 $ rsync -avz rsync://perl5.git.perl.org/perl-5.12.x .
227 $ rsync -avz rsync://perl5.git.perl.org/perl-5.10.x .
228 $ rsync -avz rsync://perl5.git.perl.org/perl-5.8.x .
229 $ rsync -avz rsync://perl5.git.perl.org/perl-5.6.x .
230 $ rsync -avz rsync://perl5.git.perl.org/perl-5.005xx .
b16c2e4a
RGS
231
232(Add the C<--delete> option to remove leftover files)
0cfb3454 233
a77cd7b8
MB
234To get a full list of the available sync points:
235
236 $ rsync perl5.git.perl.org::
237
0cfb3454
GS
238You may also want to subscribe to the perl5-changes mailing list to
239receive a copy of each patch that gets submitted to the maintenance
240and development "branches" of the perl repository. See
241http://lists.perl.org/ for subscription information.
242
a1f349fd
MB
243If you are a member of the perl5-porters mailing list, it is a good
244thing to keep in touch with the most recent changes. If not only to
245verify if what you would have posted as a bug report isn't already
246solved in the most recent available perl development branch, also
c69ca1d4 247known as perl-current, bleeding edge perl, bleedperl or bleadperl.
2be4c08b
GS
248
249Needless to say, the source code in perl-current is usually in a perpetual
250state of evolution. You should expect it to be very buggy. Do B<not> use
251it for any purpose other than testing and development.
e8cd7eae 252
cce04beb 253=head2 Bug tracking with Perlbug
52315700 254
902821cc
RGS
255There is a single remote administrative interface for modifying bug status,
256category, open issues etc. using the B<RT> bugtracker system, maintained
257by Robert Spier. Become an administrator, and close any bugs you can get
3fd28c4e 258your sticky mitts on:
52315700 259
39417508 260 http://bugs.perl.org/
52315700 261
3fd28c4e 262To email the bug system administrators:
52315700 263
3fd28c4e 264 "perlbug-admin" <perlbug-admin@perl.org>
52315700 265
a1f349fd
MB
266=head2 Submitting patches
267
f7e1e956
MS
268Always submit patches to I<perl5-porters@perl.org>. If you're
269patching a core module and there's an author listed, send the author a
270copy (see L<Patching a core module>). This lets other porters review
271your patch, which catches a surprising number of errors in patches.
b16c2e4a
RGS
272Please patch against the latest B<development> version. (e.g., even if
273you're fixing a bug in the 5.8 track, patch against the C<blead> branch in
274the git repository.)
824d470b
RGS
275
276If changes are accepted, they are applied to the development branch. Then
fe749c9a 277the maintenance pumpking decides which of those patches is to be
b16c2e4a
RGS
278backported to the maint branch. Only patches that survive the heat of the
279development branch get applied to maintenance versions.
f7e1e956 280
2c8694a7 281Your patch should update the documentation and test suite. See
84cad487 282L<TESTING>. If you have added or removed files in the distribution,
2c8694a7
JH
283edit the MANIFEST file accordingly, sort the MANIFEST file using
284C<make manisort>, and include those changes as part of your patch.
e8cd7eae 285
824d470b
RGS
286Patching documentation also follows the same order: if accepted, a patch
287is first applied to B<development>, and if relevant then it's backported
288to B<maintenance>. (With an exception for some patches that document
289behaviour that only appears in the maintenance branch, but which has
290changed in the development version.)
291
7f89f796 292To report a bug in Perl, use the program L<perlbug> which comes with
e8cd7eae 293Perl (if you can't get Perl to work, send mail to the address
f18956b7 294I<perlbug@perl.org> or I<perlbug@perl.com>). Reporting bugs through
e8cd7eae 295I<perlbug> feeds into the automated bug-tracking system, access to
902821cc 296which is provided through the web at http://rt.perl.org/rt3/ . It
e8cd7eae
GS
297often pays to check the archives of the perl5-porters mailing list to
298see whether the bug you're reporting has been reported before, and if
299so whether it was considered a bug. See above for the location of
300the searchable archives.
301
f224927c 302The CPAN testers ( http://testers.cpan.org/ ) are a group of
ba139f7d 303volunteers who test CPAN modules on a variety of platforms. Perl
d3e8af89
GS
304Smokers ( http://www.nntp.perl.org/group/perl.daily-build and
305http://www.nntp.perl.org/group/perl.daily-build.reports/ )
902821cc 306automatically test Perl source releases on platforms with various
d3e8af89
GS
307configurations. Both efforts welcome volunteers. In order to get
308involved in smoke testing of the perl itself visit
309L<http://search.cpan.org/dist/Test-Smoke>. In order to start smoke
68cbce50
CBW
310testing CPAN modules visit L<http://search.cpan.org/dist/CPANPLUS-YACSmoke/>
311or L<http://search.cpan.org/dist/minismokebox/> or
d3e8af89 312L<http://search.cpan.org/dist/CPAN-Reporter/>.
e8cd7eae 313
e8cd7eae
GS
314It's a good idea to read and lurk for a while before chipping in.
315That way you'll get to see the dynamic of the conversations, learn the
316personalities of the players, and hopefully be better prepared to make
317a useful contribution when do you speak up.
318
319If after all this you still think you want to join the perl5-porters
f6c51b38
GS
320mailing list, send mail to I<perl5-porters-subscribe@perl.org>. To
321unsubscribe, send mail to I<perl5-porters-unsubscribe@perl.org>.
e8cd7eae 322
cce04beb
DG
323=head2 Patching a core module
324
325This works just like patching anything else, with an extra
326consideration. Many core modules also live on CPAN. If this is so,
327patch the CPAN version instead of the core and send the patch off to
328the module maintainer (with a copy to p5p). This will help the module
329maintainer keep the CPAN version in sync with the core version without
330constantly scanning p5p.
331
332The list of maintainers of core modules is usefully documented in
333F<Porting/Maintainers.pl>.
334
335=head2 Adding a new function to the core
336
337If, as part of a patch to fix a bug, or just because you have an
338especially good idea, you decide to add a new function to the core,
339discuss your ideas on p5p well before you start work. It may be that
340someone else has already attempted to do what you are considering and
341can give lots of good advice or even provide you with bits of code
342that they already started (but never finished).
343
344You have to follow all of the advice given above for patching. It is
345extremely important to test any addition thoroughly and add new tests
346to explore all boundary conditions that your new function is expected
347to handle. If your new function is used only by one module (e.g. toke),
348then it should probably be named S_your_function (for static); on the
349other hand, if you expect it to accessible from other functions in
350Perl, you should name it Perl_your_function. See L<perlguts/Internal Functions>
351for more details.
352
353The location of any new code is also an important consideration. Don't
354just create a new top level .c file and put your code there; you would
355have to make changes to Configure (so the Makefile is created properly),
356as well as possibly lots of include files. This is strictly pumpking
357business.
358
359It is better to add your function to one of the existing top level
360source code files, but your choice is complicated by the nature of
361the Perl distribution. Only the files that are marked as compiled
362static are located in the perl executable. Everything else is located
363in the shared library (or DLL if you are running under WIN32). So,
364for example, if a function was only used by functions located in
365toke.c, then your code can go in toke.c. If, however, you want to call
366the function from universal.c, then you should put your code in another
367location, for example util.c.
368
369In addition to writing your c-code, you will need to create an
370appropriate entry in embed.pl describing your function, then run
371'make regen_headers' to create the entries in the numerous header
372files that perl needs to compile correctly. See L<perlguts/Internal Functions>
373for information on the various options that you can set in embed.pl.
374You will forget to do this a few (or many) times and you will get
375warnings during the compilation phase. Make sure that you mention
376this when you post your patch to P5P; the pumpking needs to know this.
377
378When you write your new code, please be conscious of existing code
379conventions used in the perl source files. See L<perlstyle> for
380details. Although most of the guidelines discussed seem to focus on
381Perl code, rather than c, they all apply (except when they don't ;).
382Also see L<perlrepository> for lots of details about both formatting and
383submitting patches of your changes.
384
385Lastly, TEST TEST TEST TEST TEST any code before posting to p5p.
386Test on as many platforms as you can find. Test as many perl
387Configure options as you can (e.g. MULTIPLICITY). If you have
84cad487 388profiling or memory tools, see L<MEMORY DEBUGGERS> and L<PROFILING>
cce04beb
DG
389below for how to use them to further test your code. Remember that
390most of the people on P5P are doing this on their own time and
391don't have the time to debug your code.
392
393=head2 Background reading
394
a422fd2d
SC
395To hack on the Perl guts, you'll need to read the following things:
396
397=over 3
398
399=item L<perlguts>
400
401This is of paramount importance, since it's the documentation of what
402goes where in the Perl source. Read it over a couple of times and it
403might start to make sense - don't worry if it doesn't yet, because the
404best way to study it is to read it in conjunction with poking at Perl
405source, and we'll do that later on.
406
0aa6d4a5
RU
407Gisle Aas's "illustrated perlguts", also known as I<illguts>, has very
408helpful pictures:
de10be12 409
0aa6d4a5 410L<http://search.cpan.org/dist/illguts/>
a422fd2d
SC
411
412=item L<perlxstut> and L<perlxs>
413
414A working knowledge of XSUB programming is incredibly useful for core
415hacking; XSUBs use techniques drawn from the PP code, the portion of the
416guts that actually executes a Perl program. It's a lot gentler to learn
417those techniques from simple examples and explanation than from the core
418itself.
419
420=item L<perlapi>
421
422The documentation for the Perl API explains what some of the internal
423functions do, as well as the many macros used in the source.
424
425=item F<Porting/pumpkin.pod>
426
427This is a collection of words of wisdom for a Perl porter; some of it is
428only useful to the pumpkin holder, but most of it applies to anyone
429wanting to go about Perl development.
430
431=item The perl5-porters FAQ
432
902821cc
RGS
433This should be available from http://dev.perl.org/perl5/docs/p5p-faq.html .
434It contains hints on reading perl5-porters, information on how
435perl5-porters works and how Perl development in general works.
a422fd2d
SC
436
437=back
438
cce04beb
DG
439=head1 UNDERSTANDING THE SOURCE
440
441=head2 Finding your way around
a422fd2d
SC
442
443Perl maintenance can be split into a number of areas, and certain people
444(pumpkins) will have responsibility for each area. These areas sometimes
445correspond to files or directories in the source kit. Among the areas are:
446
447=over 3
448
449=item Core modules
450
c53fdc5e
RF
451Modules shipped as part of the Perl core live in various subdirectories, where
452two are dedicated to core-only modules, and two are for the dual-life modules
453which live on CPAN and may be maintained separately with respect to the Perl
454core:
455
456 lib/ is for pure-Perl modules, which exist in the core only.
457
195c30ce
KW
458 ext/ is for XS extensions, and modules with special Makefile.PL
459 requirements, which exist in the core only.
c53fdc5e 460
195c30ce
KW
461 cpan/ is for dual-life modules, where the CPAN module is
462 canonical (should be patched first).
c53fdc5e 463
195c30ce
KW
464 dist/ is for dual-life modules, where the blead source is
465 canonical.
a422fd2d 466
6bdc4f2c
FR
467For some dual-life modules it has not been discussed if the CPAN version or the
468blead source is canonical. Until that is done, those modules should be in
469F<cpan/>.
470
f7e1e956
MS
471=item Tests
472
473There are tests for nearly all the modules, built-ins and major bits
474of functionality. Test files all have a .t suffix. Module tests live
475in the F<lib/> and F<ext/> directories next to the module being
84cad487 476tested. Others live in F<t/>. See L<TESTING>
f7e1e956 477
a422fd2d
SC
478=item Documentation
479
480Documentation maintenance includes looking after everything in the
481F<pod/> directory, (as well as contributing new documentation) and
482the documentation to the modules in core.
483
484=item Configure
485
99c47ece 486The Configure process is the way we make Perl portable across the
a422fd2d 487myriad of operating systems it supports. Responsibility for the
99c47ece
MB
488Configure, build and installation process, as well as the overall
489portability of the core code rests with the Configure pumpkin -
490others help out with individual operating systems.
491
e1020413 492The three files that fall under his/her responsibility are Configure,
99c47ece
MB
493config_h.SH, and Porting/Glossary (and a whole bunch of small related
494files that are less important here). The Configure pumpkin decides how
495patches to these are dealt with. Currently, the Configure pumpkin will
496accept patches in most common formats, even directly to these files.
497Other committers are allowed to commit to these files under the strict
498condition that they will inform the Configure pumpkin, either on IRC
499(if he/she happens to be around) or through (personal) e-mail.
a422fd2d
SC
500
501The files involved are the operating system directories, (F<win32/>,
502F<os2/>, F<vms/> and so on) the shell scripts which generate F<config.h>
503and F<Makefile>, as well as the metaconfig files which generate
504F<Configure>. (metaconfig isn't included in the core distribution.)
505
99c47ece
MB
506See http://perl5.git.perl.org/metaconfig.git/blob/HEAD:/README for a
507description of the full process involved.
508
a422fd2d
SC
509=item Interpreter
510
511And of course, there's the core of the Perl interpreter itself. Let's
512have a look at that in a little more detail.
513
514=back
515
516Before we leave looking at the layout, though, don't forget that
517F<MANIFEST> contains not only the file names in the Perl distribution,
518but short descriptions of what's in them, too. For an overview of the
519important files, try this:
520
521 perl -lne 'print if /^[^\/]+\.[ch]\s+/' MANIFEST
522
523=head2 Elements of the interpreter
524
525The work of the interpreter has two main stages: compiling the code
526into the internal representation, or bytecode, and then executing it.
527L<perlguts/Compiled code> explains exactly how the compilation stage
528happens.
529
530Here is a short breakdown of perl's operation:
531
532=over 3
533
534=item Startup
535
536The action begins in F<perlmain.c>. (or F<miniperlmain.c> for miniperl)
537This is very high-level code, enough to fit on a single screen, and it
538resembles the code found in L<perlembed>; most of the real action takes
539place in F<perl.c>
540
fbcaf611 541F<perlmain.c> is generated by C<ExtUtils::Miniperl> from F<miniperlmain.c> at
9df8f87f
LB
542make time, so you should make perl to follow this along.
543
a422fd2d 544First, F<perlmain.c> allocates some memory and constructs a Perl
9df8f87f 545interpreter, along these lines:
a422fd2d
SC
546
547 1 PERL_SYS_INIT3(&argc,&argv,&env);
548 2
549 3 if (!PL_do_undump) {
550 4 my_perl = perl_alloc();
551 5 if (!my_perl)
552 6 exit(1);
553 7 perl_construct(my_perl);
554 8 PL_perl_destruct_level = 0;
555 9 }
556
557Line 1 is a macro, and its definition is dependent on your operating
558system. Line 3 references C<PL_do_undump>, a global variable - all
559global variables in Perl start with C<PL_>. This tells you whether the
560current running program was created with the C<-u> flag to perl and then
561F<undump>, which means it's going to be false in any sane context.
562
563Line 4 calls a function in F<perl.c> to allocate memory for a Perl
564interpreter. It's quite a simple function, and the guts of it looks like
565this:
566
195c30ce 567 my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter));
a422fd2d
SC
568
569Here you see an example of Perl's system abstraction, which we'll see
570later: C<PerlMem_malloc> is either your system's C<malloc>, or Perl's
571own C<malloc> as defined in F<malloc.c> if you selected that option at
572configure time.
573
9df8f87f
LB
574Next, in line 7, we construct the interpreter using perl_construct,
575also in F<perl.c>; this sets up all the special variables that Perl
576needs, the stacks, and so on.
a422fd2d
SC
577
578Now we pass Perl the command line options, and tell it to go:
579
195c30ce
KW
580 exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL);
581 if (!exitstatus)
582 perl_run(my_perl);
9df8f87f 583
195c30ce 584 exitstatus = perl_destruct(my_perl);
a422fd2d 585
195c30ce 586 perl_free(my_perl);
a422fd2d
SC
587
588C<perl_parse> is actually a wrapper around C<S_parse_body>, as defined
589in F<perl.c>, which processes the command line options, sets up any
590statically linked XS modules, opens the program and calls C<yyparse> to
591parse it.
592
593=item Parsing
594
595The aim of this stage is to take the Perl source, and turn it into an op
596tree. We'll see what one of those looks like later. Strictly speaking,
597there's three things going on here.
598
599C<yyparse>, the parser, lives in F<perly.c>, although you're better off
600reading the original YACC input in F<perly.y>. (Yes, Virginia, there
601B<is> a YACC grammar for Perl!) The job of the parser is to take your
b432a672 602code and "understand" it, splitting it into sentences, deciding which
a422fd2d
SC
603operands go with which operators and so on.
604
605The parser is nobly assisted by the lexer, which chunks up your input
606into tokens, and decides what type of thing each token is: a variable
607name, an operator, a bareword, a subroutine, a core function, and so on.
608The main point of entry to the lexer is C<yylex>, and that and its
609associated routines can be found in F<toke.c>. Perl isn't much like
610other computer languages; it's highly context sensitive at times, it can
611be tricky to work out what sort of token something is, or where a token
612ends. As such, there's a lot of interplay between the tokeniser and the
613parser, which can get pretty frightening if you're not used to it.
614
615As the parser understands a Perl program, it builds up a tree of
616operations for the interpreter to perform during execution. The routines
617which construct and link together the various operations are to be found
618in F<op.c>, and will be examined later.
619
620=item Optimization
621
622Now the parsing stage is complete, and the finished tree represents
623the operations that the Perl interpreter needs to perform to execute our
624program. Next, Perl does a dry run over the tree looking for
625optimisations: constant expressions such as C<3 + 4> will be computed
626now, and the optimizer will also see if any multiple operations can be
627replaced with a single one. For instance, to fetch the variable C<$foo>,
628instead of grabbing the glob C<*foo> and looking at the scalar
629component, the optimizer fiddles the op tree to use a function which
630directly looks up the scalar in question. The main optimizer is C<peep>
631in F<op.c>, and many ops have their own optimizing functions.
632
633=item Running
634
635Now we're finally ready to go: we have compiled Perl byte code, and all
636that's left to do is run it. The actual execution is done by the
637C<runops_standard> function in F<run.c>; more specifically, it's done by
638these three innocent looking lines:
639
16c91539 640 while ((PL_op = PL_op->op_ppaddr(aTHX))) {
a422fd2d
SC
641 PERL_ASYNC_CHECK();
642 }
643
644You may be more comfortable with the Perl version of that:
645
646 PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}};
647
648Well, maybe not. Anyway, each op contains a function pointer, which
649stipulates the function which will actually carry out the operation.
650This function will return the next op in the sequence - this allows for
651things like C<if> which choose the next op dynamically at run time.
652The C<PERL_ASYNC_CHECK> makes sure that things like signals interrupt
653execution if required.
654
655The actual functions called are known as PP code, and they're spread
b432a672 656between four files: F<pp_hot.c> contains the "hot" code, which is most
a422fd2d
SC
657often used and highly optimized, F<pp_sys.c> contains all the
658system-specific functions, F<pp_ctl.c> contains the functions which
659implement control structures (C<if>, C<while> and the like) and F<pp.c>
660contains everything else. These are, if you like, the C code for Perl's
661built-in functions and operators.
662
dfc98234
DM
663Note that each C<pp_> function is expected to return a pointer to the next
664op. Calls to perl subs (and eval blocks) are handled within the same
665runops loop, and do not consume extra space on the C stack. For example,
666C<pp_entersub> and C<pp_entertry> just push a C<CxSUB> or C<CxEVAL> block
667struct onto the context stack which contain the address of the op
668following the sub call or eval. They then return the first op of that sub
669or eval block, and so execution continues of that sub or block. Later, a
670C<pp_leavesub> or C<pp_leavetry> op pops the C<CxSUB> or C<CxEVAL>,
671retrieves the return op from it, and returns it.
672
673=item Exception handing
674
0503309d 675Perl's exception handing (i.e. C<die> etc.) is built on top of the low-level
dfc98234 676C<setjmp()>/C<longjmp()> C-library functions. These basically provide a
28a5cf3b 677way to capture the current PC and SP registers and later restore them; i.e.
dfc98234
DM
678a C<longjmp()> continues at the point in code where a previous C<setjmp()>
679was done, with anything further up on the C stack being lost. This is why
680code should always save values using C<SAVE_FOO> rather than in auto
681variables.
682
683The perl core wraps C<setjmp()> etc in the macros C<JMPENV_PUSH> and
684C<JMPENV_JUMP>. The basic rule of perl exceptions is that C<exit>, and
685C<die> (in the absence of C<eval>) perform a C<JMPENV_JUMP(2)>, while
686C<die> within C<eval> does a C<JMPENV_JUMP(3)>.
687
688At entry points to perl, such as C<perl_parse()>, C<perl_run()> and
689C<call_sv(cv, G_EVAL)> each does a C<JMPENV_PUSH>, then enter a runops
690loop or whatever, and handle possible exception returns. For a 2 return,
691final cleanup is performed, such as popping stacks and calling C<CHECK> or
692C<END> blocks. Amongst other things, this is how scope cleanup still
693occurs during an C<exit>.
694
695If a C<die> can find a C<CxEVAL> block on the context stack, then the
696stack is popped to that level and the return op in that block is assigned
697to C<PL_restartop>; then a C<JMPENV_JUMP(3)> is performed. This normally
698passes control back to the guard. In the case of C<perl_run> and
699C<call_sv>, a non-null C<PL_restartop> triggers re-entry to the runops
700loop. The is the normal way that C<die> or C<croak> is handled within an
701C<eval>.
702
703Sometimes ops are executed within an inner runops loop, such as tie, sort
704or overload code. In this case, something like
705
706 sub FETCH { eval { die } }
707
708would cause a longjmp right back to the guard in C<perl_run>, popping both
709runops loops, which is clearly incorrect. One way to avoid this is for the
710tie code to do a C<JMPENV_PUSH> before executing C<FETCH> in the inner
711runops loop, but for efficiency reasons, perl in fact just sets a flag,
712using C<CATCH_SET(TRUE)>. The C<pp_require>, C<pp_entereval> and
713C<pp_entertry> ops check this flag, and if true, they call C<docatch>,
714which does a C<JMPENV_PUSH> and starts a new runops level to execute the
715code, rather than doing it on the current loop.
716
717As a further optimisation, on exit from the eval block in the C<FETCH>,
718execution of the code following the block is still carried on in the inner
719loop. When an exception is raised, C<docatch> compares the C<JMPENV>
720level of the C<CxEVAL> with C<PL_top_env> and if they differ, just
721re-throws the exception. In this way any inner loops get popped.
722
723Here's an example.
724
725 1: eval { tie @a, 'A' };
726 2: sub A::TIEARRAY {
727 3: eval { die };
728 4: die;
729 5: }
730
731To run this code, C<perl_run> is called, which does a C<JMPENV_PUSH> then
732enters a runops loop. This loop executes the eval and tie ops on line 1,
733with the eval pushing a C<CxEVAL> onto the context stack.
734
735The C<pp_tie> does a C<CATCH_SET(TRUE)>, then starts a second runops loop
736to execute the body of C<TIEARRAY>. When it executes the entertry op on
737line 3, C<CATCH_GET> is true, so C<pp_entertry> calls C<docatch> which
738does a C<JMPENV_PUSH> and starts a third runops loop, which then executes
739the die op. At this point the C call stack looks like this:
740
741 Perl_pp_die
742 Perl_runops # third loop
743 S_docatch_body
744 S_docatch
745 Perl_pp_entertry
746 Perl_runops # second loop
747 S_call_body
748 Perl_call_sv
749 Perl_pp_tie
750 Perl_runops # first loop
751 S_run_body
752 perl_run
753 main
754
755and the context and data stacks, as shown by C<-Dstv>, look like:
756
757 STACK 0: MAIN
758 CX 0: BLOCK =>
759 CX 1: EVAL => AV() PV("A"\0)
760 retop=leave
761 STACK 1: MAGIC
762 CX 0: SUB =>
763 retop=(null)
764 CX 1: EVAL => *
765 retop=nextstate
766
767The die pops the first C<CxEVAL> off the context stack, sets
768C<PL_restartop> from it, does a C<JMPENV_JUMP(3)>, and control returns to
769the top C<docatch>. This then starts another third-level runops level,
770which executes the nextstate, pushmark and die ops on line 4. At the point
771that the second C<pp_die> is called, the C call stack looks exactly like
772that above, even though we are no longer within an inner eval; this is
773because of the optimization mentioned earlier. However, the context stack
774now looks like this, ie with the top CxEVAL popped:
775
776 STACK 0: MAIN
777 CX 0: BLOCK =>
778 CX 1: EVAL => AV() PV("A"\0)
779 retop=leave
780 STACK 1: MAGIC
781 CX 0: SUB =>
782 retop=(null)
783
784The die on line 4 pops the context stack back down to the CxEVAL, leaving
785it as:
786
787 STACK 0: MAIN
788 CX 0: BLOCK =>
789
790As usual, C<PL_restartop> is extracted from the C<CxEVAL>, and a
791C<JMPENV_JUMP(3)> done, which pops the C stack back to the docatch:
792
793 S_docatch
794 Perl_pp_entertry
795 Perl_runops # second loop
796 S_call_body
797 Perl_call_sv
798 Perl_pp_tie
799 Perl_runops # first loop
800 S_run_body
801 perl_run
802 main
803
804In this case, because the C<JMPENV> level recorded in the C<CxEVAL>
805differs from the current one, C<docatch> just does a C<JMPENV_JUMP(3)>
806and the C stack unwinds to:
807
808 perl_run
809 main
810
811Because C<PL_restartop> is non-null, C<run_body> starts a new runops loop
812and execution continues.
813
a422fd2d
SC
814=back
815
816=head2 Internal Variable Types
817
818You should by now have had a look at L<perlguts>, which tells you about
819Perl's internal variable types: SVs, HVs, AVs and the rest. If not, do
820that now.
821
822These variables are used not only to represent Perl-space variables, but
823also any constants in the code, as well as some structures completely
824internal to Perl. The symbol table, for instance, is an ordinary Perl
825hash. Your code is represented by an SV as it's read into the parser;
826any program files you call are opened via ordinary Perl filehandles, and
827so on.
828
829The core L<Devel::Peek|Devel::Peek> module lets us examine SVs from a
830Perl program. Let's see, for instance, how Perl treats the constant
831C<"hello">.
832
833 % perl -MDevel::Peek -e 'Dump("hello")'
834 1 SV = PV(0xa041450) at 0xa04ecbc
835 2 REFCNT = 1
836 3 FLAGS = (POK,READONLY,pPOK)
837 4 PV = 0xa0484e0 "hello"\0
838 5 CUR = 5
839 6 LEN = 6
840
841Reading C<Devel::Peek> output takes a bit of practise, so let's go
842through it line by line.
843
844Line 1 tells us we're looking at an SV which lives at C<0xa04ecbc> in
845memory. SVs themselves are very simple structures, but they contain a
846pointer to a more complex structure. In this case, it's a PV, a
847structure which holds a string value, at location C<0xa041450>. Line 2
848is the reference count; there are no other references to this data, so
849it's 1.
850
851Line 3 are the flags for this SV - it's OK to use it as a PV, it's a
852read-only SV (because it's a constant) and the data is a PV internally.
853Next we've got the contents of the string, starting at location
854C<0xa0484e0>.
855
856Line 5 gives us the current length of the string - note that this does
857B<not> include the null terminator. Line 6 is not the length of the
858string, but the length of the currently allocated buffer; as the string
859grows, Perl automatically extends the available storage via a routine
860called C<SvGROW>.
861
862You can get at any of these quantities from C very easily; just add
863C<Sv> to the name of the field shown in the snippet, and you've got a
864macro which will return the value: C<SvCUR(sv)> returns the current
865length of the string, C<SvREFCOUNT(sv)> returns the reference count,
866C<SvPV(sv, len)> returns the string itself with its length, and so on.
867More macros to manipulate these properties can be found in L<perlguts>.
868
869Let's take an example of manipulating a PV, from C<sv_catpvn>, in F<sv.c>
870
871 1 void
872 2 Perl_sv_catpvn(pTHX_ register SV *sv, register const char *ptr, register STRLEN len)
873 3 {
874 4 STRLEN tlen;
875 5 char *junk;
876
877 6 junk = SvPV_force(sv, tlen);
878 7 SvGROW(sv, tlen + len + 1);
879 8 if (ptr == junk)
880 9 ptr = SvPVX(sv);
881 10 Move(ptr,SvPVX(sv)+tlen,len,char);
882 11 SvCUR(sv) += len;
883 12 *SvEND(sv) = '\0';
884 13 (void)SvPOK_only_UTF8(sv); /* validate pointer */
885 14 SvTAINT(sv);
886 15 }
887
888This is a function which adds a string, C<ptr>, of length C<len> onto
889the end of the PV stored in C<sv>. The first thing we do in line 6 is
890make sure that the SV B<has> a valid PV, by calling the C<SvPV_force>
891macro to force a PV. As a side effect, C<tlen> gets set to the current
892value of the PV, and the PV itself is returned to C<junk>.
893
b1866b2d 894In line 7, we make sure that the SV will have enough room to accommodate
a422fd2d
SC
895the old string, the new string and the null terminator. If C<LEN> isn't
896big enough, C<SvGROW> will reallocate space for us.
897
898Now, if C<junk> is the same as the string we're trying to add, we can
899grab the string directly from the SV; C<SvPVX> is the address of the PV
900in the SV.
901
902Line 10 does the actual catenation: the C<Move> macro moves a chunk of
903memory around: we move the string C<ptr> to the end of the PV - that's
904the start of the PV plus its current length. We're moving C<len> bytes
905of type C<char>. After doing so, we need to tell Perl we've extended the
906string, by altering C<CUR> to reflect the new length. C<SvEND> is a
907macro which gives us the end of the string, so that needs to be a
908C<"\0">.
909
910Line 13 manipulates the flags; since we've changed the PV, any IV or NV
911values will no longer be valid: if we have C<$a=10; $a.="6";> we don't
1e54db1a 912want to use the old IV of 10. C<SvPOK_only_utf8> is a special UTF-8-aware
a422fd2d
SC
913version of C<SvPOK_only>, a macro which turns off the IOK and NOK flags
914and turns on POK. The final C<SvTAINT> is a macro which launders tainted
915data if taint mode is turned on.
916
917AVs and HVs are more complicated, but SVs are by far the most common
918variable type being thrown around. Having seen something of how we
919manipulate these, let's go on and look at how the op tree is
920constructed.
921
922=head2 Op Trees
923
924First, what is the op tree, anyway? The op tree is the parsed
925representation of your program, as we saw in our section on parsing, and
926it's the sequence of operations that Perl goes through to execute your
927program, as we saw in L</Running>.
928
929An op is a fundamental operation that Perl can perform: all the built-in
930functions and operators are ops, and there are a series of ops which
931deal with concepts the interpreter needs internally - entering and
932leaving a block, ending a statement, fetching a variable, and so on.
933
934The op tree is connected in two ways: you can imagine that there are two
935"routes" through it, two orders in which you can traverse the tree.
936First, parse order reflects how the parser understood the code, and
937secondly, execution order tells perl what order to perform the
938operations in.
939
940The easiest way to examine the op tree is to stop Perl after it has
941finished parsing, and get it to dump out the tree. This is exactly what
7d7d5695
RGS
942the compiler backends L<B::Terse|B::Terse>, L<B::Concise|B::Concise>
943and L<B::Debug|B::Debug> do.
a422fd2d
SC
944
945Let's have a look at how Perl sees C<$a = $b + $c>:
946
947 % perl -MO=Terse -e '$a=$b+$c'
948 1 LISTOP (0x8179888) leave
949 2 OP (0x81798b0) enter
950 3 COP (0x8179850) nextstate
951 4 BINOP (0x8179828) sassign
952 5 BINOP (0x8179800) add [1]
953 6 UNOP (0x81796e0) null [15]
954 7 SVOP (0x80fafe0) gvsv GV (0x80fa4cc) *b
955 8 UNOP (0x81797e0) null [15]
956 9 SVOP (0x8179700) gvsv GV (0x80efeb0) *c
957 10 UNOP (0x816b4f0) null [15]
958 11 SVOP (0x816dcf0) gvsv GV (0x80fa460) *a
959
960Let's start in the middle, at line 4. This is a BINOP, a binary
961operator, which is at location C<0x8179828>. The specific operator in
962question is C<sassign> - scalar assignment - and you can find the code
963which implements it in the function C<pp_sassign> in F<pp_hot.c>. As a
964binary operator, it has two children: the add operator, providing the
965result of C<$b+$c>, is uppermost on line 5, and the left hand side is on
966line 10.
967
968Line 10 is the null op: this does exactly nothing. What is that doing
969there? If you see the null op, it's a sign that something has been
970optimized away after parsing. As we mentioned in L</Optimization>,
971the optimization stage sometimes converts two operations into one, for
972example when fetching a scalar variable. When this happens, instead of
973rewriting the op tree and cleaning up the dangling pointers, it's easier
974just to replace the redundant operation with the null op. Originally,
975the tree would have looked like this:
976
977 10 SVOP (0x816b4f0) rv2sv [15]
978 11 SVOP (0x816dcf0) gv GV (0x80fa460) *a
979
980That is, fetch the C<a> entry from the main symbol table, and then look
981at the scalar component of it: C<gvsv> (C<pp_gvsv> into F<pp_hot.c>)
982happens to do both these things.
983
984The right hand side, starting at line 5 is similar to what we've just
985seen: we have the C<add> op (C<pp_add> also in F<pp_hot.c>) add together
986two C<gvsv>s.
987
988Now, what's this about?
989
990 1 LISTOP (0x8179888) leave
991 2 OP (0x81798b0) enter
992 3 COP (0x8179850) nextstate
993
994C<enter> and C<leave> are scoping ops, and their job is to perform any
995housekeeping every time you enter and leave a block: lexical variables
996are tidied up, unreferenced variables are destroyed, and so on. Every
997program will have those first three lines: C<leave> is a list, and its
998children are all the statements in the block. Statements are delimited
999by C<nextstate>, so a block is a collection of C<nextstate> ops, with
1000the ops to be performed for each statement being the children of
1001C<nextstate>. C<enter> is a single op which functions as a marker.
1002
1003That's how Perl parsed the program, from top to bottom:
1004
1005 Program
1006 |
1007 Statement
1008 |
1009 =
1010 / \
1011 / \
1012 $a +
1013 / \
1014 $b $c
1015
1016However, it's impossible to B<perform> the operations in this order:
1017you have to find the values of C<$b> and C<$c> before you add them
1018together, for instance. So, the other thread that runs through the op
1019tree is the execution order: each op has a field C<op_next> which points
1020to the next op to be run, so following these pointers tells us how perl
1021executes the code. We can traverse the tree in this order using
1022the C<exec> option to C<B::Terse>:
1023
1024 % perl -MO=Terse,exec -e '$a=$b+$c'
1025 1 OP (0x8179928) enter
1026 2 COP (0x81798c8) nextstate
1027 3 SVOP (0x81796c8) gvsv GV (0x80fa4d4) *b
1028 4 SVOP (0x8179798) gvsv GV (0x80efeb0) *c
1029 5 BINOP (0x8179878) add [1]
1030 6 SVOP (0x816dd38) gvsv GV (0x80fa468) *a
1031 7 BINOP (0x81798a0) sassign
1032 8 LISTOP (0x8179900) leave
1033
1034This probably makes more sense for a human: enter a block, start a
1035statement. Get the values of C<$b> and C<$c>, and add them together.
1036Find C<$a>, and assign one to the other. Then leave.
1037
1038The way Perl builds up these op trees in the parsing process can be
1039unravelled by examining F<perly.y>, the YACC grammar. Let's take the
1040piece we need to construct the tree for C<$a = $b + $c>
1041
1042 1 term : term ASSIGNOP term
1043 2 { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); }
1044 3 | term ADDOP term
1045 4 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }
1046
1047If you're not used to reading BNF grammars, this is how it works: You're
1048fed certain things by the tokeniser, which generally end up in upper
1049case. Here, C<ADDOP>, is provided when the tokeniser sees C<+> in your
1050code. C<ASSIGNOP> is provided when C<=> is used for assigning. These are
b432a672 1051"terminal symbols", because you can't get any simpler than them.
a422fd2d
SC
1052
1053The grammar, lines one and three of the snippet above, tells you how to
b432a672 1054build up more complex forms. These complex forms, "non-terminal symbols"
a422fd2d
SC
1055are generally placed in lower case. C<term> here is a non-terminal
1056symbol, representing a single expression.
1057
1058The grammar gives you the following rule: you can make the thing on the
1059left of the colon if you see all the things on the right in sequence.
1060This is called a "reduction", and the aim of parsing is to completely
1061reduce the input. There are several different ways you can perform a
1062reduction, separated by vertical bars: so, C<term> followed by C<=>
1063followed by C<term> makes a C<term>, and C<term> followed by C<+>
1064followed by C<term> can also make a C<term>.
1065
1066So, if you see two terms with an C<=> or C<+>, between them, you can
1067turn them into a single expression. When you do this, you execute the
1068code in the block on the next line: if you see C<=>, you'll do the code
1069in line 2. If you see C<+>, you'll do the code in line 4. It's this code
1070which contributes to the op tree.
1071
1072 | term ADDOP term
1073 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }
1074
1075What this does is creates a new binary op, and feeds it a number of
1076variables. The variables refer to the tokens: C<$1> is the first token in
1077the input, C<$2> the second, and so on - think regular expression
1078backreferences. C<$$> is the op returned from this reduction. So, we
1079call C<newBINOP> to create a new binary operator. The first parameter to
1080C<newBINOP>, a function in F<op.c>, is the op type. It's an addition
1081operator, so we want the type to be C<ADDOP>. We could specify this
1082directly, but it's right there as the second token in the input, so we
b432a672
AL
1083use C<$2>. The second parameter is the op's flags: 0 means "nothing
1084special". Then the things to add: the left and right hand side of our
a422fd2d
SC
1085expression, in scalar context.
1086
1087=head2 Stacks
1088
1089When perl executes something like C<addop>, how does it pass on its
1090results to the next op? The answer is, through the use of stacks. Perl
1091has a number of stacks to store things it's currently working on, and
1092we'll look at the three most important ones here.
1093
1094=over 3
1095
1096=item Argument stack
1097
1098Arguments are passed to PP code and returned from PP code using the
1099argument stack, C<ST>. The typical way to handle arguments is to pop
1100them off the stack, deal with them how you wish, and then push the result
1101back onto the stack. This is how, for instance, the cosine operator
1102works:
1103
1104 NV value;
1105 value = POPn;
1106 value = Perl_cos(value);
1107 XPUSHn(value);
1108
1109We'll see a more tricky example of this when we consider Perl's macros
1110below. C<POPn> gives you the NV (floating point value) of the top SV on
1111the stack: the C<$x> in C<cos($x)>. Then we compute the cosine, and push
1112the result back as an NV. The C<X> in C<XPUSHn> means that the stack
1113should be extended if necessary - it can't be necessary here, because we
1114know there's room for one more item on the stack, since we've just
1115removed one! The C<XPUSH*> macros at least guarantee safety.
1116
1117Alternatively, you can fiddle with the stack directly: C<SP> gives you
1118the first element in your portion of the stack, and C<TOP*> gives you
1119the top SV/IV/NV/etc. on the stack. So, for instance, to do unary
1120negation of an integer:
1121
1122 SETi(-TOPi);
1123
1124Just set the integer value of the top stack entry to its negation.
1125
1126Argument stack manipulation in the core is exactly the same as it is in
1127XSUBs - see L<perlxstut>, L<perlxs> and L<perlguts> for a longer
1128description of the macros used in stack manipulation.
1129
1130=item Mark stack
1131
b432a672 1132I say "your portion of the stack" above because PP code doesn't
a422fd2d
SC
1133necessarily get the whole stack to itself: if your function calls
1134another function, you'll only want to expose the arguments aimed for the
1135called function, and not (necessarily) let it get at your own data. The
b432a672 1136way we do this is to have a "virtual" bottom-of-stack, exposed to each
a422fd2d
SC
1137function. The mark stack keeps bookmarks to locations in the argument
1138stack usable by each function. For instance, when dealing with a tied
b432a672 1139variable, (internally, something with "P" magic) Perl has to call
a422fd2d
SC
1140methods for accesses to the tied variables. However, we need to separate
1141the arguments exposed to the method to the argument exposed to the
ed233832
DM
1142original function - the store or fetch or whatever it may be. Here's
1143roughly how the tied C<push> is implemented; see C<av_push> in F<av.c>:
a422fd2d
SC
1144
1145 1 PUSHMARK(SP);
1146 2 EXTEND(SP,2);
1147 3 PUSHs(SvTIED_obj((SV*)av, mg));
1148 4 PUSHs(val);
1149 5 PUTBACK;
1150 6 ENTER;
1151 7 call_method("PUSH", G_SCALAR|G_DISCARD);
1152 8 LEAVE;
13a2d996 1153
a422fd2d
SC
1154Let's examine the whole implementation, for practice:
1155
1156 1 PUSHMARK(SP);
1157
1158Push the current state of the stack pointer onto the mark stack. This is
1159so that when we've finished adding items to the argument stack, Perl
1160knows how many things we've added recently.
1161
1162 2 EXTEND(SP,2);
1163 3 PUSHs(SvTIED_obj((SV*)av, mg));
1164 4 PUSHs(val);
1165
1166We're going to add two more items onto the argument stack: when you have
1167a tied array, the C<PUSH> subroutine receives the object and the value
1168to be pushed, and that's exactly what we have here - the tied object,
1169retrieved with C<SvTIED_obj>, and the value, the SV C<val>.
1170
1171 5 PUTBACK;
1172
e89a6d4e
JD
1173Next we tell Perl to update the global stack pointer from our internal
1174variable: C<dSP> only gave us a local copy, not a reference to the global.
a422fd2d
SC
1175
1176 6 ENTER;
1177 7 call_method("PUSH", G_SCALAR|G_DISCARD);
1178 8 LEAVE;
1179
1180C<ENTER> and C<LEAVE> localise a block of code - they make sure that all
1181variables are tidied up, everything that has been localised gets
1182its previous value returned, and so on. Think of them as the C<{> and
1183C<}> of a Perl block.
1184
1185To actually do the magic method call, we have to call a subroutine in
1186Perl space: C<call_method> takes care of that, and it's described in
1187L<perlcall>. We call the C<PUSH> method in scalar context, and we're
e89a6d4e
JD
1188going to discard its return value. The call_method() function
1189removes the top element of the mark stack, so there is nothing for
1190the caller to clean up.
a422fd2d 1191
a422fd2d
SC
1192=item Save stack
1193
1194C doesn't have a concept of local scope, so perl provides one. We've
1195seen that C<ENTER> and C<LEAVE> are used as scoping braces; the save
1196stack implements the C equivalent of, for example:
1197
1198 {
1199 local $foo = 42;
1200 ...
1201 }
1202
1203See L<perlguts/Localising Changes> for how to use the save stack.
1204
1205=back
1206
1207=head2 Millions of Macros
1208
1209One thing you'll notice about the Perl source is that it's full of
1210macros. Some have called the pervasive use of macros the hardest thing
1211to understand, others find it adds to clarity. Let's take an example,
1212the code which implements the addition operator:
1213
1214 1 PP(pp_add)
1215 2 {
39644a26 1216 3 dSP; dATARGET; tryAMAGICbin(add,opASSIGN);
a422fd2d
SC
1217 4 {
1218 5 dPOPTOPnnrl_ul;
1219 6 SETn( left + right );
1220 7 RETURN;
1221 8 }
1222 9 }
1223
1224Every line here (apart from the braces, of course) contains a macro. The
1225first line sets up the function declaration as Perl expects for PP code;
1226line 3 sets up variable declarations for the argument stack and the
1227target, the return value of the operation. Finally, it tries to see if
1228the addition operation is overloaded; if so, the appropriate subroutine
1229is called.
1230
1231Line 5 is another variable declaration - all variable declarations start
1232with C<d> - which pops from the top of the argument stack two NVs (hence
1233C<nn>) and puts them into the variables C<right> and C<left>, hence the
1234C<rl>. These are the two operands to the addition operator. Next, we
1235call C<SETn> to set the NV of the return value to the result of adding
1236the two values. This done, we return - the C<RETURN> macro makes sure
1237that our return value is properly handled, and we pass the next operator
1238to run back to the main run loop.
1239
1240Most of these macros are explained in L<perlapi>, and some of the more
1241important ones are explained in L<perlxs> as well. Pay special attention
1242to L<perlguts/Background and PERL_IMPLICIT_CONTEXT> for information on
1243the C<[pad]THX_?> macros.
1244
52d59bef
JH
1245=head2 The .i Targets
1246
1247You can expand the macros in a F<foo.c> file by saying
1248
1249 make foo.i
1250
1251which will expand the macros using cpp. Don't be scared by the results.
1252
cce04beb 1253=head1 TESTING
955fec6b 1254
cce04beb
DG
1255Every module and built-in function has an associated test file (or
1256should...). If you add or change functionality, you have to write a
1257test. If you fix a bug, you have to write a test so that bug never
1258comes back. If you alter the docs, it would be nice to test what the
1259new documentation says.
955fec6b 1260
cce04beb
DG
1261In short, if you submit a patch you probably also have to patch the
1262tests.
955fec6b 1263
cce04beb 1264=head2 Where to find test files
955fec6b 1265
cce04beb
DG
1266For modules, the test file is right next to the module itself.
1267F<lib/strict.t> tests F<lib/strict.pm>. This is a recent innovation,
1268so there are some snags (and it would be wonderful for you to brush
1269them out), but it basically works that way. Everything else lives in
1270F<t/>.
955fec6b 1271
cce04beb
DG
1272Testing of warning messages is often separately done by using expect scripts in
1273F<t/lib/warnings>. This is because much of the setup for them is already done
1274for you.
955fec6b 1275
cce04beb
DG
1276If you add a new test directory under F<t/>, it is imperative that you
1277add that directory to F<t/HARNESS> and F<t/TEST>.
955fec6b 1278
cce04beb 1279=over 3
955fec6b 1280
cce04beb 1281=item F<t/base/>
955fec6b 1282
cce04beb
DG
1283Testing of the absolute basic functionality of Perl. Things like
1284C<if>, basic file reads and writes, simple regexes, etc. These are
1285run first in the test suite and if any of them fail, something is
1286I<really> broken.
955fec6b 1287
cce04beb 1288=item F<t/cmd/>
955fec6b 1289
cce04beb
DG
1290These test the basic control structures, C<if/else>, C<while>,
1291subroutines, etc.
955fec6b 1292
cce04beb 1293=item F<t/comp/>
955fec6b 1294
cce04beb 1295Tests basic issues of how Perl parses and compiles itself.
955fec6b 1296
cce04beb 1297=item F<t/io/>
955fec6b 1298
cce04beb 1299Tests for built-in IO functions, including command line arguments.
955fec6b 1300
cce04beb 1301=item F<t/lib/>
955fec6b 1302
cce04beb
DG
1303The old home for the module tests, you shouldn't put anything new in
1304here. There are still some bits and pieces hanging around in here
1305that need to be moved. Perhaps you could move them? Thanks!
955fec6b 1306
cce04beb 1307=item F<t/mro/>
955fec6b 1308
cce04beb
DG
1309Tests for perl's method resolution order implementations
1310(see L<mro>).
955fec6b 1311
cce04beb 1312=item F<t/op/>
955fec6b 1313
cce04beb
DG
1314Tests for perl's built in functions that don't fit into any of the
1315other directories.
955fec6b 1316
cce04beb 1317=item F<t/re/>
955fec6b 1318
cce04beb
DG
1319Tests for regex related functions or behaviour. (These used to live
1320in t/op).
955fec6b 1321
cce04beb 1322=item F<t/run/>
955fec6b 1323
cce04beb
DG
1324Testing features of how perl actually runs, including exit codes and
1325handling of PERL* environment variables.
955fec6b 1326
cce04beb 1327=item F<t/uni/>
955fec6b 1328
cce04beb 1329Tests for the core support of Unicode.
955fec6b 1330
cce04beb 1331=item F<t/win32/>
955fec6b 1332
cce04beb 1333Windows-specific tests.
955fec6b 1334
cce04beb 1335=item F<t/x2p>
955fec6b 1336
cce04beb 1337A test suite for the s2p converter.
955fec6b 1338
cce04beb 1339=back
955fec6b 1340
cce04beb
DG
1341The core uses the same testing style as the rest of Perl, a simple
1342"ok/not ok" run through Test::Harness, but there are a few special
1343considerations.
955fec6b 1344
cce04beb
DG
1345There are three ways to write a test in the core. Test::More,
1346t/test.pl and ad hoc C<print $test ? "ok 42\n" : "not ok 42\n">. The
1347decision of which to use depends on what part of the test suite you're
1348working on. This is a measure to prevent a high-level failure (such
1349as Config.pm breaking) from causing basic functionality tests to fail.
1350If you write your own test, use the L<Test Anything Protocol|TAP>.
955fec6b 1351
cce04beb 1352=over 4
955fec6b 1353
cce04beb 1354=item t/base t/comp
955fec6b 1355
cce04beb
DG
1356Since we don't know if require works, or even subroutines, use ad hoc
1357tests for these two. Step carefully to avoid using the feature being
1358tested.
955fec6b 1359
cce04beb 1360=item t/cmd t/run t/io t/op
955fec6b 1361
cce04beb
DG
1362Now that basic require() and subroutines are tested, you can use the
1363t/test.pl library which emulates the important features of Test::More
1364while using a minimum of core features.
955fec6b 1365
cce04beb
DG
1366You can also conditionally use certain libraries like Config, but be
1367sure to skip the test gracefully if it's not there.
955fec6b 1368
cce04beb 1369=item t/lib ext lib
955fec6b 1370
cce04beb
DG
1371Now that the core of Perl is tested, Test::More can be used. You can
1372also use the full suite of core modules in the tests.
a422fd2d 1373
cce04beb 1374=back
a422fd2d 1375
cce04beb
DG
1376When you say "make test" Perl uses the F<t/TEST> program to run the
1377test suite (except under Win32 where it uses F<t/harness> instead.)
1378All tests are run from the F<t/> directory, B<not> the directory
1379which contains the test. This causes some problems with the tests
1380in F<lib/>, so here's some opportunity for some patching.
a422fd2d 1381
cce04beb
DG
1382You must be triply conscious of cross-platform concerns. This usually
1383boils down to using File::Spec and avoiding things like C<fork()> and
1384C<system()> unless absolutely necessary.
955fec6b 1385
cce04beb 1386=head2 Special Make Test Targets
a422fd2d 1387
cce04beb
DG
1388There are various special make targets that can be used to test Perl
1389slightly differently than the standard "test" target. Not all them
1390are expected to give a 100% success rate. Many of them have several
1391aliases, and many of them are not available on certain operating
1392systems.
a422fd2d 1393
cce04beb 1394=over 4
13a2d996 1395
cce04beb 1396=item coretest
a422fd2d 1397
cce04beb 1398Run F<perl> on all core tests (F<t/*> and F<lib/[a-z]*> pragma tests).
a422fd2d 1399
cce04beb 1400(Not available on Win32)
a422fd2d 1401
cce04beb 1402=item test.deparse
a422fd2d 1403
cce04beb 1404Run all the tests through B::Deparse. Not all tests will succeed.
a422fd2d 1405
cce04beb 1406(Not available on Win32)
a422fd2d 1407
cce04beb 1408=item test.taintwarn
a422fd2d 1409
cce04beb
DG
1410Run all tests with the B<-t> command-line switch. Not all tests
1411are expected to succeed (until they're specifically fixed, of course).
a422fd2d 1412
cce04beb 1413(Not available on Win32)
a422fd2d 1414
cce04beb 1415=item minitest
955fec6b 1416
cce04beb
DG
1417Run F<miniperl> on F<t/base>, F<t/comp>, F<t/cmd>, F<t/run>, F<t/io>,
1418F<t/op>, F<t/uni> and F<t/mro> tests.
955fec6b 1419
cce04beb 1420=item test.valgrind check.valgrind utest.valgrind ucheck.valgrind
a422fd2d 1421
cce04beb
DG
1422(Only in Linux) Run all the tests using the memory leak + naughty
1423memory access tool "valgrind". The log files will be named
1424F<testname.valgrind>.
a422fd2d 1425
cce04beb 1426=item test.third check.third utest.third ucheck.third
a422fd2d 1427
cce04beb
DG
1428(Only in Tru64) Run all the tests using the memory leak + naughty
1429memory access tool "Third Degree". The log files will be named
1430F<perl.3log.testname>.
a422fd2d 1431
cce04beb 1432=item test.torture torturetest
a422fd2d 1433
cce04beb
DG
1434Run all the usual tests and some extra tests. As of Perl 5.8.0 the
1435only extra tests are Abigail's JAPHs, F<t/japh/abigail.t>.
a422fd2d 1436
cce04beb
DG
1437You can also run the torture test with F<t/harness> by giving
1438C<-torture> argument to F<t/harness>.
a422fd2d 1439
cce04beb 1440=item utest ucheck test.utf8 check.utf8
a422fd2d 1441
cce04beb 1442Run all the tests with -Mutf8. Not all tests will succeed.
a422fd2d 1443
cce04beb 1444(Not available on Win32)
a422fd2d 1445
cce04beb 1446=item minitest.utf16 test.utf16
a422fd2d 1447
cce04beb
DG
1448Runs the tests with UTF-16 encoded scripts, encoded with different
1449versions of this encoding.
a422fd2d 1450
cce04beb
DG
1451C<make utest.utf16> runs the test suite with a combination of C<-utf8> and
1452C<-utf16> arguments to F<t/TEST>.
a422fd2d 1453
cce04beb 1454(Not available on Win32)
a422fd2d 1455
cce04beb 1456=item test_harness
a422fd2d 1457
cce04beb
DG
1458Run the test suite with the F<t/harness> controlling program, instead of
1459F<t/TEST>. F<t/harness> is more sophisticated, and uses the
1460L<Test::Harness> module, thus using this test target supposes that perl
1461mostly works. The main advantage for our purposes is that it prints a
1462detailed summary of failed tests at the end. Also, unlike F<t/TEST>, it
1463doesn't redirect stderr to stdout.
a422fd2d 1464
cce04beb
DG
1465Note that under Win32 F<t/harness> is always used instead of F<t/TEST>, so
1466there is no special "test_harness" target.
a422fd2d 1467
cce04beb
DG
1468Under Win32's "test" target you may use the TEST_SWITCHES and TEST_FILES
1469environment variables to control the behaviour of F<t/harness>. This means
1470you can say
a422fd2d 1471
cce04beb
DG
1472 nmake test TEST_FILES="op/*.t"
1473 nmake test TEST_SWITCHES="-torture" TEST_FILES="op/*.t"
a422fd2d 1474
cce04beb 1475=item Parallel tests
a422fd2d 1476
cce04beb
DG
1477The core distribution can now run its regression tests in parallel on
1478Unix-like platforms. Instead of running C<make test>, set C<TEST_JOBS> in
1479your environment to the number of tests to run in parallel, and run
1480C<make test_harness>. On a Bourne-like shell, this can be done as
a422fd2d 1481
cce04beb 1482 TEST_JOBS=3 make test_harness # Run 3 tests in parallel
a422fd2d 1483
cce04beb
DG
1484An environment variable is used, rather than parallel make itself, because
1485L<TAP::Harness> needs to be able to schedule individual non-conflicting test
1486scripts itself, and there is no standard interface to C<make> utilities to
1487interact with their job schedulers.
a422fd2d 1488
cce04beb
DG
1489Note that currently some test scripts may fail when run in parallel (most
1490notably C<ext/IO/t/io_dir.t>). If necessary run just the failing scripts
1491again sequentially and see if the failures go away.
1492=item test-notty test_notty
1493
1494Sets PERL_SKIP_TTY_TEST to true before running normal test.
a422fd2d 1495
ffc145e8
RK
1496=back
1497
cce04beb 1498=head2 Running tests by hand
52d59bef 1499
cce04beb
DG
1500You can run part of the test suite by hand by using one the following
1501commands from the F<t/> directory :
a422fd2d 1502
cce04beb 1503 ./perl -I../lib TEST list-of-.t-files
ea031e66 1504
cce04beb 1505or
a422fd2d 1506
cce04beb 1507 ./perl -I../lib harness list-of-.t-files
a422fd2d 1508
cce04beb 1509(if you don't specify test scripts, the whole test suite will be run.)
a422fd2d 1510
cce04beb 1511=head3 Using t/harness for testing
a422fd2d 1512
cce04beb
DG
1513If you use C<harness> for testing you have several command line options
1514available to you. The arguments are as follows, and are in the order
1515that they must appear if used together.
a422fd2d 1516
cce04beb
DG
1517 harness -v -torture -re=pattern LIST OF FILES TO TEST
1518 harness -v -torture -re LIST OF PATTERNS TO MATCH
a422fd2d 1519
cce04beb
DG
1520If C<LIST OF FILES TO TEST> is omitted the file list is obtained from
1521the manifest. The file list may include shell wildcards which will be
1522expanded out.
a422fd2d 1523
cce04beb 1524=over 4
a422fd2d 1525
cce04beb 1526=item -v
a422fd2d 1527
cce04beb
DG
1528Run the tests under verbose mode so you can see what tests were run,
1529and debug output.
a422fd2d 1530
cce04beb 1531=item -torture
a422fd2d 1532
cce04beb 1533Run the torture tests as well as the normal set.
a422fd2d 1534
cce04beb 1535=item -re=PATTERN
a422fd2d 1536
cce04beb
DG
1537Filter the file list so that all the test files run match PATTERN.
1538Note that this form is distinct from the B<-re LIST OF PATTERNS> form below
1539in that it allows the file list to be provided as well.
a422fd2d 1540
cce04beb 1541=item -re LIST OF PATTERNS
a422fd2d 1542
cce04beb
DG
1543Filter the file list so that all the test files run match
1544/(LIST|OF|PATTERNS)/. Note that with this form the patterns
1545are joined by '|' and you cannot supply a list of files, instead
1546the test files are obtained from the MANIFEST.
a422fd2d 1547
cce04beb 1548=back
a422fd2d 1549
cce04beb 1550You can run an individual test by a command similar to
a422fd2d 1551
cce04beb 1552 ./perl -I../lib patho/to/foo.t
a422fd2d 1553
cce04beb
DG
1554except that the harnesses set up some environment variables that may
1555affect the execution of the test :
a422fd2d 1556
cce04beb 1557=over 4
a422fd2d 1558
cce04beb
DG
1559=item PERL_CORE=1
1560
1561indicates that we're running this test part of the perl core test suite.
1562This is useful for modules that have a dual life on CPAN.
1563
1564=item PERL_DESTRUCT_LEVEL=2
1565
1566is set to 2 if it isn't set already (see L</PERL_DESTRUCT_LEVEL>)
1567
1568=item PERL
1569
1570(used only by F<t/TEST>) if set, overrides the path to the perl executable
1571that should be used to run the tests (the default being F<./perl>).
1572
1573=item PERL_SKIP_TTY_TEST
1574
1575if set, tells to skip the tests that need a terminal. It's actually set
1576automatically by the Makefile, but can also be forced artificially by
1577running 'make test_notty'.
1578
1579=back
1580
1581=head3 Other environment variables that may influence tests
1582
1583=over 4
1584
1585=item PERL_TEST_Net_Ping
1586
1587Setting this variable runs all the Net::Ping modules tests,
1588otherwise some tests that interact with the outside world are skipped.
1589See L<perl58delta>.
1590
1591=item PERL_TEST_NOVREXX
1592
1593Setting this variable skips the vrexx.t tests for OS2::REXX.
1594
1595=item PERL_TEST_NUMCONVERTS
1596
1597This sets a variable in op/numconvert.t.
1598
1599=back
1600
1601See also the documentation for the Test and Test::Harness modules,
1602for more environment variables that affect testing.
1603
1604=head1 EXAMPLE OF A SIMPLE PATCH
1605
1606All right, we've now had a look at how to navigate the Perl sources and
1607some things you'll need to know when fiddling with them. Let's now get
a422fd2d 1608on and create a simple patch. Here's something Larry suggested: if a
07aa3531 1609C<U> is the first active format during a C<pack>, (for example,
a422fd2d 1610C<pack "U3C8", @stuff>) then the resulting string should be treated as
1e54db1a 1611UTF-8 encoded.
a422fd2d 1612
168a53cc
DR
1613If you are working with a git clone of the Perl repository, you will want to
1614create a branch for your changes. This will make creating a proper patch much
1615simpler. See the L<perlrepository> for details on how to do this.
1616
cce04beb
DG
1617=head2 Writing the patch
1618
a422fd2d
SC
1619How do we prepare to fix this up? First we locate the code in question -
1620the C<pack> happens at runtime, so it's going to be in one of the F<pp>
1621files. Sure enough, C<pp_pack> is in F<pp.c>. Since we're going to be
1622altering this file, let's copy it to F<pp.c~>.
1623
a6ec74c1
JH
1624[Well, it was in F<pp.c> when this tutorial was written. It has now been
1625split off with C<pp_unpack> to its own file, F<pp_pack.c>]
1626
a422fd2d
SC
1627Now let's look over C<pp_pack>: we take a pattern into C<pat>, and then
1628loop over the pattern, taking each format character in turn into
1629C<datum_type>. Then for each possible format character, we swallow up
1630the other arguments in the pattern (a field width, an asterisk, and so
1631on) and convert the next chunk input into the specified format, adding
1632it onto the output SV C<cat>.
1633
1634How do we know if the C<U> is the first format in the C<pat>? Well, if
1635we have a pointer to the start of C<pat> then, if we see a C<U> we can
1636test whether we're still at the start of the string. So, here's where
1637C<pat> is set up:
1638
1639 STRLEN fromlen;
1640 register char *pat = SvPVx(*++MARK, fromlen);
1641 register char *patend = pat + fromlen;
1642 register I32 len;
1643 I32 datumtype;
1644 SV *fromstr;
1645
1646We'll have another string pointer in there:
1647
1648 STRLEN fromlen;
1649 register char *pat = SvPVx(*++MARK, fromlen);
1650 register char *patend = pat + fromlen;
1651 + char *patcopy;
1652 register I32 len;
1653 I32 datumtype;
1654 SV *fromstr;
1655
1656And just before we start the loop, we'll set C<patcopy> to be the start
1657of C<pat>:
1658
1659 items = SP - MARK;
1660 MARK++;
1661 sv_setpvn(cat, "", 0);
1662 + patcopy = pat;
1663 while (pat < patend) {
1664
1665Now if we see a C<U> which was at the start of the string, we turn on
1e54db1a 1666the C<UTF8> flag for the output SV, C<cat>:
a422fd2d
SC
1667
1668 + if (datumtype == 'U' && pat==patcopy+1)
1669 + SvUTF8_on(cat);
1670 if (datumtype == '#') {
1671 while (pat < patend && *pat != '\n')
1672 pat++;
1673
1674Remember that it has to be C<patcopy+1> because the first character of
1675the string is the C<U> which has been swallowed into C<datumtype!>
1676
1677Oops, we forgot one thing: what if there are spaces at the start of the
1678pattern? C<pack(" U*", @stuff)> will have C<U> as the first active
1679character, even though it's not the first thing in the pattern. In this
1680case, we have to advance C<patcopy> along with C<pat> when we see spaces:
1681
1682 if (isSPACE(datumtype))
1683 continue;
1684
1685needs to become
1686
1687 if (isSPACE(datumtype)) {
1688 patcopy++;
1689 continue;
1690 }
1691
1692OK. That's the C part done. Now we must do two additional things before
1693this patch is ready to go: we've changed the behaviour of Perl, and so
1694we must document that change. We must also provide some more regression
1695tests to make sure our patch works and doesn't create a bug somewhere
1696else along the line.
1697
cce04beb
DG
1698=head2 Testing the patch
1699
b23b8711
MS
1700The regression tests for each operator live in F<t/op/>, and so we
1701make a copy of F<t/op/pack.t> to F<t/op/pack.t~>. Now we can add our
1702tests to the end. First, we'll test that the C<U> does indeed create
07aa3531 1703Unicode strings.
b23b8711
MS
1704
1705t/op/pack.t has a sensible ok() function, but if it didn't we could
35c336e6 1706use the one from t/test.pl.
b23b8711 1707
35c336e6
MS
1708 require './test.pl';
1709 plan( tests => 159 );
b23b8711
MS
1710
1711so instead of this:
a422fd2d 1712
195c30ce
KW
1713 print 'not ' unless "1.20.300.4000" eq sprintf "%vd",
1714 pack("U*",1,20,300,4000);
a422fd2d
SC
1715 print "ok $test\n"; $test++;
1716
35c336e6
MS
1717we can write the more sensible (see L<Test::More> for a full
1718explanation of is() and other testing functions).
b23b8711 1719
07aa3531 1720 is( "1.20.300.4000", sprintf "%vd", pack("U*",1,20,300,4000),
38a44b82 1721 "U* produces Unicode" );
b23b8711 1722
a422fd2d
SC
1723Now we'll test that we got that space-at-the-beginning business right:
1724
35c336e6 1725 is( "1.20.300.4000", sprintf "%vd", pack(" U*",1,20,300,4000),
195c30ce 1726 " with spaces at the beginning" );
a422fd2d
SC
1727
1728And finally we'll test that we don't make Unicode strings if C<U> is B<not>
1729the first active format:
1730
35c336e6 1731 isnt( v1.20.300.4000, sprintf "%vd", pack("C0U*",1,20,300,4000),
38a44b82 1732 "U* not first isn't Unicode" );
a422fd2d 1733
35c336e6
MS
1734Mustn't forget to change the number of tests which appears at the top,
1735or else the automated tester will get confused. This will either look
1736like this:
a422fd2d 1737
35c336e6
MS
1738 print "1..156\n";
1739
1740or this:
1741
1742 plan( tests => 156 );
a422fd2d
SC
1743
1744We now compile up Perl, and run it through the test suite. Our new
1745tests pass, hooray!
1746
cce04beb
DG
1747=head2 Documenting the patch
1748
a422fd2d
SC
1749Finally, the documentation. The job is never done until the paperwork is
1750over, so let's describe the change we've just made. The relevant place
1751is F<pod/perlfunc.pod>; again, we make a copy, and then we'll insert
1752this text in the description of C<pack>:
1753
1754 =item *
1755
1756 If the pattern begins with a C<U>, the resulting string will be treated
1e54db1a
JH
1757 as UTF-8-encoded Unicode. You can force UTF-8 encoding on in a string
1758 with an initial C<U0>, and the bytes that follow will be interpreted as
195c30ce
KW
1759 Unicode characters. If you don't want this to happen, you can begin
1760 your pattern with C<C0> (or anything else) to force Perl not to UTF-8
1761 encode your string, and then follow this with a C<U*> somewhere in your
1762 pattern.
a422fd2d 1763
cce04beb 1764=head1 COMMON PROBLEMS
f7e1e956 1765
cce04beb
DG
1766Perl source plays by ANSI C89 rules: no C99 (or C++) extensions. In
1767some cases we have to take pre-ANSI requirements into consideration.
1768You don't care about some particular platform having broken Perl?
1769I hear there is still a strong demand for J2EE programmers.
f7e1e956 1770
cce04beb 1771=head2 Perl environment problems
db300100 1772
cce04beb 1773=over 4
acbe17fc 1774
cce04beb 1775=item *
acbe17fc 1776
cce04beb 1777Not compiling with threading
acbe17fc 1778
cce04beb
DG
1779Compiling with threading (-Duseithreads) completely rewrites
1780the function prototypes of Perl. You better try your changes
1781with that. Related to this is the difference between "Perl_-less"
1782and "Perl_-ly" APIs, for example:
acbe17fc 1783
cce04beb
DG
1784 Perl_sv_setiv(aTHX_ ...);
1785 sv_setiv(...);
acbe17fc 1786
cce04beb
DG
1787The first one explicitly passes in the context, which is needed for e.g.
1788threaded builds. The second one does that implicitly; do not get them
1789mixed. If you are not passing in a aTHX_, you will need to do a dTHX
1790(or a dVAR) as the first thing in the function.
acbe17fc 1791
cce04beb
DG
1792See L<perlguts/"How multiple interpreters and concurrency are supported">
1793for further discussion about context.
acbe17fc 1794
cce04beb 1795=item *
f7e1e956 1796
cce04beb 1797Not compiling with -DDEBUGGING
f7e1e956 1798
cce04beb
DG
1799The DEBUGGING define exposes more code to the compiler,
1800therefore more ways for things to go wrong. You should try it.
f7e1e956 1801
cce04beb 1802=item *
f7e1e956 1803
cce04beb 1804Introducing (non-read-only) globals
f7e1e956 1805
cce04beb
DG
1806Do not introduce any modifiable globals, truly global or file static.
1807They are bad form and complicate multithreading and other forms of
1808concurrency. The right way is to introduce them as new interpreter
1809variables, see F<intrpvar.h> (at the very end for binary compatibility).
628f0a0a 1810
cce04beb
DG
1811Introducing read-only (const) globals is okay, as long as you verify
1812with e.g. C<nm libperl.a|egrep -v ' [TURtr] '> (if your C<nm> has
1813BSD-style output) that the data you added really is read-only.
1814(If it is, it shouldn't show up in the output of that command.)
d5f28025 1815
cce04beb 1816If you want to have static strings, make them constant:
f7e1e956 1817
cce04beb 1818 static const char etc[] = "...";
f7e1e956 1819
cce04beb
DG
1820If you want to have arrays of constant strings, note carefully
1821the right combination of C<const>s:
f7e1e956 1822
cce04beb
DG
1823 static const char * const yippee[] =
1824 {"hi", "ho", "silver"};
f7e1e956 1825
cce04beb
DG
1826There is a way to completely hide any modifiable globals (they are all
1827moved to heap), the compilation setting C<-DPERL_GLOBAL_STRUCT_PRIVATE>.
1828It is not normally used, but can be used for testing, read more
1829about it in L<perlguts/"Background and PERL_IMPLICIT_CONTEXT">.
f7e1e956 1830
cce04beb 1831=item *
f7e1e956 1832
cce04beb 1833Not exporting your new function
f7e1e956 1834
cce04beb
DG
1835Some platforms (Win32, AIX, VMS, OS/2, to name a few) require any
1836function that is part of the public API (the shared Perl library)
1837to be explicitly marked as exported. See the discussion about
1838F<embed.pl> in L<perlguts>.
f7e1e956 1839
cce04beb 1840=item *
f7e1e956 1841
cce04beb 1842Exporting your new function
f7e1e956 1843
cce04beb
DG
1844The new shiny result of either genuine new functionality or your
1845arduous refactoring is now ready and correctly exported. So what
1846could possibly go wrong?
f7e1e956 1847
cce04beb
DG
1848Maybe simply that your function did not need to be exported in the
1849first place. Perl has a long and not so glorious history of exporting
1850functions that it should not have.
3c295041 1851
cce04beb
DG
1852If the function is used only inside one source code file, make it
1853static. See the discussion about F<embed.pl> in L<perlguts>.
3c295041 1854
cce04beb
DG
1855If the function is used across several files, but intended only for
1856Perl's internal use (and this should be the common case), do not
1857export it to the public API. See the discussion about F<embed.pl>
1858in L<perlguts>.
f7e1e956 1859
cce04beb 1860=back
f7e1e956 1861
cce04beb 1862=head2 Portability problems
a4499558 1863
cce04beb
DG
1864The following are common causes of compilation and/or execution
1865failures, not common to Perl as such. The C FAQ is good bedtime
1866reading. Please test your changes with as many C compilers and
1867platforms as possible; we will, anyway, and it's nice to save
1868oneself from public embarrassment.
a4499558 1869
cce04beb
DG
1870If using gcc, you can add the C<-std=c89> option which will hopefully
1871catch most of these unportabilities. (However it might also catch
1872incompatibilities in your system's header files.)
f7e1e956 1873
cce04beb
DG
1874Use the Configure C<-Dgccansipedantic> flag to enable the gcc
1875C<-ansi -pedantic> flags which enforce stricter ANSI rules.
f7e1e956 1876
cce04beb
DG
1877If using the C<gcc -Wall> note that not all the possible warnings
1878(like C<-Wunitialized>) are given unless you also compile with C<-O>.
244d9cb7 1879
cce04beb
DG
1880Note that if using gcc, starting from Perl 5.9.5 the Perl core source
1881code files (the ones at the top level of the source code distribution,
1882but not e.g. the extensions under ext/) are automatically compiled
1883with as many as possible of the C<-std=c89>, C<-ansi>, C<-pedantic>,
1884and a selection of C<-W> flags (see cflags.SH).
244d9cb7 1885
cce04beb
DG
1886Also study L<perlport> carefully to avoid any bad assumptions
1887about the operating system, filesystems, and so forth.
244d9cb7 1888
cce04beb
DG
1889You may once in a while try a "make microperl" to see whether we
1890can still compile Perl with just the bare minimum of interfaces.
1891(See README.micro.)
244d9cb7 1892
cce04beb 1893Do not assume an operating system indicates a certain compiler.
244d9cb7 1894
cce04beb 1895=over 4
244d9cb7 1896
cce04beb 1897=item *
f7e1e956 1898
cce04beb 1899Casting pointers to integers or casting integers to pointers
f7e1e956 1900
cce04beb
DG
1901 void castaway(U8* p)
1902 {
1903 IV i = p;
35c336e6 1904
cce04beb 1905or
35c336e6 1906
cce04beb
DG
1907 void castaway(U8* p)
1908 {
1909 IV i = (IV)p;
35c336e6 1910
cce04beb
DG
1911Both are bad, and broken, and unportable. Use the PTR2IV()
1912macro that does it right. (Likewise, there are PTR2UV(), PTR2NV(),
1913INT2PTR(), and NUM2PTR().)
35c336e6 1914
cce04beb 1915=item *
35c336e6 1916
cce04beb 1917Casting between data function pointers and data pointers
35c336e6 1918
cce04beb
DG
1919Technically speaking casting between function pointers and data
1920pointers is unportable and undefined, but practically speaking
1921it seems to work, but you should use the FPTR2DPTR() and DPTR2FPTR()
1922macros. Sometimes you can also play games with unions.
35c336e6 1923
cce04beb 1924=item *
35c336e6 1925
cce04beb 1926Assuming sizeof(int) == sizeof(long)
35c336e6 1927
cce04beb
DG
1928There are platforms where longs are 64 bits, and platforms where ints
1929are 64 bits, and while we are out to shock you, even platforms where
1930shorts are 64 bits. This is all legal according to the C standard.
1931(In other words, "long long" is not a portable way to specify 64 bits,
1932and "long long" is not even guaranteed to be any wider than "long".)
f7e1e956 1933
cce04beb
DG
1934Instead, use the definitions IV, UV, IVSIZE, I32SIZE, and so forth.
1935Avoid things like I32 because they are B<not> guaranteed to be
1936I<exactly> 32 bits, they are I<at least> 32 bits, nor are they
1937guaranteed to be B<int> or B<long>. If you really explicitly need
193864-bit variables, use I64 and U64, but only if guarded by HAS_QUAD.
f7e1e956 1939
cce04beb 1940=item *
f7e1e956 1941
cce04beb 1942Assuming one can dereference any type of pointer for any type of data
e018f8be 1943
cce04beb
DG
1944 char *p = ...;
1945 long pony = *p; /* BAD */
e018f8be 1946
cce04beb
DG
1947Many platforms, quite rightly so, will give you a core dump instead
1948of a pony if the p happens not be correctly aligned.
e018f8be 1949
cce04beb 1950=item *
e018f8be 1951
cce04beb 1952Lvalue casts
e018f8be 1953
cce04beb 1954 (int)*p = ...; /* BAD */
7205a85d 1955
cce04beb
DG
1956Simply not portable. Get your lvalue to be of the right type,
1957or maybe use temporary variables, or dirty tricks with unions.
e018f8be 1958
cce04beb 1959=item *
b26492ee 1960
cce04beb
DG
1961Assume B<anything> about structs (especially the ones you
1962don't control, like the ones coming from the system headers)
7205a85d 1963
cce04beb 1964=over 8
b26492ee 1965
cce04beb 1966=item *
e018f8be 1967
cce04beb 1968That a certain field exists in a struct
7205a85d 1969
cce04beb 1970=item *
e018f8be 1971
cce04beb 1972That no other fields exist besides the ones you know of
e018f8be 1973
cce04beb 1974=item *
7a834142 1975
cce04beb 1976That a field is of certain signedness, sizeof, or type
7a834142 1977
cce04beb 1978=item *
e018f8be 1979
cce04beb 1980That the fields are in a certain order
e018f8be 1981
cce04beb 1982=over 8
e018f8be 1983
cce04beb 1984=item *
e018f8be 1985
cce04beb
DG
1986While C guarantees the ordering specified in the struct definition,
1987between different platforms the definitions might differ
e018f8be 1988
cce04beb 1989=back
e018f8be 1990
cce04beb 1991=item *
e018f8be 1992
cce04beb 1993That the sizeof(struct) or the alignments are the same everywhere
7205a85d 1994
cce04beb 1995=over 8
cc0710ff 1996
cce04beb 1997=item *
cc0710ff 1998
cce04beb
DG
1999There might be padding bytes between the fields to align the fields -
2000the bytes can be anything
cc0710ff 2001
cce04beb 2002=item *
7205a85d 2003
cce04beb
DG
2004Structs are required to be aligned to the maximum alignment required
2005by the fields - which for native types is for usually equivalent to
2006sizeof() of the field
244d9cb7 2007
cce04beb 2008=back
244d9cb7 2009
cce04beb 2010=back
7205a85d 2011
cce04beb 2012=item *
7205a85d 2013
cce04beb 2014Assuming the character set is ASCIIish
7205a85d 2015
cce04beb
DG
2016Perl can compile and run under EBCDIC platforms. See L<perlebcdic>.
2017This is transparent for the most part, but because the character sets
2018differ, you shouldn't use numeric (decimal, octal, nor hex) constants
2019to refer to characters. You can safely say 'A', but not 0x41.
2020You can safely say '\n', but not \012.
2021If a character doesn't have a trivial input form, you can
2022create a #define for it in both C<utfebcdic.h> and C<utf8.h>, so that
2023it resolves to different values depending on the character set being used.
2024(There are three different EBCDIC character sets defined in C<utfebcdic.h>,
2025so it might be best to insert the #define three times in that file.)
a75f557c 2026
cce04beb
DG
2027Also, the range 'A' - 'Z' in ASCII is an unbroken sequence of 26 upper case
2028alphabetic characters. That is not true in EBCDIC. Nor for 'a' to 'z'.
2029But '0' - '9' is an unbroken range in both systems. Don't assume anything
2030about other ranges.
a75f557c 2031
cce04beb
DG
2032Many of the comments in the existing code ignore the possibility of EBCDIC,
2033and may be wrong therefore, even if the code works.
2034This is actually a tribute to the successful transparent insertion of being
2035able to handle EBCDIC without having to change pre-existing code.
a75f557c 2036
cce04beb
DG
2037UTF-8 and UTF-EBCDIC are two different encodings used to represent Unicode
2038code points as sequences of bytes. Macros
2039with the same names (but different definitions)
2040in C<utf8.h> and C<utfebcdic.h>
2041are used to allow the calling code to think that there is only one such
2042encoding.
2043This is almost always referred to as C<utf8>, but it means the EBCDIC version
2044as well. Again, comments in the code may well be wrong even if the code itself
2045is right.
2046For example, the concept of C<invariant characters> differs between ASCII and
2047EBCDIC.
2048On ASCII platforms, only characters that do not have the high-order
2049bit set (i.e. whose ordinals are strict ASCII, 0 - 127)
2050are invariant, and the documentation and comments in the code
2051may assume that,
2052often referring to something like, say, C<hibit>.
2053The situation differs and is not so simple on EBCDIC machines, but as long as
2054the code itself uses the C<NATIVE_IS_INVARIANT()> macro appropriately, it
2055works, even if the comments are wrong.
a75f557c 2056
cce04beb 2057=item *
7205a85d 2058
cce04beb 2059Assuming the character set is just ASCII
7205a85d 2060
cce04beb
DG
2061ASCII is a 7 bit encoding, but bytes have 8 bits in them. The 128 extra
2062characters have different meanings depending on the locale. Absent a locale,
2063currently these extra characters are generally considered to be unassigned,
2064and this has presented some problems.
2065This is being changed starting in 5.12 so that these characters will
2066be considered to be Latin-1 (ISO-8859-1).
244d9cb7 2067
cce04beb 2068=item *
244d9cb7 2069
cce04beb 2070Mixing #define and #ifdef
244d9cb7 2071
cce04beb
DG
2072 #define BURGLE(x) ... \
2073 #ifdef BURGLE_OLD_STYLE /* BAD */
2074 ... do it the old way ... \
2075 #else
2076 ... do it the new way ... \
2077 #endif
244d9cb7 2078
cce04beb
DG
2079You cannot portably "stack" cpp directives. For example in the above
2080you need two separate BURGLE() #defines, one for each #ifdef branch.
244d9cb7 2081
cce04beb 2082=item *
244d9cb7 2083
cce04beb 2084Adding non-comment stuff after #endif or #else
244d9cb7 2085
cce04beb
DG
2086 #ifdef SNOSH
2087 ...
2088 #else !SNOSH /* BAD */
2089 ...
2090 #endif SNOSH /* BAD */
7205a85d 2091
cce04beb
DG
2092The #endif and #else cannot portably have anything non-comment after
2093them. If you want to document what is going (which is a good idea
2094especially if the branches are long), use (C) comments:
7205a85d 2095
cce04beb
DG
2096 #ifdef SNOSH
2097 ...
2098 #else /* !SNOSH */
2099 ...
2100 #endif /* SNOSH */
7205a85d 2101
cce04beb
DG
2102The gcc option C<-Wendif-labels> warns about the bad variant
2103(by default on starting from Perl 5.9.4).
7205a85d 2104
cce04beb 2105=item *
7205a85d 2106
cce04beb 2107Having a comma after the last element of an enum list
7205a85d 2108
cce04beb
DG
2109 enum color {
2110 CERULEAN,
2111 CHARTREUSE,
2112 CINNABAR, /* BAD */
2113 };
7205a85d 2114
cce04beb 2115is not portable. Leave out the last comma.
7205a85d 2116
cce04beb
DG
2117Also note that whether enums are implicitly morphable to ints
2118varies between compilers, you might need to (int).
7205a85d 2119
cce04beb 2120=item *
7205a85d 2121
cce04beb 2122Using //-comments
7205a85d 2123
cce04beb 2124 // This function bamfoodles the zorklator. /* BAD */
7205a85d 2125
cce04beb
DG
2126That is C99 or C++. Perl is C89. Using the //-comments is silently
2127allowed by many C compilers but cranking up the ANSI C89 strictness
2128(which we like to do) causes the compilation to fail.
7205a85d 2129
cce04beb 2130=item *
7205a85d 2131
cce04beb 2132Mixing declarations and code
244d9cb7 2133
cce04beb
DG
2134 void zorklator()
2135 {
2136 int n = 3;
2137 set_zorkmids(n); /* BAD */
2138 int q = 4;
244d9cb7 2139
cce04beb 2140That is C99 or C++. Some C compilers allow that, but you shouldn't.
244d9cb7 2141
cce04beb
DG
2142The gcc option C<-Wdeclaration-after-statements> scans for such problems
2143(by default on starting from Perl 5.9.4).
244d9cb7 2144
cce04beb 2145=item *
244d9cb7 2146
cce04beb 2147Introducing variables inside for()
244d9cb7 2148
cce04beb 2149 for(int i = ...; ...; ...) { /* BAD */
244d9cb7 2150
cce04beb
DG
2151That is C99 or C++. While it would indeed be awfully nice to have that
2152also in C89, to limit the scope of the loop variable, alas, we cannot.
244d9cb7 2153
cce04beb 2154=item *
244d9cb7 2155
cce04beb 2156Mixing signed char pointers with unsigned char pointers
244d9cb7 2157
cce04beb
DG
2158 int foo(char *s) { ... }
2159 ...
2160 unsigned char *t = ...; /* Or U8* t = ... */
2161 foo(t); /* BAD */
244d9cb7 2162
cce04beb
DG
2163While this is legal practice, it is certainly dubious, and downright
2164fatal in at least one platform: for example VMS cc considers this a
2165fatal error. One cause for people often making this mistake is that a
2166"naked char" and therefore dereferencing a "naked char pointer" have
2167an undefined signedness: it depends on the compiler and the flags of
2168the compiler and the underlying platform whether the result is signed
2169or unsigned. For this very same reason using a 'char' as an array
2170index is bad.
244d9cb7 2171
cce04beb 2172=item *
f7e1e956 2173
cce04beb
DG
2174Macros that have string constants and their arguments as substrings of
2175the string constants
7cd58830 2176
cce04beb
DG
2177 #define FOO(n) printf("number = %d\n", n) /* BAD */
2178 FOO(10);
7cd58830 2179
cce04beb 2180Pre-ANSI semantics for that was equivalent to
7cd58830 2181
cce04beb 2182 printf("10umber = %d\10");
7cd58830 2183
cce04beb
DG
2184which is probably not what you were expecting. Unfortunately at least
2185one reasonably common and modern C compiler does "real backward
2186compatibility" here, in AIX that is what still happens even though the
2187rest of the AIX compiler is very happily C89.
7cd58830 2188
cce04beb 2189=item *
7cd58830 2190
cce04beb 2191Using printf formats for non-basic C types
7cd58830 2192
cce04beb
DG
2193 IV i = ...;
2194 printf("i = %d\n", i); /* BAD */
7cd58830 2195
cce04beb
DG
2196While this might by accident work in some platform (where IV happens
2197to be an C<int>), in general it cannot. IV might be something larger.
2198Even worse the situation is with more specific types (defined by Perl's
2199configuration step in F<config.h>):
7cd58830 2200
cce04beb
DG
2201 Uid_t who = ...;
2202 printf("who = %d\n", who); /* BAD */
7cd58830 2203
cce04beb
DG
2204The problem here is that Uid_t might be not only not C<int>-wide
2205but it might also be unsigned, in which case large uids would be
2206printed as negative values.
d7889f52 2207
cce04beb
DG
2208There is no simple solution to this because of printf()'s limited
2209intelligence, but for many types the right format is available as
2210with either 'f' or '_f' suffix, for example:
d7889f52 2211
cce04beb
DG
2212 IVdf /* IV in decimal */
2213 UVxf /* UV is hexadecimal */
d7889f52 2214
cce04beb 2215 printf("i = %"IVdf"\n", i); /* The IVdf is a string constant. */
d7889f52 2216
cce04beb 2217 Uid_t_f /* Uid_t in decimal */
d7889f52 2218
cce04beb 2219 printf("who = %"Uid_t_f"\n", who);
d7889f52 2220
cce04beb 2221Or you can try casting to a "wide enough" type:
d7889f52 2222
cce04beb 2223 printf("i = %"IVdf"\n", (IV)something_very_small_and_signed);
d7889f52 2224
cce04beb 2225Also remember that the C<%p> format really does require a void pointer:
d7889f52 2226
cce04beb
DG
2227 U8* p = ...;
2228 printf("p = %p\n", (void*)p);
2229
2230The gcc option C<-Wformat> scans for such problems.
d7889f52
JH
2231
2232=item *
2233
cce04beb 2234Blindly using variadic macros
d7889f52 2235
cce04beb
DG
2236gcc has had them for a while with its own syntax, and C99 brought
2237them with a standardized syntax. Don't use the former, and use
2238the latter only if the HAS_C99_VARIADIC_MACROS is defined.
d7889f52
JH
2239
2240=item *
2241
cce04beb 2242Blindly passing va_list
ee9468a2 2243
cce04beb
DG
2244Not all platforms support passing va_list to further varargs (stdarg)
2245functions. The right thing to do is to copy the va_list using the
2246Perl_va_copy() if the NEED_VA_COPY is defined.
ee9468a2 2247
cce04beb 2248=item *
ee9468a2 2249
cce04beb 2250Using gcc statement expressions
ee9468a2 2251
cce04beb 2252 val = ({...;...;...}); /* BAD */
ee9468a2 2253
cce04beb
DG
2254While a nice extension, it's not portable. The Perl code does
2255admittedly use them if available to gain some extra speed
2256(essentially as a funky form of inlining), but you shouldn't.
bc028b6b 2257
ee9468a2
RGS
2258=item *
2259
cce04beb 2260Binding together several statements in a macro
d7889f52 2261
cce04beb
DG
2262Use the macros STMT_START and STMT_END.
2263
2264 STMT_START {
2265 ...
2266 } STMT_END
d7889f52
JH
2267
2268=item *
2269
cce04beb 2270Testing for operating systems or versions when should be testing for features
d7889f52 2271
cce04beb
DG
2272 #ifdef __FOONIX__ /* BAD */
2273 foo = quux();
2274 #endif
d7889f52 2275
cce04beb
DG
2276Unless you know with 100% certainty that quux() is only ever available
2277for the "Foonix" operating system B<and> that is available B<and>
2278correctly working for B<all> past, present, B<and> future versions of
2279"Foonix", the above is very wrong. This is more correct (though still
2280not perfect, because the below is a compile-time check):
d7889f52 2281
cce04beb
DG
2282 #ifdef HAS_QUUX
2283 foo = quux();
2284 #endif
d7889f52 2285
cce04beb
DG
2286How does the HAS_QUUX become defined where it needs to be? Well, if
2287Foonix happens to be Unixy enough to be able to run the Configure
2288script, and Configure has been taught about detecting and testing
2289quux(), the HAS_QUUX will be correctly defined. In other platforms,
2290the corresponding configuration step will hopefully do the same.
d7889f52 2291
cce04beb
DG
2292In a pinch, if you cannot wait for Configure to be educated,
2293or if you have a good hunch of where quux() might be available,
2294you can temporarily try the following:
d7889f52 2295
cce04beb
DG
2296 #if (defined(__FOONIX__) || defined(__BARNIX__))
2297 # define HAS_QUUX
2298 #endif
d7889f52 2299
cce04beb 2300 ...
0bec6c03 2301
cce04beb
DG
2302 #ifdef HAS_QUUX
2303 foo = quux();
2304 #endif
d1307786 2305
cce04beb 2306But in any case, try to keep the features and operating systems separate.
0bec6c03 2307
cce04beb 2308=back
ee9468a2 2309
cce04beb 2310=head2 Problematic System Interfaces
d7889f52
JH
2311
2312=over 4
2313
2314=item *
2315
cce04beb
DG
2316malloc(0), realloc(0), calloc(0, 0) are non-portable. To be portable
2317allocate at least one byte. (In general you should rarely need to
2318work at this low level, but instead use the various malloc wrappers.)
27565cb6
JH
2319
2320=item *
2321
cce04beb 2322snprintf() - the return type is unportable. Use my_snprintf() instead.
27565cb6 2323
cce04beb 2324=back
27565cb6 2325
cce04beb 2326=head2 Security problems
27565cb6 2327
cce04beb 2328Last but not least, here are various tips for safer coding.
27565cb6 2329
cce04beb 2330=over 4
606fd33d 2331
27565cb6
JH
2332=item *
2333
cce04beb 2334Do not use gets()
606fd33d 2335
cce04beb 2336Or we will publicly ridicule you. Seriously.
27565cb6
JH
2337
2338=item *
2339
cce04beb 2340Do not use strcpy() or strcat() or strncpy() or strncat()
606fd33d 2341
cce04beb
DG
2342Use my_strlcpy() and my_strlcat() instead: they either use the native
2343implementation, or Perl's own implementation (borrowed from the public
2344domain implementation of INN).
27565cb6
JH
2345
2346=item *
2347
cce04beb 2348Do not use sprintf() or vsprintf()
606fd33d 2349
cce04beb
DG
2350If you really want just plain byte strings, use my_snprintf()
2351and my_vsnprintf() instead, which will try to use snprintf() and
2352vsnprintf() if those safer APIs are available. If you want something
2353fancier than a plain byte string, use SVs and Perl_sv_catpvf().
606fd33d
JH
2354
2355=back
27565cb6 2356
d7889f52 2357
cce04beb 2358=head1 DEBUGGING
d7889f52 2359
cce04beb
DG
2360You can compile a special debugging version of Perl, which allows you
2361to use the C<-D> option of Perl to tell more about what Perl is doing.
2362But sometimes there is no alternative than to dive in with a debugger,
2363either to see the stack trace of a core dump (very useful in a bug
2364report), or trying to figure out what went wrong before the core dump
2365happened, or how did we end up having wrong or unexpected results.
2bbc8d55 2366
cce04beb 2367=head2 Poking at Perl
2bbc8d55 2368
cce04beb
DG
2369To really poke around with Perl, you'll probably want to build Perl for
2370debugging, like this:
2bbc8d55 2371
cce04beb
DG
2372 ./Configure -d -D optimize=-g
2373 make
2bbc8d55 2374
cce04beb
DG
2375C<-g> is a flag to the C compiler to have it produce debugging
2376information which will allow us to step through a running program,
2377and to see in which C function we are at (without the debugging
2378information we might see only the numerical addresses of the functions,
2379which is not very helpful).
2bbc8d55 2380
cce04beb
DG
2381F<Configure> will also turn on the C<DEBUGGING> compilation symbol which
2382enables all the internal debugging code in Perl. There are a whole bunch
2383of things you can debug with this: L<perlrun> lists them all, and the
2384best way to find out about them is to play about with them. The most
2385useful options are probably
2bbc8d55 2386
cce04beb
DG
2387 l Context (loop) stack processing
2388 t Trace execution
2389 o Method and overloading resolution
2390 c String/numeric conversions
2bbc8d55 2391
cce04beb
DG
2392Some of the functionality of the debugging code can be achieved using XS
2393modules.
2bbc8d55 2394
cce04beb
DG
2395 -Dr => use re 'debug'
2396 -Dx => use O 'Debug'
2bbc8d55 2397
cce04beb 2398=head2 Using a source-level debugger
0bec6c03 2399
cce04beb
DG
2400If the debugging output of C<-D> doesn't help you, it's time to step
2401through perl's execution with a source-level debugger.
0bec6c03 2402
cce04beb 2403=over 3
ee9468a2
RGS
2404
2405=item *
2406
cce04beb
DG
2407We'll use C<gdb> for our examples here; the principles will apply to
2408any debugger (many vendors call their debugger C<dbx>), but check the
2409manual of the one you're using.
ee9468a2 2410
cce04beb 2411=back
0bec6c03 2412
cce04beb 2413To fire up the debugger, type
0bec6c03 2414
cce04beb 2415 gdb ./perl
27565cb6 2416
cce04beb 2417Or if you have a core dump:
27565cb6 2418
cce04beb 2419 gdb ./perl core
27565cb6 2420
cce04beb
DG
2421You'll want to do that in your Perl source tree so the debugger can read
2422the source code. You should see the copyright message, followed by the
2423prompt.
27565cb6 2424
cce04beb 2425 (gdb)
27565cb6 2426
cce04beb
DG
2427C<help> will get you into the documentation, but here are the most
2428useful commands:
d7889f52 2429
cce04beb 2430=over 3
d7889f52 2431
cce04beb 2432=item run [args]
d7889f52 2433
cce04beb 2434Run the program with the given arguments.
d7889f52 2435
cce04beb 2436=item break function_name
d7889f52 2437
cce04beb 2438=item break source.c:xxx
d7889f52 2439
cce04beb
DG
2440Tells the debugger that we'll want to pause execution when we reach
2441either the named function (but see L<perlguts/Internal Functions>!) or the given
2442line in the named source file.
0bec6c03 2443
cce04beb 2444=item step
63796a85 2445
cce04beb 2446Steps through the program a line at a time.
0bec6c03 2447
cce04beb 2448=item next
0bec6c03 2449
cce04beb
DG
2450Steps through the program a line at a time, without descending into
2451functions.
0bec6c03 2452
cce04beb 2453=item continue
d7889f52 2454
cce04beb 2455Run until the next breakpoint.
d7889f52 2456
cce04beb 2457=item finish
d7889f52 2458
cce04beb 2459Run until the end of the current function, then stop again.
d7889f52 2460
cce04beb 2461=item 'enter'
d7889f52 2462
cce04beb
DG
2463Just pressing Enter will do the most recent operation again - it's a
2464blessing when stepping through miles of source code.
d7889f52 2465
cce04beb 2466=item print
d7889f52 2467
cce04beb
DG
2468Execute the given C code and print its results. B<WARNING>: Perl makes
2469heavy use of macros, and F<gdb> does not necessarily support macros
2470(see later L</"gdb macro support">). You'll have to substitute them
2471yourself, or to invoke cpp on the source code files
2472(see L</"The .i Targets">)
2473So, for instance, you can't say
d7889f52 2474
cce04beb 2475 print SvPV_nolen(sv)
d7889f52 2476
cce04beb 2477but you have to say
d7889f52 2478
cce04beb 2479 print Perl_sv_2pv_nolen(sv)
0bec6c03 2480
cce04beb 2481=back
0bec6c03 2482
cce04beb
DG
2483You may find it helpful to have a "macro dictionary", which you can
2484produce by saying C<cpp -dM perl.c | sort>. Even then, F<cpp> won't
2485recursively apply those macros for you.
ee9468a2 2486
cce04beb 2487=head2 gdb macro support
ee9468a2 2488
cce04beb
DG
2489Recent versions of F<gdb> have fairly good macro support, but
2490in order to use it you'll need to compile perl with macro definitions
2491included in the debugging information. Using F<gcc> version 3.1, this
2492means configuring with C<-Doptimize=-g3>. Other compilers might use a
2493different switch (if they support debugging macros at all).
ee9468a2 2494
cce04beb 2495=head2 Dumping Perl Data Structures
ee9468a2 2496
cce04beb
DG
2497One way to get around this macro hell is to use the dumping functions in
2498F<dump.c>; these work a little like an internal
2499L<Devel::Peek|Devel::Peek>, but they also cover OPs and other structures
2500that you can't get at from Perl. Let's take an example. We'll use the
2501C<$a = $b + $c> we used before, but give it a bit of context:
2502C<$b = "6XXXX"; $c = 2.3;>. Where's a good place to stop and poke around?
ee9468a2 2503
cce04beb
DG
2504What about C<pp_add>, the function we examined earlier to implement the
2505C<+> operator:
ee9468a2 2506
cce04beb
DG
2507 (gdb) break Perl_pp_add
2508 Breakpoint 1 at 0x46249f: file pp_hot.c, line 309.
ee9468a2 2509
cce04beb
DG
2510Notice we use C<Perl_pp_add> and not C<pp_add> - see L<perlguts/Internal Functions>.
2511With the breakpoint in place, we can run our program:
ee9468a2 2512
cce04beb 2513 (gdb) run -e '$b = "6XXXX"; $c = 2.3; $a = $b + $c'
ee9468a2 2514
cce04beb
DG
2515Lots of junk will go past as gdb reads in the relevant source files and
2516libraries, and then:
ee9468a2 2517
cce04beb
DG
2518 Breakpoint 1, Perl_pp_add () at pp_hot.c:309
2519 309 dSP; dATARGET; tryAMAGICbin(add,opASSIGN);
2520 (gdb) step
2521 311 dPOPTOPnnrl_ul;
2522 (gdb)
63796a85 2523
cce04beb
DG
2524We looked at this bit of code before, and we said that C<dPOPTOPnnrl_ul>
2525arranges for two C<NV>s to be placed into C<left> and C<right> - let's
2526slightly expand it:
63796a85 2527
cce04beb
DG
2528 #define dPOPTOPnnrl_ul NV right = POPn; \
2529 SV *leftsv = TOPs; \
2530 NV left = USE_LEFT(leftsv) ? SvNV(leftsv) : 0.0
63796a85 2531
cce04beb
DG
2532C<POPn> takes the SV from the top of the stack and obtains its NV either
2533directly (if C<SvNOK> is set) or by calling the C<sv_2nv> function.
2534C<TOPs> takes the next SV from the top of the stack - yes, C<POPn> uses
2535C<TOPs> - but doesn't remove it. We then use C<SvNV> to get the NV from
2536C<leftsv> in the same way as before - yes, C<POPn> uses C<SvNV>.
63796a85 2537
cce04beb
DG
2538Since we don't have an NV for C<$b>, we'll have to use C<sv_2nv> to
2539convert it. If we step again, we'll find ourselves there:
ee9468a2 2540
cce04beb
DG
2541 Perl_sv_2nv (sv=0xa0675d0) at sv.c:1669
2542 1669 if (!sv)
2543 (gdb)
ee9468a2 2544
cce04beb 2545We can now use C<Perl_sv_dump> to investigate the SV:
0bec6c03 2546
cce04beb
DG
2547 SV = PV(0xa057cc0) at 0xa0675d0
2548 REFCNT = 1
2549 FLAGS = (POK,pPOK)
2550 PV = 0xa06a510 "6XXXX"\0
2551 CUR = 5
2552 LEN = 6
2553 $1 = void
0bec6c03 2554
cce04beb
DG
2555We know we're going to get C<6> from this, so let's finish the
2556subroutine:
0bec6c03 2557
cce04beb
DG
2558 (gdb) finish
2559 Run till exit from #0 Perl_sv_2nv (sv=0xa0675d0) at sv.c:1671
2560 0x462669 in Perl_pp_add () at pp_hot.c:311
2561 311 dPOPTOPnnrl_ul;
0bec6c03 2562
cce04beb
DG
2563We can also dump out this op: the current op is always stored in
2564C<PL_op>, and we can dump it with C<Perl_op_dump>. This'll give us
2565similar output to L<B::Debug|B::Debug>.
d7889f52 2566
cce04beb
DG
2567 {
2568 13 TYPE = add ===> 14
2569 TARG = 1
2570 FLAGS = (SCALAR,KIDS)
2571 {
2572 TYPE = null ===> (12)
2573 (was rv2sv)
2574 FLAGS = (SCALAR,KIDS)
2575 {
2576 11 TYPE = gvsv ===> 12
2577 FLAGS = (SCALAR)
2578 GV = main::b
2579 }
2580 }
ee9468a2 2581
cce04beb 2582# finish this later #
63796a85 2583
cce04beb 2584=head1 SOURCE CODE STATIC ANALYSIS
63796a85 2585
cce04beb
DG
2586Various tools exist for analysing C source code B<statically>, as
2587opposed to B<dynamically>, that is, without executing the code.
2588It is possible to detect resource leaks, undefined behaviour, type
2589mismatches, portability problems, code paths that would cause illegal
2590memory accesses, and other similar problems by just parsing the C code
2591and looking at the resulting graph, what does it tell about the
2592execution and data flows. As a matter of fact, this is exactly
2593how C compilers know to give warnings about dubious code.
63796a85 2594
cce04beb 2595=head2 lint, splint
63796a85 2596
cce04beb
DG
2597The good old C code quality inspector, C<lint>, is available in
2598several platforms, but please be aware that there are several
2599different implementations of it by different vendors, which means that
2600the flags are not identical across different platforms.
63796a85 2601
cce04beb
DG
2602There is a lint variant called C<splint> (Secure Programming Lint)
2603available from http://www.splint.org/ that should compile on any
2604Unix-like platform.
63796a85 2605
cce04beb
DG
2606There are C<lint> and <splint> targets in Makefile, but you may have
2607to diddle with the flags (see above).
63796a85 2608
cce04beb 2609=head2 Coverity
63796a85 2610
cce04beb
DG
2611Coverity (http://www.coverity.com/) is a product similar to lint and
2612as a testbed for their product they periodically check several open
2613source projects, and they give out accounts to open source developers
2614to the defect databases.
ee9468a2 2615
cce04beb 2616=head2 cpd (cut-and-paste detector)
ee9468a2 2617
cce04beb
DG
2618The cpd tool detects cut-and-paste coding. If one instance of the
2619cut-and-pasted code changes, all the other spots should probably be
2620changed, too. Therefore such code should probably be turned into a
2621subroutine or a macro.
ee9468a2 2622
cce04beb
DG
2623cpd (http://pmd.sourceforge.net/cpd.html) is part of the pmd project
2624(http://pmd.sourceforge.net/). pmd was originally written for static
2625analysis of Java code, but later the cpd part of it was extended to
2626parse also C and C++.
ee9468a2 2627
cce04beb
DG
2628Download the pmd-bin-X.Y.zip () from the SourceForge site, extract the
2629pmd-X.Y.jar from it, and then run that on source code thusly:
ee9468a2 2630
cce04beb 2631 java -cp pmd-X.Y.jar net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /some/where/src --language c > cpd.txt
ee9468a2 2632
cce04beb 2633You may run into memory limits, in which case you should use the -Xmx option:
ee9468a2 2634
cce04beb 2635 java -Xmx512M ...
ee9468a2 2636
cce04beb 2637=head2 gcc warnings
ee9468a2 2638
cce04beb
DG
2639Though much can be written about the inconsistency and coverage
2640problems of gcc warnings (like C<-Wall> not meaning "all the
2641warnings", or some common portability problems not being covered by
2642C<-Wall>, or C<-ansi> and C<-pedantic> both being a poorly defined
2643collection of warnings, and so forth), gcc is still a useful tool in
2644keeping our coding nose clean.
ee9468a2 2645
cce04beb 2646The C<-Wall> is by default on.
d7889f52 2647
cce04beb
DG
2648The C<-ansi> (and its sidekick, C<-pedantic>) would be nice to be on
2649always, but unfortunately they are not safe on all platforms, they can
2650for example cause fatal conflicts with the system headers (Solaris
2651being a prime example). If Configure C<-Dgccansipedantic> is used,
2652the C<cflags> frontend selects C<-ansi -pedantic> for the platforms
2653where they are known to be safe.
2654
2655Starting from Perl 5.9.4 the following extra flags are added:
ad7244db
JH
2656
2657=over 4
2658
2659=item *
2660
cce04beb 2661C<-Wendif-labels>
ad7244db
JH
2662
2663=item *
2664
cce04beb 2665C<-Wextra>
ad7244db 2666
cce04beb 2667=item *
ad7244db 2668
cce04beb 2669C<-Wdeclaration-after-statement>
d7889f52 2670
cce04beb
DG
2671=back
2672
2673The following flags would be nice to have but they would first need
2674their own Augean stablemaster:
d7889f52
JH
2675
2676=over 4
2677
2678=item *
2679
cce04beb 2680C<-Wpointer-arith>
d7889f52
JH
2681
2682=item *
2683
cce04beb 2684C<-Wshadow>
d7889f52
JH
2685
2686=item *
2687
cce04beb 2688C<-Wstrict-prototypes>
d7889f52
JH
2689
2690=back
2691
cce04beb
DG
2692The C<-Wtraditional> is another example of the annoying tendency of
2693gcc to bundle a lot of warnings under one switch (it would be
2694impossible to deploy in practice because it would complain a lot) but
2695it does contain some warnings that would be beneficial to have available
2696on their own, such as the warning about string constants inside macros
2697containing the macro arguments: this behaved differently pre-ANSI
2698than it does in ANSI, and some C compilers are still in transition,
2699AIX being an example.
2700
2701=head2 Warnings of other C compilers
2702
2703Other C compilers (yes, there B<are> other C compilers than gcc) often
2704have their "strict ANSI" or "strict ANSI with some portability extensions"
2705modes on, like for example the Sun Workshop has its C<-Xa> mode on
2706(though implicitly), or the DEC (these days, HP...) has its C<-std1>
2707mode on.
902b9dbf 2708
cce04beb 2709=head1 MEMORY DEBUGGERS
902b9dbf 2710
a958818a
JH
2711B<NOTE 1>: Running under memory debuggers such as Purify, valgrind, or
2712Third Degree greatly slows down the execution: seconds become minutes,
2713minutes become hours. For example as of Perl 5.8.1, the
2714ext/Encode/t/Unicode.t takes extraordinarily long to complete under
2715e.g. Purify, Third Degree, and valgrind. Under valgrind it takes more
ac036724 2716than six hours, even on a snappy computer. The said test must be
a958818a
JH
2717doing something that is quite unfriendly for memory debuggers. If you
2718don't feel like waiting, that you can simply kill away the perl
2719process.
2720
2721B<NOTE 2>: To minimize the number of memory leak false alarms (see
ac036724 2722L</PERL_DESTRUCT_LEVEL> for more information), you have to set the
2723environment variable PERL_DESTRUCT_LEVEL to 2.
2724
2725For csh-like shells:
a958818a
JH
2726
2727 setenv PERL_DESTRUCT_LEVEL 2
2728
ac036724 2729For Bourne-type shells:
a958818a
JH
2730
2731 PERL_DESTRUCT_LEVEL=2
2732 export PERL_DESTRUCT_LEVEL
2733
ac036724 2734In Unixy environments you can also use the C<env> command:
a958818a
JH
2735
2736 env PERL_DESTRUCT_LEVEL=2 valgrind ./perl -Ilib ...
a1b65709 2737
37c0adeb
JH
2738B<NOTE 3>: There are known memory leaks when there are compile-time
2739errors within eval or require, seeing C<S_doeval> in the call stack
2740is a good sign of these. Fixing these leaks is non-trivial,
2741unfortunately, but they must be fixed eventually.
2742
f50e5b73
MH
2743B<NOTE 4>: L<DynaLoader> will not clean up after itself completely
2744unless Perl is built with the Configure option
2745C<-Accflags=-DDL_UNLOAD_ALL_AT_EXIT>.
2746
902b9dbf
MF
2747=head2 Rational Software's Purify
2748
2749Purify is a commercial tool that is helpful in identifying
2750memory overruns, wild pointers, memory leaks and other such
2751badness. Perl must be compiled in a specific way for
2752optimal testing with Purify. Purify is available under
2753Windows NT, Solaris, HP-UX, SGI, and Siemens Unix.
2754
cce04beb 2755=head3 Purify on Unix
902b9dbf
MF
2756
2757On Unix, Purify creates a new Perl binary. To get the most
2758benefit out of Purify, you should create the perl to Purify
2759using:
2760
2761 sh Configure -Accflags=-DPURIFY -Doptimize='-g' \
2762 -Uusemymalloc -Dusemultiplicity
2763
2764where these arguments mean:
2765
2766=over 4
2767
2768=item -Accflags=-DPURIFY
2769
2770Disables Perl's arena memory allocation functions, as well as
2771forcing use of memory allocation functions derived from the
2772system malloc.
2773
2774=item -Doptimize='-g'
2775
2776Adds debugging information so that you see the exact source
2777statements where the problem occurs. Without this flag, all
2778you will see is the source filename of where the error occurred.
2779
2780=item -Uusemymalloc
2781
2782Disable Perl's malloc so that Purify can more closely monitor
2783allocations and leaks. Using Perl's malloc will make Purify
2784report most leaks in the "potential" leaks category.
2785
2786=item -Dusemultiplicity
2787
2788Enabling the multiplicity option allows perl to clean up
2789thoroughly when the interpreter shuts down, which reduces the
2790number of bogus leak reports from Purify.
2791
2792=back
2793
2794Once you've compiled a perl suitable for Purify'ing, then you
2795can just:
2796
07aa3531 2797 make pureperl
902b9dbf
MF
2798
2799which creates a binary named 'pureperl' that has been Purify'ed.
2800This binary is used in place of the standard 'perl' binary
2801when you want to debug Perl memory problems.
2802
2803As an example, to show any memory leaks produced during the
2804standard Perl testset you would create and run the Purify'ed
2805perl as:
2806
2807 make pureperl
2808 cd t
07aa3531 2809 ../pureperl -I../lib harness
902b9dbf
MF
2810
2811which would run Perl on test.pl and report any memory problems.
2812
2813Purify outputs messages in "Viewer" windows by default. If
2814you don't have a windowing environment or if you simply
2815want the Purify output to unobtrusively go to a log file
2816instead of to the interactive window, use these following
2817options to output to the log file "perl.log":
2818
2819 setenv PURIFYOPTIONS "-chain-length=25 -windows=no \
2820 -log-file=perl.log -append-logfile=yes"
2821
2822If you plan to use the "Viewer" windows, then you only need this option:
2823
2824 setenv PURIFYOPTIONS "-chain-length=25"
2825
c406981e
JH
2826In Bourne-type shells:
2827
98631ff8
JL
2828 PURIFYOPTIONS="..."
2829 export PURIFYOPTIONS
c406981e
JH
2830
2831or if you have the "env" utility:
2832
98631ff8 2833 env PURIFYOPTIONS="..." ../pureperl ...
c406981e 2834
cce04beb 2835=head3 Purify on NT
902b9dbf
MF
2836
2837Purify on Windows NT instruments the Perl binary 'perl.exe'
2838on the fly. There are several options in the makefile you
2839should change to get the most use out of Purify:
2840
2841=over 4
2842
2843=item DEFINES
2844
2845You should add -DPURIFY to the DEFINES line so the DEFINES
2846line looks something like:
2847
195c30ce 2848 DEFINES = -DWIN32 -D_CONSOLE -DNO_STRICT $(CRYPT_FLAG) -DPURIFY=1
902b9dbf
MF
2849
2850to disable Perl's arena memory allocation functions, as
2851well as to force use of memory allocation functions derived
2852from the system malloc.
2853
2854=item USE_MULTI = define
2855
2856Enabling the multiplicity option allows perl to clean up
2857thoroughly when the interpreter shuts down, which reduces the
2858number of bogus leak reports from Purify.
2859
2860=item #PERL_MALLOC = define
2861
2862Disable Perl's malloc so that Purify can more closely monitor
2863allocations and leaks. Using Perl's malloc will make Purify
2864report most leaks in the "potential" leaks category.
2865
2866=item CFG = Debug
2867
2868Adds debugging information so that you see the exact source
2869statements where the problem occurs. Without this flag, all
2870you will see is the source filename of where the error occurred.
2871
2872=back
2873
2874As an example, to show any memory leaks produced during the
2875standard Perl testset you would create and run Purify as:
2876
2877 cd win32
2878 make
2879 cd ../t
07aa3531 2880 purify ../perl -I../lib harness
902b9dbf
MF
2881
2882which would instrument Perl in memory, run Perl on test.pl,
2883then finally report any memory problems.
2884
7a834142
JH
2885=head2 valgrind
2886
2887The excellent valgrind tool can be used to find out both memory leaks
9df8f87f
LB
2888and illegal memory accesses. As of version 3.3.0, Valgrind only
2889supports Linux on x86, x86-64 and PowerPC. The special "test.valgrind"
2890target can be used to run the tests under valgrind. Found errors
2891and memory leaks are logged in files named F<testfile.valgrind>.
07aa3531
JC
2892
2893Valgrind also provides a cachegrind tool, invoked on perl as:
2894
038c294a 2895 VG_OPTS=--tool=cachegrind make test.valgrind
d44161bf
MHM
2896
2897As system libraries (most notably glibc) are also triggering errors,
2898valgrind allows to suppress such errors using suppression files. The
2899default suppression file that comes with valgrind already catches a lot
2900of them. Some additional suppressions are defined in F<t/perl.supp>.
7a834142
JH
2901
2902To get valgrind and for more information see
2903
2904 http://developer.kde.org/~sewardj/
2905
f134cc4e 2906=head2 Compaq's/Digital's/HP's Third Degree
09187cb1
JH
2907
2908Third Degree is a tool for memory leak detection and memory access checks.
2909It is one of the many tools in the ATOM toolkit. The toolkit is only
2910available on Tru64 (formerly known as Digital UNIX formerly known as
2911DEC OSF/1).
2912
2913When building Perl, you must first run Configure with -Doptimize=-g
2914and -Uusemymalloc flags, after that you can use the make targets
51a35ef1
JH
2915"perl.third" and "test.third". (What is required is that Perl must be
2916compiled using the C<-g> flag, you may need to re-Configure.)
09187cb1 2917
64cea5fd 2918The short story is that with "atom" you can instrument the Perl
83f0ef60 2919executable to create a new executable called F<perl.third>. When the
4ae3d70a 2920instrumented executable is run, it creates a log of dubious memory
83f0ef60 2921traffic in file called F<perl.3log>. See the manual pages of atom and
4ae3d70a
JH
2922third for more information. The most extensive Third Degree
2923documentation is available in the Compaq "Tru64 UNIX Programmer's
2924Guide", chapter "Debugging Programs with Third Degree".
64cea5fd 2925
9c54ecba 2926The "test.third" leaves a lot of files named F<foo_bar.3log> in the t/
64cea5fd
JH
2927subdirectory. There is a problem with these files: Third Degree is so
2928effective that it finds problems also in the system libraries.
9c54ecba
JH
2929Therefore you should used the Porting/thirdclean script to cleanup
2930the F<*.3log> files.
64cea5fd
JH
2931
2932There are also leaks that for given certain definition of a leak,
2933aren't. See L</PERL_DESTRUCT_LEVEL> for more information.
2934
cce04beb 2935=head1 PROFILING
51a35ef1 2936
3b753521 2937Depending on your platform there are various ways of profiling Perl.
51a35ef1
JH
2938
2939There are two commonly used techniques of profiling executables:
10f58044 2940I<statistical time-sampling> and I<basic-block counting>.
51a35ef1
JH
2941
2942The first method takes periodically samples of the CPU program
2943counter, and since the program counter can be correlated with the code
2944generated for functions, we get a statistical view of in which
2945functions the program is spending its time. The caveats are that very
2946small/fast functions have lower probability of showing up in the
2947profile, and that periodically interrupting the program (this is
2948usually done rather frequently, in the scale of milliseconds) imposes
2949an additional overhead that may skew the results. The first problem
2950can be alleviated by running the code for longer (in general this is a
2951good idea for profiling), the second problem is usually kept in guard
2952by the profiling tools themselves.
2953
10f58044 2954The second method divides up the generated code into I<basic blocks>.
51a35ef1
JH
2955Basic blocks are sections of code that are entered only in the
2956beginning and exited only at the end. For example, a conditional jump
2957starts a basic block. Basic block profiling usually works by
10f58044 2958I<instrumenting> the code by adding I<enter basic block #nnnn>
51a35ef1
JH
2959book-keeping code to the generated code. During the execution of the
2960code the basic block counters are then updated appropriately. The
2961caveat is that the added extra code can skew the results: again, the
2962profiling tools usually try to factor their own effects out of the
2963results.
2964
83f0ef60
JH
2965=head2 Gprof Profiling
2966
e1020413 2967gprof is a profiling tool available in many Unix platforms,
51a35ef1 2968it uses F<statistical time-sampling>.
83f0ef60
JH
2969
2970You can build a profiled version of perl called "perl.gprof" by
51a35ef1
JH
2971invoking the make target "perl.gprof" (What is required is that Perl
2972must be compiled using the C<-pg> flag, you may need to re-Configure).
2973Running the profiled version of Perl will create an output file called
2974F<gmon.out> is created which contains the profiling data collected
2975during the execution.
83f0ef60
JH
2976
2977The gprof tool can then display the collected data in various ways.
2978Usually gprof understands the following options:
2979
2980=over 4
2981
2982=item -a
2983
2984Suppress statically defined functions from the profile.
2985
2986=item -b
2987
2988Suppress the verbose descriptions in the profile.
2989
2990=item -e routine
2991
2992Exclude the given routine and its descendants from the profile.
2993
2994=item -f routine
2995
2996Display only the given routine and its descendants in the profile.
2997
2998=item -s
2999
3000Generate a summary file called F<gmon.sum> which then may be given
3001to subsequent gprof runs to accumulate data over several runs.
3002
3003=item -z
3004
3005Display routines that have zero usage.
3006
3007=back
3008
3009For more detailed explanation of the available commands and output
3010formats, see your own local documentation of gprof.
3011
038c294a 3012quick hint:
07aa3531 3013
289d61c2
JL
3014 $ sh Configure -des -Dusedevel -Doptimize='-pg' && make perl.gprof
3015 $ ./perl.gprof someprog # creates gmon.out in current directory
3016 $ gprof ./perl.gprof > out
07aa3531
JC
3017 $ view out
3018
51a35ef1
JH
3019=head2 GCC gcov Profiling
3020
10f58044 3021Starting from GCC 3.0 I<basic block profiling> is officially available
51a35ef1
JH
3022for the GNU CC.
3023
3024You can build a profiled version of perl called F<perl.gcov> by
3025invoking the make target "perl.gcov" (what is required that Perl must
3026be compiled using gcc with the flags C<-fprofile-arcs
3027-ftest-coverage>, you may need to re-Configure).
3028
3029Running the profiled version of Perl will cause profile output to be
3030generated. For each source file an accompanying ".da" file will be
3031created.
3032
3033To display the results you use the "gcov" utility (which should
3034be installed if you have gcc 3.0 or newer installed). F<gcov> is
3035run on source code files, like this
3036
3037 gcov sv.c
3038
3039which will cause F<sv.c.gcov> to be created. The F<.gcov> files
3040contain the source code annotated with relative frequencies of
3041execution indicated by "#" markers.
3042
3043Useful options of F<gcov> include C<-b> which will summarise the
3044basic block, branch, and function call coverage, and C<-c> which
3045instead of relative frequencies will use the actual counts. For
3046more information on the use of F<gcov> and basic block profiling
3047with gcc, see the latest GNU CC manual, as of GCC 3.0 see
3048
3049 http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc.html
3050
3051and its section titled "8. gcov: a Test Coverage Program"
3052
3053 http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc_8.html#SEC132
3054
07aa3531
JC
3055quick hint:
3056
27837272
ABR
3057 $ sh Configure -des -Dusedevel -Doptimize='-g' \
3058 -Accflags='-fprofile-arcs -ftest-coverage' \
07aa3531
JC
3059 -Aldflags='-fprofile-arcs -ftest-coverage' && make perl.gcov
3060 $ rm -f regexec.c.gcov regexec.gcda
3061 $ ./perl.gcov
3062 $ gcov regexec.c
3063 $ view regexec.c.gcov
3064
4ae3d70a
JH
3065=head2 Pixie Profiling
3066
51a35ef1
JH
3067Pixie is a profiling tool available on IRIX and Tru64 (aka Digital
3068UNIX aka DEC OSF/1) platforms. Pixie does its profiling using
10f58044 3069I<basic-block counting>.
4ae3d70a 3070
83f0ef60 3071You can build a profiled version of perl called F<perl.pixie> by
51a35ef1
JH
3072invoking the make target "perl.pixie" (what is required is that Perl
3073must be compiled using the C<-g> flag, you may need to re-Configure).
3074
3075In Tru64 a file called F<perl.Addrs> will also be silently created,
3076this file contains the addresses of the basic blocks. Running the
3077profiled version of Perl will create a new file called "perl.Counts"
3078which contains the counts for the basic block for that particular
3079program execution.
4ae3d70a 3080
51a35ef1 3081To display the results you use the F<prof> utility. The exact
4ae3d70a
JH
3082incantation depends on your operating system, "prof perl.Counts" in
3083IRIX, and "prof -pixie -all -L. perl" in Tru64.
3084
6c41479b
JH
3085In IRIX the following prof options are available:
3086
3087=over 4
3088
3089=item -h
3090
3091Reports the most heavily used lines in descending order of use.
6e36760b 3092Useful for finding the hotspot lines.
6c41479b
JH
3093
3094=item -l
3095
3096Groups lines by procedure, with procedures sorted in descending order of use.
3097Within a procedure, lines are listed in source order.
6e36760b 3098Useful for finding the hotspots of procedures.
6c41479b
JH
3099
3100=back
3101
3102In Tru64 the following options are available:
3103
3104=over 4
3105
3958b146 3106=item -p[rocedures]
6c41479b 3107
3958b146 3108Procedures sorted in descending order by the number of cycles executed
6e36760b 3109in each procedure. Useful for finding the hotspot procedures.
6c41479b
JH
3110(This is the default option.)
3111
24000d2f 3112=item -h[eavy]
6c41479b 3113
6e36760b
JH
3114Lines sorted in descending order by the number of cycles executed in
3115each line. Useful for finding the hotspot lines.
6c41479b 3116
24000d2f 3117=item -i[nvocations]
6c41479b 3118
6e36760b
JH
3119The called procedures are sorted in descending order by number of calls
3120made to the procedures. Useful for finding the most used procedures.
6c41479b 3121
24000d2f 3122=item -l[ines]
6c41479b
JH
3123
3124Grouped by procedure, sorted by cycles executed per procedure.
6e36760b 3125Useful for finding the hotspots of procedures.
6c41479b
JH
3126
3127=item -testcoverage
3128
3129The compiler emitted code for these lines, but the code was unexecuted.
3130
24000d2f 3131=item -z[ero]
6c41479b
JH
3132
3133Unexecuted procedures.
3134
aa500c9e 3135=back
6c41479b
JH
3136
3137For further information, see your system's manual pages for pixie and prof.
4ae3d70a 3138
cce04beb 3139=head1 MISCELLANEOUS TRICKS
b8ddf6b3 3140
cce04beb 3141=head2 PERL_DESTRUCT_LEVEL
b8ddf6b3 3142
cce04beb
DG
3143If you want to run any of the tests yourself manually using e.g.
3144valgrind, or the pureperl or perl.third executables, please note that
3145by default perl B<does not> explicitly cleanup all the memory it has
3146allocated (such as global memory arenas) but instead lets the exit()
3147of the whole program "take care" of such allocations, also known as
3148"global destruction of objects".
3149
3150There is a way to tell perl to do complete cleanup: set the
3151environment variable PERL_DESTRUCT_LEVEL to a non-zero value.
3152The t/TEST wrapper does set this to 2, and this is what you
3153need to do too, if you don't want to see the "global leaks":
3154For example, for "third-degreed" Perl:
3155
3156 env PERL_DESTRUCT_LEVEL=2 ./perl.third -Ilib t/foo/bar.t
3157
3158(Note: the mod_perl apache module uses also this environment variable
3159for its own purposes and extended its semantics. Refer to the mod_perl
3160documentation for more information. Also, spawned threads do the
3161equivalent of setting this variable to the value 1.)
3162
3163If, at the end of a run you get the message I<N scalars leaked>, you can
3164recompile with C<-DDEBUG_LEAKING_SCALARS>, which will cause the addresses
3165of all those leaked SVs to be dumped along with details as to where each
3166SV was originally allocated. This information is also displayed by
3167Devel::Peek. Note that the extra details recorded with each SV increases
3168memory usage, so it shouldn't be used in production environments. It also
3169converts C<new_SV()> from a macro into a real function, so you can use
3170your favourite debugger to discover where those pesky SVs were allocated.
3171
3172If you see that you're leaking memory at runtime, but neither valgrind
3173nor C<-DDEBUG_LEAKING_SCALARS> will find anything, you're probably
3174leaking SVs that are still reachable and will be properly cleaned up
3175during destruction of the interpreter. In such cases, using the C<-Dm>
3176switch can point you to the source of the leak. If the executable was
3177built with C<-DDEBUG_LEAKING_SCALARS>, C<-Dm> will output SV allocations
3178in addition to memory allocations. Each SV allocation has a distinct
3179serial number that will be written on creation and destruction of the SV.
3180So if you're executing the leaking code in a loop, you need to look for
3181SVs that are created, but never destroyed between each cycle. If such an
3182SV is found, set a conditional breakpoint within C<new_SV()> and make it
3183break only when C<PL_sv_serial> is equal to the serial number of the
3184leaking SV. Then you will catch the interpreter in exactly the state
3185where the leaking SV is allocated, which is sufficient in many cases to
3186find the source of the leak.
3187
3188As C<-Dm> is using the PerlIO layer for output, it will by itself
3189allocate quite a bunch of SVs, which are hidden to avoid recursion.
3190You can bypass the PerlIO layer if you use the SV logging provided
3191by C<-DPERL_MEM_LOG> instead.
3192
3193=head2 PERL_MEM_LOG
3194
3195If compiled with C<-DPERL_MEM_LOG>, both memory and SV allocations go
3196through logging functions, which is handy for breakpoint setting.
3197
3198Unless C<-DPERL_MEM_LOG_NOIMPL> is also compiled, the logging
3199functions read $ENV{PERL_MEM_LOG} to determine whether to log the
3200event, and if so how:
3201
3202 $ENV{PERL_MEM_LOG} =~ /m/ Log all memory ops
3203 $ENV{PERL_MEM_LOG} =~ /s/ Log all SV ops
3204 $ENV{PERL_MEM_LOG} =~ /t/ include timestamp in Log
3205 $ENV{PERL_MEM_LOG} =~ /^(\d+)/ write to FD given (default is 2)
3206
3207Memory logging is somewhat similar to C<-Dm> but is independent of
3208C<-DDEBUGGING>, and at a higher level; all uses of Newx(), Renew(),
3209and Safefree() are logged with the caller's source code file and line
3210number (and C function name, if supported by the C compiler). In
3211contrast, C<-Dm> is directly at the point of C<malloc()>. SV logging
3212is similar.
3213
3214Since the logging doesn't use PerlIO, all SV allocations are logged
3215and no extra SV allocations are introduced by enabling the logging.
3216If compiled with C<-DDEBUG_LEAKING_SCALARS>, the serial number for
3217each SV allocation is also logged.
3218
3219=head2 DDD over gdb
b8ddf6b3 3220
cc177e1a 3221Those debugging perl with the DDD frontend over gdb may find the
b8ddf6b3
SB
3222following useful:
3223
3224You can extend the data conversion shortcuts menu, so for example you
3225can display an SV's IV value with one click, without doing any typing.
3226To do that simply edit ~/.ddd/init file and add after:
3227
3228 ! Display shortcuts.
3229 Ddd*gdbDisplayShortcuts: \
3230 /t () // Convert to Bin\n\
3231 /d () // Convert to Dec\n\
3232 /x () // Convert to Hex\n\
3233 /o () // Convert to Oct(\n\
3234
3235the following two lines:
3236
3237 ((XPV*) (())->sv_any )->xpv_pv // 2pvx\n\
3238 ((XPVIV*) (())->sv_any )->xiv_iv // 2ivx
3239
3240so now you can do ivx and pvx lookups or you can plug there the
3241sv_peek "conversion":
3242
3243 Perl_sv_peek(my_perl, (SV*)()) // sv_peek
3244
3245(The my_perl is for threaded builds.)
3246Just remember that every line, but the last one, should end with \n\
3247
3248Alternatively edit the init file interactively via:
32493rd mouse button -> New Display -> Edit Menu
3250
3251Note: you can define up to 20 conversion shortcuts in the gdb
3252section.
3253
cce04beb 3254=head2 Poison
9965345d 3255
7e337ee0
JH
3256If you see in a debugger a memory area mysteriously full of 0xABABABAB
3257or 0xEFEFEFEF, you may be seeing the effect of the Poison() macros,
3258see L<perlclib>.
9965345d 3259
cce04beb 3260=head2 Read-only optrees
f1fac472
NC
3261
3262Under ithreads the optree is read only. If you want to enforce this, to check
3263for write accesses from buggy code, compile with C<-DPL_OP_SLAB_ALLOC> to
3264enable the OP slab allocator and C<-DPERL_DEBUG_READONLY_OPS> to enable code
3265that allocates op memory via C<mmap>, and sets it read-only at run time.
3266Any write access to an op results in a C<SIGBUS> and abort.
3267
3268This code is intended for development only, and may not be portable even to
3269all Unix variants. Also, it is an 80% solution, in that it isn't able to make
3270all ops read only. Specifically it
3271
3272=over
3273
3274=item 1
3275
3276Only sets read-only on all slabs of ops at C<CHECK> time, hence ops allocated
3277later via C<require> or C<eval> will be re-write
3278
3279=item 2
3280
3281Turns an entire slab of ops read-write if the refcount of any op in the slab
3282needs to be decreased.
3283
3284=item 3
3285
3286Turns an entire slab of ops read-write if any op from the slab is freed.
3287
b8ddf6b3
SB
3288=back
3289
f1fac472
NC
3290It's not possible to turn the slabs to read-only after an action requiring
3291read-write access, as either can happen during op tree building time, so
3292there may still be legitimate write access.
3293
3294However, as an 80% solution it is still effective, as currently it catches
3295a write access during the generation of F<Config.pm>, which means that we
3296can't yet build F<perl> with this enabled.
3297
955fec6b 3298=head1 CONCLUSION
a422fd2d 3299
955fec6b
JH
3300We've had a brief look around the Perl source, how to maintain quality
3301of the source code, an overview of the stages F<perl> goes through
3302when it's running your code, how to use debuggers to poke at the Perl
3303guts, and finally how to analyse the execution of Perl. We took a very
3304simple problem and demonstrated how to solve it fully - with
3305documentation, regression tests, and finally a patch for submission to
3306p5p. Finally, we talked about how to use external tools to debug and
3307test Perl.
a422fd2d
SC
3308
3309I'd now suggest you read over those references again, and then, as soon
3310as possible, get your hands dirty. The best way to learn is by doing,
07aa3531 3311so:
a422fd2d
SC
3312
3313=over 3
3314
3315=item *
3316
3317Subscribe to perl5-porters, follow the patches and try and understand
3318them; don't be afraid to ask if there's a portion you're not clear on -
3319who knows, you may unearth a bug in the patch...
3320
3321=item *
3322
3323Keep up to date with the bleeding edge Perl distributions and get
3324familiar with the changes. Try and get an idea of what areas people are
3325working on and the changes they're making.
3326
3327=item *
3328
3e148164 3329Do read the README associated with your operating system, e.g. README.aix
a1f349fd
MB
3330on the IBM AIX OS. Don't hesitate to supply patches to that README if
3331you find anything missing or changed over a new OS release.
3332
3333=item *
3334
a422fd2d
SC
3335Find an area of Perl that seems interesting to you, and see if you can
3336work out how it works. Scan through the source, and step over it in the
3337debugger. Play, poke, investigate, fiddle! You'll probably get to
3338understand not just your chosen area but a much wider range of F<perl>'s
3339activity as well, and probably sooner than you'd think.
3340
3341=back
3342
3343=over 3
3344
3345=item I<The Road goes ever on and on, down from the door where it began.>
3346
3347=back
3348
64d9b66b 3349If you can do these things, you've started on the long road to Perl porting.
a422fd2d
SC
3350Thanks for wanting to help make Perl better - and happy hacking!
3351
4ac71550
TC
3352=head2 Metaphoric Quotations
3353
3354If you recognized the quote about the Road above, you're in luck.
3355
3356Most software projects begin each file with a literal description of each
3357file's purpose. Perl instead begins each with a literary allusion to that
3358file's purpose.
3359
3360Like chapters in many books, all top-level Perl source files (along with a
3361few others here and there) begin with an epigramic inscription that alludes,
3362indirectly and metaphorically, to the material you're about to read.
3363
3364Quotations are taken from writings of J.R.R Tolkien pertaining to his
3365Legendarium, almost always from I<The Lord of the Rings>. Chapters and
3366page numbers are given using the following editions:
3367
3368=over 4
3369
3370=item *
3371
3372I<The Hobbit>, by J.R.R. Tolkien. The hardcover, 70th-anniversary
3373edition of 2007 was used, published in the UK by Harper Collins Publishers
3374and in the US by the Houghton Mifflin Company.
3375
3376=item *
3377
3378I<The Lord of the Rings>, by J.R.R. Tolkien. The hardcover,
337950th-anniversary edition of 2004 was used, published in the UK by Harper
3380Collins Publishers and in the US by the Houghton Mifflin Company.
3381
3382=item *
3383
3384I<The Lays of Beleriand>, by J.R.R. Tolkien and published posthumously by his
3385son and literary executor, C.J.R. Tolkien, being the 3rd of the 12 volumes
3386in Christopher's mammoth I<History of Middle Earth>. Page numbers derive
3387from the hardcover edition, first published in 1983 by George Allen &
3388Unwin; no page numbers changed for the special 3-volume omnibus edition of
33892002 or the various trade-paper editions, all again now by Harper Collins
3390or Houghton Mifflin.
3391
3392=back
3393
3394Other JRRT books fair game for quotes would thus include I<The Adventures of
3395Tom Bombadil>, I<The Silmarillion>, I<Unfinished Tales>, and I<The Tale of
3396the Children of Hurin>, all but the first posthumously assembled by CJRT.
3397But I<The Lord of the Rings> itself is perfectly fine and probably best to
3398quote from, provided you can find a suitable quote there.
3399
3400So if you were to supply a new, complete, top-level source file to add to
3401Perl, you should conform to this peculiar practice by yourself selecting an
3402appropriate quotation from Tolkien, retaining the original spelling and
3403punctuation and using the same format the rest of the quotes are in.
3404Indirect and oblique is just fine; remember, it's a metaphor, so being meta
3405is, after all, what it's for.
3406
e8cd7eae
GS
3407=head1 AUTHOR
3408
3409This document was written by Nathan Torkington, and is maintained by
3410the perl5-porters mailing list.
4ac71550 3411
b16c2e4a
RGS
3412=head1 SEE ALSO
3413
3414L<perlrepository>