X-Git-Url: https://perl5.git.perl.org/perl5.git/blobdiff_plain/1e54db1a8aea187ba2e790aca2ab81fab24ff92d..de10be12cd3b4d2e91c136c495230f49b31a4511:/pod/perlhack.pod diff --git a/pod/perlhack.pod b/pod/perlhack.pod index c815177..a0e9e02 100644 --- a/pod/perlhack.pod +++ b/pod/perlhack.pod @@ -34,16 +34,17 @@ words, it's your usual mix of technical people. Over this group of porters presides Larry Wall. He has the final word in what does and does not change in the Perl language. Various -releases of Perl are shepherded by a ``pumpking'', a porter -responsible for gathering patches, deciding on a patch-by-patch +releases of Perl are shepherded by a "pumpking", a porter +responsible for gathering patches, deciding on a patch-by-patch, feature-by-feature basis what will and will not go into the release. For instance, Gurusamy Sarathy was the pumpking for the 5.6 release of -Perl, and Jarkko Hietaniemi is the pumpking for the 5.8 release, and -Hugo van der Sanden will be the pumpking for the 5.10 release. +Perl, and Jarkko Hietaniemi was the pumpking for the 5.8 release, and +Rafael Garcia-Suarez holds the pumpking crown for the 5.10 release. In addition, various people are pumpkings for different things. For -instance, Andy Dougherty and Jarkko Hietaniemi share the I -pumpkin. +instance, Andy Dougherty and Jarkko Hietaniemi did a grand job as the +I pumpkin up till the 5.8 release. For the 5.10 release +H.Merijn Brand took over. Larry sees Perl development along the lines of the US government: there's the Legislature (the porters), the Executive branch (the @@ -128,7 +129,7 @@ Is this something that only the submitter wants added to the language, or would it be broadly useful? Sometimes, instead of adding a feature with a tight focus, the porters might decide to wait until someone implements the more generalized feature. For instance, instead of -implementing a ``delayed evaluation'' feature, the porters are waiting +implementing a "delayed evaluation" feature, the porters are waiting for a macro system that would permit delayed evaluation and much more. =item Does it potentially introduce new bugs? @@ -155,7 +156,7 @@ altogether without further notice. =item Is the implementation generic enough to be portable? The worst patches make use of a system-specific features. It's highly -unlikely that nonportable additions to the Perl language will be +unlikely that non-portable additions to the Perl language will be accepted. =item Is the implementation tested? @@ -177,8 +178,8 @@ always a good idea. =item Is there another way to do it? -Larry said ``Although the Perl Slogan is I, I hesitate to make 10 ways to do something''. This is a +Larry said "Although the Perl Slogan is I, I hesitate to make 10 ways to do something". This is a tricky heuristic to navigate, though--one man's essential addition is another man's pointless cruft. @@ -192,37 +193,36 @@ authors, ... Perl is supposed to be easy. Working code is always preferred to pie-in-the-sky ideas. A patch to add a feature stands a much higher chance of making it to the language than does a random feature request, no matter how fervently argued the -request might be. This ties into ``Will it be useful?'', as the fact +request might be. This ties into "Will it be useful?", as the fact that someone took the time to make the patch demonstrates a strong desire for the feature. =back -If you're on the list, you might hear the word ``core'' bandied -around. It refers to the standard distribution. ``Hacking on the -core'' means you're changing the C source code to the Perl -interpreter. ``A core module'' is one that ships with Perl. +If you're on the list, you might hear the word "core" bandied +around. It refers to the standard distribution. "Hacking on the +core" means you're changing the C source code to the Perl +interpreter. "A core module" is one that ships with Perl. =head2 Keeping in sync The source code to the Perl interpreter, in its different versions, is -kept in a repository managed by a revision control system ( which is -currently the Perforce program, see http://perforce.com/ ). The -pumpkings and a few others have access to the repository to check in -changes. Periodically the pumpking for the development version of Perl -will release a new version, so the rest of the porters can see what's -changed. The current state of the main trunk of repository, and patches -that describe the individual changes that have happened since the last -public release are available at this location: +kept in a repository managed by the git revision control system. The +pumpkings and a few others have write access to the repository to check in +changes. - http://public.activestate.com/gsar/APC/ - ftp://ftp.linux.activestate.com/pub/staff/gsar/APC/ +How to clone and use the git perl repository is described in L. -If you're looking for a particular change, or a change that affected -a particular set of files, you may find the B -useful: +You can also choose to use rsync to get a copy of the current source tree +for the bleadperl branch and all maintenance branches : - http://public.activestate.com/cgi-bin/perlbrowse + $ rsync -avz rsync://perl5.git.perl.org/APC/perl-current . + $ rsync -avz rsync://perl5.git.perl.org/APC/perl-5.10.x . + $ rsync -avz rsync://perl5.git.perl.org/APC/perl-5.8.x . + $ rsync -avz rsync://perl5.git.perl.org/APC/perl-5.6.x . + $ rsync -avz rsync://perl5.git.perl.org/APC/perl-5.005xx . + +(Add the C<--delete> option to remove leftover files) You may also want to subscribe to the perl5-changes mailing list to receive a copy of each patch that gets submitted to the maintenance @@ -239,290 +239,50 @@ Needless to say, the source code in perl-current is usually in a perpetual state of evolution. You should expect it to be very buggy. Do B use it for any purpose other than testing and development. -Keeping in sync with the most recent branch can be done in several ways, -but the most convenient and reliable way is using B, available at -ftp://rsync.samba.org/pub/rsync/ . (You can also get the most recent -branch by FTP.) - -If you choose to keep in sync using rsync, there are two approaches -to doing so: - -=over 4 - -=item rsync'ing the source tree - -Presuming you are in the directory where your perl source resides -and you have rsync installed and available, you can `upgrade' to -the bleadperl using: - - # rsync -avz rsync://ftp.linux.activestate.com/perl-current/ . - -This takes care of updating every single item in the source tree to -the latest applied patch level, creating files that are new (to your -distribution) and setting date/time stamps of existing files to -reflect the bleadperl status. - -Note that this will not delete any files that were in '.' before -the rsync. Once you are sure that the rsync is running correctly, -run it with the --delete and the --dry-run options like this: - - # rsync -avz --delete --dry-run rsync://ftp.linux.activestate.com/perl-current/ . - -This will I an rsync run that also deletes files not -present in the bleadperl master copy. Observe the results from -this run closely. If you are sure that the actual run would delete -no files precious to you, you could remove the '--dry-run' option. - -You can than check what patch was the latest that was applied by -looking in the file B<.patch>, which will show the number of the -latest patch. - -If you have more than one machine to keep in sync, and not all of -them have access to the WAN (so you are not able to rsync all the -source trees to the real source), there are some ways to get around -this problem. - -=over 4 - -=item Using rsync over the LAN - -Set up a local rsync server which makes the rsynced source tree -available to the LAN and sync the other machines against this -directory. - -From http://rsync.samba.org/README.html : - - "Rsync uses rsh or ssh for communication. It does not need to be - setuid and requires no special privileges for installation. It - does not require an inetd entry or a daemon. You must, however, - have a working rsh or ssh system. Using ssh is recommended for - its security features." - -=item Using pushing over the NFS - -Having the other systems mounted over the NFS, you can take an -active pushing approach by checking the just updated tree against -the other not-yet synced trees. An example would be - - #!/usr/bin/perl -w - - use strict; - use File::Copy; - - my %MF = map { - m/(\S+)/; - $1 => [ (stat $1)[2, 7, 9] ]; # mode, size, mtime - } `cat MANIFEST`; - - my %remote = map { $_ => "/$_/pro/3gl/CPAN/perl-5.7.1" } qw(host1 host2); - - foreach my $host (keys %remote) { - unless (-d $remote{$host}) { - print STDERR "Cannot Xsync for host $host\n"; - next; - } - foreach my $file (keys %MF) { - my $rfile = "$remote{$host}/$file"; - my ($mode, $size, $mtime) = (stat $rfile)[2, 7, 9]; - defined $size or ($mode, $size, $mtime) = (0, 0, 0); - $size == $MF{$file}[1] && $mtime == $MF{$file}[2] and next; - printf "%4s %-34s %8d %9d %8d %9d\n", - $host, $file, $MF{$file}[1], $MF{$file}[2], $size, $mtime; - unlink $rfile; - copy ($file, $rfile); - utime time, $MF{$file}[2], $rfile; - chmod $MF{$file}[0], $rfile; - } - } - -though this is not perfect. It could be improved with checking -file checksums before updating. Not all NFS systems support -reliable utime support (when used over the NFS). - -=back - -=item rsync'ing the patches - -The source tree is maintained by the pumpking who applies patches to -the files in the tree. These patches are either created by the -pumpking himself using C after updating the file manually or -by applying patches sent in by posters on the perl5-porters list. -These patches are also saved and rsync'able, so you can apply them -yourself to the source files. - -Presuming you are in a directory where your patches reside, you can -get them in sync with - - # rsync -avz rsync://ftp.linux.activestate.com/perl-current-diffs/ . - -This makes sure the latest available patch is downloaded to your -patch directory. - -It's then up to you to apply these patches, using something like - - # last=`ls -t *.gz | sed q` - # rsync -avz rsync://ftp.linux.activestate.com/perl-current-diffs/ . - # find . -name '*.gz' -newer $last -exec gzcat {} \; >blead.patch - # cd ../perl-current - # patch -p1 -N <../perl-current-diffs/blead.patch - -or, since this is only a hint towards how it works, use CPAN-patchaperl -from Andreas König to have better control over the patching process. - -=back - -=head2 Why rsync the source tree - -=over 4 - -=item It's easier to rsync the source tree - -Since you don't have to apply the patches yourself, you are sure all -files in the source tree are in the right state. - -=item It's more reliable - -While both the rsync-able source and patch areas are automatically -updated every few minutes, keep in mind that applying patches may -sometimes mean careful hand-holding, especially if your version of -the C program does not understand how to deal with new files, -files with 8-bit characters, or files without trailing newlines. - -=back - -=head2 Why rsync the patches - -=over 4 - -=item It's easier to rsync the patches - -If you have more than one machine that you want to keep in track with -bleadperl, it's easier to rsync the patches only once and then apply -them to all the source trees on the different machines. - -In case you try to keep in pace on 5 different machines, for which -only one of them has access to the WAN, rsync'ing all the source -trees should than be done 5 times over the NFS. Having -rsync'ed the patches only once, I can apply them to all the source -trees automatically. Need you say more ;-) - -=item It's a good reference - -If you do not only like to have the most recent development branch, -but also like to B bugs, or extend features, you want to dive -into the sources. If you are a seasoned perl core diver, you don't -need no manuals, tips, roadmaps, perlguts.pod or other aids to find -your way around. But if you are a starter, the patches may help you -in finding where you should start and how to change the bits that -bug you. - -The file B is updated on occasions the pumpking sees as his -own little sync points. On those occasions, he releases a tar-ball of -the current source tree (i.e. perl@7582.tar.gz), which will be an -excellent point to start with when choosing to use the 'rsync the -patches' scheme. Starting with perl@7582, which means a set of source -files on which the latest applied patch is number 7582, you apply all -succeeding patches available from then on (7583, 7584, ...). - -You can use the patches later as a kind of search archive. - -=over 4 - -=item Finding a start point - -If you want to fix/change the behaviour of function/feature Foo, just -scan the patches for patches that mention Foo either in the subject, -the comments, or the body of the fix. A good chance the patch shows -you the files that are affected by that patch which are very likely -to be the starting point of your journey into the guts of perl. - -=item Finding how to fix a bug - -If you've found I the function/feature Foo misbehaves, but you -don't know how to fix it (but you do know the change you want to -make), you can, again, peruse the patches for similar changes and -look how others apply the fix. - -=item Finding the source of misbehaviour - -When you keep in sync with bleadperl, the pumpking would love to -I that the community efforts really work. So after each of his -sync points, you are to 'make test' to check if everything is still -in working order. If it is, you do 'make ok', which will send an OK -report to perlbug@perl.org. (If you do not have access to a mailer -from the system you just finished successfully 'make test', you can -do 'make okfile', which creates the file C, which you can -than take to your favourite mailer and mail yourself). - -But of course, as always, things will not always lead to a success -path, and one or more test do not pass the 'make test'. Before -sending in a bug report (using 'make nok' or 'make nokfile'), check -the mailing list if someone else has reported the bug already and if -so, confirm it by replying to that message. If not, you might want to -trace the source of that misbehaviour B sending in the bug, -which will help all the other porters in finding the solution. - -Here the saved patches come in very handy. You can check the list of -patches to see which patch changed what file and what change caused -the misbehaviour. If you note that in the bug report, it saves the -one trying to solve it, looking for that point. - -=back - -If searching the patches is too bothersome, you might consider using -perl's bugtron to find more information about discussions and -ramblings on posted bugs. - -If you want to get the best of both worlds, rsync both the source -tree for convenience, reliability and ease and rsync the patches -for reference. - -=back - - =head2 Perlbug administration -There is a single remote administrative interface for modifying bug status, -category, open issues etc. using the B I system, maintained -by I. Become an administrator, and close any bugs you can get +There is a single remote administrative interface for modifying bug status, +category, open issues etc. using the B bugtracker system, maintained +by Robert Spier. Become an administrator, and close any bugs you can get your sticky mitts on: - http://rt.perl.org - -The bugtracker mechanism for B bugs in particular is at: - - http://bugs6.perl.org/perlbug + http://bugs.perl.org/ To email the bug system administrators: "perlbug-admin" - =head2 Submitting patches Always submit patches to I. If you're patching a core module and there's an author listed, send the author a copy (see L). This lets other porters review your patch, which catches a surprising number of errors in patches. -Either use the diff program (available in source code form from -ftp://ftp.gnu.org/pub/gnu/ , or use Johan Vromans' I -(available from I). Unified diffs are preferred, -but context diffs are accepted. Do not send RCS-style diffs or diffs -without context lines. More information is given in the -I file in the Perl source distribution. Please -patch against the latest B version (e.g., if you're -fixing a bug in the 5.005 track, patch against the latest 5.005_5x -version). Only patches that survive the heat of the development -branch get applied to maintenance versions. +Please patch against the latest B version. (e.g., even if +you're fixing a bug in the 5.8 track, patch against the C branch in +the git repository.) + +If changes are accepted, they are applied to the development branch. Then +the maintenance pumpking decides which of those patches is to be +backported to the maint branch. Only patches that survive the heat of the +development branch get applied to maintenance versions. Your patch should update the documentation and test suite. See -L. +L. If you have added or removed files in the distribution, +edit the MANIFEST file accordingly, sort the MANIFEST file using +C, and include those changes as part of your patch. + +Patching documentation also follows the same order: if accepted, a patch +is first applied to B, and if relevant then it's backported +to B. (With an exception for some patches that document +behaviour that only appears in the maintenance branch, but which has +changed in the development version.) To report a bug in Perl, use the program I which comes with Perl (if you can't get Perl to work, send mail to the address I or I). Reporting bugs through I feeds into the automated bug-tracking system, access to -which is provided through the web at http://bugs.perl.org/ . It +which is provided through the web at http://rt.perl.org/rt3/ . It often pays to check the archives of the perl5-porters mailing list to see whether the bug you're reporting has been reported before, and if so whether it was considered a bug. See above for the location of @@ -530,9 +290,15 @@ the searchable archives. The CPAN testers ( http://testers.cpan.org/ ) are a group of volunteers who test CPAN modules on a variety of platforms. Perl -Smokers ( http://archives.develooper.com/daily-build@perl.org/ ) -automatically tests Perl source releases on platforms with various -configurations. Both efforts welcome volunteers. +Smokers ( http://www.nntp.perl.org/group/perl.daily-build and +http://www.nntp.perl.org/group/perl.daily-build.reports/ ) +automatically test Perl source releases on platforms with various +configurations. Both efforts welcome volunteers. In order to get +involved in smoke testing of the perl itself visit +L. In order to start smoke +testing CPAN modules visit L +or L or +L. It's a good idea to read and lurk for a while before chipping in. That way you'll get to see the dynamic of the conversations, learn the @@ -555,10 +321,12 @@ might start to make sense - don't worry if it doesn't yet, because the best way to study it is to read it in conjunction with poking at Perl source, and we'll do that later on. -You might also want to look at Gisle Aas's illustrated perlguts - -there's no guarantee that this will be absolutely up-to-date with the -latest documentation in the Perl core, but the fundamentals will be -right. ( http://gisle.aas.no/perl/illguts/ ) +Gisle Aas's illustrated perlguts (aka: illguts) is wonderful, although +a little out of date wrt some size details; the various SV structures +have since been reworked for smaller memory footprint. The +fundamentals are right however, and the pictures are very helpful. + +http://www.perl.org/tpc/1998/Perl_Language_and_Modules/Perl%20Illustrated/ =item L and L @@ -581,11 +349,9 @@ wanting to go about Perl development. =item The perl5-porters FAQ -This should be available from http://simon-cozens.org/writings/p5p-faq ; -alternatively, you can get the FAQ emailed to you by sending mail to -C. It contains hints on reading perl5-porters, -information on how perl5-porters works and how Perl development in general -works. +This should be available from http://dev.perl.org/perl5/docs/p5p-faq.html . +It contains hints on reading perl5-porters, information on how +perl5-porters works and how Perl development in general works. =back @@ -661,8 +427,11 @@ This is very high-level code, enough to fit on a single screen, and it resembles the code found in L; most of the real action takes place in F +F is generated by L from F at +make time, so you should make perl to follow this along. + First, F allocates some memory and constructs a Perl -interpreter: +interpreter, along these lines: 1 PERL_SYS_INIT3(&argc,&argv,&env); 2 @@ -691,16 +460,19 @@ later: C is either your system's C, or Perl's own C as defined in F if you selected that option at configure time. -Next, in line 7, we construct the interpreter; this sets up all the -special variables that Perl needs, the stacks, and so on. +Next, in line 7, we construct the interpreter using perl_construct, +also in F; this sets up all the special variables that Perl +needs, the stacks, and so on. Now we pass Perl the command line options, and tell it to go: exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL); - if (!exitstatus) { - exitstatus = perl_run(my_perl); - } + if (!exitstatus) + perl_run(my_perl); + + exitstatus = perl_destruct(my_perl); + perl_free(my_perl); C is actually a wrapper around C, as defined in F, which processes the command line options, sets up any @@ -716,7 +488,7 @@ there's three things going on here. C, the parser, lives in F, although you're better off reading the original YACC input in F. (Yes, Virginia, there B a YACC grammar for Perl!) The job of the parser is to take your -code and `understand' it, splitting it into sentences, deciding which +code and "understand" it, splitting it into sentences, deciding which operands go with which operators and so on. The parser is nobly assisted by the lexer, which chunks up your input @@ -770,13 +542,164 @@ The C makes sure that things like signals interrupt execution if required. The actual functions called are known as PP code, and they're spread -between four files: F contains the `hot' code, which is most +between four files: F contains the "hot" code, which is most often used and highly optimized, F contains all the system-specific functions, F contains the functions which implement control structures (C, C and the like) and F contains everything else. These are, if you like, the C code for Perl's built-in functions and operators. +Note that each C function is expected to return a pointer to the next +op. Calls to perl subs (and eval blocks) are handled within the same +runops loop, and do not consume extra space on the C stack. For example, +C and C just push a C or C block +struct onto the context stack which contain the address of the op +following the sub call or eval. They then return the first op of that sub +or eval block, and so execution continues of that sub or block. Later, a +C or C op pops the C or C, +retrieves the return op from it, and returns it. + +=item Exception handing + +Perl's exception handing (i.e. C etc.) is built on top of the low-level +C/C C-library functions. These basically provide a +way to capture the current PC and SP registers and later restore them; i.e. +a C continues at the point in code where a previous C +was done, with anything further up on the C stack being lost. This is why +code should always save values using C rather than in auto +variables. + +The perl core wraps C etc in the macros C and +C. The basic rule of perl exceptions is that C, and +C (in the absence of C) perform a C, while +C within C does a C. + +At entry points to perl, such as C, C and +C each does a C, then enter a runops +loop or whatever, and handle possible exception returns. For a 2 return, +final cleanup is performed, such as popping stacks and calling C or +C blocks. Amongst other things, this is how scope cleanup still +occurs during an C. + +If a C can find a C block on the context stack, then the +stack is popped to that level and the return op in that block is assigned +to C; then a C is performed. This normally +passes control back to the guard. In the case of C and +C, a non-null C triggers re-entry to the runops +loop. The is the normal way that C or C is handled within an +C. + +Sometimes ops are executed within an inner runops loop, such as tie, sort +or overload code. In this case, something like + + sub FETCH { eval { die } } + +would cause a longjmp right back to the guard in C, popping both +runops loops, which is clearly incorrect. One way to avoid this is for the +tie code to do a C before executing C in the inner +runops loop, but for efficiency reasons, perl in fact just sets a flag, +using C. The C, C and +C ops check this flag, and if true, they call C, +which does a C and starts a new runops level to execute the +code, rather than doing it on the current loop. + +As a further optimisation, on exit from the eval block in the C, +execution of the code following the block is still carried on in the inner +loop. When an exception is raised, C compares the C +level of the C with C and if they differ, just +re-throws the exception. In this way any inner loops get popped. + +Here's an example. + + 1: eval { tie @a, 'A' }; + 2: sub A::TIEARRAY { + 3: eval { die }; + 4: die; + 5: } + +To run this code, C is called, which does a C then +enters a runops loop. This loop executes the eval and tie ops on line 1, +with the eval pushing a C onto the context stack. + +The C does a C, then starts a second runops loop +to execute the body of C. When it executes the entertry op on +line 3, C is true, so C calls C which +does a C and starts a third runops loop, which then executes +the die op. At this point the C call stack looks like this: + + Perl_pp_die + Perl_runops # third loop + S_docatch_body + S_docatch + Perl_pp_entertry + Perl_runops # second loop + S_call_body + Perl_call_sv + Perl_pp_tie + Perl_runops # first loop + S_run_body + perl_run + main + +and the context and data stacks, as shown by C<-Dstv>, look like: + + STACK 0: MAIN + CX 0: BLOCK => + CX 1: EVAL => AV() PV("A"\0) + retop=leave + STACK 1: MAGIC + CX 0: SUB => + retop=(null) + CX 1: EVAL => * + retop=nextstate + +The die pops the first C off the context stack, sets +C from it, does a C, and control returns to +the top C. This then starts another third-level runops level, +which executes the nextstate, pushmark and die ops on line 4. At the point +that the second C is called, the C call stack looks exactly like +that above, even though we are no longer within an inner eval; this is +because of the optimization mentioned earlier. However, the context stack +now looks like this, ie with the top CxEVAL popped: + + STACK 0: MAIN + CX 0: BLOCK => + CX 1: EVAL => AV() PV("A"\0) + retop=leave + STACK 1: MAGIC + CX 0: SUB => + retop=(null) + +The die on line 4 pops the context stack back down to the CxEVAL, leaving +it as: + + STACK 0: MAIN + CX 0: BLOCK => + +As usual, C is extracted from the C, and a +C done, which pops the C stack back to the docatch: + + S_docatch + Perl_pp_entertry + Perl_runops # second loop + S_call_body + Perl_call_sv + Perl_pp_tie + Perl_runops # first loop + S_run_body + perl_run + main + +In this case, because the C level recorded in the C +differs from the current one, C just does a C +and the C stack unwinds to: + + perl_run + main + +Because C is non-null, C starts a new runops loop +and execution continues. + =back =head2 Internal Variable Types @@ -1014,10 +937,10 @@ If you're not used to reading BNF grammars, this is how it works: You're fed certain things by the tokeniser, which generally end up in upper case. Here, C, is provided when the tokeniser sees C<+> in your code. C is provided when C<=> is used for assigning. These are -`terminal symbols', because you can't get any simpler than them. +"terminal symbols", because you can't get any simpler than them. The grammar, lines one and three of the snippet above, tells you how to -build up more complex forms. These complex forms, `non-terminal symbols' +build up more complex forms. These complex forms, "non-terminal symbols" are generally placed in lower case. C here is a non-terminal symbol, representing a single expression. @@ -1046,8 +969,8 @@ call C to create a new binary operator. The first parameter to C, a function in F, is the op type. It's an addition operator, so we want the type to be C. We could specify this directly, but it's right there as the second token in the input, so we -use C<$2>. The second parameter is the op's flags: 0 means `nothing -special'. Then the things to add: the left and right hand side of our +use C<$2>. The second parameter is the op's flags: 0 means "nothing +special". Then the things to add: the left and right hand side of our expression, in scalar context. =head2 Stacks @@ -1095,18 +1018,18 @@ description of the macros used in stack manipulation. =item Mark stack -I say `your portion of the stack' above because PP code doesn't +I say "your portion of the stack" above because PP code doesn't necessarily get the whole stack to itself: if your function calls another function, you'll only want to expose the arguments aimed for the called function, and not (necessarily) let it get at your own data. The -way we do this is to have a `virtual' bottom-of-stack, exposed to each +way we do this is to have a "virtual" bottom-of-stack, exposed to each function. The mark stack keeps bookmarks to locations in the argument stack usable by each function. For instance, when dealing with a tied -variable, (internally, something with `P' magic) Perl has to call +variable, (internally, something with "P" magic) Perl has to call methods for accesses to the tied variables. However, we need to separate the arguments exposed to the method to the argument exposed to the -original function - the store or fetch or whatever it may be. Here's how -the tied C is implemented; see C in F: +original function - the store or fetch or whatever it may be. Here's +roughly how the tied C is implemented; see C in F: 1 PUSHMARK(SP); 2 EXTEND(SP,2); @@ -1116,11 +1039,6 @@ the tied C is implemented; see C in F: 6 ENTER; 7 call_method("PUSH", G_SCALAR|G_DISCARD); 8 LEAVE; - 9 POPSTACK; - -The lines which concern the mark stack are the first, fifth and last -lines: they save away, restore and remove the current position of the -argument stack. Let's examine the whole implementation, for practice: @@ -1141,8 +1059,8 @@ retrieved with C, and the value, the SV C. 5 PUTBACK; -Next we tell Perl to make the change to the global stack pointer: C -only gave us a local copy, not a reference to the global. +Next we tell Perl to update the global stack pointer from our internal +variable: C only gave us a local copy, not a reference to the global. 6 ENTER; 7 call_method("PUSH", G_SCALAR|G_DISCARD); @@ -1156,12 +1074,9 @@ C<}> of a Perl block. To actually do the magic method call, we have to call a subroutine in Perl space: C takes care of that, and it's described in L. We call the C method in scalar context, and we're -going to discard its return value. - - 9 POPSTACK; - -Finally, we remove the value we placed on the mark stack, since we -don't need it any more. +going to discard its return value. The call_method() function +removes the top element of the mark stack, so there is nothing for +the caller to clean up. =item Save stack @@ -1224,6 +1139,140 @@ You can expand the macros in a F file by saying which will expand the macros using cpp. Don't be scared by the results. +=head1 SOURCE CODE STATIC ANALYSIS + +Various tools exist for analysing C source code B, as +opposed to B, that is, without executing the code. +It is possible to detect resource leaks, undefined behaviour, type +mismatches, portability problems, code paths that would cause illegal +memory accesses, and other similar problems by just parsing the C code +and looking at the resulting graph, what does it tell about the +execution and data flows. As a matter of fact, this is exactly +how C compilers know to give warnings about dubious code. + +=head2 lint, splint + +The good old C code quality inspector, C, is available in +several platforms, but please be aware that there are several +different implementations of it by different vendors, which means that +the flags are not identical across different platforms. + +There is a lint variant called C (Secure Programming Lint) +available from http://www.splint.org/ that should compile on any +Unix-like platform. + +There are C and targets in Makefile, but you may have +to diddle with the flags (see above). + +=head2 Coverity + +Coverity (http://www.coverity.com/) is a product similar to lint and +as a testbed for their product they periodically check several open +source projects, and they give out accounts to open source developers +to the defect databases. + +=head2 cpd (cut-and-paste detector) + +The cpd tool detects cut-and-paste coding. If one instance of the +cut-and-pasted code changes, all the other spots should probably be +changed, too. Therefore such code should probably be turned into a +subroutine or a macro. + +cpd (http://pmd.sourceforge.net/cpd.html) is part of the pmd project +(http://pmd.sourceforge.net/). pmd was originally written for static +analysis of Java code, but later the cpd part of it was extended to +parse also C and C++. + +Download the pmd-bin-X.Y.zip () from the SourceForge site, extract the +pmd-X.Y.jar from it, and then run that on source code thusly: + + java -cp pmd-X.Y.jar net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /some/where/src --language c > cpd.txt + +You may run into memory limits, in which case you should use the -Xmx option: + + java -Xmx512M ... + +=head2 gcc warnings + +Though much can be written about the inconsistency and coverage +problems of gcc warnings (like C<-Wall> not meaning "all the +warnings", or some common portability problems not being covered by +C<-Wall>, or C<-ansi> and C<-pedantic> both being a poorly defined +collection of warnings, and so forth), gcc is still a useful tool in +keeping our coding nose clean. + +The C<-Wall> is by default on. + +The C<-ansi> (and its sidekick, C<-pedantic>) would be nice to be on +always, but unfortunately they are not safe on all platforms, they can +for example cause fatal conflicts with the system headers (Solaris +being a prime example). If Configure C<-Dgccansipedantic> is used, +the C frontend selects C<-ansi -pedantic> for the platforms +where they are known to be safe. + +Starting from Perl 5.9.4 the following extra flags are added: + +=over 4 + +=item * + +C<-Wendif-labels> + +=item * + +C<-Wextra> + +=item * + +C<-Wdeclaration-after-statement> + +=back + +The following flags would be nice to have but they would first need +their own Augean stablemaster: + +=over 4 + +=item * + +C<-Wpointer-arith> + +=item * + +C<-Wshadow> + +=item * + +C<-Wstrict-prototypes> + +=back + +The C<-Wtraditional> is another example of the annoying tendency of +gcc to bundle a lot of warnings under one switch -- it would be +impossible to deploy in practice because it would complain a lot -- but +it does contain some warnings that would be beneficial to have available +on their own, such as the warning about string constants inside macros +containing the macro arguments: this behaved differently pre-ANSI +than it does in ANSI, and some C compilers are still in transition, +AIX being an example. + +=head2 Warnings of other C compilers + +Other C compilers (yes, there B other C compilers than gcc) often +have their "strict ANSI" or "strict ANSI with some portability extensions" +modes on, like for example the Sun Workshop has its C<-Xa> mode on +(though implicitly), or the DEC (these days, HP...) has its C<-std1> +mode on. + +=head2 DEBUGGING + +You can compile a special debugging version of Perl, which allows you +to use the C<-D> option of Perl to tell more about what Perl is doing. +But sometimes there is no alternative than to dive in with a debugger, +either to see the stack trace of a core dump (very useful in a bug +report), or trying to figure out what went wrong before the core dump +happened, or how did we end up having wrong or unexpected results. + =head2 Poking at Perl To really poke around with Perl, you'll probably want to build Perl for @@ -1233,7 +1282,11 @@ debugging, like this: make C<-g> is a flag to the C compiler to have it produce debugging -information which will allow us to step through a running program. +information which will allow us to step through a running program, +and to see in which C function we are at (without the debugging +information we might see only the numerical addresses of the functions, +which is not very helpful). + F will also turn on the C compilation symbol which enables all the internal debugging code in Perl. There are a whole bunch of things you can debug with this: L lists them all, and the @@ -1260,8 +1313,9 @@ through perl's execution with a source-level debugger. =item * -We'll use C for our examples here; the principles will apply to any -debugger, but check the manual of the one you're using. +We'll use C for our examples here; the principles will apply to +any debugger (many vendors call their debugger C), but check the +manual of the one you're using. =back @@ -1269,6 +1323,10 @@ To fire up the debugger, type gdb ./perl +Or if you have a core dump: + + gdb ./perl core + You'll want to do that in your Perl source tree so the debugger can read the source code. You should see the copyright message, followed by the prompt. @@ -1333,7 +1391,7 @@ but you have to say You may find it helpful to have a "macro dictionary", which you can produce by saying C. Even then, F won't -recursively apply those macros for you. +recursively apply those macros for you. =head2 gdb macro support @@ -1349,7 +1407,7 @@ One way to get around this macro hell is to use the dumping functions in F; these work a little like an internal L, but they also cover OPs and other structures that you can't get at from Perl. Let's take an example. We'll use the -C<$a = $b + $c> we used before, but give it a bit of context: +C<$a = $b + $c> we used before, but give it a bit of context: C<$b = "6XXXX"; $c = 2.3;>. Where's a good place to stop and poke around? What about C, the function we examined earlier to implement the @@ -1384,7 +1442,7 @@ C takes the SV from the top of the stack and obtains its NV either directly (if C is set) or by calling the C function. C takes the next SV from the top of the stack - yes, C uses C - but doesn't remove it. We then use C to get the NV from -C in the same way as before - yes, C uses C. +C in the same way as before - yes, C uses C. Since we don't have an NV for C<$b>, we'll have to use C to convert it. If we step again, we'll find ourselves there: @@ -1437,7 +1495,7 @@ similar output to L. All right, we've now had a look at how to navigate the Perl sources and some things you'll need to know when fiddling with them. Let's now get on and create a simple patch. Here's something Larry suggested: if a -C is the first active format during a C, (for example, +C is the first active format during a C, (for example, C) then the resulting string should be treated as UTF-8 encoded. @@ -1523,7 +1581,7 @@ else along the line. The regression tests for each operator live in F, and so we make a copy of F to F. Now we can add our tests to the end. First, we'll test that the C does indeed create -Unicode strings. +Unicode strings. t/op/pack.t has a sensible ok() function, but if it didn't we could use the one from t/test.pl. @@ -1539,8 +1597,8 @@ so instead of this: we can write the more sensible (see L for a full explanation of is() and other testing functions). - is( "1.20.300.4000", sprintf "%vd", pack("U*",1,20,300,4000), - "U* produces unicode" ); + is( "1.20.300.4000", sprintf "%vd", pack("U*",1,20,300,4000), + "U* produces Unicode" ); Now we'll test that we got that space-at-the-beginning business right: @@ -1551,7 +1609,7 @@ And finally we'll test that we don't make Unicode strings if C is B the first active format: isnt( v1.20.300.4000, sprintf "%vd", pack("C0U*",1,20,300,4000), - "U* not first isn't unicode" ); + "U* not first isn't Unicode" ); Mustn't forget to change the number of tests which appears at the top, or else the automated tester will get confused. This will either look @@ -1619,6 +1677,9 @@ the module maintainer (with a copy to p5p). This will help the module maintainer keep the CPAN version in sync with the core version without constantly scanning p5p. +The list of maintainers of core modules is usefully documented in +F. + =head2 Adding a new function to the core If, as part of a patch to fix a bug, or just because you have an @@ -1723,6 +1784,11 @@ The old home for the module tests, you shouldn't put anything new in here. There are still some bits and pieces hanging around in here that need to be moved. Perhaps you could move them? Thanks! +=item F + +Tests for perl's method resolution order implementations +(see L). + =item F Tests for perl's built in functions that don't fit into any of the @@ -1762,7 +1828,7 @@ decision of which to use depends on what part of the test suite you're working on. This is a measure to prevent a high-level failure (such as Config.pm breaking) from causing basic functionality tests to fail. -=over 4 +=over 4 =item t/base t/comp @@ -1787,9 +1853,10 @@ also use the full suite of core modules in the tests. =back When you say "make test" Perl uses the F program to run the -test suite. All tests are run from the F directory, B the -directory which contains the test. This causes some problems with the -tests in F, so here's some opportunity for some patching. +test suite (except under Win32 where it uses F instead.) +All tests are run from the F directory, B the directory +which contains the test. This causes some problems with the tests +in F, so here's some opportunity for some patching. You must be triply conscious of cross-platform concerns. This usually boils down to using File::Spec and avoiding things like C and @@ -1800,7 +1867,8 @@ C unless absolutely necessary. There are various special make targets that can be used to test Perl slightly differently than the standard "test" target. Not all them are expected to give a 100% success rate. Many of them have several -aliases. +aliases, and many of them are not available on certain operating +systems. =over 4 @@ -1808,14 +1876,25 @@ aliases. Run F on all core tests (F and F pragma tests). +(Not available on Win32) + =item test.deparse -Run all the tests through the B::Deparse. Not all tests will succeed. +Run all the tests through B::Deparse. Not all tests will succeed. + +(Not available on Win32) + +=item test.taintwarn + +Run all tests with the B<-t> command-line switch. Not all tests +are expected to succeed (until they're specifically fixed, of course). + +(Not available on Win32) =item minitest Run F on F, F, F, F, F, -F, and F tests. +F, F and F tests. =item test.valgrind check.valgrind utest.valgrind ucheck.valgrind @@ -1827,7 +1906,7 @@ F. (Only in Tru64) Run all the tests using the memory leak + naughty memory access tool "Third Degree". The log files will be named -F. +F. =item test.torture torturetest @@ -1841,6 +1920,18 @@ C<-torture> argument to F. Run all the tests with -Mutf8. Not all tests will succeed. +(Not available on Win32) + +=item minitest.utf16 test.utf16 + +Runs the tests with UTF-16 encoded scripts, encoded with different +versions of this encoding. + +C runs the test suite with a combination of C<-utf8> and +C<-utf16> arguments to F. + +(Not available on Win32) + =item test_harness Run the test suite with the F controlling program, instead of @@ -1850,6 +1941,20 @@ mostly works. The main advantage for our purposes is that it prints a detailed summary of failed tests at the end. Also, unlike F, it doesn't redirect stderr to stdout. +Note that under Win32 F is always used instead of F, so +there is no special "test_harness" target. + +Under Win32's "test" target you may use the TEST_SWITCHES and TEST_FILES +environment variables to control the behaviour of F. This means +you can say + + nmake test TEST_FILES="op/*.t" + nmake test TEST_SWITCHES="-torture" TEST_FILES="op/*.t" + +=item test-notty test_notty + +Sets PERL_SKIP_TTY_TEST to true before running normal test. + =back =head2 Running tests by hand @@ -1865,6 +1970,45 @@ or (if you don't specify test scripts, the whole test suite will be run.) +=head3 Using t/harness for testing + +If you use C for testing you have several command line options +available to you. The arguments are as follows, and are in the order +that they must appear if used together. + + harness -v -torture -re=pattern LIST OF FILES TO TEST + harness -v -torture -re LIST OF PATTERNS TO MATCH + +If C is omitted the file list is obtained from +the manifest. The file list may include shell wildcards which will be +expanded out. + +=over 4 + +=item -v + +Run the tests under verbose mode so you can see what tests were run, +and debug output. + +=item -torture + +Run the torture tests as well as the normal set. + +=item -re=PATTERN + +Filter the file list so that all the test files run match PATTERN. +Note that this form is distinct from the B<-re LIST OF PATTERNS> form below +in that it allows the file list to be provided as well. + +=item -re LIST OF PATTERNS + +Filter the file list so that all the test files run match +/(LIST|OF|PATTERNS)/. Note that with this form the patterns +are joined by '|' and you cannot supply a list of files, instead +the test files are obtained from the MANIFEST. + +=back + You can run an individual test by a command similar to ./perl -I../lib patho/to/foo.t @@ -1872,7 +2016,7 @@ You can run an individual test by a command similar to except that the harnesses set up some environment variables that may affect the execution of the test : -=over 4 +=over 4 =item PERL_CORE=1 @@ -1896,6 +2040,622 @@ running 'make test_notty'. =back +=head3 Other environment variables that may influence tests + +=over 4 + +=item PERL_TEST_Net_Ping + +Setting this variable runs all the Net::Ping modules tests, +otherwise some tests that interact with the outside world are skipped. +See L. + +=item PERL_TEST_NOVREXX + +Setting this variable skips the vrexx.t tests for OS2::REXX. + +=item PERL_TEST_NUMCONVERTS + +This sets a variable in op/numconvert.t. + +=back + +See also the documentation for the Test and Test::Harness modules, +for more environment variables that affect testing. + +=head2 Common problems when patching Perl source code + +Perl source plays by ANSI C89 rules: no C99 (or C++) extensions. In +some cases we have to take pre-ANSI requirements into consideration. +You don't care about some particular platform having broken Perl? +I hear there is still a strong demand for J2EE programmers. + +=head2 Perl environment problems + +=over 4 + +=item * + +Not compiling with threading + +Compiling with threading (-Duseithreads) completely rewrites +the function prototypes of Perl. You better try your changes +with that. Related to this is the difference between "Perl_-less" +and "Perl_-ly" APIs, for example: + + Perl_sv_setiv(aTHX_ ...); + sv_setiv(...); + +The first one explicitly passes in the context, which is needed for e.g. +threaded builds. The second one does that implicitly; do not get them +mixed. If you are not passing in a aTHX_, you will need to do a dTHX +(or a dVAR) as the first thing in the function. + +See L +for further discussion about context. + +=item * + +Not compiling with -DDEBUGGING + +The DEBUGGING define exposes more code to the compiler, +therefore more ways for things to go wrong. You should try it. + +=item * + +Introducing (non-read-only) globals + +Do not introduce any modifiable globals, truly global or file static. +They are bad form and complicate multithreading and other forms of +concurrency. The right way is to introduce them as new interpreter +variables, see F (at the very end for binary compatibility). + +Introducing read-only (const) globals is okay, as long as you verify +with e.g. C (if your C has +BSD-style output) that the data you added really is read-only. +(If it is, it shouldn't show up in the output of that command.) + +If you want to have static strings, make them constant: + + static const char etc[] = "..."; + +If you want to have arrays of constant strings, note carefully +the right combination of Cs: + + static const char * const yippee[] = + {"hi", "ho", "silver"}; + +There is a way to completely hide any modifiable globals (they are all +moved to heap), the compilation setting C<-DPERL_GLOBAL_STRUCT_PRIVATE>. +It is not normally used, but can be used for testing, read more +about it in L. + +=item * + +Not exporting your new function + +Some platforms (Win32, AIX, VMS, OS/2, to name a few) require any +function that is part of the public API (the shared Perl library) +to be explicitly marked as exported. See the discussion about +F in L. + +=item * + +Exporting your new function + +The new shiny result of either genuine new functionality or your +arduous refactoring is now ready and correctly exported. So what +could possibly go wrong? + +Maybe simply that your function did not need to be exported in the +first place. Perl has a long and not so glorious history of exporting +functions that it should not have. + +If the function is used only inside one source code file, make it +static. See the discussion about F in L. + +If the function is used across several files, but intended only for +Perl's internal use (and this should be the common case), do not +export it to the public API. See the discussion about F +in L. + +=back + +=head2 Portability problems + +The following are common causes of compilation and/or execution +failures, not common to Perl as such. The C FAQ is good bedtime +reading. Please test your changes with as many C compilers and +platforms as possible -- we will, anyway, and it's nice to save +oneself from public embarrassment. + +If using gcc, you can add the C<-std=c89> option which will hopefully +catch most of these unportabilities. (However it might also catch +incompatibilities in your system's header files.) + +Use the Configure C<-Dgccansipedantic> flag to enable the gcc +C<-ansi -pedantic> flags which enforce stricter ANSI rules. + +If using the C note that not all the possible warnings +(like C<-Wunitialized>) are given unless you also compile with C<-O>. + +Note that if using gcc, starting from Perl 5.9.5 the Perl core source +code files (the ones at the top level of the source code distribution, +but not e.g. the extensions under ext/) are automatically compiled +with as many as possible of the C<-std=c89>, C<-ansi>, C<-pedantic>, +and a selection of C<-W> flags (see cflags.SH). + +Also study L carefully to avoid any bad assumptions +about the operating system, filesystems, and so forth. + +You may once in a while try a "make microperl" to see whether we +can still compile Perl with just the bare minimum of interfaces. +(See README.micro.) + +Do not assume an operating system indicates a certain compiler. + +=over 4 + +=item * + +Casting pointers to integers or casting integers to pointers + + void castaway(U8* p) + { + IV i = p; + +or + + void castaway(U8* p) + { + IV i = (IV)p; + +Both are bad, and broken, and unportable. Use the PTR2IV() +macro that does it right. (Likewise, there are PTR2UV(), PTR2NV(), +INT2PTR(), and NUM2PTR().) + +=item * + +Casting between data function pointers and data pointers + +Technically speaking casting between function pointers and data +pointers is unportable and undefined, but practically speaking +it seems to work, but you should use the FPTR2DPTR() and DPTR2FPTR() +macros. Sometimes you can also play games with unions. + +=item * + +Assuming sizeof(int) == sizeof(long) + +There are platforms where longs are 64 bits, and platforms where ints +are 64 bits, and while we are out to shock you, even platforms where +shorts are 64 bits. This is all legal according to the C standard. +(In other words, "long long" is not a portable way to specify 64 bits, +and "long long" is not even guaranteed to be any wider than "long".) + +Instead, use the definitions IV, UV, IVSIZE, I32SIZE, and so forth. +Avoid things like I32 because they are B guaranteed to be +I 32 bits, they are I 32 bits, nor are they +guaranteed to be B or B. If you really explicitly need +64-bit variables, use I64 and U64, but only if guarded by HAS_QUAD. + +=item * + +Assuming one can dereference any type of pointer for any type of data + + char *p = ...; + long pony = *p; /* BAD */ + +Many platforms, quite rightly so, will give you a core dump instead +of a pony if the p happens not be correctly aligned. + +=item * + +Lvalue casts + + (int)*p = ...; /* BAD */ + +Simply not portable. Get your lvalue to be of the right type, +or maybe use temporary variables, or dirty tricks with unions. + +=item * + +Assume B about structs (especially the ones you +don't control, like the ones coming from the system headers) + +=over 8 + +=item * + +That a certain field exists in a struct + +=item * + +That no other fields exist besides the ones you know of + +=item * + +That a field is of certain signedness, sizeof, or type + +=item * + +That the fields are in a certain order + +=over 8 + +=item * + +While C guarantees the ordering specified in the struct definition, +between different platforms the definitions might differ + +=back + +=item * + +That the sizeof(struct) or the alignments are the same everywhere + +=over 8 + +=item * + +There might be padding bytes between the fields to align the fields - +the bytes can be anything + +=item * + +Structs are required to be aligned to the maximum alignment required +by the fields - which for native types is for usually equivalent to +sizeof() of the field + +=back + +=back + +=item * + +Assuming the character set is ASCIIish + +Perl can compile and run under EBCDIC platforms. See L. +This is transparent for the most part, but because the character sets +differ, you shouldn't use numeric (decimal, octal, nor hex) constants +to refer to characters. You can safely say 'A', but not 0x41. +You can safely say '\n', but not \012. +If a character doesn't have a trivial input form, you can +create a #define for it in both C and C, so that +it resolves to different values depending on the character set being used. +(There are three different EBCDIC character sets defined in C, +so it might be best to insert the #define three times in that file.) + +Also, the range 'A' - 'Z' in ASCII is an unbroken sequence of 26 upper case +alphabetic characters. That is not true in EBCDIC. Nor for 'a' to 'z'. +But '0' - '9' is an unbroken range in both systems. Don't assume anything +about other ranges. + +Many of the comments in the existing code ignore the possibility of EBCDIC, +and may be wrong therefore, even if the code works. +This is actually a tribute to the successful transparent insertion of being +able to handle EBCDIC without having to change pre-existing code. + +UTF-8 and UTF-EBCDIC are two different encodings used to represent Unicode +code points as sequences of bytes. Macros +with the same names (but different definitions) +in C and C +are used to allow the calling code to think that there is only one such +encoding. +This is almost always referred to as C, but it means the EBCDIC version +as well. Again, comments in the code may well be wrong even if the code itself +is right. +For example, the concept of C differs between ASCII and +EBCDIC. +On ASCII platforms, only characters that do not have the high-order +bit set (i.e. whose ordinals are strict ASCII, 0 - 127) +are invariant, and the documentation and comments in the code +may assume that, +often referring to something like, say, C. +The situation differs and is not so simple on EBCDIC machines, but as long as +the code itself uses the C macro appropriately, it +works, even if the comments are wrong. + +=item * + +Assuming the character set is just ASCII + +ASCII is a 7 bit encoding, but bytes have 8 bits in them. The 128 extra +characters have different meanings depending on the locale. Absent a locale, +currently these extra characters are generally considered to be unassigned, +and this has presented some problems. +This is scheduled to be changed in 5.12 so that these characters will +be considered to be Latin-1 (ISO-8859-1). + +=item * + +Mixing #define and #ifdef + + #define BURGLE(x) ... \ + #ifdef BURGLE_OLD_STYLE /* BAD */ + ... do it the old way ... \ + #else + ... do it the new way ... \ + #endif + +You cannot portably "stack" cpp directives. For example in the above +you need two separate BURGLE() #defines, one for each #ifdef branch. + +=item * + +Adding non-comment stuff after #endif or #else + + #ifdef SNOSH + ... + #else !SNOSH /* BAD */ + ... + #endif SNOSH /* BAD */ + +The #endif and #else cannot portably have anything non-comment after +them. If you want to document what is going (which is a good idea +especially if the branches are long), use (C) comments: + + #ifdef SNOSH + ... + #else /* !SNOSH */ + ... + #endif /* SNOSH */ + +The gcc option C<-Wendif-labels> warns about the bad variant +(by default on starting from Perl 5.9.4). + +=item * + +Having a comma after the last element of an enum list + + enum color { + CERULEAN, + CHARTREUSE, + CINNABAR, /* BAD */ + }; + +is not portable. Leave out the last comma. + +Also note that whether enums are implicitly morphable to ints +varies between compilers, you might need to (int). + +=item * + +Using //-comments + + // This function bamfoodles the zorklator. /* BAD */ + +That is C99 or C++. Perl is C89. Using the //-comments is silently +allowed by many C compilers but cranking up the ANSI C89 strictness +(which we like to do) causes the compilation to fail. + +=item * + +Mixing declarations and code + + void zorklator() + { + int n = 3; + set_zorkmids(n); /* BAD */ + int q = 4; + +That is C99 or C++. Some C compilers allow that, but you shouldn't. + +The gcc option C<-Wdeclaration-after-statements> scans for such problems +(by default on starting from Perl 5.9.4). + +=item * + +Introducing variables inside for() + + for(int i = ...; ...; ...) { /* BAD */ + +That is C99 or C++. While it would indeed be awfully nice to have that +also in C89, to limit the scope of the loop variable, alas, we cannot. + +=item * + +Mixing signed char pointers with unsigned char pointers + + int foo(char *s) { ... } + ... + unsigned char *t = ...; /* Or U8* t = ... */ + foo(t); /* BAD */ + +While this is legal practice, it is certainly dubious, and downright +fatal in at least one platform: for example VMS cc considers this a +fatal error. One cause for people often making this mistake is that a +"naked char" and therefore dereferencing a "naked char pointer" have +an undefined signedness: it depends on the compiler and the flags of +the compiler and the underlying platform whether the result is signed +or unsigned. For this very same reason using a 'char' as an array +index is bad. + +=item * + +Macros that have string constants and their arguments as substrings of +the string constants + + #define FOO(n) printf("number = %d\n", n) /* BAD */ + FOO(10); + +Pre-ANSI semantics for that was equivalent to + + printf("10umber = %d\10"); + +which is probably not what you were expecting. Unfortunately at least +one reasonably common and modern C compiler does "real backward +compatibility" here, in AIX that is what still happens even though the +rest of the AIX compiler is very happily C89. + +=item * + +Using printf formats for non-basic C types + + IV i = ...; + printf("i = %d\n", i); /* BAD */ + +While this might by accident work in some platform (where IV happens +to be an C), in general it cannot. IV might be something larger. +Even worse the situation is with more specific types (defined by Perl's +configuration step in F): + + Uid_t who = ...; + printf("who = %d\n", who); /* BAD */ + +The problem here is that Uid_t might be not only not C-wide +but it might also be unsigned, in which case large uids would be +printed as negative values. + +There is no simple solution to this because of printf()'s limited +intelligence, but for many types the right format is available as +with either 'f' or '_f' suffix, for example: + + IVdf /* IV in decimal */ + UVxf /* UV is hexadecimal */ + + printf("i = %"IVdf"\n", i); /* The IVdf is a string constant. */ + + Uid_t_f /* Uid_t in decimal */ + + printf("who = %"Uid_t_f"\n", who); + +Or you can try casting to a "wide enough" type: + + printf("i = %"IVdf"\n", (IV)something_very_small_and_signed); + +Also remember that the C<%p> format really does require a void pointer: + + U8* p = ...; + printf("p = %p\n", (void*)p); + +The gcc option C<-Wformat> scans for such problems. + +=item * + +Blindly using variadic macros + +gcc has had them for a while with its own syntax, and C99 brought +them with a standardized syntax. Don't use the former, and use +the latter only if the HAS_C99_VARIADIC_MACROS is defined. + +=item * + +Blindly passing va_list + +Not all platforms support passing va_list to further varargs (stdarg) +functions. The right thing to do is to copy the va_list using the +Perl_va_copy() if the NEED_VA_COPY is defined. + +=item * + +Using gcc statement expressions + + val = ({...;...;...}); /* BAD */ + +While a nice extension, it's not portable. The Perl code does +admittedly use them if available to gain some extra speed +(essentially as a funky form of inlining), but you shouldn't. + +=item * + +Binding together several statements in a macro + +Use the macros STMT_START and STMT_END. + + STMT_START { + ... + } STMT_END + +=item * + +Testing for operating systems or versions when should be testing for features + + #ifdef __FOONIX__ /* BAD */ + foo = quux(); + #endif + +Unless you know with 100% certainty that quux() is only ever available +for the "Foonix" operating system B that is available B +correctly working for B past, present, B future versions of +"Foonix", the above is very wrong. This is more correct (though still +not perfect, because the below is a compile-time check): + + #ifdef HAS_QUUX + foo = quux(); + #endif + +How does the HAS_QUUX become defined where it needs to be? Well, if +Foonix happens to be UNIXy enough to be able to run the Configure +script, and Configure has been taught about detecting and testing +quux(), the HAS_QUUX will be correctly defined. In other platforms, +the corresponding configuration step will hopefully do the same. + +In a pinch, if you cannot wait for Configure to be educated, +or if you have a good hunch of where quux() might be available, +you can temporarily try the following: + + #if (defined(__FOONIX__) || defined(__BARNIX__)) + # define HAS_QUUX + #endif + + ... + + #ifdef HAS_QUUX + foo = quux(); + #endif + +But in any case, try to keep the features and operating systems separate. + +=back + +=head2 Problematic System Interfaces + +=over 4 + +=item * + +malloc(0), realloc(0), calloc(0, 0) are non-portable. To be portable +allocate at least one byte. (In general you should rarely need to +work at this low level, but instead use the various malloc wrappers.) + +=item * + +snprintf() - the return type is unportable. Use my_snprintf() instead. + +=back + +=head2 Security problems + +Last but not least, here are various tips for safer coding. + +=over 4 + +=item * + +Do not use gets() + +Or we will publicly ridicule you. Seriously. + +=item * + +Do not use strcpy() or strcat() or strncpy() or strncat() + +Use my_strlcpy() and my_strlcat() instead: they either use the native +implementation, or Perl's own implementation (borrowed from the public +domain implementation of INN). + +=item * + +Do not use sprintf() or vsprintf() + +If you really want just plain byte strings, use my_snprintf() +and my_vsnprintf() instead, which will try to use snprintf() and +vsnprintf() if those safer APIs are available. If you want something +fancier than a plain byte string, use SVs and Perl_sv_catpvf(). + +=back + =head1 EXTERNAL TOOLS FOR DEBUGGING PERL Sometimes it helps to use external tools while debugging and @@ -1936,6 +2696,10 @@ errors within eval or require, seeing C in the call stack is a good sign of these. Fixing these leaks is non-trivial, unfortunately, but they must be fixed eventually. +B: L will not clean up after itself completely +unless Perl is built with the Configure option +C<-Accflags=-DDL_UNLOAD_ALL_AT_EXIT>. + =head2 Rational Software's Purify Purify is a commercial tool that is helpful in identifying @@ -1986,7 +2750,7 @@ number of bogus leak reports from Purify. Once you've compiled a perl suitable for Purify'ing, then you can just: - make pureperl + make pureperl which creates a binary named 'pureperl' that has been Purify'ed. This binary is used in place of the standard 'perl' binary @@ -1998,7 +2762,7 @@ perl as: make pureperl cd t - ../pureperl -I../lib harness + ../pureperl -I../lib harness which would run Perl on test.pl and report any memory problems. @@ -2037,7 +2801,7 @@ should change to get the most use out of Purify: You should add -DPURIFY to the DEFINES line so the DEFINES line looks something like: - DEFINES = -DWIN32 -D_CONSOLE -DNO_STRICT $(CRYPT_FLAG) -DPURIFY=1 + DEFINES = -DWIN32 -D_CONSOLE -DNO_STRICT $(CRYPT_FLAG) -DPURIFY=1 to disable Perl's arena memory allocation functions, as well as to force use of memory allocation functions derived @@ -2069,7 +2833,7 @@ standard Perl testset you would create and run Purify as: cd win32 make cd ../t - purify ../perl -I../lib harness + purify ../perl -I../lib harness which would instrument Perl in memory, run Perl on test.pl, then finally report any memory problems. @@ -2077,10 +2841,14 @@ then finally report any memory problems. =head2 valgrind The excellent valgrind tool can be used to find out both memory leaks -and illegal memory accesses. As of August 2003 it unfortunately works -only on x86 (ELF) Linux. The special "test.valgrind" target can be used -to run the tests under valgrind. Found errors and memory leaks are -logged in files named F. +and illegal memory accesses. As of version 3.3.0, Valgrind only +supports Linux on x86, x86-64 and PowerPC. The special "test.valgrind" +target can be used to run the tests under valgrind. Found errors +and memory leaks are logged in files named F. + +Valgrind also provides a cachegrind tool, invoked on perl as: + + VG_OPTS=--tool=cachegrind make test.valgrind As system libraries (most notably glibc) are also triggering errors, valgrind allows to suppress such errors using suppression files. The @@ -2143,10 +2911,58 @@ documentation for more information. Also, spawned threads do the equivalent of setting this variable to the value 1.) If, at the end of a run you get the message I, you can -recompile with C<-DDEBUG_LEAKING_SCALARS>, which will cause -the addresses of all those leaked SVs to be dumped; it also converts -C from a macro into a real function, so you can use your -favourite debugger to discover where those pesky SVs were allocated. +recompile with C<-DDEBUG_LEAKING_SCALARS>, which will cause the addresses +of all those leaked SVs to be dumped along with details as to where each +SV was originally allocated. This information is also displayed by +Devel::Peek. Note that the extra details recorded with each SV increases +memory usage, so it shouldn't be used in production environments. It also +converts C from a macro into a real function, so you can use +your favourite debugger to discover where those pesky SVs were allocated. + +If you see that you're leaking memory at runtime, but neither valgrind +nor C<-DDEBUG_LEAKING_SCALARS> will find anything, you're probably +leaking SVs that are still reachable and will be properly cleaned up +during destruction of the interpreter. In such cases, using the C<-Dm> +switch can point you to the source of the leak. If the executable was +built with C<-DDEBUG_LEAKING_SCALARS>, C<-Dm> will output SV allocations +in addition to memory allocations. Each SV allocation has a distinct +serial number that will be written on creation and destruction of the SV. +So if you're executing the leaking code in a loop, you need to look for +SVs that are created, but never destroyed between each cycle. If such an +SV is found, set a conditional breakpoint within C and make it +break only when C is equal to the serial number of the +leaking SV. Then you will catch the interpreter in exactly the state +where the leaking SV is allocated, which is sufficient in many cases to +find the source of the leak. + +As C<-Dm> is using the PerlIO layer for output, it will by itself +allocate quite a bunch of SVs, which are hidden to avoid recursion. +You can bypass the PerlIO layer if you use the SV logging provided +by C<-DPERL_MEM_LOG> instead. + +=head2 PERL_MEM_LOG + +If compiled with C<-DPERL_MEM_LOG>, all Newx() and Renew() allocations +and Safefree() in the Perl core go through logging functions, which is +handy for breakpoint setting. If also compiled with C<-DPERL_MEM_LOG_STDERR>, +the allocations and frees are logged to STDERR (or more precisely, to the +file descriptor 2) in these logging functions, with the calling source code +file and line number (and C function name, if supported by the C compiler). + +This logging is somewhat similar to C<-Dm> but independent of C<-DDEBUGGING>, +and at a higher level (the C<-Dm> is directly at the point of C, +while the C is at the level of C). + +In addition to memory allocations, SV allocations will be logged, just as +with C<-Dm>. However, since the logging doesn't use PerlIO, all SV allocations +are logged and no extra SV allocations are introduced by enabling the logging. +If compiled with C<-DDEBUG_LEAKING_SCALARS>, the serial number for each SV +allocation is also logged. + +You can control the logging from your environment if you compile with +C<-DPERL_MEM_LOG_ENV>. Then you need to explicitly set C and/or +C to a non-zero value to enable logging of memory and/or SV +allocations. =head2 Profiling @@ -2225,6 +3041,13 @@ Display routines that have zero usage. For more detailed explanation of the available commands and output formats, see your own local documentation of gprof. +quick hint: + + $ sh Configure -des -Dusedevel -Doptimize='-g' -Accflags='-pg' -Aldflags='-pg' && make + $ ./perl someprog # creates gmon.out in current directory + $ gprof perl > out + $ view out + =head2 GCC gcov Profiling Starting from GCC 3.0 I is officially available @@ -2261,6 +3084,15 @@ and its section titled "8. gcov: a Test Coverage Program" http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc_8.html#SEC132 +quick hint: + + $ sh Configure -des -Doptimize='-g' -Accflags='-fprofile-arcs -ftest-coverage' \ + -Aldflags='-fprofile-arcs -ftest-coverage' && make perl.gcov + $ rm -f regexec.c.gcov regexec.gcda + $ ./perl.gcov + $ gcov regexec.c + $ view regexec.c.gcov + =head2 Pixie Profiling Pixie is a profiling tool available on IRIX and Tru64 (aka Digital @@ -2376,23 +3208,65 @@ section. =item * -If you see in a debugger a memory area mysteriously full of 0xabababab, -you may be seeing the effect of the Poison() macro, see L. +If you see in a debugger a memory area mysteriously full of 0xABABABAB +or 0xEFEFEFEF, you may be seeing the effect of the Poison() macros, +see L. + +=item * + +Under ithreads the optree is read only. If you want to enforce this, to check +for write accesses from buggy code, compile with C<-DPL_OP_SLAB_ALLOC> to +enable the OP slab allocator and C<-DPERL_DEBUG_READONLY_OPS> to enable code +that allocates op memory via C, and sets it read-only at run time. +Any write access to an op results in a C and abort. + +This code is intended for development only, and may not be portable even to +all Unix variants. Also, it is an 80% solution, in that it isn't able to make +all ops read only. Specifically it + +=over + +=item 1 + +Only sets read-only on all slabs of ops at C time, hence ops allocated +later via C or C will be re-write + +=item 2 + +Turns an entire slab of ops read-write if the refcount of any op in the slab +needs to be decreased. + +=item 3 + +Turns an entire slab of ops read-write if any op from the slab is freed. + +=back + +It's not possible to turn the slabs to read-only after an action requiring +read-write access, as either can happen during op tree building time, so +there may still be legitimate write access. + +However, as an 80% solution it is still effective, as currently it catches +a write access during the generation of F, which means that we +can't yet build F with this enabled. =back -=head2 CONCLUSION -We've had a brief look around the Perl source, an overview of the stages -F goes through when it's running your code, and how to use a -debugger to poke at the Perl guts. We took a very simple problem and -demonstrated how to solve it fully - with documentation, regression -tests, and finally a patch for submission to p5p. Finally, we talked -about how to use external tools to debug and test Perl. +=head1 CONCLUSION + +We've had a brief look around the Perl source, how to maintain quality +of the source code, an overview of the stages F goes through +when it's running your code, how to use debuggers to poke at the Perl +guts, and finally how to analyse the execution of Perl. We took a very +simple problem and demonstrated how to solve it fully - with +documentation, regression tests, and finally a patch for submission to +p5p. Finally, we talked about how to use external tools to debug and +test Perl. I'd now suggest you read over those references again, and then, as soon as possible, get your hands dirty. The best way to learn is by doing, -so: +so: =over 3 @@ -2430,11 +3304,69 @@ activity as well, and probably sooner than you'd think. =back -If you can do these things, you've started on the long road to Perl porting. +If you can do these things, you've started on the long road to Perl porting. Thanks for wanting to help make Perl better - and happy hacking! +=head2 Metaphoric Quotations + +If you recognized the quote about the Road above, you're in luck. + +Most software projects begin each file with a literal description of each +file's purpose. Perl instead begins each with a literary allusion to that +file's purpose. + +Like chapters in many books, all top-level Perl source files (along with a +few others here and there) begin with an epigramic inscription that alludes, +indirectly and metaphorically, to the material you're about to read. + +Quotations are taken from writings of J.R.R Tolkien pertaining to his +Legendarium, almost always from I. Chapters and +page numbers are given using the following editions: + +=over 4 + +=item * + +I, by J.R.R. Tolkien. The hardcover, 70th-anniversary +edition of 2007 was used, published in the UK by Harper Collins Publishers +and in the US by the Houghton Mifflin Company. + +=item * + +I, by J.R.R. Tolkien. The hardcover, +50th-anniversary edition of 2004 was used, published in the UK by Harper +Collins Publishers and in the US by the Houghton Mifflin Company. + +=item * + +I, by J.R.R. Tolkien and published posthumously by his +son and literary executor, C.J.R. Tolkien, being the 3rd of the 12 volumes +in Christopher's mammoth I. Page numbers derive +from the hardcover edition, first published in 1983 by George Allen & +Unwin; no page numbers changed for the special 3-volume omnibus edition of +2002 or the various trade-paper editions, all again now by Harper Collins +or Houghton Mifflin. + +=back + +Other JRRT books fair game for quotes would thus include I, I, I, and I, all but the first posthumously assembled by CJRT. +But I itself is perfectly fine and probably best to +quote from, provided you can find a suitable quote there. + +So if you were to supply a new, complete, top-level source file to add to +Perl, you should conform to this peculiar practice by yourself selecting an +appropriate quotation from Tolkien, retaining the original spelling and +punctuation and using the same format the rest of the quotes are in. +Indirect and oblique is just fine; remember, it's a metaphor, so being meta +is, after all, what it's for. + =head1 AUTHOR This document was written by Nathan Torkington, and is maintained by the perl5-porters mailing list. +=head1 SEE ALSO + +L