X-Git-Url: https://perl5.git.perl.org/perl5.git/blobdiff_plain/e10204135b763e864169cd1f19037fc2f8c37385..54e0f05ce4bb904f953dde352028f27b07cb1fdf:/pod/perlhack.pod diff --git a/pod/perlhack.pod b/pod/perlhack.pod index fb9bdb8..9be3952 100644 --- a/pod/perlhack.pod +++ b/pod/perlhack.pod @@ -1,1974 +1,835 @@ -=head1 NAME - -perlhack - How to hack at the Perl internals +=encoding utf8 -=head1 DESCRIPTION +=for comment +Consistent formatting of this file is achieved with: + perl ./Porting/podtidy pod/perlhack.pod -This document attempts to explain how Perl development takes place, -and ends with some suggestions for people wanting to become bona fide -porters. +=head1 NAME -The perl5-porters mailing list is where the Perl standard distribution -is maintained and developed. The list can get anywhere from 10 to 150 -messages a day, depending on the heatedness of the debate. Most days -there are two or three patches, extensions, features, or bugs being -discussed at a time. +perlhack - How to hack on Perl -A searchable archive of the list is at either: +=head1 DESCRIPTION - http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/ +This document explains how Perl development works. It includes details +about the Perl 5 Porters email list, the Perl repository, the Perlbug +bug tracker, patch guidelines, and commentary on Perl development +philosophy. -or +=head1 SUPER QUICK PATCH GUIDE - http://archive.develooper.com/perl5-porters@perl.org/ - -List subscribers (the porters themselves) come in several flavours. -Some are quiet curious lurkers, who rarely pitch in and instead watch -the ongoing development to ensure they're forewarned of new changes or -features in Perl. Some are representatives of vendors, who are there -to make sure that Perl continues to compile and work on their -platforms. Some patch any reported bug that they know how to fix, -some are actively patching their pet area (threads, Win32, the regexp -engine), while others seem to do nothing but complain. In other -words, it's your usual mix of technical people. - -Over this group of porters presides Larry Wall. He has the final word -in what does and does not change in the Perl language. Various -releases of Perl are shepherded by a "pumpking", a porter -responsible for gathering patches, deciding on a patch-by-patch, -feature-by-feature basis what will and will not go into the release. -For instance, Gurusamy Sarathy was the pumpking for the 5.6 release of -Perl, and Jarkko Hietaniemi was the pumpking for the 5.8 release, and -Rafael Garcia-Suarez holds the pumpking crown for the 5.10 release. - -In addition, various people are pumpkings for different things. For -instance, Andy Dougherty and Jarkko Hietaniemi did a grand job as the -I pumpkin up till the 5.8 release. For the 5.10 release -H.Merijn Brand took over. - -Larry sees Perl development along the lines of the US government: -there's the Legislature (the porters), the Executive branch (the -pumpkings), and the Supreme Court (Larry). The legislature can -discuss and submit patches to the executive branch all they like, but -the executive branch is free to veto them. Rarely, the Supreme Court -will side with the executive branch over the legislature, or the -legislature over the executive branch. Mostly, however, the -legislature and the executive branch are supposed to get along and -work out their differences without impeachment or court cases. - -You might sometimes see reference to Rule 1 and Rule 2. Larry's power -as Supreme Court is expressed in The Rules: +If you just want to submit a single small patch like a pod fix, a test +for a bug, comment fixes, etc., it's easy! Here's how: =over 4 -=item 1 +=item * Check out the source repository -Larry is always by definition right about how Perl should behave. -This means he has final veto power on the core functionality. +The perl source is in a git repository. You can clone the repository +with the following command: -=item 2 + % git clone git://perl5.git.perl.org/perl.git perl -Larry is allowed to change his mind about any matter at a later date, -regardless of whether he previously invoked Rule 1. - -=back +=item * Make your change -Got that? Larry is always right, even when he was wrong. It's rare -to see either Rule exercised, but they are often alluded to. +Hack, hack, hack. -New features and extensions to the language are contentious, because -the criteria used by the pumpkings, Larry, and other porters to decide -which features should be implemented and incorporated are not codified -in a few small design goals as with some other languages. Instead, -the heuristics are flexible and often difficult to fathom. Here is -one person's list, roughly in decreasing order of importance, of -heuristics that new features have to be weighed against: +=item * Test your change -=over 4 +You can run all the tests with the following commands: -=item Does concept match the general goals of Perl? + % ./Configure -des -Dusedevel + % make test -These haven't been written anywhere in stone, but one approximation -is: +Keep hacking until the tests pass. - 1. Keep it fast, simple, and useful. - 2. Keep features/concepts as orthogonal as possible. - 3. No arbitrary limits (platforms, data sizes, cultures). - 4. Keep it open and exciting to use/patch/advocate Perl everywhere. - 5. Either assimilate new technologies, or build bridges to them. +=item * Commit your change -=item Where is the implementation? +Commiting your work will save the change I: -All the talk in the world is useless without an implementation. In -almost every case, the person or people who argue for a new feature -will be expected to be the ones who implement it. Porters capable -of coding new features have their own agendas, and are not available -to implement your (possibly good) idea. + % git commit -a -m 'Commit message goes here' -=item Backwards compatibility +Make sure the commit message describes your change in a single +sentence. For example, "Fixed spelling errors in perlhack.pod". -It's a cardinal sin to break existing Perl programs. New warnings are -contentious--some say that a program that emits warnings is not -broken, while others say it is. Adding keywords has the potential to -break programs, changing the meaning of existing token sequences or -functions might break programs. +=item * Send your change to perlbug -=item Could it be a module instead? +The next step is to submit your patch to the Perl core ticket system +via email. -Perl 5 has extension mechanisms, modules and XS, specifically to avoid -the need to keep changing the Perl interpreter. You can write modules -that export functions, you can give those functions prototypes so they -can be called like built-in functions, you can even write XS code to -mess with the runtime data structures of the Perl interpreter if you -want to implement really complicated things. If it can be done in a -module instead of in the core, it's highly unlikely to be added. +Assuming your patch consists of a single git commit, you can send it to +perlbug with this command line: -=item Is the feature generic enough? + % git format-patch HEAD^1..HEAD + % perlbug -s '[PATCH] `git log --pretty=format:%s HEAD^1..HEAD`' -f 0001-*.patch -Is this something that only the submitter wants added to the language, -or would it be broadly useful? Sometimes, instead of adding a feature -with a tight focus, the porters might decide to wait until someone -implements the more generalized feature. For instance, instead of -implementing a "delayed evaluation" feature, the porters are waiting -for a macro system that would permit delayed evaluation and much more. +The perlbug program will ask you a few questions about your email +address and the patch you're submitting. Once you've answered them you +can submit your patch. -=item Does it potentially introduce new bugs? +=item * Thank you -Radical rewrites of large chunks of the Perl interpreter have the -potential to introduce new bugs. The smaller and more localized the -change, the better. +The porters appreciate the time you spent helping to make Perl better. +Thank you! -=item Does it preclude other desirable features? +=back -A patch is likely to be rejected if it closes off future avenues of -development. For instance, a patch that placed a true and final -interpretation on prototypes is likely to be rejected because there -are still options for the future of prototypes that haven't been -addressed. +=head1 BUG REPORTING -=item Is the implementation robust? +If you want to report a bug in Perl you must use the F command +line tool. This tool will ensure that your bug report includes all the +relevant system and configuration information. -Good patches (tight code, complete, correct) stand more chance of -going in. Sloppy or incorrect patches might be placed on the back -burner until the pumpking has time to fix, or might be discarded -altogether without further notice. +To browse existing Perl bugs and patches, you can use the web interface +at L. -=item Is the implementation generic enough to be portable? +Please check the archive of the perl5-porters list (see below) and/or +the bug tracking system before submitting a bug report. Often, you'll +find that the bug has been reported already. -The worst patches make use of a system-specific features. It's highly -unlikely that non-portable additions to the Perl language will be -accepted. +You can log in to the bug tracking system and comment on existing bug +reports. If you have additional information regarding an existing bug, +please add it. This will help the porters fix the bug. -=item Is the implementation tested? +=head1 PERL 5 PORTERS -Patches which change behaviour (fixing bugs or introducing new features) -must include regression tests to verify that everything works as expected. -Without tests provided by the original author, how can anyone else changing -perl in the future be sure that they haven't unwittingly broken the behaviour -the patch implements? And without tests, how can the patch's author be -confident that his/her hard work put into the patch won't be accidentally -thrown away by someone in the future? +The perl5-porters (p5p) mailing list is where the Perl standard +distribution is maintained and developed. The people who maintain Perl +are also referred to as the "Perl 5 Porters", or just the "porters". -=item Is there enough documentation? +A searchable archive of the list is available at +L. There is +also another archive at +L. -Patches without documentation are probably ill-thought out or -incomplete. Nothing can be added without documentation, so submitting -a patch for the appropriate manpages as well as the source code is -always a good idea. +=head2 perl-changes mailing list -=item Is there another way to do it? +The perl5-changes mailing list receives a copy of each patch that gets +submitted to the maintenance and development branches of the perl +repository. See L for +subscription and archive information. -Larry said "Although the Perl Slogan is I, I hesitate to make 10 ways to do something". This is a -tricky heuristic to navigate, though--one man's essential addition is -another man's pointless cruft. +=head1 GETTING THE PERL SOURCE -=item Does it create too much work? +All of Perl's source code is kept centrally in a Git repository at +I. The repository contains many Perl revisions from +Perl 1 onwards and all the revisions from Perforce, the previous +version control system. -Work for the pumpking, work for Perl programmers, work for module -authors, ... Perl is supposed to be easy. +For much more detail on using git with the Perl repository, please see +L. -=item Patches speak louder than words +=head2 Read access via Git -Working code is always preferred to pie-in-the-sky ideas. A patch to -add a feature stands a much higher chance of making it to the language -than does a random feature request, no matter how fervently argued the -request might be. This ties into "Will it be useful?", as the fact -that someone took the time to make the patch demonstrates a strong -desire for the feature. +You will need a copy of Git for your computer. You can fetch a copy of +the repository using the git protocol: -=back + % git clone git://perl5.git.perl.org/perl.git perl -If you're on the list, you might hear the word "core" bandied -around. It refers to the standard distribution. "Hacking on the -core" means you're changing the C source code to the Perl -interpreter. "A core module" is one that ships with Perl. +This clones the repository and makes a local copy in the F +directory. -=head2 Keeping in sync +If you cannot use the git protocol for firewall reasons, you can also +clone via http, though this is much slower: -The source code to the Perl interpreter, in its different versions, is -kept in a repository managed by the git revision control system. The -pumpkings and a few others have write access to the repository to check in -changes. + % git clone http://perl5.git.perl.org/perl.git perl -How to clone and use the git perl repository is described in L. +=head2 Read access via the web -You can also choose to use rsync to get a copy of the current source tree -for the bleadperl branch and all maintenance branches : +You may access the repository over the web. This allows you to browse +the tree, see recent commits, subscribe to RSS feeds for the changes, +search for particular commits and more. You may access it at +L. A mirror of the repository is +found at L - $ rsync -avz rsync://perl5.git.perl.org/APC/perl-current . - $ rsync -avz rsync://perl5.git.perl.org/APC/perl-5.10.x . - $ rsync -avz rsync://perl5.git.perl.org/APC/perl-5.8.x . - $ rsync -avz rsync://perl5.git.perl.org/APC/perl-5.6.x . - $ rsync -avz rsync://perl5.git.perl.org/APC/perl-5.005xx . +=head2 Read access via rsync -(Add the C<--delete> option to remove leftover files) +You can also choose to use rsync to get a copy of the current source +tree for the bleadperl branch and all maintenance branches: -You may also want to subscribe to the perl5-changes mailing list to -receive a copy of each patch that gets submitted to the maintenance -and development "branches" of the perl repository. See -http://lists.perl.org/ for subscription information. + $ rsync -avz rsync://perl5.git.perl.org/perl-current . + $ rsync -avz rsync://perl5.git.perl.org/perl-5.12.x . + $ rsync -avz rsync://perl5.git.perl.org/perl-5.10.x . + $ rsync -avz rsync://perl5.git.perl.org/perl-5.8.x . + $ rsync -avz rsync://perl5.git.perl.org/perl-5.6.x . + $ rsync -avz rsync://perl5.git.perl.org/perl-5.005xx . -If you are a member of the perl5-porters mailing list, it is a good -thing to keep in touch with the most recent changes. If not only to -verify if what you would have posted as a bug report isn't already -solved in the most recent available perl development branch, also -known as perl-current, bleading edge perl, bleedperl or bleadperl. +(Add the C<--delete> option to remove leftover files) -Needless to say, the source code in perl-current is usually in a perpetual -state of evolution. You should expect it to be very buggy. Do B use -it for any purpose other than testing and development. +To get a full list of the available sync points: -=head2 Perlbug administration + $ rsync perl5.git.perl.org:: -There is a single remote administrative interface for modifying bug status, -category, open issues etc. using the B bugtracker system, maintained -by Robert Spier. Become an administrator, and close any bugs you can get -your sticky mitts on: +=head2 Write access via git - http://bugs.perl.org/ +If you have a commit bit, please see L for more details on +using git. -To email the bug system administrators: +=head1 PATCHING PERL - "perlbug-admin" +If you're planning to do more extensive work than a single small fix, +we encourage you to read the documentation below. This will help you +focus your work and make your patches easier to incorporate into the +Perl source. =head2 Submitting patches -Always submit patches to I. If you're -patching a core module and there's an author listed, send the author a -copy (see L). This lets other porters review -your patch, which catches a surprising number of errors in patches. -Please patch against the latest B version. (e.g., even if -you're fixing a bug in the 5.8 track, patch against the C branch in -the git repository.) - -If changes are accepted, they are applied to the development branch. Then -the maintenance pumpking decides which of those patches is to be -backported to the maint branch. Only patches that survive the heat of the -development branch get applied to maintenance versions. - -Your patch should update the documentation and test suite. See -L. If you have added or removed files in the distribution, -edit the MANIFEST file accordingly, sort the MANIFEST file using -C, and include those changes as part of your patch. - -Patching documentation also follows the same order: if accepted, a patch -is first applied to B, and if relevant then it's backported -to B. (With an exception for some patches that document -behaviour that only appears in the maintenance branch, but which has -changed in the development version.) - -To report a bug in Perl, use the program I which comes with -Perl (if you can't get Perl to work, send mail to the address -I or I). Reporting bugs through -I feeds into the automated bug-tracking system, access to -which is provided through the web at http://rt.perl.org/rt3/ . It -often pays to check the archives of the perl5-porters mailing list to -see whether the bug you're reporting has been reported before, and if -so whether it was considered a bug. See above for the location of -the searchable archives. - -The CPAN testers ( http://testers.cpan.org/ ) are a group of -volunteers who test CPAN modules on a variety of platforms. Perl -Smokers ( http://www.nntp.perl.org/group/perl.daily-build and -http://www.nntp.perl.org/group/perl.daily-build.reports/ ) -automatically test Perl source releases on platforms with various -configurations. Both efforts welcome volunteers. In order to get -involved in smoke testing of the perl itself visit -L. In order to start smoke -testing CPAN modules visit L -or L or -L. - -It's a good idea to read and lurk for a while before chipping in. -That way you'll get to see the dynamic of the conversations, learn the -personalities of the players, and hopefully be better prepared to make -a useful contribution when do you speak up. - -If after all this you still think you want to join the perl5-porters -mailing list, send mail to I. To -unsubscribe, send mail to I. - -To hack on the Perl guts, you'll need to read the following things: - -=over 3 - -=item L - -This is of paramount importance, since it's the documentation of what -goes where in the Perl source. Read it over a couple of times and it -might start to make sense - don't worry if it doesn't yet, because the -best way to study it is to read it in conjunction with poking at Perl -source, and we'll do that later on. - -Gisle Aas's "illustrated perlguts", also known as I, has very -helpful pictures: - -L +If you have a small patch to submit, please submit it via perlbug. You +can also send email directly to perlbug@perl.org. Please note that +messages sent to perlbug may be held in a moderation queue, so you +won't receive a response immediately. -=item L and L +You'll know your submission has been processed when you receive an +email from our ticket tracking system. This email will give you a +ticket number. Once your patch has made it to the ticket tracking +system, it will also be sent to the perl5-porters@perl.org list. -A working knowledge of XSUB programming is incredibly useful for core -hacking; XSUBs use techniques drawn from the PP code, the portion of the -guts that actually executes a Perl program. It's a lot gentler to learn -those techniques from simple examples and explanation than from the core -itself. +Patches are reviewed and discussed on the p5p list. Simple, +uncontroversial patches will usually be applied without any discussion. +When the patch is applied, the ticket will be updated and you will +receive email. In addition, an email will be sent to the p5p list. -=item L +In other cases, the patch will need more work or discussion. That will +happen on the p5p list. -The documentation for the Perl API explains what some of the internal -functions do, as well as the many macros used in the source. +You are encouraged to participate in the discussion and advocate for +your patch. Sometimes your patch may get lost in the shuffle. It's +appropriate to send a reminder email to p5p if no action has been taken +in a month. Please remember that the Perl 5 developers are all +volunteers, and be polite. -=item F +Changes are always applied directly to the main development branch, +called "blead". Some patches may be backported to a maintenance branch. +If you think your patch is appropriate for the maintenance branch, +please explain why when you submit it. -This is a collection of words of wisdom for a Perl porter; some of it is -only useful to the pumpkin holder, but most of it applies to anyone -wanting to go about Perl development. +=head2 Getting your patch accepted -=item The perl5-porters FAQ +If you are submitting a code patch there are several things that you +can do to help the Perl 5 Porters accept your patch. -This should be available from http://dev.perl.org/perl5/docs/p5p-faq.html . -It contains hints on reading perl5-porters, information on how -perl5-porters works and how Perl development in general works. +=head3 Patch style -=back +If you used git to check out the Perl source, then using C will produce a patch in a style suitable for Perl. The +C command produces one patch file for each commit you +made. If you prefer to send a single patch for all commits, you can use +C. -=head2 Finding Your Way Around + % git co blead + % git pull + % git diff blead my-branch-name -Perl maintenance can be split into a number of areas, and certain people -(pumpkins) will have responsibility for each area. These areas sometimes -correspond to files or directories in the source kit. Among the areas are: +This produces a patch based on the difference between blead and your +current branch. It's important to make sure that blead is up to date +before producing the diff, that's why we call C first. -=over 3 +We strongly recommend that you use git if possible. It will make your +life easier, and ours as well. -=item Core modules +However, if you're not using git, you can still produce a suitable +patch. You'll need a pristine copy of the Perl source to diff against. +The porters prefer unified diffs. Using GNU C, you can produce a +diff like this: -Modules shipped as part of the Perl core live in various subdirectories, where -two are dedicated to core-only modules, and two are for the dual-life modules -which live on CPAN and may be maintained separately with respect to the Perl -core: + % diff -Npurd perl.pristine perl.mine - lib/ is for pure-Perl modules, which exist in the core only. +Make sure that you C in your copy of Perl to remove any +build artifacts, or you may get a confusing result. - ext/ is for XS extensions, and modules with special Makefile.PL requirements, which exist in the core only. +=head3 Commit message - cpan/ is for dual-life modules, where the CPAN module is canonical (should be patched first). +As you craft each patch you intend to submit to the Perl core, it's +important to write a good commit message. This is especially important +if your submission will consist of a series of commits. - dist/ is for dual-life modules, where the blead source is canonical. +The first line of the commit message should be a short description +without a period. It should be no longer than the subject line of an +E-Mail, 50 characters being a good rule of thumb. -=item Tests +A lot of Git tools (Gitweb, GitHub, git log --pretty=oneline, ..) will +only display the first line (cut off at 50 characters) when presenting +commit summaries. -There are tests for nearly all the modules, built-ins and major bits -of functionality. Test files all have a .t suffix. Module tests live -in the F and F directories next to the module being -tested. Others live in F. See L +The commit message should include a description of the problem that the +patch corrects or new functionality that the patch adds. -=item Documentation +As a general rule of thumb, your commit message should help a +programmer who knows the Perl core quickly understand what you were +trying to do, how you were trying to do it, and why the change matters +to Perl. -Documentation maintenance includes looking after everything in the -F directory, (as well as contributing new documentation) and -the documentation to the modules in core. +=over 4 -=item Configure +=item * Why -The Configure process is the way we make Perl portable across the -myriad of operating systems it supports. Responsibility for the -Configure, build and installation process, as well as the overall -portability of the core code rests with the Configure pumpkin - -others help out with individual operating systems. +Your commit message should describe why the change you are making is +important. When someone looks at your change in six months or six +years, your intent should be clear. -The three files that fall under his/her responsibility are Configure, -config_h.SH, and Porting/Glossary (and a whole bunch of small related -files that are less important here). The Configure pumpkin decides how -patches to these are dealt with. Currently, the Configure pumpkin will -accept patches in most common formats, even directly to these files. -Other committers are allowed to commit to these files under the strict -condition that they will inform the Configure pumpkin, either on IRC -(if he/she happens to be around) or through (personal) e-mail. +If you're deprecating a feature with the intent of later simplifying +another bit of code, say so. If you're fixing a performance problem or +adding a new feature to support some other bit of the core, mention +that. -The files involved are the operating system directories, (F, -F, F and so on) the shell scripts which generate F -and F, as well as the metaconfig files which generate -F. (metaconfig isn't included in the core distribution.) +=item * What -See http://perl5.git.perl.org/metaconfig.git/blob/HEAD:/README for a -description of the full process involved. +Your commit message should describe what part of the Perl core you're +changing and what you expect your patch to do. -=item Interpreter +=item * How -And of course, there's the core of the Perl interpreter itself. Let's -have a look at that in a little more detail. +While it's not necessary for documentation changes, new tests or +trivial patches, it's often worth explaining how your change works. +Even if it's clear to you today, it may not be clear to a porter next +month or next year. =back -Before we leave looking at the layout, though, don't forget that -F contains not only the file names in the Perl distribution, -but short descriptions of what's in them, too. For an overview of the -important files, try this: +A commit message isn't intended to take the place of comments in your +code. Commit messages should describe the change you made, while code +comments should describe the current state of the code. - perl -lne 'print if /^[^\/]+\.[ch]\s+/' MANIFEST +If you've just implemented a new feature, complete with doc, tests and +well-commented code, a brief commit message will often suffice. If, +however, you've just changed a single character deep in the parser or +lexer, you might need to write a small novel to ensure that future +readers understand what you did and why you did it. -=head2 Elements of the interpreter +=head3 Comments, Comments, Comments -The work of the interpreter has two main stages: compiling the code -into the internal representation, or bytecode, and then executing it. -L explains exactly how the compilation stage -happens. +Be sure to adequately comment your code. While commenting every line is +unnecessary, anything that takes advantage of side effects of +operators, that creates changes that will be felt outside of the +function being patched, or that others may find confusing should be +documented. If you are going to err, it is better to err on the side of +adding too many comments than too few. -Here is a short breakdown of perl's operation: - -=over 3 +The best comments explain I the code does what it does, not I. -=item Startup +=head3 Style -The action begins in F. (or F for miniperl) -This is very high-level code, enough to fit on a single screen, and it -resembles the code found in L; most of the real action takes -place in F +In general, please follow the particular style of the code you are +patching. -F is generated by L from F at -make time, so you should make perl to follow this along. +In particular, follow these general guidelines for patching Perl +sources: -First, F allocates some memory and constructs a Perl -interpreter, along these lines: - - 1 PERL_SYS_INIT3(&argc,&argv,&env); - 2 - 3 if (!PL_do_undump) { - 4 my_perl = perl_alloc(); - 5 if (!my_perl) - 6 exit(1); - 7 perl_construct(my_perl); - 8 PL_perl_destruct_level = 0; - 9 } - -Line 1 is a macro, and its definition is dependent on your operating -system. Line 3 references C, a global variable - all -global variables in Perl start with C. This tells you whether the -current running program was created with the C<-u> flag to perl and then -F, which means it's going to be false in any sane context. - -Line 4 calls a function in F to allocate memory for a Perl -interpreter. It's quite a simple function, and the guts of it looks like -this: - - my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter)); - -Here you see an example of Perl's system abstraction, which we'll see -later: C is either your system's C, or Perl's -own C as defined in F if you selected that option at -configure time. - -Next, in line 7, we construct the interpreter using perl_construct, -also in F; this sets up all the special variables that Perl -needs, the stacks, and so on. - -Now we pass Perl the command line options, and tell it to go: - - exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL); - if (!exitstatus) - perl_run(my_perl); - - exitstatus = perl_destruct(my_perl); - - perl_free(my_perl); - -C is actually a wrapper around C, as defined -in F, which processes the command line options, sets up any -statically linked XS modules, opens the program and calls C to -parse it. - -=item Parsing - -The aim of this stage is to take the Perl source, and turn it into an op -tree. We'll see what one of those looks like later. Strictly speaking, -there's three things going on here. - -C, the parser, lives in F, although you're better off -reading the original YACC input in F. (Yes, Virginia, there -B a YACC grammar for Perl!) The job of the parser is to take your -code and "understand" it, splitting it into sentences, deciding which -operands go with which operators and so on. - -The parser is nobly assisted by the lexer, which chunks up your input -into tokens, and decides what type of thing each token is: a variable -name, an operator, a bareword, a subroutine, a core function, and so on. -The main point of entry to the lexer is C, and that and its -associated routines can be found in F. Perl isn't much like -other computer languages; it's highly context sensitive at times, it can -be tricky to work out what sort of token something is, or where a token -ends. As such, there's a lot of interplay between the tokeniser and the -parser, which can get pretty frightening if you're not used to it. - -As the parser understands a Perl program, it builds up a tree of -operations for the interpreter to perform during execution. The routines -which construct and link together the various operations are to be found -in F, and will be examined later. - -=item Optimization - -Now the parsing stage is complete, and the finished tree represents -the operations that the Perl interpreter needs to perform to execute our -program. Next, Perl does a dry run over the tree looking for -optimisations: constant expressions such as C<3 + 4> will be computed -now, and the optimizer will also see if any multiple operations can be -replaced with a single one. For instance, to fetch the variable C<$foo>, -instead of grabbing the glob C<*foo> and looking at the scalar -component, the optimizer fiddles the op tree to use a function which -directly looks up the scalar in question. The main optimizer is C -in F, and many ops have their own optimizing functions. - -=item Running - -Now we're finally ready to go: we have compiled Perl byte code, and all -that's left to do is run it. The actual execution is done by the -C function in F; more specifically, it's done by -these three innocent looking lines: - - while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) { - PERL_ASYNC_CHECK(); - } - -You may be more comfortable with the Perl version of that: - - PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}}; - -Well, maybe not. Anyway, each op contains a function pointer, which -stipulates the function which will actually carry out the operation. -This function will return the next op in the sequence - this allows for -things like C which choose the next op dynamically at run time. -The C makes sure that things like signals interrupt -execution if required. - -The actual functions called are known as PP code, and they're spread -between four files: F contains the "hot" code, which is most -often used and highly optimized, F contains all the -system-specific functions, F contains the functions which -implement control structures (C, C and the like) and F -contains everything else. These are, if you like, the C code for Perl's -built-in functions and operators. - -Note that each C function is expected to return a pointer to the next -op. Calls to perl subs (and eval blocks) are handled within the same -runops loop, and do not consume extra space on the C stack. For example, -C and C just push a C or C block -struct onto the context stack which contain the address of the op -following the sub call or eval. They then return the first op of that sub -or eval block, and so execution continues of that sub or block. Later, a -C or C op pops the C or C, -retrieves the return op from it, and returns it. - -=item Exception handing - -Perl's exception handing (i.e. C etc.) is built on top of the low-level -C/C C-library functions. These basically provide a -way to capture the current PC and SP registers and later restore them; i.e. -a C continues at the point in code where a previous C -was done, with anything further up on the C stack being lost. This is why -code should always save values using C rather than in auto -variables. - -The perl core wraps C etc in the macros C and -C. The basic rule of perl exceptions is that C, and -C (in the absence of C) perform a C, while -C within C does a C. - -At entry points to perl, such as C, C and -C each does a C, then enter a runops -loop or whatever, and handle possible exception returns. For a 2 return, -final cleanup is performed, such as popping stacks and calling C or -C blocks. Amongst other things, this is how scope cleanup still -occurs during an C. - -If a C can find a C block on the context stack, then the -stack is popped to that level and the return op in that block is assigned -to C; then a C is performed. This normally -passes control back to the guard. In the case of C and -C, a non-null C triggers re-entry to the runops -loop. The is the normal way that C or C is handled within an -C. - -Sometimes ops are executed within an inner runops loop, such as tie, sort -or overload code. In this case, something like - - sub FETCH { eval { die } } - -would cause a longjmp right back to the guard in C, popping both -runops loops, which is clearly incorrect. One way to avoid this is for the -tie code to do a C before executing C in the inner -runops loop, but for efficiency reasons, perl in fact just sets a flag, -using C. The C, C and -C ops check this flag, and if true, they call C, -which does a C and starts a new runops level to execute the -code, rather than doing it on the current loop. - -As a further optimisation, on exit from the eval block in the C, -execution of the code following the block is still carried on in the inner -loop. When an exception is raised, C compares the C -level of the C with C and if they differ, just -re-throws the exception. In this way any inner loops get popped. - -Here's an example. - - 1: eval { tie @a, 'A' }; - 2: sub A::TIEARRAY { - 3: eval { die }; - 4: die; - 5: } - -To run this code, C is called, which does a C then -enters a runops loop. This loop executes the eval and tie ops on line 1, -with the eval pushing a C onto the context stack. - -The C does a C, then starts a second runops loop -to execute the body of C. When it executes the entertry op on -line 3, C is true, so C calls C which -does a C and starts a third runops loop, which then executes -the die op. At this point the C call stack looks like this: - - Perl_pp_die - Perl_runops # third loop - S_docatch_body - S_docatch - Perl_pp_entertry - Perl_runops # second loop - S_call_body - Perl_call_sv - Perl_pp_tie - Perl_runops # first loop - S_run_body - perl_run - main - -and the context and data stacks, as shown by C<-Dstv>, look like: - - STACK 0: MAIN - CX 0: BLOCK => - CX 1: EVAL => AV() PV("A"\0) - retop=leave - STACK 1: MAGIC - CX 0: SUB => - retop=(null) - CX 1: EVAL => * - retop=nextstate - -The die pops the first C off the context stack, sets -C from it, does a C, and control returns to -the top C. This then starts another third-level runops level, -which executes the nextstate, pushmark and die ops on line 4. At the point -that the second C is called, the C call stack looks exactly like -that above, even though we are no longer within an inner eval; this is -because of the optimization mentioned earlier. However, the context stack -now looks like this, ie with the top CxEVAL popped: - - STACK 0: MAIN - CX 0: BLOCK => - CX 1: EVAL => AV() PV("A"\0) - retop=leave - STACK 1: MAGIC - CX 0: SUB => - retop=(null) - -The die on line 4 pops the context stack back down to the CxEVAL, leaving -it as: - - STACK 0: MAIN - CX 0: BLOCK => - -As usual, C is extracted from the C, and a -C done, which pops the C stack back to the docatch: - - S_docatch - Perl_pp_entertry - Perl_runops # second loop - S_call_body - Perl_call_sv - Perl_pp_tie - Perl_runops # first loop - S_run_body - perl_run - main - -In this case, because the C level recorded in the C -differs from the current one, C just does a C -and the C stack unwinds to: - - perl_run - main - -Because C is non-null, C starts a new runops loop -and execution continues. - -=back - -=head2 Internal Variable Types - -You should by now have had a look at L, which tells you about -Perl's internal variable types: SVs, HVs, AVs and the rest. If not, do -that now. - -These variables are used not only to represent Perl-space variables, but -also any constants in the code, as well as some structures completely -internal to Perl. The symbol table, for instance, is an ordinary Perl -hash. Your code is represented by an SV as it's read into the parser; -any program files you call are opened via ordinary Perl filehandles, and -so on. - -The core L module lets us examine SVs from a -Perl program. Let's see, for instance, how Perl treats the constant -C<"hello">. - - % perl -MDevel::Peek -e 'Dump("hello")' - 1 SV = PV(0xa041450) at 0xa04ecbc - 2 REFCNT = 1 - 3 FLAGS = (POK,READONLY,pPOK) - 4 PV = 0xa0484e0 "hello"\0 - 5 CUR = 5 - 6 LEN = 6 - -Reading C output takes a bit of practise, so let's go -through it line by line. - -Line 1 tells us we're looking at an SV which lives at C<0xa04ecbc> in -memory. SVs themselves are very simple structures, but they contain a -pointer to a more complex structure. In this case, it's a PV, a -structure which holds a string value, at location C<0xa041450>. Line 2 -is the reference count; there are no other references to this data, so -it's 1. - -Line 3 are the flags for this SV - it's OK to use it as a PV, it's a -read-only SV (because it's a constant) and the data is a PV internally. -Next we've got the contents of the string, starting at location -C<0xa0484e0>. - -Line 5 gives us the current length of the string - note that this does -B include the null terminator. Line 6 is not the length of the -string, but the length of the currently allocated buffer; as the string -grows, Perl automatically extends the available storage via a routine -called C. - -You can get at any of these quantities from C very easily; just add -C to the name of the field shown in the snippet, and you've got a -macro which will return the value: C returns the current -length of the string, C returns the reference count, -C returns the string itself with its length, and so on. -More macros to manipulate these properties can be found in L. - -Let's take an example of manipulating a PV, from C, in F - - 1 void - 2 Perl_sv_catpvn(pTHX_ register SV *sv, register const char *ptr, register STRLEN len) - 3 { - 4 STRLEN tlen; - 5 char *junk; - - 6 junk = SvPV_force(sv, tlen); - 7 SvGROW(sv, tlen + len + 1); - 8 if (ptr == junk) - 9 ptr = SvPVX(sv); - 10 Move(ptr,SvPVX(sv)+tlen,len,char); - 11 SvCUR(sv) += len; - 12 *SvEND(sv) = '\0'; - 13 (void)SvPOK_only_UTF8(sv); /* validate pointer */ - 14 SvTAINT(sv); - 15 } - -This is a function which adds a string, C, of length C onto -the end of the PV stored in C. The first thing we do in line 6 is -make sure that the SV B a valid PV, by calling the C -macro to force a PV. As a side effect, C gets set to the current -value of the PV, and the PV itself is returned to C. - -In line 7, we make sure that the SV will have enough room to accommodate -the old string, the new string and the null terminator. If C isn't -big enough, C will reallocate space for us. - -Now, if C is the same as the string we're trying to add, we can -grab the string directly from the SV; C is the address of the PV -in the SV. - -Line 10 does the actual catenation: the C macro moves a chunk of -memory around: we move the string C to the end of the PV - that's -the start of the PV plus its current length. We're moving C bytes -of type C. After doing so, we need to tell Perl we've extended the -string, by altering C to reflect the new length. C is a -macro which gives us the end of the string, so that needs to be a -C<"\0">. - -Line 13 manipulates the flags; since we've changed the PV, any IV or NV -values will no longer be valid: if we have C<$a=10; $a.="6";> we don't -want to use the old IV of 10. C is a special UTF-8-aware -version of C, a macro which turns off the IOK and NOK flags -and turns on POK. The final C is a macro which launders tainted -data if taint mode is turned on. - -AVs and HVs are more complicated, but SVs are by far the most common -variable type being thrown around. Having seen something of how we -manipulate these, let's go on and look at how the op tree is -constructed. - -=head2 Op Trees - -First, what is the op tree, anyway? The op tree is the parsed -representation of your program, as we saw in our section on parsing, and -it's the sequence of operations that Perl goes through to execute your -program, as we saw in L. - -An op is a fundamental operation that Perl can perform: all the built-in -functions and operators are ops, and there are a series of ops which -deal with concepts the interpreter needs internally - entering and -leaving a block, ending a statement, fetching a variable, and so on. - -The op tree is connected in two ways: you can imagine that there are two -"routes" through it, two orders in which you can traverse the tree. -First, parse order reflects how the parser understood the code, and -secondly, execution order tells perl what order to perform the -operations in. - -The easiest way to examine the op tree is to stop Perl after it has -finished parsing, and get it to dump out the tree. This is exactly what -the compiler backends L, L -and L do. - -Let's have a look at how Perl sees C<$a = $b + $c>: - - % perl -MO=Terse -e '$a=$b+$c' - 1 LISTOP (0x8179888) leave - 2 OP (0x81798b0) enter - 3 COP (0x8179850) nextstate - 4 BINOP (0x8179828) sassign - 5 BINOP (0x8179800) add [1] - 6 UNOP (0x81796e0) null [15] - 7 SVOP (0x80fafe0) gvsv GV (0x80fa4cc) *b - 8 UNOP (0x81797e0) null [15] - 9 SVOP (0x8179700) gvsv GV (0x80efeb0) *c - 10 UNOP (0x816b4f0) null [15] - 11 SVOP (0x816dcf0) gvsv GV (0x80fa460) *a - -Let's start in the middle, at line 4. This is a BINOP, a binary -operator, which is at location C<0x8179828>. The specific operator in -question is C - scalar assignment - and you can find the code -which implements it in the function C in F. As a -binary operator, it has two children: the add operator, providing the -result of C<$b+$c>, is uppermost on line 5, and the left hand side is on -line 10. - -Line 10 is the null op: this does exactly nothing. What is that doing -there? If you see the null op, it's a sign that something has been -optimized away after parsing. As we mentioned in L, -the optimization stage sometimes converts two operations into one, for -example when fetching a scalar variable. When this happens, instead of -rewriting the op tree and cleaning up the dangling pointers, it's easier -just to replace the redundant operation with the null op. Originally, -the tree would have looked like this: - - 10 SVOP (0x816b4f0) rv2sv [15] - 11 SVOP (0x816dcf0) gv GV (0x80fa460) *a - -That is, fetch the C entry from the main symbol table, and then look -at the scalar component of it: C (C into F) -happens to do both these things. - -The right hand side, starting at line 5 is similar to what we've just -seen: we have the C op (C also in F) add together -two Cs. - -Now, what's this about? - - 1 LISTOP (0x8179888) leave - 2 OP (0x81798b0) enter - 3 COP (0x8179850) nextstate - -C and C are scoping ops, and their job is to perform any -housekeeping every time you enter and leave a block: lexical variables -are tidied up, unreferenced variables are destroyed, and so on. Every -program will have those first three lines: C is a list, and its -children are all the statements in the block. Statements are delimited -by C, so a block is a collection of C ops, with -the ops to be performed for each statement being the children of -C. C is a single op which functions as a marker. - -That's how Perl parsed the program, from top to bottom: - - Program - | - Statement - | - = - / \ - / \ - $a + - / \ - $b $c - -However, it's impossible to B the operations in this order: -you have to find the values of C<$b> and C<$c> before you add them -together, for instance. So, the other thread that runs through the op -tree is the execution order: each op has a field C which points -to the next op to be run, so following these pointers tells us how perl -executes the code. We can traverse the tree in this order using -the C option to C: - - % perl -MO=Terse,exec -e '$a=$b+$c' - 1 OP (0x8179928) enter - 2 COP (0x81798c8) nextstate - 3 SVOP (0x81796c8) gvsv GV (0x80fa4d4) *b - 4 SVOP (0x8179798) gvsv GV (0x80efeb0) *c - 5 BINOP (0x8179878) add [1] - 6 SVOP (0x816dd38) gvsv GV (0x80fa468) *a - 7 BINOP (0x81798a0) sassign - 8 LISTOP (0x8179900) leave - -This probably makes more sense for a human: enter a block, start a -statement. Get the values of C<$b> and C<$c>, and add them together. -Find C<$a>, and assign one to the other. Then leave. - -The way Perl builds up these op trees in the parsing process can be -unravelled by examining F, the YACC grammar. Let's take the -piece we need to construct the tree for C<$a = $b + $c> - - 1 term : term ASSIGNOP term - 2 { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); } - 3 | term ADDOP term - 4 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); } - -If you're not used to reading BNF grammars, this is how it works: You're -fed certain things by the tokeniser, which generally end up in upper -case. Here, C, is provided when the tokeniser sees C<+> in your -code. C is provided when C<=> is used for assigning. These are -"terminal symbols", because you can't get any simpler than them. - -The grammar, lines one and three of the snippet above, tells you how to -build up more complex forms. These complex forms, "non-terminal symbols" -are generally placed in lower case. C here is a non-terminal -symbol, representing a single expression. - -The grammar gives you the following rule: you can make the thing on the -left of the colon if you see all the things on the right in sequence. -This is called a "reduction", and the aim of parsing is to completely -reduce the input. There are several different ways you can perform a -reduction, separated by vertical bars: so, C followed by C<=> -followed by C makes a C, and C followed by C<+> -followed by C can also make a C. - -So, if you see two terms with an C<=> or C<+>, between them, you can -turn them into a single expression. When you do this, you execute the -code in the block on the next line: if you see C<=>, you'll do the code -in line 2. If you see C<+>, you'll do the code in line 4. It's this code -which contributes to the op tree. - - | term ADDOP term - { $$ = newBINOP($2, 0, scalar($1), scalar($3)); } - -What this does is creates a new binary op, and feeds it a number of -variables. The variables refer to the tokens: C<$1> is the first token in -the input, C<$2> the second, and so on - think regular expression -backreferences. C<$$> is the op returned from this reduction. So, we -call C to create a new binary operator. The first parameter to -C, a function in F, is the op type. It's an addition -operator, so we want the type to be C. We could specify this -directly, but it's right there as the second token in the input, so we -use C<$2>. The second parameter is the op's flags: 0 means "nothing -special". Then the things to add: the left and right hand side of our -expression, in scalar context. - -=head2 Stacks - -When perl executes something like C, how does it pass on its -results to the next op? The answer is, through the use of stacks. Perl -has a number of stacks to store things it's currently working on, and -we'll look at the three most important ones here. - -=over 3 - -=item Argument stack - -Arguments are passed to PP code and returned from PP code using the -argument stack, C. The typical way to handle arguments is to pop -them off the stack, deal with them how you wish, and then push the result -back onto the stack. This is how, for instance, the cosine operator -works: - - NV value; - value = POPn; - value = Perl_cos(value); - XPUSHn(value); - -We'll see a more tricky example of this when we consider Perl's macros -below. C gives you the NV (floating point value) of the top SV on -the stack: the C<$x> in C. Then we compute the cosine, and push -the result back as an NV. The C in C means that the stack -should be extended if necessary - it can't be necessary here, because we -know there's room for one more item on the stack, since we've just -removed one! The C macros at least guarantee safety. - -Alternatively, you can fiddle with the stack directly: C gives you -the first element in your portion of the stack, and C gives you -the top SV/IV/NV/etc. on the stack. So, for instance, to do unary -negation of an integer: - - SETi(-TOPi); - -Just set the integer value of the top stack entry to its negation. - -Argument stack manipulation in the core is exactly the same as it is in -XSUBs - see L, L and L for a longer -description of the macros used in stack manipulation. - -=item Mark stack - -I say "your portion of the stack" above because PP code doesn't -necessarily get the whole stack to itself: if your function calls -another function, you'll only want to expose the arguments aimed for the -called function, and not (necessarily) let it get at your own data. The -way we do this is to have a "virtual" bottom-of-stack, exposed to each -function. The mark stack keeps bookmarks to locations in the argument -stack usable by each function. For instance, when dealing with a tied -variable, (internally, something with "P" magic) Perl has to call -methods for accesses to the tied variables. However, we need to separate -the arguments exposed to the method to the argument exposed to the -original function - the store or fetch or whatever it may be. Here's -roughly how the tied C is implemented; see C in F: - - 1 PUSHMARK(SP); - 2 EXTEND(SP,2); - 3 PUSHs(SvTIED_obj((SV*)av, mg)); - 4 PUSHs(val); - 5 PUTBACK; - 6 ENTER; - 7 call_method("PUSH", G_SCALAR|G_DISCARD); - 8 LEAVE; - -Let's examine the whole implementation, for practice: - - 1 PUSHMARK(SP); - -Push the current state of the stack pointer onto the mark stack. This is -so that when we've finished adding items to the argument stack, Perl -knows how many things we've added recently. - - 2 EXTEND(SP,2); - 3 PUSHs(SvTIED_obj((SV*)av, mg)); - 4 PUSHs(val); - -We're going to add two more items onto the argument stack: when you have -a tied array, the C subroutine receives the object and the value -to be pushed, and that's exactly what we have here - the tied object, -retrieved with C, and the value, the SV C. - - 5 PUTBACK; - -Next we tell Perl to update the global stack pointer from our internal -variable: C only gave us a local copy, not a reference to the global. - - 6 ENTER; - 7 call_method("PUSH", G_SCALAR|G_DISCARD); - 8 LEAVE; - -C and C localise a block of code - they make sure that all -variables are tidied up, everything that has been localised gets -its previous value returned, and so on. Think of them as the C<{> and -C<}> of a Perl block. - -To actually do the magic method call, we have to call a subroutine in -Perl space: C takes care of that, and it's described in -L. We call the C method in scalar context, and we're -going to discard its return value. The call_method() function -removes the top element of the mark stack, so there is nothing for -the caller to clean up. - -=item Save stack - -C doesn't have a concept of local scope, so perl provides one. We've -seen that C and C are used as scoping braces; the save -stack implements the C equivalent of, for example: - - { - local $foo = 42; - ... - } - -See L for how to use the save stack. - -=back +=over 4 -=head2 Millions of Macros +=item * -One thing you'll notice about the Perl source is that it's full of -macros. Some have called the pervasive use of macros the hardest thing -to understand, others find it adds to clarity. Let's take an example, -the code which implements the addition operator: +8-wide tabs (no exceptions!) - 1 PP(pp_add) - 2 { - 3 dSP; dATARGET; tryAMAGICbin(add,opASSIGN); - 4 { - 5 dPOPTOPnnrl_ul; - 6 SETn( left + right ); - 7 RETURN; - 8 } - 9 } +=item * -Every line here (apart from the braces, of course) contains a macro. The -first line sets up the function declaration as Perl expects for PP code; -line 3 sets up variable declarations for the argument stack and the -target, the return value of the operation. Finally, it tries to see if -the addition operation is overloaded; if so, the appropriate subroutine -is called. +4-wide indents for code, 2-wide indents for nested CPP #defines -Line 5 is another variable declaration - all variable declarations start -with C - which pops from the top of the argument stack two NVs (hence -C) and puts them into the variables C and C, hence the -C. These are the two operands to the addition operator. Next, we -call C to set the NV of the return value to the result of adding -the two values. This done, we return - the C macro makes sure -that our return value is properly handled, and we pass the next operator -to run back to the main run loop. +=item * -Most of these macros are explained in L, and some of the more -important ones are explained in L as well. Pay special attention -to L for information on -the C<[pad]THX_?> macros. +Try hard not to exceed 79-columns -=head2 The .i Targets +=item * -You can expand the macros in a F file by saying +ANSI C prototypes - make foo.i +=item * -which will expand the macros using cpp. Don't be scared by the results. +Uncuddled elses and "K&R" style for indenting control constructs -=head1 SOURCE CODE STATIC ANALYSIS +=item * -Various tools exist for analysing C source code B, as -opposed to B, that is, without executing the code. -It is possible to detect resource leaks, undefined behaviour, type -mismatches, portability problems, code paths that would cause illegal -memory accesses, and other similar problems by just parsing the C code -and looking at the resulting graph, what does it tell about the -execution and data flows. As a matter of fact, this is exactly -how C compilers know to give warnings about dubious code. +No C++ style (//) comments -=head2 lint, splint +=item * -The good old C code quality inspector, C, is available in -several platforms, but please be aware that there are several -different implementations of it by different vendors, which means that -the flags are not identical across different platforms. +Mark places that need to be revisited with XXX (and revisit often!) -There is a lint variant called C (Secure Programming Lint) -available from http://www.splint.org/ that should compile on any -Unix-like platform. +=item * -There are C and targets in Makefile, but you may have -to diddle with the flags (see above). +Opening brace lines up with "if" when conditional spans multiple lines; +should be at end-of-line otherwise -=head2 Coverity +=item * -Coverity (http://www.coverity.com/) is a product similar to lint and -as a testbed for their product they periodically check several open -source projects, and they give out accounts to open source developers -to the defect databases. +In function definitions, name starts in column 0 (return value is on +previous line) -=head2 cpd (cut-and-paste detector) +=item * -The cpd tool detects cut-and-paste coding. If one instance of the -cut-and-pasted code changes, all the other spots should probably be -changed, too. Therefore such code should probably be turned into a -subroutine or a macro. +Single space after keywords that are followed by parens, no space +between function name and following paren -cpd (http://pmd.sourceforge.net/cpd.html) is part of the pmd project -(http://pmd.sourceforge.net/). pmd was originally written for static -analysis of Java code, but later the cpd part of it was extended to -parse also C and C++. +=item * -Download the pmd-bin-X.Y.zip () from the SourceForge site, extract the -pmd-X.Y.jar from it, and then run that on source code thusly: +Avoid assignments in conditionals, but if they're unavoidable, use +extra paren, e.g. "if (a && (b = c)) ..." - java -cp pmd-X.Y.jar net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /some/where/src --language c > cpd.txt +=item * -You may run into memory limits, in which case you should use the -Xmx option: +"return foo;" rather than "return(foo);" - java -Xmx512M ... +=item * -=head2 gcc warnings +"if (!foo) ..." rather than "if (foo == FALSE) ..." etc. -Though much can be written about the inconsistency and coverage -problems of gcc warnings (like C<-Wall> not meaning "all the -warnings", or some common portability problems not being covered by -C<-Wall>, or C<-ansi> and C<-pedantic> both being a poorly defined -collection of warnings, and so forth), gcc is still a useful tool in -keeping our coding nose clean. +=back -The C<-Wall> is by default on. +=head3 Test suite -The C<-ansi> (and its sidekick, C<-pedantic>) would be nice to be on -always, but unfortunately they are not safe on all platforms, they can -for example cause fatal conflicts with the system headers (Solaris -being a prime example). If Configure C<-Dgccansipedantic> is used, -the C frontend selects C<-ansi -pedantic> for the platforms -where they are known to be safe. +If your patch changes code (rather than just changing documentation) +you should also include one or more test cases which illustrate the bug +you're fixing or validate the new functionality you're adding. In +general, you should update an existing test file rather than create a +new one. -Starting from Perl 5.9.4 the following extra flags are added: +Your test suite additions should generally follow these guidelines +(courtesy of Gurusamy Sarathy ): =over 4 =item * -C<-Wendif-labels> +Know what you're testing. Read the docs, and the source. =item * -C<-Wextra> +Tend to fail, not succeed. =item * -C<-Wdeclaration-after-statement> - -=back - -The following flags would be nice to have but they would first need -their own Augean stablemaster: - -=over 4 +Interpret results strictly. =item * -C<-Wpointer-arith> +Use unrelated features (this will flush out bizarre interactions). =item * -C<-Wshadow> +Use non-standard idioms (otherwise you are not testing TIMTOWTDI). =item * -C<-Wstrict-prototypes> - -=back - -The C<-Wtraditional> is another example of the annoying tendency of -gcc to bundle a lot of warnings under one switch -- it would be -impossible to deploy in practice because it would complain a lot -- but -it does contain some warnings that would be beneficial to have available -on their own, such as the warning about string constants inside macros -containing the macro arguments: this behaved differently pre-ANSI -than it does in ANSI, and some C compilers are still in transition, -AIX being an example. +Avoid using hardcoded test numbers whenever possible (the EXPECTED/GOT +found in t/op/tie.t is much more maintainable, and gives better failure +reports). -=head2 Warnings of other C compilers - -Other C compilers (yes, there B other C compilers than gcc) often -have their "strict ANSI" or "strict ANSI with some portability extensions" -modes on, like for example the Sun Workshop has its C<-Xa> mode on -(though implicitly), or the DEC (these days, HP...) has its C<-std1> -mode on. +=item * -=head2 DEBUGGING +Give meaningful error messages when a test fails. -You can compile a special debugging version of Perl, which allows you -to use the C<-D> option of Perl to tell more about what Perl is doing. -But sometimes there is no alternative than to dive in with a debugger, -either to see the stack trace of a core dump (very useful in a bug -report), or trying to figure out what went wrong before the core dump -happened, or how did we end up having wrong or unexpected results. +=item * -=head2 Poking at Perl +Avoid using qx// and system() unless you are testing for them. If you +do use them, make sure that you cover _all_ perl platforms. -To really poke around with Perl, you'll probably want to build Perl for -debugging, like this: +=item * - ./Configure -d -D optimize=-g - make +Unlink any temporary files you create. -C<-g> is a flag to the C compiler to have it produce debugging -information which will allow us to step through a running program, -and to see in which C function we are at (without the debugging -information we might see only the numerical addresses of the functions, -which is not very helpful). +=item * -F will also turn on the C compilation symbol which -enables all the internal debugging code in Perl. There are a whole bunch -of things you can debug with this: L lists them all, and the -best way to find out about them is to play about with them. The most -useful options are probably +Promote unforeseen warnings to errors with $SIG{__WARN__}. - l Context (loop) stack processing - t Trace execution - o Method and overloading resolution - c String/numeric conversions +=item * -Some of the functionality of the debugging code can be achieved using XS -modules. +Be sure to use the libraries and modules shipped with the version being +tested, not those that were already installed. - -Dr => use re 'debug' - -Dx => use O 'Debug' +=item * -=head2 Using a source-level debugger +Add comments to the code explaining what you are testing for. -If the debugging output of C<-D> doesn't help you, it's time to step -through perl's execution with a source-level debugger. +=item * -=over 3 +Make updating the '1..42' string unnecessary. Or make sure that you +update it. =item * -We'll use C for our examples here; the principles will apply to -any debugger (many vendors call their debugger C), but check the -manual of the one you're using. +Test _all_ behaviors of a given operator, library, or function. -=back +Test all optional arguments. -To fire up the debugger, type +Test return values in various contexts (boolean, scalar, list, lvalue). - gdb ./perl +Use both global and lexical variables. -Or if you have a core dump: +Don't forget the exceptional, pathological cases. - gdb ./perl core +=back -You'll want to do that in your Perl source tree so the debugger can read -the source code. You should see the copyright message, followed by the -prompt. +=head2 Patching a core module - (gdb) +This works just like patching anything else, with one extra +consideration. -C will get you into the documentation, but here are the most -useful commands: +Some core modules also live on CPAN and are maintained outside of the +Perl core. When the author updates the module, the updates are simply +copied into the core. -=over 3 +Module in the F directory of the source tree are maintained +outside of the Perl core. See that module's listing on documentation or +its listing on L for more information on +reporting bugs and submitting patches. -=item run [args] +In contrast, modules in the F directory are maintained in the +core. -Run the program with the given arguments. +=head2 Updating perldelta -=item break function_name +For changes significant enough to warrant a F entry, +the porters will greatly appreciate it if you submit a delta entry +along with your actual change. Significant changes include, but are not +limited to: -=item break source.c:xxx +=over 4 -Tells the debugger that we'll want to pause execution when we reach -either the named function (but see L!) or the given -line in the named source file. +=item * -=item step +Adding, deprecating, or removing core features -Steps through the program a line at a time. +=item * -=item next +Adding, deprecating, removing, or upgrading core or dual-life modules -Steps through the program a line at a time, without descending into -functions. +=item * -=item continue +Adding new core tests -Run until the next breakpoint. +=item * -=item finish +Fixing security issues and user-visible bugs in the core -Run until the end of the current function, then stop again. +=item * -=item 'enter' +Changes that might break existing code, either on the perl or C level -Just pressing Enter will do the most recent operation again - it's a -blessing when stepping through miles of source code. +=item * -=item print +Significant performance improvements -Execute the given C code and print its results. B: Perl makes -heavy use of macros, and F does not necessarily support macros -(see later L). You'll have to substitute them -yourself, or to invoke cpp on the source code files -(see L) -So, for instance, you can't say +=item * - print SvPV_nolen(sv) +Adding, removing, or significantly changing documentation in the +F directory -but you have to say +=item * - print Perl_sv_2pv_nolen(sv) +Important platform-specific changes =back -You may find it helpful to have a "macro dictionary", which you can -produce by saying C. Even then, F won't -recursively apply those macros for you. - -=head2 gdb macro support - -Recent versions of F have fairly good macro support, but -in order to use it you'll need to compile perl with macro definitions -included in the debugging information. Using F version 3.1, this -means configuring with C<-Doptimize=-g3>. Other compilers might use a -different switch (if they support debugging macros at all). - -=head2 Dumping Perl Data Structures - -One way to get around this macro hell is to use the dumping functions in -F; these work a little like an internal -L, but they also cover OPs and other structures -that you can't get at from Perl. Let's take an example. We'll use the -C<$a = $b + $c> we used before, but give it a bit of context: -C<$b = "6XXXX"; $c = 2.3;>. Where's a good place to stop and poke around? - -What about C, the function we examined earlier to implement the -C<+> operator: - - (gdb) break Perl_pp_add - Breakpoint 1 at 0x46249f: file pp_hot.c, line 309. - -Notice we use C and not C - see L. -With the breakpoint in place, we can run our program: - - (gdb) run -e '$b = "6XXXX"; $c = 2.3; $a = $b + $c' - -Lots of junk will go past as gdb reads in the relevant source files and -libraries, and then: - - Breakpoint 1, Perl_pp_add () at pp_hot.c:309 - 309 dSP; dATARGET; tryAMAGICbin(add,opASSIGN); - (gdb) step - 311 dPOPTOPnnrl_ul; - (gdb) - -We looked at this bit of code before, and we said that C -arranges for two Cs to be placed into C and C - let's -slightly expand it: - - #define dPOPTOPnnrl_ul NV right = POPn; \ - SV *leftsv = TOPs; \ - NV left = USE_LEFT(leftsv) ? SvNV(leftsv) : 0.0 - -C takes the SV from the top of the stack and obtains its NV either -directly (if C is set) or by calling the C function. -C takes the next SV from the top of the stack - yes, C uses -C - but doesn't remove it. We then use C to get the NV from -C in the same way as before - yes, C uses C. - -Since we don't have an NV for C<$b>, we'll have to use C to -convert it. If we step again, we'll find ourselves there: - - Perl_sv_2nv (sv=0xa0675d0) at sv.c:1669 - 1669 if (!sv) - (gdb) - -We can now use C to investigate the SV: - - SV = PV(0xa057cc0) at 0xa0675d0 - REFCNT = 1 - FLAGS = (POK,pPOK) - PV = 0xa06a510 "6XXXX"\0 - CUR = 5 - LEN = 6 - $1 = void - -We know we're going to get C<6> from this, so let's finish the -subroutine: - - (gdb) finish - Run till exit from #0 Perl_sv_2nv (sv=0xa0675d0) at sv.c:1671 - 0x462669 in Perl_pp_add () at pp_hot.c:311 - 311 dPOPTOPnnrl_ul; - -We can also dump out this op: the current op is always stored in -C, and we can dump it with C. This'll give us -similar output to L. - - { - 13 TYPE = add ===> 14 - TARG = 1 - FLAGS = (SCALAR,KIDS) - { - TYPE = null ===> (12) - (was rv2sv) - FLAGS = (SCALAR,KIDS) - { - 11 TYPE = gvsv ===> 12 - FLAGS = (SCALAR) - GV = main::b - } - } - -# finish this later # - -=head2 Patching - -All right, we've now had a look at how to navigate the Perl sources and -some things you'll need to know when fiddling with them. Let's now get -on and create a simple patch. Here's something Larry suggested: if a -C is the first active format during a C, (for example, -C) then the resulting string should be treated as -UTF-8 encoded. - -If you are working with a git clone of the Perl repository, you will want to -create a branch for your changes. This will make creating a proper patch much -simpler. See the L for details on how to do this. - -How do we prepare to fix this up? First we locate the code in question - -the C happens at runtime, so it's going to be in one of the F -files. Sure enough, C is in F. Since we're going to be -altering this file, let's copy it to F. - -[Well, it was in F when this tutorial was written. It has now been -split off with C to its own file, F] - -Now let's look over C: we take a pattern into C, and then -loop over the pattern, taking each format character in turn into -C. Then for each possible format character, we swallow up -the other arguments in the pattern (a field width, an asterisk, and so -on) and convert the next chunk input into the specified format, adding -it onto the output SV C. - -How do we know if the C is the first format in the C? Well, if -we have a pointer to the start of C then, if we see a C we can -test whether we're still at the start of the string. So, here's where -C is set up: - - STRLEN fromlen; - register char *pat = SvPVx(*++MARK, fromlen); - register char *patend = pat + fromlen; - register I32 len; - I32 datumtype; - SV *fromstr; - -We'll have another string pointer in there: - - STRLEN fromlen; - register char *pat = SvPVx(*++MARK, fromlen); - register char *patend = pat + fromlen; - + char *patcopy; - register I32 len; - I32 datumtype; - SV *fromstr; - -And just before we start the loop, we'll set C to be the start -of C: - - items = SP - MARK; - MARK++; - sv_setpvn(cat, "", 0); - + patcopy = pat; - while (pat < patend) { - -Now if we see a C which was at the start of the string, we turn on -the C flag for the output SV, C: - - + if (datumtype == 'U' && pat==patcopy+1) - + SvUTF8_on(cat); - if (datumtype == '#') { - while (pat < patend && *pat != '\n') - pat++; - -Remember that it has to be C because the first character of -the string is the C which has been swallowed into C - -Oops, we forgot one thing: what if there are spaces at the start of the -pattern? C will have C as the first active -character, even though it's not the first thing in the pattern. In this -case, we have to advance C along with C when we see spaces: - - if (isSPACE(datumtype)) - continue; - -needs to become - - if (isSPACE(datumtype)) { - patcopy++; - continue; - } - -OK. That's the C part done. Now we must do two additional things before -this patch is ready to go: we've changed the behaviour of Perl, and so -we must document that change. We must also provide some more regression -tests to make sure our patch works and doesn't create a bug somewhere -else along the line. - -The regression tests for each operator live in F, and so we -make a copy of F to F. Now we can add our -tests to the end. First, we'll test that the C does indeed create -Unicode strings. - -t/op/pack.t has a sensible ok() function, but if it didn't we could -use the one from t/test.pl. - - require './test.pl'; - plan( tests => 159 ); +Please make sure you add the perldelta entry to the right section +within F. More information on how to write good +perldelta entries is available in the C