Commit | Line | Data |
---|---|---|
e8cd7eae GS |
1 | =head1 NAME |
2 | ||
3 | perlhack - How to hack at the Perl internals | |
4 | ||
5 | =head1 DESCRIPTION | |
6 | ||
7 | This document attempts to explain how Perl development takes place, | |
8 | and ends with some suggestions for people wanting to become bona fide | |
9 | porters. | |
10 | ||
cce04beb DG |
11 | =head1 HOW PERL DEVELOPMENT HAPPENS |
12 | ||
13 | =head2 Perl 5 Porters | |
14 | ||
e8cd7eae GS |
15 | The perl5-porters mailing list is where the Perl standard distribution |
16 | is maintained and developed. The list can get anywhere from 10 to 150 | |
17 | messages a day, depending on the heatedness of the debate. Most days | |
18 | there are two or three patches, extensions, features, or bugs being | |
19 | discussed at a time. | |
20 | ||
f8e3975a | 21 | A searchable archive of the list is at either: |
e8cd7eae GS |
22 | |
23 | http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/ | |
24 | ||
f8e3975a IP |
25 | or |
26 | ||
27 | http://archive.develooper.com/perl5-porters@perl.org/ | |
28 | ||
e8cd7eae GS |
29 | List subscribers (the porters themselves) come in several flavours. |
30 | Some are quiet curious lurkers, who rarely pitch in and instead watch | |
31 | the ongoing development to ensure they're forewarned of new changes or | |
32 | features in Perl. Some are representatives of vendors, who are there | |
33 | to make sure that Perl continues to compile and work on their | |
34 | platforms. Some patch any reported bug that they know how to fix, | |
35 | some are actively patching their pet area (threads, Win32, the regexp | |
36 | engine), while others seem to do nothing but complain. In other | |
37 | words, it's your usual mix of technical people. | |
38 | ||
39 | Over this group of porters presides Larry Wall. He has the final word | |
f6c51b38 | 40 | in what does and does not change in the Perl language. Various |
b432a672 AL |
41 | releases of Perl are shepherded by a "pumpking", a porter |
42 | responsible for gathering patches, deciding on a patch-by-patch, | |
f6c51b38 | 43 | feature-by-feature basis what will and will not go into the release. |
caf100c0 | 44 | For instance, Gurusamy Sarathy was the pumpking for the 5.6 release of |
961f29c6 | 45 | Perl, and Jarkko Hietaniemi was the pumpking for the 5.8 release, and |
1a88dbf8 | 46 | Rafael Garcia-Suarez holds the pumpking crown for the 5.10 release. |
e8cd7eae GS |
47 | |
48 | In addition, various people are pumpkings for different things. For | |
961f29c6 MB |
49 | instance, Andy Dougherty and Jarkko Hietaniemi did a grand job as the |
50 | I<Configure> pumpkin up till the 5.8 release. For the 5.10 release | |
51 | H.Merijn Brand took over. | |
e8cd7eae GS |
52 | |
53 | Larry sees Perl development along the lines of the US government: | |
54 | there's the Legislature (the porters), the Executive branch (the | |
55 | pumpkings), and the Supreme Court (Larry). The legislature can | |
56 | discuss and submit patches to the executive branch all they like, but | |
57 | the executive branch is free to veto them. Rarely, the Supreme Court | |
58 | will side with the executive branch over the legislature, or the | |
59 | legislature over the executive branch. Mostly, however, the | |
60 | legislature and the executive branch are supposed to get along and | |
61 | work out their differences without impeachment or court cases. | |
62 | ||
63 | You might sometimes see reference to Rule 1 and Rule 2. Larry's power | |
64 | as Supreme Court is expressed in The Rules: | |
65 | ||
66 | =over 4 | |
67 | ||
68 | =item 1 | |
69 | ||
70 | Larry is always by definition right about how Perl should behave. | |
71 | This means he has final veto power on the core functionality. | |
72 | ||
73 | =item 2 | |
74 | ||
75 | Larry is allowed to change his mind about any matter at a later date, | |
76 | regardless of whether he previously invoked Rule 1. | |
77 | ||
78 | =back | |
79 | ||
80 | Got that? Larry is always right, even when he was wrong. It's rare | |
81 | to see either Rule exercised, but they are often alluded to. | |
82 | ||
cce04beb DG |
83 | =head2 What makes for a good patch? |
84 | ||
e8cd7eae GS |
85 | New features and extensions to the language are contentious, because |
86 | the criteria used by the pumpkings, Larry, and other porters to decide | |
87 | which features should be implemented and incorporated are not codified | |
88 | in a few small design goals as with some other languages. Instead, | |
89 | the heuristics are flexible and often difficult to fathom. Here is | |
90 | one person's list, roughly in decreasing order of importance, of | |
91 | heuristics that new features have to be weighed against: | |
92 | ||
93 | =over 4 | |
94 | ||
95 | =item Does concept match the general goals of Perl? | |
96 | ||
97 | These haven't been written anywhere in stone, but one approximation | |
98 | is: | |
99 | ||
100 | 1. Keep it fast, simple, and useful. | |
101 | 2. Keep features/concepts as orthogonal as possible. | |
102 | 3. No arbitrary limits (platforms, data sizes, cultures). | |
103 | 4. Keep it open and exciting to use/patch/advocate Perl everywhere. | |
104 | 5. Either assimilate new technologies, or build bridges to them. | |
105 | ||
106 | =item Where is the implementation? | |
107 | ||
108 | All the talk in the world is useless without an implementation. In | |
109 | almost every case, the person or people who argue for a new feature | |
110 | will be expected to be the ones who implement it. Porters capable | |
111 | of coding new features have their own agendas, and are not available | |
112 | to implement your (possibly good) idea. | |
113 | ||
114 | =item Backwards compatibility | |
115 | ||
116 | It's a cardinal sin to break existing Perl programs. New warnings are | |
117 | contentious--some say that a program that emits warnings is not | |
118 | broken, while others say it is. Adding keywords has the potential to | |
119 | break programs, changing the meaning of existing token sequences or | |
120 | functions might break programs. | |
121 | ||
122 | =item Could it be a module instead? | |
123 | ||
124 | Perl 5 has extension mechanisms, modules and XS, specifically to avoid | |
125 | the need to keep changing the Perl interpreter. You can write modules | |
126 | that export functions, you can give those functions prototypes so they | |
127 | can be called like built-in functions, you can even write XS code to | |
128 | mess with the runtime data structures of the Perl interpreter if you | |
129 | want to implement really complicated things. If it can be done in a | |
130 | module instead of in the core, it's highly unlikely to be added. | |
131 | ||
132 | =item Is the feature generic enough? | |
133 | ||
134 | Is this something that only the submitter wants added to the language, | |
135 | or would it be broadly useful? Sometimes, instead of adding a feature | |
136 | with a tight focus, the porters might decide to wait until someone | |
137 | implements the more generalized feature. For instance, instead of | |
b432a672 | 138 | implementing a "delayed evaluation" feature, the porters are waiting |
e8cd7eae GS |
139 | for a macro system that would permit delayed evaluation and much more. |
140 | ||
141 | =item Does it potentially introduce new bugs? | |
142 | ||
143 | Radical rewrites of large chunks of the Perl interpreter have the | |
144 | potential to introduce new bugs. The smaller and more localized the | |
145 | change, the better. | |
146 | ||
147 | =item Does it preclude other desirable features? | |
148 | ||
149 | A patch is likely to be rejected if it closes off future avenues of | |
150 | development. For instance, a patch that placed a true and final | |
151 | interpretation on prototypes is likely to be rejected because there | |
152 | are still options for the future of prototypes that haven't been | |
153 | addressed. | |
154 | ||
155 | =item Is the implementation robust? | |
156 | ||
157 | Good patches (tight code, complete, correct) stand more chance of | |
158 | going in. Sloppy or incorrect patches might be placed on the back | |
159 | burner until the pumpking has time to fix, or might be discarded | |
160 | altogether without further notice. | |
161 | ||
162 | =item Is the implementation generic enough to be portable? | |
163 | ||
164 | The worst patches make use of a system-specific features. It's highly | |
353c6505 | 165 | unlikely that non-portable additions to the Perl language will be |
e8cd7eae GS |
166 | accepted. |
167 | ||
a936dd3c NC |
168 | =item Is the implementation tested? |
169 | ||
170 | Patches which change behaviour (fixing bugs or introducing new features) | |
171 | must include regression tests to verify that everything works as expected. | |
172 | Without tests provided by the original author, how can anyone else changing | |
173 | perl in the future be sure that they haven't unwittingly broken the behaviour | |
174 | the patch implements? And without tests, how can the patch's author be | |
9d077eaa | 175 | confident that his/her hard work put into the patch won't be accidentally |
a936dd3c NC |
176 | thrown away by someone in the future? |
177 | ||
e8cd7eae GS |
178 | =item Is there enough documentation? |
179 | ||
180 | Patches without documentation are probably ill-thought out or | |
181 | incomplete. Nothing can be added without documentation, so submitting | |
182 | a patch for the appropriate manpages as well as the source code is | |
a936dd3c | 183 | always a good idea. |
e8cd7eae GS |
184 | |
185 | =item Is there another way to do it? | |
186 | ||
b432a672 AL |
187 | Larry said "Although the Perl Slogan is I<There's More Than One Way |
188 | to Do It>, I hesitate to make 10 ways to do something". This is a | |
e8cd7eae GS |
189 | tricky heuristic to navigate, though--one man's essential addition is |
190 | another man's pointless cruft. | |
191 | ||
192 | =item Does it create too much work? | |
193 | ||
194 | Work for the pumpking, work for Perl programmers, work for module | |
195 | authors, ... Perl is supposed to be easy. | |
196 | ||
f6c51b38 GS |
197 | =item Patches speak louder than words |
198 | ||
199 | Working code is always preferred to pie-in-the-sky ideas. A patch to | |
200 | add a feature stands a much higher chance of making it to the language | |
201 | than does a random feature request, no matter how fervently argued the | |
b432a672 | 202 | request might be. This ties into "Will it be useful?", as the fact |
f6c51b38 GS |
203 | that someone took the time to make the patch demonstrates a strong |
204 | desire for the feature. | |
205 | ||
e8cd7eae GS |
206 | =back |
207 | ||
b432a672 AL |
208 | If you're on the list, you might hear the word "core" bandied |
209 | around. It refers to the standard distribution. "Hacking on the | |
210 | core" means you're changing the C source code to the Perl | |
211 | interpreter. "A core module" is one that ships with Perl. | |
e8cd7eae | 212 | |
cce04beb | 213 | =head2 Getting the Perl source |
a1f349fd | 214 | |
e8cd7eae | 215 | The source code to the Perl interpreter, in its different versions, is |
b16c2e4a RGS |
216 | kept in a repository managed by the git revision control system. The |
217 | pumpkings and a few others have write access to the repository to check in | |
218 | changes. | |
2be4c08b | 219 | |
b16c2e4a | 220 | How to clone and use the git perl repository is described in L<perlrepository>. |
2be4c08b | 221 | |
b16c2e4a | 222 | You can also choose to use rsync to get a copy of the current source tree |
a77cd7b8 | 223 | for the bleadperl branch and all maintenance branches: |
0cfb3454 | 224 | |
a77cd7b8 MB |
225 | $ rsync -avz rsync://perl5.git.perl.org/perl-current . |
226 | $ rsync -avz rsync://perl5.git.perl.org/perl-5.12.x . | |
227 | $ rsync -avz rsync://perl5.git.perl.org/perl-5.10.x . | |
228 | $ rsync -avz rsync://perl5.git.perl.org/perl-5.8.x . | |
229 | $ rsync -avz rsync://perl5.git.perl.org/perl-5.6.x . | |
230 | $ rsync -avz rsync://perl5.git.perl.org/perl-5.005xx . | |
b16c2e4a RGS |
231 | |
232 | (Add the C<--delete> option to remove leftover files) | |
0cfb3454 | 233 | |
a77cd7b8 MB |
234 | To get a full list of the available sync points: |
235 | ||
236 | $ rsync perl5.git.perl.org:: | |
237 | ||
0cfb3454 GS |
238 | You may also want to subscribe to the perl5-changes mailing list to |
239 | receive a copy of each patch that gets submitted to the maintenance | |
240 | and development "branches" of the perl repository. See | |
241 | http://lists.perl.org/ for subscription information. | |
242 | ||
a1f349fd MB |
243 | If you are a member of the perl5-porters mailing list, it is a good |
244 | thing to keep in touch with the most recent changes. If not only to | |
245 | verify if what you would have posted as a bug report isn't already | |
246 | solved in the most recent available perl development branch, also | |
c69ca1d4 | 247 | known as perl-current, bleeding edge perl, bleedperl or bleadperl. |
2be4c08b GS |
248 | |
249 | Needless to say, the source code in perl-current is usually in a perpetual | |
250 | state of evolution. You should expect it to be very buggy. Do B<not> use | |
251 | it for any purpose other than testing and development. | |
e8cd7eae | 252 | |
cce04beb | 253 | =head2 Bug tracking with Perlbug |
52315700 | 254 | |
902821cc RGS |
255 | There is a single remote administrative interface for modifying bug status, |
256 | category, open issues etc. using the B<RT> bugtracker system, maintained | |
257 | by Robert Spier. Become an administrator, and close any bugs you can get | |
3fd28c4e | 258 | your sticky mitts on: |
52315700 | 259 | |
39417508 | 260 | http://bugs.perl.org/ |
52315700 | 261 | |
3fd28c4e | 262 | To email the bug system administrators: |
52315700 | 263 | |
3fd28c4e | 264 | "perlbug-admin" <perlbug-admin@perl.org> |
52315700 | 265 | |
a1f349fd MB |
266 | =head2 Submitting patches |
267 | ||
f7e1e956 MS |
268 | Always submit patches to I<perl5-porters@perl.org>. If you're |
269 | patching a core module and there's an author listed, send the author a | |
270 | copy (see L<Patching a core module>). This lets other porters review | |
271 | your patch, which catches a surprising number of errors in patches. | |
b16c2e4a RGS |
272 | Please patch against the latest B<development> version. (e.g., even if |
273 | you're fixing a bug in the 5.8 track, patch against the C<blead> branch in | |
274 | the git repository.) | |
824d470b RGS |
275 | |
276 | If changes are accepted, they are applied to the development branch. Then | |
fe749c9a | 277 | the maintenance pumpking decides which of those patches is to be |
b16c2e4a RGS |
278 | backported to the maint branch. Only patches that survive the heat of the |
279 | development branch get applied to maintenance versions. | |
f7e1e956 | 280 | |
2c8694a7 | 281 | Your patch should update the documentation and test suite. See |
84cad487 | 282 | L<TESTING>. If you have added or removed files in the distribution, |
2c8694a7 JH |
283 | edit the MANIFEST file accordingly, sort the MANIFEST file using |
284 | C<make manisort>, and include those changes as part of your patch. | |
e8cd7eae | 285 | |
824d470b RGS |
286 | Patching documentation also follows the same order: if accepted, a patch |
287 | is first applied to B<development>, and if relevant then it's backported | |
288 | to B<maintenance>. (With an exception for some patches that document | |
289 | behaviour that only appears in the maintenance branch, but which has | |
290 | changed in the development version.) | |
291 | ||
7f89f796 | 292 | To report a bug in Perl, use the program L<perlbug> which comes with |
e8cd7eae | 293 | Perl (if you can't get Perl to work, send mail to the address |
f18956b7 | 294 | I<perlbug@perl.org> or I<perlbug@perl.com>). Reporting bugs through |
e8cd7eae | 295 | I<perlbug> feeds into the automated bug-tracking system, access to |
902821cc | 296 | which is provided through the web at http://rt.perl.org/rt3/ . It |
e8cd7eae GS |
297 | often pays to check the archives of the perl5-porters mailing list to |
298 | see whether the bug you're reporting has been reported before, and if | |
299 | so whether it was considered a bug. See above for the location of | |
300 | the searchable archives. | |
301 | ||
f224927c | 302 | The CPAN testers ( http://testers.cpan.org/ ) are a group of |
ba139f7d | 303 | volunteers who test CPAN modules on a variety of platforms. Perl |
d3e8af89 GS |
304 | Smokers ( http://www.nntp.perl.org/group/perl.daily-build and |
305 | http://www.nntp.perl.org/group/perl.daily-build.reports/ ) | |
902821cc | 306 | automatically test Perl source releases on platforms with various |
d3e8af89 GS |
307 | configurations. Both efforts welcome volunteers. In order to get |
308 | involved in smoke testing of the perl itself visit | |
309 | L<http://search.cpan.org/dist/Test-Smoke>. In order to start smoke | |
68cbce50 CBW |
310 | testing CPAN modules visit L<http://search.cpan.org/dist/CPANPLUS-YACSmoke/> |
311 | or L<http://search.cpan.org/dist/minismokebox/> or | |
d3e8af89 | 312 | L<http://search.cpan.org/dist/CPAN-Reporter/>. |
e8cd7eae | 313 | |
e8cd7eae GS |
314 | It's a good idea to read and lurk for a while before chipping in. |
315 | That way you'll get to see the dynamic of the conversations, learn the | |
316 | personalities of the players, and hopefully be better prepared to make | |
317 | a useful contribution when do you speak up. | |
318 | ||
319 | If after all this you still think you want to join the perl5-porters | |
f6c51b38 GS |
320 | mailing list, send mail to I<perl5-porters-subscribe@perl.org>. To |
321 | unsubscribe, send mail to I<perl5-porters-unsubscribe@perl.org>. | |
e8cd7eae | 322 | |
cce04beb DG |
323 | =head2 Patching a core module |
324 | ||
325 | This works just like patching anything else, with an extra | |
326 | consideration. Many core modules also live on CPAN. If this is so, | |
327 | patch the CPAN version instead of the core and send the patch off to | |
328 | the module maintainer (with a copy to p5p). This will help the module | |
329 | maintainer keep the CPAN version in sync with the core version without | |
330 | constantly scanning p5p. | |
331 | ||
332 | The list of maintainers of core modules is usefully documented in | |
333 | F<Porting/Maintainers.pl>. | |
334 | ||
335 | =head2 Adding a new function to the core | |
336 | ||
337 | If, as part of a patch to fix a bug, or just because you have an | |
338 | especially good idea, you decide to add a new function to the core, | |
339 | discuss your ideas on p5p well before you start work. It may be that | |
340 | someone else has already attempted to do what you are considering and | |
341 | can give lots of good advice or even provide you with bits of code | |
342 | that they already started (but never finished). | |
343 | ||
344 | You have to follow all of the advice given above for patching. It is | |
345 | extremely important to test any addition thoroughly and add new tests | |
346 | to explore all boundary conditions that your new function is expected | |
347 | to handle. If your new function is used only by one module (e.g. toke), | |
348 | then it should probably be named S_your_function (for static); on the | |
349 | other hand, if you expect it to accessible from other functions in | |
350 | Perl, you should name it Perl_your_function. See L<perlguts/Internal Functions> | |
351 | for more details. | |
352 | ||
353 | The location of any new code is also an important consideration. Don't | |
354 | just create a new top level .c file and put your code there; you would | |
355 | have to make changes to Configure (so the Makefile is created properly), | |
356 | as well as possibly lots of include files. This is strictly pumpking | |
357 | business. | |
358 | ||
359 | It is better to add your function to one of the existing top level | |
360 | source code files, but your choice is complicated by the nature of | |
361 | the Perl distribution. Only the files that are marked as compiled | |
362 | static are located in the perl executable. Everything else is located | |
363 | in the shared library (or DLL if you are running under WIN32). So, | |
364 | for example, if a function was only used by functions located in | |
365 | toke.c, then your code can go in toke.c. If, however, you want to call | |
366 | the function from universal.c, then you should put your code in another | |
367 | location, for example util.c. | |
368 | ||
369 | In addition to writing your c-code, you will need to create an | |
370 | appropriate entry in embed.pl describing your function, then run | |
371 | 'make regen_headers' to create the entries in the numerous header | |
372 | files that perl needs to compile correctly. See L<perlguts/Internal Functions> | |
373 | for information on the various options that you can set in embed.pl. | |
374 | You will forget to do this a few (or many) times and you will get | |
375 | warnings during the compilation phase. Make sure that you mention | |
376 | this when you post your patch to P5P; the pumpking needs to know this. | |
377 | ||
378 | When you write your new code, please be conscious of existing code | |
379 | conventions used in the perl source files. See L<perlstyle> for | |
380 | details. Although most of the guidelines discussed seem to focus on | |
381 | Perl code, rather than c, they all apply (except when they don't ;). | |
382 | Also see L<perlrepository> for lots of details about both formatting and | |
383 | submitting patches of your changes. | |
384 | ||
385 | Lastly, TEST TEST TEST TEST TEST any code before posting to p5p. | |
386 | Test on as many platforms as you can find. Test as many perl | |
387 | Configure options as you can (e.g. MULTIPLICITY). If you have | |
84cad487 | 388 | profiling or memory tools, see L<MEMORY DEBUGGERS> and L<PROFILING> |
cce04beb DG |
389 | below for how to use them to further test your code. Remember that |
390 | most of the people on P5P are doing this on their own time and | |
391 | don't have the time to debug your code. | |
392 | ||
393 | =head2 Background reading | |
394 | ||
a422fd2d SC |
395 | To hack on the Perl guts, you'll need to read the following things: |
396 | ||
397 | =over 3 | |
398 | ||
399 | =item L<perlguts> | |
400 | ||
401 | This is of paramount importance, since it's the documentation of what | |
402 | goes where in the Perl source. Read it over a couple of times and it | |
403 | might start to make sense - don't worry if it doesn't yet, because the | |
404 | best way to study it is to read it in conjunction with poking at Perl | |
405 | source, and we'll do that later on. | |
406 | ||
0aa6d4a5 RU |
407 | Gisle Aas's "illustrated perlguts", also known as I<illguts>, has very |
408 | helpful pictures: | |
de10be12 | 409 | |
0aa6d4a5 | 410 | L<http://search.cpan.org/dist/illguts/> |
a422fd2d SC |
411 | |
412 | =item L<perlxstut> and L<perlxs> | |
413 | ||
414 | A working knowledge of XSUB programming is incredibly useful for core | |
415 | hacking; XSUBs use techniques drawn from the PP code, the portion of the | |
416 | guts that actually executes a Perl program. It's a lot gentler to learn | |
417 | those techniques from simple examples and explanation than from the core | |
418 | itself. | |
419 | ||
420 | =item L<perlapi> | |
421 | ||
422 | The documentation for the Perl API explains what some of the internal | |
423 | functions do, as well as the many macros used in the source. | |
424 | ||
425 | =item F<Porting/pumpkin.pod> | |
426 | ||
427 | This is a collection of words of wisdom for a Perl porter; some of it is | |
428 | only useful to the pumpkin holder, but most of it applies to anyone | |
429 | wanting to go about Perl development. | |
430 | ||
431 | =item The perl5-porters FAQ | |
432 | ||
902821cc RGS |
433 | This should be available from http://dev.perl.org/perl5/docs/p5p-faq.html . |
434 | It contains hints on reading perl5-porters, information on how | |
435 | perl5-porters works and how Perl development in general works. | |
a422fd2d SC |
436 | |
437 | =back | |
438 | ||
cce04beb DG |
439 | =head1 UNDERSTANDING THE SOURCE |
440 | ||
441 | =head2 Finding your way around | |
a422fd2d SC |
442 | |
443 | Perl maintenance can be split into a number of areas, and certain people | |
444 | (pumpkins) will have responsibility for each area. These areas sometimes | |
445 | correspond to files or directories in the source kit. Among the areas are: | |
446 | ||
447 | =over 3 | |
448 | ||
449 | =item Core modules | |
450 | ||
c53fdc5e RF |
451 | Modules shipped as part of the Perl core live in various subdirectories, where |
452 | two are dedicated to core-only modules, and two are for the dual-life modules | |
453 | which live on CPAN and may be maintained separately with respect to the Perl | |
454 | core: | |
455 | ||
456 | lib/ is for pure-Perl modules, which exist in the core only. | |
457 | ||
195c30ce KW |
458 | ext/ is for XS extensions, and modules with special Makefile.PL |
459 | requirements, which exist in the core only. | |
c53fdc5e | 460 | |
195c30ce KW |
461 | cpan/ is for dual-life modules, where the CPAN module is |
462 | canonical (should be patched first). | |
c53fdc5e | 463 | |
195c30ce KW |
464 | dist/ is for dual-life modules, where the blead source is |
465 | canonical. | |
a422fd2d | 466 | |
6bdc4f2c FR |
467 | For some dual-life modules it has not been discussed if the CPAN version or the |
468 | blead source is canonical. Until that is done, those modules should be in | |
469 | F<cpan/>. | |
470 | ||
f7e1e956 MS |
471 | =item Tests |
472 | ||
473 | There are tests for nearly all the modules, built-ins and major bits | |
474 | of functionality. Test files all have a .t suffix. Module tests live | |
475 | in the F<lib/> and F<ext/> directories next to the module being | |
84cad487 | 476 | tested. Others live in F<t/>. See L<TESTING> |
f7e1e956 | 477 | |
a422fd2d SC |
478 | =item Documentation |
479 | ||
480 | Documentation maintenance includes looking after everything in the | |
481 | F<pod/> directory, (as well as contributing new documentation) and | |
482 | the documentation to the modules in core. | |
483 | ||
484 | =item Configure | |
485 | ||
99c47ece | 486 | The Configure process is the way we make Perl portable across the |
a422fd2d | 487 | myriad of operating systems it supports. Responsibility for the |
99c47ece MB |
488 | Configure, build and installation process, as well as the overall |
489 | portability of the core code rests with the Configure pumpkin - | |
490 | others help out with individual operating systems. | |
491 | ||
e1020413 | 492 | The three files that fall under his/her responsibility are Configure, |
99c47ece MB |
493 | config_h.SH, and Porting/Glossary (and a whole bunch of small related |
494 | files that are less important here). The Configure pumpkin decides how | |
495 | patches to these are dealt with. Currently, the Configure pumpkin will | |
496 | accept patches in most common formats, even directly to these files. | |
497 | Other committers are allowed to commit to these files under the strict | |
498 | condition that they will inform the Configure pumpkin, either on IRC | |
499 | (if he/she happens to be around) or through (personal) e-mail. | |
a422fd2d SC |
500 | |
501 | The files involved are the operating system directories, (F<win32/>, | |
502 | F<os2/>, F<vms/> and so on) the shell scripts which generate F<config.h> | |
503 | and F<Makefile>, as well as the metaconfig files which generate | |
504 | F<Configure>. (metaconfig isn't included in the core distribution.) | |
505 | ||
99c47ece MB |
506 | See http://perl5.git.perl.org/metaconfig.git/blob/HEAD:/README for a |
507 | description of the full process involved. | |
508 | ||
a422fd2d SC |
509 | =item Interpreter |
510 | ||
511 | And of course, there's the core of the Perl interpreter itself. Let's | |
512 | have a look at that in a little more detail. | |
513 | ||
514 | =back | |
515 | ||
516 | Before we leave looking at the layout, though, don't forget that | |
517 | F<MANIFEST> contains not only the file names in the Perl distribution, | |
518 | but short descriptions of what's in them, too. For an overview of the | |
519 | important files, try this: | |
520 | ||
521 | perl -lne 'print if /^[^\/]+\.[ch]\s+/' MANIFEST | |
522 | ||
523 | =head2 Elements of the interpreter | |
524 | ||
525 | The work of the interpreter has two main stages: compiling the code | |
526 | into the internal representation, or bytecode, and then executing it. | |
527 | L<perlguts/Compiled code> explains exactly how the compilation stage | |
528 | happens. | |
529 | ||
530 | Here is a short breakdown of perl's operation: | |
531 | ||
532 | =over 3 | |
533 | ||
534 | =item Startup | |
535 | ||
536 | The action begins in F<perlmain.c>. (or F<miniperlmain.c> for miniperl) | |
537 | This is very high-level code, enough to fit on a single screen, and it | |
538 | resembles the code found in L<perlembed>; most of the real action takes | |
539 | place in F<perl.c> | |
540 | ||
fbcaf611 | 541 | F<perlmain.c> is generated by C<ExtUtils::Miniperl> from F<miniperlmain.c> at |
9df8f87f LB |
542 | make time, so you should make perl to follow this along. |
543 | ||
a422fd2d | 544 | First, F<perlmain.c> allocates some memory and constructs a Perl |
9df8f87f | 545 | interpreter, along these lines: |
a422fd2d SC |
546 | |
547 | 1 PERL_SYS_INIT3(&argc,&argv,&env); | |
548 | 2 | |
549 | 3 if (!PL_do_undump) { | |
550 | 4 my_perl = perl_alloc(); | |
551 | 5 if (!my_perl) | |
552 | 6 exit(1); | |
553 | 7 perl_construct(my_perl); | |
554 | 8 PL_perl_destruct_level = 0; | |
555 | 9 } | |
556 | ||
557 | Line 1 is a macro, and its definition is dependent on your operating | |
558 | system. Line 3 references C<PL_do_undump>, a global variable - all | |
559 | global variables in Perl start with C<PL_>. This tells you whether the | |
560 | current running program was created with the C<-u> flag to perl and then | |
561 | F<undump>, which means it's going to be false in any sane context. | |
562 | ||
563 | Line 4 calls a function in F<perl.c> to allocate memory for a Perl | |
564 | interpreter. It's quite a simple function, and the guts of it looks like | |
565 | this: | |
566 | ||
195c30ce | 567 | my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter)); |
a422fd2d SC |
568 | |
569 | Here you see an example of Perl's system abstraction, which we'll see | |
570 | later: C<PerlMem_malloc> is either your system's C<malloc>, or Perl's | |
571 | own C<malloc> as defined in F<malloc.c> if you selected that option at | |
572 | configure time. | |
573 | ||
9df8f87f LB |
574 | Next, in line 7, we construct the interpreter using perl_construct, |
575 | also in F<perl.c>; this sets up all the special variables that Perl | |
576 | needs, the stacks, and so on. | |
a422fd2d SC |
577 | |
578 | Now we pass Perl the command line options, and tell it to go: | |
579 | ||
195c30ce KW |
580 | exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL); |
581 | if (!exitstatus) | |
582 | perl_run(my_perl); | |
9df8f87f | 583 | |
195c30ce | 584 | exitstatus = perl_destruct(my_perl); |
a422fd2d | 585 | |
195c30ce | 586 | perl_free(my_perl); |
a422fd2d SC |
587 | |
588 | C<perl_parse> is actually a wrapper around C<S_parse_body>, as defined | |
589 | in F<perl.c>, which processes the command line options, sets up any | |
590 | statically linked XS modules, opens the program and calls C<yyparse> to | |
591 | parse it. | |
592 | ||
593 | =item Parsing | |
594 | ||
595 | The aim of this stage is to take the Perl source, and turn it into an op | |
596 | tree. We'll see what one of those looks like later. Strictly speaking, | |
597 | there's three things going on here. | |
598 | ||
599 | C<yyparse>, the parser, lives in F<perly.c>, although you're better off | |
600 | reading the original YACC input in F<perly.y>. (Yes, Virginia, there | |
601 | B<is> a YACC grammar for Perl!) The job of the parser is to take your | |
b432a672 | 602 | code and "understand" it, splitting it into sentences, deciding which |
a422fd2d SC |
603 | operands go with which operators and so on. |
604 | ||
605 | The parser is nobly assisted by the lexer, which chunks up your input | |
606 | into tokens, and decides what type of thing each token is: a variable | |
607 | name, an operator, a bareword, a subroutine, a core function, and so on. | |
608 | The main point of entry to the lexer is C<yylex>, and that and its | |
609 | associated routines can be found in F<toke.c>. Perl isn't much like | |
610 | other computer languages; it's highly context sensitive at times, it can | |
611 | be tricky to work out what sort of token something is, or where a token | |
612 | ends. As such, there's a lot of interplay between the tokeniser and the | |
613 | parser, which can get pretty frightening if you're not used to it. | |
614 | ||
615 | As the parser understands a Perl program, it builds up a tree of | |
616 | operations for the interpreter to perform during execution. The routines | |
617 | which construct and link together the various operations are to be found | |
618 | in F<op.c>, and will be examined later. | |
619 | ||
620 | =item Optimization | |
621 | ||
622 | Now the parsing stage is complete, and the finished tree represents | |
623 | the operations that the Perl interpreter needs to perform to execute our | |
624 | program. Next, Perl does a dry run over the tree looking for | |
625 | optimisations: constant expressions such as C<3 + 4> will be computed | |
626 | now, and the optimizer will also see if any multiple operations can be | |
627 | replaced with a single one. For instance, to fetch the variable C<$foo>, | |
628 | instead of grabbing the glob C<*foo> and looking at the scalar | |
629 | component, the optimizer fiddles the op tree to use a function which | |
630 | directly looks up the scalar in question. The main optimizer is C<peep> | |
631 | in F<op.c>, and many ops have their own optimizing functions. | |
632 | ||
633 | =item Running | |
634 | ||
635 | Now we're finally ready to go: we have compiled Perl byte code, and all | |
636 | that's left to do is run it. The actual execution is done by the | |
637 | C<runops_standard> function in F<run.c>; more specifically, it's done by | |
638 | these three innocent looking lines: | |
639 | ||
16c91539 | 640 | while ((PL_op = PL_op->op_ppaddr(aTHX))) { |
a422fd2d SC |
641 | PERL_ASYNC_CHECK(); |
642 | } | |
643 | ||
644 | You may be more comfortable with the Perl version of that: | |
645 | ||
646 | PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}}; | |
647 | ||
648 | Well, maybe not. Anyway, each op contains a function pointer, which | |
649 | stipulates the function which will actually carry out the operation. | |
650 | This function will return the next op in the sequence - this allows for | |
651 | things like C<if> which choose the next op dynamically at run time. | |
652 | The C<PERL_ASYNC_CHECK> makes sure that things like signals interrupt | |
653 | execution if required. | |
654 | ||
655 | The actual functions called are known as PP code, and they're spread | |
b432a672 | 656 | between four files: F<pp_hot.c> contains the "hot" code, which is most |
a422fd2d SC |
657 | often used and highly optimized, F<pp_sys.c> contains all the |
658 | system-specific functions, F<pp_ctl.c> contains the functions which | |
659 | implement control structures (C<if>, C<while> and the like) and F<pp.c> | |
660 | contains everything else. These are, if you like, the C code for Perl's | |
661 | built-in functions and operators. | |
662 | ||
dfc98234 DM |
663 | Note that each C<pp_> function is expected to return a pointer to the next |
664 | op. Calls to perl subs (and eval blocks) are handled within the same | |
665 | runops loop, and do not consume extra space on the C stack. For example, | |
666 | C<pp_entersub> and C<pp_entertry> just push a C<CxSUB> or C<CxEVAL> block | |
667 | struct onto the context stack which contain the address of the op | |
668 | following the sub call or eval. They then return the first op of that sub | |
669 | or eval block, and so execution continues of that sub or block. Later, a | |
670 | C<pp_leavesub> or C<pp_leavetry> op pops the C<CxSUB> or C<CxEVAL>, | |
671 | retrieves the return op from it, and returns it. | |
672 | ||
673 | =item Exception handing | |
674 | ||
0503309d | 675 | Perl's exception handing (i.e. C<die> etc.) is built on top of the low-level |
dfc98234 | 676 | C<setjmp()>/C<longjmp()> C-library functions. These basically provide a |
28a5cf3b | 677 | way to capture the current PC and SP registers and later restore them; i.e. |
dfc98234 DM |
678 | a C<longjmp()> continues at the point in code where a previous C<setjmp()> |
679 | was done, with anything further up on the C stack being lost. This is why | |
680 | code should always save values using C<SAVE_FOO> rather than in auto | |
681 | variables. | |
682 | ||
683 | The perl core wraps C<setjmp()> etc in the macros C<JMPENV_PUSH> and | |
684 | C<JMPENV_JUMP>. The basic rule of perl exceptions is that C<exit>, and | |
685 | C<die> (in the absence of C<eval>) perform a C<JMPENV_JUMP(2)>, while | |
686 | C<die> within C<eval> does a C<JMPENV_JUMP(3)>. | |
687 | ||
688 | At entry points to perl, such as C<perl_parse()>, C<perl_run()> and | |
689 | C<call_sv(cv, G_EVAL)> each does a C<JMPENV_PUSH>, then enter a runops | |
690 | loop or whatever, and handle possible exception returns. For a 2 return, | |
691 | final cleanup is performed, such as popping stacks and calling C<CHECK> or | |
692 | C<END> blocks. Amongst other things, this is how scope cleanup still | |
693 | occurs during an C<exit>. | |
694 | ||
695 | If a C<die> can find a C<CxEVAL> block on the context stack, then the | |
696 | stack is popped to that level and the return op in that block is assigned | |
697 | to C<PL_restartop>; then a C<JMPENV_JUMP(3)> is performed. This normally | |
698 | passes control back to the guard. In the case of C<perl_run> and | |
699 | C<call_sv>, a non-null C<PL_restartop> triggers re-entry to the runops | |
700 | loop. The is the normal way that C<die> or C<croak> is handled within an | |
701 | C<eval>. | |
702 | ||
703 | Sometimes ops are executed within an inner runops loop, such as tie, sort | |
704 | or overload code. In this case, something like | |
705 | ||
706 | sub FETCH { eval { die } } | |
707 | ||
708 | would cause a longjmp right back to the guard in C<perl_run>, popping both | |
709 | runops loops, which is clearly incorrect. One way to avoid this is for the | |
710 | tie code to do a C<JMPENV_PUSH> before executing C<FETCH> in the inner | |
711 | runops loop, but for efficiency reasons, perl in fact just sets a flag, | |
712 | using C<CATCH_SET(TRUE)>. The C<pp_require>, C<pp_entereval> and | |
713 | C<pp_entertry> ops check this flag, and if true, they call C<docatch>, | |
714 | which does a C<JMPENV_PUSH> and starts a new runops level to execute the | |
715 | code, rather than doing it on the current loop. | |
716 | ||
717 | As a further optimisation, on exit from the eval block in the C<FETCH>, | |
718 | execution of the code following the block is still carried on in the inner | |
719 | loop. When an exception is raised, C<docatch> compares the C<JMPENV> | |
720 | level of the C<CxEVAL> with C<PL_top_env> and if they differ, just | |
721 | re-throws the exception. In this way any inner loops get popped. | |
722 | ||
723 | Here's an example. | |
724 | ||
725 | 1: eval { tie @a, 'A' }; | |
726 | 2: sub A::TIEARRAY { | |
727 | 3: eval { die }; | |
728 | 4: die; | |
729 | 5: } | |
730 | ||
731 | To run this code, C<perl_run> is called, which does a C<JMPENV_PUSH> then | |
732 | enters a runops loop. This loop executes the eval and tie ops on line 1, | |
733 | with the eval pushing a C<CxEVAL> onto the context stack. | |
734 | ||
735 | The C<pp_tie> does a C<CATCH_SET(TRUE)>, then starts a second runops loop | |
736 | to execute the body of C<TIEARRAY>. When it executes the entertry op on | |
737 | line 3, C<CATCH_GET> is true, so C<pp_entertry> calls C<docatch> which | |
738 | does a C<JMPENV_PUSH> and starts a third runops loop, which then executes | |
739 | the die op. At this point the C call stack looks like this: | |
740 | ||
741 | Perl_pp_die | |
742 | Perl_runops # third loop | |
743 | S_docatch_body | |
744 | S_docatch | |
745 | Perl_pp_entertry | |
746 | Perl_runops # second loop | |
747 | S_call_body | |
748 | Perl_call_sv | |
749 | Perl_pp_tie | |
750 | Perl_runops # first loop | |
751 | S_run_body | |
752 | perl_run | |
753 | main | |
754 | ||
755 | and the context and data stacks, as shown by C<-Dstv>, look like: | |
756 | ||
757 | STACK 0: MAIN | |
758 | CX 0: BLOCK => | |
759 | CX 1: EVAL => AV() PV("A"\0) | |
760 | retop=leave | |
761 | STACK 1: MAGIC | |
762 | CX 0: SUB => | |
763 | retop=(null) | |
764 | CX 1: EVAL => * | |
765 | retop=nextstate | |
766 | ||
767 | The die pops the first C<CxEVAL> off the context stack, sets | |
768 | C<PL_restartop> from it, does a C<JMPENV_JUMP(3)>, and control returns to | |
769 | the top C<docatch>. This then starts another third-level runops level, | |
770 | which executes the nextstate, pushmark and die ops on line 4. At the point | |
771 | that the second C<pp_die> is called, the C call stack looks exactly like | |
772 | that above, even though we are no longer within an inner eval; this is | |
773 | because of the optimization mentioned earlier. However, the context stack | |
774 | now looks like this, ie with the top CxEVAL popped: | |
775 | ||
776 | STACK 0: MAIN | |
777 | CX 0: BLOCK => | |
778 | CX 1: EVAL => AV() PV("A"\0) | |
779 | retop=leave | |
780 | STACK 1: MAGIC | |
781 | CX 0: SUB => | |
782 | retop=(null) | |
783 | ||
784 | The die on line 4 pops the context stack back down to the CxEVAL, leaving | |
785 | it as: | |
786 | ||
787 | STACK 0: MAIN | |
788 | CX 0: BLOCK => | |
789 | ||
790 | As usual, C<PL_restartop> is extracted from the C<CxEVAL>, and a | |
791 | C<JMPENV_JUMP(3)> done, which pops the C stack back to the docatch: | |
792 | ||
793 | S_docatch | |
794 | Perl_pp_entertry | |
795 | Perl_runops # second loop | |
796 | S_call_body | |
797 | Perl_call_sv | |
798 | Perl_pp_tie | |
799 | Perl_runops # first loop | |
800 | S_run_body | |
801 | perl_run | |
802 | main | |
803 | ||
804 | In this case, because the C<JMPENV> level recorded in the C<CxEVAL> | |
805 | differs from the current one, C<docatch> just does a C<JMPENV_JUMP(3)> | |
806 | and the C stack unwinds to: | |
807 | ||
808 | perl_run | |
809 | main | |
810 | ||
811 | Because C<PL_restartop> is non-null, C<run_body> starts a new runops loop | |
812 | and execution continues. | |
813 | ||
a422fd2d SC |
814 | =back |
815 | ||
816 | =head2 Internal Variable Types | |
817 | ||
818 | You should by now have had a look at L<perlguts>, which tells you about | |
819 | Perl's internal variable types: SVs, HVs, AVs and the rest. If not, do | |
820 | that now. | |
821 | ||
822 | These variables are used not only to represent Perl-space variables, but | |
823 | also any constants in the code, as well as some structures completely | |
824 | internal to Perl. The symbol table, for instance, is an ordinary Perl | |
825 | hash. Your code is represented by an SV as it's read into the parser; | |
826 | any program files you call are opened via ordinary Perl filehandles, and | |
827 | so on. | |
828 | ||
829 | The core L<Devel::Peek|Devel::Peek> module lets us examine SVs from a | |
830 | Perl program. Let's see, for instance, how Perl treats the constant | |
831 | C<"hello">. | |
832 | ||
833 | % perl -MDevel::Peek -e 'Dump("hello")' | |
834 | 1 SV = PV(0xa041450) at 0xa04ecbc | |
835 | 2 REFCNT = 1 | |
836 | 3 FLAGS = (POK,READONLY,pPOK) | |
837 | 4 PV = 0xa0484e0 "hello"\0 | |
838 | 5 CUR = 5 | |
839 | 6 LEN = 6 | |
840 | ||
841 | Reading C<Devel::Peek> output takes a bit of practise, so let's go | |
842 | through it line by line. | |
843 | ||
844 | Line 1 tells us we're looking at an SV which lives at C<0xa04ecbc> in | |
845 | memory. SVs themselves are very simple structures, but they contain a | |
846 | pointer to a more complex structure. In this case, it's a PV, a | |
847 | structure which holds a string value, at location C<0xa041450>. Line 2 | |
848 | is the reference count; there are no other references to this data, so | |
849 | it's 1. | |
850 | ||
851 | Line 3 are the flags for this SV - it's OK to use it as a PV, it's a | |
852 | read-only SV (because it's a constant) and the data is a PV internally. | |
853 | Next we've got the contents of the string, starting at location | |
854 | C<0xa0484e0>. | |
855 | ||
856 | Line 5 gives us the current length of the string - note that this does | |
857 | B<not> include the null terminator. Line 6 is not the length of the | |
858 | string, but the length of the currently allocated buffer; as the string | |
859 | grows, Perl automatically extends the available storage via a routine | |
860 | called C<SvGROW>. | |
861 | ||
862 | You can get at any of these quantities from C very easily; just add | |
863 | C<Sv> to the name of the field shown in the snippet, and you've got a | |
864 | macro which will return the value: C<SvCUR(sv)> returns the current | |
865 | length of the string, C<SvREFCOUNT(sv)> returns the reference count, | |
866 | C<SvPV(sv, len)> returns the string itself with its length, and so on. | |
867 | More macros to manipulate these properties can be found in L<perlguts>. | |
868 | ||
869 | Let's take an example of manipulating a PV, from C<sv_catpvn>, in F<sv.c> | |
870 | ||
871 | 1 void | |
872 | 2 Perl_sv_catpvn(pTHX_ register SV *sv, register const char *ptr, register STRLEN len) | |
873 | 3 { | |
874 | 4 STRLEN tlen; | |
875 | 5 char *junk; | |
876 | ||
877 | 6 junk = SvPV_force(sv, tlen); | |
878 | 7 SvGROW(sv, tlen + len + 1); | |
879 | 8 if (ptr == junk) | |
880 | 9 ptr = SvPVX(sv); | |
881 | 10 Move(ptr,SvPVX(sv)+tlen,len,char); | |
882 | 11 SvCUR(sv) += len; | |
883 | 12 *SvEND(sv) = '\0'; | |
884 | 13 (void)SvPOK_only_UTF8(sv); /* validate pointer */ | |
885 | 14 SvTAINT(sv); | |
886 | 15 } | |
887 | ||
888 | This is a function which adds a string, C<ptr>, of length C<len> onto | |
889 | the end of the PV stored in C<sv>. The first thing we do in line 6 is | |
890 | make sure that the SV B<has> a valid PV, by calling the C<SvPV_force> | |
891 | macro to force a PV. As a side effect, C<tlen> gets set to the current | |
892 | value of the PV, and the PV itself is returned to C<junk>. | |
893 | ||
b1866b2d | 894 | In line 7, we make sure that the SV will have enough room to accommodate |
a422fd2d SC |
895 | the old string, the new string and the null terminator. If C<LEN> isn't |
896 | big enough, C<SvGROW> will reallocate space for us. | |
897 | ||
898 | Now, if C<junk> is the same as the string we're trying to add, we can | |
899 | grab the string directly from the SV; C<SvPVX> is the address of the PV | |
900 | in the SV. | |
901 | ||
902 | Line 10 does the actual catenation: the C<Move> macro moves a chunk of | |
903 | memory around: we move the string C<ptr> to the end of the PV - that's | |
904 | the start of the PV plus its current length. We're moving C<len> bytes | |
905 | of type C<char>. After doing so, we need to tell Perl we've extended the | |
906 | string, by altering C<CUR> to reflect the new length. C<SvEND> is a | |
907 | macro which gives us the end of the string, so that needs to be a | |
908 | C<"\0">. | |
909 | ||
910 | Line 13 manipulates the flags; since we've changed the PV, any IV or NV | |
911 | values will no longer be valid: if we have C<$a=10; $a.="6";> we don't | |
1e54db1a | 912 | want to use the old IV of 10. C<SvPOK_only_utf8> is a special UTF-8-aware |
a422fd2d SC |
913 | version of C<SvPOK_only>, a macro which turns off the IOK and NOK flags |
914 | and turns on POK. The final C<SvTAINT> is a macro which launders tainted | |
915 | data if taint mode is turned on. | |
916 | ||
917 | AVs and HVs are more complicated, but SVs are by far the most common | |
918 | variable type being thrown around. Having seen something of how we | |
919 | manipulate these, let's go on and look at how the op tree is | |
920 | constructed. | |
921 | ||
922 | =head2 Op Trees | |
923 | ||
924 | First, what is the op tree, anyway? The op tree is the parsed | |
925 | representation of your program, as we saw in our section on parsing, and | |
926 | it's the sequence of operations that Perl goes through to execute your | |
927 | program, as we saw in L</Running>. | |
928 | ||
929 | An op is a fundamental operation that Perl can perform: all the built-in | |
930 | functions and operators are ops, and there are a series of ops which | |
931 | deal with concepts the interpreter needs internally - entering and | |
932 | leaving a block, ending a statement, fetching a variable, and so on. | |
933 | ||
934 | The op tree is connected in two ways: you can imagine that there are two | |
935 | "routes" through it, two orders in which you can traverse the tree. | |
936 | First, parse order reflects how the parser understood the code, and | |
937 | secondly, execution order tells perl what order to perform the | |
938 | operations in. | |
939 | ||
940 | The easiest way to examine the op tree is to stop Perl after it has | |
941 | finished parsing, and get it to dump out the tree. This is exactly what | |
7d7d5695 RGS |
942 | the compiler backends L<B::Terse|B::Terse>, L<B::Concise|B::Concise> |
943 | and L<B::Debug|B::Debug> do. | |
a422fd2d SC |
944 | |
945 | Let's have a look at how Perl sees C<$a = $b + $c>: | |
946 | ||
947 | % perl -MO=Terse -e '$a=$b+$c' | |
948 | 1 LISTOP (0x8179888) leave | |
949 | 2 OP (0x81798b0) enter | |
950 | 3 COP (0x8179850) nextstate | |
951 | 4 BINOP (0x8179828) sassign | |
952 | 5 BINOP (0x8179800) add [1] | |
953 | 6 UNOP (0x81796e0) null [15] | |
954 | 7 SVOP (0x80fafe0) gvsv GV (0x80fa4cc) *b | |
955 | 8 UNOP (0x81797e0) null [15] | |
956 | 9 SVOP (0x8179700) gvsv GV (0x80efeb0) *c | |
957 | 10 UNOP (0x816b4f0) null [15] | |
958 | 11 SVOP (0x816dcf0) gvsv GV (0x80fa460) *a | |
959 | ||
960 | Let's start in the middle, at line 4. This is a BINOP, a binary | |
961 | operator, which is at location C<0x8179828>. The specific operator in | |
962 | question is C<sassign> - scalar assignment - and you can find the code | |
963 | which implements it in the function C<pp_sassign> in F<pp_hot.c>. As a | |
964 | binary operator, it has two children: the add operator, providing the | |
965 | result of C<$b+$c>, is uppermost on line 5, and the left hand side is on | |
966 | line 10. | |
967 | ||
968 | Line 10 is the null op: this does exactly nothing. What is that doing | |
969 | there? If you see the null op, it's a sign that something has been | |
970 | optimized away after parsing. As we mentioned in L</Optimization>, | |
971 | the optimization stage sometimes converts two operations into one, for | |
972 | example when fetching a scalar variable. When this happens, instead of | |
973 | rewriting the op tree and cleaning up the dangling pointers, it's easier | |
974 | just to replace the redundant operation with the null op. Originally, | |
975 | the tree would have looked like this: | |
976 | ||
977 | 10 SVOP (0x816b4f0) rv2sv [15] | |
978 | 11 SVOP (0x816dcf0) gv GV (0x80fa460) *a | |
979 | ||
980 | That is, fetch the C<a> entry from the main symbol table, and then look | |
981 | at the scalar component of it: C<gvsv> (C<pp_gvsv> into F<pp_hot.c>) | |
982 | happens to do both these things. | |
983 | ||
984 | The right hand side, starting at line 5 is similar to what we've just | |
985 | seen: we have the C<add> op (C<pp_add> also in F<pp_hot.c>) add together | |
986 | two C<gvsv>s. | |
987 | ||
988 | Now, what's this about? | |
989 | ||
990 | 1 LISTOP (0x8179888) leave | |
991 | 2 OP (0x81798b0) enter | |
992 | 3 COP (0x8179850) nextstate | |
993 | ||
994 | C<enter> and C<leave> are scoping ops, and their job is to perform any | |
995 | housekeeping every time you enter and leave a block: lexical variables | |
996 | are tidied up, unreferenced variables are destroyed, and so on. Every | |
997 | program will have those first three lines: C<leave> is a list, and its | |
998 | children are all the statements in the block. Statements are delimited | |
999 | by C<nextstate>, so a block is a collection of C<nextstate> ops, with | |
1000 | the ops to be performed for each statement being the children of | |
1001 | C<nextstate>. C<enter> is a single op which functions as a marker. | |
1002 | ||
1003 | That's how Perl parsed the program, from top to bottom: | |
1004 | ||
1005 | Program | |
1006 | | | |
1007 | Statement | |
1008 | | | |
1009 | = | |
1010 | / \ | |
1011 | / \ | |
1012 | $a + | |
1013 | / \ | |
1014 | $b $c | |
1015 | ||
1016 | However, it's impossible to B<perform> the operations in this order: | |
1017 | you have to find the values of C<$b> and C<$c> before you add them | |
1018 | together, for instance. So, the other thread that runs through the op | |
1019 | tree is the execution order: each op has a field C<op_next> which points | |
1020 | to the next op to be run, so following these pointers tells us how perl | |
1021 | executes the code. We can traverse the tree in this order using | |
1022 | the C<exec> option to C<B::Terse>: | |
1023 | ||
1024 | % perl -MO=Terse,exec -e '$a=$b+$c' | |
1025 | 1 OP (0x8179928) enter | |
1026 | 2 COP (0x81798c8) nextstate | |
1027 | 3 SVOP (0x81796c8) gvsv GV (0x80fa4d4) *b | |
1028 | 4 SVOP (0x8179798) gvsv GV (0x80efeb0) *c | |
1029 | 5 BINOP (0x8179878) add [1] | |
1030 | 6 SVOP (0x816dd38) gvsv GV (0x80fa468) *a | |
1031 | 7 BINOP (0x81798a0) sassign | |
1032 | 8 LISTOP (0x8179900) leave | |
1033 | ||
1034 | This probably makes more sense for a human: enter a block, start a | |
1035 | statement. Get the values of C<$b> and C<$c>, and add them together. | |
1036 | Find C<$a>, and assign one to the other. Then leave. | |
1037 | ||
1038 | The way Perl builds up these op trees in the parsing process can be | |
1039 | unravelled by examining F<perly.y>, the YACC grammar. Let's take the | |
1040 | piece we need to construct the tree for C<$a = $b + $c> | |
1041 | ||
1042 | 1 term : term ASSIGNOP term | |
1043 | 2 { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); } | |
1044 | 3 | term ADDOP term | |
1045 | 4 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); } | |
1046 | ||
1047 | If you're not used to reading BNF grammars, this is how it works: You're | |
1048 | fed certain things by the tokeniser, which generally end up in upper | |
1049 | case. Here, C<ADDOP>, is provided when the tokeniser sees C<+> in your | |
1050 | code. C<ASSIGNOP> is provided when C<=> is used for assigning. These are | |
b432a672 | 1051 | "terminal symbols", because you can't get any simpler than them. |
a422fd2d SC |
1052 | |
1053 | The grammar, lines one and three of the snippet above, tells you how to | |
b432a672 | 1054 | build up more complex forms. These complex forms, "non-terminal symbols" |
a422fd2d SC |
1055 | are generally placed in lower case. C<term> here is a non-terminal |
1056 | symbol, representing a single expression. | |
1057 | ||
1058 | The grammar gives you the following rule: you can make the thing on the | |
1059 | left of the colon if you see all the things on the right in sequence. | |
1060 | This is called a "reduction", and the aim of parsing is to completely | |
1061 | reduce the input. There are several different ways you can perform a | |
1062 | reduction, separated by vertical bars: so, C<term> followed by C<=> | |
1063 | followed by C<term> makes a C<term>, and C<term> followed by C<+> | |
1064 | followed by C<term> can also make a C<term>. | |
1065 | ||
1066 | So, if you see two terms with an C<=> or C<+>, between them, you can | |
1067 | turn them into a single expression. When you do this, you execute the | |
1068 | code in the block on the next line: if you see C<=>, you'll do the code | |
1069 | in line 2. If you see C<+>, you'll do the code in line 4. It's this code | |
1070 | which contributes to the op tree. | |
1071 | ||
1072 | | term ADDOP term | |
1073 | { $$ = newBINOP($2, 0, scalar($1), scalar($3)); } | |
1074 | ||
1075 | What this does is creates a new binary op, and feeds it a number of | |
1076 | variables. The variables refer to the tokens: C<$1> is the first token in | |
1077 | the input, C<$2> the second, and so on - think regular expression | |
1078 | backreferences. C<$$> is the op returned from this reduction. So, we | |
1079 | call C<newBINOP> to create a new binary operator. The first parameter to | |
1080 | C<newBINOP>, a function in F<op.c>, is the op type. It's an addition | |
1081 | operator, so we want the type to be C<ADDOP>. We could specify this | |
1082 | directly, but it's right there as the second token in the input, so we | |
b432a672 AL |
1083 | use C<$2>. The second parameter is the op's flags: 0 means "nothing |
1084 | special". Then the things to add: the left and right hand side of our | |
a422fd2d SC |
1085 | expression, in scalar context. |
1086 | ||
1087 | =head2 Stacks | |
1088 | ||
1089 | When perl executes something like C<addop>, how does it pass on its | |
1090 | results to the next op? The answer is, through the use of stacks. Perl | |
1091 | has a number of stacks to store things it's currently working on, and | |
1092 | we'll look at the three most important ones here. | |
1093 | ||
1094 | =over 3 | |
1095 | ||
1096 | =item Argument stack | |
1097 | ||
1098 | Arguments are passed to PP code and returned from PP code using the | |
1099 | argument stack, C<ST>. The typical way to handle arguments is to pop | |
1100 | them off the stack, deal with them how you wish, and then push the result | |
1101 | back onto the stack. This is how, for instance, the cosine operator | |
1102 | works: | |
1103 | ||
1104 | NV value; | |
1105 | value = POPn; | |
1106 | value = Perl_cos(value); | |
1107 | XPUSHn(value); | |
1108 | ||
1109 | We'll see a more tricky example of this when we consider Perl's macros | |
1110 | below. C<POPn> gives you the NV (floating point value) of the top SV on | |
1111 | the stack: the C<$x> in C<cos($x)>. Then we compute the cosine, and push | |
1112 | the result back as an NV. The C<X> in C<XPUSHn> means that the stack | |
1113 | should be extended if necessary - it can't be necessary here, because we | |
1114 | know there's room for one more item on the stack, since we've just | |
1115 | removed one! The C<XPUSH*> macros at least guarantee safety. | |
1116 | ||
1117 | Alternatively, you can fiddle with the stack directly: C<SP> gives you | |
1118 | the first element in your portion of the stack, and C<TOP*> gives you | |
1119 | the top SV/IV/NV/etc. on the stack. So, for instance, to do unary | |
1120 | negation of an integer: | |
1121 | ||
1122 | SETi(-TOPi); | |
1123 | ||
1124 | Just set the integer value of the top stack entry to its negation. | |
1125 | ||
1126 | Argument stack manipulation in the core is exactly the same as it is in | |
1127 | XSUBs - see L<perlxstut>, L<perlxs> and L<perlguts> for a longer | |
1128 | description of the macros used in stack manipulation. | |
1129 | ||
1130 | =item Mark stack | |
1131 | ||
b432a672 | 1132 | I say "your portion of the stack" above because PP code doesn't |
a422fd2d SC |
1133 | necessarily get the whole stack to itself: if your function calls |
1134 | another function, you'll only want to expose the arguments aimed for the | |
1135 | called function, and not (necessarily) let it get at your own data. The | |
b432a672 | 1136 | way we do this is to have a "virtual" bottom-of-stack, exposed to each |
a422fd2d SC |
1137 | function. The mark stack keeps bookmarks to locations in the argument |
1138 | stack usable by each function. For instance, when dealing with a tied | |
b432a672 | 1139 | variable, (internally, something with "P" magic) Perl has to call |
a422fd2d SC |
1140 | methods for accesses to the tied variables. However, we need to separate |
1141 | the arguments exposed to the method to the argument exposed to the | |
ed233832 DM |
1142 | original function - the store or fetch or whatever it may be. Here's |
1143 | roughly how the tied C<push> is implemented; see C<av_push> in F<av.c>: | |
a422fd2d SC |
1144 | |
1145 | 1 PUSHMARK(SP); | |
1146 | 2 EXTEND(SP,2); | |
1147 | 3 PUSHs(SvTIED_obj((SV*)av, mg)); | |
1148 | 4 PUSHs(val); | |
1149 | 5 PUTBACK; | |
1150 | 6 ENTER; | |
1151 | 7 call_method("PUSH", G_SCALAR|G_DISCARD); | |
1152 | 8 LEAVE; | |
13a2d996 | 1153 | |
a422fd2d SC |
1154 | Let's examine the whole implementation, for practice: |
1155 | ||
1156 | 1 PUSHMARK(SP); | |
1157 | ||
1158 | Push the current state of the stack pointer onto the mark stack. This is | |
1159 | so that when we've finished adding items to the argument stack, Perl | |
1160 | knows how many things we've added recently. | |
1161 | ||
1162 | 2 EXTEND(SP,2); | |
1163 | 3 PUSHs(SvTIED_obj((SV*)av, mg)); | |
1164 | 4 PUSHs(val); | |
1165 | ||
1166 | We're going to add two more items onto the argument stack: when you have | |
1167 | a tied array, the C<PUSH> subroutine receives the object and the value | |
1168 | to be pushed, and that's exactly what we have here - the tied object, | |
1169 | retrieved with C<SvTIED_obj>, and the value, the SV C<val>. | |
1170 | ||
1171 | 5 PUTBACK; | |
1172 | ||
e89a6d4e JD |
1173 | Next we tell Perl to update the global stack pointer from our internal |
1174 | variable: C<dSP> only gave us a local copy, not a reference to the global. | |
a422fd2d SC |
1175 | |
1176 | 6 ENTER; | |
1177 | 7 call_method("PUSH", G_SCALAR|G_DISCARD); | |
1178 | 8 LEAVE; | |
1179 | ||
1180 | C<ENTER> and C<LEAVE> localise a block of code - they make sure that all | |
1181 | variables are tidied up, everything that has been localised gets | |
1182 | its previous value returned, and so on. Think of them as the C<{> and | |
1183 | C<}> of a Perl block. | |
1184 | ||
1185 | To actually do the magic method call, we have to call a subroutine in | |
1186 | Perl space: C<call_method> takes care of that, and it's described in | |
1187 | L<perlcall>. We call the C<PUSH> method in scalar context, and we're | |
e89a6d4e JD |
1188 | going to discard its return value. The call_method() function |
1189 | removes the top element of the mark stack, so there is nothing for | |
1190 | the caller to clean up. | |
a422fd2d | 1191 | |
a422fd2d SC |
1192 | =item Save stack |
1193 | ||
1194 | C doesn't have a concept of local scope, so perl provides one. We've | |
1195 | seen that C<ENTER> and C<LEAVE> are used as scoping braces; the save | |
1196 | stack implements the C equivalent of, for example: | |
1197 | ||
1198 | { | |
1199 | local $foo = 42; | |
1200 | ... | |
1201 | } | |
1202 | ||
1203 | See L<perlguts/Localising Changes> for how to use the save stack. | |
1204 | ||
1205 | =back | |
1206 | ||
1207 | =head2 Millions of Macros | |
1208 | ||
1209 | One thing you'll notice about the Perl source is that it's full of | |
1210 | macros. Some have called the pervasive use of macros the hardest thing | |
1211 | to understand, others find it adds to clarity. Let's take an example, | |
1212 | the code which implements the addition operator: | |
1213 | ||
1214 | 1 PP(pp_add) | |
1215 | 2 { | |
39644a26 | 1216 | 3 dSP; dATARGET; tryAMAGICbin(add,opASSIGN); |
a422fd2d SC |
1217 | 4 { |
1218 | 5 dPOPTOPnnrl_ul; | |
1219 | 6 SETn( left + right ); | |
1220 | 7 RETURN; | |
1221 | 8 } | |
1222 | 9 } | |
1223 | ||
1224 | Every line here (apart from the braces, of course) contains a macro. The | |
1225 | first line sets up the function declaration as Perl expects for PP code; | |
1226 | line 3 sets up variable declarations for the argument stack and the | |
1227 | target, the return value of the operation. Finally, it tries to see if | |
1228 | the addition operation is overloaded; if so, the appropriate subroutine | |
1229 | is called. | |
1230 | ||
1231 | Line 5 is another variable declaration - all variable declarations start | |
1232 | with C<d> - which pops from the top of the argument stack two NVs (hence | |
1233 | C<nn>) and puts them into the variables C<right> and C<left>, hence the | |
1234 | C<rl>. These are the two operands to the addition operator. Next, we | |
1235 | call C<SETn> to set the NV of the return value to the result of adding | |
1236 | the two values. This done, we return - the C<RETURN> macro makes sure | |
1237 | that our return value is properly handled, and we pass the next operator | |
1238 | to run back to the main run loop. | |
1239 | ||
1240 | Most of these macros are explained in L<perlapi>, and some of the more | |
1241 | important ones are explained in L<perlxs> as well. Pay special attention | |
1242 | to L<perlguts/Background and PERL_IMPLICIT_CONTEXT> for information on | |
1243 | the C<[pad]THX_?> macros. | |
1244 | ||
52d59bef JH |
1245 | =head2 The .i Targets |
1246 | ||
1247 | You can expand the macros in a F<foo.c> file by saying | |
1248 | ||
1249 | make foo.i | |
1250 | ||
1251 | which will expand the macros using cpp. Don't be scared by the results. | |
1252 | ||
cce04beb | 1253 | =head1 TESTING |
955fec6b | 1254 | |
cce04beb DG |
1255 | Every module and built-in function has an associated test file (or |
1256 | should...). If you add or change functionality, you have to write a | |
1257 | test. If you fix a bug, you have to write a test so that bug never | |
1258 | comes back. If you alter the docs, it would be nice to test what the | |
1259 | new documentation says. | |
955fec6b | 1260 | |
cce04beb DG |
1261 | In short, if you submit a patch you probably also have to patch the |
1262 | tests. | |
955fec6b | 1263 | |
cce04beb | 1264 | =head2 Where to find test files |
955fec6b | 1265 | |
cce04beb DG |
1266 | For modules, the test file is right next to the module itself. |
1267 | F<lib/strict.t> tests F<lib/strict.pm>. This is a recent innovation, | |
1268 | so there are some snags (and it would be wonderful for you to brush | |
1269 | them out), but it basically works that way. Everything else lives in | |
1270 | F<t/>. | |
955fec6b | 1271 | |
cce04beb DG |
1272 | Testing of warning messages is often separately done by using expect scripts in |
1273 | F<t/lib/warnings>. This is because much of the setup for them is already done | |
1274 | for you. | |
955fec6b | 1275 | |
cce04beb DG |
1276 | If you add a new test directory under F<t/>, it is imperative that you |
1277 | add that directory to F<t/HARNESS> and F<t/TEST>. | |
955fec6b | 1278 | |
cce04beb | 1279 | =over 3 |
955fec6b | 1280 | |
cce04beb | 1281 | =item F<t/base/> |
955fec6b | 1282 | |
cce04beb DG |
1283 | Testing of the absolute basic functionality of Perl. Things like |
1284 | C<if>, basic file reads and writes, simple regexes, etc. These are | |
1285 | run first in the test suite and if any of them fail, something is | |
1286 | I<really> broken. | |
955fec6b | 1287 | |
cce04beb | 1288 | =item F<t/cmd/> |
955fec6b | 1289 | |
cce04beb DG |
1290 | These test the basic control structures, C<if/else>, C<while>, |
1291 | subroutines, etc. | |
955fec6b | 1292 | |
cce04beb | 1293 | =item F<t/comp/> |
955fec6b | 1294 | |
cce04beb | 1295 | Tests basic issues of how Perl parses and compiles itself. |
955fec6b | 1296 | |
cce04beb | 1297 | =item F<t/io/> |
955fec6b | 1298 | |
cce04beb | 1299 | Tests for built-in IO functions, including command line arguments. |
955fec6b | 1300 | |
cce04beb | 1301 | =item F<t/lib/> |
955fec6b | 1302 | |
cce04beb DG |
1303 | The old home for the module tests, you shouldn't put anything new in |
1304 | here. There are still some bits and pieces hanging around in here | |
1305 | that need to be moved. Perhaps you could move them? Thanks! | |
955fec6b | 1306 | |
cce04beb | 1307 | =item F<t/mro/> |
955fec6b | 1308 | |
cce04beb DG |
1309 | Tests for perl's method resolution order implementations |
1310 | (see L<mro>). | |
955fec6b | 1311 | |
cce04beb | 1312 | =item F<t/op/> |
955fec6b | 1313 | |
cce04beb DG |
1314 | Tests for perl's built in functions that don't fit into any of the |
1315 | other directories. | |
955fec6b | 1316 | |
cce04beb | 1317 | =item F<t/re/> |
955fec6b | 1318 | |
cce04beb DG |
1319 | Tests for regex related functions or behaviour. (These used to live |
1320 | in t/op). | |
955fec6b | 1321 | |
cce04beb | 1322 | =item F<t/run/> |
955fec6b | 1323 | |
cce04beb DG |
1324 | Testing features of how perl actually runs, including exit codes and |
1325 | handling of PERL* environment variables. | |
955fec6b | 1326 | |
cce04beb | 1327 | =item F<t/uni/> |
955fec6b | 1328 | |
cce04beb | 1329 | Tests for the core support of Unicode. |
955fec6b | 1330 | |
cce04beb | 1331 | =item F<t/win32/> |
955fec6b | 1332 | |
cce04beb | 1333 | Windows-specific tests. |
955fec6b | 1334 | |
cce04beb | 1335 | =item F<t/x2p> |
955fec6b | 1336 | |
cce04beb | 1337 | A test suite for the s2p converter. |
955fec6b | 1338 | |
cce04beb | 1339 | =back |
955fec6b | 1340 | |
cce04beb DG |
1341 | The core uses the same testing style as the rest of Perl, a simple |
1342 | "ok/not ok" run through Test::Harness, but there are a few special | |
1343 | considerations. | |
955fec6b | 1344 | |
cce04beb DG |
1345 | There are three ways to write a test in the core. Test::More, |
1346 | t/test.pl and ad hoc C<print $test ? "ok 42\n" : "not ok 42\n">. The | |
1347 | decision of which to use depends on what part of the test suite you're | |
1348 | working on. This is a measure to prevent a high-level failure (such | |
1349 | as Config.pm breaking) from causing basic functionality tests to fail. | |
1350 | If you write your own test, use the L<Test Anything Protocol|TAP>. | |
955fec6b | 1351 | |
cce04beb | 1352 | =over 4 |
955fec6b | 1353 | |
cce04beb | 1354 | =item t/base t/comp |
955fec6b | 1355 | |
cce04beb DG |
1356 | Since we don't know if require works, or even subroutines, use ad hoc |
1357 | tests for these two. Step carefully to avoid using the feature being | |
1358 | tested. | |
955fec6b | 1359 | |
cce04beb | 1360 | =item t/cmd t/run t/io t/op |
955fec6b | 1361 | |
cce04beb DG |
1362 | Now that basic require() and subroutines are tested, you can use the |
1363 | t/test.pl library which emulates the important features of Test::More | |
1364 | while using a minimum of core features. | |
955fec6b | 1365 | |
cce04beb DG |
1366 | You can also conditionally use certain libraries like Config, but be |
1367 | sure to skip the test gracefully if it's not there. | |
955fec6b | 1368 | |
cce04beb | 1369 | =item t/lib ext lib |
955fec6b | 1370 | |
cce04beb DG |
1371 | Now that the core of Perl is tested, Test::More can be used. You can |
1372 | also use the full suite of core modules in the tests. | |
a422fd2d | 1373 | |
cce04beb | 1374 | =back |
a422fd2d | 1375 | |
cce04beb DG |
1376 | When you say "make test" Perl uses the F<t/TEST> program to run the |
1377 | test suite (except under Win32 where it uses F<t/harness> instead.) | |
1378 | All tests are run from the F<t/> directory, B<not> the directory | |
1379 | which contains the test. This causes some problems with the tests | |
1380 | in F<lib/>, so here's some opportunity for some patching. | |
a422fd2d | 1381 | |
cce04beb DG |
1382 | You must be triply conscious of cross-platform concerns. This usually |
1383 | boils down to using File::Spec and avoiding things like C<fork()> and | |
1384 | C<system()> unless absolutely necessary. | |
955fec6b | 1385 | |
cce04beb | 1386 | =head2 Special Make Test Targets |
a422fd2d | 1387 | |
cce04beb DG |
1388 | There are various special make targets that can be used to test Perl |
1389 | slightly differently than the standard "test" target. Not all them | |
1390 | are expected to give a 100% success rate. Many of them have several | |
1391 | aliases, and many of them are not available on certain operating | |
1392 | systems. | |
a422fd2d | 1393 | |
cce04beb | 1394 | =over 4 |
13a2d996 | 1395 | |
cce04beb | 1396 | =item coretest |
a422fd2d | 1397 | |
cce04beb | 1398 | Run F<perl> on all core tests (F<t/*> and F<lib/[a-z]*> pragma tests). |
a422fd2d | 1399 | |
cce04beb | 1400 | (Not available on Win32) |
a422fd2d | 1401 | |
cce04beb | 1402 | =item test.deparse |
a422fd2d | 1403 | |
cce04beb | 1404 | Run all the tests through B::Deparse. Not all tests will succeed. |
a422fd2d | 1405 | |
cce04beb | 1406 | (Not available on Win32) |
a422fd2d | 1407 | |
cce04beb | 1408 | =item test.taintwarn |
a422fd2d | 1409 | |
cce04beb DG |
1410 | Run all tests with the B<-t> command-line switch. Not all tests |
1411 | are expected to succeed (until they're specifically fixed, of course). | |
a422fd2d | 1412 | |
cce04beb | 1413 | (Not available on Win32) |
a422fd2d | 1414 | |
cce04beb | 1415 | =item minitest |
955fec6b | 1416 | |
cce04beb DG |
1417 | Run F<miniperl> on F<t/base>, F<t/comp>, F<t/cmd>, F<t/run>, F<t/io>, |
1418 | F<t/op>, F<t/uni> and F<t/mro> tests. | |
955fec6b | 1419 | |
cce04beb | 1420 | =item test.valgrind check.valgrind utest.valgrind ucheck.valgrind |
a422fd2d | 1421 | |
cce04beb DG |
1422 | (Only in Linux) Run all the tests using the memory leak + naughty |
1423 | memory access tool "valgrind". The log files will be named | |
1424 | F<testname.valgrind>. | |
a422fd2d | 1425 | |
cce04beb | 1426 | =item test.third check.third utest.third ucheck.third |
a422fd2d | 1427 | |
cce04beb DG |
1428 | (Only in Tru64) Run all the tests using the memory leak + naughty |
1429 | memory access tool "Third Degree". The log files will be named | |
1430 | F<perl.3log.testname>. | |
a422fd2d | 1431 | |
cce04beb | 1432 | =item test.torture torturetest |
a422fd2d | 1433 | |
cce04beb DG |
1434 | Run all the usual tests and some extra tests. As of Perl 5.8.0 the |
1435 | only extra tests are Abigail's JAPHs, F<t/japh/abigail.t>. | |
a422fd2d | 1436 | |
cce04beb DG |
1437 | You can also run the torture test with F<t/harness> by giving |
1438 | C<-torture> argument to F<t/harness>. | |
a422fd2d | 1439 | |
cce04beb | 1440 | =item utest ucheck test.utf8 check.utf8 |
a422fd2d | 1441 | |
cce04beb | 1442 | Run all the tests with -Mutf8. Not all tests will succeed. |
a422fd2d | 1443 | |
cce04beb | 1444 | (Not available on Win32) |
a422fd2d | 1445 | |
cce04beb | 1446 | =item minitest.utf16 test.utf16 |
a422fd2d | 1447 | |
cce04beb DG |
1448 | Runs the tests with UTF-16 encoded scripts, encoded with different |
1449 | versions of this encoding. | |
a422fd2d | 1450 | |
cce04beb DG |
1451 | C<make utest.utf16> runs the test suite with a combination of C<-utf8> and |
1452 | C<-utf16> arguments to F<t/TEST>. | |
a422fd2d | 1453 | |
cce04beb | 1454 | (Not available on Win32) |
a422fd2d | 1455 | |
cce04beb | 1456 | =item test_harness |
a422fd2d | 1457 | |
cce04beb DG |
1458 | Run the test suite with the F<t/harness> controlling program, instead of |
1459 | F<t/TEST>. F<t/harness> is more sophisticated, and uses the | |
1460 | L<Test::Harness> module, thus using this test target supposes that perl | |
1461 | mostly works. The main advantage for our purposes is that it prints a | |
1462 | detailed summary of failed tests at the end. Also, unlike F<t/TEST>, it | |
1463 | doesn't redirect stderr to stdout. | |
a422fd2d | 1464 | |
cce04beb DG |
1465 | Note that under Win32 F<t/harness> is always used instead of F<t/TEST>, so |
1466 | there is no special "test_harness" target. | |
a422fd2d | 1467 | |
cce04beb DG |
1468 | Under Win32's "test" target you may use the TEST_SWITCHES and TEST_FILES |
1469 | environment variables to control the behaviour of F<t/harness>. This means | |
1470 | you can say | |
a422fd2d | 1471 | |
cce04beb DG |
1472 | nmake test TEST_FILES="op/*.t" |
1473 | nmake test TEST_SWITCHES="-torture" TEST_FILES="op/*.t" | |
a422fd2d | 1474 | |
cce04beb | 1475 | =item Parallel tests |
a422fd2d | 1476 | |
cce04beb DG |
1477 | The core distribution can now run its regression tests in parallel on |
1478 | Unix-like platforms. Instead of running C<make test>, set C<TEST_JOBS> in | |
1479 | your environment to the number of tests to run in parallel, and run | |
1480 | C<make test_harness>. On a Bourne-like shell, this can be done as | |
a422fd2d | 1481 | |
cce04beb | 1482 | TEST_JOBS=3 make test_harness # Run 3 tests in parallel |
a422fd2d | 1483 | |
cce04beb DG |
1484 | An environment variable is used, rather than parallel make itself, because |
1485 | L<TAP::Harness> needs to be able to schedule individual non-conflicting test | |
1486 | scripts itself, and there is no standard interface to C<make> utilities to | |
1487 | interact with their job schedulers. | |
a422fd2d | 1488 | |
cce04beb DG |
1489 | Note that currently some test scripts may fail when run in parallel (most |
1490 | notably C<ext/IO/t/io_dir.t>). If necessary run just the failing scripts | |
1491 | again sequentially and see if the failures go away. | |
1492 | =item test-notty test_notty | |
1493 | ||
1494 | Sets PERL_SKIP_TTY_TEST to true before running normal test. | |
a422fd2d | 1495 | |
ffc145e8 RK |
1496 | =back |
1497 | ||
cce04beb | 1498 | =head2 Running tests by hand |
52d59bef | 1499 | |
cce04beb DG |
1500 | You can run part of the test suite by hand by using one the following |
1501 | commands from the F<t/> directory : | |
a422fd2d | 1502 | |
cce04beb | 1503 | ./perl -I../lib TEST list-of-.t-files |
ea031e66 | 1504 | |
cce04beb | 1505 | or |
a422fd2d | 1506 | |
cce04beb | 1507 | ./perl -I../lib harness list-of-.t-files |
a422fd2d | 1508 | |
cce04beb | 1509 | (if you don't specify test scripts, the whole test suite will be run.) |
a422fd2d | 1510 | |
cce04beb | 1511 | =head3 Using t/harness for testing |
a422fd2d | 1512 | |
cce04beb DG |
1513 | If you use C<harness> for testing you have several command line options |
1514 | available to you. The arguments are as follows, and are in the order | |
1515 | that they must appear if used together. | |
a422fd2d | 1516 | |
cce04beb DG |
1517 | harness -v -torture -re=pattern LIST OF FILES TO TEST |
1518 | harness -v -torture -re LIST OF PATTERNS TO MATCH | |
a422fd2d | 1519 | |
cce04beb DG |
1520 | If C<LIST OF FILES TO TEST> is omitted the file list is obtained from |
1521 | the manifest. The file list may include shell wildcards which will be | |
1522 | expanded out. | |
a422fd2d | 1523 | |
cce04beb | 1524 | =over 4 |
a422fd2d | 1525 | |
cce04beb | 1526 | =item -v |
a422fd2d | 1527 | |
cce04beb DG |
1528 | Run the tests under verbose mode so you can see what tests were run, |
1529 | and debug output. | |
a422fd2d | 1530 | |
cce04beb | 1531 | =item -torture |
a422fd2d | 1532 | |
cce04beb | 1533 | Run the torture tests as well as the normal set. |
a422fd2d | 1534 | |
cce04beb | 1535 | =item -re=PATTERN |
a422fd2d | 1536 | |
cce04beb DG |
1537 | Filter the file list so that all the test files run match PATTERN. |
1538 | Note that this form is distinct from the B<-re LIST OF PATTERNS> form below | |
1539 | in that it allows the file list to be provided as well. | |
a422fd2d | 1540 | |
cce04beb | 1541 | =item -re LIST OF PATTERNS |
a422fd2d | 1542 | |
cce04beb DG |
1543 | Filter the file list so that all the test files run match |
1544 | /(LIST|OF|PATTERNS)/. Note that with this form the patterns | |
1545 | are joined by '|' and you cannot supply a list of files, instead | |
1546 | the test files are obtained from the MANIFEST. | |
a422fd2d | 1547 | |
cce04beb | 1548 | =back |
a422fd2d | 1549 | |
cce04beb | 1550 | You can run an individual test by a command similar to |
a422fd2d | 1551 | |
cce04beb | 1552 | ./perl -I../lib patho/to/foo.t |
a422fd2d | 1553 | |
cce04beb DG |
1554 | except that the harnesses set up some environment variables that may |
1555 | affect the execution of the test : | |
a422fd2d | 1556 | |
cce04beb | 1557 | =over 4 |
a422fd2d | 1558 | |
cce04beb DG |
1559 | =item PERL_CORE=1 |
1560 | ||
1561 | indicates that we're running this test part of the perl core test suite. | |
1562 | This is useful for modules that have a dual life on CPAN. | |
1563 | ||
1564 | =item PERL_DESTRUCT_LEVEL=2 | |
1565 | ||
1566 | is set to 2 if it isn't set already (see L</PERL_DESTRUCT_LEVEL>) | |
1567 | ||
1568 | =item PERL | |
1569 | ||
1570 | (used only by F<t/TEST>) if set, overrides the path to the perl executable | |
1571 | that should be used to run the tests (the default being F<./perl>). | |
1572 | ||
1573 | =item PERL_SKIP_TTY_TEST | |
1574 | ||
1575 | if set, tells to skip the tests that need a terminal. It's actually set | |
1576 | automatically by the Makefile, but can also be forced artificially by | |
1577 | running 'make test_notty'. | |
1578 | ||
1579 | =back | |
1580 | ||
1581 | =head3 Other environment variables that may influence tests | |
1582 | ||
1583 | =over 4 | |
1584 | ||
1585 | =item PERL_TEST_Net_Ping | |
1586 | ||
1587 | Setting this variable runs all the Net::Ping modules tests, | |
1588 | otherwise some tests that interact with the outside world are skipped. | |
1589 | See L<perl58delta>. | |
1590 | ||
1591 | =item PERL_TEST_NOVREXX | |
1592 | ||
1593 | Setting this variable skips the vrexx.t tests for OS2::REXX. | |
1594 | ||
1595 | =item PERL_TEST_NUMCONVERTS | |
1596 | ||
1597 | This sets a variable in op/numconvert.t. | |
1598 | ||
1599 | =back | |
1600 | ||
1601 | See also the documentation for the Test and Test::Harness modules, | |
1602 | for more environment variables that affect testing. | |
1603 | ||
1604 | =head1 EXAMPLE OF A SIMPLE PATCH | |
1605 | ||
1606 | All right, we've now had a look at how to navigate the Perl sources and | |
1607 | some things you'll need to know when fiddling with them. Let's now get | |
a422fd2d | 1608 | on and create a simple patch. Here's something Larry suggested: if a |
07aa3531 | 1609 | C<U> is the first active format during a C<pack>, (for example, |
a422fd2d | 1610 | C<pack "U3C8", @stuff>) then the resulting string should be treated as |
1e54db1a | 1611 | UTF-8 encoded. |
a422fd2d | 1612 | |
168a53cc DR |
1613 | If you are working with a git clone of the Perl repository, you will want to |
1614 | create a branch for your changes. This will make creating a proper patch much | |
1615 | simpler. See the L<perlrepository> for details on how to do this. | |
1616 | ||
cce04beb DG |
1617 | =head2 Writing the patch |
1618 | ||
a422fd2d SC |
1619 | How do we prepare to fix this up? First we locate the code in question - |
1620 | the C<pack> happens at runtime, so it's going to be in one of the F<pp> | |
1621 | files. Sure enough, C<pp_pack> is in F<pp.c>. Since we're going to be | |
1622 | altering this file, let's copy it to F<pp.c~>. | |
1623 | ||
a6ec74c1 JH |
1624 | [Well, it was in F<pp.c> when this tutorial was written. It has now been |
1625 | split off with C<pp_unpack> to its own file, F<pp_pack.c>] | |
1626 | ||
a422fd2d SC |
1627 | Now let's look over C<pp_pack>: we take a pattern into C<pat>, and then |
1628 | loop over the pattern, taking each format character in turn into | |
1629 | C<datum_type>. Then for each possible format character, we swallow up | |
1630 | the other arguments in the pattern (a field width, an asterisk, and so | |
1631 | on) and convert the next chunk input into the specified format, adding | |
1632 | it onto the output SV C<cat>. | |
1633 | ||
1634 | How do we know if the C<U> is the first format in the C<pat>? Well, if | |
1635 | we have a pointer to the start of C<pat> then, if we see a C<U> we can | |
1636 | test whether we're still at the start of the string. So, here's where | |
1637 | C<pat> is set up: | |
1638 | ||
1639 | STRLEN fromlen; | |
1640 | register char *pat = SvPVx(*++MARK, fromlen); | |
1641 | register char *patend = pat + fromlen; | |
1642 | register I32 len; | |
1643 | I32 datumtype; | |
1644 | SV *fromstr; | |
1645 | ||
1646 | We'll have another string pointer in there: | |
1647 | ||
1648 | STRLEN fromlen; | |
1649 | register char *pat = SvPVx(*++MARK, fromlen); | |
1650 | register char *patend = pat + fromlen; | |
1651 | + char *patcopy; | |
1652 | register I32 len; | |
1653 | I32 datumtype; | |
1654 | SV *fromstr; | |
1655 | ||
1656 | And just before we start the loop, we'll set C<patcopy> to be the start | |
1657 | of C<pat>: | |
1658 | ||
1659 | items = SP - MARK; | |
1660 | MARK++; | |
1661 | sv_setpvn(cat, "", 0); | |
1662 | + patcopy = pat; | |
1663 | while (pat < patend) { | |
1664 | ||
1665 | Now if we see a C<U> which was at the start of the string, we turn on | |
1e54db1a | 1666 | the C<UTF8> flag for the output SV, C<cat>: |
a422fd2d SC |
1667 | |
1668 | + if (datumtype == 'U' && pat==patcopy+1) | |
1669 | + SvUTF8_on(cat); | |
1670 | if (datumtype == '#') { | |
1671 | while (pat < patend && *pat != '\n') | |
1672 | pat++; | |
1673 | ||
1674 | Remember that it has to be C<patcopy+1> because the first character of | |
1675 | the string is the C<U> which has been swallowed into C<datumtype!> | |
1676 | ||
1677 | Oops, we forgot one thing: what if there are spaces at the start of the | |
1678 | pattern? C<pack(" U*", @stuff)> will have C<U> as the first active | |
1679 | character, even though it's not the first thing in the pattern. In this | |
1680 | case, we have to advance C<patcopy> along with C<pat> when we see spaces: | |
1681 | ||
1682 | if (isSPACE(datumtype)) | |
1683 | continue; | |
1684 | ||
1685 | needs to become | |
1686 | ||
1687 | if (isSPACE(datumtype)) { | |
1688 | patcopy++; | |
1689 | continue; | |
1690 | } | |
1691 | ||
1692 | OK. That's the C part done. Now we must do two additional things before | |
1693 | this patch is ready to go: we've changed the behaviour of Perl, and so | |
1694 | we must document that change. We must also provide some more regression | |
1695 | tests to make sure our patch works and doesn't create a bug somewhere | |
1696 | else along the line. | |
1697 | ||
cce04beb DG |
1698 | =head2 Testing the patch |
1699 | ||
b23b8711 MS |
1700 | The regression tests for each operator live in F<t/op/>, and so we |
1701 | make a copy of F<t/op/pack.t> to F<t/op/pack.t~>. Now we can add our | |
1702 | tests to the end. First, we'll test that the C<U> does indeed create | |
07aa3531 | 1703 | Unicode strings. |
b23b8711 MS |
1704 | |
1705 | t/op/pack.t has a sensible ok() function, but if it didn't we could | |
35c336e6 | 1706 | use the one from t/test.pl. |
b23b8711 | 1707 | |
35c336e6 MS |
1708 | require './test.pl'; |
1709 | plan( tests => 159 ); | |
b23b8711 MS |
1710 | |
1711 | so instead of this: | |
a422fd2d | 1712 | |
195c30ce KW |
1713 | print 'not ' unless "1.20.300.4000" eq sprintf "%vd", |
1714 | pack("U*",1,20,300,4000); | |
a422fd2d SC |
1715 | print "ok $test\n"; $test++; |
1716 | ||
35c336e6 MS |
1717 | we can write the more sensible (see L<Test::More> for a full |
1718 | explanation of is() and other testing functions). | |
b23b8711 | 1719 | |
07aa3531 | 1720 | is( "1.20.300.4000", sprintf "%vd", pack("U*",1,20,300,4000), |
38a44b82 | 1721 | "U* produces Unicode" ); |
b23b8711 | 1722 | |
a422fd2d SC |
1723 | Now we'll test that we got that space-at-the-beginning business right: |
1724 | ||
35c336e6 | 1725 | is( "1.20.300.4000", sprintf "%vd", pack(" U*",1,20,300,4000), |
195c30ce | 1726 | " with spaces at the beginning" ); |
a422fd2d SC |
1727 | |
1728 | And finally we'll test that we don't make Unicode strings if C<U> is B<not> | |
1729 | the first active format: | |
1730 | ||
35c336e6 | 1731 | isnt( v1.20.300.4000, sprintf "%vd", pack("C0U*",1,20,300,4000), |
38a44b82 | 1732 | "U* not first isn't Unicode" ); |
a422fd2d | 1733 | |
35c336e6 MS |
1734 | Mustn't forget to change the number of tests which appears at the top, |
1735 | or else the automated tester will get confused. This will either look | |
1736 | like this: | |
a422fd2d | 1737 | |
35c336e6 MS |
1738 | print "1..156\n"; |
1739 | ||
1740 | or this: | |
1741 | ||
1742 | plan( tests => 156 ); | |
a422fd2d SC |
1743 | |
1744 | We now compile up Perl, and run it through the test suite. Our new | |
1745 | tests pass, hooray! | |
1746 | ||
cce04beb DG |
1747 | =head2 Documenting the patch |
1748 | ||
a422fd2d SC |
1749 | Finally, the documentation. The job is never done until the paperwork is |
1750 | over, so let's describe the change we've just made. The relevant place | |
1751 | is F<pod/perlfunc.pod>; again, we make a copy, and then we'll insert | |
1752 | this text in the description of C<pack>: | |
1753 | ||
1754 | =item * | |
1755 | ||
1756 | If the pattern begins with a C<U>, the resulting string will be treated | |
1e54db1a JH |
1757 | as UTF-8-encoded Unicode. You can force UTF-8 encoding on in a string |
1758 | with an initial C<U0>, and the bytes that follow will be interpreted as | |
195c30ce KW |
1759 | Unicode characters. If you don't want this to happen, you can begin |
1760 | your pattern with C<C0> (or anything else) to force Perl not to UTF-8 | |
1761 | encode your string, and then follow this with a C<U*> somewhere in your | |
1762 | pattern. | |
a422fd2d | 1763 | |
cce04beb | 1764 | =head1 COMMON PROBLEMS |
f7e1e956 | 1765 | |
cce04beb DG |
1766 | Perl source plays by ANSI C89 rules: no C99 (or C++) extensions. In |
1767 | some cases we have to take pre-ANSI requirements into consideration. | |
1768 | You don't care about some particular platform having broken Perl? | |
1769 | I hear there is still a strong demand for J2EE programmers. | |
f7e1e956 | 1770 | |
cce04beb | 1771 | =head2 Perl environment problems |
db300100 | 1772 | |
cce04beb | 1773 | =over 4 |
acbe17fc | 1774 | |
cce04beb | 1775 | =item * |
acbe17fc | 1776 | |
cce04beb | 1777 | Not compiling with threading |
acbe17fc | 1778 | |
cce04beb DG |
1779 | Compiling with threading (-Duseithreads) completely rewrites |
1780 | the function prototypes of Perl. You better try your changes | |
1781 | with that. Related to this is the difference between "Perl_-less" | |
1782 | and "Perl_-ly" APIs, for example: | |
acbe17fc | 1783 | |
cce04beb DG |
1784 | Perl_sv_setiv(aTHX_ ...); |
1785 | sv_setiv(...); | |
acbe17fc | 1786 | |
cce04beb DG |
1787 | The first one explicitly passes in the context, which is needed for e.g. |
1788 | threaded builds. The second one does that implicitly; do not get them | |
1789 | mixed. If you are not passing in a aTHX_, you will need to do a dTHX | |
1790 | (or a dVAR) as the first thing in the function. | |
acbe17fc | 1791 | |
cce04beb DG |
1792 | See L<perlguts/"How multiple interpreters and concurrency are supported"> |
1793 | for further discussion about context. | |
acbe17fc | 1794 | |
cce04beb | 1795 | =item * |
f7e1e956 | 1796 | |
cce04beb | 1797 | Not compiling with -DDEBUGGING |
f7e1e956 | 1798 | |
cce04beb DG |
1799 | The DEBUGGING define exposes more code to the compiler, |
1800 | therefore more ways for things to go wrong. You should try it. | |
f7e1e956 | 1801 | |
cce04beb | 1802 | =item * |
f7e1e956 | 1803 | |
cce04beb | 1804 | Introducing (non-read-only) globals |
f7e1e956 | 1805 | |
cce04beb DG |
1806 | Do not introduce any modifiable globals, truly global or file static. |
1807 | They are bad form and complicate multithreading and other forms of | |
1808 | concurrency. The right way is to introduce them as new interpreter | |
1809 | variables, see F<intrpvar.h> (at the very end for binary compatibility). | |
628f0a0a | 1810 | |
cce04beb DG |
1811 | Introducing read-only (const) globals is okay, as long as you verify |
1812 | with e.g. C<nm libperl.a|egrep -v ' [TURtr] '> (if your C<nm> has | |
1813 | BSD-style output) that the data you added really is read-only. | |
1814 | (If it is, it shouldn't show up in the output of that command.) | |
d5f28025 | 1815 | |
cce04beb | 1816 | If you want to have static strings, make them constant: |
f7e1e956 | 1817 | |
cce04beb | 1818 | static const char etc[] = "..."; |
f7e1e956 | 1819 | |
cce04beb DG |
1820 | If you want to have arrays of constant strings, note carefully |
1821 | the right combination of C<const>s: | |
f7e1e956 | 1822 | |
cce04beb DG |
1823 | static const char * const yippee[] = |
1824 | {"hi", "ho", "silver"}; | |
f7e1e956 | 1825 | |
cce04beb DG |
1826 | There is a way to completely hide any modifiable globals (they are all |
1827 | moved to heap), the compilation setting C<-DPERL_GLOBAL_STRUCT_PRIVATE>. | |
1828 | It is not normally used, but can be used for testing, read more | |
1829 | about it in L<perlguts/"Background and PERL_IMPLICIT_CONTEXT">. | |
f7e1e956 | 1830 | |
cce04beb | 1831 | =item * |
f7e1e956 | 1832 | |
cce04beb | 1833 | Not exporting your new function |
f7e1e956 | 1834 | |
cce04beb DG |
1835 | Some platforms (Win32, AIX, VMS, OS/2, to name a few) require any |
1836 | function that is part of the public API (the shared Perl library) | |
1837 | to be explicitly marked as exported. See the discussion about | |
1838 | F<embed.pl> in L<perlguts>. | |
f7e1e956 | 1839 | |
cce04beb | 1840 | =item * |
f7e1e956 | 1841 | |
cce04beb | 1842 | Exporting your new function |
f7e1e956 | 1843 | |
cce04beb DG |
1844 | The new shiny result of either genuine new functionality or your |
1845 | arduous refactoring is now ready and correctly exported. So what | |
1846 | could possibly go wrong? | |
f7e1e956 | 1847 | |
cce04beb DG |
1848 | Maybe simply that your function did not need to be exported in the |
1849 | first place. Perl has a long and not so glorious history of exporting | |
1850 | functions that it should not have. | |
3c295041 | 1851 | |
cce04beb DG |
1852 | If the function is used only inside one source code file, make it |
1853 | static. See the discussion about F<embed.pl> in L<perlguts>. | |
3c295041 | 1854 | |
cce04beb DG |
1855 | If the function is used across several files, but intended only for |
1856 | Perl's internal use (and this should be the common case), do not | |
1857 | export it to the public API. See the discussion about F<embed.pl> | |
1858 | in L<perlguts>. | |
f7e1e956 | 1859 | |
cce04beb | 1860 | =back |
f7e1e956 | 1861 | |
cce04beb | 1862 | =head2 Portability problems |
a4499558 | 1863 | |
cce04beb DG |
1864 | The following are common causes of compilation and/or execution |
1865 | failures, not common to Perl as such. The C FAQ is good bedtime | |
1866 | reading. Please test your changes with as many C compilers and | |
1867 | platforms as possible; we will, anyway, and it's nice to save | |
1868 | oneself from public embarrassment. | |
a4499558 | 1869 | |
cce04beb DG |
1870 | If using gcc, you can add the C<-std=c89> option which will hopefully |
1871 | catch most of these unportabilities. (However it might also catch | |
1872 | incompatibilities in your system's header files.) | |
f7e1e956 | 1873 | |
cce04beb DG |
1874 | Use the Configure C<-Dgccansipedantic> flag to enable the gcc |
1875 | C<-ansi -pedantic> flags which enforce stricter ANSI rules. | |
f7e1e956 | 1876 | |
cce04beb DG |
1877 | If using the C<gcc -Wall> note that not all the possible warnings |
1878 | (like C<-Wunitialized>) are given unless you also compile with C<-O>. | |
244d9cb7 | 1879 | |
cce04beb DG |
1880 | Note that if using gcc, starting from Perl 5.9.5 the Perl core source |
1881 | code files (the ones at the top level of the source code distribution, | |
1882 | but not e.g. the extensions under ext/) are automatically compiled | |
1883 | with as many as possible of the C<-std=c89>, C<-ansi>, C<-pedantic>, | |
1884 | and a selection of C<-W> flags (see cflags.SH). | |
244d9cb7 | 1885 | |
cce04beb DG |
1886 | Also study L<perlport> carefully to avoid any bad assumptions |
1887 | about the operating system, filesystems, and so forth. | |
244d9cb7 | 1888 | |
cce04beb DG |
1889 | You may once in a while try a "make microperl" to see whether we |
1890 | can still compile Perl with just the bare minimum of interfaces. | |
1891 | (See README.micro.) | |
244d9cb7 | 1892 | |
cce04beb | 1893 | Do not assume an operating system indicates a certain compiler. |
244d9cb7 | 1894 | |
cce04beb | 1895 | =over 4 |
244d9cb7 | 1896 | |
cce04beb | 1897 | =item * |
f7e1e956 | 1898 | |
cce04beb | 1899 | Casting pointers to integers or casting integers to pointers |
f7e1e956 | 1900 | |
cce04beb DG |
1901 | void castaway(U8* p) |
1902 | { | |
1903 | IV i = p; | |
35c336e6 | 1904 | |
cce04beb | 1905 | or |
35c336e6 | 1906 | |
cce04beb DG |
1907 | void castaway(U8* p) |
1908 | { | |
1909 | IV i = (IV)p; | |
35c336e6 | 1910 | |
cce04beb DG |
1911 | Both are bad, and broken, and unportable. Use the PTR2IV() |
1912 | macro that does it right. (Likewise, there are PTR2UV(), PTR2NV(), | |
1913 | INT2PTR(), and NUM2PTR().) | |
35c336e6 | 1914 | |
cce04beb | 1915 | =item * |
35c336e6 | 1916 | |
cce04beb | 1917 | Casting between data function pointers and data pointers |
35c336e6 | 1918 | |
cce04beb DG |
1919 | Technically speaking casting between function pointers and data |
1920 | pointers is unportable and undefined, but practically speaking | |
1921 | it seems to work, but you should use the FPTR2DPTR() and DPTR2FPTR() | |
1922 | macros. Sometimes you can also play games with unions. | |
35c336e6 | 1923 | |
cce04beb | 1924 | =item * |
35c336e6 | 1925 | |
cce04beb | 1926 | Assuming sizeof(int) == sizeof(long) |
35c336e6 | 1927 | |
cce04beb DG |
1928 | There are platforms where longs are 64 bits, and platforms where ints |
1929 | are 64 bits, and while we are out to shock you, even platforms where | |
1930 | shorts are 64 bits. This is all legal according to the C standard. | |
1931 | (In other words, "long long" is not a portable way to specify 64 bits, | |
1932 | and "long long" is not even guaranteed to be any wider than "long".) | |
f7e1e956 | 1933 | |
cce04beb DG |
1934 | Instead, use the definitions IV, UV, IVSIZE, I32SIZE, and so forth. |
1935 | Avoid things like I32 because they are B<not> guaranteed to be | |
1936 | I<exactly> 32 bits, they are I<at least> 32 bits, nor are they | |
1937 | guaranteed to be B<int> or B<long>. If you really explicitly need | |
1938 | 64-bit variables, use I64 and U64, but only if guarded by HAS_QUAD. | |
f7e1e956 | 1939 | |
cce04beb | 1940 | =item * |
f7e1e956 | 1941 | |
cce04beb | 1942 | Assuming one can dereference any type of pointer for any type of data |
e018f8be | 1943 | |
cce04beb DG |
1944 | char *p = ...; |
1945 | long pony = *p; /* BAD */ | |
e018f8be | 1946 | |
cce04beb DG |
1947 | Many platforms, quite rightly so, will give you a core dump instead |
1948 | of a pony if the p happens not be correctly aligned. | |
e018f8be | 1949 | |
cce04beb | 1950 | =item * |
e018f8be | 1951 | |
cce04beb | 1952 | Lvalue casts |
e018f8be | 1953 | |
cce04beb | 1954 | (int)*p = ...; /* BAD */ |
7205a85d | 1955 | |
cce04beb DG |
1956 | Simply not portable. Get your lvalue to be of the right type, |
1957 | or maybe use temporary variables, or dirty tricks with unions. | |
e018f8be | 1958 | |
cce04beb | 1959 | =item * |
b26492ee | 1960 | |
cce04beb DG |
1961 | Assume B<anything> about structs (especially the ones you |
1962 | don't control, like the ones coming from the system headers) | |
7205a85d | 1963 | |
cce04beb | 1964 | =over 8 |
b26492ee | 1965 | |
cce04beb | 1966 | =item * |
e018f8be | 1967 | |
cce04beb | 1968 | That a certain field exists in a struct |
7205a85d | 1969 | |
cce04beb | 1970 | =item * |
e018f8be | 1971 | |
cce04beb | 1972 | That no other fields exist besides the ones you know of |
e018f8be | 1973 | |
cce04beb | 1974 | =item * |
7a834142 | 1975 | |
cce04beb | 1976 | That a field is of certain signedness, sizeof, or type |
7a834142 | 1977 | |
cce04beb | 1978 | =item * |
e018f8be | 1979 | |
cce04beb | 1980 | That the fields are in a certain order |
e018f8be | 1981 | |
cce04beb | 1982 | =over 8 |
e018f8be | 1983 | |
cce04beb | 1984 | =item * |
e018f8be | 1985 | |
cce04beb DG |
1986 | While C guarantees the ordering specified in the struct definition, |
1987 | between different platforms the definitions might differ | |
e018f8be | 1988 | |
cce04beb | 1989 | =back |
e018f8be | 1990 | |
cce04beb | 1991 | =item * |
e018f8be | 1992 | |
cce04beb | 1993 | That the sizeof(struct) or the alignments are the same everywhere |
7205a85d | 1994 | |
cce04beb | 1995 | =over 8 |
cc0710ff | 1996 | |
cce04beb | 1997 | =item * |
cc0710ff | 1998 | |
cce04beb DG |
1999 | There might be padding bytes between the fields to align the fields - |
2000 | the bytes can be anything | |
cc0710ff | 2001 | |
cce04beb | 2002 | =item * |
7205a85d | 2003 | |
cce04beb DG |
2004 | Structs are required to be aligned to the maximum alignment required |
2005 | by the fields - which for native types is for usually equivalent to | |
2006 | sizeof() of the field | |
244d9cb7 | 2007 | |
cce04beb | 2008 | =back |
244d9cb7 | 2009 | |
cce04beb | 2010 | =back |
7205a85d | 2011 | |
cce04beb | 2012 | =item * |
7205a85d | 2013 | |
cce04beb | 2014 | Assuming the character set is ASCIIish |
7205a85d | 2015 | |
cce04beb DG |
2016 | Perl can compile and run under EBCDIC platforms. See L<perlebcdic>. |
2017 | This is transparent for the most part, but because the character sets | |
2018 | differ, you shouldn't use numeric (decimal, octal, nor hex) constants | |
2019 | to refer to characters. You can safely say 'A', but not 0x41. | |
2020 | You can safely say '\n', but not \012. | |
2021 | If a character doesn't have a trivial input form, you can | |
2022 | create a #define for it in both C<utfebcdic.h> and C<utf8.h>, so that | |
2023 | it resolves to different values depending on the character set being used. | |
2024 | (There are three different EBCDIC character sets defined in C<utfebcdic.h>, | |
2025 | so it might be best to insert the #define three times in that file.) | |
a75f557c | 2026 | |
cce04beb DG |
2027 | Also, the range 'A' - 'Z' in ASCII is an unbroken sequence of 26 upper case |
2028 | alphabetic characters. That is not true in EBCDIC. Nor for 'a' to 'z'. | |
2029 | But '0' - '9' is an unbroken range in both systems. Don't assume anything | |
2030 | about other ranges. | |
a75f557c | 2031 | |
cce04beb DG |
2032 | Many of the comments in the existing code ignore the possibility of EBCDIC, |
2033 | and may be wrong therefore, even if the code works. | |
2034 | This is actually a tribute to the successful transparent insertion of being | |
2035 | able to handle EBCDIC without having to change pre-existing code. | |
a75f557c | 2036 | |
cce04beb DG |
2037 | UTF-8 and UTF-EBCDIC are two different encodings used to represent Unicode |
2038 | code points as sequences of bytes. Macros | |
2039 | with the same names (but different definitions) | |
2040 | in C<utf8.h> and C<utfebcdic.h> | |
2041 | are used to allow the calling code to think that there is only one such | |
2042 | encoding. | |
2043 | This is almost always referred to as C<utf8>, but it means the EBCDIC version | |
2044 | as well. Again, comments in the code may well be wrong even if the code itself | |
2045 | is right. | |
2046 | For example, the concept of C<invariant characters> differs between ASCII and | |
2047 | EBCDIC. | |
2048 | On ASCII platforms, only characters that do not have the high-order | |
2049 | bit set (i.e. whose ordinals are strict ASCII, 0 - 127) | |
2050 | are invariant, and the documentation and comments in the code | |
2051 | may assume that, | |
2052 | often referring to something like, say, C<hibit>. | |
2053 | The situation differs and is not so simple on EBCDIC machines, but as long as | |
2054 | the code itself uses the C<NATIVE_IS_INVARIANT()> macro appropriately, it | |
2055 | works, even if the comments are wrong. | |
a75f557c | 2056 | |
cce04beb | 2057 | =item * |
7205a85d | 2058 | |
cce04beb | 2059 | Assuming the character set is just ASCII |
7205a85d | 2060 | |
cce04beb DG |
2061 | ASCII is a 7 bit encoding, but bytes have 8 bits in them. The 128 extra |
2062 | characters have different meanings depending on the locale. Absent a locale, | |
2063 | currently these extra characters are generally considered to be unassigned, | |
2064 | and this has presented some problems. | |
2065 | This is being changed starting in 5.12 so that these characters will | |
2066 | be considered to be Latin-1 (ISO-8859-1). | |
244d9cb7 | 2067 | |
cce04beb | 2068 | =item * |
244d9cb7 | 2069 | |
cce04beb | 2070 | Mixing #define and #ifdef |
244d9cb7 | 2071 | |
cce04beb DG |
2072 | #define BURGLE(x) ... \ |
2073 | #ifdef BURGLE_OLD_STYLE /* BAD */ | |
2074 | ... do it the old way ... \ | |
2075 | #else | |
2076 | ... do it the new way ... \ | |
2077 | #endif | |
244d9cb7 | 2078 | |
cce04beb DG |
2079 | You cannot portably "stack" cpp directives. For example in the above |
2080 | you need two separate BURGLE() #defines, one for each #ifdef branch. | |
244d9cb7 | 2081 | |
cce04beb | 2082 | =item * |
244d9cb7 | 2083 | |
cce04beb | 2084 | Adding non-comment stuff after #endif or #else |
244d9cb7 | 2085 | |
cce04beb DG |
2086 | #ifdef SNOSH |
2087 | ... | |
2088 | #else !SNOSH /* BAD */ | |
2089 | ... | |
2090 | #endif SNOSH /* BAD */ | |
7205a85d | 2091 | |
cce04beb DG |
2092 | The #endif and #else cannot portably have anything non-comment after |
2093 | them. If you want to document what is going (which is a good idea | |
2094 | especially if the branches are long), use (C) comments: | |
7205a85d | 2095 | |
cce04beb DG |
2096 | #ifdef SNOSH |
2097 | ... | |
2098 | #else /* !SNOSH */ | |
2099 | ... | |
2100 | #endif /* SNOSH */ | |
7205a85d | 2101 | |
cce04beb DG |
2102 | The gcc option C<-Wendif-labels> warns about the bad variant |
2103 | (by default on starting from Perl 5.9.4). | |
7205a85d | 2104 | |
cce04beb | 2105 | =item * |
7205a85d | 2106 | |
cce04beb | 2107 | Having a comma after the last element of an enum list |
7205a85d | 2108 | |
cce04beb DG |
2109 | enum color { |
2110 | CERULEAN, | |
2111 | CHARTREUSE, | |
2112 | CINNABAR, /* BAD */ | |
2113 | }; | |
7205a85d | 2114 | |
cce04beb | 2115 | is not portable. Leave out the last comma. |
7205a85d | 2116 | |
cce04beb DG |
2117 | Also note that whether enums are implicitly morphable to ints |
2118 | varies between compilers, you might need to (int). | |
7205a85d | 2119 | |
cce04beb | 2120 | =item * |
7205a85d | 2121 | |
cce04beb | 2122 | Using //-comments |
7205a85d | 2123 | |
cce04beb | 2124 | // This function bamfoodles the zorklator. /* BAD */ |
7205a85d | 2125 | |
cce04beb DG |
2126 | That is C99 or C++. Perl is C89. Using the //-comments is silently |
2127 | allowed by many C compilers but cranking up the ANSI C89 strictness | |
2128 | (which we like to do) causes the compilation to fail. | |
7205a85d | 2129 | |
cce04beb | 2130 | =item * |
7205a85d | 2131 | |
cce04beb | 2132 | Mixing declarations and code |
244d9cb7 | 2133 | |
cce04beb DG |
2134 | void zorklator() |
2135 | { | |
2136 | int n = 3; | |
2137 | set_zorkmids(n); /* BAD */ | |
2138 | int q = 4; | |
244d9cb7 | 2139 | |
cce04beb | 2140 | That is C99 or C++. Some C compilers allow that, but you shouldn't. |
244d9cb7 | 2141 | |
cce04beb DG |
2142 | The gcc option C<-Wdeclaration-after-statements> scans for such problems |
2143 | (by default on starting from Perl 5.9.4). | |
244d9cb7 | 2144 | |
cce04beb | 2145 | =item * |
244d9cb7 | 2146 | |
cce04beb | 2147 | Introducing variables inside for() |
244d9cb7 | 2148 | |
cce04beb | 2149 | for(int i = ...; ...; ...) { /* BAD */ |
244d9cb7 | 2150 | |
cce04beb DG |
2151 | That is C99 or C++. While it would indeed be awfully nice to have that |
2152 | also in C89, to limit the scope of the loop variable, alas, we cannot. | |
244d9cb7 | 2153 | |
cce04beb | 2154 | =item * |
244d9cb7 | 2155 | |
cce04beb | 2156 | Mixing signed char pointers with unsigned char pointers |
244d9cb7 | 2157 | |
cce04beb DG |
2158 | int foo(char *s) { ... } |
2159 | ... | |
2160 | unsigned char *t = ...; /* Or U8* t = ... */ | |
2161 | foo(t); /* BAD */ | |
244d9cb7 | 2162 | |
cce04beb DG |
2163 | While this is legal practice, it is certainly dubious, and downright |
2164 | fatal in at least one platform: for example VMS cc considers this a | |
2165 | fatal error. One cause for people often making this mistake is that a | |
2166 | "naked char" and therefore dereferencing a "naked char pointer" have | |
2167 | an undefined signedness: it depends on the compiler and the flags of | |
2168 | the compiler and the underlying platform whether the result is signed | |
2169 | or unsigned. For this very same reason using a 'char' as an array | |
2170 | index is bad. | |
244d9cb7 | 2171 | |
cce04beb | 2172 | =item * |
f7e1e956 | 2173 | |
cce04beb DG |
2174 | Macros that have string constants and their arguments as substrings of |
2175 | the string constants | |
7cd58830 | 2176 | |
cce04beb DG |
2177 | #define FOO(n) printf("number = %d\n", n) /* BAD */ |
2178 | FOO(10); | |
7cd58830 | 2179 | |
cce04beb | 2180 | Pre-ANSI semantics for that was equivalent to |
7cd58830 | 2181 | |
cce04beb | 2182 | printf("10umber = %d\10"); |
7cd58830 | 2183 | |
cce04beb DG |
2184 | which is probably not what you were expecting. Unfortunately at least |
2185 | one reasonably common and modern C compiler does "real backward | |
2186 | compatibility" here, in AIX that is what still happens even though the | |
2187 | rest of the AIX compiler is very happily C89. | |
7cd58830 | 2188 | |
cce04beb | 2189 | =item * |
7cd58830 | 2190 | |
cce04beb | 2191 | Using printf formats for non-basic C types |
7cd58830 | 2192 | |
cce04beb DG |
2193 | IV i = ...; |
2194 | printf("i = %d\n", i); /* BAD */ | |
7cd58830 | 2195 | |
cce04beb DG |
2196 | While this might by accident work in some platform (where IV happens |
2197 | to be an C<int>), in general it cannot. IV might be something larger. | |
2198 | Even worse the situation is with more specific types (defined by Perl's | |
2199 | configuration step in F<config.h>): | |
7cd58830 | 2200 | |
cce04beb DG |
2201 | Uid_t who = ...; |
2202 | printf("who = %d\n", who); /* BAD */ | |
7cd58830 | 2203 | |
cce04beb DG |
2204 | The problem here is that Uid_t might be not only not C<int>-wide |
2205 | but it might also be unsigned, in which case large uids would be | |
2206 | printed as negative values. | |
d7889f52 | 2207 | |
cce04beb DG |
2208 | There is no simple solution to this because of printf()'s limited |
2209 | intelligence, but for many types the right format is available as | |
2210 | with either 'f' or '_f' suffix, for example: | |
d7889f52 | 2211 | |
cce04beb DG |
2212 | IVdf /* IV in decimal */ |
2213 | UVxf /* UV is hexadecimal */ | |
d7889f52 | 2214 | |
cce04beb | 2215 | printf("i = %"IVdf"\n", i); /* The IVdf is a string constant. */ |
d7889f52 | 2216 | |
cce04beb | 2217 | Uid_t_f /* Uid_t in decimal */ |
d7889f52 | 2218 | |
cce04beb | 2219 | printf("who = %"Uid_t_f"\n", who); |
d7889f52 | 2220 | |
cce04beb | 2221 | Or you can try casting to a "wide enough" type: |
d7889f52 | 2222 | |
cce04beb | 2223 | printf("i = %"IVdf"\n", (IV)something_very_small_and_signed); |
d7889f52 | 2224 | |
cce04beb | 2225 | Also remember that the C<%p> format really does require a void pointer: |
d7889f52 | 2226 | |
cce04beb DG |
2227 | U8* p = ...; |
2228 | printf("p = %p\n", (void*)p); | |
2229 | ||
2230 | The gcc option C<-Wformat> scans for such problems. | |
d7889f52 JH |
2231 | |
2232 | =item * | |
2233 | ||
cce04beb | 2234 | Blindly using variadic macros |
d7889f52 | 2235 | |
cce04beb DG |
2236 | gcc has had them for a while with its own syntax, and C99 brought |
2237 | them with a standardized syntax. Don't use the former, and use | |
2238 | the latter only if the HAS_C99_VARIADIC_MACROS is defined. | |
d7889f52 JH |
2239 | |
2240 | =item * | |
2241 | ||
cce04beb | 2242 | Blindly passing va_list |
ee9468a2 | 2243 | |
cce04beb DG |
2244 | Not all platforms support passing va_list to further varargs (stdarg) |
2245 | functions. The right thing to do is to copy the va_list using the | |
2246 | Perl_va_copy() if the NEED_VA_COPY is defined. | |
ee9468a2 | 2247 | |
cce04beb | 2248 | =item * |
ee9468a2 | 2249 | |
cce04beb | 2250 | Using gcc statement expressions |
ee9468a2 | 2251 | |
cce04beb | 2252 | val = ({...;...;...}); /* BAD */ |
ee9468a2 | 2253 | |
cce04beb DG |
2254 | While a nice extension, it's not portable. The Perl code does |
2255 | admittedly use them if available to gain some extra speed | |
2256 | (essentially as a funky form of inlining), but you shouldn't. | |
bc028b6b | 2257 | |
ee9468a2 RGS |
2258 | =item * |
2259 | ||
cce04beb | 2260 | Binding together several statements in a macro |
d7889f52 | 2261 | |
cce04beb DG |
2262 | Use the macros STMT_START and STMT_END. |
2263 | ||
2264 | STMT_START { | |
2265 | ... | |
2266 | } STMT_END | |
d7889f52 JH |
2267 | |
2268 | =item * | |
2269 | ||
cce04beb | 2270 | Testing for operating systems or versions when should be testing for features |
d7889f52 | 2271 | |
cce04beb DG |
2272 | #ifdef __FOONIX__ /* BAD */ |
2273 | foo = quux(); | |
2274 | #endif | |
d7889f52 | 2275 | |
cce04beb DG |
2276 | Unless you know with 100% certainty that quux() is only ever available |
2277 | for the "Foonix" operating system B<and> that is available B<and> | |
2278 | correctly working for B<all> past, present, B<and> future versions of | |
2279 | "Foonix", the above is very wrong. This is more correct (though still | |
2280 | not perfect, because the below is a compile-time check): | |
d7889f52 | 2281 | |
cce04beb DG |
2282 | #ifdef HAS_QUUX |
2283 | foo = quux(); | |
2284 | #endif | |
d7889f52 | 2285 | |
cce04beb DG |
2286 | How does the HAS_QUUX become defined where it needs to be? Well, if |
2287 | Foonix happens to be Unixy enough to be able to run the Configure | |
2288 | script, and Configure has been taught about detecting and testing | |
2289 | quux(), the HAS_QUUX will be correctly defined. In other platforms, | |
2290 | the corresponding configuration step will hopefully do the same. | |
d7889f52 | 2291 | |
cce04beb DG |
2292 | In a pinch, if you cannot wait for Configure to be educated, |
2293 | or if you have a good hunch of where quux() might be available, | |
2294 | you can temporarily try the following: | |
d7889f52 | 2295 | |
cce04beb DG |
2296 | #if (defined(__FOONIX__) || defined(__BARNIX__)) |
2297 | # define HAS_QUUX | |
2298 | #endif | |
d7889f52 | 2299 | |
cce04beb | 2300 | ... |
0bec6c03 | 2301 | |
cce04beb DG |
2302 | #ifdef HAS_QUUX |
2303 | foo = quux(); | |
2304 | #endif | |
d1307786 | 2305 | |
cce04beb | 2306 | But in any case, try to keep the features and operating systems separate. |
0bec6c03 | 2307 | |
cce04beb | 2308 | =back |
ee9468a2 | 2309 | |
cce04beb | 2310 | =head2 Problematic System Interfaces |
d7889f52 JH |
2311 | |
2312 | =over 4 | |
2313 | ||
2314 | =item * | |
2315 | ||
cce04beb DG |
2316 | malloc(0), realloc(0), calloc(0, 0) are non-portable. To be portable |
2317 | allocate at least one byte. (In general you should rarely need to | |
2318 | work at this low level, but instead use the various malloc wrappers.) | |
27565cb6 JH |
2319 | |
2320 | =item * | |
2321 | ||
cce04beb | 2322 | snprintf() - the return type is unportable. Use my_snprintf() instead. |
27565cb6 | 2323 | |
cce04beb | 2324 | =back |
27565cb6 | 2325 | |
cce04beb | 2326 | =head2 Security problems |
27565cb6 | 2327 | |
cce04beb | 2328 | Last but not least, here are various tips for safer coding. |
27565cb6 | 2329 | |
cce04beb | 2330 | =over 4 |
606fd33d | 2331 | |
27565cb6 JH |
2332 | =item * |
2333 | ||
cce04beb | 2334 | Do not use gets() |
606fd33d | 2335 | |
cce04beb | 2336 | Or we will publicly ridicule you. Seriously. |
27565cb6 JH |
2337 | |
2338 | =item * | |
2339 | ||
cce04beb | 2340 | Do not use strcpy() or strcat() or strncpy() or strncat() |
606fd33d | 2341 | |
cce04beb DG |
2342 | Use my_strlcpy() and my_strlcat() instead: they either use the native |
2343 | implementation, or Perl's own implementation (borrowed from the public | |
2344 | domain implementation of INN). | |
27565cb6 JH |
2345 | |
2346 | =item * | |
2347 | ||
cce04beb | 2348 | Do not use sprintf() or vsprintf() |
606fd33d | 2349 | |
cce04beb DG |
2350 | If you really want just plain byte strings, use my_snprintf() |
2351 | and my_vsnprintf() instead, which will try to use snprintf() and | |
2352 | vsnprintf() if those safer APIs are available. If you want something | |
2353 | fancier than a plain byte string, use SVs and Perl_sv_catpvf(). | |
606fd33d JH |
2354 | |
2355 | =back | |
27565cb6 | 2356 | |
d7889f52 | 2357 | |
cce04beb | 2358 | =head1 DEBUGGING |
d7889f52 | 2359 | |
cce04beb DG |
2360 | You can compile a special debugging version of Perl, which allows you |
2361 | to use the C<-D> option of Perl to tell more about what Perl is doing. | |
2362 | But sometimes there is no alternative than to dive in with a debugger, | |
2363 | either to see the stack trace of a core dump (very useful in a bug | |
2364 | report), or trying to figure out what went wrong before the core dump | |
2365 | happened, or how did we end up having wrong or unexpected results. | |
2bbc8d55 | 2366 | |
cce04beb | 2367 | =head2 Poking at Perl |
2bbc8d55 | 2368 | |
cce04beb DG |
2369 | To really poke around with Perl, you'll probably want to build Perl for |
2370 | debugging, like this: | |
2bbc8d55 | 2371 | |
cce04beb DG |
2372 | ./Configure -d -D optimize=-g |
2373 | make | |
2bbc8d55 | 2374 | |
cce04beb DG |
2375 | C<-g> is a flag to the C compiler to have it produce debugging |
2376 | information which will allow us to step through a running program, | |
2377 | and to see in which C function we are at (without the debugging | |
2378 | information we might see only the numerical addresses of the functions, | |
2379 | which is not very helpful). | |
2bbc8d55 | 2380 | |
cce04beb DG |
2381 | F<Configure> will also turn on the C<DEBUGGING> compilation symbol which |
2382 | enables all the internal debugging code in Perl. There are a whole bunch | |
2383 | of things you can debug with this: L<perlrun> lists them all, and the | |
2384 | best way to find out about them is to play about with them. The most | |
2385 | useful options are probably | |
2bbc8d55 | 2386 | |
cce04beb DG |
2387 | l Context (loop) stack processing |
2388 | t Trace execution | |
2389 | o Method and overloading resolution | |
2390 | c String/numeric conversions | |
2bbc8d55 | 2391 | |
cce04beb DG |
2392 | Some of the functionality of the debugging code can be achieved using XS |
2393 | modules. | |
2bbc8d55 | 2394 | |
cce04beb DG |
2395 | -Dr => use re 'debug' |
2396 | -Dx => use O 'Debug' | |
2bbc8d55 | 2397 | |
cce04beb | 2398 | =head2 Using a source-level debugger |
0bec6c03 | 2399 | |
cce04beb DG |
2400 | If the debugging output of C<-D> doesn't help you, it's time to step |
2401 | through perl's execution with a source-level debugger. | |
0bec6c03 | 2402 | |
cce04beb | 2403 | =over 3 |
ee9468a2 RGS |
2404 | |
2405 | =item * | |
2406 | ||
cce04beb DG |
2407 | We'll use C<gdb> for our examples here; the principles will apply to |
2408 | any debugger (many vendors call their debugger C<dbx>), but check the | |
2409 | manual of the one you're using. | |
ee9468a2 | 2410 | |
cce04beb | 2411 | =back |
0bec6c03 | 2412 | |
cce04beb | 2413 | To fire up the debugger, type |
0bec6c03 | 2414 | |
cce04beb | 2415 | gdb ./perl |
27565cb6 | 2416 | |
cce04beb | 2417 | Or if you have a core dump: |
27565cb6 | 2418 | |
cce04beb | 2419 | gdb ./perl core |
27565cb6 | 2420 | |
cce04beb DG |
2421 | You'll want to do that in your Perl source tree so the debugger can read |
2422 | the source code. You should see the copyright message, followed by the | |
2423 | prompt. | |
27565cb6 | 2424 | |
cce04beb | 2425 | (gdb) |
27565cb6 | 2426 | |
cce04beb DG |
2427 | C<help> will get you into the documentation, but here are the most |
2428 | useful commands: | |
d7889f52 | 2429 | |
cce04beb | 2430 | =over 3 |
d7889f52 | 2431 | |
cce04beb | 2432 | =item run [args] |
d7889f52 | 2433 | |
cce04beb | 2434 | Run the program with the given arguments. |
d7889f52 | 2435 | |
cce04beb | 2436 | =item break function_name |
d7889f52 | 2437 | |
cce04beb | 2438 | =item break source.c:xxx |
d7889f52 | 2439 | |
cce04beb DG |
2440 | Tells the debugger that we'll want to pause execution when we reach |
2441 | either the named function (but see L<perlguts/Internal Functions>!) or the given | |
2442 | line in the named source file. | |
0bec6c03 | 2443 | |
cce04beb | 2444 | =item step |
63796a85 | 2445 | |
cce04beb | 2446 | Steps through the program a line at a time. |
0bec6c03 | 2447 | |
cce04beb | 2448 | =item next |
0bec6c03 | 2449 | |
cce04beb DG |
2450 | Steps through the program a line at a time, without descending into |
2451 | functions. | |
0bec6c03 | 2452 | |
cce04beb | 2453 | =item continue |
d7889f52 | 2454 | |
cce04beb | 2455 | Run until the next breakpoint. |
d7889f52 | 2456 | |
cce04beb | 2457 | =item finish |
d7889f52 | 2458 | |
cce04beb | 2459 | Run until the end of the current function, then stop again. |
d7889f52 | 2460 | |
cce04beb | 2461 | =item 'enter' |
d7889f52 | 2462 | |
cce04beb DG |
2463 | Just pressing Enter will do the most recent operation again - it's a |
2464 | blessing when stepping through miles of source code. | |
d7889f52 | 2465 | |
cce04beb | 2466 | =item print |
d7889f52 | 2467 | |
cce04beb DG |
2468 | Execute the given C code and print its results. B<WARNING>: Perl makes |
2469 | heavy use of macros, and F<gdb> does not necessarily support macros | |
2470 | (see later L</"gdb macro support">). You'll have to substitute them | |
2471 | yourself, or to invoke cpp on the source code files | |
2472 | (see L</"The .i Targets">) | |
2473 | So, for instance, you can't say | |
d7889f52 | 2474 | |
cce04beb | 2475 | print SvPV_nolen(sv) |
d7889f52 | 2476 | |
cce04beb | 2477 | but you have to say |
d7889f52 | 2478 | |
cce04beb | 2479 | print Perl_sv_2pv_nolen(sv) |
0bec6c03 | 2480 | |
cce04beb | 2481 | =back |
0bec6c03 | 2482 | |
cce04beb DG |
2483 | You may find it helpful to have a "macro dictionary", which you can |
2484 | produce by saying C<cpp -dM perl.c | sort>. Even then, F<cpp> won't | |
2485 | recursively apply those macros for you. | |
ee9468a2 | 2486 | |
cce04beb | 2487 | =head2 gdb macro support |
ee9468a2 | 2488 | |
cce04beb DG |
2489 | Recent versions of F<gdb> have fairly good macro support, but |
2490 | in order to use it you'll need to compile perl with macro definitions | |
2491 | included in the debugging information. Using F<gcc> version 3.1, this | |
2492 | means configuring with C<-Doptimize=-g3>. Other compilers might use a | |
2493 | different switch (if they support debugging macros at all). | |
ee9468a2 | 2494 | |
cce04beb | 2495 | =head2 Dumping Perl Data Structures |
ee9468a2 | 2496 | |
cce04beb DG |
2497 | One way to get around this macro hell is to use the dumping functions in |
2498 | F<dump.c>; these work a little like an internal | |
2499 | L<Devel::Peek|Devel::Peek>, but they also cover OPs and other structures | |
2500 | that you can't get at from Perl. Let's take an example. We'll use the | |
2501 | C<$a = $b + $c> we used before, but give it a bit of context: | |
2502 | C<$b = "6XXXX"; $c = 2.3;>. Where's a good place to stop and poke around? | |
ee9468a2 | 2503 | |
cce04beb DG |
2504 | What about C<pp_add>, the function we examined earlier to implement the |
2505 | C<+> operator: | |
ee9468a2 | 2506 | |
cce04beb DG |
2507 | (gdb) break Perl_pp_add |
2508 | Breakpoint 1 at 0x46249f: file pp_hot.c, line 309. | |
ee9468a2 | 2509 | |
cce04beb DG |
2510 | Notice we use C<Perl_pp_add> and not C<pp_add> - see L<perlguts/Internal Functions>. |
2511 | With the breakpoint in place, we can run our program: | |
ee9468a2 | 2512 | |
cce04beb | 2513 | (gdb) run -e '$b = "6XXXX"; $c = 2.3; $a = $b + $c' |
ee9468a2 | 2514 | |
cce04beb DG |
2515 | Lots of junk will go past as gdb reads in the relevant source files and |
2516 | libraries, and then: | |
ee9468a2 | 2517 | |
cce04beb DG |
2518 | Breakpoint 1, Perl_pp_add () at pp_hot.c:309 |
2519 | 309 dSP; dATARGET; tryAMAGICbin(add,opASSIGN); | |
2520 | (gdb) step | |
2521 | 311 dPOPTOPnnrl_ul; | |
2522 | (gdb) | |
63796a85 | 2523 | |
cce04beb DG |
2524 | We looked at this bit of code before, and we said that C<dPOPTOPnnrl_ul> |
2525 | arranges for two C<NV>s to be placed into C<left> and C<right> - let's | |
2526 | slightly expand it: | |
63796a85 | 2527 | |
cce04beb DG |
2528 | #define dPOPTOPnnrl_ul NV right = POPn; \ |
2529 | SV *leftsv = TOPs; \ | |
2530 | NV left = USE_LEFT(leftsv) ? SvNV(leftsv) : 0.0 | |
63796a85 | 2531 | |
cce04beb DG |
2532 | C<POPn> takes the SV from the top of the stack and obtains its NV either |
2533 | directly (if C<SvNOK> is set) or by calling the C<sv_2nv> function. | |
2534 | C<TOPs> takes the next SV from the top of the stack - yes, C<POPn> uses | |
2535 | C<TOPs> - but doesn't remove it. We then use C<SvNV> to get the NV from | |
2536 | C<leftsv> in the same way as before - yes, C<POPn> uses C<SvNV>. | |
63796a85 | 2537 | |
cce04beb DG |
2538 | Since we don't have an NV for C<$b>, we'll have to use C<sv_2nv> to |
2539 | convert it. If we step again, we'll find ourselves there: | |
ee9468a2 | 2540 | |
cce04beb DG |
2541 | Perl_sv_2nv (sv=0xa0675d0) at sv.c:1669 |
2542 | 1669 if (!sv) | |
2543 | (gdb) | |
ee9468a2 | 2544 | |
cce04beb | 2545 | We can now use C<Perl_sv_dump> to investigate the SV: |
0bec6c03 | 2546 | |
cce04beb DG |
2547 | SV = PV(0xa057cc0) at 0xa0675d0 |
2548 | REFCNT = 1 | |
2549 | FLAGS = (POK,pPOK) | |
2550 | PV = 0xa06a510 "6XXXX"\0 | |
2551 | CUR = 5 | |
2552 | LEN = 6 | |
2553 | $1 = void | |
0bec6c03 | 2554 | |
cce04beb DG |
2555 | We know we're going to get C<6> from this, so let's finish the |
2556 | subroutine: | |
0bec6c03 | 2557 | |
cce04beb DG |
2558 | (gdb) finish |
2559 | Run till exit from #0 Perl_sv_2nv (sv=0xa0675d0) at sv.c:1671 | |
2560 | 0x462669 in Perl_pp_add () at pp_hot.c:311 | |
2561 | 311 dPOPTOPnnrl_ul; | |
0bec6c03 | 2562 | |
cce04beb DG |
2563 | We can also dump out this op: the current op is always stored in |
2564 | C<PL_op>, and we can dump it with C<Perl_op_dump>. This'll give us | |
2565 | similar output to L<B::Debug|B::Debug>. | |
d7889f52 | 2566 | |
cce04beb DG |
2567 | { |
2568 | 13 TYPE = add ===> 14 | |
2569 | TARG = 1 | |
2570 | FLAGS = (SCALAR,KIDS) | |
2571 | { | |
2572 | TYPE = null ===> (12) | |
2573 | (was rv2sv) | |
2574 | FLAGS = (SCALAR,KIDS) | |
2575 | { | |
2576 | 11 TYPE = gvsv ===> 12 | |
2577 | FLAGS = (SCALAR) | |
2578 | GV = main::b | |
2579 | } | |
2580 | } | |
ee9468a2 | 2581 | |
cce04beb | 2582 | # finish this later # |
63796a85 | 2583 | |
cce04beb | 2584 | =head1 SOURCE CODE STATIC ANALYSIS |
63796a85 | 2585 | |
cce04beb DG |
2586 | Various tools exist for analysing C source code B<statically>, as |
2587 | opposed to B<dynamically>, that is, without executing the code. | |
2588 | It is possible to detect resource leaks, undefined behaviour, type | |
2589 | mismatches, portability problems, code paths that would cause illegal | |
2590 | memory accesses, and other similar problems by just parsing the C code | |
2591 | and looking at the resulting graph, what does it tell about the | |
2592 | execution and data flows. As a matter of fact, this is exactly | |
2593 | how C compilers know to give warnings about dubious code. | |
63796a85 | 2594 | |
cce04beb | 2595 | =head2 lint, splint |
63796a85 | 2596 | |
cce04beb DG |
2597 | The good old C code quality inspector, C<lint>, is available in |
2598 | several platforms, but please be aware that there are several | |
2599 | different implementations of it by different vendors, which means that | |
2600 | the flags are not identical across different platforms. | |
63796a85 | 2601 | |
cce04beb DG |
2602 | There is a lint variant called C<splint> (Secure Programming Lint) |
2603 | available from http://www.splint.org/ that should compile on any | |
2604 | Unix-like platform. | |
63796a85 | 2605 | |
cce04beb DG |
2606 | There are C<lint> and <splint> targets in Makefile, but you may have |
2607 | to diddle with the flags (see above). | |
63796a85 | 2608 | |
cce04beb | 2609 | =head2 Coverity |
63796a85 | 2610 | |
cce04beb DG |
2611 | Coverity (http://www.coverity.com/) is a product similar to lint and |
2612 | as a testbed for their product they periodically check several open | |
2613 | source projects, and they give out accounts to open source developers | |
2614 | to the defect databases. | |
ee9468a2 | 2615 | |
cce04beb | 2616 | =head2 cpd (cut-and-paste detector) |
ee9468a2 | 2617 | |
cce04beb DG |
2618 | The cpd tool detects cut-and-paste coding. If one instance of the |
2619 | cut-and-pasted code changes, all the other spots should probably be | |
2620 | changed, too. Therefore such code should probably be turned into a | |
2621 | subroutine or a macro. | |
ee9468a2 | 2622 | |
cce04beb DG |
2623 | cpd (http://pmd.sourceforge.net/cpd.html) is part of the pmd project |
2624 | (http://pmd.sourceforge.net/). pmd was originally written for static | |
2625 | analysis of Java code, but later the cpd part of it was extended to | |
2626 | parse also C and C++. | |
ee9468a2 | 2627 | |
cce04beb DG |
2628 | Download the pmd-bin-X.Y.zip () from the SourceForge site, extract the |
2629 | pmd-X.Y.jar from it, and then run that on source code thusly: | |
ee9468a2 | 2630 | |
cce04beb | 2631 | java -cp pmd-X.Y.jar net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /some/where/src --language c > cpd.txt |
ee9468a2 | 2632 | |
cce04beb | 2633 | You may run into memory limits, in which case you should use the -Xmx option: |
ee9468a2 | 2634 | |
cce04beb | 2635 | java -Xmx512M ... |
ee9468a2 | 2636 | |
cce04beb | 2637 | =head2 gcc warnings |
ee9468a2 | 2638 | |
cce04beb DG |
2639 | Though much can be written about the inconsistency and coverage |
2640 | problems of gcc warnings (like C<-Wall> not meaning "all the | |
2641 | warnings", or some common portability problems not being covered by | |
2642 | C<-Wall>, or C<-ansi> and C<-pedantic> both being a poorly defined | |
2643 | collection of warnings, and so forth), gcc is still a useful tool in | |
2644 | keeping our coding nose clean. | |
ee9468a2 | 2645 | |
cce04beb | 2646 | The C<-Wall> is by default on. |
d7889f52 | 2647 | |
cce04beb DG |
2648 | The C<-ansi> (and its sidekick, C<-pedantic>) would be nice to be on |
2649 | always, but unfortunately they are not safe on all platforms, they can | |
2650 | for example cause fatal conflicts with the system headers (Solaris | |
2651 | being a prime example). If Configure C<-Dgccansipedantic> is used, | |
2652 | the C<cflags> frontend selects C<-ansi -pedantic> for the platforms | |
2653 | where they are known to be safe. | |
2654 | ||
2655 | Starting from Perl 5.9.4 the following extra flags are added: | |
ad7244db JH |
2656 | |
2657 | =over 4 | |
2658 | ||
2659 | =item * | |
2660 | ||
cce04beb | 2661 | C<-Wendif-labels> |
ad7244db JH |
2662 | |
2663 | =item * | |
2664 | ||
cce04beb | 2665 | C<-Wextra> |
ad7244db | 2666 | |
cce04beb | 2667 | =item * |
ad7244db | 2668 | |
cce04beb | 2669 | C<-Wdeclaration-after-statement> |
d7889f52 | 2670 | |
cce04beb DG |
2671 | =back |
2672 | ||
2673 | The following flags would be nice to have but they would first need | |
2674 | their own Augean stablemaster: | |
d7889f52 JH |
2675 | |
2676 | =over 4 | |
2677 | ||
2678 | =item * | |
2679 | ||
cce04beb | 2680 | C<-Wpointer-arith> |
d7889f52 JH |
2681 | |
2682 | =item * | |
2683 | ||
cce04beb | 2684 | C<-Wshadow> |
d7889f52 JH |
2685 | |
2686 | =item * | |
2687 | ||
cce04beb | 2688 | C<-Wstrict-prototypes> |
d7889f52 JH |
2689 | |
2690 | =back | |
2691 | ||
cce04beb DG |
2692 | The C<-Wtraditional> is another example of the annoying tendency of |
2693 | gcc to bundle a lot of warnings under one switch (it would be | |
2694 | impossible to deploy in practice because it would complain a lot) but | |
2695 | it does contain some warnings that would be beneficial to have available | |
2696 | on their own, such as the warning about string constants inside macros | |
2697 | containing the macro arguments: this behaved differently pre-ANSI | |
2698 | than it does in ANSI, and some C compilers are still in transition, | |
2699 | AIX being an example. | |
2700 | ||
2701 | =head2 Warnings of other C compilers | |
2702 | ||
2703 | Other C compilers (yes, there B<are> other C compilers than gcc) often | |
2704 | have their "strict ANSI" or "strict ANSI with some portability extensions" | |
2705 | modes on, like for example the Sun Workshop has its C<-Xa> mode on | |
2706 | (though implicitly), or the DEC (these days, HP...) has its C<-std1> | |
2707 | mode on. | |
902b9dbf | 2708 | |
cce04beb | 2709 | =head1 MEMORY DEBUGGERS |
902b9dbf | 2710 | |
a958818a JH |
2711 | B<NOTE 1>: Running under memory debuggers such as Purify, valgrind, or |
2712 | Third Degree greatly slows down the execution: seconds become minutes, | |
2713 | minutes become hours. For example as of Perl 5.8.1, the | |
2714 | ext/Encode/t/Unicode.t takes extraordinarily long to complete under | |
2715 | e.g. Purify, Third Degree, and valgrind. Under valgrind it takes more | |
ac036724 | 2716 | than six hours, even on a snappy computer. The said test must be |
a958818a JH |
2717 | doing something that is quite unfriendly for memory debuggers. If you |
2718 | don't feel like waiting, that you can simply kill away the perl | |
2719 | process. | |
2720 | ||
2721 | B<NOTE 2>: To minimize the number of memory leak false alarms (see | |
ac036724 | 2722 | L</PERL_DESTRUCT_LEVEL> for more information), you have to set the |
2723 | environment variable PERL_DESTRUCT_LEVEL to 2. | |
2724 | ||
2725 | For csh-like shells: | |
a958818a JH |
2726 | |
2727 | setenv PERL_DESTRUCT_LEVEL 2 | |
2728 | ||
ac036724 | 2729 | For Bourne-type shells: |
a958818a JH |
2730 | |
2731 | PERL_DESTRUCT_LEVEL=2 | |
2732 | export PERL_DESTRUCT_LEVEL | |
2733 | ||
ac036724 | 2734 | In Unixy environments you can also use the C<env> command: |
a958818a JH |
2735 | |
2736 | env PERL_DESTRUCT_LEVEL=2 valgrind ./perl -Ilib ... | |
a1b65709 | 2737 | |
37c0adeb JH |
2738 | B<NOTE 3>: There are known memory leaks when there are compile-time |
2739 | errors within eval or require, seeing C<S_doeval> in the call stack | |
2740 | is a good sign of these. Fixing these leaks is non-trivial, | |
2741 | unfortunately, but they must be fixed eventually. | |
2742 | ||
f50e5b73 MH |
2743 | B<NOTE 4>: L<DynaLoader> will not clean up after itself completely |
2744 | unless Perl is built with the Configure option | |
2745 | C<-Accflags=-DDL_UNLOAD_ALL_AT_EXIT>. | |
2746 | ||
902b9dbf MLF |
2747 | =head2 Rational Software's Purify |
2748 | ||
2749 | Purify is a commercial tool that is helpful in identifying | |
2750 | memory overruns, wild pointers, memory leaks and other such | |
2751 | badness. Perl must be compiled in a specific way for | |
2752 | optimal testing with Purify. Purify is available under | |
2753 | Windows NT, Solaris, HP-UX, SGI, and Siemens Unix. | |
2754 | ||
cce04beb | 2755 | =head3 Purify on Unix |
902b9dbf MLF |
2756 | |
2757 | On Unix, Purify creates a new Perl binary. To get the most | |
2758 | benefit out of Purify, you should create the perl to Purify | |
2759 | using: | |
2760 | ||
2761 | sh Configure -Accflags=-DPURIFY -Doptimize='-g' \ | |
2762 | -Uusemymalloc -Dusemultiplicity | |
2763 | ||
2764 | where these arguments mean: | |
2765 | ||
2766 | =over 4 | |
2767 | ||
2768 | =item -Accflags=-DPURIFY | |
2769 | ||
2770 | Disables Perl's arena memory allocation functions, as well as | |
2771 | forcing use of memory allocation functions derived from the | |
2772 | system malloc. | |
2773 | ||
2774 | =item -Doptimize='-g' | |
2775 | ||
2776 | Adds debugging information so that you see the exact source | |
2777 | statements where the problem occurs. Without this flag, all | |
2778 | you will see is the source filename of where the error occurred. | |
2779 | ||
2780 | =item -Uusemymalloc | |
2781 | ||
2782 | Disable Perl's malloc so that Purify can more closely monitor | |
2783 | allocations and leaks. Using Perl's malloc will make Purify | |
2784 | report most leaks in the "potential" leaks category. | |
2785 | ||
2786 | =item -Dusemultiplicity | |
2787 | ||
2788 | Enabling the multiplicity option allows perl to clean up | |
2789 | thoroughly when the interpreter shuts down, which reduces the | |
2790 | number of bogus leak reports from Purify. | |
2791 | ||
2792 | =back | |
2793 | ||
2794 | Once you've compiled a perl suitable for Purify'ing, then you | |
2795 | can just: | |
2796 | ||
07aa3531 | 2797 | make pureperl |
902b9dbf MLF |
2798 | |
2799 | which creates a binary named 'pureperl' that has been Purify'ed. | |
2800 | This binary is used in place of the standard 'perl' binary | |
2801 | when you want to debug Perl memory problems. | |
2802 | ||
2803 | As an example, to show any memory leaks produced during the | |
2804 | standard Perl testset you would create and run the Purify'ed | |
2805 | perl as: | |
2806 | ||
2807 | make pureperl | |
2808 | cd t | |
07aa3531 | 2809 | ../pureperl -I../lib harness |
902b9dbf MLF |
2810 | |
2811 | which would run Perl on test.pl and report any memory problems. | |
2812 | ||
2813 | Purify outputs messages in "Viewer" windows by default. If | |
2814 | you don't have a windowing environment or if you simply | |
2815 | want the Purify output to unobtrusively go to a log file | |
2816 | instead of to the interactive window, use these following | |
2817 | options to output to the log file "perl.log": | |
2818 | ||
2819 | setenv PURIFYOPTIONS "-chain-length=25 -windows=no \ | |
2820 | -log-file=perl.log -append-logfile=yes" | |
2821 | ||
2822 | If you plan to use the "Viewer" windows, then you only need this option: | |
2823 | ||
2824 | setenv PURIFYOPTIONS "-chain-length=25" | |
2825 | ||
c406981e JH |
2826 | In Bourne-type shells: |
2827 | ||
98631ff8 JL |
2828 | PURIFYOPTIONS="..." |
2829 | export PURIFYOPTIONS | |
c406981e JH |
2830 | |
2831 | or if you have the "env" utility: | |
2832 | ||
98631ff8 | 2833 | env PURIFYOPTIONS="..." ../pureperl ... |
c406981e | 2834 | |
cce04beb | 2835 | =head3 Purify on NT |
902b9dbf MLF |
2836 | |
2837 | Purify on Windows NT instruments the Perl binary 'perl.exe' | |
2838 | on the fly. There are several options in the makefile you | |
2839 | should change to get the most use out of Purify: | |
2840 | ||
2841 | =over 4 | |
2842 | ||
2843 | =item DEFINES | |
2844 | ||
2845 | You should add -DPURIFY to the DEFINES line so the DEFINES | |
2846 | line looks something like: | |
2847 | ||
195c30ce | 2848 | DEFINES = -DWIN32 -D_CONSOLE -DNO_STRICT $(CRYPT_FLAG) -DPURIFY=1 |
902b9dbf MLF |
2849 | |
2850 | to disable Perl's arena memory allocation functions, as | |
2851 | well as to force use of memory allocation functions derived | |
2852 | from the system malloc. | |
2853 | ||
2854 | =item USE_MULTI = define | |
2855 | ||
2856 | Enabling the multiplicity option allows perl to clean up | |
2857 | thoroughly when the interpreter shuts down, which reduces the | |
2858 | number of bogus leak reports from Purify. | |
2859 | ||
2860 | =item #PERL_MALLOC = define | |
2861 | ||
2862 | Disable Perl's malloc so that Purify can more closely monitor | |
2863 | allocations and leaks. Using Perl's malloc will make Purify | |
2864 | report most leaks in the "potential" leaks category. | |
2865 | ||
2866 | =item CFG = Debug | |
2867 | ||
2868 | Adds debugging information so that you see the exact source | |
2869 | statements where the problem occurs. Without this flag, all | |
2870 | you will see is the source filename of where the error occurred. | |
2871 | ||
2872 | =back | |
2873 | ||
2874 | As an example, to show any memory leaks produced during the | |
2875 | standard Perl testset you would create and run Purify as: | |
2876 | ||
2877 | cd win32 | |
2878 | make | |
2879 | cd ../t | |
07aa3531 | 2880 | purify ../perl -I../lib harness |
902b9dbf MLF |
2881 | |
2882 | which would instrument Perl in memory, run Perl on test.pl, | |
2883 | then finally report any memory problems. | |
2884 | ||
7a834142 JH |
2885 | =head2 valgrind |
2886 | ||
2887 | The excellent valgrind tool can be used to find out both memory leaks | |
9df8f87f LB |
2888 | and illegal memory accesses. As of version 3.3.0, Valgrind only |
2889 | supports Linux on x86, x86-64 and PowerPC. The special "test.valgrind" | |
2890 | target can be used to run the tests under valgrind. Found errors | |
2891 | and memory leaks are logged in files named F<testfile.valgrind>. | |
07aa3531 JC |
2892 | |
2893 | Valgrind also provides a cachegrind tool, invoked on perl as: | |
2894 | ||
038c294a | 2895 | VG_OPTS=--tool=cachegrind make test.valgrind |
d44161bf MHM |
2896 | |
2897 | As system libraries (most notably glibc) are also triggering errors, | |
2898 | valgrind allows to suppress such errors using suppression files. The | |
2899 | default suppression file that comes with valgrind already catches a lot | |
2900 | of them. Some additional suppressions are defined in F<t/perl.supp>. | |
7a834142 JH |
2901 | |
2902 | To get valgrind and for more information see | |
2903 | ||
2904 | http://developer.kde.org/~sewardj/ | |
2905 | ||
f134cc4e | 2906 | =head2 Compaq's/Digital's/HP's Third Degree |
09187cb1 JH |
2907 | |
2908 | Third Degree is a tool for memory leak detection and memory access checks. | |
2909 | It is one of the many tools in the ATOM toolkit. The toolkit is only | |
2910 | available on Tru64 (formerly known as Digital UNIX formerly known as | |
2911 | DEC OSF/1). | |
2912 | ||
2913 | When building Perl, you must first run Configure with -Doptimize=-g | |
2914 | and -Uusemymalloc flags, after that you can use the make targets | |
51a35ef1 JH |
2915 | "perl.third" and "test.third". (What is required is that Perl must be |
2916 | compiled using the C<-g> flag, you may need to re-Configure.) | |
09187cb1 | 2917 | |
64cea5fd | 2918 | The short story is that with "atom" you can instrument the Perl |
83f0ef60 | 2919 | executable to create a new executable called F<perl.third>. When the |
4ae3d70a | 2920 | instrumented executable is run, it creates a log of dubious memory |
83f0ef60 | 2921 | traffic in file called F<perl.3log>. See the manual pages of atom and |
4ae3d70a JH |
2922 | third for more information. The most extensive Third Degree |
2923 | documentation is available in the Compaq "Tru64 UNIX Programmer's | |
2924 | Guide", chapter "Debugging Programs with Third Degree". | |
64cea5fd | 2925 | |
9c54ecba | 2926 | The "test.third" leaves a lot of files named F<foo_bar.3log> in the t/ |
64cea5fd JH |
2927 | subdirectory. There is a problem with these files: Third Degree is so |
2928 | effective that it finds problems also in the system libraries. | |
9c54ecba JH |
2929 | Therefore you should used the Porting/thirdclean script to cleanup |
2930 | the F<*.3log> files. | |
64cea5fd JH |
2931 | |
2932 | There are also leaks that for given certain definition of a leak, | |
2933 | aren't. See L</PERL_DESTRUCT_LEVEL> for more information. | |
2934 | ||
cce04beb | 2935 | =head1 PROFILING |
51a35ef1 | 2936 | |
3b753521 | 2937 | Depending on your platform there are various ways of profiling Perl. |
51a35ef1 JH |
2938 | |
2939 | There are two commonly used techniques of profiling executables: | |
10f58044 | 2940 | I<statistical time-sampling> and I<basic-block counting>. |
51a35ef1 JH |
2941 | |
2942 | The first method takes periodically samples of the CPU program | |
2943 | counter, and since the program counter can be correlated with the code | |
2944 | generated for functions, we get a statistical view of in which | |
2945 | functions the program is spending its time. The caveats are that very | |
2946 | small/fast functions have lower probability of showing up in the | |
2947 | profile, and that periodically interrupting the program (this is | |
2948 | usually done rather frequently, in the scale of milliseconds) imposes | |
2949 | an additional overhead that may skew the results. The first problem | |
2950 | can be alleviated by running the code for longer (in general this is a | |
2951 | good idea for profiling), the second problem is usually kept in guard | |
2952 | by the profiling tools themselves. | |
2953 | ||
10f58044 | 2954 | The second method divides up the generated code into I<basic blocks>. |
51a35ef1 JH |
2955 | Basic blocks are sections of code that are entered only in the |
2956 | beginning and exited only at the end. For example, a conditional jump | |
2957 | starts a basic block. Basic block profiling usually works by | |
10f58044 | 2958 | I<instrumenting> the code by adding I<enter basic block #nnnn> |
51a35ef1 JH |
2959 | book-keeping code to the generated code. During the execution of the |
2960 | code the basic block counters are then updated appropriately. The | |
2961 | caveat is that the added extra code can skew the results: again, the | |
2962 | profiling tools usually try to factor their own effects out of the | |
2963 | results. | |
2964 | ||
83f0ef60 JH |
2965 | =head2 Gprof Profiling |
2966 | ||
e1020413 | 2967 | gprof is a profiling tool available in many Unix platforms, |
51a35ef1 | 2968 | it uses F<statistical time-sampling>. |
83f0ef60 JH |
2969 | |
2970 | You can build a profiled version of perl called "perl.gprof" by | |
51a35ef1 JH |
2971 | invoking the make target "perl.gprof" (What is required is that Perl |
2972 | must be compiled using the C<-pg> flag, you may need to re-Configure). | |
2973 | Running the profiled version of Perl will create an output file called | |
2974 | F<gmon.out> is created which contains the profiling data collected | |
2975 | during the execution. | |
83f0ef60 JH |
2976 | |
2977 | The gprof tool can then display the collected data in various ways. | |
2978 | Usually gprof understands the following options: | |
2979 | ||
2980 | =over 4 | |
2981 | ||
2982 | =item -a | |
2983 | ||
2984 | Suppress statically defined functions from the profile. | |
2985 | ||
2986 | =item -b | |
2987 | ||
2988 | Suppress the verbose descriptions in the profile. | |
2989 | ||
2990 | =item -e routine | |
2991 | ||
2992 | Exclude the given routine and its descendants from the profile. | |
2993 | ||
2994 | =item -f routine | |
2995 | ||
2996 | Display only the given routine and its descendants in the profile. | |
2997 | ||
2998 | =item -s | |
2999 | ||
3000 | Generate a summary file called F<gmon.sum> which then may be given | |
3001 | to subsequent gprof runs to accumulate data over several runs. | |
3002 | ||
3003 | =item -z | |
3004 | ||
3005 | Display routines that have zero usage. | |
3006 | ||
3007 | =back | |
3008 | ||
3009 | For more detailed explanation of the available commands and output | |
3010 | formats, see your own local documentation of gprof. | |
3011 | ||
038c294a | 3012 | quick hint: |
07aa3531 | 3013 | |
289d61c2 JL |
3014 | $ sh Configure -des -Dusedevel -Doptimize='-pg' && make perl.gprof |
3015 | $ ./perl.gprof someprog # creates gmon.out in current directory | |
3016 | $ gprof ./perl.gprof > out | |
07aa3531 JC |
3017 | $ view out |
3018 | ||
51a35ef1 JH |
3019 | =head2 GCC gcov Profiling |
3020 | ||
10f58044 | 3021 | Starting from GCC 3.0 I<basic block profiling> is officially available |
51a35ef1 JH |
3022 | for the GNU CC. |
3023 | ||
3024 | You can build a profiled version of perl called F<perl.gcov> by | |
3025 | invoking the make target "perl.gcov" (what is required that Perl must | |
3026 | be compiled using gcc with the flags C<-fprofile-arcs | |
3027 | -ftest-coverage>, you may need to re-Configure). | |
3028 | ||
3029 | Running the profiled version of Perl will cause profile output to be | |
3030 | generated. For each source file an accompanying ".da" file will be | |
3031 | created. | |
3032 | ||
3033 | To display the results you use the "gcov" utility (which should | |
3034 | be installed if you have gcc 3.0 or newer installed). F<gcov> is | |
3035 | run on source code files, like this | |
3036 | ||
3037 | gcov sv.c | |
3038 | ||
3039 | which will cause F<sv.c.gcov> to be created. The F<.gcov> files | |
3040 | contain the source code annotated with relative frequencies of | |
3041 | execution indicated by "#" markers. | |
3042 | ||
3043 | Useful options of F<gcov> include C<-b> which will summarise the | |
3044 | basic block, branch, and function call coverage, and C<-c> which | |
3045 | instead of relative frequencies will use the actual counts. For | |
3046 | more information on the use of F<gcov> and basic block profiling | |
3047 | with gcc, see the latest GNU CC manual, as of GCC 3.0 see | |
3048 | ||
3049 | http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc.html | |
3050 | ||
3051 | and its section titled "8. gcov: a Test Coverage Program" | |
3052 | ||
3053 | http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc_8.html#SEC132 | |
3054 | ||
07aa3531 JC |
3055 | quick hint: |
3056 | ||
27837272 AB |
3057 | $ sh Configure -des -Dusedevel -Doptimize='-g' \ |
3058 | -Accflags='-fprofile-arcs -ftest-coverage' \ | |
07aa3531 JC |
3059 | -Aldflags='-fprofile-arcs -ftest-coverage' && make perl.gcov |
3060 | $ rm -f regexec.c.gcov regexec.gcda | |
3061 | $ ./perl.gcov | |
3062 | $ gcov regexec.c | |
3063 | $ view regexec.c.gcov | |
3064 | ||
4ae3d70a JH |
3065 | =head2 Pixie Profiling |
3066 | ||
51a35ef1 JH |
3067 | Pixie is a profiling tool available on IRIX and Tru64 (aka Digital |
3068 | UNIX aka DEC OSF/1) platforms. Pixie does its profiling using | |
10f58044 | 3069 | I<basic-block counting>. |
4ae3d70a | 3070 | |
83f0ef60 | 3071 | You can build a profiled version of perl called F<perl.pixie> by |
51a35ef1 JH |
3072 | invoking the make target "perl.pixie" (what is required is that Perl |
3073 | must be compiled using the C<-g> flag, you may need to re-Configure). | |
3074 | ||
3075 | In Tru64 a file called F<perl.Addrs> will also be silently created, | |
3076 | this file contains the addresses of the basic blocks. Running the | |
3077 | profiled version of Perl will create a new file called "perl.Counts" | |
3078 | which contains the counts for the basic block for that particular | |
3079 | program execution. | |
4ae3d70a | 3080 | |
51a35ef1 | 3081 | To display the results you use the F<prof> utility. The exact |
4ae3d70a JH |
3082 | incantation depends on your operating system, "prof perl.Counts" in |
3083 | IRIX, and "prof -pixie -all -L. perl" in Tru64. | |
3084 | ||
6c41479b JH |
3085 | In IRIX the following prof options are available: |
3086 | ||
3087 | =over 4 | |
3088 | ||
3089 | =item -h | |
3090 | ||
3091 | Reports the most heavily used lines in descending order of use. | |
6e36760b | 3092 | Useful for finding the hotspot lines. |
6c41479b JH |
3093 | |
3094 | =item -l | |
3095 | ||
3096 | Groups lines by procedure, with procedures sorted in descending order of use. | |
3097 | Within a procedure, lines are listed in source order. | |
6e36760b | 3098 | Useful for finding the hotspots of procedures. |
6c41479b JH |
3099 | |
3100 | =back | |
3101 | ||
3102 | In Tru64 the following options are available: | |
3103 | ||
3104 | =over 4 | |
3105 | ||
3958b146 | 3106 | =item -p[rocedures] |
6c41479b | 3107 | |
3958b146 | 3108 | Procedures sorted in descending order by the number of cycles executed |
6e36760b | 3109 | in each procedure. Useful for finding the hotspot procedures. |
6c41479b JH |
3110 | (This is the default option.) |
3111 | ||
24000d2f | 3112 | =item -h[eavy] |
6c41479b | 3113 | |
6e36760b JH |
3114 | Lines sorted in descending order by the number of cycles executed in |
3115 | each line. Useful for finding the hotspot lines. | |
6c41479b | 3116 | |
24000d2f | 3117 | =item -i[nvocations] |
6c41479b | 3118 | |
6e36760b JH |
3119 | The called procedures are sorted in descending order by number of calls |
3120 | made to the procedures. Useful for finding the most used procedures. | |
6c41479b | 3121 | |
24000d2f | 3122 | =item -l[ines] |
6c41479b JH |
3123 | |
3124 | Grouped by procedure, sorted by cycles executed per procedure. | |
6e36760b | 3125 | Useful for finding the hotspots of procedures. |
6c41479b JH |
3126 | |
3127 | =item -testcoverage | |
3128 | ||
3129 | The compiler emitted code for these lines, but the code was unexecuted. | |
3130 | ||
24000d2f | 3131 | =item -z[ero] |
6c41479b JH |
3132 | |
3133 | Unexecuted procedures. | |
3134 | ||
aa500c9e | 3135 | =back |
6c41479b JH |
3136 | |
3137 | For further information, see your system's manual pages for pixie and prof. | |
4ae3d70a | 3138 | |
cce04beb | 3139 | =head1 MISCELLANEOUS TRICKS |
b8ddf6b3 | 3140 | |
cce04beb | 3141 | =head2 PERL_DESTRUCT_LEVEL |
b8ddf6b3 | 3142 | |
cce04beb DG |
3143 | If you want to run any of the tests yourself manually using e.g. |
3144 | valgrind, or the pureperl or perl.third executables, please note that | |
3145 | by default perl B<does not> explicitly cleanup all the memory it has | |
3146 | allocated (such as global memory arenas) but instead lets the exit() | |
3147 | of the whole program "take care" of such allocations, also known as | |
3148 | "global destruction of objects". | |
3149 | ||
3150 | There is a way to tell perl to do complete cleanup: set the | |
3151 | environment variable PERL_DESTRUCT_LEVEL to a non-zero value. | |
3152 | The t/TEST wrapper does set this to 2, and this is what you | |
3153 | need to do too, if you don't want to see the "global leaks": | |
3154 | For example, for "third-degreed" Perl: | |
3155 | ||
3156 | env PERL_DESTRUCT_LEVEL=2 ./perl.third -Ilib t/foo/bar.t | |
3157 | ||
3158 | (Note: the mod_perl apache module uses also this environment variable | |
3159 | for its own purposes and extended its semantics. Refer to the mod_perl | |
3160 | documentation for more information. Also, spawned threads do the | |
3161 | equivalent of setting this variable to the value 1.) | |
3162 | ||
3163 | If, at the end of a run you get the message I<N scalars leaked>, you can | |
3164 | recompile with C<-DDEBUG_LEAKING_SCALARS>, which will cause the addresses | |
3165 | of all those leaked SVs to be dumped along with details as to where each | |
3166 | SV was originally allocated. This information is also displayed by | |
3167 | Devel::Peek. Note that the extra details recorded with each SV increases | |
3168 | memory usage, so it shouldn't be used in production environments. It also | |
3169 | converts C<new_SV()> from a macro into a real function, so you can use | |
3170 | your favourite debugger to discover where those pesky SVs were allocated. | |
3171 | ||
3172 | If you see that you're leaking memory at runtime, but neither valgrind | |
3173 | nor C<-DDEBUG_LEAKING_SCALARS> will find anything, you're probably | |
3174 | leaking SVs that are still reachable and will be properly cleaned up | |
3175 | during destruction of the interpreter. In such cases, using the C<-Dm> | |
3176 | switch can point you to the source of the leak. If the executable was | |
3177 | built with C<-DDEBUG_LEAKING_SCALARS>, C<-Dm> will output SV allocations | |
3178 | in addition to memory allocations. Each SV allocation has a distinct | |
3179 | serial number that will be written on creation and destruction of the SV. | |
3180 | So if you're executing the leaking code in a loop, you need to look for | |
3181 | SVs that are created, but never destroyed between each cycle. If such an | |
3182 | SV is found, set a conditional breakpoint within C<new_SV()> and make it | |
3183 | break only when C<PL_sv_serial> is equal to the serial number of the | |
3184 | leaking SV. Then you will catch the interpreter in exactly the state | |
3185 | where the leaking SV is allocated, which is sufficient in many cases to | |
3186 | find the source of the leak. | |
3187 | ||
3188 | As C<-Dm> is using the PerlIO layer for output, it will by itself | |
3189 | allocate quite a bunch of SVs, which are hidden to avoid recursion. | |
3190 | You can bypass the PerlIO layer if you use the SV logging provided | |
3191 | by C<-DPERL_MEM_LOG> instead. | |
3192 | ||
3193 | =head2 PERL_MEM_LOG | |
3194 | ||
3195 | If compiled with C<-DPERL_MEM_LOG>, both memory and SV allocations go | |
3196 | through logging functions, which is handy for breakpoint setting. | |
3197 | ||
3198 | Unless C<-DPERL_MEM_LOG_NOIMPL> is also compiled, the logging | |
3199 | functions read $ENV{PERL_MEM_LOG} to determine whether to log the | |
3200 | event, and if so how: | |
3201 | ||
3202 | $ENV{PERL_MEM_LOG} =~ /m/ Log all memory ops | |
3203 | $ENV{PERL_MEM_LOG} =~ /s/ Log all SV ops | |
3204 | $ENV{PERL_MEM_LOG} =~ /t/ include timestamp in Log | |
3205 | $ENV{PERL_MEM_LOG} =~ /^(\d+)/ write to FD given (default is 2) | |
3206 | ||
3207 | Memory logging is somewhat similar to C<-Dm> but is independent of | |
3208 | C<-DDEBUGGING>, and at a higher level; all uses of Newx(), Renew(), | |
3209 | and Safefree() are logged with the caller's source code file and line | |
3210 | number (and C function name, if supported by the C compiler). In | |
3211 | contrast, C<-Dm> is directly at the point of C<malloc()>. SV logging | |
3212 | is similar. | |
3213 | ||
3214 | Since the logging doesn't use PerlIO, all SV allocations are logged | |
3215 | and no extra SV allocations are introduced by enabling the logging. | |
3216 | If compiled with C<-DDEBUG_LEAKING_SCALARS>, the serial number for | |
3217 | each SV allocation is also logged. | |
3218 | ||
3219 | =head2 DDD over gdb | |
b8ddf6b3 | 3220 | |
cc177e1a | 3221 | Those debugging perl with the DDD frontend over gdb may find the |
b8ddf6b3 SB |
3222 | following useful: |
3223 | ||
3224 | You can extend the data conversion shortcuts menu, so for example you | |
3225 | can display an SV's IV value with one click, without doing any typing. | |
3226 | To do that simply edit ~/.ddd/init file and add after: | |
3227 | ||
3228 | ! Display shortcuts. | |
3229 | Ddd*gdbDisplayShortcuts: \ | |
3230 | /t () // Convert to Bin\n\ | |
3231 | /d () // Convert to Dec\n\ | |
3232 | /x () // Convert to Hex\n\ | |
3233 | /o () // Convert to Oct(\n\ | |
3234 | ||
3235 | the following two lines: | |
3236 | ||
3237 | ((XPV*) (())->sv_any )->xpv_pv // 2pvx\n\ | |
3238 | ((XPVIV*) (())->sv_any )->xiv_iv // 2ivx | |
3239 | ||
3240 | so now you can do ivx and pvx lookups or you can plug there the | |
3241 | sv_peek "conversion": | |
3242 | ||
3243 | Perl_sv_peek(my_perl, (SV*)()) // sv_peek | |
3244 | ||
3245 | (The my_perl is for threaded builds.) | |
3246 | Just remember that every line, but the last one, should end with \n\ | |
3247 | ||
3248 | Alternatively edit the init file interactively via: | |
3249 | 3rd mouse button -> New Display -> Edit Menu | |
3250 | ||
3251 | Note: you can define up to 20 conversion shortcuts in the gdb | |
3252 | section. | |
3253 | ||
cce04beb | 3254 | =head2 Poison |
9965345d | 3255 | |
7e337ee0 JH |
3256 | If you see in a debugger a memory area mysteriously full of 0xABABABAB |
3257 | or 0xEFEFEFEF, you may be seeing the effect of the Poison() macros, | |
3258 | see L<perlclib>. | |
9965345d | 3259 | |
cce04beb | 3260 | =head2 Read-only optrees |
f1fac472 NC |
3261 | |
3262 | Under ithreads the optree is read only. If you want to enforce this, to check | |
3263 | for write accesses from buggy code, compile with C<-DPL_OP_SLAB_ALLOC> to | |
3264 | enable the OP slab allocator and C<-DPERL_DEBUG_READONLY_OPS> to enable code | |
3265 | that allocates op memory via C<mmap>, and sets it read-only at run time. | |
3266 | Any write access to an op results in a C<SIGBUS> and abort. | |
3267 | ||
3268 | This code is intended for development only, and may not be portable even to | |
3269 | all Unix variants. Also, it is an 80% solution, in that it isn't able to make | |
3270 | all ops read only. Specifically it | |
3271 | ||
3272 | =over | |
3273 | ||
3274 | =item 1 | |
3275 | ||
3276 | Only sets read-only on all slabs of ops at C<CHECK> time, hence ops allocated | |
3277 | later via C<require> or C<eval> will be re-write | |
3278 | ||
3279 | =item 2 | |
3280 | ||
3281 | Turns an entire slab of ops read-write if the refcount of any op in the slab | |
3282 | needs to be decreased. | |
3283 | ||
3284 | =item 3 | |
3285 | ||
3286 | Turns an entire slab of ops read-write if any op from the slab is freed. | |
3287 | ||
b8ddf6b3 SB |
3288 | =back |
3289 | ||
f1fac472 NC |
3290 | It's not possible to turn the slabs to read-only after an action requiring |
3291 | read-write access, as either can happen during op tree building time, so | |
3292 | there may still be legitimate write access. | |
3293 | ||
3294 | However, as an 80% solution it is still effective, as currently it catches | |
3295 | a write access during the generation of F<Config.pm>, which means that we | |
3296 | can't yet build F<perl> with this enabled. | |
3297 | ||
955fec6b | 3298 | =head1 CONCLUSION |
a422fd2d | 3299 | |
955fec6b JH |
3300 | We've had a brief look around the Perl source, how to maintain quality |
3301 | of the source code, an overview of the stages F<perl> goes through | |
3302 | when it's running your code, how to use debuggers to poke at the Perl | |
3303 | guts, and finally how to analyse the execution of Perl. We took a very | |
3304 | simple problem and demonstrated how to solve it fully - with | |
3305 | documentation, regression tests, and finally a patch for submission to | |
3306 | p5p. Finally, we talked about how to use external tools to debug and | |
3307 | test Perl. | |
a422fd2d SC |
3308 | |
3309 | I'd now suggest you read over those references again, and then, as soon | |
3310 | as possible, get your hands dirty. The best way to learn is by doing, | |
07aa3531 | 3311 | so: |
a422fd2d SC |
3312 | |
3313 | =over 3 | |
3314 | ||
3315 | =item * | |
3316 | ||
3317 | Subscribe to perl5-porters, follow the patches and try and understand | |
3318 | them; don't be afraid to ask if there's a portion you're not clear on - | |
3319 | who knows, you may unearth a bug in the patch... | |
3320 | ||
3321 | =item * | |
3322 | ||
3323 | Keep up to date with the bleeding edge Perl distributions and get | |
3324 | familiar with the changes. Try and get an idea of what areas people are | |
3325 | working on and the changes they're making. | |
3326 | ||
3327 | =item * | |
3328 | ||
3e148164 | 3329 | Do read the README associated with your operating system, e.g. README.aix |
a1f349fd MB |
3330 | on the IBM AIX OS. Don't hesitate to supply patches to that README if |
3331 | you find anything missing or changed over a new OS release. | |
3332 | ||
3333 | =item * | |
3334 | ||
a422fd2d SC |
3335 | Find an area of Perl that seems interesting to you, and see if you can |
3336 | work out how it works. Scan through the source, and step over it in the | |
3337 | debugger. Play, poke, investigate, fiddle! You'll probably get to | |
3338 | understand not just your chosen area but a much wider range of F<perl>'s | |
3339 | activity as well, and probably sooner than you'd think. | |
3340 | ||
3341 | =back | |
3342 | ||
3343 | =over 3 | |
3344 | ||
3345 | =item I<The Road goes ever on and on, down from the door where it began.> | |
3346 | ||
3347 | =back | |
3348 | ||
64d9b66b | 3349 | If you can do these things, you've started on the long road to Perl porting. |
a422fd2d SC |
3350 | Thanks for wanting to help make Perl better - and happy hacking! |
3351 | ||
4ac71550 TC |
3352 | =head2 Metaphoric Quotations |
3353 | ||
3354 | If you recognized the quote about the Road above, you're in luck. | |
3355 | ||
3356 | Most software projects begin each file with a literal description of each | |
3357 | file's purpose. Perl instead begins each with a literary allusion to that | |
3358 | file's purpose. | |
3359 | ||
3360 | Like chapters in many books, all top-level Perl source files (along with a | |
3361 | few others here and there) begin with an epigramic inscription that alludes, | |
3362 | indirectly and metaphorically, to the material you're about to read. | |
3363 | ||
3364 | Quotations are taken from writings of J.R.R Tolkien pertaining to his | |
3365 | Legendarium, almost always from I<The Lord of the Rings>. Chapters and | |
3366 | page numbers are given using the following editions: | |
3367 | ||
3368 | =over 4 | |
3369 | ||
3370 | =item * | |
3371 | ||
3372 | I<The Hobbit>, by J.R.R. Tolkien. The hardcover, 70th-anniversary | |
3373 | edition of 2007 was used, published in the UK by Harper Collins Publishers | |
3374 | and in the US by the Houghton Mifflin Company. | |
3375 | ||
3376 | =item * | |
3377 | ||
3378 | I<The Lord of the Rings>, by J.R.R. Tolkien. The hardcover, | |
3379 | 50th-anniversary edition of 2004 was used, published in the UK by Harper | |
3380 | Collins Publishers and in the US by the Houghton Mifflin Company. | |
3381 | ||
3382 | =item * | |
3383 | ||
3384 | I<The Lays of Beleriand>, by J.R.R. Tolkien and published posthumously by his | |
3385 | son and literary executor, C.J.R. Tolkien, being the 3rd of the 12 volumes | |
3386 | in Christopher's mammoth I<History of Middle Earth>. Page numbers derive | |
3387 | from the hardcover edition, first published in 1983 by George Allen & | |
3388 | Unwin; no page numbers changed for the special 3-volume omnibus edition of | |
3389 | 2002 or the various trade-paper editions, all again now by Harper Collins | |
3390 | or Houghton Mifflin. | |
3391 | ||
3392 | =back | |
3393 | ||
3394 | Other JRRT books fair game for quotes would thus include I<The Adventures of | |
3395 | Tom Bombadil>, I<The Silmarillion>, I<Unfinished Tales>, and I<The Tale of | |
3396 | the Children of Hurin>, all but the first posthumously assembled by CJRT. | |
3397 | But I<The Lord of the Rings> itself is perfectly fine and probably best to | |
3398 | quote from, provided you can find a suitable quote there. | |
3399 | ||
3400 | So if you were to supply a new, complete, top-level source file to add to | |
3401 | Perl, you should conform to this peculiar practice by yourself selecting an | |
3402 | appropriate quotation from Tolkien, retaining the original spelling and | |
3403 | punctuation and using the same format the rest of the quotes are in. | |
3404 | Indirect and oblique is just fine; remember, it's a metaphor, so being meta | |
3405 | is, after all, what it's for. | |
3406 | ||
e8cd7eae GS |
3407 | =head1 AUTHOR |
3408 | ||
3409 | This document was written by Nathan Torkington, and is maintained by | |
3410 | the perl5-porters mailing list. | |
4ac71550 | 3411 | |
b16c2e4a RGS |
3412 | =head1 SEE ALSO |
3413 | ||
3414 | L<perlrepository> |