This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Current ppport.h forcibly overrides older buggy versions of utf8_to_uvchr_buf
[perl5.git] / pod / perlhacktips.pod
CommitLineData
0061d4fa 1
04c692a8
DR
2=encoding utf8
3
4=for comment
5Consistent formatting of this file is achieved with:
6 perl ./Porting/podtidy pod/perlhacktips.pod
7
8=head1 NAME
9
10perlhacktips - Tips for Perl core C code hacking
11
12=head1 DESCRIPTION
13
14This document will help you learn the best way to go about hacking on
9b22382a 15the Perl core C code. It covers common problems, debugging, profiling,
04c692a8
DR
16and more.
17
18If you haven't read L<perlhack> and L<perlhacktut> yet, you might want
19to do that first.
20
21=head1 COMMON PROBLEMS
22
adf7d503 23Perl source plays by ANSI C89 rules: no C99 (or C++) extensions.
04c692a8
DR
24You don't care about some particular platform having broken Perl? I
25hear there is still a strong demand for J2EE programmers.
26
27=head2 Perl environment problems
28
29=over 4
30
31=item *
32
33Not compiling with threading
34
35Compiling with threading (-Duseithreads) completely rewrites the
9b22382a 36function prototypes of Perl. You better try your changes with that.
04c692a8
DR
37Related to this is the difference between "Perl_-less" and "Perl_-ly"
38APIs, for example:
39
40 Perl_sv_setiv(aTHX_ ...);
41 sv_setiv(...);
42
43The first one explicitly passes in the context, which is needed for
9b22382a
FC
44e.g. threaded builds. The second one does that implicitly; do not get
45them mixed. If you are not passing in a aTHX_, you will need to do a
c91f661c 46dTHX as the first thing in the function.
04c692a8
DR
47
48See L<perlguts/"How multiple interpreters and concurrency are
49supported"> for further discussion about context.
50
51=item *
52
53Not compiling with -DDEBUGGING
54
55The DEBUGGING define exposes more code to the compiler, therefore more
9b22382a 56ways for things to go wrong. You should try it.
04c692a8
DR
57
58=item *
59
60Introducing (non-read-only) globals
61
62Do not introduce any modifiable globals, truly global or file static.
63They are bad form and complicate multithreading and other forms of
9b22382a 64concurrency. The right way is to introduce them as new interpreter
04c692a8
DR
65variables, see F<intrpvar.h> (at the very end for binary
66compatibility).
67
68Introducing read-only (const) globals is okay, as long as you verify
69with e.g. C<nm libperl.a|egrep -v ' [TURtr] '> (if your C<nm> has
9b22382a 70BSD-style output) that the data you added really is read-only. (If it
04c692a8
DR
71is, it shouldn't show up in the output of that command.)
72
73If you want to have static strings, make them constant:
74
75 static const char etc[] = "...";
76
77If you want to have arrays of constant strings, note carefully the
78right combination of C<const>s:
79
80 static const char * const yippee[] =
a63ef199 81 {"hi", "ho", "silver"};
04c692a8 82
04c692a8
DR
83=item *
84
85Not exporting your new function
86
87Some platforms (Win32, AIX, VMS, OS/2, to name a few) require any
88function that is part of the public API (the shared Perl library) to be
9b22382a 89explicitly marked as exported. See the discussion about F<embed.pl> in
04c692a8
DR
90L<perlguts>.
91
92=item *
93
94Exporting your new function
95
96The new shiny result of either genuine new functionality or your
9b22382a 97arduous refactoring is now ready and correctly exported. So what could
04c692a8
DR
98possibly go wrong?
99
100Maybe simply that your function did not need to be exported in the
9b22382a 101first place. Perl has a long and not so glorious history of exporting
04c692a8
DR
102functions that it should not have.
103
104If the function is used only inside one source code file, make it
9b22382a 105static. See the discussion about F<embed.pl> in L<perlguts>.
04c692a8
DR
106
107If the function is used across several files, but intended only for
108Perl's internal use (and this should be the common case), do not export
9b22382a 109it to the public API. See the discussion about F<embed.pl> in
04c692a8
DR
110L<perlguts>.
111
112=back
113
114=head2 Portability problems
115
116The following are common causes of compilation and/or execution
9b22382a
FC
117failures, not common to Perl as such. The C FAQ is good bedtime
118reading. Please test your changes with as many C compilers and
04c692a8
DR
119platforms as possible; we will, anyway, and it's nice to save oneself
120from public embarrassment.
121
122If using gcc, you can add the C<-std=c89> option which will hopefully
9b22382a 123catch most of these unportabilities. (However it might also catch
04c692a8
DR
124incompatibilities in your system's header files.)
125
126Use the Configure C<-Dgccansipedantic> flag to enable the gcc C<-ansi
127-pedantic> flags which enforce stricter ANSI rules.
128
129If using the C<gcc -Wall> note that not all the possible warnings (like
404b28f3 130C<-Wuninitialized>) are given unless you also compile with C<-O>.
04c692a8
DR
131
132Note that if using gcc, starting from Perl 5.9.5 the Perl core source
133code files (the ones at the top level of the source code distribution,
134but not e.g. the extensions under ext/) are automatically compiled with
135as many as possible of the C<-std=c89>, C<-ansi>, C<-pedantic>, and a
136selection of C<-W> flags (see cflags.SH).
137
138Also study L<perlport> carefully to avoid any bad assumptions about the
eb9df707 139operating system, filesystems, character set, and so forth.
04c692a8
DR
140
141You may once in a while try a "make microperl" to see whether we can
9b22382a 142still compile Perl with just the bare minimum of interfaces. (See
04c692a8
DR
143README.micro.)
144
145Do not assume an operating system indicates a certain compiler.
146
147=over 4
148
149=item *
150
151Casting pointers to integers or casting integers to pointers
152
153 void castaway(U8* p)
154 {
155 IV i = p;
156
157or
158
159 void castaway(U8* p)
160 {
161 IV i = (IV)p;
162
9b22382a
FC
163Both are bad, and broken, and unportable. Use the PTR2IV() macro that
164does it right. (Likewise, there are PTR2UV(), PTR2NV(), INT2PTR(), and
04c692a8
DR
165NUM2PTR().)
166
167=item *
168
28ffebaf 169Casting between function pointers and data pointers
04c692a8
DR
170
171Technically speaking casting between function pointers and data
172pointers is unportable and undefined, but practically speaking it seems
173to work, but you should use the FPTR2DPTR() and DPTR2FPTR() macros.
174Sometimes you can also play games with unions.
175
176=item *
177
178Assuming sizeof(int) == sizeof(long)
179
180There are platforms where longs are 64 bits, and platforms where ints
181are 64 bits, and while we are out to shock you, even platforms where
9b22382a 182shorts are 64 bits. This is all legal according to the C standard. (In
04c692a8
DR
183other words, "long long" is not a portable way to specify 64 bits, and
184"long long" is not even guaranteed to be any wider than "long".)
185
186Instead, use the definitions IV, UV, IVSIZE, I32SIZE, and so forth.
187Avoid things like I32 because they are B<not> guaranteed to be
188I<exactly> 32 bits, they are I<at least> 32 bits, nor are they
9b22382a 189guaranteed to be B<int> or B<long>. If you really explicitly need
04c692a8
DR
19064-bit variables, use I64 and U64, but only if guarded by HAS_QUAD.
191
192=item *
193
194Assuming one can dereference any type of pointer for any type of data
195
196 char *p = ...;
56bb4b7b 197 long pony = *(long *)p; /* BAD */
04c692a8
DR
198
199Many platforms, quite rightly so, will give you a core dump instead of
768312ab 200a pony if the p happens not to be correctly aligned.
04c692a8
DR
201
202=item *
203
204Lvalue casts
205
206 (int)*p = ...; /* BAD */
207
9b22382a 208Simply not portable. Get your lvalue to be of the right type, or maybe
04c692a8
DR
209use temporary variables, or dirty tricks with unions.
210
211=item *
212
213Assume B<anything> about structs (especially the ones you don't
214control, like the ones coming from the system headers)
215
216=over 8
217
218=item *
219
220That a certain field exists in a struct
221
222=item *
223
224That no other fields exist besides the ones you know of
225
226=item *
227
228That a field is of certain signedness, sizeof, or type
229
230=item *
231
232That the fields are in a certain order
233
234=over 8
235
236=item *
237
238While C guarantees the ordering specified in the struct definition,
239between different platforms the definitions might differ
240
241=back
242
243=item *
244
245That the sizeof(struct) or the alignments are the same everywhere
246
247=over 8
248
249=item *
250
251There might be padding bytes between the fields to align the fields -
252the bytes can be anything
253
254=item *
255
256Structs are required to be aligned to the maximum alignment required by
257the fields - which for native types is for usually equivalent to
258sizeof() of the field
259
260=back
261
262=back
263
264=item *
265
266Assuming the character set is ASCIIish
267
9b22382a 268Perl can compile and run under EBCDIC platforms. See L<perlebcdic>.
04c692a8
DR
269This is transparent for the most part, but because the character sets
270differ, you shouldn't use numeric (decimal, octal, nor hex) constants
eb9df707
KW
271to refer to characters. You can safely say C<'A'>, but not C<0x41>.
272You can safely say C<'\n'>, but not C<\012>. However, you can use
273macros defined in F<utf8.h> to specify any code point portably.
274C<LATIN1_TO_NATIVE(0xDF)> is going to be the code point that means
275LATIN SMALL LETTER SHARP S on whatever platform you are running on (on
276ASCII platforms it compiles without adding any extra code, so there is
277zero performance hit on those). The acceptable inputs to
278C<LATIN1_TO_NATIVE> are from C<0x00> through C<0xFF>. If your input
279isn't guaranteed to be in that range, use C<UNICODE_TO_NATIVE> instead.
280C<NATIVE_TO_LATIN1> and C<NATIVE_TO_UNICODE> translate the opposite
281direction.
282
283If you need the string representation of a character that doesn't have a
284mnemonic name in C, you should add it to the list in
c22aa07d 285F<regen/unicode_constants.pl>, and have Perl create C<#define>'s for you,
eb6d698b 286based on the current platform.
04c692a8 287
eb9df707
KW
288Note that the C<isI<FOO>> and C<toI<FOO>> macros in F<handy.h> work
289properly on native code points and strings.
290
04c692a8 291Also, the range 'A' - 'Z' in ASCII is an unbroken sequence of 26 upper
9b22382a
FC
292case alphabetic characters. That is not true in EBCDIC. Nor for 'a' to
293'z'. But '0' - '9' is an unbroken range in both systems. Don't assume
c22aa07d 294anything about other ranges. (Note that special handling of ranges in
f4240379 295regular expression patterns and transliterations makes it appear to Perl
c22aa07d 296code that the aforementioned ranges are all unbroken.)
04c692a8
DR
297
298Many of the comments in the existing code ignore the possibility of
9b22382a 299EBCDIC, and may be wrong therefore, even if the code works. This is
04c692a8
DR
300actually a tribute to the successful transparent insertion of being
301able to handle EBCDIC without having to change pre-existing code.
302
303UTF-8 and UTF-EBCDIC are two different encodings used to represent
9b22382a 304Unicode code points as sequences of bytes. Macros with the same names
eb9df707 305(but different definitions) in F<utf8.h> and F<utfebcdic.h> are used to
04c692a8
DR
306allow the calling code to think that there is only one such encoding.
307This is almost always referred to as C<utf8>, but it means the EBCDIC
9b22382a 308version as well. Again, comments in the code may well be wrong even if
eb9df707 309the code itself is right. For example, the concept of UTF-8 C<invariant
9b22382a
FC
310characters> differs between ASCII and EBCDIC. On ASCII platforms, only
311characters that do not have the high-order bit set (i.e. whose ordinals
04c692a8
DR
312are strict ASCII, 0 - 127) are invariant, and the documentation and
313comments in the code may assume that, often referring to something
9b22382a 314like, say, C<hibit>. The situation differs and is not so simple on
04c692a8
DR
315EBCDIC machines, but as long as the code itself uses the
316C<NATIVE_IS_INVARIANT()> macro appropriately, it works, even if the
317comments are wrong.
318
257844b9
KW
319As noted in L<perlhack/TESTING>, when writing test scripts, the file
320F<t/charset_tools.pl> contains some helpful functions for writing tests
321valid on both ASCII and EBCDIC platforms. Sometimes, though, a test
322can't use a function and it's inconvenient to have different test
323versions depending on the platform. There are 20 code points that are
324the same in all 4 character sets currently recognized by Perl (the 3
325EBCDIC code pages plus ISO 8859-1 (ASCII/Latin1)). These can be used in
326such tests, though there is a small possibility that Perl will become
327available in yet another character set, breaking your test. All but one
328of these code points are C0 control characters. The most significant
329controls that are the same are C<\0>, C<\r>, and C<\N{VT}> (also
330specifiable as C<\cK>, C<\x0B>, C<\N{U+0B}>, or C<\013>). The single
331non-control is U+00B6 PILCROW SIGN. The controls that are the same have
332the same bit pattern in all 4 character sets, regardless of the UTF8ness
333of the string containing them. The bit pattern for U+B6 is the same in
334all 4 for non-UTF8 strings, but differs in each when its containing
335string is UTF-8 encoded. The only other code points that have some sort
336of sameness across all 4 character sets are the pair 0xDC and 0xFC.
337Together these represent upper- and lowercase LATIN LETTER U WITH
338DIAERESIS, but which is upper and which is lower may be reversed: 0xDC
339is the capital in Latin1 and 0xFC is the small letter, while 0xFC is the
340capital in EBCDIC and 0xDC is the small one. This factoid may be
341exploited in writing case insensitive tests that are the same across all
3424 character sets.
343
04c692a8
DR
344=item *
345
346Assuming the character set is just ASCII
347
9b22382a 348ASCII is a 7 bit encoding, but bytes have 8 bits in them. The 128 extra
04c692a8
DR
349characters have different meanings depending on the locale. Absent a
350locale, currently these extra characters are generally considered to be
eb9df707
KW
351unassigned, and this has presented some problems. This has being
352changed starting in 5.12 so that these characters can be considered to
353be Latin-1 (ISO-8859-1).
04c692a8
DR
354
355=item *
356
357Mixing #define and #ifdef
358
359 #define BURGLE(x) ... \
360 #ifdef BURGLE_OLD_STYLE /* BAD */
361 ... do it the old way ... \
362 #else
363 ... do it the new way ... \
364 #endif
365
9b22382a 366You cannot portably "stack" cpp directives. For example in the above
04c692a8
DR
367you need two separate BURGLE() #defines, one for each #ifdef branch.
368
369=item *
370
371Adding non-comment stuff after #endif or #else
372
373 #ifdef SNOSH
374 ...
375 #else !SNOSH /* BAD */
376 ...
377 #endif SNOSH /* BAD */
378
379The #endif and #else cannot portably have anything non-comment after
9b22382a 380them. If you want to document what is going (which is a good idea
04c692a8
DR
381especially if the branches are long), use (C) comments:
382
383 #ifdef SNOSH
384 ...
385 #else /* !SNOSH */
386 ...
387 #endif /* SNOSH */
388
389The gcc option C<-Wendif-labels> warns about the bad variant (by
390default on starting from Perl 5.9.4).
391
392=item *
393
394Having a comma after the last element of an enum list
395
396 enum color {
397 CERULEAN,
398 CHARTREUSE,
399 CINNABAR, /* BAD */
400 };
401
9b22382a 402is not portable. Leave out the last comma.
04c692a8
DR
403
404Also note that whether enums are implicitly morphable to ints varies
405between compilers, you might need to (int).
406
407=item *
408
409Using //-comments
410
411 // This function bamfoodles the zorklator. /* BAD */
412
9b22382a 413That is C99 or C++. Perl is C89. Using the //-comments is silently
04c692a8
DR
414allowed by many C compilers but cranking up the ANSI C89 strictness
415(which we like to do) causes the compilation to fail.
416
417=item *
418
419Mixing declarations and code
420
421 void zorklator()
422 {
423 int n = 3;
424 set_zorkmids(n); /* BAD */
425 int q = 4;
426
9b22382a 427That is C99 or C++. Some C compilers allow that, but you shouldn't.
04c692a8 428
d821b3b4 429The gcc option C<-Wdeclaration-after-statement> scans for such
04c692a8
DR
430problems (by default on starting from Perl 5.9.4).
431
432=item *
433
434Introducing variables inside for()
435
436 for(int i = ...; ...; ...) { /* BAD */
437
9b22382a 438That is C99 or C++. While it would indeed be awfully nice to have that
04c692a8
DR
439also in C89, to limit the scope of the loop variable, alas, we cannot.
440
441=item *
442
443Mixing signed char pointers with unsigned char pointers
444
445 int foo(char *s) { ... }
446 ...
447 unsigned char *t = ...; /* Or U8* t = ... */
448 foo(t); /* BAD */
449
450While this is legal practice, it is certainly dubious, and downright
451fatal in at least one platform: for example VMS cc considers this a
9b22382a 452fatal error. One cause for people often making this mistake is that a
04c692a8
DR
453"naked char" and therefore dereferencing a "naked char pointer" have an
454undefined signedness: it depends on the compiler and the flags of the
455compiler and the underlying platform whether the result is signed or
9b22382a 456unsigned. For this very same reason using a 'char' as an array index is
04c692a8
DR
457bad.
458
459=item *
460
461Macros that have string constants and their arguments as substrings of
462the string constants
463
464 #define FOO(n) printf("number = %d\n", n) /* BAD */
465 FOO(10);
466
467Pre-ANSI semantics for that was equivalent to
468
469 printf("10umber = %d\10");
470
9b22382a 471which is probably not what you were expecting. Unfortunately at least
04c692a8
DR
472one reasonably common and modern C compiler does "real backward
473compatibility" here, in AIX that is what still happens even though the
474rest of the AIX compiler is very happily C89.
475
476=item *
477
478Using printf formats for non-basic C types
479
480 IV i = ...;
481 printf("i = %d\n", i); /* BAD */
482
483While this might by accident work in some platform (where IV happens to
9b22382a 484be an C<int>), in general it cannot. IV might be something larger. Even
04c692a8
DR
485worse the situation is with more specific types (defined by Perl's
486configuration step in F<config.h>):
487
488 Uid_t who = ...;
489 printf("who = %d\n", who); /* BAD */
490
491The problem here is that Uid_t might be not only not C<int>-wide but it
492might also be unsigned, in which case large uids would be printed as
493negative values.
494
495There is no simple solution to this because of printf()'s limited
496intelligence, but for many types the right format is available as with
497either 'f' or '_f' suffix, for example:
498
499 IVdf /* IV in decimal */
500 UVxf /* UV is hexadecimal */
501
502 printf("i = %"IVdf"\n", i); /* The IVdf is a string constant. */
503
504 Uid_t_f /* Uid_t in decimal */
505
506 printf("who = %"Uid_t_f"\n", who);
507
508Or you can try casting to a "wide enough" type:
509
510 printf("i = %"IVdf"\n", (IV)something_very_small_and_signed);
511
9dd1a77d
KW
512See L<perlguts/Formatted Printing of Size_t and SSize_t> for how to
513print those.
514
04c692a8
DR
515Also remember that the C<%p> format really does require a void pointer:
516
517 U8* p = ...;
518 printf("p = %p\n", (void*)p);
519
520The gcc option C<-Wformat> scans for such problems.
521
522=item *
523
524Blindly using variadic macros
525
526gcc has had them for a while with its own syntax, and C99 brought them
9b22382a 527with a standardized syntax. Don't use the former, and use the latter
04c692a8
DR
528only if the HAS_C99_VARIADIC_MACROS is defined.
529
530=item *
531
532Blindly passing va_list
533
534Not all platforms support passing va_list to further varargs (stdarg)
9b22382a 535functions. The right thing to do is to copy the va_list using the
04c692a8
DR
536Perl_va_copy() if the NEED_VA_COPY is defined.
537
538=item *
539
540Using gcc statement expressions
541
542 val = ({...;...;...}); /* BAD */
543
9b22382a 544While a nice extension, it's not portable. The Perl code does
04c692a8
DR
545admittedly use them if available to gain some extra speed (essentially
546as a funky form of inlining), but you shouldn't.
547
548=item *
549
550Binding together several statements in a macro
551
552Use the macros STMT_START and STMT_END.
553
554 STMT_START {
555 ...
556 } STMT_END
557
558=item *
559
560Testing for operating systems or versions when should be testing for
561features
562
563 #ifdef __FOONIX__ /* BAD */
564 foo = quux();
565 #endif
566
567Unless you know with 100% certainty that quux() is only ever available
568for the "Foonix" operating system B<and> that is available B<and>
569correctly working for B<all> past, present, B<and> future versions of
9b22382a 570"Foonix", the above is very wrong. This is more correct (though still
04c692a8
DR
571not perfect, because the below is a compile-time check):
572
573 #ifdef HAS_QUUX
574 foo = quux();
575 #endif
576
577How does the HAS_QUUX become defined where it needs to be? Well, if
578Foonix happens to be Unixy enough to be able to run the Configure
579script, and Configure has been taught about detecting and testing
9b22382a 580quux(), the HAS_QUUX will be correctly defined. In other platforms, the
04c692a8
DR
581corresponding configuration step will hopefully do the same.
582
583In a pinch, if you cannot wait for Configure to be educated, or if you
584have a good hunch of where quux() might be available, you can
585temporarily try the following:
586
587 #if (defined(__FOONIX__) || defined(__BARNIX__))
588 # define HAS_QUUX
589 #endif
590
591 ...
592
593 #ifdef HAS_QUUX
594 foo = quux();
595 #endif
596
597But in any case, try to keep the features and operating systems
598separate.
599
b39b5b0c
JH
600A good resource on the predefined macros for various operating
601systems, compilers, and so forth is
602L<http://sourceforge.net/p/predef/wiki/Home/>
603
38f18a30
KW
604=item *
605
606Assuming the contents of static memory pointed to by the return values
607of Perl wrappers for C library functions doesn't change. Many C library
608functions return pointers to static storage that can be overwritten by
609subsequent calls to the same or related functions. Perl has
610light-weight wrappers for some of these functions, and which don't make
611copies of the static memory. A good example is the interface to the
612environment variables that are in effect for the program. Perl has
613C<PerlEnv_getenv> to get values from the environment. But the return is
614a pointer to static memory in the C library. If you are using the value
615to immediately test for something, that's fine, but if you save the
616value and expect it to be unchanged by later processing, you would be
617wrong, but perhaps you wouldn't know it because different C library
618implementations behave differently, and the one on the platform you're
619testing on might work for your situation. But on some platforms, a
620subsequent call to C<PerlEnv_getenv> or related function WILL overwrite
621the memory that your first call points to. This has led to some
622hard-to-debug problems. Do a L<perlapi/savepv> to make a copy, thus
623avoiding these problems. You will have to free the copy when you're
624done to avoid memory leaks. If you don't have control over when it gets
625freed, you'll need to make the copy in a mortal scalar, like so:
626
627 if ((s = PerlEnv_getenv("foo") == NULL) {
628 ... /* handle NULL case */
629 }
630 else {
631 s = SvPVX(sv_2mortal(newSVpv(s, 0)));
632 }
633
634The above example works only if C<"s"> is C<NUL>-terminated; otherwise
635you have to pass its length to C<newSVpv>.
636
04c692a8
DR
637=back
638
639=head2 Problematic System Interfaces
640
641=over 4
642
643=item *
644
4aada8b9
KW
645Perl strings are NOT the same as C strings: They may contain C<NUL>
646characters, whereas a C string is terminated by the first C<NUL>.
647That is why Perl API functions that deal with strings generally take a
648pointer to the first byte and either a length or a pointer to the byte
649just beyond the final one.
650
651And this is the reason that many of the C library string handling
652functions should not be used. They don't cope with the full generality
653of Perl strings. It may be that your test cases don't have embedded
654C<NUL>s, and so the tests pass, whereas there may well eventually arise
655real-world cases where they fail. A lesson here is to include C<NUL>s
656in your tests. Now it's fairly rare in most real world cases to get
657C<NUL>s, so your code may seem to work, until one day a C<NUL> comes
658along.
659
660Here's an example. It used to be a common paradigm, for decades, in the
661perl core to use S<C<strchr("list", c)>> to see if the character C<c> is
662any of the ones given in C<"list">, a double-quote-enclosed string of
663the set of characters that we are seeing if C<c> is one of. As long as
664C<c> isn't a C<NUL>, it works. But when C<c> is a C<NUL>, C<strchr>
665returns a pointer to the terminating C<NUL> in C<"list">. This likely
666will result in a segfault or a security issue when the caller uses that
667end pointer as the starting point to read from.
668
669A solution to this and many similar issues is to use the C<mem>I<-foo> C
670library functions instead. In this case C<memchr> can be used to see if
671C<c> is in C<"list"> and works even if C<c> is C<NUL>. These functions
672need an additional parameter to give the string length.
673In the case of literal string parameters, perl has defined macros that
51b56f5c 674calculate the length for you. See L<perlapi/String Handling>.
4aada8b9
KW
675
676=item *
677
9b22382a
FC
678malloc(0), realloc(0), calloc(0, 0) are non-portable. To be portable
679allocate at least one byte. (In general you should rarely need to work
04c692a8
DR
680at this low level, but instead use the various malloc wrappers.)
681
4059ba87
AC
682=item *
683
684snprintf() - the return type is unportable. Use my_snprintf() instead.
685
04c692a8
DR
686=back
687
688=head2 Security problems
689
690Last but not least, here are various tips for safer coding.
bbc89b61 691See also L<perlclib> for libc/stdio replacements one should use.
04c692a8
DR
692
693=over 4
694
695=item *
696
697Do not use gets()
698
9b22382a 699Or we will publicly ridicule you. Seriously.
04c692a8
DR
700
701=item *
702
bbc89b61
JH
703Do not use tmpfile()
704
705Use mkstemp() instead.
706
707=item *
708
04c692a8
DR
709Do not use strcpy() or strcat() or strncpy() or strncat()
710
711Use my_strlcpy() and my_strlcat() instead: they either use the native
712implementation, or Perl's own implementation (borrowed from the public
713domain implementation of INN).
714
715=item *
716
717Do not use sprintf() or vsprintf()
718
4059ba87
AC
719If you really want just plain byte strings, use my_snprintf() and
720my_vsnprintf() instead, which will try to use snprintf() and
721vsnprintf() if those safer APIs are available. If you want something
6bfe0388
KW
722fancier than a plain byte string, use
723L<C<Perl_form>()|perlapi/form> or SVs and
724L<C<Perl_sv_catpvf()>|perlapi/sv_catpvf>.
725
4059ba87
AC
726Note that glibc C<printf()>, C<sprintf()>, etc. are buggy before glibc
727version 2.17. They won't allow a C<%.s> format with a precision to
728create a string that isn't valid UTF-8 if the current underlying locale
729of the program is UTF-8. What happens is that the C<%s> and its operand are
730simply skipped without any notice.
731L<https://sourceware.org/bugzilla/show_bug.cgi?id=6530>.
732
c98823ff
JH
733=item *
734
735Do not use atoi()
736
22ff3130 737Use grok_atoUV() instead. atoi() has ill-defined behavior on overflows,
c98823ff 738and cannot be used for incremental parsing. It is also affected by locale,
338aa8b0
JH
739which is bad.
740
741=item *
742
743Do not use strtol() or strtoul()
744
22ff3130 745Use grok_atoUV() instead. strtol() or strtoul() (or their IV/UV-friendly
338aa8b0
JH
746macro disguises, Strtol() and Strtoul(), or Atol() and Atoul() are
747affected by locale, which is bad.
c98823ff 748
04c692a8
DR
749=back
750
751=head1 DEBUGGING
752
753You can compile a special debugging version of Perl, which allows you
754to use the C<-D> option of Perl to tell more about what Perl is doing.
755But sometimes there is no alternative than to dive in with a debugger,
756either to see the stack trace of a core dump (very useful in a bug
757report), or trying to figure out what went wrong before the core dump
758happened, or how did we end up having wrong or unexpected results.
759
760=head2 Poking at Perl
761
762To really poke around with Perl, you'll probably want to build Perl for
763debugging, like this:
764
f075db89 765 ./Configure -d -DDEBUGGING
04c692a8
DR
766 make
767
f075db89
DM
768C<-DDEBUGGING> turns on the C compiler's C<-g> flag to have it produce
769debugging information which will allow us to step through a running
770program, and to see in which C function we are at (without the debugging
771information we might see only the numerical addresses of the functions,
772which is not very helpful). It will also turn on the C<DEBUGGING>
773compilation symbol which enables all the internal debugging code in Perl.
028611fa
DB
774There are a whole bunch of things you can debug with this:
775L<perlrun|perlrun/-Dletters> lists them all, and the best way to find out
776about them is to play about with them. The most useful options are
777probably
04c692a8
DR
778
779 l Context (loop) stack processing
f075db89 780 s Stack snapshots (with v, displays all stacks)
04c692a8
DR
781 t Trace execution
782 o Method and overloading resolution
783 c String/numeric conversions
784
f075db89
DM
785For example
786
787 $ perl -Dst -e '$a + 1'
788 ....
789 (-e:1) gvsv(main::a)
790 => UNDEF
791 (-e:1) const(IV(1))
792 => UNDEF IV(1)
793 (-e:1) add
794 => NV(1)
795
796
797Some of the functionality of the debugging code can be achieved with a
798non-debugging perl by using XS modules:
04c692a8
DR
799
800 -Dr => use re 'debug'
801 -Dx => use O 'Debug'
802
803=head2 Using a source-level debugger
804
805If the debugging output of C<-D> doesn't help you, it's time to step
806through perl's execution with a source-level debugger.
807
808=over 3
809
810=item *
811
812We'll use C<gdb> for our examples here; the principles will apply to
813any debugger (many vendors call their debugger C<dbx>), but check the
814manual of the one you're using.
815
816=back
817
818To fire up the debugger, type
819
820 gdb ./perl
821
822Or if you have a core dump:
823
824 gdb ./perl core
825
826You'll want to do that in your Perl source tree so the debugger can
9b22382a 827read the source code. You should see the copyright message, followed by
04c692a8
DR
828the prompt.
829
830 (gdb)
831
832C<help> will get you into the documentation, but here are the most
833useful commands:
834
835=over 3
836
837=item * run [args]
838
839Run the program with the given arguments.
840
841=item * break function_name
842
843=item * break source.c:xxx
844
845Tells the debugger that we'll want to pause execution when we reach
846either the named function (but see L<perlguts/Internal Functions>!) or
847the given line in the named source file.
848
849=item * step
850
851Steps through the program a line at a time.
852
853=item * next
854
855Steps through the program a line at a time, without descending into
856functions.
857
858=item * continue
859
860Run until the next breakpoint.
861
862=item * finish
863
864Run until the end of the current function, then stop again.
865
866=item * 'enter'
867
868Just pressing Enter will do the most recent operation again - it's a
869blessing when stepping through miles of source code.
870
8b029fdf
MH
871=item * ptype
872
873Prints the C definition of the argument given.
874
875 (gdb) ptype PL_op
876 type = struct op {
877 OP *op_next;
86cd3a13 878 OP *op_sibparent;
8b029fdf
MH
879 OP *(*op_ppaddr)(void);
880 PADOFFSET op_targ;
881 unsigned int op_type : 9;
882 unsigned int op_opt : 1;
883 unsigned int op_slabbed : 1;
884 unsigned int op_savefree : 1;
885 unsigned int op_static : 1;
886 unsigned int op_folded : 1;
887 unsigned int op_spare : 2;
888 U8 op_flags;
889 U8 op_private;
890 } *
891
04c692a8
DR
892=item * print
893
9b22382a 894Execute the given C code and print its results. B<WARNING>: Perl makes
04c692a8 895heavy use of macros, and F<gdb> does not necessarily support macros
9b22382a 896(see later L</"gdb macro support">). You'll have to substitute them
04c692a8
DR
897yourself, or to invoke cpp on the source code files (see L</"The .i
898Targets">) So, for instance, you can't say
899
900 print SvPV_nolen(sv)
901
902but you have to say
903
904 print Perl_sv_2pv_nolen(sv)
905
906=back
907
908You may find it helpful to have a "macro dictionary", which you can
9b22382a 909produce by saying C<cpp -dM perl.c | sort>. Even then, F<cpp> won't
04c692a8
DR
910recursively apply those macros for you.
911
912=head2 gdb macro support
913
914Recent versions of F<gdb> have fairly good macro support, but in order
915to use it you'll need to compile perl with macro definitions included
9b22382a
FC
916in the debugging information. Using F<gcc> version 3.1, this means
917configuring with C<-Doptimize=-g3>. Other compilers might use a
04c692a8
DR
918different switch (if they support debugging macros at all).
919
920=head2 Dumping Perl Data Structures
921
922One way to get around this macro hell is to use the dumping functions
923in F<dump.c>; these work a little like an internal
924L<Devel::Peek|Devel::Peek>, but they also cover OPs and other
9b22382a 925structures that you can't get at from Perl. Let's take an example.
04c692a8 926We'll use the C<$a = $b + $c> we used before, but give it a bit of
9b22382a 927context: C<$b = "6XXXX"; $c = 2.3;>. Where's a good place to stop and
04c692a8
DR
928poke around?
929
930What about C<pp_add>, the function we examined earlier to implement the
931C<+> operator:
932
933 (gdb) break Perl_pp_add
934 Breakpoint 1 at 0x46249f: file pp_hot.c, line 309.
935
936Notice we use C<Perl_pp_add> and not C<pp_add> - see
9b22382a 937L<perlguts/Internal Functions>. With the breakpoint in place, we can
04c692a8
DR
938run our program:
939
940 (gdb) run -e '$b = "6XXXX"; $c = 2.3; $a = $b + $c'
941
942Lots of junk will go past as gdb reads in the relevant source files and
943libraries, and then:
944
945 Breakpoint 1, Perl_pp_add () at pp_hot.c:309
72876cce 946 1396 dSP; dATARGET; bool useleft; SV *svl, *svr;
04c692a8
DR
947 (gdb) step
948 311 dPOPTOPnnrl_ul;
949 (gdb)
950
951We looked at this bit of code before, and we said that
952C<dPOPTOPnnrl_ul> arranges for two C<NV>s to be placed into C<left> and
953C<right> - let's slightly expand it:
954
955 #define dPOPTOPnnrl_ul NV right = POPn; \
956 SV *leftsv = TOPs; \
957 NV left = USE_LEFT(leftsv) ? SvNV(leftsv) : 0.0
958
959C<POPn> takes the SV from the top of the stack and obtains its NV
960either directly (if C<SvNOK> is set) or by calling the C<sv_2nv>
9b22382a
FC
961function. C<TOPs> takes the next SV from the top of the stack - yes,
962C<POPn> uses C<TOPs> - but doesn't remove it. We then use C<SvNV> to
04c692a8
DR
963get the NV from C<leftsv> in the same way as before - yes, C<POPn> uses
964C<SvNV>.
965
966Since we don't have an NV for C<$b>, we'll have to use C<sv_2nv> to
9b22382a 967convert it. If we step again, we'll find ourselves there:
04c692a8 968
8b029fdf 969 (gdb) step
04c692a8
DR
970 Perl_sv_2nv (sv=0xa0675d0) at sv.c:1669
971 1669 if (!sv)
972 (gdb)
973
974We can now use C<Perl_sv_dump> to investigate the SV:
975
8b029fdf 976 (gdb) print Perl_sv_dump(sv)
04c692a8
DR
977 SV = PV(0xa057cc0) at 0xa0675d0
978 REFCNT = 1
979 FLAGS = (POK,pPOK)
980 PV = 0xa06a510 "6XXXX"\0
981 CUR = 5
982 LEN = 6
983 $1 = void
984
985We know we're going to get C<6> from this, so let's finish the
986subroutine:
987
988 (gdb) finish
989 Run till exit from #0 Perl_sv_2nv (sv=0xa0675d0) at sv.c:1671
990 0x462669 in Perl_pp_add () at pp_hot.c:311
991 311 dPOPTOPnnrl_ul;
992
993We can also dump out this op: the current op is always stored in
9b22382a 994C<PL_op>, and we can dump it with C<Perl_op_dump>. This'll give us
903b1101 995similar output to CPAN module B::Debug.
04c692a8 996
8b029fdf 997 (gdb) print Perl_op_dump(PL_op)
04c692a8
DR
998 {
999 13 TYPE = add ===> 14
1000 TARG = 1
1001 FLAGS = (SCALAR,KIDS)
1002 {
1003 TYPE = null ===> (12)
1004 (was rv2sv)
1005 FLAGS = (SCALAR,KIDS)
1006 {
1007 11 TYPE = gvsv ===> 12
1008 FLAGS = (SCALAR)
1009 GV = main::b
1010 }
1011 }
1012
1013# finish this later #
1014
8b029fdf
MH
1015=head2 Using gdb to look at specific parts of a program
1016
73013070
SF
1017With the example above, you knew to look for C<Perl_pp_add>, but what if
1018there were multiple calls to it all over the place, or you didn't know what
8b029fdf
MH
1019the op was you were looking for?
1020
73013070 1021One way to do this is to inject a rare call somewhere near what you're looking
9b22382a 1022for. For example, you could add C<study> before your method:
8b029fdf
MH
1023
1024 study;
1025
1026And in gdb do:
1027
1028 (gdb) break Perl_pp_study
1029
9b22382a 1030And then step until you hit what you're
73013070 1031looking for. This works well in a loop
8b029fdf
MH
1032if you want to only break at certain iterations:
1033
1034 for my $c (1..100) {
1035 study if $c == 50;
1036 }
1037
1038=head2 Using gdb to look at what the parser/lexer are doing
1039
73013070 1040If you want to see what perl is doing when parsing/lexing your code, you can
72b22e55 1041use C<BEGIN {}>:
8b029fdf
MH
1042
1043 print "Before\n";
1044 BEGIN { study; }
1045 print "After\n";
1046
1047And in gdb:
1048
1049 (gdb) break Perl_pp_study
1050
1051If you want to see what the parser/lexer is doing inside of C<if> blocks and
1052the like you need to be a little trickier:
1053
73013070 1054 if ($a && $b && do { BEGIN { study } 1 } && $c) { ... }
8b029fdf 1055
04c692a8
DR
1056=head1 SOURCE CODE STATIC ANALYSIS
1057
1058Various tools exist for analysing C source code B<statically>, as
9b22382a 1059opposed to B<dynamically>, that is, without executing the code. It is
04c692a8
DR
1060possible to detect resource leaks, undefined behaviour, type
1061mismatches, portability problems, code paths that would cause illegal
1062memory accesses, and other similar problems by just parsing the C code
1063and looking at the resulting graph, what does it tell about the
9b22382a 1064execution and data flows. As a matter of fact, this is exactly how C
04c692a8
DR
1065compilers know to give warnings about dubious code.
1066
c707756e 1067=head2 lint
04c692a8
DR
1068
1069The good old C code quality inspector, C<lint>, is available in several
1070platforms, but please be aware that there are several different
1071implementations of it by different vendors, which means that the flags
1072are not identical across different platforms.
1073
c707756e 1074There is a C<lint> target in Makefile, but you may have to
04c692a8
DR
1075diddle with the flags (see above).
1076
1077=head2 Coverity
1078
4b05bc8e 1079Coverity (L<http://www.coverity.com/>) is a product similar to lint and as
04c692a8
DR
1080a testbed for their product they periodically check several open source
1081projects, and they give out accounts to open source developers to the
1082defect databases.
1083
d3c1eddb
JH
1084There is Coverity setup for the perl5 project:
1085L<https://scan.coverity.com/projects/perl5>
1086
a72f2680 1087=head2 HP-UX cadvise (Code Advisor)
65c4791f
JH
1088
1089HP has a C/C++ static analyzer product for HP-UX caller Code Advisor.
1090(Link not given here because the URL is horribly long and seems horribly
1091unstable; use the search engine of your choice to find it.) The use of
1092the C<cadvise_cc> recipe with C<Configure ... -Dcc=./cadvise_cc>
1093(see cadvise "User Guide") is recommended; as is the use of C<+wall>.
1094
04c692a8
DR
1095=head2 cpd (cut-and-paste detector)
1096
9b22382a 1097The cpd tool detects cut-and-paste coding. If one instance of the
04c692a8 1098cut-and-pasted code changes, all the other spots should probably be
9b22382a 1099changed, too. Therefore such code should probably be turned into a
04c692a8
DR
1100subroutine or a macro.
1101
4b05bc8e
KW
1102cpd (L<http://pmd.sourceforge.net/cpd.html>) is part of the pmd project
1103(L<http://pmd.sourceforge.net/>). pmd was originally written for static
04c692a8
DR
1104analysis of Java code, but later the cpd part of it was extended to
1105parse also C and C++.
1106
1107Download the pmd-bin-X.Y.zip () from the SourceForge site, extract the
1108pmd-X.Y.jar from it, and then run that on source code thusly:
1109
0cbf2b31
FC
1110 java -cp pmd-X.Y.jar net.sourceforge.pmd.cpd.CPD \
1111 --minimum-tokens 100 --files /some/where/src --language c > cpd.txt
04c692a8
DR
1112
1113You may run into memory limits, in which case you should use the -Xmx
1114option:
1115
1116 java -Xmx512M ...
1117
1118=head2 gcc warnings
1119
1120Though much can be written about the inconsistency and coverage
1121problems of gcc warnings (like C<-Wall> not meaning "all the warnings",
1122or some common portability problems not being covered by C<-Wall>, or
1123C<-ansi> and C<-pedantic> both being a poorly defined collection of
1124warnings, and so forth), gcc is still a useful tool in keeping our
1125coding nose clean.
1126
1127The C<-Wall> is by default on.
1128
1129The C<-ansi> (and its sidekick, C<-pedantic>) would be nice to be on
1130always, but unfortunately they are not safe on all platforms, they can
1131for example cause fatal conflicts with the system headers (Solaris
9b22382a 1132being a prime example). If Configure C<-Dgccansipedantic> is used, the
04c692a8
DR
1133C<cflags> frontend selects C<-ansi -pedantic> for the platforms where
1134they are known to be safe.
1135
2884c977 1136The following extra flags are added:
04c692a8
DR
1137
1138=over 4
1139
1140=item *
1141
1142C<-Wendif-labels>
1143
1144=item *
1145
1146C<-Wextra>
1147
1148=item *
1149
2884c977
DIM
1150C<-Wc++-compat>
1151
1152=item *
1153
1154C<-Wwrite-strings>
1155
1156=item *
1157
1158C<-Werror=declaration-after-statement>
04c692a8 1159
5997475b
DIM
1160=item *
1161
1162C<-Werror=pointer-arith>
1163
04c692a8
DR
1164=back
1165
1166The following flags would be nice to have but they would first need
1167their own Augean stablemaster:
1168
1169=over 4
1170
1171=item *
1172
04c692a8
DR
1173C<-Wshadow>
1174
1175=item *
1176
1177C<-Wstrict-prototypes>
1178
1179=back
1180
1181The C<-Wtraditional> is another example of the annoying tendency of gcc
1182to bundle a lot of warnings under one switch (it would be impossible to
1183deploy in practice because it would complain a lot) but it does contain
1184some warnings that would be beneficial to have available on their own,
1185such as the warning about string constants inside macros containing the
1186macro arguments: this behaved differently pre-ANSI than it does in
1187ANSI, and some C compilers are still in transition, AIX being an
1188example.
1189
1190=head2 Warnings of other C compilers
1191
1192Other C compilers (yes, there B<are> other C compilers than gcc) often
1193have their "strict ANSI" or "strict ANSI with some portability
1194extensions" modes on, like for example the Sun Workshop has its C<-Xa>
1195mode on (though implicitly), or the DEC (these days, HP...) has its
1196C<-std1> mode on.
1197
1198=head1 MEMORY DEBUGGERS
1199
d1fd4856
VP
1200B<NOTE 1>: Running under older memory debuggers such as Purify,
1201valgrind or Third Degree greatly slows down the execution: seconds
9b22382a 1202become minutes, minutes become hours. For example as of Perl 5.8.1, the
04c692a8 1203ext/Encode/t/Unicode.t takes extraordinarily long to complete under
9b22382a
FC
1204e.g. Purify, Third Degree, and valgrind. Under valgrind it takes more
1205than six hours, even on a snappy computer. The said test must be doing
1206something that is quite unfriendly for memory debuggers. If you don't
04c692a8 1207feel like waiting, that you can simply kill away the perl process.
d1fd4856
VP
1208Roughly valgrind slows down execution by factor 10, AddressSanitizer by
1209factor 2.
04c692a8
DR
1210
1211B<NOTE 2>: To minimize the number of memory leak false alarms (see
1212L</PERL_DESTRUCT_LEVEL> for more information), you have to set the
9b22382a 1213environment variable PERL_DESTRUCT_LEVEL to 2. For example, like this:
04c692a8
DR
1214
1215 env PERL_DESTRUCT_LEVEL=2 valgrind ./perl -Ilib ...
1216
1217B<NOTE 3>: There are known memory leaks when there are compile-time
1218errors within eval or require, seeing C<S_doeval> in the call stack is
9b22382a 1219a good sign of these. Fixing these leaks is non-trivial, unfortunately,
04c692a8
DR
1220but they must be fixed eventually.
1221
1222B<NOTE 4>: L<DynaLoader> will not clean up after itself completely
1223unless Perl is built with the Configure option
1224C<-Accflags=-DDL_UNLOAD_ALL_AT_EXIT>.
1225
04c692a8
DR
1226=head2 valgrind
1227
d1fd4856 1228The valgrind tool can be used to find out both memory leaks and illegal
9b22382a 1229heap memory accesses. As of version 3.3.0, Valgrind only supports Linux
0263e49a 1230on x86, x86-64 and PowerPC and Darwin (OS X) on x86 and x86-64. The
d1fd4856 1231special "test.valgrind" target can be used to run the tests under
9b22382a 1232valgrind. Found errors and memory leaks are logged in files named
037ab3f1
MH
1233F<testfile.valgrind> and by default output is displayed inline.
1234
1235Example usage:
1236
1237 make test.valgrind
1238
1239Since valgrind adds significant overhead, tests will take much longer to
1240run. The valgrind tests support being run in parallel to help with this:
1241
1242 TEST_JOBS=9 make test.valgrind
1243
1244Note that the above two invocations will be very verbose as reachable
1245memory and leak-checking is enabled by default. If you want to just see
1246pure errors, try:
73013070 1247
037ab3f1
MH
1248 VG_OPTS='-q --leak-check=no --show-reachable=no' TEST_JOBS=9 \
1249 make test.valgrind
04c692a8
DR
1250
1251Valgrind also provides a cachegrind tool, invoked on perl as:
1252
1253 VG_OPTS=--tool=cachegrind make test.valgrind
1254
1255As system libraries (most notably glibc) are also triggering errors,
9b22382a 1256valgrind allows to suppress such errors using suppression files. The
04c692a8 1257default suppression file that comes with valgrind already catches a lot
9b22382a 1258of them. Some additional suppressions are defined in F<t/perl.supp>.
04c692a8
DR
1259
1260To get valgrind and for more information see
1261
0061d4fa 1262 http://valgrind.org/
04c692a8 1263
81c3bbe7
RU
1264=head2 AddressSanitizer
1265
6babf542 1266AddressSanitizer ("ASan") consists of a compiler instrumentation module
53b3ccc9
RL
1267and a run-time C<malloc> library. ASan is available for a variety of
1268architectures, operating systems, and compilers (see project link below).
1269It checks for unsafe memory usage, such as use after free and buffer
1270overflow conditions, and is fast enough that you can easily compile your
6babf542
RL
1271debugging or optimized perl with it. Modern versions of ASan check for
1272memory leaks by default on most platforms, otherwise (e.g. x86_64 OS X)
1273this feature can be enabled via C<ASAN_OPTIONS=detect_leaks=1>.
1274
81c3bbe7 1275
8a64fbaa
VP
1276To build perl with AddressSanitizer, your Configure invocation should
1277look like:
81c3bbe7 1278
e8596d90 1279 sh Configure -des -Dcc=clang \
6babf542
RL
1280 -Accflags=-fsanitize=address -Aldflags=-fsanitize=address \
1281 -Alddlflags=-shared\ -fsanitize=address \
1282 -fsanitize-blacklist=`pwd`/asan_ignore
81c3bbe7
RU
1283
1284where these arguments mean:
1285
1286=over 4
1287
1288=item * -Dcc=clang
1289
8a64fbaa
VP
1290This should be replaced by the full path to your clang executable if it
1291is not in your path.
81c3bbe7 1292
6babf542 1293=item * -Accflags=-fsanitize=address
81c3bbe7 1294
8a64fbaa 1295Compile perl and extensions sources with AddressSanitizer.
81c3bbe7 1296
6babf542 1297=item * -Aldflags=-fsanitize=address
81c3bbe7 1298
8a64fbaa 1299Link the perl executable with AddressSanitizer.
81c3bbe7 1300
6babf542 1301=item * -Alddlflags=-shared\ -fsanitize=address
81c3bbe7 1302
9b22382a 1303Link dynamic extensions with AddressSanitizer. You must manually
e8596d90
VP
1304specify C<-shared> because using C<-Alddlflags=-shared> will prevent
1305Configure from setting a default value for C<lddlflags>, which usually
5dfc6e97 1306contains C<-shared> (at least on Linux).
81c3bbe7 1307
6babf542
RL
1308=item * -fsanitize-blacklist=`pwd`/asan_ignore
1309
1310AddressSanitizer will ignore functions listed in the C<asan_ignore>
1311file. (This file should contain a short explanation of why each of
1312the functions is listed.)
1313
81c3bbe7
RU
1314=back
1315
8a64fbaa 1316See also
a856e9cc 1317L<https://github.com/google/sanitizers/wiki/AddressSanitizer>.
81c3bbe7
RU
1318
1319
04c692a8
DR
1320=head1 PROFILING
1321
1322Depending on your platform there are various ways of profiling Perl.
1323
1324There are two commonly used techniques of profiling executables:
1325I<statistical time-sampling> and I<basic-block counting>.
1326
1327The first method takes periodically samples of the CPU program counter,
1328and since the program counter can be correlated with the code generated
1329for functions, we get a statistical view of in which functions the
9b22382a 1330program is spending its time. The caveats are that very small/fast
04c692a8
DR
1331functions have lower probability of showing up in the profile, and that
1332periodically interrupting the program (this is usually done rather
1333frequently, in the scale of milliseconds) imposes an additional
9b22382a 1334overhead that may skew the results. The first problem can be alleviated
04c692a8
DR
1335by running the code for longer (in general this is a good idea for
1336profiling), the second problem is usually kept in guard by the
1337profiling tools themselves.
1338
1339The second method divides up the generated code into I<basic blocks>.
1340Basic blocks are sections of code that are entered only in the
9b22382a
FC
1341beginning and exited only at the end. For example, a conditional jump
1342starts a basic block. Basic block profiling usually works by
04c692a8 1343I<instrumenting> the code by adding I<enter basic block #nnnn>
9b22382a
FC
1344book-keeping code to the generated code. During the execution of the
1345code the basic block counters are then updated appropriately. The
04c692a8
DR
1346caveat is that the added extra code can skew the results: again, the
1347profiling tools usually try to factor their own effects out of the
1348results.
1349
1350=head2 Gprof Profiling
1351
e2aed43d 1352I<gprof> is a profiling tool available in many Unix platforms which
9b22382a
FC
1353uses I<statistical time-sampling>. You can build a profiled version of
1354F<perl> by compiling using gcc with the flag C<-pg>. Either edit
1355F<config.sh> or re-run F<Configure>. Running the profiled version of
e2aed43d
NC
1356Perl will create an output file called F<gmon.out> which contains the
1357profiling data collected during the execution.
04c692a8 1358
e2aed43d
NC
1359quick hint:
1360
1361 $ sh Configure -des -Dusedevel -Accflags='-pg' \
1362 -Aldflags='-pg' -Alddlflags='-pg -shared' \
1363 && make perl
1364 $ ./perl ... # creates gmon.out in current directory
1365 $ gprof ./perl > out
1366 $ less out
1367
1368(you probably need to add C<-shared> to the <-Alddlflags> line until RT
1369#118199 is resolved)
04c692a8 1370
e2aed43d
NC
1371The F<gprof> tool can then display the collected data in various ways.
1372Usually F<gprof> understands the following options:
04c692a8
DR
1373
1374=over 4
1375
1376=item * -a
1377
1378Suppress statically defined functions from the profile.
1379
1380=item * -b
1381
1382Suppress the verbose descriptions in the profile.
1383
1384=item * -e routine
1385
1386Exclude the given routine and its descendants from the profile.
1387
1388=item * -f routine
1389
1390Display only the given routine and its descendants in the profile.
1391
1392=item * -s
1393
1394Generate a summary file called F<gmon.sum> which then may be given to
1395subsequent gprof runs to accumulate data over several runs.
1396
1397=item * -z
1398
1399Display routines that have zero usage.
1400
1401=back
1402
1403For more detailed explanation of the available commands and output
e2aed43d 1404formats, see your own local documentation of F<gprof>.
04c692a8 1405
e2aed43d 1406=head2 GCC gcov Profiling
04c692a8 1407
e2aed43d
NC
1408I<basic block profiling> is officially available in gcc 3.0 and later.
1409You can build a profiled version of F<perl> by compiling using gcc with
9b22382a 1410the flags C<-fprofile-arcs -ftest-coverage>. Either edit F<config.sh>
e2aed43d 1411or re-run F<Configure>.
04c692a8 1412
e2aed43d 1413quick hint:
04c692a8 1414
e2aed43d
NC
1415 $ sh Configure -des -Dusedevel -Doptimize='-g' \
1416 -Accflags='-fprofile-arcs -ftest-coverage' \
1417 -Aldflags='-fprofile-arcs -ftest-coverage' \
1418 -Alddlflags='-fprofile-arcs -ftest-coverage -shared' \
1419 && make perl
1420 $ rm -f regexec.c.gcov regexec.gcda
1421 $ ./perl ...
1422 $ gcov regexec.c
1423 $ less regexec.c.gcov
04c692a8 1424
e2aed43d
NC
1425(you probably need to add C<-shared> to the <-Alddlflags> line until RT
1426#118199 is resolved)
04c692a8
DR
1427
1428Running the profiled version of Perl will cause profile output to be
9b22382a 1429generated. For each source file an accompanying F<.gcda> file will be
04c692a8
DR
1430created.
1431
e2aed43d 1432To display the results you use the I<gcov> utility (which should be
9b22382a 1433installed if you have gcc 3.0 or newer installed). F<gcov> is run on
04c692a8
DR
1434source code files, like this
1435
1436 gcov sv.c
1437
9b22382a 1438which will cause F<sv.c.gcov> to be created. The F<.gcov> files contain
04c692a8 1439the source code annotated with relative frequencies of execution
9b22382a 1440indicated by "#" markers. If you want to generate F<.gcov> files for
6f134219
NC
1441all profiled object files, you can run something like this:
1442
1443 for file in `find . -name \*.gcno`
1444 do sh -c "cd `dirname $file` && gcov `basename $file .gcno`"
1445 done
04c692a8
DR
1446
1447Useful options of F<gcov> include C<-b> which will summarise the basic
1448block, branch, and function call coverage, and C<-c> which instead of
9b22382a 1449relative frequencies will use the actual counts. For more information
04c692a8 1450on the use of F<gcov> and basic block profiling with gcc, see the
9b22382a 1451latest GNU CC manual. As of gcc 4.8, this is at
e2aed43d 1452L<http://gcc.gnu.org/onlinedocs/gcc/Gcov-Intro.html#Gcov-Intro>
04c692a8
DR
1453
1454=head1 MISCELLANEOUS TRICKS
1455
1456=head2 PERL_DESTRUCT_LEVEL
1457
1458If you want to run any of the tests yourself manually using e.g.
4dd56148
NC
1459valgrind, please note that by default perl B<does not> explicitly
1460cleanup all the memory it has allocated (such as global memory arenas)
1461but instead lets the exit() of the whole program "take care" of such
1462allocations, also known as "global destruction of objects".
04c692a8
DR
1463
1464There is a way to tell perl to do complete cleanup: set the environment
9b22382a 1465variable PERL_DESTRUCT_LEVEL to a non-zero value. The t/TEST wrapper
04c692a8 1466does set this to 2, and this is what you need to do too, if you don't
f01ecde8 1467want to see the "global leaks": For example, for running under valgrind
04c692a8 1468
a63ef199 1469 env PERL_DESTRUCT_LEVEL=2 valgrind ./perl -Ilib t/foo/bar.t
04c692a8
DR
1470
1471(Note: the mod_perl apache module uses also this environment variable
9b22382a
FC
1472for its own purposes and extended its semantics. Refer to the mod_perl
1473documentation for more information. Also, spawned threads do the
04c692a8
DR
1474equivalent of setting this variable to the value 1.)
1475
1476If, at the end of a run you get the message I<N scalars leaked>, you
b4986286
DM
1477can recompile with C<-DDEBUG_LEAKING_SCALARS>,
1478(C<Configure -Accflags=-DDEBUG_LEAKING_SCALARS>), which will cause the
04c692a8 1479addresses of all those leaked SVs to be dumped along with details as to
9b22382a
FC
1480where each SV was originally allocated. This information is also
1481displayed by Devel::Peek. Note that the extra details recorded with
04c692a8 1482each SV increases memory usage, so it shouldn't be used in production
9b22382a 1483environments. It also converts C<new_SV()> from a macro into a real
04c692a8
DR
1484function, so you can use your favourite debugger to discover where
1485those pesky SVs were allocated.
1486
1487If you see that you're leaking memory at runtime, but neither valgrind
1488nor C<-DDEBUG_LEAKING_SCALARS> will find anything, you're probably
1489leaking SVs that are still reachable and will be properly cleaned up
9b22382a
FC
1490during destruction of the interpreter. In such cases, using the C<-Dm>
1491switch can point you to the source of the leak. If the executable was
04c692a8 1492built with C<-DDEBUG_LEAKING_SCALARS>, C<-Dm> will output SV
9b22382a 1493allocations in addition to memory allocations. Each SV allocation has a
04c692a8 1494distinct serial number that will be written on creation and destruction
9b22382a 1495of the SV. So if you're executing the leaking code in a loop, you need
04c692a8 1496to look for SVs that are created, but never destroyed between each
9b22382a 1497cycle. If such an SV is found, set a conditional breakpoint within
04c692a8 1498C<new_SV()> and make it break only when C<PL_sv_serial> is equal to the
9b22382a 1499serial number of the leaking SV. Then you will catch the interpreter in
04c692a8
DR
1500exactly the state where the leaking SV is allocated, which is
1501sufficient in many cases to find the source of the leak.
1502
1503As C<-Dm> is using the PerlIO layer for output, it will by itself
9b22382a 1504allocate quite a bunch of SVs, which are hidden to avoid recursion. You
04c692a8
DR
1505can bypass the PerlIO layer if you use the SV logging provided by
1506C<-DPERL_MEM_LOG> instead.
1507
1508=head2 PERL_MEM_LOG
1509
6fb87544
MH
1510If compiled with C<-DPERL_MEM_LOG> (C<-Accflags=-DPERL_MEM_LOG>), both
1511memory and SV allocations go through logging functions, which is
1512handy for breakpoint setting.
04c692a8 1513
6fb87544
MH
1514Unless C<-DPERL_MEM_LOG_NOIMPL> (C<-Accflags=-DPERL_MEM_LOG_NOIMPL>) is
1515also compiled, the logging functions read $ENV{PERL_MEM_LOG} to
1516determine whether to log the event, and if so how:
04c692a8 1517
a63ef199
SF
1518 $ENV{PERL_MEM_LOG} =~ /m/ Log all memory ops
1519 $ENV{PERL_MEM_LOG} =~ /s/ Log all SV ops
1520 $ENV{PERL_MEM_LOG} =~ /t/ include timestamp in Log
1521 $ENV{PERL_MEM_LOG} =~ /^(\d+)/ write to FD given (default is 2)
04c692a8
DR
1522
1523Memory logging is somewhat similar to C<-Dm> but is independent of
1524C<-DDEBUGGING>, and at a higher level; all uses of Newx(), Renew(), and
1525Safefree() are logged with the caller's source code file and line
9b22382a
FC
1526number (and C function name, if supported by the C compiler). In
1527contrast, C<-Dm> is directly at the point of C<malloc()>. SV logging is
04c692a8
DR
1528similar.
1529
1530Since the logging doesn't use PerlIO, all SV allocations are logged and
9b22382a 1531no extra SV allocations are introduced by enabling the logging. If
04c692a8
DR
1532compiled with C<-DDEBUG_LEAKING_SCALARS>, the serial number for each SV
1533allocation is also logged.
1534
1535=head2 DDD over gdb
1536
1537Those debugging perl with the DDD frontend over gdb may find the
1538following useful:
1539
1540You can extend the data conversion shortcuts menu, so for example you
1541can display an SV's IV value with one click, without doing any typing.
1542To do that simply edit ~/.ddd/init file and add after:
1543
1544 ! Display shortcuts.
1545 Ddd*gdbDisplayShortcuts: \
1546 /t () // Convert to Bin\n\
1547 /d () // Convert to Dec\n\
1548 /x () // Convert to Hex\n\
1549 /o () // Convert to Oct(\n\
1550
1551the following two lines:
1552
1553 ((XPV*) (())->sv_any )->xpv_pv // 2pvx\n\
1554 ((XPVIV*) (())->sv_any )->xiv_iv // 2ivx
1555
1556so now you can do ivx and pvx lookups or you can plug there the sv_peek
1557"conversion":
1558
1559 Perl_sv_peek(my_perl, (SV*)()) // sv_peek
1560
9b22382a 1561(The my_perl is for threaded builds.) Just remember that every line,
04c692a8
DR
1562but the last one, should end with \n\
1563
1564Alternatively edit the init file interactively via: 3rd mouse button ->
1565New Display -> Edit Menu
1566
1567Note: you can define up to 20 conversion shortcuts in the gdb section.
1568
470dd224
JH
1569=head2 C backtrace
1570
0762e42f
JH
1571On some platforms Perl supports retrieving the C level backtrace
1572(similar to what symbolic debuggers like gdb do).
470dd224
JH
1573
1574The backtrace returns the stack trace of the C call frames,
1575with the symbol names (function names), the object names (like "perl"),
1576and if it can, also the source code locations (file:line).
1577
0762e42f
JH
1578The supported platforms are Linux, and OS X (some *BSD might
1579work at least partly, but they have not yet been tested).
1580
1581This feature hasn't been tested with multiple threads, but it will
1582only show the backtrace of the thread doing the backtracing.
470dd224
JH
1583
1584The feature needs to be enabled with C<Configure -Dusecbacktrace>.
1585
0762e42f
JH
1586The C<-Dusecbacktrace> also enables keeping the debug information when
1587compiling/linking (often: C<-g>). Many compilers/linkers do support
1588having both optimization and keeping the debug information. The debug
1589information is needed for the symbol names and the source locations.
1590
1591Static functions might not be visible for the backtrace.
470dd224
JH
1592
1593Source code locations, even if available, can often be missing or
0762e42f
JH
1594misleading if the compiler has e.g. inlined code. Optimizer can
1595make matching the source code and the object code quite challenging.
470dd224
JH
1596
1597=over 4
1598
1599=item Linux
1600
59b3baca 1601You B<must> have the BFD (-lbfd) library installed, otherwise C<perl> will
0762e42f 1602fail to link. The BFD is usually distributed as part of the GNU binutils.
470dd224
JH
1603
1604Summary: C<Configure ... -Dusecbacktrace>
1605and you need C<-lbfd>.
1606
1607=item OS X
1608
0762e42f
JH
1609The source code locations are supported B<only> if you have
1610the Developer Tools installed. (BFD is B<not> needed.)
470dd224
JH
1611
1612Summary: C<Configure ... -Dusecbacktrace>
1613and installing the Developer Tools would be good.
1614
1615=back
1616
1617Optionally, for trying out the feature, you may want to enable
0762e42f
JH
1618automatic dumping of the backtrace just before a warning or croak (die)
1619message is emitted, by adding C<-Accflags=-DUSE_C_BACKTRACE_ON_ERROR>
1620for Configure.
470dd224
JH
1621
1622Unless the above additional feature is enabled, nothing about the
1623backtrace functionality is visible, except for the Perl/XS level.
1624
1625Furthermore, even if you have enabled this feature to be compiled,
1626you need to enable it in runtime with an environment variable:
0762e42f
JH
1627C<PERL_C_BACKTRACE_ON_ERROR=10>. It must be an integer higher
1628than zero, telling the desired frame count.
470dd224
JH
1629
1630Retrieving the backtrace from Perl level (using for example an XS
1631extension) would be much less exciting than one would hope: normally
1632you would see C<runops>, C<entersub>, and not much else. This API is
1633intended to be called B<from within> the Perl implementation, not from
1634Perl level execution.
1635
0762e42f 1636The C API for the backtrace is as follows:
470dd224
JH
1637
1638=over 4
1639
1640=item get_c_backtrace
1641
1642=item free_c_backtrace
1643
1644=item get_c_backtrace_dump
1645
1646=item dump_c_backtrace
1647
1648=back
1649
04c692a8
DR
1650=head2 Poison
1651
1652If you see in a debugger a memory area mysteriously full of 0xABABABAB
1653or 0xEFEFEFEF, you may be seeing the effect of the Poison() macros, see
1654L<perlclib>.
1655
1656=head2 Read-only optrees
1657
9b22382a 1658Under ithreads the optree is read only. If you want to enforce this, to
04c692a8 1659check for write accesses from buggy code, compile with
91fc0422
FC
1660C<-Accflags=-DPERL_DEBUG_READONLY_OPS>
1661to enable code that allocates op memory
4dd56148
NC
1662via C<mmap>, and sets it read-only when it is attached to a subroutine.
1663Any write access to an op results in a C<SIGBUS> and abort.
04c692a8
DR
1664
1665This code is intended for development only, and may not be portable
9b22382a
FC
1666even to all Unix variants. Also, it is an 80% solution, in that it
1667isn't able to make all ops read only. Specifically it does not apply to
4dd56148 1668op slabs belonging to C<BEGIN> blocks.
04c692a8 1669
4dd56148
NC
1670However, as an 80% solution it is still effective, as it has caught
1671bugs in the past.
04c692a8 1672
f789f6a4
FC
1673=head2 When is a bool not a bool?
1674
1675On pre-C99 compilers, C<bool> is defined as equivalent to C<char>.
b59008ae
AC
1676Consequently assignment of any larger type to a C<bool> is unsafe and may be
1677truncated. The C<cBOOL> macro exists to cast it correctly; you may also find
1678that using it is shorter and clearer than writing out the equivalent
1679conditional expression longhand.
f789f6a4
FC
1680
1681On those platforms and compilers where C<bool> really is a boolean (C++,
1682C99), it is easy to forget the cast. You can force C<bool> to be a C<char>
1683by compiling with C<-Accflags=-DPERL_BOOL_AS_CHAR>. You may also wish to
50e4f4d4
CB
1684run C<Configure> with something like
1685
cbc13c3d 1686 -Accflags='-Wconversion -Wno-sign-conversion -Wno-shorten-64-to-32'
50e4f4d4
CB
1687
1688or your compiler's equivalent to make it easier to spot any unsafe truncations
1689that show up.
f789f6a4 1690
b59008ae
AC
1691The C<TRUE> and C<FALSE> macros are available for situations where using them
1692would clarify intent. (But they always just mean the same as the integers 1 and
16930 regardless, so using them isn't compulsory.)
1694
04c692a8
DR
1695=head2 The .i Targets
1696
1697You can expand the macros in a F<foo.c> file by saying
1698
1699 make foo.i
1700
d1fd4856
VP
1701which will expand the macros using cpp. Don't be scared by the
1702results.
04c692a8
DR
1703
1704=head1 AUTHOR
1705
1706This document was originally written by Nathan Torkington, and is
1707maintained by the perl5-porters mailing list.