This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Fix GH Issue #19472: read warnings from open($fh,">",\(my $x))
[perl5.git] / pod / perlhacktips.pod
CommitLineData
0061d4fa 1
04c692a8
DR
2=encoding utf8
3
4=for comment
5Consistent formatting of this file is achieved with:
6 perl ./Porting/podtidy pod/perlhacktips.pod
7
8=head1 NAME
9
10perlhacktips - Tips for Perl core C code hacking
11
12=head1 DESCRIPTION
13
14This document will help you learn the best way to go about hacking on
9b22382a 15the Perl core C code. It covers common problems, debugging, profiling,
04c692a8
DR
16and more.
17
18If you haven't read L<perlhack> and L<perlhacktut> yet, you might want
19to do that first.
20
21=head1 COMMON PROBLEMS
22
a66ca998
NC
23Perl source now permits some specific C99 features which we know are
24supported by all platforms, but mostly plays by ANSI C89 rules.
04c692a8
DR
25You don't care about some particular platform having broken Perl? I
26hear there is still a strong demand for J2EE programmers.
27
28=head2 Perl environment problems
29
30=over 4
31
32=item *
33
34Not compiling with threading
35
36Compiling with threading (-Duseithreads) completely rewrites the
9b22382a 37function prototypes of Perl. You better try your changes with that.
04c692a8
DR
38Related to this is the difference between "Perl_-less" and "Perl_-ly"
39APIs, for example:
40
41 Perl_sv_setiv(aTHX_ ...);
42 sv_setiv(...);
43
44The first one explicitly passes in the context, which is needed for
9b22382a
FC
45e.g. threaded builds. The second one does that implicitly; do not get
46them mixed. If you are not passing in a aTHX_, you will need to do a
c91f661c 47dTHX as the first thing in the function.
04c692a8
DR
48
49See L<perlguts/"How multiple interpreters and concurrency are
50supported"> for further discussion about context.
51
52=item *
53
54Not compiling with -DDEBUGGING
55
56The DEBUGGING define exposes more code to the compiler, therefore more
9b22382a 57ways for things to go wrong. You should try it.
04c692a8
DR
58
59=item *
60
61Introducing (non-read-only) globals
62
63Do not introduce any modifiable globals, truly global or file static.
64They are bad form and complicate multithreading and other forms of
9b22382a 65concurrency. The right way is to introduce them as new interpreter
04c692a8
DR
66variables, see F<intrpvar.h> (at the very end for binary
67compatibility).
68
69Introducing read-only (const) globals is okay, as long as you verify
70with e.g. C<nm libperl.a|egrep -v ' [TURtr] '> (if your C<nm> has
9b22382a 71BSD-style output) that the data you added really is read-only. (If it
04c692a8
DR
72is, it shouldn't show up in the output of that command.)
73
74If you want to have static strings, make them constant:
75
76 static const char etc[] = "...";
77
78If you want to have arrays of constant strings, note carefully the
79right combination of C<const>s:
80
81 static const char * const yippee[] =
a63ef199 82 {"hi", "ho", "silver"};
04c692a8 83
04c692a8
DR
84=item *
85
86Not exporting your new function
87
88Some platforms (Win32, AIX, VMS, OS/2, to name a few) require any
89function that is part of the public API (the shared Perl library) to be
9b22382a 90explicitly marked as exported. See the discussion about F<embed.pl> in
04c692a8
DR
91L<perlguts>.
92
93=item *
94
95Exporting your new function
96
97The new shiny result of either genuine new functionality or your
9b22382a 98arduous refactoring is now ready and correctly exported. So what could
04c692a8
DR
99possibly go wrong?
100
101Maybe simply that your function did not need to be exported in the
9b22382a 102first place. Perl has a long and not so glorious history of exporting
04c692a8
DR
103functions that it should not have.
104
105If the function is used only inside one source code file, make it
9b22382a 106static. See the discussion about F<embed.pl> in L<perlguts>.
04c692a8
DR
107
108If the function is used across several files, but intended only for
109Perl's internal use (and this should be the common case), do not export
9b22382a 110it to the public API. See the discussion about F<embed.pl> in
04c692a8
DR
111L<perlguts>.
112
113=back
114
a66ca998
NC
115=head2 C99
116
117Starting from 5.35.5 we now permit some C99 features in the core C source.
118However, code in dual life extensions still needs to be C89 only, because it
119needs to compile against earlier version of Perl running on older platforms.
120Also note that our headers need to also be valid as C++, because XS extensions
121written in C++ need to include them, hence I<member structure initialisers>
122can't be used in headers.
123
124C99 support is still far from complete on all platforms we currently support.
125As a baseline we can only assume C89 semantics with the specific C99 features
126described below, which we've verified work everywhere. It's fine to probe for
127additional C99 features and use them where available, providing there is also a
128fallback for compilers that don't support the feature. For example, we use C11
129thread local storage when available, but fall back to POSIX thread specific
130APIs otherwise, and we use C<char> for booleans if C<< <stdbool.h> >> isn't
131available.
132
133Code can use (and rely on) the following C99 features being present
134
135=over
136
137=item *
138
139mixed declarations and code
140
141=item *
142
14364 bit integer types
144
145For consistency with the existing source code, use the typedefs C<I64> and
146C<U64>, instead of using C<long long> and C<unsigned long long> directly.
147
148=item *
149
150variadic macros
151
152 void greet(char *file, unsigned int line, char *format, ...);
153 #define logged_greet(...) greet(__FILE__, __LINE__, __VA_ARGS__);
154
155Note that C<__VA_OPT__> is a gcc extension not yet in any published standard.
156
157=item *
158
159declarations in for loops
160
161 for (const char *p = message; *p; ++p) {
162 putchar(*p);
163 }
164
165=item *
166
167member structure initialisers
168
169But not in headers, as support was only added to C++ relatively recently.
170
171Hence this is fine in C and XS code, but not headers:
172
173 struct message {
174 char *action;
175 char *target;
176 };
177
178 struct message mcguffin = {
179 .target = "member structure initialisers",
180 .action = "Built"
181 };
182
183=item *
184
185flexible array members
186
187This is standards conformant:
188
189 struct greeting {
190 unsigned int len;
191 char message[];
192 };
193
194However, the source code already uses the "unwarranted chumminess with the
195compiler" hack in many places:
196
197 struct greeting {
198 unsigned int len;
199 char message[1];
200 };
201
202Strictly it B<is> undefined behaviour accessing beyond C<message[0]>, but this
203has been a commonly used hack since K&R times, and using it hasn't been a
204practical issue anywhere (in the perl source or any other common C code).
205Hence it's unclear what we would gain from actively changing to the C99
206approach.
207
208=item *
209
210C<//> comments
211
212All compilers we tested support their use. Not all humans we tested support
213their use.
214
215=back
216
217Code explicitly should not use any other C99 features. For example
218
219=over 4
220
221=item *
222
223variable length arrays
224
225Not supported by B<any> MSVC, and this is not going to change.
226
227Even "variable" length arrays where the variable is a constant expression
228are syntax errors under MSVC.
229
230=item *
231
232C99 types in C<< <stdint.h> >>
233
234Use C<PERL_INT_FAST8_T> etc as defined in F<handy.h>
235
236=item *
237
238C99 format strings in C<< <inttypes> >>
239
240C<snprintf> in the VMS libc only added support for C<PRIdN> etc very recently,
241meaning that there are live supported installations without this, or formats
242such as C<%zu>.
243
244(perl's C<sv_catpvf> etc use parser code code in C<sv.c>, which supports the
245C<z> modifier, along with perl-specific formats such as C<SVf>.)
246
247=back
248
135ed903
NC
249If you want to use a C99 feature not listed above then you need to do one of
250
251=over 4
252
253=item *
254
255Probe for it in F<Configure>, set a variable in F<config.sh>, and add fallback logic in the headers for platforms which don't have it.
256
257=item *
258
259Write test code and verify that it works on platforms we need to support, before relying on it unconditionally.
260
261=back
262
263Likely you want to repeat the same plan as we used to get the current C99
264feature set. See the message at https://markmail.org/thread/odr4fjrn72u2fkpz
265for the C99 probes we used before. Note that the two most "fussy" compilers
266appear to be MSVC and the vendor compiler on VMS. To date all the *nix
267compilers have been far more flexible in what they support.
268
a66ca998
NC
269On *nix platforms, F<Configure> attempts to set compiler flags appropriately.
270All vendor compilers that we tested defaulted to C99 (or C11) support.
271However, older versions of gcc default to C89, or permit I<most> C99 (with
272warnings), but forbid I<declarations in for loops> unless C<-std=gnu99> is
273added. The alternative C<-std=c99> B<might> seem better, but using it on some
274platforms can prevent C<< <unistd.h> >> declaring some prototypes being
275declared, which breaks the build. gcc's C<-ansi> flag implies C<-std=c89> so we
276can no longer set that, hence the Configure option C<-gccansipedantic> now only
277adds C<-pedantic>.
278
279The Perl core source code files (the ones at the top level of the source code
280distribution) are automatically compiled with as many as possible of the
281C<-std=gnu99>, C<-pedantic>, and a selection of C<-W> flags (see
282cflags.SH). Files in F<ext/> F<dist/> F<cpan/> etc are compiled with the same
283flags as the installed perl would use to compile XS extensions.
284
285Basically, it's safe to assume that F<Configure> and F<cflags.SH> have
286picked the best combination of flags for the version of gcc on the platform,
287and attempting to add more flags related to enforcing a C dialect will
288cause problems either locally, or on other systems that the code is shipped
289to.
290
291We believe that the C99 support in gcc 3.1 is good enough for us, but we don't
292have a 19 year old gcc handy to check this :-)
293If you have ancient vendor compilers that don't default to C99, the flags
294you might want to try are
295
296=over 4
297
298=item AIX
299
300C<-qlanglvl=stdc99>
301
302=item HP/UX
303
304C<-AC99>
305
306=item Solaris
307
308C<-xc99>
309
310=back
311
04c692a8
DR
312=head2 Portability problems
313
314The following are common causes of compilation and/or execution
9b22382a
FC
315failures, not common to Perl as such. The C FAQ is good bedtime
316reading. Please test your changes with as many C compilers and
04c692a8
DR
317platforms as possible; we will, anyway, and it's nice to save oneself
318from public embarrassment.
319
04c692a8 320Also study L<perlport> carefully to avoid any bad assumptions about the
eb9df707 321operating system, filesystems, character set, and so forth.
04c692a8 322
04c692a8
DR
323Do not assume an operating system indicates a certain compiler.
324
325=over 4
326
327=item *
328
329Casting pointers to integers or casting integers to pointers
330
331 void castaway(U8* p)
332 {
333 IV i = p;
334
335or
336
337 void castaway(U8* p)
338 {
339 IV i = (IV)p;
340
9b22382a
FC
341Both are bad, and broken, and unportable. Use the PTR2IV() macro that
342does it right. (Likewise, there are PTR2UV(), PTR2NV(), INT2PTR(), and
04c692a8
DR
343NUM2PTR().)
344
345=item *
346
28ffebaf 347Casting between function pointers and data pointers
04c692a8
DR
348
349Technically speaking casting between function pointers and data
350pointers is unportable and undefined, but practically speaking it seems
351to work, but you should use the FPTR2DPTR() and DPTR2FPTR() macros.
352Sometimes you can also play games with unions.
353
354=item *
355
356Assuming sizeof(int) == sizeof(long)
357
358There are platforms where longs are 64 bits, and platforms where ints
359are 64 bits, and while we are out to shock you, even platforms where
9b22382a 360shorts are 64 bits. This is all legal according to the C standard. (In
04c692a8
DR
361other words, "long long" is not a portable way to specify 64 bits, and
362"long long" is not even guaranteed to be any wider than "long".)
363
364Instead, use the definitions IV, UV, IVSIZE, I32SIZE, and so forth.
365Avoid things like I32 because they are B<not> guaranteed to be
366I<exactly> 32 bits, they are I<at least> 32 bits, nor are they
a66ca998
NC
367guaranteed to be B<int> or B<long>. If you explicitly need
36864-bit variables, use I64 and U64.
04c692a8
DR
369
370=item *
371
372Assuming one can dereference any type of pointer for any type of data
373
374 char *p = ...;
56bb4b7b 375 long pony = *(long *)p; /* BAD */
04c692a8
DR
376
377Many platforms, quite rightly so, will give you a core dump instead of
768312ab 378a pony if the p happens not to be correctly aligned.
04c692a8
DR
379
380=item *
381
382Lvalue casts
383
384 (int)*p = ...; /* BAD */
385
9b22382a 386Simply not portable. Get your lvalue to be of the right type, or maybe
04c692a8
DR
387use temporary variables, or dirty tricks with unions.
388
389=item *
390
391Assume B<anything> about structs (especially the ones you don't
392control, like the ones coming from the system headers)
393
394=over 8
395
396=item *
397
398That a certain field exists in a struct
399
400=item *
401
402That no other fields exist besides the ones you know of
403
404=item *
405
406That a field is of certain signedness, sizeof, or type
407
408=item *
409
410That the fields are in a certain order
411
412=over 8
413
414=item *
415
416While C guarantees the ordering specified in the struct definition,
417between different platforms the definitions might differ
418
419=back
420
421=item *
422
423That the sizeof(struct) or the alignments are the same everywhere
424
425=over 8
426
427=item *
428
429There might be padding bytes between the fields to align the fields -
430the bytes can be anything
431
432=item *
433
434Structs are required to be aligned to the maximum alignment required by
435the fields - which for native types is for usually equivalent to
436sizeof() of the field
437
438=back
439
440=back
441
442=item *
443
444Assuming the character set is ASCIIish
445
9b22382a 446Perl can compile and run under EBCDIC platforms. See L<perlebcdic>.
04c692a8
DR
447This is transparent for the most part, but because the character sets
448differ, you shouldn't use numeric (decimal, octal, nor hex) constants
eb9df707
KW
449to refer to characters. You can safely say C<'A'>, but not C<0x41>.
450You can safely say C<'\n'>, but not C<\012>. However, you can use
451macros defined in F<utf8.h> to specify any code point portably.
452C<LATIN1_TO_NATIVE(0xDF)> is going to be the code point that means
453LATIN SMALL LETTER SHARP S on whatever platform you are running on (on
454ASCII platforms it compiles without adding any extra code, so there is
455zero performance hit on those). The acceptable inputs to
456C<LATIN1_TO_NATIVE> are from C<0x00> through C<0xFF>. If your input
457isn't guaranteed to be in that range, use C<UNICODE_TO_NATIVE> instead.
458C<NATIVE_TO_LATIN1> and C<NATIVE_TO_UNICODE> translate the opposite
459direction.
460
461If you need the string representation of a character that doesn't have a
462mnemonic name in C, you should add it to the list in
c22aa07d 463F<regen/unicode_constants.pl>, and have Perl create C<#define>'s for you,
eb6d698b 464based on the current platform.
04c692a8 465
eb9df707
KW
466Note that the C<isI<FOO>> and C<toI<FOO>> macros in F<handy.h> work
467properly on native code points and strings.
468
04c692a8 469Also, the range 'A' - 'Z' in ASCII is an unbroken sequence of 26 upper
9b22382a
FC
470case alphabetic characters. That is not true in EBCDIC. Nor for 'a' to
471'z'. But '0' - '9' is an unbroken range in both systems. Don't assume
c22aa07d 472anything about other ranges. (Note that special handling of ranges in
f4240379 473regular expression patterns and transliterations makes it appear to Perl
c22aa07d 474code that the aforementioned ranges are all unbroken.)
04c692a8
DR
475
476Many of the comments in the existing code ignore the possibility of
9b22382a 477EBCDIC, and may be wrong therefore, even if the code works. This is
04c692a8
DR
478actually a tribute to the successful transparent insertion of being
479able to handle EBCDIC without having to change pre-existing code.
480
481UTF-8 and UTF-EBCDIC are two different encodings used to represent
9b22382a 482Unicode code points as sequences of bytes. Macros with the same names
eb9df707 483(but different definitions) in F<utf8.h> and F<utfebcdic.h> are used to
04c692a8
DR
484allow the calling code to think that there is only one such encoding.
485This is almost always referred to as C<utf8>, but it means the EBCDIC
9b22382a 486version as well. Again, comments in the code may well be wrong even if
eb9df707 487the code itself is right. For example, the concept of UTF-8 C<invariant
9b22382a
FC
488characters> differs between ASCII and EBCDIC. On ASCII platforms, only
489characters that do not have the high-order bit set (i.e. whose ordinals
04c692a8
DR
490are strict ASCII, 0 - 127) are invariant, and the documentation and
491comments in the code may assume that, often referring to something
9b22382a 492like, say, C<hibit>. The situation differs and is not so simple on
04c692a8
DR
493EBCDIC machines, but as long as the code itself uses the
494C<NATIVE_IS_INVARIANT()> macro appropriately, it works, even if the
495comments are wrong.
496
257844b9
KW
497As noted in L<perlhack/TESTING>, when writing test scripts, the file
498F<t/charset_tools.pl> contains some helpful functions for writing tests
499valid on both ASCII and EBCDIC platforms. Sometimes, though, a test
500can't use a function and it's inconvenient to have different test
501versions depending on the platform. There are 20 code points that are
502the same in all 4 character sets currently recognized by Perl (the 3
503EBCDIC code pages plus ISO 8859-1 (ASCII/Latin1)). These can be used in
504such tests, though there is a small possibility that Perl will become
505available in yet another character set, breaking your test. All but one
506of these code points are C0 control characters. The most significant
507controls that are the same are C<\0>, C<\r>, and C<\N{VT}> (also
508specifiable as C<\cK>, C<\x0B>, C<\N{U+0B}>, or C<\013>). The single
509non-control is U+00B6 PILCROW SIGN. The controls that are the same have
510the same bit pattern in all 4 character sets, regardless of the UTF8ness
511of the string containing them. The bit pattern for U+B6 is the same in
512all 4 for non-UTF8 strings, but differs in each when its containing
513string is UTF-8 encoded. The only other code points that have some sort
514of sameness across all 4 character sets are the pair 0xDC and 0xFC.
515Together these represent upper- and lowercase LATIN LETTER U WITH
516DIAERESIS, but which is upper and which is lower may be reversed: 0xDC
517is the capital in Latin1 and 0xFC is the small letter, while 0xFC is the
518capital in EBCDIC and 0xDC is the small one. This factoid may be
519exploited in writing case insensitive tests that are the same across all
5204 character sets.
521
04c692a8
DR
522=item *
523
524Assuming the character set is just ASCII
525
9b22382a 526ASCII is a 7 bit encoding, but bytes have 8 bits in them. The 128 extra
04c692a8
DR
527characters have different meanings depending on the locale. Absent a
528locale, currently these extra characters are generally considered to be
eb9df707
KW
529unassigned, and this has presented some problems. This has being
530changed starting in 5.12 so that these characters can be considered to
531be Latin-1 (ISO-8859-1).
04c692a8
DR
532
533=item *
534
535Mixing #define and #ifdef
536
537 #define BURGLE(x) ... \
538 #ifdef BURGLE_OLD_STYLE /* BAD */
539 ... do it the old way ... \
540 #else
541 ... do it the new way ... \
542 #endif
543
9b22382a 544You cannot portably "stack" cpp directives. For example in the above
04c692a8
DR
545you need two separate BURGLE() #defines, one for each #ifdef branch.
546
547=item *
548
549Adding non-comment stuff after #endif or #else
550
551 #ifdef SNOSH
552 ...
553 #else !SNOSH /* BAD */
554 ...
555 #endif SNOSH /* BAD */
556
557The #endif and #else cannot portably have anything non-comment after
9b22382a 558them. If you want to document what is going (which is a good idea
04c692a8
DR
559especially if the branches are long), use (C) comments:
560
561 #ifdef SNOSH
562 ...
563 #else /* !SNOSH */
564 ...
565 #endif /* SNOSH */
566
567The gcc option C<-Wendif-labels> warns about the bad variant (by
568default on starting from Perl 5.9.4).
569
570=item *
571
572Having a comma after the last element of an enum list
573
574 enum color {
575 CERULEAN,
576 CHARTREUSE,
577 CINNABAR, /* BAD */
578 };
579
9b22382a 580is not portable. Leave out the last comma.
04c692a8
DR
581
582Also note that whether enums are implicitly morphable to ints varies
583between compilers, you might need to (int).
584
585=item *
586
04c692a8
DR
587Mixing signed char pointers with unsigned char pointers
588
589 int foo(char *s) { ... }
590 ...
591 unsigned char *t = ...; /* Or U8* t = ... */
592 foo(t); /* BAD */
593
594While this is legal practice, it is certainly dubious, and downright
595fatal in at least one platform: for example VMS cc considers this a
9b22382a 596fatal error. One cause for people often making this mistake is that a
04c692a8
DR
597"naked char" and therefore dereferencing a "naked char pointer" have an
598undefined signedness: it depends on the compiler and the flags of the
599compiler and the underlying platform whether the result is signed or
9b22382a 600unsigned. For this very same reason using a 'char' as an array index is
04c692a8
DR
601bad.
602
603=item *
604
605Macros that have string constants and their arguments as substrings of
606the string constants
607
608 #define FOO(n) printf("number = %d\n", n) /* BAD */
609 FOO(10);
610
611Pre-ANSI semantics for that was equivalent to
612
613 printf("10umber = %d\10");
614
9b22382a 615which is probably not what you were expecting. Unfortunately at least
04c692a8
DR
616one reasonably common and modern C compiler does "real backward
617compatibility" here, in AIX that is what still happens even though the
618rest of the AIX compiler is very happily C89.
619
620=item *
621
622Using printf formats for non-basic C types
623
624 IV i = ...;
625 printf("i = %d\n", i); /* BAD */
626
627While this might by accident work in some platform (where IV happens to
9b22382a 628be an C<int>), in general it cannot. IV might be something larger. Even
04c692a8
DR
629worse the situation is with more specific types (defined by Perl's
630configuration step in F<config.h>):
631
632 Uid_t who = ...;
633 printf("who = %d\n", who); /* BAD */
634
635The problem here is that Uid_t might be not only not C<int>-wide but it
636might also be unsigned, in which case large uids would be printed as
637negative values.
638
639There is no simple solution to this because of printf()'s limited
640intelligence, but for many types the right format is available as with
641either 'f' or '_f' suffix, for example:
642
643 IVdf /* IV in decimal */
644 UVxf /* UV is hexadecimal */
645
646 printf("i = %"IVdf"\n", i); /* The IVdf is a string constant. */
647
648 Uid_t_f /* Uid_t in decimal */
649
650 printf("who = %"Uid_t_f"\n", who);
651
652Or you can try casting to a "wide enough" type:
653
654 printf("i = %"IVdf"\n", (IV)something_very_small_and_signed);
655
9dd1a77d
KW
656See L<perlguts/Formatted Printing of Size_t and SSize_t> for how to
657print those.
658
04c692a8
DR
659Also remember that the C<%p> format really does require a void pointer:
660
661 U8* p = ...;
662 printf("p = %p\n", (void*)p);
663
664The gcc option C<-Wformat> scans for such problems.
665
666=item *
667
04c692a8
DR
668Blindly passing va_list
669
670Not all platforms support passing va_list to further varargs (stdarg)
9b22382a 671functions. The right thing to do is to copy the va_list using the
04c692a8
DR
672Perl_va_copy() if the NEED_VA_COPY is defined.
673
674=item *
675
676Using gcc statement expressions
677
678 val = ({...;...;...}); /* BAD */
679
91c7e172 680While a nice extension, it's not portable. Historically, Perl used
a66ca998
NC
681them in macros if available to gain some extra speed (essentially
682as a funky form of inlining), but we now support (or emulate) C99
683C<static inline> functions, so use them instead. Declare functions as
684C<PERL_STATIC_INLINE> to transparently fall back to emulation where needed.
04c692a8
DR
685
686=item *
687
688Binding together several statements in a macro
689
690Use the macros STMT_START and STMT_END.
691
692 STMT_START {
693 ...
694 } STMT_END
695
696=item *
697
698Testing for operating systems or versions when should be testing for
699features
700
701 #ifdef __FOONIX__ /* BAD */
702 foo = quux();
703 #endif
704
705Unless you know with 100% certainty that quux() is only ever available
706for the "Foonix" operating system B<and> that is available B<and>
707correctly working for B<all> past, present, B<and> future versions of
9b22382a 708"Foonix", the above is very wrong. This is more correct (though still
04c692a8
DR
709not perfect, because the below is a compile-time check):
710
711 #ifdef HAS_QUUX
712 foo = quux();
713 #endif
714
715How does the HAS_QUUX become defined where it needs to be? Well, if
716Foonix happens to be Unixy enough to be able to run the Configure
717script, and Configure has been taught about detecting and testing
9b22382a 718quux(), the HAS_QUUX will be correctly defined. In other platforms, the
04c692a8
DR
719corresponding configuration step will hopefully do the same.
720
721In a pinch, if you cannot wait for Configure to be educated, or if you
722have a good hunch of where quux() might be available, you can
723temporarily try the following:
724
725 #if (defined(__FOONIX__) || defined(__BARNIX__))
726 # define HAS_QUUX
727 #endif
728
729 ...
730
731 #ifdef HAS_QUUX
732 foo = quux();
733 #endif
734
735But in any case, try to keep the features and operating systems
736separate.
737
b39b5b0c
JH
738A good resource on the predefined macros for various operating
739systems, compilers, and so forth is
740L<http://sourceforge.net/p/predef/wiki/Home/>
741
38f18a30
KW
742=item *
743
744Assuming the contents of static memory pointed to by the return values
745of Perl wrappers for C library functions doesn't change. Many C library
746functions return pointers to static storage that can be overwritten by
747subsequent calls to the same or related functions. Perl has
748light-weight wrappers for some of these functions, and which don't make
749copies of the static memory. A good example is the interface to the
750environment variables that are in effect for the program. Perl has
751C<PerlEnv_getenv> to get values from the environment. But the return is
752a pointer to static memory in the C library. If you are using the value
753to immediately test for something, that's fine, but if you save the
754value and expect it to be unchanged by later processing, you would be
755wrong, but perhaps you wouldn't know it because different C library
756implementations behave differently, and the one on the platform you're
757testing on might work for your situation. But on some platforms, a
758subsequent call to C<PerlEnv_getenv> or related function WILL overwrite
759the memory that your first call points to. This has led to some
760hard-to-debug problems. Do a L<perlapi/savepv> to make a copy, thus
761avoiding these problems. You will have to free the copy when you're
762done to avoid memory leaks. If you don't have control over when it gets
763freed, you'll need to make the copy in a mortal scalar, like so:
764
765 if ((s = PerlEnv_getenv("foo") == NULL) {
766 ... /* handle NULL case */
767 }
768 else {
769 s = SvPVX(sv_2mortal(newSVpv(s, 0)));
770 }
771
772The above example works only if C<"s"> is C<NUL>-terminated; otherwise
773you have to pass its length to C<newSVpv>.
774
04c692a8
DR
775=back
776
777=head2 Problematic System Interfaces
778
779=over 4
780
781=item *
782
4aada8b9
KW
783Perl strings are NOT the same as C strings: They may contain C<NUL>
784characters, whereas a C string is terminated by the first C<NUL>.
785That is why Perl API functions that deal with strings generally take a
786pointer to the first byte and either a length or a pointer to the byte
787just beyond the final one.
788
789And this is the reason that many of the C library string handling
790functions should not be used. They don't cope with the full generality
791of Perl strings. It may be that your test cases don't have embedded
792C<NUL>s, and so the tests pass, whereas there may well eventually arise
793real-world cases where they fail. A lesson here is to include C<NUL>s
794in your tests. Now it's fairly rare in most real world cases to get
795C<NUL>s, so your code may seem to work, until one day a C<NUL> comes
796along.
797
798Here's an example. It used to be a common paradigm, for decades, in the
799perl core to use S<C<strchr("list", c)>> to see if the character C<c> is
800any of the ones given in C<"list">, a double-quote-enclosed string of
801the set of characters that we are seeing if C<c> is one of. As long as
802C<c> isn't a C<NUL>, it works. But when C<c> is a C<NUL>, C<strchr>
803returns a pointer to the terminating C<NUL> in C<"list">. This likely
804will result in a segfault or a security issue when the caller uses that
805end pointer as the starting point to read from.
806
807A solution to this and many similar issues is to use the C<mem>I<-foo> C
808library functions instead. In this case C<memchr> can be used to see if
809C<c> is in C<"list"> and works even if C<c> is C<NUL>. These functions
810need an additional parameter to give the string length.
811In the case of literal string parameters, perl has defined macros that
51b56f5c 812calculate the length for you. See L<perlapi/String Handling>.
4aada8b9
KW
813
814=item *
815
9b22382a
FC
816malloc(0), realloc(0), calloc(0, 0) are non-portable. To be portable
817allocate at least one byte. (In general you should rarely need to work
04c692a8
DR
818at this low level, but instead use the various malloc wrappers.)
819
4059ba87
AC
820=item *
821
822snprintf() - the return type is unportable. Use my_snprintf() instead.
823
04c692a8
DR
824=back
825
826=head2 Security problems
827
828Last but not least, here are various tips for safer coding.
bbc89b61 829See also L<perlclib> for libc/stdio replacements one should use.
04c692a8
DR
830
831=over 4
832
833=item *
834
835Do not use gets()
836
9b22382a 837Or we will publicly ridicule you. Seriously.
04c692a8
DR
838
839=item *
840
bbc89b61
JH
841Do not use tmpfile()
842
843Use mkstemp() instead.
844
845=item *
846
04c692a8
DR
847Do not use strcpy() or strcat() or strncpy() or strncat()
848
849Use my_strlcpy() and my_strlcat() instead: they either use the native
850implementation, or Perl's own implementation (borrowed from the public
851domain implementation of INN).
852
853=item *
854
855Do not use sprintf() or vsprintf()
856
4059ba87
AC
857If you really want just plain byte strings, use my_snprintf() and
858my_vsnprintf() instead, which will try to use snprintf() and
859vsnprintf() if those safer APIs are available. If you want something
6bfe0388
KW
860fancier than a plain byte string, use
861L<C<Perl_form>()|perlapi/form> or SVs and
862L<C<Perl_sv_catpvf()>|perlapi/sv_catpvf>.
863
4059ba87
AC
864Note that glibc C<printf()>, C<sprintf()>, etc. are buggy before glibc
865version 2.17. They won't allow a C<%.s> format with a precision to
866create a string that isn't valid UTF-8 if the current underlying locale
867of the program is UTF-8. What happens is that the C<%s> and its operand are
868simply skipped without any notice.
869L<https://sourceware.org/bugzilla/show_bug.cgi?id=6530>.
870
c98823ff
JH
871=item *
872
873Do not use atoi()
874
22ff3130 875Use grok_atoUV() instead. atoi() has ill-defined behavior on overflows,
c98823ff 876and cannot be used for incremental parsing. It is also affected by locale,
338aa8b0
JH
877which is bad.
878
879=item *
880
881Do not use strtol() or strtoul()
882
22ff3130 883Use grok_atoUV() instead. strtol() or strtoul() (or their IV/UV-friendly
338aa8b0
JH
884macro disguises, Strtol() and Strtoul(), or Atol() and Atoul() are
885affected by locale, which is bad.
c98823ff 886
04c692a8
DR
887=back
888
889=head1 DEBUGGING
890
891You can compile a special debugging version of Perl, which allows you
892to use the C<-D> option of Perl to tell more about what Perl is doing.
893But sometimes there is no alternative than to dive in with a debugger,
894either to see the stack trace of a core dump (very useful in a bug
895report), or trying to figure out what went wrong before the core dump
896happened, or how did we end up having wrong or unexpected results.
897
898=head2 Poking at Perl
899
900To really poke around with Perl, you'll probably want to build Perl for
901debugging, like this:
902
f075db89 903 ./Configure -d -DDEBUGGING
04c692a8
DR
904 make
905
f075db89
DM
906C<-DDEBUGGING> turns on the C compiler's C<-g> flag to have it produce
907debugging information which will allow us to step through a running
908program, and to see in which C function we are at (without the debugging
909information we might see only the numerical addresses of the functions,
910which is not very helpful). It will also turn on the C<DEBUGGING>
911compilation symbol which enables all the internal debugging code in Perl.
028611fa
DB
912There are a whole bunch of things you can debug with this:
913L<perlrun|perlrun/-Dletters> lists them all, and the best way to find out
914about them is to play about with them. The most useful options are
915probably
04c692a8
DR
916
917 l Context (loop) stack processing
f075db89 918 s Stack snapshots (with v, displays all stacks)
04c692a8
DR
919 t Trace execution
920 o Method and overloading resolution
921 c String/numeric conversions
922
f075db89
DM
923For example
924
925 $ perl -Dst -e '$a + 1'
926 ....
927 (-e:1) gvsv(main::a)
928 => UNDEF
929 (-e:1) const(IV(1))
930 => UNDEF IV(1)
931 (-e:1) add
932 => NV(1)
933
934
935Some of the functionality of the debugging code can be achieved with a
936non-debugging perl by using XS modules:
04c692a8
DR
937
938 -Dr => use re 'debug'
939 -Dx => use O 'Debug'
940
941=head2 Using a source-level debugger
942
943If the debugging output of C<-D> doesn't help you, it's time to step
944through perl's execution with a source-level debugger.
945
946=over 3
947
948=item *
949
950We'll use C<gdb> for our examples here; the principles will apply to
951any debugger (many vendors call their debugger C<dbx>), but check the
952manual of the one you're using.
953
954=back
955
956To fire up the debugger, type
957
958 gdb ./perl
959
960Or if you have a core dump:
961
962 gdb ./perl core
963
964You'll want to do that in your Perl source tree so the debugger can
9b22382a 965read the source code. You should see the copyright message, followed by
04c692a8
DR
966the prompt.
967
968 (gdb)
969
970C<help> will get you into the documentation, but here are the most
971useful commands:
972
973=over 3
974
975=item * run [args]
976
977Run the program with the given arguments.
978
979=item * break function_name
980
981=item * break source.c:xxx
982
983Tells the debugger that we'll want to pause execution when we reach
984either the named function (but see L<perlguts/Internal Functions>!) or
985the given line in the named source file.
986
987=item * step
988
989Steps through the program a line at a time.
990
991=item * next
992
993Steps through the program a line at a time, without descending into
994functions.
995
996=item * continue
997
998Run until the next breakpoint.
999
1000=item * finish
1001
1002Run until the end of the current function, then stop again.
1003
1004=item * 'enter'
1005
1006Just pressing Enter will do the most recent operation again - it's a
1007blessing when stepping through miles of source code.
1008
8b029fdf
MH
1009=item * ptype
1010
1011Prints the C definition of the argument given.
1012
1013 (gdb) ptype PL_op
1014 type = struct op {
1015 OP *op_next;
86cd3a13 1016 OP *op_sibparent;
8b029fdf
MH
1017 OP *(*op_ppaddr)(void);
1018 PADOFFSET op_targ;
1019 unsigned int op_type : 9;
1020 unsigned int op_opt : 1;
1021 unsigned int op_slabbed : 1;
1022 unsigned int op_savefree : 1;
1023 unsigned int op_static : 1;
1024 unsigned int op_folded : 1;
1025 unsigned int op_spare : 2;
1026 U8 op_flags;
1027 U8 op_private;
1028 } *
1029
04c692a8
DR
1030=item * print
1031
9b22382a 1032Execute the given C code and print its results. B<WARNING>: Perl makes
04c692a8 1033heavy use of macros, and F<gdb> does not necessarily support macros
9b22382a 1034(see later L</"gdb macro support">). You'll have to substitute them
04c692a8
DR
1035yourself, or to invoke cpp on the source code files (see L</"The .i
1036Targets">) So, for instance, you can't say
1037
1038 print SvPV_nolen(sv)
1039
1040but you have to say
1041
1042 print Perl_sv_2pv_nolen(sv)
1043
1044=back
1045
1046You may find it helpful to have a "macro dictionary", which you can
9b22382a 1047produce by saying C<cpp -dM perl.c | sort>. Even then, F<cpp> won't
04c692a8
DR
1048recursively apply those macros for you.
1049
1050=head2 gdb macro support
1051
1052Recent versions of F<gdb> have fairly good macro support, but in order
1053to use it you'll need to compile perl with macro definitions included
9b22382a
FC
1054in the debugging information. Using F<gcc> version 3.1, this means
1055configuring with C<-Doptimize=-g3>. Other compilers might use a
04c692a8
DR
1056different switch (if they support debugging macros at all).
1057
1058=head2 Dumping Perl Data Structures
1059
1060One way to get around this macro hell is to use the dumping functions
1061in F<dump.c>; these work a little like an internal
1062L<Devel::Peek|Devel::Peek>, but they also cover OPs and other
9b22382a 1063structures that you can't get at from Perl. Let's take an example.
04c692a8 1064We'll use the C<$a = $b + $c> we used before, but give it a bit of
9b22382a 1065context: C<$b = "6XXXX"; $c = 2.3;>. Where's a good place to stop and
04c692a8
DR
1066poke around?
1067
1068What about C<pp_add>, the function we examined earlier to implement the
1069C<+> operator:
1070
1071 (gdb) break Perl_pp_add
1072 Breakpoint 1 at 0x46249f: file pp_hot.c, line 309.
1073
1074Notice we use C<Perl_pp_add> and not C<pp_add> - see
9b22382a 1075L<perlguts/Internal Functions>. With the breakpoint in place, we can
04c692a8
DR
1076run our program:
1077
1078 (gdb) run -e '$b = "6XXXX"; $c = 2.3; $a = $b + $c'
1079
1080Lots of junk will go past as gdb reads in the relevant source files and
1081libraries, and then:
1082
1083 Breakpoint 1, Perl_pp_add () at pp_hot.c:309
72876cce 1084 1396 dSP; dATARGET; bool useleft; SV *svl, *svr;
04c692a8
DR
1085 (gdb) step
1086 311 dPOPTOPnnrl_ul;
1087 (gdb)
1088
1089We looked at this bit of code before, and we said that
1090C<dPOPTOPnnrl_ul> arranges for two C<NV>s to be placed into C<left> and
1091C<right> - let's slightly expand it:
1092
1093 #define dPOPTOPnnrl_ul NV right = POPn; \
1094 SV *leftsv = TOPs; \
1095 NV left = USE_LEFT(leftsv) ? SvNV(leftsv) : 0.0
1096
1097C<POPn> takes the SV from the top of the stack and obtains its NV
1098either directly (if C<SvNOK> is set) or by calling the C<sv_2nv>
9b22382a
FC
1099function. C<TOPs> takes the next SV from the top of the stack - yes,
1100C<POPn> uses C<TOPs> - but doesn't remove it. We then use C<SvNV> to
04c692a8
DR
1101get the NV from C<leftsv> in the same way as before - yes, C<POPn> uses
1102C<SvNV>.
1103
1104Since we don't have an NV for C<$b>, we'll have to use C<sv_2nv> to
9b22382a 1105convert it. If we step again, we'll find ourselves there:
04c692a8 1106
8b029fdf 1107 (gdb) step
04c692a8
DR
1108 Perl_sv_2nv (sv=0xa0675d0) at sv.c:1669
1109 1669 if (!sv)
1110 (gdb)
1111
1112We can now use C<Perl_sv_dump> to investigate the SV:
1113
8b029fdf 1114 (gdb) print Perl_sv_dump(sv)
04c692a8
DR
1115 SV = PV(0xa057cc0) at 0xa0675d0
1116 REFCNT = 1
1117 FLAGS = (POK,pPOK)
1118 PV = 0xa06a510 "6XXXX"\0
1119 CUR = 5
1120 LEN = 6
1121 $1 = void
1122
1123We know we're going to get C<6> from this, so let's finish the
1124subroutine:
1125
1126 (gdb) finish
1127 Run till exit from #0 Perl_sv_2nv (sv=0xa0675d0) at sv.c:1671
1128 0x462669 in Perl_pp_add () at pp_hot.c:311
1129 311 dPOPTOPnnrl_ul;
1130
1131We can also dump out this op: the current op is always stored in
9b22382a 1132C<PL_op>, and we can dump it with C<Perl_op_dump>. This'll give us
903b1101 1133similar output to CPAN module B::Debug.
04c692a8 1134
8b029fdf 1135 (gdb) print Perl_op_dump(PL_op)
04c692a8
DR
1136 {
1137 13 TYPE = add ===> 14
1138 TARG = 1
1139 FLAGS = (SCALAR,KIDS)
1140 {
1141 TYPE = null ===> (12)
1142 (was rv2sv)
1143 FLAGS = (SCALAR,KIDS)
1144 {
1145 11 TYPE = gvsv ===> 12
1146 FLAGS = (SCALAR)
1147 GV = main::b
1148 }
1149 }
1150
1151# finish this later #
1152
8b029fdf
MH
1153=head2 Using gdb to look at specific parts of a program
1154
73013070
SF
1155With the example above, you knew to look for C<Perl_pp_add>, but what if
1156there were multiple calls to it all over the place, or you didn't know what
8b029fdf
MH
1157the op was you were looking for?
1158
73013070 1159One way to do this is to inject a rare call somewhere near what you're looking
9b22382a 1160for. For example, you could add C<study> before your method:
8b029fdf
MH
1161
1162 study;
1163
1164And in gdb do:
1165
1166 (gdb) break Perl_pp_study
1167
9b22382a 1168And then step until you hit what you're
73013070 1169looking for. This works well in a loop
8b029fdf
MH
1170if you want to only break at certain iterations:
1171
1172 for my $c (1..100) {
1173 study if $c == 50;
1174 }
1175
1176=head2 Using gdb to look at what the parser/lexer are doing
1177
73013070 1178If you want to see what perl is doing when parsing/lexing your code, you can
72b22e55 1179use C<BEGIN {}>:
8b029fdf
MH
1180
1181 print "Before\n";
1182 BEGIN { study; }
1183 print "After\n";
1184
1185And in gdb:
1186
1187 (gdb) break Perl_pp_study
1188
1189If you want to see what the parser/lexer is doing inside of C<if> blocks and
1190the like you need to be a little trickier:
1191
73013070 1192 if ($a && $b && do { BEGIN { study } 1 } && $c) { ... }
8b029fdf 1193
04c692a8
DR
1194=head1 SOURCE CODE STATIC ANALYSIS
1195
1196Various tools exist for analysing C source code B<statically>, as
9b22382a 1197opposed to B<dynamically>, that is, without executing the code. It is
04c692a8
DR
1198possible to detect resource leaks, undefined behaviour, type
1199mismatches, portability problems, code paths that would cause illegal
1200memory accesses, and other similar problems by just parsing the C code
1201and looking at the resulting graph, what does it tell about the
9b22382a 1202execution and data flows. As a matter of fact, this is exactly how C
04c692a8
DR
1203compilers know to give warnings about dubious code.
1204
c707756e 1205=head2 lint
04c692a8
DR
1206
1207The good old C code quality inspector, C<lint>, is available in several
1208platforms, but please be aware that there are several different
1209implementations of it by different vendors, which means that the flags
1210are not identical across different platforms.
1211
c707756e 1212There is a C<lint> target in Makefile, but you may have to
04c692a8
DR
1213diddle with the flags (see above).
1214
1215=head2 Coverity
1216
4b05bc8e 1217Coverity (L<http://www.coverity.com/>) is a product similar to lint and as
04c692a8
DR
1218a testbed for their product they periodically check several open source
1219projects, and they give out accounts to open source developers to the
1220defect databases.
1221
d3c1eddb
JH
1222There is Coverity setup for the perl5 project:
1223L<https://scan.coverity.com/projects/perl5>
1224
a72f2680 1225=head2 HP-UX cadvise (Code Advisor)
65c4791f
JH
1226
1227HP has a C/C++ static analyzer product for HP-UX caller Code Advisor.
1228(Link not given here because the URL is horribly long and seems horribly
1229unstable; use the search engine of your choice to find it.) The use of
1230the C<cadvise_cc> recipe with C<Configure ... -Dcc=./cadvise_cc>
1231(see cadvise "User Guide") is recommended; as is the use of C<+wall>.
1232
04c692a8
DR
1233=head2 cpd (cut-and-paste detector)
1234
9b22382a 1235The cpd tool detects cut-and-paste coding. If one instance of the
04c692a8 1236cut-and-pasted code changes, all the other spots should probably be
9b22382a 1237changed, too. Therefore such code should probably be turned into a
04c692a8
DR
1238subroutine or a macro.
1239
5632ec47
TD
1240cpd (L<https://pmd.github.io/latest/pmd_userdocs_cpd.html>) is part of the pmd project
1241(L<https://pmd.github.io/>). pmd was originally written for static
04c692a8
DR
1242analysis of Java code, but later the cpd part of it was extended to
1243parse also C and C++.
1244
1245Download the pmd-bin-X.Y.zip () from the SourceForge site, extract the
1246pmd-X.Y.jar from it, and then run that on source code thusly:
1247
0cbf2b31
FC
1248 java -cp pmd-X.Y.jar net.sourceforge.pmd.cpd.CPD \
1249 --minimum-tokens 100 --files /some/where/src --language c > cpd.txt
04c692a8
DR
1250
1251You may run into memory limits, in which case you should use the -Xmx
1252option:
1253
1254 java -Xmx512M ...
1255
1256=head2 gcc warnings
1257
1258Though much can be written about the inconsistency and coverage
1259problems of gcc warnings (like C<-Wall> not meaning "all the warnings",
1260or some common portability problems not being covered by C<-Wall>, or
1261C<-ansi> and C<-pedantic> both being a poorly defined collection of
1262warnings, and so forth), gcc is still a useful tool in keeping our
1263coding nose clean.
1264
1265The C<-Wall> is by default on.
1266
a66ca998
NC
1267It would be nice for C<-pedantic>) to be on always, but unfortunately it is not
1268safe on all platforms - for example fatal conflicts with the system headers
1269(Solaris being a prime example). If Configure C<-Dgccansipedantic> is used,
1270the C<cflags> frontend selects C<-pedantic> for the platforms where it is known
1271to be safe.
04c692a8 1272
2884c977 1273The following extra flags are added:
04c692a8
DR
1274
1275=over 4
1276
1277=item *
1278
1279C<-Wendif-labels>
1280
1281=item *
1282
1283C<-Wextra>
1284
1285=item *
1286
2884c977
DIM
1287C<-Wc++-compat>
1288
1289=item *
1290
1291C<-Wwrite-strings>
1292
1293=item *
1294
a66ca998 1295C<-Werror=pointer-arith>
04c692a8 1296
5997475b
DIM
1297=item *
1298
a66ca998 1299C<-Werror=vla>
5997475b 1300
04c692a8
DR
1301=back
1302
1303The following flags would be nice to have but they would first need
1304their own Augean stablemaster:
1305
1306=over 4
1307
1308=item *
1309
04c692a8
DR
1310C<-Wshadow>
1311
1312=item *
1313
1314C<-Wstrict-prototypes>
1315
1316=back
1317
1318The C<-Wtraditional> is another example of the annoying tendency of gcc
1319to bundle a lot of warnings under one switch (it would be impossible to
1320deploy in practice because it would complain a lot) but it does contain
1321some warnings that would be beneficial to have available on their own,
1322such as the warning about string constants inside macros containing the
1323macro arguments: this behaved differently pre-ANSI than it does in
1324ANSI, and some C compilers are still in transition, AIX being an
1325example.
1326
1327=head2 Warnings of other C compilers
1328
1329Other C compilers (yes, there B<are> other C compilers than gcc) often
1330have their "strict ANSI" or "strict ANSI with some portability
1331extensions" modes on, like for example the Sun Workshop has its C<-Xa>
1332mode on (though implicitly), or the DEC (these days, HP...) has its
1333C<-std1> mode on.
1334
1335=head1 MEMORY DEBUGGERS
1336
d1fd4856
VP
1337B<NOTE 1>: Running under older memory debuggers such as Purify,
1338valgrind or Third Degree greatly slows down the execution: seconds
9b22382a 1339become minutes, minutes become hours. For example as of Perl 5.8.1, the
04c692a8 1340ext/Encode/t/Unicode.t takes extraordinarily long to complete under
9b22382a
FC
1341e.g. Purify, Third Degree, and valgrind. Under valgrind it takes more
1342than six hours, even on a snappy computer. The said test must be doing
1343something that is quite unfriendly for memory debuggers. If you don't
04c692a8 1344feel like waiting, that you can simply kill away the perl process.
d1fd4856
VP
1345Roughly valgrind slows down execution by factor 10, AddressSanitizer by
1346factor 2.
04c692a8
DR
1347
1348B<NOTE 2>: To minimize the number of memory leak false alarms (see
1349L</PERL_DESTRUCT_LEVEL> for more information), you have to set the
9b22382a 1350environment variable PERL_DESTRUCT_LEVEL to 2. For example, like this:
04c692a8
DR
1351
1352 env PERL_DESTRUCT_LEVEL=2 valgrind ./perl -Ilib ...
1353
1354B<NOTE 3>: There are known memory leaks when there are compile-time
1355errors within eval or require, seeing C<S_doeval> in the call stack is
9b22382a 1356a good sign of these. Fixing these leaks is non-trivial, unfortunately,
04c692a8
DR
1357but they must be fixed eventually.
1358
1359B<NOTE 4>: L<DynaLoader> will not clean up after itself completely
1360unless Perl is built with the Configure option
1361C<-Accflags=-DDL_UNLOAD_ALL_AT_EXIT>.
1362
04c692a8
DR
1363=head2 valgrind
1364
d1fd4856 1365The valgrind tool can be used to find out both memory leaks and illegal
9b22382a 1366heap memory accesses. As of version 3.3.0, Valgrind only supports Linux
0263e49a 1367on x86, x86-64 and PowerPC and Darwin (OS X) on x86 and x86-64. The
d1fd4856 1368special "test.valgrind" target can be used to run the tests under
9b22382a 1369valgrind. Found errors and memory leaks are logged in files named
037ab3f1
MH
1370F<testfile.valgrind> and by default output is displayed inline.
1371
1372Example usage:
1373
1374 make test.valgrind
1375
1376Since valgrind adds significant overhead, tests will take much longer to
1377run. The valgrind tests support being run in parallel to help with this:
1378
1379 TEST_JOBS=9 make test.valgrind
1380
1381Note that the above two invocations will be very verbose as reachable
1382memory and leak-checking is enabled by default. If you want to just see
1383pure errors, try:
73013070 1384
037ab3f1
MH
1385 VG_OPTS='-q --leak-check=no --show-reachable=no' TEST_JOBS=9 \
1386 make test.valgrind
04c692a8
DR
1387
1388Valgrind also provides a cachegrind tool, invoked on perl as:
1389
1390 VG_OPTS=--tool=cachegrind make test.valgrind
1391
1392As system libraries (most notably glibc) are also triggering errors,
9b22382a 1393valgrind allows to suppress such errors using suppression files. The
04c692a8 1394default suppression file that comes with valgrind already catches a lot
9b22382a 1395of them. Some additional suppressions are defined in F<t/perl.supp>.
04c692a8
DR
1396
1397To get valgrind and for more information see
1398
0061d4fa 1399 http://valgrind.org/
04c692a8 1400
81c3bbe7
RU
1401=head2 AddressSanitizer
1402
6babf542 1403AddressSanitizer ("ASan") consists of a compiler instrumentation module
53b3ccc9
RL
1404and a run-time C<malloc> library. ASan is available for a variety of
1405architectures, operating systems, and compilers (see project link below).
1406It checks for unsafe memory usage, such as use after free and buffer
1407overflow conditions, and is fast enough that you can easily compile your
6babf542
RL
1408debugging or optimized perl with it. Modern versions of ASan check for
1409memory leaks by default on most platforms, otherwise (e.g. x86_64 OS X)
1410this feature can be enabled via C<ASAN_OPTIONS=detect_leaks=1>.
1411
81c3bbe7 1412
8a64fbaa
VP
1413To build perl with AddressSanitizer, your Configure invocation should
1414look like:
81c3bbe7 1415
e8596d90 1416 sh Configure -des -Dcc=clang \
6babf542
RL
1417 -Accflags=-fsanitize=address -Aldflags=-fsanitize=address \
1418 -Alddlflags=-shared\ -fsanitize=address \
1419 -fsanitize-blacklist=`pwd`/asan_ignore
81c3bbe7
RU
1420
1421where these arguments mean:
1422
1423=over 4
1424
1425=item * -Dcc=clang
1426
8a64fbaa
VP
1427This should be replaced by the full path to your clang executable if it
1428is not in your path.
81c3bbe7 1429
6babf542 1430=item * -Accflags=-fsanitize=address
81c3bbe7 1431
8a64fbaa 1432Compile perl and extensions sources with AddressSanitizer.
81c3bbe7 1433
6babf542 1434=item * -Aldflags=-fsanitize=address
81c3bbe7 1435
8a64fbaa 1436Link the perl executable with AddressSanitizer.
81c3bbe7 1437
6babf542 1438=item * -Alddlflags=-shared\ -fsanitize=address
81c3bbe7 1439
9b22382a 1440Link dynamic extensions with AddressSanitizer. You must manually
e8596d90
VP
1441specify C<-shared> because using C<-Alddlflags=-shared> will prevent
1442Configure from setting a default value for C<lddlflags>, which usually
5dfc6e97 1443contains C<-shared> (at least on Linux).
81c3bbe7 1444
6babf542
RL
1445=item * -fsanitize-blacklist=`pwd`/asan_ignore
1446
1447AddressSanitizer will ignore functions listed in the C<asan_ignore>
1448file. (This file should contain a short explanation of why each of
1449the functions is listed.)
1450
81c3bbe7
RU
1451=back
1452
8a64fbaa 1453See also
a856e9cc 1454L<https://github.com/google/sanitizers/wiki/AddressSanitizer>.
81c3bbe7
RU
1455
1456
04c692a8
DR
1457=head1 PROFILING
1458
1459Depending on your platform there are various ways of profiling Perl.
1460
1461There are two commonly used techniques of profiling executables:
1462I<statistical time-sampling> and I<basic-block counting>.
1463
1464The first method takes periodically samples of the CPU program counter,
1465and since the program counter can be correlated with the code generated
1466for functions, we get a statistical view of in which functions the
9b22382a 1467program is spending its time. The caveats are that very small/fast
04c692a8
DR
1468functions have lower probability of showing up in the profile, and that
1469periodically interrupting the program (this is usually done rather
1470frequently, in the scale of milliseconds) imposes an additional
9b22382a 1471overhead that may skew the results. The first problem can be alleviated
04c692a8
DR
1472by running the code for longer (in general this is a good idea for
1473profiling), the second problem is usually kept in guard by the
1474profiling tools themselves.
1475
1476The second method divides up the generated code into I<basic blocks>.
1477Basic blocks are sections of code that are entered only in the
9b22382a
FC
1478beginning and exited only at the end. For example, a conditional jump
1479starts a basic block. Basic block profiling usually works by
04c692a8 1480I<instrumenting> the code by adding I<enter basic block #nnnn>
9b22382a
FC
1481book-keeping code to the generated code. During the execution of the
1482code the basic block counters are then updated appropriately. The
04c692a8
DR
1483caveat is that the added extra code can skew the results: again, the
1484profiling tools usually try to factor their own effects out of the
1485results.
1486
1487=head2 Gprof Profiling
1488
e2aed43d 1489I<gprof> is a profiling tool available in many Unix platforms which
9b22382a
FC
1490uses I<statistical time-sampling>. You can build a profiled version of
1491F<perl> by compiling using gcc with the flag C<-pg>. Either edit
1492F<config.sh> or re-run F<Configure>. Running the profiled version of
e2aed43d
NC
1493Perl will create an output file called F<gmon.out> which contains the
1494profiling data collected during the execution.
04c692a8 1495
e2aed43d
NC
1496quick hint:
1497
1498 $ sh Configure -des -Dusedevel -Accflags='-pg' \
1499 -Aldflags='-pg' -Alddlflags='-pg -shared' \
1500 && make perl
1501 $ ./perl ... # creates gmon.out in current directory
1502 $ gprof ./perl > out
1503 $ less out
1504
1505(you probably need to add C<-shared> to the <-Alddlflags> line until RT
1506#118199 is resolved)
04c692a8 1507
e2aed43d
NC
1508The F<gprof> tool can then display the collected data in various ways.
1509Usually F<gprof> understands the following options:
04c692a8
DR
1510
1511=over 4
1512
1513=item * -a
1514
1515Suppress statically defined functions from the profile.
1516
1517=item * -b
1518
1519Suppress the verbose descriptions in the profile.
1520
1521=item * -e routine
1522
1523Exclude the given routine and its descendants from the profile.
1524
1525=item * -f routine
1526
1527Display only the given routine and its descendants in the profile.
1528
1529=item * -s
1530
1531Generate a summary file called F<gmon.sum> which then may be given to
1532subsequent gprof runs to accumulate data over several runs.
1533
1534=item * -z
1535
1536Display routines that have zero usage.
1537
1538=back
1539
1540For more detailed explanation of the available commands and output
e2aed43d 1541formats, see your own local documentation of F<gprof>.
04c692a8 1542
e2aed43d 1543=head2 GCC gcov Profiling
04c692a8 1544
e2aed43d
NC
1545I<basic block profiling> is officially available in gcc 3.0 and later.
1546You can build a profiled version of F<perl> by compiling using gcc with
9b22382a 1547the flags C<-fprofile-arcs -ftest-coverage>. Either edit F<config.sh>
e2aed43d 1548or re-run F<Configure>.
04c692a8 1549
e2aed43d 1550quick hint:
04c692a8 1551
e2aed43d
NC
1552 $ sh Configure -des -Dusedevel -Doptimize='-g' \
1553 -Accflags='-fprofile-arcs -ftest-coverage' \
1554 -Aldflags='-fprofile-arcs -ftest-coverage' \
1555 -Alddlflags='-fprofile-arcs -ftest-coverage -shared' \
1556 && make perl
1557 $ rm -f regexec.c.gcov regexec.gcda
1558 $ ./perl ...
1559 $ gcov regexec.c
1560 $ less regexec.c.gcov
04c692a8 1561
e2aed43d
NC
1562(you probably need to add C<-shared> to the <-Alddlflags> line until RT
1563#118199 is resolved)
04c692a8
DR
1564
1565Running the profiled version of Perl will cause profile output to be
9b22382a 1566generated. For each source file an accompanying F<.gcda> file will be
04c692a8
DR
1567created.
1568
e2aed43d 1569To display the results you use the I<gcov> utility (which should be
9b22382a 1570installed if you have gcc 3.0 or newer installed). F<gcov> is run on
04c692a8
DR
1571source code files, like this
1572
1573 gcov sv.c
1574
9b22382a 1575which will cause F<sv.c.gcov> to be created. The F<.gcov> files contain
04c692a8 1576the source code annotated with relative frequencies of execution
9b22382a 1577indicated by "#" markers. If you want to generate F<.gcov> files for
6f134219
NC
1578all profiled object files, you can run something like this:
1579
1580 for file in `find . -name \*.gcno`
1581 do sh -c "cd `dirname $file` && gcov `basename $file .gcno`"
1582 done
04c692a8
DR
1583
1584Useful options of F<gcov> include C<-b> which will summarise the basic
1585block, branch, and function call coverage, and C<-c> which instead of
9b22382a 1586relative frequencies will use the actual counts. For more information
04c692a8 1587on the use of F<gcov> and basic block profiling with gcc, see the
9b22382a 1588latest GNU CC manual. As of gcc 4.8, this is at
e2aed43d 1589L<http://gcc.gnu.org/onlinedocs/gcc/Gcov-Intro.html#Gcov-Intro>
04c692a8 1590
696d6093
MH
1591=head2 callgrind profiling
1592
1593callgrind is a valgrind tool for profiling source code. Paired
1594with kcachegrind (a Qt based UI), it gives you an overview of
1595where code is taking up time, as well as the ability
1596to examine callers, call trees, and more. One of its benefits
1597is you can use it on perl and XS modules that have not been
1598compiled with debugging symbols.
1599
1600If perl is compiled with debugging symbols (C<-g>), you can view
1601the annotated source and click around, much like Devel::NYTProf's
1602HTML output.
1603
1604For basic usage:
1605
1606 valgrind --tool=callgrind ./perl ...
1607
1608By default it will write output to F<callgrind.out.PID>, but you
1609can change that with C<--callgrind-out-file=...>
1610
1611To view the data, do:
1612
1613 kcachegrind callgrind.out.PID
1614
1615If you'd prefer to view the data in a terminal, you can use
1616F<callgrind_annotate>. In it's basic form:
1617
1618 callgrind_annotate callgrind.out.PID | less
1619
1620Some useful options are:
1621
1622=over 4
1623
1624=item * --threshold
1625
1626Percentage of counts (of primary sort event) we are interested in.
1627The default is 99%, 100% might show things that seem to be missing.
1628
1629=item * --auto
1630
1631Annotate all source files containing functions that helped reach
1632the event count threshold.
1633
1634=back
1635
04c692a8
DR
1636=head1 MISCELLANEOUS TRICKS
1637
1638=head2 PERL_DESTRUCT_LEVEL
1639
1640If you want to run any of the tests yourself manually using e.g.
4dd56148
NC
1641valgrind, please note that by default perl B<does not> explicitly
1642cleanup all the memory it has allocated (such as global memory arenas)
1643but instead lets the exit() of the whole program "take care" of such
1644allocations, also known as "global destruction of objects".
04c692a8
DR
1645
1646There is a way to tell perl to do complete cleanup: set the environment
9b22382a 1647variable PERL_DESTRUCT_LEVEL to a non-zero value. The t/TEST wrapper
04c692a8 1648does set this to 2, and this is what you need to do too, if you don't
f01ecde8 1649want to see the "global leaks": For example, for running under valgrind
04c692a8 1650
a63ef199 1651 env PERL_DESTRUCT_LEVEL=2 valgrind ./perl -Ilib t/foo/bar.t
04c692a8
DR
1652
1653(Note: the mod_perl apache module uses also this environment variable
9b22382a
FC
1654for its own purposes and extended its semantics. Refer to the mod_perl
1655documentation for more information. Also, spawned threads do the
04c692a8
DR
1656equivalent of setting this variable to the value 1.)
1657
1658If, at the end of a run you get the message I<N scalars leaked>, you
b4986286
DM
1659can recompile with C<-DDEBUG_LEAKING_SCALARS>,
1660(C<Configure -Accflags=-DDEBUG_LEAKING_SCALARS>), which will cause the
04c692a8 1661addresses of all those leaked SVs to be dumped along with details as to
9b22382a
FC
1662where each SV was originally allocated. This information is also
1663displayed by Devel::Peek. Note that the extra details recorded with
04c692a8 1664each SV increases memory usage, so it shouldn't be used in production
9b22382a 1665environments. It also converts C<new_SV()> from a macro into a real
04c692a8
DR
1666function, so you can use your favourite debugger to discover where
1667those pesky SVs were allocated.
1668
1669If you see that you're leaking memory at runtime, but neither valgrind
1670nor C<-DDEBUG_LEAKING_SCALARS> will find anything, you're probably
1671leaking SVs that are still reachable and will be properly cleaned up
9b22382a
FC
1672during destruction of the interpreter. In such cases, using the C<-Dm>
1673switch can point you to the source of the leak. If the executable was
04c692a8 1674built with C<-DDEBUG_LEAKING_SCALARS>, C<-Dm> will output SV
9b22382a 1675allocations in addition to memory allocations. Each SV allocation has a
04c692a8 1676distinct serial number that will be written on creation and destruction
9b22382a 1677of the SV. So if you're executing the leaking code in a loop, you need
04c692a8 1678to look for SVs that are created, but never destroyed between each
9b22382a 1679cycle. If such an SV is found, set a conditional breakpoint within
04c692a8 1680C<new_SV()> and make it break only when C<PL_sv_serial> is equal to the
9b22382a 1681serial number of the leaking SV. Then you will catch the interpreter in
04c692a8
DR
1682exactly the state where the leaking SV is allocated, which is
1683sufficient in many cases to find the source of the leak.
1684
1685As C<-Dm> is using the PerlIO layer for output, it will by itself
9b22382a 1686allocate quite a bunch of SVs, which are hidden to avoid recursion. You
04c692a8
DR
1687can bypass the PerlIO layer if you use the SV logging provided by
1688C<-DPERL_MEM_LOG> instead.
1689
1690=head2 PERL_MEM_LOG
1691
6fb87544
MH
1692If compiled with C<-DPERL_MEM_LOG> (C<-Accflags=-DPERL_MEM_LOG>), both
1693memory and SV allocations go through logging functions, which is
1694handy for breakpoint setting.
04c692a8 1695
6fb87544
MH
1696Unless C<-DPERL_MEM_LOG_NOIMPL> (C<-Accflags=-DPERL_MEM_LOG_NOIMPL>) is
1697also compiled, the logging functions read $ENV{PERL_MEM_LOG} to
1698determine whether to log the event, and if so how:
04c692a8 1699
a63ef199
SF
1700 $ENV{PERL_MEM_LOG} =~ /m/ Log all memory ops
1701 $ENV{PERL_MEM_LOG} =~ /s/ Log all SV ops
1702 $ENV{PERL_MEM_LOG} =~ /t/ include timestamp in Log
1703 $ENV{PERL_MEM_LOG} =~ /^(\d+)/ write to FD given (default is 2)
04c692a8
DR
1704
1705Memory logging is somewhat similar to C<-Dm> but is independent of
1706C<-DDEBUGGING>, and at a higher level; all uses of Newx(), Renew(), and
1707Safefree() are logged with the caller's source code file and line
9b22382a
FC
1708number (and C function name, if supported by the C compiler). In
1709contrast, C<-Dm> is directly at the point of C<malloc()>. SV logging is
04c692a8
DR
1710similar.
1711
1712Since the logging doesn't use PerlIO, all SV allocations are logged and
9b22382a 1713no extra SV allocations are introduced by enabling the logging. If
04c692a8
DR
1714compiled with C<-DDEBUG_LEAKING_SCALARS>, the serial number for each SV
1715allocation is also logged.
1716
1717=head2 DDD over gdb
1718
1719Those debugging perl with the DDD frontend over gdb may find the
1720following useful:
1721
1722You can extend the data conversion shortcuts menu, so for example you
1723can display an SV's IV value with one click, without doing any typing.
1724To do that simply edit ~/.ddd/init file and add after:
1725
1726 ! Display shortcuts.
1727 Ddd*gdbDisplayShortcuts: \
1728 /t () // Convert to Bin\n\
1729 /d () // Convert to Dec\n\
1730 /x () // Convert to Hex\n\
1731 /o () // Convert to Oct(\n\
1732
1733the following two lines:
1734
1735 ((XPV*) (())->sv_any )->xpv_pv // 2pvx\n\
1736 ((XPVIV*) (())->sv_any )->xiv_iv // 2ivx
1737
1738so now you can do ivx and pvx lookups or you can plug there the sv_peek
1739"conversion":
1740
1741 Perl_sv_peek(my_perl, (SV*)()) // sv_peek
1742
9b22382a 1743(The my_perl is for threaded builds.) Just remember that every line,
04c692a8
DR
1744but the last one, should end with \n\
1745
1746Alternatively edit the init file interactively via: 3rd mouse button ->
1747New Display -> Edit Menu
1748
1749Note: you can define up to 20 conversion shortcuts in the gdb section.
1750
470dd224
JH
1751=head2 C backtrace
1752
0762e42f
JH
1753On some platforms Perl supports retrieving the C level backtrace
1754(similar to what symbolic debuggers like gdb do).
470dd224
JH
1755
1756The backtrace returns the stack trace of the C call frames,
1757with the symbol names (function names), the object names (like "perl"),
1758and if it can, also the source code locations (file:line).
1759
0762e42f
JH
1760The supported platforms are Linux, and OS X (some *BSD might
1761work at least partly, but they have not yet been tested).
1762
1763This feature hasn't been tested with multiple threads, but it will
1764only show the backtrace of the thread doing the backtracing.
470dd224
JH
1765
1766The feature needs to be enabled with C<Configure -Dusecbacktrace>.
1767
0762e42f
JH
1768The C<-Dusecbacktrace> also enables keeping the debug information when
1769compiling/linking (often: C<-g>). Many compilers/linkers do support
1770having both optimization and keeping the debug information. The debug
1771information is needed for the symbol names and the source locations.
1772
1773Static functions might not be visible for the backtrace.
470dd224
JH
1774
1775Source code locations, even if available, can often be missing or
0762e42f
JH
1776misleading if the compiler has e.g. inlined code. Optimizer can
1777make matching the source code and the object code quite challenging.
470dd224
JH
1778
1779=over 4
1780
1781=item Linux
1782
59b3baca 1783You B<must> have the BFD (-lbfd) library installed, otherwise C<perl> will
0762e42f 1784fail to link. The BFD is usually distributed as part of the GNU binutils.
470dd224
JH
1785
1786Summary: C<Configure ... -Dusecbacktrace>
1787and you need C<-lbfd>.
1788
1789=item OS X
1790
0762e42f
JH
1791The source code locations are supported B<only> if you have
1792the Developer Tools installed. (BFD is B<not> needed.)
470dd224
JH
1793
1794Summary: C<Configure ... -Dusecbacktrace>
1795and installing the Developer Tools would be good.
1796
1797=back
1798
1799Optionally, for trying out the feature, you may want to enable
0762e42f
JH
1800automatic dumping of the backtrace just before a warning or croak (die)
1801message is emitted, by adding C<-Accflags=-DUSE_C_BACKTRACE_ON_ERROR>
1802for Configure.
470dd224
JH
1803
1804Unless the above additional feature is enabled, nothing about the
1805backtrace functionality is visible, except for the Perl/XS level.
1806
1807Furthermore, even if you have enabled this feature to be compiled,
1808you need to enable it in runtime with an environment variable:
0762e42f
JH
1809C<PERL_C_BACKTRACE_ON_ERROR=10>. It must be an integer higher
1810than zero, telling the desired frame count.
470dd224
JH
1811
1812Retrieving the backtrace from Perl level (using for example an XS
1813extension) would be much less exciting than one would hope: normally
1814you would see C<runops>, C<entersub>, and not much else. This API is
1815intended to be called B<from within> the Perl implementation, not from
1816Perl level execution.
1817
0762e42f 1818The C API for the backtrace is as follows:
470dd224
JH
1819
1820=over 4
1821
1822=item get_c_backtrace
1823
1824=item free_c_backtrace
1825
1826=item get_c_backtrace_dump
1827
1828=item dump_c_backtrace
1829
1830=back
1831
04c692a8
DR
1832=head2 Poison
1833
1834If you see in a debugger a memory area mysteriously full of 0xABABABAB
1835or 0xEFEFEFEF, you may be seeing the effect of the Poison() macros, see
1836L<perlclib>.
1837
1838=head2 Read-only optrees
1839
9b22382a 1840Under ithreads the optree is read only. If you want to enforce this, to
04c692a8 1841check for write accesses from buggy code, compile with
91fc0422
FC
1842C<-Accflags=-DPERL_DEBUG_READONLY_OPS>
1843to enable code that allocates op memory
4dd56148
NC
1844via C<mmap>, and sets it read-only when it is attached to a subroutine.
1845Any write access to an op results in a C<SIGBUS> and abort.
04c692a8
DR
1846
1847This code is intended for development only, and may not be portable
9b22382a
FC
1848even to all Unix variants. Also, it is an 80% solution, in that it
1849isn't able to make all ops read only. Specifically it does not apply to
4dd56148 1850op slabs belonging to C<BEGIN> blocks.
04c692a8 1851
4dd56148
NC
1852However, as an 80% solution it is still effective, as it has caught
1853bugs in the past.
04c692a8 1854
f789f6a4
FC
1855=head2 When is a bool not a bool?
1856
23805bfc
KW
1857There wasn't necessarily a standard C<bool> type on compilers prior to
1858C99, and so some workarounds were created. The C<TRUE> and C<FALSE>
1859macros are still available as alternatives for C<true> and C<false>.
1860And the C<cBOOL> macro was created to correctly cast to a true/false
1861value in all circumstances, but should no longer be necessary.
1862Using S<C<(bool)> I<expr>>> should now always work.
f789f6a4 1863
23805bfc
KW
1864There are no plans to remove any of C<TRUE>, C<FALSE>, nor C<cBOOL>.
1865
1866=head2 Finding unsafe truncations
1867
1868You may wish to run C<Configure> with something like
50e4f4d4 1869
cbc13c3d 1870 -Accflags='-Wconversion -Wno-sign-conversion -Wno-shorten-64-to-32'
50e4f4d4
CB
1871
1872or your compiler's equivalent to make it easier to spot any unsafe truncations
1873that show up.
f789f6a4 1874
04c692a8
DR
1875=head2 The .i Targets
1876
1877You can expand the macros in a F<foo.c> file by saying
1878
1879 make foo.i
1880
d1fd4856
VP
1881which will expand the macros using cpp. Don't be scared by the
1882results.
04c692a8
DR
1883
1884=head1 AUTHOR
1885
1886This document was originally written by Nathan Torkington, and is
1887maintained by the perl5-porters mailing list.