perlhacktips.pod - update ASan section

[perl5.git] / pod / perlhacktips.pod
diff --git a/pod/perlhacktips.pod b/pod/perlhacktips.pod

index f41918c..99caf25 100644 (file)
--- a/pod/perlhacktips.pod
+++ b/pod/perlhacktips.pod
@@ -20,8 +20,7 @@ to do that first.
  
  =head1 COMMON PROBLEMS
  
-Perl source plays by ANSI C89 rules: no C99 (or C++) extensions.  In
-some cases we have to take pre-ANSI requirements into consideration.
+Perl source plays by ANSI C89 rules: no C99 (or C++) extensions.
  You don't care about some particular platform having broken Perl? I
  hear there is still a strong demand for J2EE programmers.
  
@@ -79,7 +78,7 @@ If you want to have arrays of constant strings, note carefully the
  right combination of C<const>s:
  
      static const char * const yippee[] =
-       {"hi", "ho", "silver"};
+        {"hi", "ho", "silver"};
  
  There is a way to completely hide any modifiable globals (they are all
  moved to heap), the compilation setting
@@ -134,7 +133,7 @@ Use the Configure C<-Dgccansipedantic> flag to enable the gcc C<-ansi
  -pedantic> flags which enforce stricter ANSI rules.
  
  If using the C<gcc -Wall> note that not all the possible warnings (like
-C<-Wunitialized>) are given unless you also compile with C<-O>.
+C<-Wuninitialized>) are given unless you also compile with C<-O>.
  
  Note that if using gcc, starting from Perl 5.9.5 the Perl core source
  code files (the ones at the top level of the source code distribution,
@@ -143,7 +142,7 @@ as many as possible of the C<-std=c89>, C<-ansi>, C<-pedantic>, and a
  selection of C<-W> flags (see cflags.SH).
  
  Also study L<perlport> carefully to avoid any bad assumptions about the
-operating system, filesystems, and so forth.
+operating system, filesystems, character set, and so forth.
  
  You may once in a while try a "make microperl" to see whether we can
  still compile Perl with just the bare minimum of interfaces.  (See
@@ -173,7 +172,7 @@ NUM2PTR().)
  
  =item *
  
-Casting between data function pointers and data pointers
+Casting between function pointers and data pointers
  
  Technically speaking casting between function pointers and data
  pointers is unportable and undefined, but practically speaking it seems
@@ -201,7 +200,7 @@ guaranteed to be B<int> or B<long>.  If you really explicitly need
  Assuming one can dereference any type of pointer for any type of data
  
    char *p = ...;
-  long pony = *p;    /* BAD */
+  long pony = *(long *)p;    /* BAD */
  
  Many platforms, quite rightly so, will give you a core dump instead of
  a pony if the p happens not to be correctly aligned.
@@ -275,16 +274,32 @@ Assuming the character set is ASCIIish
  Perl can compile and run under EBCDIC platforms.  See L<perlebcdic>.
  This is transparent for the most part, but because the character sets
  differ, you shouldn't use numeric (decimal, octal, nor hex) constants
-to refer to characters.  You can safely say 'A', but not 0x41.  You can
-safely say '\n', but not \012.  If a character doesn't have a trivial
-input form, you should add it to the list in
-F<regen/unicode_constants.pl>, and have Perl create #defines for you,
+to refer to characters.  You can safely say C<'A'>, but not C<0x41>.
+You can safely say C<'\n'>, but not C<\012>.  However, you can use
+macros defined in F<utf8.h> to specify any code point portably.
+C<LATIN1_TO_NATIVE(0xDF)> is going to be the code point that means
+LATIN SMALL LETTER SHARP S on whatever platform you are running on (on
+ASCII platforms it compiles without adding any extra code, so there is
+zero performance hit on those).  The acceptable inputs to
+C<LATIN1_TO_NATIVE> are from C<0x00> through C<0xFF>.  If your input
+isn't guaranteed to be in that range, use C<UNICODE_TO_NATIVE> instead.
+C<NATIVE_TO_LATIN1> and C<NATIVE_TO_UNICODE> translate the opposite
+direction.
+
+If you need the string representation of a character that doesn't have a
+mnemonic name in C, you should add it to the list in
+F<regen/unicode_constants.pl>, and have Perl create C<#define>'s for you,
  based on the current platform.
  
+Note that the C<isI<FOO>> and C<toI<FOO>> macros in F<handy.h> work
+properly on native code points and strings.
+
  Also, the range 'A' - 'Z' in ASCII is an unbroken sequence of 26 upper
  case alphabetic characters.  That is not true in EBCDIC.  Nor for 'a' to
  'z'.  But '0' - '9' is an unbroken range in both systems.  Don't assume
-anything about other ranges.
+anything about other ranges.  (Note that special handling of ranges in
+regular expression patterns and transliterations makes it appear to Perl
+code that the aforementioned ranges are all unbroken.)
  
  Many of the comments in the existing code ignore the possibility of
  EBCDIC, and may be wrong therefore, even if the code works.  This is
@@ -293,11 +308,11 @@ able to handle EBCDIC without having to change pre-existing code.
  
  UTF-8 and UTF-EBCDIC are two different encodings used to represent
  Unicode code points as sequences of bytes.  Macros  with the same names
-(but different definitions) in C<utf8.h> and C<utfebcdic.h> are used to
+(but different definitions) in F<utf8.h> and F<utfebcdic.h> are used to
  allow the calling code to think that there is only one such encoding.
  This is almost always referred to as C<utf8>, but it means the EBCDIC
  version as well.  Again, comments in the code may well be wrong even if
-the code itself is right.  For example, the concept of C<invariant
+the code itself is right.  For example, the concept of UTF-8 C<invariant
  characters> differs between ASCII and EBCDIC.  On ASCII platforms, only
  characters that do not have the high-order bit set (i.e.  whose ordinals
  are strict ASCII, 0 - 127) are invariant, and the documentation and
@@ -307,6 +322,31 @@ EBCDIC machines, but as long as the code itself uses the
  C<NATIVE_IS_INVARIANT()> macro appropriately, it works, even if the
  comments are wrong.
  
+As noted in L<perlhack/TESTING>, when writing test scripts, the file
+F<t/charset_tools.pl> contains some helpful functions for writing tests
+valid on both ASCII and EBCDIC platforms.  Sometimes, though, a test
+can't use a function and it's inconvenient to have different test
+versions depending on the platform.  There are 20 code points that are
+the same in all 4 character sets currently recognized by Perl (the 3
+EBCDIC code pages plus ISO 8859-1 (ASCII/Latin1)).  These can be used in
+such tests, though there is a small possibility that Perl will become
+available in yet another character set, breaking your test.  All but one
+of these code points are C0 control characters.  The most significant
+controls that are the same are C<\0>, C<\r>, and C<\N{VT}> (also
+specifiable as C<\cK>, C<\x0B>, C<\N{U+0B}>, or C<\013>).  The single
+non-control is U+00B6 PILCROW SIGN.  The controls that are the same have
+the same bit pattern in all 4 character sets, regardless of the UTF8ness
+of the string containing them.  The bit pattern for U+B6 is the same in
+all 4 for non-UTF8 strings, but differs in each when its containing
+string is UTF-8 encoded.  The only other code points that have some sort
+of sameness across all 4 character sets are the pair 0xDC and 0xFC.
+Together these represent upper- and lowercase LATIN LETTER U WITH
+DIAERESIS, but which is upper and which is lower may be reversed: 0xDC
+is the capital in Latin1 and 0xFC is the small letter, while 0xFC is the
+capital in EBCDIC and 0xDC is the small one.  This factoid may be
+exploited in writing case insensitive tests that are the same across all
+4 character sets.
+
  =item *
  
  Assuming the character set is just ASCII
@@ -314,9 +354,9 @@ Assuming the character set is just ASCII
  ASCII is a 7 bit encoding, but bytes have 8 bits in them.  The 128 extra
  characters have different meanings depending on the locale.  Absent a
  locale, currently these extra characters are generally considered to be
-unassigned, and this has presented some problems.  This is being changed
-starting in 5.12 so that these characters will be considered to be
-Latin-1 (ISO-8859-1).
+unassigned, and this has presented some problems.  This has being
+changed starting in 5.12 so that these characters can be considered to
+be Latin-1 (ISO-8859-1).
  
  =item *
  
@@ -392,7 +432,7 @@ Mixing declarations and code
  
  That is C99 or C++.  Some C compilers allow that, but you shouldn't.
  
-The gcc option C<-Wdeclaration-after-statements> scans for such
+The gcc option C<-Wdeclaration-after-statement> scans for such
  problems (by default on starting from Perl 5.9.4).
  
  =item *
@@ -475,6 +515,9 @@ Or you can try casting to a "wide enough" type:
  
     printf("i = %"IVdf"\n", (IV)something_very_small_and_signed);
  
+See L<perlguts/Formatted Printing of Size_t and SSize_t> for how to
+print those.
+
  Also remember that the C<%p> format really does require a void pointer:
  
     U8* p = ...;
@@ -560,6 +603,43 @@ temporarily try the following:
  But in any case, try to keep the features and operating systems
  separate.
  
+A good resource on the predefined macros for various operating
+systems, compilers, and so forth is
+L<http://sourceforge.net/p/predef/wiki/Home/>
+
+=item *
+
+Assuming the contents of static memory pointed to by the return values
+of Perl wrappers for C library functions doesn't change.  Many C library
+functions return pointers to static storage that can be overwritten by
+subsequent calls to the same or related functions.  Perl has
+light-weight wrappers for some of these functions, and which don't make
+copies of the static memory.  A good example is the interface to the
+environment variables that are in effect for the program.  Perl has
+C<PerlEnv_getenv> to get values from the environment.  But the return is
+a pointer to static memory in the C library.  If you are using the value
+to immediately test for something, that's fine, but if you save the
+value and expect it to be unchanged by later processing, you would be
+wrong, but perhaps you wouldn't know it because different C library
+implementations behave differently, and the one on the platform you're
+testing on might work for your situation.  But on some platforms, a
+subsequent call to C<PerlEnv_getenv> or related function WILL overwrite
+the memory that your first call points to.  This has led to some
+hard-to-debug problems.  Do a L<perlapi/savepv> to make a copy, thus
+avoiding these problems.  You will have to free the copy when you're
+done to avoid memory leaks.  If you don't have control over when it gets
+freed, you'll need to make the copy in a mortal scalar, like so:
+
+ if ((s = PerlEnv_getenv("foo") == NULL) {
+    ... /* handle NULL case */
+ }
+ else {
+     s = SvPVX(sv_2mortal(newSVpv(s, 0)));
+ }
+
+The above example works only if C<"s"> is C<NUL>-terminated; otherwise
+you have to pass its length to C<newSVpv>.
+
  =back
  
  =head2 Problematic System Interfaces
@@ -568,6 +648,39 @@ separate.
  
  =item *
  
+Perl strings are NOT the same as C strings:  They may contain C<NUL>
+characters, whereas a C string is terminated by the first C<NUL>.
+That is why Perl API functions that deal with strings generally take a
+pointer to the first byte and either a length or a pointer to the byte
+just beyond the final one.
+
+And this is the reason that many of the C library string handling
+functions should not be used.  They don't cope with the full generality
+of Perl strings.  It may be that your test cases don't have embedded
+C<NUL>s, and so the tests pass, whereas there may well eventually arise
+real-world cases where they fail.  A lesson here is to include C<NUL>s
+in your tests.  Now it's fairly rare in most real world cases to get
+C<NUL>s, so your code may seem to work, until one day a C<NUL> comes
+along.
+
+Here's an example.  It used to be a common paradigm, for decades, in the
+perl core to use S<C<strchr("list", c)>> to see if the character C<c> is
+any of the ones given in C<"list">, a double-quote-enclosed string of
+the set of characters that we are seeing if C<c> is one of.  As long as
+C<c> isn't a C<NUL>, it works.  But when C<c> is a C<NUL>, C<strchr>
+returns a pointer to the terminating C<NUL> in C<"list">.   This likely
+will result in a segfault or a security issue when the caller uses that
+end pointer as the starting point to read from.
+
+A solution to this and many similar issues is to use the C<mem>I<-foo> C
+library functions instead.  In this case C<memchr> can be used to see if
+C<c> is in C<"list"> and works even if C<c> is C<NUL>.  These functions
+need an additional parameter to give the string length.
+In the case of literal string parameters, perl has defined macros that
+calculate the length for you.  See L<perlapi/Miscellaneous Functions>.
+
+=item *
+
  malloc(0), realloc(0), calloc(0, 0) are non-portable.  To be portable
  allocate at least one byte.  (In general you should rarely need to work
  at this low level, but instead use the various malloc wrappers.)
@@ -581,6 +694,7 @@ snprintf() - the return type is unportable.  Use my_snprintf() instead.
  =head2 Security problems
  
  Last but not least, here are various tips for safer coding.
+See also L<perlclib> for libc/stdio replacements one should use.
  
  =over 4
  
@@ -592,6 +706,12 @@ Or we will publicly ridicule you.  Seriously.
  
  =item *
  
+Do not use tmpfile()
+
+Use mkstemp() instead.
+
+=item *
+
  Do not use strcpy() or strcat() or strncpy() or strncat()
  
  Use my_strlcpy() and my_strlcat() instead: they either use the native
@@ -616,6 +736,22 @@ of the program is UTF-8.  What happens is that the C<%s> and its operand are
  simply skipped without any notice.
  L<https://sourceware.org/bugzilla/show_bug.cgi?id=6530>.
  
+=item *
+
+Do not use atoi()
+
+Use grok_atoUV() instead.  atoi() has ill-defined behavior on overflows,
+and cannot be used for incremental parsing.  It is also affected by locale,
+which is bad.
+
+=item *
+
+Do not use strtol() or strtoul()
+
+Use grok_atoUV() instead.  strtol() or strtoul() (or their IV/UV-friendly
+macro disguises, Strtol() and Strtoul(), or Atol() and Atoul() are
+affected by locale, which is bad.
+
  =back
  
  =head1 DEBUGGING
@@ -632,28 +768,40 @@ happened, or how did we end up having wrong or unexpected results.
  To really poke around with Perl, you'll probably want to build Perl for
  debugging, like this:
  
-    ./Configure -d -D optimize=-g
+    ./Configure -d -DDEBUGGING
      make
  
-C<-g> is a flag to the C compiler to have it produce debugging
-information which will allow us to step through a running program, and
-to see in which C function we are at (without the debugging information
-we might see only the numerical addresses of the functions, which is
-not very helpful).
-
-F<Configure> will also turn on the C<DEBUGGING> compilation symbol
-which enables all the internal debugging code in Perl.  There are a
-whole bunch of things you can debug with this: L<perlrun> lists them
-all, and the best way to find out about them is to play about with
-them.  The most useful options are probably
+C<-DDEBUGGING> turns on the C compiler's C<-g> flag to have it produce
+debugging information which will allow us to step through a running
+program, and to see in which C function we are at (without the debugging
+information we might see only the numerical addresses of the functions,
+which is not very helpful). It will also turn on the C<DEBUGGING>
+compilation symbol which enables all the internal debugging code in Perl.
+There are a whole bunch of things you can debug with this:
+L<perlrun|perlrun/-Dletters> lists them all, and the best way to find out
+about them is to play about with them.  The most useful options are
+probably
  
      l  Context (loop) stack processing
+    s  Stack snapshots (with v, displays all stacks)
      t  Trace execution
      o  Method and overloading resolution
      c  String/numeric conversions
  
-Some of the functionality of the debugging code can be achieved using
-XS modules.
+For example
+
+    $ perl -Dst -e '$a + 1'
+    ....
+    (-e:1)     gvsv(main::a)
+        =>  UNDEF
+    (-e:1)     const(IV(1))
+        =>  UNDEF  IV(1)
+    (-e:1)     add
+        =>  NV(1)
+
+
+Some of the functionality of the debugging code can be achieved with a
+non-debugging perl by using XS modules:
  
      -Dr => use re 'debug'
      -Dx => use O 'Debug'
@@ -733,7 +881,7 @@ Prints the C definition of the argument given.
    (gdb) ptype PL_op
    type = struct op {
        OP *op_next;
-      OP *op_sibling;
+      OP *op_sibparent;
        OP *(*op_ppaddr)(void);
        PADOFFSET op_targ;
        unsigned int op_type : 9;
@@ -801,7 +949,7 @@ Lots of junk will go past as gdb reads in the relevant source files and
  libraries, and then:
  
      Breakpoint 1, Perl_pp_add () at pp_hot.c:309
-    309         dSP; dATARGET; tryAMAGICbin(add,opASSIGN);
+    1396    dSP; dATARGET; bool useleft; SV *svl, *svr;
      (gdb) step
      311           dPOPTOPnnrl_ul;
      (gdb)
@@ -850,7 +998,7 @@ subroutine:
  
  We can also dump out this op: the current op is always stored in
  C<PL_op>, and we can dump it with C<Perl_op_dump>.  This'll give us
-similar output to L<B::Debug|B::Debug>.
+similar output to CPAN module B::Debug.
  
      (gdb) print Perl_op_dump(PL_op)
      {
@@ -872,11 +1020,11 @@ similar output to L<B::Debug|B::Debug>.
  
  =head2 Using gdb to look at specific parts of a program
  
-With the example above, you knew to look for C<Perl_pp_add>, but what if 
-there were multiple calls to it all over the place, or you didn't know what 
+With the example above, you knew to look for C<Perl_pp_add>, but what if
+there were multiple calls to it all over the place, or you didn't know what
  the op was you were looking for?
  
-One way to do this is to inject a rare call somewhere near what you're looking 
+One way to do this is to inject a rare call somewhere near what you're looking
  for.  For example, you could add C<study> before your method:
  
      study;
@@ -886,7 +1034,7 @@ And in gdb do:
      (gdb) break Perl_pp_study
  
  And then step until you hit what you're
-looking for.  This works well in a loop 
+looking for.  This works well in a loop
  if you want to only break at certain iterations:
  
      for my $c (1..100) {
@@ -895,7 +1043,7 @@ if you want to only break at certain iterations:
  
  =head2 Using gdb to look at what the parser/lexer are doing
  
-If you want to see what perl is doing when parsing/lexing your code, you can 
+If you want to see what perl is doing when parsing/lexing your code, you can
  use C<BEGIN {}>:
  
      print "Before\n";
@@ -909,7 +1057,7 @@ And in gdb:
  If you want to see what the parser/lexer is doing inside of C<if> blocks and
  the like you need to be a little trickier:
  
-    if ($a && $b && do { BEGIN { study } 1 } && $c) { ... } 
+    if ($a && $b && do { BEGIN { study } 1 } && $c) { ... }
  
  =head1 SOURCE CODE STATIC ANALYSIS
  
@@ -922,27 +1070,34 @@ and looking at the resulting graph, what does it tell about the
  execution and data flows.  As a matter of fact, this is exactly how C
  compilers know to give warnings about dubious code.
  
-=head2 lint, splint
+=head2 lint
  
  The good old C code quality inspector, C<lint>, is available in several
  platforms, but please be aware that there are several different
  implementations of it by different vendors, which means that the flags
  are not identical across different platforms.
  
-There is a lint variant called C<splint> (Secure Programming Lint)
-available from http://www.splint.org/ that should compile on any
-Unix-like platform.
-
-There are C<lint> and <splint> targets in Makefile, but you may have to
+There is a C<lint> target in Makefile, but you may have to
  diddle with the flags (see above).
  
  =head2 Coverity
  
-Coverity (http://www.coverity.com/) is a product similar to lint and as
+Coverity (L<http://www.coverity.com/>) is a product similar to lint and as
  a testbed for their product they periodically check several open source
  projects, and they give out accounts to open source developers to the
  defect databases.
  
+There is Coverity setup for the perl5 project:
+L<https://scan.coverity.com/projects/perl5>
+
+=head2 HP-UX cadvise (Code Advisor)
+
+HP has a C/C++ static analyzer product for HP-UX caller Code Advisor.
+(Link not given here because the URL is horribly long and seems horribly
+unstable; use the search engine of your choice to find it.)  The use of
+the C<cadvise_cc> recipe with C<Configure ... -Dcc=./cadvise_cc>
+(see cadvise "User Guide") is recommended; as is the use of C<+wall>.
+
  =head2 cpd (cut-and-paste detector)
  
  The cpd tool detects cut-and-paste coding.  If one instance of the
@@ -950,8 +1105,8 @@ cut-and-pasted code changes, all the other spots should probably be
  changed, too.  Therefore such code should probably be turned into a
  subroutine or a macro.
  
-cpd (http://pmd.sourceforge.net/cpd.html) is part of the pmd project
-(http://pmd.sourceforge.net/).  pmd was originally written for static
+cpd (L<http://pmd.sourceforge.net/cpd.html>) is part of the pmd project
+(L<http://pmd.sourceforge.net/>).  pmd was originally written for static
  analysis of Java code, but later the cpd part of it was extended to
  parse also C and C++.
  
@@ -984,7 +1139,7 @@ being a prime example).  If Configure C<-Dgccansipedantic> is used, the
  C<cflags> frontend selects C<-ansi -pedantic> for the platforms where
  they are known to be safe.
  
-Starting from Perl 5.9.4 the following extra flags are added:
+The following extra flags are added:
  
  =over 4
  
@@ -998,7 +1153,19 @@ C<-Wextra>
  
  =item *
  
-C<-Wdeclaration-after-statement>
+C<-Wc++-compat>
+
+=item *
+
+C<-Wwrite-strings>
+
+=item *
+
+C<-Werror=declaration-after-statement>
+
+=item *
+
+C<-Werror=pointer-arith>
  
  =back
  
@@ -1009,10 +1176,6 @@ their own Augean stablemaster:
  
  =item *
  
-C<-Wpointer-arith>
-
-=item *
-
  C<-Wshadow>
  
  =item *
@@ -1070,7 +1233,7 @@ C<-Accflags=-DDL_UNLOAD_ALL_AT_EXIT>.
  
  The valgrind tool can be used to find out both memory leaks and illegal
  heap memory accesses.  As of version 3.3.0, Valgrind only supports Linux
-on x86, x86-64 and PowerPC and Darwin (OS X) on x86 and x86-64).  The
+on x86, x86-64 and PowerPC and Darwin (OS X) on x86 and x86-64.  The
  special "test.valgrind" target can be used to run the tests under
  valgrind.  Found errors and memory leaks are logged in files named
  F<testfile.valgrind> and by default output is displayed inline.
@@ -1087,7 +1250,7 @@ run.  The valgrind tests support being run in parallel to help with this:
  Note that the above two invocations will be very verbose as reachable
  memory and leak-checking is enabled by default.  If you want to just see
  pure errors, try:
-    
+
      VG_OPTS='-q --leak-check=no --show-reachable=no' TEST_JOBS=9 \
          make test.valgrind
  
@@ -1106,19 +1269,24 @@ To get valgrind and for more information see
  
  =head2 AddressSanitizer
  
-AddressSanitizer is a clang and gcc extension, included in clang since
-v3.1 and gcc since v4.8.  It checks illegal heap pointers, global
-pointers, stack pointers and use after free errors, and is fast enough
-that you can easily compile your debugging or optimized perl with it.
-It does not check memory leaks though.  AddressSanitizer is available
-for Linux, Mac OS X and soon on Windows.
+AddressSanitizer ("ASan") consists of a compiler instrumentation module
+and a run-time C<malloc> library. AddressSanitizer is available for
+Linux, Mac OS X and Windows. Specifically, it has been included in clang
+since v3.1, gcc since v4.8, and Visual Studio 2019 since v16.1. It checks
+for unsafe memory usage, such as use after free and buffer overflow
+conditions, and is fast enough that you can easily compile your
+debugging or optimized perl with it. Modern versions of ASan check for
+memory leaks by default on most platforms, otherwise (e.g. x86_64 OS X)
+this feature can be enabled via C<ASAN_OPTIONS=detect_leaks=1>.
+
  
  To build perl with AddressSanitizer, your Configure invocation should
  look like:
  
      sh Configure -des -Dcc=clang \
-       -Accflags=-faddress-sanitizer -Aldflags=-faddress-sanitizer \
-       -Alddlflags=-shared\ -faddress-sanitizer
+       -Accflags=-fsanitize=address -Aldflags=-fsanitize=address \
+       -Alddlflags=-shared\ -fsanitize=address \
+       -fsanitize-blacklist=`pwd`/asan_ignore
  
  where these arguments mean:
  
@@ -1129,25 +1297,31 @@ where these arguments mean:
  This should be replaced by the full path to your clang executable if it
  is not in your path.
  
-=item * -Accflags=-faddress-sanitizer
+=item * -Accflags=-fsanitize=address
  
  Compile perl and extensions sources with AddressSanitizer.
  
-=item * -Aldflags=-faddress-sanitizer
+=item * -Aldflags=-fsanitize=address
  
  Link the perl executable with AddressSanitizer.
  
-=item * -Alddlflags=-shared\ -faddress-sanitizer
+=item * -Alddlflags=-shared\ -fsanitize=address
  
  Link dynamic extensions with AddressSanitizer.  You must manually
  specify C<-shared> because using C<-Alddlflags=-shared> will prevent
  Configure from setting a default value for C<lddlflags>, which usually
  contains C<-shared> (at least on Linux).
  
+=item * -fsanitize-blacklist=`pwd`/asan_ignore
+
+AddressSanitizer will ignore functions listed in the C<asan_ignore>
+file. (This file should contain a short explanation of why each of
+the functions is listed.)
+
  =back
  
  See also
-L<http://code.google.com/p/address-sanitizer/wiki/AddressSanitizer>.
+L<https://github.com/google/sanitizers/wiki/AddressSanitizer>.
  
  
  =head1 PROFILING
@@ -1299,7 +1473,7 @@ variable PERL_DESTRUCT_LEVEL to a non-zero value.  The t/TEST wrapper
  does set this to 2, and this is what you need to do too, if you don't
  want to see the "global leaks": For example, for running under valgrind
  
-       env PERL_DESTRUCT_LEVEL=2 valgrind ./perl -Ilib t/foo/bar.t
+    env PERL_DESTRUCT_LEVEL=2 valgrind ./perl -Ilib t/foo/bar.t
  
  (Note: the mod_perl apache module uses also this environment variable
  for its own purposes and extended its semantics.  Refer to the mod_perl
@@ -1307,7 +1481,8 @@ documentation for more information.  Also, spawned threads do the
  equivalent of setting this variable to the value 1.)
  
  If, at the end of a run you get the message I<N scalars leaked>, you
-can recompile with C<-DDEBUG_LEAKING_SCALARS>, which will cause the
+can recompile with C<-DDEBUG_LEAKING_SCALARS>,
+(C<Configure -Accflags=-DDEBUG_LEAKING_SCALARS>), which will cause the
  addresses of all those leaked SVs to be dumped along with details as to
  where each SV was originally allocated.  This information is also
  displayed by Devel::Peek.  Note that the extra details recorded with
@@ -1339,17 +1514,18 @@ C<-DPERL_MEM_LOG> instead.
  
  =head2 PERL_MEM_LOG
  
-If compiled with C<-DPERL_MEM_LOG>, both memory and SV allocations go
-through logging functions, which is handy for breakpoint setting.
+If compiled with C<-DPERL_MEM_LOG> (C<-Accflags=-DPERL_MEM_LOG>), both
+memory and SV allocations go through logging functions, which is
+handy for breakpoint setting.
  
-Unless C<-DPERL_MEM_LOG_NOIMPL> is also compiled, the logging functions
-read $ENV{PERL_MEM_LOG} to determine whether to log the event, and if
-so how:
+Unless C<-DPERL_MEM_LOG_NOIMPL> (C<-Accflags=-DPERL_MEM_LOG_NOIMPL>) is
+also compiled, the logging functions read $ENV{PERL_MEM_LOG} to
+determine whether to log the event, and if so how:
  
-    $ENV{PERL_MEM_LOG} =~ /m/          Log all memory ops
-    $ENV{PERL_MEM_LOG} =~ /s/          Log all SV ops
-    $ENV{PERL_MEM_LOG} =~ /t/          include timestamp in Log
-    $ENV{PERL_MEM_LOG} =~ /^(\d+)/     write to FD given (default is 2)
+    $ENV{PERL_MEM_LOG} =~ /m/           Log all memory ops
+    $ENV{PERL_MEM_LOG} =~ /s/           Log all SV ops
+    $ENV{PERL_MEM_LOG} =~ /t/           include timestamp in Log
+    $ENV{PERL_MEM_LOG} =~ /^(\d+)/      write to FD given (default is 2)
  
  Memory logging is somewhat similar to C<-Dm> but is independent of
  C<-DDEBUGGING>, and at a higher level; all uses of Newx(), Renew(), and
@@ -1399,41 +1575,46 @@ Note: you can define up to 20 conversion shortcuts in the gdb section.
  
  =head2 C backtrace
  
-Starting from Perl 5.21.1, on some platforms Perl supports retrieving
-the C level backtrace (similar to what symbolic debuggers like gdb do).
+On some platforms Perl supports retrieving the C level backtrace
+(similar to what symbolic debuggers like gdb do).
  
  The backtrace returns the stack trace of the C call frames,
  with the symbol names (function names), the object names (like "perl"),
  and if it can, also the source code locations (file:line).
  
-The supported platforms are Linux and OS X (some *BSD might work at
-least partly, but they have not yet been tested).
+The supported platforms are Linux, and OS X (some *BSD might
+work at least partly, but they have not yet been tested).
+
+This feature hasn't been tested with multiple threads, but it will
+only show the backtrace of the thread doing the backtracing.
  
  The feature needs to be enabled with C<Configure -Dusecbacktrace>.
  
-The C<-Dusecbacktrace> also enables keeping the debug information
-when compiling.  Many compilers/linkers do support having both
-optimization and keeping the debug information.  The debug information
-is needed for the symbol names and the source locations.
+The C<-Dusecbacktrace> also enables keeping the debug information when
+compiling/linking (often: C<-g>).  Many compilers/linkers do support
+having both optimization and keeping the debug information.  The debug
+information is needed for the symbol names and the source locations.
+
+Static functions might not be visible for the backtrace.
  
  Source code locations, even if available, can often be missing or
-misleading if the compiler has e.g. inlined code.
+misleading if the compiler has e.g. inlined code.  Optimizer can
+make matching the source code and the object code quite challenging.
  
  =over 4
  
  =item Linux
  
-You B<must> need to have the BFD (-lbfd) library installed, otherwise
-C<perl> will fail to link.  The BFD is usually distributed as part of
-the binutils.
+You B<must> have the BFD (-lbfd) library installed, otherwise C<perl> will
+fail to link.  The BFD is usually distributed as part of the GNU binutils.
  
  Summary: C<Configure ... -Dusecbacktrace>
  and you need C<-lbfd>.
  
  =item OS X
  
-The source code locations are supported only if you have both C<-g>
-and have the Developer Tools installed.
+The source code locations are supported B<only> if you have
+the Developer Tools installed.  (BFD is B<not> needed.)
  
  Summary: C<Configure ... -Dusecbacktrace>
  and installing the Developer Tools would be good.
@@ -1441,17 +1622,17 @@ and installing the Developer Tools would be good.
  =back
  
  Optionally, for trying out the feature, you may want to enable
-automatic dumping of the backtrace just before a warning message
-is emitted (this includes coincidentally croaking) by adding
-C<-Accflags=-DUSE_C_BACKTRACE_ON_WARN> for Configure.
+automatic dumping of the backtrace just before a warning or croak (die)
+message is emitted, by adding C<-Accflags=-DUSE_C_BACKTRACE_ON_ERROR>
+for Configure.
  
  Unless the above additional feature is enabled, nothing about the
  backtrace functionality is visible, except for the Perl/XS level.
  
  Furthermore, even if you have enabled this feature to be compiled,
  you need to enable it in runtime with an environment variable:
-C<PERL_C_BACKTRACE_ON_WARN=10>.  It must be an integer higher
-than zero, and it tells the desired frame count.
+C<PERL_C_BACKTRACE_ON_ERROR=10>.  It must be an integer higher
+than zero, telling the desired frame count.
  
  Retrieving the backtrace from Perl level (using for example an XS
  extension) would be much less exciting than one would hope: normally
@@ -1459,7 +1640,7 @@ you would see C<runops>, C<entersub>, and not much else.  This API is
  intended to be called B<from within> the Perl implementation, not from
  Perl level execution.
  
-The C API for the backtrace is as follows (see L<perlintern>) for details).
+The C API for the backtrace is as follows:
  
  =over 4
  
@@ -1499,8 +1680,10 @@ bugs in the past.
  =head2 When is a bool not a bool?
  
  On pre-C99 compilers, C<bool> is defined as equivalent to C<char>.
-Consequently assignment of any larger type to a C<bool> is unsafe and may
-be truncated.  The C<cBOOL> macro exists to cast it correctly.
+Consequently assignment of any larger type to a C<bool> is unsafe and may be
+truncated.  The C<cBOOL> macro exists to cast it correctly; you may also find
+that using it is shorter and clearer than writing out the equivalent
+conditional expression longhand.
  
  On those platforms and compilers where C<bool> really is a boolean (C++,
  C99), it is easy to forget the cast.  You can force C<bool> to be a C<char>
@@ -1512,6 +1695,10 @@ run C<Configure> with something like
  or your compiler's equivalent to make it easier to spot any unsafe truncations
  that show up.
  
+The C<TRUE> and C<FALSE> macros are available for situations where using them
+would clarify intent. (But they always just mean the same as the integers 1 and
+0 regardless, so using them isn't compulsory.)
+
  =head2 The .i Targets
  
  You can expand the macros in a F<foo.c> file by saying