selection of C<-W> flags (see cflags.SH).
Also study L<perlport> carefully to avoid any bad assumptions about the
-operating system, filesystems, and so forth.
+operating system, filesystems, character set, and so forth.
You may once in a while try a "make microperl" to see whether we can
still compile Perl with just the bare minimum of interfaces. (See
=item *
-Casting between data function pointers and data pointers
+Casting between function pointers and data pointers
Technically speaking casting between function pointers and data
pointers is unportable and undefined, but practically speaking it seems
Perl can compile and run under EBCDIC platforms. See L<perlebcdic>.
This is transparent for the most part, but because the character sets
differ, you shouldn't use numeric (decimal, octal, nor hex) constants
-to refer to characters. You can safely say 'A', but not 0x41. You can
-safely say '\n', but not \012. If a character doesn't have a trivial
-input form, you should add it to the list in
-F<regen/unicode_constants.pl>, and have Perl create #defines for you,
+to refer to characters. You can safely say C<'A'>, but not C<0x41>.
+You can safely say C<'\n'>, but not C<\012>. However, you can use
+macros defined in F<utf8.h> to specify any code point portably.
+C<LATIN1_TO_NATIVE(0xDF)> is going to be the code point that means
+LATIN SMALL LETTER SHARP S on whatever platform you are running on (on
+ASCII platforms it compiles without adding any extra code, so there is
+zero performance hit on those). The acceptable inputs to
+C<LATIN1_TO_NATIVE> are from C<0x00> through C<0xFF>. If your input
+isn't guaranteed to be in that range, use C<UNICODE_TO_NATIVE> instead.
+C<NATIVE_TO_LATIN1> and C<NATIVE_TO_UNICODE> translate the opposite
+direction.
+
+If you need the string representation of a character that doesn't have a
+mnemonic name in C, you should add it to the list in
+F<regen/unicode_constants.pl>, and have Perl create C<#define>s for you,
based on the current platform.
+Note that the C<isI<FOO>> and C<toI<FOO>> macros in F<handy.h> work
+properly on native code points and strings.
+
Also, the range 'A' - 'Z' in ASCII is an unbroken sequence of 26 upper
case alphabetic characters. That is not true in EBCDIC. Nor for 'a' to
'z'. But '0' - '9' is an unbroken range in both systems. Don't assume
UTF-8 and UTF-EBCDIC are two different encodings used to represent
Unicode code points as sequences of bytes. Macros with the same names
-(but different definitions) in C<utf8.h> and C<utfebcdic.h> are used to
+(but different definitions) in F<utf8.h> and F<utfebcdic.h> are used to
allow the calling code to think that there is only one such encoding.
This is almost always referred to as C<utf8>, but it means the EBCDIC
version as well. Again, comments in the code may well be wrong even if
-the code itself is right. For example, the concept of C<invariant
+the code itself is right. For example, the concept of UTF-8 C<invariant
characters> differs between ASCII and EBCDIC. On ASCII platforms, only
characters that do not have the high-order bit set (i.e. whose ordinals
are strict ASCII, 0 - 127) are invariant, and the documentation and
ASCII is a 7 bit encoding, but bytes have 8 bits in them. The 128 extra
characters have different meanings depending on the locale. Absent a
locale, currently these extra characters are generally considered to be
-unassigned, and this has presented some problems. This is being changed
-starting in 5.12 so that these characters will be considered to be
-Latin-1 (ISO-8859-1).
+unassigned, and this has presented some problems. This has being
+changed starting in 5.12 so that these characters can be considered to
+be Latin-1 (ISO-8859-1).
=item *
=head2 Security problems
Last but not least, here are various tips for safer coding.
+See also L<perlclib> for libc/stdio replacements one should use.
=over 4
=item *
+Do not use tmpfile()
+
+Use mkstemp() instead.
+
+=item *
+
Do not use strcpy() or strcat() or strncpy() or strncat()
Use my_strlcpy() and my_strlcat() instead: they either use the native
simply skipped without any notice.
L<https://sourceware.org/bugzilla/show_bug.cgi?id=6530>.
+=item *
+
+Do not use atoi()
+
+Use grok_atou() instead. atoi() has ill-defined behavior on overflows,
+and cannot be used for incremental parsing. It is also affected by locale,
+which is bad.
+
+=item *
+
+Do not use strtol() or strtoul()
+
+Use grok_atou() instead. strtol() or strtoul() (or their IV/UV-friendly
+macro disguises, Strtol() and Strtoul(), or Atol() and Atoul() are
+affected by locale, which is bad.
+
=back
=head1 DEBUGGING
=head2 C backtrace
-Starting from Perl 5.21.1, on some platforms Perl supports retrieving
-the C level backtrace (similar to what symbolic debuggers like gdb do).
+On some platforms Perl supports retrieving the C level backtrace
+(similar to what symbolic debuggers like gdb do).
The backtrace returns the stack trace of the C call frames,
with the symbol names (function names), the object names (like "perl"),
and if it can, also the source code locations (file:line).
-The supported platforms are Linux and OS X (some *BSD might work at
-least partly, but they have not yet been tested).
+The supported platforms are Linux, and OS X (some *BSD might
+work at least partly, but they have not yet been tested).
+
+This feature hasn't been tested with multiple threads, but it will
+only show the backtrace of the thread doing the backtracing.
The feature needs to be enabled with C<Configure -Dusecbacktrace>.
-The C<-Dusecbacktrace> also enables keeping the debug information
-when compiling. Many compilers/linkers do support having both
-optimization and keeping the debug information. The debug information
-is needed for the symbol names and the source locations.
+The C<-Dusecbacktrace> also enables keeping the debug information when
+compiling/linking (often: C<-g>). Many compilers/linkers do support
+having both optimization and keeping the debug information. The debug
+information is needed for the symbol names and the source locations.
+
+Static functions might not be visible for the backtrace.
Source code locations, even if available, can often be missing or
-misleading if the compiler has e.g. inlined code.
+misleading if the compiler has e.g. inlined code. Optimizer can
+make matching the source code and the object code quite challenging.
=over 4
=item Linux
-You B<must> need to have the BFD (-lbfd) library installed, otherwise
-C<perl> will fail to link. The BFD is usually distributed as part of
-the binutils.
+You B<must> have the BFD (-lbfd) library installed, otherwise C<perl> will
+fail to link. The BFD is usually distributed as part of the GNU binutils.
Summary: C<Configure ... -Dusecbacktrace>
and you need C<-lbfd>.
=item OS X
-The source code locations are supported only if you have both C<-g>
-and have the Developer Tools installed.
+The source code locations are supported B<only> if you have
+the Developer Tools installed. (BFD is B<not> needed.)
Summary: C<Configure ... -Dusecbacktrace>
and installing the Developer Tools would be good.
=back
Optionally, for trying out the feature, you may want to enable
-automatic dumping of the backtrace just before a warning message
-is emitted (this includes coincidentally croaking) by adding
-C<-Accflags=-DUSE_C_BACKTRACE_ON_WARN> for Configure.
+automatic dumping of the backtrace just before a warning or croak (die)
+message is emitted, by adding C<-Accflags=-DUSE_C_BACKTRACE_ON_ERROR>
+for Configure.
Unless the above additional feature is enabled, nothing about the
backtrace functionality is visible, except for the Perl/XS level.
Furthermore, even if you have enabled this feature to be compiled,
you need to enable it in runtime with an environment variable:
-C<PERL_C_BACKTRACE_ON_WARN=10>. It must be an integer higher
-than zero, and it tells the desired frame count.
+C<PERL_C_BACKTRACE_ON_ERROR=10>. It must be an integer higher
+than zero, telling the desired frame count.
Retrieving the backtrace from Perl level (using for example an XS
extension) would be much less exciting than one would hope: normally
intended to be called B<from within> the Perl implementation, not from
Perl level execution.
-The C API for the backtrace is as follows (see L<perlintern>) for details).
+The C API for the backtrace is as follows:
=over 4