selection of C<-W> flags (see cflags.SH).
Also study L<perlport> carefully to avoid any bad assumptions about the
-operating system, filesystems, and so forth.
+operating system, filesystems, character set, and so forth.
You may once in a while try a "make microperl" to see whether we can
still compile Perl with just the bare minimum of interfaces. (See
=item *
-Casting between data function pointers and data pointers
+Casting between function pointers and data pointers
Technically speaking casting between function pointers and data
pointers is unportable and undefined, but practically speaking it seems
Perl can compile and run under EBCDIC platforms. See L<perlebcdic>.
This is transparent for the most part, but because the character sets
differ, you shouldn't use numeric (decimal, octal, nor hex) constants
-to refer to characters. You can safely say 'A', but not 0x41. You can
-safely say '\n', but not \012. If a character doesn't have a trivial
-input form, you should add it to the list in
-F<regen/unicode_constants.pl>, and have Perl create #defines for you,
+to refer to characters. You can safely say C<'A'>, but not C<0x41>.
+You can safely say C<'\n'>, but not C<\012>. However, you can use
+macros defined in F<utf8.h> to specify any code point portably.
+C<LATIN1_TO_NATIVE(0xDF)> is going to be the code point that means
+LATIN SMALL LETTER SHARP S on whatever platform you are running on (on
+ASCII platforms it compiles without adding any extra code, so there is
+zero performance hit on those). The acceptable inputs to
+C<LATIN1_TO_NATIVE> are from C<0x00> through C<0xFF>. If your input
+isn't guaranteed to be in that range, use C<UNICODE_TO_NATIVE> instead.
+C<NATIVE_TO_LATIN1> and C<NATIVE_TO_UNICODE> translate the opposite
+direction.
+
+If you need the string representation of a character that doesn't have a
+mnemonic name in C, you should add it to the list in
+F<regen/unicode_constants.pl>, and have Perl create C<#define>s for you,
based on the current platform.
+Note that the C<isI<FOO>> and C<toI<FOO>> macros in F<handy.h> work
+properly on native code points and strings.
+
Also, the range 'A' - 'Z' in ASCII is an unbroken sequence of 26 upper
case alphabetic characters. That is not true in EBCDIC. Nor for 'a' to
'z'. But '0' - '9' is an unbroken range in both systems. Don't assume
UTF-8 and UTF-EBCDIC are two different encodings used to represent
Unicode code points as sequences of bytes. Macros with the same names
-(but different definitions) in C<utf8.h> and C<utfebcdic.h> are used to
+(but different definitions) in F<utf8.h> and F<utfebcdic.h> are used to
allow the calling code to think that there is only one such encoding.
This is almost always referred to as C<utf8>, but it means the EBCDIC
version as well. Again, comments in the code may well be wrong even if
-the code itself is right. For example, the concept of C<invariant
+the code itself is right. For example, the concept of UTF-8 C<invariant
characters> differs between ASCII and EBCDIC. On ASCII platforms, only
characters that do not have the high-order bit set (i.e. whose ordinals
are strict ASCII, 0 - 127) are invariant, and the documentation and
ASCII is a 7 bit encoding, but bytes have 8 bits in them. The 128 extra
characters have different meanings depending on the locale. Absent a
locale, currently these extra characters are generally considered to be
-unassigned, and this has presented some problems. This is being changed
-starting in 5.12 so that these characters will be considered to be
-Latin-1 (ISO-8859-1).
+unassigned, and this has presented some problems. This has being
+changed starting in 5.12 so that these characters can be considered to
+be Latin-1 (ISO-8859-1).
=item *
=head2 Security problems
Last but not least, here are various tips for safer coding.
+See also L<perlclib> for libc/stdio replacements one should use.
=over 4
=item *
+Do not use tmpfile()
+
+Use mkstemp() instead.
+
+=item *
+
Do not use strcpy() or strcat() or strncpy() or strncat()
Use my_strlcpy() and my_strlcat() instead: they either use the native
L<C<Perl_form>()|perlapi/form> or SVs and
L<C<Perl_sv_catpvf()>|perlapi/sv_catpvf>.
-Note that some versions of all the C<sprintf()> forms are buggy in
-glibc as of version 2.17. They won't allow a C<%s> format to create a
-string that isn't valid UTF-8 if the current underlying locale of the
-program is UTF-8. What happens is that the C<%s> and its operand are
+Note that glibc C<printf()>, C<sprintf()>, etc. are buggy before glibc
+version 2.17. They won't allow a C<%.s> format with a precision to
+create a string that isn't valid UTF-8 if the current underlying locale
+of the program is UTF-8. What happens is that the C<%s> and its operand are
simply skipped without any notice.
+L<https://sourceware.org/bugzilla/show_bug.cgi?id=6530>.
+
+=item *
+
+Do not use atoi()
+
+Use grok_atou() instead. atoi() has ill-defined behavior on overflows,
+and cannot be used for incremental parsing. It is also affected by locale,
+which is bad.
+
+=item *
+
+Do not use strtol() or strtoul()
+
+Use grok_atou() instead. strtol() or strtoul() (or their IV/UV-friendly
+macro disguises, Strtol() and Strtoul(), or Atol() and Atoul() are
+affected by locale, which is bad.
=back
on x86, x86-64 and PowerPC and Darwin (OS X) on x86 and x86-64). The
special "test.valgrind" target can be used to run the tests under
valgrind. Found errors and memory leaks are logged in files named
-F<testfile.valgrind>.
+F<testfile.valgrind> and by default output is displayed inline.
+
+Example usage:
+
+ make test.valgrind
+
+Since valgrind adds significant overhead, tests will take much longer to
+run. The valgrind tests support being run in parallel to help with this:
+
+ TEST_JOBS=9 make test.valgrind
+
+Note that the above two invocations will be very verbose as reachable
+memory and leak-checking is enabled by default. If you want to just see
+pure errors, try:
+
+ VG_OPTS='-q --leak-check=no --show-reachable=no' TEST_JOBS=9 \
+ make test.valgrind
Valgrind also provides a cachegrind tool, invoked on perl as:
Note: you can define up to 20 conversion shortcuts in the gdb section.
+=head2 C backtrace
+
+On some platforms Perl supports retrieving the C level backtrace
+(similar to what symbolic debuggers like gdb do).
+
+The backtrace returns the stack trace of the C call frames,
+with the symbol names (function names), the object names (like "perl"),
+and if it can, also the source code locations (file:line).
+
+The supported platforms are Linux, and OS X (some *BSD might
+work at least partly, but they have not yet been tested).
+
+This feature hasn't been tested with multiple threads, but it will
+only show the backtrace of the thread doing the backtracing.
+
+The feature needs to be enabled with C<Configure -Dusecbacktrace>.
+
+The C<-Dusecbacktrace> also enables keeping the debug information when
+compiling/linking (often: C<-g>). Many compilers/linkers do support
+having both optimization and keeping the debug information. The debug
+information is needed for the symbol names and the source locations.
+
+Static functions might not be visible for the backtrace.
+
+Source code locations, even if available, can often be missing or
+misleading if the compiler has e.g. inlined code. Optimizer can
+make matching the source code and the object code quite challenging.
+
+=over 4
+
+=item Linux
+
+You B<must> have the BFD (-lbfd) library installed, otherwise C<perl> will
+fail to link. The BFD is usually distributed as part of the GNU binutils.
+
+Summary: C<Configure ... -Dusecbacktrace>
+and you need C<-lbfd>.
+
+=item OS X
+
+The source code locations are supported B<only> if you have
+the Developer Tools installed. (BFD is B<not> needed.)
+
+Summary: C<Configure ... -Dusecbacktrace>
+and installing the Developer Tools would be good.
+
+=back
+
+Optionally, for trying out the feature, you may want to enable
+automatic dumping of the backtrace just before a warning or croak (die)
+message is emitted, by adding C<-Accflags=-DUSE_C_BACKTRACE_ON_ERROR>
+for Configure.
+
+Unless the above additional feature is enabled, nothing about the
+backtrace functionality is visible, except for the Perl/XS level.
+
+Furthermore, even if you have enabled this feature to be compiled,
+you need to enable it in runtime with an environment variable:
+C<PERL_C_BACKTRACE_ON_ERROR=10>. It must be an integer higher
+than zero, telling the desired frame count.
+
+Retrieving the backtrace from Perl level (using for example an XS
+extension) would be much less exciting than one would hope: normally
+you would see C<runops>, C<entersub>, and not much else. This API is
+intended to be called B<from within> the Perl implementation, not from
+Perl level execution.
+
+The C API for the backtrace is as follows:
+
+=over 4
+
+=item get_c_backtrace
+
+=item free_c_backtrace
+
+=item get_c_backtrace_dump
+
+=item dump_c_backtrace
+
+=back
+
=head2 Poison
If you see in a debugger a memory area mysteriously full of 0xABABABAB
Under ithreads the optree is read only. If you want to enforce this, to
check for write accesses from buggy code, compile with
-C<-DPERL_DEBUG_READONLY_OPS> to enable code that allocates op memory
+C<-Accflags=-DPERL_DEBUG_READONLY_OPS>
+to enable code that allocates op memory
via C<mmap>, and sets it read-only when it is attached to a subroutine.
Any write access to an op results in a C<SIGBUS> and abort.