perlrecharclass: Fix typo

[perl5.git] / pod / perlhacktips.pod
diff --git a/pod/perlhacktips.pod b/pod/perlhacktips.pod

index f41918c..943bdfb 100644 (file)
--- a/pod/perlhacktips.pod
+++ b/pod/perlhacktips.pod
@@ -143,7 +143,7 @@ as many as possible of the C<-std=c89>, C<-ansi>, C<-pedantic>, and a
  selection of C<-W> flags (see cflags.SH).
  
  Also study L<perlport> carefully to avoid any bad assumptions about the
-operating system, filesystems, and so forth.
+operating system, filesystems, character set, and so forth.
  
  You may once in a while try a "make microperl" to see whether we can
  still compile Perl with just the bare minimum of interfaces.  (See
@@ -173,7 +173,7 @@ NUM2PTR().)
  
  =item *
  
-Casting between data function pointers and data pointers
+Casting between function pointers and data pointers
  
  Technically speaking casting between function pointers and data
  pointers is unportable and undefined, but practically speaking it seems
@@ -275,12 +275,26 @@ Assuming the character set is ASCIIish
  Perl can compile and run under EBCDIC platforms.  See L<perlebcdic>.
  This is transparent for the most part, but because the character sets
  differ, you shouldn't use numeric (decimal, octal, nor hex) constants
-to refer to characters.  You can safely say 'A', but not 0x41.  You can
-safely say '\n', but not \012.  If a character doesn't have a trivial
-input form, you should add it to the list in
-F<regen/unicode_constants.pl>, and have Perl create #defines for you,
+to refer to characters.  You can safely say C<'A'>, but not C<0x41>.
+You can safely say C<'\n'>, but not C<\012>.  However, you can use
+macros defined in F<utf8.h> to specify any code point portably.
+C<LATIN1_TO_NATIVE(0xDF)> is going to be the code point that means
+LATIN SMALL LETTER SHARP S on whatever platform you are running on (on
+ASCII platforms it compiles without adding any extra code, so there is
+zero performance hit on those).  The acceptable inputs to
+C<LATIN1_TO_NATIVE> are from C<0x00> through C<0xFF>.  If your input
+isn't guaranteed to be in that range, use C<UNICODE_TO_NATIVE> instead.
+C<NATIVE_TO_LATIN1> and C<NATIVE_TO_UNICODE> translate the opposite
+direction.
+
+If you need the string representation of a character that doesn't have a
+mnemonic name in C, you should add it to the list in
+F<regen/unicode_constants.pl>, and have Perl create C<#define>s for you,
  based on the current platform.
  
+Note that the C<isI<FOO>> and C<toI<FOO>> macros in F<handy.h> work
+properly on native code points and strings.
+
  Also, the range 'A' - 'Z' in ASCII is an unbroken sequence of 26 upper
  case alphabetic characters.  That is not true in EBCDIC.  Nor for 'a' to
  'z'.  But '0' - '9' is an unbroken range in both systems.  Don't assume
@@ -293,11 +307,11 @@ able to handle EBCDIC without having to change pre-existing code.
  
  UTF-8 and UTF-EBCDIC are two different encodings used to represent
  Unicode code points as sequences of bytes.  Macros  with the same names
-(but different definitions) in C<utf8.h> and C<utfebcdic.h> are used to
+(but different definitions) in F<utf8.h> and F<utfebcdic.h> are used to
  allow the calling code to think that there is only one such encoding.
  This is almost always referred to as C<utf8>, but it means the EBCDIC
  version as well.  Again, comments in the code may well be wrong even if
-the code itself is right.  For example, the concept of C<invariant
+the code itself is right.  For example, the concept of UTF-8 C<invariant
  characters> differs between ASCII and EBCDIC.  On ASCII platforms, only
  characters that do not have the high-order bit set (i.e.  whose ordinals
  are strict ASCII, 0 - 127) are invariant, and the documentation and
@@ -314,9 +328,9 @@ Assuming the character set is just ASCII
  ASCII is a 7 bit encoding, but bytes have 8 bits in them.  The 128 extra
  characters have different meanings depending on the locale.  Absent a
  locale, currently these extra characters are generally considered to be
-unassigned, and this has presented some problems.  This is being changed
-starting in 5.12 so that these characters will be considered to be
-Latin-1 (ISO-8859-1).
+unassigned, and this has presented some problems.  This has being
+changed starting in 5.12 so that these characters can be considered to
+be Latin-1 (ISO-8859-1).
  
  =item *
  
@@ -581,6 +595,7 @@ snprintf() - the return type is unportable.  Use my_snprintf() instead.
  =head2 Security problems
  
  Last but not least, here are various tips for safer coding.
+See also L<perlclib> for libc/stdio replacements one should use.
  
  =over 4
  
@@ -592,6 +607,12 @@ Or we will publicly ridicule you.  Seriously.
  
  =item *
  
+Do not use tmpfile()
+
+Use mkstemp() instead.
+
+=item *
+
  Do not use strcpy() or strcat() or strncpy() or strncat()
  
  Use my_strlcpy() and my_strlcat() instead: they either use the native
@@ -616,6 +637,22 @@ of the program is UTF-8.  What happens is that the C<%s> and its operand are
  simply skipped without any notice.
  L<https://sourceware.org/bugzilla/show_bug.cgi?id=6530>.
  
+=item *
+
+Do not use atoi()
+
+Use grok_atou() instead.  atoi() has ill-defined behavior on overflows,
+and cannot be used for incremental parsing.  It is also affected by locale,
+which is bad.
+
+=item *
+
+Do not use strtol() or strtoul()
+
+Use grok_atou() instead.  strtol() or strtoul() (or their IV/UV-friendly
+macro disguises, Strtol() and Strtoul(), or Atol() and Atoul() are
+affected by locale, which is bad.
+
  =back
  
  =head1 DEBUGGING
@@ -1399,41 +1436,46 @@ Note: you can define up to 20 conversion shortcuts in the gdb section.
  
  =head2 C backtrace
  
-Starting from Perl 5.21.1, on some platforms Perl supports retrieving
-the C level backtrace (similar to what symbolic debuggers like gdb do).
+On some platforms Perl supports retrieving the C level backtrace
+(similar to what symbolic debuggers like gdb do).
  
  The backtrace returns the stack trace of the C call frames,
  with the symbol names (function names), the object names (like "perl"),
  and if it can, also the source code locations (file:line).
  
-The supported platforms are Linux and OS X (some *BSD might work at
-least partly, but they have not yet been tested).
+The supported platforms are Linux, and OS X (some *BSD might
+work at least partly, but they have not yet been tested).
+
+This feature hasn't been tested with multiple threads, but it will
+only show the backtrace of the thread doing the backtracing.
  
  The feature needs to be enabled with C<Configure -Dusecbacktrace>.
  
-The C<-Dusecbacktrace> also enables keeping the debug information
-when compiling.  Many compilers/linkers do support having both
-optimization and keeping the debug information.  The debug information
-is needed for the symbol names and the source locations.
+The C<-Dusecbacktrace> also enables keeping the debug information when
+compiling/linking (often: C<-g>).  Many compilers/linkers do support
+having both optimization and keeping the debug information.  The debug
+information is needed for the symbol names and the source locations.
+
+Static functions might not be visible for the backtrace.
  
  Source code locations, even if available, can often be missing or
-misleading if the compiler has e.g. inlined code.
+misleading if the compiler has e.g. inlined code.  Optimizer can
+make matching the source code and the object code quite challenging.
  
  =over 4
  
  =item Linux
  
-You B<must> need to have the BFD (-lbfd) library installed, otherwise
-C<perl> will fail to link.  The BFD is usually distributed as part of
-the binutils.
+You B<must> have the BFD (-lbfd) library installed, otherwise C<perl> will
+fail to link.  The BFD is usually distributed as part of the GNU binutils.
  
  Summary: C<Configure ... -Dusecbacktrace>
  and you need C<-lbfd>.
  
  =item OS X
  
-The source code locations are supported only if you have both C<-g>
-and have the Developer Tools installed.
+The source code locations are supported B<only> if you have
+the Developer Tools installed.  (BFD is B<not> needed.)
  
  Summary: C<Configure ... -Dusecbacktrace>
  and installing the Developer Tools would be good.
@@ -1441,17 +1483,17 @@ and installing the Developer Tools would be good.
  =back
  
  Optionally, for trying out the feature, you may want to enable
-automatic dumping of the backtrace just before a warning message
-is emitted (this includes coincidentally croaking) by adding
-C<-Accflags=-DUSE_C_BACKTRACE_ON_WARN> for Configure.
+automatic dumping of the backtrace just before a warning or croak (die)
+message is emitted, by adding C<-Accflags=-DUSE_C_BACKTRACE_ON_ERROR>
+for Configure.
  
  Unless the above additional feature is enabled, nothing about the
  backtrace functionality is visible, except for the Perl/XS level.
  
  Furthermore, even if you have enabled this feature to be compiled,
  you need to enable it in runtime with an environment variable:
-C<PERL_C_BACKTRACE_ON_WARN=10>.  It must be an integer higher
-than zero, and it tells the desired frame count.
+C<PERL_C_BACKTRACE_ON_ERROR=10>.  It must be an integer higher
+than zero, telling the desired frame count.
  
  Retrieving the backtrace from Perl level (using for example an XS
  extension) would be much less exciting than one would hope: normally
@@ -1459,7 +1501,7 @@ you would see C<runops>, C<entersub>, and not much else.  This API is
  intended to be called B<from within> the Perl implementation, not from
  Perl level execution.
  
-The C API for the backtrace is as follows (see L<perlintern>) for details).
+The C API for the backtrace is as follows:
  
  =over 4