perldelta for 2f465e08e / #123652

[perl5.git] / pod / perlfunc.pod
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod

index 40e4965..9dc4cc6 100644 (file)
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -390,7 +390,7 @@ other named unary operator.  The operator may be any of:
      -g  File has setgid bit set.
      -k  File has sticky bit set.
  
-    -T  File is an ASCII text file (heuristic guess).
+    -T  File is an ASCII or UTF-8 text file (heuristic guess).
      -B  File is a "binary" file (opposite of -T).
  
      -M  Script start time minus file modification time, in days.
@@ -449,12 +449,18 @@ filehandle won't cache the results of the file tests when this pragma is
  in effect.  Read the documentation for the C<filetest> pragma for more
  information.
  
-The C<-T> and C<-B> switches work as follows.  The first block or so of the
-file is examined for odd characters such as strange control codes or
-characters with the high bit set.  If too many strange characters (>30%)
-are found, it's a C<-B> file; otherwise it's a C<-T> file.  Also, any file
-containing a zero byte in the first block is considered a binary file.  If C<-T>
-or C<-B> is used on a filehandle, the current IO buffer is examined
+The C<-T> and C<-B> switches work as follows.  The first block or so of
+the file is examined to see if it is valid UTF-8 that includes non-ASCII
+characters.  If, so it's a C<-T> file.  Otherwise, that same portion of
+the file is examined for odd characters such as strange control codes or
+characters with the high bit set.  If more than a third of the
+characters are strange, it's a C<-B> file; otherwise it's a C<-T> file.
+Also, any file containing a zero byte in the examined portion is
+considered a binary file.  (If executed within the scope of a L<S<use
+locale>|perllocale> which includes C<LC_CTYPE>, odd characters are
+anything that isn't a printable nor space in the current locale.)  If
+C<-T> or C<-B> is used on a filehandle, the current IO buffer is
+examined
  rather than the first block.  Both C<-T> and C<-B> return true on an empty
  file, or a file at EOF when testing a filehandle.  Because you have to
  read a file to do the C<-T> test, on most occasions you want to use a C<-f>
@@ -738,7 +744,8 @@ Returns the context of the current pure perl subroutine call.  In scalar
  context, returns the caller's package name if there I<is> a caller (that is, if
  we're in a subroutine or C<eval> or C<require>) and the undefined value
  otherwise.  caller never returns XS subs and they are skipped.  The next pure
-perl sub will appear instead of the XS sub in caller's return values. In list
+perl sub will appear instead of the XS
+sub in caller's return values.  In list
  context, caller returns
  
      # 0         1          2
@@ -756,7 +763,7 @@ to go back before the current one.
       = caller($i);
  
  Here, $subroutine is the function that the caller called (rather than the
-function containing the caller). Note that $subroutine may be C<(eval)> if
+function containing the caller).  Note that $subroutine may be C<(eval)> if
  the frame is not a subroutine call, but an C<eval>.  In such a case
  additional elements $evaltext and
  C<$is_require> are set: C<$is_require> is true if the frame is created by a
@@ -1374,11 +1381,12 @@ straightforward.  Although exists() will return false for deleted entries,
  deleting array elements never changes indices of existing values; use shift()
  or splice() for that.  However, if any deleted elements fall at the end of an
  array, the array's size shrinks to the position of the highest element that
-still tests true for exists(), or to 0 if none do. In other words, an
+still tests true for exists(), or to 0 if none do.  In other words, an
  array won't have trailing nonexistent elements after a delete.
  
-B<WARNING:> Calling delete on array values is deprecated and likely to
-be removed in a future version of Perl.
+B<WARNING:> Calling C<delete> on array values is strongly discouraged.  The
+notion of deleting or checking the existence of Perl array elements is not
+conceptually coherent, and can lead to surprising behavior.
  
  Deleting from C<%ENV> modifies the environment.  Deleting from a hash tied to
  a DBM file deletes the entry from the DBM file.  Deleting from a C<tied> hash
@@ -2055,9 +2063,11 @@ corresponding value is undefined.
      print "True\n"      if $hash{$key};
  
  exists may also be called on array elements, but its behavior is much less
-obvious and is strongly tied to the use of L</delete> on arrays.  B<Be aware>
-that calling exists on array values is deprecated and likely to be removed in
-a future version of Perl.
+obvious and is strongly tied to the use of L</delete> on arrays.
+
+B<WARNING:> Calling C<exists> on array values is strongly discouraged.  The
+notion of deleting or checking the existence of Perl array elements is not
+conceptually coherent, and can lead to surprising behavior.
  
      print "Exists\n"    if exists $array[$index];
      print "Defined\n"   if defined $array[$index];
@@ -2287,6 +2297,12 @@ same underlying descriptor:
              "not have a real file descriptor\n";
      }
  
+The behavior of C<fileno> on a directory handle depends on the operating
+system.  On a system with dirfd(3) or similar, C<fileno> on a directory
+handle returns the underlying file descriptor associated with the
+handle; on systems with no such support, it returns the undefined value,
+and sets C<$!> (errno).
+
  =item flock FILEHANDLE,OPERATION
  X<flock> X<lock> X<locking>
  
@@ -2706,7 +2722,8 @@ various get routines are as follows:
   $comment,  $gcos,     $dir,       $shell,   $expire ) = getpw*
   # 5        6          7           8         9
  
-(If the entry doesn't exist you get an empty list.)
+(If the entry doesn't exist, the return value is a single meaningless true
+value.)
  
  The exact meaning of the $gcos field varies but it usually contains
  the real name of the user (as opposed to the login name) and other
@@ -3352,7 +3369,7 @@ Respects current C<LC_CTYPE> locale for code points < 256; and uses Unicode
  rules for the remaining code points (this last can only happen if
  the UTF8 flag is also set).  See L<perllocale>.
  
-Starting in v5.20, Perl wil use full Unicode rules if the locale is
+Starting in v5.20, Perl uses full Unicode rules if the locale is
  UTF-8.  Otherwise, there is a deficiency in this scheme, which is that
  case changes that cross the 255/256
  boundary are not well-defined.  For example, the lower case of LATIN CAPITAL
@@ -3362,8 +3379,10 @@ locale), the lower case of U+1E9E is
  itself, because 0xDF may not be LATIN SMALL LETTER SHARP S in the
  current locale, and Perl has no way of knowing if that character even
  exists in the locale, much less what code point it is.  Perl returns
-the input character unchanged, for all instances (and there aren't
-many) where the 255/256 boundary would otherwise be crossed.
+a result that is above 255 (almost always the input character unchanged,
+for all instances (and there aren't many) where the 255/256 boundary
+would otherwise be crossed; and starting in v5.22, it raises a
+L<locale|perldiag/Can't do %s("%s") on non-UTF-8 locale; resolved to "%s".> warning.
  
  =item Otherwise, If EXPR has the UTF8 flag set:
  
@@ -3667,12 +3686,13 @@ C<{>.  Usually it gets it right, but if it
  doesn't it won't realize something is wrong until it gets to the C<}> and
  encounters the missing (or unexpected) comma.  The syntax error will be
  reported close to the C<}>, but you'll need to change something near the C<{>
-such as using a unary C<+> to give Perl some help:
+such as using a unary C<+> or semicolon to give Perl some help:
  
      %hash = map {  "\L$_" => 1  } @array # perl guesses EXPR. wrong
      %hash = map { +"\L$_" => 1  } @array # perl guesses BLOCK. right
-    %hash = map { ("\L$_" => 1) } @array # this also works
-    %hash = map {  lc($_) => 1  } @array # as does this.
+    %hash = map {; "\L$_" => 1  } @array # this also works
+    %hash = map { ("\L$_" => 1) } @array # as does this
+    %hash = map {  lc($_) => 1  } @array # and this.
      %hash = map +( lc($_) => 1 ), @array # this is EXPR and works!
  
      %hash = map  ( lc($_), 1 ),   @array # evaluates to (1, @array)
@@ -4356,7 +4376,9 @@ existing variable: a package variable of the same name.
  
  This means that when C<use strict 'vars'> is in effect, C<our> lets you use
  a package variable without qualifying it with the package name, but only within
-the lexical scope of the C<our> declaration.
+the lexical scope of the C<our>
+declaration.  This applies immediately--even
+within the same statement.
  
      package Foo;
      use strict;
@@ -4382,6 +4404,16 @@ package variables spring into existence when first used.
  
      print $Foo::foo; # prints 23
  
+Because the variable becomes legal immediately under C<use strict 'vars'>, so
+long as there is no variable with that name is already in scope, you can then
+reference the package variable again even within the same statement.
+
+    package Foo;
+    use strict;
+
+    my  $foo = $foo; # error, undeclared $foo on right-hand side
+    our $foo = $foo; # no errors
+
  If more than one variable is listed, the list must be placed
  in parentheses.
  
@@ -4501,7 +4533,8 @@ of values, as follows:
      D  A float of long-double precision in native format.
           (Long doubles are available only if your system supports
            long double values _and_ if Perl has been compiled to
-          support those.  Raises an exception otherwise.)
+          support those.  Raises an exception otherwise.
+          Note that there are different long double formats.)
  
      p  A pointer to a null-terminated string.
      P  A pointer to a structure (fixed-length string).
@@ -4836,6 +4869,8 @@ Some systems may have even weirder byte orders such as
     0x56 0x78 0x12 0x34
     0x34 0x12 0x78 0x56
  
+These are called mid-endian, middle-endian, mixed-endian, or just weird.
+
  You can determine your system endianness with this incantation:
  
     printf("%#02x ", $_) for unpack("W*", pack L=>0x12345678); 
@@ -4851,7 +4886,9 @@ or from the command line:
      $ perl -V:byteorder
  
  Byteorders C<"1234"> and C<"12345678"> are little-endian; C<"4321">
-and C<"87654321"> are big-endian.
+and C<"87654321"> are big-endian.  Systems with multiarchitecture binaries
+will have C<"ffff">, signifying that static information doesn't work,
+one must use runtime probing.
  
  For portably packed integers, either use the formats C<n>, C<N>, C<v>, 
  and C<V> or else use the C<< > >> and C<< < >> modifiers described
@@ -4859,6 +4896,19 @@ immediately below.  See also L<perlport>.
  
  =item *
  
+Also floating point numbers have endianness.  Usually (but not always)
+this agrees with the integer endianness.  Even though most platforms
+these days use the IEEE 754 binary format, there are differences,
+especially if the long doubles are involved.  You can see the
+C<Config> variables C<doublekind> and C<longdblkind> (also C<doublesize>,
+C<longdblsize>): the "kind" values are enums, unlike C<byteorder>.
+
+Portability-wise the best option is probably to keep to the IEEE 754
+64-bit doubles, and of agreed-upon endianness.  Another possibility
+is the C<"%a">) format of C<printf>.
+
+=item *
+
  Starting with Perl 5.10.0, integer and floating-point formats, along with
  the C<p> and C<P> formats and C<()> groups, may all be followed by the 
  C<< > >> or C<< < >> endianness modifiers to respectively enforce big-
@@ -5010,6 +5060,13 @@ If TEMPLATE requires more arguments than pack() is given, pack()
  assumes additional C<""> arguments.  If TEMPLATE requires fewer arguments
  than given, extra arguments are ignored.
  
+=item *
+
+Attempting to pack the special floating point values C<Inf> and C<NaN>
+(infinity, also in negative, and not-a-number) into packed integer values
+(like C<"L">) is a fatal error.  The reason for this is that there simply
+isn't any sensible mapping for these special values into integers.
+
  =back
  
  Examples:
@@ -5295,11 +5352,14 @@ error prone.
  =item prototype FUNCTION
  X<prototype>
  
+=item prototype
+
  =for Pod::Functions +5.002 get the prototype (if any) of a subroutine
  
  Returns the prototype of a function as a string (or C<undef> if the
  function has no prototype).  FUNCTION is a reference to, or the name of,
-the function whose prototype you want to retrieve.
+the function whose prototype you want to retrieve.  If FUNCTION is omitted,
+$_ is used.
  
  If FUNCTION is a string starting with C<CORE::>, the rest is taken as a
  name for a Perl builtin.  If the builtin's arguments
@@ -6940,7 +7000,8 @@ uses empty string matches as separators to produce the output
  list of its component characters.
  
  As a special case for C<split>, the empty pattern given in
-L<match operator|perlop/"m/PATTERN/msixpodualgc"> syntax (C<//>) specifically matches the empty string, which is contrary to its usual
+L<match operator|perlop/"m/PATTERN/msixpodualngc"> syntax (C<//>)
+specifically matches the empty string, which is contrary to its usual
  interpretation as the last successful match.
  
  If PATTERN is C</^/>, then it is treated as if it used the
@@ -7644,7 +7705,8 @@ list context is currently not possible this would serve no purpose.
  
  C<state> variables are enabled only when the C<use feature "state"> pragma 
  is in effect, unless the keyword is written as C<CORE::state>.
-See also L<feature>.
+See also L<feature>. Alternately, include a C<use v5.10> or later to the
+current scope.
  
  =item study SCALAR
  X<study>
@@ -8601,15 +8663,19 @@ This is often useful if you need to check the current Perl version before
  C<use>ing library modules that won't work with older versions of Perl.
  (We try not to do this more than we have to.)
  
-C<use VERSION> also enables all features available in the requested
+C<use VERSION> also lexically enables all features available in the requested
  version as defined by the C<feature> pragma, disabling any features
  not in the requested version's feature bundle.  See L<feature>.
  Similarly, if the specified Perl version is greater than or equal to
  5.12.0, strictures are enabled lexically as
  with C<use strict>.  Any explicit use of
  C<use strict> or C<no strict> overrides C<use VERSION>, even if it comes
-before it.  In both cases, the F<feature.pm> and F<strict.pm> files are
-not actually loaded.
+before it.  Later use of C<use VERSION>
+will override all behavior of a previous
+C<use VERSION>, possibly removing the C<strict> and C<feature> added by
+C<use VERSION>.  C<use VERSION> does not
+load the F<feature.pm> or F<strict.pm>
+files.
  
  The C<BEGIN> forces the C<require> and C<import> to happen at compile time.  The
  C<require> makes sure the module is loaded into memory if it hasn't been
@@ -9044,8 +9110,8 @@ and C<${^CHILD_ERROR_NATIVE}>.
  Note that a return value of C<-1> could mean that child processes are
  being automatically reaped, as described in L<perlipc>.
  
-If you use wait in your handler for $SIG{CHLD} it may accidentally for the
-child created by qx() or system().  See L<perlipc> for details.
+If you use C<wait> in your handler for $SIG{CHLD}, it may accidentally wait
+for the child created by qx() or system().  See L<perlipc> for details.
  
  Portability issues: L<perlport/wait>.
  
@@ -9279,8 +9345,6 @@ This keyword is documented in L<perlsub/"Autoloading">.
  
  =item else
  
-=item elseif
-
  =item elsif
  
  =item for
@@ -9297,6 +9361,15 @@ This keyword is documented in L<perlsub/"Autoloading">.
  
  These flow-control keywords are documented in L<perlsyn/"Compound Statements">.
  
+=item elseif
+
+The "else if" keyword is spelled C<elsif> in Perl.  There's no C<elif>
+or C<else if> either.  It does parse C<elseif>, but only to warn you
+about not using it.
+
+See the documentation for flow-control keywords in L<perlsyn/"Compound
+Statements">.
+
  =back
  
  =over