Attemting to readdir() something that isn't a dirhandle should cause

[perl5.git] / pod / perlretut.pod
diff --git a/pod/perlretut.pod b/pod/perlretut.pod

index f4e9bb6..c0a78a4 100644 (file)
--- a/pod/perlretut.pod
+++ b/pod/perlretut.pod
@@ -158,13 +158,14 @@ that a metacharacter can be matched by putting a backslash before it:
      "2+2=4" =~ /2\+2/;   # matches, \+ is treated like an ordinary +
      "The interval is [0,1)." =~ /[0,1)./     # is a syntax error!
      "The interval is [0,1)." =~ /\[0,1\)\./  # matches
-    "/usr/bin/perl" =~ /\/usr\/local\/bin\/perl/;  # matches
+    "/usr/bin/perl" =~ /\/usr\/bin\/perl/;  # matches
  
  In the last regexp, the forward slash C<'/'> is also backslashed,
  because it is used to delimit the regexp.  This can lead to LTS
  (leaning toothpick syndrome), however, and it is often more readable
  to change delimiters.
  
+    "/usr/bin/perl" =~ m!/usr/bin/perl!;    # easier to read
  
  The backslash character C<'\'> is a metacharacter itself and needs to
  be backslashed:
@@ -689,10 +690,11 @@ inside goes into the special variables C<$1>, C<$2>, etc.  They can be
  used just as ordinary variables:
  
      # extract hours, minutes, seconds
-    $time =~ /(\d\d):(\d\d):(\d\d)/;  # match hh:mm:ss format
-    $hours = $1;
-    $minutes = $2;
-    $seconds = $3;
+    if ($time =~ /(\d\d):(\d\d):(\d\d)/) {    # match hh:mm:ss format
+       $hours = $1;
+       $minutes = $2;
+       $seconds = $3;
+    }
  
  Now, we know that in scalar context,
  S<C<$time =~ /(\d\d):(\d\d):(\d\d)/> > returns a true or false
@@ -1323,9 +1325,9 @@ If you change C<$pattern> after the first substitution happens, perl
  will ignore it.  If you don't want any substitutions at all, use the
  special delimiter C<m''>:
  
-    $pattern = 'Seuss';
+    @pattern = ('Seuss');
      while (<>) {
-        print if m'$pattern';  # matches '$pattern', not 'Seuss'
+        print if m'@pattern';  # matches literal '@pattern', not 'Seuss'
      }
  
  C<m''> acts like single quotes on a regexp; all other C<m> delimiters
@@ -1403,6 +1405,8 @@ off.  C<\G> allows us to easily do context-sensitive matching:
  
  The combination of C<//g> and C<\G> allows us to process the string a
  bit at a time and use arbitrary Perl logic to decide what to do next.
+Currently, the C<\G> anchor is only fully supported when used to anchor
+to the start of the pattern.
  
  C<\G> is also invaluable in processing fixed length records with
  regexps.  Suppose we have a snippet of coding region DNA, encoded as
@@ -1653,12 +1657,11 @@ Unicode characters in the range of 128-255 use two hexadecimal digits
  with braces: C<\x{ab}>.  Note that this is different than C<\xab>,
  which is just a hexadecimal byte with no Unicode significance.
  
-B<NOTE>: in perl 5.6.0 it used to be that one needed to say C<use utf8>
-to use any Unicode features.  This is no more the case: for almost all
-Unicode processing, the explicit C<utf8> pragma is not needed.
-(The only case where it matters is if your Perl script is in Unicode,
-that is, encoded in UTF-8/UTF-16/UTF-EBCDIC: then an explicit C<use utf8>
-is needed.)
+B<NOTE>: in Perl 5.6.0 it used to be that one needed to say C<use
+utf8> to use any Unicode features.  This is no more the case: for
+almost all Unicode processing, the explicit C<utf8> pragma is not
+needed.  (The only case where it matters is if your Perl script is in
+Unicode and encoded in UTF-8, then an explicit C<use utf8> is needed.)
  
  Figuring out the hexadecimal sequence of a Unicode character you want
  or deciphering someone else's hexadecimal Unicode regexp is about as
@@ -1706,7 +1709,7 @@ it matches I<any> byte 0-255.  So
  The last regexp matches, but is dangerous because the string
  I<character> position is no longer synchronized to the string I<byte>
  position.  This generates the warning 'Malformed UTF-8
-character'.  C<\C> is best used for matching the binary data in strings
+character'.  The C<\C> is best used for matching the binary data in strings
  with binary data intermixed with Unicode characters.
  
  Let us now discuss the rest of the character classes.  Just as with
@@ -1739,7 +1742,7 @@ traditional Unicode classes:
      IsPrint          /^([LMNPS]|Co|Zs)/
      IsPunct          /^P/
      IsSpace          /^Z/ || ($code =~ /^(0009|000A|000B|000C|000D)$/
-    IsSpacePerl      /^Z/ || ($code =~ /^(0009|000A|000C|000D)$/
+    IsSpacePerl      /^Z/ || ($code =~ /^(0009|000A|000C|000D|0085|2028|2029)$/
      IsUpper          /^L[ut]/
      IsWord           /^[LMN]/ || $code eq "005F"
      IsXDigit         $code =~ /^00(3[0-9]|[46][1-6])$/
@@ -1751,9 +1754,9 @@ letter, the braces can be dropped.  For instance, C<\pM> is the
  character class of Unicode 'marks', for example accent marks.
  For the full list see L<perlunicode>.
  
-The Unicode has also been separated into various sets of charaters
+The Unicode has also been separated into various sets of characters
  which you can test with C<\p{In...}> (in) and C<\P{In...}> (not in),
-for example C<\p{InLatin}>, C<\p{InGreek}>, or C<\P{InKatakana}>.
+for example C<\p{Latin}>, C<\p{Greek}>, or C<\P{Katakana}>.
  For the full list see L<perlunicode>.
  
  C<\X> is an abbreviation for a character class sequence that includes
@@ -1783,10 +1786,11 @@ C<[:space:]> correspond to the familiar C<\d>, C<\w>, and C<\s>
  character classes.  To negate a POSIX class, put a C<^> in front of
  the name, so that, e.g., C<[:^digit:]> corresponds to C<\D> and under
  C<utf8>, C<\P{IsDigit}>.  The Unicode and POSIX character classes can
-be used just like C<\d>, both inside and outside of character classes:
+be used just like C<\d>, with the exception that POSIX character
+classes can only be used inside of a character class:
  
      /\s+[abc[:digit:]xyz]\s*/;  # match a,b,c,x,y,z, or a digit
-    /^=item\s[:digit:]/;        # match '=item',
+    /^=item\s[[:digit:]]/;      # match '=item',
                                  # followed by a space and a digit
      use charnames ":full";
      /\s+[abc\p{IsDigit}xyz]\s+/;  # match a,b,c,x,y,z, or a digit
@@ -2002,6 +2006,10 @@ They evaluate true if the regexps do I<not> match:
      $x =~ /foo(?!baz)/;  # matches, 'baz' doesn't follow 'foo'
      $x =~ /(?<!\s)foo/;  # matches, there is no \s before 'foo'
  
+The C<\C> is unsupported in lookbehind, because the already
+treacherous definition of C<\C> would become even more so
+when going backwards.
+
  =head2 Using independent subexpressions to prevent backtracking
  
  The last few extended patterns in this tutorial are experimental as of
@@ -2060,7 +2068,7 @@ the first alternative C<[^()]+> matching a substring with no
  parentheses and the second alternative C<\([^()]*\)>  matching a
  substring delimited by parentheses.  The problem with this regexp is
  that it is pathological: it has nested indeterminate quantifiers
- of the form C<(a+|b)+>.  We discussed in Part 1 how nested quantifiers
+of the form C<(a+|b)+>.  We discussed in Part 1 how nested quantifiers
  like this could take an exponentially long time to execute if there
  was no match possible.  To prevent the exponential blowup, we need to
  prevent useless backtracking at some point.  This can be done by
@@ -2263,7 +2271,7 @@ may surprise you:
      $pat = qr/(?{ $foo = 1 })/;  # precompile code regexp
      /foo${pat}bar/;      # compiles ok
  
-If a regexp has (1) code expressions and interpolating variables,or
+If a regexp has (1) code expressions and interpolating variables, or
  (2) a variable that interpolates a code expression, perl treats the
  regexp as an error. If the code expression is precompiled into a
  variable, however, interpolating is ok. The question is, why is this