Attemting to readdir() something that isn't a dirhandle should cause

[perl5.git] / pod / perlretut.pod
diff --git a/pod/perlretut.pod b/pod/perlretut.pod

index 8f7c8cd..c0a78a4 100644 (file)
--- a/pod/perlretut.pod
+++ b/pod/perlretut.pod
@@ -158,13 +158,14 @@ that a metacharacter can be matched by putting a backslash before it:
      "2+2=4" =~ /2\+2/;   # matches, \+ is treated like an ordinary +
      "The interval is [0,1)." =~ /[0,1)./     # is a syntax error!
      "The interval is [0,1)." =~ /\[0,1\)\./  # matches
-    "/usr/bin/perl" =~ /\/usr\/local\/bin\/perl/;  # matches
+    "/usr/bin/perl" =~ /\/usr\/bin\/perl/;  # matches
  
  In the last regexp, the forward slash C<'/'> is also backslashed,
  because it is used to delimit the regexp.  This can lead to LTS
  (leaning toothpick syndrome), however, and it is often more readable
  to change delimiters.
  
+    "/usr/bin/perl" =~ m!/usr/bin/perl!;    # easier to read
  
  The backslash character C<'\'> is a metacharacter itself and needs to
  be backslashed:
@@ -689,10 +690,11 @@ inside goes into the special variables C<$1>, C<$2>, etc.  They can be
  used just as ordinary variables:
  
      # extract hours, minutes, seconds
-    $time =~ /(\d\d):(\d\d):(\d\d)/;  # match hh:mm:ss format
-    $hours = $1;
-    $minutes = $2;
-    $seconds = $3;
+    if ($time =~ /(\d\d):(\d\d):(\d\d)/) {    # match hh:mm:ss format
+       $hours = $1;
+       $minutes = $2;
+       $seconds = $3;
+    }
  
  Now, we know that in scalar context,
  S<C<$time =~ /(\d\d):(\d\d):(\d\d)/> > returns a true or false
@@ -1323,9 +1325,9 @@ If you change C<$pattern> after the first substitution happens, perl
  will ignore it.  If you don't want any substitutions at all, use the
  special delimiter C<m''>:
  
-    $pattern = 'Seuss';
+    @pattern = ('Seuss');
      while (<>) {
-        print if m'$pattern';  # matches '$pattern', not 'Seuss'
+        print if m'@pattern';  # matches literal '@pattern', not 'Seuss'
      }
  
  C<m''> acts like single quotes on a regexp; all other C<m> delimiters
@@ -1403,6 +1405,8 @@ off.  C<\G> allows us to easily do context-sensitive matching:
  
  The combination of C<//g> and C<\G> allows us to process the string a
  bit at a time and use arbitrary Perl logic to decide what to do next.
+Currently, the C<\G> anchor is only fully supported when used to anchor
+to the start of the pattern.
  
  C<\G> is also invaluable in processing fixed length records with
  regexps.  Suppose we have a snippet of coding region DNA, encoded as
@@ -1705,7 +1709,7 @@ it matches I<any> byte 0-255.  So
  The last regexp matches, but is dangerous because the string
  I<character> position is no longer synchronized to the string I<byte>
  position.  This generates the warning 'Malformed UTF-8
-character'.  C<\C> is best used for matching the binary data in strings
+character'.  The C<\C> is best used for matching the binary data in strings
  with binary data intermixed with Unicode characters.
  
  Let us now discuss the rest of the character classes.  Just as with
@@ -1738,7 +1742,7 @@ traditional Unicode classes:
      IsPrint          /^([LMNPS]|Co|Zs)/
      IsPunct          /^P/
      IsSpace          /^Z/ || ($code =~ /^(0009|000A|000B|000C|000D)$/
-    IsSpacePerl      /^Z/ || ($code =~ /^(0009|000A|000C|000D)$/
+    IsSpacePerl      /^Z/ || ($code =~ /^(0009|000A|000C|000D|0085|2028|2029)$/
      IsUpper          /^L[ut]/
      IsWord           /^[LMN]/ || $code eq "005F"
      IsXDigit         $code =~ /^00(3[0-9]|[46][1-6])$/
@@ -1750,9 +1754,9 @@ letter, the braces can be dropped.  For instance, C<\pM> is the
  character class of Unicode 'marks', for example accent marks.
  For the full list see L<perlunicode>.
  
-The Unicode has also been separated into various sets of charaters
+The Unicode has also been separated into various sets of characters
  which you can test with C<\p{In...}> (in) and C<\P{In...}> (not in),
-for example C<\p{InLatin}>, C<\p{InGreek}>, or C<\P{InKatakana}>.
+for example C<\p{Latin}>, C<\p{Greek}>, or C<\P{Katakana}>.
  For the full list see L<perlunicode>.
  
  C<\X> is an abbreviation for a character class sequence that includes
@@ -1782,10 +1786,11 @@ C<[:space:]> correspond to the familiar C<\d>, C<\w>, and C<\s>
  character classes.  To negate a POSIX class, put a C<^> in front of
  the name, so that, e.g., C<[:^digit:]> corresponds to C<\D> and under
  C<utf8>, C<\P{IsDigit}>.  The Unicode and POSIX character classes can
-be used just like C<\d>, both inside and outside of character classes:
+be used just like C<\d>, with the exception that POSIX character
+classes can only be used inside of a character class:
  
      /\s+[abc[:digit:]xyz]\s*/;  # match a,b,c,x,y,z, or a digit
-    /^=item\s[:digit:]/;        # match '=item',
+    /^=item\s[[:digit:]]/;      # match '=item',
                                  # followed by a space and a digit
      use charnames ":full";
      /\s+[abc\p{IsDigit}xyz]\s+/;  # match a,b,c,x,y,z, or a digit
@@ -2001,6 +2006,10 @@ They evaluate true if the regexps do I<not> match:
      $x =~ /foo(?!baz)/;  # matches, 'baz' doesn't follow 'foo'
      $x =~ /(?<!\s)foo/;  # matches, there is no \s before 'foo'
  
+The C<\C> is unsupported in lookbehind, because the already
+treacherous definition of C<\C> would become even more so
+when going backwards.
+
  =head2 Using independent subexpressions to prevent backtracking
  
  The last few extended patterns in this tutorial are experimental as of
@@ -2262,7 +2271,7 @@ may surprise you:
      $pat = qr/(?{ $foo = 1 })/;  # precompile code regexp
      /foo${pat}bar/;      # compiles ok
  
-If a regexp has (1) code expressions and interpolating variables,or
+If a regexp has (1) code expressions and interpolating variables, or
  (2) a variable that interpolates a code expression, perl treats the
  regexp as an error. If the code expression is precompiled into a
  variable, however, interpolating is ok. The question is, why is this