Revert "perlinterp: Use 'e.g' not 'i.e.' for 'for example'"

[perl5.git] / pod / perlretut.pod
diff --git a/pod/perlretut.pod b/pod/perlretut.pod

index c5d8891..734ca5c 100644 (file)
--- a/pod/perlretut.pod
+++ b/pod/perlretut.pod
@@ -459,6 +459,11 @@ character C<\w\W> or C<\W\w>:
  Note in the last example, the end of the string is considered a word
  boundary.
  
+For natural language processing (so that, for example, apostrophes are
+included in words), use instead C<\b{wb}>
+
+    "don't" =~ / .+? \b{wb} /x;  # matches the whole string
+
  You might wonder why C<'.'> matches everything but C<"\n"> - why not
  every character? The reason is that often one is matching against
  lines and would like to ignore the newline characters.  For instance,
@@ -1981,14 +1986,18 @@ also listed there.  Some synonyms are a single character.  For these,
  you can drop the braces.  For instance, C<\pM> is the same thing as
  C<\p{Mark}>, meaning things like accent marks.
  
-The Unicode C<\p{Script}> property is used to categorize every Unicode
-character into the language script it is written in.  For example,
+The Unicode C<\p{Script}> and C<\p{Script_Extensions}> properties are
+used to categorize every Unicode character into the language script it
+is written in.  (C<Script_Extensions> is an improved version of
+C<Script>, which is retained for backward compatibility, and so you
+should generally use C<Script_Extensions>.)
+For example,
  English, French, and a bunch of other European languages are written in
  the Latin script.  But there is also the Greek script, the Thai script,
  the Katakana script, etc.  You can test whether a character is in a
-particular script with, for example C<\p{Latin}>, C<\p{Greek}>,
-or C<\p{Katakana}>.  To test if it isn't in the Balinese script, you
-would use C<\P{Balinese}>.
+particular script (based on C<Script_Extensions>) with, for example
+C<\p{Latin}>, C<\p{Greek}>, or C<\p{Katakana}>.  To test if it isn't in
+the Balinese script, you would use C<\P{Balinese}>.
  
  What we have described so far is the single form of the C<\p{...}> character
  classes.  There is also a compound form which you may run into.  These
@@ -1996,8 +2005,8 @@ look like C<\p{name=value}> or C<\p{name:value}> (the equals sign and colon
  can be used interchangeably).  These are more general than the single form,
  and in fact most of the single forms are just Perl-defined shortcuts for common
  compound forms.  For example, the script examples in the previous paragraph
-could be written equivalently as C<\p{Script=Latin}>, C<\p{Script:Greek}>,
-C<\p{script=katakana}>, and C<\P{script=balinese}> (case is irrelevant
+could be written equivalently as C<\p{Script_Extensions=Latin}>, C<\p{Script_Extensions:Greek}>,
+C<\p{script_extensions=katakana}>, and C<\P{script_extensions=balinese}> (case is irrelevant
  between the C<{}> braces).  You may
  never have to use the compound forms, but sometimes it is necessary, and their
  use can make your code easier to understand.
@@ -2236,7 +2245,7 @@ a little background.
  
  In Perl regular expressions, most regexp elements 'eat up' a certain
  amount of string when they match.  For instance, the regexp element
-C<[abc}]> eats up one character of the string when it matches, in the
+C<[abc]> eats up one character of the string when it matches, in the
  sense that Perl moves to the next character position in the string
  after the match.  There are some elements, however, that don't eat up
  characters (advance the character position) if they match.  The examples
@@ -2290,10 +2299,6 @@ They evaluate true if the regexps do I<not> match:
      $x =~ /foo(?!baz)/;  # matches, 'baz' doesn't follow 'foo'
      $x =~ /(?<!\s)foo/;  # matches, there is no \s before 'foo'
  
-The C<\C> is unsupported in lookbehind, because the already
-treacherous definition of C<\C> would become even more so
-when going backwards.
-
  Here is an example where a string containing blank-separated words,
  numbers and single dashes is to be split into its components.
  Using C</\s+/> alone won't work, because spaces are not required between
@@ -2547,7 +2552,7 @@ running it hits the print statement before it discovers that we don't
  have a match.
  
  To take a closer look at how the engine does optimizations, see the
-section L<"Pragmas and debugging"> below.
+section L</"Pragmas and debugging"> below.
  
  More fun with C<?{}>: