Merge branch 'blead' of ssh://perl5.git.perl.org/gitroot/perl into blead

[perl5.git] / pod / perlrebackslash.pod
diff --git a/pod/perlrebackslash.pod b/pod/perlrebackslash.pod

index d8cfb6a..3d3a76f 100644 (file)
--- a/pod/perlrebackslash.pod
+++ b/pod/perlrebackslash.pod
@@ -83,9 +83,10 @@ quoted constructs>.
   \l                Lowercase next character.
   \L                Lowercase till \E.
   \n                (Logical) newline character.
+ \N                Any character but newline.
   \N{}              Named (Unicode) character.
- \p{}, \pP         Character with a Unicode property.
- \P{}, \PP         Character without a Unicode property.
+ \p{}, \pP         Character with the given Unicode property.
+ \P{}, \PP         Character without the given Unicode property.
   \Q                Quotemeta till \E.
   \r                Return character.
   \R                Generic new line.
@@ -99,7 +100,7 @@ quoted constructs>.
   \w                Character class for word characters.
   \W                Character class for non-word characters.
   \x{}, \x00        Hexadecimal escape sequence.
- \X                Extended Unicode "combining character sequence".
+ \X                Unicode "extended grapheme cluster".
   \z                End of string.
   \Z                End of string.
  
@@ -293,7 +294,7 @@ L<perlrecharclass>.
  C<\w> is a character class that matches any I<word> character (letters,
  digits, underscore). C<\d> is a character class that matches any digit,
  while the character class C<\s> matches any white space character.
-New in perl 5.10 are the classes C<\h> and C<\v> which match horizontal
+New in perl 5.10.0 are the classes C<\h> and C<\v> which match horizontal
  and vertical white space characters.
  
  The uppercase variants (C<\W>, C<\D>, C<\S>, C<\H>, and C<\V>) are
@@ -340,7 +341,7 @@ as well.
  
  =head3 Relative referencing
  
-New in perl 5.10 is different way of referring to capture buffers: C<\g>.
+New in perl 5.10.0 is a different way of referring to capture buffers: C<\g>.
  C<\g> takes a number as argument, with the number in curly braces (the
  braces are optional). If the number (N) does not have a sign, it's a reference
  to the Nth capture group (so C<\g{2}> is equivalent to C<\2> - except that
@@ -369,7 +370,7 @@ Mnemonic: I<g>roup.
  
  =head3 Named referencing
  
-Also new in perl 5.10 is the use of named capture buffers, which can be
+Also new in perl 5.10.0 is the use of named capture buffers, which can be
  referred to by name. This is done with C<\g{name}>, which is a
  backreference to the capture buffer with the name I<name>.
  
@@ -391,7 +392,7 @@ contain a hyphen, so there is no ambiguity.
  
  =head2 Assertions
  
-Assertions are conditions that have to be true -- they don't actually
+Assertions are conditions that have to be true; they don't actually
  match parts of the substring. There are six assertions that are written as
  backslash sequences.
  
@@ -482,7 +483,7 @@ Mnemonic: oI<C>tet.
  
  =item \K
  
-This is new in perl 5.10. Anything that is matched left of C<\K> is
+This is new in perl 5.10.0. Anything that is matched left of C<\K> is
  not included in C<$&> - and will not be replaced if the pattern is
  used in a substitution. This will allow you to write C<s/PAT1 \K PAT2/REPL/x>
  instead of C<s/(PAT1) PAT2/${1}REPL/x> or C<s/(?<=PAT1) PAT2/REPL/x>.
@@ -498,19 +499,22 @@ a newline by Unicode. This includes all characters matched by C<\v>
  the newline used in Windows text files). C<\R> is equivalent with
  C<< (?>\x0D\x0A)|\v) >>. Since C<\R> can match a more than one character,
  it cannot be put inside a bracketed character class; C</[\R]/> is an error.
-C<\R> is introduced in perl 5.10.
+C<\R> was introduced in perl 5.10.0.
  
-Mnemonic: none really. C<\R> was picked because PCRE already uses C<\R>.
+Mnemonic: none really. C<\R> was picked because PCRE already uses C<\R>,
+and more importantly because Unicode recommends such a regular expression
+metacharacter, and suggests C<\R> as the notation.
  
  =item \X
  
-This matches an extended Unicode I<combining character sequence>, and
-is equivalent to C<< (?>\PM\pM*) >>. C<\PM> matches any character that is
-not considered a Unicode mark character, while C<\pM> matches any character
-that is considered a Unicode mark character; so C<\X> matches any non
-mark character followed by zero or more mark characters. Mark characters
-include (but are not restricted to) I<combining characters> and
-I<vowel signs>.
+This matches a Unicode I<extended grapheme cluster>.
+
+C<\X> matches quite well what normal (non-Unicode-programmer) usage
+would consider a single character.  As an example, consider a G with some sort
+of diacritic mark, such as an arrow.  There is no such single character in
+Unicode, but one can be composed using a G followed by a Unicode "COMBINING
+UPWARDS ARROW BELOW", and would be displayed by Unicode-aware software as if it
+were a single character.
  
  Mnemonic: eI<X>tended Unicode character.