-X<\w> X<\W> X<\s> X<\S> X<\d> X<\D> X<\X> X<\p> X<\P> X<\C>
-X<\g> X<\k> X<\N> X<\K> X<\v> X<\V> X<\h> X<\H>
-X<word> X<whitespace> X<character class> X<backreference>
-
- \w Match a "word" character (alphanumeric plus "_")
- \W Match a non-"word" character
- \s Match a whitespace character
- \S Match a non-whitespace character
- \d Match a digit character
- \D Match a non-digit character
- \pP Match P, named property. Use \p{Prop} for longer names.
- \PP Match non-P
- \X Match eXtended Unicode "combining character sequence",
- equivalent to (?>\PM\pM*)
- \C Match a single C char (octet) even under Unicode.
- NOTE: breaks up characters into their UTF-8 bytes,
- so you may end up with malformed pieces of UTF-8.
- Unsupported in lookbehind.
- \1 Backreference to a specific group.
- '1' may actually be any positive integer.
- \g1 Backreference to a specific or previous group,
- \g{-1} number may be negative indicating a previous buffer and may
- optionally be wrapped in curly brackets for safer parsing.
- \g{name} Named backreference
- \k<name> Named backreference
- \K Keep the stuff left of the \K, don't include it in $&
- \N Any character but \n
- \v Vertical whitespace
- \V Not vertical whitespace
- \h Horizontal whitespace
- \H Not horizontal whitespace
- \R Linebreak
-
-A C<\w> matches a single alphanumeric character (an alphabetic
-character, or a decimal digit) or C<_>, not a whole word. Use C<\w+>
-to match a string of Perl-identifier characters (which isn't the same
-as matching an English word). If C<use locale> is in effect, the list
-of alphabetic characters generated by C<\w> is taken from the current
-locale. See L<perllocale>. You may use C<\w>, C<\W>, C<\s>, C<\S>,
-C<\d>, and C<\D> within character classes, but they aren't usable
-as either end of a range. If any of them precedes or follows a "-",
-the "-" is understood literally. If Unicode is in effect, C<\s> matches
-also "\x{85}", "\x{2028}", and "\x{2029}". See L<perlunicode> for more
-details about C<\pP>, C<\PP>, C<\X> and the possibility of defining
-your own C<\p> and C<\P> properties, and L<perluniintro> about Unicode
-in general.
-X<\w> X<\W> X<word>
-
-C<\R> will atomically match a linebreak, including the network line-ending
-"\x0D\x0A". Specifically, X<\R> is exactly equivalent to
-
- (?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}])
-
-B<Note:> C<\R> has no special meaning inside of a character class;
-use C<\v> instead (vertical whitespace).
-X<\R>
-
-The POSIX character class syntax
-X<character class>
-
- [:class:]
-
-is also available. Note that the C<[> and C<]> brackets are I<literal>;
-they must always be used within a character class expression.
-
- # this is correct:
- $string =~ /[[:alpha:]]/;
-
- # this is not, and will generate a warning:
- $string =~ /[:alpha:]/;
-
-The following table shows the mapping of POSIX character class
-names, common escapes, literal escape sequences and their equivalent
-Unicode style property names.
-X<character class> X<\p> X<\p{}>
-X<alpha> X<alnum> X<ascii> X<blank> X<cntrl> X<digit> X<graph>
-X<lower> X<print> X<punct> X<space> X<upper> X<word> X<xdigit>
-
-B<Note:> up to Perl 5.10 the property names used were shared with
-standard Unicode properties, this was changed in Perl 5.11, see
-L<perl5110delta> for details.
-
- POSIX Esc Class Property Note
- --------------------------------------------------------
- alnum [0-9A-Za-z] IsPosixAlnum
- alpha [A-Za-z] IsPosixAlpha
- ascii [\000-\177] IsASCII
- blank [\011 ] IsPosixBlank [1]
- cntrl [\0-\37\177] IsPosixCntrl
- digit \d [0-9] IsPosixDigit
- graph [!-~] IsPosixGraph
- lower [a-z] IsPosixLower
- print [ -~] IsPosixPrint
- punct [!-/:-@[-`{-~] IsPosixPunct
- space [\11-\15 ] IsPosixSpace [2]
- \s [\11\12\14\15 ] IsPerlSpace [2]
- upper [A-Z] IsPosixUpper
- word \w [0-9A-Z_a-z] IsPerlWord [3]
- xdigit [0-9A-Fa-f] IsXDigit
-
-=over
+X<\g> X<\k> X<\K> X<backreference>
+
+ Sequence Note Description
+ [...] [1] Match a character according to the rules of the bracketed
+ character class defined by the "...". Example: [a-z]
+ matches "a" or "b" or "c" ... or "z"
+ [[:...:]] [2] Match a character according to the rules of the POSIX
+ character class "..." within the outer bracketed character
+ class. Example: [[:upper:]] matches any uppercase
+ character.
+ \w [3] Match a "word" character (alphanumeric plus "_")
+ \W [3] Match a non-"word" character
+ \s [3] Match a whitespace character
+ \S [3] Match a non-whitespace character
+ \d [3] Match a decimal digit character
+ \D [3] Match a non-digit character
+ \pP [3] Match P, named property. Use \p{Prop} for longer names.
+ \PP [3] Match non-P
+ \X [4] Match Unicode "eXtended grapheme cluster"
+ \C Match a single C-language char (octet) even if that is part
+ of a larger UTF-8 character. Thus it breaks up characters
+ into their UTF-8 bytes, so you may end up with malformed
+ pieces of UTF-8. Unsupported in lookbehind.
+ \1 [5] Backreference to a specific capture buffer or group.
+ '1' may actually be any positive integer.
+ \g1 [5] Backreference to a specific or previous group,
+ \g{-1} [5] The number may be negative indicating a relative previous
+ buffer and may optionally be wrapped in curly brackets for
+ safer parsing.
+ \g{name} [5] Named backreference
+ \k<name> [5] Named backreference
+ \K [6] Keep the stuff left of the \K, don't include it in $&
+ \N [7] Any character but \n (experimental). Not affected by /s
+ modifier
+ \v [3] Vertical whitespace
+ \V [3] Not vertical whitespace
+ \h [3] Horizontal whitespace
+ \H [3] Not horizontal whitespace
+ \R [4] Linebreak
+
+=over 4