- alpha Any alphabetical character.
- alnum Any alphanumerical character.
- ascii Any ASCII character.
- blank A GNU extension, equal to a space or a horizontal tab (C<\t>).
- cntrl Any control character.
- digit Any digit, equivalent to C<\d>.
- graph Any printable character, excluding a space.
- lower Any lowercase character.
- print Any printable character, including a space.
- punct Any punctuation character.
- space Any white space character. C<\s> plus the vertical tab (C<\cK>).
- upper Any uppercase character.
- word Any "word" character, equivalent to C<\w>.
- xdigit Any hexadecimal digit, '0' - '9', 'a' - 'f', 'A' - 'F'.
-
-The exact set of characters matched depends on whether the source string
-is internally in UTF-8 format or not. See L</Locale, Unicode and UTF-8>.
-
-Most POSIX character classes have C<\p> counterparts. The difference
-is that the C<\p> classes will always match according to the Unicode
-properties, regardless whether the string is in UTF-8 format or not.
-
-The following table shows the relation between POSIX character classes
-and the Unicode properties:
-
- [[:...:]] \p{...} backslash
-
- alpha IsAlpha
- alnum IsAlnum
- ascii IsASCII
- blank
- cntrl IsCntrl
- digit IsDigit \d
- graph IsGraph
- lower IsLower
- print IsPrint
- punct IsPunct
- space IsSpace
- IsSpacePerl \s
- upper IsUpper
- word IsWord
- xdigit IsXDigit
-
-Some character classes may have a non-obvious name:
+ alpha Any alphabetical character ("[A-Za-z]").
+ alnum Any alphanumeric character. ("[A-Za-z0-9]")
+ ascii Any character in the ASCII character set.
+ blank A GNU extension, equal to a space or a horizontal tab ("\t").
+ cntrl Any control character. See Note [2] below.
+ digit Any decimal digit ("[0-9]"), equivalent to "\d".
+ graph Any printable character, excluding a space. See Note [3] below.
+ lower Any lowercase character ("[a-z]").
+ print Any printable character, including a space. See Note [4] below.
+ punct Any graphical character excluding "word" characters. Note [5].
+ space Any whitespace character. "\s" plus the vertical tab ("\cK").
+ upper Any uppercase character ("[A-Z]").
+ word A Perl extension ("[A-Za-z0-9_]"), equivalent to "\w".
+ xdigit Any hexadecimal digit ("[0-9a-fA-F]").
+
+Most POSIX character classes have two Unicode-style C<\p> property
+counterparts. (They are not official Unicode properties, but Perl extensions
+derived from official Unicode properties.) The table below shows the relation
+between POSIX character classes and these counterparts.
+
+One counterpart, in the column labelled "ASCII-range Unicode" in
+the table, matches only characters in the ASCII character set.
+
+The other counterpart, in the column labelled "Full-range Unicode", matches any
+appropriate characters in the full Unicode character set. For example,
+C<\p{Alpha}> matches not just the ASCII alphabetic characters, but any
+character in the entire Unicode character set considered alphabetic.
+The column labelled "backslash sequence" is a (short) synonym for
+the Full-range Unicode form.
+
+(Each of the counterparts has various synonyms as well.
+L<perluniprops/Properties accessible through \p{} and \P{}> lists all
+synonyms, plus all characters matched by each ASCII-range property.
+For example, C<\p{AHex}> is a synonym for C<\p{ASCII_Hex_Digit}>,
+and any C<\p> property name can be prefixed with "Is" such as C<\p{IsAlpha}>.)
+
+Both the C<\p> forms are unaffected by any locale in effect, or whether
+the string is in UTF-8 format or not, or whether the platform is EBCDIC or not.
+In contrast, the POSIX character classes are affected, unless the
+regular expression is compiled with the C</a> modifier. If the C</a>
+modifier is not in effect, and the source string is in UTF-8 format, the
+POSIX classes behave like their "Full-range" Unicode counterparts. If
+C</a> modifier is in effect; or the source string is not in UTF-8
+format, and no locale is in effect, and the platform is not EBCDIC, all
+the POSIX classes behave like their ASCII-range counterparts.
+Otherwise, they behave based on the rules of the locale or EBCDIC code
+page.
+
+It is proposed to change this behavior in a future release of Perl so that the
+the UTF-8-ness of the source string will be irrelevant to the behavior of the
+POSIX character classes. This means they will always behave in strict
+accordance with the official POSIX standard. That is, if either locale or
+EBCDIC code page is present, they will behave in accordance with those; if
+absent, the classes will match only their ASCII-range counterparts. If you
+wish to comment on this proposal, send email to C<perl5-porters@perl.org>.
+
+ [[:...:]] ASCII-range Full-range backslash Note
+ Unicode Unicode sequence
+ -----------------------------------------------------
+ alpha \p{PosixAlpha} \p{XPosixAlpha}
+ alnum \p{PosixAlnum} \p{XPosixAlnum}
+ ascii \p{ASCII}
+ blank \p{PosixBlank} \p{XPosixBlank} \h [1]
+ or \p{HorizSpace} [1]
+ cntrl \p{PosixCntrl} \p{XPosixCntrl} [2]
+ digit \p{PosixDigit} \p{XPosixDigit} \d
+ graph \p{PosixGraph} \p{XPosixGraph} [3]
+ lower \p{PosixLower} \p{XPosixLower}
+ print \p{PosixPrint} \p{XPosixPrint} [4]
+ punct \p{PosixPunct} \p{XPosixPunct} [5]
+ \p{PerlSpace} \p{XPerlSpace} \s [6]
+ space \p{PosixSpace} \p{XPosixSpace} [6]
+ upper \p{PosixUpper} \p{XPosixUpper}
+ word \p{PosixWord} \p{XPosixWord} \w
+ xdigit \p{ASCII_Hex_Digit} \p{XPosixXDigit}