# No Posix equivalent for vertical space
my $Space = $perl->add_match_table('Space',
- Description => '\s including beyond ASCII plus vertical tab',
+ Description => '\s including beyond ASCII and vertical tab',
Initialize => $Blank + $VertSpace,
);
$Space->add_alias('XPosixSpace');
Initialize => $Space & $ASCII,
);
- # Perl's traditional space doesn't include Vertical Tab
+ # Perl's traditional space doesn't include Vertical Tab prior to v5.18
my $XPerlSpace = $perl->add_match_table('XPerlSpace',
Description => '\s, including beyond ASCII',
#Initialize => $Space - 0x000B,
=item If the C</a> modifier is in effect ...
-C<\s> matches the 5 characters [\t\n\f\r ]; that is, the horizontal tab,
-the newline, the form feed, the carriage return, and the space. (Note
-that it doesn't match the vertical tab, C<\cK> on ASCII platforms.)
+In all Perl versions, C<\s> matches the 5 characters [\t\n\f\r ]; that
+is, the horizontal tab,
+the newline, the form feed, the carriage return, and the space.
+Starting in Perl v5.18, experimentally, it also matches the vertical tab, C<\cK>.
+See note C<[1]> below for a discussion of this.
=item otherwise ...
=item if locale rules are in effect ...
-C<\s> matches whatever the locale considers to be whitespace. Note that
-this is likely to include the vertical space, unlike non-locale C<\s>
-matching.
+C<\s> matches whatever the locale considers to be whitespace.
=item if Unicode rules are in effect or if on an EBCDIC platform ...
=item otherwise ...
-C<\s> matches [\t\n\f\r ].
+C<\s> matches [\t\n\f\r\cK ] and, starting, experimentally in Perl
+v5.18, the vertical tab, C<\cK>.
+(See note C<[1]> below for a discussion of this.)
Note that this list doesn't include the non-breaking space.
=back
the same characters, without regard to other factors, such as the active
locale or whether the source string is in UTF-8 format.
-One might think that C<\s> is equivalent to C<[\h\v]>. This is not true.
-The difference is that the vertical tab (C<"\x0b">) is not matched by
-C<\s>; it is however considered vertical whitespace.
+One might think that C<\s> is equivalent to C<[\h\v]>. This is indeed true
+starting in Perl v5.18, but prior to that, the sole difference was that the
+vertical tab (C<"\cK">) was not matched by C<\s>.
The following table is a complete listing of characters matched by
C<\s>, C<\h> and C<\v> as of Unicode 6.0.
0x0009 CHARACTER TABULATION h s
0x000a LINE FEED (LF) vs
- 0x000b LINE TABULATION v
+ 0x000b LINE TABULATION vs [1]
0x000c FORM FEED (FF) vs
0x000d CARRIAGE RETURN (CR) vs
0x0020 SPACE h s
- 0x0085 NEXT LINE (NEL) vs [1]
- 0x00a0 NO-BREAK SPACE h s [1]
+ 0x0085 NEXT LINE (NEL) vs [2]
+ 0x00a0 NO-BREAK SPACE h s [2]
0x1680 OGHAM SPACE MARK h s
0x180e MONGOLIAN VOWEL SEPARATOR h s
0x2000 EN QUAD h s
=item [1]
+Prior to Perl v5.18, C<\s> did not match the vertical tab. The change
+in v5.18 is considered an experiment, which means it could be backed out
+in v5.20 or v5.22 if experience indicates that it breaks too much
+existing code. If this change adversely affects you, send email to
+C<perlbug@perl.org>; if it affects you positively, email
+C<perlthanks@perl.org>. In the meantime, C<[^\S\cK]> (obscurely)
+matches what C<\s> traditionally did.
+
+=item [2]
+
NEXT LINE and NO-BREAK SPACE may or may not match C<\s> depending
on the rules in effect. See
L<the beginning of this section|/Whitespace>.
lower Any lowercase character ("[a-z]").
print Any printable character, including a space. See Note [4] below.
punct Any graphical character excluding "word" characters. Note [5].
- space Any whitespace character. "\s" plus the vertical tab ("\cK").
+ space Any whitespace character. "\s" including the vertical tab
+ ("\cK").
upper Any uppercase character ("[A-Z]").
word A Perl extension ("[A-Za-z0-9_]"), equivalent to "\w".
xdigit Any hexadecimal digit ("[0-9a-fA-F]").
=item [6]
-C<\p{SpacePerl}> and C<\p{Space}> differ only in that in non-locale
-matching, C<\p{Space}> additionally
-matches the vertical tab, C<\cK>. Same for the two ASCII-only range forms.
+C<\p{SpacePerl}> and C<\p{Space}> match identically starting with Perl
+v5.18. In earlier versions, these differ only in that in non-locale
+matching, C<\p{SpacePerl}> does not match the vertical tab, C<\cK>.
+Same for the two ASCII-only range forms.
=back