\L Lowercase till \E. Not in [].
\n (Logical) newline character.
\N Any character but newline. Experimental. Not in [].
- \N{} Named or numbered (Unicode) character.
+ \N{} Named or numbered (Unicode) character or sequence.
\o{} Octal escape sequence.
\p{}, \pP Character with the given Unicode property.
\P{}, \PP Character without the given Unicode property.
$str =~ /\cK/; # Matches if $str contains a vertical tab (control-K).
-=head3 Named or numbered characters
+=head3 Named or numbered characters and character sequences
Unicode characters have a Unicode name and numeric ordinal value. Use the
C<\N{}> construct to specify a character by either of these values.
+Certain sequences of characters also have names.
-To specify by name, the name of the character goes between the curly braces.
-In this case, you have to C<use charnames> to load the Unicode names of the
-characters, otherwise Perl will complain.
+To specify by name, the name of the character or character sequence goes
+between the curly braces. In this case, you have to C<use charnames> to
+load the Unicode names of the characters, otherwise Perl will complain.
To specify a character by Unicode code point, use the form
C<\N{U+I<wide hex character>}>, where I<wide hex character> is a number in
leading zeros. C<\N{U+0041}> means "A" even on EBCDIC machines (where the
ordinal value of "A" is not 0x41).
-It is even possible to give your own names to characters, and even to short
-sequences of characters. For details, see L<charnames>.
+It is even possible to give your own names to characters and character
+sequences. For details, see L<charnames>.
(There is an expanded internal form that you may see in debug output:
C<\N{U+I<wide hex character>.I<wide hex character>...}>.
Mnemonic: I<N>amed character.
-Note that a character that is expressed as a named or numbered character is
-considered as a character without special meaning by the regex engine, and will
-match "as is".
+Note that a character or character sequence that is expressed as a named
+or numbered character is considered as a character without special
+meaning by the regex engine, and will match "as is".
=head4 Example
discuss those here; full details of character classes can be found in
L<perlrecharclass>.
-C<\w> is a character class that matches any single I<word> character (letters,
-digits, underscore). C<\d> is a character class that matches any decimal digit,
-while the character class C<\s> matches any whitespace character.
+C<\w> is a character class that matches any single I<word> character
+(letters, digits, Unicode marks, and connector punctuation (like the
+underscore)). C<\d> is a character class that matches any decimal
+digit, while the character class C<\s> matches any whitespace character.
New in perl 5.10.0 are the classes C<\h> and C<\v> which match horizontal
and vertical whitespace characters.
The uppercase variants (C<\W>, C<\D>, C<\S>, C<\H>, and C<\V>) are
-character classes that match any character that isn't a word character,
-digit, whitespace, horizontal whitespace nor vertical whitespace.
+character classes that match, respectively, any character that isn't a
+word character, digit, whitespace, horizontal whitespace, or vertical
+whitespace.
Mnemonics: I<w>ord, I<d>igit, I<s>pace, I<h>orizontal, I<v>ertical.
=item \A
C<\A> only matches at the beginning of the string. If the C</m> modifier
-isn't used, then C</\A/> is equivalent with C</^/>. However, if the C</m>
+isn't used, then C</\A/> is equivalent to C</^/>. However, if the C</m>
modifier is used, then C</^/> matches internal newlines, but the meaning
of C</\A/> isn't changed by the C</m> modifier. C<\A> matches at the beginning
of the string regardless whether the C</m> modifier is used.
=item \z, \Z
C<\z> and C<\Z> match at the end of the string. If the C</m> modifier isn't
-used, then C</\Z/> is equivalent with C</$/>, that is, it matches at the
+used, then C</\Z/> is equivalent to C</$/>, that is, it matches at the
end of the string, or before the newline at the end of the string. If the
C</m> modifier is used, then C</$/> matches at internal newlines, but the
meaning of C</\Z/> isn't changed by the C</m> modifier. C<\Z> matches at
the meaning of C<.>, but not C<\N>.
Note that C<\N{...}> can mean a
-L<named or numbered character|/Named or numbered characters>.
+L<named or numbered character
+|/Named or numbered characters and character sequences>.
Mnemonic: Complement of I<\n>.