multi-character range (not even as one of its endpoints). (L</Character
Ranges> will be explained shortly.) Therefore,
- 'ss' =~ /\A[\0-\x{ff}]\z/i # Doesn't match
- 'ss' =~ /\A[\0-\N{LATIN SMALL LETTER SHARP S}]\z/i # No match
- 'ss' =~ /\A[\xDF-\xDF]\z/i # Matches on ASCII platforms, since
+ 'ss' =~ /\A[\0-\x{ff}]\z/ui # Doesn't match
+ 'ss' =~ /\A[\0-\N{LATIN SMALL LETTER SHARP S}]\z/ui # No match
+ 'ss' =~ /\A[\xDF-\xDF]\z/ui # Matches on ASCII platforms, since
# \XDF is LATIN SMALL LETTER SHARP S,
# and the range is just a single
# element
matches, because C<\N{TAMIL SYLLABLE KAU}> is a named sequence
consisting of the two characters matched against. Like the other
-instance where a bracketed class can match multi characters, and for
+instance where a bracketed class can match multiple characters, and for
similar reasons, the class must not be inverted, and the named sequence
may not appear in a range, even one where it is both endpoints. If
these happen, it is a fatal error if the character class is within an
and
C<\x>
are also special and have the same meanings as they do outside a
-bracketed character class. (However, inside a bracketed character
-class, if C<\N{I<NAME>}> expands to a sequence of characters, only the first
-one in the sequence is used, with a warning.)
+bracketed character class.
Also, a backslash followed by two or three octal digits is considered an octal
number.
that it could be considered part of a range, you must escape that hyphen
with a backslash.
-The classes C<< [A-Z] >> and C<< [a-z] >> are special cased, in the sense
-they always match exactly the 26 upper/lower case letters, regardless
-of the platform (this only effects EBCDIC, which would otherwise include
-some non-letters).
-
Examples:
[a-z] # Matches a character that is a lower case ASCII letter.
# hyphen ('-'), or the letter 'm'.
['-?] # Matches any of the characters '()*+,-./0123456789:;<=>?
# (But not on an EBCDIC platform).
-
+ [\N{APOSTROPHE}-\N{QUESTION MARK}]
+ # Matches any of the characters '()*+,-./0123456789:;<=>?
+ # even on an EBCDIC platform.
+ [\N{U+27}-\N{U+3F}] # Same. (U+27 is "'", and U+3F is "?"
+
+As the final two examples above show, you can achieve portablity to
+non-ASCII platforms by using the C<\N{...}> form for the range
+endpoints. These indicate that the specified range is to be interpreted
+using Unicode values, so C<[\N{U+27}-\N{U+3F}]> means to match
+C<\N{U+27}>, C<\N{U+28}>, C<\N{U+29}>, ..., C<\N{U+3D}>, C<\N{U+3E}>,
+and C<\N{U+3F}>, whatever the native code point versions for those are.
+
+Perl also guarantees that the ranges C<A-Z>, C<a-z>, C<0-9>, and any
+subranges of these match what an English-only speaker would expect them
+to match on any platform. That is, C<[A-Z]> matches the 26 ASCII
+uppercase letters;
+C<[a-z]> matches the 26 lowercase letters; and C<[0-9]> matches the 10
+digits. Subranges, like C<[h-k]>, match correspondingly, in this case
+just the four letters C<"h">, C<"i">, C<"j">, and C<"k">. This is the
+natural behavior on ASCII platforms where the code points (ordinal
+values) for C<"h"> through C<"k"> are consecutive integers (0x68 through
+0x6B). But special handling to achieve this may be needed on platforms
+with a non-ASCII native character set. For example, on EBCDIC
+platforms, the code point for C<"h"> is 0x88, C<"i"> is 0x89, C<"j"> is
+0x91, and C<"k"> is 0x92. Perl specially treats C<[h-k]> to exclude the
+seven code points in the gap: 0x8A through 0x90. This special handling is
+only invoked when the range is a subrange of one of the ASCII uppercase,
+lowercase, and digit ranges, AND each end of the range is expressed
+either as a literal, like C<"A">, or as a named character (C<\N{...}>,
+including the C<\N{U+...> form).
+
+EBCDIC Examples:
+
+ [i-j] # Matches either "i" or "j"
+ [i-\N{LATIN SMALL LETTER J}] # Same
+ [i-\N{U+6A}] # Same
+ [\N{U+69}-\N{U+6A}] # Same
+ [\x{89}-\x{91}] # Matches 0x89 ("i"), 0x8A .. 0x90, 0x91 ("j")
+ [i-\x{91}] # Same
+ [\x{89}-j] # Same
+ [i-J] # Matches, 0x89 ("i") .. 0xC1 ("J"); special
+ # handling doesn't apply because range is mixed
+ # case
=head3 Negation