=item *
Regular expression patterns can be compiled using
-L<qrE<sol>E<sol>|perlop/qrE<sol>STRINGE<sol>msixpodual> with actual
+L<qrE<sol>E<sol>|perlop/qrE<sol>STRINGE<sol>msixpodualn> with actual
matching deferred to later. Again, it is whether or not the compilation
was done within the scope of C<use locale> that determines the match
behavior, not if the matches are done within such a scope or not.
=item *
-The variables L<$!|perlvar/$ERRNO> (and its synonyms C<$ERRNO> and
-C<$OS_ERROR>) and L<$^E|perlvar/$EXTENDED_OS_ERROR> (and its synonym
+B<The variables L<$!|perlvar/$ERRNO>> (and its synonyms C<$ERRNO> and
+C<$OS_ERROR>) B<and L<$^E|perlvar/$EXTENDED_OS_ERROR>> (and its synonym
C<$EXTENDED_OS_ERROR>) when used as strings use C<LC_MESSAGES>.
=back
"color" follows "chocolate" in English, what about in traditional Spanish?
The following collations all make sense and you may meet any of them
-if you "use locale".
+if you C<"use locale">.
A B C D E a b c d e
A a B b C c D d E e
dictionary-like ordering that ignores space characters completely and
which folds case.
-Perl only supports single-byte locales for C<LC_COLLATE>. This means
+Perl currently only supports single-byte locales for C<LC_COLLATE>. This means
that a UTF-8 locale likely will just give you machine-native ordering.
Use L<Unicode::Collate> for the full implementation of the Unicode
Collation Algorithm.
Regular expression checks for safe file names or mail addresses using
C<\w> may be spoofed by an C<LC_CTYPE> locale that claims that
-characters such as "E<gt>" and "|" are alphanumeric.
+characters such as C<"E<gt>"> and C<"|"> are alphanumeric.
=item *
properly under C<LC_CTYPE>. To see if a character is a particular type
under a locale, Perl uses the functions like C<isalnum()>. Your C
library may not work for UTF-8 locales with those functions, instead
-only working under the newer wide library functions like C<iswalnum()>.
-However, they are treated like single-byte locales, and will have the
-restrictions described below.
+only working under the newer wide library functions like C<iswalnum()>,
+which Perl does not use.
+These multi-byte locales are treated like single-byte locales, and will
+have the restrictions described below. Starting in Perl v5.22 a warning
+message is raised when Perl detects a multi-byte locale that it doesn't
+fully support.
For single-byte locales,
Perl generally takes the tack to use locale rules on code points that can fit
issue occurs with C<\N{...}>. Prior to v5.20, It is therefore a bad
idea to use C<\p{}> or
C<\N{}> under plain C<use locale>--I<unless> you can guarantee that the
-locale will be a ISO8859-1. Use POSIX character classes instead.
+locale will be ISO8859-1. Use POSIX character classes instead.
Another problem with this approach is that operations that cross the
single byte/multiple byte boundary are not well-defined, and so are
points meaning the same character. Thus in a Greek locale, both U+03A7
and U+00D7 are GREEK CAPITAL LETTER CHI.
+Because of all these problems, starting in v5.22, Perl will raise a
+warning if a multi-byte (hence Unicode) code point is used when a
+single-byte locale is in effect. (Although it doesn't check for this if
+doing so would unreasonably slow execution down.)
+
Vendor locales are notoriously buggy, and it is difficult for Perl to test
its locale-handling code because this interacts with code that Perl has no
control over; therefore the locale-handling code in Perl may be buggy as
consistently to regular expression matching except for bracketed
character classes; in v5.14 it was extended to all regex matches; and in
v5.16 to the casing operations such as C<\L> and C<uc()>. For
-collation, in all releases, the system's C<strxfrm()> function is called,
-and whatever it does is what you get.
+collation, in all releases so far, the system's C<strxfrm()> function is
+called, and whatever it does is what you get.
=head1 BUGS