+The support of Unicode is new starting from Perl version v5.6, and more fully
+implemented in version v5.8 and later. See L<perluniintro>. It is
+strongly recommended that when combining Unicode and locale (starting in
+v5.16), you use
+
+ use locale ':not_characters';
+
+When this form of the pragma is used, only the non-character portions of
+locales are used by Perl, for example C<LC_NUMERIC>. Perl assumes that
+you have translated all the characters it is to operate on into Unicode
+(actually the platform's native character set (ASCII or EBCDIC) plus
+Unicode). For data in files, this can conveniently be done by also
+specifying
+
+ use open ':locale';
+
+This pragma arranges for all inputs from files to be translated into
+Unicode from the current locale as specified in the environment (see
+L</ENVIRONMENT>), and all outputs to files to be translated back
+into the locale. (See L<open>). On a per-filehandle basis, you can
+instead use the L<PerlIO::locale> module, or the L<Encode::Locale>
+module, both available from CPAN. The latter module also has methods to
+ease the handling of C<ARGV> and environment variables, and can be used
+on individual strings. Also, if you know that all your locales will be
+UTF-8, as many are these days, you can use the L<B<-C>|perlrun/-C>
+command line switch.
+
+This form of the pragma allows essentially seamless handling of locales
+with Unicode. The collation order will be Unicode's. It is strongly
+recommended that when you need to order and sort strings that you use
+the standard module L<Unicode::Collate> which gives much better results
+in many instances than you can get with the old-style locale handling.
+
+For pre-v5.16 Perls, or if you use the locale pragma without the
+C<:not_characters> parameter, Perl tries to work with both Unicode and
+locales--but there are problems.
+
+Perl does not handle multi-byte locales in this case, such as have been
+used for various
+Asian languages, such as Big5 or Shift JIS. However, the increasingly
+common multi-byte UTF-8 locales, if properly implemented, may work
+reasonably well (depending on your C library implementation) in this
+form of the locale pragma, simply because both
+they and Perl store characters that take up multiple bytes the same way.
+However, some, if not most, C library implementations may not process
+the characters in the upper half of the Latin-1 range (128 - 255)
+properly under LC_CTYPE. To see if a character is a particular type
+under a locale, Perl uses the functions like C<isalnum()>. Your C
+library may not work for UTF-8 locales with those functions, instead
+only working under the newer wide library functions like C<iswalnum()>.
+
+Perl generally takes the tack to use locale rules on code points that can fit
+in a single byte, and Unicode rules for those that can't (though this
+isn't uniformly applied, see the note at the end of this section). This
+prevents many problems in locales that aren't UTF-8. Suppose the locale
+is ISO8859-7, Greek. The character at 0xD7 there is a capital Chi. But
+in the ISO8859-1 locale, Latin1, it is a multiplication sign. The POSIX
+regular expression character class C<[[:alpha:]]> will magically match
+0xD7 in the Greek locale but not in the Latin one.
+
+However, there are places where this breaks down. Certain constructs are
+for Unicode only, such as C<\p{Alpha}>. They assume that 0xD7 always has its
+Unicode meaning (or the equivalent on EBCDIC platforms). Since Latin1 is a
+subset of Unicode and 0xD7 is the multiplication sign in both Latin1 and
+Unicode, C<\p{Alpha}> will never match it, regardless of locale. A similar
+issue occurs with C<\N{...}>. It is therefore a bad idea to use C<\p{}> or
+C<\N{}> under plain C<use locale>--I<unless> you can guarantee that the
+locale will be a ISO8859-1. Use POSIX character classes instead.
+
+Another problem with this approach is that operations that cross the
+single byte/multiple byte boundary are not well-defined, and so are
+disallowed. (This boundary is between the codepoints at 255/256.).
+For example, lower casing LATIN CAPITAL LETTER Y WITH DIAERESIS (U+0178)
+should return LATIN SMALL LETTER Y WITH DIAERESIS (U+00FF). But in the
+Greek locale, for example, there is no character at 0xFF, and Perl
+has no way of knowing what the character at 0xFF is really supposed to
+represent. Thus it disallows the operation. In this mode, the
+lowercase of U+0178 is itself.
+
+The same problems ensue if you enable automatic UTF-8-ification of your
+standard file handles, default C<open()> layer, and C<@ARGV> on non-ISO8859-1,
+non-UTF-8 locales (by using either the B<-C> command line switch or the
+C<PERL_UNICODE> environment variable; see L<perlrun>).
+Things are read in as UTF-8, which would normally imply a Unicode
+interpretation, but the presence of a locale causes them to be interpreted
+in that locale instead. For example, a 0xD7 code point in the Unicode
+input, which should mean the multiplication sign, won't be interpreted by
+Perl that way under the Greek locale. This is not a problem
+I<provided> you make certain that all locales will always and only be either
+an ISO8859-1, or, if you don't have a deficient C library, a UTF-8 locale.
+
+Vendor locales are notoriously buggy, and it is difficult for Perl to test
+its locale-handling code because this interacts with code that Perl has no
+control over; therefore the locale-handling code in Perl may be buggy as
+well. (However, the Unicode-supplied locales should be better, and
+there is a feed back mechanism to correct any problems. See
+L</Freely available locale definitions>.)
+
+If you have Perl v5.16, the problems mentioned above go away if you use
+the C<:not_characters> parameter to the locale pragma (except for vendor
+bugs in the non-character portions). If you don't have v5.16, and you
+I<do> have locales that work, using them may be worthwhile for certain
+specific purposes, as long as you keep in mind the gotchas already
+mentioned. For example, if the collation for your locales works, it
+runs faster under locales than under L<Unicode::Collate>; and you gain
+access to such things as the local currency symbol and the names of the
+months and days of the week. (But to hammer home the point, in v5.16,
+you get this access without the downsides of locales by using the
+C<:not_characters> form of the pragma.)
+
+Note: The policy of using locale rules for code points that can fit in a
+byte, and Unicode rules for those that can't is not uniformly applied.
+Pre-v5.12, it was somewhat haphazard; in v5.12 it was applied fairly
+consistently to regular expression matching except for bracketed
+character classes; in v5.14 it was extended to all regex matches; and in
+v5.16 to the casing operations such as C<"\L"> and C<uc()>. For
+collation, in all releases, the system's C<strxfrm()> function is called,
+and whatever it does is what you get.