+into bankers, bikers, gamers, and so on.
+
+=head1 Unicode and UTF-8
+
+The support of Unicode is new starting from Perl version v5.6, and more fully
+implemented in versions v5.8 and later. See L<perluniintro>.
+
+Starting in Perl v5.20, UTF-8 locales are supported in Perl, except for
+C<LC_COLLATE> (use L<Unicode::Collate> instead). If you have Perl v5.16
+or v5.18 and can't upgrade, you can use
+
+ use locale ':not_characters';
+
+When this form of the pragma is used, only the non-character portions of
+locales are used by Perl, for example C<LC_NUMERIC>. Perl assumes that
+you have translated all the characters it is to operate on into Unicode
+(actually the platform's native character set (ASCII or EBCDIC) plus
+Unicode). For data in files, this can conveniently be done by also
+specifying
+
+ use open ':locale';
+
+This pragma arranges for all inputs from files to be translated into
+Unicode from the current locale as specified in the environment (see
+L</ENVIRONMENT>), and all outputs to files to be translated back
+into the locale. (See L<open>). On a per-filehandle basis, you can
+instead use the L<PerlIO::locale> module, or the L<Encode::Locale>
+module, both available from CPAN. The latter module also has methods to
+ease the handling of C<ARGV> and environment variables, and can be used
+on individual strings. If you know that all your locales will be
+UTF-8, as many are these days, you can use the L<B<-C>|perlrun/-C>
+command line switch.
+
+This form of the pragma allows essentially seamless handling of locales
+with Unicode. The collation order will be by Unicode code point order.
+It is strongly
+recommended that when you need to order and sort strings that you use
+the standard module L<Unicode::Collate> which gives much better results
+in many instances than you can get with the old-style locale handling.
+
+All the modules and switches just described can be used in v5.20 with
+just plain C<use locale>, and, should the input locales not be UTF-8,
+you'll get the less than ideal behavior, described below, that you get
+with pre-v5.16 Perls, or when you use the locale pragma without the
+C<:not_characters> parameter in v5.16 and v5.18. If you are using
+exclusively UTF-8 locales in v5.20 and higher, the rest of this section
+does not apply to you.
+
+There are two cases, multi-byte and single-byte locales. First
+multi-byte:
+
+The only multi-byte (or wide character) locale that Perl is ever likely
+to support is UTF-8. This is due to the difficulty of implementation,
+the fact that high quality UTF-8 locales are now published for every
+area of the world (L<http://unicode.org/Public/cldr/latest/>), and that
+failing all that you can use the L<Encode> module to translate to/from
+your locale. So, you'll have to do one of those things if you're using
+one of these locales, such as Big5 or Shift JIS. For UTF-8 locales, in
+Perls (pre v5.20) that don't have full UTF-8 locale support, they may
+work reasonably well (depending on your C library implementation)
+simply because both
+they and Perl store characters that take up multiple bytes the same way.
+However, some, if not most, C library implementations may not process
+the characters in the upper half of the Latin-1 range (128 - 255)
+properly under C<LC_CTYPE>. To see if a character is a particular type
+under a locale, Perl uses the functions like C<isalnum()>. Your C
+library may not work for UTF-8 locales with those functions, instead
+only working under the newer wide library functions like C<iswalnum()>,
+which Perl does not use.
+These multi-byte locales are treated like single-byte locales, and will
+have the restrictions described below. Starting in Perl v5.22 a warning
+message is raised when Perl detects a multi-byte locale that it doesn't
+fully support.
+
+For single-byte locales,
+Perl generally takes the tack to use locale rules on code points that can fit
+in a single byte, and Unicode rules for those that can't (though this
+isn't uniformly applied, see the note at the end of this section). This
+prevents many problems in locales that aren't UTF-8. Suppose the locale
+is ISO8859-7, Greek. The character at 0xD7 there is a capital Chi. But
+in the ISO8859-1 locale, Latin1, it is a multiplication sign. The POSIX
+regular expression character class C<[[:alpha:]]> will magically match
+0xD7 in the Greek locale but not in the Latin one.
+
+However, there are places where this breaks down. Certain Perl constructs are
+for Unicode only, such as C<\p{Alpha}>. They assume that 0xD7 always has its
+Unicode meaning (or the equivalent on EBCDIC platforms). Since Latin1 is a
+subset of Unicode and 0xD7 is the multiplication sign in both Latin1 and
+Unicode, C<\p{Alpha}> will never match it, regardless of locale. A similar
+issue occurs with C<\N{...}>. Prior to v5.20, It is therefore a bad
+idea to use C<\p{}> or
+C<\N{}> under plain C<use locale>--I<unless> you can guarantee that the
+locale will be ISO8859-1. Use POSIX character classes instead.
+
+Another problem with this approach is that operations that cross the
+single byte/multiple byte boundary are not well-defined, and so are
+disallowed. (This boundary is between the codepoints at 255/256.)
+For example, lower casing LATIN CAPITAL LETTER Y WITH DIAERESIS (U+0178)
+should return LATIN SMALL LETTER Y WITH DIAERESIS (U+00FF). But in the
+Greek locale, for example, there is no character at 0xFF, and Perl
+has no way of knowing what the character at 0xFF is really supposed to
+represent. Thus it disallows the operation. In this mode, the
+lowercase of U+0178 is itself.
+
+The same problems ensue if you enable automatic UTF-8-ification of your
+standard file handles, default C<open()> layer, and C<@ARGV> on non-ISO8859-1,
+non-UTF-8 locales (by using either the B<-C> command line switch or the
+C<PERL_UNICODE> environment variable; see L<perlrun>).
+Things are read in as UTF-8, which would normally imply a Unicode
+interpretation, but the presence of a locale causes them to be interpreted
+in that locale instead. For example, a 0xD7 code point in the Unicode
+input, which should mean the multiplication sign, won't be interpreted by
+Perl that way under the Greek locale. This is not a problem
+I<provided> you make certain that all locales will always and only be either
+an ISO8859-1, or, if you don't have a deficient C library, a UTF-8 locale.
+
+Still another problem is that this approach can lead to two code
+points meaning the same character. Thus in a Greek locale, both U+03A7
+and U+00D7 are GREEK CAPITAL LETTER CHI.
+
+Because of all these problems, starting in v5.22, Perl will raise a
+warning if a multi-byte (hence Unicode) code point is used when a
+single-byte locale is in effect. (Although it doesn't check for this if
+doing so would unreasonably slow execution down.)
+
+Vendor locales are notoriously buggy, and it is difficult for Perl to test
+its locale-handling code because this interacts with code that Perl has no
+control over; therefore the locale-handling code in Perl may be buggy as
+well. (However, the Unicode-supplied locales should be better, and
+there is a feed back mechanism to correct any problems. See
+L</Freely available locale definitions>.)
+
+If you have Perl v5.16, the problems mentioned above go away if you use
+the C<:not_characters> parameter to the locale pragma (except for vendor
+bugs in the non-character portions). If you don't have v5.16, and you
+I<do> have locales that work, using them may be worthwhile for certain
+specific purposes, as long as you keep in mind the gotchas already
+mentioned. For example, if the collation for your locales works, it
+runs faster under locales than under L<Unicode::Collate>; and you gain
+access to such things as the local currency symbol and the names of the
+months and days of the week. (But to hammer home the point, in v5.16,
+you get this access without the downsides of locales by using the
+C<:not_characters> form of the pragma.)
+
+Note: The policy of using locale rules for code points that can fit in a
+byte, and Unicode rules for those that can't is not uniformly applied.
+Pre-v5.12, it was somewhat haphazard; in v5.12 it was applied fairly
+consistently to regular expression matching except for bracketed
+character classes; in v5.14 it was extended to all regex matches; and in
+v5.16 to the casing operations such as C<\L> and C<uc()>. For
+collation, in all releases so far, the system's C<strxfrm()> function is
+called, and whatever it does is what you get.