This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
locale.t: Don't bother testing locales with malformed names
[perl5.git] / pod / perllocale.pod
CommitLineData
5f05dabc 1=head1 NAME
2
b0c42ed9 3perllocale - Perl locale handling (internationalization and localization)
5f05dabc 4
5=head1 DESCRIPTION
6
e199995e
KW
7Locales these days have been mostly been supplanted by Unicode, but Perl
8continues to support them. See L</Unicode and UTF-8> below.
9
5a964f20
TC
10Perl supports language-specific notions of data such as "is this
11a letter", "what is the uppercase equivalent of this letter", and
12"which of these letters comes first". These are important issues,
13especially for languages other than English--but also for English: it
14would be naE<iuml>ve to imagine that C<A-Za-z> defines all the "letters"
b4ffc3db
TC
15needed to write correct English. Perl is also aware that some character other
16than "." may be preferred as a decimal point, and that output date
5a964f20
TC
17representations may be language-specific. The process of making an
18application take account of its users' preferences in such matters is
19called B<internationalization> (often abbreviated as B<i18n>); telling
20such an application about a particular set of preferences is known as
21B<localization> (B<l10n>).
14280422
DD
22
23Perl can understand language-specific data via the standardized (ISO C,
24XPG4, POSIX 1.c) method called "the locale system". The locale system is
b0c42ed9 25controlled per application using one pragma, one function call, and
14280422
DD
26several environment variables.
27
28B<NOTE>: This feature is new in Perl 5.004, and does not apply unless an
5a964f20 29application specifically requests it--see L<Backward compatibility>.
e38874e2
DD
30The one exception is that write() now B<always> uses the current locale
31- see L<"NOTES">.
5f05dabc 32
33=head1 PREPARING TO USE LOCALES
34
5a964f20 35If Perl applications are to understand and present your data
14280422 36correctly according a locale of your choice, B<all> of the following
5f05dabc 37must be true:
38
39=over 4
40
41=item *
42
43B<Your operating system must support the locale system>. If it does,
14280422 44you should find that the setlocale() function is a documented part of
5f05dabc 45its C library.
46
47=item *
48
5a964f20 49B<Definitions for locales that you use must be installed>. You, or
14280422
DD
50your system administrator, must make sure that this is the case. The
51available locales, the location in which they are kept, and the manner
5a964f20
TC
52in which they are installed all vary from system to system. Some systems
53provide only a few, hard-wired locales and do not allow more to be
54added. Others allow you to add "canned" locales provided by the system
55supplier. Still others allow you or the system administrator to define
14280422 56and add arbitrary locales. (You may have to ask your supplier to
5a964f20 57provide canned locales that are not delivered with your operating
14280422 58system.) Read your system documentation for further illumination.
5f05dabc 59
60=item *
61
62B<Perl must believe that the locale system is supported>. If it does,
63C<perl -V:d_setlocale> will say that the value for C<d_setlocale> is
64C<define>.
65
66=back
67
68If you want a Perl application to process and present your data
69according to a particular locale, the application code should include
2ae324a7 70the S<C<use locale>> pragma (see L<The use locale pragma>) where
5f05dabc 71appropriate, and B<at least one> of the following must be true:
72
73=over 4
74
c052850d 75=item 1
5f05dabc 76
14280422 77B<The locale-determining environment variables (see L<"ENVIRONMENT">)
5a964f20 78must be correctly set up> at the time the application is started, either
ef3087ec 79by yourself or by whomever set up your system account; or
5f05dabc 80
c052850d 81=item 2
5f05dabc 82
14280422
DD
83B<The application must set its own locale> using the method described in
84L<The setlocale function>.
5f05dabc 85
86=back
87
88=head1 USING LOCALES
89
90=head2 The use locale pragma
91
14280422 92By default, Perl ignores the current locale. The S<C<use locale>>
50bb0127 93pragma tells Perl to use the
70709c68 94current locale for some operations (C</l> for just pattern matching).
c052850d
KW
95
96The current locale is set at execution time by
97L<setlocale()|/The setlocale function> described below. If that function
98hasn't yet been called in the course of the program's execution, the
99current locale is that which was determined by the L<"ENVIRONMENT"> in
100effect at the start of the program, except that
101C<L<LC_NUMERIC|/Category LC_NUMERIC: Numeric Formatting>> is always
102initialized to the C locale (mentioned under L<Finding locales>).
70709c68
KW
103If there is no valid environment, the current locale is undefined. It
104is likely, but not necessarily, the "C" locale.
c052850d
KW
105
106The operations that are affected by locale are:
5f05dabc 107
108=over 4
109
110=item *
111
14280422
DD
112B<The comparison operators> (C<lt>, C<le>, C<cmp>, C<ge>, and C<gt>) and
113the POSIX string collation functions strcoll() and strxfrm() use
5a964f20
TC
114C<LC_COLLATE>. sort() is also affected if used without an
115explicit comparison function, because it uses C<cmp> by default.
14280422 116
5a964f20 117B<Note:> C<eq> and C<ne> are unaffected by locale: they always
de108802 118perform a char-by-char comparison of their scalar operands. What's
14280422
DD
119more, if C<cmp> finds that its operands are equal according to the
120collation sequence specified by the current locale, it goes on to
de108802
RGS
121perform a char-by-char comparison, and only returns I<0> (equal) if the
122operands are char-for-char identical. If you really want to know whether
5a964f20 123two strings--which C<eq> and C<cmp> may consider different--are equal
14280422
DD
124as far as collation in the locale is concerned, see the discussion in
125L<Category LC_COLLATE: Collation>.
5f05dabc 126
127=item *
128
14280422
DD
129B<Regular expressions and case-modification functions> (uc(), lc(),
130ucfirst(), and lcfirst()) use C<LC_CTYPE>
5f05dabc 131
132=item *
133
903eb63f 134B<Format declarations> (format()) use C<LC_NUMERIC>
5f05dabc 135
136=item *
137
14280422 138B<The POSIX date formatting function> (strftime()) uses C<LC_TIME>.
5f05dabc 139
140=back
141
13a2d996
SP
142C<LC_COLLATE>, C<LC_CTYPE>, and so on, are discussed further in
143L<LOCALE CATEGORIES>.
5f05dabc 144
5a964f20 145The default behavior is restored with the S<C<no locale>> pragma, or
ef3087ec 146upon reaching the end of the block enclosing C<use locale>.
5f05dabc 147
5a964f20 148The string result of any operation that uses locale
14280422
DD
149information is tainted, as it is possible for a locale to be
150untrustworthy. See L<"SECURITY">.
5f05dabc 151
152=head2 The setlocale function
153
14280422
DD
154You can switch locales as often as you wish at run time with the
155POSIX::setlocale() function:
5f05dabc 156
157 # This functionality not usable prior to Perl 5.004
158 require 5.004;
159
160 # Import locale-handling tool set from POSIX module.
161 # This example uses: setlocale -- the function call
162 # LC_CTYPE -- explained below
163 use POSIX qw(locale_h);
164
14280422 165 # query and save the old locale
5f05dabc 166 $old_locale = setlocale(LC_CTYPE);
167
168 setlocale(LC_CTYPE, "fr_CA.ISO8859-1");
169 # LC_CTYPE now in locale "French, Canada, codeset ISO 8859-1"
170
171 setlocale(LC_CTYPE, "");
172 # LC_CTYPE now reset to default defined by LC_ALL/LC_CTYPE/LANG
173 # environment variables. See below for documentation.
174
175 # restore the old locale
176 setlocale(LC_CTYPE, $old_locale);
177
14280422
DD
178The first argument of setlocale() gives the B<category>, the second the
179B<locale>. The category tells in what aspect of data processing you
180want to apply locale-specific rules. Category names are discussed in
181L<LOCALE CATEGORIES> and L<"ENVIRONMENT">. The locale is the name of a
182collection of customization information corresponding to a particular
183combination of language, country or territory, and codeset. Read on for
184hints on the naming of locales: not all systems name locales as in the
185example.
186
502a173a
JH
187If no second argument is provided and the category is something else
188than LC_ALL, the function returns a string naming the current locale
189for the category. You can use this value as the second argument in a
190subsequent call to setlocale().
191
192If no second argument is provided and the category is LC_ALL, the
193result is implementation-dependent. It may be a string of
c052850d 194concatenated locale names (separator also implementation-dependent)
f979aebc 195or a single locale name. Please consult your setlocale(3) man page for
502a173a
JH
196details.
197
198If a second argument is given and it corresponds to a valid locale,
199the locale for the category is set to that value, and the function
200returns the now-current locale value. You can then use this in yet
201another call to setlocale(). (In some implementations, the return
202value may sometimes differ from the value you gave as the second
203argument--think of it as an alias for the value you gave.)
5f05dabc 204
205As the example shows, if the second argument is an empty string, the
206category's locale is returned to the default specified by the
207corresponding environment variables. Generally, this results in a
5a964f20 208return to the default that was in force when Perl started up: changes
54310121 209to the environment made by the application after startup may or may not
5a964f20 210be noticed, depending on your system's C library.
5f05dabc 211
14280422
DD
212If the second argument does not correspond to a valid locale, the locale
213for the category is not changed, and the function returns I<undef>.
5f05dabc 214
f979aebc 215For further information about the categories, consult setlocale(3).
3e6e419a
JH
216
217=head2 Finding locales
218
f979aebc 219For locales available in your system, consult also setlocale(3) to
5a964f20
TC
220see whether it leads to the list of available locales (search for the
221I<SEE ALSO> section). If that fails, try the following command lines:
5f05dabc 222
223 locale -a
224
225 nlsinfo
226
227 ls /usr/lib/nls/loc
228
229 ls /usr/lib/locale
230
231 ls /usr/lib/nls
232
b478f28d
JH
233 ls /usr/share/locale
234
5f05dabc 235and see whether they list something resembling these
236
2bdf8add 237 en_US.ISO8859-1 de_DE.ISO8859-1 ru_RU.ISO8859-5
502a173a 238 en_US.iso88591 de_DE.iso88591 ru_RU.iso88595
2bdf8add 239 en_US de_DE ru_RU
14280422 240 en de ru
2bdf8add
JH
241 english german russian
242 english.iso88591 german.iso88591 russian.iso88595
502a173a 243 english.roman8 russian.koi8r
5f05dabc 244
528d65ad
JH
245Sadly, even though the calling interface for setlocale() has been
246standardized, names of locales and the directories where the
5a964f20 247configuration resides have not been. The basic form of the name is
528d65ad
JH
248I<language_territory>B<.>I<codeset>, but the latter parts after
249I<language> are not always present. The I<language> and I<country>
250are usually from the standards B<ISO 3166> and B<ISO 639>, the
251two-letter abbreviations for the countries and the languages of the
252world, respectively. The I<codeset> part often mentions some B<ISO
2538859> character set, the Latin codesets. For example, C<ISO 8859-1>
254is the so-called "Western European codeset" that can be used to encode
255most Western European languages adequately. Again, there are several
256ways to write even the name of that one standard. Lamentably.
5f05dabc 257
14280422
DD
258Two special locales are worth particular mention: "C" and "POSIX".
259Currently these are effectively the same locale: the difference is
5a964f20
TC
260mainly that the first one is defined by the C standard, the second by
261the POSIX standard. They define the B<default locale> in which
14280422 262every program starts in the absence of locale information in its
5a964f20 263environment. (The I<default> default locale, if you will.) Its language
14280422 264is (American) English and its character codeset ASCII.
c052850d
KW
265B<Warning>. The C locale delivered by some vendors may not
266actually exactly match what the C standard calls for. So beware.
5f05dabc 267
14280422
DD
268B<NOTE>: Not all systems have the "POSIX" locale (not all systems are
269POSIX-conformant), so use "C" when you need explicitly to specify this
270default locale.
5f05dabc 271
3e6e419a
JH
272=head2 LOCALE PROBLEMS
273
5a964f20 274You may encounter the following warning message at Perl startup:
3e6e419a
JH
275
276 perl: warning: Setting locale failed.
277 perl: warning: Please check that your locale settings:
278 LC_ALL = "En_US",
279 LANG = (unset)
280 are supported and installed on your system.
281 perl: warning: Falling back to the standard locale ("C").
282
5a964f20
TC
283This means that your locale settings had LC_ALL set to "En_US" and
284LANG exists but has no value. Perl tried to believe you but could not.
285Instead, Perl gave up and fell back to the "C" locale, the default locale
286that is supposed to work no matter what. This usually means your locale
287settings were wrong, they mention locales your system has never heard
288of, or the locale installation in your system has problems (for example,
289some system files are broken or missing). There are quick and temporary
290fixes to these problems, as well as more thorough and lasting fixes.
3e6e419a
JH
291
292=head2 Temporarily fixing locale problems
293
5a964f20 294The two quickest fixes are either to render Perl silent about any
3e6e419a
JH
295locale inconsistencies or to run Perl under the default locale "C".
296
297Perl's moaning about locale problems can be silenced by setting the
900bd440
JH
298environment variable PERL_BADLANG to a zero value, for example "0".
299This method really just sweeps the problem under the carpet: you tell
300Perl to shut up even when Perl sees that something is wrong. Do not
301be surprised if later something locale-dependent misbehaves.
3e6e419a
JH
302
303Perl can be run under the "C" locale by setting the environment
5a964f20
TC
304variable LC_ALL to "C". This method is perhaps a bit more civilized
305than the PERL_BADLANG approach, but setting LC_ALL (or
306other locale variables) may affect other programs as well, not just
307Perl. In particular, external programs run from within Perl will see
3e6e419a 308these changes. If you make the new settings permanent (read on), all
f979aebc 309programs you run see the changes. See L<"ENVIRONMENT"> for
5a964f20 310the full list of relevant environment variables and L<USING LOCALES>
e05ffc7d 311for their effects in Perl. Effects in other programs are
5a964f20 312easily deducible. For example, the variable LC_COLLATE may well affect
b432a672 313your B<sort> program (or whatever the program that arranges "records"
3e6e419a
JH
314alphabetically in your system is called).
315
5a964f20
TC
316You can test out changing these variables temporarily, and if the
317new settings seem to help, put those settings into your shell startup
318files. Consult your local documentation for the exact details. For in
319Bourne-like shells (B<sh>, B<ksh>, B<bash>, B<zsh>):
3e6e419a
JH
320
321 LC_ALL=en_US.ISO8859-1
322 export LC_ALL
323
5a964f20
TC
324This assumes that we saw the locale "en_US.ISO8859-1" using the commands
325discussed above. We decided to try that instead of the above faulty
326locale "En_US"--and in Cshish shells (B<csh>, B<tcsh>)
3e6e419a
JH
327
328 setenv LC_ALL en_US.ISO8859-1
c47ff5f1 329
c406981e
JH
330or if you have the "env" application you can do in any shell
331
332 env LC_ALL=en_US.ISO8859-1 perl ...
333
5a964f20 334If you do not know what shell you have, consult your local
3e6e419a
JH
335helpdesk or the equivalent.
336
337=head2 Permanently fixing locale problems
338
5a964f20
TC
339The slower but superior fixes are when you may be able to yourself
340fix the misconfiguration of your own environment variables. The
3e6e419a
JH
341mis(sing)configuration of the whole system's locales usually requires
342the help of your friendly system administrator.
343
5a964f20
TC
344First, see earlier in this document about L<Finding locales>. That tells
345how to find which locales are really supported--and more importantly,
346installed--on your system. In our example error message, environment
347variables affecting the locale are listed in the order of decreasing
348importance (and unset variables do not matter). Therefore, having
349LC_ALL set to "En_US" must have been the bad choice, as shown by the
350error message. First try fixing locale settings listed first.
3e6e419a 351
5a964f20
TC
352Second, if using the listed commands you see something B<exactly>
353(prefix matches do not count and case usually counts) like "En_US"
354without the quotes, then you should be okay because you are using a
355locale name that should be installed and available in your system.
4a4eefd0 356In this case, see L<Permanently fixing your system's locale configuration>.
3e6e419a 357
4a4eefd0 358=head2 Permanently fixing your system's locale configuration
3e6e419a 359
5a964f20 360This is when you see something like:
3e6e419a
JH
361
362 perl: warning: Please check that your locale settings:
363 LC_ALL = "En_US",
364 LANG = (unset)
365 are supported and installed on your system.
366
367but then cannot see that "En_US" listed by the above-mentioned
5a964f20
TC
368commands. You may see things like "en_US.ISO8859-1", but that isn't
369the same. In this case, try running under a locale
370that you can list and which somehow matches what you tried. The
3e6e419a 371rules for matching locale names are a bit vague because
e05ffc7d 372standardization is weak in this area. See again the
13a2d996 373L<Finding locales> about general rules.
3e6e419a 374
b687b08b 375=head2 Fixing system locale configuration
3e6e419a 376
5a964f20
TC
377Contact a system administrator (preferably your own) and report the exact
378error message you get, and ask them to read this same documentation you
379are now reading. They should be able to check whether there is something
380wrong with the locale configuration of the system. The L<Finding locales>
381section is unfortunately a bit vague about the exact commands and places
382because these things are not that standardized.
3e6e419a 383
5f05dabc 384=head2 The localeconv function
385
14280422
DD
386The POSIX::localeconv() function allows you to get particulars of the
387locale-dependent numeric formatting information specified by the current
388C<LC_NUMERIC> and C<LC_MONETARY> locales. (If you just want the name of
389the current locale for a particular category, use POSIX::setlocale()
5a964f20 390with a single parameter--see L<The setlocale function>.)
5f05dabc 391
392 use POSIX qw(locale_h);
5f05dabc 393
394 # Get a reference to a hash of locale-dependent info
395 $locale_values = localeconv();
396
397 # Output sorted list of the values
398 for (sort keys %$locale_values) {
14280422 399 printf "%-20s = %s\n", $_, $locale_values->{$_}
5f05dabc 400 }
401
14280422 402localeconv() takes no arguments, and returns B<a reference to> a hash.
5a964f20 403The keys of this hash are variable names for formatting, such as
502a173a 404C<decimal_point> and C<thousands_sep>. The values are the
cea6626f 405corresponding, er, values. See L<POSIX/localeconv> for a longer
502a173a
JH
406example listing the categories an implementation might be expected to
407provide; some provide more and others fewer. You don't need an
408explicit C<use locale>, because localeconv() always observes the
409current locale.
5f05dabc 410
5a964f20
TC
411Here's a simple-minded example program that rewrites its command-line
412parameters as integers correctly formatted in the current locale:
5f05dabc 413
ef3087ec
KW
414 # See comments in previous example
415 require 5.004;
416 use POSIX qw(locale_h);
417
418 # Get some of locale's numeric formatting parameters
419 my ($thousands_sep, $grouping) =
420 @{localeconv()}{'thousands_sep', 'grouping'};
421
422 # Apply defaults if values are missing
423 $thousands_sep = ',' unless $thousands_sep;
424
425 # grouping and mon_grouping are packed lists
426 # of small integers (characters) telling the
427 # grouping (thousand_seps and mon_thousand_seps
428 # being the group dividers) of numbers and
429 # monetary quantities. The integers' meanings:
430 # 255 means no more grouping, 0 means repeat
431 # the previous grouping, 1-254 means use that
432 # as the current grouping. Grouping goes from
433 # right to left (low to high digits). In the
434 # below we cheat slightly by never using anything
435 # else than the first grouping (whatever that is).
436 if ($grouping) {
437 @grouping = unpack("C*", $grouping);
438 } else {
439 @grouping = (3);
440 }
441
442 # Format command line params for current locale
443 for (@ARGV) {
444 $_ = int; # Chop non-integer part
445 1 while
446 s/(\d)(\d{$grouping[0]}($|$thousands_sep))/$1$thousands_sep$2/;
447 print "$_";
448 }
449 print "\n";
5f05dabc 450
74c76037 451=head2 I18N::Langinfo
4bbcc6e8
JH
452
453Another interface for querying locale-dependent information is the
e1020413 454I18N::Langinfo::langinfo() function, available at least in Unix-like
4bbcc6e8
JH
455systems and VMS.
456
74c76037
JH
457The following example will import the langinfo() function itself and
458three constants to be used as arguments to langinfo(): a constant for
459the abbreviated first day of the week (the numbering starts from
460Sunday = 1) and two more constants for the affirmative and negative
461answers for a yes/no question in the current locale.
4bbcc6e8 462
74c76037 463 use I18N::Langinfo qw(langinfo ABDAY_1 YESSTR NOSTR);
4bbcc6e8 464
ef3087ec
KW
465 my ($abday_1, $yesstr, $nostr)
466 = map { langinfo } qw(ABDAY_1 YESSTR NOSTR);
4bbcc6e8 467
74c76037 468 print "$abday_1? [$yesstr/$nostr] ";
4bbcc6e8 469
74c76037
JH
470In other words, in the "C" (or English) locale the above will probably
471print something like:
472
e05ffc7d 473 Sun? [yes/no]
4bbcc6e8
JH
474
475See L<I18N::Langinfo> for more information.
476
5f05dabc 477=head1 LOCALE CATEGORIES
478
5a964f20
TC
479The following subsections describe basic locale categories. Beyond these,
480some combination categories allow manipulation of more than one
481basic category at a time. See L<"ENVIRONMENT"> for a discussion of these.
5f05dabc 482
483=head2 Category LC_COLLATE: Collation
484
5a964f20
TC
485In the scope of S<C<use locale>>, Perl looks to the C<LC_COLLATE>
486environment variable to determine the application's notions on collation
b4ffc3db
TC
487(ordering) of characters. For example, "b" follows "a" in Latin
488alphabets, but where do "E<aacute>" and "E<aring>" belong? And while
489"color" follows "chocolate" in English, what about in Spanish?
5f05dabc 490
60f0fa02
JH
491The following collations all make sense and you may meet any of them
492if you "use locale".
493
494 A B C D E a b c d e
35316ca3 495 A a B b C c D d E e
60f0fa02
JH
496 a A b B c C d D e E
497 a b c d e A B C D E
498
f1cbbd6e 499Here is a code snippet to tell what "word"
5a964f20 500characters are in the current locale, in that locale's order:
5f05dabc 501
502 use locale;
35316ca3 503 print +(sort grep /\w/, map { chr } 0..255), "\n";
5f05dabc 504
14280422
DD
505Compare this with the characters that you see and their order if you
506state explicitly that the locale should be ignored:
5f05dabc 507
508 no locale;
35316ca3 509 print +(sort grep /\w/, map { chr } 0..255), "\n";
5f05dabc 510
511This machine-native collation (which is what you get unless S<C<use
512locale>> has appeared earlier in the same block) must be used for
513sorting raw binary data, whereas the locale-dependent collation of the
b0c42ed9 514first example is useful for natural text.
5f05dabc 515
14280422
DD
516As noted in L<USING LOCALES>, C<cmp> compares according to the current
517collation locale when C<use locale> is in effect, but falls back to a
de108802 518char-by-char comparison for strings that the locale says are equal. You
14280422
DD
519can use POSIX::strcoll() if you don't want this fall-back:
520
521 use POSIX qw(strcoll);
522 $equal_in_locale =
523 !strcoll("space and case ignored", "SpaceAndCaseIgnored");
524
525$equal_in_locale will be true if the collation locale specifies a
5a964f20 526dictionary-like ordering that ignores space characters completely and
9e3a2af8 527which folds case.
14280422 528
5a964f20 529If you have a single string that you want to check for "equality in
14280422
DD
530locale" against several others, you might think you could gain a little
531efficiency by using POSIX::strxfrm() in conjunction with C<eq>:
532
533 use POSIX qw(strxfrm);
534 $xfrm_string = strxfrm("Mixed-case string");
535 print "locale collation ignores spaces\n"
536 if $xfrm_string eq strxfrm("Mixed-casestring");
537 print "locale collation ignores hyphens\n"
538 if $xfrm_string eq strxfrm("Mixedcase string");
539 print "locale collation ignores case\n"
540 if $xfrm_string eq strxfrm("mixed-case string");
541
542strxfrm() takes a string and maps it into a transformed string for use
de108802 543in char-by-char comparisons against other transformed strings during
14280422 544collation. "Under the hood", locale-affected Perl comparison operators
de108802 545call strxfrm() for both operands, then do a char-by-char
5a964f20 546comparison of the transformed strings. By calling strxfrm() explicitly
14280422 547and using a non locale-affected comparison, the example attempts to save
5a964f20 548a couple of transformations. But in fact, it doesn't save anything: Perl
2ae324a7 549magic (see L<perlguts/Magic Variables>) creates the transformed version of a
5a964f20 550string the first time it's needed in a comparison, then keeps this version around
14280422 551in case it's needed again. An example rewritten the easy way with
e38874e2 552C<cmp> runs just about as fast. It also copes with null characters
14280422 553embedded in strings; if you call strxfrm() directly, it treats the first
5a964f20
TC
554null it finds as a terminator. don't expect the transformed strings
555it produces to be portable across systems--or even from one revision
e38874e2
DD
556of your operating system to the next. In short, don't call strxfrm()
557directly: let Perl do it for you.
14280422 558
5a964f20 559Note: C<use locale> isn't shown in some of these examples because it isn't
14280422
DD
560needed: strcoll() and strxfrm() exist only to generate locale-dependent
561results, and so always obey the current C<LC_COLLATE> locale.
5f05dabc 562
563=head2 Category LC_CTYPE: Character Types
564
5a964f20 565In the scope of S<C<use locale>>, Perl obeys the C<LC_CTYPE> locale
14280422
DD
566setting. This controls the application's notion of which characters are
567alphabetic. This affects Perl's C<\w> regular expression metanotation,
f1cbbd6e
GS
568which stands for alphanumeric characters--that is, alphabetic,
569numeric, and including other special characters such as the underscore or
570hyphen. (Consult L<perlre> for more information about
14280422 571regular expressions.) Thanks to C<LC_CTYPE>, depending on your locale
b4ffc3db
TC
572setting, characters like "E<aelig>", "E<eth>", "E<szlig>", and
573"E<oslash>" may be understood as C<\w> characters.
5f05dabc 574
2c268ad5 575The C<LC_CTYPE> locale also provides the map used in transliterating
68dc0745 576characters between lower and uppercase. This affects the case-mapping
5a964f20
TC
577functions--lc(), lcfirst, uc(), and ucfirst(); case-mapping
578interpolation with C<\l>, C<\L>, C<\u>, or C<\U> in double-quoted strings
579and C<s///> substitutions; and case-independent regular expression
e38874e2
DD
580pattern matching using the C<i> modifier.
581
5a964f20
TC
582Finally, C<LC_CTYPE> affects the POSIX character-class test
583functions--isalpha(), islower(), and so on. For example, if you move
584from the "C" locale to a 7-bit Scandinavian one, you may find--possibly
585to your surprise--that "|" moves from the ispunct() class to isalpha().
ef3087ec
KW
586Unfortunately, this creates big problems for regular expressions. "|" still
587means alternation even though it matches C<\w>.
5f05dabc 588
14280422
DD
589B<Note:> A broken or malicious C<LC_CTYPE> locale definition may result
590in clearly ineligible characters being considered to be alphanumeric by
e199995e 591your application. For strict matching of (mundane) ASCII letters and
5a964f20 592digits--for example, in command strings--locale-aware applications
e199995e 593should use C<\w> with the C</a> regular expression modifier. See L<"SECURITY">.
5f05dabc 594
595=head2 Category LC_NUMERIC: Numeric Formatting
596
2095dafa
RGS
597After a proper POSIX::setlocale() call, Perl obeys the C<LC_NUMERIC>
598locale information, which controls an application's idea of how numbers
599should be formatted for human readability by the printf(), sprintf(), and
600write() functions. String-to-numeric conversion by the POSIX::strtod()
5a964f20 601function is also affected. In most implementations the only effect is to
b4ffc3db 602change the character used for the decimal point--perhaps from "." to ",".
5a964f20 603These functions aren't aware of such niceties as thousands separation and
2095dafa 604so on. (See L<The localeconv function> if you care about these things.)
5a964f20 605
3cf03d68 606Output produced by print() is also affected by the current locale: it
3cf03d68
JH
607corresponds to what you'd get from printf() in the "C" locale. The
608same is true for Perl's internal conversions between numeric and
609string formats:
5f05dabc 610
2095dafa
RGS
611 use POSIX qw(strtod setlocale LC_NUMERIC);
612
613 setlocale LC_NUMERIC, "";
14280422 614
5f05dabc 615 $n = 5/2; # Assign numeric 2.5 to $n
616
35316ca3 617 $a = " $n"; # Locale-dependent conversion to string
5f05dabc 618
35316ca3 619 print "half five is $n\n"; # Locale-dependent output
5f05dabc 620
621 printf "half five is %g\n", $n; # Locale-dependent output
622
14280422
DD
623 print "DECIMAL POINT IS COMMA\n"
624 if $n == (strtod("2,5"))[0]; # Locale-dependent conversion
5f05dabc 625
4bbcc6e8
JH
626See also L<I18N::Langinfo> and C<RADIXCHAR>.
627
5f05dabc 628=head2 Category LC_MONETARY: Formatting of monetary amounts
629
e199995e 630The C standard defines the C<LC_MONETARY> category, but not a function
5a964f20 631that is affected by its contents. (Those with experience of standards
b0c42ed9 632committees will recognize that the working group decided to punt on the
14280422 633issue.) Consequently, Perl takes no notice of it. If you really want
e05ffc7d
KW
634to use C<LC_MONETARY>, you can query its contents--see
635L<The localeconv function>--and use the information that it returns in your
636application's own formatting of currency amounts. However, you may well
637find that the information, voluminous and complex though it may be, still
638does not quite meet your requirements: currency formatting is a hard nut
13a2d996 639to crack.
5f05dabc 640
4bbcc6e8
JH
641See also L<I18N::Langinfo> and C<CRNCYSTR>.
642
5f05dabc 643=head2 LC_TIME
644
5a964f20 645Output produced by POSIX::strftime(), which builds a formatted
5f05dabc 646human-readable date/time string, is affected by the current C<LC_TIME>
647locale. Thus, in a French locale, the output produced by the C<%B>
648format element (full month name) for the first month of the year would
5a964f20 649be "janvier". Here's how to get a list of long month names in the
5f05dabc 650current locale:
651
652 use POSIX qw(strftime);
14280422
DD
653 for (0..11) {
654 $long_month_name[$_] =
655 strftime("%B", 0, 0, 0, 1, $_, 96);
5f05dabc 656 }
657
5a964f20 658Note: C<use locale> isn't needed in this example: as a function that
14280422
DD
659exists only to generate locale-dependent results, strftime() always
660obeys the current C<LC_TIME> locale.
5f05dabc 661
4bbcc6e8 662See also L<I18N::Langinfo> and C<ABDAY_1>..C<ABDAY_7>, C<DAY_1>..C<DAY_7>,
2a2bf5f4 663C<ABMON_1>..C<ABMON_12>, and C<ABMON_1>..C<ABMON_12>.
4bbcc6e8 664
5f05dabc 665=head2 Other categories
666
5a964f20
TC
667The remaining locale category, C<LC_MESSAGES> (possibly supplemented
668by others in particular implementations) is not currently used by
98a6f11e 669Perl--except possibly to affect the behavior of library functions
670called by extensions outside the standard Perl distribution and by the
671operating system and its utilities. Note especially that the string
672value of C<$!> and the error messages given by external utilities may
673be changed by C<LC_MESSAGES>. If you want to have portable error
265f5c4a 674codes, use C<%!>. See L<Errno>.
14280422
DD
675
676=head1 SECURITY
677
5a964f20 678Although the main discussion of Perl security issues can be found in
14280422
DD
679L<perlsec>, a discussion of Perl's locale handling would be incomplete
680if it did not draw your attention to locale-dependent security issues.
5a964f20
TC
681Locales--particularly on systems that allow unprivileged users to
682build their own locales--are untrustworthy. A malicious (or just plain
14280422
DD
683broken) locale can make a locale-aware application give unexpected
684results. Here are a few possibilities:
685
686=over 4
687
688=item *
689
690Regular expression checks for safe file names or mail addresses using
5a964f20 691C<\w> may be spoofed by an C<LC_CTYPE> locale that claims that
14280422
DD
692characters such as "E<gt>" and "|" are alphanumeric.
693
694=item *
695
e38874e2
DD
696String interpolation with case-mapping, as in, say, C<$dest =
697"C:\U$name.$ext">, may produce dangerous results if a bogus LC_CTYPE
698case-mapping table is in effect.
699
700=item *
701
14280422
DD
702A sneaky C<LC_COLLATE> locale could result in the names of students with
703"D" grades appearing ahead of those with "A"s.
704
705=item *
706
5a964f20 707An application that takes the trouble to use information in
14280422 708C<LC_MONETARY> may format debits as if they were credits and vice versa
5a964f20 709if that locale has been subverted. Or it might make payments in US
14280422
DD
710dollars instead of Hong Kong dollars.
711
712=item *
713
714The date and day names in dates formatted by strftime() could be
715manipulated to advantage by a malicious user able to subvert the
5a964f20 716C<LC_DATE> locale. ("Look--it says I wasn't in the building on
14280422
DD
717Sunday.")
718
719=back
720
721Such dangers are not peculiar to the locale system: any aspect of an
5a964f20 722application's environment which may be modified maliciously presents
14280422 723similar challenges. Similarly, they are not specific to Perl: any
5a964f20 724programming language that allows you to write programs that take
14280422
DD
725account of their environment exposes you to these issues.
726
5a964f20
TC
727Perl cannot protect you from all possibilities shown in the
728examples--there is no substitute for your own vigilance--but, when
14280422 729C<use locale> is in effect, Perl uses the tainting mechanism (see
5a964f20 730L<perlsec>) to mark string results that become locale-dependent, and
14280422 731which may be untrustworthy in consequence. Here is a summary of the
5a964f20 732tainting behavior of operators and functions that may be affected by
14280422
DD
733the locale:
734
735=over 4
736
551e1d92
RB
737=item *
738
739B<Comparison operators> (C<lt>, C<le>, C<ge>, C<gt> and C<cmp>):
14280422
DD
740
741Scalar true/false (or less/equal/greater) result is never tainted.
742
551e1d92
RB
743=item *
744
745B<Case-mapping interpolation> (with C<\l>, C<\L>, C<\u> or C<\U>)
e38874e2
DD
746
747Result string containing interpolated material is tainted if
748C<use locale> is in effect.
749
551e1d92
RB
750=item *
751
752B<Matching operator> (C<m//>):
14280422
DD
753
754Scalar true/false result never tainted.
755
5a964f20 756Subpatterns, either delivered as a list-context result or as $1 etc.
14280422 757are tainted if C<use locale> is in effect, and the subpattern regular
e38874e2 758expression contains C<\w> (to match an alphanumeric character), C<\W>
6b0ac556
OK
759(non-alphanumeric character), C<\s> (whitespace character), or C<\S>
760(non whitespace character). The matched-pattern variable, $&, $`
e38874e2
DD
761(pre-match), $' (post-match), and $+ (last match) are also tainted if
762C<use locale> is in effect and the regular expression contains C<\w>,
763C<\W>, C<\s>, or C<\S>.
14280422 764
551e1d92
RB
765=item *
766
767B<Substitution operator> (C<s///>):
14280422 768
e38874e2 769Has the same behavior as the match operator. Also, the left
5a964f20
TC
770operand of C<=~> becomes tainted when C<use locale> in effect
771if modified as a result of a substitution based on a regular
e38874e2 772expression match involving C<\w>, C<\W>, C<\s>, or C<\S>; or of
7b8d334a 773case-mapping with C<\l>, C<\L>,C<\u> or C<\U>.
14280422 774
551e1d92
RB
775=item *
776
777B<Output formatting functions> (printf() and write()):
14280422 778
3cf03d68
JH
779Results are never tainted because otherwise even output from print,
780for example C<print(1/7)>, should be tainted if C<use locale> is in
781effect.
14280422 782
551e1d92
RB
783=item *
784
785B<Case-mapping functions> (lc(), lcfirst(), uc(), ucfirst()):
14280422
DD
786
787Results are tainted if C<use locale> is in effect.
788
551e1d92
RB
789=item *
790
791B<POSIX locale-dependent functions> (localeconv(), strcoll(),
14280422
DD
792strftime(), strxfrm()):
793
794Results are never tainted.
795
551e1d92
RB
796=item *
797
798B<POSIX character class tests> (isalnum(), isalpha(), isdigit(),
14280422
DD
799isgraph(), islower(), isprint(), ispunct(), isspace(), isupper(),
800isxdigit()):
801
802True/false results are never tainted.
803
804=back
805
806Three examples illustrate locale-dependent tainting.
807The first program, which ignores its locale, won't run: a value taken
54310121 808directly from the command line may not be used to name an output file
14280422
DD
809when taint checks are enabled.
810
811 #/usr/local/bin/perl -T
812 # Run with taint checking
813
54310121 814 # Command line sanity check omitted...
14280422
DD
815 $tainted_output_file = shift;
816
817 open(F, ">$tainted_output_file")
818 or warn "Open of $untainted_output_file failed: $!\n";
819
820The program can be made to run by "laundering" the tainted value through
5a964f20
TC
821a regular expression: the second example--which still ignores locale
822information--runs, creating the file named on its command line
14280422
DD
823if it can.
824
825 #/usr/local/bin/perl -T
826
827 $tainted_output_file = shift;
828 $tainted_output_file =~ m%[\w/]+%;
829 $untainted_output_file = $&;
830
831 open(F, ">$untainted_output_file")
832 or warn "Open of $untainted_output_file failed: $!\n";
833
5a964f20 834Compare this with a similar but locale-aware program:
14280422
DD
835
836 #/usr/local/bin/perl -T
837
838 $tainted_output_file = shift;
839 use locale;
840 $tainted_output_file =~ m%[\w/]+%;
841 $localized_output_file = $&;
842
843 open(F, ">$localized_output_file")
844 or warn "Open of $localized_output_file failed: $!\n";
845
846This third program fails to run because $& is tainted: it is the result
5a964f20 847of a match involving C<\w> while C<use locale> is in effect.
5f05dabc 848
849=head1 ENVIRONMENT
850
851=over 12
852
853=item PERL_BADLANG
854
14280422 855A string that can suppress Perl's warning about failed locale settings
54310121 856at startup. Failure can occur if the locale support in the operating
5a964f20 857system is lacking (broken) in some way--or if you mistyped the name of
900bd440
JH
858a locale when you set up your environment. If this environment
859variable is absent, or has a value that does not evaluate to integer
860zero--that is, "0" or ""-- Perl will complain about locale setting
861failures.
5f05dabc 862
14280422
DD
863B<NOTE>: PERL_BADLANG only gives you a way to hide the warning message.
864The message tells about some problem in your system's locale support,
865and you should investigate what the problem is.
5f05dabc 866
867=back
868
869The following environment variables are not specific to Perl: They are
14280422
DD
870part of the standardized (ISO C, XPG4, POSIX 1.c) setlocale() method
871for controlling an application's opinion on data.
5f05dabc 872
873=over 12
874
875=item LC_ALL
876
5a964f20 877C<LC_ALL> is the "override-all" locale environment variable. If
5f05dabc 878set, it overrides all the rest of the locale environment variables.
879
528d65ad
JH
880=item LANGUAGE
881
882B<NOTE>: C<LANGUAGE> is a GNU extension, it affects you only if you
883are using the GNU libc. This is the case if you are using e.g. Linux.
e1020413 884If you are using "commercial" Unixes you are most probably I<not>
22b6f60d
JH
885using GNU libc and you can ignore C<LANGUAGE>.
886
887However, in the case you are using C<LANGUAGE>: it affects the
888language of informational, warning, and error messages output by
889commands (in other words, it's like C<LC_MESSAGES>) but it has higher
96090e4f 890priority than C<LC_ALL>. Moreover, it's not a single value but
22b6f60d
JH
891instead a "path" (":"-separated list) of I<languages> (not locales).
892See the GNU C<gettext> library documentation for more information.
528d65ad 893
5f05dabc 894=item LC_CTYPE
895
896In the absence of C<LC_ALL>, C<LC_CTYPE> chooses the character type
897locale. In the absence of both C<LC_ALL> and C<LC_CTYPE>, C<LANG>
898chooses the character type locale.
899
900=item LC_COLLATE
901
14280422
DD
902In the absence of C<LC_ALL>, C<LC_COLLATE> chooses the collation
903(sorting) locale. In the absence of both C<LC_ALL> and C<LC_COLLATE>,
904C<LANG> chooses the collation locale.
5f05dabc 905
906=item LC_MONETARY
907
14280422
DD
908In the absence of C<LC_ALL>, C<LC_MONETARY> chooses the monetary
909formatting locale. In the absence of both C<LC_ALL> and C<LC_MONETARY>,
910C<LANG> chooses the monetary formatting locale.
5f05dabc 911
912=item LC_NUMERIC
913
914In the absence of C<LC_ALL>, C<LC_NUMERIC> chooses the numeric format
915locale. In the absence of both C<LC_ALL> and C<LC_NUMERIC>, C<LANG>
916chooses the numeric format.
917
918=item LC_TIME
919
14280422
DD
920In the absence of C<LC_ALL>, C<LC_TIME> chooses the date and time
921formatting locale. In the absence of both C<LC_ALL> and C<LC_TIME>,
922C<LANG> chooses the date and time formatting locale.
5f05dabc 923
924=item LANG
925
14280422
DD
926C<LANG> is the "catch-all" locale environment variable. If it is set, it
927is used as the last resort after the overall C<LC_ALL> and the
5f05dabc 928category-specific C<LC_...>.
929
930=back
931
7e4353e9
RGS
932=head2 Examples
933
934The LC_NUMERIC controls the numeric output:
935
ef3087ec
KW
936 use locale;
937 use POSIX qw(locale_h); # Imports setlocale() and the LC_ constants.
938 setlocale(LC_NUMERIC, "fr_FR") or die "Pardon";
939 printf "%g\n", 1.23; # If the "fr_FR" succeeded, probably shows 1,23.
7e4353e9
RGS
940
941and also how strings are parsed by POSIX::strtod() as numbers:
942
ef3087ec
KW
943 use locale;
944 use POSIX qw(locale_h strtod);
945 setlocale(LC_NUMERIC, "de_DE") or die "Entschuldigung";
946 my $x = strtod("2,34") + 5;
947 print $x, "\n"; # Probably shows 7,34.
7e4353e9 948
5f05dabc 949=head1 NOTES
950
951=head2 Backward compatibility
952
b0c42ed9 953Versions of Perl prior to 5.004 B<mostly> ignored locale information,
5a964f20
TC
954generally behaving as if something similar to the C<"C"> locale were
955always in force, even if the program environment suggested otherwise
956(see L<The setlocale function>). By default, Perl still behaves this
957way for backward compatibility. If you want a Perl application to pay
958attention to locale information, you B<must> use the S<C<use locale>>
062ca197
KW
959pragma (see L<The use locale pragma>) or, in the unlikely event
960that you want to do so for just pattern matching, the
70709c68
KW
961C</l> regular expression modifier (see L<perlre/Character set
962modifiers>) to instruct it to do so.
b0c42ed9
JH
963
964Versions of Perl from 5.002 to 5.003 did use the C<LC_CTYPE>
5a964f20
TC
965information if available; that is, C<\w> did understand what
966were the letters according to the locale environment variables.
b0c42ed9
JH
967The problem was that the user had no control over the feature:
968if the C library supported locales, Perl used them.
969
970=head2 I18N:Collate obsolete
971
5a964f20 972In versions of Perl prior to 5.004, per-locale collation was possible
b0c42ed9
JH
973using the C<I18N::Collate> library module. This module is now mildly
974obsolete and should be avoided in new applications. The C<LC_COLLATE>
975functionality is now integrated into the Perl core language: One can
976use locale-specific scalar data completely normally with C<use locale>,
977so there is no longer any need to juggle with the scalar references of
978C<I18N::Collate>.
5f05dabc 979
14280422 980=head2 Sort speed and memory use impacts
5f05dabc 981
982Comparing and sorting by locale is usually slower than the default
14280422
DD
983sorting; slow-downs of two to four times have been observed. It will
984also consume more memory: once a Perl scalar variable has participated
985in any string comparison or sorting operation obeying the locale
986collation rules, it will take 3-15 times more memory than before. (The
987exact multiplier depends on the string's contents, the operating system
988and the locale.) These downsides are dictated more by the operating
989system's implementation of the locale system than by Perl.
5f05dabc 990
e38874e2
DD
991=head2 write() and LC_NUMERIC
992
903eb63f
NT
993If a program's environment specifies an LC_NUMERIC locale and C<use
994locale> is in effect when the format is declared, the locale is used
995to specify the decimal point character in formatted output. Formatted
996output cannot be controlled by C<use locale> at the time when write()
997is called.
e38874e2 998
5f05dabc 999=head2 Freely available locale definitions
1000
08d7a6b2
LB
1001There is a large collection of locale definitions at:
1002
1003 http://std.dkuug.dk/i18n/WG15-collection/locales/
1004
1005You should be aware that it is
14280422 1006unsupported, and is not claimed to be fit for any purpose. If your
5a964f20 1007system allows installation of arbitrary locales, you may find the
14280422
DD
1008definitions useful as they are, or as a basis for the development of
1009your own locales.
5f05dabc 1010
14280422 1011=head2 I18n and l10n
5f05dabc 1012
b0c42ed9
JH
1013"Internationalization" is often abbreviated as B<i18n> because its first
1014and last letters are separated by eighteen others. (You may guess why
1015the internalin ... internaliti ... i18n tends to get abbreviated.) In
1016the same way, "localization" is often abbreviated to B<l10n>.
14280422
DD
1017
1018=head2 An imperfect standard
1019
1020Internationalization, as defined in the C and POSIX standards, can be
1021criticized as incomplete, ungainly, and having too large a granularity.
1022(Locales apply to a whole process, when it would arguably be more useful
1023to have them apply to a single thread, window group, or whatever.) They
1024also have a tendency, like standards groups, to divide the world into
1025nations, when we all know that the world can equally well be divided
e199995e 1026into bankers, bikers, gamers, and so on.
5f05dabc 1027
b310b053
JH
1028=head1 Unicode and UTF-8
1029
e199995e 1030The support of Unicode is new starting from Perl version 5.6, and more fully
b4ffc3db
TC
1031implemented in version 5.8 and later. See L<perluniintro>. Perl tries to
1032work with both Unicode and locales--but of course, there are problems.
e199995e
KW
1033
1034Perl does not handle multi-byte locales, such as have been used for various
dc4bfc4b
KW
1035Asian languages, such as Big5 or Shift JIS. However, the increasingly
1036common multi-byte UTF-8 locales, if properly implemented, may work
1037reasonably well (depending on your C library implementation) in this
1038form of the locale pragma, simply because both
1039they and Perl store characters that take up multiple bytes the same way.
1040However, some, if not most, C library implementations may not process
1041the characters in the upper half of the Latin-1 range (128 - 255)
1042properly under LC_CTYPE. To see if a character is a particular type
1043under a locale, Perl uses the functions like C<isalnum()>. Your C
1044library may not work for UTF-8 locales with those functions, instead
1045only working under the newer wide library functions like C<iswalnum()>.
e199995e
KW
1046
1047Perl generally takes the tack to use locale rules on code points that can fit
1048in a single byte, and Unicode rules for those that can't (though this wasn't
1049uniformly applied prior to Perl 5.14). This prevents many problems in locales
1050that aren't UTF-8. Suppose the locale is ISO8859-7, Greek. The character at
10510xD7 there is a capital Chi. But in the ISO8859-1 locale, Latin1, it is a
1052multiplication sign. The POSIX regular expression character class
b4ffc3db
TC
1053C<[[:alpha:]]> will magically match 0xD7 in the Greek locale but not in the
1054Latin one, even if the string is encoded in UTF-8, which would normally imply
1055Unicode semantics. (The "U" in UTF-8 stands for Unicode.)
e199995e
KW
1056
1057However, there are places where this breaks down. Certain constructs are
b4ffc3db
TC
1058for Unicode only, such as C<\p{Alpha}>. They assume that 0xD7 always has its
1059Unicode meaning (or the equivalent on EBCDIC platforms). Since Latin1 is a
1060subset of Unicode and 0xD7 is the multiplication sign in both Latin1 and
1061Unicode, C<\p{Alpha}> will never match it, regardless of locale. A similar
1062issue occurs with C<\N{...}>. It is therefore a bad idea to use C<\p{}> or
1063C<\N{}> under C<use locale>--I<unless> you can guarantee that the locale will
1064be a ISO8859-1 or UTF-8 one. Use POSIX character classes instead.
1065
e199995e
KW
1066
1067The same problem ensues if you enable automatic UTF-8-ification of your
1068standard file handles, default C<open()> layer, and C<@ARGV> on non-ISO8859-1,
b4ffc3db
TC
1069non-UTF-8 locales (by using either the B<-C> command line switch or the
1070C<PERL_UNICODE> environment variable; see L<perlrun>).
1071Things are read in as UTF-8, which would normally imply a Unicode
1072interpretation, but the presence of a locale causes them to be interpreted
1073in that locale instead. For example, a 0xD7 code point in the Unicode
1074input, which should mean the multiplication sign, won't be interpreted by
1075Perl that way under the Greek locale. Again, this is not a problem
1076I<provided> you make certain that all locales will always and only be either
1077an ISO8859-1 or a UTF-8 locale.
1078
1079Vendor locales are notoriously buggy, and it is difficult for Perl to test
1080its locale-handling code because this interacts with code that Perl has no
1081control over; therefore the locale-handling code in Perl may be buggy as
1082well. But if you I<do> have locales that work, using them may be
1083worthwhile for certain specific purposes, as long as you keep in mind the
1084gotchas already mentioned. For example, collation runs faster under
1085locales than under L<Unicode::Collate> (albeit with less flexibility), and
1086you gain access to such things as the local currency symbol and the names
1087of the months and days of the week.
b310b053 1088
5f05dabc 1089=head1 BUGS
1090
1091=head2 Broken systems
1092
5a964f20 1093In certain systems, the operating system's locale support
2bdf8add 1094is broken and cannot be fixed or used by Perl. Such deficiencies can
b4ffc3db 1095and will result in mysterious hangs and/or Perl core dumps when
2bdf8add 1096C<use locale> is in effect. When confronted with such a system,
7f2de2d2 1097please report in excruciating detail to <F<perlbug@perl.org>>, and
b4ffc3db 1098also contact your vendor: bug fixes may exist for these problems
2bdf8add
JH
1099in your operating system. Sometimes such bug fixes are called an
1100operating system upgrade.
5f05dabc 1101
1102=head1 SEE ALSO
1103
b310b053
JH
1104L<I18N::Langinfo>, L<perluniintro>, L<perlunicode>, L<open>,
1105L<POSIX/isalnum>, L<POSIX/isalpha>,
4bbcc6e8
JH
1106L<POSIX/isdigit>, L<POSIX/isgraph>, L<POSIX/islower>,
1107L<POSIX/isprint>, L<POSIX/ispunct>, L<POSIX/isspace>,
1108L<POSIX/isupper>, L<POSIX/isxdigit>, L<POSIX/localeconv>,
1109L<POSIX/setlocale>, L<POSIX/strcoll>, L<POSIX/strftime>,
1110L<POSIX/strtod>, L<POSIX/strxfrm>.
5f05dabc 1111
1112=head1 HISTORY
1113
b0c42ed9 1114Jarkko Hietaniemi's original F<perli18n.pod> heavily hacked by Dominic
5a964f20 1115Dunlop, assisted by the perl5-porters. Prose worked over a bit by
c052850d 1116Tom Christiansen, and updated by Perl 5 porters.