pod/perllocale.pod

   1 =encoding utf8
   2
   3 =head1 NAME
   4
   5 perllocale - Perl locale handling (internationalization and localization)
   6
   7 =head1 DESCRIPTION
   8
   9 In the beginning there was ASCII, the "American Standard Code for
  10 Information Interchange", which works quite well for Americans with
  11 their English alphabet and dollar-denominated currency.  But it doesn't
  12 work so well even for other English speakers, who may use different
  13 currencies, such as the pound sterling (as the symbol for that currency
  14 is not in ASCII); and it's hopelessly inadequate for many of the
  15 thousands of the world's other languages.
  16
  17 To address these deficiencies, the concept of locales was invented
  18 (formally the ISO C, XPG4, POSIX 1.c "locale system").  And applications
  19 were and are being written that use the locale mechanism.  The process of
  20 making such an application take account of its users' preferences in
  21 these kinds of matters is called B<internationalization> (often
  22 abbreviated as B<i18n>); telling such an application about a particular
  23 set of preferences is known as B<localization> (B<l10n>).
  24
  25 Perl has been extended to support certain types of locales available in
  26 the locale system.  This is controlled per application by using one
  27 pragma, one function call, and several environment variables.
  28
  29 Perl supports single-byte locales that are supersets of ASCII, such as
  30 the ISO 8859 ones, and one multi-byte-type locale, UTF-8 ones, described
  31 in the next paragraph.  Perl doesn't support any other multi-byte
  32 locales, such as the ones for East Asian languages.
  33
  34 Unfortunately, there are quite a few deficiencies with the design (and
  35 often, the implementations) of locales.  Unicode was invented (see
  36 L<perlunitut> for an introduction to that) in part to address these
  37 design deficiencies, and nowadays, there is a series of "UTF-8
  38 locales", based on Unicode.  These are locales whose character set is
  39 Unicode, encoded in UTF-8.  Starting in v5.20, Perl fully supports
  40 UTF-8 locales, except for sorting and string comparisons like C<lt> and
  41 C<ge>.  Starting in v5.26, Perl can handle these reasonably as well,
  42 depending on the platform's implementation.  However, for earlier
  43 releases or for better control, use L<Unicode::Collate>.  There are
  44 actually two slightly different types of UTF-8 locales: one for Turkic
  45 languages and one for everything else.  Starting in Perl v5.30, Perl
  46 seamlessly handles both types; previously only the non-Turkic one was
  47 supported.
  48
  49 Perl continues to support the old non UTF-8 locales as well.  There are
  50 currently no UTF-8 locales for EBCDIC platforms.
  51
  52 (Unicode is also creating C<CLDR>, the "Common Locale Data Repository",
  53 L<http://cldr.unicode.org/> which includes more types of information than
  54 are available in the POSIX locale system.  At the time of this writing,
  55 there was no CPAN module that provides access to this XML-encoded data.
  56 However, it is possible to compute the POSIX locale data from them, and
  57 earlier CLDR versions had these already extracted for you as UTF-8 locales
  58 L<http://unicode.org/Public/cldr/2.0.1/>.)
  59
  60 =head1 WHAT IS A LOCALE
  61
  62 A locale is a set of data that describes various aspects of how various
  63 communities in the world categorize their world.  These categories are
  64 broken down into the following types (some of which include a brief
  65 note here):
  66
  67 =over
  68
  69 =item Category C<LC_NUMERIC>: Numeric formatting
  70
  71 This indicates how numbers should be formatted for human readability,
  72 for example the character used as the decimal point.
  73
  74 =item Category C<LC_MONETARY>: Formatting of monetary amounts
  75
  76 Z<>
  77
  78 =item Category C<LC_TIME>: Date/Time formatting
  79
  80 Z<>
  81
  82 =item Category C<LC_MESSAGES>: Error and other messages
  83
  84 This is used by Perl itself only for accessing operating system error
  85 messages via L<$!|perlvar/$ERRNO> and L<$^E|perlvar/$EXTENDED_OS_ERROR>.
  86
  87 =item Category C<LC_COLLATE>: Collation
  88
  89 This indicates the ordering of letters for comparison and sorting.
  90 In Latin alphabets, for example, "b", generally follows "a".
  91
  92 =item Category C<LC_CTYPE>: Character Types
  93
  94 This indicates, for example if a character is an uppercase letter.
  95
  96 =item Other categories
  97
  98 Some platforms have other categories, dealing with such things as
  99 measurement units and paper sizes.  None of these are used directly by
 100 Perl, but outside operations that Perl interacts with may use
 101 these.  See L</Not within the scope of "use locale"> below.
 102
 103 =back
 104
 105 More details on the categories used by Perl are given below in L</LOCALE
 106 CATEGORIES>.
 107
 108 Together, these categories go a long way towards being able to customize
 109 a single program to run in many different locations.  But there are
 110 deficiencies, so keep reading.
 111
 112 =head1 PREPARING TO USE LOCALES
 113
 114 Perl itself (outside the L<POSIX> module) will not use locales unless
 115 specifically requested to (but
 116 again note that Perl may interact with code that does use them).  Even
 117 if there is such a request, B<all> of the following must be true
 118 for it to work properly:
 119
 120 =over 4
 121
 122 =item *
 123
 124 B<Your operating system must support the locale system>.  If it does,
 125 you should find that the C<setlocale()> function is a documented part of
 126 its C library.
 127
 128 =item *
 129
 130 B<Definitions for locales that you use must be installed>.  You, or
 131 your system administrator, must make sure that this is the case. The
 132 available locales, the location in which they are kept, and the manner
 133 in which they are installed all vary from system to system.  Some systems
 134 provide only a few, hard-wired locales and do not allow more to be
 135 added.  Others allow you to add "canned" locales provided by the system
 136 supplier.  Still others allow you or the system administrator to define
 137 and add arbitrary locales.  (You may have to ask your supplier to
 138 provide canned locales that are not delivered with your operating
 139 system.)  Read your system documentation for further illumination.
 140
 141 =item *
 142
 143 B<Perl must believe that the locale system is supported>.  If it does,
 144 C<perl -V:d_setlocale> will say that the value for C<d_setlocale> is
 145 C<define>.
 146
 147 =back
 148
 149 If you want a Perl application to process and present your data
 150 according to a particular locale, the application code should include
 151 the S<C<use locale>> pragma (see L</The "use locale" pragma>) where
 152 appropriate, and B<at least one> of the following must be true:
 153
 154 =over 4
 155
 156 =item 1
 157
 158 B<The locale-determining environment variables (see L</"ENVIRONMENT">)
 159 must be correctly set up> at the time the application is started, either
 160 by yourself or by whomever set up your system account; or
 161
 162 =item 2
 163
 164 B<The application must set its own locale> using the method described in
 165 L</The setlocale function>.
 166
 167 =back
 168
 169 =head1 USING LOCALES
 170
 171 =head2 The C<"use locale"> pragma
 172
 173 Starting in Perl 5.28, this pragma may be used in
 174 L<multi-threaded|threads> applications on systems that have thread-safe
 175 locale ability.  Some caveats apply, see L</Multi-threaded> below.  On
 176 systems without this capability, or in earlier Perls, do NOT use this
 177 pragma in scripts that have multiple L<threads|threads> active.  The
 178 locale in these cases is not local to a single thread.  Another thread
 179 may change the locale at any time, which could cause at a minimum that a
 180 given thread is operating in a locale it isn't expecting to be in.  On
 181 some platforms, segfaults can also occur.  The locale change need not be
 182 explicit; some operations cause perl to change the locale itself.  You
 183 are vulnerable simply by having done a S<C<"use locale">>.
 184
 185 By default, Perl itself (outside the L<POSIX> module)
 186 ignores the current locale.  The S<C<use locale>>
 187 pragma tells Perl to use the current locale for some operations.
 188 Starting in v5.16, there are optional parameters to this pragma,
 189 described below, which restrict which operations are affected by it.
 190
 191 The current locale is set at execution time by
 192 L<setlocale()|/The setlocale function> described below.  If that function
 193 hasn't yet been called in the course of the program's execution, the
 194 current locale is that which was determined by the L</"ENVIRONMENT"> in
 195 effect at the start of the program.
 196 If there is no valid environment, the current locale is whatever the
 197 system default has been set to.   On POSIX systems, it is likely, but
 198 not necessarily, the "C" locale.  On Windows, the default is set via the
 199 computer's S<C<Control Panel-E<gt>Regional and Language Options>> (or its
 200 current equivalent).
 201
 202 The operations that are affected by locale are:
 203
 204 =over 4
 205
 206 =item B<Not within the scope of C<"use locale">>
 207
 208 Only certain operations (all originating outside Perl) should be
 209 affected, as follows:
 210
 211 =over 4
 212
 213 =item *
 214
 215 The current locale is used when going outside of Perl with
 216 operations like L<system()|perlfunc/system LIST> or
 217 L<qxE<sol>E<sol>|perlop/qxE<sol>STRINGE<sol>>, if those operations are
 218 locale-sensitive.
 219
 220 =item *
 221
 222 Also Perl gives access to various C library functions through the
 223 L<POSIX> module.  Some of those functions are always affected by the
 224 current locale.  For example, C<POSIX::strftime()> uses C<LC_TIME>;
 225 C<POSIX::strtod()> uses C<LC_NUMERIC>; C<POSIX::strcoll()> and
 226 C<POSIX::strxfrm()> use C<LC_COLLATE>.  All such functions
 227 will behave according to the current underlying locale, even if that
 228 locale isn't exposed to Perl space.
 229
 230 This applies as well to L<I18N::Langinfo>.
 231
 232 =item *
 233
 234 XS modules for all categories but C<LC_NUMERIC> get the underlying
 235 locale, and hence any C library functions they call will use that
 236 underlying locale.  For more discussion, see L<perlxs/CAVEATS>.
 237
 238 =back
 239
 240 Note that all C programs (including the perl interpreter, which is
 241 written in C) always have an underlying locale.  That locale is the "C"
 242 locale unless changed by a call to L<setlocale()|/The setlocale
 243 function>.  When Perl starts up, it changes the underlying locale to the
 244 one which is indicated by the L</ENVIRONMENT>.  When using the L<POSIX>
 245 module or writing XS code, it is important to keep in mind that the
 246 underlying locale may be something other than "C", even if the program
 247 hasn't explicitly changed it.
 248
 249 Z<>
 250
 251 =item B<Lingering effects of C<S<use locale>>>
 252
 253 Certain Perl operations that are set-up within the scope of a
 254 C<use locale> retain that effect even outside the scope.
 255 These include:
 256
 257 =over 4
 258
 259 =item *
 260
 261 The output format of a L<write()|perlfunc/write> is determined by an
 262 earlier format declaration (L<perlfunc/format>), so whether or not the
 263 output is affected by locale is determined by if the C<format()> is
 264 within the scope of a C<use locale>, not whether the C<write()>
 265 is.
 266
 267 =item *
 268
 269 Regular expression patterns can be compiled using
 270 L<qrE<sol>E<sol>|perlop/qrE<sol>STRINGE<sol>msixpodualn> with actual
 271 matching deferred to later.  Again, it is whether or not the compilation
 272 was done within the scope of C<use locale> that determines the match
 273 behavior, not if the matches are done within such a scope or not.
 274
 275 =back
 276
 277 Z<>
 278
 279 =item B<Under C<"use locale";>>
 280
 281 =over 4
 282
 283 =item *
 284
 285 All the above operations
 286
 287 =item *
 288
 289 B<Format declarations> (L<perlfunc/format>) and hence any subsequent
 290 C<write()>s use C<LC_NUMERIC>.
 291
 292 =item *
 293
 294 B<stringification and output> use C<LC_NUMERIC>.
 295 These include the results of
 296 C<print()>,
 297 C<printf()>,
 298 C<say()>,
 299 and
 300 C<sprintf()>.
 301
 302 =item *
 303
 304 B<The comparison operators> (C<lt>, C<le>, C<cmp>, C<ge>, and C<gt>) use
 305 C<LC_COLLATE>.  C<sort()> is also affected if used without an
 306 explicit comparison function, because it uses C<cmp> by default.
 307
 308 B<Note:> C<eq> and C<ne> are unaffected by locale: they always
 309 perform a char-by-char comparison of their scalar operands.  What's
 310 more, if C<cmp> finds that its operands are equal according to the
 311 collation sequence specified by the current locale, it goes on to
 312 perform a char-by-char comparison, and only returns I<0> (equal) if the
 313 operands are char-for-char identical.  If you really want to know whether
 314 two strings--which C<eq> and C<cmp> may consider different--are equal
 315 as far as collation in the locale is concerned, see the discussion in
 316 L<Category C<LC_COLLATE>: Collation>.
 317
 318 =item *
 319
 320 B<Regular expressions and case-modification functions> (C<uc()>, C<lc()>,
 321 C<ucfirst()>, and C<lcfirst()>) use C<LC_CTYPE>
 322
 323 =item *
 324
 325 B<The variables L<$!|perlvar/$ERRNO>> (and its synonyms C<$ERRNO> and
 326 C<$OS_ERROR>) B<and L<$^E|perlvar/$EXTENDED_OS_ERROR>> (and its synonym
 327 C<$EXTENDED_OS_ERROR>) when used as strings use C<LC_MESSAGES>.
 328
 329 =back
 330
 331 =back
 332
 333 The default behavior is restored with the S<C<no locale>> pragma, or
 334 upon reaching the end of the block enclosing C<use locale>.
 335 Note that C<use locale> calls may be
 336 nested, and that what is in effect within an inner scope will revert to
 337 the outer scope's rules at the end of the inner scope.
 338
 339 The string result of any operation that uses locale
 340 information is tainted, as it is possible for a locale to be
 341 untrustworthy.  See L</"SECURITY">.
 342
 343 Starting in Perl v5.16 in a very limited way, and more generally in
 344 v5.22, you can restrict which category or categories are enabled by this
 345 particular instance of the pragma by adding parameters to it.  For
 346 example,
 347
 348  use locale qw(:ctype :numeric);
 349
 350 enables locale awareness within its scope of only those operations
 351 (listed above) that are affected by C<LC_CTYPE> and C<LC_NUMERIC>.
 352
 353 The possible categories are: C<:collate>, C<:ctype>, C<:messages>,
 354 C<:monetary>, C<:numeric>, C<:time>, and the pseudo category
 355 C<:characters> (described below).
 356
 357 Thus you can say
 358
 359  use locale ':messages';
 360
 361 and only L<$!|perlvar/$ERRNO> and L<$^E|perlvar/$EXTENDED_OS_ERROR>
 362 will be locale aware.  Everything else is unaffected.
 363
 364 Since Perl doesn't currently do anything with the C<LC_MONETARY>
 365 category, specifying C<:monetary> does effectively nothing.  Some
 366 systems have other categories, such as C<LC_PAPER>, but Perl
 367 also doesn't do anything with them, and there is no way to specify
 368 them in this pragma's arguments.
 369
 370 You can also easily say to use all categories but one, by either, for
 371 example,
 372
 373  use locale ':!ctype';
 374  use locale ':not_ctype';
 375
 376 both of which mean to enable locale awarness of all categories but
 377 C<LC_CTYPE>.  Only one category argument may be specified in a
 378 S<C<use locale>> if it is of the negated form.
 379
 380 Prior to v5.22 only one form of the pragma with arguments is available:
 381
 382  use locale ':not_characters';
 383
 384 (and you have to say C<not_>; you can't use the bang C<!> form).  This
 385 pseudo category is a shorthand for specifying both C<:collate> and
 386 C<:ctype>.  Hence, in the negated form, it is nearly the same thing as
 387 saying
 388
 389  use locale qw(:messages :monetary :numeric :time);
 390
 391 We use the term "nearly", because C<:not_characters> also turns on
 392 S<C<use feature 'unicode_strings'>> within its scope.  This form is
 393 less useful in v5.20 and later, and is described fully in
 394 L</Unicode and UTF-8>, but briefly, it tells Perl to not use the
 395 character portions of the locale definition, that is the C<LC_CTYPE> and
 396 C<LC_COLLATE> categories.  Instead it will use the native character set
 397 (extended by Unicode).  When using this parameter, you are responsible
 398 for getting the external character set translated into the
 399 native/Unicode one (which it already will be if it is one of the
 400 increasingly popular UTF-8 locales).  There are convenient ways of doing
 401 this, as described in L</Unicode and UTF-8>.
 402
 403 =head2 The setlocale function
 404
 405 WARNING!  Prior to Perl 5.28 or on a system that does not support
 406 thread-safe locale operations, do NOT use this function in a
 407 L<thread|threads>.  The locale will change in all other threads at the
 408 same time, and should your thread get paused by the operating system,
 409 and another started, that thread will not have the locale it is
 410 expecting.  On some platforms, there can be a race leading to segfaults
 411 if two threads call this function nearly simultaneously.
 412
 413 You can switch locales as often as you wish at run time with the
 414 C<POSIX::setlocale()> function:
 415
 416         # Import locale-handling tool set from POSIX module.
 417         # This example uses: setlocale -- the function call
 418         #                    LC_CTYPE -- explained below
 419         # (Showing the testing for success/failure of operations is
 420         # omitted in these examples to avoid distracting from the main
 421         # point)
 422
 423         use POSIX qw(locale_h);
 424         use locale;
 425         my $old_locale;
 426
 427         # query and save the old locale
 428         $old_locale = setlocale(LC_CTYPE);
 429
 430         setlocale(LC_CTYPE, "fr_CA.ISO8859-1");
 431         # LC_CTYPE now in locale "French, Canada, codeset ISO 8859-1"
 432
 433         setlocale(LC_CTYPE, "");
 434         # LC_CTYPE now reset to the default defined by the
 435         # LC_ALL/LC_CTYPE/LANG environment variables, or to the system
 436         # default.  See below for documentation.
 437
 438         # restore the old locale
 439         setlocale(LC_CTYPE, $old_locale);
 440
 441 The first argument of C<setlocale()> gives the B<category>, the second the
 442 B<locale>.  The category tells in what aspect of data processing you
 443 want to apply locale-specific rules.  Category names are discussed in
 444 L</LOCALE CATEGORIES> and L</"ENVIRONMENT">.  The locale is the name of a
 445 collection of customization information corresponding to a particular
 446 combination of language, country or territory, and codeset.  Read on for
 447 hints on the naming of locales: not all systems name locales as in the
 448 example.
 449
 450 If no second argument is provided and the category is something other
 451 than C<LC_ALL>, the function returns a string naming the current locale
 452 for the category.  You can use this value as the second argument in a
 453 subsequent call to C<setlocale()>, B<but> on some platforms the string
 454 is opaque, not something that most people would be able to decipher as
 455 to what locale it means.
 456
 457 If no second argument is provided and the category is C<LC_ALL>, the
 458 result is implementation-dependent.  It may be a string of
 459 concatenated locale names (separator also implementation-dependent)
 460 or a single locale name.  Please consult your L<setlocale(3)> man page for
 461 details.
 462
 463 If a second argument is given and it corresponds to a valid locale,
 464 the locale for the category is set to that value, and the function
 465 returns the now-current locale value.  You can then use this in yet
 466 another call to C<setlocale()>.  (In some implementations, the return
 467 value may sometimes differ from the value you gave as the second
 468 argument--think of it as an alias for the value you gave.)
 469
 470 As the example shows, if the second argument is an empty string, the
 471 category's locale is returned to the default specified by the
 472 corresponding environment variables.  Generally, this results in a
 473 return to the default that was in force when Perl started up: changes
 474 to the environment made by the application after startup may or may not
 475 be noticed, depending on your system's C library.
 476
 477 Note that when a form of C<use locale> that doesn't include all
 478 categories is specified, Perl ignores the excluded categories.
 479
 480 If C<set_locale()> fails for some reason (for example, an attempt to set
 481 to a locale unknown to the system), the locale for the category is not
 482 changed, and the function returns C<undef>.
 483
 484 Starting in Perl 5.28, on multi-threaded perls compiled on systems that
 485 implement POSIX 2008 thread-safe locale operations, this function
 486 doesn't actually call the system C<setlocale>.  Instead those
 487 thread-safe operations are used to emulate the C<setlocale> function,
 488 but in a thread-safe manner.
 489
 490 You can force the thread-safe locale operations to always be used (if
 491 available) by recompiling perl with
 492
 493  -Accflags='-DUSE_THREAD_SAFE_LOCALE'
 494
 495 added to your call to F<Configure>.
 496
 497 For further information about the categories, consult L<setlocale(3)>.
 498
 499 =head2 Multi-threaded operation
 500
 501 Beginning in Perl 5.28, multi-threaded locale operation is supported on
 502 systems that implement either the POSIX 2008 or Windows-specific
 503 thread-safe locale operations.  Many modern systems, such as various
 504 Unix variants and Darwin do have this.
 505
 506 You can tell if using locales is safe on your system by looking at the
 507 read-only boolean variable C<${^SAFE_LOCALES}>.  The value is 1 if the
 508 perl is not threaded, or if it is using thread-safe locale operations.
 509
 510 Thread-safe operations are supported in Windows starting in Visual Studio
 511 2005, and in systems compatible with POSIX 2008.  Some platforms claim
 512 to support POSIX 2008, but have buggy implementations, so that the hints
 513 files for compiling to run on them turn off attempting to use
 514 thread-safety.  C<${^SAFE_LOCALES}> will be 0 on them.
 515
 516 Be aware that writing a multi-threaded application will not be portable
 517 to a platform which lacks the native thread-safe locale support.  On
 518 systems that do have it, you automatically get this behavior for
 519 threaded perls, without having to do anything.  If for some reason, you
 520 don't want to use this capability (perhaps the POSIX 2008 support is
 521 buggy on your system), you can manually compile Perl to use the old
 522 non-thread-safe implementation by passing the argument
 523 C<-Accflags='-DNO_THREAD_SAFE_LOCALE'> to F<Configure>.
 524 Except on Windows, this will continue to use certain of the POSIX 2008
 525 functions in some situations.  If these are buggy, you can pass the
 526 following to F<Configure> instead or additionally:
 527 C<-Accflags='-DNO_POSIX_2008_LOCALE'>.  This will also keep the code
 528 from using thread-safe locales.
 529 C<${^SAFE_LOCALES}> will be 0 on systems that turn off the thread-safe
 530 operations.
 531
 532 Normally on unthreaded builds, the traditional C<setlocale()> is used
 533 and not the thread-safe locale functions.  You can force the use of these
 534 on systems that have them by adding the
 535 C<-Accflags='-DUSE_THREAD_SAFE_LOCALE'> to F<Configure>.
 536
 537 The initial program is started up using the locale specified from the
 538 environment, as currently, described in L</ENVIRONMENT>.   All newly
 539 created threads start with C<LC_ALL> set to C<"C">>.  Each thread may
 540 use C<POSIX::setlocale()> to query or switch its locale at any time,
 541 without affecting any other thread.  All locale-dependent operations
 542 automatically use their thread's locale.
 543
 544 This should be completely transparent to any applications written
 545 entirely in Perl (minus a few rarely encountered caveats given in the
 546 L</Multi-threaded> section).  Information for XS module writers is given
 547 in L<perlxs/Locale-aware XS code>.
 548
 549 =head2 Finding locales
 550
 551 For locales available in your system, consult also L<setlocale(3)> to
 552 see whether it leads to the list of available locales (search for the
 553 I<SEE ALSO> section).  If that fails, try the following command lines:
 554
 555         locale -a
 556
 557         nlsinfo
 558
 559         ls /usr/lib/nls/loc
 560
 561         ls /usr/lib/locale
 562
 563         ls /usr/lib/nls
 564
 565         ls /usr/share/locale
 566
 567 and see whether they list something resembling these
 568
 569         en_US.ISO8859-1     de_DE.ISO8859-1     ru_RU.ISO8859-5
 570         en_US.iso88591      de_DE.iso88591      ru_RU.iso88595
 571         en_US               de_DE               ru_RU
 572         en                  de                  ru
 573         english             german              russian
 574         english.iso88591    german.iso88591     russian.iso88595
 575         english.roman8                          russian.koi8r
 576
 577 Sadly, even though the calling interface for C<setlocale()> has been
 578 standardized, names of locales and the directories where the
 579 configuration resides have not been.  The basic form of the name is
 580 I<language_territory>B<.>I<codeset>, but the latter parts after
 581 I<language> are not always present.  The I<language> and I<country>
 582 are usually from the standards B<ISO 3166> and B<ISO 639>, the
 583 two-letter abbreviations for the countries and the languages of the
 584 world, respectively.  The I<codeset> part often mentions some B<ISO
 585 8859> character set, the Latin codesets.  For example, C<ISO 8859-1>
 586 is the so-called "Western European codeset" that can be used to encode
 587 most Western European languages adequately.  Again, there are several
 588 ways to write even the name of that one standard.  Lamentably.
 589
 590 Two special locales are worth particular mention: "C" and "POSIX".
 591 Currently these are effectively the same locale: the difference is
 592 mainly that the first one is defined by the C standard, the second by
 593 the POSIX standard.  They define the B<default locale> in which
 594 every program starts in the absence of locale information in its
 595 environment.  (The I<default> default locale, if you will.)  Its language
 596 is (American) English and its character codeset ASCII or, rarely, a
 597 superset thereof (such as the "DEC Multinational Character Set
 598 (DEC-MCS)").  B<Warning>. The C locale delivered by some vendors
 599 may not actually exactly match what the C standard calls for.  So
 600 beware.
 601
 602 B<NOTE>: Not all systems have the "POSIX" locale (not all systems are
 603 POSIX-conformant), so use "C" when you need explicitly to specify this
 604 default locale.
 605
 606 =head2 LOCALE PROBLEMS
 607
 608 You may encounter the following warning message at Perl startup:
 609
 610         perl: warning: Setting locale failed.
 611         perl: warning: Please check that your locale settings:
 612                 LC_ALL = "En_US",
 613                 LANG = (unset)
 614             are supported and installed on your system.
 615         perl: warning: Falling back to the standard locale ("C").
 616
 617 This means that your locale settings had C<LC_ALL> set to "En_US" and
 618 LANG exists but has no value.  Perl tried to believe you but could not.
 619 Instead, Perl gave up and fell back to the "C" locale, the default locale
 620 that is supposed to work no matter what.  (On Windows, it first tries
 621 falling back to the system default locale.)  This usually means your
 622 locale settings were wrong, they mention locales your system has never
 623 heard of, or the locale installation in your system has problems (for
 624 example, some system files are broken or missing).  There are quick and
 625 temporary fixes to these problems, as well as more thorough and lasting
 626 fixes.
 627
 628 =head2 Testing for broken locales
 629
 630 If you are building Perl from source, the Perl test suite file
 631 F<lib/locale.t> can be used to test the locales on your system.
 632 Setting the environment variable C<PERL_DEBUG_FULL_TEST> to 1
 633 will cause it to output detailed results.  For example, on Linux, you
 634 could say
 635
 636  PERL_DEBUG_FULL_TEST=1 ./perl -T -Ilib lib/locale.t > locale.log 2>&1
 637
 638 Besides many other tests, it will test every locale it finds on your
 639 system to see if they conform to the POSIX standard.  If any have
 640 errors, it will include a summary near the end of the output of which
 641 locales passed all its tests, and which failed, and why.
 642
 643 =head2 Temporarily fixing locale problems
 644
 645 The two quickest fixes are either to render Perl silent about any
 646 locale inconsistencies or to run Perl under the default locale "C".
 647
 648 Perl's moaning about locale problems can be silenced by setting the
 649 environment variable C<PERL_BADLANG> to "0" or "".
 650 This method really just sweeps the problem under the carpet: you tell
 651 Perl to shut up even when Perl sees that something is wrong.  Do not
 652 be surprised if later something locale-dependent misbehaves.
 653
 654 Perl can be run under the "C" locale by setting the environment
 655 variable C<LC_ALL> to "C".  This method is perhaps a bit more civilized
 656 than the C<PERL_BADLANG> approach, but setting C<LC_ALL> (or
 657 other locale variables) may affect other programs as well, not just
 658 Perl.  In particular, external programs run from within Perl will see
 659 these changes.  If you make the new settings permanent (read on), all
 660 programs you run see the changes.  See L</"ENVIRONMENT"> for
 661 the full list of relevant environment variables and L</"USING LOCALES">
 662 for their effects in Perl.  Effects in other programs are
 663 easily deducible.  For example, the variable C<LC_COLLATE> may well affect
 664 your B<sort> program (or whatever the program that arranges "records"
 665 alphabetically in your system is called).
 666
 667 You can test out changing these variables temporarily, and if the
 668 new settings seem to help, put those settings into your shell startup
 669 files.  Consult your local documentation for the exact details.  For
 670 Bourne-like shells (B<sh>, B<ksh>, B<bash>, B<zsh>):
 671
 672         LC_ALL=en_US.ISO8859-1
 673         export LC_ALL
 674
 675 This assumes that we saw the locale "en_US.ISO8859-1" using the commands
 676 discussed above.  We decided to try that instead of the above faulty
 677 locale "En_US"--and in Cshish shells (B<csh>, B<tcsh>)
 678
 679         setenv LC_ALL en_US.ISO8859-1
 680
 681 or if you have the "env" application you can do (in any shell)
 682
 683         env LC_ALL=en_US.ISO8859-1 perl ...
 684
 685 If you do not know what shell you have, consult your local
 686 helpdesk or the equivalent.
 687
 688 =head2 Permanently fixing locale problems
 689
 690 The slower but superior fixes are when you may be able to yourself
 691 fix the misconfiguration of your own environment variables.  The
 692 mis(sing)configuration of the whole system's locales usually requires
 693 the help of your friendly system administrator.
 694
 695 First, see earlier in this document about L</Finding locales>.  That tells
 696 how to find which locales are really supported--and more importantly,
 697 installed--on your system.  In our example error message, environment
 698 variables affecting the locale are listed in the order of decreasing
 699 importance (and unset variables do not matter).  Therefore, having
 700 LC_ALL set to "En_US" must have been the bad choice, as shown by the
 701 error message.  First try fixing locale settings listed first.
 702
 703 Second, if using the listed commands you see something B<exactly>
 704 (prefix matches do not count and case usually counts) like "En_US"
 705 without the quotes, then you should be okay because you are using a
 706 locale name that should be installed and available in your system.
 707 In this case, see L</Permanently fixing your system's locale configuration>.
 708
 709 =head2 Permanently fixing your system's locale configuration
 710
 711 This is when you see something like:
 712
 713         perl: warning: Please check that your locale settings:
 714                 LC_ALL = "En_US",
 715                 LANG = (unset)
 716             are supported and installed on your system.
 717
 718 but then cannot see that "En_US" listed by the above-mentioned
 719 commands.  You may see things like "en_US.ISO8859-1", but that isn't
 720 the same.  In this case, try running under a locale
 721 that you can list and which somehow matches what you tried.  The
 722 rules for matching locale names are a bit vague because
 723 standardization is weak in this area.  See again the
 724 L</Finding locales> about general rules.
 725
 726 =head2 Fixing system locale configuration
 727
 728 Contact a system administrator (preferably your own) and report the exact
 729 error message you get, and ask them to read this same documentation you
 730 are now reading.  They should be able to check whether there is something
 731 wrong with the locale configuration of the system.  The L</Finding locales>
 732 section is unfortunately a bit vague about the exact commands and places
 733 because these things are not that standardized.
 734
 735 =head2 The localeconv function
 736
 737 The C<POSIX::localeconv()> function allows you to get particulars of the
 738 locale-dependent numeric formatting information specified by the current
 739 underlying C<LC_NUMERIC> and C<LC_MONETARY> locales (regardless of
 740 whether called from within the scope of C<S<use locale>> or not).  (If
 741 you just want the name of
 742 the current locale for a particular category, use C<POSIX::setlocale()>
 743 with a single parameter--see L</The setlocale function>.)
 744
 745         use POSIX qw(locale_h);
 746
 747         # Get a reference to a hash of locale-dependent info
 748         $locale_values = localeconv();
 749
 750         # Output sorted list of the values
 751         for (sort keys %$locale_values) {
 752             printf "%-20s = %s\n", $_, $locale_values->{$_}
 753         }
 754
 755 C<localeconv()> takes no arguments, and returns B<a reference to> a hash.
 756 The keys of this hash are variable names for formatting, such as
 757 C<decimal_point> and C<thousands_sep>.  The values are the
 758 corresponding, er, values.  See L<POSIX/localeconv> for a longer
 759 example listing the categories an implementation might be expected to
 760 provide; some provide more and others fewer.  You don't need an
 761 explicit C<use locale>, because C<localeconv()> always observes the
 762 current locale.
 763
 764 Here's a simple-minded example program that rewrites its command-line
 765 parameters as integers correctly formatted in the current locale:
 766
 767     use POSIX qw(locale_h);
 768
 769     # Get some of locale's numeric formatting parameters
 770     my ($thousands_sep, $grouping) =
 771             @{localeconv()}{'thousands_sep', 'grouping'};
 772
 773     # Apply defaults if values are missing
 774     $thousands_sep = ',' unless $thousands_sep;
 775
 776     # grouping and mon_grouping are packed lists
 777     # of small integers (characters) telling the
 778     # grouping (thousand_seps and mon_thousand_seps
 779     # being the group dividers) of numbers and
 780     # monetary quantities.  The integers' meanings:
 781     # 255 means no more grouping, 0 means repeat
 782     # the previous grouping, 1-254 means use that
 783     # as the current grouping.  Grouping goes from
 784     # right to left (low to high digits).  In the
 785     # below we cheat slightly by never using anything
 786     # else than the first grouping (whatever that is).
 787     if ($grouping) {
 788         @grouping = unpack("C*", $grouping);
 789     } else {
 790         @grouping = (3);
 791     }
 792
 793     # Format command line params for current locale
 794     for (@ARGV) {
 795         $_ = int;    # Chop non-integer part
 796         1 while
 797         s/(\d)(\d{$grouping[0]}($|$thousands_sep))/$1$thousands_sep$2/;
 798         print "$_";
 799     }
 800     print "\n";
 801
 802 Note that if the platform doesn't have C<LC_NUMERIC> and/or
 803 C<LC_MONETARY> available or enabled, the corresponding elements of the
 804 hash will be missing.
 805
 806 =head2 I18N::Langinfo
 807
 808 Another interface for querying locale-dependent information is the
 809 C<I18N::Langinfo::langinfo()> function.
 810
 811 The following example will import the C<langinfo()> function itself and
 812 three constants to be used as arguments to C<langinfo()>: a constant for
 813 the abbreviated first day of the week (the numbering starts from
 814 Sunday = 1) and two more constants for the affirmative and negative
 815 answers for a yes/no question in the current locale.
 816
 817     use I18N::Langinfo qw(langinfo ABDAY_1 YESSTR NOSTR);
 818
 819     my ($abday_1, $yesstr, $nostr)
 820                 = map { langinfo } qw(ABDAY_1 YESSTR NOSTR);
 821
 822     print "$abday_1? [$yesstr/$nostr] ";
 823
 824 In other words, in the "C" (or English) locale the above will probably
 825 print something like:
 826
 827     Sun? [yes/no]
 828
 829 See L<I18N::Langinfo> for more information.
 830
 831 =head1 LOCALE CATEGORIES
 832
 833 The following subsections describe basic locale categories.  Beyond these,
 834 some combination categories allow manipulation of more than one
 835 basic category at a time.  See L</"ENVIRONMENT"> for a discussion of these.
 836
 837 =head2 Category C<LC_COLLATE>: Collation: Text Comparisons and Sorting
 838
 839 In the scope of a S<C<use locale>> form that includes collation, Perl
 840 looks to the C<LC_COLLATE>
 841 environment variable to determine the application's notions on collation
 842 (ordering) of characters.  For example, "b" follows "a" in Latin
 843 alphabets, but where do "E<aacute>" and "E<aring>" belong?  And while
 844 "color" follows "chocolate" in English, what about in traditional Spanish?
 845
 846 The following collations all make sense and you may meet any of them
 847 if you C<"use locale">.
 848
 849         A B C D E a b c d e
 850         A a B b C c D d E e
 851         a A b B c C d D e E
 852         a b c d e A B C D E
 853
 854 Here is a code snippet to tell what "word"
 855 characters are in the current locale, in that locale's order:
 856
 857         use locale;
 858         print +(sort grep /\w/, map { chr } 0..255), "\n";
 859
 860 Compare this with the characters that you see and their order if you
 861 state explicitly that the locale should be ignored:
 862
 863         no locale;
 864         print +(sort grep /\w/, map { chr } 0..255), "\n";
 865
 866 This machine-native collation (which is what you get unless S<C<use
 867 locale>> has appeared earlier in the same block) must be used for
 868 sorting raw binary data, whereas the locale-dependent collation of the
 869 first example is useful for natural text.
 870
 871 As noted in L</USING LOCALES>, C<cmp> compares according to the current
 872 collation locale when C<use locale> is in effect, but falls back to a
 873 char-by-char comparison for strings that the locale says are equal. You
 874 can use C<POSIX::strcoll()> if you don't want this fall-back:
 875
 876         use POSIX qw(strcoll);
 877         $equal_in_locale =
 878             !strcoll("space and case ignored", "SpaceAndCaseIgnored");
 879
 880 C<$equal_in_locale> will be true if the collation locale specifies a
 881 dictionary-like ordering that ignores space characters completely and
 882 which folds case.
 883
 884 Perl uses the platform's C library collation functions C<strcoll()> and
 885 C<strxfrm()>.  That means you get whatever they give.  On some
 886 platforms, these functions work well on UTF-8 locales, giving
 887 a reasonable default collation for the code points that are important in
 888 that locale.  (And if they aren't working well, the problem may only be
 889 that the locale definition is deficient, so can be fixed by using a
 890 better definition file.  Unicode's definitions (see L</Freely available
 891 locale definitions>) provide reasonable UTF-8 locale collation
 892 definitions.)  Starting in Perl v5.26, Perl's use of these functions has
 893 been made more seamless.  This may be sufficient for your needs.  For
 894 more control, and to make sure strings containing any code point (not
 895 just the ones important in the locale) collate properly, the
 896 L<Unicode::Collate> module is suggested.
 897
 898 In non-UTF-8 locales (hence single byte), code points above 0xFF are
 899 technically invalid.  But if present, again starting in v5.26, they will
 900 collate to the same position as the highest valid code point does.  This
 901 generally gives good results, but the collation order may be skewed if
 902 the valid code point gets special treatment when it forms particular
 903 sequences with other characters as defined by the locale.
 904 When two strings collate identically, the code point order is used as a
 905 tie breaker.
 906
 907 If Perl detects that there are problems with the locale collation order,
 908 it reverts to using non-locale collation rules for that locale.
 909
 910 If you have a single string that you want to check for "equality in
 911 locale" against several others, you might think you could gain a little
 912 efficiency by using C<POSIX::strxfrm()> in conjunction with C<eq>:
 913
 914         use POSIX qw(strxfrm);
 915         $xfrm_string = strxfrm("Mixed-case string");
 916         print "locale collation ignores spaces\n"
 917             if $xfrm_string eq strxfrm("Mixed-casestring");
 918         print "locale collation ignores hyphens\n"
 919             if $xfrm_string eq strxfrm("Mixedcase string");
 920         print "locale collation ignores case\n"
 921             if $xfrm_string eq strxfrm("mixed-case string");
 922
 923 C<strxfrm()> takes a string and maps it into a transformed string for use
 924 in char-by-char comparisons against other transformed strings during
 925 collation.  "Under the hood", locale-affected Perl comparison operators
 926 call C<strxfrm()> for both operands, then do a char-by-char
 927 comparison of the transformed strings.  By calling C<strxfrm()> explicitly
 928 and using a non locale-affected comparison, the example attempts to save
 929 a couple of transformations.  But in fact, it doesn't save anything: Perl
 930 magic (see L<perlguts/Magic Variables>) creates the transformed version of a
 931 string the first time it's needed in a comparison, then keeps this version around
 932 in case it's needed again.  An example rewritten the easy way with
 933 C<cmp> runs just about as fast.  It also copes with null characters
 934 embedded in strings; if you call C<strxfrm()> directly, it treats the first
 935 null it finds as a terminator.  Don't expect the transformed strings
 936 it produces to be portable across systems--or even from one revision
 937 of your operating system to the next.  In short, don't call C<strxfrm()>
 938 directly: let Perl do it for you.
 939
 940 Note: C<use locale> isn't shown in some of these examples because it isn't
 941 needed: C<strcoll()> and C<strxfrm()> are POSIX functions
 942 which use the standard system-supplied C<libc> functions that
 943 always obey the current C<LC_COLLATE> locale.
 944
 945 =head2 Category C<LC_CTYPE>: Character Types
 946
 947 In the scope of a S<C<use locale>> form that includes C<LC_CTYPE>, Perl
 948 obeys the C<LC_CTYPE> locale
 949 setting.  This controls the application's notion of which characters are
 950 alphabetic, numeric, punctuation, I<etc>.  This affects Perl's C<\w>
 951 regular expression metanotation,
 952 which stands for alphanumeric characters--that is, alphabetic,
 953 numeric, and the platform's native underscore.
 954 (Consult L<perlre> for more information about
 955 regular expressions.)  Thanks to C<LC_CTYPE>, depending on your locale
 956 setting, characters like "E<aelig>", "E<eth>", "E<szlig>", and
 957 "E<oslash>" may be understood as C<\w> characters.
 958 It also affects things like C<\s>, C<\D>, and the POSIX character
 959 classes, like C<[[:graph:]]>.  (See L<perlrecharclass> for more
 960 information on all these.)
 961
 962 The C<LC_CTYPE> locale also provides the map used in transliterating
 963 characters between lower and uppercase.  This affects the case-mapping
 964 functions--C<fc()>, C<lc()>, C<lcfirst()>, C<uc()>, and C<ucfirst()>;
 965 case-mapping
 966 interpolation with C<\F>, C<\l>, C<\L>, C<\u>, or C<\U> in double-quoted
 967 strings and C<s///> substitutions; and case-insensitive regular expression
 968 pattern matching using the C<i> modifier.
 969
 970 Starting in v5.20, Perl supports UTF-8 locales for C<LC_CTYPE>, but
 971 otherwise Perl only supports single-byte locales, such as the ISO 8859
 972 series.  This means that wide character locales, for example for Asian
 973 languages, are not well-supported.  Use of these locales may cause core
 974 dumps.  If the platform has the capability for Perl to detect such a
 975 locale, starting in Perl v5.22, L<Perl will warn, default
 976 enabled|warnings/Category Hierarchy>, using the C<locale> warning
 977 category, whenever such a locale is switched into.  The UTF-8 locale
 978 support is actually a
 979 superset of POSIX locales, because it is really full Unicode behavior
 980 as if no C<LC_CTYPE> locale were in effect at all (except for tainting;
 981 see L</SECURITY>).  POSIX locales, even UTF-8 ones,
 982 are lacking certain concepts in Unicode, such as the idea that changing
 983 the case of a character could expand to be more than one character.
 984 Perl in a UTF-8 locale, will give you that expansion.  Prior to v5.20,
 985 Perl treated a UTF-8 locale on some platforms like an ISO 8859-1 one,
 986 with some restrictions, and on other platforms more like the "C" locale.
 987 For releases v5.16 and v5.18, C<S<use locale 'not_characters>> could be
 988 used as a workaround for this (see L</Unicode and UTF-8>).
 989
 990 Note that there are quite a few things that are unaffected by the
 991 current locale.  Any literal character is the native character for the
 992 given platform.  Hence 'A' means the character at code point 65 on ASCII
 993 platforms, and 193 on EBCDIC.  That may or may not be an 'A' in the
 994 current locale, if that locale even has an 'A'.
 995 Similarly, all the escape sequences for particular characters,
 996 C<\n> for example, always mean the platform's native one.  This means,
 997 for example, that C<\N> in regular expressions (every character
 998 but new-line) works on the platform character set.
 999
1000 Starting in v5.22, Perl will by default warn when switching into a
1001 locale that redefines any ASCII printable character (plus C<\t> and
1002 C<\n>) into a different class than expected.  This is likely to
1003 happen on modern locales only on EBCDIC platforms, where, for example,
1004 a CCSID 0037 locale on a CCSID 1047 machine moves C<"[">, but it can
1005 happen on ASCII platforms with the ISO 646 and other
1006 7-bit locales that are essentially obsolete.  Things may still work,
1007 depending on what features of Perl are used by the program.  For
1008 example, in the example from above where C<"|"> becomes a C<\w>, and
1009 there are no regular expressions where this matters, the program may
1010 still work properly.  The warning lists all the characters that
1011 it can determine could be adversely affected.
1012
1013 B<Note:> A broken or malicious C<LC_CTYPE> locale definition may result
1014 in clearly ineligible characters being considered to be alphanumeric by
1015 your application.  For strict matching of (mundane) ASCII letters and
1016 digits--for example, in command strings--locale-aware applications
1017 should use C<\w> with the C</a> regular expression modifier.  See L</"SECURITY">.
1018
1019 =head2 Category C<LC_NUMERIC>: Numeric Formatting
1020
1021 After a proper C<POSIX::setlocale()> call, and within the scope of
1022 of a C<use locale> form that includes numerics, Perl obeys the
1023 C<LC_NUMERIC> locale information, which controls an application's idea
1024 of how numbers should be formatted for human readability.
1025 In most implementations the only effect is to
1026 change the character used for the decimal point--perhaps from "."  to ",".
1027 The functions aren't aware of such niceties as thousands separation and
1028 so on. (See L</The localeconv function> if you care about these things.)
1029
1030  use POSIX qw(strtod setlocale LC_NUMERIC);
1031  use locale;
1032
1033  setlocale LC_NUMERIC, "";
1034
1035  $n = 5/2;   # Assign numeric 2.5 to $n
1036
1037  $a = " $n"; # Locale-dependent conversion to string
1038
1039  print "half five is $n\n";       # Locale-dependent output
1040
1041  printf "half five is %g\n", $n;  # Locale-dependent output
1042
1043  print "DECIMAL POINT IS COMMA\n"
1044           if $n == (strtod("2,5"))[0]; # Locale-dependent conversion
1045
1046 See also L<I18N::Langinfo> and C<RADIXCHAR>.
1047
1048 =head2 Category C<LC_MONETARY>: Formatting of monetary amounts
1049
1050 The C standard defines the C<LC_MONETARY> category, but not a function
1051 that is affected by its contents.  (Those with experience of standards
1052 committees will recognize that the working group decided to punt on the
1053 issue.)  Consequently, Perl essentially takes no notice of it.  If you
1054 really want to use C<LC_MONETARY>, you can query its contents--see
1055 L</The localeconv function>--and use the information that it returns in your
1056 application's own formatting of currency amounts.  However, you may well
1057 find that the information, voluminous and complex though it may be, still
1058 does not quite meet your requirements: currency formatting is a hard nut
1059 to crack.
1060
1061 See also L<I18N::Langinfo> and C<CRNCYSTR>.
1062
1063 =head2 Category C<LC_TIME>: Respresentation of time
1064
1065 Output produced by C<POSIX::strftime()>, which builds a formatted
1066 human-readable date/time string, is affected by the current C<LC_TIME>
1067 locale.  Thus, in a French locale, the output produced by the C<%B>
1068 format element (full month name) for the first month of the year would
1069 be "janvier".  Here's how to get a list of long month names in the
1070 current locale:
1071
1072         use POSIX qw(strftime);
1073         for (0..11) {
1074             $long_month_name[$_] =
1075                 strftime("%B", 0, 0, 0, 1, $_, 96);
1076         }
1077
1078 Note: C<use locale> isn't needed in this example: C<strftime()> is a POSIX
1079 function which uses the standard system-supplied C<libc> function that
1080 always obeys the current C<LC_TIME> locale.
1081
1082 See also L<I18N::Langinfo> and C<ABDAY_1>..C<ABDAY_7>, C<DAY_1>..C<DAY_7>,
1083 C<ABMON_1>..C<ABMON_12>, and C<ABMON_1>..C<ABMON_12>.
1084
1085 =head2 Other categories
1086
1087 The remaining locale categories are not currently used by Perl itself.
1088 But again note that things Perl interacts with may use these, including
1089 extensions outside the standard Perl distribution, and by the
1090 operating system and its utilities.  Note especially that the string
1091 value of C<$!> and the error messages given by external utilities may
1092 be changed by C<LC_MESSAGES>.  If you want to have portable error
1093 codes, use C<%!>.  See L<Errno>.
1094
1095 =head1 SECURITY
1096
1097 Although the main discussion of Perl security issues can be found in
1098 L<perlsec>, a discussion of Perl's locale handling would be incomplete
1099 if it did not draw your attention to locale-dependent security issues.
1100 Locales--particularly on systems that allow unprivileged users to
1101 build their own locales--are untrustworthy.  A malicious (or just plain
1102 broken) locale can make a locale-aware application give unexpected
1103 results.  Here are a few possibilities:
1104
1105 =over 4
1106
1107 =item *
1108
1109 Regular expression checks for safe file names or mail addresses using
1110 C<\w> may be spoofed by an C<LC_CTYPE> locale that claims that
1111 characters such as C<"E<gt>"> and C<"|"> are alphanumeric.
1112
1113 =item *
1114
1115 String interpolation with case-mapping, as in, say, C<$dest =
1116 "C:\U$name.$ext">, may produce dangerous results if a bogus C<LC_CTYPE>
1117 case-mapping table is in effect.
1118
1119 =item *
1120
1121 A sneaky C<LC_COLLATE> locale could result in the names of students with
1122 "D" grades appearing ahead of those with "A"s.
1123
1124 =item *
1125
1126 An application that takes the trouble to use information in
1127 C<LC_MONETARY> may format debits as if they were credits and vice versa
1128 if that locale has been subverted.  Or it might make payments in US
1129 dollars instead of Hong Kong dollars.
1130
1131 =item *
1132
1133 The date and day names in dates formatted by C<strftime()> could be
1134 manipulated to advantage by a malicious user able to subvert the
1135 C<LC_DATE> locale.  ("Look--it says I wasn't in the building on
1136 Sunday.")
1137
1138 =back
1139
1140 Such dangers are not peculiar to the locale system: any aspect of an
1141 application's environment which may be modified maliciously presents
1142 similar challenges.  Similarly, they are not specific to Perl: any
1143 programming language that allows you to write programs that take
1144 account of their environment exposes you to these issues.
1145
1146 Perl cannot protect you from all possibilities shown in the
1147 examples--there is no substitute for your own vigilance--but, when
1148 C<use locale> is in effect, Perl uses the tainting mechanism (see
1149 L<perlsec>) to mark string results that become locale-dependent, and
1150 which may be untrustworthy in consequence.  Here is a summary of the
1151 tainting behavior of operators and functions that may be affected by
1152 the locale:
1153
1154 =over 4
1155
1156 =item  *
1157
1158 B<Comparison operators> (C<lt>, C<le>, C<ge>, C<gt> and C<cmp>):
1159
1160 Scalar true/false (or less/equal/greater) result is never tainted.
1161
1162 =item  *
1163
1164 B<Case-mapping interpolation> (with C<\l>, C<\L>, C<\u>, C<\U>, or C<\F>)
1165
1166 The result string containing interpolated material is tainted if
1167 a C<use locale> form that includes C<LC_CTYPE> is in effect.
1168
1169 =item  *
1170
1171 B<Matching operator> (C<m//>):
1172
1173 Scalar true/false result never tainted.
1174
1175 All subpatterns, either delivered as a list-context result or as C<$1>
1176 I<etc>., are tainted if a C<use locale> form that includes
1177 C<LC_CTYPE> is in effect, and the subpattern
1178 regular expression contains a locale-dependent construct.  These
1179 constructs include C<\w> (to match an alphanumeric character), C<\W>
1180 (non-alphanumeric character), C<\b> and C<\B> (word-boundary and
1181 non-boundardy, which depend on what C<\w> and C<\W> match), C<\s>
1182 (whitespace character), C<\S> (non whitespace character), C<\d> and
1183 C<\D> (digits and non-digits), and the POSIX character classes, such as
1184 C<[:alpha:]> (see L<perlrecharclass/POSIX Character Classes>).
1185
1186 Tainting is also likely if the pattern is to be matched
1187 case-insensitively (via C</i>).  The exception is if all the code points
1188 to be matched this way are above 255 and do not have folds under Unicode
1189 rules to below 256.  Tainting is not done for these because Perl
1190 only uses Unicode rules for such code points, and those rules are the
1191 same no matter what the current locale.
1192
1193 The matched-pattern variables, C<$&>, C<$`> (pre-match), C<$'>
1194 (post-match), and C<$+> (last match) also are tainted.
1195
1196 =item  *
1197
1198 B<Substitution operator> (C<s///>):
1199
1200 Has the same behavior as the match operator.  Also, the left
1201 operand of C<=~> becomes tainted when a C<use locale>
1202 form that includes C<LC_CTYPE> is in effect, if modified as
1203 a result of a substitution based on a regular
1204 expression match involving any of the things mentioned in the previous
1205 item, or of case-mapping, such as C<\l>, C<\L>,C<\u>, C<\U>, or C<\F>.
1206
1207 =item *
1208
1209 B<Output formatting functions> (C<printf()> and C<write()>):
1210
1211 Results are never tainted because otherwise even output from print,
1212 for example C<print(1/7)>, should be tainted if C<use locale> is in
1213 effect.
1214
1215 =item *
1216
1217 B<Case-mapping functions> (C<lc()>, C<lcfirst()>, C<uc()>, C<ucfirst()>):
1218
1219 Results are tainted if a C<use locale> form that includes C<LC_CTYPE> is
1220 in effect.
1221
1222 =item *
1223
1224 B<POSIX locale-dependent functions> (C<localeconv()>, C<strcoll()>,
1225 C<strftime()>, C<strxfrm()>):
1226
1227 Results are never tainted.
1228
1229 =back
1230
1231 Three examples illustrate locale-dependent tainting.
1232 The first program, which ignores its locale, won't run: a value taken
1233 directly from the command line may not be used to name an output file
1234 when taint checks are enabled.
1235
1236         #/usr/local/bin/perl -T
1237         # Run with taint checking
1238
1239         # Command line sanity check omitted...
1240         $tainted_output_file = shift;
1241
1242         open(F, ">$tainted_output_file")
1243             or warn "Open of $tainted_output_file failed: $!\n";
1244
1245 The program can be made to run by "laundering" the tainted value through
1246 a regular expression: the second example--which still ignores locale
1247 information--runs, creating the file named on its command line
1248 if it can.
1249
1250         #/usr/local/bin/perl -T
1251
1252         $tainted_output_file = shift;
1253         $tainted_output_file =~ m%[\w/]+%;
1254         $untainted_output_file = $&;
1255
1256         open(F, ">$untainted_output_file")
1257             or warn "Open of $untainted_output_file failed: $!\n";
1258
1259 Compare this with a similar but locale-aware program:
1260
1261         #/usr/local/bin/perl -T
1262
1263         $tainted_output_file = shift;
1264         use locale;
1265         $tainted_output_file =~ m%[\w/]+%;
1266         $localized_output_file = $&;
1267
1268         open(F, ">$localized_output_file")
1269             or warn "Open of $localized_output_file failed: $!\n";
1270
1271 This third program fails to run because C<$&> is tainted: it is the result
1272 of a match involving C<\w> while C<use locale> is in effect.
1273
1274 =head1 ENVIRONMENT
1275
1276 =over 12
1277
1278 =item PERL_SKIP_LOCALE_INIT
1279
1280 This environment variable, available starting in Perl v5.20, if set
1281 (to any value), tells Perl to not use the rest of the
1282 environment variables to initialize with.  Instead, Perl uses whatever
1283 the current locale settings are.  This is particularly useful in
1284 embedded environments, see
1285 L<perlembed/Using embedded Perl with POSIX locales>.
1286
1287 =item PERL_BADLANG
1288
1289 A string that can suppress Perl's warning about failed locale settings
1290 at startup.  Failure can occur if the locale support in the operating
1291 system is lacking (broken) in some way--or if you mistyped the name of
1292 a locale when you set up your environment.  If this environment
1293 variable is absent, or has a value other than "0" or "", Perl will
1294 complain about locale setting failures.
1295
1296 B<NOTE>: C<PERL_BADLANG> only gives you a way to hide the warning message.
1297 The message tells about some problem in your system's locale support,
1298 and you should investigate what the problem is.
1299
1300 =back
1301
1302 The following environment variables are not specific to Perl: They are
1303 part of the standardized (ISO C, XPG4, POSIX 1.c) C<setlocale()> method
1304 for controlling an application's opinion on data.  Windows is non-POSIX,
1305 but Perl arranges for the following to work as described anyway.
1306 If the locale given by an environment variable is not valid, Perl tries
1307 the next lower one in priority.  If none are valid, on Windows, the
1308 system default locale is then tried.  If all else fails, the C<"C">
1309 locale is used.  If even that doesn't work, something is badly broken,
1310 but Perl tries to forge ahead with whatever the locale settings might
1311 be.
1312
1313 =over 12
1314
1315 =item C<LC_ALL>
1316
1317 C<LC_ALL> is the "override-all" locale environment variable. If
1318 set, it overrides all the rest of the locale environment variables.
1319
1320 =item C<LANGUAGE>
1321
1322 B<NOTE>: C<LANGUAGE> is a GNU extension, it affects you only if you
1323 are using the GNU libc.  This is the case if you are using e.g. Linux.
1324 If you are using "commercial" Unixes you are most probably I<not>
1325 using GNU libc and you can ignore C<LANGUAGE>.
1326
1327 However, in the case you are using C<LANGUAGE>: it affects the
1328 language of informational, warning, and error messages output by
1329 commands (in other words, it's like C<LC_MESSAGES>) but it has higher
1330 priority than C<LC_ALL>.  Moreover, it's not a single value but
1331 instead a "path" (":"-separated list) of I<languages> (not locales).
1332 See the GNU C<gettext> library documentation for more information.
1333
1334 =item C<LC_CTYPE>
1335
1336 In the absence of C<LC_ALL>, C<LC_CTYPE> chooses the character type
1337 locale.  In the absence of both C<LC_ALL> and C<LC_CTYPE>, C<LANG>
1338 chooses the character type locale.
1339
1340 =item C<LC_COLLATE>
1341
1342 In the absence of C<LC_ALL>, C<LC_COLLATE> chooses the collation
1343 (sorting) locale.  In the absence of both C<LC_ALL> and C<LC_COLLATE>,
1344 C<LANG> chooses the collation locale.
1345
1346 =item C<LC_MONETARY>
1347
1348 In the absence of C<LC_ALL>, C<LC_MONETARY> chooses the monetary
1349 formatting locale.  In the absence of both C<LC_ALL> and C<LC_MONETARY>,
1350 C<LANG> chooses the monetary formatting locale.
1351
1352 =item C<LC_NUMERIC>
1353
1354 In the absence of C<LC_ALL>, C<LC_NUMERIC> chooses the numeric format
1355 locale.  In the absence of both C<LC_ALL> and C<LC_NUMERIC>, C<LANG>
1356 chooses the numeric format.
1357
1358 =item C<LC_TIME>
1359
1360 In the absence of C<LC_ALL>, C<LC_TIME> chooses the date and time
1361 formatting locale.  In the absence of both C<LC_ALL> and C<LC_TIME>,
1362 C<LANG> chooses the date and time formatting locale.
1363
1364 =item C<LANG>
1365
1366 C<LANG> is the "catch-all" locale environment variable. If it is set, it
1367 is used as the last resort after the overall C<LC_ALL> and the
1368 category-specific C<LC_I<foo>>.
1369
1370 =back
1371
1372 =head2 Examples
1373
1374 The C<LC_NUMERIC> controls the numeric output:
1375
1376    use locale;
1377    use POSIX qw(locale_h); # Imports setlocale() and the LC_ constants.
1378    setlocale(LC_NUMERIC, "fr_FR") or die "Pardon";
1379    printf "%g\n", 1.23; # If the "fr_FR" succeeded, probably shows 1,23.
1380
1381 and also how strings are parsed by C<POSIX::strtod()> as numbers:
1382
1383    use locale;
1384    use POSIX qw(locale_h strtod);
1385    setlocale(LC_NUMERIC, "de_DE") or die "Entschuldigung";
1386    my $x = strtod("2,34") + 5;
1387    print $x, "\n"; # Probably shows 7,34.
1388
1389 =head1 NOTES
1390
1391 =head2 String C<eval> and C<LC_NUMERIC>
1392
1393 A string L<eval|perlfunc/eval EXPR> parses its expression as standard
1394 Perl.  It is therefore expecting the decimal point to be a dot.  If
1395 C<LC_NUMERIC> is set to have this be a comma instead, the parsing will
1396 be confused, perhaps silently.
1397
1398  use locale;
1399  use POSIX qw(locale_h);
1400  setlocale(LC_NUMERIC, "fr_FR") or die "Pardon";
1401  my $a = 1.2;
1402  print eval "$a + 1.5";
1403  print "\n";
1404
1405 prints C<13,5>.  This is because in that locale, the comma is the
1406 decimal point character.  The C<eval> thus expands to:
1407
1408  eval "1,2 + 1.5"
1409
1410 and the result is not what you likely expected.  No warnings are
1411 generated.  If you do string C<eval>'s within the scope of
1412 S<C<use locale>>, you should instead change the C<eval> line to do
1413 something like:
1414
1415  print eval "no locale; $a + 1.5";
1416
1417 This prints C<2.7>.
1418
1419 You could also exclude C<LC_NUMERIC>, if you don't need it, by
1420
1421  use locale ':!numeric';
1422
1423 =head2 Backward compatibility
1424
1425 Versions of Perl prior to 5.004 B<mostly> ignored locale information,
1426 generally behaving as if something similar to the C<"C"> locale were
1427 always in force, even if the program environment suggested otherwise
1428 (see L</The setlocale function>).  By default, Perl still behaves this
1429 way for backward compatibility.  If you want a Perl application to pay
1430 attention to locale information, you B<must> use the S<C<use locale>>
1431 pragma (see L</The "use locale" pragma>) or, in the unlikely event
1432 that you want to do so for just pattern matching, the
1433 C</l> regular expression modifier (see L<perlre/Character set
1434 modifiers>) to instruct it to do so.
1435
1436 Versions of Perl from 5.002 to 5.003 did use the C<LC_CTYPE>
1437 information if available; that is, C<\w> did understand what
1438 were the letters according to the locale environment variables.
1439 The problem was that the user had no control over the feature:
1440 if the C library supported locales, Perl used them.
1441
1442 =head2 I18N:Collate obsolete
1443
1444 In versions of Perl prior to 5.004, per-locale collation was possible
1445 using the C<I18N::Collate> library module.  This module is now mildly
1446 obsolete and should be avoided in new applications.  The C<LC_COLLATE>
1447 functionality is now integrated into the Perl core language: One can
1448 use locale-specific scalar data completely normally with C<use locale>,
1449 so there is no longer any need to juggle with the scalar references of
1450 C<I18N::Collate>.
1451
1452 =head2 Sort speed and memory use impacts
1453
1454 Comparing and sorting by locale is usually slower than the default
1455 sorting; slow-downs of two to four times have been observed.  It will
1456 also consume more memory: once a Perl scalar variable has participated
1457 in any string comparison or sorting operation obeying the locale
1458 collation rules, it will take 3-15 times more memory than before.  (The
1459 exact multiplier depends on the string's contents, the operating system
1460 and the locale.) These downsides are dictated more by the operating
1461 system's implementation of the locale system than by Perl.
1462
1463 =head2 Freely available locale definitions
1464
1465 The Unicode CLDR project extracts the POSIX portion of many of its
1466 locales, available at
1467
1468   http://unicode.org/Public/cldr/2.0.1/
1469
1470 (Newer versions of CLDR require you to compute the POSIX data yourself.
1471 See L<http://unicode.org/Public/cldr/latest/>.)
1472
1473 There is a large collection of locale definitions at:
1474
1475   http://std.dkuug.dk/i18n/WG15-collection/locales/
1476
1477 You should be aware that it is
1478 unsupported, and is not claimed to be fit for any purpose.  If your
1479 system allows installation of arbitrary locales, you may find the
1480 definitions useful as they are, or as a basis for the development of
1481 your own locales.
1482
1483 =head2 I18n and l10n
1484
1485 "Internationalization" is often abbreviated as B<i18n> because its first
1486 and last letters are separated by eighteen others.  (You may guess why
1487 the internalin ... internaliti ... i18n tends to get abbreviated.)  In
1488 the same way, "localization" is often abbreviated to B<l10n>.
1489
1490 =head2 An imperfect standard
1491
1492 Internationalization, as defined in the C and POSIX standards, can be
1493 criticized as incomplete and ungainly.  They also have a tendency, like
1494 standards groups, to divide the world into nations, when we all know
1495 that the world can equally well be divided into bankers, bikers, gamers,
1496 and so on.
1497
1498 =head1 Unicode and UTF-8
1499
1500 The support of Unicode is new starting from Perl version v5.6, and more fully
1501 implemented in versions v5.8 and later.  See L<perluniintro>.
1502
1503 Starting in Perl v5.20, UTF-8 locales are supported in Perl, except
1504 C<LC_COLLATE> is only partially supported; collation support is improved
1505 in Perl v5.26 to a level that may be sufficient for your needs
1506 (see L</Category C<LC_COLLATE>: Collation: Text Comparisons and Sorting>).
1507
1508 If you have Perl v5.16 or v5.18 and can't upgrade, you can use
1509
1510     use locale ':not_characters';
1511
1512 When this form of the pragma is used, only the non-character portions of
1513 locales are used by Perl, for example C<LC_NUMERIC>.  Perl assumes that
1514 you have translated all the characters it is to operate on into Unicode
1515 (actually the platform's native character set (ASCII or EBCDIC) plus
1516 Unicode).  For data in files, this can conveniently be done by also
1517 specifying
1518
1519     use open ':locale';
1520
1521 This pragma arranges for all inputs from files to be translated into
1522 Unicode from the current locale as specified in the environment (see
1523 L</ENVIRONMENT>), and all outputs to files to be translated back
1524 into the locale.  (See L<open>).  On a per-filehandle basis, you can
1525 instead use the L<PerlIO::locale> module, or the L<Encode::Locale>
1526 module, both available from CPAN.  The latter module also has methods to
1527 ease the handling of C<ARGV> and environment variables, and can be used
1528 on individual strings.  If you know that all your locales will be
1529 UTF-8, as many are these days, you can use the L<B<-C>|perlrun/-C>
1530 command line switch.
1531
1532 This form of the pragma allows essentially seamless handling of locales
1533 with Unicode.  The collation order will be by Unicode code point order.
1534 L<Unicode::Collate> can be used to get Unicode rules collation.
1535
1536 All the modules and switches just described can be used in v5.20 with
1537 just plain C<use locale>, and, should the input locales not be UTF-8,
1538 you'll get the less than ideal behavior, described below, that you get
1539 with pre-v5.16 Perls, or when you use the locale pragma without the
1540 C<:not_characters> parameter in v5.16 and v5.18.  If you are using
1541 exclusively UTF-8 locales in v5.20 and higher, the rest of this section
1542 does not apply to you.
1543
1544 There are two cases, multi-byte and single-byte locales.  First
1545 multi-byte:
1546
1547 The only multi-byte (or wide character) locale that Perl is ever likely
1548 to support is UTF-8.  This is due to the difficulty of implementation,
1549 the fact that high quality UTF-8 locales are now published for every
1550 area of the world (L<http://unicode.org/Public/cldr/2.0.1/> for
1551 ones that are already set-up, but from an earlier version;
1552 L<http://unicode.org/Public/cldr/latest/> for the most up-to-date, but
1553 you have to extract the POSIX information yourself), and that
1554 failing all that you can use the L<Encode> module to translate to/from
1555 your locale.  So, you'll have to do one of those things if you're using
1556 one of these locales, such as Big5 or Shift JIS.  For UTF-8 locales, in
1557 Perls (pre v5.20) that don't have full UTF-8 locale support, they may
1558 work reasonably well (depending on your C library implementation)
1559 simply because both
1560 they and Perl store characters that take up multiple bytes the same way.
1561 However, some, if not most, C library implementations may not process
1562 the characters in the upper half of the Latin-1 range (128 - 255)
1563 properly under C<LC_CTYPE>.  To see if a character is a particular type
1564 under a locale, Perl uses the functions like C<isalnum()>.  Your C
1565 library may not work for UTF-8 locales with those functions, instead
1566 only working under the newer wide library functions like C<iswalnum()>,
1567 which Perl does not use.
1568 These multi-byte locales are treated like single-byte locales, and will
1569 have the restrictions described below.  Starting in Perl v5.22 a warning
1570 message is raised when Perl detects a multi-byte locale that it doesn't
1571 fully support.
1572
1573 For single-byte locales,
1574 Perl generally takes the tack to use locale rules on code points that can fit
1575 in a single byte, and Unicode rules for those that can't (though this
1576 isn't uniformly applied, see the note at the end of this section).  This
1577 prevents many problems in locales that aren't UTF-8.  Suppose the locale
1578 is ISO8859-7, Greek.  The character at 0xD7 there is a capital Chi. But
1579 in the ISO8859-1 locale, Latin1, it is a multiplication sign.  The POSIX
1580 regular expression character class C<[[:alpha:]]> will magically match
1581 0xD7 in the Greek locale but not in the Latin one.
1582
1583 However, there are places where this breaks down.  Certain Perl constructs are
1584 for Unicode only, such as C<\p{Alpha}>.  They assume that 0xD7 always has its
1585 Unicode meaning (or the equivalent on EBCDIC platforms).  Since Latin1 is a
1586 subset of Unicode and 0xD7 is the multiplication sign in both Latin1 and
1587 Unicode, C<\p{Alpha}> will never match it, regardless of locale.  A similar
1588 issue occurs with C<\N{...}>.  Prior to v5.20, it is therefore a bad
1589 idea to use C<\p{}> or
1590 C<\N{}> under plain C<use locale>--I<unless> you can guarantee that the
1591 locale will be ISO8859-1.  Use POSIX character classes instead.
1592
1593 Another problem with this approach is that operations that cross the
1594 single byte/multiple byte boundary are not well-defined, and so are
1595 disallowed.  (This boundary is between the codepoints at 255/256.)
1596 For example, lower casing LATIN CAPITAL LETTER Y WITH DIAERESIS (U+0178)
1597 should return LATIN SMALL LETTER Y WITH DIAERESIS (U+00FF).  But in the
1598 Greek locale, for example, there is no character at 0xFF, and Perl
1599 has no way of knowing what the character at 0xFF is really supposed to
1600 represent.  Thus it disallows the operation.  In this mode, the
1601 lowercase of U+0178 is itself.
1602
1603 The same problems ensue if you enable automatic UTF-8-ification of your
1604 standard file handles, default C<open()> layer, and C<@ARGV> on non-ISO8859-1,
1605 non-UTF-8 locales (by using either the B<-C> command line switch or the
1606 C<PERL_UNICODE> environment variable; see L<perlrun>).
1607 Things are read in as UTF-8, which would normally imply a Unicode
1608 interpretation, but the presence of a locale causes them to be interpreted
1609 in that locale instead.  For example, a 0xD7 code point in the Unicode
1610 input, which should mean the multiplication sign, won't be interpreted by
1611 Perl that way under the Greek locale.  This is not a problem
1612 I<provided> you make certain that all locales will always and only be either
1613 an ISO8859-1, or, if you don't have a deficient C library, a UTF-8 locale.
1614
1615 Still another problem is that this approach can lead to two code
1616 points meaning the same character.  Thus in a Greek locale, both U+03A7
1617 and U+00D7 are GREEK CAPITAL LETTER CHI.
1618
1619 Because of all these problems, starting in v5.22, Perl will raise a
1620 warning if a multi-byte (hence Unicode) code point is used when a
1621 single-byte locale is in effect.  (Although it doesn't check for this if
1622 doing so would unreasonably slow execution down.)
1623
1624 Vendor locales are notoriously buggy, and it is difficult for Perl to test
1625 its locale-handling code because this interacts with code that Perl has no
1626 control over; therefore the locale-handling code in Perl may be buggy as
1627 well.  (However, the Unicode-supplied locales should be better, and
1628 there is a feed back mechanism to correct any problems.  See
1629 L</Freely available locale definitions>.)
1630
1631 If you have Perl v5.16, the problems mentioned above go away if you use
1632 the C<:not_characters> parameter to the locale pragma (except for vendor
1633 bugs in the non-character portions).  If you don't have v5.16, and you
1634 I<do> have locales that work, using them may be worthwhile for certain
1635 specific purposes, as long as you keep in mind the gotchas already
1636 mentioned.  For example, if the collation for your locales works, it
1637 runs faster under locales than under L<Unicode::Collate>; and you gain
1638 access to such things as the local currency symbol and the names of the
1639 months and days of the week.  (But to hammer home the point, in v5.16,
1640 you get this access without the downsides of locales by using the
1641 C<:not_characters> form of the pragma.)
1642
1643 Note: The policy of using locale rules for code points that can fit in a
1644 byte, and Unicode rules for those that can't is not uniformly applied.
1645 Pre-v5.12, it was somewhat haphazard; in v5.12 it was applied fairly
1646 consistently to regular expression matching except for bracketed
1647 character classes; in v5.14 it was extended to all regex matches; and in
1648 v5.16 to the casing operations such as C<\L> and C<uc()>.  For
1649 collation, in all releases so far, the system's C<strxfrm()> function is
1650 called, and whatever it does is what you get.  Starting in v5.26, various
1651 bugs are fixed with the way perl uses this function.
1652
1653 =head1 BUGS
1654
1655 =head2 Collation of strings containing embedded C<NUL> characters
1656
1657 C<NUL> characters will sort the same as the lowest collating control
1658 character does, or to C<"\001"> in the unlikely event that there are no
1659 control characters at all in the locale.  In cases where the strings
1660 don't contain this non-C<NUL> control, the results will be correct, and
1661 in many locales, this control, whatever it might be, will rarely be
1662 encountered.  But there are cases where a C<NUL> should sort before this
1663 control, but doesn't.  If two strings do collate identically, the one
1664 containing the C<NUL> will sort to earlier.  Prior to 5.26, there were
1665 more bugs.
1666
1667 =head2 Multi-threaded
1668
1669 XS code or C-language libraries called from it that use the system
1670 L<C<setlocale(3)>> function (except on Windows) likely will not work
1671 from a multi-threaded application without changes.  See
1672 L<perlxs/Locale-aware XS code>.
1673
1674 An XS module that is locale-dependent could have been written under the
1675 assumption that it will never be called in a multi-threaded environment,
1676 and so uses other non-locale constructs that aren't multi-thread-safe.
1677 See L<perlxs/Thread-aware system interfaces>.
1678
1679 POSIX does not define a way to get the name of the current per-thread
1680 locale.  Some systems, such as Darwin and NetBSD do implement a
1681 function, L<querylocale(3)> to do this.  On non-Windows systems without
1682 it, such as Linux, there are some additional caveats:
1683
1684 =over
1685
1686 =item *
1687
1688 An embedded perl needs to be started up while the global locale is in
1689 effect.  See L<perlembed/Using embedded Perl with POSIX locales>.
1690
1691 =item *
1692
1693 It becomes more important for perl to know about all the possible
1694 locale categories on the platform, even if they aren't apparently used
1695 in your program.  Perl knows all of the Linux ones.  If your platform
1696 has others, you can send email to L<mailto:perlbug@perl.org> for
1697 inclusion of it in the next release.  In the meantime, it is possible to
1698 edit the Perl source to teach it about the category, and then recompile.
1699 Search for instances of, say, C<LC_PAPER> in the source, and use that as
1700 a template to add the omitted one.
1701
1702 =item *
1703
1704 It is possible, though hard to do, to call C<POSIX::setlocale> with a
1705 locale that it doesn't recognize as syntactically legal, but actually is
1706 legal on that system.  This should happen only with embedded perls, or
1707 if you hand-craft a locale name yourself.
1708
1709 =back
1710
1711 =head2 Broken systems
1712
1713 In certain systems, the operating system's locale support
1714 is broken and cannot be fixed or used by Perl.  Such deficiencies can
1715 and will result in mysterious hangs and/or Perl core dumps when
1716 C<use locale> is in effect.  When confronted with such a system,
1717 please report in excruciating detail to <F<perlbug@perl.org>>, and
1718 also contact your vendor: bug fixes may exist for these problems
1719 in your operating system.  Sometimes such bug fixes are called an
1720 operating system upgrade.  If you have the source for Perl, include in
1721 the perlbug email the output of the test described above in L</Testing
1722 for broken locales>.
1723
1724 =head1 SEE ALSO
1725
1726 L<I18N::Langinfo>, L<perluniintro>, L<perlunicode>, L<open>,
1727 L<POSIX/localeconv>,
1728 L<POSIX/setlocale>, L<POSIX/strcoll>, L<POSIX/strftime>,
1729 L<POSIX/strtod>, L<POSIX/strxfrm>.
1730
1731 For special considerations when Perl is embedded in a C program,
1732 see L<perlembed/Using embedded Perl with POSIX locales>.
1733
1734 =head1 HISTORY
1735
1736 Jarkko Hietaniemi's original F<perli18n.pod> heavily hacked by Dominic
1737 Dunlop, assisted by the perl5-porters.  Prose worked over a bit by
1738 Tom Christiansen, and now maintained by Perl 5 porters.