=over
-=item Category LC_NUMERIC: Numeric formatting
+=item Category C<LC_NUMERIC>: Numeric formatting
This indicates how numbers should be formatted for human readability,
for example the character used as the decimal point.
-=item Category LC_MONETARY: Formatting of monetary amounts
+=item Category C<LC_MONETARY>: Formatting of monetary amounts
=for comment
-The nbsp below makes this look better
+The nbsp below makes this look better (though not great)
E<160>
-=item Category LC_TIME: Date/Time formatting
+=item Category C<LC_TIME>: Date/Time formatting
=for comment
-The nbsp below makes this look better
+The nbsp below makes this look better (though not great)
E<160>
-=item Category LC_MESSAGES: Error and other messages
+=item Category C<LC_MESSAGES>: Error and other messages
This is used by Perl itself only for accessing operating system error
messages via L<$!|perlvar/$ERRNO> and L<$^E|perlvar/$EXTENDED_OS_ERROR>.
-=item Category LC_COLLATE: Collation
+=item Category C<LC_COLLATE>: Collation
This indicates the ordering of letters for comparison and sorting.
In Latin alphabets, for example, "b", generally follows "a".
-=item Category LC_CTYPE: Character Types
+=item Category C<LC_CTYPE>: Character Types
This indicates, for example if a character is an uppercase letter.
L<setlocale()|/The setlocale function> described below. If that function
hasn't yet been called in the course of the program's execution, the
current locale is that which was determined by the L</"ENVIRONMENT"> in
-effect at the start of the program, except that
-C<L<LC_NUMERIC|/Category LC_NUMERIC: Numeric Formatting>> is always
-initialized to the C locale (the C locale is mentioned under L<Finding
-locales>).
+effect at the start of the program.
If there is no valid environment, the current locale is whatever the
-system default has been set to. It is likely, but not necessarily, the
-"C" locale.
+system default has been set to. On POSIX systems, it is likely, but
+not necessarily, the "C" locale. On Windows, the default is set via the
+computer's S<C<Control Panel-E<gt>Regional and Language Options>> (or its
+current equivalent).
The operations that are affected by locale are:
=item *
-The variable L<$!|perlvar/$ERRNO> (and its synonyms C<$ERRNO> and
-C<$OS_ERROR>) when used as strings always are in terms of the current
-locale.
+The variables L<$!|perlvar/$ERRNO> (and its synonyms C<$ERRNO> and
+C<$OS_ERROR>) and L<$^E|perlvar/$EXTENDED_OS_ERROR> (and its synonym
+C<$EXTENDED_OS_ERROR>) when used as strings always are in terms of the
+current locale and as if within the scope of L<"use bytes"|bytes>. This is
+likely to change in Perl v5.22.
=item *
=item *
-Perl also provides lite wrappers for XS modules to use some C library
-C<printf> functions. These wrappers don't do anything with the locale,
-and the underlying C library function is affected by the locale in
-effect at the time of the wrapper call.
-The affected functions are
-L<perlapi/my_sprintf>,
-L<perlapi/my_snprintf>,
+XS modules for all categories but C<LC_NUMERIC> get the underlying
+locale, and hence any C library functions they call will use that
+underlying locale. Perl always initializes C<LC_NUMERIC> to C<"C">
+because too many modules are unable to cope with the decimal point in a
+floating point number not being a dot (it's a comma in many locales).
+But note that these modules are vulnerable because C<LC_NUMERIC>
+currently can be changed at any time by a call to the C C<set_locale()>
+by XS code or by something XS code calls, or by C<POSIX::setlocale()> by
+Perl code. This is true also for the Perl-provided lite wrappers for XS
+modules to use some C library C<printf> functions:
+C<Gconvert>,
+L<my_sprintf|perlapi/my_sprintf>,
+L<my_snprintf|perlapi/my_snprintf>,
and
-L<perlapi/my_vsnprintf>.
+L<my_vsnprintf|perlapi/my_vsnprintf>.
=back
-=item Lingering effects of C<S<use locale>>
+=for comment
+The nbsp below makes this look better (though not great)
+
+E<160>
+
+=item B<Lingering effects of C<S<use locale>>>
Certain Perl operations that are set-up within the scope of a
C<use locale> variant retain that effect even outside the scope.
=back
+=for comment
+The nbsp below makes this look better (though not great)
+
+E<160>
+
=item B<Under C<"use locale ':not_characters';">>
=over 4
=back
=for comment
-The nbsp below makes this look better
+The nbsp below makes this look better (though not great)
E<160>
operands are char-for-char identical. If you really want to know whether
two strings--which C<eq> and C<cmp> may consider different--are equal
as far as collation in the locale is concerned, see the discussion in
-L<Category LC_COLLATE: Collation>.
+L<Category C<LC_COLLATE>: Collation>.
=item *
# LC_CTYPE -- explained below
# (Showing the testing for success/failure of operations is
# omitted in these examples to avoid distracting from the main
- # point
+ # point)
use POSIX qw(locale_h);
use locale;
# LC_CTYPE now in locale "French, Canada, codeset ISO 8859-1"
setlocale(LC_CTYPE, "");
- # LC_CTYPE now reset to default defined by LC_ALL/LC_CTYPE/LANG
- # environment variables. See below for documentation.
+ # LC_CTYPE now reset to the default defined by the
+ # LC_ALL/LC_CTYPE/LANG environment variables, or to the system
+ # default. See below for documentation.
# restore the old locale
setlocale(LC_CTYPE, $old_locale);
example.
If no second argument is provided and the category is something other
-than LC_ALL, the function returns a string naming the current locale
+than C<LC_ALL>, the function returns a string naming the current locale
for the category. You can use this value as the second argument in a
subsequent call to C<setlocale()>, B<but> on some platforms the string
is opaque, not something that most people would be able to decipher as
to what locale it means.
-If no second argument is provided and the category is LC_ALL, the
+If no second argument is provided and the category is C<LC_ALL>, the
result is implementation-dependent. It may be a string of
concatenated locale names (separator also implementation-dependent)
or a single locale name. Please consult your L<setlocale(3)> man page for
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
-This means that your locale settings had LC_ALL set to "En_US" and
+This means that your locale settings had C<LC_ALL> set to "En_US" and
LANG exists but has no value. Perl tried to believe you but could not.
Instead, Perl gave up and fell back to the "C" locale, the default locale
-that is supposed to work no matter what. This usually means your locale
-settings were wrong, they mention locales your system has never heard
-of, or the locale installation in your system has problems (for example,
-some system files are broken or missing). There are quick and temporary
-fixes to these problems, as well as more thorough and lasting fixes.
+that is supposed to work no matter what. (On Windows, it first tries
+falling back to the system default locale.) This usually means your
+locale settings were wrong, they mention locales your system has never
+heard of, or the locale installation in your system has problems (for
+example, some system files are broken or missing). There are quick and
+temporary fixes to these problems, as well as more thorough and lasting
+fixes.
=head2 Testing for broken locales
locale inconsistencies or to run Perl under the default locale "C".
Perl's moaning about locale problems can be silenced by setting the
-environment variable PERL_BADLANG to a zero value, for example "0".
+environment variable C<PERL_BADLANG> to a zero value, for example "0".
This method really just sweeps the problem under the carpet: you tell
Perl to shut up even when Perl sees that something is wrong. Do not
be surprised if later something locale-dependent misbehaves.
Perl can be run under the "C" locale by setting the environment
-variable LC_ALL to "C". This method is perhaps a bit more civilized
-than the PERL_BADLANG approach, but setting LC_ALL (or
+variable C<LC_ALL> to "C". This method is perhaps a bit more civilized
+than the C<PERL_BADLANG> approach, but setting C<LC_ALL> (or
other locale variables) may affect other programs as well, not just
Perl. In particular, external programs run from within Perl will see
these changes. If you make the new settings permanent (read on), all
programs you run see the changes. See L<"ENVIRONMENT"> for
the full list of relevant environment variables and L<USING LOCALES>
for their effects in Perl. Effects in other programs are
-easily deducible. For example, the variable LC_COLLATE may well affect
+easily deducible. For example, the variable C<LC_COLLATE> may well affect
your B<sort> program (or whatever the program that arranges "records"
alphabetically in your system is called).
The C<POSIX::localeconv()> function allows you to get particulars of the
locale-dependent numeric formatting information specified by the current
-C<LC_NUMERIC> and C<LC_MONETARY> locales. (If you just want the name of
+underlying C<LC_NUMERIC> and C<LC_MONETARY> locales (regardless of
+whether called from within the scope of C<S<use locale>> or not). (If
+you just want the name of
the current locale for a particular category, use C<POSIX::setlocale()>
with a single parameter--see L<The setlocale function>.)
}
print "\n";
+Note that if the platform doesn't have C<LC_NUMERIC> and/or
+C<LC_MONETARY> available or enabled, the corresponding elements of the
+hash will be missing.
+
=head2 I18N::Langinfo
Another interface for querying locale-dependent information is the
some combination categories allow manipulation of more than one
basic category at a time. See L<"ENVIRONMENT"> for a discussion of these.
-=head2 Category LC_COLLATE: Collation
+=head2 Category C<LC_COLLATE>: Collation
In the scope of S<C<use locale>> (but not a
C<use locale ':not_characters'>), Perl looks to the C<LC_COLLATE>
which use the standard system-supplied C<libc> functions that
always obey the current C<LC_COLLATE> locale.
-=head2 Category LC_CTYPE: Character Types
+=head2 Category C<LC_CTYPE>: Character Types
In the scope of S<C<use locale>> (but not a
C<use locale ':not_characters'>), Perl obeys the C<LC_CTYPE> locale
setting. This controls the application's notion of which characters are
-alphabetic. This affects Perl's C<\w> regular expression metanotation,
+alphabetic, numeric, punctuation, I<etc>. This affects Perl's C<\w>
+regular expression metanotation,
which stands for alphanumeric characters--that is, alphabetic,
-numeric, and including other special characters such as the underscore or
-hyphen. (Consult L<perlre> for more information about
+numeric, and the platform's native underscore.
+(Consult L<perlre> for more information about
regular expressions.) Thanks to C<LC_CTYPE>, depending on your locale
setting, characters like "E<aelig>", "E<eth>", "E<szlig>", and
"E<oslash>" may be understood as C<\w> characters.
+It also affects things like C<\s>, C<\D>, and the POSIX character
+classes, like C<[[:graph:]]>. (See L<perlrecharclass> for more
+information on all these.)
The C<LC_CTYPE> locale also provides the map used in transliterating
characters between lower and uppercase. This affects the case-mapping
strings and C<s///> substitutions; and case-independent regular expression
pattern matching using the C<i> modifier.
-Finally, C<LC_CTYPE> affects the POSIX character-class test
+Finally, C<LC_CTYPE> affects the (deprecated) POSIX character-class test
functions--C<POSIX::isalpha()>, C<POSIX::islower()>, and so on. For
example, if you move from the "C" locale to a 7-bit Scandinavian one,
you may find--possibly to your surprise--that "|" moves from the
digits--for example, in command strings--locale-aware applications
should use C<\w> with the C</a> regular expression modifier. See L<"SECURITY">.
-=head2 Category LC_NUMERIC: Numeric Formatting
+=head2 Category C<LC_NUMERIC>: Numeric Formatting
After a proper C<POSIX::setlocale()> call, and within the scope of one
of the C<use locale> variants, Perl obeys the C<LC_NUMERIC>
See also L<I18N::Langinfo> and C<RADIXCHAR>.
-=head2 Category LC_MONETARY: Formatting of monetary amounts
+=head2 Category C<LC_MONETARY>: Formatting of monetary amounts
The C standard defines the C<LC_MONETARY> category, but not a function
that is affected by its contents. (Those with experience of standards
See also L<I18N::Langinfo> and C<CRNCYSTR>.
-=head2 LC_TIME
+=head2 C<LC_TIME>
Output produced by C<POSIX::strftime()>, which builds a formatted
human-readable date/time string, is affected by the current C<LC_TIME>
=item *
String interpolation with case-mapping, as in, say, C<$dest =
-"C:\U$name.$ext">, may produce dangerous results if a bogus LC_CTYPE
+"C:\U$name.$ext">, may produce dangerous results if a bogus C<LC_CTYPE>
case-mapping table is in effect.
=item *
All subpatterns, either delivered as a list-context result or as C<$1>
I<etc>., are tainted if C<use locale> (but not
S<C<use locale ':not_characters'>>) is in effect, and the subpattern
-regular expression is matched case-insensitively (C</i>) or contains a
-locale-dependent construct. These constructs include C<\w>
-(to match an alphanumeric character), C<\W> (non-alphanumeric
-character), C<\s> (whitespace character), C<\S> (non whitespace
-character), and the POSIX character classes, such as C<[:alpha:]> (see
-L<perlrecharclass/POSIX Character Classes>).
+regular expression contains a locale-dependent construct. These
+constructs include C<\w> (to match an alphanumeric character), C<\W>
+(non-alphanumeric character), C<\b> and C<\B> (word-boundary and
+non-boundardy, which depend on what C<\w> and C<\W> match), C<\s>
+(whitespace character), C<\S> (non whitespace character), C<\d> and
+C<\D> (digits and non-digits), and the POSIX character classes, such as
+C<[:alpha:]> (see L<perlrecharclass/POSIX Character Classes>).
+
+Tainting is also likely if the pattern is to be matched
+case-insensitively (via C</i>). The exception is if all the code points
+to be matched this way are above 255 and do not have folds under Unicode
+rules to below 256. Tainting is not done for these because Perl
+only uses Unicode rules for such code points, and those rules are the
+same no matter what the current locale.
+
The matched-pattern variables, C<$&>, C<$`> (pre-match), C<$'>
(post-match), and C<$+> (last match) also are tainted.
-(Note that currently there are some bugs where not everything that
-should be tainted gets tainted in all circumstances.)
=item *
=over 12
+=item PERL_SKIP_LOCALE_INIT
+
+This environment variable, available starting in Perl v5.20, and if it
+evaluates to a TRUE value, tells Perl to not use the rest of the
+environment variables to initialize with. Instead, Perl uses whatever
+the current locale settings are. This is particularly useful in
+embedded environments, see
+L<perlembed/Using embedded Perl with POSIX locales>.
+
=item PERL_BADLANG
A string that can suppress Perl's warning about failed locale settings
zero--that is, "0" or ""-- Perl will complain about locale setting
failures.
-B<NOTE>: PERL_BADLANG only gives you a way to hide the warning message.
+B<NOTE>: C<PERL_BADLANG> only gives you a way to hide the warning message.
The message tells about some problem in your system's locale support,
and you should investigate what the problem is.
part of the standardized (ISO C, XPG4, POSIX 1.c) C<setlocale()> method
for controlling an application's opinion on data. Windows is non-POSIX,
but Perl arranges for the following to work as described anyway.
+If the locale given by an environment variable is not valid, Perl tries
+the next lower one in priority. If none are valid, on Windows, the
+system default locale is then tried. If all else fails, the C<"C">
+locale is used. If even that doesn't work, something is badly broken,
+but Perl tries to forge ahead with whatever the locale settings might
+be.
=over 12
-=item LC_ALL
+=item C<LC_ALL>
C<LC_ALL> is the "override-all" locale environment variable. If
set, it overrides all the rest of the locale environment variables.
-=item LANGUAGE
+=item C<LANGUAGE>
B<NOTE>: C<LANGUAGE> is a GNU extension, it affects you only if you
are using the GNU libc. This is the case if you are using e.g. Linux.
instead a "path" (":"-separated list) of I<languages> (not locales).
See the GNU C<gettext> library documentation for more information.
-=item LC_CTYPE
+=item C<LC_CTYPE>.
In the absence of C<LC_ALL>, C<LC_CTYPE> chooses the character type
locale. In the absence of both C<LC_ALL> and C<LC_CTYPE>, C<LANG>
chooses the character type locale.
-=item LC_COLLATE
+=item C<LC_COLLATE>
In the absence of C<LC_ALL>, C<LC_COLLATE> chooses the collation
(sorting) locale. In the absence of both C<LC_ALL> and C<LC_COLLATE>,
C<LANG> chooses the collation locale.
-=item LC_MONETARY
+=item C<LC_MONETARY>
In the absence of C<LC_ALL>, C<LC_MONETARY> chooses the monetary
formatting locale. In the absence of both C<LC_ALL> and C<LC_MONETARY>,
C<LANG> chooses the monetary formatting locale.
-=item LC_NUMERIC
+=item C<LC_NUMERIC>
In the absence of C<LC_ALL>, C<LC_NUMERIC> chooses the numeric format
locale. In the absence of both C<LC_ALL> and C<LC_NUMERIC>, C<LANG>
chooses the numeric format.
-=item LC_TIME
+=item C<LC_TIME>
In the absence of C<LC_ALL>, C<LC_TIME> chooses the date and time
formatting locale. In the absence of both C<LC_ALL> and C<LC_TIME>,
C<LANG> chooses the date and time formatting locale.
-=item LANG
+=item C<LANG>
C<LANG> is the "catch-all" locale environment variable. If it is set, it
is used as the last resort after the overall C<LC_ALL> and the
-category-specific C<LC_...>.
+category-specific C<LC_I<foo>>
=back
=head2 Examples
-The LC_NUMERIC controls the numeric output:
+The C<LC_NUMERIC> controls the numeric output:
use locale;
use POSIX qw(locale_h); # Imports setlocale() and the LC_ constants.
they and Perl store characters that take up multiple bytes the same way.
However, some, if not most, C library implementations may not process
the characters in the upper half of the Latin-1 range (128 - 255)
-properly under LC_CTYPE. To see if a character is a particular type
+properly under C<LC_CTYPE>. To see if a character is a particular type
under a locale, Perl uses the functions like C<isalnum()>. Your C
library may not work for UTF-8 locales with those functions, instead
only working under the newer wide library functions like C<iswalnum()>.
Another problem with this approach is that operations that cross the
single byte/multiple byte boundary are not well-defined, and so are
-disallowed. (This boundary is between the codepoints at 255/256.).
+disallowed. (This boundary is between the codepoints at 255/256.)
For example, lower casing LATIN CAPITAL LETTER Y WITH DIAERESIS (U+0178)
should return LATIN SMALL LETTER Y WITH DIAERESIS (U+00FF). But in the
Greek locale, for example, there is no character at 0xFF, and Perl