perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=encoding utf8
	2
	3	=head1 NAME
	4
	5	perllocale - Perl locale handling (internationalization and localization)
	6
	7	=head1 DESCRIPTION
	8
	9	In the beginning there was ASCII, the "American Standard Code for
	10	Information Interchange", which works quite well for Americans with
	11	their English alphabet and dollar-denominated currency. But it doesn't
	12	work so well even for other English speakers, who may use different
	13	currencies, such as the pound sterling (as the symbol for that currency
	14	is not in ASCII); and it's hopelessly inadequate for many of the
	15	thousands of the world's other languages.
	16
	17	To address these deficiencies, the concept of locales was invented
	18	(formally the ISO C, XPG4, POSIX 1.c "locale system"). And applications
	19	were and are being written that use the locale mechanism. The process of
	20	making such an application take account of its users' preferences in
	21	these kinds of matters is called B<internationalization> (often
	22	abbreviated as B<i18n>); telling such an application about a particular
	23	set of preferences is known as B<localization> (B<l10n>).
	24
	25	Perl has been extended to support the locale system. This
	26	is controlled per application by using one pragma, one function call,
	27	and several environment variables.
	28
	29	Unfortunately, there are quite a few deficiencies with the design (and
	30	often, the implementations) of locales, and their use for character sets
	31	has mostly been supplanted by Unicode (see L<perlunitut> for an
	32	introduction to that, and keep on reading here for how Unicode interacts
	33	with locales in Perl).
	34
	35	Perl continues to support the old locale system, and starting in v5.16,
	36	provides a hybrid way to use the Unicode character set, along with the
	37	other portions of locales that may not be so problematic.
	38	(Unicode is also creating C<CLDR>, the "Common Locale Data Repository",
	39	L<http://cldr.unicode.org/> which includes more types of information than
	40	are available in the POSIX locale system. At the time of this writing,
	41	there was no CPAN module that provides access to this XML-encoded data.
	42	However, many of its locales have the POSIX-only data extracted, and are
	43	available at L<http://unicode.org/Public/cldr/latest/>.)
	44
	45	=head1 WHAT IS A LOCALE
	46
	47	A locale is a set of data that describes various aspects of how various
	48	communities in the world categorize their world. These categories are
	49	broken down into the following types (some of which include a brief
	50	note here):
	51
	52	=over
	53
	54	=item Category LC_NUMERIC: Numeric formatting
	55
	56	This indicates how numbers should be formatted for human readability,
	57	for example the character used as the decimal point.
	58
	59	=item Category LC_MONETARY: Formatting of monetary amounts
	60
	61	=for comment
	62	The nbsp below makes this look better
	63
	64	E<160>
	65
	66	=item Category LC_TIME: Date/Time formatting
	67
	68	=for comment
	69	The nbsp below makes this look better
	70
	71	E<160>
	72
	73	=item Category LC_MESSAGES: Error and other messages
	74
	75	This is used by Perl itself only for accessing operating system error
	76	messages via L<$!\|perlvar/$ERRNO> and L<$^E\|perlvar/$EXTENDED_OS_ERROR>.
	77
	78	=item Category LC_COLLATE: Collation
	79
	80	This indicates the ordering of letters for comparison and sorting.
	81	In Latin alphabets, for example, "b", generally follows "a".
	82
	83	=item Category LC_CTYPE: Character Types
	84
	85	This indicates, for example if a character is an uppercase letter.
	86
	87	=item Other categories
	88
	89	Some platforms have other categories, dealing with such things as
	90	measurement units and paper sizes. None of these are used directly by
	91	Perl, but outside operations that Perl interacts with may use
	92	these. See L</Not within the scope of any "use locale" variant> below.
	93
	94	=back
	95
	96	More details on the categories used by Perl are given below in L</LOCALE
	97	CATEGORIES>.
	98
	99	Together, these categories go a long way towards being able to customize
	100	a single program to run in many different locations. But there are
	101	deficiencies, so keep reading.
	102
	103	=head1 PREPARING TO USE LOCALES
	104
	105	Perl itself will not use locales unless specifically requested to (but
	106	again note that Perl may interact with code that does use them). Even
	107	if there is such a request, B<all> of the following must be true
	108	for it to work properly:
	109
	110	=over 4
	111
	112	=item *
	113
	114	B<Your operating system must support the locale system>. If it does,
	115	you should find that the C<setlocale()> function is a documented part of
	116	its C library.
	117
	118	=item *
	119
	120	B<Definitions for locales that you use must be installed>. You, or
	121	your system administrator, must make sure that this is the case. The
	122	available locales, the location in which they are kept, and the manner
	123	in which they are installed all vary from system to system. Some systems
	124	provide only a few, hard-wired locales and do not allow more to be
	125	added. Others allow you to add "canned" locales provided by the system
	126	supplier. Still others allow you or the system administrator to define
	127	and add arbitrary locales. (You may have to ask your supplier to
	128	provide canned locales that are not delivered with your operating
	129	system.) Read your system documentation for further illumination.
	130
	131	=item *
	132
	133	B<Perl must believe that the locale system is supported>. If it does,
	134	C<perl -V:d_setlocale> will say that the value for C<d_setlocale> is
	135	C<define>.
	136
	137	=back
	138
	139	If you want a Perl application to process and present your data
	140	according to a particular locale, the application code should include
	141	the S<C<use locale>> pragma (see L<The use locale pragma>) where
	142	appropriate, and B<at least one> of the following must be true:
	143
	144	=over 4
	145
	146	=item 1
	147
	148	B<The locale-determining environment variables (see L</"ENVIRONMENT">)
	149	must be correctly set up> at the time the application is started, either
	150	by yourself or by whomever set up your system account; or
	151
	152	=item 2
	153
	154	B<The application must set its own locale> using the method described in
	155	L<The setlocale function>.
	156
	157	=back
	158
	159	=head1 USING LOCALES
	160
	161	=head2 The use locale pragma
	162
	163	By default, Perl itself ignores the current locale. The S<C<use locale>>
	164	pragma tells Perl to use the current locale for some operations.
	165	Starting in v5.16, there is an optional parameter to this pragma:
	166
	167	use locale ':not_characters';
	168
	169	This parameter allows better mixing of locales and Unicode, and is
	170	described fully in L</Unicode and UTF-8>, but briefly, it tells Perl to
	171	not use the character portions of the locale definition, that is
	172	the C<LC_CTYPE> and C<LC_COLLATE> categories. Instead it will use the
	173	native character set (extended by Unicode). When using this parameter,
	174	you are responsible for getting the external character set translated
	175	into the native/Unicode one (which it already will be if it is one of
	176	the increasingly popular UTF-8 locales). There are convenient ways of
	177	doing this, as described in L</Unicode and UTF-8>.
	178
	179	The current locale is set at execution time by
	180	L<setlocale()\|/The setlocale function> described below. If that function
	181	hasn't yet been called in the course of the program's execution, the
	182	current locale is that which was determined by the L</"ENVIRONMENT"> in
	183	effect at the start of the program, except that
	184	C<L<LC_NUMERIC\|/Category LC_NUMERIC: Numeric Formatting>> is always
	185	initialized to the C locale (the C locale is mentioned under L<Finding
	186	locales>).
	187	If there is no valid environment, the current locale is whatever the
	188	system default has been set to. It is likely, but not necessarily, the
	189	"C" locale.
	190
	191	The operations that are affected by locale are:
	192
	193	=over 4
	194
	195	=item B<Not within the scope of any C<"use locale"> variant>
	196
	197	Only operations originating outside Perl should be affected, as follows:
	198
	199	=over 4
	200
	201	=item *
	202
	203	The variable L<$!\|perlvar/$ERRNO> (and its synonyms C<$ERRNO> and
	204	C<$OS_ERROR>) when used as strings always are in terms of the current
	205	locale.
	206
	207	=item *
	208
	209	The current locale is also used when going outside of Perl with
	210	operations like L<system()\|perlfunc/system LIST> or
	211	L<qxE<sol>E<sol>\|perlop/qxE<sol>STRINGE<sol>>, if those operations are
	212	locale-sensitive.
	213
	214	=item *
	215
	216	Also Perl gives access to various C library functions through the
	217	L<POSIX> module. Some of those functions are always affected by the
	218	current locale. For example, C<POSIX::strftime()> uses C<LC_TIME>;
	219	C<POSIX::strtod()> uses C<LC_NUMERIC>; C<POSIX::strcoll()> and
	220	C<POSIX::strxfrm()> use C<LC_COLLATE>; and character classification
	221	functions like C<POSIX::isalnum()> use C<LC_CTYPE>. All such functions
	222	will behave according to the current underlying locale, even if that
	223	locale isn't exposed to Perl space.
	224
	225	=item *
	226
	227	Perl also provides lite wrappers for XS modules to use some C library
	228	C<printf> functions. These wrappers don't do anything with the locale,
	229	and the underlying C library function is affected by the locale in
	230	effect at the time of the wrapper call.
	231	The affected functions are
	232	L<perlapi/my_sprintf>,
	233	L<perlapi/my_snprintf>,
	234	and
	235	L<perlapi/my_vsnprintf>.
	236
	237	=back
	238
	239	=item Lingering effects of C<S<use locale>>
	240
	241	Certain Perl operations that are set-up within the scope of a
	242	C<use locale> variant retain that effect even outside the scope.
	243	These include:
	244
	245	=over 4
	246
	247	=item *
	248
	249	The output format of a L<write()\|perlfunc/write> is determined by an
	250	earlier format declaration (L<perlfunc/format>), so whether or not the
	251	output is affected by locale is determined by if the C<format()> is
	252	within the scope of a C<use locale> variant, not whether the C<write()>
	253	is.
	254
	255	=item *
	256
	257	Regular expression patterns can be compiled using
	258	L<qrE<sol>E<sol>\|perlop/qrE<sol>STRINGE<sol>msixpodual> with actual
	259	matching deferred to later. Again, it is whether or not the compilation
	260	was done within the scope of C<use locale> that determines the match
	261	behavior, not if the matches are done within such a scope or not.
	262
	263	=back
	264
	265	=item B<Under C<"use locale ':not_characters';">>
	266
	267	=over 4
	268
	269	=item *
	270
	271	All the non-Perl operations.
	272
	273	=item *
	274
	275	B<Format declarations> (L<perlfunc/format>) and hence any subsequent
	276	C<write()>s use C<LC_NUMERIC>.
	277
	278	=item *
	279
	280	B<stringification and output> use C<LC_NUMERIC>.
	281	These include the results of
	282	C<print()>,
	283	C<printf()>,
	284	C<say()>,
	285	and
	286	C<sprintf()>.
	287
	288	=back
	289
	290	=for comment
	291	The nbsp below makes this look better
	292
	293	E<160>
	294
	295	=item B<Under just plain C<"use locale";>>
	296
	297	=over 4
	298
	299	=item *
	300
	301	All the above operations
	302
	303	=item *
	304
	305	B<The comparison operators> (C<lt>, C<le>, C<cmp>, C<ge>, and C<gt>) use
	306	C<LC_COLLATE>. C<sort()> is also affected if used without an
	307	explicit comparison function, because it uses C<cmp> by default.
	308
	309	B<Note:> C<eq> and C<ne> are unaffected by locale: they always
	310	perform a char-by-char comparison of their scalar operands. What's
	311	more, if C<cmp> finds that its operands are equal according to the
	312	collation sequence specified by the current locale, it goes on to
	313	perform a char-by-char comparison, and only returns I<0> (equal) if the
	314	operands are char-for-char identical. If you really want to know whether
	315	two strings--which C<eq> and C<cmp> may consider different--are equal
	316	as far as collation in the locale is concerned, see the discussion in
	317	L<Category LC_COLLATE: Collation>.
	318
	319	=item *
	320
	321	B<Regular expressions and case-modification functions> (C<uc()>, C<lc()>,
	322	C<ucfirst()>, and C<lcfirst()>) use C<LC_CTYPE>
	323
	324	=back
	325
	326	=back
	327
	328	The default behavior is restored with the S<C<no locale>> pragma, or
	329	upon reaching the end of the block enclosing C<use locale>.
	330	Note that C<use locale> and C<use locale ':not_characters'> may be
	331	nested, and that what is in effect within an inner scope will revert to
	332	the outer scope's rules at the end of the inner scope.
	333
	334	The string result of any operation that uses locale
	335	information is tainted, as it is possible for a locale to be
	336	untrustworthy. See L<"SECURITY">.
	337
	338	=head2 The setlocale function
	339
	340	You can switch locales as often as you wish at run time with the
	341	C<POSIX::setlocale()> function:
	342
	343	# Import locale-handling tool set from POSIX module.
	344	# This example uses: setlocale -- the function call
	345	# LC_CTYPE -- explained below
	346	# (Showing the testing for success/failure of operations is
	347	# omitted in these examples to avoid distracting from the main
	348	# point
	349
	350	use POSIX qw(locale_h);
	351	use locale;
	352	my $old_locale;
	353
	354	# query and save the old locale
	355	$old_locale = setlocale(LC_CTYPE);
	356
	357	setlocale(LC_CTYPE, "fr_CA.ISO8859-1");
	358	# LC_CTYPE now in locale "French, Canada, codeset ISO 8859-1"
	359
	360	setlocale(LC_CTYPE, "");
	361	# LC_CTYPE now reset to default defined by LC_ALL/LC_CTYPE/LANG
	362	# environment variables. See below for documentation.
	363
	364	# restore the old locale
	365	setlocale(LC_CTYPE, $old_locale);
	366
	367	The first argument of C<setlocale()> gives the B<category>, the second the
	368	B<locale>. The category tells in what aspect of data processing you
	369	want to apply locale-specific rules. Category names are discussed in
	370	L</LOCALE CATEGORIES> and L</"ENVIRONMENT">. The locale is the name of a
	371	collection of customization information corresponding to a particular
	372	combination of language, country or territory, and codeset. Read on for
	373	hints on the naming of locales: not all systems name locales as in the
	374	example.
	375
	376	If no second argument is provided and the category is something other
	377	than LC_ALL, the function returns a string naming the current locale
	378	for the category. You can use this value as the second argument in a
	379	subsequent call to C<setlocale()>, B<but> on some platforms the string
	380	is opaque, not something that most people would be able to decipher as
	381	to what locale it means.
	382
	383	If no second argument is provided and the category is LC_ALL, the
	384	result is implementation-dependent. It may be a string of
	385	concatenated locale names (separator also implementation-dependent)
	386	or a single locale name. Please consult your L<setlocale(3)> man page for
	387	details.
	388
	389	If a second argument is given and it corresponds to a valid locale,
	390	the locale for the category is set to that value, and the function
	391	returns the now-current locale value. You can then use this in yet
	392	another call to C<setlocale()>. (In some implementations, the return
	393	value may sometimes differ from the value you gave as the second
	394	argument--think of it as an alias for the value you gave.)
	395
	396	As the example shows, if the second argument is an empty string, the
	397	category's locale is returned to the default specified by the
	398	corresponding environment variables. Generally, this results in a
	399	return to the default that was in force when Perl started up: changes
	400	to the environment made by the application after startup may or may not
	401	be noticed, depending on your system's C library.
	402
	403	Note that Perl ignores the current C<LC_CTYPE> and C<LC_COLLATE> locales
	404	within the scope of a C<use locale ':not_characters'>.
	405
	406	If C<set_locale()> fails for some reason (for example, an attempt to set
	407	to a locale unknown to the system), the locale for the category is not
	408	changed, and the function returns C<undef>.
	409
	410
	411	For further information about the categories, consult L<setlocale(3)>.
	412
	413	=head2 Finding locales
	414
	415	For locales available in your system, consult also L<setlocale(3)> to
	416	see whether it leads to the list of available locales (search for the
	417	I<SEE ALSO> section). If that fails, try the following command lines:
	418
	419	locale -a
	420
	421	nlsinfo
	422
	423	ls /usr/lib/nls/loc
	424
	425	ls /usr/lib/locale
	426
	427	ls /usr/lib/nls
	428
	429	ls /usr/share/locale
	430
	431	and see whether they list something resembling these
	432
	433	en_US.ISO8859-1 de_DE.ISO8859-1 ru_RU.ISO8859-5
	434	en_US.iso88591 de_DE.iso88591 ru_RU.iso88595
	435	en_US de_DE ru_RU
	436	en de ru
	437	english german russian
	438	english.iso88591 german.iso88591 russian.iso88595
	439	english.roman8 russian.koi8r
	440
	441	Sadly, even though the calling interface for C<setlocale()> has been
	442	standardized, names of locales and the directories where the
	443	configuration resides have not been. The basic form of the name is
	444	I<language_territory>B<.>I<codeset>, but the latter parts after
	445	I<language> are not always present. The I<language> and I<country>
	446	are usually from the standards B<ISO 3166> and B<ISO 639>, the
	447	two-letter abbreviations for the countries and the languages of the
	448	world, respectively. The I<codeset> part often mentions some B<ISO
	449	8859> character set, the Latin codesets. For example, C<ISO 8859-1>
	450	is the so-called "Western European codeset" that can be used to encode
	451	most Western European languages adequately. Again, there are several
	452	ways to write even the name of that one standard. Lamentably.
	453
	454	Two special locales are worth particular mention: "C" and "POSIX".
	455	Currently these are effectively the same locale: the difference is
	456	mainly that the first one is defined by the C standard, the second by
	457	the POSIX standard. They define the B<default locale> in which
	458	every program starts in the absence of locale information in its
	459	environment. (The I<default> default locale, if you will.) Its language
	460	is (American) English and its character codeset ASCII or, rarely, a
	461	superset thereof (such as the "DEC Multinational Character Set
	462	(DEC-MCS)"). B<Warning>. The C locale delivered by some vendors
	463	may not actually exactly match what the C standard calls for. So
	464	beware.
	465
	466	B<NOTE>: Not all systems have the "POSIX" locale (not all systems are
	467	POSIX-conformant), so use "C" when you need explicitly to specify this
	468	default locale.
	469
	470	=head2 LOCALE PROBLEMS
	471
	472	You may encounter the following warning message at Perl startup:
	473
	474	perl: warning: Setting locale failed.
	475	perl: warning: Please check that your locale settings:
	476	LC_ALL = "En_US",
	477	LANG = (unset)
	478	are supported and installed on your system.
	479	perl: warning: Falling back to the standard locale ("C").
	480
	481	This means that your locale settings had LC_ALL set to "En_US" and
	482	LANG exists but has no value. Perl tried to believe you but could not.
	483	Instead, Perl gave up and fell back to the "C" locale, the default locale
	484	that is supposed to work no matter what. This usually means your locale
	485	settings were wrong, they mention locales your system has never heard
	486	of, or the locale installation in your system has problems (for example,
	487	some system files are broken or missing). There are quick and temporary
	488	fixes to these problems, as well as more thorough and lasting fixes.
	489
	490	=head2 Testing for broken locales
	491
	492	If you are building Perl from source, the Perl test suite file
	493	F<lib/locale.t> can be used to test the locales on your system.
	494	Setting the environment variable C<PERL_DEBUG_FULL_TEST> to 1
	495	will cause it to output detailed results. For example, on Linux, you
	496	could say
	497
	498	PERL_DEBUG_FULL_TEST=1 ./perl -T -Ilib lib/locale.t > locale.log 2>&1
	499
	500	Besides many other tests, it will test every locale it finds on your
	501	system to see if they conform to the POSIX standard. If any have
	502	errors, it will include a summary near the end of the output of which
	503	locales passed all its tests, and which failed, and why.
	504
	505	=head2 Temporarily fixing locale problems
	506
	507	The two quickest fixes are either to render Perl silent about any
	508	locale inconsistencies or to run Perl under the default locale "C".
	509
	510	Perl's moaning about locale problems can be silenced by setting the
	511	environment variable PERL_BADLANG to a zero value, for example "0".
	512	This method really just sweeps the problem under the carpet: you tell
	513	Perl to shut up even when Perl sees that something is wrong. Do not
	514	be surprised if later something locale-dependent misbehaves.
	515
	516	Perl can be run under the "C" locale by setting the environment
	517	variable LC_ALL to "C". This method is perhaps a bit more civilized
	518	than the PERL_BADLANG approach, but setting LC_ALL (or
	519	other locale variables) may affect other programs as well, not just
	520	Perl. In particular, external programs run from within Perl will see
	521	these changes. If you make the new settings permanent (read on), all
	522	programs you run see the changes. See L<"ENVIRONMENT"> for
	523	the full list of relevant environment variables and L<USING LOCALES>
	524	for their effects in Perl. Effects in other programs are
	525	easily deducible. For example, the variable LC_COLLATE may well affect
	526	your B<sort> program (or whatever the program that arranges "records"
	527	alphabetically in your system is called).
	528
	529	You can test out changing these variables temporarily, and if the
	530	new settings seem to help, put those settings into your shell startup
	531	files. Consult your local documentation for the exact details. For in
	532	Bourne-like shells (B<sh>, B<ksh>, B<bash>, B<zsh>):
	533
	534	LC_ALL=en_US.ISO8859-1
	535	export LC_ALL
	536
	537	This assumes that we saw the locale "en_US.ISO8859-1" using the commands
	538	discussed above. We decided to try that instead of the above faulty
	539	locale "En_US"--and in Cshish shells (B<csh>, B<tcsh>)
	540
	541	setenv LC_ALL en_US.ISO8859-1
	542
	543	or if you have the "env" application you can do in any shell
	544
	545	env LC_ALL=en_US.ISO8859-1 perl ...
	546
	547	If you do not know what shell you have, consult your local
	548	helpdesk or the equivalent.
	549
	550	=head2 Permanently fixing locale problems
	551
	552	The slower but superior fixes are when you may be able to yourself
	553	fix the misconfiguration of your own environment variables. The
	554	mis(sing)configuration of the whole system's locales usually requires
	555	the help of your friendly system administrator.
	556
	557	First, see earlier in this document about L<Finding locales>. That tells
	558	how to find which locales are really supported--and more importantly,
	559	installed--on your system. In our example error message, environment
	560	variables affecting the locale are listed in the order of decreasing
	561	importance (and unset variables do not matter). Therefore, having
	562	LC_ALL set to "En_US" must have been the bad choice, as shown by the
	563	error message. First try fixing locale settings listed first.
	564
	565	Second, if using the listed commands you see something B<exactly>
	566	(prefix matches do not count and case usually counts) like "En_US"
	567	without the quotes, then you should be okay because you are using a
	568	locale name that should be installed and available in your system.
	569	In this case, see L<Permanently fixing your system's locale configuration>.
	570
	571	=head2 Permanently fixing your system's locale configuration
	572
	573	This is when you see something like:
	574
	575	perl: warning: Please check that your locale settings:
	576	LC_ALL = "En_US",
	577	LANG = (unset)
	578	are supported and installed on your system.
	579
	580	but then cannot see that "En_US" listed by the above-mentioned
	581	commands. You may see things like "en_US.ISO8859-1", but that isn't
	582	the same. In this case, try running under a locale
	583	that you can list and which somehow matches what you tried. The
	584	rules for matching locale names are a bit vague because
	585	standardization is weak in this area. See again the
	586	L<Finding locales> about general rules.
	587
	588	=head2 Fixing system locale configuration
	589
	590	Contact a system administrator (preferably your own) and report the exact
	591	error message you get, and ask them to read this same documentation you
	592	are now reading. They should be able to check whether there is something
	593	wrong with the locale configuration of the system. The L<Finding locales>
	594	section is unfortunately a bit vague about the exact commands and places
	595	because these things are not that standardized.
	596
	597	=head2 The localeconv function
	598
	599	The C<POSIX::localeconv()> function allows you to get particulars of the
	600	locale-dependent numeric formatting information specified by the current
	601	C<LC_NUMERIC> and C<LC_MONETARY> locales. (If you just want the name of
	602	the current locale for a particular category, use C<POSIX::setlocale()>
	603	with a single parameter--see L<The setlocale function>.)
	604
	605	use POSIX qw(locale_h);
	606
	607	# Get a reference to a hash of locale-dependent info
	608	$locale_values = localeconv();
	609
	610	# Output sorted list of the values
	611	for (sort keys %$locale_values) {
	612	printf "%-20s = %s\n", $_, $locale_values->{$_}
	613	}
	614
	615	C<localeconv()> takes no arguments, and returns B<a reference to> a hash.
	616	The keys of this hash are variable names for formatting, such as
	617	C<decimal_point> and C<thousands_sep>. The values are the
	618	corresponding, er, values. See L<POSIX/localeconv> for a longer
	619	example listing the categories an implementation might be expected to
	620	provide; some provide more and others fewer. You don't need an
	621	explicit C<use locale>, because C<localeconv()> always observes the
	622	current locale.
	623
	624	Here's a simple-minded example program that rewrites its command-line
	625	parameters as integers correctly formatted in the current locale:
	626
	627	use POSIX qw(locale_h);
	628
	629	# Get some of locale's numeric formatting parameters
	630	my ($thousands_sep, $grouping) =
	631	@{localeconv()}{'thousands_sep', 'grouping'};
	632
	633	# Apply defaults if values are missing
	634	$thousands_sep = ',' unless $thousands_sep;
	635
	636	# grouping and mon_grouping are packed lists
	637	# of small integers (characters) telling the
	638	# grouping (thousand_seps and mon_thousand_seps
	639	# being the group dividers) of numbers and
	640	# monetary quantities. The integers' meanings:
	641	# 255 means no more grouping, 0 means repeat
	642	# the previous grouping, 1-254 means use that
	643	# as the current grouping. Grouping goes from
	644	# right to left (low to high digits). In the
	645	# below we cheat slightly by never using anything
	646	# else than the first grouping (whatever that is).
	647	if ($grouping) {
	648	@grouping = unpack("C*", $grouping);
	649	} else {
	650	@grouping = (3);
	651	}
	652
	653	# Format command line params for current locale
	654	for (@ARGV) {
	655	$_ = int; # Chop non-integer part
	656	1 while
	657	s/(\d)(\d{$grouping[0]}($\|$thousands_sep))/$1$thousands_sep$2/;
	658	print "$_";
	659	}
	660	print "\n";
	661
	662	=head2 I18N::Langinfo
	663
	664	Another interface for querying locale-dependent information is the
	665	C<I18N::Langinfo::langinfo()> function, available at least in Unix-like
	666	systems and VMS.
	667
	668	The following example will import the C<langinfo()> function itself and
	669	three constants to be used as arguments to C<langinfo()>: a constant for
	670	the abbreviated first day of the week (the numbering starts from
	671	Sunday = 1) and two more constants for the affirmative and negative
	672	answers for a yes/no question in the current locale.
	673
	674	use I18N::Langinfo qw(langinfo ABDAY_1 YESSTR NOSTR);
	675
	676	my ($abday_1, $yesstr, $nostr)
	677	= map { langinfo } qw(ABDAY_1 YESSTR NOSTR);
	678
	679	print "$abday_1? [$yesstr/$nostr] ";
	680
	681	In other words, in the "C" (or English) locale the above will probably
	682	print something like:
	683
	684	Sun? [yes/no]
	685
	686	See L<I18N::Langinfo> for more information.
	687
	688	=head1 LOCALE CATEGORIES
	689
	690	The following subsections describe basic locale categories. Beyond these,
	691	some combination categories allow manipulation of more than one
	692	basic category at a time. See L<"ENVIRONMENT"> for a discussion of these.
	693
	694	=head2 Category LC_COLLATE: Collation
	695
	696	In the scope of S<C<use locale>> (but not a
	697	C<use locale ':not_characters'>), Perl looks to the C<LC_COLLATE>
	698	environment variable to determine the application's notions on collation
	699	(ordering) of characters. For example, "b" follows "a" in Latin
	700	alphabets, but where do "E<aacute>" and "E<aring>" belong? And while
	701	"color" follows "chocolate" in English, what about in traditional Spanish?
	702
	703	The following collations all make sense and you may meet any of them
	704	if you "use locale".
	705
	706	A B C D E a b c d e
	707	A a B b C c D d E e
	708	a A b B c C d D e E
	709	a b c d e A B C D E
	710
	711	Here is a code snippet to tell what "word"
	712	characters are in the current locale, in that locale's order:
	713
	714	use locale;
	715	print +(sort grep /\w/, map { chr } 0..255), "\n";
	716
	717	Compare this with the characters that you see and their order if you
	718	state explicitly that the locale should be ignored:
	719
	720	no locale;
	721	print +(sort grep /\w/, map { chr } 0..255), "\n";
	722
	723	This machine-native collation (which is what you get unless S<C<use
	724	locale>> has appeared earlier in the same block) must be used for
	725	sorting raw binary data, whereas the locale-dependent collation of the
	726	first example is useful for natural text.
	727
	728	As noted in L<USING LOCALES>, C<cmp> compares according to the current
	729	collation locale when C<use locale> is in effect, but falls back to a
	730	char-by-char comparison for strings that the locale says are equal. You
	731	can use C<POSIX::strcoll()> if you don't want this fall-back:
	732
	733	use POSIX qw(strcoll);
	734	$equal_in_locale =
	735	!strcoll("space and case ignored", "SpaceAndCaseIgnored");
	736
	737	C<$equal_in_locale> will be true if the collation locale specifies a
	738	dictionary-like ordering that ignores space characters completely and
	739	which folds case.
	740
	741	If you have a single string that you want to check for "equality in
	742	locale" against several others, you might think you could gain a little
	743	efficiency by using C<POSIX::strxfrm()> in conjunction with C<eq>:
	744
	745	use POSIX qw(strxfrm);
	746	$xfrm_string = strxfrm("Mixed-case string");
	747	print "locale collation ignores spaces\n"
	748	if $xfrm_string eq strxfrm("Mixed-casestring");
	749	print "locale collation ignores hyphens\n"
	750	if $xfrm_string eq strxfrm("Mixedcase string");
	751	print "locale collation ignores case\n"
	752	if $xfrm_string eq strxfrm("mixed-case string");
	753
	754	C<strxfrm()> takes a string and maps it into a transformed string for use
	755	in char-by-char comparisons against other transformed strings during
	756	collation. "Under the hood", locale-affected Perl comparison operators
	757	call C<strxfrm()> for both operands, then do a char-by-char
	758	comparison of the transformed strings. By calling C<strxfrm()> explicitly
	759	and using a non locale-affected comparison, the example attempts to save
	760	a couple of transformations. But in fact, it doesn't save anything: Perl
	761	magic (see L<perlguts/Magic Variables>) creates the transformed version of a
	762	string the first time it's needed in a comparison, then keeps this version around
	763	in case it's needed again. An example rewritten the easy way with
	764	C<cmp> runs just about as fast. It also copes with null characters
	765	embedded in strings; if you call C<strxfrm()> directly, it treats the first
	766	null it finds as a terminator. don't expect the transformed strings
	767	it produces to be portable across systems--or even from one revision
	768	of your operating system to the next. In short, don't call C<strxfrm()>
	769	directly: let Perl do it for you.
	770
	771	Note: C<use locale> isn't shown in some of these examples because it isn't
	772	needed: C<strcoll()> and C<strxfrm()> are POSIX functions
	773	which use the standard system-supplied C<libc> functions that
	774	always obey the current C<LC_COLLATE> locale.
	775
	776	=head2 Category LC_CTYPE: Character Types
	777
	778	In the scope of S<C<use locale>> (but not a
	779	C<use locale ':not_characters'>), Perl obeys the C<LC_CTYPE> locale
	780	setting. This controls the application's notion of which characters are
	781	alphabetic. This affects Perl's C<\w> regular expression metanotation,
	782	which stands for alphanumeric characters--that is, alphabetic,
	783	numeric, and including other special characters such as the underscore or
	784	hyphen. (Consult L<perlre> for more information about
	785	regular expressions.) Thanks to C<LC_CTYPE>, depending on your locale
	786	setting, characters like "E<aelig>", "E<eth>", "E<szlig>", and
	787	"E<oslash>" may be understood as C<\w> characters.
	788
	789	The C<LC_CTYPE> locale also provides the map used in transliterating
	790	characters between lower and uppercase. This affects the case-mapping
	791	functions--C<fc()>, C<lc()>, C<lcfirst()>, C<uc()>, and C<ucfirst()>; case-mapping
	792	interpolation with C<\F>, C<\l>, C<\L>, C<\u>, or C<\U> in double-quoted
	793	strings and C<s///> substitutions; and case-independent regular expression
	794	pattern matching using the C<i> modifier.
	795
	796	Finally, C<LC_CTYPE> affects the POSIX character-class test
	797	functions--C<POSIX::isalpha()>, C<POSIX::islower()>, and so on. For
	798	example, if you move from the "C" locale to a 7-bit Scandinavian one,
	799	you may find--possibly to your surprise--that "\|" moves from the
	800	C<POSIX::ispunct()> class to C<POSIX::isalpha()>.
	801	Unfortunately, this creates big problems for regular expressions. "\|" still
	802	means alternation even though it matches C<\w>.
	803
	804	Note that there are quite a few things that are unaffected by the
	805	current locale. All the escape sequences for particular characters,
	806	C<\n> for example, always mean the platform's native one. This means,
	807	for example, that C<\N> in regular expressions (every character
	808	but new-line) works on the platform character set.
	809
	810	B<Note:> A broken or malicious C<LC_CTYPE> locale definition may result
	811	in clearly ineligible characters being considered to be alphanumeric by
	812	your application. For strict matching of (mundane) ASCII letters and
	813	digits--for example, in command strings--locale-aware applications
	814	should use C<\w> with the C</a> regular expression modifier. See L<"SECURITY">.
	815
	816	=head2 Category LC_NUMERIC: Numeric Formatting
	817
	818	After a proper C<POSIX::setlocale()> call, and within the scope of one
	819	of the C<use locale> variants, Perl obeys the C<LC_NUMERIC>
	820	locale information, which controls an application's idea of how numbers
	821	should be formatted for human readability.
	822	In most implementations the only effect is to
	823	change the character used for the decimal point--perhaps from "." to ",".
	824	The functions aren't aware of such niceties as thousands separation and
	825	so on. (See L<The localeconv function> if you care about these things.)
	826
	827	use POSIX qw(strtod setlocale LC_NUMERIC);
	828	use locale;
	829
	830	setlocale LC_NUMERIC, "";
	831
	832	$n = 5/2; # Assign numeric 2.5 to $n
	833
	834	$a = " $n"; # Locale-dependent conversion to string
	835
	836	print "half five is $n\n"; # Locale-dependent output
	837
	838	printf "half five is %g\n", $n; # Locale-dependent output
	839
	840	print "DECIMAL POINT IS COMMA\n"
	841	if $n == (strtod("2,5"))[0]; # Locale-dependent conversion
	842
	843	See also L<I18N::Langinfo> and C<RADIXCHAR>.
	844
	845	=head2 Category LC_MONETARY: Formatting of monetary amounts
	846
	847	The C standard defines the C<LC_MONETARY> category, but not a function
	848	that is affected by its contents. (Those with experience of standards
	849	committees will recognize that the working group decided to punt on the
	850	issue.) Consequently, Perl essentially takes no notice of it. If you
	851	really want to use C<LC_MONETARY>, you can query its contents--see
	852	L<The localeconv function>--and use the information that it returns in your
	853	application's own formatting of currency amounts. However, you may well
	854	find that the information, voluminous and complex though it may be, still
	855	does not quite meet your requirements: currency formatting is a hard nut
	856	to crack.
	857
	858	See also L<I18N::Langinfo> and C<CRNCYSTR>.
	859
	860	=head2 LC_TIME
	861
	862	Output produced by C<POSIX::strftime()>, which builds a formatted
	863	human-readable date/time string, is affected by the current C<LC_TIME>
	864	locale. Thus, in a French locale, the output produced by the C<%B>
	865	format element (full month name) for the first month of the year would
	866	be "janvier". Here's how to get a list of long month names in the
	867	current locale:
	868
	869	use POSIX qw(strftime);
	870	for (0..11) {
	871	$long_month_name[$_] =
	872	strftime("%B", 0, 0, 0, 1, $_, 96);
	873	}
	874
	875	Note: C<use locale> isn't needed in this example: C<strftime()> is a POSIX
	876	function which uses the standard system-supplied C<libc> function that
	877	always obeys the current C<LC_TIME> locale.
	878
	879	See also L<I18N::Langinfo> and C<ABDAY_1>..C<ABDAY_7>, C<DAY_1>..C<DAY_7>,
	880	C<ABMON_1>..C<ABMON_12>, and C<ABMON_1>..C<ABMON_12>.
	881
	882	=head2 Other categories
	883
	884	The remaining locale categories are not currently used by Perl itself.
	885	But again note that things Perl interacts with may use these, including
	886	extensions outside the standard Perl distribution, and by the
	887	operating system and its utilities. Note especially that the string
	888	value of C<$!> and the error messages given by external utilities may
	889	be changed by C<LC_MESSAGES>. If you want to have portable error
	890	codes, use C<%!>. See L<Errno>.
	891
	892	=head1 SECURITY
	893
	894	Although the main discussion of Perl security issues can be found in
	895	L<perlsec>, a discussion of Perl's locale handling would be incomplete
	896	if it did not draw your attention to locale-dependent security issues.
	897	Locales--particularly on systems that allow unprivileged users to
	898	build their own locales--are untrustworthy. A malicious (or just plain
	899	broken) locale can make a locale-aware application give unexpected
	900	results. Here are a few possibilities:
	901
	902	=over 4
	903
	904	=item *
	905
	906	Regular expression checks for safe file names or mail addresses using
	907	C<\w> may be spoofed by an C<LC_CTYPE> locale that claims that
	908	characters such as "E<gt>" and "\|" are alphanumeric.
	909
	910	=item *
	911
	912	String interpolation with case-mapping, as in, say, C<$dest =
	913	"C:\U$name.$ext">, may produce dangerous results if a bogus LC_CTYPE
	914	case-mapping table is in effect.
	915
	916	=item *
	917
	918	A sneaky C<LC_COLLATE> locale could result in the names of students with
	919	"D" grades appearing ahead of those with "A"s.
	920
	921	=item *
	922
	923	An application that takes the trouble to use information in
	924	C<LC_MONETARY> may format debits as if they were credits and vice versa
	925	if that locale has been subverted. Or it might make payments in US
	926	dollars instead of Hong Kong dollars.
	927
	928	=item *
	929
	930	The date and day names in dates formatted by C<strftime()> could be
	931	manipulated to advantage by a malicious user able to subvert the
	932	C<LC_DATE> locale. ("Look--it says I wasn't in the building on
	933	Sunday.")
	934
	935	=back
	936
	937	Such dangers are not peculiar to the locale system: any aspect of an
	938	application's environment which may be modified maliciously presents
	939	similar challenges. Similarly, they are not specific to Perl: any
	940	programming language that allows you to write programs that take
	941	account of their environment exposes you to these issues.
	942
	943	Perl cannot protect you from all possibilities shown in the
	944	examples--there is no substitute for your own vigilance--but, when
	945	C<use locale> is in effect, Perl uses the tainting mechanism (see
	946	L<perlsec>) to mark string results that become locale-dependent, and
	947	which may be untrustworthy in consequence. Here is a summary of the
	948	tainting behavior of operators and functions that may be affected by
	949	the locale:
	950
	951	=over 4
	952
	953	=item *
	954
	955	B<Comparison operators> (C<lt>, C<le>, C<ge>, C<gt> and C<cmp>):
	956
	957	Scalar true/false (or less/equal/greater) result is never tainted.
	958
	959	=item *
	960
	961	B<Case-mapping interpolation> (with C<\l>, C<\L>, C<\u>, C<\U>, or C<\F>)
	962
	963	Result string containing interpolated material is tainted if
	964	C<use locale> (but not S<C<use locale ':not_characters'>>) is in effect.
	965
	966	=item *
	967
	968	B<Matching operator> (C<m//>):
	969
	970	Scalar true/false result never tainted.
	971
	972	All subpatterns, either delivered as a list-context result or as C<$1>
	973	I<etc>., are tainted if C<use locale> (but not
	974	S<C<use locale ':not_characters'>>) is in effect, and the subpattern
	975	regular expression is matched case-insensitively (C</i>) or contains a
	976	locale-dependent construct. These constructs include C<\w>
	977	(to match an alphanumeric character), C<\W> (non-alphanumeric
	978	character), C<\s> (whitespace character), C<\S> (non whitespace
	979	character), and the POSIX character classes, such as C<[:alpha:]> (see
	980	L<perlrecharclass/POSIX Character Classes>).
	981	The matched-pattern variables, C<$&>, C<$`> (pre-match), C<$'>
	982	(post-match), and C<$+> (last match) also are tainted.
	983	(Note that currently there are some bugs where not everything that
	984	should be tainted gets tainted in all circumstances.)
	985
	986	=item *
	987
	988	B<Substitution operator> (C<s///>):
	989
	990	Has the same behavior as the match operator. Also, the left
	991	operand of C<=~> becomes tainted when C<use locale>
	992	(but not S<C<use locale ':not_characters'>>) is in effect if modified as
	993	a result of a substitution based on a regular
	994	expression match involving any of the things mentioned in the previous
	995	item, or of case-mapping, such as C<\l>, C<\L>,C<\u>, C<\U>, or C<\F>.
	996
	997	=item *
	998
	999	B<Output formatting functions> (C<printf()> and C<write()>):
	1000
	1001	Results are never tainted because otherwise even output from print,
	1002	for example C<print(1/7)>, should be tainted if C<use locale> is in
	1003	effect.
	1004
	1005	=item *
	1006
	1007	B<Case-mapping functions> (C<lc()>, C<lcfirst()>, C<uc()>, C<ucfirst()>):
	1008
	1009	Results are tainted if C<use locale> (but not
	1010	S<C<use locale ':not_characters'>>) is in effect.
	1011
	1012	=item *
	1013
	1014	B<POSIX locale-dependent functions> (C<localeconv()>, C<strcoll()>,
	1015	C<strftime()>, C<strxfrm()>):
	1016
	1017	Results are never tainted.
	1018
	1019	=item *
	1020
	1021	B<POSIX character class tests> (C<POSIX::isalnum()>,
	1022	C<POSIX::isalpha()>, C<POSIX::isdigit()>, C<POSIX::isgraph()>,
	1023	C<POSIX::islower()>, C<POSIX::isprint()>, C<POSIX::ispunct()>,
	1024	C<POSIX::isspace()>, C<POSIX::isupper()>, C<POSIX::isxdigit()>):
	1025
	1026	True/false results are never tainted.
	1027
	1028	=back
	1029
	1030	Three examples illustrate locale-dependent tainting.
	1031	The first program, which ignores its locale, won't run: a value taken
	1032	directly from the command line may not be used to name an output file
	1033	when taint checks are enabled.
	1034
	1035	#/usr/local/bin/perl -T
	1036	# Run with taint checking
	1037
	1038	# Command line sanity check omitted...
	1039	$tainted_output_file = shift;
	1040
	1041	open(F, ">$tainted_output_file")
	1042	or warn "Open of $tainted_output_file failed: $!\n";
	1043
	1044	The program can be made to run by "laundering" the tainted value through
	1045	a regular expression: the second example--which still ignores locale
	1046	information--runs, creating the file named on its command line
	1047	if it can.
	1048
	1049	#/usr/local/bin/perl -T
	1050
	1051	$tainted_output_file = shift;
	1052	$tainted_output_file =~ m%[\w/]+%;
	1053	$untainted_output_file = $&;
	1054
	1055	open(F, ">$untainted_output_file")
	1056	or warn "Open of $untainted_output_file failed: $!\n";
	1057
	1058	Compare this with a similar but locale-aware program:
	1059
	1060	#/usr/local/bin/perl -T
	1061
	1062	$tainted_output_file = shift;
	1063	use locale;
	1064	$tainted_output_file =~ m%[\w/]+%;
	1065	$localized_output_file = $&;
	1066
	1067	open(F, ">$localized_output_file")
	1068	or warn "Open of $localized_output_file failed: $!\n";
	1069
	1070	This third program fails to run because C<$&> is tainted: it is the result
	1071	of a match involving C<\w> while C<use locale> is in effect.
	1072
	1073	=head1 ENVIRONMENT
	1074
	1075	=over 12
	1076
	1077	=item PERL_BADLANG
	1078
	1079	A string that can suppress Perl's warning about failed locale settings
	1080	at startup. Failure can occur if the locale support in the operating
	1081	system is lacking (broken) in some way--or if you mistyped the name of
	1082	a locale when you set up your environment. If this environment
	1083	variable is absent, or has a value that does not evaluate to integer
	1084	zero--that is, "0" or ""-- Perl will complain about locale setting
	1085	failures.
	1086
	1087	B<NOTE>: PERL_BADLANG only gives you a way to hide the warning message.
	1088	The message tells about some problem in your system's locale support,
	1089	and you should investigate what the problem is.
	1090
	1091	=back
	1092
	1093	The following environment variables are not specific to Perl: They are
	1094	part of the standardized (ISO C, XPG4, POSIX 1.c) C<setlocale()> method
	1095	for controlling an application's opinion on data.
	1096
	1097	=over 12
	1098
	1099	=item LC_ALL
	1100
	1101	C<LC_ALL> is the "override-all" locale environment variable. If
	1102	set, it overrides all the rest of the locale environment variables.
	1103
	1104	=item LANGUAGE
	1105
	1106	B<NOTE>: C<LANGUAGE> is a GNU extension, it affects you only if you
	1107	are using the GNU libc. This is the case if you are using e.g. Linux.
	1108	If you are using "commercial" Unixes you are most probably I<not>
	1109	using GNU libc and you can ignore C<LANGUAGE>.
	1110
	1111	However, in the case you are using C<LANGUAGE>: it affects the
	1112	language of informational, warning, and error messages output by
	1113	commands (in other words, it's like C<LC_MESSAGES>) but it has higher
	1114	priority than C<LC_ALL>. Moreover, it's not a single value but
	1115	instead a "path" (":"-separated list) of I<languages> (not locales).
	1116	See the GNU C<gettext> library documentation for more information.
	1117
	1118	=item LC_CTYPE
	1119
	1120	In the absence of C<LC_ALL>, C<LC_CTYPE> chooses the character type
	1121	locale. In the absence of both C<LC_ALL> and C<LC_CTYPE>, C<LANG>
	1122	chooses the character type locale.
	1123
	1124	=item LC_COLLATE
	1125
	1126	In the absence of C<LC_ALL>, C<LC_COLLATE> chooses the collation
	1127	(sorting) locale. In the absence of both C<LC_ALL> and C<LC_COLLATE>,
	1128	C<LANG> chooses the collation locale.
	1129
	1130	=item LC_MONETARY
	1131
	1132	In the absence of C<LC_ALL>, C<LC_MONETARY> chooses the monetary
	1133	formatting locale. In the absence of both C<LC_ALL> and C<LC_MONETARY>,
	1134	C<LANG> chooses the monetary formatting locale.
	1135
	1136	=item LC_NUMERIC
	1137
	1138	In the absence of C<LC_ALL>, C<LC_NUMERIC> chooses the numeric format
	1139	locale. In the absence of both C<LC_ALL> and C<LC_NUMERIC>, C<LANG>
	1140	chooses the numeric format.
	1141
	1142	=item LC_TIME
	1143
	1144	In the absence of C<LC_ALL>, C<LC_TIME> chooses the date and time
	1145	formatting locale. In the absence of both C<LC_ALL> and C<LC_TIME>,
	1146	C<LANG> chooses the date and time formatting locale.
	1147
	1148	=item LANG
	1149
	1150	C<LANG> is the "catch-all" locale environment variable. If it is set, it
	1151	is used as the last resort after the overall C<LC_ALL> and the
	1152	category-specific C<LC_...>.
	1153
	1154	=back
	1155
	1156	=head2 Examples
	1157
	1158	The LC_NUMERIC controls the numeric output:
	1159
	1160	use locale;
	1161	use POSIX qw(locale_h); # Imports setlocale() and the LC_ constants.
	1162	setlocale(LC_NUMERIC, "fr_FR") or die "Pardon";
	1163	printf "%g\n", 1.23; # If the "fr_FR" succeeded, probably shows 1,23.
	1164
	1165	and also how strings are parsed by C<POSIX::strtod()> as numbers:
	1166
	1167	use locale;
	1168	use POSIX qw(locale_h strtod);
	1169	setlocale(LC_NUMERIC, "de_DE") or die "Entschuldigung";
	1170	my $x = strtod("2,34") + 5;
	1171	print $x, "\n"; # Probably shows 7,34.
	1172
	1173	=head1 NOTES
	1174
	1175	=head2 String C<eval> and C<LC_NUMERIC>
	1176
	1177	A string L<eval\|perlfunc/eval EXPR> parses its expression as standard
	1178	Perl. It is therefore expecting the decimal point to be a dot. If
	1179	C<LC_NUMERIC> is set to have this be a comma instead, the parsing will
	1180	be confused, perhaps silently.
	1181
	1182	use locale;
	1183	use POSIX qw(locale_h);
	1184	setlocale(LC_NUMERIC, "fr_FR") or die "Pardon";
	1185	my $a = 1.2;
	1186	print eval "$a + 1.5";
	1187	print "\n";
	1188
	1189	prints C<13,5>. This is because in that locale, the comma is the
	1190	decimal point character. The C<eval> thus expands to:
	1191
	1192	eval "1,2 + 1.5"
	1193
	1194	and the result is not what you likely expected. No warnings are
	1195	generated. If you do string C<eval>'s within the scope of
	1196	S<C<use locale>>, you should instead change the C<eval> line to do
	1197	something like:
	1198
	1199	print eval "no locale; $a + 1.5";
	1200
	1201	This prints C<2.7>.
	1202
	1203	=head2 Backward compatibility
	1204
	1205	Versions of Perl prior to 5.004 B<mostly> ignored locale information,
	1206	generally behaving as if something similar to the C<"C"> locale were
	1207	always in force, even if the program environment suggested otherwise
	1208	(see L<The setlocale function>). By default, Perl still behaves this
	1209	way for backward compatibility. If you want a Perl application to pay
	1210	attention to locale information, you B<must> use the S<C<use locale>>
	1211	pragma (see L<The use locale pragma>) or, in the unlikely event
	1212	that you want to do so for just pattern matching, the
	1213	C</l> regular expression modifier (see L<perlre/Character set
	1214	modifiers>) to instruct it to do so.
	1215
	1216	Versions of Perl from 5.002 to 5.003 did use the C<LC_CTYPE>
	1217	information if available; that is, C<\w> did understand what
	1218	were the letters according to the locale environment variables.
	1219	The problem was that the user had no control over the feature:
	1220	if the C library supported locales, Perl used them.
	1221
	1222	=head2 I18N:Collate obsolete
	1223
	1224	In versions of Perl prior to 5.004, per-locale collation was possible
	1225	using the C<I18N::Collate> library module. This module is now mildly
	1226	obsolete and should be avoided in new applications. The C<LC_COLLATE>
	1227	functionality is now integrated into the Perl core language: One can
	1228	use locale-specific scalar data completely normally with C<use locale>,
	1229	so there is no longer any need to juggle with the scalar references of
	1230	C<I18N::Collate>.
	1231
	1232	=head2 Sort speed and memory use impacts
	1233
	1234	Comparing and sorting by locale is usually slower than the default
	1235	sorting; slow-downs of two to four times have been observed. It will
	1236	also consume more memory: once a Perl scalar variable has participated
	1237	in any string comparison or sorting operation obeying the locale
	1238	collation rules, it will take 3-15 times more memory than before. (The
	1239	exact multiplier depends on the string's contents, the operating system
	1240	and the locale.) These downsides are dictated more by the operating
	1241	system's implementation of the locale system than by Perl.
	1242
	1243	=head2 Freely available locale definitions
	1244
	1245	The Unicode CLDR project extracts the POSIX portion of many of its
	1246	locales, available at
	1247
	1248	http://unicode.org/Public/cldr/latest/
	1249
	1250	There is a large collection of locale definitions at:
	1251
	1252	http://std.dkuug.dk/i18n/WG15-collection/locales/
	1253
	1254	You should be aware that it is
	1255	unsupported, and is not claimed to be fit for any purpose. If your
	1256	system allows installation of arbitrary locales, you may find the
	1257	definitions useful as they are, or as a basis for the development of
	1258	your own locales.
	1259
	1260	=head2 I18n and l10n
	1261
	1262	"Internationalization" is often abbreviated as B<i18n> because its first
	1263	and last letters are separated by eighteen others. (You may guess why
	1264	the internalin ... internaliti ... i18n tends to get abbreviated.) In
	1265	the same way, "localization" is often abbreviated to B<l10n>.
	1266
	1267	=head2 An imperfect standard
	1268
	1269	Internationalization, as defined in the C and POSIX standards, can be
	1270	criticized as incomplete, ungainly, and having too large a granularity.
	1271	(Locales apply to a whole process, when it would arguably be more useful
	1272	to have them apply to a single thread, window group, or whatever.) They
	1273	also have a tendency, like standards groups, to divide the world into
	1274	nations, when we all know that the world can equally well be divided
	1275	into bankers, bikers, gamers, and so on.
	1276
	1277	=head1 Unicode and UTF-8
	1278
	1279	The support of Unicode is new starting from Perl version v5.6, and more fully
	1280	implemented in versions v5.8 and later. See L<perluniintro>. It is
	1281	strongly recommended that when combining Unicode and locale (starting in
	1282	v5.16), you use
	1283
	1284	use locale ':not_characters';
	1285
	1286	When this form of the pragma is used, only the non-character portions of
	1287	locales are used by Perl, for example C<LC_NUMERIC>. Perl assumes that
	1288	you have translated all the characters it is to operate on into Unicode
	1289	(actually the platform's native character set (ASCII or EBCDIC) plus
	1290	Unicode). For data in files, this can conveniently be done by also
	1291	specifying
	1292
	1293	use open ':locale';
	1294
	1295	This pragma arranges for all inputs from files to be translated into
	1296	Unicode from the current locale as specified in the environment (see
	1297	L</ENVIRONMENT>), and all outputs to files to be translated back
	1298	into the locale. (See L<open>). On a per-filehandle basis, you can
	1299	instead use the L<PerlIO::locale> module, or the L<Encode::Locale>
	1300	module, both available from CPAN. The latter module also has methods to
	1301	ease the handling of C<ARGV> and environment variables, and can be used
	1302	on individual strings. Also, if you know that all your locales will be
	1303	UTF-8, as many are these days, you can use the L<B<-C>\|perlrun/-C>
	1304	command line switch.
	1305
	1306	This form of the pragma allows essentially seamless handling of locales
	1307	with Unicode. The collation order will be Unicode's. It is strongly
	1308	recommended that when you need to order and sort strings that you use
	1309	the standard module L<Unicode::Collate> which gives much better results
	1310	in many instances than you can get with the old-style locale handling.
	1311
	1312	For pre-v5.16 Perls, or if you use the locale pragma without the
	1313	C<:not_characters> parameter, Perl tries to work with both Unicode and
	1314	locales--but there are problems.
	1315
	1316	Perl does not handle multi-byte locales in this case, such as have been
	1317	used for various
	1318	Asian languages, such as Big5 or Shift JIS. However, the increasingly
	1319	common multi-byte UTF-8 locales, if properly implemented, may work
	1320	reasonably well (depending on your C library implementation) in this
	1321	form of the locale pragma, simply because both
	1322	they and Perl store characters that take up multiple bytes the same way.
	1323	However, some, if not most, C library implementations may not process
	1324	the characters in the upper half of the Latin-1 range (128 - 255)
	1325	properly under LC_CTYPE. To see if a character is a particular type
	1326	under a locale, Perl uses the functions like C<isalnum()>. Your C
	1327	library may not work for UTF-8 locales with those functions, instead
	1328	only working under the newer wide library functions like C<iswalnum()>.
	1329
	1330	Perl generally takes the tack to use locale rules on code points that can fit
	1331	in a single byte, and Unicode rules for those that can't (though this
	1332	isn't uniformly applied, see the note at the end of this section). This
	1333	prevents many problems in locales that aren't UTF-8. Suppose the locale
	1334	is ISO8859-7, Greek. The character at 0xD7 there is a capital Chi. But
	1335	in the ISO8859-1 locale, Latin1, it is a multiplication sign. The POSIX
	1336	regular expression character class C<[[:alpha:]]> will magically match
	1337	0xD7 in the Greek locale but not in the Latin one.
	1338
	1339	However, there are places where this breaks down. Certain Perl constructs are
	1340	for Unicode only, such as C<\p{Alpha}>. They assume that 0xD7 always has its
	1341	Unicode meaning (or the equivalent on EBCDIC platforms). Since Latin1 is a
	1342	subset of Unicode and 0xD7 is the multiplication sign in both Latin1 and
	1343	Unicode, C<\p{Alpha}> will never match it, regardless of locale. A similar
	1344	issue occurs with C<\N{...}>. It is therefore a bad idea to use C<\p{}> or
	1345	C<\N{}> under plain C<use locale>--I<unless> you can guarantee that the
	1346	locale will be a ISO8859-1. Use POSIX character classes instead.
	1347
	1348	Another problem with this approach is that operations that cross the
	1349	single byte/multiple byte boundary are not well-defined, and so are
	1350	disallowed. (This boundary is between the codepoints at 255/256.).
	1351	For example, lower casing LATIN CAPITAL LETTER Y WITH DIAERESIS (U+0178)
	1352	should return LATIN SMALL LETTER Y WITH DIAERESIS (U+00FF). But in the
	1353	Greek locale, for example, there is no character at 0xFF, and Perl
	1354	has no way of knowing what the character at 0xFF is really supposed to
	1355	represent. Thus it disallows the operation. In this mode, the
	1356	lowercase of U+0178 is itself.
	1357
	1358	The same problems ensue if you enable automatic UTF-8-ification of your
	1359	standard file handles, default C<open()> layer, and C<@ARGV> on non-ISO8859-1,
	1360	non-UTF-8 locales (by using either the B<-C> command line switch or the
	1361	C<PERL_UNICODE> environment variable; see L<perlrun>).
	1362	Things are read in as UTF-8, which would normally imply a Unicode
	1363	interpretation, but the presence of a locale causes them to be interpreted
	1364	in that locale instead. For example, a 0xD7 code point in the Unicode
	1365	input, which should mean the multiplication sign, won't be interpreted by
	1366	Perl that way under the Greek locale. This is not a problem
	1367	I<provided> you make certain that all locales will always and only be either
	1368	an ISO8859-1, or, if you don't have a deficient C library, a UTF-8 locale.
	1369
	1370	Still another problem is that this approach can lead to two code
	1371	points meaning the same character. Thus in a Greek locale, both U+03A7
	1372	and U+00D7 are GREEK CAPITAL LETTER CHI.
	1373
	1374	Vendor locales are notoriously buggy, and it is difficult for Perl to test
	1375	its locale-handling code because this interacts with code that Perl has no
	1376	control over; therefore the locale-handling code in Perl may be buggy as
	1377	well. (However, the Unicode-supplied locales should be better, and
	1378	there is a feed back mechanism to correct any problems. See
	1379	L</Freely available locale definitions>.)
	1380
	1381	If you have Perl v5.16, the problems mentioned above go away if you use
	1382	the C<:not_characters> parameter to the locale pragma (except for vendor
	1383	bugs in the non-character portions). If you don't have v5.16, and you
	1384	I<do> have locales that work, using them may be worthwhile for certain
	1385	specific purposes, as long as you keep in mind the gotchas already
	1386	mentioned. For example, if the collation for your locales works, it
	1387	runs faster under locales than under L<Unicode::Collate>; and you gain
	1388	access to such things as the local currency symbol and the names of the
	1389	months and days of the week. (But to hammer home the point, in v5.16,
	1390	you get this access without the downsides of locales by using the
	1391	C<:not_characters> form of the pragma.)
	1392
	1393	Note: The policy of using locale rules for code points that can fit in a
	1394	byte, and Unicode rules for those that can't is not uniformly applied.
	1395	Pre-v5.12, it was somewhat haphazard; in v5.12 it was applied fairly
	1396	consistently to regular expression matching except for bracketed
	1397	character classes; in v5.14 it was extended to all regex matches; and in
	1398	v5.16 to the casing operations such as C<"\L"> and C<uc()>. For
	1399	collation, in all releases, the system's C<strxfrm()> function is called,
	1400	and whatever it does is what you get.
	1401
	1402	=head1 BUGS
	1403
	1404	=head2 Broken systems
	1405
	1406	In certain systems, the operating system's locale support
	1407	is broken and cannot be fixed or used by Perl. Such deficiencies can
	1408	and will result in mysterious hangs and/or Perl core dumps when
	1409	C<use locale> is in effect. When confronted with such a system,
	1410	please report in excruciating detail to <F<perlbug@perl.org>>, and
	1411	also contact your vendor: bug fixes may exist for these problems
	1412	in your operating system. Sometimes such bug fixes are called an
	1413	operating system upgrade. If you have the source for Perl, include in
	1414	the perlbug email the output of the test described above in L</Testing
	1415	for broken locales>.
	1416
	1417	=head1 SEE ALSO
	1418
	1419	L<I18N::Langinfo>, L<perluniintro>, L<perlunicode>, L<open>,
	1420	L<POSIX/isalnum>, L<POSIX/isalpha>,
	1421	L<POSIX/isdigit>, L<POSIX/isgraph>, L<POSIX/islower>,
	1422	L<POSIX/isprint>, L<POSIX/ispunct>, L<POSIX/isspace>,
	1423	L<POSIX/isupper>, L<POSIX/isxdigit>, L<POSIX/localeconv>,
	1424	L<POSIX/setlocale>, L<POSIX/strcoll>, L<POSIX/strftime>,
	1425	L<POSIX/strtod>, L<POSIX/strxfrm>.
	1426
	1427	For special considerations when Perl is embedded in a C program,
	1428	see L<perlembed/Using embedded Perl with POSIX locales>.
	1429
	1430	=head1 HISTORY
	1431
	1432	Jarkko Hietaniemi's original F<perli18n.pod> heavily hacked by Dominic
	1433	Dunlop, assisted by the perl5-porters. Prose worked over a bit by
	1434	Tom Christiansen, and updated by Perl 5 porters.