perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=encoding utf8
	2
	3	=head1 NAME
	4
	5	perllocale - Perl locale handling (internationalization and localization)
	6
	7	=head1 DESCRIPTION
	8
	9	In the beginning there was ASCII, the "American Standard Code for
	10	Information Interchange", which works quite well for Americans with
	11	their English alphabet and dollar-denominated currency. But it doesn't
	12	work so well even for other English speakers, who may use different
	13	currencies, such as the pound sterling (as the symbol for that currency
	14	is not in ASCII); and it's hopelessly inadequate for many of the
	15	thousands of the world's other languages.
	16
	17	To address these deficiencies, the concept of locales was invented
	18	(formally the ISO C, XPG4, POSIX 1.c "locale system"). And applications
	19	were and are being written that use the locale mechanism. The process of
	20	making such an application take account of its users' preferences in
	21	these kinds of matters is called B<internationalization> (often
	22	abbreviated as B<i18n>); telling such an application about a particular
	23	set of preferences is known as B<localization> (B<l10n>).
	24
	25	Perl has been extended to support certain types of locales available in
	26	the locale system. This is controlled per application by using one
	27	pragma, one function call, and several environment variables.
	28
	29	Perl supports single-byte locales that are supersets of ASCII, such as
	30	the ISO 8859 ones, and one multi-byte-type locale, UTF-8 ones, described
	31	in the next paragraph. Perl doesn't support any other multi-byte
	32	locales, such as the ones for East Asian languages.
	33
	34	Unfortunately, there are quite a few deficiencies with the design (and
	35	often, the implementations) of locales. Unicode was invented (see
	36	L<perlunitut> for an introduction to that) in part to address these
	37	design deficiencies, and nowadays, there is a series of "UTF-8
	38	locales", based on Unicode. These are locales whose character set is
	39	Unicode, encoded in UTF-8. Starting in v5.20, Perl fully supports
	40	UTF-8 locales, except for sorting and string comparisons like C<lt> and
	41	C<ge>. Starting in v5.26, Perl can handle these reasonably as well,
	42	depending on the platform's implementation. However, for earlier
	43	releases or for better control, use L<Unicode::Collate>. There are
	44	actually two slightly different types of UTF-8 locales: one for Turkic
	45	languages and one for everything else.
	46
	47	Starting in Perl v5.30, Perl detects Turkic locales by their
	48	behaviour, and seamlessly handles both types; previously only the
	49	non-Turkic one was supported. The name of the locale is ignored, if
	50	your system has a C<tr_TR.UTF-8> locale and it doesn't behave like a
	51	Turkic locale, perl will treat it like a non-Turkic locale.
	52
	53	Perl continues to support the old non UTF-8 locales as well. There are
	54	currently no UTF-8 locales for EBCDIC platforms.
	55
	56	(Unicode is also creating C<CLDR>, the "Common Locale Data Repository",
	57	L<http://cldr.unicode.org/> which includes more types of information than
	58	are available in the POSIX locale system. At the time of this writing,
	59	there was no CPAN module that provides access to this XML-encoded data.
	60	However, it is possible to compute the POSIX locale data from them, and
	61	earlier CLDR versions had these already extracted for you as UTF-8 locales
	62	L<http://unicode.org/Public/cldr/2.0.1/>.)
	63
	64	=head1 WHAT IS A LOCALE
	65
	66	A locale is a set of data that describes various aspects of how various
	67	communities in the world categorize their world. These categories are
	68	broken down into the following types (some of which include a brief
	69	note here):
	70
	71	=over
	72
	73	=item Category C<LC_NUMERIC>: Numeric formatting
	74
	75	This indicates how numbers should be formatted for human readability,
	76	for example the character used as the decimal point.
	77
	78	=item Category C<LC_MONETARY>: Formatting of monetary amounts
	79
	80	Z<>
	81
	82	=item Category C<LC_TIME>: Date/Time formatting
	83
	84	Z<>
	85
	86	=item Category C<LC_MESSAGES>: Error and other messages
	87
	88	This is used by Perl itself only for accessing operating system error
	89	messages via L<$!\|perlvar/$ERRNO> and L<$^E\|perlvar/$EXTENDED_OS_ERROR>.
	90
	91	=item Category C<LC_COLLATE>: Collation
	92
	93	This indicates the ordering of letters for comparison and sorting.
	94	In Latin alphabets, for example, "b", generally follows "a".
	95
	96	=item Category C<LC_CTYPE>: Character Types
	97
	98	This indicates, for example if a character is an uppercase letter.
	99
	100	=item Other categories
	101
	102	Some platforms have other categories, dealing with such things as
	103	measurement units and paper sizes. None of these are used directly by
	104	Perl, but outside operations that Perl interacts with may use
	105	these. See L</Not within the scope of "use locale"> below.
	106
	107	=back
	108
	109	More details on the categories used by Perl are given below in L</LOCALE
	110	CATEGORIES>.
	111
	112	Together, these categories go a long way towards being able to customize
	113	a single program to run in many different locations. But there are
	114	deficiencies, so keep reading.
	115
	116	=head1 PREPARING TO USE LOCALES
	117
	118	Perl itself (outside the L<POSIX> module) will not use locales unless
	119	specifically requested to (but
	120	again note that Perl may interact with code that does use them). Even
	121	if there is such a request, B<all> of the following must be true
	122	for it to work properly:
	123
	124	=over 4
	125
	126	=item *
	127
	128	B<Your operating system must support the locale system>. If it does,
	129	you should find that the C<setlocale()> function is a documented part of
	130	its C library.
	131
	132	=item *
	133
	134	B<Definitions for locales that you use must be installed>. You, or
	135	your system administrator, must make sure that this is the case. The
	136	available locales, the location in which they are kept, and the manner
	137	in which they are installed all vary from system to system. Some systems
	138	provide only a few, hard-wired locales and do not allow more to be
	139	added. Others allow you to add "canned" locales provided by the system
	140	supplier. Still others allow you or the system administrator to define
	141	and add arbitrary locales. (You may have to ask your supplier to
	142	provide canned locales that are not delivered with your operating
	143	system.) Read your system documentation for further illumination.
	144
	145	=item *
	146
	147	B<Perl must believe that the locale system is supported>. If it does,
	148	C<perl -V:d_setlocale> will say that the value for C<d_setlocale> is
	149	C<define>.
	150
	151	=back
	152
	153	If you want a Perl application to process and present your data
	154	according to a particular locale, the application code should include
	155	the S<C<use locale>> pragma (see L</The "use locale" pragma>) where
	156	appropriate, and B<at least one> of the following must be true:
	157
	158	=over 4
	159
	160	=item 1
	161
	162	B<The locale-determining environment variables (see L</"ENVIRONMENT">)
	163	must be correctly set up> at the time the application is started, either
	164	by yourself or by whomever set up your system account; or
	165
	166	=item 2
	167
	168	B<The application must set its own locale> using the method described in
	169	L</The setlocale function>.
	170
	171	=back
	172
	173	=head1 USING LOCALES
	174
	175	=head2 The C<"use locale"> pragma
	176
	177	Starting in Perl 5.28, this pragma may be used in
	178	L<multi-threaded\|threads> applications on systems that have thread-safe
	179	locale ability. Some caveats apply, see L</Multi-threaded> below. On
	180	systems without this capability, or in earlier Perls, do NOT use this
	181	pragma in scripts that have multiple L<threads\|threads> active. The
	182	locale in these cases is not local to a single thread. Another thread
	183	may change the locale at any time, which could cause at a minimum that a
	184	given thread is operating in a locale it isn't expecting to be in. On
	185	some platforms, segfaults can also occur. The locale change need not be
	186	explicit; some operations cause perl to change the locale itself. You
	187	are vulnerable simply by having done a S<C<"use locale">>.
	188
	189	By default, Perl itself (outside the L<POSIX> module)
	190	ignores the current locale. The S<C<use locale>>
	191	pragma tells Perl to use the current locale for some operations.
	192	Starting in v5.16, there are optional parameters to this pragma,
	193	described below, which restrict which operations are affected by it.
	194
	195	The current locale is set at execution time by
	196	L<setlocale()\|/The setlocale function> described below. If that function
	197	hasn't yet been called in the course of the program's execution, the
	198	current locale is that which was determined by the L</"ENVIRONMENT"> in
	199	effect at the start of the program.
	200	If there is no valid environment, the current locale is whatever the
	201	system default has been set to. On POSIX systems, it is likely, but
	202	not necessarily, the "C" locale. On Windows, the default is set via the
	203	computer's S<C<Control Panel-E<gt>Regional and Language Options>> (or its
	204	current equivalent).
	205
	206	The operations that are affected by locale are:
	207
	208	=over 4
	209
	210	=item B<Not within the scope of C<"use locale">>
	211
	212	Only certain operations (all originating outside Perl) should be
	213	affected, as follows:
	214
	215	=over 4
	216
	217	=item *
	218
	219	The current locale is used when going outside of Perl with
	220	operations like L<system()\|perlfunc/system LIST> or
	221	L<qxE<sol>E<sol>\|perlop/qxE<sol>STRINGE<sol>>, if those operations are
	222	locale-sensitive.
	223
	224	=item *
	225
	226	Also Perl gives access to various C library functions through the
	227	L<POSIX> module. Some of those functions are always affected by the
	228	current locale. For example, C<POSIX::strftime()> uses C<LC_TIME>;
	229	C<POSIX::strtod()> uses C<LC_NUMERIC>; C<POSIX::strcoll()> and
	230	C<POSIX::strxfrm()> use C<LC_COLLATE>. All such functions
	231	will behave according to the current underlying locale, even if that
	232	locale isn't exposed to Perl space.
	233
	234	This applies as well to L<I18N::Langinfo>.
	235
	236	=item *
	237
	238	XS modules for all categories but C<LC_NUMERIC> get the underlying
	239	locale, and hence any C library functions they call will use that
	240	underlying locale. For more discussion, see L<perlxs/CAVEATS>.
	241
	242	=back
	243
	244	Note that all C programs (including the perl interpreter, which is
	245	written in C) always have an underlying locale. That locale is the "C"
	246	locale unless changed by a call to L<setlocale()\|/The setlocale
	247	function>. When Perl starts up, it changes the underlying locale to the
	248	one which is indicated by the L</ENVIRONMENT>. When using the L<POSIX>
	249	module or writing XS code, it is important to keep in mind that the
	250	underlying locale may be something other than "C", even if the program
	251	hasn't explicitly changed it.
	252
	253	Z<>
	254
	255	=item B<Lingering effects of C<S<use locale>>>
	256
	257	Certain Perl operations that are set-up within the scope of a
	258	C<use locale> retain that effect even outside the scope.
	259	These include:
	260
	261	=over 4
	262
	263	=item *
	264
	265	The output format of a L<write()\|perlfunc/write> is determined by an
	266	earlier format declaration (L<perlfunc/format>), so whether or not the
	267	output is affected by locale is determined by if the C<format()> is
	268	within the scope of a C<use locale>, not whether the C<write()>
	269	is.
	270
	271	=item *
	272
	273	Regular expression patterns can be compiled using
	274	L<qrE<sol>E<sol>\|perlop/qrE<sol>STRINGE<sol>msixpodualn> with actual
	275	matching deferred to later. Again, it is whether or not the compilation
	276	was done within the scope of C<use locale> that determines the match
	277	behavior, not if the matches are done within such a scope or not.
	278
	279	=back
	280
	281	Z<>
	282
	283	=item B<Under C<"use locale";>>
	284
	285	=over 4
	286
	287	=item *
	288
	289	All the above operations
	290
	291	=item *
	292
	293	B<Format declarations> (L<perlfunc/format>) and hence any subsequent
	294	C<write()>s use C<LC_NUMERIC>.
	295
	296	=item *
	297
	298	B<stringification and output> use C<LC_NUMERIC>.
	299	These include the results of
	300	C<print()>,
	301	C<printf()>,
	302	C<say()>,
	303	and
	304	C<sprintf()>.
	305
	306	=item *
	307
	308	B<The comparison operators> (C<lt>, C<le>, C<cmp>, C<ge>, and C<gt>) use
	309	C<LC_COLLATE>. C<sort()> is also affected if used without an
	310	explicit comparison function, because it uses C<cmp> by default.
	311
	312	B<Note:> C<eq> and C<ne> are unaffected by locale: they always
	313	perform a char-by-char comparison of their scalar operands. What's
	314	more, if C<cmp> finds that its operands are equal according to the
	315	collation sequence specified by the current locale, it goes on to
	316	perform a char-by-char comparison, and only returns I<0> (equal) if the
	317	operands are char-for-char identical. If you really want to know whether
	318	two strings--which C<eq> and C<cmp> may consider different--are equal
	319	as far as collation in the locale is concerned, see the discussion in
	320	L</Category C<LC_COLLATE>: Collation>.
	321
	322	=item *
	323
	324	B<Regular expressions and case-modification functions> (C<uc()>, C<lc()>,
	325	C<ucfirst()>, and C<lcfirst()>) use C<LC_CTYPE>
	326
	327	=item *
	328
	329	B<The variables L<$!\|perlvar/$ERRNO>> (and its synonyms C<$ERRNO> and
	330	C<$OS_ERROR>) B<and L<$^E\|perlvar/$EXTENDED_OS_ERROR>> (and its synonym
	331	C<$EXTENDED_OS_ERROR>) when used as strings use C<LC_MESSAGES>.
	332
	333	=back
	334
	335	=back
	336
	337	The default behavior is restored with the S<C<no locale>> pragma, or
	338	upon reaching the end of the block enclosing C<use locale>.
	339	Note that C<use locale> calls may be
	340	nested, and that what is in effect within an inner scope will revert to
	341	the outer scope's rules at the end of the inner scope.
	342
	343	The string result of any operation that uses locale
	344	information is tainted, as it is possible for a locale to be
	345	untrustworthy. See L</"SECURITY">.
	346
	347	Starting in Perl v5.16 in a very limited way, and more generally in
	348	v5.22, you can restrict which category or categories are enabled by this
	349	particular instance of the pragma by adding parameters to it. For
	350	example,
	351
	352	use locale qw(:ctype :numeric);
	353
	354	enables locale awareness within its scope of only those operations
	355	(listed above) that are affected by C<LC_CTYPE> and C<LC_NUMERIC>.
	356
	357	The possible categories are: C<:collate>, C<:ctype>, C<:messages>,
	358	C<:monetary>, C<:numeric>, C<:time>, and the pseudo category
	359	C<:characters> (described below).
	360
	361	Thus you can say
	362
	363	use locale ':messages';
	364
	365	and only L<$!\|perlvar/$ERRNO> and L<$^E\|perlvar/$EXTENDED_OS_ERROR>
	366	will be locale aware. Everything else is unaffected.
	367
	368	Since Perl doesn't currently do anything with the C<LC_MONETARY>
	369	category, specifying C<:monetary> does effectively nothing. Some
	370	systems have other categories, such as C<LC_PAPER>, but Perl
	371	also doesn't do anything with them, and there is no way to specify
	372	them in this pragma's arguments.
	373
	374	You can also easily say to use all categories but one, by either, for
	375	example,
	376
	377	use locale ':!ctype';
	378	use locale ':not_ctype';
	379
	380	both of which mean to enable locale awarness of all categories but
	381	C<LC_CTYPE>. Only one category argument may be specified in a
	382	S<C<use locale>> if it is of the negated form.
	383
	384	Prior to v5.22 only one form of the pragma with arguments is available:
	385
	386	use locale ':not_characters';
	387
	388	(and you have to say C<not_>; you can't use the bang C<!> form). This
	389	pseudo category is a shorthand for specifying both C<:collate> and
	390	C<:ctype>. Hence, in the negated form, it is nearly the same thing as
	391	saying
	392
	393	use locale qw(:messages :monetary :numeric :time);
	394
	395	We use the term "nearly", because C<:not_characters> also turns on
	396	S<C<use feature 'unicode_strings'>> within its scope. This form is
	397	less useful in v5.20 and later, and is described fully in
	398	L</Unicode and UTF-8>, but briefly, it tells Perl to not use the
	399	character portions of the locale definition, that is the C<LC_CTYPE> and
	400	C<LC_COLLATE> categories. Instead it will use the native character set
	401	(extended by Unicode). When using this parameter, you are responsible
	402	for getting the external character set translated into the
	403	native/Unicode one (which it already will be if it is one of the
	404	increasingly popular UTF-8 locales). There are convenient ways of doing
	405	this, as described in L</Unicode and UTF-8>.
	406
	407	=head2 The setlocale function
	408
	409	WARNING! Prior to Perl 5.28 or on a system that does not support
	410	thread-safe locale operations, do NOT use this function in a
	411	L<thread\|threads>. The locale will change in all other threads at the
	412	same time, and should your thread get paused by the operating system,
	413	and another started, that thread will not have the locale it is
	414	expecting. On some platforms, there can be a race leading to segfaults
	415	if two threads call this function nearly simultaneously. On unthreaded
	416	builds, or on Perl 5.28 and later on thread-safe systems, this warning
	417	does not apply.
	418
	419	You can switch locales as often as you wish at run time with the
	420	C<POSIX::setlocale()> function:
	421
	422	# Import locale-handling tool set from POSIX module.
	423	# This example uses: setlocale -- the function call
	424	# LC_CTYPE -- explained below
	425	# (Showing the testing for success/failure of operations is
	426	# omitted in these examples to avoid distracting from the main
	427	# point)
	428
	429	use POSIX qw(locale_h);
	430	use locale;
	431	my $old_locale;
	432
	433	# query and save the old locale
	434	$old_locale = setlocale(LC_CTYPE);
	435
	436	setlocale(LC_CTYPE, "fr_CA.ISO8859-1");
	437	# LC_CTYPE now in locale "French, Canada, codeset ISO 8859-1"
	438
	439	setlocale(LC_CTYPE, "");
	440	# LC_CTYPE now reset to the default defined by the
	441	# LC_ALL/LC_CTYPE/LANG environment variables, or to the system
	442	# default. See below for documentation.
	443
	444	# restore the old locale
	445	setlocale(LC_CTYPE, $old_locale);
	446
	447	The first argument of C<setlocale()> gives the B<category>, the second the
	448	B<locale>. The category tells in what aspect of data processing you
	449	want to apply locale-specific rules. Category names are discussed in
	450	L</LOCALE CATEGORIES> and L</"ENVIRONMENT">. The locale is the name of a
	451	collection of customization information corresponding to a particular
	452	combination of language, country or territory, and codeset. Read on for
	453	hints on the naming of locales: not all systems name locales as in the
	454	example.
	455
	456	If no second argument is provided and the category is something other
	457	than C<LC_ALL>, the function returns a string naming the current locale
	458	for the category. You can use this value as the second argument in a
	459	subsequent call to C<setlocale()>, B<but> on some platforms the string
	460	is opaque, not something that most people would be able to decipher as
	461	to what locale it means.
	462
	463	If no second argument is provided and the category is C<LC_ALL>, the
	464	result is implementation-dependent. It may be a string of
	465	concatenated locale names (separator also implementation-dependent)
	466	or a single locale name. Please consult your L<setlocale(3)> man page for
	467	details.
	468
	469	If a second argument is given and it corresponds to a valid locale,
	470	the locale for the category is set to that value, and the function
	471	returns the now-current locale value. You can then use this in yet
	472	another call to C<setlocale()>. (In some implementations, the return
	473	value may sometimes differ from the value you gave as the second
	474	argument--think of it as an alias for the value you gave.)
	475
	476	As the example shows, if the second argument is an empty string, the
	477	category's locale is returned to the default specified by the
	478	corresponding environment variables. Generally, this results in a
	479	return to the default that was in force when Perl started up: changes
	480	to the environment made by the application after startup may or may not
	481	be noticed, depending on your system's C library.
	482
	483	Note that when a form of C<use locale> that doesn't include all
	484	categories is specified, Perl ignores the excluded categories.
	485
	486	If C<set_locale()> fails for some reason (for example, an attempt to set
	487	to a locale unknown to the system), the locale for the category is not
	488	changed, and the function returns C<undef>.
	489
	490	Starting in Perl 5.28, on multi-threaded perls compiled on systems that
	491	implement POSIX 2008 thread-safe locale operations, this function
	492	doesn't actually call the system C<setlocale>. Instead those
	493	thread-safe operations are used to emulate the C<setlocale> function,
	494	but in a thread-safe manner.
	495
	496	You can force the thread-safe locale operations to always be used (if
	497	available) by recompiling perl with
	498
	499	-Accflags='-DUSE_THREAD_SAFE_LOCALE'
	500
	501	added to your call to F<Configure>.
	502
	503	For further information about the categories, consult L<setlocale(3)>.
	504
	505	=head2 Multi-threaded operation
	506
	507	Beginning in Perl 5.28, multi-threaded locale operation is supported on
	508	systems that implement either the POSIX 2008 or Windows-specific
	509	thread-safe locale operations. Many modern systems, such as various
	510	Unix variants and Darwin do have this.
	511
	512	You can tell if using locales is safe on your system by looking at the
	513	read-only boolean variable C<${^SAFE_LOCALES}>. The value is 1 if the
	514	perl is not threaded, or if it is using thread-safe locale operations.
	515
	516	Thread-safe operations are supported in Windows starting in Visual Studio
	517	2005, and in systems compatible with POSIX 2008. Some platforms claim
	518	to support POSIX 2008, but have buggy implementations, so that the hints
	519	files for compiling to run on them turn off attempting to use
	520	thread-safety. C<${^SAFE_LOCALES}> will be 0 on them.
	521
	522	Be aware that writing a multi-threaded application will not be portable
	523	to a platform which lacks the native thread-safe locale support. On
	524	systems that do have it, you automatically get this behavior for
	525	threaded perls, without having to do anything. If for some reason, you
	526	don't want to use this capability (perhaps the POSIX 2008 support is
	527	buggy on your system), you can manually compile Perl to use the old
	528	non-thread-safe implementation by passing the argument
	529	C<-Accflags='-DNO_THREAD_SAFE_LOCALE'> to F<Configure>.
	530	Except on Windows, this will continue to use certain of the POSIX 2008
	531	functions in some situations. If these are buggy, you can pass the
	532	following to F<Configure> instead or additionally:
	533	C<-Accflags='-DNO_POSIX_2008_LOCALE'>. This will also keep the code
	534	from using thread-safe locales.
	535	C<${^SAFE_LOCALES}> will be 0 on systems that turn off the thread-safe
	536	operations.
	537
	538	Normally on unthreaded builds, the traditional C<setlocale()> is used
	539	and not the thread-safe locale functions. You can force the use of these
	540	on systems that have them by adding the
	541	C<-Accflags='-DUSE_THREAD_SAFE_LOCALE'> to F<Configure>.
	542
	543	The initial program is started up using the locale specified from the
	544	environment, as currently, described in L</ENVIRONMENT>. All newly
	545	created threads start with C<LC_ALL> set to C<"C">>. Each thread may
	546	use C<POSIX::setlocale()> to query or switch its locale at any time,
	547	without affecting any other thread. All locale-dependent operations
	548	automatically use their thread's locale.
	549
	550	This should be completely transparent to any applications written
	551	entirely in Perl (minus a few rarely encountered caveats given in the
	552	L</Multi-threaded> section). Information for XS module writers is given
	553	in L<perlxs/Locale-aware XS code>.
	554
	555	=head2 Finding locales
	556
	557	For locales available in your system, consult also L<setlocale(3)> to
	558	see whether it leads to the list of available locales (search for the
	559	I<SEE ALSO> section). If that fails, try the following command lines:
	560
	561	locale -a
	562
	563	nlsinfo
	564
	565	ls /usr/lib/nls/loc
	566
	567	ls /usr/lib/locale
	568
	569	ls /usr/lib/nls
	570
	571	ls /usr/share/locale
	572
	573	and see whether they list something resembling these
	574
	575	en_US.ISO8859-1 de_DE.ISO8859-1 ru_RU.ISO8859-5
	576	en_US.iso88591 de_DE.iso88591 ru_RU.iso88595
	577	en_US de_DE ru_RU
	578	en de ru
	579	english german russian
	580	english.iso88591 german.iso88591 russian.iso88595
	581	english.roman8 russian.koi8r
	582
	583	Sadly, even though the calling interface for C<setlocale()> has been
	584	standardized, names of locales and the directories where the
	585	configuration resides have not been. The basic form of the name is
	586	I<language_territory>B<.>I<codeset>, but the latter parts after
	587	I<language> are not always present. The I<language> and I<country>
	588	are usually from the standards B<ISO 3166> and B<ISO 639>, the
	589	two-letter abbreviations for the countries and the languages of the
	590	world, respectively. The I<codeset> part often mentions some B<ISO
	591	8859> character set, the Latin codesets. For example, C<ISO 8859-1>
	592	is the so-called "Western European codeset" that can be used to encode
	593	most Western European languages adequately. Again, there are several
	594	ways to write even the name of that one standard. Lamentably.
	595
	596	Two special locales are worth particular mention: "C" and "POSIX".
	597	Currently these are effectively the same locale: the difference is
	598	mainly that the first one is defined by the C standard, the second by
	599	the POSIX standard. They define the B<default locale> in which
	600	every program starts in the absence of locale information in its
	601	environment. (The I<default> default locale, if you will.) Its language
	602	is (American) English and its character codeset ASCII or, rarely, a
	603	superset thereof (such as the "DEC Multinational Character Set
	604	(DEC-MCS)"). B<Warning>. The C locale delivered by some vendors
	605	may not actually exactly match what the C standard calls for. So
	606	beware.
	607
	608	B<NOTE>: Not all systems have the "POSIX" locale (not all systems are
	609	POSIX-conformant), so use "C" when you need explicitly to specify this
	610	default locale.
	611
	612	=head2 LOCALE PROBLEMS
	613
	614	You may encounter the following warning message at Perl startup:
	615
	616	perl: warning: Setting locale failed.
	617	perl: warning: Please check that your locale settings:
	618	LC_ALL = "En_US",
	619	LANG = (unset)
	620	are supported and installed on your system.
	621	perl: warning: Falling back to the standard locale ("C").
	622
	623	This means that your locale settings had C<LC_ALL> set to "En_US" and
	624	LANG exists but has no value. Perl tried to believe you but could not.
	625	Instead, Perl gave up and fell back to the "C" locale, the default locale
	626	that is supposed to work no matter what. (On Windows, it first tries
	627	falling back to the system default locale.) This usually means your
	628	locale settings were wrong, they mention locales your system has never
	629	heard of, or the locale installation in your system has problems (for
	630	example, some system files are broken or missing). There are quick and
	631	temporary fixes to these problems, as well as more thorough and lasting
	632	fixes.
	633
	634	=head2 Testing for broken locales
	635
	636	If you are building Perl from source, the Perl test suite file
	637	F<lib/locale.t> can be used to test the locales on your system.
	638	Setting the environment variable C<PERL_DEBUG_FULL_TEST> to 1
	639	will cause it to output detailed results. For example, on Linux, you
	640	could say
	641
	642	PERL_DEBUG_FULL_TEST=1 ./perl -T -Ilib lib/locale.t > locale.log 2>&1
	643
	644	Besides many other tests, it will test every locale it finds on your
	645	system to see if they conform to the POSIX standard. If any have
	646	errors, it will include a summary near the end of the output of which
	647	locales passed all its tests, and which failed, and why.
	648
	649	=head2 Temporarily fixing locale problems
	650
	651	The two quickest fixes are either to render Perl silent about any
	652	locale inconsistencies or to run Perl under the default locale "C".
	653
	654	Perl's moaning about locale problems can be silenced by setting the
	655	environment variable C<PERL_BADLANG> to "0" or "".
	656	This method really just sweeps the problem under the carpet: you tell
	657	Perl to shut up even when Perl sees that something is wrong. Do not
	658	be surprised if later something locale-dependent misbehaves.
	659
	660	Perl can be run under the "C" locale by setting the environment
	661	variable C<LC_ALL> to "C". This method is perhaps a bit more civilized
	662	than the C<PERL_BADLANG> approach, but setting C<LC_ALL> (or
	663	other locale variables) may affect other programs as well, not just
	664	Perl. In particular, external programs run from within Perl will see
	665	these changes. If you make the new settings permanent (read on), all
	666	programs you run see the changes. See L</"ENVIRONMENT"> for
	667	the full list of relevant environment variables and L</"USING LOCALES">
	668	for their effects in Perl. Effects in other programs are
	669	easily deducible. For example, the variable C<LC_COLLATE> may well affect
	670	your B<sort> program (or whatever the program that arranges "records"
	671	alphabetically in your system is called).
	672
	673	You can test out changing these variables temporarily, and if the
	674	new settings seem to help, put those settings into your shell startup
	675	files. Consult your local documentation for the exact details. For
	676	Bourne-like shells (B<sh>, B<ksh>, B<bash>, B<zsh>):
	677
	678	LC_ALL=en_US.ISO8859-1
	679	export LC_ALL
	680
	681	This assumes that we saw the locale "en_US.ISO8859-1" using the commands
	682	discussed above. We decided to try that instead of the above faulty
	683	locale "En_US"--and in Cshish shells (B<csh>, B<tcsh>)
	684
	685	setenv LC_ALL en_US.ISO8859-1
	686
	687	or if you have the "env" application you can do (in any shell)
	688
	689	env LC_ALL=en_US.ISO8859-1 perl ...
	690
	691	If you do not know what shell you have, consult your local
	692	helpdesk or the equivalent.
	693
	694	=head2 Permanently fixing locale problems
	695
	696	The slower but superior fixes are when you may be able to yourself
	697	fix the misconfiguration of your own environment variables. The
	698	mis(sing)configuration of the whole system's locales usually requires
	699	the help of your friendly system administrator.
	700
	701	First, see earlier in this document about L</Finding locales>. That tells
	702	how to find which locales are really supported--and more importantly,
	703	installed--on your system. In our example error message, environment
	704	variables affecting the locale are listed in the order of decreasing
	705	importance (and unset variables do not matter). Therefore, having
	706	LC_ALL set to "En_US" must have been the bad choice, as shown by the
	707	error message. First try fixing locale settings listed first.
	708
	709	Second, if using the listed commands you see something B<exactly>
	710	(prefix matches do not count and case usually counts) like "En_US"
	711	without the quotes, then you should be okay because you are using a
	712	locale name that should be installed and available in your system.
	713	In this case, see L</Permanently fixing your system's locale configuration>.
	714
	715	=head2 Permanently fixing your system's locale configuration
	716
	717	This is when you see something like:
	718
	719	perl: warning: Please check that your locale settings:
	720	LC_ALL = "En_US",
	721	LANG = (unset)
	722	are supported and installed on your system.
	723
	724	but then cannot see that "En_US" listed by the above-mentioned
	725	commands. You may see things like "en_US.ISO8859-1", but that isn't
	726	the same. In this case, try running under a locale
	727	that you can list and which somehow matches what you tried. The
	728	rules for matching locale names are a bit vague because
	729	standardization is weak in this area. See again the
	730	L</Finding locales> about general rules.
	731
	732	=head2 Fixing system locale configuration
	733
	734	Contact a system administrator (preferably your own) and report the exact
	735	error message you get, and ask them to read this same documentation you
	736	are now reading. They should be able to check whether there is something
	737	wrong with the locale configuration of the system. The L</Finding locales>
	738	section is unfortunately a bit vague about the exact commands and places
	739	because these things are not that standardized.
	740
	741	=head2 The localeconv function
	742
	743	The C<POSIX::localeconv()> function allows you to get particulars of the
	744	locale-dependent numeric formatting information specified by the current
	745	underlying C<LC_NUMERIC> and C<LC_MONETARY> locales (regardless of
	746	whether called from within the scope of C<S<use locale>> or not). (If
	747	you just want the name of
	748	the current locale for a particular category, use C<POSIX::setlocale()>
	749	with a single parameter--see L</The setlocale function>.)
	750
	751	use POSIX qw(locale_h);
	752
	753	# Get a reference to a hash of locale-dependent info
	754	$locale_values = localeconv();
	755
	756	# Output sorted list of the values
	757	for (sort keys %$locale_values) {
	758	printf "%-20s = %s\n", $_, $locale_values->{$_}
	759	}
	760
	761	C<localeconv()> takes no arguments, and returns B<a reference to> a hash.
	762	The keys of this hash are variable names for formatting, such as
	763	C<decimal_point> and C<thousands_sep>. The values are the
	764	corresponding, er, values. See L<POSIX/localeconv> for a longer
	765	example listing the categories an implementation might be expected to
	766	provide; some provide more and others fewer. You don't need an
	767	explicit C<use locale>, because C<localeconv()> always observes the
	768	current locale.
	769
	770	Here's a simple-minded example program that rewrites its command-line
	771	parameters as integers correctly formatted in the current locale:
	772
	773	use POSIX qw(locale_h);
	774
	775	# Get some of locale's numeric formatting parameters
	776	my ($thousands_sep, $grouping) =
	777	@{localeconv()}{'thousands_sep', 'grouping'};
	778
	779	# Apply defaults if values are missing
	780	$thousands_sep = ',' unless $thousands_sep;
	781
	782	# grouping and mon_grouping are packed lists
	783	# of small integers (characters) telling the
	784	# grouping (thousand_seps and mon_thousand_seps
	785	# being the group dividers) of numbers and
	786	# monetary quantities. The integers' meanings:
	787	# 255 means no more grouping, 0 means repeat
	788	# the previous grouping, 1-254 means use that
	789	# as the current grouping. Grouping goes from
	790	# right to left (low to high digits). In the
	791	# below we cheat slightly by never using anything
	792	# else than the first grouping (whatever that is).
	793	if ($grouping) {
	794	@grouping = unpack("C*", $grouping);
	795	} else {
	796	@grouping = (3);
	797	}
	798
	799	# Format command line params for current locale
	800	for (@ARGV) {
	801	$_ = int; # Chop non-integer part
	802	1 while
	803	s/(\d)(\d{$grouping[0]}($\|$thousands_sep))/$1$thousands_sep$2/;
	804	print "$_";
	805	}
	806	print "\n";
	807
	808	Note that if the platform doesn't have C<LC_NUMERIC> and/or
	809	C<LC_MONETARY> available or enabled, the corresponding elements of the
	810	hash will be missing.
	811
	812	=head2 I18N::Langinfo
	813
	814	Another interface for querying locale-dependent information is the
	815	C<I18N::Langinfo::langinfo()> function.
	816
	817	The following example will import the C<langinfo()> function itself and
	818	three constants to be used as arguments to C<langinfo()>: a constant for
	819	the abbreviated first day of the week (the numbering starts from
	820	Sunday = 1) and two more constants for the affirmative and negative
	821	answers for a yes/no question in the current locale.
	822
	823	use I18N::Langinfo qw(langinfo ABDAY_1 YESSTR NOSTR);
	824
	825	my ($abday_1, $yesstr, $nostr)
	826	= map { langinfo } qw(ABDAY_1 YESSTR NOSTR);
	827
	828	print "$abday_1? [$yesstr/$nostr] ";
	829
	830	In other words, in the "C" (or English) locale the above will probably
	831	print something like:
	832
	833	Sun? [yes/no]
	834
	835	See L<I18N::Langinfo> for more information.
	836
	837	=head1 LOCALE CATEGORIES
	838
	839	The following subsections describe basic locale categories. Beyond these,
	840	some combination categories allow manipulation of more than one
	841	basic category at a time. See L</"ENVIRONMENT"> for a discussion of these.
	842
	843	=head2 Category C<LC_COLLATE>: Collation: Text Comparisons and Sorting
	844
	845	In the scope of a S<C<use locale>> form that includes collation, Perl
	846	looks to the C<LC_COLLATE>
	847	environment variable to determine the application's notions on collation
	848	(ordering) of characters. For example, "b" follows "a" in Latin
	849	alphabets, but where do "E<aacute>" and "E<aring>" belong? And while
	850	"color" follows "chocolate" in English, what about in traditional Spanish?
	851
	852	The following collations all make sense and you may meet any of them
	853	if you C<"use locale">.
	854
	855	A B C D E a b c d e
	856	A a B b C c D d E e
	857	a A b B c C d D e E
	858	a b c d e A B C D E
	859
	860	Here is a code snippet to tell what "word"
	861	characters are in the current locale, in that locale's order:
	862
	863	use locale;
	864	print +(sort grep /\w/, map { chr } 0..255), "\n";
	865
	866	Compare this with the characters that you see and their order if you
	867	state explicitly that the locale should be ignored:
	868
	869	no locale;
	870	print +(sort grep /\w/, map { chr } 0..255), "\n";
	871
	872	This machine-native collation (which is what you get unless S<C<use
	873	locale>> has appeared earlier in the same block) must be used for
	874	sorting raw binary data, whereas the locale-dependent collation of the
	875	first example is useful for natural text.
	876
	877	As noted in L</USING LOCALES>, C<cmp> compares according to the current
	878	collation locale when C<use locale> is in effect, but falls back to a
	879	char-by-char comparison for strings that the locale says are equal. You
	880	can use C<POSIX::strcoll()> if you don't want this fall-back:
	881
	882	use POSIX qw(strcoll);
	883	$equal_in_locale =
	884	!strcoll("space and case ignored", "SpaceAndCaseIgnored");
	885
	886	C<$equal_in_locale> will be true if the collation locale specifies a
	887	dictionary-like ordering that ignores space characters completely and
	888	which folds case.
	889
	890	Perl uses the platform's C library collation functions C<strcoll()> and
	891	C<strxfrm()>. That means you get whatever they give. On some
	892	platforms, these functions work well on UTF-8 locales, giving
	893	a reasonable default collation for the code points that are important in
	894	that locale. (And if they aren't working well, the problem may only be
	895	that the locale definition is deficient, so can be fixed by using a
	896	better definition file. Unicode's definitions (see L</Freely available
	897	locale definitions>) provide reasonable UTF-8 locale collation
	898	definitions.) Starting in Perl v5.26, Perl's use of these functions has
	899	been made more seamless. This may be sufficient for your needs. For
	900	more control, and to make sure strings containing any code point (not
	901	just the ones important in the locale) collate properly, the
	902	L<Unicode::Collate> module is suggested.
	903
	904	In non-UTF-8 locales (hence single byte), code points above 0xFF are
	905	technically invalid. But if present, again starting in v5.26, they will
	906	collate to the same position as the highest valid code point does. This
	907	generally gives good results, but the collation order may be skewed if
	908	the valid code point gets special treatment when it forms particular
	909	sequences with other characters as defined by the locale.
	910	When two strings collate identically, the code point order is used as a
	911	tie breaker.
	912
	913	If Perl detects that there are problems with the locale collation order,
	914	it reverts to using non-locale collation rules for that locale.
	915
	916	If you have a single string that you want to check for "equality in
	917	locale" against several others, you might think you could gain a little
	918	efficiency by using C<POSIX::strxfrm()> in conjunction with C<eq>:
	919
	920	use POSIX qw(strxfrm);
	921	$xfrm_string = strxfrm("Mixed-case string");
	922	print "locale collation ignores spaces\n"
	923	if $xfrm_string eq strxfrm("Mixed-casestring");
	924	print "locale collation ignores hyphens\n"
	925	if $xfrm_string eq strxfrm("Mixedcase string");
	926	print "locale collation ignores case\n"
	927	if $xfrm_string eq strxfrm("mixed-case string");
	928
	929	C<strxfrm()> takes a string and maps it into a transformed string for use
	930	in char-by-char comparisons against other transformed strings during
	931	collation. "Under the hood", locale-affected Perl comparison operators
	932	call C<strxfrm()> for both operands, then do a char-by-char
	933	comparison of the transformed strings. By calling C<strxfrm()> explicitly
	934	and using a non locale-affected comparison, the example attempts to save
	935	a couple of transformations. But in fact, it doesn't save anything: Perl
	936	magic (see L<perlguts/Magic Variables>) creates the transformed version of a
	937	string the first time it's needed in a comparison, then keeps this version around
	938	in case it's needed again. An example rewritten the easy way with
	939	C<cmp> runs just about as fast. It also copes with null characters
	940	embedded in strings; if you call C<strxfrm()> directly, it treats the first
	941	null it finds as a terminator. Don't expect the transformed strings
	942	it produces to be portable across systems--or even from one revision
	943	of your operating system to the next. In short, don't call C<strxfrm()>
	944	directly: let Perl do it for you.
	945
	946	Note: C<use locale> isn't shown in some of these examples because it isn't
	947	needed: C<strcoll()> and C<strxfrm()> are POSIX functions
	948	which use the standard system-supplied C<libc> functions that
	949	always obey the current C<LC_COLLATE> locale.
	950
	951	=head2 Category C<LC_CTYPE>: Character Types
	952
	953	In the scope of a S<C<use locale>> form that includes C<LC_CTYPE>, Perl
	954	obeys the C<LC_CTYPE> locale
	955	setting. This controls the application's notion of which characters are
	956	alphabetic, numeric, punctuation, I<etc>. This affects Perl's C<\w>
	957	regular expression metanotation,
	958	which stands for alphanumeric characters--that is, alphabetic,
	959	numeric, and the platform's native underscore.
	960	(Consult L<perlre> for more information about
	961	regular expressions.) Thanks to C<LC_CTYPE>, depending on your locale
	962	setting, characters like "E<aelig>", "E<eth>", "E<szlig>", and
	963	"E<oslash>" may be understood as C<\w> characters.
	964	It also affects things like C<\s>, C<\D>, and the POSIX character
	965	classes, like C<[[:graph:]]>. (See L<perlrecharclass> for more
	966	information on all these.)
	967
	968	The C<LC_CTYPE> locale also provides the map used in transliterating
	969	characters between lower and uppercase. This affects the case-mapping
	970	functions--C<fc()>, C<lc()>, C<lcfirst()>, C<uc()>, and C<ucfirst()>;
	971	case-mapping
	972	interpolation with C<\F>, C<\l>, C<\L>, C<\u>, or C<\U> in double-quoted
	973	strings and C<s///> substitutions; and case-insensitive regular expression
	974	pattern matching using the C<i> modifier.
	975
	976	Starting in v5.20, Perl supports UTF-8 locales for C<LC_CTYPE>, but
	977	otherwise Perl only supports single-byte locales, such as the ISO 8859
	978	series. This means that wide character locales, for example for Asian
	979	languages, are not well-supported. Use of these locales may cause core
	980	dumps. If the platform has the capability for Perl to detect such a
	981	locale, starting in Perl v5.22, L<Perl will warn, default
	982	enabled\|warnings/Category Hierarchy>, using the C<locale> warning
	983	category, whenever such a locale is switched into. The UTF-8 locale
	984	support is actually a
	985	superset of POSIX locales, because it is really full Unicode behavior
	986	as if no C<LC_CTYPE> locale were in effect at all (except for tainting;
	987	see L</SECURITY>). POSIX locales, even UTF-8 ones,
	988	are lacking certain concepts in Unicode, such as the idea that changing
	989	the case of a character could expand to be more than one character.
	990	Perl in a UTF-8 locale, will give you that expansion. Prior to v5.20,
	991	Perl treated a UTF-8 locale on some platforms like an ISO 8859-1 one,
	992	with some restrictions, and on other platforms more like the "C" locale.
	993	For releases v5.16 and v5.18, C<S<use locale 'not_characters>> could be
	994	used as a workaround for this (see L</Unicode and UTF-8>).
	995
	996	Note that there are quite a few things that are unaffected by the
	997	current locale. Any literal character is the native character for the
	998	given platform. Hence 'A' means the character at code point 65 on ASCII
	999	platforms, and 193 on EBCDIC. That may or may not be an 'A' in the
	1000	current locale, if that locale even has an 'A'.
	1001	Similarly, all the escape sequences for particular characters,
	1002	C<\n> for example, always mean the platform's native one. This means,
	1003	for example, that C<\N> in regular expressions (every character
	1004	but new-line) works on the platform character set.
	1005
	1006	Starting in v5.22, Perl will by default warn when switching into a
	1007	locale that redefines any ASCII printable character (plus C<\t> and
	1008	C<\n>) into a different class than expected. This is likely to
	1009	happen on modern locales only on EBCDIC platforms, where, for example,
	1010	a CCSID 0037 locale on a CCSID 1047 machine moves C<"[">, but it can
	1011	happen on ASCII platforms with the ISO 646 and other
	1012	7-bit locales that are essentially obsolete. Things may still work,
	1013	depending on what features of Perl are used by the program. For
	1014	example, in the example from above where C<"\|"> becomes a C<\w>, and
	1015	there are no regular expressions where this matters, the program may
	1016	still work properly. The warning lists all the characters that
	1017	it can determine could be adversely affected.
	1018
	1019	B<Note:> A broken or malicious C<LC_CTYPE> locale definition may result
	1020	in clearly ineligible characters being considered to be alphanumeric by
	1021	your application. For strict matching of (mundane) ASCII letters and
	1022	digits--for example, in command strings--locale-aware applications
	1023	should use C<\w> with the C</a> regular expression modifier. See L</"SECURITY">.
	1024
	1025	=head2 Category C<LC_NUMERIC>: Numeric Formatting
	1026
	1027	After a proper C<POSIX::setlocale()> call, and within the scope of
	1028	of a C<use locale> form that includes numerics, Perl obeys the
	1029	C<LC_NUMERIC> locale information, which controls an application's idea
	1030	of how numbers should be formatted for human readability.
	1031	In most implementations the only effect is to
	1032	change the character used for the decimal point--perhaps from "." to ",".
	1033	The functions aren't aware of such niceties as thousands separation and
	1034	so on. (See L</The localeconv function> if you care about these things.)
	1035
	1036	use POSIX qw(strtod setlocale LC_NUMERIC);
	1037	use locale;
	1038
	1039	setlocale LC_NUMERIC, "";
	1040
	1041	$n = 5/2; # Assign numeric 2.5 to $n
	1042
	1043	$a = " $n"; # Locale-dependent conversion to string
	1044
	1045	print "half five is $n\n"; # Locale-dependent output
	1046
	1047	printf "half five is %g\n", $n; # Locale-dependent output
	1048
	1049	print "DECIMAL POINT IS COMMA\n"
	1050	if $n == (strtod("2,5"))[0]; # Locale-dependent conversion
	1051
	1052	See also L<I18N::Langinfo> and C<RADIXCHAR>.
	1053
	1054	=head2 Category C<LC_MONETARY>: Formatting of monetary amounts
	1055
	1056	The C standard defines the C<LC_MONETARY> category, but not a function
	1057	that is affected by its contents. (Those with experience of standards
	1058	committees will recognize that the working group decided to punt on the
	1059	issue.) Consequently, Perl essentially takes no notice of it. If you
	1060	really want to use C<LC_MONETARY>, you can query its contents--see
	1061	L</The localeconv function>--and use the information that it returns in your
	1062	application's own formatting of currency amounts. However, you may well
	1063	find that the information, voluminous and complex though it may be, still
	1064	does not quite meet your requirements: currency formatting is a hard nut
	1065	to crack.
	1066
	1067	See also L<I18N::Langinfo> and C<CRNCYSTR>.
	1068
	1069	=head2 Category C<LC_TIME>: Respresentation of time
	1070
	1071	Output produced by C<POSIX::strftime()>, which builds a formatted
	1072	human-readable date/time string, is affected by the current C<LC_TIME>
	1073	locale. Thus, in a French locale, the output produced by the C<%B>
	1074	format element (full month name) for the first month of the year would
	1075	be "janvier". Here's how to get a list of long month names in the
	1076	current locale:
	1077
	1078	use POSIX qw(strftime);
	1079	for (0..11) {
	1080	$long_month_name[$_] =
	1081	strftime("%B", 0, 0, 0, 1, $_, 96);
	1082	}
	1083
	1084	Note: C<use locale> isn't needed in this example: C<strftime()> is a POSIX
	1085	function which uses the standard system-supplied C<libc> function that
	1086	always obeys the current C<LC_TIME> locale.
	1087
	1088	See also L<I18N::Langinfo> and C<ABDAY_1>..C<ABDAY_7>, C<DAY_1>..C<DAY_7>,
	1089	C<ABMON_1>..C<ABMON_12>, and C<ABMON_1>..C<ABMON_12>.
	1090
	1091	=head2 Other categories
	1092
	1093	The remaining locale categories are not currently used by Perl itself.
	1094	But again note that things Perl interacts with may use these, including
	1095	extensions outside the standard Perl distribution, and by the
	1096	operating system and its utilities. Note especially that the string
	1097	value of C<$!> and the error messages given by external utilities may
	1098	be changed by C<LC_MESSAGES>. If you want to have portable error
	1099	codes, use C<%!>. See L<Errno>.
	1100
	1101	=head1 SECURITY
	1102
	1103	Although the main discussion of Perl security issues can be found in
	1104	L<perlsec>, a discussion of Perl's locale handling would be incomplete
	1105	if it did not draw your attention to locale-dependent security issues.
	1106	Locales--particularly on systems that allow unprivileged users to
	1107	build their own locales--are untrustworthy. A malicious (or just plain
	1108	broken) locale can make a locale-aware application give unexpected
	1109	results. Here are a few possibilities:
	1110
	1111	=over 4
	1112
	1113	=item *
	1114
	1115	Regular expression checks for safe file names or mail addresses using
	1116	C<\w> may be spoofed by an C<LC_CTYPE> locale that claims that
	1117	characters such as C<"E<gt>"> and C<"\|"> are alphanumeric.
	1118
	1119	=item *
	1120
	1121	String interpolation with case-mapping, as in, say, C<$dest =
	1122	"C:\U$name.$ext">, may produce dangerous results if a bogus C<LC_CTYPE>
	1123	case-mapping table is in effect.
	1124
	1125	=item *
	1126
	1127	A sneaky C<LC_COLLATE> locale could result in the names of students with
	1128	"D" grades appearing ahead of those with "A"s.
	1129
	1130	=item *
	1131
	1132	An application that takes the trouble to use information in
	1133	C<LC_MONETARY> may format debits as if they were credits and vice versa
	1134	if that locale has been subverted. Or it might make payments in US
	1135	dollars instead of Hong Kong dollars.
	1136
	1137	=item *
	1138
	1139	The date and day names in dates formatted by C<strftime()> could be
	1140	manipulated to advantage by a malicious user able to subvert the
	1141	C<LC_DATE> locale. ("Look--it says I wasn't in the building on
	1142	Sunday.")
	1143
	1144	=back
	1145
	1146	Such dangers are not peculiar to the locale system: any aspect of an
	1147	application's environment which may be modified maliciously presents
	1148	similar challenges. Similarly, they are not specific to Perl: any
	1149	programming language that allows you to write programs that take
	1150	account of their environment exposes you to these issues.
	1151
	1152	Perl cannot protect you from all possibilities shown in the
	1153	examples--there is no substitute for your own vigilance--but, when
	1154	C<use locale> is in effect, Perl uses the tainting mechanism (see
	1155	L<perlsec>) to mark string results that become locale-dependent, and
	1156	which may be untrustworthy in consequence. Here is a summary of the
	1157	tainting behavior of operators and functions that may be affected by
	1158	the locale:
	1159
	1160	=over 4
	1161
	1162	=item *
	1163
	1164	B<Comparison operators> (C<lt>, C<le>, C<ge>, C<gt> and C<cmp>):
	1165
	1166	Scalar true/false (or less/equal/greater) result is never tainted.
	1167
	1168	=item *
	1169
	1170	B<Case-mapping interpolation> (with C<\l>, C<\L>, C<\u>, C<\U>, or C<\F>)
	1171
	1172	The result string containing interpolated material is tainted if
	1173	a C<use locale> form that includes C<LC_CTYPE> is in effect.
	1174
	1175	=item *
	1176
	1177	B<Matching operator> (C<m//>):
	1178
	1179	Scalar true/false result never tainted.
	1180
	1181	All subpatterns, either delivered as a list-context result or as C<$1>
	1182	I<etc>., are tainted if a C<use locale> form that includes
	1183	C<LC_CTYPE> is in effect, and the subpattern
	1184	regular expression contains a locale-dependent construct. These
	1185	constructs include C<\w> (to match an alphanumeric character), C<\W>
	1186	(non-alphanumeric character), C<\b> and C<\B> (word-boundary and
	1187	non-boundardy, which depend on what C<\w> and C<\W> match), C<\s>
	1188	(whitespace character), C<\S> (non whitespace character), C<\d> and
	1189	C<\D> (digits and non-digits), and the POSIX character classes, such as
	1190	C<[:alpha:]> (see L<perlrecharclass/POSIX Character Classes>).
	1191
	1192	Tainting is also likely if the pattern is to be matched
	1193	case-insensitively (via C</i>). The exception is if all the code points
	1194	to be matched this way are above 255 and do not have folds under Unicode
	1195	rules to below 256. Tainting is not done for these because Perl
	1196	only uses Unicode rules for such code points, and those rules are the
	1197	same no matter what the current locale.
	1198
	1199	The matched-pattern variables, C<$&>, C<$`> (pre-match), C<$'>
	1200	(post-match), and C<$+> (last match) also are tainted.
	1201
	1202	=item *
	1203
	1204	B<Substitution operator> (C<s///>):
	1205
	1206	Has the same behavior as the match operator. Also, the left
	1207	operand of C<=~> becomes tainted when a C<use locale>
	1208	form that includes C<LC_CTYPE> is in effect, if modified as
	1209	a result of a substitution based on a regular
	1210	expression match involving any of the things mentioned in the previous
	1211	item, or of case-mapping, such as C<\l>, C<\L>,C<\u>, C<\U>, or C<\F>.
	1212
	1213	=item *
	1214
	1215	B<Output formatting functions> (C<printf()> and C<write()>):
	1216
	1217	Results are never tainted because otherwise even output from print,
	1218	for example C<print(1/7)>, should be tainted if C<use locale> is in
	1219	effect.
	1220
	1221	=item *
	1222
	1223	B<Case-mapping functions> (C<lc()>, C<lcfirst()>, C<uc()>, C<ucfirst()>):
	1224
	1225	Results are tainted if a C<use locale> form that includes C<LC_CTYPE> is
	1226	in effect.
	1227
	1228	=item *
	1229
	1230	B<POSIX locale-dependent functions> (C<localeconv()>, C<strcoll()>,
	1231	C<strftime()>, C<strxfrm()>):
	1232
	1233	Results are never tainted.
	1234
	1235	=back
	1236
	1237	Three examples illustrate locale-dependent tainting.
	1238	The first program, which ignores its locale, won't run: a value taken
	1239	directly from the command line may not be used to name an output file
	1240	when taint checks are enabled.
	1241
	1242	#/usr/local/bin/perl -T
	1243	# Run with taint checking
	1244
	1245	# Command line sanity check omitted...
	1246	$tainted_output_file = shift;
	1247
	1248	open(F, ">$tainted_output_file")
	1249	or warn "Open of $tainted_output_file failed: $!\n";
	1250
	1251	The program can be made to run by "laundering" the tainted value through
	1252	a regular expression: the second example--which still ignores locale
	1253	information--runs, creating the file named on its command line
	1254	if it can.
	1255
	1256	#/usr/local/bin/perl -T
	1257
	1258	$tainted_output_file = shift;
	1259	$tainted_output_file =~ m%[\w/]+%;
	1260	$untainted_output_file = $&;
	1261
	1262	open(F, ">$untainted_output_file")
	1263	or warn "Open of $untainted_output_file failed: $!\n";
	1264
	1265	Compare this with a similar but locale-aware program:
	1266
	1267	#/usr/local/bin/perl -T
	1268
	1269	$tainted_output_file = shift;
	1270	use locale;
	1271	$tainted_output_file =~ m%[\w/]+%;
	1272	$localized_output_file = $&;
	1273
	1274	open(F, ">$localized_output_file")
	1275	or warn "Open of $localized_output_file failed: $!\n";
	1276
	1277	This third program fails to run because C<$&> is tainted: it is the result
	1278	of a match involving C<\w> while C<use locale> is in effect.
	1279
	1280	=head1 ENVIRONMENT
	1281
	1282	=over 12
	1283
	1284	=item PERL_SKIP_LOCALE_INIT
	1285
	1286	This environment variable, available starting in Perl v5.20, if set
	1287	(to any value), tells Perl to not use the rest of the
	1288	environment variables to initialize with. Instead, Perl uses whatever
	1289	the current locale settings are. This is particularly useful in
	1290	embedded environments, see
	1291	L<perlembed/Using embedded Perl with POSIX locales>.
	1292
	1293	=item PERL_BADLANG
	1294
	1295	A string that can suppress Perl's warning about failed locale settings
	1296	at startup. Failure can occur if the locale support in the operating
	1297	system is lacking (broken) in some way--or if you mistyped the name of
	1298	a locale when you set up your environment. If this environment
	1299	variable is absent, or has a value other than "0" or "", Perl will
	1300	complain about locale setting failures.
	1301
	1302	B<NOTE>: C<PERL_BADLANG> only gives you a way to hide the warning message.
	1303	The message tells about some problem in your system's locale support,
	1304	and you should investigate what the problem is.
	1305
	1306	=back
	1307
	1308	The following environment variables are not specific to Perl: They are
	1309	part of the standardized (ISO C, XPG4, POSIX 1.c) C<setlocale()> method
	1310	for controlling an application's opinion on data. Windows is non-POSIX,
	1311	but Perl arranges for the following to work as described anyway.
	1312	If the locale given by an environment variable is not valid, Perl tries
	1313	the next lower one in priority. If none are valid, on Windows, the
	1314	system default locale is then tried. If all else fails, the C<"C">
	1315	locale is used. If even that doesn't work, something is badly broken,
	1316	but Perl tries to forge ahead with whatever the locale settings might
	1317	be.
	1318
	1319	=over 12
	1320
	1321	=item C<LC_ALL>
	1322
	1323	C<LC_ALL> is the "override-all" locale environment variable. If
	1324	set, it overrides all the rest of the locale environment variables.
	1325
	1326	=item C<LANGUAGE>
	1327
	1328	B<NOTE>: C<LANGUAGE> is a GNU extension, it affects you only if you
	1329	are using the GNU libc. This is the case if you are using e.g. Linux.
	1330	If you are using "commercial" Unixes you are most probably I<not>
	1331	using GNU libc and you can ignore C<LANGUAGE>.
	1332
	1333	However, in the case you are using C<LANGUAGE>: it affects the
	1334	language of informational, warning, and error messages output by
	1335	commands (in other words, it's like C<LC_MESSAGES>) but it has higher
	1336	priority than C<LC_ALL>. Moreover, it's not a single value but
	1337	instead a "path" (":"-separated list) of I<languages> (not locales).
	1338	See the GNU C<gettext> library documentation for more information.
	1339
	1340	=item C<LC_CTYPE>
	1341
	1342	In the absence of C<LC_ALL>, C<LC_CTYPE> chooses the character type
	1343	locale. In the absence of both C<LC_ALL> and C<LC_CTYPE>, C<LANG>
	1344	chooses the character type locale.
	1345
	1346	=item C<LC_COLLATE>
	1347
	1348	In the absence of C<LC_ALL>, C<LC_COLLATE> chooses the collation
	1349	(sorting) locale. In the absence of both C<LC_ALL> and C<LC_COLLATE>,
	1350	C<LANG> chooses the collation locale.
	1351
	1352	=item C<LC_MONETARY>
	1353
	1354	In the absence of C<LC_ALL>, C<LC_MONETARY> chooses the monetary
	1355	formatting locale. In the absence of both C<LC_ALL> and C<LC_MONETARY>,
	1356	C<LANG> chooses the monetary formatting locale.
	1357
	1358	=item C<LC_NUMERIC>
	1359
	1360	In the absence of C<LC_ALL>, C<LC_NUMERIC> chooses the numeric format
	1361	locale. In the absence of both C<LC_ALL> and C<LC_NUMERIC>, C<LANG>
	1362	chooses the numeric format.
	1363
	1364	=item C<LC_TIME>
	1365
	1366	In the absence of C<LC_ALL>, C<LC_TIME> chooses the date and time
	1367	formatting locale. In the absence of both C<LC_ALL> and C<LC_TIME>,
	1368	C<LANG> chooses the date and time formatting locale.
	1369
	1370	=item C<LANG>
	1371
	1372	C<LANG> is the "catch-all" locale environment variable. If it is set, it
	1373	is used as the last resort after the overall C<LC_ALL> and the
	1374	category-specific C<LC_I<foo>>.
	1375
	1376	=back
	1377
	1378	=head2 Examples
	1379
	1380	The C<LC_NUMERIC> controls the numeric output:
	1381
	1382	use locale;
	1383	use POSIX qw(locale_h); # Imports setlocale() and the LC_ constants.
	1384	setlocale(LC_NUMERIC, "fr_FR") or die "Pardon";
	1385	printf "%g\n", 1.23; # If the "fr_FR" succeeded, probably shows 1,23.
	1386
	1387	and also how strings are parsed by C<POSIX::strtod()> as numbers:
	1388
	1389	use locale;
	1390	use POSIX qw(locale_h strtod);
	1391	setlocale(LC_NUMERIC, "de_DE") or die "Entschuldigung";
	1392	my $x = strtod("2,34") + 5;
	1393	print $x, "\n"; # Probably shows 7,34.
	1394
	1395	=head1 NOTES
	1396
	1397	=head2 String C<eval> and C<LC_NUMERIC>
	1398
	1399	A string L<eval\|perlfunc/eval EXPR> parses its expression as standard
	1400	Perl. It is therefore expecting the decimal point to be a dot. If
	1401	C<LC_NUMERIC> is set to have this be a comma instead, the parsing will
	1402	be confused, perhaps silently.
	1403
	1404	use locale;
	1405	use POSIX qw(locale_h);
	1406	setlocale(LC_NUMERIC, "fr_FR") or die "Pardon";
	1407	my $a = 1.2;
	1408	print eval "$a + 1.5";
	1409	print "\n";
	1410
	1411	prints C<13,5>. This is because in that locale, the comma is the
	1412	decimal point character. The C<eval> thus expands to:
	1413
	1414	eval "1,2 + 1.5"
	1415
	1416	and the result is not what you likely expected. No warnings are
	1417	generated. If you do string C<eval>'s within the scope of
	1418	S<C<use locale>>, you should instead change the C<eval> line to do
	1419	something like:
	1420
	1421	print eval "no locale; $a + 1.5";
	1422
	1423	This prints C<2.7>.
	1424
	1425	You could also exclude C<LC_NUMERIC>, if you don't need it, by
	1426
	1427	use locale ':!numeric';
	1428
	1429	=head2 Backward compatibility
	1430
	1431	Versions of Perl prior to 5.004 B<mostly> ignored locale information,
	1432	generally behaving as if something similar to the C<"C"> locale were
	1433	always in force, even if the program environment suggested otherwise
	1434	(see L</The setlocale function>). By default, Perl still behaves this
	1435	way for backward compatibility. If you want a Perl application to pay
	1436	attention to locale information, you B<must> use the S<C<use locale>>
	1437	pragma (see L</The "use locale" pragma>) or, in the unlikely event
	1438	that you want to do so for just pattern matching, the
	1439	C</l> regular expression modifier (see L<perlre/Character set
	1440	modifiers>) to instruct it to do so.
	1441
	1442	Versions of Perl from 5.002 to 5.003 did use the C<LC_CTYPE>
	1443	information if available; that is, C<\w> did understand what
	1444	were the letters according to the locale environment variables.
	1445	The problem was that the user had no control over the feature:
	1446	if the C library supported locales, Perl used them.
	1447
	1448	=head2 I18N:Collate obsolete
	1449
	1450	In versions of Perl prior to 5.004, per-locale collation was possible
	1451	using the C<I18N::Collate> library module. This module is now mildly
	1452	obsolete and should be avoided in new applications. The C<LC_COLLATE>
	1453	functionality is now integrated into the Perl core language: One can
	1454	use locale-specific scalar data completely normally with C<use locale>,
	1455	so there is no longer any need to juggle with the scalar references of
	1456	C<I18N::Collate>.
	1457
	1458	=head2 Sort speed and memory use impacts
	1459
	1460	Comparing and sorting by locale is usually slower than the default
	1461	sorting; slow-downs of two to four times have been observed. It will
	1462	also consume more memory: once a Perl scalar variable has participated
	1463	in any string comparison or sorting operation obeying the locale
	1464	collation rules, it will take 3-15 times more memory than before. (The
	1465	exact multiplier depends on the string's contents, the operating system
	1466	and the locale.) These downsides are dictated more by the operating
	1467	system's implementation of the locale system than by Perl.
	1468
	1469	=head2 Freely available locale definitions
	1470
	1471	The Unicode CLDR project extracts the POSIX portion of many of its
	1472	locales, available at
	1473
	1474	https://unicode.org/Public/cldr/2.0.1/
	1475
	1476	(Newer versions of CLDR require you to compute the POSIX data yourself.
	1477	See L<http://unicode.org/Public/cldr/latest/>.)
	1478
	1479	There is a large collection of locale definitions at:
	1480
	1481	http://std.dkuug.dk/i18n/WG15-collection/locales/
	1482
	1483	You should be aware that it is
	1484	unsupported, and is not claimed to be fit for any purpose. If your
	1485	system allows installation of arbitrary locales, you may find the
	1486	definitions useful as they are, or as a basis for the development of
	1487	your own locales.
	1488
	1489	=head2 I18n and l10n
	1490
	1491	"Internationalization" is often abbreviated as B<i18n> because its first
	1492	and last letters are separated by eighteen others. (You may guess why
	1493	the internalin ... internaliti ... i18n tends to get abbreviated.) In
	1494	the same way, "localization" is often abbreviated to B<l10n>.
	1495
	1496	=head2 An imperfect standard
	1497
	1498	Internationalization, as defined in the C and POSIX standards, can be
	1499	criticized as incomplete and ungainly. They also have a tendency, like
	1500	standards groups, to divide the world into nations, when we all know
	1501	that the world can equally well be divided into bankers, bikers, gamers,
	1502	and so on.
	1503
	1504	=head1 Unicode and UTF-8
	1505
	1506	The support of Unicode is new starting from Perl version v5.6, and more fully
	1507	implemented in versions v5.8 and later. See L<perluniintro>.
	1508
	1509	Starting in Perl v5.20, UTF-8 locales are supported in Perl, except
	1510	C<LC_COLLATE> is only partially supported; collation support is improved
	1511	in Perl v5.26 to a level that may be sufficient for your needs
	1512	(see L</Category C<LC_COLLATE>: Collation: Text Comparisons and Sorting>).
	1513
	1514	If you have Perl v5.16 or v5.18 and can't upgrade, you can use
	1515
	1516	use locale ':not_characters';
	1517
	1518	When this form of the pragma is used, only the non-character portions of
	1519	locales are used by Perl, for example C<LC_NUMERIC>. Perl assumes that
	1520	you have translated all the characters it is to operate on into Unicode
	1521	(actually the platform's native character set (ASCII or EBCDIC) plus
	1522	Unicode). For data in files, this can conveniently be done by also
	1523	specifying
	1524
	1525	use open ':locale';
	1526
	1527	This pragma arranges for all inputs from files to be translated into
	1528	Unicode from the current locale as specified in the environment (see
	1529	L</ENVIRONMENT>), and all outputs to files to be translated back
	1530	into the locale. (See L<open>). On a per-filehandle basis, you can
	1531	instead use the L<PerlIO::locale> module, or the L<Encode::Locale>
	1532	module, both available from CPAN. The latter module also has methods to
	1533	ease the handling of C<ARGV> and environment variables, and can be used
	1534	on individual strings. If you know that all your locales will be
	1535	UTF-8, as many are these days, you can use the L<B<-C>\|perlrun/-C>
	1536	command line switch.
	1537
	1538	This form of the pragma allows essentially seamless handling of locales
	1539	with Unicode. The collation order will be by Unicode code point order.
	1540	L<Unicode::Collate> can be used to get Unicode rules collation.
	1541
	1542	All the modules and switches just described can be used in v5.20 with
	1543	just plain C<use locale>, and, should the input locales not be UTF-8,
	1544	you'll get the less than ideal behavior, described below, that you get
	1545	with pre-v5.16 Perls, or when you use the locale pragma without the
	1546	C<:not_characters> parameter in v5.16 and v5.18. If you are using
	1547	exclusively UTF-8 locales in v5.20 and higher, the rest of this section
	1548	does not apply to you.
	1549
	1550	There are two cases, multi-byte and single-byte locales. First
	1551	multi-byte:
	1552
	1553	The only multi-byte (or wide character) locale that Perl is ever likely
	1554	to support is UTF-8. This is due to the difficulty of implementation,
	1555	the fact that high quality UTF-8 locales are now published for every
	1556	area of the world (L<https://unicode.org/Public/cldr/2.0.1/> for
	1557	ones that are already set-up, but from an earlier version;
	1558	L<https://unicode.org/Public/cldr/latest/> for the most up-to-date, but
	1559	you have to extract the POSIX information yourself), and that
	1560	failing all that you can use the L<Encode> module to translate to/from
	1561	your locale. So, you'll have to do one of those things if you're using
	1562	one of these locales, such as Big5 or Shift JIS. For UTF-8 locales, in
	1563	Perls (pre v5.20) that don't have full UTF-8 locale support, they may
	1564	work reasonably well (depending on your C library implementation)
	1565	simply because both
	1566	they and Perl store characters that take up multiple bytes the same way.
	1567	However, some, if not most, C library implementations may not process
	1568	the characters in the upper half of the Latin-1 range (128 - 255)
	1569	properly under C<LC_CTYPE>. To see if a character is a particular type
	1570	under a locale, Perl uses the functions like C<isalnum()>. Your C
	1571	library may not work for UTF-8 locales with those functions, instead
	1572	only working under the newer wide library functions like C<iswalnum()>,
	1573	which Perl does not use.
	1574	These multi-byte locales are treated like single-byte locales, and will
	1575	have the restrictions described below. Starting in Perl v5.22 a warning
	1576	message is raised when Perl detects a multi-byte locale that it doesn't
	1577	fully support.
	1578
	1579	For single-byte locales,
	1580	Perl generally takes the tack to use locale rules on code points that can fit
	1581	in a single byte, and Unicode rules for those that can't (though this
	1582	isn't uniformly applied, see the note at the end of this section). This
	1583	prevents many problems in locales that aren't UTF-8. Suppose the locale
	1584	is ISO8859-7, Greek. The character at 0xD7 there is a capital Chi. But
	1585	in the ISO8859-1 locale, Latin1, it is a multiplication sign. The POSIX
	1586	regular expression character class C<[[:alpha:]]> will magically match
	1587	0xD7 in the Greek locale but not in the Latin one.
	1588
	1589	However, there are places where this breaks down. Certain Perl constructs are
	1590	for Unicode only, such as C<\p{Alpha}>. They assume that 0xD7 always has its
	1591	Unicode meaning (or the equivalent on EBCDIC platforms). Since Latin1 is a
	1592	subset of Unicode and 0xD7 is the multiplication sign in both Latin1 and
	1593	Unicode, C<\p{Alpha}> will never match it, regardless of locale. A similar
	1594	issue occurs with C<\N{...}>. Prior to v5.20, it is therefore a bad
	1595	idea to use C<\p{}> or
	1596	C<\N{}> under plain C<use locale>--I<unless> you can guarantee that the
	1597	locale will be ISO8859-1. Use POSIX character classes instead.
	1598
	1599	Another problem with this approach is that operations that cross the
	1600	single byte/multiple byte boundary are not well-defined, and so are
	1601	disallowed. (This boundary is between the codepoints at 255/256.)
	1602	For example, lower casing LATIN CAPITAL LETTER Y WITH DIAERESIS (U+0178)
	1603	should return LATIN SMALL LETTER Y WITH DIAERESIS (U+00FF). But in the
	1604	Greek locale, for example, there is no character at 0xFF, and Perl
	1605	has no way of knowing what the character at 0xFF is really supposed to
	1606	represent. Thus it disallows the operation. In this mode, the
	1607	lowercase of U+0178 is itself.
	1608
	1609	The same problems ensue if you enable automatic UTF-8-ification of your
	1610	standard file handles, default C<open()> layer, and C<@ARGV> on non-ISO8859-1,
	1611	non-UTF-8 locales (by using either the B<-C> command line switch or the
	1612	C<PERL_UNICODE> environment variable; see L<perlrun>).
	1613	Things are read in as UTF-8, which would normally imply a Unicode
	1614	interpretation, but the presence of a locale causes them to be interpreted
	1615	in that locale instead. For example, a 0xD7 code point in the Unicode
	1616	input, which should mean the multiplication sign, won't be interpreted by
	1617	Perl that way under the Greek locale. This is not a problem
	1618	I<provided> you make certain that all locales will always and only be either
	1619	an ISO8859-1, or, if you don't have a deficient C library, a UTF-8 locale.
	1620
	1621	Still another problem is that this approach can lead to two code
	1622	points meaning the same character. Thus in a Greek locale, both U+03A7
	1623	and U+00D7 are GREEK CAPITAL LETTER CHI.
	1624
	1625	Because of all these problems, starting in v5.22, Perl will raise a
	1626	warning if a multi-byte (hence Unicode) code point is used when a
	1627	single-byte locale is in effect. (Although it doesn't check for this if
	1628	doing so would unreasonably slow execution down.)
	1629
	1630	Vendor locales are notoriously buggy, and it is difficult for Perl to test
	1631	its locale-handling code because this interacts with code that Perl has no
	1632	control over; therefore the locale-handling code in Perl may be buggy as
	1633	well. (However, the Unicode-supplied locales should be better, and
	1634	there is a feed back mechanism to correct any problems. See
	1635	L</Freely available locale definitions>.)
	1636
	1637	If you have Perl v5.16, the problems mentioned above go away if you use
	1638	the C<:not_characters> parameter to the locale pragma (except for vendor
	1639	bugs in the non-character portions). If you don't have v5.16, and you
	1640	I<do> have locales that work, using them may be worthwhile for certain
	1641	specific purposes, as long as you keep in mind the gotchas already
	1642	mentioned. For example, if the collation for your locales works, it
	1643	runs faster under locales than under L<Unicode::Collate>; and you gain
	1644	access to such things as the local currency symbol and the names of the
	1645	months and days of the week. (But to hammer home the point, in v5.16,
	1646	you get this access without the downsides of locales by using the
	1647	C<:not_characters> form of the pragma.)
	1648
	1649	Note: The policy of using locale rules for code points that can fit in a
	1650	byte, and Unicode rules for those that can't is not uniformly applied.
	1651	Pre-v5.12, it was somewhat haphazard; in v5.12 it was applied fairly
	1652	consistently to regular expression matching except for bracketed
	1653	character classes; in v5.14 it was extended to all regex matches; and in
	1654	v5.16 to the casing operations such as C<\L> and C<uc()>. For
	1655	collation, in all releases so far, the system's C<strxfrm()> function is
	1656	called, and whatever it does is what you get. Starting in v5.26, various
	1657	bugs are fixed with the way perl uses this function.
	1658
	1659	=head1 BUGS
	1660
	1661	=head2 Collation of strings containing embedded C<NUL> characters
	1662
	1663	C<NUL> characters will sort the same as the lowest collating control
	1664	character does, or to C<"\001"> in the unlikely event that there are no
	1665	control characters at all in the locale. In cases where the strings
	1666	don't contain this non-C<NUL> control, the results will be correct, and
	1667	in many locales, this control, whatever it might be, will rarely be
	1668	encountered. But there are cases where a C<NUL> should sort before this
	1669	control, but doesn't. If two strings do collate identically, the one
	1670	containing the C<NUL> will sort to earlier. Prior to 5.26, there were
	1671	more bugs.
	1672
	1673	=head2 Multi-threaded
	1674
	1675	XS code or C-language libraries called from it that use the system
	1676	L<C<setlocale(3)>> function (except on Windows) likely will not work
	1677	from a multi-threaded application without changes. See
	1678	L<perlxs/Locale-aware XS code>.
	1679
	1680	An XS module that is locale-dependent could have been written under the
	1681	assumption that it will never be called in a multi-threaded environment,
	1682	and so uses other non-locale constructs that aren't multi-thread-safe.
	1683	See L<perlxs/Thread-aware system interfaces>.
	1684
	1685	POSIX does not define a way to get the name of the current per-thread
	1686	locale. Some systems, such as Darwin and NetBSD do implement a
	1687	function, L<querylocale(3)> to do this. On non-Windows systems without
	1688	it, such as Linux, there are some additional caveats:
	1689
	1690	=over
	1691
	1692	=item *
	1693
	1694	An embedded perl needs to be started up while the global locale is in
	1695	effect. See L<perlembed/Using embedded Perl with POSIX locales>.
	1696
	1697	=item *
	1698
	1699	It becomes more important for perl to know about all the possible
	1700	locale categories on the platform, even if they aren't apparently used
	1701	in your program. Perl knows all of the Linux ones. If your platform
	1702	has others, you can send email to L<mailto:perlbug@perl.org> for
	1703	inclusion of it in the next release. In the meantime, it is possible to
	1704	edit the Perl source to teach it about the category, and then recompile.
	1705	Search for instances of, say, C<LC_PAPER> in the source, and use that as
	1706	a template to add the omitted one.
	1707
	1708	=item *
	1709
	1710	It is possible, though hard to do, to call C<POSIX::setlocale> with a
	1711	locale that it doesn't recognize as syntactically legal, but actually is
	1712	legal on that system. This should happen only with embedded perls, or
	1713	if you hand-craft a locale name yourself.
	1714
	1715	=back
	1716
	1717	=head2 Broken systems
	1718
	1719	In certain systems, the operating system's locale support
	1720	is broken and cannot be fixed or used by Perl. Such deficiencies can
	1721	and will result in mysterious hangs and/or Perl core dumps when
	1722	C<use locale> is in effect. When confronted with such a system,
	1723	please report in excruciating detail to <F<perlbug@perl.org>>, and
	1724	also contact your vendor: bug fixes may exist for these problems
	1725	in your operating system. Sometimes such bug fixes are called an
	1726	operating system upgrade. If you have the source for Perl, include in
	1727	the perlbug email the output of the test described above in L</Testing
	1728	for broken locales>.
	1729
	1730	=head1 SEE ALSO
	1731
	1732	L<I18N::Langinfo>, L<perluniintro>, L<perlunicode>, L<open>,
	1733	L<POSIX/localeconv>,
	1734	L<POSIX/setlocale>, L<POSIX/strcoll>, L<POSIX/strftime>,
	1735	L<POSIX/strtod>, L<POSIX/strxfrm>.
	1736
	1737	For special considerations when Perl is embedded in a C program,
	1738	see L<perlembed/Using embedded Perl with POSIX locales>.
	1739
	1740	=head1 HISTORY
	1741
	1742	Jarkko Hietaniemi's original F<perli18n.pod> heavily hacked by Dominic
	1743	Dunlop, assisted by the perl5-porters. Prose worked over a bit by
	1744	Tom Christiansen, and now maintained by Perl 5 porters.