perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=head1 NAME
	2
	3	perllocale - Perl locale handling (internationalization and localization)
	4
	5	=head1 DESCRIPTION
	6
	7	Perl supports language-specific notions of data such as "is this
	8	a letter", "what is the uppercase equivalent of this letter", and
	9	"which of these letters comes first". These are important issues,
	10	especially for languages other than English--but also for English: it
	11	would be naE<iuml>ve to imagine that C<A-Za-z> defines all the "letters"
	12	needed to write in English. Perl is also aware that some character other
	13	than '.' may be preferred as a decimal point, and that output date
	14	representations may be language-specific. The process of making an
	15	application take account of its users' preferences in such matters is
	16	called B<internationalization> (often abbreviated as B<i18n>); telling
	17	such an application about a particular set of preferences is known as
	18	B<localization> (B<l10n>).
	19
	20	Perl can understand language-specific data via the standardized (ISO C,
	21	XPG4, POSIX 1.c) method called "the locale system". The locale system is
	22	controlled per application using one pragma, one function call, and
	23	several environment variables.
	24
	25	B<NOTE>: This feature is new in Perl 5.004, and does not apply unless an
	26	application specifically requests it--see L<Backward compatibility>.
	27	The one exception is that write() now B<always> uses the current locale
	28	- see L<"NOTES">.
	29
	30	=head1 PREPARING TO USE LOCALES
	31
	32	If Perl applications are to understand and present your data
	33	correctly according a locale of your choice, B<all> of the following
	34	must be true:
	35
	36	=over 4
	37
	38	=item *
	39
	40	B<Your operating system must support the locale system>. If it does,
	41	you should find that the setlocale() function is a documented part of
	42	its C library.
	43
	44	=item *
	45
	46	B<Definitions for locales that you use must be installed>. You, or
	47	your system administrator, must make sure that this is the case. The
	48	available locales, the location in which they are kept, and the manner
	49	in which they are installed all vary from system to system. Some systems
	50	provide only a few, hard-wired locales and do not allow more to be
	51	added. Others allow you to add "canned" locales provided by the system
	52	supplier. Still others allow you or the system administrator to define
	53	and add arbitrary locales. (You may have to ask your supplier to
	54	provide canned locales that are not delivered with your operating
	55	system.) Read your system documentation for further illumination.
	56
	57	=item *
	58
	59	B<Perl must believe that the locale system is supported>. If it does,
	60	C<perl -V:d_setlocale> will say that the value for C<d_setlocale> is
	61	C<define>.
	62
	63	=back
	64
	65	If you want a Perl application to process and present your data
	66	according to a particular locale, the application code should include
	67	the S<C<use locale>> pragma (see L<The use locale pragma>) where
	68	appropriate, and B<at least one> of the following must be true:
	69
	70	=over 4
	71
	72	=item *
	73
	74	B<The locale-determining environment variables (see L<"ENVIRONMENT">)
	75	must be correctly set up> at the time the application is started, either
	76	by yourself or by whoever set up your system account.
	77
	78	=item *
	79
	80	B<The application must set its own locale> using the method described in
	81	L<The setlocale function>.
	82
	83	=back
	84
	85	=head1 USING LOCALES
	86
	87	=head2 The use locale pragma
	88
	89	By default, Perl ignores the current locale. The S<C<use locale>>
	90	pragma tells Perl to use the current locale for some operations:
	91
	92	=over 4
	93
	94	=item *
	95
	96	B<The comparison operators> (C<lt>, C<le>, C<cmp>, C<ge>, and C<gt>) and
	97	the POSIX string collation functions strcoll() and strxfrm() use
	98	C<LC_COLLATE>. sort() is also affected if used without an
	99	explicit comparison function, because it uses C<cmp> by default.
	100
	101	B<Note:> C<eq> and C<ne> are unaffected by locale: they always
	102	perform a char-by-char comparison of their scalar operands. What's
	103	more, if C<cmp> finds that its operands are equal according to the
	104	collation sequence specified by the current locale, it goes on to
	105	perform a char-by-char comparison, and only returns I<0> (equal) if the
	106	operands are char-for-char identical. If you really want to know whether
	107	two strings--which C<eq> and C<cmp> may consider different--are equal
	108	as far as collation in the locale is concerned, see the discussion in
	109	L<Category LC_COLLATE: Collation>.
	110
	111	=item *
	112
	113	B<Regular expressions and case-modification functions> (uc(), lc(),
	114	ucfirst(), and lcfirst()) use C<LC_CTYPE>
	115
	116	=item *
	117
	118	B<The formatting functions> (printf(), sprintf() and write()) use
	119	C<LC_NUMERIC>
	120
	121	=item *
	122
	123	B<The POSIX date formatting function> (strftime()) uses C<LC_TIME>.
	124
	125	=back
	126
	127	C<LC_COLLATE>, C<LC_CTYPE>, and so on, are discussed further in
	128	L<LOCALE CATEGORIES>.
	129
	130	The default behavior is restored with the S<C<no locale>> pragma, or
	131	upon reaching the end of block enclosing C<use locale>.
	132
	133	The string result of any operation that uses locale
	134	information is tainted, as it is possible for a locale to be
	135	untrustworthy. See L<"SECURITY">.
	136
	137	=head2 The setlocale function
	138
	139	You can switch locales as often as you wish at run time with the
	140	POSIX::setlocale() function:
	141
	142	# This functionality not usable prior to Perl 5.004
	143	require 5.004;
	144
	145	# Import locale-handling tool set from POSIX module.
	146	# This example uses: setlocale -- the function call
	147	# LC_CTYPE -- explained below
	148	use POSIX qw(locale_h);
	149
	150	# query and save the old locale
	151	$old_locale = setlocale(LC_CTYPE);
	152
	153	setlocale(LC_CTYPE, "fr_CA.ISO8859-1");
	154	# LC_CTYPE now in locale "French, Canada, codeset ISO 8859-1"
	155
	156	setlocale(LC_CTYPE, "");
	157	# LC_CTYPE now reset to default defined by LC_ALL/LC_CTYPE/LANG
	158	# environment variables. See below for documentation.
	159
	160	# restore the old locale
	161	setlocale(LC_CTYPE, $old_locale);
	162
	163	The first argument of setlocale() gives the B<category>, the second the
	164	B<locale>. The category tells in what aspect of data processing you
	165	want to apply locale-specific rules. Category names are discussed in
	166	L<LOCALE CATEGORIES> and L<"ENVIRONMENT">. The locale is the name of a
	167	collection of customization information corresponding to a particular
	168	combination of language, country or territory, and codeset. Read on for
	169	hints on the naming of locales: not all systems name locales as in the
	170	example.
	171
	172	If no second argument is provided and the category is something else
	173	than LC_ALL, the function returns a string naming the current locale
	174	for the category. You can use this value as the second argument in a
	175	subsequent call to setlocale().
	176
	177	If no second argument is provided and the category is LC_ALL, the
	178	result is implementation-dependent. It may be a string of
	179	concatenated locales names (separator also implementation-dependent)
	180	or a single locale name. Please consult your setlocale(3) man page for
	181	details.
	182
	183	If a second argument is given and it corresponds to a valid locale,
	184	the locale for the category is set to that value, and the function
	185	returns the now-current locale value. You can then use this in yet
	186	another call to setlocale(). (In some implementations, the return
	187	value may sometimes differ from the value you gave as the second
	188	argument--think of it as an alias for the value you gave.)
	189
	190	As the example shows, if the second argument is an empty string, the
	191	category's locale is returned to the default specified by the
	192	corresponding environment variables. Generally, this results in a
	193	return to the default that was in force when Perl started up: changes
	194	to the environment made by the application after startup may or may not
	195	be noticed, depending on your system's C library.
	196
	197	If the second argument does not correspond to a valid locale, the locale
	198	for the category is not changed, and the function returns I<undef>.
	199
	200	For further information about the categories, consult setlocale(3).
	201
	202	=head2 Finding locales
	203
	204	For locales available in your system, consult also setlocale(3) to
	205	see whether it leads to the list of available locales (search for the
	206	I<SEE ALSO> section). If that fails, try the following command lines:
	207
	208	locale -a
	209
	210	nlsinfo
	211
	212	ls /usr/lib/nls/loc
	213
	214	ls /usr/lib/locale
	215
	216	ls /usr/lib/nls
	217
	218	ls /usr/share/locale
	219
	220	and see whether they list something resembling these
	221
	222	en_US.ISO8859-1 de_DE.ISO8859-1 ru_RU.ISO8859-5
	223	en_US.iso88591 de_DE.iso88591 ru_RU.iso88595
	224	en_US de_DE ru_RU
	225	en de ru
	226	english german russian
	227	english.iso88591 german.iso88591 russian.iso88595
	228	english.roman8 russian.koi8r
	229
	230	Sadly, even though the calling interface for setlocale() has been
	231	standardized, names of locales and the directories where the
	232	configuration resides have not been. The basic form of the name is
	233	I<language_territory>B<.>I<codeset>, but the latter parts after
	234	I<language> are not always present. The I<language> and I<country>
	235	are usually from the standards B<ISO 3166> and B<ISO 639>, the
	236	two-letter abbreviations for the countries and the languages of the
	237	world, respectively. The I<codeset> part often mentions some B<ISO
	238	8859> character set, the Latin codesets. For example, C<ISO 8859-1>
	239	is the so-called "Western European codeset" that can be used to encode
	240	most Western European languages adequately. Again, there are several
	241	ways to write even the name of that one standard. Lamentably.
	242
	243	Two special locales are worth particular mention: "C" and "POSIX".
	244	Currently these are effectively the same locale: the difference is
	245	mainly that the first one is defined by the C standard, the second by
	246	the POSIX standard. They define the B<default locale> in which
	247	every program starts in the absence of locale information in its
	248	environment. (The I<default> default locale, if you will.) Its language
	249	is (American) English and its character codeset ASCII.
	250
	251	B<NOTE>: Not all systems have the "POSIX" locale (not all systems are
	252	POSIX-conformant), so use "C" when you need explicitly to specify this
	253	default locale.
	254
	255	=head2 LOCALE PROBLEMS
	256
	257	You may encounter the following warning message at Perl startup:
	258
	259	perl: warning: Setting locale failed.
	260	perl: warning: Please check that your locale settings:
	261	LC_ALL = "En_US",
	262	LANG = (unset)
	263	are supported and installed on your system.
	264	perl: warning: Falling back to the standard locale ("C").
	265
	266	This means that your locale settings had LC_ALL set to "En_US" and
	267	LANG exists but has no value. Perl tried to believe you but could not.
	268	Instead, Perl gave up and fell back to the "C" locale, the default locale
	269	that is supposed to work no matter what. This usually means your locale
	270	settings were wrong, they mention locales your system has never heard
	271	of, or the locale installation in your system has problems (for example,
	272	some system files are broken or missing). There are quick and temporary
	273	fixes to these problems, as well as more thorough and lasting fixes.
	274
	275	=head2 Temporarily fixing locale problems
	276
	277	The two quickest fixes are either to render Perl silent about any
	278	locale inconsistencies or to run Perl under the default locale "C".
	279
	280	Perl's moaning about locale problems can be silenced by setting the
	281	environment variable PERL_BADLANG to a zero value, for example "0".
	282	This method really just sweeps the problem under the carpet: you tell
	283	Perl to shut up even when Perl sees that something is wrong. Do not
	284	be surprised if later something locale-dependent misbehaves.
	285
	286	Perl can be run under the "C" locale by setting the environment
	287	variable LC_ALL to "C". This method is perhaps a bit more civilized
	288	than the PERL_BADLANG approach, but setting LC_ALL (or
	289	other locale variables) may affect other programs as well, not just
	290	Perl. In particular, external programs run from within Perl will see
	291	these changes. If you make the new settings permanent (read on), all
	292	programs you run see the changes. See L<"ENVIRONMENT"> for
	293	the full list of relevant environment variables and L<USING LOCALES>
	294	for their effects in Perl. Effects in other programs are
	295	easily deducible. For example, the variable LC_COLLATE may well affect
	296	your B<sort> program (or whatever the program that arranges "records"
	297	alphabetically in your system is called).
	298
	299	You can test out changing these variables temporarily, and if the
	300	new settings seem to help, put those settings into your shell startup
	301	files. Consult your local documentation for the exact details. For in
	302	Bourne-like shells (B<sh>, B<ksh>, B<bash>, B<zsh>):
	303
	304	LC_ALL=en_US.ISO8859-1
	305	export LC_ALL
	306
	307	This assumes that we saw the locale "en_US.ISO8859-1" using the commands
	308	discussed above. We decided to try that instead of the above faulty
	309	locale "En_US"--and in Cshish shells (B<csh>, B<tcsh>)
	310
	311	setenv LC_ALL en_US.ISO8859-1
	312
	313	or if you have the "env" application you can do in any shell
	314
	315	env LC_ALL=en_US.ISO8859-1 perl ...
	316
	317	If you do not know what shell you have, consult your local
	318	helpdesk or the equivalent.
	319
	320	=head2 Permanently fixing locale problems
	321
	322	The slower but superior fixes are when you may be able to yourself
	323	fix the misconfiguration of your own environment variables. The
	324	mis(sing)configuration of the whole system's locales usually requires
	325	the help of your friendly system administrator.
	326
	327	First, see earlier in this document about L<Finding locales>. That tells
	328	how to find which locales are really supported--and more importantly,
	329	installed--on your system. In our example error message, environment
	330	variables affecting the locale are listed in the order of decreasing
	331	importance (and unset variables do not matter). Therefore, having
	332	LC_ALL set to "En_US" must have been the bad choice, as shown by the
	333	error message. First try fixing locale settings listed first.
	334
	335	Second, if using the listed commands you see something B<exactly>
	336	(prefix matches do not count and case usually counts) like "En_US"
	337	without the quotes, then you should be okay because you are using a
	338	locale name that should be installed and available in your system.
	339	In this case, see L<Permanently fixing your system's locale configuration>.
	340
	341	=head2 Permanently fixing your system's locale configuration
	342
	343	This is when you see something like:
	344
	345	perl: warning: Please check that your locale settings:
	346	LC_ALL = "En_US",
	347	LANG = (unset)
	348	are supported and installed on your system.
	349
	350	but then cannot see that "En_US" listed by the above-mentioned
	351	commands. You may see things like "en_US.ISO8859-1", but that isn't
	352	the same. In this case, try running under a locale
	353	that you can list and which somehow matches what you tried. The
	354	rules for matching locale names are a bit vague because
	355	standardization is weak in this area. See again the
	356	L<Finding locales> about general rules.
	357
	358	=head2 Fixing system locale configuration
	359
	360	Contact a system administrator (preferably your own) and report the exact
	361	error message you get, and ask them to read this same documentation you
	362	are now reading. They should be able to check whether there is something
	363	wrong with the locale configuration of the system. The L<Finding locales>
	364	section is unfortunately a bit vague about the exact commands and places
	365	because these things are not that standardized.
	366
	367	=head2 The localeconv function
	368
	369	The POSIX::localeconv() function allows you to get particulars of the
	370	locale-dependent numeric formatting information specified by the current
	371	C<LC_NUMERIC> and C<LC_MONETARY> locales. (If you just want the name of
	372	the current locale for a particular category, use POSIX::setlocale()
	373	with a single parameter--see L<The setlocale function>.)
	374
	375	use POSIX qw(locale_h);
	376
	377	# Get a reference to a hash of locale-dependent info
	378	$locale_values = localeconv();
	379
	380	# Output sorted list of the values
	381	for (sort keys %$locale_values) {
	382	printf "%-20s = %s\n", $_, $locale_values->{$_}
	383	}
	384
	385	localeconv() takes no arguments, and returns B<a reference to> a hash.
	386	The keys of this hash are variable names for formatting, such as
	387	C<decimal_point> and C<thousands_sep>. The values are the
	388	corresponding, er, values. See L<POSIX/localeconv> for a longer
	389	example listing the categories an implementation might be expected to
	390	provide; some provide more and others fewer. You don't need an
	391	explicit C<use locale>, because localeconv() always observes the
	392	current locale.
	393
	394	Here's a simple-minded example program that rewrites its command-line
	395	parameters as integers correctly formatted in the current locale:
	396
	397	# See comments in previous example
	398	require 5.004;
	399	use POSIX qw(locale_h);
	400
	401	# Get some of locale's numeric formatting parameters
	402	my ($thousands_sep, $grouping) =
	403	@{localeconv()}{'thousands_sep', 'grouping'};
	404
	405	# Apply defaults if values are missing
	406	$thousands_sep = ',' unless $thousands_sep;
	407
	408	# grouping and mon_grouping are packed lists
	409	# of small integers (characters) telling the
	410	# grouping (thousand_seps and mon_thousand_seps
	411	# being the group dividers) of numbers and
	412	# monetary quantities. The integers' meanings:
	413	# 255 means no more grouping, 0 means repeat
	414	# the previous grouping, 1-254 means use that
	415	# as the current grouping. Grouping goes from
	416	# right to left (low to high digits). In the
	417	# below we cheat slightly by never using anything
	418	# else than the first grouping (whatever that is).
	419	if ($grouping) {
	420	@grouping = unpack("C*", $grouping);
	421	} else {
	422	@grouping = (3);
	423	}
	424
	425	# Format command line params for current locale
	426	for (@ARGV) {
	427	$_ = int; # Chop non-integer part
	428	1 while
	429	s/(\d)(\d{$grouping[0]}($\|$thousands_sep))/$1$thousands_sep$2/;
	430	print "$_";
	431	}
	432	print "\n";
	433
	434	=head2 I18N::Langinfo
	435
	436	Another interface for querying locale-dependent information is the
	437	I18N::Langinfo::langinfo() function, available at least in UNIX-like
	438	systems and VMS.
	439
	440	The following example will import the langinfo() function itself and
	441	three constants to be used as arguments to langinfo(): a constant for
	442	the abbreviated first day of the week (the numbering starts from
	443	Sunday = 1) and two more constants for the affirmative and negative
	444	answers for a yes/no question in the current locale.
	445
	446	use I18N::Langinfo qw(langinfo ABDAY_1 YESSTR NOSTR);
	447
	448	my ($abday_1, $yesstr, $nostr) = map { langinfo } qw(ABDAY_1 YESSTR NOSTR);
	449
	450	print "$abday_1? [$yesstr/$nostr] ";
	451
	452	In other words, in the "C" (or English) locale the above will probably
	453	print something like:
	454
	455	Sun? [yes/no]
	456
	457	See L<I18N::Langinfo> for more information.
	458
	459	=head1 LOCALE CATEGORIES
	460
	461	The following subsections describe basic locale categories. Beyond these,
	462	some combination categories allow manipulation of more than one
	463	basic category at a time. See L<"ENVIRONMENT"> for a discussion of these.
	464
	465	=head2 Category LC_COLLATE: Collation
	466
	467	In the scope of S<C<use locale>>, Perl looks to the C<LC_COLLATE>
	468	environment variable to determine the application's notions on collation
	469	(ordering) of characters. For example, 'b' follows 'a' in Latin
	470	alphabets, but where do 'E<aacute>' and 'E<aring>' belong? And while
	471	'color' follows 'chocolate' in English, what about in Spanish?
	472
	473	The following collations all make sense and you may meet any of them
	474	if you "use locale".
	475
	476	A B C D E a b c d e
	477	A a B b C c D d E e
	478	a A b B c C d D e E
	479	a b c d e A B C D E
	480
	481	Here is a code snippet to tell what "word"
	482	characters are in the current locale, in that locale's order:
	483
	484	use locale;
	485	print +(sort grep /\w/, map { chr } 0..255), "\n";
	486
	487	Compare this with the characters that you see and their order if you
	488	state explicitly that the locale should be ignored:
	489
	490	no locale;
	491	print +(sort grep /\w/, map { chr } 0..255), "\n";
	492
	493	This machine-native collation (which is what you get unless S<C<use
	494	locale>> has appeared earlier in the same block) must be used for
	495	sorting raw binary data, whereas the locale-dependent collation of the
	496	first example is useful for natural text.
	497
	498	As noted in L<USING LOCALES>, C<cmp> compares according to the current
	499	collation locale when C<use locale> is in effect, but falls back to a
	500	char-by-char comparison for strings that the locale says are equal. You
	501	can use POSIX::strcoll() if you don't want this fall-back:
	502
	503	use POSIX qw(strcoll);
	504	$equal_in_locale =
	505	!strcoll("space and case ignored", "SpaceAndCaseIgnored");
	506
	507	$equal_in_locale will be true if the collation locale specifies a
	508	dictionary-like ordering that ignores space characters completely and
	509	which folds case.
	510
	511	If you have a single string that you want to check for "equality in
	512	locale" against several others, you might think you could gain a little
	513	efficiency by using POSIX::strxfrm() in conjunction with C<eq>:
	514
	515	use POSIX qw(strxfrm);
	516	$xfrm_string = strxfrm("Mixed-case string");
	517	print "locale collation ignores spaces\n"
	518	if $xfrm_string eq strxfrm("Mixed-casestring");
	519	print "locale collation ignores hyphens\n"
	520	if $xfrm_string eq strxfrm("Mixedcase string");
	521	print "locale collation ignores case\n"
	522	if $xfrm_string eq strxfrm("mixed-case string");
	523
	524	strxfrm() takes a string and maps it into a transformed string for use
	525	in char-by-char comparisons against other transformed strings during
	526	collation. "Under the hood", locale-affected Perl comparison operators
	527	call strxfrm() for both operands, then do a char-by-char
	528	comparison of the transformed strings. By calling strxfrm() explicitly
	529	and using a non locale-affected comparison, the example attempts to save
	530	a couple of transformations. But in fact, it doesn't save anything: Perl
	531	magic (see L<perlguts/Magic Variables>) creates the transformed version of a
	532	string the first time it's needed in a comparison, then keeps this version around
	533	in case it's needed again. An example rewritten the easy way with
	534	C<cmp> runs just about as fast. It also copes with null characters
	535	embedded in strings; if you call strxfrm() directly, it treats the first
	536	null it finds as a terminator. don't expect the transformed strings
	537	it produces to be portable across systems--or even from one revision
	538	of your operating system to the next. In short, don't call strxfrm()
	539	directly: let Perl do it for you.
	540
	541	Note: C<use locale> isn't shown in some of these examples because it isn't
	542	needed: strcoll() and strxfrm() exist only to generate locale-dependent
	543	results, and so always obey the current C<LC_COLLATE> locale.
	544
	545	=head2 Category LC_CTYPE: Character Types
	546
	547	In the scope of S<C<use locale>>, Perl obeys the C<LC_CTYPE> locale
	548	setting. This controls the application's notion of which characters are
	549	alphabetic. This affects Perl's C<\w> regular expression metanotation,
	550	which stands for alphanumeric characters--that is, alphabetic,
	551	numeric, and including other special characters such as the underscore or
	552	hyphen. (Consult L<perlre> for more information about
	553	regular expressions.) Thanks to C<LC_CTYPE>, depending on your locale
	554	setting, characters like 'E<aelig>', 'E<eth>', 'E<szlig>', and
	555	'E<oslash>' may be understood as C<\w> characters.
	556
	557	The C<LC_CTYPE> locale also provides the map used in transliterating
	558	characters between lower and uppercase. This affects the case-mapping
	559	functions--lc(), lcfirst, uc(), and ucfirst(); case-mapping
	560	interpolation with C<\l>, C<\L>, C<\u>, or C<\U> in double-quoted strings
	561	and C<s///> substitutions; and case-independent regular expression
	562	pattern matching using the C<i> modifier.
	563
	564	Finally, C<LC_CTYPE> affects the POSIX character-class test
	565	functions--isalpha(), islower(), and so on. For example, if you move
	566	from the "C" locale to a 7-bit Scandinavian one, you may find--possibly
	567	to your surprise--that "\|" moves from the ispunct() class to isalpha().
	568
	569	B<Note:> A broken or malicious C<LC_CTYPE> locale definition may result
	570	in clearly ineligible characters being considered to be alphanumeric by
	571	your application. For strict matching of (mundane) letters and
	572	digits--for example, in command strings--locale-aware applications
	573	should use C<\w> inside a C<no locale> block. See L<"SECURITY">.
	574
	575	=head2 Category LC_NUMERIC: Numeric Formatting
	576
	577	In the scope of S<C<use locale>>, Perl obeys the C<LC_NUMERIC> locale
	578	information, which controls an application's idea of how numbers should
	579	be formatted for human readability by the printf(), sprintf(), and
	580	write() functions. String-to-numeric conversion by the POSIX::strtod()
	581	function is also affected. In most implementations the only effect is to
	582	change the character used for the decimal point--perhaps from '.' to ','.
	583	These functions aren't aware of such niceties as thousands separation and
	584	so on. (See L<The localeconv function> if you care about these things.)
	585
	586	Output produced by print() is also affected by the current locale: it
	587	depends on whether C<use locale> or C<no locale> is in effect, and
	588	corresponds to what you'd get from printf() in the "C" locale. The
	589	same is true for Perl's internal conversions between numeric and
	590	string formats:
	591
	592	use POSIX qw(strtod);
	593	use locale;
	594
	595	$n = 5/2; # Assign numeric 2.5 to $n
	596
	597	$a = " $n"; # Locale-dependent conversion to string
	598
	599	print "half five is $n\n"; # Locale-dependent output
	600
	601	printf "half five is %g\n", $n; # Locale-dependent output
	602
	603	print "DECIMAL POINT IS COMMA\n"
	604	if $n == (strtod("2,5"))[0]; # Locale-dependent conversion
	605
	606	See also L<I18N::Langinfo> and C<RADIXCHAR>.
	607
	608	=head2 Category LC_MONETARY: Formatting of monetary amounts
	609
	610	The C standard defines the C<LC_MONETARY> category, but no function
	611	that is affected by its contents. (Those with experience of standards
	612	committees will recognize that the working group decided to punt on the
	613	issue.) Consequently, Perl takes no notice of it. If you really want
	614	to use C<LC_MONETARY>, you can query its contents--see
	615	L<The localeconv function>--and use the information that it returns in your
	616	application's own formatting of currency amounts. However, you may well
	617	find that the information, voluminous and complex though it may be, still
	618	does not quite meet your requirements: currency formatting is a hard nut
	619	to crack.
	620
	621	See also L<I18N::Langinfo> and C<CRNCYSTR>.
	622
	623	=head2 LC_TIME
	624
	625	Output produced by POSIX::strftime(), which builds a formatted
	626	human-readable date/time string, is affected by the current C<LC_TIME>
	627	locale. Thus, in a French locale, the output produced by the C<%B>
	628	format element (full month name) for the first month of the year would
	629	be "janvier". Here's how to get a list of long month names in the
	630	current locale:
	631
	632	use POSIX qw(strftime);
	633	for (0..11) {
	634	$long_month_name[$_] =
	635	strftime("%B", 0, 0, 0, 1, $_, 96);
	636	}
	637
	638	Note: C<use locale> isn't needed in this example: as a function that
	639	exists only to generate locale-dependent results, strftime() always
	640	obeys the current C<LC_TIME> locale.
	641
	642	See also L<I18N::Langinfo> and C<ABDAY_1>..C<ABDAY_7>, C<DAY_1>..C<DAY_7>,
	643	C<ABMON_1>..C<ABMON_12>, and C<ABMON_1>..C<ABMON_12>.
	644
	645	=head2 Other categories
	646
	647	The remaining locale category, C<LC_MESSAGES> (possibly supplemented
	648	by others in particular implementations) is not currently used by
	649	Perl--except possibly to affect the behavior of library functions
	650	called by extensions outside the standard Perl distribution and by the
	651	operating system and its utilities. Note especially that the string
	652	value of C<$!> and the error messages given by external utilities may
	653	be changed by C<LC_MESSAGES>. If you want to have portable error
	654	codes, use C<%!>. See L<Errno>.
	655
	656	=head1 SECURITY
	657
	658	Although the main discussion of Perl security issues can be found in
	659	L<perlsec>, a discussion of Perl's locale handling would be incomplete
	660	if it did not draw your attention to locale-dependent security issues.
	661	Locales--particularly on systems that allow unprivileged users to
	662	build their own locales--are untrustworthy. A malicious (or just plain
	663	broken) locale can make a locale-aware application give unexpected
	664	results. Here are a few possibilities:
	665
	666	=over 4
	667
	668	=item *
	669
	670	Regular expression checks for safe file names or mail addresses using
	671	C<\w> may be spoofed by an C<LC_CTYPE> locale that claims that
	672	characters such as "E<gt>" and "\|" are alphanumeric.
	673
	674	=item *
	675
	676	String interpolation with case-mapping, as in, say, C<$dest =
	677	"C:\U$name.$ext">, may produce dangerous results if a bogus LC_CTYPE
	678	case-mapping table is in effect.
	679
	680	=item *
	681
	682	A sneaky C<LC_COLLATE> locale could result in the names of students with
	683	"D" grades appearing ahead of those with "A"s.
	684
	685	=item *
	686
	687	An application that takes the trouble to use information in
	688	C<LC_MONETARY> may format debits as if they were credits and vice versa
	689	if that locale has been subverted. Or it might make payments in US
	690	dollars instead of Hong Kong dollars.
	691
	692	=item *
	693
	694	The date and day names in dates formatted by strftime() could be
	695	manipulated to advantage by a malicious user able to subvert the
	696	C<LC_DATE> locale. ("Look--it says I wasn't in the building on
	697	Sunday.")
	698
	699	=back
	700
	701	Such dangers are not peculiar to the locale system: any aspect of an
	702	application's environment which may be modified maliciously presents
	703	similar challenges. Similarly, they are not specific to Perl: any
	704	programming language that allows you to write programs that take
	705	account of their environment exposes you to these issues.
	706
	707	Perl cannot protect you from all possibilities shown in the
	708	examples--there is no substitute for your own vigilance--but, when
	709	C<use locale> is in effect, Perl uses the tainting mechanism (see
	710	L<perlsec>) to mark string results that become locale-dependent, and
	711	which may be untrustworthy in consequence. Here is a summary of the
	712	tainting behavior of operators and functions that may be affected by
	713	the locale:
	714
	715	=over 4
	716
	717	=item *
	718
	719	B<Comparison operators> (C<lt>, C<le>, C<ge>, C<gt> and C<cmp>):
	720
	721	Scalar true/false (or less/equal/greater) result is never tainted.
	722
	723	=item *
	724
	725	B<Case-mapping interpolation> (with C<\l>, C<\L>, C<\u> or C<\U>)
	726
	727	Result string containing interpolated material is tainted if
	728	C<use locale> is in effect.
	729
	730	=item *
	731
	732	B<Matching operator> (C<m//>):
	733
	734	Scalar true/false result never tainted.
	735
	736	Subpatterns, either delivered as a list-context result or as $1 etc.
	737	are tainted if C<use locale> is in effect, and the subpattern regular
	738	expression contains C<\w> (to match an alphanumeric character), C<\W>
	739	(non-alphanumeric character), C<\s> (whitespace character), or C<\S>
	740	(non whitespace character). The matched-pattern variable, $&, $`
	741	(pre-match), $' (post-match), and $+ (last match) are also tainted if
	742	C<use locale> is in effect and the regular expression contains C<\w>,
	743	C<\W>, C<\s>, or C<\S>.
	744
	745	=item *
	746
	747	B<Substitution operator> (C<s///>):
	748
	749	Has the same behavior as the match operator. Also, the left
	750	operand of C<=~> becomes tainted when C<use locale> in effect
	751	if modified as a result of a substitution based on a regular
	752	expression match involving C<\w>, C<\W>, C<\s>, or C<\S>; or of
	753	case-mapping with C<\l>, C<\L>,C<\u> or C<\U>.
	754
	755	=item *
	756
	757	B<Output formatting functions> (printf() and write()):
	758
	759	Results are never tainted because otherwise even output from print,
	760	for example C<print(1/7)>, should be tainted if C<use locale> is in
	761	effect.
	762
	763	=item *
	764
	765	B<Case-mapping functions> (lc(), lcfirst(), uc(), ucfirst()):
	766
	767	Results are tainted if C<use locale> is in effect.
	768
	769	=item *
	770
	771	B<POSIX locale-dependent functions> (localeconv(), strcoll(),
	772	strftime(), strxfrm()):
	773
	774	Results are never tainted.
	775
	776	=item *
	777
	778	B<POSIX character class tests> (isalnum(), isalpha(), isdigit(),
	779	isgraph(), islower(), isprint(), ispunct(), isspace(), isupper(),
	780	isxdigit()):
	781
	782	True/false results are never tainted.
	783
	784	=back
	785
	786	Three examples illustrate locale-dependent tainting.
	787	The first program, which ignores its locale, won't run: a value taken
	788	directly from the command line may not be used to name an output file
	789	when taint checks are enabled.
	790
	791	#/usr/local/bin/perl -T
	792	# Run with taint checking
	793
	794	# Command line sanity check omitted...
	795	$tainted_output_file = shift;
	796
	797	open(F, ">$tainted_output_file")
	798	or warn "Open of $untainted_output_file failed: $!\n";
	799
	800	The program can be made to run by "laundering" the tainted value through
	801	a regular expression: the second example--which still ignores locale
	802	information--runs, creating the file named on its command line
	803	if it can.
	804
	805	#/usr/local/bin/perl -T
	806
	807	$tainted_output_file = shift;
	808	$tainted_output_file =~ m%[\w/]+%;
	809	$untainted_output_file = $&;
	810
	811	open(F, ">$untainted_output_file")
	812	or warn "Open of $untainted_output_file failed: $!\n";
	813
	814	Compare this with a similar but locale-aware program:
	815
	816	#/usr/local/bin/perl -T
	817
	818	$tainted_output_file = shift;
	819	use locale;
	820	$tainted_output_file =~ m%[\w/]+%;
	821	$localized_output_file = $&;
	822
	823	open(F, ">$localized_output_file")
	824	or warn "Open of $localized_output_file failed: $!\n";
	825
	826	This third program fails to run because $& is tainted: it is the result
	827	of a match involving C<\w> while C<use locale> is in effect.
	828
	829	=head1 ENVIRONMENT
	830
	831	=over 12
	832
	833	=item PERL_BADLANG
	834
	835	A string that can suppress Perl's warning about failed locale settings
	836	at startup. Failure can occur if the locale support in the operating
	837	system is lacking (broken) in some way--or if you mistyped the name of
	838	a locale when you set up your environment. If this environment
	839	variable is absent, or has a value that does not evaluate to integer
	840	zero--that is, "0" or ""-- Perl will complain about locale setting
	841	failures.
	842
	843	B<NOTE>: PERL_BADLANG only gives you a way to hide the warning message.
	844	The message tells about some problem in your system's locale support,
	845	and you should investigate what the problem is.
	846
	847	=back
	848
	849	The following environment variables are not specific to Perl: They are
	850	part of the standardized (ISO C, XPG4, POSIX 1.c) setlocale() method
	851	for controlling an application's opinion on data.
	852
	853	=over 12
	854
	855	=item LC_ALL
	856
	857	C<LC_ALL> is the "override-all" locale environment variable. If
	858	set, it overrides all the rest of the locale environment variables.
	859
	860	=item LANGUAGE
	861
	862	B<NOTE>: C<LANGUAGE> is a GNU extension, it affects you only if you
	863	are using the GNU libc. This is the case if you are using e.g. Linux.
	864	If you are using "commercial" UNIXes you are most probably I<not>
	865	using GNU libc and you can ignore C<LANGUAGE>.
	866
	867	However, in the case you are using C<LANGUAGE>: it affects the
	868	language of informational, warning, and error messages output by
	869	commands (in other words, it's like C<LC_MESSAGES>) but it has higher
	870	priority than L<LC_ALL>. Moreover, it's not a single value but
	871	instead a "path" (":"-separated list) of I<languages> (not locales).
	872	See the GNU C<gettext> library documentation for more information.
	873
	874	=item LC_CTYPE
	875
	876	In the absence of C<LC_ALL>, C<LC_CTYPE> chooses the character type
	877	locale. In the absence of both C<LC_ALL> and C<LC_CTYPE>, C<LANG>
	878	chooses the character type locale.
	879
	880	=item LC_COLLATE
	881
	882	In the absence of C<LC_ALL>, C<LC_COLLATE> chooses the collation
	883	(sorting) locale. In the absence of both C<LC_ALL> and C<LC_COLLATE>,
	884	C<LANG> chooses the collation locale.
	885
	886	=item LC_MONETARY
	887
	888	In the absence of C<LC_ALL>, C<LC_MONETARY> chooses the monetary
	889	formatting locale. In the absence of both C<LC_ALL> and C<LC_MONETARY>,
	890	C<LANG> chooses the monetary formatting locale.
	891
	892	=item LC_NUMERIC
	893
	894	In the absence of C<LC_ALL>, C<LC_NUMERIC> chooses the numeric format
	895	locale. In the absence of both C<LC_ALL> and C<LC_NUMERIC>, C<LANG>
	896	chooses the numeric format.
	897
	898	=item LC_TIME
	899
	900	In the absence of C<LC_ALL>, C<LC_TIME> chooses the date and time
	901	formatting locale. In the absence of both C<LC_ALL> and C<LC_TIME>,
	902	C<LANG> chooses the date and time formatting locale.
	903
	904	=item LANG
	905
	906	C<LANG> is the "catch-all" locale environment variable. If it is set, it
	907	is used as the last resort after the overall C<LC_ALL> and the
	908	category-specific C<LC_...>.
	909
	910	=back
	911
	912	=head2 Examples
	913
	914	The LC_NUMERIC controls the numeric output:
	915
	916	use locale;
	917	use POSIX qw(locale_h); # Imports setlocale() and the LC_ constants.
	918	setlocale(LC_NUMERIC, "fr_FR") or die "Pardon";
	919	printf "%g\n", 1.23; # If the "fr_FR" succeeded, probably shows 1,23.
	920
	921	and also how strings are parsed by POSIX::strtod() as numbers:
	922
	923	use locale;
	924	use POSIX qw(locale_h strtod);
	925	setlocale(LC_NUMERIC, "de_DE") or die "Entshuldigung";
	926	my $x = strtod("2,34") + 5;
	927	print $x, "\n"; # Probably shows 7,34.
	928
	929	=head1 NOTES
	930
	931	=head2 Backward compatibility
	932
	933	Versions of Perl prior to 5.004 B<mostly> ignored locale information,
	934	generally behaving as if something similar to the C<"C"> locale were
	935	always in force, even if the program environment suggested otherwise
	936	(see L<The setlocale function>). By default, Perl still behaves this
	937	way for backward compatibility. If you want a Perl application to pay
	938	attention to locale information, you B<must> use the S<C<use locale>>
	939	pragma (see L<The use locale pragma>) to instruct it to do so.
	940
	941	Versions of Perl from 5.002 to 5.003 did use the C<LC_CTYPE>
	942	information if available; that is, C<\w> did understand what
	943	were the letters according to the locale environment variables.
	944	The problem was that the user had no control over the feature:
	945	if the C library supported locales, Perl used them.
	946
	947	=head2 I18N:Collate obsolete
	948
	949	In versions of Perl prior to 5.004, per-locale collation was possible
	950	using the C<I18N::Collate> library module. This module is now mildly
	951	obsolete and should be avoided in new applications. The C<LC_COLLATE>
	952	functionality is now integrated into the Perl core language: One can
	953	use locale-specific scalar data completely normally with C<use locale>,
	954	so there is no longer any need to juggle with the scalar references of
	955	C<I18N::Collate>.
	956
	957	=head2 Sort speed and memory use impacts
	958
	959	Comparing and sorting by locale is usually slower than the default
	960	sorting; slow-downs of two to four times have been observed. It will
	961	also consume more memory: once a Perl scalar variable has participated
	962	in any string comparison or sorting operation obeying the locale
	963	collation rules, it will take 3-15 times more memory than before. (The
	964	exact multiplier depends on the string's contents, the operating system
	965	and the locale.) These downsides are dictated more by the operating
	966	system's implementation of the locale system than by Perl.
	967
	968	=head2 write() and LC_NUMERIC
	969
	970	Formats are the only part of Perl that unconditionally use information
	971	from a program's locale; if a program's environment specifies an
	972	LC_NUMERIC locale, it is always used to specify the decimal point
	973	character in formatted output. Formatted output cannot be controlled by
	974	C<use locale> because the pragma is tied to the block structure of the
	975	program, and, for historical reasons, formats exist outside that block
	976	structure.
	977
	978	=head2 Freely available locale definitions
	979
	980	There is a large collection of locale definitions at
	981	ftp://dkuug.dk/i18n/WG15-collection . You should be aware that it is
	982	unsupported, and is not claimed to be fit for any purpose. If your
	983	system allows installation of arbitrary locales, you may find the
	984	definitions useful as they are, or as a basis for the development of
	985	your own locales.
	986
	987	=head2 I18n and l10n
	988
	989	"Internationalization" is often abbreviated as B<i18n> because its first
	990	and last letters are separated by eighteen others. (You may guess why
	991	the internalin ... internaliti ... i18n tends to get abbreviated.) In
	992	the same way, "localization" is often abbreviated to B<l10n>.
	993
	994	=head2 An imperfect standard
	995
	996	Internationalization, as defined in the C and POSIX standards, can be
	997	criticized as incomplete, ungainly, and having too large a granularity.
	998	(Locales apply to a whole process, when it would arguably be more useful
	999	to have them apply to a single thread, window group, or whatever.) They
	1000	also have a tendency, like standards groups, to divide the world into
	1001	nations, when we all know that the world can equally well be divided
	1002	into bankers, bikers, gamers, and so on. But, for now, it's the only
	1003	standard we've got. This may be construed as a bug.
	1004
	1005	=head1 Unicode and UTF-8
	1006
	1007	The support of Unicode is new starting from Perl version 5.6, and
	1008	more fully implemented in the version 5.8. See L<perluniintro> and
	1009	L<perlunicode> for more details.
	1010
	1011	Usually locale settings and Unicode do not affect each other, but
	1012	there are exceptions, see L<perlunicode/Locales> for examples.
	1013
	1014	=head1 BUGS
	1015
	1016	=head2 Broken systems
	1017
	1018	In certain systems, the operating system's locale support
	1019	is broken and cannot be fixed or used by Perl. Such deficiencies can
	1020	and will result in mysterious hangs and/or Perl core dumps when the
	1021	C<use locale> is in effect. When confronted with such a system,
	1022	please report in excruciating detail to <F<perlbug@perl.org>>, and
	1023	complain to your vendor: bug fixes may exist for these problems
	1024	in your operating system. Sometimes such bug fixes are called an
	1025	operating system upgrade.
	1026
	1027	=head1 SEE ALSO
	1028
	1029	L<I18N::Langinfo>, L<perluniintro>, L<perlunicode>, L<open>,
	1030	L<POSIX/isalnum>, L<POSIX/isalpha>,
	1031	L<POSIX/isdigit>, L<POSIX/isgraph>, L<POSIX/islower>,
	1032	L<POSIX/isprint>, L<POSIX/ispunct>, L<POSIX/isspace>,
	1033	L<POSIX/isupper>, L<POSIX/isxdigit>, L<POSIX/localeconv>,
	1034	L<POSIX/setlocale>, L<POSIX/strcoll>, L<POSIX/strftime>,
	1035	L<POSIX/strtod>, L<POSIX/strxfrm>.
	1036
	1037	=head1 HISTORY
	1038
	1039	Jarkko Hietaniemi's original F<perli18n.pod> heavily hacked by Dominic
	1040	Dunlop, assisted by the perl5-porters. Prose worked over a bit by
	1041	Tom Christiansen.
	1042
	1043	Last update: Thu Jun 11 08:44:13 MDT 1998