3 * Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001,
4 * 2002, 2003, 2005, 2006, 2007, 2008 by Larry Wall and others
6 * You may distribute under the terms of either the GNU General Public
7 * License or the Artistic License, as specified in the README file.
12 * A Elbereth Gilthoniel,
13 * silivren penna míriel
14 * o menel aglar elenath!
15 * Na-chaered palan-díriel
16 * o galadhremmin ennorath,
17 * Fanuilos, le linnathon
18 * nef aear, si nef aearon!
20 * [p.238 of _The Lord of the Rings_, II/i: "Many Meetings"]
23 /* utility functions for handling locale-specific stuff like what
24 * character represents the decimal point.
26 * All C programs have an underlying locale. Perl code generally doesn't pay
27 * any attention to it except within the scope of a 'use locale'. For most
28 * categories, it accomplishes this by just using different operations if it is
29 * in such scope than if not. However, various libc functions called by Perl
30 * are affected by the LC_NUMERIC category, so there are macros in perl.h that
31 * are used to toggle between the current locale and the C locale depending on
32 * the desired behavior of those functions at the moment. And, LC_MESSAGES is
33 * switched to the C locale for outputting the message unless within the scope
36 * There is more than the typical amount of variation between platforms with
37 * regard to locale handling. At the end of these introductory comments, are
38 * listed various relevent Configuration options, including some that can be
39 * used to pretend to some extent that this is being developed on a different
40 * platform than it actually is. This allows you to make changes and catch
41 * some errors without having access to those other platforms.
43 * This code now has multi-thread-safe locale handling on systems that support
44 * that. This is completely transparent to most XS code. On earlier systems,
45 * it would be possible to emulate thread-safe locales, but this likely would
46 * involve a lot of locale switching, and would require XS code changes.
47 * Macros could be written so that the code wouldn't have to know which type of
48 * system is being used.
50 * Table-driven code is used for simplicity and clarity, as many operations
51 * differ only in which category is being worked on. However the system
52 * categories need not be small contiguous integers, so do not lend themselves
53 * to table lookup. Instead we have created our own equivalent values which
54 * are all small contiguous non-negative integers, and translation functions
55 * between the two sets. For category 'LC_foo', the name of our index is
56 * LC_foo_INDEX_. Various parallel tables, indexed by these, are used for the
57 * translation. The tables are generated at compile-time based on platform
58 * characteristics and Configure options. They hide from the code many of the
59 * vagaries of the different locale implementations out there.
61 * On unthreaded perls, most operations expand out to just the basic
62 * setlocale() calls. That sort of is true on threaded perls on modern Windows
63 * systems where the same API, after set up, is used for thread-safe locale
64 * handling. (But there are complications on Windows due to internal character
65 * set issues.) On other systems, there is a completely different API,
66 * specified in POSIX 2008, to do thread-safe locales. On these systems, our
67 * bool_setlocale_2008_i() function is used to hide the different API from the
68 * outside. This makes it completely transparent to most XS code.
70 * A huge complicating factor is that the LC_NUMERIC category is normally held
71 * in the C locale, except during those relatively rare times when it needs to
72 * be in the underlying locale. There is a bunch of code to accomplish this,
73 * and to allow easy switches from one state to the other.
75 * In addition, the setlocale equivalents have versions for the return context,
76 * 'void' and 'bool', besides the full return value. This can present
77 * opportunities for avoiding work. We don't have to necessarily create a safe
78 * copy to return if no return is desired.
80 * There are 3.5 major implementations here; which one chosen depends on what
81 * the platform has available, and Configuration options.
83 * 1) Raw posix_setlocale(). This implementation is basically the libc
84 * setlocale(), with possibly minor tweaks. This is used for startup, and
85 * always for unthreaded perls, and when the API for safe locale threading
86 * is identical to the unsafe API (Windows, currently).
88 * This implementation is composed of two layers:
89 * a) posix_setlocale() implements the libc setlocale(). In most cases,
90 * it is just an alias for the libc version. But Windows doesn't
91 * fully conform to the POSIX standard, and this is a layer on top of
92 * libc to bring it more into conformance. And in Configurations
93 * where perl is to ignore some locale categories that the libc
94 * setlocale() knows about, there is a layer to cope with that.
95 * b) stdized_setlocale() is a layer above a) that fixes some vagaries in
96 * the return value of the libc setlocale(). On most platforms this
97 * layer is empty; it requires perl to be Configured with a parameter
98 * indicating the platform's defect, in order to be activated. The
99 * current ones are listed at the definition of the macro.
101 * 2) An implementation that adds a minimal layer above implementation 1),
102 * making that implementation uninterruptible and returning a
103 * per-thread/per-category value.
105 * 3a and 3b) An implementation of POSIX 2008 thread-safe locale handling,
106 * hiding from the programmer the completely different API for this.
107 * This automatically makes almost all code thread-safe without need for
108 * changes. This implementation is chosen on threaded perls when the
109 * platform properly supports the POSIX 2008 functions, and when there is no
110 * manual override to the contrary passed to Configure.
112 * 3a) is when the platform has a documented reliable querylocale() function
113 * or equivalent that is selected to be used.
114 * 3b) is when we have to emulate that functionality.
116 * Unfortunately, it seems that some platforms that claim to support these
117 * are buggy, in one way or another. There are workarounds encoded here,
118 * where feasible, for platforms where the bugs are amenable to that
119 * (glibc, for example). But other platforms instead don't use this
122 * z/OS (os390) is an outlier. Locales really don't work under threads when
123 * either the radix character isn't a dot, or attempts are made to change
124 * locales after the first thread is created. The reason is that IBM has made
125 * it thread-safe by refusing to change locales (returning failure if
126 * attempted) any time after an application has called pthread_create() to
127 * create another thread. The expectation is that an application will set up
128 * its locale information before the first fork, and be stable thereafter. But
129 * perl toggles LC_NUMERIC if the locale's radix character isn't a dot, as do
130 * the other toggles, which are less common.
132 * Associated with each implementation are three sets of macros that translate
133 * a consistent API into what that implementation needs. Each set consists of
134 * three macros with the suffixes:
135 * _c Means the argument is a locale category number known at compile time.
136 * An example would be LC_TIME. This token is a compile-time constant
137 * and can be passed to a '_c' macro.
138 * _r Means the argument is a locale category number whose value might not be
139 * known until runtime
140 * _i Means the argument is our internal index of a locale category
142 * The three sets are: ('_X' means one of '_c', '_r', '_i')
143 * 1) bool_setlocale_X()
144 * This calls the appropriate setlocale()-equivalent for the
145 * implementation, with the category and new locale. The input locale is
146 * not necessarily valid, so the return is true or false depending on
147 * whether or not the setlocale() succeeded. This is not used for
148 * querying the locale, so the input locale must not be NULL.
150 * This macro is suitable for toggling the locale back and forth during an
151 * operation. For example, the names of days and months under LC_TIME are
152 * strings that are also subject to LC_CTYPE. If the locales of these two
153 * categories differ, mojibake can result on many platforms. The code
154 * here will toggle LC_CTYPE into the locale of LC_TIME temporarily to
157 * Several categories require extra work when their locale is changed.
158 * LC_CTYPE, for example, requires the calculation of the table of which
159 * characters fold to which others under /i pattern matching or fc(), as
160 * folding is not a concept in POSIX. This table isn't needed when the
161 * LC_CTYPE locale gets toggled during an operation, and will be toggled
162 * back before return to the caller. To save work that would be
163 * discarded, the bool_setlocale_X() implementations don't do this extra
164 * work. Instead, there is a separate function for just this purpose to
165 * be done before control is transferred back to the external caller. All
166 * categories that have such requirements have such a function. The
167 * update_functions[] array contains pointers to them (or NULL for
168 * categories which don't need a function).
170 * Care must be taken to remember to call the separate function before
171 * returning to an external caller, and to not use things it updates
172 * before its call. An alternative approach would be to have
173 * bool_setlocale_X() always call the update, which would return
174 * immediately if a flag wasn't set indicating it was time to actually
177 * 2) void_setlocale_X()
178 * This is like bool_setlocale_X(), but it is used only when it is
179 * expected that the call must succeed, or something is seriously wrong.
180 * A panic is issued if it fails. The caller uses this form when it just
181 * wants to assume things worked.
184 * This returns a string that specifies the current locale for the given
185 * category given by the input argument. The string is safe from other
186 * threads zapping it, and the caller need not worry about freeing it, but
187 * it may be mortalized, so must be copied if you need to preserve it
188 * across calls, or long term. This returns the actual current locale,
189 * not the nominal. These differ, for example, when LC_NUMERIC is
190 * supposed to be a locale whose decimal radix character is a comma. As
191 * mentioned above, Perl actually keeps this category set to C in such
192 * circumstances so that XS code can just assume a dot radix character.
193 * querylocale_X() returns the locale that libc has stored at this moment,
194 * so most of the time will return a locale whose radix character is a
195 * dot. The macro query_nominal_locale_i() can be used to get the nominal
196 * locale that an external caller would expect, for all categories except
197 * LC_ALL. For that, you can use the function
198 * S_calculate_LC_ALL_string(). Or S_native_querylocale_i() will operate
201 * The underlying C API that this implements uses category numbers, hence the
202 * code is structured to use '_r' at the API level to convert to indexes, which
203 * are then used internally with the '_i' forms.
205 * The splitting apart into setting vs querying means that the return value of
206 * the bool macros is not subject to potential clashes with other threads,
207 * eliminating any need for the calling code to worry about that and get it
208 * wrong. Whereas, you do have to think about thread interactions when using a
211 * Additionally, for the implementations where there aren't any complications,
212 * a setlocale_i() is defined that is like plain setlocale(), returning the new
213 * locale. Thus it combines a bool_setlocale_X() with a querylocale_X(). It
214 * is used only for performance on implementations that allow it, such as
215 * non-threaded perls.
217 * There are also a few other macros herein that use this naming convention to
218 * describe their category parameter.
220 * Relevant Configure options
222 * -Accflags=-DNO_LOCALE
223 * This compiles perl to always use the C locale, ignoring any
224 * attempts to change it. This could be useful on platforms with a
225 * crippled locale implementation.
227 * -Accflags=-DNO_THREAD_SAFE_LOCALE
228 * Even if thread-safe operations are available on this platform and
229 * would otherwise be used (because this is a perl with multiplicity),
230 * perl is compiled to not use them. This could be useful on
231 * platforms where the libc is buggy.
233 * -Accflags=-DNO_POSIX_2008_LOCALE
234 * Even if the libc locale operations specified by the Posix 2008
235 * Standard are available on this platform and would otherwise be used
236 * (because this is a perl with multiplicity), perl is compiled to not
237 * use them. This could be useful on platforms where the libc is
238 * buggy. This is like NO_THREAD_SAFE_LOCALE, but has no effect on
239 * platforms that don't have these functions.
241 * -Accflags=-DUSE_POSIX_2008_LOCALE
242 * Normally, setlocale() is used for locale operations on perls
243 * compiled without multiplicity. This option causes the locale
244 * operations defined by the Posix 2008 Standard to always be used
245 * instead. This could be useful on platforms where the libc
246 * setlocale() is buggy.
248 * -Accflags=-DNO_THREAD_SAFE_QUERYLOCALE
249 * This applies only to platforms that have a querylocale() libc
250 * function. perl assumes that that function is thread-safe, unless
251 * overridden by this, typically in a hints file. When overridden,
252 * querylocale() is called only while the locale mutex is locked, and
253 * the result is copied to a per-thread place before unlocking.
255 * -Accflags=-DUSE_NL_LOCALE_NAME
256 * glibc has an undocumented equivalent function to querylocale(). It
257 * currently isn't used by default because it is undocumented. But
258 * testing hasn't found any problems with it. Using this Configure
259 * option enables it on systems that have it (with no effect on
260 * systems lacking it). Enabling this removes the need for perl
261 * to keep its own records, hence is more efficient and guaranteed to
264 * -Accflags=-DNO_LOCALE_CTYPE
265 * -Accflags=-DNO_LOCALE_NUMERIC
268 * If the named category(ies) does(do) not exist on this platform,
269 * these have no effect. Otherwise they cause perl to be compiled to
270 * always keep the named category(ies) in the C locale.
272 * -Accflags=-DHAS_BROKEN_SETLOCALE_QUERY_LC_ALL
273 * This would be set in a hints file to tell perl that doing a libc
274 * setlocale(LC_ALL, NULL)
275 * can give erroneous results, and perl will compensate to get the
276 * correct results. This is known to be a problem in earlier AIX
279 * -Accflags=-DHAS_LF_IN_SETLOCALE_RETURN
280 * This would be set in a hints file to tell perl that a libc
281 * setlocale() can return results containing \n characters that need
282 * to be stripped off. khw believes there aren't any such platforms
283 * still in existence.
285 * -Accflags=USE_FAKE_LC_ALL_POSITIONAL_NOTATION
286 * This is used when developing Perl on a platform that uses
287 * 'name=value;' notation to represent LC_ALL when not all categories
288 * are the same. When so compiled, much of the code gets compiled
289 * and exercised that applies to platforms that instead use positional
290 * notation. This allows for finding many bugs in that portion of the
291 * implementation, without having to access such a platform.
293 * -Accflags=-DWIN32_USE_FAKE_OLD_MINGW_LOCALES
294 * This is used when developing Perl on a non-Windows platform to
295 * compile and exercise much of the locale-related code that instead
296 * applies to MingW platforms that don't use the more modern UCRT
297 * library. This allows for finding many bugs in that portion of the
298 * implementation, without having to access such a platform.
301 /* If the environment says to, we can output debugging information during
302 * initialization. This is done before option parsing, and before any thread
303 * creation, so can be a file-level static. (Must come before #including
307 /* Returns the Unix errno portion; ignoring any others. This is a macro here
308 * instead of putting it into perl.h, because unclear to khw what should be
310 #define GET_ERRNO saved_errno
313 static int debug_initialization = 0;
314 # define DEBUG_INITIALIZATION_set(v) (debug_initialization = v)
315 # define DEBUG_LOCALE_INITIALIZATION_ debug_initialization
317 # ifdef HAS_EXTENDED_OS_ERRNO
318 /* Output the non-zero errno and/or the non-zero extended errno */
319 # define DEBUG_ERRNO \
321 int extended = get_extended_os_errno(); \
322 const char * errno_string; \
323 if (GET_ERRNO == 0) { /* Skip output if both errno types are 0 */ \
324 if (LIKELY(extended == 0)) errno_string = ""; \
325 else errno_string = Perl_form(aTHX_ "; $^E=%d", extended); \
327 else if (LIKELY(extended == GET_ERRNO)) \
328 errno_string = Perl_form(aTHX_ "; $!=%d", GET_ERRNO); \
329 else errno_string = Perl_form(aTHX_ "; $!=%d, $^E=%d", \
330 GET_ERRNO, extended);
332 /* Output the errno, if non-zero */
333 # define DEBUG_ERRNO \
335 const char * errno_string = ""; \
336 if (GET_ERRNO != 0) { \
338 errno_string = Perl_form(aTHX_ "; $!=%d", GET_ERRNO); \
342 /* Automatically include the caller's file, and line number in debugging output;
343 * and the errno (and/or extended errno) if non-zero. On threaded perls add
345 # if defined(USE_ITHREADS) && ! defined(NO_LOCALE_THREADS)
346 # define DEBUG_PRE_STMTS \
348 PerlIO_printf(Perl_debug_log, "\n%s: %" LINE_Tf ": 0x%p%s: ", \
349 __FILE__, (line_t)__LINE__, aTHX_ \
352 # define DEBUG_PRE_STMTS \
354 PerlIO_printf(Perl_debug_log, "\n%s: %" LINE_Tf "%s: ", \
355 __FILE__, (line_t)__LINE__, \
358 # define DEBUG_POST_STMTS RESTORE_ERRNO;
360 # define debug_initialization 0
361 # define DEBUG_INITIALIZATION_set(v)
362 # define DEBUG_PRE_STMTS
363 # define DEBUG_POST_STMTS
367 #define PERL_IN_LOCALE_C
370 #ifdef WIN32_USE_FAKE_OLD_MINGW_LOCALES
372 /* Use -Accflags=-DWIN32_USE_FAKE_OLD_MINGW_LOCALES on a POSIX or *nix box
373 * to get a semblance of pretending the locale handling is that of a MingW
374 * that doesn't use UCRT (hence 'OLD' in the name). This exercizes code
375 * paths that are not compiled on non-Windows boxes, and allows for ASAN
376 * and PERL_MEMLOG. This is thus a way to see if locale.c on Windows is
377 * likely going to compile, without having to use a real Win32 box. And
378 * running the test suite will verify to a large extent our logic and memory
379 * allocation handling for such boxes. Of course the underlying calls are
380 * to the POSIX libc, so any differences in implementation between those and
381 * the Windows versions will not be caught by this. */
384 # undef P_CS_PRECEDES
385 # undef CURRENCY_SYMBOL
387 # undef _configthreadlocale
388 # define _configthreadlocale(arg) NOOP
390 # define MultiByteToWideChar(cp, flags, byte_string, m1, wstring, req_size) \
391 (mbsrtowcs(wstring, &(byte_string), req_size, NULL) + 1)
392 # define WideCharToMultiByte(cp, flags, wstring, m1, byte_string, \
393 req_size, default_char, found_default_char) \
394 (wcsrtombs(byte_string, &(wstring), req_size, NULL) + 1)
398 static const wchar_t * wsetlocale_buf = NULL;
399 static Size_t wsetlocale_buf_size = 0;
400 static PerlInterpreter * wsetlocale_buf_aTHX = NULL;
404 S_wsetlocale(const int category, const wchar_t * wlocale)
406 /* Windows uses a setlocale that takes a wchar_t* locale. Other boxes
407 * don't have this, so this Windows replacement converts the wchar_t input
408 * to plain 'char*', calls plain setlocale(), and converts the result back
411 const char * byte_locale = NULL;
413 byte_locale = Win_wstring_to_byte_string(CP_UTF8, wlocale);
416 const char * byte_result = setlocale(category, byte_locale);
417 Safefree(byte_locale);
418 if (byte_result == NULL) {
422 const wchar_t * wresult = Win_byte_string_to_wstring(CP_UTF8, byte_result);
428 /* Emulate a global static memory return from wsetlocale(). This currently
429 * leaks at process end; would require changing LOCALE_TERM to fix that */
430 Size_t string_size = wcslen(wresult) + 1;
432 if (wsetlocale_buf_size == 0) {
433 Newx(wsetlocale_buf, string_size, wchar_t);
434 wsetlocale_buf_size = string_size;
439 wsetlocale_buf_aTHX = aTHX;
444 else if (string_size > wsetlocale_buf_size) {
445 Renew(wsetlocale_buf, string_size, wchar_t);
446 wsetlocale_buf_size = string_size;
449 Copy(wresult, wsetlocale_buf, string_size, wchar_t);
452 return wsetlocale_buf;
455 # define _wsetlocale(category, wlocale) S_wsetlocale(category, wlocale)
457 #endif /* WIN32_USE_FAKE_OLD_MINGW_LOCALES */
459 /* 'for' loop headers to hide the necessary casts */
460 #define for_all_individual_category_indexes(i) \
461 for (locale_category_index i = (locale_category_index) 0; \
463 i = (locale_category_index) ((int) i + 1))
465 #define for_all_but_0th_individual_category_indexes(i) \
466 for (locale_category_index i = (locale_category_index) 1; \
468 i = (locale_category_index) ((int) i + 1))
470 #define for_all_category_indexes(i) \
471 for (locale_category_index i = (locale_category_index) 0; \
472 i <= LC_ALL_INDEX_; \
473 i = (locale_category_index) ((int) i + 1))
476 # if defined(USE_FAKE_LC_ALL_POSITIONAL_NOTATION) && defined(LC_ALL)
478 /* This simulates an underlying positional notation for LC_ALL when compiled on
479 * a system that uses name=value notation. Use this to develop on Linux and
480 * make a quick check that things have some chance of working on a positional
481 * box. Enable by adding to the Congfigure parameters:
482 * -Accflags=USE_FAKE_LC_ALL_POSITIONAL_NOTATION
484 * NOTE it redefines setlocale() and usequerylocale()
488 S_positional_name_value_xlation(const char * locale, bool direction)
489 { /* direction == 1 is from name=value to positional
490 direction == 0 is from positional to name=value */
494 const char * individ_locales[LC_ALL_INDEX_] = { NULL };
496 /* This parses either notation */
497 switch (parse_LC_ALL_string(locale,
498 (const char **) &individ_locales,
499 no_override, /* Handled by other code */
500 false, /* Return only [0] if suffices */
501 false, /* Don't panic on error */
504 default: /* Some compilers don't realize that below is the complete
505 list of the available enum values */
512 SAVEFREEPV(individ_locales[0]);
513 return individ_locales[0];
516 calc_LC_ALL_format format = (direction)
517 ? EXTERNAL_FORMAT_FOR_SET
519 const char * retval = calculate_LC_ALL_string(individ_locales,
524 for_all_individual_category_indexes(i) {
525 Safefree(individ_locales[i]);
534 S_positional_setlocale(int cat, const char * locale)
536 if (cat != LC_ALL) return setlocale(cat, locale);
538 if (locale && strNE(locale, "")) {
539 locale = S_positional_name_value_xlation(locale, 0);
540 if (! locale) return NULL;
543 locale = setlocale(cat, locale);
544 if (locale == NULL) return NULL;
545 return S_positional_name_value_xlation(locale, 1);
549 # define setlocale(a,b) S_positional_setlocale(a,b)
550 # ifdef USE_POSIX_2008_LOCALE
553 S_positional_newlocale(int mask, const char * locale, locale_t base)
557 if (mask != LC_ALL_MASK) return newlocale(mask, locale, base);
559 if (strNE(locale, "")) locale = S_positional_name_value_xlation(locale, 0);
560 if (locale == NULL) return NULL;
561 return newlocale(LC_ALL_MASK, locale, base);
565 # define newlocale(a,b,c) S_positional_newlocale(a,b,c)
568 #endif /* End of fake positional notation */
579 /* The main errno that gets used is this one, on platforms that support it */
581 # define SET_EINVAL SETERRNO(EINVAL, LIB_INVARG)
586 /* This is a starting guess as to when this is true. It definititely isn't
587 * true on *BSD where positional LC_ALL notation is used. Likely this will end
588 * up being defined in hints files. */
589 #ifdef PERL_LC_ALL_USES_NAME_VALUE_PAIRS
590 # define NEWLOCALE_HANDLES_DISPARATE_LC_ALL
593 /* But regardless, we have to look at individual categories if some are
595 #ifdef HAS_IGNORED_LOCALE_CATEGORIES_
596 # undef NEWLOCALE_HANDLES_DISPARATE_LC_ALL
600 /* Not all categories need be set to the same locale. This macro determines if
601 * 'name' which represents LC_ALL is uniform or disparate. There are two
602 * situations: 1) the platform uses unordered name=value pairs; 2) the platform
603 * uses ordered positional values, with a separator string between them */
604 # ifdef PERL_LC_ALL_SEPARATOR /* positional */
605 # define is_disparate_LC_ALL(name) cBOOL(instr(name, PERL_LC_ALL_SEPARATOR))
606 # else /* name=value */
608 /* In the, hopefully never occurring, event that the platform doesn't use
609 * either mechanism for disparate LC_ALL's, assume the name=value pairs
610 * form, rather than taking the extreme step of refusing to compile. Many
611 * programs won't have disparate locales, so will generally work */
612 # define PERL_LC_ALL_SEPARATOR ";"
613 # define is_disparate_LC_ALL(name) cBOOL( strchr(name, ';') \
614 && strchr(name, '='))
617 /* It is possible to compile perl to always keep any individual category in the
618 * C locale. This would be done where the implementation on a platform is
619 * flawed or incomplete. At the time of this writing, for example, OpenBSD has
620 * not implemented LC_COLLATE beyond the C locale. The 'category_available[]'
621 * table is a bool that says whether a category is changeable, or must be kept
622 * in C. This macro substitutes C for the locale appropriately, expanding to
623 * nothing on the more typical case where all possible categories present on
624 * the platform are handled. */
625 # ifdef HAS_IGNORED_LOCALE_CATEGORIES_
626 # define need_to_override_category(i) (! category_available[i])
627 # define override_ignored_category(i, new_locale) \
628 ((need_to_override_category(i)) ? "C" : (new_locale))
630 # define need_to_override_category(i) 0
631 # define override_ignored_category(i, new_locale) (new_locale)
634 PERL_STATIC_INLINE const char *
635 S_mortalized_pv_copy(pTHX_ const char * const pv)
637 PERL_ARGS_ASSERT_MORTALIZED_PV_COPY;
639 /* Copies the input pv, and arranges for it to be freed at an unspecified
646 const char * copy = savepv(pv);
653 /* Default values come from the C locale */
654 #define C_codeset "ANSI_X3.4-1968" /* Only in some Configurations, and usually
655 a single instance, so is a #define */
656 static const char C_decimal_point[] = ".";
658 #if (defined(USE_LOCALE_NUMERIC) && ! defined(TS_W32_BROKEN_LOCALECONV)) \
659 || ! ( defined(USE_LOCALE_NUMERIC) \
660 && (defined(HAS_SOME_LANGINFO) || defined(HAS_LOCALECONV)))
661 static const char C_thousands_sep[] = "";
664 /* Is the C string input 'name' "C" or "POSIX"? If so, and 'name' is the
665 * return of setlocale(), then this is extremely likely to be the C or POSIX
666 * locale. However, the output of setlocale() is documented to be opaque, but
667 * the odds are extremely small that it would return these two strings for some
668 * other locale. Note that VMS includes many non-ASCII characters in these two
669 * locales as controls and punctuation (below are hex bytes):
671 * punct: A1-A3 A5 A7-AB B0-B3 B5-B7 B9-BD BF-CF D1-DD DF-EF F1-FD
672 * Oddly, none there are listed as alphas, though some represent alphabetics
673 * http://www.nntp.perl.org/group/perl.perl5.porters/2013/02/msg198753.html */
674 #define isNAME_C_OR_POSIX(name) \
676 && (( *(name) == 'C' && (*(name + 1)) == '\0') \
677 || strEQ((name), "POSIX")))
679 #if defined(HAS_NL_LANGINFO_L) || defined(HAS_NL_LANGINFO)
680 # define HAS_SOME_LANGINFO
683 #define my_langinfo_c(item, category, locale, retbufp, retbuf_sizep, utf8ness) \
684 my_langinfo_i(item, category##_INDEX_, locale, retbufp, \
685 retbuf_sizep, utf8ness)
688 # define setlocale_debug_string_i(index, locale, result) \
689 my_setlocale_debug_string_i(index, locale, result, __LINE__)
690 # define setlocale_debug_string_c(category, locale, result) \
691 setlocale_debug_string_i(category##_INDEX_, locale, result)
692 # define setlocale_debug_string_r(category, locale, result) \
693 setlocale_debug_string_i(get_category_index(category), \
697 # define toggle_locale_i(index, locale) \
698 S_toggle_locale_i(aTHX_ index, locale, __LINE__)
699 # define toggle_locale_c(cat, locale) toggle_locale_i(cat##_INDEX_, locale)
700 # define restore_toggled_locale_i(index, locale) \
701 S_restore_toggled_locale_i(aTHX_ index, locale, __LINE__)
702 # define restore_toggled_locale_c(cat, locale) \
703 restore_toggled_locale_i(cat##_INDEX_, locale)
705 /* On systems without LC_ALL, pretending it exists anyway simplifies things.
706 * Choose a value for it that is very unlikely to clash with any actual
708 # define FAKE_LC_ALL PERL_INT_MIN
710 /* Below are parallel arrays for locale information indexed by our mapping of
711 * category numbers into small non-negative indexes. locale_table.h contains
712 * an entry like this for each individual category used on this system:
713 * PERL_LOCALE_TABLE_ENTRY(CTYPE, S_new_ctype)
715 * Each array redefines PERL_LOCALE_TABLE_ENTRY to generate the information
716 * needed for that array, and #includes locale_table.h to get the valid
719 * An entry for the conglomerate category LC_ALL is added here, immediately
720 * following the individual categories. (The treatment for it varies, so can't
721 * be in locale_table.h.)
723 * Following this, each array ends with an entry for illegal categories. All
724 * category numbers unknown to perl get mapped to this entry. This is likely
725 * to be a parameter error from the calling program; but it could be that this
726 * platform has a category we don't know about, in which case it needs to be
727 * added, using the paradigm of one of the existing categories. */
729 /* The first array is the locale categories perl uses on this system, used to
730 * map our index back to the system's category number. */
731 STATIC const int categories[] = {
733 # undef PERL_LOCALE_TABLE_ENTRY
734 # define PERL_LOCALE_TABLE_ENTRY(name, call_back) LC_ ## name,
735 # include "locale_table.h"
743 (FAKE_LC_ALL + 1) /* Entry for unknown category; this number is unlikely
744 to clash with a real category */
747 # define GET_NAME_AS_STRING(token) # token
748 # define GET_LC_NAME_AS_STRING(token) GET_NAME_AS_STRING(LC_ ## token)
750 /* The second array is the category names. */
751 STATIC const char * const category_names[] = {
753 # undef PERL_LOCALE_TABLE_ENTRY
754 # define PERL_LOCALE_TABLE_ENTRY(name, call_back) GET_LC_NAME_AS_STRING(name),
755 # include "locale_table.h"
758 # define LC_ALL_STRING "LC_ALL"
760 # define LC_ALL_STRING "If you see this, it is a bug in perl;" \
761 " please report it via perlbug"
766 # define LC_UNKNOWN_STRING "Locale category unknown to Perl; if you see" \
767 " this, it is a bug in perl; please report it" \
772 STATIC const Size_t category_name_lengths[] = {
774 # undef PERL_LOCALE_TABLE_ENTRY
775 # define PERL_LOCALE_TABLE_ENTRY(name, call_back) \
776 STRLENs(GET_LC_NAME_AS_STRING(name)),
777 # include "locale_table.h"
779 STRLENs(LC_ALL_STRING),
780 STRLENs(LC_UNKNOWN_STRING)
783 /* Each entry includes space for the '=' and ';' */
784 # undef PERL_LOCALE_TABLE_ENTRY
785 # define PERL_LOCALE_TABLE_ENTRY(name, call_back) \
786 + STRLENs(GET_LC_NAME_AS_STRING(name)) + 2
788 STATIC const Size_t lc_all_boiler_plate_length = 1 /* space for trailing NUL */
789 # include "locale_table.h"
792 /* A few categories require additional setup when they are changed. This table
793 * points to the functions that do that setup */
794 STATIC void (*update_functions[]) (pTHX_ const char *, bool force) = {
796 # undef PERL_LOCALE_TABLE_ENTRY
797 # define PERL_LOCALE_TABLE_ENTRY(name, call_back) call_back,
798 # include "locale_table.h"
801 NULL, /* No update for unknown category */
804 # if defined(HAS_IGNORED_LOCALE_CATEGORIES_)
806 /* Indicates if each category on this platform is available to use not in
808 STATIC const bool category_available[] = {
810 # undef PERL_LOCALE_TABLE_ENTRY
811 # define PERL_LOCALE_TABLE_ENTRY(name, call_back) LC_ ## name ## _AVAIL_,
812 # include "locale_table.h"
820 false /* LC_UNKNOWN_AVAIL_ */
824 # if defined(USE_POSIX_2008_LOCALE)
826 STATIC const int category_masks[] = {
828 # undef PERL_LOCALE_TABLE_ENTRY
829 # define PERL_LOCALE_TABLE_ENTRY(name, call_back) LC_ ## name ## _MASK,
830 # include "locale_table.h"
832 LC_ALL_MASK, /* Will rightly refuse to compile unless this is defined */
833 0 /* Empty mask for unknown category */
837 # if ! defined(PERL_LC_ALL_USES_NAME_VALUE_PAIRS)
839 /* On platforms that use positional notation for expressing LC_ALL, this maps
840 * the position of each category to our corresponding internal index for it.
841 * This is initialized at run time if needed. LC_ALL_INDEX_ is not legal for
842 * an individual locale, hence marks the elements here as not actually
846 map_LC_ALL_position_to_index[LC_ALL_INDEX_] = { LC_ALL_INDEX_ };
850 #if defined(USE_LOCALE) || defined(DEBUGGING)
853 S_get_displayable_string(pTHX_
854 const char * const s,
855 const char * const e,
858 PERL_ARGS_ASSERT_GET_DISPLAYABLE_STRING;
865 bool prev_was_printable = TRUE;
866 bool first_time = TRUE;
869 /* Worst case scenario: All are non-printable so have a blank between each.
870 * If UTF-8, all are the largest possible code point; otherwise all are a
871 * single byte. '(2 + 1)' is from each byte takes 2 characters to
872 * display, and a blank (or NUL for the final one) after it */
873 const Size_t size = (e - s) * (2 + 1) * ((is_utf8) ? UVSIZE : 1);
874 Newxz(ret, size, char);
879 ? utf8_to_uvchr_buf((U8 *) t, e, NULL)
882 if (! prev_was_printable) {
883 my_strlcat(ret, " ", size);
886 /* Escape these to avoid any ambiguity */
887 if (cp == ' ' || cp == '\\') {
888 my_strlcat(ret, "\\", size);
890 my_strlcat(ret, Perl_form(aTHX_ "%c", (U8) cp), size);
891 prev_was_printable = TRUE;
895 my_strlcat(ret, " ", size);
897 my_strlcat(ret, Perl_form(aTHX_ "%02" UVXf, cp), size);
898 prev_was_printable = FALSE;
900 t += (is_utf8) ? UTF8SKIP(t) : 1;
910 # define get_category_index(cat) get_category_index_helper(cat, NULL, __LINE__)
912 STATIC locale_category_index
913 S_get_category_index_helper(pTHX_ const int category, bool * succeeded,
914 const line_t caller_line)
916 PERL_ARGS_ASSERT_GET_CATEGORY_INDEX_HELPER;
918 /* Given a category, return the equivalent internal index we generally use
919 * instead, warn or panic if not found. */
921 locale_category_index i;
923 # undef PERL_LOCALE_TABLE_ENTRY
924 # define PERL_LOCALE_TABLE_ENTRY(name, call_back) \
925 case LC_ ## name: i = LC_ ## name ## _INDEX_; break;
929 # include "locale_table.h"
931 case LC_ALL: i = LC_ALL_INDEX_; break;
934 default: goto unknown_locale;
937 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
938 "index of category %d (%s) is %d;"
939 " called from %" LINE_Tf "\n",
940 category, category_names[i], i, caller_line));
952 return LC_ALL_INDEX_; /* Arbitrary */
955 locale_panic_via_(Perl_form(aTHX_ "Unknown locale category %d", category),
956 __FILE__, caller_line);
957 NOT_REACHED; /* NOTREACHED */
960 #endif /* ifdef USE_LOCALE */
963 Perl_force_locale_unlock(pTHX)
965 /* Remove any locale mutex, in preperation for an inglorious termination,
966 * typically a panic */
968 #if defined(USE_LOCALE_THREADS)
970 /* If recursively locked, clear all at once */
971 if (PL_locale_mutex_depth > 1) {
972 PL_locale_mutex_depth = 1;
975 if (PL_locale_mutex_depth > 0) {
983 #ifdef USE_POSIX_2008_LOCALE
986 S_use_curlocale_scratch(pTHX)
988 /* This function is used to hide from the caller the case where the current
989 * locale_t object in POSIX 2008 is the global one, which is illegal in
990 * many of the P2008 API calls. This checks for that and, if necessary
991 * creates a proper P2008 object. Any prior object is deleted, as is any
992 * remaining object during global destruction. */
994 locale_t cur = uselocale((locale_t) 0);
996 if (cur != LC_GLOBAL_LOCALE) {
1000 if (PL_scratch_locale_obj) {
1001 freelocale(PL_scratch_locale_obj);
1004 PL_scratch_locale_obj = duplocale(LC_GLOBAL_LOCALE);
1005 return PL_scratch_locale_obj;
1011 Perl_locale_panic(const char * msg,
1012 const line_t immediate_caller_line,
1013 const char * const higher_caller_file,
1014 const line_t higher_caller_line)
1016 PERL_ARGS_ASSERT_LOCALE_PANIC;
1020 force_locale_unlock();
1022 #ifdef USE_C_BACKTRACE
1023 dump_c_backtrace(Perl_debug_log, 20, 1);
1026 const char * called_by = "";
1027 if ( strNE(__FILE__, higher_caller_file)
1028 || immediate_caller_line != higher_caller_line)
1030 called_by = Perl_form(aTHX_ "\nCalled by %s: %" LINE_Tf "\n",
1031 higher_caller_file, higher_caller_line);
1036 const char * errno_text;
1038 #ifdef HAS_EXTENDED_OS_ERRNO
1040 const int extended_errnum = get_extended_os_errno();
1041 if (errno != extended_errnum) {
1042 errno_text = Perl_form(aTHX_ "; errno=%d, $^E=%d",
1043 errno, extended_errnum);
1050 errno_text = Perl_form(aTHX_ "; errno=%d", errno);
1053 /* diag_listed_as: panic: %s */
1054 Perl_croak(aTHX_ "%s: %" LINE_Tf ": panic: %s%s%s\n",
1055 __FILE__, immediate_caller_line,
1056 msg, errno_text, called_by);
1059 /* Macros to report and croak on an unexpected failure to set the locale. The
1060 * via version has more stack trace information */
1061 #define setlocale_failure_panic_i(i, cur, fail, line, higher_line) \
1062 setlocale_failure_panic_via_i(i, cur, fail, __LINE__, line, \
1063 __FILE__, higher_line)
1065 #define setlocale_failure_panic_c(cat, cur, fail, line, higher_line) \
1066 setlocale_failure_panic_i(cat##_INDEX_, cur, fail, line, higher_line)
1068 #if defined(USE_LOCALE)
1070 /* Expands to the code to
1071 * result = savepvn(s, len)
1072 * if the category whose internal index is 'i' doesn't need to be kept in the C
1073 * locale on this system, or if 'action is 'no_override'. Otherwise it expands
1075 * result = savepv("C")
1076 * unless 'action' isn't 'check_that_overridden', in which case if the string
1077 * 's' isn't already "C" it panics */
1078 # ifndef HAS_IGNORED_LOCALE_CATEGORIES_
1079 # define OVERRIDE_AND_SAVEPV(s, len, result, i, action) \
1080 result = savepvn(s, len)
1082 # define OVERRIDE_AND_SAVEPV(s, len, result, i, action) \
1084 if (LIKELY( ! need_to_override_category(i) \
1085 || action == no_override)) { \
1086 result = savepvn(s, len); \
1089 const char * temp = savepvn(s, len); \
1090 result = savepv(override_ignored_category(i, temp)); \
1091 if (action == check_that_overridden && strNE(result, temp)) { \
1092 locale_panic_(Perl_form(aTHX_ \
1093 "%s expected to be '%s', instead is '%s'", \
1094 category_names[i], result, temp)); \
1101 STATIC parse_LC_ALL_string_return
1102 S_parse_LC_ALL_string(pTHX_ const char * string,
1103 const char ** output,
1104 const parse_LC_ALL_STRING_action override,
1105 bool always_use_full_array,
1106 const bool panic_on_error,
1107 const line_t caller_line)
1109 /* This function parses the value of the input 'string' which is expected
1110 * to be the representation of an LC_ALL locale, and splits the result into
1111 * the values for the individual component categories, returning those in
1112 * the 'output' array. Each array value will be a savepv() copy that is
1113 * the responsibility of the caller to make sure gets freed
1115 * The locale for each category is independent of the other categories.
1116 * Often, they are all the same, but certainly not always. Perl, in fact,
1117 * usually keeps LC_NUMERIC in the C locale, regardless of the underlying
1118 * locale. LC_ALL has to be able to represent the case of when not all
1119 * categories have the same locale. Platforms have differing ways of
1120 * representing this. Internally, this file uses the 'name=value;'
1121 * representation found on some platforms, so this function always looks
1122 * for and parses that. Other platforms use a positional notation. On
1123 * those platforms, this function also parses that form. It examines the
1124 * input to see which form is being parsed.
1126 * Often, all categories will have the same locale. This is special cased
1127 * if 'always_use_full_array' is false on input:
1128 * 1) If the input 'string' is a single value, this function doesn't
1129 * store anything into 'output', and returns 'no_array'
1130 * 2) Some platforms will return multiple occurrences of the same
1131 * value rather than coalescing them down to a single one. HP-UX
1132 * is such a one. This function will do that collapsing for you,
1133 * returning 'only_element_0' and saving the single value in
1134 * output[0], which the caller will need to arrange to be freed.
1135 * The rest of output[] is undefined, and does not need to be
1138 * Otherwise, the input 'string' may not be valid. This function looks
1139 * mainly for syntactic errors, and if found, returns 'invalid'. 'output'
1140 * will not be filled in in that case, but the input state of it isn't
1141 * necessarily preserved. Turning on -DL debugging will give details as to
1142 * the error. If 'panic_on_error' is 'true', the function panics instead
1143 * of returning on error, with a message giving the details.
1145 * Otherwise, output[] will be filled with the individual locale names for
1146 * all categories on the system, 'full_array' will be returned, and the
1147 * caller needs to arrange for each to be freed. This means that either at
1148 * least one category differed from the others, or 'always_use_full_array' was
1151 * perl may be configured to ignore changes to a category's locale to
1152 * non-C. The parameter 'override' tells this function what to do when
1153 * encountering such an illegal combination:
1155 * no_override indicates to take no special action
1156 * override_if_ignored, indicates to return 'C' instead of what the
1157 * input string actually says.
1158 * check_that_overridden indicates to panic if the string says the
1159 * category is not 'C'. This is used when
1160 * non-C is very unexpected behavior.
1163 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
1164 "Entering parse_LC_ALL_string; called from %" \
1165 LINE_Tf "\nnew='%s'\n", caller_line, string));
1167 # ifdef PERL_LC_ALL_USES_NAME_VALUE_PAIRS
1169 const char separator[] = ";";
1170 const Size_t separator_len = 1;
1171 const bool single_component = (strchr(string, ';') == NULL);
1175 /* It's possible (but quite unlikely) that the separator string is an '='
1176 * or a ';'. Requiring both to be present for using the 'name=value;' form
1177 * properly handles those possibilities */
1178 const bool name_value = strchr(string, '=') && strchr(string, ';');
1179 const char * separator;
1180 Size_t separator_len;
1181 bool single_component;
1185 single_component = false; /* Since has both [;=], must be multi */
1188 separator = PERL_LC_ALL_SEPARATOR;
1189 separator_len = STRLENs(PERL_LC_ALL_SEPARATOR);
1190 single_component = instr(string, separator) == NULL;
1193 Size_t component_number = 0; /* Position in the parsing loop below */
1196 # ifndef HAS_IGNORED_LOCALE_CATEGORIES_
1197 PERL_UNUSED_ARG(override);
1200 /* Any ignored categories are to be set to "C", so if this single-component
1201 * LC_ALL isn't to C, it has both "C" and non-C, so isn't really a single
1202 * component. All the non-ignored categories are set to the input
1203 * component, but the ignored ones are overridden to be C.
1205 * This incidentally handles the case where the string is "". The return
1206 * will be C for each ignored category and "" for the others. Then the
1207 * caller can individually set each category, and get the right answer. */
1208 if (single_component && ! isNAME_C_OR_POSIX(string)) {
1209 for_all_individual_category_indexes(i) {
1210 OVERRIDE_AND_SAVEPV(string, strlen(string), output[i], i, override);
1218 if (single_component) {
1219 if (! always_use_full_array) {
1223 for_all_individual_category_indexes(i) {
1224 output[i] = savepv(string);
1230 /* Here the input is multiple components. Parse through them. (It is
1231 * possible that these components are all the same, so we check, and if so,
1232 * return just the 0th component (unless 'always_use_full_array' is true)
1234 * This enum notes the possible errors findable in parsing */
1239 contains_LC_ALL_element
1242 /* Keep track of the categories we have encountered so far */
1243 bool seen[LC_ALL_INDEX_] = { false };
1245 Size_t index; /* Our internal index for the current category */
1246 const char * s = string;
1247 const char * e = s + strlen(string);
1248 const char * category_end = NULL;
1249 const char * saved_first = NULL;
1251 /* Parse the input locale string */
1254 /* 'separator' has been set up to delimit the components */
1255 const char * next_sep = instr(s, separator);
1256 if (! next_sep) { /* At the end of the input */
1260 # ifndef PERL_LC_ALL_USES_NAME_VALUE_PAIRS
1263 /* Get the index of the category in this position */
1264 index = map_LC_ALL_position_to_index[component_number++];
1270 { /* Get the category part when each component is the
1271 * 'category=locale' form */
1273 category_end = strchr(s, '=');
1275 /* The '=' terminates the category name. If no '=', is improper
1277 if (! category_end) {
1282 /* Find our internal index of the category name; uses a linear
1283 * search. (XXX This could be avoided by various means, but the
1284 * maximum likely search is 6 items, and khw doesn't think the
1285 * added complexity would save very much at all.) */
1286 const unsigned int name_len = (unsigned int) (category_end - s);
1287 for (index = 0; index < C_ARRAY_LENGTH(category_names); index++) {
1288 if ( name_len == category_name_lengths[index]
1289 && memEQ(s, category_names[index], name_len))
1291 goto found_category;
1295 /* Here, the category is not in our list. */
1296 error = unknown_category;
1299 found_category: /* The system knows about this category. */
1301 if (index == LC_ALL_INDEX_) {
1302 error = contains_LC_ALL_element;
1306 /* The locale name starts just beyond the '=' */
1307 s = category_end + 1;
1309 /* Linux (and maybe others) doesn't treat a duplicate category in
1310 * the string as an error. Instead it uses the final occurrence as
1311 * the intended value. So if this is a duplicate, free the former
1312 * value before setting the new one */
1314 Safefree(output[index]);
1321 /* Here, 'index' contains our internal index number for the current
1322 * category, and 's' points to the beginning of the locale name for
1324 OVERRIDE_AND_SAVEPV(s, next_sep - s, output[index], index, override);
1326 if (! always_use_full_array) {
1327 if (! saved_first) {
1328 saved_first = output[index];
1331 if (strNE(saved_first, output[index])) {
1332 always_use_full_array = true;
1337 /* Next time start from the new position */
1338 s = next_sep + separator_len;
1341 /* Finished looping through all the categories
1343 * Check if the input was incomplete. */
1345 # ifndef PERL_LC_ALL_USES_NAME_VALUE_PAIRS
1347 if (! name_value) { /* Positional notation */
1348 if (component_number != LC_ALL_INDEX_) {
1357 { /* Here is the name=value notation */
1358 for_all_individual_category_indexes(i) {
1366 /* In the loop above, we changed 'always_use_full_array' to true iff not all
1367 * categories have the same locale. Hence, if it is still 'false', all of
1368 * them are the same. */
1369 if (always_use_full_array) {
1373 /* Free the dangling ones */
1374 for_all_but_0th_individual_category_indexes(i) {
1375 Safefree(output[i]);
1379 return only_element_0;
1383 /* Don't leave memory dangling that we allocated before the failure */
1384 for_all_individual_category_indexes(i) {
1386 Safefree(output[i]);
1392 const char * display_start = s;
1393 const char * display_end = e;
1397 msg = "doesn't list every locale category";
1398 display_start = string;
1401 msg = "needs an '=' to split name=value";
1403 case unknown_category:
1404 msg = "is an unknown category";
1405 display_end = (category_end && category_end > display_start)
1409 case contains_LC_ALL_element:
1410 msg = "has LC_ALL, which is illegal here";
1414 msg = Perl_form(aTHX_ "'%.*s' %s\n",
1415 (int) (display_end - display_start),
1416 display_start, msg);
1418 DEBUG_L(PerlIO_printf(Perl_debug_log, "%s", msg));
1420 if (panic_on_error) {
1421 locale_panic_via_(msg, __FILE__, caller_line);
1427 # undef OVERRIDE_AND_SAVEPV
1430 /*==========================================================================
1431 * Here starts the code that gives a uniform interface to its callers, hiding
1432 * the differences between platforms.
1434 * base_posix_setlocale_() presents a consistent POSIX-compliant interface to
1435 * setlocale(). Windows requres a customized base-level setlocale(). This
1436 * layer should only be used by the next level up: the plain posix_setlocale
1437 * layer. Any necessary mutex locking needs to be done at a higher level. The
1438 * return may be overwritten by the next call to this function */
1440 # define base_posix_setlocale_(cat, locale) win32_setlocale(cat, locale)
1442 # define base_posix_setlocale_(cat, locale) \
1443 ((const char *) setlocale(cat, locale))
1446 /*==========================================================================
1447 * Here is the main posix layer. It is the same as the base one unless the
1448 * system is lacking LC_ALL, or there are categories that we ignore, but that
1449 * the system libc knows about */
1451 #if ! defined(USE_LOCALE) \
1452 || (defined(LC_ALL) && ! defined(HAS_IGNORED_LOCALE_CATEGORIES_))
1453 # define posix_setlocale(cat, locale) base_posix_setlocale_(cat, locale)
1455 # define posix_setlocale(cat, locale) \
1456 S_posix_setlocale_with_complications(aTHX_ cat, locale, __LINE__)
1459 S_posix_setlocale_with_complications(pTHX_ const int cat,
1460 const char * new_locale,
1461 const line_t caller_line)
1463 /* This implements the posix layer above the base posix layer.
1464 * It is needed to reconcile our internal records that reflect only a
1465 * proper subset of the categories known by the system. */
1467 /* Querying the current locale returns the real value */
1468 if (new_locale == NULL) {
1469 new_locale = base_posix_setlocale_(cat, NULL);
1474 const char * locale_on_entry = NULL;
1476 /* If setting from the environment, actually do the set to get the system's
1477 * idea of what that means; we may have to override later. */
1478 if (strEQ(new_locale, "")) {
1479 locale_on_entry = base_posix_setlocale_(cat, NULL);
1480 assert(locale_on_entry);
1481 new_locale = base_posix_setlocale_(cat, "");
1490 const char * new_locales[LC_ALL_INDEX_] = { NULL };
1492 if (cat == LC_ALL) {
1493 switch (parse_LC_ALL_string(new_locale,
1494 (const char **) &new_locales,
1495 override_if_ignored, /* Override any
1498 false, /* Return only [0] if suffices */
1499 false, /* Don't panic on error */
1509 case only_element_0:
1510 new_locale = new_locales[0];
1511 SAVEFREEPV(new_locale);
1516 /* Turn the array into a string that the libc setlocale() should
1517 * understand. (Another option would be to loop, setting the
1518 * individual locales, and then return base(cat, NULL) */
1519 new_locale = calculate_LC_ALL_string(new_locales,
1520 EXTERNAL_FORMAT_FOR_SET,
1524 for_all_individual_category_indexes(i) {
1525 Safefree(new_locales[i]);
1528 /* And call the libc setlocale. We could avoid this call if
1529 * locale_on_entry is set and eq the new_locale. But that would be
1530 * only for the relatively rare case of the desired locale being
1531 * "", and the time spent in doing the string compare might be more
1532 * than that of just setting it unconditionally */
1533 new_locale = base_posix_setlocale_(cat, new_locale);
1544 /* Here, 'new_locale' is a single value, not an aggregation. Just set it.
1547 base_posix_setlocale_(cat,
1548 override_ignored_category(
1549 get_category_index(cat), new_locale));
1558 /* 'locale_on_entry' being set indicates there has likely been a change in
1559 * locale which needs to be restored */
1560 if (locale_on_entry) {
1561 if (! base_posix_setlocale_(cat, locale_on_entry)) {
1562 setlocale_failure_panic_i(get_category_index(cat),
1563 NULL, locale_on_entry,
1564 __LINE__, caller_line);
1574 /* End of posix layer
1575 *==========================================================================
1577 * The next layer up is to catch vagaries and bugs in the libc setlocale return
1578 * value. The return is not guaranteed to be stable.
1580 * Any necessary mutex locking needs to be done at a higher level.
1582 * On most platforms this layer is empty, expanding to just the layer
1583 * below. To enable it, call Configure with either or both:
1584 * -Accflags=-DHAS_LF_IN_SETLOCALE_RETURN
1585 * to indicate that extraneous \n characters can be returned
1587 * -Accflags=-DHAS_BROKEN_SETLOCALE_QUERY_LC_ALL
1588 * to indicate that setlocale(LC_ALL, NULL) cannot be relied
1592 #define STDIZED_SETLOCALE_LOCK POSIX_SETLOCALE_LOCK
1593 #define STDIZED_SETLOCALE_UNLOCK POSIX_SETLOCALE_UNLOCK
1594 #if ! defined(USE_LOCALE) \
1595 || ! ( defined(HAS_LF_IN_SETLOCALE_RETURN) \
1596 || defined(HAS_BROKEN_SETLOCALE_QUERY_LC_ALL))
1597 # define stdized_setlocale(cat, locale) posix_setlocale(cat, locale)
1598 # define stdize_locale(cat, locale) (locale)
1600 # define stdized_setlocale(cat, locale) \
1601 S_stdize_locale(aTHX_ cat, posix_setlocale(cat, locale), __LINE__)
1604 S_stdize_locale(pTHX_ const int category,
1605 const char *input_locale,
1606 const line_t caller_line)
1608 /* The return value of setlocale() is opaque, but is required to be usable
1609 * as input to a future setlocale() to create the same state.
1610 * Unfortunately not all systems are compliant. This function brings those
1611 * outliers into conformance. It is based on what problems have arisen in
1614 * This has similar constraints as the posix layer. You need to lock
1615 * around it until its return is safely copied or no longer needed. (The
1616 * return may point to a global static buffer or may be mortalized.)
1618 * The current things this corrects are:
1619 * 1) A new-line. This function chops any \n characters
1620 * 2) A broken 'setlocale(LC_ALL, foo)' This constructs a proper returned
1621 * string from the constituent categories
1623 * If no changes were made, the input is returned as-is */
1625 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
1626 "Entering stdize_locale(%d, '%s');"
1627 " called from %" LINE_Tf "\n",
1628 category, input_locale, caller_line));
1630 if (input_locale == NULL) {
1635 char * retval = (char *) input_locale;
1637 # if defined(LC_ALL) && defined(HAS_BROKEN_SETLOCALE_QUERY_LC_ALL)
1639 /* If setlocale(LC_ALL, NULL) is broken, compute what the system
1640 * actually thinks it should be from its individual components */
1641 if (category == LC_ALL) {
1642 retval = (char *) calculate_LC_ALL_string(
1643 NULL, /* query each individ locale */
1644 EXTERNAL_FORMAT_FOR_SET,
1650 # ifdef HAS_NL_IN_SETLOCALE_RETURN
1652 char * first_bad = NULL;
1656 PERL_UNUSED_ARG(category);
1657 PERL_UNUSED_ARG(caller_line);
1659 # define INPUT_LOCALE retval
1660 # define MARK_CHANGED
1663 char * individ_locales[LC_ALL_INDEX_] = { NULL };
1664 bool made_changes = false;
1666 if (category != LC_ALL) {
1667 individ_locales[0] = retval;
1672 /* And parse the locale string, splitting into its individual
1674 switch (parse_LC_ALL_string(retval,
1675 (const char **) &individ_locales,
1676 check_that_overridden, /* ignored
1680 false, /* Return only [0] if suffices */
1681 false, /* Don't panic on error */
1688 case full_array: /* Loop below through all the component categories.
1690 upper = LC_ALL_INDEX_ - 1;
1694 /* All categories here are set to the same locale, and the parse
1695 * didn't fill in any of 'individ_locales'. Set the 0th element to
1697 individ_locales[0] = retval;
1700 case only_element_0: /* Element 0 is the only element we need to look
1707 for (unsigned int i = 0; i <= upper; i++)
1709 # define INPUT_LOCALE individ_locales[i]
1710 # define MARK_CHANGED made_changes = true;
1711 # endif /* Has LC_ALL */
1714 first_bad = (char *) strchr(INPUT_LOCALE, '\n');
1716 /* Most likely, there isn't a problem with the input */
1717 if (UNLIKELY(first_bad)) {
1719 /* This element will need to be adjusted. Create a modifiable
1722 retval = savepv(INPUT_LOCALE);
1725 /* Translate the found position into terms of the copy */
1726 first_bad = retval + (first_bad - INPUT_LOCALE);
1728 /* Get rid of the \n and what follows. (Originally, only a
1729 * trailing \n was stripped. Unsure what to do if not trailing) */
1730 *((char *) first_bad) = '\0';
1731 } /* End of needs adjusting */
1732 } /* End of looking for problems */
1736 /* If we had multiple elements, extra work is required */
1739 /* If no changes were made to the input, 'retval' already contains it
1743 /* But if did make changes, need to calculate the new value */
1744 retval = (char *) calculate_LC_ALL_string(
1745 (const char **) &individ_locales,
1746 EXTERNAL_FORMAT_FOR_SET,
1751 /* And free the no-longer needed memory */
1752 for (unsigned int i = 0; i <= upper; i++) {
1753 Safefree(individ_locales[i]);
1758 # undef INPUT_LOCALE
1759 # undef MARK_CHANGED
1760 # endif /* HAS_NL_IN_SETLOCALE_RETURN */
1762 return (const char *) retval;
1765 #endif /* USE_LOCALE */
1767 /* End of stdize_locale layer
1769 * ==========================================================================
1771 * The next many lines form several implementations of a layer above the
1772 * close-to-the-metal 'posix' and 'stdized' macros. They are used to present a
1773 * uniform API to the rest of the code in this file in spite of the disparate
1774 * underlying implementations. Which implementation gets compiled depends on
1775 * the platform capabilities (and some user choice) as determined by Configure.
1777 * As more fully described in the introductory comments in this file, the
1778 * API of each implementation consists of three sets of macros. Each set has
1779 * three variants with suffixes '_c', '_r', and '_i'. In the list below '_X'
1780 * is to be replaced by any of these suffixes.
1782 * 1) bool_setlocale_X attempts to set the given category's locale to the
1783 * given value, returning if it worked or not.
1784 * 2) void_setlocale_X is like the corresponding bool_setlocale, but used when
1785 * success is the only sane outcome, so failure causes it
1787 * 3) querylocale_X to see what the given category's locale is
1789 * 4) setlocale_i() is defined only in those implementations where the bool
1790 * and query forms are essentially the same, and can be
1791 * combined to save CPU time.
1793 * Each implementation below is separated by ==== lines, and includes bool,
1794 * void, and query macros. The query macros are first, followed by any
1795 * functions needed to implement them. Then come the bool, again followed by
1796 * any implementing functions Then are the void macros; next is setlocale_i if
1797 * present on this implementation. Finally are any helper functions. The sets
1798 * in each implementation are separated by ---- lines.
1800 * The returned strings from all the querylocale...() forms in all
1801 * implementations are thread-safe, and the caller should not free them,
1802 * but each may be a mortalized copy. If you need something stable across
1803 * calls, you need to savepv() the result yourself.
1805 *===========================================================================*/
1807 #if (! defined(USE_LOCALE_THREADS) && ! defined(USE_POSIX_2008_LOCALE)) \
1808 || ( defined(WIN32) && defined(USE_THREAD_SAFE_LOCALE))
1810 /* For non-threaded perls, the implementation just expands to the base-level
1811 * functions (except if we are Configured to nonetheless use the POSIX 2008
1812 * interface) This implementation is also used on threaded perls where
1813 * threading is invisible to us. Currently this is only on later Windows
1816 # define querylocale_r(cat) mortalized_pv_copy(stdized_setlocale(cat, NULL))
1817 # define querylocale_c(cat) querylocale_r(cat)
1818 # define querylocale_i(i) querylocale_c(categories[i])
1820 /*---------------------------------------------------------------------------*/
1822 # define bool_setlocale_r(cat, locale) cBOOL(posix_setlocale(cat, locale))
1823 # define bool_setlocale_i(i, locale) \
1824 bool_setlocale_c(categories[i], locale)
1825 # define bool_setlocale_c(cat, locale) bool_setlocale_r(cat, locale)
1827 /*---------------------------------------------------------------------------*/
1829 # define void_setlocale_r_with_caller(cat, locale, file, line) \
1831 if (! bool_setlocale_r(cat, locale)) \
1832 setlocale_failure_panic_via_i(get_category_index(cat), \
1833 NULL, locale, __LINE__, 0, \
1837 # define void_setlocale_c_with_caller(cat, locale, file, line) \
1838 void_setlocale_r_with_caller(cat, locale, file, line)
1840 # define void_setlocale_i_with_caller(i, locale, file, line) \
1841 void_setlocale_r_with_caller(categories[i], locale, file, line)
1843 # define void_setlocale_r(cat, locale) \
1844 void_setlocale_r_with_caller(cat, locale, __FILE__, __LINE__)
1845 # define void_setlocale_c(cat, locale) void_setlocale_r(cat, locale)
1846 # define void_setlocale_i(i, locale) void_setlocale_r(categories[i], locale)
1848 /*---------------------------------------------------------------------------*/
1850 /* setlocale_i is only defined for Configurations where the libc setlocale()
1851 * doesn't need any tweaking. It allows for some shortcuts */
1852 # ifndef USE_LOCALE_THREADS
1853 # define setlocale_i(i, locale) stdized_setlocale(categories[i], locale)
1855 # elif defined(WIN32) && defined(USE_THREAD_SAFE_LOCALE)
1857 /* On Windows, we don't know at compile time if we are in thread-safe mode or
1858 * not. If we are, we can just return the result of the layer below us. If we
1859 * are in unsafe mode, we need to first copy that result to a safe place while
1860 * in a critical section */
1862 # define setlocale_i(i, locale) S_setlocale_i(aTHX_ categories[i], locale)
1865 S_setlocale_i(pTHX_ const int category, const char * locale)
1867 if (LIKELY(_configthreadlocale(0) == _ENABLE_PER_THREAD_LOCALE)) {
1868 return stdized_setlocale(category, locale);
1872 const char * retval = save_to_buffer(stdized_setlocale(category, locale),
1874 &PL_setlocale_bufsize);
1882 /*===========================================================================*/
1883 #elif defined(USE_LOCALE_THREADS) \
1884 && ! defined(USE_THREAD_SAFE_LOCALE)
1886 /* Here, there are threads, and there is no support for thread-safe
1887 * operation. This is a dangerous situation, which perl is documented as
1888 * not supporting, but it arises in practice. We can do a modicum of
1889 * automatic mitigation by making sure there is a per-thread return from
1890 * setlocale(), and that a mutex protects it from races */
1892 # define querylocale_r(cat) \
1893 mortalized_pv_copy(less_dicey_setlocale_r(cat, NULL))
1894 # define querylocale_c(cat) querylocale_r(cat)
1895 # define querylocale_i(i) querylocale_r(categories[i])
1898 S_less_dicey_setlocale_r(pTHX_ const int category, const char * locale)
1900 const char * retval;
1902 PERL_ARGS_ASSERT_LESS_DICEY_SETLOCALE_R;
1904 STDIZED_SETLOCALE_LOCK;
1906 retval = save_to_buffer(stdized_setlocale(category, locale),
1907 &PL_less_dicey_locale_buf,
1908 &PL_less_dicey_locale_bufsize);
1910 STDIZED_SETLOCALE_UNLOCK;
1915 /*---------------------------------------------------------------------------*/
1917 # define bool_setlocale_r(cat, locale) \
1918 less_dicey_bool_setlocale_r(cat, locale)
1919 # define bool_setlocale_i(i, locale) \
1920 bool_setlocale_r(categories[i], locale)
1921 # define bool_setlocale_c(cat, locale) bool_setlocale_r(cat, locale)
1924 S_less_dicey_bool_setlocale_r(pTHX_ const int cat, const char * locale)
1928 PERL_ARGS_ASSERT_LESS_DICEY_BOOL_SETLOCALE_R;
1930 /* Unlikely, but potentially possible that another thread could zap the
1931 * buffer from true to false or vice-versa, so need to lock here */
1932 POSIX_SETLOCALE_LOCK;
1933 retval = cBOOL(posix_setlocale(cat, locale));
1934 POSIX_SETLOCALE_UNLOCK;
1939 /*---------------------------------------------------------------------------*/
1941 # define void_setlocale_r_with_caller(cat, locale, file, line) \
1943 if (! bool_setlocale_r(cat, locale)) \
1944 setlocale_failure_panic_via_i(get_category_index(cat), \
1945 NULL, locale, __LINE__, 0, \
1949 # define void_setlocale_c_with_caller(cat, locale, file, line) \
1950 void_setlocale_r_with_caller(cat, locale, file, line)
1952 # define void_setlocale_i_with_caller(i, locale, file, line) \
1953 void_setlocale_r_with_caller(categories[i], locale, file, line)
1955 # define void_setlocale_r(cat, locale) \
1956 void_setlocale_r_with_caller(cat, locale, __FILE__, __LINE__)
1957 # define void_setlocale_c(cat, locale) void_setlocale_r(cat, locale)
1958 # define void_setlocale_i(i, locale) void_setlocale_r(categories[i], locale)
1960 /*===========================================================================*/
1962 #elif defined(USE_POSIX_2008_LOCALE)
1964 # error This code assumes that LC_ALL is available on a system modern enough to have POSIX 2008
1967 /* Here, there is a completely different API to get thread-safe locales. We
1968 * emulate the setlocale() API with our own function(s). setlocale categories,
1969 * like LC_NUMERIC, are not valid here for the POSIX 2008 API. Instead, there
1970 * are equivalents, like LC_NUMERIC_MASK, which we use instead, which we find
1971 * by table lookup. */
1973 # if defined(__GLIBC__) && defined(USE_LOCALE_MESSAGES)
1974 /* https://sourceware.org/bugzilla/show_bug.cgi?id=24936 */
1975 # define HAS_GLIBC_LC_MESSAGES_BUG
1976 # include <libintl.h>
1979 # define querylocale_i(i) querylocale_2008_i(i, __LINE__)
1980 # define querylocale_c(cat) querylocale_i(cat##_INDEX_)
1981 # define querylocale_r(cat) querylocale_i(get_category_index(cat))
1984 S_querylocale_2008_i(pTHX_ const locale_category_index index,
1985 const line_t caller_line)
1987 PERL_ARGS_ASSERT_QUERYLOCALE_2008_I;
1988 assert(index <= LC_ALL_INDEX_);
1990 /* This function returns the name of the locale category given by the input
1991 * 'index' into our parallel tables of them.
1993 * POSIX 2008, for some sick reason, chose not to provide a method to find
1994 * the category name of a locale, disregarding a basic linguistic tenet
1995 * that for any object, people will create a name for it. (The next
1996 * version of the POSIX standard is proposed to fix this.) Some vendors
1997 * have created a querylocale() function to do this in the meantime. On
1998 * systems without querylocale(), we have to keep track of what the locale
1999 * has been set to, so that we can return its name so as to emulate
2000 * setlocale(). There are potential problems with this:
2002 * 1) We don't know what calling newlocale() with the locale argument ""
2003 * actually does. It gets its values from the program's environment.
2004 * find_locale_from_environment() is used to work around this. But it
2005 * isn't fool-proof. See the comments for that function for details.
2006 * 2) It's possible for C code in some library to change the locale
2007 * without us knowing it, and thus our records become wrong;
2008 * querylocale() would catch this. But as of September 2017, there
2009 * are no occurrences in CPAN of uselocale(). Some libraries do use
2010 * setlocale(), but that changes the global locale, and threads using
2011 * per-thread locales will just ignore those changes.
2012 * 3) Many systems have multiple names for the same locale. Generally,
2013 * there is an underlying base name, with aliases that evaluate to it.
2014 * On some systems, if you set the locale to an alias, and then
2015 * retrieve the name, you get the alias as expected; but on others you
2016 * get the base name, not the alias you used. And sometimes the
2017 * charade is incomplete. See
2018 * https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=269375.
2020 * The code is structured so that the returned locale name when the
2021 * locale is changed is whatever the result of querylocale() on the
2022 * new locale is. This effectively gives the result the system
2023 * expects. Without querylocale, the name returned is always the
2024 * input name. Theoretically this could cause problems, but khw knows
2025 * of none so far, but mentions it here in case you are trying to
2026 * debug something. (This could be worked around by messing with the
2027 * global locale temporarily, using setlocale() to get the base name;
2028 * but that could cause a race. The comments for
2029 * find_locale_from_environment() give details on the potential race.)
2032 const locale_t cur_obj = uselocale((locale_t) 0);
2033 const char * retval;
2035 DEBUG_Lv(PerlIO_printf(Perl_debug_log, "querylocale_2008_i(%s) on %p;"
2036 " called from %" LINE_Tf "\n",
2037 category_names[index], cur_obj,
2040 if (UNLIKELY(cur_obj == LC_GLOBAL_LOCALE)) {
2042 /* Even on platforms that have querylocale(), it is unclear if they
2043 * work in the global locale, and we have the means to get the correct
2044 * answer anyway. khw is unsure this situation even comes up these
2045 * days, hence the branch prediction */
2046 POSIX_SETLOCALE_LOCK;
2047 retval = mortalized_pv_copy(posix_setlocale(categories[index], NULL));
2048 POSIX_SETLOCALE_UNLOCK;
2051 /* Here we have handled the case of the the current locale being the global
2052 * one. Below is the 'else' case of that. There are two different
2053 * implementations, depending on USE_PL_CURLOCALES */
2055 # ifdef USE_PL_CURLOCALES
2059 /* PL_curlocales[] is kept up-to-date for all categories except LC_ALL,
2060 * which may have been invalidated by setting it to NULL, and if so,
2061 * should now be calculated. (The called function updates that
2063 if (index == LC_ALL_INDEX_ && PL_curlocales[LC_ALL_INDEX_] == NULL) {
2064 calculate_LC_ALL_string((const char **) &PL_curlocales,
2070 if (cur_obj == PL_C_locale_obj) {
2072 /* If the current locale object is the C object, then the answer is
2073 * "C" or POSIX, regardless of the category. Handling this
2074 * reasonably likely case specially shortcuts extra effort, and
2075 * hides some bugs from us in OS's that alias other locales to C,
2076 * but do so incompletely. If our records say it is POSIX, use
2077 * that; otherwise use C. See
2078 * https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=269375 */
2079 retval = mortalized_pv_copy((strEQ(PL_curlocales[index], "POSIX"))
2084 retval = mortalized_pv_copy(PL_curlocales[index]);
2090 /* Below is the implementation of the 'else' clause which handles the case
2091 * of the current locale not being the global one on platforms where
2092 * USE_PL_CURLOCALES is NOT in effect. That means the system must have
2093 * some form of querylocale. But these have varying characteristics, so
2094 * first create some #defines to make the actual 'else' clause uniform.
2096 * First, glibc has a function that implements querylocale(), but is called
2097 * something else, and takes the category number; the others take the mask.
2099 # if defined(USE_QUERYLOCALE) && ( defined(_NL_LOCALE_NAME) \
2100 && defined(HAS_NL_LANGINFO_L))
2101 # define my_querylocale(index, cur_obj) \
2102 nl_langinfo_l(_NL_LOCALE_NAME(categories[index]), cur_obj)
2104 /* Experience so far shows it is thread-safe, as well as glibc's
2105 * nl_langinfo_l(), so unless overridden, mark it so */
2106 # ifdef NO_THREAD_SAFE_QUERYLOCALE
2107 # undef HAS_THREAD_SAFE_QUERYLOCALE
2109 # define HAS_THREAD_SAFE_QUERYLOCALE
2111 # else /* below, ! glibc */
2113 /* Otherwise, use the system's querylocale(). */
2114 # define my_querylocale(index, cur_obj) \
2115 querylocale(category_masks[index], cur_obj)
2117 /* There is no standard for this function, and khw has never seen
2118 * anything beyond minimal vendor documentation, lacking important
2119 * details. Experience has shown that some implementations have race
2120 * condiions, and their returns may not be thread safe. It would be
2121 * unreliable to test for complete thread safety in Configure. What we
2122 * do instead is to assume that it is thread-safe, unless overriden by,
2123 * say, a hints file specifying
2124 * -Accflags='-DNO_THREAD_SAFE_QUERYLOCALE */
2125 # ifdef NO_THREAD_SAFE_QUERYLOCALE
2126 # undef HAS_THREAD_SAFE_QUERYLOCALE
2128 # define HAS_THREAD_SAFE_QUERYLOCALE
2132 /* Here, we have set up enough information to know if this querylocale()
2133 * is thread-safe, or needs to use a mutex */
2134 # ifdef HAS_THREAD_SAFE_QUERYLOCALE
2135 # define QUERYLOCALE_LOCK
2136 # define QUERYLOCALE_UNLOCK
2138 # define QUERYLOCALE_LOCK gwLOCALE_LOCK
2139 # define QUERYLOCALE_UNLOCK gwLOCALE_UNLOCK
2142 /* Finally, everything is ready, so here is the 'else' clause to implement
2143 * the case of the current locale not being the global one on systems that
2144 * have some form of querylocale(). (POSIX will presumably eventually
2145 * publish their next version in their pipeline, which will define a
2146 * precisely specified querylocale equivalent, and there can be a new
2147 * #ifdef to use it without having to guess at its characteristics) */
2150 /* We don't keep records when there is querylocale(), so as to avoid the
2151 * pitfalls mentioned at the beginning of this function.
2153 * That means LC_ALL has to be calculated from all its constituent
2154 * categories each time, since the querylocale() forms on many (if not
2155 * all) platforms only work on individual categories */
2156 if (index == LC_ALL_INDEX_) {
2157 retval = calculate_LC_ALL_string(NULL, INTERNAL_FORMAT,
2164 retval = savepv(my_querylocale(index, cur_obj));
2167 /* querylocale() may conflate the C locale with something that
2168 * isn't exactly the same. See for example
2169 * https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=269375
2170 * We know that if the locale object is the C one, we
2171 * are in the C locale, which may go by the name POSIX, as both, by
2172 * definition, are equivalent. But we consider any other name
2173 * spurious, so override with "C". As in the PL_CURLOCALES case
2174 * above, this hides those glitches, for the most part, from the
2175 * rest of our code. (The code is ordered this way so that if the
2176 * system distinugishes "C" from "POSIX", we do too.) */
2177 if (cur_obj == PL_C_locale_obj && ! isNAME_C_OR_POSIX(retval)) {
2179 retval = savepv("C");
2186 # undef QUERYLOCALE_LOCK
2187 # undef QUERYLOCALE_UNLOCK
2190 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
2191 "querylocale_2008_i(%s) returning '%s'\n",
2192 category_names[index], retval));
2193 assert(strNE(retval, ""));
2197 /*---------------------------------------------------------------------------*/
2199 # define bool_setlocale_i(i, locale) \
2200 bool_setlocale_2008_i(i, locale, __LINE__)
2201 # define bool_setlocale_c(cat, locale) \
2202 bool_setlocale_i(cat##_INDEX_, locale)
2203 # define bool_setlocale_r(cat, locale) \
2204 bool_setlocale_i(get_category_index(cat), locale)
2206 /* If this doesn't exist on this platform, make it a no-op (to save #ifdefs) */
2207 # ifndef update_PL_curlocales_i
2208 # define update_PL_curlocales_i(index, new_locale, caller_line)
2212 S_bool_setlocale_2008_i(pTHX_
2214 /* Our internal index of the 'category' setlocale is called with */
2215 const locale_category_index index,
2216 const char * new_locale, /* The locale to set the category to */
2217 const line_t caller_line /* Called from this line number */
2220 PERL_ARGS_ASSERT_BOOL_SETLOCALE_2008_I;
2221 assert(index <= LC_ALL_INDEX_);
2223 /* This function effectively performs a setlocale() on just the current
2224 * thread; thus it is thread-safe. It does this by using the POSIX 2008
2225 * locale functions to emulate the behavior of setlocale(). Similar to
2226 * regular setlocale(), the return from this function points to memory that
2227 * can be overwritten by other system calls, so needs to be copied
2228 * immediately if you need to retain it. The difference here is that
2229 * system calls besides another setlocale() can overwrite it.
2231 * By doing this, most locale-sensitive functions become thread-safe. The
2232 * exceptions are mostly those that return a pointer to static memory.
2235 int mask = category_masks[index];
2236 const locale_t entry_obj = uselocale((locale_t) 0);
2237 const char * locale_on_entry = querylocale_i(index);
2239 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
2240 "bool_setlocale_2008_i: input=%d (%s), mask=0x%x,"
2241 " new locale=\"%s\", current locale=\"%s\","
2242 " index=%d, entry object=%p;"
2243 " called from %" LINE_Tf "\n",
2244 categories[index], category_names[index], mask,
2245 ((new_locale == NULL) ? "(nil)" : new_locale),
2246 locale_on_entry, index, entry_obj, caller_line));
2248 /* Here, trying to change the locale, but it is a no-op if the new boss is
2249 * the same as the old boss. Except this routine is called when converting
2250 * from the global locale, so in that case we will create a per-thread
2251 * locale below (with the current values). It also seemed that newlocale()
2252 * could free up the basis locale memory if we called it with the new and
2253 * old being the same, but khw now thinks that this was due to some other
2254 * bug, since fixed, as there are other places where newlocale() gets
2255 * similarly called without problems. */
2256 if ( entry_obj != LC_GLOBAL_LOCALE
2258 && strEQ(new_locale, locale_on_entry))
2260 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
2261 "bool_setlocale_2008_i: no-op to change to"
2262 " what it already was\n"));
2266 # ifndef USE_QUERYLOCALE
2268 /* Without a querylocale() mechanism, we have to figure out ourselves what
2269 * happens with setting a locale to "" */
2271 if (strEQ(new_locale, "")) {
2272 new_locale = find_locale_from_environment(index);
2280 # ifdef NEWLOCALE_HANDLES_DISPARATE_LC_ALL
2282 const bool need_loop = false;
2286 bool need_loop = false;
2287 const char * new_locales[LC_ALL_INDEX_] = { NULL };
2289 /* If we're going to have to parse the LC_ALL string, might as well do it
2290 * now before we have made changes that we would have to back out of if the
2292 if (index == LC_ALL_INDEX_) {
2293 switch (parse_LC_ALL_string(new_locale,
2294 (const char **) &new_locales,
2295 override_if_ignored,
2296 false, /* Return only [0] if suffices */
2297 false, /* Don't panic on error */
2308 case only_element_0:
2309 SAVEFREEPV(new_locales[0]);
2310 new_locale = new_locales[0];
2321 # ifdef HAS_GLIBC_LC_MESSAGES_BUG
2323 /* For this bug, if the LC_MESSAGES locale changes, we have to do an
2324 * expensive workaround. Save the current value so we can later determine
2326 const char * old_messages_locale = NULL;
2327 if ( (index == LC_MESSAGES_INDEX_ || index == LC_ALL_INDEX_)
2328 && LIKELY(PL_phase != PERL_PHASE_CONSTRUCT))
2330 old_messages_locale = querylocale_c(LC_MESSAGES);
2335 assert(PL_C_locale_obj);
2337 /* Now ready to switch to the input 'new_locale' */
2339 /* Switching locales generally entails freeing the current one's space (at
2340 * the C library's discretion), hence we can't be using that locale at the
2341 * time of the switch (this wasn't obvious to khw from the man pages). So
2342 * switch to a known locale object that we don't otherwise mess with. */
2343 if (! uselocale(PL_C_locale_obj)) {
2345 /* Not being able to change to the C locale is severe; don't keep
2347 setlocale_failure_panic_i(index, locale_on_entry, "C",
2348 __LINE__, caller_line);
2349 NOT_REACHED; /* NOTREACHED */
2352 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
2353 "bool_setlocale_2008_i: now using C"
2354 " object=%p\n", PL_C_locale_obj));
2356 /* These two objects are special:
2357 * LC_GLOBAL_LOCALE because it is undefined behavior to call
2358 * newlocale() with it as a parameter.
2359 * PL_C_locale_obj because newlocale() generally destroys its locale
2360 * object parameter when it succeeds; and we don't
2361 * want that happening to this immutable object.
2362 * Copies will be made for them to use instead if we get so far as to call
2364 bool entry_obj_is_special = ( entry_obj == LC_GLOBAL_LOCALE
2365 || entry_obj == PL_C_locale_obj);
2368 /* PL_C_locale_obj is LC_ALL set to the C locale. If this call is to
2369 * switch to LC_ALL => C, simply use that object. But in fact, we already
2370 * have switched to it just above, in preparation for the general case.
2371 * Since we're already there, no need to do further switching. */
2372 if (mask == LC_ALL_MASK && isNAME_C_OR_POSIX(new_locale)) {
2373 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
2374 "bool_setlocale_2008_i: will stay in C"
2376 new_obj = PL_C_locale_obj;
2378 /* 'entry_obj' is now dangling, of no further use to anyone (unless it
2379 * is one of the special ones). Free it to avoid a leak */
2380 if (! entry_obj_is_special) {
2381 freelocale(entry_obj);
2384 update_PL_curlocales_i(index, new_locale, caller_line);
2386 else { /* Here is the general case, not to LC_ALL => C */
2388 /* The newlocale() call(s) below take a basis object to build upon to
2389 * create the changed locale, trashing it iff successful.
2391 * For the objects that are not to be modified by this function, we
2392 * create a duplicate that gets trashed instead.
2394 * Also if we will have to loop doing multiple newlocale()s, there is a
2395 * chance we will succeed for the first few, and then fail, having to
2396 * back out. We need to duplicate 'entry_obj' in this case as well, so
2397 * it remains valid as something to back out to. */
2398 locale_t basis_obj = entry_obj;
2400 if (entry_obj_is_special || need_loop) {
2401 basis_obj = duplocale(basis_obj);
2403 locale_panic_via_("duplocale failed", __FILE__, caller_line);
2404 NOT_REACHED; /* NOTREACHED */
2407 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
2408 "bool_setlocale_2008_i created %p by"
2409 " duping the input\n", basis_obj));
2412 # define DEBUG_NEW_OBJECT_CREATED(category, locale, new, old, caller_line) \
2413 DEBUG_Lv(PerlIO_printf(Perl_debug_log, \
2414 "bool_setlocale_2008_i(%s, %s): created %p" \
2415 " while freeing %p; called from %" LINE_Tf \
2416 " via %" LINE_Tf "\n", \
2417 category, locale, new, old, \
2418 caller_line, __LINE__))
2419 # define DEBUG_NEW_OBJECT_FAILED(category, locale, basis_obj) \
2420 DEBUG_L(PerlIO_printf(Perl_debug_log, \
2421 "bool_setlocale_2008_i: creating new object" \
2422 " for (%s '%s') from %p failed; called from %" \
2423 LINE_Tf " via %" LINE_Tf "\n", \
2424 category, locale, basis_obj, \
2425 caller_line, __LINE__));
2427 /* Ready to create a new locale by modification of the existing one.
2429 * NOTE: This code may incorrectly show up as a leak under the address
2430 * sanitizer. We do not free this object under normal teardown, however
2431 * you can set PERL_DESTRUCT_LEVEL=2 to cause it to be freed.
2434 # ifdef NEWLOCALE_HANDLES_DISPARATE_LC_ALL
2436 /* Some platforms have a newlocale() that can handle disparate LC_ALL
2437 * input, so on these a single call to newlocale() always works */
2440 /* If a single call to newlocale() will do */
2446 new_obj = newlocale(mask,
2447 override_ignored_category(index, new_locale),
2450 DEBUG_NEW_OBJECT_FAILED(category_names[index], new_locale,
2453 /* Since the call failed, it didn't trash 'basis_obj', which is
2454 * a dup for these objects, and hence would leak if we don't
2455 * free it. XXX However, something is seriously wrong if we
2456 * can't switch to C or the global locale, so maybe should
2458 if (entry_obj_is_special) {
2459 freelocale(basis_obj);
2462 goto must_restore_state;
2465 DEBUG_NEW_OBJECT_CREATED(category_names[index], new_locale,
2466 new_obj, basis_obj, caller_line);
2468 update_PL_curlocales_i(index, new_locale, caller_line);
2471 # ifndef NEWLOCALE_HANDLES_DISPARATE_LC_ALL
2473 else { /* Need multiple newlocale() calls */
2475 /* Loop through the individual categories, setting the locale of
2476 * each to the corresponding name previously populated into
2477 * newlocales[]. Each iteration builds on the previous one, adding
2478 * its category to what's already been calculated, and taking as a
2479 * basis for what's been calculated 'basis_obj', which is updated
2480 * each iteration to be the result of the previous one. Upon
2481 * success, newlocale() trashes the 'basis_obj' parameter to it.
2482 * If any iteration fails, we immediately give up, restore the
2483 * locale to what it was at the time this function was called
2484 * (saved in 'entry_obj'), and return failure. */
2486 /* Loop, using the previous iteration's result as the basis for the
2487 * next one. (The first time we effectively use the locale in
2488 * force upon entry to this function.) */
2489 for_all_individual_category_indexes(i) {
2490 new_obj = newlocale(category_masks[i],
2494 DEBUG_NEW_OBJECT_CREATED(category_names[i],
2498 basis_obj = new_obj;
2502 /* Failed. Likely this is because the proposed new locale
2503 * isn't valid on this system. */
2505 DEBUG_NEW_OBJECT_FAILED(category_names[i],
2509 /* newlocale() didn't trash this, since the function call
2511 freelocale(basis_obj);
2513 for_all_individual_category_indexes(j) {
2514 Safefree(new_locales[j]);
2517 goto must_restore_state;
2520 /* Success for all categories. */
2521 for_all_individual_category_indexes(i) {
2522 update_PL_curlocales_i(i, new_locales[i], caller_line);
2523 Safefree(new_locales[i]);
2526 /* We dup'd entry_obj in case we had to fall back to it. The
2527 * newlocale() above destroyed the dup when it first succeeded, but
2528 * entry_obj itself is left dangling, so free it */
2529 if (! entry_obj_is_special) {
2530 freelocale(entry_obj);
2534 # endif /* End of newlocale can't handle disparate LC_ALL input */
2538 # undef DEBUG_NEW_OBJECT_CREATED
2539 # undef DEBUG_NEW_OBJECT_FAILED
2541 /* Here, successfully created an object representing the desired locale;
2542 * now switch into it */
2543 if (! uselocale(new_obj)) {
2544 freelocale(new_obj);
2545 locale_panic_(Perl_form(aTHX_ "(called from %" LINE_Tf "):"
2546 " bool_setlocale_2008_i: switching"
2547 " into new locale failed",
2551 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
2552 "bool_setlocale_2008_i: now using %p\n", new_obj));
2554 # ifdef MULTIPLICITY /* Unlikely, but POSIX 2008 functions could be
2555 Configured to be used on unthreaded perls, in which
2556 case this object doesn't exist */
2558 if (DEBUG_Lv_TEST) {
2559 if (PL_cur_locale_obj != new_obj) {
2560 PerlIO_printf(Perl_debug_log,
2561 "bool_setlocale_2008_i: PL_cur_locale_obj"
2562 " was %p, now is %p\n",
2563 PL_cur_locale_obj, new_obj);
2567 /* Update the current object */
2568 PL_cur_locale_obj = new_obj;
2571 # ifdef HAS_GLIBC_LC_MESSAGES_BUG
2573 /* Invalidate the glibc cache of loaded translations if the locale has
2574 * changed, see [perl #134264] and
2575 * https://sourceware.org/bugzilla/show_bug.cgi?id=24936 */
2576 if (old_messages_locale) {
2577 if (strNE(old_messages_locale, querylocale_c(LC_MESSAGES))) {
2578 textdomain(textdomain(NULL));
2588 /* We earlier switched to the LC_ALL => C locale in anticipation of it
2589 * succeeding, Now have to switch back to the state upon entry. */
2590 if (! uselocale(entry_obj)) {
2591 setlocale_failure_panic_i(index, "switching back to",
2592 locale_on_entry, __LINE__, caller_line);
2598 /*---------------------------------------------------------------------------*/
2600 # define void_setlocale_i_with_caller(i, locale, file, line) \
2602 if (! bool_setlocale_i(i, locale)) \
2603 setlocale_failure_panic_via_i(i, NULL, locale, __LINE__, 0, \
2607 # define void_setlocale_r_with_caller(cat, locale, file, line) \
2608 void_setlocale_i_with_caller(get_category_index(cat), locale, \
2611 # define void_setlocale_c_with_caller(cat, locale, file, line) \
2612 void_setlocale_i_with_caller(cat##_INDEX_, locale, file, line)
2614 # define void_setlocale_i(i, locale) \
2615 void_setlocale_i_with_caller(i, locale, __FILE__, __LINE__)
2616 # define void_setlocale_c(cat, locale) \
2617 void_setlocale_i(cat##_INDEX_, locale)
2618 # define void_setlocale_r(cat, locale) \
2619 void_setlocale_i(get_category_index(cat), locale)
2621 /*===========================================================================*/
2624 # error Unexpected Configuration
2625 #endif /* End of the various implementations of the setlocale and
2626 querylocale macros used in the remainder of this program */
2628 /* query_nominal_locale_i() is used when the caller needs the locale that an
2629 * external caller would be expecting, and not what we're secretly using
2630 * behind the scenes. It deliberately doesn't handle LC_ALL; use
2631 * calculate_LC_ALL_string() for that. */
2632 #ifdef USE_LOCALE_NUMERIC
2633 # define query_nominal_locale_i(i) \
2634 (__ASSERT_(i != LC_ALL_INDEX_) \
2635 ((i == LC_NUMERIC_INDEX_) ? PL_numeric_name : querylocale_i(i)))
2637 # define query_nominal_locale_i(i) \
2638 (__ASSERT_(i != LC_ALL_INDEX_) querylocale_i(i))
2641 #ifdef USE_PL_CURLOCALES
2644 S_update_PL_curlocales_i(pTHX_
2645 const locale_category_index index,
2646 const char * new_locale,
2647 const line_t caller_line)
2649 /* Update PL_curlocales[], which is parallel to the other ones indexed by
2650 * our mapping of libc category number to our internal equivalents. */
2652 PERL_ARGS_ASSERT_UPDATE_PL_CURLOCALES_I;
2653 assert(index <= LC_ALL_INDEX_);
2655 if (index == LC_ALL_INDEX_) {
2657 /* For LC_ALL, we change all individual categories to correspond,
2658 * including the LC_ALL element */
2659 for (unsigned int i = 0; i <= LC_ALL_INDEX_; i++) {
2660 Safefree(PL_curlocales[i]);
2661 PL_curlocales[i] = NULL;
2664 switch (parse_LC_ALL_string(new_locale,
2665 (const char **) &PL_curlocales,
2666 check_that_overridden, /* things should
2670 true, /* Always fill array */
2671 true, /* Panic if fails, as to get here
2672 it earlier had to have succeeded
2678 case only_element_0:
2679 locale_panic_via_("Unexpected return from parse_LC_ALL_string",
2680 __FILE__, caller_line);
2683 /* parse_LC_ALL_string() has already filled PL_curlocales properly,
2684 * except for the LC_ALL element, which should be set to
2686 PL_curlocales[LC_ALL_INDEX_] = savepv(new_locale);
2689 else { /* Not LC_ALL */
2691 /* Update the single category's record */
2692 Safefree(PL_curlocales[index]);
2693 PL_curlocales[index] = savepv(new_locale);
2695 /* Invalidate LC_ALL */
2696 Safefree(PL_curlocales[LC_ALL_INDEX_]);
2697 PL_curlocales[LC_ALL_INDEX_] = NULL;
2701 # endif /* Need PL_curlocales[] */
2703 /*===========================================================================*/
2705 #if defined(USE_LOCALE)
2707 /* This paradigm is needed in several places in the function below. We have to
2708 * substitute the nominal locale for LC_NUMERIC when returning a value for
2709 * external consumption */
2710 # ifndef USE_LOCALE_NUMERIC
2711 # define ENTRY(i, array, format) array[i]
2713 # define ENTRY(i, array, format) \
2714 (UNLIKELY( format == EXTERNAL_FORMAT_FOR_QUERY \
2715 && i == LC_NUMERIC_INDEX_) \
2722 S_calculate_LC_ALL_string(pTHX_ const char ** category_locales_list,
2723 const calc_LC_ALL_format format,
2724 const calc_LC_ALL_return returning,
2725 const line_t caller_line)
2727 PERL_ARGS_ASSERT_CALCULATE_LC_ALL_STRING;
2729 /* NOTE: On Configurations that have PL_curlocales[], this function has the
2730 * side effect of updating the LC_ALL_INDEX_ element with its result.
2732 * This function calculates a string that defines the locale(s) LC_ALL is
2733 * set to, in either:
2734 * 1) Our internal format if 'format' is set to INTERNAL_FORMAT.
2735 * 2) The external format returned by Perl_setlocale() if 'format' is set
2736 * to EXTERNAL_FORMAT_FOR_QUERY or EXTERNAL_FORMAT_FOR_SET.
2738 * These two are distinguished by:
2739 * a) EXTERNAL_FORMAT_FOR_SET returns the actual locale currently in
2741 * b) EXTERNAL_FORMAT_FOR_QUERY returns the nominal locale.
2742 * Currently this can differ only from the actual locale in the
2743 * LC_NUMERIC category when it is set to a locale whose radix is
2744 * not a dot. (The actual locale is kept as a dot to accommodate
2745 * the large corpus of XS code that expects it to be that;
2746 * switched to a non-dot temporarily during certain operations
2747 * that require the actual radix.)
2749 * In both 1) and 2), LC_ALL's values are passed to this function by
2750 * 'category_locales_list' which is either:
2751 * 1) a pointer to an array of strings with up-to-date values of all the
2752 * individual categories; or
2753 * 2) NULL, to indicate to use querylocale_i() to get each individual
2756 * The caller sets 'returning' to
2757 * WANT_TEMP_PV the function returns the calculated string
2758 * as a mortalized temporary, so the caller
2759 * doesn't have to worry about it being
2760 * per-thread, nor needs to arrange for its
2762 * WANT_PL_setlocale_buf the function stores the calculated string
2763 * into the per-thread buffer PL_setlocale_buf
2764 * and returns a pointer to that. The buffer
2765 * is cleaned up automatically in process
2766 * destruction. This return method avoids
2767 * extra copies in some circumstances.
2768 * WANT_VOID NULL is returned. This is used when the
2769 * function is being called only for its side
2770 * effect of updating
2771 * PL_curlocales[LC_ALL_INDEX_]
2773 * querylocale(), on systems that have it, doesn't tend to work for LC_ALL.
2774 * So we have to construct the answer ourselves based on the passed in
2777 * If all individual categories are the same locale, we can just set LC_ALL
2778 * to that locale. But if not, we have to create an aggregation of all the
2779 * categories on the system. Platforms differ as to the syntax they use
2780 * for these non-uniform locales for LC_ALL. Some, like glibc and Windows,
2781 * use an unordered series of name=value pairs, like
2782 * LC_NUMERIC=C;LC_TIME=en_US.UTF-8;...
2783 * to specify LC_ALL; others, like *BSD, use a positional notation with a
2784 * delimitter, typically a single '/' character:
2787 * When the external format is desired, this function returns whatever the
2788 * system expects. The internal format is always name=value pairs.
2790 * For systems that have categories we don't know about, the algorithm
2791 * below won't know about those missing categories, leading to potential
2792 * bugs for code that looks at them. If there is an environment variable
2793 * that sets that category, we won't know to look for it, and so our use of
2794 * LANG or "C" improperly overrides it. On the other hand, if we don't do
2795 * what is done here, and there is no environment variable, the category's
2796 * locale should be set to LANG or "C". So there is no good solution. khw
2797 * thinks the best is to make sure we have a complete list of possible
2798 * categories, adding new ones as they show up on obscure platforms.
2801 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
2802 "Entering calculate_LC_ALL_string(%s);"
2803 " called from %" LINE_Tf "\n",
2804 ((format == EXTERNAL_FORMAT_FOR_QUERY)
2805 ? "EXTERNAL_FORMAT_FOR_QUERY"
2806 : ((format == EXTERNAL_FORMAT_FOR_SET)
2807 ? "EXTERNAL_FORMAT_FOR_SET"
2808 : "INTERNAL_FORMAT")),
2811 bool input_list_was_NULL = (category_locales_list == NULL);
2813 /* If there was no input category list, construct a temporary one
2815 const char * my_category_locales_list[LC_ALL_INDEX_];
2816 const char ** locales_list = category_locales_list;
2817 if (locales_list == NULL) {
2818 locales_list = my_category_locales_list;
2820 if (format == EXTERNAL_FORMAT_FOR_QUERY) {
2821 for_all_individual_category_indexes(i) {
2822 locales_list[i] = query_nominal_locale_i(i);
2826 for_all_individual_category_indexes(i) {
2827 locales_list[i] = querylocale_i(i);
2832 /* While we are calculating LC_ALL, we see if every category's locale is
2833 * the same as every other's or not. */
2834 # ifndef HAS_IGNORED_LOCALE_CATEGORIES_
2836 /* When we pay attention to all categories, we assume they are all the same
2837 * until proven different */
2838 bool disparate = false;
2842 /* But if there are ignored categories, those will be set to "C", so try an
2843 * arbitrary category, and if it isn't C, we know immediately that the
2844 * locales are disparate. (The #if conditionals are to handle the case
2845 * where LC_NUMERIC_INDEX_ is 0. We don't want to use LC_NUMERIC to
2846 * compare, as that may be different between external and internal forms.)
2848 # if ! defined(USE_LOCALE_NUMERIC)
2850 bool disparate = ! isNAME_C_OR_POSIX(locales_list[0]);
2852 # elif LC_NUMERIC_INDEX_ != 0
2854 bool disparate = ! isNAME_C_OR_POSIX(locales_list[0]);
2858 /* Would need revision to handle the very unlikely case where only a single
2859 * category, LC_NUMERIC, is defined */
2860 assert(LOCALE_CATEGORIES_COUNT_ > 0);
2862 bool disparate = ! isNAME_C_OR_POSIX(locales_list[1]);
2867 /* Calculate the needed size for the string listing the individual locales.
2868 * Initialize with values known at compile time. */
2870 const char *separator;
2872 # ifdef PERL_LC_ALL_USES_NAME_VALUE_PAIRS /* Positional formatted LC_ALL */
2873 PERL_UNUSED_ARG(format);
2876 if (format != INTERNAL_FORMAT) {
2878 /* Here, we will be using positional notation. it includes n-1
2880 total_len = ( LOCALE_CATEGORIES_COUNT_ - 1)
2881 * STRLENs(PERL_LC_ALL_SEPARATOR)
2882 + 1; /* And a trailing NUL */
2883 separator = PERL_LC_ALL_SEPARATOR;
2890 /* name=value output is always used in internal format, and when
2891 * positional isn't available on the platform. */
2892 total_len = lc_all_boiler_plate_length;
2896 /* The total length then is just the sum of the above boiler-plate plus the
2897 * total strlen()s of the locale name of each individual category. */
2898 for_all_individual_category_indexes(i) {
2899 const char * entry = ENTRY(i, locales_list, format);
2901 total_len += strlen(entry);
2902 if (! disparate && strNE(entry, locales_list[0])) {
2907 bool free_if_void_return = false;
2908 const char * retval;
2910 /* If all categories have the same locale, we already know the answer */
2912 if (returning == WANT_PL_setlocale_buf) {
2913 save_to_buffer(locales_list[0],
2915 &PL_setlocale_bufsize);
2916 retval = PL_setlocale_buf;
2920 retval = locales_list[0];
2922 /* If a temporary is wanted for the return, and we had to create
2923 * the input list ourselves, we created it into such a temporary,
2924 * so no further work is needed; but otherwise, make a mortal copy
2925 * of this passed-in list element */
2926 if (returning == WANT_TEMP_PV && ! input_list_was_NULL) {
2927 retval = savepv(retval);
2931 /* In all cases here, there's nothing we create that needs to be
2932 * freed, so leave 'free_if_void_return' set to the default
2936 else { /* Here, not all categories have the same locale */
2940 /* If returning to PL_setlocale_buf, set up to write directly to it,
2941 * being sure it is resized to be large enough */
2942 if (returning == WANT_PL_setlocale_buf) {
2943 set_save_buffer_min_size(total_len,
2945 &PL_setlocale_bufsize);
2946 constructed = PL_setlocale_buf;
2948 else { /* Otherwise we need new memory to hold the calculated value. */
2950 Newx(constructed, total_len, char);
2952 /* If returning the new memory, it must be set up to be freed
2953 * later; otherwise at the end of this function */
2954 if (returning == WANT_TEMP_PV) {
2955 SAVEFREEPV(constructed);
2958 free_if_void_return = true;
2962 constructed[0] = '\0';
2964 /* Loop through all the categories */
2965 for_all_individual_category_indexes(j) {
2967 /* Add a separator, except before the first one */
2969 my_strlcat(constructed, separator, total_len);
2976 # ifndef PERL_LC_ALL_USES_NAME_VALUE_PAIRS
2978 if (UNLIKELY(format != INTERNAL_FORMAT)) {
2980 /* In positional notation 'j' means the position, and we have
2981 * to convert to the index 'i' */
2982 i = map_LC_ALL_position_to_index[j];
2984 entry = ENTRY(i, locales_list, format);
2985 needed_len = my_strlcat(constructed, entry, total_len);
2991 /* Below, we are to use name=value notation, either because
2992 * that's what the platform uses, or because this is the
2993 * internal format, which uses that notation regardless of the
2996 entry = ENTRY(i, locales_list, format);
2998 /* "name=locale;" */
2999 my_strlcat(constructed, category_names[i], total_len);
3000 my_strlcat(constructed, "=", total_len);
3001 needed_len = my_strlcat(constructed, entry, total_len);
3004 if (LIKELY(needed_len <= total_len)) {
3008 /* If would have overflowed, panic */
3009 locale_panic_via_(Perl_form(aTHX_
3010 "Internal length calculation wrong.\n"
3011 "\"%s\" was not entirely added to"
3012 " \"%.*s\"; needed=%zu, had=%zu",
3013 entry, (int) total_len,
3015 needed_len, total_len),
3018 } /* End of loop through the categories */
3020 retval = constructed;
3021 } /* End of the categories' locales are displarate */
3023 # if defined(USE_PL_CURLOCALES) && defined(LC_ALL)
3025 if (format == INTERNAL_FORMAT) {
3027 /* PL_curlocales[LC_ALL_INDEX_] is updated as a side-effect of this
3028 * function for internal format. */
3029 Safefree(PL_curlocales[LC_ALL_INDEX_]);
3030 PL_curlocales[LC_ALL_INDEX_] = savepv(retval);
3035 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
3036 "calculate_LC_ALL_string calculated '%s'\n",
3039 if (returning == WANT_VOID) {
3040 if (free_if_void_return) {
3050 # if defined(WIN32) || ( defined(USE_POSIX_2008_LOCALE) \
3051 && ! defined(USE_QUERYLOCALE))
3054 S_find_locale_from_environment(pTHX_ const locale_category_index index)
3056 /* NB: This function may actually change the locale on Windows. It
3057 * currently is designed to be called only from setting the locale on
3058 * Windows, and POSIX 2008
3060 * This function returns the locale specified by the program's environment
3061 * for the category specified by our internal index number 'index'. It
3062 * therefore simulates:
3063 * setlocale(cat, "")
3064 * but, except for some cases in Windows, doesn't actually change the
3065 * locale; merely returns it.
3067 * The return need not be freed by the caller. This
3068 * promise relies on PerlEnv_getenv() returning a mortalized copy to us.
3070 * The simulation is needed only on certain platforms; otherwise, libc is
3071 * called with "" to get the actual value(s). The simulation is needed
3074 * 1) On Windows systems, the concept of the POSIX ordering of
3075 * environment variables is missing. To increase portability of
3076 * programs across platforms, the POSIX ordering is emulated on
3079 * 2) On POSIX 2008 systems without querylocale(), it is problematic
3080 * getting the results of the POSIX 2008 equivalent of
3082 * setlocale(category, "")
3084 * To ensure that we know exactly what those values are, we do the
3085 * setting ourselves, using the documented algorithm specified by the
3086 * POSIX standard (assuming the platform follows the Standard) rather
3087 * than use "" as the locale. This will lead to results that differ
3088 * from native behavior if the native behavior differs from the
3089 * Standard's documented value, but khw believes it is better to know
3090 * what's going on, even if different from native, than to just guess.
3092 * glibc systems differ from this standard in having a LANGUAGE
3093 * environment variable used for just LC_MESSAGES. This function does
3096 * Another option for the POSIX 2008 case would be, in a critical
3097 * section, to save the global locale's current value, and do a
3098 * straight setlocale(LC_ALL, ""). That would return our desired
3099 * values, destroying the global locale's, which we would then
3100 * restore. But that could cause races with any other thread that is
3101 * using the global locale and isn't using the mutex. And, the only
3102 * reason someone would have done that is because they are calling a
3103 * library function, like in gtk, that calls setlocale(), and which
3104 * can't be changed to use the mutex. That wouldn't be a problem if
3105 * this were to be done before any threads had switched, say during
3106 * perl construction time. But this code would still be needed for
3109 * The Windows and POSIX 2008 differ in that the ultimate fallback is "C"
3110 * in POSIX, and is the system default locale in Windows. To get that
3111 * system default value, we actually have to call setlocale() on Windows.
3114 const char * const lc_all = PerlEnv_getenv("LC_ALL");
3115 const char * locale_names[LC_ALL_INDEX_] = { NULL };
3117 /* Use any "LC_ALL" environment variable, as it overrides everything else.
3119 if (lc_all && strNE(lc_all, "")) {
3123 /* Here, no usable LC_ALL environment variable. We have to handle each
3124 * category separately. If all categories are desired, we loop through
3125 * them all. If only an individual category is desired, to avoid
3126 * duplicating logic, we use the same loop, but set up the limits so it is
3127 * only executed once, for that particular category. */
3128 locale_category_index lower, upper, offset;
3129 if (index == LC_ALL_INDEX_) {
3130 lower = (locale_category_index) 0;
3131 upper = (locale_category_index) ((int) LC_ALL_INDEX_ - 1);
3132 offset = (locale_category_index) 0;
3138 /* 'offset' is used so that the result of the single loop iteration is
3139 * stored into output[0] */
3143 /* When no LC_ALL environment variable, LANG is used as a default, but
3144 * overridden for individual categories that have corresponding environment
3145 * variables. If no LANG exists, the default is "C" on POSIX 2008, or the
3146 * system default for the category on Windows. */
3147 const char * env_lang = NULL;
3149 /* For each desired category, use any corresponding environment variable;
3150 * or the default if none such exists. */
3151 bool is_disparate = false; /* Assume is uniform until proven otherwise */
3152 for (unsigned i = lower; i <= upper; i++) {
3153 const char * const env_override = PerlEnv_getenv(category_names[i]);
3154 unsigned int j = i - offset;
3156 if (env_override && strNE(env_override, "")) {
3157 locale_names[j] = env_override;
3159 else { /* Here, no corresponding environment variable, see if LANG
3160 exists and is usable. Done this way to avoid fetching LANG
3161 unless it is actually needed */
3162 if (env_lang == NULL) {
3163 env_lang = PerlEnv_getenv("LANG");
3165 /* If not usable, set it to a non-NULL illegal value so won't
3166 * try to use it below */
3167 if (env_lang == NULL || strEQ(env_lang, "")) {
3168 env_lang = (const char *) 1;
3172 /* If a usable LANG exists, use it. */
3173 if (env_lang != NULL && env_lang != (const char *) 1) {
3174 locale_names[j] = env_lang;
3179 /* If no LANG, use the system default on Windows. */
3180 locale_names[j] = wrap_wsetlocale(categories[i], ".ACP");
3181 if (locale_names[j]) {
3182 SAVEFREEPV(locale_names[j]);
3186 { /* If nothing was found or worked, use C */
3187 locale_names[j] = "C";
3192 if (j > 0 && ! is_disparate && strNE(locale_names[0], locale_names[j]))
3194 is_disparate = true;
3197 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
3198 "find_locale_from_environment i=%u, j=%u, name=%s,"
3199 " locale=%s, locale of 0th category=%s, disparate=%d\n",
3200 i, j, category_names[i],
3201 locale_names[j], locale_names[0], is_disparate));
3204 if (! is_disparate) {
3205 return locale_names[0];
3208 return calculate_LC_ALL_string(locale_names, INTERNAL_FORMAT,
3214 # if defined(DEBUGGING) || defined(USE_PERL_SWITCH_LOCALE_CONTEXT)
3217 S_get_LC_ALL_display(pTHX)
3219 return calculate_LC_ALL_string(NULL, INTERNAL_FORMAT,
3227 S_setlocale_failure_panic_via_i(pTHX_
3228 const locale_category_index cat_index,
3229 const char * current,
3230 const char * failed,
3231 const line_t proxy_caller_line,
3232 const line_t immediate_caller_line,
3233 const char * const higher_caller_file,
3234 const line_t higher_caller_line)
3236 PERL_ARGS_ASSERT_SETLOCALE_FAILURE_PANIC_VIA_I;
3238 /* Called to panic when a setlocale form unexpectedly failed for the
3239 * category determined by 'cat_index', and the locale that was in effect
3240 * (and likely still is) is 'current'. 'current' may be NULL, which causes
3241 * this function to query what it is.
3243 * The extra caller information is used for when a function acts as a
3244 * stand-in for another function, which a typical reader would more likely
3245 * think would be the caller
3247 * If a line number is 0, its stack (sort-of) frame is omitted; same if
3248 * it's the same line number as the next higher caller. */
3250 const int cat = categories[cat_index];
3251 const char * name = category_names[cat_index];
3255 if (current == NULL) {
3256 current = querylocale_i(cat_index);
3259 const char * proxy_text = "";
3260 if (proxy_caller_line != 0 && proxy_caller_line != immediate_caller_line)
3262 proxy_text = Perl_form(aTHX_ "\nCalled via %s: %" LINE_Tf,
3263 __FILE__, proxy_caller_line);
3265 if ( strNE(__FILE__, higher_caller_file)
3266 || ( immediate_caller_line != 0
3267 && immediate_caller_line != higher_caller_line))
3269 proxy_text = Perl_form(aTHX_ "%s\nCalled via %s: %" LINE_Tf,
3270 proxy_text, __FILE__,
3271 immediate_caller_line);
3274 /* 'false' in the get_displayable_string() calls makes it not think the
3275 * locale is UTF-8, so just dumps bytes. Actually figuring it out can be
3276 * too complicated for a panic situation. */
3277 const char * msg = Perl_form(aTHX_
3278 "Can't change locale for %s (%d) from '%s' to '%s'"
3281 get_displayable_string(current,
3282 current + strlen(current),
3284 get_displayable_string(failed,
3285 failed + strlen(failed),
3290 Perl_locale_panic(msg, __LINE__, higher_caller_file, higher_caller_line);
3291 NOT_REACHED; /* NOTREACHED */
3294 /* Any of these will allow us to find the RADIX */
3295 # if defined(USE_LOCALE_NUMERIC) && ( defined(HAS_SOME_LANGINFO) \
3296 || defined(HAS_LOCALECONV) \
3297 || defined(HAS_SNPRINTF))
3298 # define CAN_CALCULATE_RADIX
3300 # ifdef USE_LOCALE_NUMERIC
3303 S_new_numeric(pTHX_ const char *newnum, bool force)
3305 PERL_ARGS_ASSERT_NEW_NUMERIC;
3307 /* Called after each libc setlocale() or uselocale() call affecting
3308 * LC_NUMERIC, to tell core Perl this and that 'newnum' is the name of the
3309 * new locale, and we are switched into it. It installs this locale as the
3310 * current underlying default, and then switches to the C locale, if
3311 * necessary, so that the code that has traditionally expected the radix
3312 * character to be a dot may continue to do so.
3314 * The default locale and the C locale can be toggled between by use of the
3315 * set_numeric_underlying() and set_numeric_standard() functions, which
3316 * should probably not be called directly, but only via macros like
3317 * SET_NUMERIC_STANDARD() in perl.h.
3319 * The toggling is necessary mainly so that a non-dot radix decimal point
3320 * character can be input and output, while allowing internal calculations
3323 * This sets several interpreter-level variables:
3324 * PL_numeric_name The underlying locale's name: a copy of 'newnum'
3325 * PL_numeric_underlying A boolean indicating if the toggled state is
3326 * such that the current locale is the program's
3328 * PL_numeric_standard An int indicating if the toggled state is such
3329 * that the current locale is the C locale or
3330 * indistinguishable from the C locale. If non-zero, it
3331 * is in C; if > 1, it means it may not be toggled away
3333 * PL_numeric_underlying_is_standard A bool kept by this function
3334 * indicating that the underlying locale and the standard
3335 * C locale are indistinguishable for the purposes of
3336 * LC_NUMERIC. This happens when both of the above two
3337 * variables are true at the same time. (Toggling is a
3338 * no-op under these circumstances.) This variable is
3339 * used to avoid having to recalculate.
3340 * PL_numeric_radix_sv Contains the string that code should use for the
3341 * decimal point. It is set to either a dot or the
3342 * program's underlying locale's radix character string,
3343 * depending on the situation.
3344 * PL_underlying_radix_sv Contains the program's underlying locale's
3345 * radix character string. This is copied into
3346 * PL_numeric_radix_sv when the situation warrants. It
3347 * exists to avoid having to recalculate it when toggling.
3350 DEBUG_L( PerlIO_printf(Perl_debug_log,
3351 "Called new_numeric with %s, PL_numeric_name=%s\n",
3352 newnum, PL_numeric_name));
3354 /* If not forcing this procedure, and there isn't actually a change from
3355 * our records, do nothing. (Our records can be wrong when sync'ing to the
3356 * locale set up by an external library, hence the 'force' parameter) */
3357 if (! force && strEQ(PL_numeric_name, newnum)) {
3361 Safefree(PL_numeric_name);
3362 PL_numeric_name = savepv(newnum);
3364 /* Handle the trivial case. Since this is called at process
3365 * initialization, be aware that this bit can't rely on much being
3367 if (isNAME_C_OR_POSIX(PL_numeric_name)) {
3368 PL_numeric_standard = TRUE;
3369 PL_numeric_underlying_is_standard = TRUE;
3370 PL_numeric_underlying = TRUE;
3371 sv_setpv(PL_numeric_radix_sv, C_decimal_point);
3372 SvUTF8_off(PL_numeric_radix_sv);
3373 sv_setpv(PL_underlying_radix_sv, C_decimal_point);
3374 SvUTF8_off(PL_underlying_radix_sv);
3378 /* We are in the underlying locale until changed at the end of this
3380 PL_numeric_underlying = TRUE;
3382 char * radix = NULL;
3383 utf8ness_t utf8ness = UTF8NESS_IMMATERIAL;
3385 /* Find and save this locale's radix character. */
3386 my_langinfo_c(RADIXCHAR, LC_NUMERIC, PL_numeric_name,
3387 &radix, NULL, &utf8ness);
3388 sv_setpv(PL_underlying_radix_sv, radix);
3390 if (utf8ness == UTF8NESS_YES) {
3391 SvUTF8_on(PL_underlying_radix_sv);
3394 SvUTF8_off(PL_underlying_radix_sv);
3397 DEBUG_L(PerlIO_printf(Perl_debug_log,
3398 "Locale radix is '%s', ?UTF-8=%d\n",
3399 SvPVX(PL_underlying_radix_sv),
3400 cBOOL(SvUTF8(PL_underlying_radix_sv))));
3402 /* This locale is indistinguishable from C (for numeric purposes) if both
3403 * the radix character and the thousands separator are the same as C's.
3404 * Start with the radix. */
3405 PL_numeric_underlying_is_standard = strEQ(C_decimal_point, radix);
3408 # ifndef TS_W32_BROKEN_LOCALECONV
3410 /* If the radix isn't the same as C's, we know it is distinguishable from
3411 * C; otherwise check the thousands separator too. Only if both are the
3412 * same as C's is the locale indistinguishable from C.
3414 * But on earlier Windows versions, there is a potential race. This code
3415 * knows that localeconv() (elsewhere in this file) will be used to extract
3416 * the needed value, and localeconv() was buggy for quite a while, and that
3417 * code in this file hence uses a workaround. And that workaround may have
3418 * an (unlikely) race. Gathering the radix uses a different workaround on
3419 * Windows that doesn't involve a race. It might be possible to do the
3420 * same for this (patches welcome).
3422 * Until then khw doesn't think it's worth even the small risk of a race to
3423 * get this value, which doesn't appear to be used in any of the Microsoft
3424 * library routines anyway. */
3426 char * scratch_buffer = NULL;
3427 if (PL_numeric_underlying_is_standard) {
3428 PL_numeric_underlying_is_standard = strEQ(C_thousands_sep,
3429 my_langinfo_c(THOUSEP, LC_NUMERIC,
3434 Safefree(scratch_buffer);
3438 PL_numeric_standard = PL_numeric_underlying_is_standard;
3440 /* Keep LC_NUMERIC so that it has the C locale radix and thousands
3441 * separator. This is for XS modules, so they don't have to worry about
3442 * the radix being a non-dot. (Core operations that need the underlying
3443 * locale change to it temporarily). */
3444 if (! PL_numeric_standard) {
3445 set_numeric_standard(__FILE__, __LINE__);
3452 Perl_set_numeric_standard(pTHX_ const char * const file, const line_t line)
3454 PERL_ARGS_ASSERT_SET_NUMERIC_STANDARD;
3455 PERL_UNUSED_ARG(file); /* Some Configurations ignore these */
3456 PERL_UNUSED_ARG(line);
3458 # ifdef USE_LOCALE_NUMERIC
3460 /* Unconditionally toggle the LC_NUMERIC locale to the C locale
3462 * Most code should use the macro SET_NUMERIC_STANDARD() in perl.h
3463 * instead of calling this directly. The macro avoids calling this routine
3464 * if toggling isn't necessary according to our records (which could be
3465 * wrong if some XS code has changed the locale behind our back) */
3467 DEBUG_L(PerlIO_printf(Perl_debug_log, "Setting LC_NUMERIC locale to"
3468 " standard C; called from %s: %"
3469 LINE_Tf "\n", file, line));
3471 void_setlocale_c_with_caller(LC_NUMERIC, "C", file, line);
3472 PL_numeric_standard = TRUE;
3473 sv_setpv(PL_numeric_radix_sv, C_decimal_point);
3474 SvUTF8_off(PL_numeric_radix_sv);
3476 PL_numeric_underlying = PL_numeric_underlying_is_standard;
3478 # endif /* USE_LOCALE_NUMERIC */
3483 Perl_set_numeric_underlying(pTHX_ const char * const file, const line_t line)
3485 PERL_ARGS_ASSERT_SET_NUMERIC_UNDERLYING;
3486 PERL_UNUSED_ARG(file); /* Some Configurations ignore these */
3487 PERL_UNUSED_ARG(line);
3489 # ifdef USE_LOCALE_NUMERIC
3491 /* Unconditionally toggle the LC_NUMERIC locale to the current underlying
3494 * Most code should use the macro SET_NUMERIC_UNDERLYING() in perl.h
3495 * instead of calling this directly. The macro avoids calling this routine
3496 * if toggling isn't necessary according to our records (which could be
3497 * wrong if some XS code has changed the locale behind our back) */
3499 DEBUG_L(PerlIO_printf(Perl_debug_log, "Setting LC_NUMERIC locale to %s;"
3500 " called from %s: %" LINE_Tf "\n",
3501 PL_numeric_name, file, line));
3502 /* Maybe not in init? assert(PL_locale_mutex_depth > 0);*/
3504 void_setlocale_c_with_caller(LC_NUMERIC, PL_numeric_name, file, line);
3505 PL_numeric_underlying = TRUE;
3506 sv_setsv_nomg(PL_numeric_radix_sv, PL_underlying_radix_sv);
3508 PL_numeric_standard = PL_numeric_underlying_is_standard;
3510 # endif /* USE_LOCALE_NUMERIC */
3514 # ifdef USE_LOCALE_CTYPE
3517 S_new_ctype(pTHX_ const char *newctype, bool force)
3519 PERL_ARGS_ASSERT_NEW_CTYPE;
3520 PERL_UNUSED_ARG(force);
3522 /* Called after each libc setlocale() call affecting LC_CTYPE, to tell
3523 * core Perl this and that 'newctype' is the name of the new locale.
3525 * This function sets up the folding arrays for all 256 bytes, assuming
3526 * that tofold() is tolc() since fold case is not a concept in POSIX,
3529 DEBUG_L(PerlIO_printf(Perl_debug_log, "Entering new_ctype(%s)\n",
3532 /* No change means no-op */
3533 if (strEQ(PL_ctype_name, newctype)) {
3537 /* We will replace any bad locale warning with
3538 * 1) nothing if the new one is ok; or
3539 * 2) a new warning for the bad new locale */
3540 if (PL_warn_locale) {
3541 SvREFCNT_dec_NN(PL_warn_locale);
3542 PL_warn_locale = NULL;
3546 Safefree(PL_ctype_name);
3549 PL_in_utf8_turkic_locale = FALSE;
3551 /* For the C locale, just use the standard folds, and we know there are no
3552 * glitches possible, so return early. Since this is called at process
3553 * initialization, be aware that this bit can't rely on much being
3555 if (isNAME_C_OR_POSIX(newctype)) {
3556 Copy(PL_fold, PL_fold_locale, 256, U8);
3557 PL_ctype_name = savepv(newctype);
3558 PL_in_utf8_CTYPE_locale = FALSE;
3562 /* The cache being cleared signals the called function to compute a new
3564 PL_in_utf8_CTYPE_locale = is_locale_utf8(newctype);
3566 PL_ctype_name = savepv(newctype);
3567 bool maybe_utf8_turkic = FALSE;
3569 /* Don't check for problems if we are suppressing the warnings */
3570 bool check_for_problems = ckWARN_d(WARN_LOCALE) || UNLIKELY(DEBUG_L_TEST);
3572 if (PL_in_utf8_CTYPE_locale) {
3574 /* A UTF-8 locale gets standard rules. But note that code still has to
3575 * handle this specially because of the three problematic code points
3577 Copy(PL_fold_latin1, PL_fold_locale, 256, U8);
3579 /* UTF-8 locales can have special handling for 'I' and 'i' if they are
3580 * Turkic. Make sure these two are the only anomalies. (We don't
3581 * require towupper and towlower because they aren't in C89.) */
3583 # if defined(HAS_TOWUPPER) && defined (HAS_TOWLOWER)
3585 if (towupper('i') == 0x130 && towlower('I') == 0x131)
3589 if (toU8_UPPER_LC('i') == 'i' && toU8_LOWER_LC('I') == 'I')
3594 /* This is how we determine it really is Turkic */
3595 check_for_problems = TRUE;
3596 maybe_utf8_turkic = TRUE;
3599 else { /* Not a canned locale we know the values for. Compute them */
3603 bool has_non_ascii_fold = FALSE;
3604 bool found_unexpected = FALSE;
3606 /* Under -DLv, see if there are any folds outside the ASCII range.
3607 * This factoid is used below */
3608 if (DEBUG_Lv_TEST) {
3609 for (unsigned i = 128; i < 256; i++) {
3610 int j = LATIN1_TO_NATIVE(i);
3611 if (toU8_LOWER_LC(j) != j || toU8_UPPER_LC(j) != j) {
3612 has_non_ascii_fold = TRUE;
3620 for (unsigned i = 0; i < 256; i++) {
3621 if (isU8_UPPER_LC(i))
3622 PL_fold_locale[i] = (U8) toU8_LOWER_LC(i);
3623 else if (isU8_LOWER_LC(i))
3624 PL_fold_locale[i] = (U8) toU8_UPPER_LC(i);
3626 PL_fold_locale[i] = (U8) i;
3630 /* Most locales these days are supersets of ASCII. When debugging
3631 * with -DLv, it is helpful to know what the exceptions to that are
3633 if (DEBUG_Lv_TEST) {
3634 bool unexpected = FALSE;
3636 if (isUPPER_L1(i)) {
3638 if (PL_fold_locale[i] != toLOWER_A(i)) {
3642 else if (has_non_ascii_fold) {
3643 if (PL_fold_locale[i] != toLOWER_L1(i)) {
3647 else if (PL_fold_locale[i] != i) {
3651 else if ( isLOWER_L1(i)
3652 && i != LATIN_SMALL_LETTER_SHARP_S
3656 if (PL_fold_locale[i] != toUPPER_A(i)) {
3660 else if (has_non_ascii_fold) {
3661 if (PL_fold_locale[i] != toUPPER_LATIN1_MOD(i)) {
3665 else if (PL_fold_locale[i] != i) {
3669 else if (PL_fold_locale[i] != i) {
3674 found_unexpected = TRUE;
3675 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
3676 "For %s, fold of %02x is %02x\n",
3677 newctype, i, PL_fold_locale[i]));
3682 if (found_unexpected) {
3683 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
3684 "All bytes not mentioned above either fold to"
3685 " themselves or are the expected ASCII or"
3689 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
3690 "No nonstandard folds were found\n"));
3698 /* We only handle single-byte locales (outside of UTF-8 ones); so if this
3699 * locale requires more than one byte, there are going to be BIG problems.
3702 const int mb_cur_max = MB_CUR_MAX;
3704 if (mb_cur_max > 1 && ! PL_in_utf8_CTYPE_locale
3706 /* Some platforms return MB_CUR_MAX > 1 for even the "C" locale.
3707 * Just assume that the implementation for them (plus for POSIX) is
3708 * correct and the > 1 value is spurious. (Since these are
3709 * specially handled to never be considered UTF-8 locales, as long
3710 * as this is the only problem, everything should work fine */
3711 && ! isNAME_C_OR_POSIX(newctype))
3713 DEBUG_L(PerlIO_printf(Perl_debug_log,
3714 "Unsupported, MB_CUR_MAX=%d\n", mb_cur_max));
3716 Perl_ck_warner_d(aTHX_ packWARN(WARN_LOCALE),
3717 "Locale '%s' is unsupported, and may crash the"
3724 DEBUG_Lv(PerlIO_printf(Perl_debug_log, "check_for_problems=%d\n",
3725 check_for_problems));
3727 /* We don't populate the other lists if a UTF-8 locale, but do check that
3728 * everything works as expected, unless checking turned off */
3729 if (check_for_problems) {
3730 /* Assume enough space for every character being bad. 4 spaces each
3731 * for the 94 printable characters that are output like "'x' "; and 5
3732 * spaces each for "'\\' ", "'\t' ", and "'\n' "; plus a terminating
3734 char bad_chars_list[ (94 * 4) + (3 * 5) + 1 ] = { '\0' };
3735 unsigned int bad_count = 0; /* Count of bad characters */
3737 for (unsigned i = 0; i < 256; i++) {
3739 /* If checking for locale problems, see if the native ASCII-range
3740 * printables plus \n and \t are in their expected categories in
3741 * the new locale. If not, this could mean big trouble, upending
3742 * Perl's and most programs' assumptions, like having a
3743 * metacharacter with special meaning become a \w. Fortunately,
3744 * it's very rare to find locales that aren't supersets of ASCII
3745 * nowadays. It isn't a problem for most controls to be changed
3746 * into something else; we check only \n and \t, though perhaps \r
3747 * could be an issue as well. */
3748 if (isGRAPH_A(i) || isBLANK_A(i) || i == '\n') {
3749 bool is_bad = FALSE;
3750 char name[4] = { '\0' };
3752 /* Convert the name into a string */
3757 else if (i == '\n') {
3758 my_strlcpy(name, "\\n", sizeof(name));
3760 else if (i == '\t') {
3761 my_strlcpy(name, "\\t", sizeof(name));
3765 my_strlcpy(name, "' '", sizeof(name));
3768 /* Check each possibe class */
3769 if (UNLIKELY(cBOOL(isU8_ALPHANUMERIC_LC(i)) !=
3770 cBOOL(isALPHANUMERIC_A(i))))
3773 DEBUG_L(PerlIO_printf(Perl_debug_log,
3774 "isalnum('%s') unexpectedly is %x\n",
3775 name, cBOOL(isU8_ALPHANUMERIC_LC(i))));
3777 if (UNLIKELY(cBOOL(isU8_ALPHA_LC(i)) != cBOOL(isALPHA_A(i)))) {
3779 DEBUG_L(PerlIO_printf(Perl_debug_log,
3780 "isalpha('%s') unexpectedly is %x\n",
3781 name, cBOOL(isU8_ALPHA_LC(i))));
3783 if (UNLIKELY(cBOOL(isU8_DIGIT_LC(i)) != cBOOL(isDIGIT_A(i)))) {
3785 DEBUG_L(PerlIO_printf(Perl_debug_log,
3786 "isdigit('%s') unexpectedly is %x\n",
3787 name, cBOOL(isU8_DIGIT_LC(i))));
3789 if (UNLIKELY(cBOOL(isU8_GRAPH_LC(i)) != cBOOL(isGRAPH_A(i)))) {
3791 DEBUG_L(PerlIO_printf(Perl_debug_log,
3792 "isgraph('%s') unexpectedly is %x\n",
3793 name, cBOOL(isU8_GRAPH_LC(i))));
3795 if (UNLIKELY(cBOOL(isU8_LOWER_LC(i)) != cBOOL(isLOWER_A(i)))) {
3797 DEBUG_L(PerlIO_printf(Perl_debug_log,
3798 "islower('%s') unexpectedly is %x\n",
3799 name, cBOOL(isU8_LOWER_LC(i))));
3801 if (UNLIKELY(cBOOL(isU8_PRINT_LC(i)) != cBOOL(isPRINT_A(i)))) {
3803 DEBUG_L(PerlIO_printf(Perl_debug_log,
3804 "isprint('%s') unexpectedly is %x\n",
3805 name, cBOOL(isU8_PRINT_LC(i))));
3807 if (UNLIKELY(cBOOL(isU8_PUNCT_LC(i)) != cBOOL(isPUNCT_A(i)))) {
3809 DEBUG_L(PerlIO_printf(Perl_debug_log,
3810 "ispunct('%s') unexpectedly is %x\n",
3811 name, cBOOL(isU8_PUNCT_LC(i))));
3813 if (UNLIKELY(cBOOL(isU8_SPACE_LC(i)) != cBOOL(isSPACE_A(i)))) {
3815 DEBUG_L(PerlIO_printf(Perl_debug_log,
3816 "isspace('%s') unexpectedly is %x\n",
3817 name, cBOOL(isU8_SPACE_LC(i))));
3819 if (UNLIKELY(cBOOL(isU8_UPPER_LC(i)) != cBOOL(isUPPER_A(i)))) {
3821 DEBUG_L(PerlIO_printf(Perl_debug_log,
3822 "isupper('%s') unexpectedly is %x\n",
3823 name, cBOOL(isU8_UPPER_LC(i))));
3825 if (UNLIKELY(cBOOL(isU8_XDIGIT_LC(i))!= cBOOL(isXDIGIT_A(i)))) {
3827 DEBUG_L(PerlIO_printf(Perl_debug_log,
3828 "isxdigit('%s') unexpectedly is %x\n",
3829 name, cBOOL(isU8_XDIGIT_LC(i))));
3831 if (UNLIKELY(toU8_LOWER_LC(i) != (int) toLOWER_A(i))) {
3833 DEBUG_L(PerlIO_printf(Perl_debug_log,
3834 "tolower('%s')=0x%x instead of the expected 0x%x\n",
3835 name, toU8_LOWER_LC(i), (int) toLOWER_A(i)));
3837 if (UNLIKELY(toU8_UPPER_LC(i) != (int) toUPPER_A(i))) {
3839 DEBUG_L(PerlIO_printf(Perl_debug_log,
3840 "toupper('%s')=0x%x instead of the expected 0x%x\n",
3841 name, toU8_UPPER_LC(i), (int) toUPPER_A(i)));
3843 if (UNLIKELY((i == '\n' && ! isCNTRL_LC(i)))) {
3845 DEBUG_L(PerlIO_printf(Perl_debug_log,
3846 "'\\n' (=%02X) is not a control\n", (int) i));
3849 /* Add to the list; Separate multiple entries with a blank */
3852 my_strlcat(bad_chars_list, " ", sizeof(bad_chars_list));
3854 my_strlcat(bad_chars_list, name, sizeof(bad_chars_list));
3860 if (bad_count == 2 && maybe_utf8_turkic) {
3862 *bad_chars_list = '\0';
3864 /* The casts are because otherwise some compilers warn:
3865 gcc.gnu.org/bugzilla/show_bug.cgi?id=99950
3866 gcc.gnu.org/bugzilla/show_bug.cgi?id=94182
3868 PL_fold_locale[ (U8) 'I' ] = 'I';
3869 PL_fold_locale[ (U8) 'i' ] = 'i';
3870 PL_in_utf8_turkic_locale = TRUE;
3871 DEBUG_L(PerlIO_printf(Perl_debug_log, "%s is turkic\n", newctype));
3874 /* If we found problems and we want them output, do so */
3875 if ( (UNLIKELY(bad_count))
3876 && (LIKELY(ckWARN_d(WARN_LOCALE)) || UNLIKELY(DEBUG_L_TEST)))
3878 /* WARNING. If you change the wording of these; be sure to update
3879 * t/loc_tools.pl correspondingly */
3881 if (PL_in_utf8_CTYPE_locale) {
3882 PL_warn_locale = Perl_newSVpvf(aTHX_
3883 "Locale '%s' contains (at least) the following characters"
3884 " which have\nunexpected meanings: %s\nThe Perl program"
3885 " will use the expected meanings",
3886 newctype, bad_chars_list);
3891 "\nThe following characters (and maybe"
3892 " others) may not have the same meaning as"
3893 " the Perl program expects: %s\n",
3898 # if defined(HAS_SOME_LANGINFO) || defined(WIN32)
3900 char * scratch_buffer = NULL;
3901 Perl_sv_catpvf(aTHX_ PL_warn_locale, "; codeset=%s",
3902 my_langinfo_c(CODESET, LC_CTYPE,
3904 &scratch_buffer, NULL,
3906 Safefree(scratch_buffer);
3910 Perl_sv_catpvf(aTHX_ PL_warn_locale, "\n");
3912 /* If we are actually in the scope of the locale or are debugging,
3913 * output the message now. If not in that scope, we save the
3914 * message to be output at the first operation using this locale,
3915 * if that actually happens. Most programs don't use locales, so
3916 * they are immune to bad ones. */
3917 if (IN_LC(LC_CTYPE) || UNLIKELY(DEBUG_L_TEST)) {
3919 /* The '0' below suppresses a bogus gcc compiler warning */
3920 Perl_warner(aTHX_ packWARN(WARN_LOCALE), SvPVX(PL_warn_locale),
3922 if (IN_LC(LC_CTYPE)) {
3923 SvREFCNT_dec_NN(PL_warn_locale);
3924 PL_warn_locale = NULL;
3932 Perl_warn_problematic_locale()
3936 /* Core-only function that outputs the message in PL_warn_locale,
3937 * and then NULLS it. Should be called only through the macro
3938 * CHECK_AND_WARN_PROBLEMATIC_LOCALE_ */
3940 if (PL_warn_locale) {
3941 Perl_ck_warner(aTHX_ packWARN(WARN_LOCALE),
3942 SvPVX(PL_warn_locale),
3943 0 /* dummy to avoid compiler warning */ );
3944 SvREFCNT_dec_NN(PL_warn_locale);
3945 PL_warn_locale = NULL;
3949 # endif /* USE_LOCALE_CTYPE */
3952 S_new_LC_ALL(pTHX_ const char *lc_all, bool force)
3954 PERL_ARGS_ASSERT_NEW_LC_ALL;
3956 /* new_LC_ALL() updates all the things we care about. Note that this is
3957 * called just after a change, so uses the actual underlying locale just
3958 * set, and not the nominal one (should they differ, as they may in
3961 const char * individ_locales[LC_ALL_INDEX_] = { NULL };
3963 switch (parse_LC_ALL_string(lc_all,
3965 override_if_ignored, /* Override any ignored
3967 true, /* Always fill array */
3968 true, /* Panic if fails, as to get here it
3969 earlier had to have succeeded */
3974 case only_element_0:
3975 locale_panic_("Unexpected return from parse_LC_ALL_string");
3981 for_all_individual_category_indexes(i) {
3982 if (update_functions[i]) {
3983 const char * this_locale = individ_locales[i];
3984 update_functions[i](aTHX_ this_locale, force);
3987 Safefree(individ_locales[i]);
3991 # ifdef USE_LOCALE_COLLATE
3994 S_new_collate(pTHX_ const char *newcoll, bool force)
3996 PERL_ARGS_ASSERT_NEW_COLLATE;
3997 PERL_UNUSED_ARG(force);
3999 /* Called after each libc setlocale() call affecting LC_COLLATE, to tell
4000 * core Perl this and that 'newcoll' is the name of the new locale.
4002 * The design of locale collation is that every locale change is given an
4003 * index 'PL_collation_ix'. The first time a string participates in an
4004 * operation that requires collation while locale collation is active, it
4005 * is given PERL_MAGIC_collxfrm magic (via sv_collxfrm_flags()). That
4006 * magic includes the collation index, and the transformation of the string
4007 * by strxfrm(), q.v. That transformation is used when doing comparisons,
4008 * instead of the string itself. If a string changes, the magic is
4009 * cleared. The next time the locale changes, the index is incremented,
4010 * and so we know during a comparison that the transformation is not
4011 * necessarily still valid, and so is recomputed. Note that if the locale
4012 * changes enough times, the index could wrap, and it is possible that a
4013 * transformation would improperly be considered valid, leading to an
4014 * unlikely bug. The value is declared to the widest possible type on this
4017 /* Return if the locale isn't changing */
4018 if (strEQ(PL_collation_name, newcoll)) {
4022 Safefree(PL_collation_name);
4023 PL_collation_name = savepv(newcoll);
4026 /* Set the new one up if trivial. Since this is called at process
4027 * initialization, be aware that this bit can't rely on much being
4029 PL_collation_standard = isNAME_C_OR_POSIX(newcoll);
4030 if (PL_collation_standard) {
4031 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
4032 "Setting PL_collation name='%s'\n",
4033 PL_collation_name));
4034 PL_collxfrm_base = 0;
4035 PL_collxfrm_mult = 2;
4036 PL_in_utf8_COLLATE_locale = FALSE;
4037 PL_strxfrm_NUL_replacement = '\0';
4038 PL_strxfrm_max_cp = 0;
4042 /* Flag that the remainder of the set up is being deferred until first
4044 PL_collxfrm_mult = 0;
4045 PL_collxfrm_base = 0;
4049 # endif /* USE_LOCALE_COLLATE */
4054 S_Win_byte_string_to_wstring(const UINT code_page, const char * byte_string)
4056 /* Caller must arrange to free the returned string */
4058 int req_size = MultiByteToWideChar(code_page, 0, byte_string, -1, NULL, 0);
4065 Newx(wstring, req_size, wchar_t);
4067 if (! MultiByteToWideChar(code_page, 0, byte_string, -1, wstring, req_size))
4077 # define Win_utf8_string_to_wstring(s) \
4078 Win_byte_string_to_wstring(CP_UTF8, (s))
4081 S_Win_wstring_to_byte_string(const UINT code_page, const wchar_t * wstring)
4083 /* Caller must arrange to free the returned string */
4086 WideCharToMultiByte(code_page, 0, wstring, -1, NULL, 0, NULL, NULL);
4089 Newx(byte_string, req_size, char);
4091 if (! WideCharToMultiByte(code_page, 0, wstring, -1, byte_string,
4092 req_size, NULL, NULL))
4094 Safefree(byte_string);
4102 # define Win_wstring_to_utf8_string(ws) \
4103 Win_wstring_to_byte_string(CP_UTF8, (ws))
4106 S_wrap_wsetlocale(pTHX_ const int category, const char *locale)
4108 PERL_ARGS_ASSERT_WRAP_WSETLOCALE;
4110 /* Calls _wsetlocale(), converting the parameters/return to/from
4111 * Perl-expected forms as if plain setlocale() were being called instead.
4113 * Caller must arrange for the returned PV to be freed.
4116 const wchar_t * wlocale = NULL;
4119 wlocale = Win_utf8_string_to_wstring(locale);
4126 const wchar_t * wresult = _wsetlocale(category, wlocale);
4134 const char * result = Win_wstring_to_utf8_string(wresult);
4142 S_win32_setlocale(pTHX_ int category, const char* locale)
4144 /* This, for Windows, emulates POSIX setlocale() behavior. There is no
4145 * difference between the two unless the input locale is "", which normally
4146 * means on Windows to get the machine default, which is set via the
4147 * computer's "Regional and Language Options" (or its current equivalent).
4148 * In POSIX, it instead means to find the locale from the user's
4149 * environment. This routine changes the Windows behavior to try the POSIX
4150 * behavior first. Further details are in the called function
4151 * find_locale_from_environment().
4154 if (locale != NULL && strEQ(locale, "")) {
4155 /* Note this function may change the locale, but that's ok because we
4156 * are about to change it anyway */
4157 locale = find_locale_from_environment(get_category_index(category));
4158 if (locale == NULL) {
4164 const char * result = wrap_wsetlocale(category, locale);
4165 DEBUG_L(PerlIO_printf(Perl_debug_log, "%s\n",
4166 setlocale_debug_string_r(category, locale, result)));
4173 save_to_buffer(result, &PL_setlocale_buf, &PL_setlocale_bufsize);
4175 # ifndef USE_PL_CUR_LC_ALL
4181 /* Here, we need to keep track of LC_ALL, so store the new value. but if
4182 * the input locale is NULL, we were just querying, so the original value
4184 if (locale == NULL) {
4189 /* If we set LC_ALL directly above, we already know its new value; but
4190 * if we changed just an individual category, find the new LC_ALL */
4191 if (category != LC_ALL) {
4193 result = wrap_wsetlocale(LC_ALL, NULL);
4196 Safefree(PL_cur_LC_ALL);
4197 PL_cur_LC_ALL = result;
4200 DEBUG_L(PerlIO_printf(Perl_debug_log, "new PL_cur_LC_ALL=%s\n",
4204 return PL_setlocale_buf;
4210 S_native_querylocale_i(pTHX_ const locale_category_index cat_index)
4212 /* Determine the current locale and return it in the form the platform's
4213 * native locale handling understands. This is different only from our
4214 * internal form for the LC_ALL category, as platforms differ in how they
4217 * This is only called from Perl_setlocale(). As such it returns in
4218 * PL_setlocale_buf */
4220 # ifdef USE_LOCALE_NUMERIC
4222 /* We have the LC_NUMERIC name saved, because we are normally switched into
4223 * the C locale (or equivalent) for it. */
4224 if (cat_index == LC_NUMERIC_INDEX_) {
4226 /* We don't have to copy this return value, as it is a per-thread
4227 * variable, and won't change until a future setlocale */
4228 return PL_numeric_name;
4234 if (cat_index != LC_ALL_INDEX_)
4239 /* Here, not LC_ALL, and not LC_NUMERIC: the actual and native values
4242 # ifdef setlocale_i /* Can shortcut if this is defined */
4244 return setlocale_i(cat_index, NULL);
4248 return save_to_buffer(querylocale_i(cat_index),
4249 &PL_setlocale_buf, &PL_setlocale_bufsize);
4254 /* Below, querying LC_ALL */
4257 # ifdef USE_PL_CURLOCALES
4258 # define LC_ALL_ARG PL_curlocales
4260 # define LC_ALL_ARG NULL /* Causes calculate_LC_ALL_string() to find the
4261 locale using a querylocale function */
4264 return calculate_LC_ALL_string(LC_ALL_ARG, EXTERNAL_FORMAT_FOR_QUERY,
4265 WANT_PL_setlocale_buf,
4268 # endif /* has LC_ALL */
4272 #endif /* USE_LOCALE */
4275 =for apidoc Perl_setlocale
4277 This is an (almost) drop-in replacement for the system L<C<setlocale(3)>>,
4278 taking the same parameters, and returning the same information, except that it
4279 returns the correct underlying C<LC_NUMERIC> locale. Regular C<setlocale> will
4280 instead return C<C> if the underlying locale has a non-dot decimal point
4281 character, or a non-empty thousands separator for displaying floating point
4282 numbers. This is because perl keeps that locale category such that it has a
4283 dot and empty separator, changing the locale briefly during the operations
4284 where the underlying one is required. C<Perl_setlocale> knows about this, and
4285 compensates; regular C<setlocale> doesn't.
4287 Another reason it isn't completely a drop-in replacement is that it is
4288 declared to return S<C<const char *>>, whereas the system setlocale omits the
4289 C<const> (presumably because its API was specified long ago, and can't be
4290 updated; it is illegal to change the information C<setlocale> returns; doing
4291 so leads to segfaults.)
4293 Finally, C<Perl_setlocale> works under all circumstances, whereas plain
4294 C<setlocale> can be completely ineffective on some platforms under some
4297 Changing the locale is not a good idea when more than one thread is running,
4298 except on systems where the predefined variable C<${^SAFE_LOCALES}> is
4299 non-zero. This is because on such systems the locale is global to the whole
4300 process and not local to just the thread calling the function. So changing it
4301 in one thread instantaneously changes it in all. On some such systems, the
4302 system C<setlocale()> is ineffective, returning the wrong information, and
4303 failing to actually change the locale. z/OS refuses to try to change the
4304 locale once a second thread is created. C<Perl_setlocale>, should give you
4305 accurate results of what actually happened on these problematic platforms,
4306 returning NULL if the system forbade the locale change.
4308 The return points to a per-thread static buffer, which is overwritten the next
4309 time C<Perl_setlocale> is called from the same thread.
4316 Perl_setlocale(const int category, const char * locale)
4318 /* This wraps POSIX::setlocale() */
4322 PERL_UNUSED_ARG(category);
4323 PERL_UNUSED_ARG(locale);
4331 DEBUG_L(PerlIO_printf(Perl_debug_log,
4332 "Entering Perl_setlocale(%d, \"%s\")\n",
4335 bool valid_category;
4336 locale_category_index cat_index = get_category_index_helper(category,
4339 if (! valid_category) {
4340 if (ckWARN(WARN_LOCALE)) {
4341 const char * conditional_warn_text;
4342 if (locale == NULL) {
4343 conditional_warn_text = "";
4347 conditional_warn_text = "; can't set it to ";
4350 /* diag_listed_as: Unknown locale category %d; can't set it to %s */
4352 packWARN(WARN_LOCALE),
4353 "Unknown locale category %d%s%s",
4354 category, conditional_warn_text, locale);
4363 /* setlocale_i() gets defined only on Configurations that use setlocale()
4364 * in a simple manner that adequately handles all cases. If this category
4365 * doesn't have any perl complications, just do that. */
4366 if (! update_functions[cat_index]) {
4367 return setlocale_i(cat_index, locale);
4372 /* Get current locale */
4373 const char * current_locale = native_querylocale_i(cat_index);
4375 /* A NULL locale means only query what the current one is. */
4376 if (locale == NULL) {
4377 return current_locale;
4380 if (strEQ(current_locale, locale)) {
4381 DEBUG_L(PerlIO_printf(Perl_debug_log,
4382 "Already in requested locale: no action taken\n"));
4383 return current_locale;
4386 /* Here, an actual change is being requested. Do it */
4387 if (! bool_setlocale_i(cat_index, locale)) {
4388 DEBUG_L(PerlIO_printf(Perl_debug_log, "%s\n",
4389 setlocale_debug_string_i(cat_index, locale, "NULL")));
4393 /* At this point, the locale has been changed based on the requested value,
4394 * and the querylocale_i() will return the actual new value that the system
4395 * has for the category. That may not be the same as the input, as libc
4396 * may have returned a synonymous locale name instead of the input one; or,
4397 * if there are locale categories that we are compiled to ignore, any
4398 * attempt to change them away from "C" is overruled */
4399 current_locale = querylocale_i(cat_index);
4401 /* But certain categories need further work. For example we may need to
4402 * calculate new folding or collation rules. And for LC_NUMERIC, we have
4403 * to switch into a locale that has a dot radix. */
4404 if (update_functions[cat_index]) {
4405 update_functions[cat_index](aTHX_ current_locale,
4406 /* No need to force recalculation, as
4407 * aren't coming from a situation
4408 * where Perl hasn't been controlling
4409 * the locale, so has accurate
4414 /* Make sure the result is in a stable buffer for the caller's use, and is
4415 * in the expected format */
4416 current_locale = native_querylocale_i(cat_index);
4418 DEBUG_L(PerlIO_printf(Perl_debug_log, "returning '%s'\n", current_locale));
4420 return current_locale;
4426 #if defined(USE_LOCALE) || defined(HAS_SOME_LANGINFO) || defined(HAS_LOCALECONV)
4429 S_get_locale_string_utf8ness_i(pTHX_ const char * string,
4430 const locale_utf8ness_t known_utf8,
4431 const char * locale,
4432 const locale_category_index cat_index)
4434 PERL_ARGS_ASSERT_GET_LOCALE_STRING_UTF8NESS_I;
4439 PERL_UNUSED_ARG(string);
4440 PERL_UNUSED_ARG(known_utf8);
4441 PERL_UNUSED_ARG(locale);
4442 PERL_UNUSED_ARG(cat_index);
4446 assert(cat_index <= LC_ALL_INDEX_);
4448 /* Return to indicate if 'string' in the locale given by the input
4449 * arguments should be considered UTF-8 or not.
4451 * If the input 'locale' is not NULL, use that for the locale; otherwise
4452 * use the current locale for the category specified by 'cat_index'.
4455 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
4456 "Entering get_locale_string_utf8ness_i; locale=%s,"
4457 " index=%u(%s), string=%s, known_utf8=%d\n",
4458 locale, cat_index, category_names[cat_index],
4460 ? _byte_dump_string((U8 *) string,
4465 if (string == NULL) {
4466 return UTF8NESS_IMMATERIAL;
4469 if (IN_BYTES) { /* respect 'use bytes' */
4473 Size_t len = strlen(string);
4475 /* UTF8ness is immaterial if the representation doesn't vary */
4476 const U8 * first_variant = NULL;
4477 if (is_utf8_invariant_string_loc((U8 *) string, len, &first_variant)) {
4478 return UTF8NESS_IMMATERIAL;
4481 /* Can't be UTF-8 if invalid */
4482 if (! is_utf8_string((U8 *) first_variant,
4483 len - ((char *) first_variant - string)))
4488 /* Here and below, we know the string is legal UTF-8, containing at least
4489 * one character requiring a sequence of two or more bytes. It is quite
4490 * likely to be UTF-8. But it pays to be paranoid and do further checking.
4492 * If we already know the UTF-8ness of the locale, then we immediately know
4493 * what the string is */
4494 if (UNLIKELY(known_utf8 != LOCALE_UTF8NESS_UNKNOWN)) {
4495 return (known_utf8 == LOCALE_IS_UTF8) ? UTF8NESS_YES : UTF8NESS_NO;
4498 if (locale == NULL) {
4499 locale = querylocale_i(cat_index);
4502 /* If the locale is UTF-8, the string is UTF-8; otherwise it was
4503 * coincidental that the string is legal UTF-8
4505 * However, if the perl is compiled to not pay attention to the category
4506 * being passed in, you might think that that locale is essentially always
4507 * the C locale, so it would make sense to say it isn't UTF-8. But to get
4508 * here, the string has to contain characters unknown in the C locale. And
4509 * in fact, Windows boxes are compiled without LC_MESSAGES, as their
4510 * message catalog isn't really a part of the locale system. But those
4511 * messages really could be UTF-8, and given that the odds are rather small
4512 * of something not being UTF-8 but being syntactically valid UTF-8, khw
4513 * has decided to call such strings as UTF-8. */
4514 return (is_locale_utf8(locale)) ? UTF8NESS_YES : UTF8NESS_NO;
4521 S_is_locale_utf8(pTHX_ const char * locale)
4523 /* Returns TRUE if the locale 'locale' is UTF-8; FALSE otherwise. It uses
4524 * my_langinfo(), which employs various methods to get this information
4525 * if nl_langinfo() isn't available, using heuristics as a last resort, in
4526 * which case, the result will very likely be correct for locales for
4527 * languages that have commonly used non-ASCII characters, but for notably
4528 * English, it comes down to if the locale's name ends in something like
4529 * "UTF-8". It errs on the side of not being a UTF-8 locale.
4531 * Systems conforming to C99 should have the needed libc calls to give us a
4532 * completely reliable result. */
4534 # if ! defined(USE_LOCALE) \
4535 || ! defined(USE_LOCALE_CTYPE) \
4536 || defined(EBCDIC) /* There aren't any real UTF-8 locales at this time */
4538 PERL_UNUSED_ARG(locale);
4544 char * scratch_buffer = NULL;
4545 const char * codeset;
4548 PERL_ARGS_ASSERT_IS_LOCALE_UTF8;
4550 if (strEQ(locale, PL_ctype_name)) {
4551 return PL_in_utf8_CTYPE_locale;
4554 codeset = my_langinfo_c(CODESET, LC_CTYPE, locale,
4555 &scratch_buffer, NULL, NULL);
4556 retval = is_codeset_name_UTF8(codeset);
4558 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
4559 "found codeset=%s, is_utf8=%d\n", codeset, retval));
4561 Safefree(scratch_buffer);
4563 DEBUG_Lv(PerlIO_printf(Perl_debug_log, "is_locale_utf8(%s) returning %d\n",
4575 S_set_save_buffer_min_size(pTHX_ Size_t min_len,
4577 Size_t * buf_cursize)
4579 /* Make sure the buffer pointed to by *buf is at least as large 'min_len';
4580 * *buf_cursize is the size of 'buf' upon entry; it will be updated to the
4581 * new size on exit. 'buf_cursize' being NULL is to be used when this is a
4582 * single use buffer, which will shortly be freed by the caller. */
4584 if (buf_cursize == NULL) {
4585 Newx(*buf, min_len, char);
4587 else if (*buf_cursize == 0) {
4588 Newx(*buf, min_len, char);
4589 *buf_cursize = min_len;
4591 else if (min_len > *buf_cursize) {
4592 Renew(*buf, min_len, char);
4593 *buf_cursize = min_len;
4598 S_save_to_buffer(pTHX_ const char * string, char **buf, Size_t *buf_size)
4600 PERL_ARGS_ASSERT_SAVE_TO_BUFFER;
4602 /* Copy the NUL-terminated 'string' to a buffer whose address before this
4603 * call began at *buf, and whose available length before this call was
4606 * If the length of 'string' is greater than the space available, the
4607 * buffer is grown accordingly, which may mean that it gets relocated.
4608 * *buf and *buf_size will be updated to reflect this.
4610 * Regardless, the function returns a pointer to where 'string' is now
4613 * 'string' may be NULL, which means no action gets taken, and NULL is
4616 * 'buf_size' being NULL is to be used when this is a single use buffer,
4617 * which will shortly be freed by the caller.
4619 * If *buf or 'buf_size' are NULL or *buf_size is 0, the buffer is assumed
4620 * empty, and memory is malloc'd.
4627 /* No-op to copy over oneself */
4628 if (string == *buf) {
4632 Size_t string_size = strlen(string) + 1;
4633 set_save_buffer_min_size(string_size, buf, buf_size);
4637 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
4638 "Copying '%s' to %p\n",
4639 ((is_utf8_string((U8 *) string, 0))
4641 :_byte_dump_string((U8 *) string, strlen(string), 0)),
4644 /* Catch glitches. Usually this is because LC_CTYPE needs to be the same
4645 * locale as whatever is being worked on */
4646 if (UNLIKELY(instr(string, REPLACEMENT_CHARACTER_UTF8))) {
4647 locale_panic_(Perl_form(aTHX_
4648 "Unexpected REPLACEMENT_CHARACTER in '%s'\n%s",
4649 string, get_LC_ALL_display()));
4654 Copy(string, *buf, string_size, char);
4661 Perl_get_win32_message_utf8ness(pTHX_ const char * string)
4663 /* This is because Windows doesn't have LC_MESSAGES. */
4665 # ifdef USE_LOCALE_CTYPE
4667 return get_locale_string_utf8ness_i(string, LOCALE_IS_UTF8,
4668 NULL, LC_CTYPE_INDEX_);
4678 #endif /* USE_LOCALE */
4681 Perl_mbtowc_(pTHX_ const wchar_t * pwc, const char * s, const Size_t len)
4684 #if ! defined(HAS_MBRTOWC) && ! defined(HAS_MBTOWC)
4686 PERL_UNUSED_ARG(pwc);
4688 PERL_UNUSED_ARG(len);
4691 #else /* Below we have some form of mbtowc() */
4692 # if defined(HAS_MBRTOWC) \
4693 && (defined(USE_LOCALE_THREADS) || ! defined(HAS_MBTOWC))
4694 # define USE_MBRTOWC
4701 if (s == NULL) { /* Initialize the shift state to all zeros in
4704 # if defined(USE_MBRTOWC)
4706 memzero(&PL_mbrtowc_ps, sizeof(PL_mbrtowc_ps));
4713 retval = mbtowc(NULL, NULL, 0);
4721 # if defined(USE_MBRTOWC)
4725 retval = (SSize_t) mbrtowc((wchar_t *) pwc, s, len, &PL_mbrtowc_ps);
4730 /* Locking prevents races, but locales can be switched out without locking,
4731 * so this isn't a cure all */
4734 retval = mbtowc((wchar_t *) pwc, s, len);
4746 =for apidoc Perl_localeconv
4748 This is a thread-safe version of the libc L<localeconv(3)>. It is the same as
4749 L<POSIX::localeconv|POSIX/localeconv> (returning a hash of the C<localeconv()>
4750 fields), but directly callable from XS code.
4756 Perl_localeconv(pTHX)
4759 #if ! defined(HAS_LOCALECONV)
4765 return my_localeconv(0);
4771 #if defined(HAS_LOCALECONV)
4774 S_my_localeconv(pTHX_ const int item)
4776 PERL_ARGS_ASSERT_MY_LOCALECONV;
4778 /* This returns a mortalized hash containing all or certain elements
4779 * returned by localeconv(). It is used by Perl_localeconv() and
4780 * POSIX::localeconv() and is thread-safe.
4782 * There are two use cases:
4783 * 1) Called from POSIX::locale_conv(). This returns the lconv structure
4784 * copied to a hash, based on the current underlying locales for
4785 * LC_NUMERIC and LC_MONETARY. An input item==0 signifies this case, or
4786 * on many platforms it is the only use case compiled.
4787 * 2) Certain items that nl_langinfo() provides are also derivable from
4788 * the return of localeconv(). Windows notably doesn't have
4789 * nl_langinfo(), so on that, and actually any platform lacking it,
4790 * my_localeconv() is used also to emulate it for those particular
4791 * items. The code to do this is compiled only on such platforms.
4792 * Rather than going to the expense of creating a full hash when only
4793 * one item is needed, the returned hash has just the desired item in
4796 * To access all the localeconv() struct lconv fields, there is a data
4797 * structure that contains every commonly documented field in it. (Maybe
4798 * some minority platforms have extra fields. Those could be added here
4799 * without harm; they would just be ignored on platforms lacking them.)
4801 * Our structure is compiled to make looping through the fields easier by
4802 * pointing each name to its value's offset within lconv, e.g.,
4803 { "thousands_sep", STRUCT_OFFSET(struct lconv, thousands_sep) }
4805 # define LCONV_ENTRY(name) \
4806 {STRINGIFY(name), STRUCT_OFFSET(struct lconv, name)}
4808 /* These synonyms are just for clarity, and to make it easier in case
4809 * something needs to change in the future */
4810 # define LCONV_NUMERIC_ENTRY(name) LCONV_ENTRY(name)
4811 # define LCONV_MONETARY_ENTRY(name) LCONV_ENTRY(name)
4813 /* There are just a few fields for NUMERIC strings */
4814 const lconv_offset_t lconv_numeric_strings[] = {
4815 # ifndef NO_LOCALECONV_GROUPING
4816 LCONV_NUMERIC_ENTRY(grouping),
4818 LCONV_NUMERIC_ENTRY(thousands_sep),
4819 LCONV_NUMERIC_ENTRY(decimal_point),
4823 /* When used to implement nl_langinfo(), we save time by only populating
4824 * the hash with the field(s) needed. Thus we would need a data structure
4826 * LCONV_NUMERIC_ENTRY(decimal_point),
4829 * By placing the decimal_point field last in the full structure, we can
4830 * use just the tail for this bit of it, saving space. This macro yields
4831 * the address of the sub structure. */
4832 # define DECIMAL_POINT_ADDRESS \
4833 &lconv_numeric_strings[(C_ARRAY_LENGTH(lconv_numeric_strings) - 2)]
4835 /* And the MONETARY string fields */
4836 const lconv_offset_t lconv_monetary_strings[] = {
4837 LCONV_MONETARY_ENTRY(int_curr_symbol),
4838 LCONV_MONETARY_ENTRY(mon_decimal_point),
4839 # ifndef NO_LOCALECONV_MON_THOUSANDS_SEP
4840 LCONV_MONETARY_ENTRY(mon_thousands_sep),
4842 # ifndef NO_LOCALECONV_MON_GROUPING
4843 LCONV_MONETARY_ENTRY(mon_grouping),
4845 LCONV_MONETARY_ENTRY(positive_sign),
4846 LCONV_MONETARY_ENTRY(negative_sign),
4847 LCONV_MONETARY_ENTRY(currency_symbol),
4851 /* Like above, this field being last can be used as a sub structure */
4852 # define CURRENCY_SYMBOL_ADDRESS \
4853 &lconv_monetary_strings[(C_ARRAY_LENGTH(lconv_monetary_strings) - 2)]
4855 /* Finally there are integer fields, all are for monetary purposes */
4856 const lconv_offset_t lconv_integers[] = {
4857 LCONV_ENTRY(int_frac_digits),
4858 LCONV_ENTRY(frac_digits),
4859 LCONV_ENTRY(p_sep_by_space),
4860 LCONV_ENTRY(n_cs_precedes),
4861 LCONV_ENTRY(n_sep_by_space),
4862 LCONV_ENTRY(p_sign_posn),
4863 LCONV_ENTRY(n_sign_posn),
4864 # ifdef HAS_LC_MONETARY_2008
4865 LCONV_ENTRY(int_p_cs_precedes),
4866 LCONV_ENTRY(int_p_sep_by_space),
4867 LCONV_ENTRY(int_n_cs_precedes),
4868 LCONV_ENTRY(int_n_sep_by_space),
4869 LCONV_ENTRY(int_p_sign_posn),
4870 LCONV_ENTRY(int_n_sign_posn),
4872 LCONV_ENTRY(p_cs_precedes),
4876 /* Like above, this field being last can be used as a sub structure */
4877 # define P_CS_PRECEDES_ADDRESS \
4878 &lconv_integers[(C_ARRAY_LENGTH(lconv_integers) - 2)]
4880 /* If we aren't paying attention to a given category, use LC_CTYPE instead;
4881 * If not paying attention to that either, the code below should end up not
4882 * using this. Make sure that things blow up if that avoidance gets lost,
4883 * by setting the category to an out-of-bounds value */
4884 locale_category_index numeric_index;
4885 locale_category_index monetary_index;
4887 # ifdef USE_LOCALE_NUMERIC
4888 numeric_index = LC_NUMERIC_INDEX_;
4889 # elif defined(USE_LOCALE_CTYPE)
4890 numeric_index = LC_CTYPE_INDEX_;
4892 numeric_index = LC_ALL_INDEX_; /* Out-of-bounds */
4894 # ifdef USE_LOCALE_MONETARY
4895 monetary_index = LC_MONETARY_INDEX_;
4896 # elif defined(USE_LOCALE_CTYPE)
4897 monetary_index = LC_CTYPE_INDEX_;
4899 monetary_index = LC_ALL_INDEX_; /* Out-of-bounds */
4902 /* Some platforms, for correct non-mojibake results, require LC_CTYPE's
4903 * locale to match LC_NUMERIC's for the numeric fields, and LC_MONETARY's
4904 * for the monetary ones. What happens if LC_NUMERIC and LC_MONETARY
4905 * aren't compatible? Wrong results. To avoid that, we call localeconv()
4906 * twice, once for each locale, setting LC_CTYPE to match the category.
4907 * But if the locales of both categories are the same, there is no need for
4908 * a second call. Assume this is the case unless overridden below */
4909 bool requires_2nd_localeconv = false;
4911 /* The actual hash populating is done by S_populate_hash_from_localeconv().
4912 * It gets passed an array of length two containing the data structure it
4913 * is supposed to use to get the key names to fill the hash with. One
4914 * element is always for the NUMERIC strings (or NULL if none to use), and
4915 * the other element similarly for the MONETARY ones. */
4916 # define NUMERIC_STRING_OFFSET 0
4917 # define MONETARY_STRING_OFFSET 1
4918 const lconv_offset_t * strings[2] = { NULL, NULL };
4920 /* This is a mask, with one bit to tell S_populate_hash_from_localeconv to
4921 * populate the NUMERIC items; another bit for the MONETARY ones. This way
4922 * it can choose which (or both) to populate from */
4925 /* This converts from a locale index to its bit position in the above mask.
4927 # define INDEX_TO_BIT(i) (1 << (i))
4929 /* The two categories can have disparate locales. Initialize them to C and
4930 * override later whichever one(s) we pay attention to */
4931 const char * numeric_locale = "C";
4932 const char * monetary_locale = "C";
4934 /* This will be either 'numeric_locale' or 'monetary_locale' depending on
4935 * what we are working on at the moment */
4936 const char * locale;
4938 /* The LC_MONETARY category also has some integer-valued fields, whose
4939 * information is kept in a separate list */
4940 const lconv_offset_t * integers;
4942 # ifdef HAS_SOME_LANGINFO
4944 /* If the only use-case for this is the full localeconv(), the 'item'
4945 * parameter is ignored. */
4946 PERL_UNUSED_ARG(item);
4950 /* This only gets compiled for the use-case of using localeconv() to
4951 * emulate an nl_langinfo() missing from the platform. */
4953 # ifdef USE_LOCALE_NUMERIC
4955 /* We need this substructure to only return this field for the THOUSEP
4956 * item. The other items also need substructures, but they were handled
4957 * above by placing the substructure's item at the end of the full one, so
4958 * the data structure could do double duty. However, both this and
4959 * RADIXCHAR would need to be in the final position of the same full
4960 * structure; an impossibility. So make this into a separate structure */
4961 const lconv_offset_t thousands_sep_string[] = {
4962 LCONV_NUMERIC_ENTRY(thousands_sep),
4968 /* End of all the initialization of data structures. Now for actual code.
4970 * Without nl_langinfo(), the call to my_localeconv() could be for just one
4971 * of the following 3 items to emulate nl_langinfo(). This is compiled
4972 * only when using perl_langinfo.h, which we control, and it has been
4973 * constructed so that no item is numbered 0.
4975 * For each, set up the appropriate parameters for the call below to
4976 * S_populate_hash_from_localeconv() */
4977 if (item != 0) switch (item) {
4979 locale_panic_(Perl_form(aTHX_
4980 "Unexpected item passed to my_localeconv: %d", item));
4983 # ifdef USE_LOCALE_NUMERIC
4986 locale = numeric_locale = PL_numeric_name;
4987 index_bits = INDEX_TO_BIT(LC_NUMERIC_INDEX_);
4988 strings[NUMERIC_STRING_OFFSET] = DECIMAL_POINT_ADDRESS;
4993 index_bits = INDEX_TO_BIT(LC_NUMERIC_INDEX_);
4994 locale = numeric_locale = PL_numeric_name;
4995 strings[NUMERIC_STRING_OFFSET] = thousands_sep_string;
5000 # ifdef USE_LOCALE_MONETARY
5003 index_bits = INDEX_TO_BIT(LC_MONETARY_INDEX_);
5004 locale = monetary_locale = querylocale_i(LC_MONETARY_INDEX_);
5006 /* This item needs the values for both the currency symbol, and another
5007 * one used to construct the nl_langino()-compatible return */
5008 strings[MONETARY_STRING_OFFSET] = CURRENCY_SYMBOL_ADDRESS;
5009 integers = P_CS_PRECEDES_ADDRESS;
5014 } /* End of switch() */
5016 else /* End of for just one item to emulate nl_langinfo() */
5020 { /* Here, the call is for all of localeconv(). It has a bunch of
5021 * items. As in the individual item case, set up the parameters for
5022 * S_populate_hash_from_localeconv(); */
5024 # ifdef USE_LOCALE_NUMERIC
5025 numeric_locale = PL_numeric_name;
5026 # elif defined(USE_LOCALE_CTYPE)
5027 numeric_locale = querylocale_i(numeric_index);
5029 # if defined(USE_LOCALE_MONETARY) || defined(USE_LOCALE_CTYPE)
5030 monetary_locale = querylocale_i(monetary_index);
5033 /* The first call to S_populate_hash_from_localeconv() will be for the
5034 * MONETARY values */
5035 index_bits = INDEX_TO_BIT(monetary_index);
5036 locale = monetary_locale;
5038 /* And if the locales for the two categories are the same, we can also
5039 * do the NUMERIC values in the same call */
5040 if (strEQ(numeric_locale, monetary_locale)) {
5041 index_bits |= INDEX_TO_BIT(numeric_index);
5044 requires_2nd_localeconv = true;
5047 /* We always pass both sets of strings. 'index_bits' tells
5048 * S_populate_hash_from_localeconv which to actually look at */
5049 strings[NUMERIC_STRING_OFFSET] = lconv_numeric_strings;
5050 strings[MONETARY_STRING_OFFSET] = lconv_monetary_strings;
5052 /* And pass the integer values to populate; again 'index_bits' will
5053 * say to use them or not */
5054 integers = lconv_integers;
5056 } /* End of call is for localeconv() */
5058 /* The code above has determined the parameters to
5059 S_populate_hash_from_localeconv() for both cases of an individual item
5060 and for the entire structure. Below is code common to both */
5062 HV * hv = newHV(); /* The returned hash, initially empty */
5063 sv_2mortal((SV*)hv);
5065 /* Call localeconv() and copy its results into the hash. All the
5066 * parameters have been initialized above */
5067 populate_hash_from_localeconv(hv,
5074 /* The above call may have done all the hash fields, but not always, as
5075 * already explained. If we need a second call it is always for the
5077 if (requires_2nd_localeconv) {
5078 populate_hash_from_localeconv(hv,
5080 INDEX_TO_BIT(numeric_index),
5082 NULL /* There are no NUMERIC integer
5087 /* Here, the hash has been completely populated.
5089 * Now go through all the items and:
5090 * a) For string items, see if they should be marked as UTF-8 or not.
5091 * This would have been more convenient and faster to do while
5092 * populating the hash in the first place, but that operation has to be
5093 * done within a critical section, keeping other threads from
5094 * executing, so only the minimal amount of work necessary is done at
5096 * b) For integer items, convert the C CHAR_MAX value into -1. Again,
5097 * this could have been done in the critical section, but was deferred
5098 * to here to keep to the bare minimum amount the time spent owning the
5099 * processor. CHAR_MAX is a C concept for an 8-bit character type.
5100 * Perl has no such type; the closest fit is a -1.
5102 * XXX On unthreaded perls, this code could be #ifdef'd out, and the
5103 * corrections determined at hash population time, at an extra maintenance
5104 * cost which khw doesn't think is worth it
5107 # ifndef HAS_SOME_LANGINFO
5109 /* We are done when called with an individual item. There are no integer
5110 * items to adjust, and it's best for the caller to determine if this
5111 * string item is UTF-8 or not. This is because the locale's UTF-8ness is
5112 * calculated below, and in some Configurations, that can lead to a
5113 * recursive call to here, which could recurse infinitely. */
5121 for (unsigned int i = 0; i < 2; i++) { /* Try both types of strings */
5122 if (! strings[i]) { /* Skip if no strings of this type */
5126 locale = (i == NUMERIC_STRING_OFFSET)
5130 if (! is_locale_utf8(locale)) {
5131 continue; /* No string can be UTF-8 if the locale isn't */
5134 /* Examine each string */
5135 for (const lconv_offset_t *strp = strings[i]; strp->name; strp++) {
5136 const char * name = strp->name;
5138 /* 'value' will contain the string that may need to be marked as
5140 SV ** value = hv_fetch(hv, name, strlen(name), true);
5141 if (! value || ! SvPOK(*value)) {
5145 /* Determine if the string should be marked as UTF-8. */
5146 if (UTF8NESS_YES == (get_locale_string_utf8ness_i(SvPVX(*value),
5149 (locale_category_index) 0)))
5154 } /* End of fixing up UTF8ness */
5157 /* Examine each integer */
5158 for (; integers; integers++) {
5159 const char * name = integers->name;
5161 if (! name) { /* Reached the end */
5165 SV ** value = hv_fetch(hv, name, strlen(name), true);
5170 /* Change CHAR_MAX to -1 */
5171 if (SvIV(*value) == CHAR_MAX) {
5172 sv_setiv(*value, -1);
5180 S_populate_hash_from_localeconv(pTHX_ HV * hv,
5182 /* Switch to this locale to run
5183 * localeconv() from */
5184 const char * locale,
5186 /* bit mask of which categories to
5188 const U32 which_mask,
5190 /* strings[0] points to the numeric
5191 * string fields; [1] to the monetary */
5192 const lconv_offset_t * strings[2],
5194 /* And to the monetary integer fields */
5195 const lconv_offset_t * integers)
5197 PERL_ARGS_ASSERT_POPULATE_HASH_FROM_LOCALECONV;
5198 PERL_UNUSED_ARG(which_mask); /* Some configurations don't use this;
5199 complicated to figure out which */
5201 PERL_UNUSED_ARG(locale);
5204 /* Run localeconv() and copy some or all of its results to the input 'hv'
5205 * hash. Most localeconv() implementations return the values in a global
5206 * static buffer, so the operation must be performed in a critical section,
5207 * ending only after the copy is completed. There are so many locks
5208 * because localeconv() deals with two categories, and returns in a single
5209 * global static buffer. Some locks might be no-ops on this platform, but
5210 * not others. We need to lock if any one isn't a no-op. */
5212 # ifdef USE_LOCALE_CTYPE
5214 /* Some platforms require LC_CTYPE to be congruent with the category we are
5216 const char * orig_CTYPE_locale = toggle_locale_c(LC_CTYPE, locale);
5219 # ifdef USE_LOCALE_NUMERIC
5221 /* We need to toggle to the underlying NUMERIC locale if we are getting
5222 * NUMERIC strings */
5223 const char * orig_NUMERIC_locale = NULL;
5224 if (which_mask & INDEX_TO_BIT(LC_NUMERIC_INDEX_)) {
5229 /* There is a bug in Windows in which setting LC_CTYPE after the others
5230 * doesn't actually take effect for localeconv(). See commit
5231 * 418efacd1950763f74ed3cc22f8cf9206661b892 for details. Thus we have
5232 * to make sure that the locale we want is set after LC_CTYPE. We
5233 * unconditionally toggle away from and back to the current locale
5234 * prior to calling localeconv().
5236 * This code will have no effect if we already are in C, but khw
5237 * hasn't seen any cases where this causes problems when we are in the
5239 orig_NUMERIC_locale = toggle_locale_i(LC_NUMERIC_INDEX_, "C");
5240 toggle_locale_i(LC_NUMERIC_INDEX_, locale);
5244 /* No need for the extra toggle when not on Windows */
5245 orig_NUMERIC_locale = toggle_locale_i(LC_NUMERIC_INDEX_, locale);
5252 # if defined(USE_LOCALE_MONETARY) && defined(WIN32)
5254 /* Same Windows bug as described just above for NUMERIC. Otherwise, no
5255 * need to toggle LC_MONETARY, as it is kept in the underlying locale */
5256 const char * orig_MONETARY_locale = NULL;
5257 if (which_mask & INDEX_TO_BIT(LC_MONETARY_INDEX_)) {
5258 orig_MONETARY_locale = toggle_locale_i(LC_MONETARY_INDEX_, "C");
5259 toggle_locale_i(LC_MONETARY_INDEX_, locale);
5264 /* Finally ready to do the actual localeconv(). Lock to prevent other
5265 * accesses until we have made a copy of its returned static buffer */
5268 # if defined(TS_W32_BROKEN_LOCALECONV) && defined(USE_THREAD_SAFE_LOCALE)
5270 /* This is a workaround for another bug in Windows. localeconv() was
5271 * broken with thread-safe locales prior to VS 15. It looks at the global
5272 * locale instead of the thread one. As a work-around, we toggle to the
5273 * global locale; populate the return; then toggle back. We have to use
5274 * LC_ALL instead of the individual categories because of yet another bug
5275 * in Windows. And this all has to be done in a critical section.
5277 * This introduces a potential race with any other thread that has also
5278 * converted to use the global locale, and doesn't protect its locale calls
5279 * with mutexes. khw can't think of any reason for a thread to do so on
5280 * Windows, as the locale API is the same regardless of thread-safety,
5281 * except if the code is ported from working on another platform where
5282 * there might be some reason to do this. But this is typically due to
5283 * some alien-to-Perl library that thinks it owns locale setting. Such a
5284 * library isn't likely to exist on Windows, so such an application is
5285 * unlikely to be run on Windows
5287 bool restore_per_thread = FALSE;
5289 /* Save the per-thread locale state */
5290 const char * save_thread = querylocale_c(LC_ALL);
5292 /* Change to the global locale, and note if we already were there */
5293 int config_return = _configthreadlocale(_DISABLE_PER_THREAD_LOCALE);
5294 if (config_return != _DISABLE_PER_THREAD_LOCALE) {
5295 if (config_return == -1) {
5296 locale_panic_("_configthreadlocale returned an error");
5299 restore_per_thread = TRUE;
5302 /* Save the state of the global locale; then convert to our desired
5304 const char * save_global = querylocale_c(LC_ALL);
5305 void_setlocale_c(LC_ALL, save_thread);
5307 # endif /* TS_W32_BROKEN_LOCALECONV */
5309 /* Finally, do the actual localeconv */
5310 const char *lcbuf_as_string = (const char *) localeconv();
5312 /* Fill in the string fields of the HV* */
5313 for (unsigned int i = 0; i < 2; i++) {
5315 /* One iteration is only for the numeric string fields. Skip these
5316 * unless we are compiled to care about those fields and the input
5317 * parameters indicate we want their values */
5318 if ( i == NUMERIC_STRING_OFFSET
5320 # ifdef USE_LOCALE_NUMERIC
5322 && (which_mask & INDEX_TO_BIT(LC_NUMERIC_INDEX_)) == 0
5330 /* The other iteration is only for the monetary string fields. Again
5331 * skip it unless we want those values */
5332 if ( i == MONETARY_STRING_OFFSET
5334 # ifdef USE_LOCALE_MONETARY
5336 && (which_mask & INDEX_TO_BIT(LC_MONETARY_INDEX_)) == 0
5344 /* For each field for the given category ... */
5345 const lconv_offset_t * category_strings = strings[i];
5347 const char * name = category_strings->name;
5348 if (! name) { /* Quit at the end */
5352 /* we have set things up so that we know where in the returned
5353 * structure, when viewed as a string, the corresponding value is.
5355 const char *value = *((const char **)( lcbuf_as_string
5356 + category_strings->offset));
5358 /* Set to get next string on next iteration */
5361 /* Skip if this platform doesn't have this field. */
5366 /* Copy to the hash */
5369 newSVpv(value, strlen(value)),
5373 /* Add any int fields to the HV* */
5374 if (i == MONETARY_STRING_OFFSET && integers) {
5375 while (integers->name) {
5376 const char value = *((const char *)( lcbuf_as_string
5377 + integers->offset));
5378 (void) hv_store(hv, integers->name,
5379 strlen(integers->name), newSViv(value), 0);
5383 } /* End of loop through the fields */
5385 /* Done with copying to the hash. Can unwind the critical section locks */
5387 # if defined(TS_W32_BROKEN_LOCALECONV) && defined(USE_THREAD_SAFE_LOCALE)
5389 /* Restore the global locale's prior state */
5390 void_setlocale_c(LC_ALL, save_global);
5392 /* And back to per-thread locales */
5393 if (restore_per_thread) {
5394 if (_configthreadlocale(_ENABLE_PER_THREAD_LOCALE) == -1) {
5395 locale_panic_("_configthreadlocale returned an error");
5399 /* Restore the per-thread locale state */
5400 void_setlocale_c(LC_ALL, save_thread);
5402 # endif /* TS_W32_BROKEN_LOCALECONV */
5404 gwLOCALE_UNLOCK; /* Finished with the critical section of a
5405 globally-accessible buffer */
5407 # if defined(USE_LOCALE_MONETARY) && defined(WIN32)
5409 restore_toggled_locale_i(LC_MONETARY_INDEX_, orig_MONETARY_locale);
5412 # ifdef USE_LOCALE_NUMERIC
5414 restore_toggled_locale_i(LC_NUMERIC_INDEX_, orig_NUMERIC_locale);
5415 if (which_mask & INDEX_TO_BIT(LC_NUMERIC_INDEX_)) {
5420 # ifdef USE_LOCALE_CTYPE
5422 restore_toggled_locale_c(LC_CTYPE, orig_CTYPE_locale);
5428 #endif /* defined(HAS_LOCALECONV) */
5429 #ifndef HAS_SOME_LANGINFO
5431 typedef int nl_item; /* Substitute 'int' for emulated nl_langinfo() */
5437 =for apidoc Perl_langinfo
5438 =for apidoc_item Perl_langinfo8
5440 C<Perl_langinfo> is an (almost) drop-in replacement for the system
5441 C<L<nl_langinfo(3)>>, taking the same C<item> parameter values, and returning
5442 the same information. But it is more thread-safe than regular
5443 C<nl_langinfo()>, and hides the quirks of Perl's locale handling from your
5444 code, and can be used on systems that lack a native C<nl_langinfo>.
5446 However, you should instead use the improved version of this:
5447 L</Perl_langinfo8>, which behaves identically except for an additional
5448 parameter, a pointer to a variable declared as L</C<utf8ness_t>>, into which it
5449 returns to you how you should treat the returned string with regards to it
5450 being encoded in UTF-8 or not.
5452 Concerning the differences between these and plain C<nl_langinfo()>:
5458 C<Perl_langinfo8> has an extra parameter, described above. Besides this, the
5459 other reason they aren't quite a drop-in replacement is actually an advantage.
5460 The C<const>ness of the return allows the compiler to catch attempts to write
5461 into the returned buffer, which is illegal and could cause run-time crashes.
5465 They deliver the correct results for the C<RADIXCHAR> and C<THOUSEP> items,
5466 without you having to write extra code. The reason for the extra code would be
5467 because these are from the C<LC_NUMERIC> locale category, which is normally
5468 kept set by Perl so that the radix is a dot, and the separator is the empty
5469 string, no matter what the underlying locale is supposed to be, and so to get
5470 the expected results, you have to temporarily toggle into the underlying
5471 locale, and later toggle back. (You could use plain C<nl_langinfo> and
5472 C<L</STORE_LC_NUMERIC_FORCE_TO_UNDERLYING>> for this but then you wouldn't get
5473 the other advantages of C<Perl_langinfo()>; not keeping C<LC_NUMERIC> in the C
5474 (or equivalent) locale would break a lot of CPAN, which is expecting the radix
5475 (decimal point) character to be a dot.)
5479 The system function they replace can have its static return buffer trashed,
5480 not only by a subsequent call to that function, but by a C<freelocale>,
5481 C<setlocale>, or other locale change. The returned buffer of these functions
5482 is not changed until the next call to one or the other, so the buffer is never
5487 The return buffer is per-thread, so it also is never overwritten by a call to
5488 these functions from another thread; unlike the function it replaces.
5492 But most importantly, they work on systems that don't have C<nl_langinfo>, such
5493 as Windows, hence making your code more portable. Of the fifty-some possible
5494 items specified by the POSIX 2008 standard,
5495 L<http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/langinfo.h.html>,
5496 only one is completely unimplemented, though on non-Windows platforms, another
5497 significant one is not fully implemented). They use various techniques to
5498 recover the other items, including calling C<L<localeconv(3)>>, and
5499 C<L<strftime(3)>>, both of which are specified in C89, so should be always be
5500 available. Later C<strftime()> versions have additional capabilities.
5501 If an item is not available on your system, this returns either the value
5502 associated with the C locale, or simply C<"">, whichever is more appropriate.
5504 It is important to note that, when called with an item that is recovered by
5505 using C<localeconv>, the buffer from any previous explicit call to
5506 C<L<localeconv(3)>> will be overwritten. But you shouldn't be using
5507 C<localeconv> anyway because it is is very much not thread-safe, and suffers
5508 from the same problems outlined in item 'b.' above for the fields it returns
5509 that are controlled by the LC_NUMERIC locale category. Instead, avoid all of
5510 those problems by calling L</Perl_localeconv>, which is thread-safe; or by
5511 using the methods given in L<perlcall> to call
5512 L<C<POSIX::localeconv()>|POSIX/localeconv>, which is also thread-safe.
5516 The details for those items which may deviate from what this emulation returns
5517 and what a native C<nl_langinfo()> would return are specified in
5520 When using C<Perl_langinfo8> (or plain C<Perl_langinfo>) on systems that don't
5521 have a native C<nl_langinfo()>, you must
5523 #include "perl_langinfo.h"
5525 before the C<perl.h> C<#include>. You can replace your F<langinfo.h>
5526 C<#include> with this one. (Doing it this way keeps out the symbols that plain
5527 F<langinfo.h> would try to import into the namespace for code that doesn't need
5535 Perl_langinfo(const nl_item item)
5537 return Perl_langinfo8(item, NULL);
5541 Perl_langinfo8(const nl_item item, utf8ness_t * utf8ness)
5544 locale_category_index cat_index;
5546 PERL_ARGS_ASSERT_PERL_LANGINFO8;
5548 if (utf8ness) { /* Assume for now */
5549 *utf8ness = UTF8NESS_IMMATERIAL;
5552 /* Find the locale category that controls the input 'item'. If we are not
5553 * paying attention to that category, instead return a default value. Also
5554 * return the default value if there is no way for us to figure out the
5555 * correct value. If we have some form of nl_langinfo(), we can always
5556 * figure it out, but lacking that, there may be alternative methods that
5557 * can be used to recover most of the possible items. Some of those
5558 * methods need libc functions, which may or may not be available. If
5559 * unavailable, we can't compute the correct value, so must here return the
5565 #ifdef USE_LOCALE_CTYPE
5567 cat_index = LC_CTYPE_INDEX_;
5573 #if defined(USE_LOCALE_MESSAGES) && defined(HAS_SOME_LANGINFO)
5575 case YESEXPR: case YESSTR: case NOEXPR: case NOSTR:
5576 cat_index = LC_MESSAGES_INDEX_;
5579 case YESEXPR: return "^[+1yY]";
5580 case YESSTR: return "yes";
5581 case NOEXPR: return "^[-0nN]";
5582 case NOSTR: return "no";
5587 #if defined(USE_LOCALE_MONETARY) \
5588 && (defined(HAS_SOME_LANGINFO) || defined(HAS_LOCALECONV))
5590 cat_index = LC_MONETARY_INDEX_;
5598 #ifdef CAN_CALCULATE_RADIX
5600 cat_index = LC_NUMERIC_INDEX_;
5603 return C_decimal_point;
5608 #if defined(USE_LOCALE_NUMERIC) \
5609 && (defined(HAS_SOME_LANGINFO) || defined(HAS_LOCALECONV))
5611 cat_index = LC_NUMERIC_INDEX_;
5614 return C_thousands_sep;
5617 /* The other possible items are all in LC_TIME. */
5618 #ifdef USE_LOCALE_TIME
5621 cat_index = LC_TIME_INDEX_;
5625 #if ! defined(USE_LOCALE_TIME) || ! defined(HAS_SOME_LANGINFO)
5627 /* If not using LC_TIME, hard code the rest. Or, if there is no
5628 * nl_langinfo(), we use strftime() as an alternative, and it is missing
5629 * functionality to get every single one, so hard-code those */
5631 case ERA: return ""; /* Unimplemented; for use with strftime() %E
5634 /* These formats are defined by C89, so we assume that strftime supports
5635 * them, and so are returned unconditionally; they may not be what the
5636 * locale actually says, but should give good enough results for someone
5637 * using them as formats (as opposed to trying to parse them to figure
5638 * out what the locale says). The other format items are actually tested
5639 * to verify they work on the platform */
5640 case D_FMT: return "%x";
5641 case T_FMT: return "%X";
5642 case D_T_FMT: return "%c";
5644 # if defined(WIN32) || ! defined(USE_LOCALE_TIME)
5646 /* strftime() on Windows doesn't have the POSIX (beyond C89) extensions
5647 * that would allow it to recover these */
5648 case ERA_D_FMT: return "%x";
5649 case ERA_T_FMT: return "%X";
5650 case ERA_D_T_FMT: return "%c";
5651 case ALT_DIGITS: return "0";
5654 # ifndef USE_LOCALE_TIME
5656 case T_FMT_AMPM: return "%r";
5657 case ABDAY_1: return "Sun";
5658 case ABDAY_2: return "Mon";
5659 case ABDAY_3: return "Tue";
5660 case ABDAY_4: return "Wed";
5661 case ABDAY_5: return "Thu";
5662 case ABDAY_6: return "Fri";
5663 case ABDAY_7: return "Sat";
5664 case AM_STR: return "AM";
5665 case PM_STR: return "PM";
5666 case ABMON_1: return "Jan";
5667 case ABMON_2: return "Feb";
5668 case ABMON_3: return "Mar";
5669 case ABMON_4: return "Apr";
5670 case ABMON_5: return "May";
5671 case ABMON_6: return "Jun";
5672 case ABMON_7: return "Jul";
5673 case ABMON_8: return "Aug";
5674 case ABMON_9: return "Sep";
5675 case ABMON_10: return "Oct";
5676 case ABMON_11: return "Nov";
5677 case ABMON_12: return "Dec";
5678 case DAY_1: return "Sunday";
5679 case DAY_2: return "Monday";
5680 case DAY_3: return "Tuesday";
5681 case DAY_4: return "Wednesday";
5682 case DAY_5: return "Thursday";
5683 case DAY_6: return "Friday";
5684 case DAY_7: return "Saturday";
5685 case MON_1: return "January";
5686 case MON_2: return "February";
5687 case MON_3: return "March";
5688 case MON_4: return "April";
5689 case MON_5: return "May";
5690 case MON_6: return "June";
5691 case MON_7: return "July";
5692 case MON_8: return "August";
5693 case MON_9: return "September";
5694 case MON_10: return "October";
5695 case MON_11: return "November";
5696 case MON_12: return "December";
5701 } /* End of switch on item */
5705 Perl_croak_nocontext("panic: Unexpected nl_langinfo() item %d", item);
5706 NOT_REACHED; /* NOTREACHED */
5707 PERL_UNUSED_VAR(cat_index);
5711 return my_langinfo_i(item, cat_index, query_nominal_locale_i(cat_index),
5712 &PL_langinfo_buf, &PL_langinfo_bufsize, utf8ness);
5718 Perl_my_strftime(pTHX_ const char *fmt, int sec, int min, int hour,
5719 int mday, int mon, int year, int wday, int yday,
5721 { /* Documented above */
5722 PERL_ARGS_ASSERT_MY_STRFTIME;
5725 ints_to_tm(&mytm, sec, min, hour, mday, mon, year, wday, yday, isdst);
5726 char * ret = strftime_tm(fmt, &mytm);
5731 Perl_sv_strftime_tm(pTHX_ SV * fmt, const struct tm * mytm)
5732 { /* Documented above */
5733 PERL_ARGS_ASSERT_SV_STRFTIME_TM;
5735 utf8ness_t fmt_utf8ness = (SvUTF8(fmt) && LIKELY(! IN_BYTES))
5739 utf8ness_t result_utf8ness;
5740 char * retval = strftime8(SvPV_nolen(fmt),
5744 true /* calling from sv_strftime */
5748 STRLEN len = strlen(retval);
5750 sv_usepvn_flags(sv, retval, len, SV_HAS_TRAILING_NUL);
5752 if (result_utf8ness == UTF8NESS_YES) {
5761 Perl_sv_strftime_ints(pTHX_ SV * fmt, int sec, int min, int hour,
5762 int mday, int mon, int year, int wday,
5763 int yday, int isdst)
5764 { /* Documented above */
5765 PERL_ARGS_ASSERT_SV_STRFTIME_INTS;
5768 ints_to_tm(&mytm, sec, min, hour, mday, mon, year, wday, yday, isdst);
5769 SV * ret = sv_strftime_tm(fmt, &mytm);
5775 /* There are several implementations of my_langinfo, depending on the
5776 * Configuration. They all share the same beginning of the function */
5778 S_my_langinfo_i(pTHX_
5779 const nl_item item, /* The item to look up */
5780 const locale_category_index cat_index, /* The locale category
5782 /* The locale to look up 'item' in. */
5783 const char * locale,
5785 /* Where to store the result, and where the size of that buffer
5786 * is stored, updated on exit. retbuf_sizep may be NULL for an
5787 * empty-on-entry, single use buffer whose size we don't need
5788 * to keep track of */
5790 Size_t * retbuf_sizep,
5792 /* If not NULL, the location to store the UTF8-ness of 'item's
5793 * value, as documented */
5794 utf8ness_t * utf8ness)
5796 const char * retval = NULL;
5798 PERL_ARGS_ASSERT_MY_LANGINFO_I;
5799 assert(cat_index < LC_ALL_INDEX_);
5801 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
5802 "Entering my_langinfo item=%ld, using locale %s\n",
5803 (long) item, locale));
5804 /*--------------------------------------------------------------------------*/
5805 /* Above is the common beginning to all the implementations of my_langinfo().
5806 * Below are the various completions.
5808 * Some platforms don't deal well with non-ASCII strings in locale X when
5809 * LC_CTYPE is not in X. (Actually it is probably when X is UTF-8 and LC_CTYPE
5810 * isn't, or vice versa). There is explicit code to bring the categories into
5811 * sync. This doesn't seem to be a problem with nl_langinfo(), so that
5812 * implementation doesn't currently worry about it. But it is a problem on
5813 * Windows boxes, which don't have nl_langinfo(). */
5815 /*--------------------------------------------------------------------------*/
5816 # if defined(HAS_NL_LANGINFO) /* nl_langinfo() is available. */
5817 # ifdef USE_LOCALE_CTYPE
5819 /* This function sorts out if things actually have to be switched or not,
5820 * for both save and restore. */
5821 const char * orig_CTYPE_locale = toggle_locale_c(LC_CTYPE, locale);
5825 const char * orig_switched_locale = toggle_locale_i(cat_index, locale);
5828 retval = save_to_buffer(nl_langinfo(item), retbufp, retbuf_sizep);
5832 *utf8ness = get_locale_string_utf8ness_i(retval,
5833 LOCALE_UTF8NESS_UNKNOWN,
5837 restore_toggled_locale_i(cat_index, orig_switched_locale);
5839 # ifdef USE_LOCALE_CTYPE
5841 restore_toggled_locale_c(LC_CTYPE, orig_CTYPE_locale);
5846 /*--------------------------------------------------------------------------*/
5847 # else /* Below, emulate nl_langinfo as best we can */
5849 /* The other completion is where we have to emulate nl_langinfo(). There
5850 * are various possibilities depending on the Configuration */
5852 # ifdef USE_LOCALE_CTYPE
5854 const char * orig_CTYPE_locale = toggle_locale_c(LC_CTYPE, locale);
5858 const char * orig_switched_locale = toggle_locale_i(cat_index, locale);
5860 /* Here, we are in the locale we want information about */
5862 /* Almost all the items will have ASCII return values. Set that here, and
5863 * override if necessary */
5864 utf8ness_t is_utf8 = UTF8NESS_IMMATERIAL;
5868 assert(item < 0); /* Make sure using perl_langinfo.h */
5874 # if defined(HAS_SNPRINTF) \
5875 && (! defined(HAS_LOCALECONV) || defined(TS_W32_BROKEN_LOCALECONV))
5878 /* snprintf() can be used to find the radix character by outputting
5879 * a known simple floating point number to a buffer, and parsing
5880 * it, inferring the radix as the bytes separating the integer and
5881 * fractional parts. But localeconv() is more direct, not
5882 * requiring inference, so use it instead of the code just below,
5883 * if (likely) it is available and works ok */
5885 char * floatbuf = NULL;
5886 const Size_t initial_size = 10;
5888 Newx(floatbuf, initial_size, char);
5890 /* 1.5 is exactly representable on binary computers */
5891 Size_t needed_size = snprintf(floatbuf, initial_size, "%.1f", 1.5);
5893 /* If our guess wasn't big enough, increase and try again, based on
5894 * the real number that snprintf() is supposed to return */
5895 if (UNLIKELY(needed_size >= initial_size)) {
5896 needed_size++; /* insurance */
5897 Renew(floatbuf, needed_size, char);
5898 Size_t new_needed = snprintf(floatbuf, needed_size, "%.1f", 1.5);
5899 assert(new_needed <= needed_size);
5900 needed_size = new_needed;
5903 char * s = floatbuf;
5904 char * e = floatbuf + needed_size;
5907 while (s < e && *s != '1') {
5911 if (LIKELY(s < e)) {
5916 char * item_start = s;
5917 while (s < e && *s != '5') {
5921 /* Everything in between is the radix string */
5922 if (LIKELY(s < e)) {
5924 retval = save_to_buffer(item_start, retbufp, retbuf_sizep);
5928 is_utf8 = get_locale_string_utf8ness_i(retval,
5929 LOCALE_UTF8NESS_UNKNOWN,
5939 # ifdef HAS_LOCALECONV /* snprintf() failed; drop down to use
5944 # else /* snprintf() failed and no localeconv() */
5946 retval = C_decimal_point;
5951 # ifdef HAS_LOCALECONV
5953 /* These items are available from localeconv(). (To avoid using
5954 * TS_W32_BROKEN_LOCALECONV, one could use GetNumberFormat and
5955 * GetCurrencyFormat; patches welcome) */
5957 # define P_CS_PRECEDES "p_cs_precedes"
5958 # define CURRENCY_SYMBOL "currency_symbol"
5960 /* case RADIXCHAR: // May drop down to here in some configurations */
5965 /* The hash gets populated with just the field(s) related to 'item'. */
5966 HV * result_hv = my_localeconv(item);
5969 if (item != CRNCYSTR) {
5971 /* These items have been populated with just one key => value */
5972 (void) hv_iterinit(result_hv);
5973 HE * entry = hv_iternext(result_hv);
5974 string = hv_iterval(result_hv, entry);
5978 /* But CRNCYSTR localeconv() returns a slightly different value
5979 * than the nl_langinfo() API calls for, so have to modify this one
5980 * to conform. We need another value from localeconv() to know
5981 * what to change it to. my_localeconv() has populated the hash
5982 * with exactly both fields. Delete this one, leaving just the
5983 * CRNCYSTR one in the hash */
5984 SV* precedes = hv_delete(result_hv,
5985 P_CS_PRECEDES, STRLENs(P_CS_PRECEDES),
5988 locale_panic_("my_localeconv() unexpectedly didn't return"
5989 " a value for " P_CS_PRECEDES);
5992 /* The modification is to prefix the localeconv() return with a
5993 * single byte, calculated as follows: */
5994 char prefix = (LIKELY(SvIV(precedes) != -1))
5995 ? ((precedes != 0) ? '-' : '+')
5997 /* khw couldn't find any documentation that
5998 * CHAR_MAX (which we modify to -1) is the signal,
5999 * but cygwin uses it thusly, and it makes sense
6000 * given that CHAR_MAX indicates the value isn't
6001 * used, so it neither precedes nor succeeds */
6004 /* Now get CRNCYSTR */
6005 (void) hv_iterinit(result_hv);
6006 HE * entry = hv_iternext(result_hv);
6007 string = hv_iterval(result_hv, entry);
6009 /* And perform the modification */
6010 Perl_sv_setpvf(aTHX_ string, "%c%s", prefix, SvPV_nolen(string));
6013 /* Here, 'string' contains the value we want to return */
6014 retval = save_to_buffer(SvPV_nolen(string), retbufp, retbuf_sizep);
6017 is_utf8 = get_locale_string_utf8ness_i(retval,
6018 LOCALE_UTF8NESS_UNKNOWN,
6027 # endif /* Some form of localeconv */
6028 # ifdef HAS_STRFTIME
6030 /* These formats are only available in later strftime's */
6031 case ERA_D_FMT: case ERA_T_FMT: case ERA_D_T_FMT: case T_FMT_AMPM:
6033 /* The rest can be gotten from most versions of strftime(). */
6034 case ABDAY_1: case ABDAY_2: case ABDAY_3:
6035 case ABDAY_4: case ABDAY_5: case ABDAY_6: case ABDAY_7:
6037 case AM_STR: case PM_STR:
6038 case ABMON_1: case ABMON_2: case ABMON_3: case ABMON_4:
6039 case ABMON_5: case ABMON_6: case ABMON_7: case ABMON_8:
6040 case ABMON_9: case ABMON_10: case ABMON_11: case ABMON_12:
6041 case DAY_1: case DAY_2: case DAY_3: case DAY_4:
6042 case DAY_5: case DAY_6: case DAY_7:
6043 case MON_1: case MON_2: case MON_3: case MON_4:
6044 case MON_5: case MON_6: case MON_7: case MON_8:
6045 case MON_9: case MON_10: case MON_11: case MON_12:
6047 const char * format;
6048 bool return_format = FALSE;
6053 GCC_DIAG_IGNORE_STMT(-Wimplicit-fallthrough);
6057 locale_panic_(Perl_form(aTHX_ "switch case: %d problem", item));
6058 NOT_REACHED; /* NOTREACHED */
6060 case PM_STR: hour = 18;
6064 case ABDAY_7: mday++;
6065 case ABDAY_6: mday++;
6066 case ABDAY_5: mday++;
6067 case ABDAY_4: mday++;
6068 case ABDAY_3: mday++;
6069 case ABDAY_2: mday++;
6082 case ABMON_12: mon++;
6083 case ABMON_11: mon++;
6084 case ABMON_10: mon++;
6085 case ABMON_9: mon++;
6086 case ABMON_8: mon++;
6087 case ABMON_7: mon++;
6088 case ABMON_6: mon++;
6089 case ABMON_5: mon++;
6090 case ABMON_4: mon++;
6091 case ABMON_3: mon++;
6092 case ABMON_2: mon++;
6112 return_format = TRUE;
6116 return_format = TRUE;
6120 return_format = TRUE;
6124 return_format = TRUE;
6127 format = "%Ow"; /* Find the alternate digit for 0 */
6131 GCC_DIAG_RESTORE_STMT;
6133 /* The year was deliberately chosen so that January 1 is on the
6134 * first day of the week. Since we're only getting one thing at a
6135 * time, it all works */
6137 ints_to_tm(&mytm, 30, 30, hour, mday, mon, 2011, 0, 0, 0);
6140 temp = strftime8(format,
6142 UTF8NESS_IMMATERIAL, /* All possible formats
6146 false /* not calling from sv_strftime */
6150 temp = strftime_tm(format, &mytm);
6153 retval = save_to_buffer(temp, retbufp, retbuf_sizep);
6156 /* If the item is 'ALT_DIGITS', '*retbuf' contains the alternate
6157 * format for wday 0. If the value is the same as the normal 0,
6158 * there isn't an alternate, so clear the buffer.
6160 * (wday was chosen because its range is all a single digit.
6161 * Things like tm_sec have two digits as the minimum: '00'.) */
6162 if (item == ALT_DIGITS && strEQ(*retbufp, "0")) {
6167 /* ALT_DIGITS is problematic. Experiments on it showed that
6168 * strftime() did not always work properly when going from alt-9 to
6169 * alt-10. Only a few locales have this item defined, and in all
6170 * of them on Linux that khw was able to find, nl_langinfo() merely
6171 * returned the alt-0 character, possibly doubled. Most Unicode
6172 * digits are in blocks of 10 consecutive code points, so that is
6173 * sufficient information for such scripts, as we can infer alt-1,
6174 * alt-2, .... But for a Japanese locale, a CJK ideographic 0 is
6175 * returned, and the CJK digits are not in code point order, so you
6176 * can't really infer anything. The localedef for this locale did
6177 * specify the succeeding digits, so that strftime() works properly
6178 * on them, without needing to infer anything. But the
6179 * nl_langinfo() return did not give sufficient information for the
6180 * caller to understand what's going on. So until there is
6181 * evidence that it should work differently, this returns the alt-0
6182 * string for ALT_DIGITS. */
6184 if (return_format) {
6186 /* If to return the format, not the value, overwrite the buffer
6187 * with it. But some strftime()s will keep the original format
6188 * if illegal, so change those to "" */
6189 if (strEQ(*retbufp, format)) {
6196 /* A format is always in ASCII */
6197 is_utf8 = UTF8NESS_IMMATERIAL;
6204 # ifdef USE_LOCALE_CTYPE
6208 /* The trivial case */
6209 if (isNAME_C_OR_POSIX(locale)) {
6216 /* This function retrieves the code page. It is subject to change, but
6217 * is documented and has been stable for many releases */
6218 UINT ___lc_codepage_func(void);
6220 # ifndef WIN32_USE_FAKE_OLD_MINGW_LOCALES
6222 retval = save_to_buffer(Perl_form(aTHX_ "%d", ___lc_codepage_func()),
6223 retbufp, retbuf_sizep);
6226 retval = save_to_buffer(nl_langinfo(CODESET),
6227 retbufp, retbuf_sizep);
6230 DEBUG_Lv(PerlIO_printf(Perl_debug_log, "locale='%s' cp=%s\n",
6236 /* The codeset is important, but khw did not figure out a way for it to
6237 * be retrieved on non-Windows boxes without nl_langinfo(). But even
6238 * if we can't get it directly, we can usually determine if it is a
6239 * UTF-8 locale or not. If it is UTF-8, we (correctly) use that for
6242 # if defined(HAS_MBTOWC) || defined(HAS_MBRTOWC)
6244 /* If libc mbtowc() evaluates the bytes that form the REPLACEMENT
6245 * CHARACTER as that Unicode code point, this has to be a UTF-8 locale.
6248 (void) Perl_mbtowc_(aTHX_ NULL, NULL, 0);/* Reset shift state */
6249 int mbtowc_ret = Perl_mbtowc_(aTHX_ &wc,
6250 STR_WITH_LEN(REPLACEMENT_CHARACTER_UTF8));
6251 if (mbtowc_ret >= 0 && wc == UNICODE_REPLACEMENT) {
6252 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
6253 "mbtowc returned REPLACEMENT\n"));
6258 /* Here, it isn't a UTF-8 locale. After the #else clause is code to
6259 * find the codeset (if any) from the locale name */
6263 /* Here, neither mbtowc() nor mbrtowc() is available. The chances of
6264 * this are very small, as they are C99 required functions, and we are
6265 * now requiring C99; perhaps this is a defective implementation and
6266 * therefore Configure has been set to indicate neither exists.
6268 * Just below we try to calculate the code set from the locale name.
6269 * In all cases but this one, it has already been determined that it is
6270 * not a UTF-8 locale. But for this case, we defer that, calculate the
6271 * code set name, if any, and later use that result as a hint. First
6272 * #define a symbol to later tell us that we need to handle this case.
6274 # define NEED_FURTHER_UTF8NESS_CHECKING
6277 /* Here, the code set has not been found. The only other option khw
6278 * could think of is to see if the codeset is part of the locale name.
6279 * This is very less than ideal; often there is no code set in the
6280 * name; and at other times they even lie.
6282 * But there is an XPG standard syntax, which many locales follow:
6284 * language[_territory[.codeset]][@modifier]
6286 * So we take the part between the dot and any '@' */
6287 retval = strchr(locale, '.');
6289 retval = ""; /* Alas, no dot */
6293 /* Don't include the dot */
6296 /* And stop before any '@' */
6297 const char * modifier = strchr(retval, '@');
6299 char * code_set_name;
6300 const Size_t name_len = modifier - retval;
6301 Newx(code_set_name, name_len + 1, char); /* +1 for NUL */
6302 my_strlcpy(code_set_name, retval, name_len + 1);
6303 SAVEFREEPV(code_set_name);
6304 retval = code_set_name;
6307 /* The code set name is considered to be everything between the dot
6309 retval = save_to_buffer(retval, retbufp, retbuf_sizep);
6312 # ifndef NEED_FURTHER_UTF8NESS_CHECKING
6314 break; /* All done */
6317 # define NAME_INDICATES_UTF8 0x1
6318 # define MB_CUR_MAX_SUGGESTS_UTF8 0x2
6320 /* Here, 'retval' contains whatever code set name is in the locale
6321 * name. In this #else, it being a UTF-8 code set hasn't been
6322 * determined, because this platform is lacking the libc functions
6323 * which would readily return that information. So, we try to infer
6324 * the UTF-8ness by other means, using the code set name just found as
6325 * a hint to help resolve ambiguities. So if that name indicates it is
6326 * UTF-8, we expect it to be so */
6327 unsigned int lean_towards_being_utf8 = 0;
6328 if (is_codeset_name_UTF8(retval)) {
6329 lean_towards_being_utf8 |= NAME_INDICATES_UTF8;
6332 /* The code set is often UTF-8, even when the locale name doesn't so
6333 * indicate. If we discover this is so, we will override whatever the
6334 * locale name said. Conversely (but rarely), "UTF-8" in the locale
6335 * name might be wrong. We return "" as the code set name if we find
6336 * that to be the case.
6338 * For this portion of the file to compile, neither mbtowc() nor
6339 * mbrtowc() are available to us, even though they are required by C99.
6340 * So, something must be wrong with them. The code here should be good
6341 * enough to work around this issue, but should the need arise, you
6342 * could look for other C99 functions that are implemented correctly to
6345 * But MB_CUR_MAX is a C99 construct that helps a lot, is simple for a
6346 * vendor to implement, and our experience with it is that it works
6347 * well on a variety of platforms. We have found that it returns a
6348 * too-large number on some platforms for the C locale, but for no
6349 * others. That locale was already ruled out above. (If MB_CUR_MAX
6350 * returned too small a number, that would break a lot of things, and
6351 * likely would be quickly corrected by the vendor.) khw has some
6352 * confidence that it doesn't return >1 when 1 is meant, as that would
6353 * trigger a Perl warning, and we've had no reports of invalid
6354 * occurrences of such. */
6357 /* If there are fewer bytes available in this locale than are required
6358 * to represent the largest legal UTF-8 code point, this definitely
6359 * isn't a UTF-8 locale, even if the locale name says it is. */
6361 const int mb_cur_max = MB_CUR_MAX;
6363 if (mb_cur_max < (int) UNISKIP(PERL_UNICODE_MAX)) {
6364 if (lean_towards_being_utf8 & NAME_INDICATES_UTF8) {
6365 retval = ""; /* The name is wrong; override */
6371 /* But if the locale could be UTF-8, and also the name corroborates
6372 * this, assume it is so */
6373 if (lean_towards_being_utf8 & NAME_INDICATES_UTF8) {
6377 /* Here, the name doesn't indicate UTF-8, but MB_CUR_MAX indicates it
6378 * could be. khw knows of only two other locales in the world, EUC-TW
6379 * and GB 18030, that legitimately require this many bytes (4). In
6380 * both, the single byte characters are the same as ASCII. No
6381 * multi-byte character in EUC-TW is legal UTF-8 (since the first byte
6382 * of each is a continuation). GB 18030 has no three byte sequences,
6383 * and none of the four byte ones is legal UTF-8 (as the second byte
6384 * for these is a non-continuation). But every legal UTF-8 two byte
6385 * sequence is also legal in GB 18030, though none have the same
6386 * meaning, and no Han code point expressed in UTF-8 is two byte. So
6387 * the further tests below which look for native expressions of
6388 * currency and time will not return two byte sequences, hence they
6389 * will reliably rule out this locale as being UTF-8. So, if we get
6390 * this far, the result is almost certainly UTF-8. But to be really
6391 * sure, also check that there is no illegal UTF-8. */
6392 lean_towards_being_utf8 |= MB_CUR_MAX_SUGGESTS_UTF8;
6394 # endif /* has MB_CUR_MAX */
6396 /* Here, MB_CUR_MAX is not available, or was inconclusive. What we do
6397 * is to look at various strings associated with the locale:
6398 * 1) If any are illegal UTF-8, the locale can't be UTF-8.
6399 * 2) If all are legal UTF-8, and some non-ASCII characters are
6400 * present, it is likely to be UTF-8, because of the strictness of
6401 * UTF-8 syntax. So assume it is UTF-8
6402 * 3) If all are ASCII and the locale name and/or MB_CUR_MAX indicate
6403 * UTF-8, assume the locale is UTF-8.
6404 * 4) Otherwise, assume the locale isn't UTF-8
6406 * To save cycles, if the locale name indicates it is a UTF-8 locale,
6407 * we stop looking at the first instance with legal non-ASCII UTF-8.
6408 * It is very unlikely this combination is coincidental. */
6410 utf8ness_t strings_utf8ness = UTF8NESS_UNKNOWN;
6411 char * scratch_buf = NULL;
6412 Size_t scratch_buf_size = 0;
6414 /* List of strings to look at */
6415 const int trials[] = {
6417 # if defined(USE_LOCALE_MONETARY) && defined(HAS_LOCALECONV)
6419 /* The first string tried is the locale currency name. Often that
6420 * will be in the native script.
6422 * But this is usable only if localeconv() is available, as that's
6423 * the way we find out the currency symbol. */
6428 # ifdef USE_LOCALE_TIME
6430 /* We can also try various strings associated with LC_TIME, like the
6431 * names of months or days of the week */
6433 DAY_1, DAY_2, DAY_3, DAY_4, DAY_5, DAY_6, DAY_7,
6434 MON_1, MON_2, MON_3, MON_4, MON_5, MON_6, MON_7, MON_8,
6435 MON_9, MON_10, MON_11, MON_12,
6436 ALT_DIGITS, AM_STR, PM_STR,
6437 ABDAY_1, ABDAY_2, ABDAY_3, ABDAY_4, ABDAY_5, ABDAY_6, ABDAY_7,
6438 ABMON_1, ABMON_2, ABMON_3, ABMON_4, ABMON_5, ABMON_6,
6439 ABMON_7, ABMON_8, ABMON_9, ABMON_10, ABMON_11, ABMON_12
6444 # ifdef USE_LOCALE_TIME
6446 /* The code in the recursive call below can handle switching the
6447 * locales, but by doing it now here, that code will check and discover
6448 * that there is no need to switch then restore, avoiding those each
6450 const char * orig_TIME_locale = toggle_locale_c(LC_TIME, locale);
6454 /* The trials array may consist of strings from two different locale
6455 * categories. The call to my_langinfo_i() below needs to pass the
6456 * proper category for each string. There is a max of 1 trial for
6457 * LC_MONETARY; the rest are LC_TIME. So the array is arranged so the
6458 * LC_MONETARY item (if any) is first, and all subsequent iterations
6459 * will use LC_TIME. These #ifdefs set up the values for all possible
6461 # if defined(USE_LOCALE_MONETARY) && defined(HAS_LOCALECONV)
6463 locale_category_index cat_index = LC_MONETARY_INDEX_;
6465 # ifdef USE_LOCALE_TIME
6467 const locale_category_index follow_on_cat_index = LC_TIME_INDEX_;
6468 assert(trials[1] == DAY_1); /* Make sure only a single non-time entry */
6472 /* Effectively out-of-bounds, as there is only the monetary entry */
6473 const locale_category_index follow_on_cat_index = LC_ALL_INDEX_;
6476 # elif defined(USE_LOCALE_TIME)
6478 locale_category_index cat_index = LC_TIME_INDEX_;
6479 const locale_category_index follow_on_cat_index = LC_TIME_INDEX_;
6483 /* Effectively out-of-bounds, as here there are no trial entries at
6484 * all. This allows this code to compile, but there are no strings to
6485 * test, and so the answer will always be non-UTF-8. */
6486 locale_category_index cat_index = LC_ALL_INDEX_;
6487 const locale_category_index follow_on_cat_index = LC_ALL_INDEX_;
6490 /* Everything set up; look through all the strings */
6491 for (PERL_UINT_FAST8_T i = 0; i < C_ARRAY_LENGTH(trials); i++) {
6492 (void) my_langinfo_i(trials[i], cat_index, locale,
6493 &scratch_buf, &scratch_buf_size, NULL);
6494 cat_index = follow_on_cat_index;
6496 /* To prevent infinite recursive calls, we don't ask for the
6497 * UTF-8ness of the string (in 'trials[i]') above. Instead we
6498 * examine the returned string here */
6499 const Size_t len = strlen(scratch_buf);
6500 const U8 * first_variant;
6502 /* If the string is identical whether or not it is encoded as
6503 * UTF-8, it isn't helpful in determining UTF8ness. */
6504 if (is_utf8_invariant_string_loc((U8 *) scratch_buf, len,
6510 /* Here, has non-ASCII. If not legal UTF-8, isn't a UTF-8
6512 if (! is_utf8_string(first_variant,
6513 len - (first_variant - (U8 *) scratch_buf)))
6515 strings_utf8ness = UTF8NESS_NO;
6519 /* Here, is a legal non-ASCII UTF-8 string; tentatively set the
6520 * return to YES; possibly overridden by later iterations */
6521 strings_utf8ness = UTF8NESS_YES;
6523 /* But if this corroborates our expectation, quit now */
6524 if (lean_towards_being_utf8 & NAME_INDICATES_UTF8) {
6529 # ifdef USE_LOCALE_TIME
6531 restore_toggled_locale_c(LC_TIME, orig_TIME_locale);
6535 Safefree(scratch_buf);
6538 if (strings_utf8ness == UTF8NESS_NO) {
6539 /* 'retval' is already loaded with whatever code set we found. */
6543 /* Here all tested strings are legal UTF-8.
6545 * Above we set UTF8NESS_YES if any string wasn't ASCII. But even if
6546 * they are all ascii, and the locale name indicates it is a UTF-8
6547 * locale, assume the locale is UTF-8. */
6548 if (lean_towards_being_utf8) {
6549 strings_utf8ness = UTF8NESS_YES;
6552 if (strings_utf8ness == UTF8NESS_YES) {
6557 /* Here, nothing examined indicates that the codeset is or isn't UTF-8.
6558 * But what is it? The other locale categories are not likely to be of
6561 * LC_NUMERIC Only a few locales in the world have a non-ASCII radix
6562 * or group separator.
6563 * LC_CTYPE This code wouldn't be compiled if mbtowc() existed and
6564 * was reliable. This is unlikely in C99. There are
6565 * other functions that could be used instead, but are
6566 * they going to exist, and be able to distinguish between
6567 * UTF-8 and 8859-1? Deal with this only if it becomes
6569 * LC_MESSAGES The strings returned from strerror() would seem likely
6570 * candidates, but experience has shown that many systems
6571 * don't actually have translations installed for them.
6572 * They are instead always in English, so everything in
6573 * them is ASCII, which is of no help to us. A Configure
6574 * probe could possibly be written to see if this platform
6575 * has non-ASCII error messages. But again, wait until it
6576 * turns out to be an actual problem.
6578 * Things like YESSTR, NOSTR, might not be in ASCII, but
6579 * need nl_langinfo() to access, which we don't have.
6582 /* Otherwise, assume the locale isn't UTF-8. This can be wrong if we
6583 * don't have MB_CUR_MAX, and the locale is English without UTF-8 in
6584 * its name, and with a dollar currency symbol. */
6585 break; /* 'retval' is already loaded with whatever code set we found. */
6587 # endif /* NEED_FURTHER_UTF8NESS_CHECKING */
6588 # endif /* ! WIN32 */
6589 # endif /* USE_LOCALE_CTYPE */
6591 } /* Giant switch() of nl_langinfo() items */
6593 restore_toggled_locale_i(cat_index, orig_switched_locale);
6595 # ifdef USE_LOCALE_CTYPE
6596 restore_toggled_locale_c(LC_CTYPE, orig_CTYPE_locale);
6600 *utf8ness = is_utf8;
6605 # endif /* All the implementations of my_langinfo() */
6607 /*--------------------------------------------------------------------------*/
6609 } /* my_langinfo() */
6611 #endif /* USE_LOCALE */
6614 =for apidoc_section $time
6615 =for apidoc sv_strftime_tm
6616 =for apidoc_item sv_strftime_ints
6617 =for apidoc_item my_strftime
6619 These implement the libc strftime(), but with a different API so that the return
6620 value is a pointer to the formatted result (which MUST be arranged to be FREED
6621 BY THE CALLER). This allows these functions to increase the buffer size as
6622 needed, so that the caller doesn't have to worry about that.
6624 On failure they return NULL, and set errno to C<EINVAL>.
6626 C<sv_strftime_tm> and C<sv_strftime_ints> are preferred, as they transparently
6627 handle the UTF-8ness of the current locale, the input C<fmt>, and the returned
6628 result. Only if the current C<LC_TIME> locale is a UTF-8 one (and S<C<use
6629 bytes>> is not in effect) will the result be marked as UTF-8. These differ
6630 only in the form of their inputs. C<sv_strftime_tm> takes a filled-in
6631 S<C<struct tm>> parameter. C<sv_strftime_ints> takes a bunch of integer
6632 parameters that together completely define a given time.
6634 C<my_strftime> is kept for backwards compatibility. Knowing if the result
6635 should be considered UTF-8 or not requires significant extra logic.
6637 Note that C<yday> and C<wday> effectively are ignored by C<sv_strftime_ints>
6638 and C<my_strftime>, as mini_mktime() overwrites them
6640 Also note that all three functions are always executed in the underlying
6641 C<LC_TIME> locale of the program, giving results based on that locale.
6647 S_ints_to_tm(pTHX_ struct tm * mytm,
6648 int sec, int min, int hour, int mday, int mon, int year,
6649 int wday, int yday, int isdst)
6651 /* Create a struct tm structure from the input time-related integer
6654 /* Override with the passed-in values */
6655 Zero(mytm, 1, struct tm);
6658 mytm->tm_hour = hour;
6659 mytm->tm_mday = mday;
6661 mytm->tm_year = year;
6662 mytm->tm_wday = wday;
6663 mytm->tm_yday = yday;
6664 mytm->tm_isdst = isdst;
6667 /* use libc to get the values for tm_gmtoff and tm_zone on platforms that
6668 * have them [perl #18238] */
6669 #if defined(HAS_MKTIME) \
6670 && (defined(HAS_TM_TM_GMTOFF) || defined(HAS_TM_TM_ZONE))
6671 struct tm mytm2 = *mytm;
6675 # ifdef HAS_TM_TM_GMTOFF
6676 mytm->tm_gmtoff = mytm2.tm_gmtoff;
6678 # ifdef HAS_TM_TM_ZONE
6679 mytm->tm_zone = mytm2.tm_zone;
6687 S_strftime_tm(pTHX_ const char *fmt, const struct tm *mytm)
6689 PERL_ARGS_ASSERT_STRFTIME_TM;
6691 /* Execute strftime() based on the input struct tm */
6693 /* An empty format yields an empty result */
6694 const int fmtlen = strlen(fmt);
6697 Newxz (ret, 1, char);
6701 #ifndef HAS_STRFTIME
6702 Perl_croak(aTHX_ "panic: no strftime");
6704 # if defined(USE_LOCALE_CTYPE) && defined(USE_LOCALE_TIME)
6706 const char * orig_CTYPE_LOCALE = toggle_locale_c(LC_CTYPE,
6707 querylocale_c(LC_TIME));
6710 /* Guess an initial size for the returned string based on an expansion
6711 * factor of the input format, but with a minimum that should handle most
6712 * common cases. If this guess is too small, we will try again with a
6714 int bufsize = MAX(fmtlen * 2, 64);
6716 char *buf = NULL; /* Makes Renew() act as Newx() on the first iteration */
6718 Renew(buf, bufsize, char);
6720 /* allowing user-supplied (rather than literal) formats is normally
6721 * frowned upon as a potential security risk; but this is part of the
6722 * API so we have to allow it (and the available formats have a much
6723 * lower chance of doing something bad than the ones for printf etc. */
6724 GCC_DIAG_IGNORE_STMT(-Wformat-nonliteral);
6727 int len = strftime(buf, bufsize, fmt, mytm);
6730 GCC_DIAG_RESTORE_STMT;
6732 /* A non-zero return indicates success. But to make sure we're not
6733 * dealing with some rogue strftime that returns how much space it
6734 * needs instead of 0 when there isn't enough, check that the return
6735 * indicates we have at least one byte of spare space (which will be
6736 * used for the terminating NUL). */
6737 if (inRANGE(len, 1, bufsize - 1)) {
6738 goto strftime_return;
6741 /* There are several possible reasons for a 0 return code for a
6742 * non-empty format, and they are not trivial to tease apart. This
6743 * issue is a known bug in the strftime() API. What we do to cope is
6744 * to assume that the reason is not enough space in the buffer, so
6745 * increase it and try again. */
6748 /* But don't just keep increasing the size indefinitely. Stop when it
6749 * becomes obvious that the reason for failure is something besides not
6750 * enough space. The most likely largest expanding format is %c. On
6751 * khw's Linux box, the maximum result of this is 67 characters, in the
6752 * km_KH locale. If a new script comes along that uses 4 UTF-8 bytes
6753 * per character, and with a similar expansion factor, that would be a
6754 * 268:2 byte ratio, or a bit more than 128:1 = 2**7:1. Some strftime
6755 * implementations allow you to say %1000c to pad to 1000 bytes. This
6756 * shows that it is impossible to implement this without a heuristic
6757 * (which can fail). But it indicates we need to be generous in the
6758 * upper limit before failing. The previous heuristic used was too
6759 * stingy. Since the size doubles per iteration, it doesn't take many
6760 * to reach the limit */
6761 } while (bufsize < ((1 << 11) + 1) * fmtlen);
6763 /* Here, strftime() returned 0, and it likely wasn't for lack of space.
6764 * There are two possible reasons:
6766 * First is that the result is legitimately 0 length. This can happen
6767 * when the format is precisely "%p". That is the only documented format
6768 * that can have an empty result. */
6769 if (strEQ(fmt, "%p")) {
6770 Renew(buf, 1, char);
6772 goto strftime_return;
6775 /* The other reason is that the format string is malformed. Probably it is
6776 * that the string is syntactically invalid for the locale. On some
6777 * platforms an invalid conversion specifier '%?' (for all illegal '?') is
6778 * treated as a literal, but others may fail when '?' is illegal */
6785 # if defined(USE_LOCALE_CTYPE) && defined(USE_LOCALE_TIME)
6787 restore_toggled_locale_c(LC_CTYPE, orig_CTYPE_LOCALE);
6798 S_strftime8(pTHX_ const char * fmt,
6799 const struct tm * mytm,
6800 const utf8ness_t fmt_utf8ness,
6801 utf8ness_t * result_utf8ness,
6802 const bool came_from_sv)
6804 PERL_ARGS_ASSERT_STRFTIME8;
6806 /* Wrap strftime_tm, taking into account the input and output UTF-8ness */
6808 #ifdef USE_LOCALE_TIME
6809 # define INDEX_TO_USE LC_TIME_INDEX_
6811 const char * locale = querylocale_c(LC_TIME);
6812 locale_utf8ness_t locale_utf8ness = LOCALE_UTF8NESS_UNKNOWN;
6815 # define INDEX_TO_USE LC_ALL_INDEX_ /* Effectively out of bounds */
6817 const char * locale = "C";
6818 locale_utf8ness_t locale_utf8ness = LOCALE_NOT_UTF8;
6822 switch (fmt_utf8ness) {
6823 case UTF8NESS_IMMATERIAL:
6826 case UTF8NESS_NO: /* Known not to be UTF-8; must not be UTF-8 locale */
6827 if (is_locale_utf8(locale)) {
6832 locale_utf8ness = LOCALE_NOT_UTF8;
6835 case UTF8NESS_YES: /* Known to be UTF-8; must be UTF-8 locale if can't
6837 if (! is_locale_utf8(locale)) {
6838 locale_utf8ness = LOCALE_NOT_UTF8;
6840 bool is_utf8 = true;
6841 Size_t fmt_len = strlen(fmt);
6842 fmt = (char *) bytes_from_utf8((U8 *) fmt, &fmt_len, &is_utf8);
6851 locale_utf8ness = LOCALE_IS_UTF8;
6856 case UTF8NESS_UNKNOWN:
6857 if (! is_locale_utf8(locale)) {
6858 locale_utf8ness = LOCALE_NOT_UTF8;
6861 locale_utf8ness = LOCALE_IS_UTF8;
6864 /* Upgrade 'fmt' to UTF-8 for a UTF-8 locale. Otherwise the
6865 * locale would find any UTF-8 variant characters to be
6867 Size_t fmt_len = strlen(fmt);
6868 fmt = (char *) bytes_to_utf8((U8 *) fmt, &fmt_len);
6876 char * retval = strftime_tm(fmt, mytm);
6877 *result_utf8ness = get_locale_string_utf8ness_i(retval,
6881 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
6882 "fmt=%s, retval=%s; utf8ness=%d",
6884 ((is_utf8_string((U8 *) retval, 0))
6886 :_byte_dump_string((U8 *) retval, strlen(retval),0)),
6897 S_give_perl_locale_control(pTHX_
6899 const char * lc_all_string,
6901 const char ** locales,
6903 const line_t caller_line)
6905 PERL_UNUSED_ARG(caller_line);
6907 /* This is called when the program is in the global locale and are
6908 * switching to per-thread (if available). And it is called at
6909 * initialization time to do the same.
6912 # if defined(WIN32) && defined(USE_THREAD_SAFE_LOCALE)
6914 /* On Windows, convert to per-thread behavior. This isn't necessary in
6915 * POSIX 2008, as the conversion gets done automatically in the
6916 * void_setlocale_i() calls below. */
6917 if (_configthreadlocale(_ENABLE_PER_THREAD_LOCALE) == -1) {
6918 locale_panic_("_configthreadlocale returned an error");
6922 # if ! defined(USE_THREAD_SAFE_LOCALE) \
6923 && ! defined(USE_POSIX_2008_LOCALE)
6924 # if defined(LC_ALL)
6925 PERL_UNUSED_ARG(lc_all_string);
6927 PERL_UNUSED_ARG(locales);
6931 /* This platform has per-thread locale handling. Do the conversion. */
6933 # if defined(LC_ALL)
6935 void_setlocale_c_with_caller(LC_ALL, lc_all_string, __FILE__, caller_line);
6939 for_all_individual_category_indexes(i) {
6940 void_setlocale_i_with_caller(i, locales[i], __FILE__, caller_line);
6946 /* Finally, update our remaining records. 'true' => force recalculation.
6947 * This is needed because we don't know what's happened while Perl hasn't
6948 * had control, so we need to figure out the current state */
6950 # if defined(LC_ALL)
6952 new_LC_ALL(lc_all_string, true);
6956 new_LC_ALL(calculate_LC_ALL_string(locales,
6966 S_output_check_environment_warning(pTHX_ const char * const language,
6967 const char * const lc_all,
6968 const char * const lang)
6970 PerlIO_printf(Perl_error_log,
6971 "perl: warning: Please check that your locale settings:\n");
6975 PerlIO_printf(Perl_error_log, "\tLANGUAGE = %c%s%c,\n",
6976 language ? '"' : '(',
6977 language ? language : "unset",
6978 language ? '"' : ')');
6980 PERL_UNUSED_ARG(language);
6983 PerlIO_printf(Perl_error_log, "\tLC_ALL = %c%s%c,\n",
6985 lc_all ? lc_all : "unset",
6986 lc_all ? '"' : ')');
6988 for_all_individual_category_indexes(i) {
6989 const char * value = PerlEnv_getenv(category_names[i]);
6990 PerlIO_printf(Perl_error_log,
6994 value ? value : "unset",
6998 PerlIO_printf(Perl_error_log, "\tLANG = %c%s%c\n",
7000 lang ? lang : "unset",
7002 PerlIO_printf(Perl_error_log,
7003 " are supported and installed on your system.\n");
7008 /* A helper macro for the next function. Needed because would be called in two
7009 * places. Knows about the internal workings of the function */
7010 #define GET_DESCRIPTION(trial, name) \
7011 ((isNAME_C_OR_POSIX(name)) \
7012 ? "the standard locale" \
7013 : ((trial == (system_default_trial) \
7014 ? "the system default locale" \
7015 : "a fallback locale")))
7018 * Initialize locale awareness.
7021 Perl_init_i18nl10n(pTHX_ int printwarn)
7024 * 0 if not to output warning when setup locale is bad
7025 * 1 if to output warning based on value of PERL_BADLANG
7026 * >1 if to output regardless of PERL_BADLANG
7029 * 1 = set ok or not applicable,
7030 * 0 = fallback to a locale of lower priority
7031 * -1 = fallback to all locales failed, not even to the C locale
7033 * Under -DDEBUGGING, if the environment variable PERL_DEBUG_LOCALE_INIT is
7034 * set, debugging information is output.
7036 * This routine effectively does the following in most cases:
7038 * basic initialization;
7039 * asserts that the compiled tables are consistent;
7040 * initialize data structures;
7041 * make sure we are in the global locale;
7042 * setlocale(LC_ALL, "");
7043 * switch to per-thread locale if applicable;
7045 * The "" causes the locale to be set to what the environment variables at
7046 * the time say it should be.
7048 * To handle possible failures, the setlocale is expanded to be like:
7050 * trial_locale = pre-first-trial;
7051 * while (has_another_trial()) {
7052 * trial_locale = next_trial();
7053 * if setlocale(LC_ALL, trial_locale) {
7058 * had_failure = true;
7062 * if (had_failure) {
7064 * if (! ok) warn_still_more();
7067 * The first trial is either:
7068 * "" to examine the environment variables for the locale
7069 * NULL to use the values already set for the locale by the program
7070 * embedding this perl instantiation.
7072 * Something is wrong if this trial fails, but there is a sequence of
7073 * fallbacks to try should that happen. They are given in the enum below.
7075 * If there is no LC_ALL defined on the system, the setlocale() above is
7076 * replaced by a loop setting each individual category separately.
7078 * In a non-embeded environment, this code is executed exactly once. It
7079 * sets up the global locale environment. At the end, if some sort of
7080 * thread-safety is in effect, it will turn thread 0 into using that, with
7081 * the same locale as the global initially. thread 0 can then change its
7082 * locale at will without affecting the global one.
7084 * At destruction time, thread 0 will revert to the global locale as the
7085 * other threads die.
7087 * Care must be taken in an embedded environment. This code will be
7088 * executed for each instantiation. Since it changes the global locale, it
7089 * could clash with another running instantiation that isn't using
7090 * per-thread locales. perlembed suggests having the controlling program
7091 * set each instantiation's locale and set PERL_SKIP_LOCALE_INIT so this
7092 * code uses that without actually changing anything. Then the onus is on
7093 * the controlling program to prevent any races. The code below does
7094 * enough locking so as to prevent system calls from overwriting data
7095 * before it is safely copied here, but that isn't a general solution.
7100 PERL_UNUSED_ARG(printwarn);
7103 #else /* USE_LOCALE to near the end of the routine */
7109 const char * const language = PerlEnv_getenv("LANGUAGE");
7112 const char * const language = NULL; /* Unused placeholder */
7115 /* A later getenv() could zap this, so only use here */
7116 const char * const bad_lang_use_once = PerlEnv_getenv("PERL_BADLANG");
7118 const bool locwarn = (printwarn > 1
7120 && ( ! bad_lang_use_once
7122 /* disallow with "" or "0" */
7124 && strNE("0", bad_lang_use_once)))));
7127 # define DEBUG_LOCALE_INIT(a,b,c)
7130 DEBUG_INITIALIZATION_set(cBOOL(PerlEnv_getenv("PERL_DEBUG_LOCALE_INIT")));
7132 # define DEBUG_LOCALE_INIT(cat_index, locale, result) \
7133 DEBUG_L(PerlIO_printf(Perl_debug_log, "%s\n", \
7134 setlocale_debug_string_i(cat_index, locale, result)));
7137 assert(categories[LC_ALL_INDEX_] == LC_ALL);
7138 assert(strEQ(category_names[LC_ALL_INDEX_], "LC_ALL"));
7139 # ifdef USE_POSIX_2008_LOCALE
7140 assert(category_masks[LC_ALL_INDEX_] == LC_ALL_MASK);
7144 for_all_individual_category_indexes(i) {
7145 assert(category_name_lengths[i] == strlen(category_names[i]));
7148 # endif /* DEBUGGING */
7150 /* Initialize the per-thread mbrFOO() state variables. See POSIX.xs for
7151 * why these particular incantations are used. */
7153 memzero(&PL_mbrlen_ps, sizeof(PL_mbrlen_ps));
7156 memzero(&PL_mbrtowc_ps, sizeof(PL_mbrtowc_ps));
7159 wcrtomb(NULL, L'\0', &PL_wcrtomb_ps);
7161 # ifdef USE_PL_CURLOCALES
7163 for (unsigned int i = 0; i <= LC_ALL_INDEX_; i++) {
7164 PL_curlocales[i] = savepv("C");
7168 # ifdef USE_PL_CUR_LC_ALL
7170 PL_cur_LC_ALL = savepv("C");
7173 # if ! defined(PERL_LC_ALL_USES_NAME_VALUE_PAIRS) && defined(LC_ALL)
7177 /* If we haven't done so already, translate the LC_ALL positions of
7178 * categories into our internal indices. */
7179 if (map_LC_ALL_position_to_index[0] == LC_ALL_INDEX_) {
7181 /* Use this array, initialized by a config.h constant */
7182 int lc_all_category_positions[] = PERL_LC_ALL_CATEGORY_POSITIONS_INIT;
7183 STATIC_ASSERT_STMT( C_ARRAY_LENGTH(lc_all_category_positions)
7186 for (unsigned int i = 0;
7187 i < C_ARRAY_LENGTH(lc_all_category_positions);
7190 map_LC_ALL_position_to_index[i] =
7191 get_category_index(lc_all_category_positions[i]);
7198 # ifdef USE_POSIX_2008_LOCALE
7200 /* This is a global, so be sure to keep another instance from zapping it */
7202 if (PL_C_locale_obj) {
7206 PL_C_locale_obj = newlocale(LC_ALL_MASK, "C", (locale_t) 0);
7207 if (! PL_C_locale_obj) {
7209 locale_panic_(Perl_form(aTHX_
7210 "Cannot create POSIX 2008 C locale object"));
7214 DEBUG_Lv(PerlIO_printf(Perl_debug_log, "created C object %p\n",
7218 /* Switch to using the POSIX 2008 interface now. This would happen below
7219 * anyway, but deferring it can lead to leaks of memory that would also get
7220 * malloc'd in the interim. We arbitrarily switch to the C locale,
7221 * overridden below */
7222 if (! uselocale(PL_C_locale_obj)) {
7223 locale_panic_(Perl_form(aTHX_
7224 "Can't uselocale(%p), LC_ALL supposed to"
7229 # ifdef MULTIPLICITY
7231 PL_cur_locale_obj = PL_C_locale_obj;
7236 /* Now initialize some data structures. This is entirely so that
7237 * later-executed code doesn't have to concern itself with things not being
7238 * initialized. Arbitrarily use the C locale (which we know has to exist
7239 * on the system). */
7241 # ifdef USE_LOCALE_NUMERIC
7243 PL_numeric_radix_sv = newSV(1);
7244 PL_underlying_radix_sv = newSV(1);
7245 Newxz(PL_numeric_name, 1, char); /* Single NUL character */
7248 # ifdef USE_LOCALE_COLLATE
7250 Newxz(PL_collation_name, 1, char);
7253 # ifdef USE_LOCALE_CTYPE
7255 Newxz(PL_ctype_name, 1, char);
7259 new_LC_ALL("C", true /* Don't shortcut */);
7261 /*===========================================================================*/
7263 /* Now ready to override the initialization with the values that the user
7264 * wants. This is done in the global locale as explained in the
7265 * introductory comments to this function */
7266 switch_to_global_locale();
7268 const char * const lc_all = PerlEnv_getenv("LC_ALL");
7269 const char * const lang = PerlEnv_getenv("LANG");
7271 /* We try each locale in the enum, in order, until we get one that works,
7272 * or exhaust the list. Normally the loop is executed just once.
7274 * Each enum value is +1 from the previous */
7277 environment_trial = 0, /* "" or NULL; code below assumes value
7278 0 is the first real trial */
7279 LC_ALL_trial, /* ENV{LC_ALL} */
7280 LANG_trial, /* ENV{LANG} */
7281 system_default_trial, /* Windows .ACP */
7282 C_trial, /* C locale */
7287 SSize_t already_checked = 0;
7288 const char * checked[C_trial];
7291 const char * lc_all_string;
7293 const char * curlocales[LC_ALL_INDEX_];
7296 /* Loop through the initial setting and all the possible fallbacks,
7297 * breaking out of the loop on success */
7298 trial = dummy_trial;
7299 while (trial != beyond_final_trial) {
7301 /* Each time through compute the next trial to use based on the one in
7302 * the previous iteration and switch to the new one. This enforces the
7303 * order in which the fallbacks are applied */
7305 trial = (trials) ((int) trial + 1); /* Casts are needed for g++ */
7307 const char * locale = NULL;
7309 /* Set up the parameters for this trial */
7312 locale_panic_("Unexpectedly got 'dummy_trial");
7315 case environment_trial:
7316 /* This is either "" to get the values from the environment, or
7317 * NULL if the calling program has initialized the values already.
7319 locale = (PerlEnv_getenv("PERL_SKIP_LOCALE_INIT"))
7325 if (! lc_all || strEQ(lc_all, "")) {
7326 continue; /* No-op */
7333 if (! lang || strEQ(lang, "")) {
7334 continue; /* No-op */
7340 case system_default_trial:
7342 # if ! defined(WIN32) || ! defined(LC_ALL)
7344 continue; /* No-op */
7347 /* For Windows, we also try the system default locale before "C".
7348 * (If there exists a Windows without LC_ALL we skip this because
7349 * it gets too complicated. For those, "C" is the next fallback
7359 case beyond_final_trial:
7360 continue; /* No-op, causes loop to exit */
7363 /* If the locale is a substantive name, don't try the same locale
7365 if (locale && strNE(locale, "")) {
7366 for (unsigned int i = 0; i < already_checked; i++) {
7367 if (strEQ(checked[i], locale)) {
7372 /* And, for future iterations, indicate we've tried this locale */
7373 checked[already_checked] = savepv(locale);
7374 SAVEFREEPV(checked[already_checked]);
7380 STDIZED_SETLOCALE_LOCK;
7381 lc_all_string = savepv(stdized_setlocale(LC_ALL, locale));
7382 STDIZED_SETLOCALE_UNLOCK;
7384 DEBUG_LOCALE_INIT(LC_ALL_INDEX_, locale, lc_all_string);
7386 if (LIKELY(lc_all_string)) { /* Succeeded */
7391 if (trial == 0 && locwarn) {
7392 PerlIO_printf(Perl_error_log,
7393 "perl: warning: Setting locale failed.\n");
7394 output_check_environment_warning(language, lc_all, lang);
7397 # else /* Below is ! LC_ALL */
7399 bool setlocale_failure = FALSE; /* This trial hasn't failed so far */
7400 bool dowarn = trial == 0 && locwarn;
7402 for_all_individual_category_indexes(j) {
7403 STDIZED_SETLOCALE_LOCK;
7404 curlocales[j] = savepv(stdized_setlocale(categories[j], locale));
7405 STDIZED_SETLOCALE_UNLOCK;
7407 DEBUG_LOCALE_INIT(j, locale, curlocales[j]);
7409 if (UNLIKELY(! curlocales[j])) {
7410 setlocale_failure = TRUE;
7412 /* If are going to warn below, continue to loop so all failures
7413 * are included in the message */
7420 if (LIKELY(! setlocale_failure)) { /* All succeeded */
7422 break; /* Exit trial_locales loop */
7425 /* Here, this trial failed */
7428 PerlIO_printf(Perl_error_log,
7429 "perl: warning: Setting locale failed for the categories:\n");
7431 for_all_individual_category_indexes(j) {
7432 if (! curlocales[j]) {
7433 PerlIO_printf(Perl_error_log, "\t%s\n", category_names[j]);
7437 output_check_environment_warning(language, lc_all, lang);
7438 } /* end of warning on first failure */
7440 # endif /* LC_ALL */
7442 } /* end of looping through the trial locales */
7444 /* If we had to do more than the first trial, it means that one failed, and
7445 * we may need to output a warning, and, if none worked, do more */
7446 if (UNLIKELY(trial != 0)) {
7448 const char * description = "a fallback locale";
7449 const char * name = NULL;;
7451 /* If we didn't find a good fallback, list all we tried */
7452 if (! ok && already_checked > 0) {
7453 PerlIO_printf(Perl_error_log, "perl: warning: Failed to fall"
7455 if (already_checked > 1) { /* more than one was tried */
7456 PerlIO_printf(Perl_error_log, "any of:\n");
7459 while (already_checked > 0) {
7460 name = checked[--already_checked];
7461 description = GET_DESCRIPTION(trial, name);
7462 PerlIO_printf(Perl_error_log, "%s (\"%s\")\n",
7469 /* Here, a fallback worked. So we have saved its name, and the
7470 * trial that succeeded is still valid */
7472 const char * individ_locales[LC_ALL_INDEX_] = { NULL };
7474 /* Even though we know the valid string for LC_ALL that worked,
7475 * translate it into our internal format, which is the
7476 * name=value pairs notation. This is easier for a human to
7477 * decipher than the positional notation. Some platforms
7478 * can return "C C C C C C" for LC_ALL. This code also
7479 * standardizes that result into plain "C". */
7480 switch (parse_LC_ALL_string(lc_all_string,
7481 (const char **) &individ_locales,
7483 false, /* Return only [0] if
7485 false, /* Don't panic on error */
7490 /* Here, the parse failed, which shouldn't happen, but if
7491 * it does, we have an easy fallback that allows us to keep
7493 name = lc_all_string;
7496 case no_array: /* The original is a single locale */
7497 name = lc_all_string;
7500 case only_element_0: /* element[0] is a single locale valid
7501 for all categories */
7502 SAVEFREEPV(individ_locales[0]);
7503 name = individ_locales[0];
7507 name = calculate_LC_ALL_string(individ_locales,
7511 for_all_individual_category_indexes(j) {
7512 Safefree(individ_locales[j]);
7516 name = calculate_LC_ALL_string(curlocales,
7521 description = GET_DESCRIPTION(trial, name);
7525 /* Nothing seems to be working, yet we want to continue
7526 * executing. It may well be that locales are mostly
7527 * irrelevant to this particular program, and there must be
7528 * some locale underlying the program. Figure it out as best
7529 * we can, by querying the system's current locale */
7533 STDIZED_SETLOCALE_LOCK;
7534 name = stdized_setlocale(LC_ALL, NULL);
7535 STDIZED_SETLOCALE_UNLOCK;
7537 if (UNLIKELY(! name)) {
7538 name = "locale name not determinable";
7541 # else /* Below is ! LC_ALL */
7543 const char * system_locales[LC_ALL_INDEX_] = { NULL };
7545 for_all_individual_category_indexes(j) {
7546 STDIZED_SETLOCALE_LOCK;
7547 system_locales[j] = savepv(stdized_setlocale(categories[j],
7549 STDIZED_SETLOCALE_UNLOCK;
7551 if (UNLIKELY(! system_locales[j])) {
7552 system_locales[j] = "not determinable";
7556 /* We use the name=value form for the string, as that is more
7557 * human readable than the positional notation */
7558 name = calculate_LC_ALL_string(system_locales,
7562 description = "what the system says";
7564 for_all_individual_category_indexes(j) {
7565 Safefree(system_locales[j]);
7570 PerlIO_printf(Perl_error_log,
7571 "perl: warning: Falling back to %s (\"%s\").\n",
7574 /* Here, ok being true indicates that the first attempt failed, but
7575 * a fallback succeeded; false => nothing working. Translate to
7576 * API return values. */
7583 give_perl_locale_control(lc_all_string, __LINE__);
7584 Safefree(lc_all_string);
7588 give_perl_locale_control((const char **) &curlocales, __LINE__);
7590 for_all_individual_category_indexes(j) {
7591 Safefree(curlocales[j]);
7595 # if defined(USE_PERLIO) && defined(USE_LOCALE_CTYPE)
7597 /* Set PL_utf8locale to TRUE if using PerlIO _and_ the current LC_CTYPE
7598 * locale is UTF-8. give_perl_locale_control() just above has already
7599 * calculated the latter value and saved it in PL_in_utf8_CTYPE_locale. If
7600 * both PL_utf8locale and PL_unicode (set by -C or by $ENV{PERL_UNICODE})
7601 * are true, perl.c:S_parse_body() will turn on the PerlIO :utf8 layer on
7602 * STDIN, STDOUT, STDERR, _and_ the default open discipline. */
7603 PL_utf8locale = PL_in_utf8_CTYPE_locale;
7605 /* Set PL_unicode to $ENV{PERL_UNICODE} if using PerlIO.
7606 This is an alternative to using the -C command line switch
7607 (the -C if present will override this). */
7609 const char *p = PerlEnv_getenv("PERL_UNICODE");
7610 PL_unicode = p ? parse_unicode_opts(&p) : 0;
7611 if (PL_unicode & PERL_UNICODE_UTF8CACHEASSERT_FLAG)
7616 # if defined(USE_POSIX_2008_LOCALE) && defined(MULTIPLICITY)
7617 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
7618 "finished Perl_init_i18nl10n; actual obj=%p,"
7619 " expected obj=%p, initial=%s\n",
7620 uselocale(0), PL_cur_locale_obj,
7621 get_LC_ALL_display()));
7624 /* So won't continue to output stuff */
7625 DEBUG_INITIALIZATION_set(FALSE);
7627 #endif /* USE_LOCALE */
7632 #undef GET_DESCRIPTION
7633 #ifdef USE_LOCALE_COLLATE
7636 S_compute_collxfrm_coefficients(pTHX)
7639 /* A locale collation definition includes primary, secondary, tertiary,
7640 * etc. weights for each character. To sort, the primary weights are used,
7641 * and only if they compare equal, then the secondary weights are used, and
7642 * only if they compare equal, then the tertiary, etc.
7644 * strxfrm() works by taking the input string, say ABC, and creating an
7645 * output transformed string consisting of first the primary weights,
7646 * A¹B¹C¹ followed by the secondary ones, A²B²C²; and then the tertiary,
7647 * etc, yielding A¹B¹C¹ A²B²C² A³B³C³ .... Some characters may not have
7648 * weights at every level. In our example, let's say B doesn't have a
7649 * tertiary weight, and A doesn't have a secondary weight. The constructed
7650 * string is then going to be
7651 * A¹B¹C¹ B²C² A³C³ ....
7652 * This has the desired effect that strcmp() will look at the secondary or
7653 * tertiary weights only if the strings compare equal at all higher
7654 * priority weights. The spaces shown here, like in
7656 * are not just for readability. In the general case, these must actually
7657 * be bytes, which we will call here 'separator weights'; and they must be
7658 * smaller than any other weight value, but since these are C strings, only
7659 * the terminating one can be a NUL (some implementations may include a
7660 * non-NUL separator weight just before the NUL). Implementations tend to
7661 * reserve 01 for the separator weights. They are needed so that a shorter
7662 * string's secondary weights won't be misconstrued as primary weights of a
7663 * longer string, etc. By making them smaller than any other weight, the
7664 * shorter string will sort first. (Actually, if all secondary weights are
7665 * smaller than all primary ones, there is no need for a separator weight
7666 * between those two levels, etc.)
7668 * The length of the transformed string is roughly a linear function of the
7669 * input string. It's not exactly linear because some characters don't
7670 * have weights at all levels. When we call strxfrm() we have to allocate
7671 * some memory to hold the transformed string. The calculations below try
7672 * to find coefficients 'm' and 'b' for this locale so that m*x + b equals
7673 * how much space we need, given the size of the input string in 'x'. If
7674 * we calculate too small, we increase the size as needed, and call
7675 * strxfrm() again, but it is better to get it right the first time to
7676 * avoid wasted expensive string transformations.
7678 * We use the string below to find how long the transformation of it is.
7679 * Almost all locales are supersets of ASCII, or at least the ASCII
7680 * letters. We use all of them, half upper half lower, because if we used
7681 * fewer, we might hit just the ones that are outliers in a particular
7682 * locale. Most of the strings being collated will contain a preponderance
7683 * of letters, and even if they are above-ASCII, they are likely to have
7684 * the same number of weight levels as the ASCII ones. It turns out that
7685 * digits tend to have fewer levels, and some punctuation has more, but
7686 * those are relatively sparse in text, and khw believes this gives a
7687 * reasonable result, but it could be changed if experience so dictates. */
7688 const char longer[] = "ABCDEFGHIJKLMnopqrstuvwxyz";
7689 char * x_longer; /* Transformed 'longer' */
7690 Size_t x_len_longer; /* Length of 'x_longer' */
7692 char * x_shorter; /* We also transform a substring of 'longer' */
7693 Size_t x_len_shorter;
7695 PL_in_utf8_COLLATE_locale = (PL_collation_standard)
7697 : is_locale_utf8(PL_collation_name);
7698 PL_strxfrm_NUL_replacement = '\0';
7699 PL_strxfrm_max_cp = 0;
7701 /* mem_collxfrm_() is used get the transformation (though here we are
7702 * interested only in its length). It is used because it has the
7703 * intelligence to handle all cases, but to work, it needs some values of
7704 * 'm' and 'b' to get it started. For the purposes of this calculation we
7705 * use a very conservative estimate of 'm' and 'b'. This assumes a weight
7706 * can be multiple bytes, enough to hold any UV on the platform, and there
7707 * are 5 levels, 4 weight bytes, and a trailing NUL. */
7708 PL_collxfrm_base = 5;
7709 PL_collxfrm_mult = 5 * sizeof(UV);
7711 /* Find out how long the transformation really is */
7712 x_longer = mem_collxfrm_(longer,
7716 /* We avoid converting to UTF-8 in the called
7717 * function by telling it the string is in UTF-8
7718 * if the locale is a UTF-8 one. Since the string
7719 * passed here is invariant under UTF-8, we can
7720 * claim it's UTF-8 even if it isn't. */
7721 PL_in_utf8_COLLATE_locale);
7724 /* Find out how long the transformation of a substring of 'longer' is.
7725 * Together the lengths of these transformations are sufficient to
7726 * calculate 'm' and 'b'. The substring is all of 'longer' except the
7727 * first character. This minimizes the chances of being swayed by outliers
7729 x_shorter = mem_collxfrm_(longer + 1,
7732 PL_in_utf8_COLLATE_locale);
7733 Safefree(x_shorter);
7735 /* If the results are nonsensical for this simple test, the whole locale
7736 * definition is suspect. Mark it so that locale collation is not active
7737 * at all for it. XXX Should we warn? */
7738 if ( x_len_shorter == 0
7739 || x_len_longer == 0
7740 || x_len_shorter >= x_len_longer)
7742 PL_collxfrm_mult = 0;
7743 PL_collxfrm_base = 1;
7744 DEBUG_L(PerlIO_printf(Perl_debug_log,
7745 "Disabling locale collation for LC_COLLATE='%s';"
7746 " length for shorter sample=%zu; longer=%zu\n",
7747 PL_collation_name, x_len_shorter, x_len_longer));
7750 SSize_t base; /* Temporary */
7752 /* We have both: m * strlen(longer) + b = x_len_longer
7753 * m * strlen(shorter) + b = x_len_shorter;
7754 * subtracting yields:
7755 * m * (strlen(longer) - strlen(shorter))
7756 * = x_len_longer - x_len_shorter
7757 * But we have set things up so that 'shorter' is 1 byte smaller than
7759 * m = x_len_longer - x_len_shorter
7761 * But if something went wrong, make sure the multiplier is at least 1.
7763 if (x_len_longer > x_len_shorter) {
7764 PL_collxfrm_mult = (STRLEN) x_len_longer - x_len_shorter;
7767 PL_collxfrm_mult = 1;
7772 * but in case something has gone wrong, make sure it is non-negative
7774 base = x_len_longer - PL_collxfrm_mult * (sizeof(longer) - 1);
7779 /* Add 1 for the trailing NUL */
7780 PL_collxfrm_base = base + 1;
7783 DEBUG_L(PerlIO_printf(Perl_debug_log,
7784 "?UTF-8 locale=%d; x_len_shorter=%zu, "
7786 " collate multipler=%zu, collate base=%zu\n",
7787 PL_in_utf8_COLLATE_locale,
7788 x_len_shorter, x_len_longer,
7789 PL_collxfrm_mult, PL_collxfrm_base));
7793 Perl_mem_collxfrm_(pTHX_ const char *input_string,
7794 STRLEN len, /* Length of 'input_string' */
7795 STRLEN *xlen, /* Set to length of returned string
7796 (not including the collation index
7798 bool utf8 /* Is the input in UTF-8? */
7801 /* mem_collxfrm_() is like strxfrm() but with two important differences.
7802 * First, it handles embedded NULs. Second, it allocates a bit more memory
7803 * than needed for the transformed data itself. The real transformed data
7804 * begins at offset COLLXFRM_HDR_LEN. *xlen is set to the length of that,
7805 * and doesn't include the collation index size.
7807 * It is the caller's responsibility to eventually free the memory returned
7810 * Please see sv_collxfrm() to see how this is used. */
7812 # define COLLXFRM_HDR_LEN sizeof(PL_collation_ix)
7814 char * s = (char *) input_string;
7815 STRLEN s_strlen = strlen(input_string);
7817 STRLEN xAlloc; /* xalloc is a reserved word in VC */
7818 STRLEN length_in_chars;
7819 bool first_time = TRUE; /* Cleared after first loop iteration */
7821 # ifdef USE_LOCALE_CTYPE
7822 const char * orig_CTYPE_locale = NULL;
7825 # if defined(USE_POSIX_2008_LOCALE) && defined HAS_STRXFRM_L
7826 locale_t constructed_locale = (locale_t) 0;
7829 PERL_ARGS_ASSERT_MEM_COLLXFRM_;
7831 /* Must be NUL-terminated */
7832 assert(*(input_string + len) == '\0');
7834 if (PL_collxfrm_mult == 0) { /* unknown or bad */
7835 if (PL_collxfrm_base != 0) { /* bad collation => skip */
7836 DEBUG_L(PerlIO_printf(Perl_debug_log,
7837 "mem_collxfrm_: locale's collation is defective\n"));
7841 /* (mult, base) == (0,0) means we need to calculate mult and base
7842 * before proceeding */
7843 S_compute_collxfrm_coefficients(aTHX);
7846 /* Replace any embedded NULs with the control that sorts before any others.
7847 * This will give as good as possible results on strings that don't
7848 * otherwise contain that character, but otherwise there may be
7849 * less-than-perfect results with that character and NUL. This is
7850 * unavoidable unless we replace strxfrm with our own implementation. */
7851 if (UNLIKELY(s_strlen < len)) { /* Only execute if there is an embedded
7855 STRLEN sans_nuls_len;
7856 int try_non_controls;
7857 char this_replacement_char[] = "?\0"; /* Room for a two-byte string,
7858 making sure 2nd byte is NUL.
7860 STRLEN this_replacement_len;
7862 /* If we don't know what non-NUL control character sorts lowest for
7863 * this locale, find it */
7864 if (PL_strxfrm_NUL_replacement == '\0') {
7866 char * cur_min_x = NULL; /* The min_char's xfrm, (except it also
7867 includes the collation index
7870 DEBUG_Lv(PerlIO_printf(Perl_debug_log, "Looking to replace NUL\n"));
7872 /* Unlikely, but it may be that no control will work to replace
7873 * NUL, in which case we instead look for any character. Controls
7874 * are preferred because collation order is, in general, context
7875 * sensitive, with adjoining characters affecting the order, and
7876 * controls are less likely to have such interactions, allowing the
7877 * NUL-replacement to stand on its own. (Another way to look at it
7878 * is to imagine what would happen if the NUL were replaced by a
7879 * combining character; it wouldn't work out all that well.) */
7880 for (try_non_controls = 0;
7881 try_non_controls < 2;
7885 # ifdef USE_LOCALE_CTYPE
7887 /* In this case we use isCNTRL_LC() below, which relies on
7888 * LC_CTYPE, so that must be switched to correspond with the
7889 * LC_COLLATE locale */
7890 if (! try_non_controls && ! PL_in_utf8_COLLATE_locale) {
7891 orig_CTYPE_locale = toggle_locale_c(LC_CTYPE,
7895 /* Look through all legal code points (NUL isn't) */
7896 for (j = 1; j < 256; j++) {
7897 char * x; /* j's xfrm plus collation index */
7898 STRLEN x_len; /* length of 'x' */
7899 STRLEN trial_len = 1;
7900 char cur_source[] = { '\0', '\0' };
7902 /* Skip non-controls the first time through the loop. The
7903 * controls in a UTF-8 locale are the L1 ones */
7904 if (! try_non_controls && (PL_in_utf8_COLLATE_locale)
7911 /* Create a 1-char string of the current code point */
7912 cur_source[0] = (char) j;
7914 /* Then transform it */
7915 x = mem_collxfrm_(cur_source, trial_len, &x_len,
7916 0 /* The string is not in UTF-8 */);
7918 /* Ignore any character that didn't successfully transform.
7924 /* If this character's transformation is lower than
7925 * the current lowest, this one becomes the lowest */
7926 if ( cur_min_x == NULL
7927 || strLT(x + COLLXFRM_HDR_LEN,
7928 cur_min_x + COLLXFRM_HDR_LEN))
7930 PL_strxfrm_NUL_replacement = j;
7931 Safefree(cur_min_x);
7937 } /* end of loop through all 255 characters */
7939 # ifdef USE_LOCALE_CTYPE
7940 restore_toggled_locale_c(LC_CTYPE, orig_CTYPE_locale);
7943 /* Stop looking if found */
7948 /* Unlikely, but possible, if there aren't any controls that
7949 * work in the locale, repeat the loop, looking for any
7950 * character that works */
7951 DEBUG_L(PerlIO_printf(Perl_debug_log,
7952 "mem_collxfrm_: No control worked. Trying non-controls\n"));
7953 } /* End of loop to try first the controls, then any char */
7956 DEBUG_L(PerlIO_printf(Perl_debug_log,
7957 "mem_collxfrm_: Couldn't find any character to replace"
7958 " embedded NULs in locale %s with", PL_collation_name));
7962 DEBUG_L(PerlIO_printf(Perl_debug_log,
7963 "mem_collxfrm_: Replacing embedded NULs in locale %s with "
7964 "0x%02X\n", PL_collation_name, PL_strxfrm_NUL_replacement));
7966 Safefree(cur_min_x);
7967 } /* End of determining the character that is to replace NULs */
7969 /* If the replacement is variant under UTF-8, it must match the
7970 * UTF8-ness of the original */
7971 if ( ! UVCHR_IS_INVARIANT(PL_strxfrm_NUL_replacement) && utf8) {
7972 this_replacement_char[0] =
7973 UTF8_EIGHT_BIT_HI(PL_strxfrm_NUL_replacement);
7974 this_replacement_char[1] =
7975 UTF8_EIGHT_BIT_LO(PL_strxfrm_NUL_replacement);
7976 this_replacement_len = 2;
7979 this_replacement_char[0] = PL_strxfrm_NUL_replacement;
7980 /* this_replacement_char[1] = '\0' was done at initialization */
7981 this_replacement_len = 1;
7984 /* The worst case length for the replaced string would be if every
7985 * character in it is NUL. Multiply that by the length of each
7986 * replacement, and allow for a trailing NUL */
7987 sans_nuls_len = (len * this_replacement_len) + 1;
7988 Newx(sans_nuls, sans_nuls_len, char);
7991 /* Replace each NUL with the lowest collating control. Loop until have
7992 * exhausted all the NULs */
7993 while (s + s_strlen < e) {
7994 my_strlcat(sans_nuls, s, sans_nuls_len);
7996 /* Do the actual replacement */
7997 my_strlcat(sans_nuls, this_replacement_char, sans_nuls_len);
7999 /* Move past the input NUL */
8001 s_strlen = strlen(s);
8004 /* And add anything that trails the final NUL */
8005 my_strlcat(sans_nuls, s, sans_nuls_len);
8007 /* Switch so below we transform this modified string */
8010 } /* End of replacing NULs */
8012 /* Make sure the UTF8ness of the string and locale match */
8013 if (utf8 != PL_in_utf8_COLLATE_locale) {
8014 /* XXX convert above Unicode to 10FFFF? */
8015 const char * const t = s; /* Temporary so we can later find where the
8018 /* Here they don't match. Change the string's to be what the locale is
8021 if (! utf8) { /* locale is UTF-8, but input isn't; upgrade the input */
8022 s = (char *) bytes_to_utf8((const U8 *) s, &len);
8025 else { /* locale is not UTF-8; but input is; downgrade the input */
8027 s = (char *) bytes_from_utf8((const U8 *) s, &len, &utf8);
8029 /* If the downgrade was successful we are done, but if the input
8030 * contains things that require UTF-8 to represent, have to do
8031 * damage control ... */
8032 if (UNLIKELY(utf8)) {
8034 /* What we do is construct a non-UTF-8 string with
8035 * 1) the characters representable by a single byte converted
8036 * to be so (if necessary);
8037 * 2) and the rest converted to collate the same as the
8038 * highest collating representable character. That makes
8039 * them collate at the end. This is similar to how we
8040 * handle embedded NULs, but we use the highest collating
8041 * code point instead of the smallest. Like the NUL case,
8042 * this isn't perfect, but is the best we can reasonably
8043 * do. Every above-255 code point will sort the same as
8044 * the highest-sorting 0-255 code point. If that code
8045 * point can combine in a sequence with some other code
8046 * points for weight calculations, us changing something to
8047 * be it can adversely affect the results. But in most
8048 * cases, it should work reasonably. And note that this is
8049 * really an illegal situation: using code points above 255
8050 * on a locale where only 0-255 are valid. If two strings
8051 * sort entirely equal, then the sort order for the
8052 * above-255 code points will be in code point order. */
8056 /* If we haven't calculated the code point with the maximum
8057 * collating order for this locale, do so now */
8058 if (! PL_strxfrm_max_cp) {
8061 /* The current transformed string that collates the
8062 * highest (except it also includes the prefixed collation
8064 char * cur_max_x = NULL;
8066 /* Look through all legal code points (NUL isn't) */
8067 for (j = 1; j < 256; j++) {
8070 char cur_source[] = { '\0', '\0' };
8072 /* Create a 1-char string of the current code point */
8073 cur_source[0] = (char) j;
8075 /* Then transform it */
8076 x = mem_collxfrm_(cur_source, 1, &x_len, FALSE);
8078 /* If something went wrong (which it shouldn't), just
8079 * ignore this code point */
8084 /* If this character's transformation is higher than
8085 * the current highest, this one becomes the highest */
8086 if ( cur_max_x == NULL
8087 || strGT(x + COLLXFRM_HDR_LEN,
8088 cur_max_x + COLLXFRM_HDR_LEN))
8090 PL_strxfrm_max_cp = j;
8091 Safefree(cur_max_x);
8100 DEBUG_L(PerlIO_printf(Perl_debug_log,
8101 "mem_collxfrm_: Couldn't find any character to"
8102 " replace above-Latin1 chars in locale %s with",
8103 PL_collation_name));
8107 DEBUG_L(PerlIO_printf(Perl_debug_log,
8108 "mem_collxfrm_: highest 1-byte collating character"
8109 " in locale %s is 0x%02X\n",
8111 PL_strxfrm_max_cp));
8113 Safefree(cur_max_x);
8116 /* Here we know which legal code point collates the highest.
8117 * We are ready to construct the non-UTF-8 string. The length
8118 * will be at least 1 byte smaller than the input string
8119 * (because we changed at least one 2-byte character into a
8120 * single byte), but that is eaten up by the trailing NUL */
8126 char * e = (char *) t + len;
8128 for (i = 0; i < len; i+= UTF8SKIP(t + i)) {
8130 if (UTF8_IS_INVARIANT(cur_char)) {
8133 else if (UTF8_IS_NEXT_CHAR_DOWNGRADEABLE(t + i, e)) {
8134 s[d++] = EIGHT_BIT_UTF8_TO_NATIVE(cur_char, t[i+1]);
8136 else { /* Replace illegal cp with highest collating
8138 s[d++] = PL_strxfrm_max_cp;
8142 Renew(s, d, char); /* Free up unused space */
8147 /* Here, we have constructed a modified version of the input. It could
8148 * be that we already had a modified copy before we did this version.
8149 * If so, that copy is no longer needed */
8150 if (t != input_string) {
8155 length_in_chars = (utf8)
8156 ? utf8_length((U8 *) s, (U8 *) s + len)
8159 /* The first element in the output is the collation id, used by
8160 * sv_collxfrm(); then comes the space for the transformed string. The
8161 * equation should give us a good estimate as to how much is needed */
8162 xAlloc = COLLXFRM_HDR_LEN
8164 + (PL_collxfrm_mult * length_in_chars);
8165 Newx(xbuf, xAlloc, char);
8166 if (UNLIKELY(! xbuf)) {
8167 DEBUG_L(PerlIO_printf(Perl_debug_log,
8168 "mem_collxfrm_: Couldn't malloc %zu bytes\n", xAlloc));
8172 /* Store the collation id */
8173 *(PERL_UINTMAX_T *)xbuf = PL_collation_ix;
8175 # if defined(USE_POSIX_2008_LOCALE) && defined HAS_STRXFRM_L
8176 # ifdef USE_LOCALE_CTYPE
8178 constructed_locale = newlocale(LC_CTYPE_MASK, PL_collation_name,
8179 duplocale(use_curlocale_scratch()));
8182 constructed_locale = duplocale(use_curlocale_scratch());
8185 # define my_strxfrm(dest, src, n) strxfrm_l(dest, src, n, \
8187 # define CLEANUP_STRXFRM \
8189 if (constructed_locale != (locale_t) 0) \
8190 freelocale(constructed_locale); \
8193 # define my_strxfrm(dest, src, n) strxfrm(dest, src, n)
8194 # ifdef USE_LOCALE_CTYPE
8196 orig_CTYPE_locale = toggle_locale_c(LC_CTYPE, PL_collation_name);
8198 # define CLEANUP_STRXFRM \
8199 restore_toggled_locale_c(LC_CTYPE, orig_CTYPE_locale)
8201 # define CLEANUP_STRXFRM NOOP
8205 /* Then the transformation of the input. We loop until successful, or we
8210 *xlen = my_strxfrm(xbuf + COLLXFRM_HDR_LEN,
8212 xAlloc - COLLXFRM_HDR_LEN);
8215 /* If the transformed string occupies less space than we told strxfrm()
8216 * was available, it means it transformed the whole string. */
8217 if (*xlen < xAlloc - COLLXFRM_HDR_LEN) {
8219 /* But there still could have been a problem */
8221 DEBUG_L(PerlIO_printf(Perl_debug_log,
8222 "strxfrm failed for LC_COLLATE=%s; errno=%d, input=%s\n",
8223 PL_collation_name, errno,
8224 _byte_dump_string((U8 *) s, len, 0)));
8228 /* Here, the transformation was successful. Some systems include a
8229 * trailing NUL in the returned length. Ignore it, using a loop in
8230 * case multiple trailing NULs are returned. */
8232 && *(xbuf + COLLXFRM_HDR_LEN + (*xlen) - 1) == '\0')
8237 /* If the first try didn't get it, it means our prediction was low.
8238 * Modify the coefficients so that we predict a larger value in any
8239 * future transformations */
8241 STRLEN needed = *xlen + 1; /* +1 For trailing NUL */
8242 STRLEN computed_guess = PL_collxfrm_base
8243 + (PL_collxfrm_mult * length_in_chars);
8245 /* On zero-length input, just keep current slope instead of
8247 const STRLEN new_m = (length_in_chars != 0)
8248 ? needed / length_in_chars
8251 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
8252 "initial size of %zu bytes for a length "
8253 "%zu string was insufficient, %zu needed\n",
8254 computed_guess, length_in_chars, needed));
8256 /* If slope increased, use it, but discard this result for
8257 * length 1 strings, as we can't be sure that it's a real slope
8259 if (length_in_chars > 1 && new_m > PL_collxfrm_mult) {
8263 STRLEN old_m = PL_collxfrm_mult;
8264 STRLEN old_b = PL_collxfrm_base;
8268 PL_collxfrm_mult = new_m;
8269 PL_collxfrm_base = 1; /* +1 For trailing NUL */
8270 computed_guess = PL_collxfrm_base
8271 + (PL_collxfrm_mult * length_in_chars);
8272 if (computed_guess < needed) {
8273 PL_collxfrm_base += needed - computed_guess;
8276 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
8277 "slope is now %zu; was %zu, base "
8278 "is now %zu; was %zu\n",
8279 PL_collxfrm_mult, old_m,
8280 PL_collxfrm_base, old_b));
8282 else { /* Slope didn't change, but 'b' did */
8283 const STRLEN new_b = needed
8286 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
8287 "base is now %zu; was %zu\n", new_b, PL_collxfrm_base));
8288 PL_collxfrm_base = new_b;
8295 if (UNLIKELY(*xlen >= PERL_INT_MAX)) {
8296 DEBUG_L(PerlIO_printf(Perl_debug_log,
8297 "mem_collxfrm_: Needed %zu bytes, max permissible is %u\n",
8298 *xlen, PERL_INT_MAX));
8302 /* A well-behaved strxfrm() returns exactly how much space it needs
8303 * (usually not including the trailing NUL) when it fails due to not
8304 * enough space being provided. Assume that this is the case unless
8305 * it's been proven otherwise */
8306 if (LIKELY(PL_strxfrm_is_behaved) && first_time) {
8307 xAlloc = *xlen + COLLXFRM_HDR_LEN + 1;
8309 else { /* Here, either:
8310 * 1) The strxfrm() has previously shown bad behavior; or
8311 * 2) It isn't the first time through the loop, which means
8312 * that the strxfrm() is now showing bad behavior, because
8313 * we gave it what it said was needed in the previous
8314 * iteration, and it came back saying it needed still more.
8315 * (Many versions of cygwin fit this. When the buffer size
8316 * isn't sufficient, they return the input size instead of
8317 * how much is needed.)
8318 * Increase the buffer size by a fixed percentage and try again.
8320 xAlloc += (xAlloc / 4) + 1;
8321 PL_strxfrm_is_behaved = FALSE;
8323 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
8324 "mem_collxfrm_ required more space than previously"
8325 " calculated for locale %s, trying again with new"
8327 PL_collation_name, COLLXFRM_HDR_LEN,
8328 xAlloc - COLLXFRM_HDR_LEN));
8331 Renew(xbuf, xAlloc, char);
8332 if (UNLIKELY(! xbuf)) {
8333 DEBUG_L(PerlIO_printf(Perl_debug_log,
8334 "mem_collxfrm_: Couldn't realloc %zu bytes\n", xAlloc));
8343 DEBUG_L(print_collxfrm_input_and_return(s, s + len, xbuf, *xlen, utf8));
8345 /* Free up unneeded space; retain enough for trailing NUL */
8346 Renew(xbuf, COLLXFRM_HDR_LEN + *xlen + 1, char);
8348 if (s != input_string) {
8357 DEBUG_L(print_collxfrm_input_and_return(s, s + len, NULL, 0, utf8));
8360 if (s != input_string) {
8371 S_print_collxfrm_input_and_return(pTHX_
8379 PERL_ARGS_ASSERT_PRINT_COLLXFRM_INPUT_AND_RETURN;
8381 PerlIO_printf(Perl_debug_log,
8382 "mem_collxfrm_[ix %" UVuf "] for locale '%s':\n"
8383 " input=%s\n return=%s\n return len=%zu\n",
8384 (UV) PL_collation_ix, PL_collation_name,
8385 get_displayable_string(s, e, is_utf8),
8390 : _byte_dump_string((U8 *) xbuf + COLLXFRM_HDR_LEN,
8395 # endif /* DEBUGGING */
8398 Perl_strxfrm(pTHX_ SV * src)
8400 PERL_ARGS_ASSERT_STRXFRM;
8402 /* For use by POSIX::strxfrm(). If they differ, toggle LC_CTYPE to
8403 * LC_COLLATE to avoid potential mojibake.
8405 * If we can't calculate a collation, 'src' is instead returned, so that
8406 * future comparisons will be by code point order */
8408 # ifdef USE_LOCALE_CTYPE
8410 const char * orig_ctype = toggle_locale_c(LC_CTYPE,
8411 querylocale_c(LC_COLLATE));
8417 const char *p = SvPV_const(src, srclen);
8418 const U32 utf8_flag = SvUTF8(src);
8419 char *d = mem_collxfrm_(p, srclen, &dstlen, cBOOL(utf8_flag));
8421 assert(utf8_flag == 0 || utf8_flag == SVf_UTF8);
8425 dst =newSVpvn_flags(d + COLLXFRM_HDR_LEN,
8426 dstlen, SVs_TEMP|utf8_flag);
8430 # ifdef USE_LOCALE_CTYPE
8432 restore_toggled_locale_c(LC_CTYPE, orig_ctype);
8439 #endif /* USE_LOCALE_COLLATE */
8443 S_toggle_locale_i(pTHX_ const locale_category_index cat_index,
8444 const char * new_locale,
8445 const line_t caller_line)
8447 PERL_ARGS_ASSERT_TOGGLE_LOCALE_I;
8448 assert(cat_index <= LC_ALL_INDEX_);
8450 /* Changes the locale for the category specified by 'index' to 'new_locale,
8451 * if they aren't already the same.
8453 * Returns a copy of the name of the original locale for 'cat_index'
8454 * so can be switched back to with the companion function
8455 * restore_toggled_locale_i(), (NULL if no restoral is necessary.) */
8457 /* Find the original locale of the category we may need to change, so that
8458 * it can be restored to later */
8459 const char * locale_to_restore_to = querylocale_i(cat_index);
8461 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
8462 "(%" LINE_Tf "): toggle_locale_i: index=%d(%s), wanted=%s,"
8464 caller_line, cat_index, category_names[cat_index],
8465 new_locale, locale_to_restore_to));
8467 if (! locale_to_restore_to) {
8468 locale_panic_via_(Perl_form(aTHX_
8469 "Could not find current %s locale",
8470 category_names[cat_index]),
8471 __FILE__, caller_line);
8474 /* If the locales are the same, there's nothing to do */
8475 if (strEQ(locale_to_restore_to, new_locale)) {
8476 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
8477 "(%" LINE_Tf "): %s locale unchanged as %s\n",
8478 caller_line, category_names[cat_index],
8484 /* Finally, change the locale to the new one */
8485 void_setlocale_i_with_caller(cat_index, new_locale, __FILE__, caller_line);
8487 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
8488 "(%" LINE_Tf "): %s locale switched to %s\n",
8489 caller_line, category_names[cat_index], new_locale));
8491 return locale_to_restore_to;
8494 PERL_UNUSED_ARG(caller_line);
8500 S_restore_toggled_locale_i(pTHX_ const locale_category_index cat_index,
8501 const char * restore_locale,
8502 const line_t caller_line)
8504 /* Restores the locale for LC_category corresponding to cat_index to
8505 * 'restore_locale' (which is a copy that will be freed by this function),
8506 * or do nothing if the latter parameter is NULL */
8508 PERL_ARGS_ASSERT_RESTORE_TOGGLED_LOCALE_I;
8509 assert(cat_index <= LC_ALL_INDEX_);
8511 if (restore_locale == NULL) {
8512 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
8513 "(%" LINE_Tf "): No need to restore %s\n",
8514 caller_line, category_names[cat_index]));
8518 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
8519 "(%" LINE_Tf "): %s restoring locale to %s\n",
8520 caller_line, category_names[cat_index],
8523 void_setlocale_i_with_caller(cat_index, restore_locale,
8524 __FILE__, caller_line);
8527 PERL_UNUSED_ARG(caller_line);
8532 # ifdef USE_LOCALE_CTYPE
8535 S_is_codeset_name_UTF8(const char * name)
8537 /* Return a boolean as to if the passed-in name indicates it is a UTF-8
8538 * code set. Several variants are possible */
8539 const Size_t len = strlen(name);
8541 PERL_ARGS_ASSERT_IS_CODESET_NAME_UTF8;
8545 /* http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756.aspx */
8546 if (memENDs(name, len, "65001")) {
8551 /* 'UTF8' or 'UTF-8' */
8552 return ( inRANGE(len, 4, 5)
8553 && name[len-1] == '8'
8554 && ( memBEGINs(name, len, "UTF")
8555 || memBEGINs(name, len, "utf"))
8556 && (len == 4 || name[3] == '-'));
8560 #endif /* USE_LOCALE */
8563 Perl__is_in_locale_category(pTHX_ const bool compiling, const int category)
8565 /* Internal function which returns if we are in the scope of a pragma that
8566 * enables the locale category 'category'. 'compiling' should indicate if
8567 * this is during the compilation phase (TRUE) or not (FALSE). */
8569 const COP * const cop = (compiling) ? &PL_compiling : PL_curcop;
8571 SV *these_categories = cop_hints_fetch_pvs(cop, "locale", 0);
8572 if (! these_categories || these_categories == &PL_sv_placeholder) {
8576 /* The pseudo-category 'not_characters' is -1, so just add 1 to each to get
8577 * a valid unsigned */
8578 assert(category >= -1);
8579 return cBOOL(SvUV(these_categories) & (1U << (category + 1)));
8582 /* my_strerror() returns a mortalized copy of the text of the error message
8583 * associated with 'errnum'.
8585 * If not called from within the scope of 'use locale', it uses the text from
8586 * the C locale. If Perl is compiled to not pay attention to LC_CTYPE nor
8587 * LC_MESSAGES, it uses whatever strerror() returns. Otherwise the text is
8588 * derived from the locale, LC_MESSAGES if we have that; LC_CTYPE if not.
8590 * It returns in *utf8ness the result's UTF-8ness
8592 * The function just calls strerror(), but temporarily switches locales, if
8593 * needed. Many platforms require LC_CTYPE and LC_MESSAGES to be in the same
8594 * CODESET in order for the return from strerror() to not contain '?' symbols,
8595 * or worse, mojibaked. It's cheaper to just use the stricter criteria of
8596 * being in the same locale. So the code below uses a common locale for both
8597 * categories. Again, that is C if not within 'use locale' scope; or the
8598 * LC_MESSAGES locale if in scope and we have that category; and LC_CTYPE if we
8599 * don't have LC_MESSAGES; and whatever strerror returns if we don't have
8602 * There are two sets of implementations. The first below is if we have
8603 * strerror_l(). This is the simpler. We just use the already-built C locale
8604 * object if not in locale scope, or build up a custom one otherwise.
8606 * When strerror_l() is not available, we may have to swap locales temporarily
8607 * to bring the two categories into sync with each other, and possibly to the C
8610 * Because the prepropessing directives to conditionally compile this function
8611 * would greatly obscure the logic of the various implementations, the whole
8612 * function is repeated for each configuration, with some common macros. */
8614 /* Used to shorten the definitions of the following implementations of
8616 #define DEBUG_STRERROR_ENTER(errnum, in_locale) \
8617 DEBUG_Lv(PerlIO_printf(Perl_debug_log, \
8618 "my_strerror called with errnum %d;" \
8619 " Within locale scope=%d\n", \
8622 #define DEBUG_STRERROR_RETURN(errstr, utf8ness) \
8623 DEBUG_Lv(PerlIO_printf(Perl_debug_log, \
8624 "Strerror returned; saving a copy: '%s';" \
8626 get_displayable_string(errstr, \
8627 errstr + strlen(errstr), \
8631 /* On platforms that have precisely one of these categories (Windows
8632 * qualifies), these yield the correct one */
8633 #if defined(USE_LOCALE_CTYPE)
8634 # define WHICH_LC_INDEX LC_CTYPE_INDEX_
8635 #elif defined(USE_LOCALE_MESSAGES)
8636 # define WHICH_LC_INDEX LC_MESSAGES_INDEX_
8639 /*===========================================================================*/
8640 /* First set of implementations, when have strerror_l() */
8642 #if defined(USE_POSIX_2008_LOCALE) && defined(HAS_STRERROR_L)
8644 # if ! defined(USE_LOCALE_CTYPE) && ! defined(USE_LOCALE_MESSAGES)
8646 /* Here, neither category is defined: use the C locale */
8648 Perl_my_strerror(pTHX_ const int errnum, utf8ness_t * utf8ness)
8650 PERL_ARGS_ASSERT_MY_STRERROR;
8652 DEBUG_STRERROR_ENTER(errnum, 0);
8654 const char *errstr = savepv(strerror_l(errnum, PL_C_locale_obj));
8655 *utf8ness = UTF8NESS_IMMATERIAL;
8657 DEBUG_STRERROR_RETURN(errstr, utf8ness);
8663 # elif ! defined(USE_LOCALE_CTYPE) || ! defined(USE_LOCALE_MESSAGES)
8665 /*--------------------------------------------------------------------------*/
8667 /* Here one or the other of CTYPE or MESSAGES is defined, but not both. If we
8668 * are not within 'use locale' scope of the only one defined, we use the C
8669 * locale; otherwise use the current locale object */
8672 Perl_my_strerror(pTHX_ const int errnum, utf8ness_t * utf8ness)
8674 PERL_ARGS_ASSERT_MY_STRERROR;
8676 DEBUG_STRERROR_ENTER(errnum, IN_LC(categories[WHICH_LC_INDEX]));
8678 /* Use C if not within locale scope; Otherwise, use current locale */
8679 const locale_t which_obj = (IN_LC(categories[WHICH_LC_INDEX]))
8681 : use_curlocale_scratch();
8683 const char *errstr = savepv(strerror_l(errnum, which_obj));
8684 *utf8ness = get_locale_string_utf8ness_i(errstr, LOCALE_UTF8NESS_UNKNOWN,
8685 NULL, WHICH_LC_INDEX);
8686 DEBUG_STRERROR_RETURN(errstr, utf8ness);
8692 /*--------------------------------------------------------------------------*/
8693 # else /* Are using both categories. Place them in the same CODESET,
8694 * either C or the LC_MESSAGES locale */
8697 Perl_my_strerror(pTHX_ const int errnum, utf8ness_t * utf8ness)
8699 PERL_ARGS_ASSERT_MY_STRERROR;
8701 DEBUG_STRERROR_ENTER(errnum, IN_LC(LC_MESSAGES));
8704 if (! IN_LC(LC_MESSAGES)) { /* Use C if not within locale scope */
8705 errstr = savepv(strerror_l(errnum, PL_C_locale_obj));
8706 *utf8ness = UTF8NESS_IMMATERIAL;
8708 else { /* Otherwise, use the LC_MESSAGES locale, making sure LC_CTYPE
8710 locale_t cur = duplocale(use_curlocale_scratch());
8712 cur = newlocale(LC_CTYPE_MASK, querylocale_c(LC_MESSAGES), cur);
8713 errstr = savepv(strerror_l(errnum, cur));
8714 *utf8ness = get_locale_string_utf8ness_i(errstr,
8715 LOCALE_UTF8NESS_UNKNOWN,
8716 NULL, LC_MESSAGES_INDEX_);
8720 DEBUG_STRERROR_RETURN(errstr, utf8ness);
8725 # endif /* Above is using strerror_l */
8726 /*===========================================================================*/
8727 #else /* Below is not using strerror_l */
8728 # if ! defined(USE_LOCALE_CTYPE) && ! defined(USE_LOCALE_MESSAGES)
8730 /* If not using using either of the categories, return plain, unadorned
8734 Perl_my_strerror(pTHX_ const int errnum, utf8ness_t * utf8ness)
8736 PERL_ARGS_ASSERT_MY_STRERROR;
8738 DEBUG_STRERROR_ENTER(errnum, 0);
8740 const char *errstr = savepv(Strerror(errnum));
8741 *utf8ness = UTF8NESS_IMMATERIAL;
8743 DEBUG_STRERROR_RETURN(errstr, utf8ness);
8749 /*--------------------------------------------------------------------------*/
8750 # elif ! defined(USE_LOCALE_CTYPE) || ! defined(USE_LOCALE_MESSAGES)
8752 /* Here one or the other of CTYPE or MESSAGES is defined, but not both. If we
8753 * are not within 'use locale' scope of the only one defined, we use the C
8754 * locale; otherwise use the current locale */
8757 Perl_my_strerror(pTHX_ const int errnum, utf8ness_t * utf8ness)
8759 PERL_ARGS_ASSERT_MY_STRERROR;
8761 DEBUG_STRERROR_ENTER(errnum, IN_LC(categories[WHICH_LC_INDEX]));
8764 if (IN_LC(categories[WHICH_LC_INDEX])) {
8765 errstr = savepv(Strerror(errnum));
8766 *utf8ness = get_locale_string_utf8ness_i(errstr,
8767 LOCALE_UTF8NESS_UNKNOWN,
8768 NULL, WHICH_LC_INDEX);
8774 const char * orig_locale = toggle_locale_i(WHICH_LC_INDEX, "C");
8776 errstr = savepv(Strerror(errnum));
8778 restore_toggled_locale_i(WHICH_LC_INDEX, orig_locale);
8782 *utf8ness = UTF8NESS_IMMATERIAL;
8785 DEBUG_STRERROR_RETURN(errstr, utf8ness);
8791 /*--------------------------------------------------------------------------*/
8794 /* Below, have both LC_CTYPE and LC_MESSAGES. Place them in the same CODESET,
8795 * either C or the LC_MESSAGES locale */
8798 Perl_my_strerror(pTHX_ const int errnum, utf8ness_t * utf8ness)
8800 PERL_ARGS_ASSERT_MY_STRERROR;
8802 DEBUG_STRERROR_ENTER(errnum, IN_LC(LC_MESSAGES));
8804 const char * desired_locale = (IN_LC(LC_MESSAGES))
8805 ? querylocale_c(LC_MESSAGES)
8807 /* XXX Can fail on z/OS */
8811 const char* orig_CTYPE_locale = toggle_locale_c(LC_CTYPE,
8813 const char* orig_MESSAGES_locale = toggle_locale_c(LC_MESSAGES,
8815 const char *errstr = savepv(Strerror(errnum));
8817 restore_toggled_locale_c(LC_MESSAGES, orig_MESSAGES_locale);
8818 restore_toggled_locale_c(LC_CTYPE, orig_CTYPE_locale);
8822 *utf8ness = get_locale_string_utf8ness_i(errstr, LOCALE_UTF8NESS_UNKNOWN,
8823 NULL, LC_MESSAGES_INDEX_);
8824 DEBUG_STRERROR_RETURN(errstr, utf8ness);
8830 /*--------------------------------------------------------------------------*/
8831 # endif /* end of not using strerror_l() */
8832 #endif /* end of all the my_strerror() implementations */
8836 =for apidoc switch_to_global_locale
8838 This function copies the locale state of the calling thread into the program's
8839 global locale, and converts the thread to use that global locale.
8841 It is intended so that Perl can safely be used with C libraries that access the
8842 global locale and which can't be converted to not access it. Effectively, this
8843 means libraries that call C<L<setlocale(3)>> on non-Windows systems. (For
8844 portability, it is a good idea to use it on Windows as well.)
8846 A downside of using it is that it disables the services that Perl provides to
8847 hide locale gotchas from your code. The service you most likely will miss
8848 regards the radix character (decimal point) in floating point numbers. Code
8849 executed after this function is called can no longer just assume that this
8850 character is correct for the current circumstances.
8852 To return to Perl control, and restart the gotcha prevention services, call
8853 C<L</sync_locale>>. Behavior is undefined for any pure Perl code that executes
8854 while the switch is in effect.
8856 The global locale and the per-thread locales are independent. As long as just
8857 one thread converts to the global locale, everything works smoothly. But if
8858 more than one does, they can easily interfere with each other, and races are
8859 likely. On Windows systems prior to Visual Studio 15 (at which point Microsoft
8860 fixed a bug), races can occur (even if only one thread has been converted to
8861 the global locale), but only if you use the following operations:
8865 =item L<POSIX::localeconv|POSIX/localeconv>
8867 =item L<I18N::Langinfo>, items C<CRNCYSTR> and C<THOUSEP>
8869 =item L<perlapi/Perl_langinfo>, items C<CRNCYSTR> and C<THOUSEP>
8873 The first item is not fixable (except by upgrading to a later Visual Studio
8874 release), but it would be possible to work around the latter two items by
8875 having Perl change its algorithm for calculating these to use Windows API
8876 functions (likely C<GetNumberFormat> and C<GetCurrencyFormat>); patches
8879 XS code should never call plain C<setlocale>, but should instead be converted
8880 to either call L<C<Perl_setlocale>|perlapi/Perl_setlocale> (which is a drop-in
8881 for the system C<setlocale>) or use the methods given in L<perlcall> to call
8882 L<C<POSIX::setlocale>|POSIX/setlocale>. Either one will transparently properly
8883 handle all cases of single- vs multi-thread, POSIX 2008-supported or not.
8888 #if defined(WIN32) && defined(USE_THREAD_SAFE_LOCALE)
8889 # define CHANGE_SYSTEM_LOCALE_TO_GLOBAL \
8891 if (_configthreadlocale(_DISABLE_PER_THREAD_LOCALE) == -1) { \
8892 locale_panic_("_configthreadlocale returned an error"); \
8895 #elif defined(USE_POSIX_2008_LOCALE)
8896 # define CHANGE_SYSTEM_LOCALE_TO_GLOBAL \
8898 locale_t old_locale = uselocale(LC_GLOBAL_LOCALE); \
8899 if (! old_locale) { \
8900 locale_panic_("Could not change to global locale"); \
8903 /* Free the per-thread memory */ \
8904 if ( old_locale != LC_GLOBAL_LOCALE \
8905 && old_locale != PL_C_locale_obj) \
8907 freelocale(old_locale); \
8911 # define CHANGE_SYSTEM_LOCALE_TO_GLOBAL
8915 Perl_switch_to_global_locale(pTHX)
8920 DEBUG_L(PerlIO_printf(Perl_debug_log, "Entering switch_to_global; %s\n",
8921 get_LC_ALL_display()));
8923 /* In these cases, we use the system state to determine if we are in the
8924 * global locale or not. */
8925 # ifdef USE_POSIX_2008_LOCALE
8927 const bool perl_controls = (LC_GLOBAL_LOCALE != uselocale((locale_t) 0));
8929 # elif defined(USE_THREAD_SAFE_LOCALE) && defined(WIN32)
8931 int config_return = _configthreadlocale(0);
8932 if (config_return == -1) {
8933 locale_panic_("_configthreadlocale returned an error");
8935 const bool perl_controls = (config_return == _ENABLE_PER_THREAD_LOCALE);
8939 const bool perl_controls = false;
8943 /* No-op if already in global */
8944 if (! perl_controls) {
8950 const char * thread_locale = calculate_LC_ALL_string(NULL,
8951 EXTERNAL_FORMAT_FOR_SET,
8954 CHANGE_SYSTEM_LOCALE_TO_GLOBAL;
8955 posix_setlocale(LC_ALL, thread_locale);
8957 # else /* Must be USE_POSIX_2008_LOCALE) */
8959 const char * cur_thread_locales[LC_ALL_INDEX_];
8961 /* Save each category's current per-thread state */
8962 for_all_individual_category_indexes(i) {
8963 cur_thread_locales[i] = querylocale_i(i);
8966 CHANGE_SYSTEM_LOCALE_TO_GLOBAL;
8968 /* Set the global to what was our per-thread state */
8969 POSIX_SETLOCALE_LOCK;
8970 for_all_individual_category_indexes(i) {
8971 posix_setlocale(categories[i], cur_thread_locales[i]);
8973 POSIX_SETLOCALE_UNLOCK;
8976 # ifdef USE_LOCALE_NUMERIC
8978 /* Switch to the underlying C numeric locale; the application is on its
8980 POSIX_SETLOCALE_LOCK;
8981 posix_setlocale(LC_NUMERIC, PL_numeric_name);
8982 POSIX_SETLOCALE_UNLOCK;
8991 =for apidoc sync_locale
8993 This function copies the state of the program global locale into the calling
8994 thread, and converts that thread to using per-thread locales, if it wasn't
8995 already, and the platform supports them. The LC_NUMERIC locale is toggled into
8996 the standard state (using the C locale's conventions), if not within the
8997 lexical scope of S<C<use locale>>.
8999 Perl will now consider itself to have control of the locale.
9001 Since unthreaded perls have only a global locale, this function is a no-op
9004 This function is intended for use with C libraries that do locale manipulation.
9005 It allows Perl to accommodate the use of them. Call this function before
9006 transferring back to Perl space so that it knows what state the C code has left
9009 XS code should not manipulate the locale on its own. Instead,
9010 L<C<Perl_setlocale>|perlapi/Perl_setlocale> can be used at any time to query or
9011 change the locale (though changing the locale is antisocial and dangerous on
9012 multi-threaded systems that don't have multi-thread safe locale operations.
9013 (See L<perllocale/Multi-threaded operation>).
9015 Using the libc L<C<setlocale(3)>> function should be avoided. Nevertheless,
9016 certain non-Perl libraries called from XS, do call it, and their behavior may
9017 not be able to be changed. This function, along with
9018 C<L</switch_to_global_locale>>, can be used to get seamless behavior in these
9019 circumstances, as long as only one thread is involved.
9021 If the library has an option to turn off its locale manipulation, doing that is
9022 preferable to using this mechanism. C<Gtk> is such a library.
9024 The return value is a boolean: TRUE if the global locale at the time of call
9025 was in effect for the caller; and FALSE if a per-thread locale was in effect.
9031 Perl_sync_locale(pTHX)
9040 bool was_in_global = TRUE;
9042 # ifdef USE_THREAD_SAFE_LOCALE
9045 int config_return = _configthreadlocale(_DISABLE_PER_THREAD_LOCALE);
9046 if (config_return == -1) {
9047 locale_panic_("_configthreadlocale returned an error");
9049 was_in_global = (config_return == _DISABLE_PER_THREAD_LOCALE);
9051 # elif defined(USE_POSIX_2008_LOCALE)
9053 was_in_global = (LC_GLOBAL_LOCALE == uselocale(LC_GLOBAL_LOCALE));
9056 # error Unexpected Configuration
9058 # endif /* USE_THREAD_SAFE_LOCALE */
9060 /* Here, we are in the global locale. Get and save the values for each
9061 * category, and convert the current thread to use them */
9065 STDIZED_SETLOCALE_LOCK;
9066 const char * lc_all_string = savepv(stdized_setlocale(LC_ALL, NULL));
9067 STDIZED_SETLOCALE_UNLOCK;
9069 give_perl_locale_control(lc_all_string, __LINE__);
9070 Safefree(lc_all_string);
9074 const char * current_globals[LC_ALL_INDEX_];
9075 for_all_individual_category_indexes(i) {
9076 STDIZED_SETLOCALE_LOCK;
9077 current_globals[i] = savepv(stdized_setlocale(categories[i], NULL));
9078 STDIZED_SETLOCALE_UNLOCK;
9081 give_perl_locale_control((const char **) ¤t_globals, __LINE__);
9083 for_all_individual_category_indexes(i) {
9084 Safefree(current_globals[i]);
9089 return was_in_global;
9095 #if defined(DEBUGGING) && defined(USE_LOCALE)
9098 S_my_setlocale_debug_string_i(pTHX_
9099 const locale_category_index cat_index,
9100 const char* locale, /* Optional locale name */
9102 /* return value from setlocale() when attempting
9103 * to set 'category' to 'locale' */
9108 /* Returns a pointer to a NUL-terminated string in static storage with
9109 * added text about the info passed in. This is not thread safe and will
9110 * be overwritten by the next call, so this should be used just to
9111 * formulate a string to immediately print or savepv() on. */
9113 const char * locale_quote;
9114 const char * retval_quote;
9116 assert(cat_index <= LC_ALL_INDEX_);
9118 if (locale == NULL) {
9123 locale_quote = "\"";
9126 if (retval == NULL) {
9131 retval_quote = "\"";
9134 # ifdef USE_LOCALE_THREADS
9135 # define THREAD_FORMAT "%p:"
9136 # define THREAD_ARGUMENT aTHX_
9138 # define THREAD_FORMAT
9139 # define THREAD_ARGUMENT
9142 return Perl_form(aTHX_
9143 "%s:%" LINE_Tf ": " THREAD_FORMAT
9144 " setlocale(%s[%d], %s%s%s) returned %s%s%s\n",
9146 __FILE__, line, THREAD_ARGUMENT
9147 category_names[cat_index], categories[cat_index],
9148 locale_quote, locale, locale_quote,
9149 retval_quote, retval, retval_quote);
9153 #ifdef USE_PERL_SWITCH_LOCALE_CONTEXT
9156 Perl_switch_locale_context(pTHX)
9158 /* libc keeps per-thread locale status information in some configurations.
9159 * So, we can't just switch out aTHX to switch to a new thread. libc has
9160 * to follow along. This routine does that based on per-interpreter
9161 * variables we keep just for this purpose.
9163 * There are two implementations where this is an issue. For the other
9164 * implementations, it doesn't matter because libc is using global values
9165 * that all threads know about.
9167 * The two implementations are where libc keeps thread-specific information
9168 * on its own. These are
9170 * POSIX 2008: The current locale is kept by libc as an object. We save
9171 * a copy of that in the per-thread PL_cur_locale_obj, and so
9172 * this routine uses that copy to tell the thread it should be
9173 * operating with that object
9174 * Windows thread-safe locales: A given thread in Windows can be being run
9175 * with per-thread locales, or not. When the thread context
9176 * changes, libc doesn't automatically know if the thread is
9177 * using per-thread locales, nor does it know what the new
9178 * thread's locale is. We keep that information in the
9179 * per-thread variables:
9180 * PL_controls_locale indicates if this thread is using
9181 * per-thread locales or not
9182 * PL_cur_LC_ALL indicates what the the locale
9183 * should be if it is a per-thread
9187 if (UNLIKELY( PL_veto_switch_non_tTHX_context
9188 || PL_phase == PERL_PHASE_CONSTRUCT))
9193 # ifdef USE_POSIX_2008_LOCALE
9195 if (! uselocale(PL_cur_locale_obj)) {
9196 locale_panic_(Perl_form(aTHX_
9197 "Can't uselocale(%p), LC_ALL supposed to"
9199 PL_cur_locale_obj, get_LC_ALL_display()));
9202 # elif defined(WIN32)
9204 if (! bool_setlocale_c(LC_ALL, PL_cur_LC_ALL)) {
9205 locale_panic_(Perl_form(aTHX_ "Can't setlocale(%s)", PL_cur_LC_ALL));
9215 Perl_thread_locale_init(pTHX)
9218 #ifdef USE_THREAD_SAFE_LOCALE
9219 # ifdef USE_POSIX_2008_LOCALE
9221 /* Called from a thread on startup.
9223 * The operations here have to be done from within the calling thread, as
9224 * they affect libc's knowledge of the thread; libc has no knowledge of
9227 DEBUG_L(PerlIO_printf(Perl_debug_log,
9228 "new thread, initial locale is %s;"
9229 " calling setlocale(LC_ALL, \"C\")\n",
9230 get_LC_ALL_display()));
9232 if (! uselocale(PL_C_locale_obj)) {
9234 /* Not being able to change to the C locale is severe; don't keep
9236 locale_panic_(Perl_form(aTHX_
9237 "Can't uselocale(%p), 'C'", PL_C_locale_obj));
9238 NOT_REACHED; /* NOTREACHED */
9241 # ifdef MULTIPLICITY
9243 PL_cur_locale_obj = PL_C_locale_obj;
9246 # elif defined(WIN32)
9248 /* On Windows, make sure new thread has per-thread locales enabled */
9249 if (_configthreadlocale(_ENABLE_PER_THREAD_LOCALE) == -1) {
9250 locale_panic_("_configthreadlocale returned an error");
9252 void_setlocale_c(LC_ALL, "C");
9260 Perl_thread_locale_term(pTHX)
9262 /* Called from a thread as it gets ready to terminate.
9264 * The operations here have to be done from within the calling thread, as
9265 * they affect libc's knowledge of the thread; libc has no knowledge of
9268 #if defined(USE_POSIX_2008_LOCALE) && defined(USE_THREADS)
9270 /* Switch to the global locale, so can free up the per-thread object */
9271 locale_t actual_obj = uselocale(LC_GLOBAL_LOCALE);
9272 if (actual_obj != LC_GLOBAL_LOCALE && actual_obj != PL_C_locale_obj) {
9273 freelocale(actual_obj);
9276 /* Prevent leaks even if something has gone wrong */
9277 locale_t expected_obj = PL_cur_locale_obj;
9278 if (UNLIKELY( expected_obj != actual_obj
9279 && expected_obj != LC_GLOBAL_LOCALE
9280 && expected_obj != PL_C_locale_obj))
9282 freelocale(expected_obj);
9285 PL_cur_locale_obj = LC_GLOBAL_LOCALE;
9288 #ifdef WIN32_USE_FAKE_OLD_MINGW_LOCALES
9290 /* When faking the mingw implementation, we coerce this function into doing
9291 * something completely different from its intent -- namely to free up our
9292 * static buffer to avoid a leak. This function gets called for each
9293 * thread that is terminating, so will give us a chance to free the buffer
9294 * from the appropriate pool. On unthreaded systems, it gets called by the
9295 * mutex termination code. */
9297 # ifdef MULTIPLICITY
9299 if (aTHX != wsetlocale_buf_aTHX) {
9305 if (wsetlocale_buf_size > 0) {
9306 Safefree(wsetlocale_buf);
9307 wsetlocale_buf_size = 0;
9315 * ex: set ts=8 sts=4 sw=4 et: