This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
locale.c: Stop compiler warning
[perl5.git] / locale.c
CommitLineData
98994639
HS
1/* locale.c
2 *
1129b882
NC
3 * Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001,
4 * 2002, 2003, 2005, 2006, 2007, 2008 by Larry Wall and others
98994639
HS
5 *
6 * You may distribute under the terms of either the GNU General Public
7 * License or the Artistic License, as specified in the README file.
8 *
9 */
10
11/*
4ac71550 12 * A Elbereth Gilthoniel,
cdad3b53 13 * silivren penna míriel
4ac71550 14 * o menel aglar elenath!
cdad3b53 15 * Na-chaered palan-díriel
4ac71550
TC
16 * o galadhremmin ennorath,
17 * Fanuilos, le linnathon
18 * nef aear, si nef aearon!
19 *
20 * [p.238 of _The Lord of the Rings_, II/i: "Many Meetings"]
98994639
HS
21 */
22
166f8a29
DM
23/* utility functions for handling locale-specific stuff like what
24 * character represents the decimal point.
0d071d52 25 *
7d4bcc4a
KW
26 * All C programs have an underlying locale. Perl code generally doesn't pay
27 * any attention to it except within the scope of a 'use locale'. For most
0d071d52
KW
28 * categories, it accomplishes this by just using different operations if it is
29 * in such scope than if not. However, various libc functions called by Perl
30 * are affected by the LC_NUMERIC category, so there are macros in perl.h that
31 * are used to toggle between the current locale and the C locale depending on
a9ad02a8
KW
32 * the desired behavior of those functions at the moment. And, LC_MESSAGES is
33 * switched to the C locale for outputting the message unless within the scope
34 * of 'use locale'.
e9bc6d6b
KW
35 *
36 * This code now has multi-thread-safe locale handling on systems that support
37 * that. This is completely transparent to most XS code. On earlier systems,
38 * it would be possible to emulate thread-safe locales, but this likely would
39 * involve a lot of locale switching, and would require XS code changes.
40 * Macros could be written so that the code wouldn't have to know which type of
fe1c1494
KW
41 * system is being used.
42 *
43 * Table-driven code is used for simplicity and clarity, as many operations
44 * differ only in which category is being worked on. However the system
45 * categories need not be small contiguous integers, so do not lend themselves
46 * to table lookup. Instead we have created our own equivalent values which
47 * are all small contiguous non-negative integers, and translation functions
48 * between the two sets. For category 'LC_foo', the name of our index is
49 * LC_foo_INDEX_. Various parallel tables, indexed by these, are used.
50 *
51 * Many of the macros and functions in this file have one of the suffixes '_c',
52 * '_r', or '_i'. khw found these useful in remembering what type of locale
53 * category to use as their parameter. '_r' takes an int category number as
54 * passed to setlocale(), like LC_ALL, LC_CTYPE, etc. The 'r' indicates that
55 * the value isn't known until runtime. '_c' also indicates such a category
56 * number, but its value is known at compile time. These are both converted
57 * into unsigned indexes into various tables of category information, where the
58 * real work is generally done. The tables are generated at compile-time based
59 * on platform characteristics and Configure options. They hide from the code
60 * many of the vagaries of the different locale implementations out there. You
61 * may have already guessed that '_i' indicates the parameter is such an
62 * unsigned index. Converting from '_r' to '_i' requires run-time lookup.
63 * '_c' is used to get cpp to do this at compile time. To avoid the runtime
64 * expense, the code is structured to use '_r' at the API level, and once
65 * converted, everything possible is done using the table indexes.
66 *
67 * On unthreaded perls, most operations expand out to just the basic
68 * setlocale() calls. The same is true on threaded perls on modern Windows
69 * systems where the same API, after set up, is used for thread-safe locale
70 * handling. On other systems, there is a completely different API, specified
71 * in POSIX 2008, to do thread-safe locales. On these systems, our
72 * emulate_setlocale_i() function is used to hide the different API from the
73 * outside. This makes it completely transparent to most XS code.
74 *
75 * A huge complicating factor is that the LC_NUMERIC category is normally held
76 * in the C locale, except during those relatively rare times when it needs to
77 * be in the underlying locale. There is a bunch of code to accomplish this,
78 * and to allow easy switches from one state to the other.
79 *
58b390f2
KW
80 * In addition, the setlocale equivalents have versions for the return context,
81 * 'void' and 'bool', besides the full return value. This can present
82 * opportunities for avoiding work. We don't have to necessarily create a safe
83 * copy to return if no return is desired.
84 *
85 * There are 3.5 major implementations here; which one chosen depends on what
86 * the platform has available, and Configuration options.
87 *
88 * 1) Raw my_setlocale(). Here the layer adds nothing. This is used for
89 * unthreaded perls, and when the API for safe locale threading is identical
90 * to the unsafe API (Windows, currently).
91 *
92 * 2) A minimal layer that makes my_setlocale() uninterruptible and returns a
93 * per-thread/per-category value.
94 *
95 * 3a and 3b) A layer that implements POSIX 2008 thread-safe locale handling,
96 * mapping the setlocale() API to them. This automatically makes almost all
97 * code thread-safe without need for changes. This layer is chosen on
98 * threaded perls when the platform supports the POSIX 2008 functions, and
99 * when there is no manual override in Configure.
100 *
101 * 3a) is when the platform has a reliable querylocale() function or
102 * equivalent that is selected to be used.
103 * 3b) is when we have to emulate that functionality.
104 *
fe1c1494
KW
105 * z/OS (os390) is an outlier. Locales really don't work under threads when
106 * either the radix character isn't a dot, or attempts are made to change
107 * locales after the first thread is created. The reason is that IBM has made
108 * it thread-safe by refusing to change locales (returning failure if
109 * attempted) any time after an application has called pthread_create() to
110 * create another thread. The expectation is that an application will set up
111 * its locale information before the first fork, and be stable thereafter. But
112 * perl toggles LC_NUMERIC if the locale's radix character isn't a dot, as do
113 * the other toggles, which are less common.
166f8a29
DM
114 */
115
4767484d
KW
116/* If the environment says to, we can output debugging information during
117 * initialization. This is done before option parsing, and before any thread
118 * creation, so can be a file-level static. (Must come before #including
119 * perl.h) */
120#ifdef DEBUGGING
121static int debug_initialization = 0;
122# define DEBUG_INITIALIZATION_set(v) (debug_initialization = v)
123# define DEBUG_LOCALE_INITIALIZATION_ debug_initialization
124#else
125# define debug_initialization 0
126# define DEBUG_INITIALIZATION_set(v)
127#endif
128
b26541a9
KW
129#define DEBUG_PRE_STMTS dSAVE_ERRNO; \
130 PerlIO_printf(Perl_debug_log, "%s: %" LINE_Tf ": ", __FILE__, __LINE__);
0494a7e2
KW
131#define DEBUG_POST_STMTS RESTORE_ERRNO;
132
98994639
HS
133#include "EXTERN.h"
134#define PERL_IN_LOCALE_C
135#include "perl.h"
136
a4af207c
JH
137#include "reentr.h"
138
0dec74cd
KW
139#ifdef I_WCHAR
140# include <wchar.h>
141#endif
5b64f24c
KW
142#ifdef I_WCTYPE
143# include <wctype.h>
144#endif
0dec74cd 145
988faa11
KW
146#ifdef USE_LOCALE
147
22392ba2
KW
148PERL_STATIC_INLINE const char *
149S_mortalized_pv_copy(pTHX_ const char * const pv)
150{
151 PERL_ARGS_ASSERT_MORTALIZED_PV_COPY;
152
153 /* Copies the input pv, and arranges for it to be freed at an unspecified
154 * later time. */
155
156 if (pv == NULL) {
157 return NULL;
158 }
159
160 const char * copy = savepv(pv);
161 SAVEFREEPV(copy);
162 return copy;
163}
164
988faa11 165#endif
48015184
KW
166
167/* Returns the Unix errno portion; ignoring any others. This is a macro here
168 * instead of putting it into perl.h, because unclear to khw what should be
169 * done generally. */
170#define GET_ERRNO saved_errno
171
2fbc73dd 172/* Default values come from the C locale */
95e4742c
KW
173#define C_codeset "ANSI_X3.4-1968" /* Only in some Configurations, and usually
174 a single instance, so is a #define */
2fbc73dd
KW
175static const char C_decimal_point[] = ".";
176static const char C_thousands_sep[] = "";
177
63e5b0d7
KW
178/* Is the C string input 'name' "C" or "POSIX"? If so, and 'name' is the
179 * return of setlocale(), then this is extremely likely to be the C or POSIX
180 * locale. However, the output of setlocale() is documented to be opaque, but
181 * the odds are extremely small that it would return these two strings for some
182 * other locale. Note that VMS in these two locales includes many non-ASCII
183 * characters as controls and punctuation (below are hex bytes):
184 * cntrl: 84-97 9B-9F
185 * punct: A1-A3 A5 A7-AB B0-B3 B5-B7 B9-BD BF-CF D1-DD DF-EF F1-FD
186 * Oddly, none there are listed as alphas, though some represent alphabetics
187 * http://www.nntp.perl.org/group/perl.perl5.porters/2013/02/msg198753.html */
188#define isNAME_C_OR_POSIX(name) \
189 ( (name) != NULL \
190 && (( *(name) == 'C' && (*(name + 1)) == '\0') \
191 || strEQ((name), "POSIX")))
192
3871c541
KW
193#if defined(HAS_NL_LANGINFO_L) || defined(HAS_NL_LANGINFO)
194# define HAS_SOME_LANGINFO
195#endif
196#if defined(HAS_LOCALECONV) || defined(HAS_LOCALECONV_L)
197# define HAS_SOME_LOCALECONV
198#endif
199
fffcb78f
KW
200#define my_langinfo_c(item, category, locale, retbufp, retbuf_sizep, utf8ness) \
201 my_langinfo_i(item, category##_INDEX_, locale, retbufp, \
202 retbuf_sizep, utf8ness)
fc46dd0d 203
8ef6e574
KW
204#ifdef USE_LOCALE
205
d188fe6d 206# ifdef DEBUGGING
1e78b5ed
KW
207# define setlocale_debug_string_i(index, locale, result) \
208 my_setlocale_debug_string_i(index, locale, result, __LINE__)
d188fe6d
KW
209# define setlocale_debug_string_c(category, locale, result) \
210 setlocale_debug_string_i(category##_INDEX_, locale, result)
211# define setlocale_debug_string_r(category, locale, result) \
212 setlocale_debug_string_i(get_category_index(category, locale), \
213 locale, result)
214# endif
215
497bfce4
KW
216# define toggle_locale_i(index, locale) \
217 S_toggle_locale_i(aTHX_ index, locale, __LINE__)
218# define toggle_locale_c(cat, locale) toggle_locale_i(cat##_INDEX_, locale)
219# define restore_toggled_locale_i(index, locale) \
220 S_restore_toggled_locale_i(aTHX_ index, locale, __LINE__)
221# define restore_toggled_locale_c(cat, locale) \
222 restore_toggled_locale_i(cat##_INDEX_, locale)
223
fe1c1494
KW
224/* Two parallel arrays indexed by our mapping of category numbers into small
225 * non-negative indexes; first the locale categories Perl uses on this system,
226 * used to do the inverse mapping. The second array is their names. These
227 * arrays are in mostly arbitrary order. */
e5f10d49 228
e129f2eb 229STATIC const int categories[] = {
e5f10d49 230
e5f10d49
KW
231# ifdef USE_LOCALE_CTYPE
232 LC_CTYPE,
233# endif
bfcd7d51
KW
234# ifdef USE_LOCALE_NUMERIC
235 LC_NUMERIC,
236# endif
e5f10d49
KW
237# ifdef USE_LOCALE_COLLATE
238 LC_COLLATE,
239# endif
240# ifdef USE_LOCALE_TIME
241 LC_TIME,
242# endif
243# ifdef USE_LOCALE_MESSAGES
244 LC_MESSAGES,
245# endif
246# ifdef USE_LOCALE_MONETARY
247 LC_MONETARY,
248# endif
9821811f
KW
249# ifdef USE_LOCALE_ADDRESS
250 LC_ADDRESS,
251# endif
252# ifdef USE_LOCALE_IDENTIFICATION
253 LC_IDENTIFICATION,
254# endif
255# ifdef USE_LOCALE_MEASUREMENT
256 LC_MEASUREMENT,
257# endif
258# ifdef USE_LOCALE_PAPER
259 LC_PAPER,
260# endif
261# ifdef USE_LOCALE_TELEPHONE
262 LC_TELEPHONE,
263# endif
36dbc955
KW
264# ifdef USE_LOCALE_SYNTAX
265 LC_SYNTAX,
266# endif
267# ifdef USE_LOCALE_TOD
268 LC_TOD,
269# endif
e5f10d49
KW
270# ifdef LC_ALL
271 LC_ALL,
272# endif
7e1cacef
KW
273
274 /* Placeholder as a precaution if code fails to check the return of
275 * get_category_index(), which returns this element to indicate an error */
276 -1
e5f10d49
KW
277};
278
279/* The top-most real element is LC_ALL */
280
e129f2eb 281STATIC const char * const category_names[] = {
e5f10d49 282
e5f10d49
KW
283# ifdef USE_LOCALE_CTYPE
284 "LC_CTYPE",
285# endif
bfcd7d51
KW
286# ifdef USE_LOCALE_NUMERIC
287 "LC_NUMERIC",
288# endif
e5f10d49
KW
289# ifdef USE_LOCALE_COLLATE
290 "LC_COLLATE",
291# endif
292# ifdef USE_LOCALE_TIME
293 "LC_TIME",
294# endif
295# ifdef USE_LOCALE_MESSAGES
296 "LC_MESSAGES",
297# endif
298# ifdef USE_LOCALE_MONETARY
299 "LC_MONETARY",
300# endif
9821811f
KW
301# ifdef USE_LOCALE_ADDRESS
302 "LC_ADDRESS",
303# endif
304# ifdef USE_LOCALE_IDENTIFICATION
305 "LC_IDENTIFICATION",
306# endif
307# ifdef USE_LOCALE_MEASUREMENT
308 "LC_MEASUREMENT",
309# endif
310# ifdef USE_LOCALE_PAPER
311 "LC_PAPER",
312# endif
313# ifdef USE_LOCALE_TELEPHONE
314 "LC_TELEPHONE",
315# endif
36dbc955
KW
316# ifdef USE_LOCALE_SYNTAX
317 "LC_SYNTAX",
318# endif
319# ifdef USE_LOCALE_TOD
320 "LC_TOD",
321# endif
e5f10d49
KW
322# ifdef LC_ALL
323 "LC_ALL",
324# endif
7e1cacef
KW
325
326 /* Placeholder as a precaution if code fails to check the return of
327 * get_category_index(), which returns this element to indicate an error */
328 NULL
329};
e5f10d49 330
c7dc1ea8
KW
331/* A few categories require additional setup when they are changed. This table
332 * points to the functions that do that setup */
333STATIC void (*update_functions[]) (pTHX_ const char *) = {
c7dc1ea8
KW
334# ifdef USE_LOCALE_CTYPE
335 S_new_ctype,
336# endif
bfcd7d51
KW
337# ifdef USE_LOCALE_NUMERIC
338 S_new_numeric,
339# endif
c7dc1ea8
KW
340# ifdef USE_LOCALE_COLLATE
341 S_new_collate,
342# endif
343# ifdef USE_LOCALE_TIME
344 NULL,
345# endif
346# ifdef USE_LOCALE_MESSAGES
347 NULL,
348# endif
349# ifdef USE_LOCALE_MONETARY
350 NULL,
351# endif
352# ifdef USE_LOCALE_ADDRESS
353 NULL,
354# endif
355# ifdef USE_LOCALE_IDENTIFICATION
356 NULL,
357# endif
358# ifdef USE_LOCALE_MEASUREMENT
359 NULL,
360# endif
361# ifdef USE_LOCALE_PAPER
362 NULL,
363# endif
364# ifdef USE_LOCALE_TELEPHONE
365 NULL,
366# endif
367# ifdef USE_LOCALE_SYNTAX
368 NULL,
369# endif
370# ifdef USE_LOCALE_TOD
371 NULL,
372# endif
373 /* No harm done to have this even without an LC_ALL */
374 S_new_LC_ALL,
375
376 /* Placeholder as a precaution if code fails to check the return of
377 * get_category_index(), which returns this element to indicate an error */
378 NULL
379};
380
e5f10d49
KW
381# ifdef LC_ALL
382
383 /* On systems with LC_ALL, it is kept in the highest index position. (-2
384 * to account for the final unused placeholder element.) */
385# define NOMINAL_LC_ALL_INDEX (C_ARRAY_LENGTH(categories) - 2)
e5f10d49
KW
386# else
387
388 /* On systems without LC_ALL, we pretend it is there, one beyond the real
389 * top element, hence in the unused placeholder element. */
390# define NOMINAL_LC_ALL_INDEX (C_ARRAY_LENGTH(categories) - 1)
e5f10d49
KW
391# endif
392
393/* Pretending there is an LC_ALL element just above allows us to avoid most
394 * special cases. Most loops through these arrays in the code below are
395 * written like 'for (i = 0; i < NOMINAL_LC_ALL_INDEX; i++)'. They will work
396 * on either type of system. But the code must be written to not access the
4fd76d49
KW
397 * element at 'LC_ALL_INDEX_' except on platforms that have it. This can be
398 * checked for at compile time by using the #define LC_ALL_INDEX_ which is only
948523db 399 * defined if we do have LC_ALL. */
e5f10d49 400
7e1cacef
KW
401STATIC unsigned int
402S_get_category_index(const int category, const char * locale)
403{
404 /* Given a category, return the equivalent internal index we generally use
405 * instead.
406 *
407 * 'locale' is for use in any generated diagnostics, and may be NULL
408 *
409 * Some sort of hash could be used instead of this loop, but the number of
410 * elements is so far at most 12 */
411
412 unsigned int i;
fef64ad9 413 const char * conditional_warn_text = "; can't set it to ";
7e1cacef
KW
414
415 PERL_ARGS_ASSERT_GET_CATEGORY_INDEX;
416
417# ifdef LC_ALL
418 for (i = 0; i <= LC_ALL_INDEX_; i++)
419# else
420 for (i = 0; i < NOMINAL_LC_ALL_INDEX; i++)
421# endif
422 {
423 if (category == categories[i]) {
424 dTHX_DEBUGGING;
425 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
b26541a9
KW
426 "index of category %d (%s) is %d\n",
427 category, category_names[i], i));
7e1cacef
KW
428 return i;
429 }
430 }
431
432 /* Here, we don't know about this category, so can't handle it. */
fef64ad9 433
7e1cacef 434 if (! locale) {
fef64ad9
KW
435 locale = "";
436 conditional_warn_text = "";
7e1cacef 437 }
fef64ad9
KW
438
439 /* diag_listed_as: Unknown locale category %d; can't set it to %s */
7e1cacef 440 Perl_warner_nocontext(packWARN(WARN_LOCALE),
fef64ad9
KW
441 "Unknown locale category %d%s%s",
442 category, conditional_warn_text, locale);
7e1cacef 443
01c966a3
KW
444# ifdef EINVAL
445
446 SETERRNO(EINVAL, LIB_INVARG);
447
448# endif
449
7e1cacef
KW
450 /* Return an out-of-bounds value */
451 return NOMINAL_LC_ALL_INDEX + 1;
452}
453
948523db 454#endif /* ifdef USE_LOCALE */
20ad7b23
KW
455
456void
457Perl_force_locale_unlock()
458{
459
460#if defined(USE_LOCALE_THREADS)
461
462 dTHX;
da1b79a9
KW
463
464 /* If recursively locked, clear all at once */
465 if (PL_locale_mutex_depth > 1) {
466 PL_locale_mutex_depth = 1;
467 }
468
469 if (PL_locale_mutex_depth > 0) {
470 LOCALE_UNLOCK_;
471 }
20ad7b23
KW
472
473#endif
474
475}
476
827a4a05
KW
477#ifdef USE_POSIX_2008_LOCALE
478
479STATIC locale_t
480S_use_curlocale_scratch(pTHX)
481{
482 /* This function is used to hide from the caller the case where the current
483 * locale_t object in POSIX 2008 is the global one, which is illegal in
484 * many of the P2008 API calls. This checks for that and, if necessary
485 * creates a proper P2008 object. Any prior object is deleted, as is any
486 * remaining object during global destruction. */
487
488 locale_t cur = uselocale((locale_t) 0);
489
490 if (cur != LC_GLOBAL_LOCALE) {
491 return cur;
492 }
493
494 if (PL_scratch_locale_obj) {
495 freelocale(PL_scratch_locale_obj);
496 }
497
498 PL_scratch_locale_obj = duplocale(LC_GLOBAL_LOCALE);
499 return PL_scratch_locale_obj;
500}
501
502#endif
8ef6e574 503
20ad7b23
KW
504void
505Perl_locale_panic(const char * msg,
506 const char * file_name,
507 const line_t line,
508 const int errnum)
509{
510 dTHX;
511
512 PERL_ARGS_ASSERT_LOCALE_PANIC;
513
514 force_locale_unlock();
515
516#ifdef USE_C_BACKTRACE
517 dump_c_backtrace(Perl_debug_log, 20, 1);
518#endif
519
520 /* diag_listed_as: panic: %s */
521 Perl_croak(aTHX_ "%s: %d: panic: %s; errno=%d\n",
522 file_name, line, msg, errnum);
523}
524
94202fb8
KW
525#define setlocale_failure_panic_c( \
526 cat, current, failed, caller_0_line, caller_1_line) \
527 setlocale_failure_panic_i(cat##_INDEX_, current, failed, \
528 caller_0_line, caller_1_line)
529
952fa04f 530/* posix_setlocale() presents a consistent POSIX-compliant interface to
fe1c1494 531 * setlocale(). Windows requres a customized base-level setlocale() */
e9bc6d6b 532#ifdef WIN32
952fa04f 533# define posix_setlocale(cat, locale) win32_setlocale(cat, locale)
e9bc6d6b 534#else
952fa04f 535# define posix_setlocale(cat, locale) ((const char *) setlocale(cat, locale))
e9bc6d6b
KW
536#endif
537
32899cfb
KW
538/* The next layer up is to catch vagaries and bugs in the libc setlocale return
539 * value */
540#ifdef stdize_locale
541# define stdized_setlocale(cat, locale) \
952fa04f 542 stdize_locale(cat, posix_setlocale(cat, locale), \
32899cfb
KW
543 &PL_stdize_locale_buf, &PL_stdize_locale_bufsize, __LINE__)
544#else
952fa04f 545# define stdized_setlocale(cat, locale) posix_setlocale(cat, locale)
32899cfb
KW
546#endif
547
952fa04f 548/* The next many lines form a layer above the close-to-the-metal 'posix'
32899cfb
KW
549 * and 'stdized' macros. They are used to present a uniform API to the rest of
550 * the code in this file in spite of the disparate underlying implementations.
551 * */
fe1c1494 552
58b390f2
KW
553#if (! defined(USE_LOCALE_THREADS) && ! defined(USE_POSIX_2008_LOCALE)) \
554 || ( defined(WIN32) && defined(USE_THREAD_SAFE_LOCALE))
e9bc6d6b 555
fe1c1494
KW
556/* For non-threaded perls (which we are not to use the POSIX 2008 API on), or a
557 * thread-safe Windows one in which threading is invisible to us, the added
5d4670cb
KW
558 * layer just expands to the base-level functions. See the introductory
559 * comments in this file for the meaning of the suffixes '_c', '_r', '_i'. */
fe1c1494 560
50ffb02b
KW
561# define setlocale_r(cat, locale) stdized_setlocale(cat, locale)
562# define setlocale_i(i, locale) setlocale_r(categories[i], locale)
563# define setlocale_c(cat, locale) setlocale_r(cat, locale)
3980cddb 564
94202fb8
KW
565# define void_setlocale_i(i, locale) \
566 STMT_START { \
952fa04f 567 if (! posix_setlocale(categories[i], locale)) { \
94202fb8
KW
568 setlocale_failure_panic_i(i, NULL, locale, __LINE__, 0); \
569 NOT_REACHED; /* NOTREACHED */ \
570 } \
571 } STMT_END
572# define void_setlocale_c(cat, locale) \
573 void_setlocale_i(cat##_INDEX_, locale)
574# define void_setlocale_r(cat, locale) \
575 void_setlocale_i(get_category_index(cat, locale), locale)
3980cddb 576
952fa04f 577# define bool_setlocale_r(cat, locale) cBOOL(posix_setlocale(cat, locale))
32899cfb
KW
578# define bool_setlocale_i(i, locale) \
579 bool_setlocale_c(categories[i], locale)
50ffb02b 580# define bool_setlocale_c(cat, locale) bool_setlocale_r(cat, locale)
a683a66a 581
22392ba2
KW
582/* All the querylocale...() forms return a mortalized copy. If you need
583 * something stable across calls, you need to savepv() the result yourself */
584
585# define querylocale_r(cat) mortalized_pv_copy(setlocale_r(cat, NULL))
50ffb02b 586# define querylocale_c(cat) querylocale_r(cat)
a683a66a
KW
587# define querylocale_i(i) querylocale_c(categories[i])
588
58b390f2
KW
589#elif defined(USE_LOCALE_THREADS) \
590 && ! defined(USE_THREAD_SAFE_LOCALE)
591
592 /* Here, there are threads, and there is no support for thread-safe
593 * operation. This is a dangerous situation, which perl is documented as
594 * not supporting, but it arises in practice. We can do a modicum of
595 * automatic mitigation by making sure there is a per-thread return from
596 * setlocale(), and that a mutex protects it from races */
597STATIC const char *
598S_less_dicey_setlocale_r(pTHX_ const int category, const char * locale)
599{
600 const char * retval;
601
602 PERL_ARGS_ASSERT_LESS_DICEY_SETLOCALE_R;
603
604 POSIX_SETLOCALE_LOCK;
605
606 retval = stdized_setlocale(category, locale);
607
608 /* We reuse PL_stdize_locale_buf as it doesn't conflict, but the call may
609 * already have used it, in which case we don't have to do anything further
610 * */
611 if (retval != PL_stdize_locale_buf) {
612 retval = save_to_buffer(retval,
613 &PL_stdize_locale_buf, &PL_stdize_locale_bufsize);
614 }
615
616 POSIX_SETLOCALE_UNLOCK;
617
618 return retval;
619}
620
621# define setlocale_r(cat, locale) less_dicey_setlocale_r(cat, locale)
622# define setlocale_c(cat, locale) setlocale_r(cat, locale)
623# define setlocale_i(i, locale) setlocale_r(categories[i], locale)
624
625# define querylocale_r(cat) mortalized_pv_copy(setlocale_r(cat, NULL))
626# define querylocale_c(cat) querylocale_r(cat)
627# define querylocale_i(i) querylocale_r(categories[i])
628
629STATIC void
630S_less_dicey_void_setlocale_i(pTHX_ const unsigned cat_index,
631 const char * locale,
632 const line_t line)
633{
634 PERL_ARGS_ASSERT_LESS_DICEY_VOID_SETLOCALE_I;
635
636 POSIX_SETLOCALE_LOCK;
637 if (! posix_setlocale(categories[cat_index], locale)) {
638 POSIX_SETLOCALE_UNLOCK;
639 setlocale_failure_panic_i(cat_index, NULL, locale, __LINE__, line);
640 }
641 POSIX_SETLOCALE_UNLOCK;
642}
643
644# define void_setlocale_i(i, locale) \
645 less_dicey_void_setlocale_i(i, locale, __LINE__)
646# define void_setlocale_c(cat, locale) \
647 void_setlocale_i(cat##_INDEX_, locale)
648# define void_setlocale_r(cat, locale) \
649 void_setlocale_i(get_category_index(cat, locale), locale)
650
ed4b2b05
KW
651# if 0 /* Not currently used */
652
58b390f2
KW
653STATIC bool
654S_less_dicey_bool_setlocale_r(pTHX_ const int cat, const char * locale)
655{
656 bool retval;
657
658 PERL_ARGS_ASSERT_LESS_DICEY_BOOL_SETLOCALE_R;
659
660 POSIX_SETLOCALE_LOCK;
661 retval = cBOOL(posix_setlocale(cat, locale));
662 POSIX_SETLOCALE_UNLOCK;
663
664 return retval;
665}
666
ed4b2b05 667# endif
58b390f2
KW
668# define bool_setlocale_r(cat, locale) \
669 less_dicey_bool_setlocale_r(cat, locale)
670# define bool_setlocale_i(i, locale) \
671 bool_setlocale_r(categories[i], locale)
672# define bool_setlocale_c(cat, locale) bool_setlocale_r(cat, locale)
673#else
e9bc6d6b 674
fe1c1494
KW
675/* Here, there is a completely different API to get thread-safe locales. We
676 * emulate the setlocale() API with our own function(s). setlocale categories,
677 * like LC_NUMERIC, are not valid here for the POSIX 2008 API. Instead, there
678 * are equivalents, like LC_NUMERIC_MASK, which we use instead, converting to
679 * by using get_category_index() followed by table lookup. */
680
1a63cba7
KW
681# define emulate_setlocale_c(cat, locale, recalc_LC_ALL, line) \
682 emulate_setlocale_i(cat##_INDEX_, locale, recalc_LC_ALL, line)
f2780a3f 683
20be777a 684 /* A wrapper for the macros below. */
11d8dd19 685# define common_emulate_setlocale(i, locale) \
20be777a 686 emulate_setlocale_i(i, locale, YES_RECALC_LC_ALL, __LINE__)
11d8dd19
KW
687
688# define setlocale_i(i, locale) common_emulate_setlocale(i, locale)
3980cddb
KW
689# define setlocale_c(cat, locale) setlocale_i(cat##_INDEX_, locale)
690# define setlocale_r(cat, locale) \
691 setlocale_i(get_category_index(cat, locale), locale)
692
693# define void_setlocale_i(i, locale) ((void) setlocale_i(i, locale))
694# define void_setlocale_c(cat, locale) \
695 void_setlocale_i(cat##_INDEX_, locale)
696# define void_setlocale_r(cat, locale) ((void) setlocale_r(cat, locale))
697
698# define bool_setlocale_i(i, locale) cBOOL(setlocale_i(i, locale))
699# define bool_setlocale_c(cat, locale) \
700 bool_setlocale_i(cat##_INDEX_, locale)
701# define bool_setlocale_r(cat, locale) cBOOL(setlocale_r(cat, locale))
e9bc6d6b 702
22392ba2 703# define querylocale_i(i) mortalized_pv_copy(my_querylocale_i(i))
a683a66a
KW
704# define querylocale_c(cat) querylocale_i(cat##_INDEX_)
705# define querylocale_r(cat) querylocale_i(get_category_index(cat,NULL))
706
5e395a7a 707# ifdef USE_QUERYLOCALE
d92b203c
KW
708# define isSINGLE_BIT_SET(mask) isPOWER_OF_2(mask)
709
710 /* This code used to think querylocale() was valid on LC_ALL. Make sure
711 * all instances of that have been removed */
712# define QUERYLOCALE_ASSERT(index) \
713 __ASSERT_(isSINGLE_BIT_SET(category_masks[index]))
67c13e2e
KW
714# if ! defined(HAS_QUERYLOCALE) && defined(_NL_LOCALE_NAME)
715# define querylocale_l(index, locale_obj) \
d92b203c 716 (QUERYLOCALE_ASSERT(index) \
22392ba2
KW
717 mortalized_pv_copy(nl_langinfo_l( \
718 _NL_LOCALE_NAME(categories[index]), locale_obj)))
67c13e2e
KW
719# else
720# define querylocale_l(index, locale_obj) \
22392ba2
KW
721 (QUERYLOCALE_ASSERT(index) \
722 mortalized_pv_copy(querylocale(category_masks[index], locale_obj)))
67c13e2e 723# endif
434e0069 724# endif
74a90e54
KW
725# if defined(__GLIBC__) && defined(USE_LOCALE_MESSAGES)
726# define HAS_GLIBC_LC_MESSAGES_BUG
45cef8fb 727# include <libintl.h>
45cef8fb
KW
728# endif
729
c7dc1ea8 730/* A fourth array, parallel to the ones above to map from category to its
e9bc6d6b 731 * equivalent mask */
e129f2eb 732STATIC const int category_masks[] = {
e9bc6d6b
KW
733# ifdef USE_LOCALE_CTYPE
734 LC_CTYPE_MASK,
735# endif
bfcd7d51
KW
736# ifdef USE_LOCALE_NUMERIC
737 LC_NUMERIC_MASK,
738# endif
e9bc6d6b
KW
739# ifdef USE_LOCALE_COLLATE
740 LC_COLLATE_MASK,
741# endif
742# ifdef USE_LOCALE_TIME
743 LC_TIME_MASK,
744# endif
745# ifdef USE_LOCALE_MESSAGES
746 LC_MESSAGES_MASK,
747# endif
748# ifdef USE_LOCALE_MONETARY
749 LC_MONETARY_MASK,
750# endif
751# ifdef USE_LOCALE_ADDRESS
752 LC_ADDRESS_MASK,
753# endif
754# ifdef USE_LOCALE_IDENTIFICATION
755 LC_IDENTIFICATION_MASK,
756# endif
757# ifdef USE_LOCALE_MEASUREMENT
758 LC_MEASUREMENT_MASK,
759# endif
760# ifdef USE_LOCALE_PAPER
761 LC_PAPER_MASK,
762# endif
763# ifdef USE_LOCALE_TELEPHONE
764 LC_TELEPHONE_MASK,
765# endif
36dbc955
KW
766# ifdef USE_LOCALE_SYNTAX
767 LC_SYNTAX_MASK,
768# endif
769# ifdef USE_LOCALE_TOD
770 LC_TOD_MASK,
771# endif
e9bc6d6b
KW
772 /* LC_ALL can't be turned off by a Configure
773 * option, and in Posix 2008, should always be
774 * here, so compile it in unconditionally.
775 * This could catch some glitches at compile
776 * time */
7e1cacef
KW
777 LC_ALL_MASK,
778
779 /* Placeholder as a precaution if code fails to check the return of
780 * get_category_index(), which returns this element to indicate an error */
781 0
fe1c1494 782};
e9bc6d6b 783
935c09ec
KW
784# define my_querylocale_c(cat) my_querylocale_i(cat##_INDEX_)
785
e9bc6d6b 786STATIC const char *
935c09ec 787S_my_querylocale_i(pTHX_ const unsigned int index)
e9bc6d6b 788{
935c09ec
KW
789 /* This function returns the name of the locale category given by the input
790 * index into our parallel tables of them.
e9bc6d6b
KW
791 *
792 * POSIX 2008, for some sick reason, chose not to provide a method to find
935c09ec
KW
793 * the category name of a locale, discarding a basic linguistic tenet that
794 * for any object, people will create a name for it. Some vendors have
795 * created a querylocale() function to do just that. This function is a
796 * lot simpler to implement on systems that have this. Otherwise, we have
797 * to keep track of what the locale has been set to, so that we can return
798 * its name so as to emulate setlocale(). It's also possible for C code in
799 * some library to change the locale without us knowing it, though as of
e9bc6d6b
KW
800 * September 2017, there are no occurrences in CPAN of uselocale(). Some
801 * libraries do use setlocale(), but that changes the global locale, and
935c09ec 802 * threads using per-thread locales will just ignore those changes. */
e9bc6d6b 803
4bcc858c 804 int category;
935c09ec 805 const locale_t cur_obj = uselocale((locale_t) 0);
d92b203c 806 const char * retval;
e9bc6d6b 807
935c09ec 808 PERL_ARGS_ASSERT_MY_QUERYLOCALE_I;
4bcc858c 809 assert(index <= NOMINAL_LC_ALL_INDEX);
e9bc6d6b 810
4bcc858c 811 category = categories[index];
e9bc6d6b 812
b26541a9
KW
813 DEBUG_Lv(PerlIO_printf(Perl_debug_log, "my_querylocale_i(%s) on %p\n",
814 category_names[index], cur_obj));
1118ec15
KW
815 if (cur_obj == LC_GLOBAL_LOCALE) {
816 POSIX_SETLOCALE_LOCK;
952fa04f 817 retval = posix_setlocale(category, NULL);
1118ec15
KW
818 POSIX_SETLOCALE_UNLOCK;
819 }
d92b203c 820 else {
e9bc6d6b 821
67c13e2e 822# ifdef USE_QUERYLOCALE
e9bc6d6b 823
d92b203c
KW
824 /* We don't currently keep records when there is querylocale(), so have
825 * to get it anew each time */
826 retval = (index == LC_ALL_INDEX_)
827 ? calculate_LC_ALL(cur_obj)
828 : querylocale_l(index, cur_obj);
e9bc6d6b 829
d2b24094 830# else
ecdda939 831
fe3d11ab
KW
832 /* But we do have up-to-date values when we keep our own records
833 * (except some times in initialization, where we get the value from
834 * the system. */
835 if (PL_curlocales[index] == NULL) {
836 retval = stdized_setlocale(category, NULL);
837 PL_curlocales[index] = savepv(retval);
838 }
839 else {
840 retval = PL_curlocales[index];
841 }
e9bc6d6b 842
d92b203c 843# endif
e9bc6d6b 844
e9bc6d6b
KW
845 }
846
94d521c0 847 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
b26541a9
KW
848 "my_querylocale_i(%s) returning '%s'\n",
849 category_names[index], retval));
c8aea264 850 assert(strNE(retval, ""));
d92b203c 851 return retval;
935c09ec
KW
852}
853
11d8dd19
KW
854# ifdef USE_PL_CURLOCALES
855
856STATIC const char *
857S_update_PL_curlocales_i(pTHX_
858 const unsigned int index,
859 const char * new_locale,
20be777a 860 recalc_lc_all_t recalc_LC_ALL)
11d8dd19
KW
861{
862 /* This is a helper function for emulate_setlocale_i(), mostly used to
863 * make that function easier to read. */
864
865 PERL_ARGS_ASSERT_UPDATE_PL_CURLOCALES_I;
866 assert(index <= NOMINAL_LC_ALL_INDEX);
867
868 if (index == LC_ALL_INDEX_) {
869 unsigned int i;
870
871 /* For LC_ALL, we change all individual categories to correspond */
872 /* PL_curlocales is a parallel array, so has same
873 * length as 'categories' */
874 for (i = 0; i < LC_ALL_INDEX_; i++) {
875 Safefree(PL_curlocales[i]);
876 PL_curlocales[i] = savepv(new_locale);
877 }
878
20be777a 879 recalc_LC_ALL = YES_RECALC_LC_ALL;
11d8dd19
KW
880 }
881 else {
882
883 /* Update the single category's record */
884 Safefree(PL_curlocales[index]);
885 PL_curlocales[index] = savepv(new_locale);
886
20be777a
KW
887 if (recalc_LC_ALL == RECALCULATE_LC_ALL_ON_FINAL_INTERATION) {
888 recalc_LC_ALL = (index == NOMINAL_LC_ALL_INDEX - 1)
889 ? YES_RECALC_LC_ALL
890 : DONT_RECALC_LC_ALL;
11d8dd19
KW
891 }
892 }
893
20be777a 894 if (recalc_LC_ALL == YES_RECALC_LC_ALL) {
11d8dd19
KW
895 Safefree(PL_curlocales[LC_ALL_INDEX_]);
896 PL_curlocales[LC_ALL_INDEX_] =
897 savepv(calculate_LC_ALL(PL_curlocales));
898 }
899
900 return PL_curlocales[index];
901}
f2780a3f 902
0d17a9c8
KW
903# endif /* Need PL_curlocales[] */
904
f2780a3f 905STATIC const char *
1a63cba7 906S_setlocale_from_aggregate_LC_ALL(pTHX_ const char * locale, const line_t line)
f2780a3f
KW
907{
908 /* This function parses the value of the LC_ALL locale, assuming glibc
909 * syntax, and sets each individual category on the system to the proper
910 * value.
911 *
912 * This is likely to only ever be called from one place, so exists to make
913 * the calling function easier to read by moving this ancillary code out of
914 * the main line.
915 *
916 * The locale for each category is independent of the other categories.
917 * Often, they are all the same, but certainly not always. Perl, in fact,
918 * usually keeps LC_NUMERIC in the C locale, regardless of the underlying
919 * locale. LC_ALL has to be able to represent the case of when there are
920 * varying locales. Platforms have differing ways of representing this.
921 * Because of this, the code in this file goes to lengths to avoid the
922 * issue, generally looping over the component categories instead of
923 * referring to them in the aggregate, wherever possible. However, there
924 * are cases where we have to parse our own constructed aggregates, which use
925 * the glibc syntax. */
926
22392ba2 927 const char * locale_on_entry = querylocale_c(LC_ALL);
f2780a3f
KW
928
929 PERL_ARGS_ASSERT_SETLOCALE_FROM_AGGREGATE_LC_ALL;
930
931 /* If the string that gives what to set doesn't include all categories,
932 * the omitted ones get set to "C". To get this behavior, first set
933 * all the individual categories to "C", and override the furnished
11d8dd19
KW
934 * ones below. FALSE => No need to recalculate LC_ALL, as this is a
935 * temporary state */
20be777a 936 if (! emulate_setlocale_c(LC_ALL, "C", DONT_RECALC_LC_ALL, line)) {
0998f1fd 937 setlocale_failure_panic_c(LC_ALL, locale_on_entry,
1a63cba7 938 "C", __LINE__, line);
0998f1fd 939 NOT_REACHED; /* NOTREACHED */
f2780a3f
KW
940 }
941
2ee51ed7
KW
942 const char * s = locale;
943 const char * e = locale + strlen(locale);
f2780a3f 944 while (s < e) {
2ee51ed7 945 const char * p = s;
f2780a3f
KW
946
947 /* Parse through the category */
948 while (isWORDCHAR(*p)) {
949 p++;
950 }
2ee51ed7
KW
951
952 const char * category_end = p;
f2780a3f
KW
953
954 if (*p++ != '=') {
20ad7b23
KW
955 locale_panic_(Perl_form(aTHX_
956 "Unexpected character in locale category name '%02X", *(p-1)));
f2780a3f
KW
957 }
958
959 /* Parse through the locale name */
2ee51ed7 960 const char * name_start = p;
f2780a3f
KW
961 while (p < e && *p != ';') {
962 if (! isGRAPH(*p)) {
20ad7b23
KW
963 locale_panic_(Perl_form(aTHX_
964 "Unexpected character in locale name '%02X", *p));
f2780a3f
KW
965 }
966 p++;
967 }
2ee51ed7
KW
968
969 const char * name_end = p;
f2780a3f
KW
970
971 /* Space past the semi-colon */
972 if (p < e) {
973 p++;
974 }
975
976 /* Find the index of the category name in our lists */
2ee51ed7 977 for (PERL_UINT_FAST8_T i = 0; i < LC_ALL_INDEX_; i++) {
f2780a3f 978
2ee51ed7
KW
979 /* Keep going if this index doesn't point to the category being
980 * parsed. The strnNE() avoids a Perl_form(), but would fail if
981 * ever a category name could be a substring of another one, e.g.,
982 * if there were a "LC_TIME_DATE" */
f2780a3f
KW
983 if strnNE(s, category_names[i], category_end - s) {
984 continue;
985 }
986
2ee51ed7
KW
987 /* Here i points to the category being parsed. Now isolate the
988 * locale it is being changed to */
989 const char * individ_locale = Perl_form(aTHX_ "%.*s",
f2780a3f 990 (int) (name_end - name_start), name_start);
2ee51ed7 991
11d8dd19
KW
992 /* And do the change. FALSE => Don't recalculate LC_ALL; we'll do
993 * it ourselves after the loop */
20be777a
KW
994 if (! emulate_setlocale_i(i, individ_locale,
995 DONT_RECALC_LC_ALL, line))
996 {
2ee51ed7 997
11d8dd19 998 /* But if we have to back out, do fix up LC_ALL */
20be777a
KW
999 if (! emulate_setlocale_c(LC_ALL, locale_on_entry,
1000 YES_RECALC_LC_ALL, line))
1a63cba7 1001 {
f2780a3f 1002 setlocale_failure_panic_i(i, individ_locale,
1a63cba7 1003 locale, __LINE__, line);
f2780a3f
KW
1004 NOT_REACHED; /* NOTREACHED */
1005 }
2ee51ed7 1006
2ee51ed7
KW
1007 /* Reverting to the entry value succeeded, but the operation
1008 * failed to go to the requested locale. */
f2780a3f
KW
1009 return NULL;
1010 }
1011
2ee51ed7
KW
1012 /* Found and handled the desired category. Quit the inner loop to
1013 * try the next category */
f2780a3f
KW
1014 break;
1015 }
1016
2ee51ed7 1017 /* Finished with this category; iterate to the next one in the input */
f2780a3f
KW
1018 s = p;
1019 }
1020
0d17a9c8
KW
1021# ifdef USE_PL_CURLOCALES
1022
2ee51ed7
KW
1023 /* Here we have set all the individual categories. Update the LC_ALL entry
1024 * as well. We can't just use the input 'locale' as the value may omit
1025 * categories whose locale is 'C'. khw thinks it's better to store a
1026 * complete LC_ALL. So calculate it. */
1027 const char * retval = savepv(calculate_LC_ALL(PL_curlocales));
f2780a3f
KW
1028 Safefree(PL_curlocales[LC_ALL_INDEX_]);
1029 PL_curlocales[LC_ALL_INDEX_] = retval;
1030
0d17a9c8
KW
1031# else
1032
1033 const char * retval = querylocale_c(LC_ALL);
1034
1035# endif
1036
f2780a3f
KW
1037 return retval;
1038}
1039
0d17a9c8
KW
1040# ifndef USE_QUERYLOCALE
1041
9444acab
KW
1042STATIC const char *
1043S_find_locale_from_environment(pTHX_ const unsigned int index)
1044{
1045 /* On systems without querylocale(), it is problematic getting the results
1046 * of the POSIX 2008 equivalent of setlocale(category, "") (which gets the
1047 * locale from the environment).
1048 *
1049 * To ensure that we know exactly what those values are, we do the setting
1050 * ourselves, using the documented algorithm (assuming the documentation is
1051 * correct) rather than use "" as the locale. This will lead to results
1052 * that differ from native behavior if the native behavior differs from the
1053 * standard documented value, but khw believes it is better to know what's
1054 * going on, even if different from native, than to just guess.
1055 *
1056 * Another option would be, in a critical section, to save the global
1057 * locale's current value, and do a straight setlocale(LC_ALL, ""). That
1058 * would return our desired values, destroying the global locale's, which
1059 * we would then restore. But that could cause races with any other thread
1060 * that is using the global locale and isn't using the mutex. And, the
1061 * only reason someone would have done that is because they are calling a
1062 * library function, like in gtk, that calls setlocale(), and which can't
1063 * be changed to use the mutex. That wouldn't be a problem if this were to
1064 * be done before any threads had switched, say during perl construction
1065 * time. But this code would still be needed for the general case. */
1066
1067 const char * default_name;
1068 unsigned int i;
1069 const char * locale_names[LC_ALL_INDEX_];
1070
1071 /* We rely on PerlEnv_getenv() returning a mortalized copy */
1072 const char * const lc_all = PerlEnv_getenv("LC_ALL");
1073
1074 /* Use any "LC_ALL" environment variable, as it overrides everything
1075 * else. */
1076 if (lc_all && strNE(lc_all, "")) {
1077 return lc_all;
1078 }
1079
1080 /* Otherwise, we need to dig deeper. Unless overridden, the default is
1081 * the LANG environment variable; "C" if it doesn't exist. */
1082 default_name = PerlEnv_getenv("LANG");
1083 if (! default_name || strEQ(default_name, "")) {
1084 default_name = "C";
1085 }
1086
1087 /* If setting an individual category, use its corresponding value found in
1088 * the environment, if any; otherwise use the default we already
1089 * calculated. */
1090 if (index != LC_ALL_INDEX_) {
1091 const char * const new_value = PerlEnv_getenv(category_names[index]);
1092
1093 return (new_value && strNE(new_value, ""))
1094 ? new_value
1095 : default_name;
1096 }
1097
1098 /* Here, we are getting LC_ALL. Any categories that don't have a
1099 * corresponding environment variable set should be set to 'default_name'
1100 *
1101 * Simply find the values for all categories, and call the function to
1102 * compute LC_ALL. */
1103 for (i = 0; i < LC_ALL_INDEX_; i++) {
1104 const char * const env_override = PerlEnv_getenv(category_names[i]);
1105
1106 locale_names[i] = (env_override && strNE(env_override, ""))
1107 ? env_override
1108 : default_name;
1109
1110 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
b26541a9
KW
1111 "find_locale_from_environment i=%d, name=%s, locale=%s\n",
1112 i, category_names[i], locale_names[i]));
9444acab
KW
1113 }
1114
1115 return calculate_LC_ALL(locale_names);
1116}
1117
0d17a9c8 1118# endif
f2780a3f 1119
935c09ec 1120STATIC const char *
11d8dd19 1121S_emulate_setlocale_i(pTHX_
20be777a
KW
1122
1123 /* Our internal index of the 'category' setlocale is
1124 called with */
1125 const unsigned int index,
1126
1127 const char * new_locale, /* The locale to set the category to */
1128 const recalc_lc_all_t recalc_LC_ALL, /* Explained below */
1129 const line_t line /* Called from this line number */
1130 )
935c09ec 1131{
66a28734
KW
1132 PERL_ARGS_ASSERT_EMULATE_SETLOCALE_I;
1133 assert(index <= NOMINAL_LC_ALL_INDEX);
1134
935c09ec
KW
1135 /* This function effectively performs a setlocale() on just the current
1136 * thread; thus it is thread-safe. It does this by using the POSIX 2008
1137 * locale functions to emulate the behavior of setlocale(). Similar to
1138 * regular setlocale(), the return from this function points to memory that
1139 * can be overwritten by other system calls, so needs to be copied
1140 * immediately if you need to retain it. The difference here is that
1141 * system calls besides another setlocale() can overwrite it.
1142 *
1143 * By doing this, most locale-sensitive functions become thread-safe. The
1144 * exceptions are mostly those that return a pointer to static memory.
1145 *
20be777a
KW
1146 * This function may be called in a tight loop that iterates over all
1147 * categories. Because LC_ALL is not a "real" category, but merely the sum
1148 * of all the other ones, such loops don't include LC_ALL. On systems that
1149 * have querylocale() or similar, the current LC_ALL value is immediately
1150 * retrievable; on systems lacking that feature, we have to keep track of
1151 * LC_ALL ourselves. We could do that on each iteration, only to throw it
1152 * away on the next, but the calculation is more than a trivial amount of
1153 * work. Instead, the 'recalc_LC_ALL' parameter is set to
1154 * RECALCULATE_LC_ALL_ON_FINAL_INTERATION to only do the calculation once.
1155 * This function calls itself recursively in such a loop.
11d8dd19 1156 *
20be777a
KW
1157 * When not in such a loop, the parameter is set to the other enum values
1158 * DONT_RECALC_LC_ALL or YES_RECALC_LC_ALL. */
935c09ec 1159
66a28734
KW
1160 int mask = category_masks[index];
1161 const locale_t entry_obj = uselocale((locale_t) 0);
1162 const char * locale_on_entry = querylocale_i(index);
935c09ec
KW
1163
1164 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
66a28734
KW
1165 "emulate_setlocale_i input=%d (%s), mask=0x%x,"
1166 " new locale=\"%s\", current locale=\"%s\","
1167 "index=%d, object=%p\n",
8544fe16 1168 categories[index], category_names[index], mask,
66a28734
KW
1169 ((new_locale == NULL) ? "(nil)" : new_locale),
1170 locale_on_entry, index, entry_obj));
1171
1172 /* Return the already-calculated info if just querying what the existing
1173 * locale is */
9ec822ee 1174 if (new_locale == NULL) {
66a28734 1175 return locale_on_entry;
935c09ec 1176 }
e9bc6d6b 1177
19e550d7
KW
1178 /* Here, trying to change the locale, but it is a no-op if the new boss is
1179 * the same as the old boss. Except this routine is called when converting
1180 * from the global locale, so in that case we will create a per-thread
1181 * locale below (with the current values). Bitter experience also
1182 * indicates that newlocale() can free up the basis locale memory if we
1183 * call it with the new and old being the same. */
1184 if ( entry_obj != LC_GLOBAL_LOCALE
1185 && locale_on_entry
1186 && strEQ(new_locale, locale_on_entry))
1187 {
1188 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
1189 "(%" LINE_Tf "): emulate_setlocale_i"
1190 " no-op to change to what it already was\n",
1191 line));
42192b39
KW
1192
1193# ifdef USE_PL_CURLOCALES
1194
1195 /* On the final iteration of a loop that needs to recalculate LC_ALL, do
1196 * so. If no iteration changed anything, LC_ALL also doesn't change,
1197 * but khw believes the complexity needed to keep track of that isn't
1198 * worth it. */
1199 if (UNLIKELY( recalc_LC_ALL == RECALCULATE_LC_ALL_ON_FINAL_INTERATION
1200 && index == NOMINAL_LC_ALL_INDEX - 1))
1201 {
1202 Safefree(PL_curlocales[LC_ALL_INDEX_]);
1203 PL_curlocales[LC_ALL_INDEX_] =
1204 savepv(calculate_LC_ALL(PL_curlocales));
1205 }
1206
1207# endif
1208
19e550d7
KW
1209 return locale_on_entry;
1210 }
1211
67c13e2e 1212# ifndef USE_QUERYLOCALE
e9bc6d6b 1213
0d17a9c8
KW
1214 /* Without a querylocale() mechanism, we have to figure out ourselves what
1215 * happens with setting a locale to "" */
9ec822ee 1216 if (strEQ(new_locale, "")) {
9444acab
KW
1217 new_locale = find_locale_from_environment(index);
1218 }
c61e7ca7 1219
0d17a9c8
KW
1220# endif
1221
1222 /* So far, it has worked that a semi-colon in the locale name means that
1223 * the category is LC_ALL and it subsumes categories which don't all have
1224 * the same locale. This is the glibc syntax. */
c61e7ca7 1225 if (strchr(new_locale, ';')) {
0d17a9c8 1226 assert(index == LC_ALL_INDEX_);
1a63cba7 1227 return setlocale_from_aggregate_LC_ALL(new_locale, line);
f2780a3f 1228 }
e9bc6d6b 1229
74a90e54
KW
1230# ifdef HAS_GLIBC_LC_MESSAGES_BUG
1231
1232 /* For this bug, if the LC_MESSAGES locale changes, we have to do an
1233 * expensive workaround. Save the current value so we can later determine
1234 * if it changed. */
66a28734 1235 const char * old_messages_locale = NULL;
74a90e54
KW
1236 if ( (index == LC_MESSAGES_INDEX_ || index == LC_ALL_INDEX_)
1237 && LIKELY(PL_phase != PERL_PHASE_CONSTRUCT))
1238 {
22392ba2 1239 old_messages_locale = querylocale_c(LC_MESSAGES);
74a90e54
KW
1240 }
1241
74a90e54 1242# endif
e9bc6d6b 1243
f8115d60
KW
1244 assert(PL_C_locale_obj);
1245
2ee51ed7
KW
1246 /* Now ready to switch to the input 'new_locale' */
1247
f8115d60 1248 /* Switching locales generally entails freeing the current one's space (at
fe1c1494
KW
1249 * the C library's discretion), hence we can't be using that locale at the
1250 * time of the switch (this wasn't obvious to khw from the man pages). So
c2c9cf19
KW
1251 * switch to a known locale object that we don't otherwise mess with. */
1252 if (! uselocale(PL_C_locale_obj)) {
f8115d60 1253
947b444c
KW
1254 /* Not being able to change to the C locale is severe; don't keep
1255 * going. */
1256 setlocale_failure_panic_i(index, locale_on_entry, "C", __LINE__, line);
1257 NOT_REACHED; /* NOTREACHED */
f8115d60
KW
1258 }
1259
94d521c0 1260 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
b26541a9
KW
1261 "(%" LINE_Tf "): emulate_setlocale_i now using C"
1262 " object=%p\n", line, PL_C_locale_obj));
f8115d60 1263
66a28734
KW
1264 locale_t new_obj;
1265
2ee51ed7
KW
1266 /* We created a (never changing) object at start-up for LC_ALL being in the
1267 * C locale. If this call is to switch to LC_ALL=>C, simply use that
1268 * object. But in fact, we already have switched to it just above, in
1269 * preparation for the general case. Since we're already there, no need to
1270 * do further switching. */
9ec822ee 1271 if (mask == LC_ALL_MASK && isNAME_C_OR_POSIX(new_locale)) {
b26541a9
KW
1272 DEBUG_Lv(PerlIO_printf(Perl_debug_log, "(%" LINE_Tf "):"
1273 " emulate_setlocale_i will stay"
1274 " in C object\n", line));
70bd6bc8
KW
1275 new_obj = PL_C_locale_obj;
1276
2ee51ed7 1277 /* And free the old object if it isn't a special one */
c2c9cf19
KW
1278 if (entry_obj != LC_GLOBAL_LOCALE && entry_obj != PL_C_locale_obj) {
1279 freelocale(entry_obj);
70bd6bc8
KW
1280 }
1281 }
2ee51ed7 1282 else { /* Here is the general case, not to LC_ALL=>C */
43c2ca2c 1283 locale_t basis_obj = entry_obj;
5ce9bebe 1284
43c2ca2c
KW
1285 /* Specially handle two objects */
1286 if (basis_obj == LC_GLOBAL_LOCALE || basis_obj == PL_C_locale_obj) {
1287
1288 /* For these two objects, we make duplicates to hand to newlocale()
1289 * below. For LC_GLOBAL_LOCALE, this is because newlocale()
1290 * doesn't necessarily accept it as input (the results are
1291 * undefined). For PL_C_locale_obj, it is so that it never gets
1292 * modified, as otherwise newlocale() is free to do so */
1293 basis_obj = duplocale(entry_obj);
1294 if (! basis_obj) {
b5016ef4
KW
1295 locale_panic_(Perl_form(aTHX_ "(%" LINE_Tf "): duplocale failed",
1296 line));
43c2ca2c 1297 NOT_REACHED; /* NOTREACHED */
5ce9bebe 1298 }
43c2ca2c
KW
1299
1300 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
1301 "(%" LINE_Tf "): emulate_setlocale_i"
1302 " created %p by duping the input\n",
1303 line, basis_obj));
b22e9937 1304 }
f8115d60 1305
b22e9937 1306 /* Ready to create a new locale by modification of the exising one */
43c2ca2c 1307 new_obj = newlocale(mask, new_locale, basis_obj);
e9bc6d6b 1308
b22e9937 1309 if (! new_obj) {
94d521c0 1310 DEBUG_L(PerlIO_printf(Perl_debug_log,
b26541a9 1311 " (%" LINE_Tf "): emulate_setlocale_i"
43c2ca2c
KW
1312 " creating new object from %p failed:"
1313 " errno=%d\n",
1314 line, basis_obj, GET_ERRNO));
e9bc6d6b 1315
2ee51ed7
KW
1316 /* Failed. Likely this is because the proposed new locale isn't
1317 * valid on this system. But we earlier switched to the LC_ALL=>C
1318 * locale in anticipation of it succeeding, Now have to switch
1319 * back to the state upon entry */
43c2ca2c 1320 if (! uselocale(entry_obj)) {
947b444c
KW
1321 setlocale_failure_panic_i(index, "switching back to",
1322 locale_on_entry, __LINE__, line);
1323 NOT_REACHED; /* NOTREACHED */
b22e9937 1324 }
5ce9bebe
KW
1325
1326# ifdef USE_PL_CURLOCALES
1327
c2c9cf19 1328 if (entry_obj == LC_GLOBAL_LOCALE) {
1118ec15 1329
5ce9bebe
KW
1330 /* Here, we are back in the global locale. We may never have
1331 * set PL_curlocales. If the locale change had succeeded, the
1332 * code would have then set them up, but since it didn't, do so
1333 * here. khw isn't sure if this prevents some issues or not,
1334 * but tis is defensive coding. The system setlocale() returns
f14a9aff 1335 * the desired information. This will calculate LC_ALL's entry
20be777a 1336 * only on the final iteration */
1118ec15 1337 POSIX_SETLOCALE_LOCK;
f14a9aff 1338 for (PERL_UINT_FAST8_T i = 0; i < NOMINAL_LC_ALL_INDEX; i++) {
5ce9bebe 1339 update_PL_curlocales_i(i,
952fa04f 1340 posix_setlocale(categories[i], NULL),
20be777a 1341 RECALCULATE_LC_ALL_ON_FINAL_INTERATION);
5ce9bebe 1342 }
1118ec15 1343 POSIX_SETLOCALE_UNLOCK;
5ce9bebe
KW
1344 }
1345# endif
1346
b22e9937 1347 return NULL;
e9bc6d6b 1348 }
e9bc6d6b 1349
43c2ca2c
KW
1350 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
1351 "(%" LINE_Tf "): emulate_setlocale_i created %p"
1352 " while freeing %p\n", line, new_obj, basis_obj));
e9bc6d6b 1353
2ee51ed7
KW
1354 /* Here, successfully created an object representing the desired
1355 * locale; now switch into it */
b22e9937 1356 if (! uselocale(new_obj)) {
b22e9937 1357 freelocale(new_obj);
947b444c
KW
1358 locale_panic_(Perl_form(aTHX_ "(%" LINE_Tf "): emulate_setlocale_i"
1359 " switching into new locale failed",
1360 line));
e9bc6d6b 1361 }
70bd6bc8 1362 }
e9bc6d6b 1363
9ec822ee 1364 /* Here, we are using 'new_obj' which matches the input 'new_locale'. */
94d521c0 1365 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
b26541a9 1366 "(%" LINE_Tf "): emulate_setlocale_i now using %p\n", line, new_obj));
e9bc6d6b
KW
1367
1368 /* We are done, except for updating our records (if the system doesn't keep
1369 * them) and in the case of locale "", we don't actually know what the
1370 * locale that got switched to is, as it came from the environment. So
1371 * have to find it */
1372
67c13e2e 1373# ifdef USE_QUERYLOCALE
e9bc6d6b 1374
9ec822ee
KW
1375 if (strEQ(new_locale, "")) {
1376 new_locale = querylocale_i(index);
e9bc6d6b
KW
1377 }
1378
11d8dd19 1379 PERL_UNUSED_ARG(recalc_LC_ALL);
e9bc6d6b 1380
11d8dd19 1381# else
d92b203c 1382
11d8dd19 1383 new_locale = update_PL_curlocales_i(index, new_locale, recalc_LC_ALL);
74a90e54
KW
1384
1385# endif
1386# ifdef HAS_GLIBC_LC_MESSAGES_BUG
1387
1388 /* Invalidate the glibc cache of loaded translations if the locale has changed,
1389 * see [perl #134264] */
1390 if (old_messages_locale) {
1391 if (strNE(old_messages_locale, my_querylocale_c(LC_MESSAGES))) {
1392 textdomain(textdomain(NULL));
1393 }
e9bc6d6b
KW
1394 }
1395
1396# endif
1397
9ec822ee 1398 return new_locale;
e9bc6d6b
KW
1399}
1400
3980cddb
KW
1401#endif /* End of the various implementations of the setlocale and
1402 querylocale macros used in the remainder of this program */
e9bc6d6b 1403
d36adde0 1404#ifdef USE_LOCALE
837ce802 1405
32899cfb
KW
1406/* So far, the locale strings returned by modern 2008-compliant systems have
1407 * been fine */
1408
1409STATIC const char *
1410S_stdize_locale(pTHX_ const int category,
1411 const char *input_locale,
1412 const char **buf,
1413 Size_t *buf_size,
1414 const line_t caller_line)
54ddc3dd 1415{
32899cfb
KW
1416 /* The return value of setlocale() is opaque, but is required to be usable
1417 * as input to a future setlocale() to create the same state.
1418 * Unfortunately not all systems are compliant. But most often they are of
1419 * a very restricted set of forms that this file has been coded to expect.
54ddc3dd 1420 *
32899cfb 1421 * There are some outliers, though, that this function tries to tame:
54ddc3dd 1422 *
32899cfb
KW
1423 * 1) A new-line. This function chomps any \n characters
1424 * 2) foo=bar. 'bar' is what is generally meant, and the foo= part is
1425 * stripped. This form is legal for LC_ALL. When found in
1426 * that category group, the function calls itself
1427 * recursively on each possible component category to make
1428 * sure the individual categories are ok.
1429 *
1430 * If no changes to the input were made, it is returned; otherwise the
1431 * changed version is stored into memory at *buf, with *buf_size set to its
1432 * new value, and *buf is returned.
1433 */
54ddc3dd 1434
32899cfb
KW
1435 const char * first_bad;
1436 const char * retval;
54ddc3dd
KW
1437
1438 PERL_ARGS_ASSERT_STDIZE_LOCALE;
1439
32899cfb
KW
1440 if (input_locale == NULL) {
1441 return NULL;
1442 }
1443
1444 first_bad = strpbrk(input_locale, "=\n");
1445
1446 /* Most likely, there isn't a problem with the input */
1447 if (LIKELY(! first_bad)) {
1448 return input_locale;
1449 }
1450
1451# ifdef LC_ALL
1452
1453 /* But if there is, and the category is LC_ALL, we have to look at each
1454 * component category */
1455 if (category == LC_ALL) {
1456 const char * individ_locales[LC_ALL_INDEX_];
1457 bool made_changes = FALSE;
1458 unsigned int i;
1459
1460 for (i = 0; i < NOMINAL_LC_ALL_INDEX; i++) {
1461 Size_t this_size = 0;
1462 individ_locales[i] = stdize_locale(categories[i],
952fa04f
KW
1463 posix_setlocale(categories[i],
1464 NULL),
32899cfb
KW
1465 &individ_locales[i],
1466 &this_size,
1467 caller_line);
1468
1469 /* If the size didn't change, it means this category did not have
1470 * to be adjusted, and individ_locales[i] points to the buffer
952fa04f 1471 * returned by posix_setlocale(); we have to copy that before
32899cfb
KW
1472 * it's called again in the next iteration */
1473 if (this_size == 0) {
1474 individ_locales[i] = savepv(individ_locales[i]);
54ddc3dd 1475 }
32899cfb
KW
1476 else {
1477 made_changes = TRUE;
1478 }
1479 }
1480
1481 /* If all the individual categories were ok as-is, this was a false
1482 * alarm. We must have seen an '=' which was a legal occurrence in
1483 * this combination locale */
1484 if (! made_changes) {
1485 retval = input_locale; /* The input can be returned unchanged */
1486 }
1487 else {
0b8fa716 1488 retval = save_to_buffer(querylocale_c(LC_ALL), buf, buf_size);
32899cfb
KW
1489 }
1490
1491 for (i = 0; i < NOMINAL_LC_ALL_INDEX; i++) {
1492 Safefree(individ_locales[i]);
54ddc3dd 1493 }
32899cfb
KW
1494
1495 return retval;
54ddc3dd
KW
1496 }
1497
c117e8b0
KW
1498# else /* else no LC_ALL */
1499
1500 PERL_UNUSED_ARG(category);
1501 PERL_UNUSED_ARG(caller_line);
1502
32899cfb
KW
1503# endif
1504
1505 /* Here, there was a problem in an individual category. This means that at
1506 * least one adjustment will be necessary. Create a modifiable copy */
0b8fa716 1507 retval = save_to_buffer(input_locale, buf, buf_size);
32899cfb
KW
1508
1509 if (*first_bad != '=') {
54ddc3dd 1510
32899cfb
KW
1511 /* Translate the found position into terms of the copy */
1512 first_bad = retval + (first_bad - input_locale);
1513 }
1514 else { /* An '=' */
1515
1516 /* It is unlikely that the return is so screwed-up that it contains
1517 * multiple equals signs, but handle that case by stripping all of
1518 * them. */
1519 const char * final_equals = strrchr(retval, '=');
1520
1521 /* The length passed here causes the move to include the terminating
1522 * NUL */
1523 Move(final_equals + 1, retval, strlen(final_equals), char);
1524
1525 /* See if there are additional problems; if not, we're good to return.
1526 * */
1527 first_bad = strpbrk(retval, "\n");
1528
1529 if (! first_bad) {
1530 return retval;
1531 }
1532 }
1533
1534 /* Here, the problem must be a \n. Get rid of it and what follows.
1535 * (Originally, only a trailing \n was stripped. Unsure what to do if not
1536 * trailing) */
1537 *((char *) first_bad) = '\0';
1538 return retval;
54ddc3dd
KW
1539}
1540
2855c76a 1541#if defined(USE_POSIX_2008_LOCALE) || ! defined(LC_ALL)
32899cfb 1542
d92b203c
KW
1543STATIC
1544const char *
1545
1546# ifdef USE_QUERYLOCALE
1547S_calculate_LC_ALL(pTHX_ const locale_t cur_obj)
1548# else
1549S_calculate_LC_ALL(pTHX_ const char ** individ_locales)
1550# endif
1551
1552{
1553 /* For POSIX 2008, we have to figure out LC_ALL ourselves when needed.
1554 * querylocale(), on systems that have it, doesn't tend to work for LC_ALL.
1555 * So we have to construct the answer ourselves based on the passed in
1556 * data, which is either a locale_t object, for systems with querylocale(),
1557 * or an array we keep updated to the proper values, otherwise.
1558 *
1559 * This returns a mortalized string containing the locale name(s) of
1560 * LC_ALL.
1561 *
1562 * If all individual categories are the same locale, we can just set LC_ALL
1563 * to that locale. But if not, we have to create an aggregation of all the
1564 * categories on the system. Platforms differ as to the syntax they use
1565 * for these non-uniform locales for LC_ALL. Some use a '/' or other
1566 * delimiter of the locales with a predetermined order of categories; a
1567 * Configure probe would be needed to tell us how to decipher those. glibc
1568 * uses a series of name=value pairs, like
1569 * LC_NUMERIC=C;LC_TIME=en_US.UTF-8;...
1570 * The syntax we use for our aggregation doesn't much matter, as we take
1571 * care not to use the native setlocale() function on whatever style is
1572 * chosen. But, it would be possible for someone to call Perl_setlocale()
1573 * using a native style we don't understand. So far no one has complained.
1574 *
1575 * For systems that have categories we don't know about, the algorithm
1576 * below won't know about those missing categories, leading to potential
1577 * bugs for code that looks at them. If there is an environment variable
1578 * that sets that category, we won't know to look for it, and so our use of
1579 * LANG or "C" improperly overrides it. On the other hand, if we don't do
1580 * what is done here, and there is no environment variable, the category's
1581 * locale should be set to LANG or "C". So there is no good solution. khw
1582 * thinks the best is to make sure we have a complete list of possible
1583 * categories, adding new ones as they show up on obscure platforms.
1584 */
1585
1586 unsigned int i;
1587 Size_t names_len = 0;
1588 bool are_all_categories_the_same_locale = TRUE;
1589 char * aggregate_locale;
1590 char * previous_start = NULL;
36540e72 1591 char * this_start = NULL;
d92b203c
KW
1592 Size_t entry_len = 0;
1593
1594 PERL_ARGS_ASSERT_CALCULATE_LC_ALL;
1595
1596 /* First calculate the needed size for the string listing the categories
1597 * and their locales. */
2855c76a 1598 for (i = 0; i < NOMINAL_LC_ALL_INDEX; i++) {
d92b203c
KW
1599
1600# ifdef USE_QUERYLOCALE
1601 const char * entry = querylocale_l(i, cur_obj);
1602# else
1603 const char * entry = individ_locales[i];
1604# endif
1605
d92b203c
KW
1606 names_len += strlen(category_names[i])
1607 + 1 /* '=' */
1608 + strlen(entry)
1609 + 1; /* ';' */
1610 }
1611
1612 names_len++; /* Trailing '\0' */
1613
1614 /* Allocate enough space for the aggregated string */
1615 SAVEFREEPV(Newxz(aggregate_locale, names_len, char));
1616
1617 /* Then fill it in */
2855c76a 1618 for (i = 0; i < NOMINAL_LC_ALL_INDEX; i++) {
d92b203c
KW
1619 Size_t new_len;
1620
1621# ifdef USE_QUERYLOCALE
1622 const char * entry = querylocale_l(i, cur_obj);
1623# else
1624 const char * entry = individ_locales[i];
1625# endif
1626
d92b203c
KW
1627 new_len = my_strlcat(aggregate_locale, category_names[i], names_len);
1628 assert(new_len <= names_len);
1629 new_len = my_strlcat(aggregate_locale, "=", names_len);
1630 assert(new_len <= names_len);
1631
1632 this_start = aggregate_locale + strlen(aggregate_locale);
1633 entry_len = strlen(entry);
1634
1635 new_len = my_strlcat(aggregate_locale, entry, names_len);
1636 assert(new_len <= names_len);
1637 new_len = my_strlcat(aggregate_locale, ";", names_len);
1638 assert(new_len <= names_len);
1639 PERL_UNUSED_VAR(new_len); /* Only used in DEBUGGING */
1640
1641 if ( i > 0
1642 && are_all_categories_the_same_locale
1643 && memNE(previous_start, this_start, entry_len + 1))
1644 {
1645 are_all_categories_the_same_locale = FALSE;
1646 }
1647 else {
1648 previous_start = this_start;
1649 }
1650 }
1651
1652 /* If they are all the same, just return any one of them */
1653 if (are_all_categories_the_same_locale) {
1654 aggregate_locale = this_start;
1655 aggregate_locale[entry_len] = '\0';
1656 }
1657
1658 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
b26541a9
KW
1659 "calculate_LC_ALL returning '%s'\n",
1660 aggregate_locale));
d92b203c
KW
1661
1662 return aggregate_locale;
1663}
2855c76a
KW
1664
1665#endif
1666
1667#if defined(USE_LOCALE) && defined(DEBUGGING)
1668
1669STATIC const char *
1670S_get_LC_ALL_display(pTHX)
1671{
1672
1673# ifdef LC_ALL
1674
1675 return querylocale_c(LC_ALL);
1676
1677# else
1678
1679 const char * curlocales[NOMINAL_LC_ALL_INDEX];
1680
1681 for (unsigned i = 0; i < NOMINAL_LC_ALL_INDEX; i++) {
1682 curlocales[i] = querylocale_i(i);
1683 }
1684
1685 return calculate_LC_ALL(curlocales);
1686
1687# endif
1688
1689}
1690
1691#endif
d92b203c 1692
a4f00dcc 1693STATIC void
94202fb8
KW
1694S_setlocale_failure_panic_i(pTHX_
1695 const unsigned int cat_index,
1696 const char * current,
1697 const char * failed,
1698 const line_t caller_0_line,
1699 const line_t caller_1_line)
1700{
20ad7b23 1701 dSAVE_ERRNO;
94202fb8
KW
1702 const int cat = categories[cat_index];
1703 const char * name = category_names[cat_index];
94202fb8
KW
1704
1705 PERL_ARGS_ASSERT_SETLOCALE_FAILURE_PANIC_I;
1706
94202fb8
KW
1707 if (current == NULL) {
1708 current = querylocale_i(cat_index);
1709 }
1710
20ad7b23
KW
1711 Perl_locale_panic(Perl_form(aTHX_ "(%" LINE_Tf
1712 "): Can't change locale for %s(%d)"
1713 " from '%s' to '%s'",
1714 caller_1_line, name, cat,
1715 current, failed),
1716 __FILE__, caller_0_line, GET_ERRNO);
94202fb8
KW
1717 NOT_REACHED; /* NOTREACHED */
1718}
1719
fc76ce84
KW
1720/* Any of these will allow us to find the RADIX */
1721# if defined(USE_LOCALE_NUMERIC) && ( defined(HAS_SOME_LANGINFO) \
1722 || defined(HAS_SOME_LOCALECONV) \
1723 || defined(HAS_SNPRINTF))
1724# define CAN_CALCULATE_RADIX
1725# endif
f4fdec1f 1726# ifdef USE_LOCALE_NUMERIC
fc76ce84 1727
94202fb8 1728STATIC void
398eeea9 1729S_new_numeric(pTHX_ const char *newnum)
98994639 1730{
759f4406 1731 PERL_ARGS_ASSERT_NEW_NUMERIC;
7d4bcc4a 1732
291a84fb 1733 /* Called after each libc setlocale() call affecting LC_NUMERIC, to tell
fe1c1494
KW
1734 * core Perl this and that 'newnum' is the name of the new locale, and we
1735 * are switched into it. It installs this locale as the current underlying
1736 * default, and then switches to the C locale, if necessary, so that the
1737 * code that has traditionally expected the radix character to be a dot may
1738 * continue to do so.
0d071d52
KW
1739 *
1740 * The default locale and the C locale can be toggled between by use of the
5792c642
KW
1741 * set_numeric_underlying() and set_numeric_standard() functions, which
1742 * should probably not be called directly, but only via macros like
0d071d52
KW
1743 * SET_NUMERIC_STANDARD() in perl.h.
1744 *
1745 * The toggling is necessary mainly so that a non-dot radix decimal point
fe1c1494
KW
1746 * character can be input and output, while allowing internal calculations
1747 * to use a dot.
0d071d52
KW
1748 *
1749 * This sets several interpreter-level variables:
bb304765 1750 * PL_numeric_name The underlying locale's name: a copy of 'newnum'
892e6465 1751 * PL_numeric_underlying A boolean indicating if the toggled state is such
7738054c
KW
1752 * that the current locale is the program's underlying
1753 * locale
1754 * PL_numeric_standard An int indicating if the toggled state is such
4c68b815
KW
1755 * that the current locale is the C locale or
1756 * indistinguishable from the C locale. If non-zero, it
1757 * is in C; if > 1, it means it may not be toggled away
7738054c 1758 * from C.
4c68b815
KW
1759 * PL_numeric_underlying_is_standard A bool kept by this function
1760 * indicating that the underlying locale and the standard
1761 * C locale are indistinguishable for the purposes of
1762 * LC_NUMERIC. This happens when both of the above two
1763 * variables are true at the same time. (Toggling is a
1764 * no-op under these circumstances.) This variable is
1765 * used to avoid having to recalculate.
fe1c1494
KW
1766 * PL_numeric_radix_sv Contains the string that code should use for the
1767 * decimal point. It is set to either a dot or the
1768 * program's underlying locale's radix character string,
1769 * depending on the situation.
37b8263e
KW
1770 * PL_underlying_radix_sv Contains the program's underlying locale's radix
1771 * character string. This is copied into
1772 * PL_numeric_radix_sv when the situation warrants. It
1773 * exists to avoid having to recalculate it when toggling.
fe1c1494
KW
1774 * PL_underlying_numeric_obj = (only on POSIX 2008 platforms) An object
1775 * with everything set up properly so as to avoid work on
1776 * such platforms.
398eeea9 1777 */
0d071d52 1778
7506c6d9
KW
1779 DEBUG_L( PerlIO_printf(Perl_debug_log,
1780 "Called new_numeric with %s, PL_numeric_name=%s\n",
1781 newnum, PL_numeric_name));
1782
1783 /* If this isn't actually a change, do nothing */
818cdb7a
KW
1784 if (strEQ(PL_numeric_name, newnum)) {
1785 return;
fcf05094
KW
1786 }
1787
818cdb7a
KW
1788 Safefree(PL_numeric_name);
1789 PL_numeric_name = savepv(newnum);
1790
3014c972
KW
1791 /* Handle the trivial case. Since this is called at process
1792 * initialization, be aware that this bit can't rely on much being
1793 * available. */
22bb2282
KW
1794 if (isNAME_C_OR_POSIX(PL_numeric_name)) {
1795 PL_numeric_standard = TRUE;
1796 PL_numeric_underlying_is_standard = TRUE;
1797 PL_numeric_underlying = TRUE;
1798 sv_setpv(PL_numeric_radix_sv, C_decimal_point);
37b8263e 1799 sv_setpv(PL_underlying_radix_sv, C_decimal_point);
22bb2282
KW
1800 return;
1801 }
1802
3d3a0c7e
KW
1803 /* We are in the underlying locale until changed at the end of this
1804 * function */
1805 PL_numeric_underlying = TRUE;
1806
fcf05094
KW
1807# ifdef USE_POSIX_2008_LOCALE
1808
1809 /* We keep a special object for easy switching to */
1810 PL_underlying_numeric_obj = newlocale(LC_NUMERIC_MASK,
1811 PL_numeric_name,
1812 PL_underlying_numeric_obj);
1813
1814# endif
69c5e0db 1815
60d3cb45
KW
1816 const char * radix = NULL;
1817 utf8ness_t utf8ness = UTF8NESS_IMMATERIAL;
1818
37b8263e
KW
1819 /* Find and save this locale's radix character. */
1820 my_langinfo_c(RADIXCHAR, LC_NUMERIC, PL_numeric_name,
1821 &radix, NULL, &utf8ness);
1822 sv_setpv(PL_underlying_radix_sv, radix);
fcf05094 1823
37b8263e
KW
1824 if (utf8ness == UTF8NESS_YES) {
1825 SvUTF8_on(PL_underlying_radix_sv);
1826 }
1827
6cd93f2a
KW
1828 DEBUG_L(PerlIO_printf(Perl_debug_log,
1829 "Locale radix is '%s', ?UTF-8=%d\n",
1830 SvPVX(PL_underlying_radix_sv),
1831 cBOOL(SvUTF8(PL_underlying_radix_sv))));
1832
37b8263e
KW
1833 /* This locale is indistinguishable from C (for numeric purposes) if both
1834 * the radix character and the thousands separator are the same as C's.
1835 * Start with the radix. */
1836 PL_numeric_underlying_is_standard = strEQ(C_decimal_point, radix);
1837 Safefree(radix);
fcf05094
KW
1838
1839# ifndef TS_W32_BROKEN_LOCALECONV
1840
37b8263e
KW
1841 /* If the radix isn't the same as C's, we know it is distinguishable from
1842 * C; otherwise check the thousands separator too. Only if both are the
1843 * same as C's is the locale indistinguishable from C.
1844 *
1845 * But on earlier Windows versions, there is a potential race. This code
1846 * knows that localeconv() (elsewhere in this file) will be used to extract
1847 * the needed value, and localeconv() was buggy for quite a while, and that
1848 * code in this file hence uses a workaround. And that workaround may have
1849 * an (unlikely) race. Gathering the radix uses a different workaround on
1850 * Windows that doesn't involve a race. It might be possible to do the
1851 * same for this (patches welcome).
1852 *
1853 * Until then khw doesn't think it's worth even the small risk of a race to
1854 * get this value, which in almost all locales is empty, and doesn't appear
1855 * to be used in any of the Micrsoft library routines anyway. */
07492de5 1856
37b8263e
KW
1857 const char * scratch_buffer = NULL;
1858 PL_numeric_underlying_is_standard &= strEQ(C_thousands_sep,
1859 my_langinfo_c(THOUSEP, LC_NUMERIC,
1860 PL_numeric_name,
1861 &scratch_buffer,
1862 NULL, NULL));
1863 Safefree(scratch_buffer);
abe1abcf 1864
fe1c1494 1865# endif
69c5e0db 1866
430ae200 1867 PL_numeric_standard = PL_numeric_underlying_is_standard;
4c68b815 1868
fe1c1494
KW
1869 /* Keep LC_NUMERIC so that it has the C locale radix and thousands
1870 * separator. This is for XS modules, so they don't have to worry about
1871 * the radix being a non-dot. (Core operations that need the underlying
1872 * locale change to it temporarily). */
6cd93f2a 1873 if (! PL_numeric_standard) {
84f1d29f
KW
1874 set_numeric_standard();
1875 }
4c28b29c 1876
98994639
HS
1877}
1878
f4fdec1f
KW
1879# endif
1880
98994639
HS
1881void
1882Perl_set_numeric_standard(pTHX)
1883{
7d4bcc4a 1884
fe1c1494 1885# ifdef USE_LOCALE_NUMERIC
7d4bcc4a 1886
fe1c1494
KW
1887 /* Unconditionally toggle the LC_NUMERIC locale to the current underlying
1888 * default.
1889 *
1890 * Most code should use the macro SET_NUMERIC_STANDARD() in perl.h
1891 * instead of calling this directly. The macro avoids calling this routine
1892 * if toggling isn't necessary according to our records (which could be
1893 * wrong if some XS code has changed the locale behind our back) */
0d071d52 1894
94d521c0
KW
1895 DEBUG_L(PerlIO_printf(Perl_debug_log,
1896 "Setting LC_NUMERIC locale to standard C\n"));
7bb8f702 1897
3980cddb 1898 void_setlocale_c(LC_NUMERIC, "C");
7bb8f702 1899 PL_numeric_standard = TRUE;
6cd93f2a
KW
1900 sv_setpv(PL_numeric_radix_sv, C_decimal_point);
1901
7bb8f702 1902 PL_numeric_underlying = PL_numeric_underlying_is_standard;
7bb8f702 1903
fe1c1494 1904# endif /* USE_LOCALE_NUMERIC */
7d4bcc4a 1905
98994639
HS
1906}
1907
1908void
5792c642 1909Perl_set_numeric_underlying(pTHX)
98994639 1910{
7d4bcc4a 1911
fe1c1494 1912# ifdef USE_LOCALE_NUMERIC
7d4bcc4a 1913
fe1c1494
KW
1914 /* Unconditionally toggle the LC_NUMERIC locale to the current underlying
1915 * default.
1916 *
1917 * Most code should use the macro SET_NUMERIC_UNDERLYING() in perl.h
7d4bcc4a
KW
1918 * instead of calling this directly. The macro avoids calling this routine
1919 * if toggling isn't necessary according to our records (which could be
1920 * wrong if some XS code has changed the locale behind our back) */
a9b8c0d8 1921
94d521c0
KW
1922 DEBUG_L(PerlIO_printf(Perl_debug_log, "Setting LC_NUMERIC locale to %s\n",
1923 PL_numeric_name));
7bb8f702 1924
3980cddb 1925 void_setlocale_c(LC_NUMERIC, PL_numeric_name);
7bb8f702 1926 PL_numeric_underlying = TRUE;
6cd93f2a
KW
1927 sv_setsv_nomg(PL_numeric_radix_sv, PL_underlying_radix_sv);
1928
1929 PL_numeric_standard = PL_numeric_underlying_is_standard;
7bb8f702 1930
fe1c1494 1931# endif /* USE_LOCALE_NUMERIC */
7d4bcc4a 1932
98994639
HS
1933}
1934
057f0a44
KW
1935# ifdef USE_LOCALE_CTYPE
1936
a4f00dcc
KW
1937STATIC void
1938S_new_ctype(pTHX_ const char *newctype)
98994639 1939{
360374e2 1940 PERL_ARGS_ASSERT_NEW_CTYPE;
7d4bcc4a 1941
291a84fb 1942 /* Called after each libc setlocale() call affecting LC_CTYPE, to tell
0d071d52
KW
1943 * core Perl this and that 'newctype' is the name of the new locale.
1944 *
1945 * This function sets up the folding arrays for all 256 bytes, assuming
1946 * that tofold() is tolc() since fold case is not a concept in POSIX,
360374e2 1947 */
7918f24d 1948
1a63cba7
KW
1949 DEBUG_L(PerlIO_printf(Perl_debug_log, "Entering new_ctype(%s)\n", newctype));
1950
b269edc8 1951 /* No change means no-op */
ccbc6a9d 1952 if (strEQ(PL_ctype_name, newctype)) {
b269edc8
KW
1953 return;
1954 }
1955
215c5139
KW
1956 /* We will replace any bad locale warning with 1) nothing if the new one is
1957 * ok; or 2) a new warning for the bad new locale */
1958 if (PL_warn_locale) {
1959 SvREFCNT_dec_NN(PL_warn_locale);
1960 PL_warn_locale = NULL;
1961 }
1962
b269edc8
KW
1963 /* Clear cache */
1964 Safefree(PL_ctype_name);
1965 PL_ctype_name = "";
1966
360374e2
KW
1967 PL_in_utf8_turkic_locale = FALSE;
1968
1969 /* For the C locale, just use the standard folds, and we know there are no
3014c972
KW
1970 * glitches possible, so return early. Since this is called at process
1971 * initialization, be aware that this bit can't rely on much being
1972 * available. */
360374e2
KW
1973 if (isNAME_C_OR_POSIX(newctype)) {
1974 Copy(PL_fold, PL_fold_locale, 256, U8);
1975 PL_ctype_name = savepv(newctype);
1976 PL_in_utf8_CTYPE_locale = FALSE;
1977 return;
1978 }
b269edc8 1979
360374e2 1980 /* The cache being cleared signals this to compute a new value */
a0eeb339 1981 PL_in_utf8_CTYPE_locale = is_locale_utf8(newctype);
31f05a37 1982
360374e2
KW
1983 PL_ctype_name = savepv(newctype);
1984 bool maybe_utf8_turkic = FALSE;
b269edc8 1985
360374e2
KW
1986 /* Don't check for problems if we are suppressing the warnings */
1987 bool check_for_problems = ckWARN_d(WARN_LOCALE) || UNLIKELY(DEBUG_L_TEST);
b269edc8 1988
360374e2 1989 if (PL_in_utf8_CTYPE_locale) {
9dd831d8
KW
1990
1991 /* A UTF-8 locale gets standard rules. But note that code still has to
1992 * handle this specially because of the three problematic code points
1993 * */
31f05a37 1994 Copy(PL_fold_latin1, PL_fold_locale, 256, U8);
30d8090d
KW
1995
1996 /* UTF-8 locales can have special handling for 'I' and 'i' if they are
fe1c1494
KW
1997 * Turkic. Make sure these two are the only anomalies. (We don't
1998 * require towupper and towlower because they aren't in C89.) */
5b64f24c 1999
fe1c1494 2000# if defined(HAS_TOWUPPER) && defined (HAS_TOWLOWER)
5b64f24c 2001
fe1c1494 2002 if (towupper('i') == 0x130 && towlower('I') == 0x131)
5b64f24c 2003
fe1c1494 2004# else
5b64f24c 2005
fe1c1494 2006 if (toU8_UPPER_LC('i') == 'i' && toU8_LOWER_LC('I') == 'I')
5b64f24c 2007
fe1c1494
KW
2008# endif
2009
2010 {
2011 /* This is how we determine it really is Turkic */
30d8090d
KW
2012 check_for_problems = TRUE;
2013 maybe_utf8_turkic = TRUE;
2014 }
31f05a37 2015 }
9dd831d8 2016 else { /* Not a canned locale we know the values for. Compute them */
9200ac36
KW
2017
2018# ifdef DEBUGGING
2019
2020 bool has_non_ascii_fold = FALSE;
2021 bool found_unexpected = FALSE;
2022
2023 if (DEBUG_Lv_TEST) {
360374e2 2024 for (unsigned i = 128; i < 256; i++) {
9200ac36
KW
2025 int j = LATIN1_TO_NATIVE(i);
2026 if (toU8_LOWER_LC(j) != j || toU8_UPPER_LC(j) != j) {
2027 has_non_ascii_fold = TRUE;
2028 break;
2029 }
2030 }
2031 }
2032
2033# endif
2034
360374e2 2035 for (unsigned i = 0; i < 256; i++) {
9dd831d8
KW
2036 if (isU8_UPPER_LC(i))
2037 PL_fold_locale[i] = (U8) toU8_LOWER_LC(i);
2038 else if (isU8_LOWER_LC(i))
2039 PL_fold_locale[i] = (U8) toU8_UPPER_LC(i);
2040 else
2041 PL_fold_locale[i] = (U8) i;
9200ac36
KW
2042
2043# ifdef DEBUGGING
2044
2045 if (DEBUG_Lv_TEST) {
2046 bool unexpected = FALSE;
2047
2048 if (isUPPER_L1(i)) {
2049 if (isUPPER_A(i)) {
2050 if (PL_fold_locale[i] != toLOWER_A(i)) {
2051 unexpected = TRUE;
2052 }
2053 }
2054 else if (has_non_ascii_fold) {
2055 if (PL_fold_locale[i] != toLOWER_L1(i)) {
2056 unexpected = TRUE;
2057 }
2058 }
2059 else if (PL_fold_locale[i] != i) {
2060 unexpected = TRUE;
2061 }
2062 }
2063 else if ( isLOWER_L1(i)
2064 && i != LATIN_SMALL_LETTER_SHARP_S
2065 && i != MICRO_SIGN)
2066 {
2067 if (isLOWER_A(i)) {
2068 if (PL_fold_locale[i] != toUPPER_A(i)) {
2069 unexpected = TRUE;
2070 }
2071 }
2072 else if (has_non_ascii_fold) {
2073 if (PL_fold_locale[i] != toUPPER_LATIN1_MOD(i)) {
2074 unexpected = TRUE;
2075 }
2076 }
2077 else if (PL_fold_locale[i] != i) {
2078 unexpected = TRUE;
2079 }
2080 }
2081 else if (PL_fold_locale[i] != i) {
2082 unexpected = TRUE;
2083 }
2084
2085 if (unexpected) {
2086 found_unexpected = TRUE;
2087 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
2088 "For %s, fold of %02x is %02x\n",
2089 newctype, i, PL_fold_locale[i]));
2090 }
2091 }
2092 }
2093
2094 if (found_unexpected) {
2095 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
2096 "All bytes not mentioned above either fold to"
2097 " themselves or are the expected ASCII or"
2098 " Latin1 ones\n"));
2099 }
2100 else {
2101 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
2102 "No nonstandard folds were found\n"));
2103# endif
2104
9dd831d8
KW
2105 }
2106 }
8b7358b9 2107
667fa5f0
KW
2108# ifdef MB_CUR_MAX
2109
2110 /* We only handle single-byte locales (outside of UTF-8 ones; so if this
2111 * locale requires more than one byte, there are going to be BIG problems.
2112 * */
2113
2114 if (MB_CUR_MAX > 1 && ! PL_in_utf8_CTYPE_locale
2115
2116 /* Some platforms return MB_CUR_MAX > 1 for even the "C" locale.
2117 * Just assume that the implementation for them (plus for POSIX) is
2118 * correct and the > 1 value is spurious. (Since these are
2119 * specially handled to never be considered UTF-8 locales, as long
2120 * as this is the only problem, everything should work fine */
2121 && ! isNAME_C_OR_POSIX(newctype))
2122 {
2123 DEBUG_L(PerlIO_printf(Perl_debug_log,
2124 "Unsupported, MB_CUR_MAX=%d\n", (int) MB_CUR_MAX));
2125
2126 Perl_ck_warner_d(aTHX_ packWARN(WARN_LOCALE),
2127 "Locale '%s' is unsupported, and may crash the"
2128 " interpreter.\n",
2129 newctype);
2130 }
2131
2132# endif
2133
2134 DEBUG_Lv(PerlIO_printf(Perl_debug_log, "check_for_problems=%d\n",
2135 check_for_problems));
2136
8b7358b9
KW
2137 /* We don't populate the other lists if a UTF-8 locale, but do check that
2138 * everything works as expected, unless checking turned off */
9dd831d8 2139 if (check_for_problems) {
8c6180a9
KW
2140 /* Assume enough space for every character being bad. 4 spaces each
2141 * for the 94 printable characters that are output like "'x' "; and 5
2142 * spaces each for "'\\' ", "'\t' ", and "'\n' "; plus a terminating
2143 * NUL */
8b7358b9 2144 char bad_chars_list[ (94 * 4) + (3 * 5) + 1 ] = { '\0' };
8c6180a9
KW
2145 unsigned int bad_count = 0; /* Count of bad characters */
2146
360374e2 2147 for (unsigned i = 0; i < 256; i++) {
8c6180a9
KW
2148
2149 /* If checking for locale problems, see if the native ASCII-range
2150 * printables plus \n and \t are in their expected categories in
2151 * the new locale. If not, this could mean big trouble, upending
2152 * Perl's and most programs' assumptions, like having a
2153 * metacharacter with special meaning become a \w. Fortunately,
2154 * it's very rare to find locales that aren't supersets of ASCII
2155 * nowadays. It isn't a problem for most controls to be changed
2156 * into something else; we check only \n and \t, though perhaps \r
2157 * could be an issue as well. */
9dd831d8 2158 if (isGRAPH_A(i) || isBLANK_A(i) || i == '\n') {
8b7358b9 2159 bool is_bad = FALSE;
6726b4f4 2160 char name[4] = { '\0' };
8b7358b9
KW
2161
2162 /* Convert the name into a string */
6726b4f4 2163 if (isGRAPH_A(i)) {
8b7358b9
KW
2164 name[0] = i;
2165 name[1] = '\0';
2166 }
2167 else if (i == '\n') {
6726b4f4
KW
2168 my_strlcpy(name, "\\n", sizeof(name));
2169 }
2170 else if (i == '\t') {
2171 my_strlcpy(name, "\\t", sizeof(name));
8b7358b9
KW
2172 }
2173 else {
6726b4f4
KW
2174 assert(i == ' ');
2175 my_strlcpy(name, "' '", sizeof(name));
8b7358b9
KW
2176 }
2177
2178 /* Check each possibe class */
dcbeec20 2179 if (UNLIKELY(cBOOL(isU8_ALPHANUMERIC_LC(i)) != cBOOL(isALPHANUMERIC_A(i)))) {
8b7358b9
KW
2180 is_bad = TRUE;
2181 DEBUG_L(PerlIO_printf(Perl_debug_log,
58236863 2182 "isalnum('%s') unexpectedly is %x\n",
dcbeec20 2183 name, cBOOL(isU8_ALPHANUMERIC_LC(i))));
8b7358b9 2184 }
dcbeec20 2185 if (UNLIKELY(cBOOL(isU8_ALPHA_LC(i)) != cBOOL(isALPHA_A(i)))) {
8b7358b9
KW
2186 is_bad = TRUE;
2187 DEBUG_L(PerlIO_printf(Perl_debug_log,
58236863 2188 "isalpha('%s') unexpectedly is %x\n",
dcbeec20 2189 name, cBOOL(isU8_ALPHA_LC(i))));
8b7358b9 2190 }
dcbeec20 2191 if (UNLIKELY(cBOOL(isU8_DIGIT_LC(i)) != cBOOL(isDIGIT_A(i)))) {
8b7358b9
KW
2192 is_bad = TRUE;
2193 DEBUG_L(PerlIO_printf(Perl_debug_log,
58236863 2194 "isdigit('%s') unexpectedly is %x\n",
dcbeec20 2195 name, cBOOL(isU8_DIGIT_LC(i))));
8b7358b9 2196 }
dcbeec20 2197 if (UNLIKELY(cBOOL(isU8_GRAPH_LC(i)) != cBOOL(isGRAPH_A(i)))) {
8b7358b9
KW
2198 is_bad = TRUE;
2199 DEBUG_L(PerlIO_printf(Perl_debug_log,
58236863 2200 "isgraph('%s') unexpectedly is %x\n",
dcbeec20 2201 name, cBOOL(isU8_GRAPH_LC(i))));
8b7358b9 2202 }
dcbeec20 2203 if (UNLIKELY(cBOOL(isU8_LOWER_LC(i)) != cBOOL(isLOWER_A(i)))) {
8b7358b9
KW
2204 is_bad = TRUE;
2205 DEBUG_L(PerlIO_printf(Perl_debug_log,
58236863 2206 "islower('%s') unexpectedly is %x\n",
dcbeec20 2207 name, cBOOL(isU8_LOWER_LC(i))));
8b7358b9 2208 }
dcbeec20 2209 if (UNLIKELY(cBOOL(isU8_PRINT_LC(i)) != cBOOL(isPRINT_A(i)))) {
8b7358b9
KW
2210 is_bad = TRUE;
2211 DEBUG_L(PerlIO_printf(Perl_debug_log,
58236863 2212 "isprint('%s') unexpectedly is %x\n",
dcbeec20 2213 name, cBOOL(isU8_PRINT_LC(i))));
8b7358b9 2214 }
dcbeec20 2215 if (UNLIKELY(cBOOL(isU8_PUNCT_LC(i)) != cBOOL(isPUNCT_A(i)))) {
8b7358b9
KW
2216 is_bad = TRUE;
2217 DEBUG_L(PerlIO_printf(Perl_debug_log,
58236863 2218 "ispunct('%s') unexpectedly is %x\n",
dcbeec20 2219 name, cBOOL(isU8_PUNCT_LC(i))));
8b7358b9 2220 }
dcbeec20 2221 if (UNLIKELY(cBOOL(isU8_SPACE_LC(i)) != cBOOL(isSPACE_A(i)))) {
8b7358b9
KW
2222 is_bad = TRUE;
2223 DEBUG_L(PerlIO_printf(Perl_debug_log,
58236863 2224 "isspace('%s') unexpectedly is %x\n",
dcbeec20 2225 name, cBOOL(isU8_SPACE_LC(i))));
8b7358b9 2226 }
dcbeec20 2227 if (UNLIKELY(cBOOL(isU8_UPPER_LC(i)) != cBOOL(isUPPER_A(i)))) {
8b7358b9
KW
2228 is_bad = TRUE;
2229 DEBUG_L(PerlIO_printf(Perl_debug_log,
58236863 2230 "isupper('%s') unexpectedly is %x\n",
dcbeec20 2231 name, cBOOL(isU8_UPPER_LC(i))));
8b7358b9 2232 }
dcbeec20 2233 if (UNLIKELY(cBOOL(isU8_XDIGIT_LC(i))!= cBOOL(isXDIGIT_A(i)))) {
8b7358b9
KW
2234 is_bad = TRUE;
2235 DEBUG_L(PerlIO_printf(Perl_debug_log,
58236863 2236 "isxdigit('%s') unexpectedly is %x\n",
dcbeec20 2237 name, cBOOL(isU8_XDIGIT_LC(i))));
8b7358b9 2238 }
dcbeec20 2239 if (UNLIKELY(toU8_LOWER_LC(i) != (int) toLOWER_A(i))) {
8b7358b9
KW
2240 is_bad = TRUE;
2241 DEBUG_L(PerlIO_printf(Perl_debug_log,
2242 "tolower('%s')=0x%x instead of the expected 0x%x\n",
dcbeec20 2243 name, toU8_LOWER_LC(i), (int) toLOWER_A(i)));
8b7358b9 2244 }
dcbeec20 2245 if (UNLIKELY(toU8_UPPER_LC(i) != (int) toUPPER_A(i))) {
8b7358b9
KW
2246 is_bad = TRUE;
2247 DEBUG_L(PerlIO_printf(Perl_debug_log,
2248 "toupper('%s')=0x%x instead of the expected 0x%x\n",
dcbeec20 2249 name, toU8_UPPER_LC(i), (int) toUPPER_A(i)));
8b7358b9
KW
2250 }
2251 if (UNLIKELY((i == '\n' && ! isCNTRL_LC(i)))) {
2252 is_bad = TRUE;
2253 DEBUG_L(PerlIO_printf(Perl_debug_log,
2254 "'\\n' (=%02X) is not a control\n", (int) i));
2255 }
2256
2257 /* Add to the list; Separate multiple entries with a blank */
2258 if (is_bad) {
2259 if (bad_count) {
2260 my_strlcat(bad_chars_list, " ", sizeof(bad_chars_list));
8c6180a9 2261 }
8b7358b9
KW
2262 my_strlcat(bad_chars_list, name, sizeof(bad_chars_list));
2263 bad_count++;
8c6180a9
KW
2264 }
2265 }
2266 }
2267
30d8090d
KW
2268 if (bad_count == 2 && maybe_utf8_turkic) {
2269 bad_count = 0;
2270 *bad_chars_list = '\0';
2271 PL_fold_locale['I'] = 'I';
2272 PL_fold_locale['i'] = 'i';
2273 PL_in_utf8_turkic_locale = TRUE;
b26541a9 2274 DEBUG_L(PerlIO_printf(Perl_debug_log, "%s is turkic\n", newctype));
30d8090d 2275 }
8c6180a9 2276
30d8090d 2277 /* If we found problems and we want them output, do so */
667fa5f0 2278 if ( (UNLIKELY(bad_count))
30d8090d
KW
2279 && (LIKELY(ckWARN_d(WARN_LOCALE)) || UNLIKELY(DEBUG_L_TEST)))
2280 {
8b7358b9
KW
2281 if (UNLIKELY(bad_count) && PL_in_utf8_CTYPE_locale) {
2282 PL_warn_locale = Perl_newSVpvf(aTHX_
2283 "Locale '%s' contains (at least) the following characters"
578a6a87
KW
2284 " which have\nunexpected meanings: %s\nThe Perl program"
2285 " will use the expected meanings",
8b7358b9 2286 newctype, bad_chars_list);
8b7358b9
KW
2287 }
2288 else {
667fa5f0
KW
2289 PL_warn_locale =
2290 Perl_newSVpvf(aTHX_
2291 "\nThe following characters (and maybe"
2292 " others) may not have the same meaning as"
2293 " the Perl program expects: %s\n",
2294 bad_chars_list
8c6180a9 2295 );
8b7358b9
KW
2296 }
2297
3871c541 2298# ifdef HAS_SOME_LANGINFO
b119c2be 2299
07492de5 2300 const char * scratch_buffer = NULL;
b119c2be 2301 Perl_sv_catpvf(aTHX_ PL_warn_locale, "; codeset=%s",
fc46dd0d
KW
2302 my_langinfo_c(CODESET, LC_CTYPE,
2303 newctype,
fffcb78f
KW
2304 &scratch_buffer, NULL,
2305 NULL));
07492de5 2306 Safefree(scratch_buffer);
b119c2be
KW
2307
2308# endif
2309
8b7358b9
KW
2310 Perl_sv_catpvf(aTHX_ PL_warn_locale, "\n");
2311
cc9eaeb0 2312 /* If we are actually in the scope of the locale or are debugging,
bddebb56
KW
2313 * output the message now. If not in that scope, we save the
2314 * message to be output at the first operation using this locale,
2315 * if that actually happens. Most programs don't use locales, so
2316 * they are immune to bad ones. */
cc9eaeb0 2317 if (IN_LC(LC_CTYPE) || UNLIKELY(DEBUG_L_TEST)) {
780fcc9f 2318
780fcc9f 2319 /* The '0' below suppresses a bogus gcc compiler warning */
fe1c1494
KW
2320 Perl_warner(aTHX_ packWARN(WARN_LOCALE), SvPVX(PL_warn_locale),
2321 0);
bddebb56 2322
bddebb56
KW
2323 if (IN_LC(LC_CTYPE)) {
2324 SvREFCNT_dec_NN(PL_warn_locale);
2325 PL_warn_locale = NULL;
2326 }
780fcc9f 2327 }
baa60164 2328 }
31f05a37 2329 }
057f0a44 2330}
98994639 2331
fe1c1494 2332# endif /* USE_LOCALE_CTYPE */
7d4bcc4a 2333
98994639 2334void
2726666d
KW
2335Perl__warn_problematic_locale()
2336{
2726666d 2337
fe1c1494 2338# ifdef USE_LOCALE_CTYPE
2726666d 2339
5f04a188
KW
2340 dTHX;
2341
2342 /* Internal-to-core function that outputs the message in PL_warn_locale,
2343 * and then NULLS it. Should be called only through the macro
1629a27e 2344 * CHECK_AND_WARN_PROBLEMATIC_LOCALE_ */
5f04a188 2345
2726666d 2346 if (PL_warn_locale) {
2726666d
KW
2347 Perl_ck_warner(aTHX_ packWARN(WARN_LOCALE),
2348 SvPVX(PL_warn_locale),
2349 0 /* dummy to avoid compiler warning */ );
2726666d
KW
2350 SvREFCNT_dec_NN(PL_warn_locale);
2351 PL_warn_locale = NULL;
2352 }
2353
fe1c1494 2354# endif
2726666d
KW
2355
2356}
2357
a4f00dcc 2358STATIC void
c7dc1ea8
KW
2359S_new_LC_ALL(pTHX_ const char *unused)
2360{
2361 unsigned int i;
2362
2363 /* LC_ALL updates all the things we care about. */
2364
2365 PERL_UNUSED_ARG(unused);
2366
2367 for (i = 0; i < NOMINAL_LC_ALL_INDEX; i++) {
2368 if (update_functions[i]) {
22392ba2 2369 const char * this_locale = querylocale_i(i);
1f9a6108 2370 update_functions[i](aTHX_ this_locale);
c7dc1ea8
KW
2371 }
2372 }
2373}
2374
e933b48f
KW
2375# ifdef USE_LOCALE_COLLATE
2376
c7dc1ea8 2377STATIC void
a4f00dcc 2378S_new_collate(pTHX_ const char *newcoll)
98994639 2379{
89713b94 2380 PERL_ARGS_ASSERT_NEW_COLLATE;
7d4bcc4a 2381
0d071d52 2382
291a84fb 2383 /* Called after each libc setlocale() call affecting LC_COLLATE, to tell
0d071d52
KW
2384 * core Perl this and that 'newcoll' is the name of the new locale.
2385 *
d35fca5f
KW
2386 * The design of locale collation is that every locale change is given an
2387 * index 'PL_collation_ix'. The first time a string particpates in an
2388 * operation that requires collation while locale collation is active, it
2389 * is given PERL_MAGIC_collxfrm magic (via sv_collxfrm_flags()). That
2390 * magic includes the collation index, and the transformation of the string
2391 * by strxfrm(), q.v. That transformation is used when doing comparisons,
2392 * instead of the string itself. If a string changes, the magic is
2393 * cleared. The next time the locale changes, the index is incremented,
2394 * and so we know during a comparison that the transformation is not
2395 * necessarily still valid, and so is recomputed. Note that if the locale
2396 * changes enough times, the index could wrap (a U32), and it is possible
2397 * that a transformation would improperly be considered valid, leading to
2398 * an unlikely bug */
0d071d52 2399
5fdda512
KW
2400 /* Return if the locale isn't changing */
2401 if (strEQ(PL_collation_name, newcoll)) {
2402 return;
2403 }
2404
2405 Safefree(PL_collation_name);
2406 PL_collation_name = savepv(newcoll);
2407 ++PL_collation_ix;
2408
3014c972
KW
2409 /* Set the new one up if trivial. Since this is called at process
2410 * initialization, be aware that this bit can't rely on much being
2411 * available. */
5fdda512
KW
2412 PL_collation_standard = isNAME_C_OR_POSIX(newcoll);
2413 if (PL_collation_standard) {
5fdda512 2414 DEBUG_Lv(PerlIO_printf(Perl_debug_log, "Setting PL_collation name='%s'\n", PL_collation_name));
1604cfb0
MS
2415 PL_collxfrm_base = 0;
2416 PL_collxfrm_mult = 2;
165a1c52 2417 PL_in_utf8_COLLATE_locale = FALSE;
f28f4d2a 2418 PL_strxfrm_NUL_replacement = '\0';
a4a439fb 2419 PL_strxfrm_max_cp = 0;
1604cfb0 2420 return;
5fdda512
KW
2421 }
2422
2423 /* Flag that the remainder of the set up is being deferred until first need */
2424 PL_collxfrm_mult = 0;
2425 PL_collxfrm_base = 0;
2426
5fdda512
KW
2427}
2428
e933b48f 2429# endif /* USE_LOCALE_COLLATE */
5fdda512 2430#endif /* USE_LOCALE */
d36adde0 2431
d2b24094 2432#ifdef WIN32
b8cc575c 2433
27edef3b
KW
2434wchar_t *
2435Perl_Win_utf8_string_to_wstring(const char * utf8_string)
2436{
2437 wchar_t *wstring;
2438
2439 int req_size = MultiByteToWideChar(CP_UTF8, 0, utf8_string, -1, NULL, 0);
2440 if (! req_size) {
2441 errno = EINVAL;
2442 return NULL;
2443 }
2444
2445 Newx(wstring, req_size, wchar_t);
2446
2447 if (! MultiByteToWideChar(CP_UTF8, 0, utf8_string, -1, wstring, req_size))
2448 {
2449 Safefree(wstring);
2450 errno = EINVAL;
2451 return NULL;
2452 }
2453
2454 return wstring;
2455}
2456
2457char *
2458Perl_Win_wstring_to_utf8_string(const wchar_t * wstring)
2459{
2460 char *utf8_string;
2461
2462 int req_size =
2463 WideCharToMultiByte(CP_UTF8, 0, wstring, -1, NULL, 0, NULL, NULL);
2464
2465 Newx(utf8_string, req_size, char);
2466
2467 if (! WideCharToMultiByte(CP_UTF8, 0, wstring, -1, utf8_string,
2468 req_size, NULL, NULL))
2469 {
2470 Safefree(utf8_string);
2471 errno = EINVAL;
2472 return NULL;
2473 }
2474
2475 return utf8_string;
2476}
2477
6c332036
TC
2478#define USE_WSETLOCALE
2479
2480#ifdef USE_WSETLOCALE
2481
2482STATIC char *
2483S_wrap_wsetlocale(pTHX_ int category, const char *locale) {
27edef3b 2484 wchar_t *wlocale = NULL;
6c332036
TC
2485 wchar_t *wresult;
2486 char *result;
2487
2488 if (locale) {
27edef3b
KW
2489 wlocale = Win_utf8_string_to_wstring(locale);
2490 if (! wlocale) {
6c332036
TC
2491 return NULL;
2492 }
2493 }
2494 else {
2495 wlocale = NULL;
2496 }
27edef3b 2497
6c332036
TC
2498 wresult = _wsetlocale(category, wlocale);
2499 Safefree(wlocale);
27edef3b
KW
2500
2501 if (! wresult) {
6c332036
TC
2502 return NULL;
2503 }
27edef3b
KW
2504
2505 result = Win_wstring_to_utf8_string(wresult);
2506 SAVEFREEPV(result); /* is there something better we can do here? */
6c332036
TC
2507
2508 return result;
2509}
2510
2511#endif
2512
a4f00dcc 2513STATIC char *
b8cc575c 2514S_win32_setlocale(pTHX_ int category, const char* locale)
b385bb4d
KW
2515{
2516 /* This, for Windows, emulates POSIX setlocale() behavior. There is no
7d4bcc4a
KW
2517 * difference between the two unless the input locale is "", which normally
2518 * means on Windows to get the machine default, which is set via the
2519 * computer's "Regional and Language Options" (or its current equivalent).
2520 * In POSIX, it instead means to find the locale from the user's
2521 * environment. This routine changes the Windows behavior to first look in
2522 * the environment, and, if anything is found, use that instead of going to
2523 * the machine default. If there is no environment override, the machine
2524 * default is used, by calling the real setlocale() with "".
2525 *
2526 * The POSIX behavior is to use the LC_ALL variable if set; otherwise to
2527 * use the particular category's variable if set; otherwise to use the LANG
2528 * variable. */
b385bb4d 2529
175c4cf9 2530 bool override_LC_ALL = FALSE;
89f7b9aa 2531 char * result;
e5f10d49 2532 unsigned int i;
89f7b9aa 2533
b385bb4d 2534 if (locale && strEQ(locale, "")) {
7d4bcc4a
KW
2535
2536# ifdef LC_ALL
2537
b385bb4d
KW
2538 locale = PerlEnv_getenv("LC_ALL");
2539 if (! locale) {
e5f10d49
KW
2540 if (category == LC_ALL) {
2541 override_LC_ALL = TRUE;
2542 }
2543 else {
7d4bcc4a
KW
2544
2545# endif
7d4bcc4a 2546
e5f10d49
KW
2547 for (i = 0; i < NOMINAL_LC_ALL_INDEX; i++) {
2548 if (category == categories[i]) {
2549 locale = PerlEnv_getenv(category_names[i]);
2550 goto found_locale;
2551 }
2552 }
7d4bcc4a 2553
b385bb4d 2554 locale = PerlEnv_getenv("LANG");
481465ea 2555 if (! locale) {
b385bb4d
KW
2556 locale = "";
2557 }
e5f10d49
KW
2558
2559 found_locale: ;
7d4bcc4a
KW
2560
2561# ifdef LC_ALL
2562
e5f10d49 2563 }
b385bb4d 2564 }
7d4bcc4a
KW
2565
2566# endif
2567
b385bb4d
KW
2568 }
2569
6c332036
TC
2570#ifdef USE_WSETLOCALE
2571 result = S_wrap_wsetlocale(aTHX_ category, locale);
2572#else
89f7b9aa 2573 result = setlocale(category, locale);
6c332036 2574#endif
cd7ba941
KW
2575 DEBUG_L(PerlIO_printf(Perl_debug_log, "%s\n",
2576 setlocale_debug_string_r(category, locale, result)));
89f7b9aa 2577
481465ea 2578 if (! override_LC_ALL) {
89f7b9aa
KW
2579 return result;
2580 }
2581
dfd77d7a 2582 /* Here the input category was LC_ALL, and we have set it to what is in the
481465ea
KW
2583 * LANG variable or the system default if there is no LANG. But these have
2584 * lower priority than the other LC_foo variables, so override it for each
2585 * one that is set. (If they are set to "", it means to use the same thing
2586 * we just set LC_ALL to, so can skip) */
7d4bcc4a 2587
4fd76d49 2588 for (i = 0; i < LC_ALL_INDEX_; i++) {
e5f10d49
KW
2589 result = PerlEnv_getenv(category_names[i]);
2590 if (result && strNE(result, "")) {
6c332036 2591#ifdef USE_WSETLOCALE
17a1e9ac 2592 S_wrap_wsetlocale(aTHX_ categories[i], result);
6c332036 2593#else
17a1e9ac 2594 setlocale(categories[i], result);
6c332036 2595#endif
b26541a9 2596 DEBUG_Lv(PerlIO_printf(Perl_debug_log, "%s\n",
d188fe6d 2597 setlocale_debug_string_i(i, result, "not captured")));
e5f10d49 2598 }
89f7b9aa 2599 }
7d4bcc4a 2600
bbc98134 2601 result = setlocale(LC_ALL, NULL);
cd7ba941
KW
2602 DEBUG_L(PerlIO_printf(Perl_debug_log, "%s\n",
2603 setlocale_debug_string_c(LC_ALL, NULL, result)));
89f7b9aa 2604
bbc98134 2605 return result;
b385bb4d
KW
2606}
2607
2608#endif
2609
9aac5db8 2610/*
9aac5db8
KW
2611=for apidoc Perl_setlocale
2612
2613This is an (almost) drop-in replacement for the system L<C<setlocale(3)>>,
2614taking the same parameters, and returning the same information, except that it
9487427b
KW
2615returns the correct underlying C<LC_NUMERIC> locale. Regular C<setlocale> will
2616instead return C<C> if the underlying locale has a non-dot decimal point
2617character, or a non-empty thousands separator for displaying floating point
2618numbers. This is because perl keeps that locale category such that it has a
2619dot and empty separator, changing the locale briefly during the operations
2620where the underlying one is required. C<Perl_setlocale> knows about this, and
2621compensates; regular C<setlocale> doesn't.
9aac5db8 2622
e9bc6d6b 2623Another reason it isn't completely a drop-in replacement is that it is
9aac5db8 2624declared to return S<C<const char *>>, whereas the system setlocale omits the
163deff3
KW
2625C<const> (presumably because its API was specified long ago, and can't be
2626updated; it is illegal to change the information C<setlocale> returns; doing
2627so leads to segfaults.)
9aac5db8 2628
e9bc6d6b
KW
2629Finally, C<Perl_setlocale> works under all circumstances, whereas plain
2630C<setlocale> can be completely ineffective on some platforms under some
2631configurations.
2632
9aac5db8 2633C<Perl_setlocale> should not be used to change the locale except on systems
e9bc6d6b
KW
2634where the predefined variable C<${^SAFE_LOCALES}> is 1. On some such systems,
2635the system C<setlocale()> is ineffective, returning the wrong information, and
2636failing to actually change the locale. C<Perl_setlocale>, however works
2637properly in all circumstances.
9aac5db8
KW
2638
2639The return points to a per-thread static buffer, which is overwritten the next
2640time C<Perl_setlocale> is called from the same thread.
2641
2642=cut
2643
2644*/
2645
d87e68d7
KW
2646#ifndef USE_LOCALE_NUMERIC
2647# define affects_LC_NUMERIC(cat) 0
2648#elif defined(LC_ALL)
2649# define affects_LC_NUMERIC(cat) (cat == LC_NUMERIC || cat == LC_ALL)
2650#else
2651# define affects_LC_NUMERIC(cat) (cat == LC_NUMERIC)
2652#endif
2653
9aac5db8
KW
2654const char *
2655Perl_setlocale(const int category, const char * locale)
a4f00dcc
KW
2656{
2657 /* This wraps POSIX::setlocale() */
2658
52129632 2659#ifndef USE_LOCALE
d36adde0
KW
2660
2661 PERL_UNUSED_ARG(category);
2662 PERL_UNUSED_ARG(locale);
2663
2664 return "C";
2665
2666#else
2667
9aac5db8 2668 const char * retval;
a4f00dcc 2669 dTHX;
0bb37942 2670
1a63cba7
KW
2671 DEBUG_L(PerlIO_printf(Perl_debug_log,
2672 "Entering Perl_setlocale(%d, \"%s\")\n",
2673 category, locale));
2674
0bb37942
KW
2675 /* A NULL locale means only query what the current one is. */
2676 if (locale == NULL) {
2677
6a7aca81 2678# ifndef USE_LOCALE_NUMERIC
0bb37942 2679
6a7aca81 2680 /* Without LC_NUMERIC, it's trivial; we just return the value */
22392ba2 2681 return save_to_buffer(querylocale_r(category),
0b8fa716 2682 &PL_setlocale_buf, &PL_setlocale_bufsize);
6a7aca81 2683# else
a4f00dcc 2684
0bb37942
KW
2685 /* We have the LC_NUMERIC name saved, because we are normally switched
2686 * into the C locale (or equivalent) for it. */
a4f00dcc 2687 if (category == LC_NUMERIC) {
1a63cba7
KW
2688 DEBUG_L(PerlIO_printf(Perl_debug_log,
2689 "Perl_setlocale(LC_NUMERIC, NULL) returning stashed '%s'\n",
2690 PL_numeric_name));
9aac5db8
KW
2691
2692 /* We don't have to copy this return value, as it is a per-thread
2693 * variable, and won't change until a future setlocale */
2694 return PL_numeric_name;
a4f00dcc
KW
2695 }
2696
6a7aca81
KW
2697# ifndef LC_ALL
2698
2699 /* Without LC_ALL, just return the value */
22392ba2 2700 return save_to_buffer(querylocale_r(category),
0b8fa716 2701 &PL_setlocale_buf, &PL_setlocale_bufsize);
6a7aca81
KW
2702
2703# else
2704
2705 /* Here, LC_ALL is available on this platform. It's the one
2706 * complicating category (because it can contain a toggled LC_NUMERIC
2707 * value), for all the remaining ones (we took care of LC_NUMERIC
2708 * above), just return the value */
2709 if (category != LC_ALL) {
22392ba2 2710 return save_to_buffer(querylocale_r(category),
0b8fa716 2711 &PL_setlocale_buf, &PL_setlocale_bufsize);
6a7aca81
KW
2712 }
2713
2714 bool toggled = FALSE;
a4f00dcc 2715
0bb37942
KW
2716 /* For an LC_ALL query, switch back to the underlying numeric locale
2717 * (if we aren't there already) so as to get the correct results. Our
2718 * records for all the other categories are valid without switching */
6a7aca81 2719 if (! PL_numeric_underlying) {
996e6b9e
KW
2720 set_numeric_underlying();
2721 toggled = TRUE;
a4f00dcc
KW
2722 }
2723
6a7aca81 2724 retval = querylocale_c(LC_ALL);
996e6b9e
KW
2725
2726 if (toggled) {
996e6b9e 2727 set_numeric_standard();
6a7aca81 2728 }
0bb37942 2729
1a63cba7
KW
2730 DEBUG_L(PerlIO_printf(Perl_debug_log, "%s\n",
2731 setlocale_debug_string_r(category, locale, retval)));
22392ba2 2732
0b8fa716 2733 return save_to_buffer(retval, &PL_setlocale_buf, &PL_setlocale_bufsize);
6a7aca81
KW
2734
2735# endif /* Has LC_ALL */
2736# endif /* Has LC_NUMERIC */
2737
0bb37942 2738 } /* End of querying the current locale */
a4f00dcc 2739
d87e68d7
KW
2740
2741 /* Here, the input has a locale to change to. First find the current
2742 * locale */
1a63cba7 2743 unsigned int cat_index = get_category_index(category, NULL);
d87e68d7
KW
2744 retval = querylocale_i(cat_index);
2745
2746 /* If the new locale is the same as the current one, nothing is actually
2747 * being changed, so do nothing. */
2748 if ( strEQ(retval, locale)
2749 && ( ! affects_LC_NUMERIC(category)
2750
2751# ifdef USE_LOCALE_NUMERIC
2752
2753 || strEQ(locale, PL_numeric_name)
2754
2755# endif
2756
2757 )) {
2758 DEBUG_L(PerlIO_printf(Perl_debug_log,
2759 "Already in requested locale: no action taken\n"));
2760 return save_to_buffer(setlocale_i(cat_index, locale),
0b8fa716 2761 &PL_setlocale_buf, &PL_setlocale_bufsize);
d87e68d7
KW
2762 }
2763
2764 /* Here, an actual change is being requested. Do it */
22392ba2 2765 retval = setlocale_i(cat_index, locale);
c8aea264 2766
9aac5db8 2767 if (! retval) {
1a63cba7
KW
2768 DEBUG_L(PerlIO_printf(Perl_debug_log, "%s\n",
2769 setlocale_debug_string_i(cat_index, locale, "NULL")));
a4f00dcc
KW
2770 return NULL;
2771 }
2772
c8aea264 2773 assert(strNE(retval, ""));
0b8fa716 2774 retval = save_to_buffer(retval, &PL_setlocale_buf, &PL_setlocale_bufsize);
22392ba2 2775
fe1c1494
KW
2776 /* Now that have changed locales, we have to update our records to
2777 * correspond. Only certain categories have extra work to update. */
c7dc1ea8
KW
2778 if (update_functions[cat_index]) {
2779 update_functions[cat_index](aTHX_ retval);
a4f00dcc
KW
2780 }
2781
b26541a9 2782 DEBUG_L(PerlIO_printf(Perl_debug_log, "returning '%s'\n", retval));
1a63cba7 2783
a4f00dcc
KW
2784 return retval;
2785
d36adde0
KW
2786#endif
2787
f7416781
KW
2788}
2789
faef259e
KW
2790#ifdef USE_LOCALE
2791
1b5d1fe8 2792STATIC const char *
0b8fa716 2793S_save_to_buffer(const char * string, const char **buf, Size_t *buf_size)
f7416781 2794{
0b8fa716
KW
2795 /* Copy the NUL-terminated 'string' to a buffer whose address before this
2796 * call began at *buf, and whose available length before this call was
2797 * *buf_size.
2798 *
2799 * If the length of 'string' is greater than the space available, the
2800 * buffer is grown accordingly, which may mean that it gets relocated.
2801 * *buf and *buf_size will be updated to reflect this.
2802 *
2803 * Regardless, the function returns a pointer to where 'string' is now
2804 * stored.
2805 *
2806 * 'string' may be NULL, which means no action gets taken, and NULL is
2807 * returned.
2808 *
2809 * If *buf or 'buf_size' are NULL or *buf_size is 0, the buffer is assumed
2810 * empty, and memory is malloc'd. 'buf-size' being NULL is to be used
2811 * when this is a single use buffer, which will shortly be freed by the
2812 * caller.
2813 */
f7416781 2814
0b6697c5 2815 Size_t string_size;
f7416781
KW
2816
2817 PERL_ARGS_ASSERT_SAVE_TO_BUFFER;
2818
0b6697c5
KW
2819 if (! string) {
2820 return NULL;
2821 }
2822
0b8fa716 2823 string_size = strlen(string) + 1;
0b6697c5 2824
b5644714
KW
2825 if (buf_size == NULL) {
2826 Newx(*buf, string_size, char);
2827 }
2828 else if (*buf_size == 0) {
f7416781
KW
2829 Newx(*buf, string_size, char);
2830 *buf_size = string_size;
2831 }
2832 else if (string_size > *buf_size) {
2833 Renew(*buf, string_size, char);
2834 *buf_size = string_size;
2835 }
2836
0b8fa716
KW
2837 {
2838 dTHX_DEBUGGING;
2839 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
2840 "Copying '%s' to %p\n",
2841 ((is_utf8_string((U8 *) string, 0))
2842 ? string
2843 :_byte_dump_string((U8 *) string, strlen(string), 0)),
2844 *buf));
2845 }
2846
14580ae6
KW
2847# ifdef DEBUGGING
2848
2849 /* Catch glitches. Usually this is because LC_CTYPE needs to be the same
2850 * locale as whatever is being worked on */
2851 if (UNLIKELY(instr(string, REPLACEMENT_CHARACTER_UTF8))) {
2852 dTHX_DEBUGGING;
2853
2854 locale_panic_(Perl_form(aTHX_
2855 "Unexpected REPLACEMENT_CHARACTER in '%s'\n%s",
2856 string, get_LC_ALL_display()));
2857 }
2858
2859# endif
2860
0b8fa716 2861 Copy(string, *buf, string_size, char);
f7416781
KW
2862 return *buf;
2863}
2864
d5efbc49
KW
2865STATIC utf8ness_t
2866S_get_locale_string_utf8ness_i(pTHX_ const char * locale,
2867 const unsigned cat_index,
2868 const char * string,
2869 const locale_utf8ness_t known_utf8)
2870{
2871 /* Return to indicate if 'string' in the locale given by the input
2872 * arguments should be considered UTF-8 or not.
2873 *
2874 * If the input 'locale' is not NULL, use that for the locale; otherwise
2875 * use the current locale for the category specified by 'cat_index'.
2876 */
2877
2878 Size_t len;
2879 const U8 * first_variant = NULL;
2880
2881 PERL_ARGS_ASSERT_GET_LOCALE_STRING_UTF8NESS_I;
2882 assert(cat_index <= NOMINAL_LC_ALL_INDEX);
2883
2884 if (string == NULL) {
2885 return UTF8NESS_NO;
2886 }
2887
2888 if (IN_BYTES) { /* respect 'use bytes' */
2889 return UTF8NESS_NO;
2890 }
2891
2892 len = strlen(string);
2893
2894 /* UTF8ness is immaterial if the representation doesn't vary */
2895 if (is_utf8_invariant_string_loc((U8 *) string, len, &first_variant)) {
2896 return UTF8NESS_IMMATERIAL;
2897 }
2898
2899 /* Can't be UTF-8 if invalid */
2900 if (! is_utf8_string((U8 *) first_variant,
2901 len - ((char *) first_variant - string)))
2902 {
2903 return UTF8NESS_NO;
2904 }
2905
2906 /* Here and below, we know the string is legal UTF-8, containing at least
2907 * one character requiring a sequence of two or more bytes. It is quite
2908 * likely to be UTF-8. But it pays to be paranoid and do further checking.
2909 *
2910 * If we already know the UTF-8ness of the locale, then we immediately know
2911 * what the string is */
2912 if (UNLIKELY(known_utf8 != LOCALE_UTF8NESS_UNKNOWN)) {
2913 if (known_utf8 == LOCALE_IS_UTF8) {
2914 return UTF8NESS_YES;
2915 }
2916 else {
2917 return UTF8NESS_NO;
2918 }
2919 }
2920
2921# if defined(HAS_SOME_LANGINFO) || defined(HAS_MBTOWC) || defined(HAS_MBRTOWC)
2922
2923 /* Here, we have available the libc functions that can be used to
2924 * accurately determine the UTF8ness of the underlying locale. If it is a
2925 * UTF-8 locale, the string is UTF-8; otherwise it was coincidental that
2926 * the string is legal UTF-8
2927 *
2928 * However, if the perl is compiled to not pay attention to the category
2929 * being passed in, you might think that that locale is essentially always
2930 * the C locale, so it would make sense to say it isn't UTF-8. But to get
2931 * here, the string has to contain characters unknown in the C locale. And
2932 * in fact, Windows boxes are compiled without LC_MESSAGES, as their
2933 * message catalog isn't really a part of the locale system. But those
2934 * messages really could be UTF-8, and given that the odds are rather small
2935 * of something not being UTF-8 but being syntactically valid UTF-8, khw
2936 * has decided to call such strings as UTF-8. */
2937
2938 if (locale == NULL) {
2939 locale = querylocale_i(cat_index);
2940 }
2941 if (is_locale_utf8(locale)) {
2942 return UTF8NESS_YES;
2943 }
2944
2945 return UTF8NESS_NO;
2946
2947# else
2948
2949 /* Here, we have a valid UTF-8 string containing non-ASCII characters, and
2950 * don't have access to functions to check if the locale is UTF-8 or not.
2951 * Assume that it is. khw tried adding a check that the string is entirely
2952 * in a single Unicode script, but discovered the strftime() timezone is
2953 * user-settable through the environment, which may be in a different
2954 * script than the locale-expected value. */
2955 PERL_UNUSED_ARG(locale);
2956 PERL_UNUSED_ARG(cat_index);
2957
2958 return UTF8NESS_YES;
2959
c728a1a6 2960# endif
faef259e 2961
d5efbc49
KW
2962}
2963
c728a1a6
KW
2964# ifdef WIN32
2965
2966bool
2967Perl_get_win32_message_utf8ness(pTHX_ const char * string)
2968{
2969 /* NULL => locale irrelevant, 0 => category irrelevant
2970 * so returns based on the UTF-8 legality of the input string, ignoring the
2971 * locale and category completely.
2972 *
2973 * This is because Windows doesn't have LC_MESSAGES */
2974 return get_locale_string_utf8ness_i(NULL, 0, string, LOCALE_IS_UTF8);
2975}
2976
2977# endif
d5efbc49
KW
2978#endif /* USE_LOCALE */
2979
c728a1a6 2980
d5f728f7
KW
2981int
2982Perl_mbtowc_(pTHX_ const wchar_t * pwc, const char * s, const Size_t len)
2983{
2984
2985#if ! defined(HAS_MBRTOWC) && ! defined(HAS_MBTOWC)
2986
2987 PERL_UNUSED_ARG(pwc);
2988 PERL_UNUSED_ARG(s);
2989 PERL_UNUSED_ARG(len);
2990 return -1;
2991
2992#else /* Below we have some form of mbtowc() */
2993# if defined(HAS_MBRTOWC) \
2994 && (defined(USE_LOCALE_THREADS) || ! defined(HAS_MBTOWC))
2995# define USE_MBRTOWC
2996# else
2997# undef USE_MBRTOWC
2998# endif
2999
3000 int retval = -1;
3001
3002 if (s == NULL) { /* Initialize the shift state to all zeros in
3003 PL_mbrtowc_ps. */
3004
3005# if defined(USE_MBRTOWC)
3006
3007 memzero(&PL_mbrtowc_ps, sizeof(PL_mbrtowc_ps));
3008 return 0;
3009
3010# else
3011
598718e2 3012 MBTOWC_LOCK_;
d5f728f7
KW
3013 SETERRNO(0, 0);
3014 retval = mbtowc(NULL, NULL, 0);
598718e2 3015 MBTOWC_UNLOCK_;
d5f728f7
KW
3016 return retval;
3017
3018# endif
3019
3020 }
3021
3022# if defined(USE_MBRTOWC)
3023
3024 SETERRNO(0, 0);
3025 retval = (SSize_t) mbrtowc((wchar_t *) pwc, s, len, &PL_mbrtowc_ps);
3026
3027# else
3028
3029 /* Locking prevents races, but locales can be switched out without locking,
3030 * so this isn't a cure all */
598718e2 3031 MBTOWC_LOCK_;
d5f728f7
KW
3032 SETERRNO(0, 0);
3033 retval = mbtowc((wchar_t *) pwc, s, len);
598718e2 3034 MBTOWC_UNLOCK_;
d5f728f7
KW
3035
3036# endif
3037
3038 return retval;
3039
3040#endif
3041
3042}
3043
f7416781 3044/*
b7f19487
KW
3045=for apidoc Perl_localeconv
3046
3047This is a thread-safe version of the libc L<localeconv(3)>. It is the same as
3048L<POSIX::localeconv|POSIX/localeconv> (returning a hash of the C<localeconv()>
3049fields), but directly callable from XS code.
3050
3051=cut
3052*/
3053
3054HV *
3055Perl_localeconv(pTHX)
3056{
3057
3058#if ! defined(HAS_SOME_LOCALECONV) \
3059 || (! defined(USE_LOCALE_MONETARY) && ! defined(USE_LOCALE_NUMERIC))
3060
3061 return newHV();
3062
3063#else
3064
0630bfb4 3065 return my_localeconv(0, LOCALE_UTF8NESS_UNKNOWN);
b7f19487
KW
3066
3067#endif
3068
3069}
3070
3071#if defined(HAS_SOME_LOCALECONV) \
3072 && (defined(USE_LOCALE_MONETARY) || defined(USE_LOCALE_NUMERIC))
3073
3074HV *
0630bfb4 3075S_my_localeconv(pTHX_ const int item, const locale_utf8ness_t locale_is_utf8)
b7f19487
KW
3076{
3077 HV * retval;
3078 locale_utf8ness_t numeric_locale_is_utf8 = LOCALE_UTF8NESS_UNKNOWN;
3079 locale_utf8ness_t monetary_locale_is_utf8 = LOCALE_UTF8NESS_UNKNOWN;
3080 HV * (*copy_localeconv)(pTHX_ const struct lconv *,
0630bfb4
KW
3081 const int,
3082 const locale_utf8ness_t,
3083 const locale_utf8ness_t);
b7f19487
KW
3084
3085 /* A thread-safe locale_conv(). The locking mechanisms vary greatly
3086 * depending on platform capabilities. They all share this common set up
3087 * code for the function, and then conditional compilations choose one of
3088 * several terminations.
3089 *
0630bfb4
KW
3090 * There are two use cases:
3091 * 1) Called from POSIX::locale_conv(). This returns lconv() copied to
b7f19487 3092 * a hash, based on the current underlying locale.
0630bfb4
KW
3093 * 2) Certain items that nl_langinfo() provides are also derivable from
3094 * the return of localeconv(). Windows notably doesn't have
3095 * nl_langinfo(), so on that, and actually any platform lacking it,
3096 * my_localeconv() is used to emulate it for those particular items.
3097 * The code to do this is compiled only on such platforms. Rather than
3098 * going to the expense of creating a full hash when only one item is
3099 * needed, just the desired item is returned, in an SV cast to an HV.
b7f19487 3100 *
0630bfb4
KW
3101 * There is a helper function to accomplish each of the two tasks. The
3102 * function pointer just below is set to the appropriate one, and is called
b7f19487 3103 * from each of the various implementations, in the middle of whatever
0630bfb4
KW
3104 * necessary locking/locale swapping have been done. */
3105
3106# ifdef HAS_SOME_LANGINFO
3107
3108 PERL_UNUSED_ARG(item);
3109 PERL_UNUSED_ARG(locale_is_utf8);
3110
3111# ifdef USE_LOCALE_NUMERIC
3112
3113 /* When there is a nl_langinfo, we will only be called for localeconv
3114 * numeric purposes. */
3115 const bool is_localeconv_call = true;
3116
3117# endif
3118
3119# else
3120
3121 /* Note we use this sentinel; this works because this only gets compiled
3122 * when our perl_langinfo.h is used, and that uses negative numbers for all
3123 * the items */
3124 const bool is_localeconv_call = (item == 0);
3125 if (is_localeconv_call)
3126
3127# endif
b7f19487
KW
3128
3129 {
3130 copy_localeconv = S_populate_localeconv;
3131
3132# ifdef USE_LOCALE_NUMERIC
3133
3134 /* Get the UTF8ness of the locales now to avoid repeating this for each
3135 * string returned by localeconv() */
3136 numeric_locale_is_utf8 = (is_locale_utf8(PL_numeric_name))
3137 ? LOCALE_IS_UTF8
3138 : LOCALE_NOT_UTF8;
3139
3140# endif
3141# ifdef USE_LOCALE_MONETARY
3142
3143 monetary_locale_is_utf8 = (is_locale_utf8(querylocale_c(LC_MONETARY)))
3144 ? LOCALE_IS_UTF8
3145 : LOCALE_NOT_UTF8;
3146
3147# endif
3148
3149 }
3150
0630bfb4
KW
3151# ifndef HAS_SOME_LANGINFO
3152
3153 else {
3154 copy_localeconv = S_get_nl_item_from_localeconv;
3155 numeric_locale_is_utf8 = locale_is_utf8;
3156 }
3157
3158# endif
3159
b7f19487
KW
3160 PERL_ARGS_ASSERT_MY_LOCALECONV;
3161/*--------------------------------------------------------------------------*/
3162/* Here, we are done with the common beginning of all the implementations of
3163 * my_localeconv(). Below are the various terminations of the function (except
3164 * the closing '}'. They are separated out because the preprocessor directives
3165 * were making the simple logic hard to follow. Each implementation ends with
3166 * the same few lines. khw decided to keep those separate because he thought
3167 * it was clearer to the reader.
3168 *
3169 * The first distinct termination (of the above common code) are the
3170 * implementations when we have locale_conv_l() and can use it. These are the
3171 * simplest cases, without any locking needed. */
3172# if defined(USE_POSIX_2008_LOCALE) && defined(HAS_LOCALECONV_L)
3173
3174 /* And there are two sub-cases: First (by far the most common) is where we
3175 * are compiled to pay attention to LC_NUMERIC */
3176# ifdef USE_LOCALE_NUMERIC
3177
3178 const locale_t cur = use_curlocale_scratch();
3179 locale_t with_numeric = duplocale(cur);
3180
3181 /* Just create a new locale object with what we've got, but using the
3182 * underlying LC_NUMERIC locale */
3183 with_numeric = newlocale(LC_NUMERIC_MASK, PL_numeric_name, with_numeric);
3184
3185 retval = copy_localeconv(aTHX_ localeconv_l(with_numeric),
0630bfb4 3186 item,
b7f19487
KW
3187 numeric_locale_is_utf8,
3188 monetary_locale_is_utf8);
3189 freelocale(with_numeric);
3190
3191 return retval;
3192
3193/*--------------------------------------------------------------------------*/
3194# else /* Below not paying attention to LC_NUMERIC */
3195
3196 const locale_t cur = use_curlocale_scratch();
3197
3198 retval = copy_localeconv(aTHX_ localeconv_l(cur),
0630bfb4 3199 item,
b7f19487
KW
3200 numeric_locale_is_utf8,
3201 monetary_locale_is_utf8);
3202 return retval;
3203
3204# endif /* Above, using lconv_l(); below plain lconv() */
3205/*--------------------------------------------------------------------------*/
3206# elif ! defined(TS_W32_BROKEN_LOCALECONV) /* Next is regular lconv() */
3207
f5988947
KW
3208 /* There are so many locks because localeconv() deals with two
3209 * categories, and returns in a single global static buffer. Some
3210 * locks might be no-ops on this platform, but not others. We need to
3211 * lock if any one isn't a no-op. */
b7f19487 3212
f5988947
KW
3213# ifdef USE_LOCALE_NUMERIC
3214
3215 LC_NUMERIC_LOCK(0);
0630bfb4
KW
3216 const char * orig_switched_locale = NULL;
3217
3218 /* When called internally, are already switched into the proper numeric
3219 * locale; otherwise must toggle to it */
3220 if (is_localeconv_call) {
3221 orig_switched_locale = toggle_locale_c(LC_NUMERIC, PL_numeric_name);
3222 }
f5988947
KW
3223
3224# endif
b7f19487 3225
a09a3b94 3226 gwLOCALE_LOCK;
0630bfb4
KW
3227 retval = copy_localeconv(aTHX_ localeconv(),
3228 item,
3229 numeric_locale_is_utf8,
3230 monetary_locale_is_utf8);
a09a3b94 3231 gwLOCALE_UNLOCK;
f5988947
KW
3232
3233# ifdef USE_LOCALE_NUMERIC
3234
0630bfb4
KW
3235 if (orig_switched_locale) {
3236 restore_toggled_locale_c(LC_NUMERIC, orig_switched_locale);
3237 }
f5988947
KW
3238 LC_NUMERIC_UNLOCK;
3239
3240# endif
b7f19487
KW
3241
3242 return retval;
3243
3244/*--------------------------------------------------------------------------*/
3245# else /* defined(TS_W32_BROKEN_LOCALECONV) */
3246
3247 /* Last is a workaround for the broken localeconv() on Windows with
3248 * thread-safe locales prior to VS 15. It looks at the global locale
3249 * instead of the thread one. As a work-around, we toggle to the global
3250 * locale; populate the return; then toggle back. We have to use LC_ALL
3251 * instead of the individual categories because of another bug in Windows.
3252 *
3253 * This introduces a potential race with any other thread that has also
3254 * converted to use the global locale, and doesn't protect its locale calls
3255 * with mutexes. khw can't think of any reason for a thread to do so on
3256 * Windows, as the locale API is the same regardless of thread-safety, except
3257 * if the code is ported from working on another platform where there might
3258 * be some reason to do this. But this is typically due to some
3259 * alien-to-Perl library that thinks it owns locale setting. Such a
3260 * library usn't likely to exist on Windows, so such an application is
3261 * unlikely to be run on Windows
3262 */
3263 bool restore_per_thread = FALSE;
b7f19487 3264
f5988947
KW
3265# ifdef USE_LOCALE_NUMERIC
3266
3267 const char * orig_switched_locale = NULL;
3268
3269 LC_NUMERIC_LOCK(0);
0630bfb4
KW
3270
3271 /* When called internally, are already switched into the proper numeric
3272 * locale; otherwise must toggle to it */
3273 if (is_localeconv_call) {
3274 orig_switched_locale = toggle_locale_c(LC_NUMERIC, PL_numeric_name);
3275 }
f5988947
KW
3276
3277# endif
b7f19487
KW
3278
3279 /* Save the per-thread locale state */
3280 const char * save_thread = querylocale_c(LC_ALL);
3281
3282 /* Change to the global locale, and note if we already were there */
3283 if (_configthreadlocale(_DISABLE_PER_THREAD_LOCALE)
3284 != _DISABLE_PER_THREAD_LOCALE)
3285 {
3286 restore_per_thread = TRUE;
3287 }
3288
3289 /* Save the state of the global locale; then convert to our desired
3290 * state. */
3291 const char * save_global = querylocale_c(LC_ALL);
3292 void_setlocale_c(LC_ALL, save_thread);
3293
3294 /* Safely stash the desired data */
a09a3b94 3295 gwLOCALE_LOCK;
0630bfb4
KW
3296 retval = copy_localeconv(aTHX_ localeconv(),
3297 item,
3298 numeric_locale_is_utf8,
3299 monetary_locale_is_utf8);
a09a3b94 3300 gwLOCALE_UNLOCK;
b7f19487
KW
3301
3302 /* Restore the global locale's prior state */
3303 void_setlocale_c(LC_ALL, save_global);
3304
3305 /* And back to per-thread locales */
3306 if (restore_per_thread) {
3307 _configthreadlocale(_ENABLE_PER_THREAD_LOCALE);
3308 }
3309
3310 /* Restore the per-thread locale state */
3311 void_setlocale_c(LC_ALL, save_thread);
3312
f5988947
KW
3313# ifdef USE_LOCALE_NUMERIC
3314
0630bfb4
KW
3315 if (orig_switched_locale) {
3316 restore_toggled_locale_c(LC_NUMERIC, orig_switched_locale);
3317 }
f5988947
KW
3318 LC_NUMERIC_UNLOCK;
3319
3320# endif
b7f19487
KW
3321
3322 return retval;
3323
3324# endif
3325/*--------------------------------------------------------------------------*/
3326}
3327
3328STATIC HV *
3329S_populate_localeconv(pTHX_ const struct lconv *lcbuf,
0630bfb4 3330 const int unused,
b7f19487
KW
3331 const locale_utf8ness_t numeric_locale_is_utf8,
3332 const locale_utf8ness_t monetary_locale_is_utf8)
3333{
3334 /* This returns a mortalized hash containing all the elements returned by
3335 * localeconv(). It is used by Perl_localeconv() and POSIX::localeconv()
3336 */
0630bfb4 3337 PERL_UNUSED_ARG(unused);
b7f19487
KW
3338
3339 struct lconv_offset {
3340 const char *name;
3341 size_t offset;
3342 };
3343
3344 /* Create e.g.,
3345 {"thousands_sep", STRUCT_OFFSET(struct lconv, thousands_sep)},
3346 */
3347# define LCONV_ENTRY(name) \
3348 {STRINGIFY(name), STRUCT_OFFSET(struct lconv, name)}
3349
3350 /* Set up structures containing the documented fields. One structure for
3351 * LC_NUMERIC-controlled strings; one for LC_MONETARY ones, and a final one
3352 * of just numerics. */
3353# ifdef USE_LOCALE_NUMERIC
3354
3355 static const struct lconv_offset lconv_numeric_strings[] = {
3356 LCONV_ENTRY(decimal_point),
3357 LCONV_ENTRY(thousands_sep),
3358# ifndef NO_LOCALECONV_GROUPING
3359 LCONV_ENTRY(grouping),
3360# endif
3361 {NULL, 0}
3362 };
3363
3364# endif
3365# ifdef USE_LOCALE_MONETARY
3366
3367 static const struct lconv_offset lconv_monetary_strings[] = {
3368 LCONV_ENTRY(int_curr_symbol),
3369 LCONV_ENTRY(currency_symbol),
3370 LCONV_ENTRY(mon_decimal_point),
3371# ifndef NO_LOCALECONV_MON_THOUSANDS_SEP
3372 LCONV_ENTRY(mon_thousands_sep),
3373# endif
3374# ifndef NO_LOCALECONV_MON_GROUPING
3375 LCONV_ENTRY(mon_grouping),
3376# endif
3377 LCONV_ENTRY(positive_sign),
3378 LCONV_ENTRY(negative_sign),
3379 {NULL, 0}
3380 };
3381
3382# endif
3383
3384 static const struct lconv_offset lconv_integers[] = {
3385# ifdef USE_LOCALE_MONETARY
3386 LCONV_ENTRY(int_frac_digits),
3387 LCONV_ENTRY(frac_digits),
3388 LCONV_ENTRY(p_cs_precedes),
3389 LCONV_ENTRY(p_sep_by_space),
3390 LCONV_ENTRY(n_cs_precedes),
3391 LCONV_ENTRY(n_sep_by_space),
3392 LCONV_ENTRY(p_sign_posn),
3393 LCONV_ENTRY(n_sign_posn),
3394# ifdef HAS_LC_MONETARY_2008
3395 LCONV_ENTRY(int_p_cs_precedes),
3396 LCONV_ENTRY(int_p_sep_by_space),
3397 LCONV_ENTRY(int_n_cs_precedes),
3398 LCONV_ENTRY(int_n_sep_by_space),
3399 LCONV_ENTRY(int_p_sign_posn),
3400 LCONV_ENTRY(int_n_sign_posn),
3401# endif
3402# endif
3403 {NULL, 0}
3404 };
3405
3406 static const unsigned category_indices[] = {
3407# ifdef USE_LOCALE_NUMERIC
3408 LC_NUMERIC_INDEX_,
3409# endif
3410# ifdef USE_LOCALE_MONETARY
3411 LC_MONETARY_INDEX_,
3412# endif
3413 (unsigned) -1 /* Just so the previous element can always end with a
3414 comma => subtract 1 below for the max loop index */
3415 };
3416
3417 const char *ptr = (const char *) lcbuf;
3418 const struct lconv_offset *integers = lconv_integers;
3419
3420 HV * retval = newHV();
3421 sv_2mortal((SV*)retval);
3422
3423 PERL_ARGS_ASSERT_POPULATE_LOCALECONV;
3424
3425 /* For each enabled category ... */
3426 for (PERL_UINT_FAST8_T i = 0; i < C_ARRAY_LENGTH(category_indices) - 1; i++) {
3427 const unsigned cat_index = category_indices[i];
3428 locale_utf8ness_t locale_is_utf8 = LOCALE_UTF8NESS_UNKNOWN;
3429 const char *locale;
3430
3431 /* ( = NULL silences a compiler warning; would segfault if it could
3432 * actually happen.) */
3433 const struct lconv_offset *strings = NULL;
3434
3435# ifdef USE_LOCALE_NUMERIC
3436 if (cat_index == LC_NUMERIC_INDEX_) {
3437 locale_is_utf8 = numeric_locale_is_utf8;
3438 strings = lconv_numeric_strings;
3439 }
0630bfb4
KW
3440# else
3441 PERL_UNUSED_ARG(numeric_locale_is_utf8);
b7f19487
KW
3442# endif
3443# ifdef USE_LOCALE_MONETARY
3444 if (cat_index == LC_MONETARY_INDEX_) {
3445 locale_is_utf8 = monetary_locale_is_utf8;
3446 strings = lconv_monetary_strings;
3447 }
0630bfb4
KW
3448# else
3449 PERL_UNUSED_ARG(monetary_locale_is_utf8);
b7f19487
KW
3450# endif
3451
3452 assert(locale_is_utf8 != LOCALE_UTF8NESS_UNKNOWN);
3453
3454 /* Iterate over the strings structure for this category */
0630bfb4 3455 locale = querylocale_i(cat_index);
b7f19487
KW
3456
3457 while (strings->name) {
3458 const char *value = *((const char **)(ptr + strings->offset));
3459 if (value && *value) {
3460 bool is_utf8 = /* Only make UTF-8 if required to */
3461 (UTF8NESS_YES == (get_locale_string_utf8ness_i(locale,
3462 cat_index,
3463 value,
3464 locale_is_utf8)));
3465 (void) hv_store(retval,
3466 strings->name,
3467 strlen(strings->name),
3468 newSVpvn_utf8(value, strlen(value), is_utf8),
3469 0);
3470 }
3471
3472 strings++;
3473 }
3474 }
3475
3476 while (integers->name) {
3477 const char value = *((const char *)(ptr + integers->offset));
3478
3479 if (value != CHAR_MAX)
3480 (void) hv_store(retval, integers->name,
3481 strlen(integers->name), newSViv(value), 0);
3482 integers++;
3483 }
3484
3485 return retval;
3486}
3487
0630bfb4
KW
3488# ifndef HAS_SOME_LANGINFO
3489
3490STATIC HV *
3491S_get_nl_item_from_localeconv(pTHX_ const struct lconv *lcbuf,
3492 const int item,
3493 const locale_utf8ness_t unused1,
3494 const locale_utf8ness_t unused2)
3495{
3496 /* This is a helper function for my_localeconv(), which is called from
3497 * my_langinfo() to emulate the libc nl_langinfo() function on platforms
3498 * that don't have it available.
3499 *
3500 * This function acts as an extension to my_langinfo(), the intermediate
3501 * my_localeconv() call is to set up the locks and switch into the proper
3502 * locale. That logic exists for other reasons, and by doing it this way,
3503 * it doesn't have to be duplicated.
3504 *
3505 * This function extracts the current value of 'item' in the current locale
3506 * using the localconv() result also passed in, via 'lcbuf'. The other
3507 * parameter is unused, a placeholder so the signature of this function
3508 * matches another that does need it, and so the two functions can be
3509 * referred to by a single function pointer, to simplify the code below */
3510
3511 const char * prefix = "";
3512 const char * temp = NULL;
3513
3514 PERL_ARGS_ASSERT_GET_NL_ITEM_FROM_LOCALECONV;
3515 PERL_UNUSED_ARG(unused1);
3516 PERL_UNUSED_ARG(unused2);
3517
3518 switch (item) {
3519 case CRNCYSTR:
3520 temp = lcbuf->currency_symbol;
3521
3522 if (lcbuf->p_cs_precedes) {
3523
3524 /* khw couldn't find any documentation that CHAR_MAX is the signal,
3525 * but cygwin uses it thusly */
3526 if (lcbuf->p_cs_precedes == CHAR_MAX) {
3527 prefix = ".";
3528 }
3529 else {
3530 prefix = "-";
3531 }
3532 }
3533 else {
3534 prefix = "+";
3535 }
3536
3537 break;
3538
3539 case RADIXCHAR:
3540 temp = lcbuf->decimal_point;
3541 break;
3542
3543 case THOUSEP:
3544 temp = lcbuf->thousands_sep;
3545 break;
3546
3547 default:
3548 locale_panic_(Perl_form(aTHX_
3549 "Unexpected item passed to populate_localeconv: %d", item));
3550 }
3551
3552 return (HV *) Perl_newSVpvf(aTHX_ "%s%s", prefix, temp);
3553}
3554
3555# endif /* ! Has some form of langinfo() */
b7f19487
KW
3556#endif /* Has some form of localeconv() and paying attn to a category it
3557 traffics in */
3558
d0826532
KW
3559#ifndef HAS_SOME_LANGINFO
3560
3561typedef int nl_item; /* Substitute 'int' for emulated nl_langinfo() */
3562
3563#endif
3564
b7f19487 3565/*
f7416781 3566
d0826532
KW
3567=for apidoc Perl_langinfo
3568=for apidoc_item Perl_langinfo8
3569
3570C<Perl_langinfo> is an (almost) drop-in replacement for the system
3571C<L<nl_langinfo(3)>>, taking the same C<item> parameter values, and returning
3572the same information. But it is more thread-safe than regular
3573C<nl_langinfo()>, and hides the quirks of Perl's locale handling from your
3574code, and can be used on systems that lack a native C<nl_langinfo>.
f7416781 3575
d0826532
KW
3576However, you should instead use the improved version of this:
3577L</Perl_langinfo8>, which behaves identically except for an additional
3578parameter, a pointer to a variable declared as L</C<utf8ness_t>>, into which it
3579returns to you how you should treat the returned string with regards to it
3580being encoded in UTF-8 or not.
f7416781 3581
d0826532 3582Concerning the differences between these and plain C<nl_langinfo()>:
f7416781
KW
3583
3584=over
3585
d0826532 3586=item a.
f7416781 3587
d0826532
KW
3588C<Perl_langinfo8> has an extra parameter, described above. Besides this, the
3589other reasons they aren't quite a drop-in replacement is actually an advantage.
3590The C<const>ness of the return allows the compiler to catch attempts to write
3591into the returned buffer, which is illegal and could cause run-time crashes.
ce181bf6 3592
d0826532 3593=item b.
ce181bf6 3594
d0826532 3595They deliver the correct results for the C<RADIXCHAR> and C<THOUSEP> items,
f7416781
KW
3596without you having to write extra code. The reason for the extra code would be
3597because these are from the C<LC_NUMERIC> locale category, which is normally
9487427b
KW
3598kept set by Perl so that the radix is a dot, and the separator is the empty
3599string, no matter what the underlying locale is supposed to be, and so to get
3600the expected results, you have to temporarily toggle into the underlying
3601locale, and later toggle back. (You could use plain C<nl_langinfo> and
3602C<L</STORE_LC_NUMERIC_FORCE_TO_UNDERLYING>> for this but then you wouldn't get
3603the other advantages of C<Perl_langinfo()>; not keeping C<LC_NUMERIC> in the C
3604(or equivalent) locale would break a lot of CPAN, which is expecting the radix
3605(decimal point) character to be a dot.)
ce181bf6 3606
d0826532 3607=item c.
ce181bf6 3608
d0826532 3609The system function they replace can have its static return buffer trashed,
f1460a66 3610not only by a subsequent call to that function, but by a C<freelocale>,
d0826532
KW
3611C<setlocale>, or other locale change. The returned buffer of these functions
3612is not changed until the next call to one or the other, so the buffer is never
3613in a trashed state.
f7416781 3614
d0826532 3615=item d.
f7416781 3616
d0826532
KW
3617The return buffer is per-thread, so it also is never overwritten by a call to
3618these functions from another thread; unlike the function it replaces.
ce181bf6 3619
d0826532 3620=item e.
ce181bf6 3621
d0826532
KW
3622But most importantly, they work on systems that don't have C<nl_langinfo>, such
3623as Windows, hence making your code more portable. Of the fifty-some possible
ce181bf6 3624items specified by the POSIX 2008 standard,
f7416781 3625L<http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/langinfo.h.html>,
8d72e74e 3626only one is completely unimplemented, though on non-Windows platforms, another
d0826532 3627significant one is not fully implemented). They use various techniques to
8d72e74e
KW
3628recover the other items, including calling C<L<localeconv(3)>>, and
3629C<L<strftime(3)>>, both of which are specified in C89, so should be always be
3630available. Later C<strftime()> versions have additional capabilities; C<""> is
d0826532 3631returned for any item not available on your system.
ce181bf6 3632
d0826532 3633It is important to note that, when called with an item that is recovered by
ce181bf6 3634using C<localeconv>, the buffer from any previous explicit call to
d0826532
KW
3635C<L<localeconv(3)>> will be overwritten. But you shouldn't be using
3636C<localeconv> anyway because it is is very much not thread-safe, and suffers
3637from the same problems outlined in item 'b.' above for the fields it returns that
3638are controlled by the LC_NUMERIC locale category. Instead, avoid all of those
3639problems by calling L</Perl_localeconv>, which is thread-safe; or by using the
3640methods given in L<perlcall> to call
3641L<C<POSIX::localeconv()>|POSIX/localeconv>, which is also thread-safe.
3642
3643=back
9f0bc911 3644
6201c220
KW
3645The details for those items which may deviate from what this emulation returns
3646and what a native C<nl_langinfo()> would return are specified in
3647L<I18N::Langinfo>.
f7416781 3648
d0826532
KW
3649When using C<Perl_langinfo8> (or plain C<Perl_langinfo>) on systems that don't
3650have a native C<nl_langinfo()>, you must
f7416781
KW
3651
3652 #include "perl_langinfo.h"
3653
3654before the C<perl.h> C<#include>. You can replace your C<langinfo.h>
3655C<#include> with this one. (Doing it this way keeps out the symbols that plain
c3997e39
KW
3656C<langinfo.h> would try to import into the namespace for code that doesn't need
3657it.)
f7416781 3658
f7416781
KW
3659=cut
3660
3661*/
3662
3663const char *
f7416781 3664Perl_langinfo(const nl_item item)
f7416781 3665{
d0826532
KW
3666 return Perl_langinfo8(item, NULL);
3667}
3668
3669const char *
3670Perl_langinfo8(const nl_item item, utf8ness_t * utf8ness)
3671{
07492de5 3672 dTHX;
fc46dd0d
KW
3673 unsigned cat_index;
3674
d0826532
KW
3675 PERL_ARGS_ASSERT_PERL_LANGINFO8;
3676
3677 if (utf8ness) { /* Assume for now */
3678 *utf8ness = UTF8NESS_IMMATERIAL;
3679 }
3680
fc46dd0d
KW
3681 /* Find the locale category that controls the input 'item'. If we are not
3682 * paying attention to that category, instead return a default value. Also
3683 * return the default value if there is no way for us to figure out the
3684 * correct value. If we have some form of nl_langinfo(), we can always
3685 * figure it out, but lacking that, there may be alternative methods that
3686 * can be used to recover most of the possible items. Some of those
3687 * methods need libc functions, which may or may not be available. If
3688 * unavailable, we can't compute the correct value, so must here return the
3689 * default. */
faef259e 3690 switch (item) {
fc46dd0d
KW
3691
3692 case CODESET:
faef259e
KW
3693
3694#ifdef USE_LOCALE_CTYPE
faef259e 3695
fc46dd0d
KW
3696 cat_index = LC_CTYPE_INDEX_;
3697 break;
faef259e 3698
fc46dd0d
KW
3699#else
3700 return C_codeset;
faef259e
KW
3701#endif
3702#if defined(USE_LOCALE_MESSAGES) && defined(HAS_SOME_LANGINFO)
faef259e 3703
fc46dd0d
KW
3704 case YESEXPR: case YESSTR: case NOEXPR: case NOSTR:
3705 cat_index = LC_MESSAGES_INDEX_;
3706 break;
3707#else
faef259e
KW
3708 case YESEXPR: return "^[+1yY]";
3709 case YESSTR: return "yes";
3710 case NOEXPR: return "^[-0nN]";
3711 case NOSTR: return "no";
faef259e 3712#endif
fc46dd0d
KW
3713
3714 case CRNCYSTR:
3715
faef259e
KW
3716#if defined(USE_LOCALE_MONETARY) \
3717 && (defined(HAS_SOME_LANGINFO) || defined(HAS_SOME_LOCALECONV))
faef259e 3718
fc46dd0d
KW
3719 cat_index = LC_MONETARY_INDEX_;
3720 break;
3721#else
faef259e 3722 return "-";
faef259e 3723#endif
fc76ce84 3724
fc46dd0d
KW
3725 case RADIXCHAR:
3726
fc76ce84 3727#ifdef CAN_CALCULATE_RADIX
faef259e 3728
fc46dd0d
KW
3729 cat_index = LC_NUMERIC_INDEX_;
3730 break;
3731#else
faef259e 3732 return C_decimal_point;
fc76ce84 3733#endif
fc46dd0d
KW
3734
3735 case THOUSEP:
3736
fc76ce84
KW
3737#if defined(USE_LOCALE_NUMERIC) \
3738 && (defined(HAS_SOME_LANGINFO) || defined(HAS_SOME_LOCALECONV))
fc46dd0d
KW
3739
3740 cat_index = LC_NUMERIC_INDEX_;
3741 break;
fc76ce84 3742#else
faef259e 3743 return C_thousands_sep;
fc46dd0d
KW
3744#endif
3745
3746/* The other possible items are all in LC_TIME. */
3747#ifdef USE_LOCALE_TIME
3748
3749 default:
3750 cat_index = LC_TIME_INDEX_;
3751 break;
faef259e
KW
3752
3753#endif
3754#if ! defined(USE_LOCALE_TIME) || ! defined(HAS_SOME_LANGINFO)
3755
3756 /* If not using LC_TIME, hard code the rest. Or, if there is no
3757 * nl_langinfo(), we use strftime() as an alternative, and it is missing
3758 * functionality to get every single one, so hard-code those */
3759
3760 case ERA: return ""; /* Unimplemented; for use with strftime() %E
3761 modifier */
3762
3763 /* These formats are defined by C89, so we assume that strftime supports
3764 * them, and so are returned unconditionally; they may not be what the
3765 * locale actually says, but should give good enough results for someone
3766 * using them as formats (as opposed to trying to parse them to figure
3767 * out what the locale says). The other format items are actually tested
3768 * to verify they work on the platform */
3769 case D_FMT: return "%x";
3770 case T_FMT: return "%X";
3771 case D_T_FMT: return "%c";
3772
3773# if defined(WIN32) || ! defined(USE_LOCALE_TIME)
3774
3775 /* strftime() on Windows doesn't have the POSIX (beyond C89) extensions
3776 * that would allow it to recover these */
3777 case ERA_D_FMT: return "%x";
3778 case ERA_T_FMT: return "%X";
3779 case ERA_D_T_FMT: return "%c";
3780 case ALT_DIGITS: return "0";
3781
3782# endif
fc46dd0d 3783# ifndef USE_LOCALE_TIME
faef259e
KW
3784
3785 case T_FMT_AMPM: return "%r";
3786 case ABDAY_1: return "Sun";
3787 case ABDAY_2: return "Mon";
3788 case ABDAY_3: return "Tue";
3789 case ABDAY_4: return "Wed";
3790 case ABDAY_5: return "Thu";
3791 case ABDAY_6: return "Fri";
3792 case ABDAY_7: return "Sat";
3793 case AM_STR: return "AM";
3794 case PM_STR: return "PM";
3795 case ABMON_1: return "Jan";
3796 case ABMON_2: return "Feb";
3797 case ABMON_3: return "Mar";
3798 case ABMON_4: return "Apr";
3799 case ABMON_5: return "May";
3800 case ABMON_6: return "Jun";
3801 case ABMON_7: return "Jul";
3802 case ABMON_8: return "Aug";
3803 case ABMON_9: return "Sep";
3804 case ABMON_10: return "Oct";
3805 case ABMON_11: return "Nov";
3806 case ABMON_12: return "Dec";
3807 case DAY_1: return "Sunday";
3808 case DAY_2: return "Monday";
3809 case DAY_3: return "Tuesday";
3810 case DAY_4: return "Wednesday";
3811 case DAY_5: return "Thursday";
3812 case DAY_6: return "Friday";
3813 case DAY_7: return "Saturday";
3814 case MON_1: return "January";
3815 case MON_2: return "February";
3816 case MON_3: return "March";
3817 case MON_4: return "April";
3818 case MON_5: return "May";
3819 case MON_6: return "June";
3820 case MON_7: return "July";
3821 case MON_8: return "August";
3822 case MON_9: return "September";
3823 case MON_10: return "October";
3824 case MON_11: return "November";
3825 case MON_12: return "December";
3826
fc46dd0d 3827# endif
faef259e
KW
3828#endif
3829
3830 } /* End of switch on item */
3831
3832#ifndef USE_LOCALE
3833
3834 Perl_croak_nocontext("panic: Unexpected nl_langinfo() item %d", item);
3835 NOT_REACHED; /* NOTREACHED */
fc46dd0d 3836 PERL_UNUSED_VAR(cat_index);
faef259e
KW
3837
3838#else
fc46dd0d
KW
3839# ifdef USE_LOCALE_NUMERIC
3840
3841 /* Use either the underlying numeric, or the other underlying categories */
3842 if (cat_index == LC_NUMERIC_INDEX_) {
3843 return my_langinfo_c(item, LC_NUMERIC, PL_numeric_name,
d0826532 3844 &PL_langinfo_buf, &PL_langinfo_bufsize, utf8ness);
fc46dd0d
KW
3845 }
3846 else
3847
3848# endif
faef259e 3849
fc46dd0d
KW
3850 {
3851 return my_langinfo_i(item, cat_index, querylocale_i(cat_index),
d0826532 3852 &PL_langinfo_buf, &PL_langinfo_bufsize, utf8ness);
fc46dd0d 3853 }
faef259e
KW
3854
3855#endif
3856
f61748ac
KW
3857}
3858
faef259e
KW
3859#ifdef USE_LOCALE
3860
07492de5
KW
3861/* There are several implementations of my_langinfo, depending on the
3862 * Configuration. They all share the same beginning of the function */
59a15af0 3863STATIC const char *
fc46dd0d 3864S_my_langinfo_i(pTHX_
07492de5 3865 const nl_item item, /* The item to look up */
fc46dd0d
KW
3866 const unsigned int cat_index, /* The locale category that
3867 controls it */
3868 /* The locale to look up 'item' in. */
3869 const char * locale,
07492de5
KW
3870
3871 /* Where to store the result, and where the size of that buffer
3872 * is stored, updated on exit. retbuf_sizep may be NULL for an
3873 * empty-on-entry, single use buffer whose size we don't need
3874 * to keep track of */
3875 const char ** retbufp,
fffcb78f
KW
3876 Size_t * retbuf_sizep,
3877
3878 /* If not NULL, the location to store the UTF8-ness of 'item's
3879 * value, as documented */
3880 utf8ness_t * utf8ness)
f61748ac 3881{
07492de5 3882 const char * retval = NULL;
faef259e 3883
fc46dd0d
KW
3884 PERL_ARGS_ASSERT_MY_LANGINFO_I;
3885 assert(cat_index <= NOMINAL_LC_ALL_INDEX);
5a854ab3 3886
fc46dd0d
KW
3887 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
3888 "Entering my_langinfo item=%d, using locale %s\n",
3889 item, locale));
fe1c1494
KW
3890/*--------------------------------------------------------------------------*/
3891/* Above is the common beginning to all the implementations of my_langinfo().
332c5a2a
KW
3892 * Below are the various completions.
3893 *
3894 * Some platforms don't deal well with non-ASCII strings in locale X when
3895 * LC_CTYPE is not in X. (Actually it is probably when X is UTF-8 and LC_CTYPE
3896 * isn't, or vice versa). There is explicit code to bring the categories into
3897 * sync. This doesn't seem to be a problem with nl_langinfo(), so that
3898 * implementation doesn't currently worry about it. But it is a problem on
3899 * Windows boxes, which don't have nl_langinfo(). */
3900
fc46dd0d 3901# if defined(HAS_THREAD_SAFE_NL_LANGINFO_L) && defined(USE_POSIX_2008_LOCALE)
f7416781 3902
332c5a2a
KW
3903 /* Simplest is if we can use nl_langinfo_l()
3904 *
3905 * With it, we can change LC_CTYPE in the same call as the other category */
3906# ifdef USE_LOCALE_CTYPE
3907# define CTYPE_SAFETY_MASK LC_CTYPE_MASK
3908# else
3909# define CTYPE_SAFETY_MASK 0
3910# endif
f7416781 3911
332c5a2a
KW
3912 locale_t cur = newlocale((category_masks[cat_index] | CTYPE_SAFETY_MASK),
3913 locale, (locale_t) 0);
ee90eb2d 3914
fc46dd0d 3915 retval = save_to_buffer(nl_langinfo_l(item, cur), retbufp, retbuf_sizep);
fffcb78f
KW
3916 if (utf8ness) {
3917 *utf8ness = get_locale_string_utf8ness_i(locale, cat_index, retval,
3918 LOCALE_UTF8NESS_UNKNOWN);
3919 }
5acc4454 3920
fc46dd0d 3921 freelocale(cur);
07492de5
KW
3922
3923 return retval;
fe1c1494 3924/*--------------------------------------------------------------------------*/
fc46dd0d 3925# elif defined(HAS_NL_LANGINFO) /* nl_langinfo() is available. */
ab340fff 3926
fc46dd0d 3927 /* The second version of my_langinfo() is if we have plain nl_langinfo() */
d36adde0 3928
332c5a2a
KW
3929# ifdef USE_LOCALE_CTYPE
3930
3931 /* Ths function sorts out if things actually have to be switched or not,
3932 * for both calls. */
3933 const char * orig_CTYPE_locale = toggle_locale_c(LC_CTYPE, locale);
3934
3935# endif
3936
fc46dd0d 3937 const char * orig_switched_locale = toggle_locale_i(cat_index, locale);
ab340fff 3938
4bd49302 3939 gwLOCALE_LOCK;
fc46dd0d 3940 retval = save_to_buffer(nl_langinfo(item), retbufp, retbuf_sizep);
4bd49302 3941 gwLOCALE_UNLOCK;
d36adde0 3942
fffcb78f
KW
3943 if (utf8ness) {
3944 *utf8ness = get_locale_string_utf8ness_i(locale, cat_index,
3945 retval, LOCALE_UTF8NESS_UNKNOWN);
3946 }
3947
fc46dd0d 3948 restore_toggled_locale_i(cat_index, orig_switched_locale);
ab340fff 3949
332c5a2a
KW
3950# ifdef USE_LOCALE_CTYPE
3951 restore_toggled_locale_c(LC_CTYPE, orig_CTYPE_locale);
3952# endif
3953
0b6697c5 3954 return retval;
fe1c1494
KW
3955/*--------------------------------------------------------------------------*/
3956# else /* Below, emulate nl_langinfo as best we can */
43dd6b15 3957
fc46dd0d
KW
3958 /* And the third and final completion is where we have to emulate
3959 * nl_langinfo(). There are various possibilities depending on the
3960 * Configuration */
69c5e0db 3961
332c5a2a
KW
3962# ifdef USE_LOCALE_CTYPE
3963
3964 const char * orig_CTYPE_locale = toggle_locale_c(LC_CTYPE, locale);
3965
3966# endif
f7416781 3967
fc46dd0d
KW
3968 const char * orig_switched_locale = toggle_locale_i(cat_index, locale);
3969
3970 /* Here, we are in the locale we want information about */
3971
fffcb78f
KW
3972 /* Almost all the items will have ASCII return values. Set that here, and
3973 * override if necessary */
3974 utf8ness_t is_utf8 = UTF8NESS_IMMATERIAL;
3975
fc46dd0d
KW
3976 switch (item) {
3977 default:
c8aea264 3978 assert(item < 0); /* Make sure using perl_langinfo.h */
fc46dd0d
KW
3979 retval = "";
3980 break;
3981
fc76ce84
KW
3982 case RADIXCHAR:
3983
3984# if defined(HAS_SNPRINTF) \
3985 && (! defined(HAS_SOME_LOCALECONV) || defined(TS_W32_BROKEN_LOCALECONV))
69c5e0db 3986
ea80ef5d 3987 {
fc76ce84
KW
3988 /* snprintf() can be used to find the radix character by outputting
3989 * a known simple floating point number to a buffer, and parsing
3990 * it, inferring the radix as the bytes separating the integer and
3991 * fractional parts. But localeconv() is more direct, not
3992 * requiring inference, so use it instead of the code just below,
3993 * if (likely) it is available and works ok */
69c5e0db 3994
ea80ef5d
KW
3995 char * floatbuf = NULL;
3996 const Size_t initial_size = 10;
69c5e0db 3997
ea80ef5d 3998 Newx(floatbuf, initial_size, char);
692d08c5
KW
3999
4000 /* 1.5 is exactly representable on binary computers */
ea80ef5d 4001 Size_t needed_size = snprintf(floatbuf, initial_size, "%.1f", 1.5);
692d08c5
KW
4002
4003 /* If our guess wasn't big enough, increase and try again, based on
4004 * the real number that strnprintf() is supposed to return */
4005 if (UNLIKELY(needed_size >= initial_size)) {
ea80ef5d
KW
4006 needed_size++; /* insurance */
4007 Renew(floatbuf, needed_size, char);
4008 Size_t new_needed = snprintf(floatbuf, needed_size, "%.1f", 1.5);
4009 assert(new_needed <= needed_size);
4010 needed_size = new_needed;
4011 }
69c5e0db 4012
ea80ef5d
KW
4013 char * s = floatbuf;
4014 char * e = floatbuf + needed_size;
69c5e0db 4015
ea80ef5d
KW
4016 /* Find the '1' */
4017 while (s < e && *s != '1') {
4018 s++;
4019 }
eb1b97ad 4020
ea80ef5d
KW
4021 if (LIKELY(s < e)) {
4022 s++;
eb1b97ad 4023 }
69c5e0db 4024
ea80ef5d
KW
4025 /* Find the '5' */
4026 char * item_start = s;
4027 while (s < e && *s != '5') {
4028 s++;
4029 }
69c5e0db 4030
ea80ef5d 4031 /* Everything in between is the radix string */
692d08c5 4032 if (LIKELY(s < e)) {
ea80ef5d
KW
4033 *s = '\0';
4034 retval = save_to_buffer(item_start,
4035 (const char **) &PL_langinfo_buf,
4036 &PL_langinfo_bufsize);
4037 Safefree(floatbuf);
fffcb78f
KW
4038
4039 if (utf8ness) {
4040 is_utf8 = get_locale_string_utf8ness_i(locale, cat_index,
4041 retval,
4042 LOCALE_UTF8NESS_UNKNOWN);
4043
4044 }
4045
69c5e0db 4046 break;
ea80ef5d
KW
4047 }
4048
4049 Safefree(floatbuf);
4050 }
69c5e0db 4051
fc76ce84
KW
4052# ifdef HAS_SOME_LOCALECONV /* snprintf() failed; drop down to use
4053 localeconv() */
4054
4055 /* FALLTHROUGH */ \
69c5e0db 4056
fc76ce84 4057# else /* snprintf() failed and no localeconv() */
69c5e0db 4058
fc46dd0d
KW
4059 retval = C_decimal_point;
4060 break;
fc76ce84
KW
4061
4062# endif
69c5e0db 4063# endif
fc76ce84 4064# ifdef HAS_SOME_LOCALECONV
69c5e0db 4065
692d08c5
KW
4066 /* These items are available from localeconv(). (To avoid using
4067 * TS_W32_BROKEN_LOCALECONV, one could use GetNumberFormat and
4068 * GetCurrencyFormat; patches welcome) */
5acc4454 4069
0630bfb4
KW
4070 case CRNCYSTR:
4071 case THOUSEP:
4072 {
4073 SV * string = (SV *) my_localeconv(item, LOCALE_UTF8NESS_UNKNOWN);
69c5e0db 4074
0630bfb4 4075 retval = save_to_buffer(SvPV_nolen(string), retbufp, retbuf_sizep);
5acc4454 4076
fffcb78f
KW
4077 if (utf8ness) {
4078 is_utf8 = get_locale_string_utf8ness_i(locale, cat_index, retval,
4079 LOCALE_UTF8NESS_UNKNOWN);
4080 }
4081
0630bfb4 4082 SvREFCNT_dec_NN(string);
fffcb78f 4083 break;
0630bfb4 4084 }
f7416781 4085
909e71b1
KW
4086# endif /* Some form of localeconv */
4087# ifdef HAS_STRFTIME
4088
4089 /* These formats are only available in later strfmtime's */
4090 case ERA_D_FMT: case ERA_T_FMT: case ERA_D_T_FMT: case T_FMT_AMPM:
4091
4092 /* The rest can be gotten from most versions of strftime(). */
4093 case ABDAY_1: case ABDAY_2: case ABDAY_3:
4094 case ABDAY_4: case ABDAY_5: case ABDAY_6: case ABDAY_7:
4095 case ALT_DIGITS:
4096 case AM_STR: case PM_STR:
4097 case ABMON_1: case ABMON_2: case ABMON_3: case ABMON_4:
4098 case ABMON_5: case ABMON_6: case ABMON_7: case ABMON_8:
4099 case ABMON_9: case ABMON_10: case ABMON_11: case ABMON_12:
4100 case DAY_1: case DAY_2: case DAY_3: case DAY_4:
4101 case DAY_5: case DAY_6: case DAY_7:
4102 case MON_1: case MON_2: case MON_3: case MON_4:
4103 case MON_5: case MON_6: case MON_7: case MON_8:
4104 case MON_9: case MON_10: case MON_11: case MON_12:
dd8a7039
KW
4105 {
4106 const char * format;
4107 bool return_format = FALSE;
4108 int mon = 0;
4109 int mday = 1;
4110 int hour = 6;
639a7c35 4111
909e71b1 4112 GCC_DIAG_IGNORE_STMT(-Wimplicit-fallthrough);
639a7c35 4113
909e71b1
KW
4114 switch (item) {
4115 default:
20ad7b23 4116 locale_panic_(Perl_form(aTHX_ "switch case: %d problem", item));
909e71b1
KW
4117 NOT_REACHED; /* NOTREACHED */
4118
4119 case PM_STR: hour = 18;
4120 case AM_STR:
4121 format = "%p";
4122 break;
4123 case ABDAY_7: mday++;
4124 case ABDAY_6: mday++;
4125 case ABDAY_5: mday++;
4126 case ABDAY_4: mday++;
4127 case ABDAY_3: mday++;
4128 case ABDAY_2: mday++;
4129 case ABDAY_1:
4130 format = "%a";
4131 break;
4132 case DAY_7: mday++;
4133 case DAY_6: mday++;
4134 case DAY_5: mday++;
4135 case DAY_4: mday++;
4136 case DAY_3: mday++;
4137 case DAY_2: mday++;
4138 case DAY_1:
4139 format = "%A";
4140 break;
4141 case ABMON_12: mon++;
4142 case ABMON_11: mon++;
4143 case ABMON_10: mon++;
4144 case ABMON_9: mon++;
4145 case ABMON_8: mon++;
4146 case ABMON_7: mon++;
4147 case ABMON_6: mon++;
4148 case ABMON_5: mon++;
4149 case ABMON_4: mon++;
4150 case ABMON_3: mon++;
4151 case ABMON_2: mon++;
4152 case ABMON_1:
4153 format = "%b";
4154 break;
4155 case MON_12: mon++;
4156 case MON_11: mon++;
4157 case MON_10: mon++;
4158 case MON_9: mon++;
4159 case MON_8: mon++;
4160 case MON_7: mon++;
4161 case MON_6: mon++;
4162 case MON_5: mon++;
4163 case MON_4: mon++;
4164 case MON_3: mon++;
4165 case MON_2: mon++;
4166 case MON_1:
4167 format = "%B";
4168 break;
4169 case T_FMT_AMPM:
4170 format = "%r";
4171 return_format = TRUE;
4172 break;
4173 case ERA_D_FMT:
4174 format = "%Ex";
4175 return_format = TRUE;
4176 break;
4177 case ERA_T_FMT:
4178 format = "%EX";
4179 return_format = TRUE;
4180 break;
4181 case ERA_D_T_FMT:
4182 format = "%Ec";
4183 return_format = TRUE;
4184 break;
4185 case ALT_DIGITS:
4186 format = "%Ow"; /* Find the alternate digit for 0 */
4187 break;
4188 }
f7416781 4189
909e71b1 4190 GCC_DIAG_RESTORE_STMT;
639a7c35 4191
dd8a7039
KW
4192 /* The year was deliberately chosen so that January 1 is on the
4193 * first day of the week. Since we're only getting one thing at a
4194 * time, it all works */
fffcb78f
KW
4195 const char * temp = my_strftime8(format, 30, 30, hour, mday, mon,
4196 2011, 0, 0, 0, &is_utf8);
07492de5 4197 retval = save_to_buffer(temp, retbufp, retbuf_sizep);
dd8a7039 4198 Safefree(temp);
f7416781 4199
07492de5
KW
4200 /* If the item is 'ALT_DIGITS', '*retbuf' contains the alternate
4201 * format for wday 0. If the value is the same as the normal 0,
4202 * there isn't an alternate, so clear the buffer.
dd8a7039
KW
4203 *
4204 * (wday was chosen because its range is all a single digit.
4205 * Things like tm_sec have two digits as the minimum: '00'.) */
07492de5 4206 if (item == ALT_DIGITS && strEQ(*retbufp, "0")) {
fc46dd0d
KW
4207 retval = "";
4208 break;
909e71b1 4209 }
f7416781 4210
909e71b1 4211 /* ALT_DIGITS is problematic. Experiments on it showed that
fe1c1494
KW
4212 * strftime() did not always work properly when going from alt-9 to
4213 * alt-10. Only a few locales have this item defined, and in all
4214 * of them on Linux that khw was able to find, nl_langinfo() merely
4215 * returned the alt-0 character, possibly doubled. Most Unicode
4216 * digits are in blocks of 10 consecutive code points, so that is
4217 * sufficient information for such scripts, as we can infer alt-1,
4218 * alt-2, .... But for a Japanese locale, a CJK ideographic 0 is
4219 * returned, and the CJK digits are not in code point order, so you
4220 * can't really infer anything. The localedef for this locale did
4221 * specify the succeeding digits, so that strftime() works properly
4222 * on them, without needing to infer anything. But the
4223 * nl_langinfo() return did not give sufficient information for the
4224 * caller to understand what's going on. So until there is
4225 * evidence that it should work differently, this returns the alt-0
dd8a7039 4226 * string for ALT_DIGITS. */
f7416781 4227
dd8a7039 4228 if (return_format) {
d1b150d9 4229
909e71b1
KW
4230 /* If to return the format, not the value, overwrite the buffer
4231 * with it. But some strftime()s will keep the original format
4232 * if illegal, so change those to "" */
07492de5 4233 if (strEQ(*retbufp, format)) {
909e71b1
KW
4234 retval = "";
4235 }
4236 else {
4237 retval = format;
4238 }
fffcb78f
KW
4239
4240 /* A format is always in ASCII */
4241 is_utf8 = UTF8NESS_IMMATERIAL;
dd8a7039 4242 }
f7416781 4243
dd8a7039
KW
4244 break;
4245 }
f7416781 4246
909e71b1 4247# endif
f7416781 4248
f206e331
KW
4249 case CODESET:
4250
f6e2ab67
KW
4251 /* The trivial case */
4252 if (isNAME_C_OR_POSIX(locale)) {
fc46dd0d
KW
4253 retval = C_codeset;
4254 break;
f6e2ab67 4255 }
f206e331 4256
f6e2ab67 4257# ifdef WIN32
f206e331 4258
f6e2ab67
KW
4259 /* This function retrieves the code page. It is subject to change, but
4260 * is documented and has been stable for many releases */
f4a34202
KW
4261 UINT ___lc_codepage_func(void);
4262
4263 retval = save_to_buffer(Perl_form(aTHX_ "%d", ___lc_codepage_func()),
4264 retbufp, retbuf_sizep);
f6e2ab67
KW
4265 DEBUG_Lv(PerlIO_printf(Perl_debug_log, "locale='%s' cp=%s\n",
4266 locale, retval));
f4a34202
KW
4267 break;
4268
f6e2ab67 4269# else
f4a34202 4270
f6e2ab67
KW
4271 /* The codeset is important, but khw did not figure out a way for it to
4272 * be retrieved on non-Windows boxes without nl_langinfo(). But even
4273 * if we can't get it directly, we can usually determine if it is a
4274 * UTF-8 locale or not. If it is UTF-8, we (correctly) use that for
4275 * the code set. */
f206e331 4276
f6e2ab67
KW
4277# if defined(HAS_MBTOWC) || defined(HAS_MBRTOWC)
4278
4279 /* If libc mbtowc() evaluates the bytes that form the REPLACEMENT
4280 * CHARACTER as that Unicode code point, this has to be a UTF-8 locale.
4281 * */
4282
4283 wchar_t wc = 0;
4284 (void) Perl_mbtowc_(aTHX_ NULL, NULL, 0);/* Reset shift state */
4285 int mbtowc_ret = Perl_mbtowc_(aTHX_ &wc,
4286 STR_WITH_LEN(REPLACEMENT_CHARACTER_UTF8));
4287 if (mbtowc_ret >= 0 && wc == UNICODE_REPLACEMENT) {
58399f03
KW
4288 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
4289 "mbtowc returned REPLACEMENT\n"));
fc46dd0d
KW
4290 retval = "UTF-8";
4291 break;
f6e2ab67 4292 }
f206e331 4293
f6e2ab67
KW
4294 /* Here, it isn't a UTF-8 locale. */
4295
58399f03
KW
4296# else /* mbtowc() is not available. */
4297
4298 /* Sling together several possibilities, depending on platform
4299 * capabilities and what we found.
4300 *
4301 * For non-English locales or non-dollar currency locales, we likely
4302 * will find out whether a locale is UTF-8 or not */
4303
4304 utf8ness_t is_utf8 = UTF8NESS_UNKNOWN;
4305 const char * scratch_buf = NULL;
4306
4307# if defined(USE_LOCALE_MONETARY) && defined(HAS_SOME_LOCALECONV)
4308
4309 /* Can't use this method unless localeconv() is available, as that's
4310 * the way we find out the currency symbol. */
4311
4312 /* First try looking at the currency symbol (via a recursive call) to
4313 * see if it disambiguates things. Often that will be in the native
4314 * script, and if the symbol isn't legal UTF-8, we know that the locale
4315 * isn't either. */
4316 (void) my_langinfo_c(CRNCYSTR, LC_MONETARY, locale, &scratch_buf, NULL,
4317 &is_utf8);
4318 Safefree(scratch_buf);
4319
f6e2ab67 4320# endif
58399f03
KW
4321# ifdef USE_LOCALE_TIME
4322
4323 /* If we have ruled out being UTF-8, no point in checking further. */
4324 if (is_utf8 != UTF8NESS_NO) {
4325
4326 /* But otherwise do check more. This is done even if the currency
4327 * symbol looks to be UTF-8, just in case that's a false positive.
4328 *
4329 * Look at the LC_TIME entries, like the names of the months or
4330 * weekdays. We quit at the first one that is illegal UTF-8 */
4331
4332 utf8ness_t this_is_utf8 = UTF8NESS_UNKNOWN;
4333 const int times[] = {
4334 DAY_1, DAY_2, DAY_3, DAY_4, DAY_5, DAY_6, DAY_7,
4335 MON_1, MON_2, MON_3, MON_4, MON_5, MON_6, MON_7, MON_8,
4336 MON_9, MON_10, MON_11, MON_12,
4337 ALT_DIGITS, AM_STR, PM_STR,
4338 ABDAY_1, ABDAY_2, ABDAY_3, ABDAY_4, ABDAY_5, ABDAY_6,
4339 ABDAY_7,
4340 ABMON_1, ABMON_2, ABMON_3, ABMON_4, ABMON_5, ABMON_6,
4341 ABMON_7, ABMON_8, ABMON_9, ABMON_10, ABMON_11, ABMON_12
4342 };
4343
4344 /* The code in the recursive call can handle switching the locales,
4345 * but by doing it here, we avoid switching each iteration of the
4346 * loop */
4347 const char * orig_TIME_locale = toggle_locale_c(LC_TIME, locale);
4348
4349 for (PERL_UINT_FAST8_T i = 0; i < C_ARRAY_LENGTH(times); i++) {
4350 scratch_buf = NULL;
4351 (void) my_langinfo_c(times[i], LC_TIME, locale, &scratch_buf,
4352 NULL, &this_is_utf8);
4353 Safefree(scratch_buf);
4354 if (this_is_utf8 == UTF8NESS_NO) {
4355 is_utf8 = UTF8NESS_NO;
4356 break;
4357 }
4358
4359 if (this_is_utf8 == UTF8NESS_YES) {
4360 is_utf8 = UTF8NESS_YES;
4361 }
4362 }
4363
4364 /* Here we have gone through all the LC_TIME elements. is_utf8 has
4365 * been set as follows:
4366 * UTF8NESS_NO If any aren't legal UTF-8
4367 * UTF8NESS_IMMMATERIAL If all are ASCII
4368 * UTF8NESS_YES If all are legal UTF-8 (including
4369 * ASCIIi), and at least one isn't
4370 * ASCII. */
4371
4372 restore_toggled_locale_c(LC_TIME, orig_TIME_locale);
4373 }
f6e2ab67 4374
58399f03
KW
4375# endif /* LC_TIME */
4376
4377 /* If nothing examined above rules out it being UTF-8, and at least one
4378 * thing fits as UTF-8 (and not plain ASCII), assume the codeset is
4379 * UTF-8. */
4380 if (is_utf8 == UTF8NESS_YES) {
4381 retval = "UTF-8";
4382 break;
4383 }
4384
4385 /* Here, nothing examined indicates that the codeset is UTF-8. But
4386 * what is it? The other locale categories are not likely to be of
4387 * further help:
4388 *
4389 * LC_NUMERIC Only a few locales in the world have a non-ASCII radix
4390 * or group separator.
4391 * LC_CTYPE This code wouldn't be compiled if mbtowc() existed and
4392 * was reliable. This is unlikely in C99. There are
4393 * other functions that could be used instead, but are
4394 * they going to exist, and be able to distinguish between
4395 * UTF-8 and 8859-1? Deal with this only if it becomes
4396 * necessary.
4397 * LC_MESSAGES The strings returned from strerror() would seem likely
4398 * candidates, but experience has shown that many systems
4399 * don't actually have translations installed for them.
4400 * They are instead always in English, so everything in
4401 * them is ASCII, which is of no help to us. A Configure
4402 * probe could possibly be written to see if this platform
4403 * has non-ASCII error messages. But again, wait until it
4404 * turns out to be an actual problem. */
4405
4406# endif /* ! mbtowc() */
4407
4408 /* Rejoin the mbtowc available/not-available cases.
4409 *
4410 * We got here only because we haven't been able to find the codeset.
4411 * The only other option khw could think of is to see if the codeset is
4412 * part of the locale name. This is very less than ideal; often there
4413 * is no code set in the name; and at other times they even lie.
f6e2ab67
KW
4414 *
4415 * Find any dot in the locale name */
4416 retval = (const char *) strchr(locale, '.');
58399f03
KW
4417 if (! retval) {
4418 retval = ""; /* Alas, no dot */
4419 break;
4420 }
f206e331 4421
58399f03
KW
4422 /* Use everything past the dot */
4423 retval++;
f206e331 4424
f6e2ab67
KW
4425# if defined(HAS_MBTOWC) || defined(HAS_MBRTOWC)
4426
58399f03
KW
4427 /* When these functions, are available, they were tried earlier and
4428 * indicated that the locale did not act like a proper UTF-8 one. So
4429 * if it claims to be UTF-8, it is a lie */
f6e2ab67
KW
4430 if (is_codeset_name_UTF8(retval)) {
4431 retval = "";
58399f03 4432 break;
f6e2ab67 4433 }
58399f03
KW
4434
4435# endif
4436
4437 /* Otherwise the code set name is considered to be everything past the
4438 * dot. */
4439 retval = save_to_buffer(retval, retbufp, retbuf_sizep);
f206e331
KW
4440
4441 break;
4442
f6e2ab67 4443# endif
f206e331 4444
f6e2ab67 4445 } /* Giant switch() of nl_langinfo() items */
f7416781 4446
fc46dd0d 4447 restore_toggled_locale_i(cat_index, orig_switched_locale);
332c5a2a
KW
4448
4449# ifdef USE_LOCALE_CTYPE
4450 restore_toggled_locale_c(LC_CTYPE, orig_CTYPE_locale);
4451# endif
4452
fffcb78f
KW
4453 if (utf8ness) {
4454 *utf8ness = is_utf8;
4455 }
4456
0b6697c5 4457 return retval;
f7416781 4458
f6e2ab67 4459# endif /* All the implementations of my_langinfo() */
58399f03 4460
fe1c1494 4461/*--------------------------------------------------------------------------*/
f6e2ab67
KW
4462
4463} /* my_langinfo() */
b385bb4d 4464
faef259e
KW
4465#endif /* USE_LOCALE */
4466
7b12ce91
KW
4467char *
4468Perl_my_strftime8(pTHX_ const char *fmt, int sec, int min, int hour, int mday,
4469 int mon, int year, int wday, int yday, int isdst,
4470 utf8ness_t * utf8ness)
4471{ /* Documented in util.c */
4472 char * retval = my_strftime(fmt, sec, min, hour, mday, mon, year, wday,
4473 yday, isdst);
4474
4475 PERL_ARGS_ASSERT_MY_STRFTIME8;
4476
4477 if (utf8ness) {
4478
4479#ifdef USE_LOCALE_TIME
4480 *utf8ness = get_locale_string_utf8ness_i(NULL, LC_TIME_INDEX_,
fffcb78f 4481 retval, LOCALE_UTF8NESS_UNKNOWN);
7b12ce91
KW
4482#else
4483 *utf8ness = UTF8NESS_IMMATERIAL;
4484#endif
4485
4486 }
4487
4488 DEBUG_Lv(PerlIO_printf(Perl_debug_log, "fmt=%s, retval=%s", fmt,
4489 ((is_utf8_string((U8 *) retval, 0))
4490 ? retval
4491 :_byte_dump_string((U8 *) retval, strlen(retval), 0)));
4492 if (utf8ness) PerlIO_printf(Perl_debug_log, "; utf8ness=%d",
4493 (int) *utf8ness);
4494 PerlIO_printf(Perl_debug_log, "\n");
4495 );
4496
4497 return retval;
4498}
4499
98994639
HS
4500/*
4501 * Initialize locale awareness.
4502 */
4503int
4504Perl_init_i18nl10n(pTHX_ int printwarn)
4505{
0e92a118
KW
4506 /* printwarn is
4507 *
4508 * 0 if not to output warning when setup locale is bad
4509 * 1 if to output warning based on value of PERL_BADLANG
4510 * >1 if to output regardless of PERL_BADLANG
4511 *
4512 * returns
98994639 4513 * 1 = set ok or not applicable,
0e92a118
KW
4514 * 0 = fallback to a locale of lower priority
4515 * -1 = fallback to all locales failed, not even to the C locale
6b058d42
KW
4516 *
4517 * Under -DDEBUGGING, if the environment variable PERL_DEBUG_LOCALE_INIT is
4518 * set, debugging information is output.
4519 *
b85af0ab
KW
4520 * This looks more complicated than it is, mainly due to the #ifdefs and
4521 * error handling.
6b058d42 4522 *
b85af0ab 4523 * Besides some asserts, data structure initialization, and specific
fe1c1494
KW
4524 * platform complications, this routine is effectively represented by this
4525 * pseudo-code:
b85af0ab 4526 *
fe1c1494
KW
4527 * setlocale(LC_ALL, ""); x
4528 * foreach (subcategory) { x
4529 * curlocales[f(subcategory)] = setlocale(subcategory, NULL); x
4530 * } x
4531 * if (platform_so_requires) {
4532 * foreach (subcategory) {
4533 * PL_curlocales[f(subcategory)] = curlocales[f(subcategory)]
4534 * }
4535 * }
4536 * foreach (subcategory) {
4537 * if (needs_special_handling[f(subcategory)] &this_subcat_handler
4538 * }
b85af0ab 4539 *
fe1c1494
KW
4540 * This sets all the categories to the values in the current environment,
4541 * saves them temporarily in curlocales[] until they can be handled and/or
4542 * on some platforms saved in a per-thread array PL_curlocales[].
b85af0ab 4543 *
fe1c1494
KW
4544 * f(foo) is a mapping from the opaque system category numbers to small
4545 * non-negative integers used most everywhere in this file as indices into
4546 * arrays (such as curlocales[]) so the program doesn't have to otherwise
4547 * deal with the opaqueness.
b85af0ab 4548 *
fe1c1494
KW
4549 * If the platform doesn't have LC_ALL, the lines marked 'x' above are
4550 * effectively replaced by:
4551 * foreach (subcategory) { y
4552 * curlocales[f(subcategory)] = setlocale(subcategory, ""); y
4553 * } y
b85af0ab 4554 *
fe1c1494
KW
4555 * The only differences being the lack of an LC_ALL call, and using ""
4556 * instead of NULL in the setlocale calls.
b85af0ab 4557 *
fe1c1494 4558 * But there are, of course, complications.
b85af0ab 4559 *
fe1c1494
KW
4560 * it has to deal with if this is an embedded perl, whose locale doesn't
4561 * come from the environment, but has been set up by the caller. This is
4562 * pretty simply handled: the "" in the setlocale calls is not a string
4563 * constant, but a variable which is set to NULL in the embedded case.
b85af0ab 4564 *
fe1c1494
KW
4565 * But the major complication is handling failure and doing fallback. All
4566 * the code marked 'x' or 'y' above is actually enclosed in an outer loop,
4567 * using the array trial_locales[]. On entry, trial_locales[] is
4568 * initialized to just one entry, containing the NULL or "" locale argument
4569 * shown above. If, as is almost always the case, everything works, it
4570 * exits after just the one iteration, going on to the next step.
b85af0ab 4571 *
fe1c1494
KW
4572 * But if there is a failure, the code tries its best to honor the
4573 * environment as much as possible. It self-modifies trial_locales[] to
4574 * have more elements, one for each of the POSIX-specified settings from
4575 * the environment, such as LANG, ending in the ultimate fallback, the C
4576 * locale. Thus if there is something bogus with a higher priority
4577 * environment variable, it will try with the next highest, until something
4578 * works. If everything fails, it limps along with whatever state it got
4579 * to.
b85af0ab
KW
4580 *
4581 * A further complication is that Windows has an additional fallback, the
4582 * user-default ANSI code page obtained from the operating system. This is
4583 * added as yet another loop iteration, just before the final "C"
6b058d42 4584 *
1aa8e6dd
KW
4585 * A slight complication is that in embedded Perls, the locale may already
4586 * be set-up, and we don't want to get it from the normal environment
4587 * variables. This is handled by having a special environment variable
4588 * indicate we're in this situation. We simply set setlocale's 2nd
4589 * parameter to be a NULL instead of "". That indicates to setlocale that
4590 * it is not to change anything, but to return the current value,
4591 * effectively initializing perl's db to what the locale already is.
4592 *
4593 * We play the same trick with NULL if a LC_ALL succeeds. We call
4594 * setlocale() on the individual categores with NULL to get their existing
4595 * values for our db, instead of trying to change them.
4596 * */
1565c085 4597
0e92a118
KW
4598 int ok = 1;
4599
7d4bcc4a
KW
4600#ifndef USE_LOCALE
4601
4602 PERL_UNUSED_ARG(printwarn);
4603
4604#else /* USE_LOCALE */
7d4bcc4a
KW
4605# ifdef __GLIBC__
4606
24f3e849 4607 const char * const language = PerlEnv_getenv("LANGUAGE");
7d4bcc4a
KW
4608
4609# endif
65ebb059 4610
ccd65d51
KW
4611 /* NULL uses the existing already set up locale */
4612 const char * const setlocale_init = (PerlEnv_getenv("PERL_SKIP_LOCALE_INIT"))
4613 ? NULL
4614 : "";
938a8fef 4615 typedef struct trial_locales_struct_s {
e57aaf75 4616 const char* trial_locale;
6e1b5a64
B
4617 const char* fallback_desc;
4618 const char* fallback_name;
e57aaf75 4619 } trial_locales_struct;
4fe10b51 4620 /* 5 = 1 each for "", LC_ALL, LANG, (Win32) system default locale, C */
938a8fef 4621 trial_locales_struct trial_locales[5];
c3fcd832 4622 unsigned int trial_locales_count;
24f3e849
KW
4623 const char * const lc_all = PerlEnv_getenv("LC_ALL");
4624 const char * const lang = PerlEnv_getenv("LANG");
98994639 4625 bool setlocale_failure = FALSE;
65ebb059 4626 unsigned int i;
175c4cf9
KW
4627
4628 /* A later getenv() could zap this, so only use here */
4629 const char * const bad_lang_use_once = PerlEnv_getenv("PERL_BADLANG");
4630
4631 const bool locwarn = (printwarn > 1
e5f10d49
KW
4632 || ( printwarn
4633 && ( ! bad_lang_use_once
22ff3130 4634 || (
e5f10d49
KW
4635 /* disallow with "" or "0" */
4636 *bad_lang_use_once
4637 && strNE("0", bad_lang_use_once)))));
ea92aad8 4638
291a84fb
KW
4639 /* current locale for given category; should have been copied so aren't
4640 * volatile */
8de4332b 4641 const char * curlocales[NOMINAL_LC_ALL_INDEX + 1];
291a84fb 4642
948523db
KW
4643# ifndef DEBUGGING
4644# define DEBUG_LOCALE_INIT(a,b,c)
4645# else
7d4bcc4a 4646
8298454c 4647 DEBUG_INITIALIZATION_set(cBOOL(PerlEnv_getenv("PERL_DEBUG_LOCALE_INIT")));
7d4bcc4a 4648
d188fe6d 4649# define DEBUG_LOCALE_INIT(cat_index, locale, result) \
b26541a9 4650 DEBUG_L(PerlIO_printf(Perl_debug_log, "%s\n", \
d188fe6d 4651 setlocale_debug_string_i(cat_index, locale, result)));
2fcc0ca9 4652
948523db
KW
4653/* Make sure the parallel arrays are properly set up */
4654# ifdef USE_LOCALE_NUMERIC
4fd76d49
KW
4655 assert(categories[LC_NUMERIC_INDEX_] == LC_NUMERIC);
4656 assert(strEQ(category_names[LC_NUMERIC_INDEX_], "LC_NUMERIC"));
e9bc6d6b 4657# ifdef USE_POSIX_2008_LOCALE
4fd76d49 4658 assert(category_masks[LC_NUMERIC_INDEX_] == LC_NUMERIC_MASK);
e9bc6d6b 4659# endif
948523db
KW
4660# endif
4661# ifdef USE_LOCALE_CTYPE
4fd76d49
KW
4662 assert(categories[LC_CTYPE_INDEX_] == LC_CTYPE);
4663 assert(strEQ(category_names[LC_CTYPE_INDEX_], "LC_CTYPE"));
e9bc6d6b 4664# ifdef USE_POSIX_2008_LOCALE
4fd76d49 4665 assert(category_masks[LC_CTYPE_INDEX_] == LC_CTYPE_MASK);
e9bc6d6b 4666# endif
948523db
KW
4667# endif
4668# ifdef USE_LOCALE_COLLATE
4fd76d49
KW
4669 assert(categories[LC_COLLATE_INDEX_] == LC_COLLATE);
4670 assert(strEQ(category_names[LC_COLLATE_INDEX_], "LC_COLLATE"));
e9bc6d6b 4671# ifdef USE_POSIX_2008_LOCALE
4fd76d49 4672 assert(category_masks[LC_COLLATE_INDEX_] == LC_COLLATE_MASK);
e9bc6d6b 4673# endif
948523db
KW
4674# endif
4675# ifdef USE_LOCALE_TIME
4fd76d49
KW
4676 assert(categories[LC_TIME_INDEX_] == LC_TIME);
4677 assert(strEQ(category_names[LC_TIME_INDEX_], "LC_TIME"));
e9bc6d6b 4678# ifdef USE_POSIX_2008_LOCALE
4fd76d49 4679 assert(category_masks[LC_TIME_INDEX_] == LC_TIME_MASK);
e9bc6d6b 4680# endif
948523db
KW
4681# endif
4682# ifdef USE_LOCALE_MESSAGES
4fd76d49
KW
4683 assert(categories[LC_MESSAGES_INDEX_] == LC_MESSAGES);
4684 assert(strEQ(category_names[LC_MESSAGES_INDEX_], "LC_MESSAGES"));
e9bc6d6b 4685# ifdef USE_POSIX_2008_LOCALE
4fd76d49 4686 assert(category_masks[LC_MESSAGES_INDEX_] == LC_MESSAGES_MASK);
e9bc6d6b 4687# endif
948523db
KW
4688# endif
4689# ifdef USE_LOCALE_MONETARY
4fd76d49
KW
4690 assert(categories[LC_MONETARY_INDEX_] == LC_MONETARY);
4691 assert(strEQ(category_names[LC_MONETARY_INDEX_], "LC_MONETARY"));
e9bc6d6b 4692# ifdef USE_POSIX_2008_LOCALE
4fd76d49 4693 assert(category_masks[LC_MONETARY_INDEX_] == LC_MONETARY_MASK);
e9bc6d6b 4694# endif
948523db 4695# endif
9821811f 4696# ifdef USE_LOCALE_ADDRESS
4fd76d49
KW
4697 assert(categories[LC_ADDRESS_INDEX_] == LC_ADDRESS);
4698 assert(strEQ(category_names[LC_ADDRESS_INDEX_], "LC_ADDRESS"));
e9bc6d6b 4699# ifdef USE_POSIX_2008_LOCALE
4fd76d49 4700 assert(category_masks[LC_ADDRESS_INDEX_] == LC_ADDRESS_MASK);
e9bc6d6b 4701# endif
9821811f
KW
4702# endif
4703# ifdef USE_LOCALE_IDENTIFICATION
4fd76d49
KW
4704 assert(categories[LC_IDENTIFICATION_INDEX_] == LC_IDENTIFICATION);
4705 assert(strEQ(category_names[LC_IDENTIFICATION_INDEX_], "LC_IDENTIFICATION"));
e9bc6d6b 4706# ifdef USE_POSIX_2008_LOCALE
4fd76d49 4707 assert(category_masks[LC_IDENTIFICATION_INDEX_] == LC_IDENTIFICATION_MASK);
e9bc6d6b 4708# endif
9821811f
KW
4709# endif
4710# ifdef USE_LOCALE_MEASUREMENT
4fd76d49
KW
4711 assert(categories[LC_MEASUREMENT_INDEX_] == LC_MEASUREMENT);
4712 assert(strEQ(category_names[LC_MEASUREMENT_INDEX_], "LC_MEASUREMENT"));
e9bc6d6b 4713# ifdef USE_POSIX_2008_LOCALE
4fd76d49 4714 assert(category_masks[LC_MEASUREMENT_INDEX_] == LC_MEASUREMENT_MASK);
e9bc6d6b 4715# endif
9821811f
KW
4716# endif
4717# ifdef USE_LOCALE_PAPER
4fd76d49
KW
4718 assert(categories[LC_PAPER_INDEX_] == LC_PAPER);
4719 assert(strEQ(category_names[LC_PAPER_INDEX_], "LC_PAPER"));
e9bc6d6b 4720# ifdef USE_POSIX_2008_LOCALE
4fd76d49 4721 assert(category_masks[LC_PAPER_INDEX_] == LC_PAPER_MASK);
e9bc6d6b 4722# endif
9821811f
KW
4723# endif
4724# ifdef USE_LOCALE_TELEPHONE
4fd76d49
KW
4725 assert(categories[LC_TELEPHONE_INDEX_] == LC_TELEPHONE);
4726 assert(strEQ(category_names[LC_TELEPHONE_INDEX_], "LC_TELEPHONE"));
e9bc6d6b 4727# ifdef USE_POSIX_2008_LOCALE
4fd76d49 4728 assert(category_masks[LC_TELEPHONE_INDEX_] == LC_TELEPHONE_MASK);
e9bc6d6b 4729# endif
9821811f 4730# endif
36dbc955 4731# ifdef USE_LOCALE_SYNTAX
4fd76d49
KW
4732 assert(categories[LC_SYNTAX_INDEX_] == LC_SYNTAX);
4733 assert(strEQ(category_names[LC_SYNTAX_INDEX_], "LC_SYNTAX"));
36dbc955 4734# ifdef USE_POSIX_2008_LOCALE
4fd76d49 4735 assert(category_masks[LC_SYNTAX_INDEX_] == LC_SYNTAX_MASK);
36dbc955
KW
4736# endif
4737# endif
4738# ifdef USE_LOCALE_TOD
4fd76d49
KW
4739 assert(categories[LC_TOD_INDEX_] == LC_TOD);
4740 assert(strEQ(category_names[LC_TOD_INDEX_], "LC_TOD"));
36dbc955 4741# ifdef USE_POSIX_2008_LOCALE
4fd76d49 4742 assert(category_masks[LC_TOD_INDEX_] == LC_TOD_MASK);
36dbc955
KW
4743# endif
4744# endif
948523db 4745# ifdef LC_ALL
4fd76d49
KW
4746 assert(categories[LC_ALL_INDEX_] == LC_ALL);
4747 assert(strEQ(category_names[LC_ALL_INDEX_], "LC_ALL"));
b9cb1acb 4748 STATIC_ASSERT_STMT(NOMINAL_LC_ALL_INDEX == LC_ALL_INDEX_);
e9bc6d6b 4749# ifdef USE_POSIX_2008_LOCALE
4fd76d49 4750 assert(category_masks[LC_ALL_INDEX_] == LC_ALL_MASK);
e9bc6d6b 4751# endif
948523db
KW
4752# endif
4753# endif /* DEBUGGING */
47280b20 4754
5a6637f0
KW
4755 /* Initialize the per-thread mbrFOO() state variables. See POSIX.xs for
4756 * why these particular incantations are used. */
fe1c1494 4757# ifdef HAS_MBRLEN
d2c9cb53 4758 memzero(&PL_mbrlen_ps, sizeof(PL_mbrlen_ps));
fe1c1494
KW
4759# endif
4760# ifdef HAS_MBRTOWC
5a6637f0 4761 memzero(&PL_mbrtowc_ps, sizeof(PL_mbrtowc_ps));
fe1c1494
KW
4762# endif
4763# ifdef HAS_WCTOMBR
5a6637f0 4764 wcrtomb(NULL, L'\0', &PL_wcrtomb_ps);
fe1c1494 4765# endif
e9bc6d6b
KW
4766# ifdef USE_THREAD_SAFE_LOCALE
4767# ifdef WIN32
4768
4769 _configthreadlocale(_ENABLE_PER_THREAD_LOCALE);
4770
4771# endif
4772# endif
39e69e77 4773# ifdef USE_POSIX_2008_LOCALE
e9bc6d6b
KW
4774
4775 PL_C_locale_obj = newlocale(LC_ALL_MASK, "C", (locale_t) 0);
4776 if (! PL_C_locale_obj) {
20ad7b23
KW
4777 locale_panic_(Perl_form(aTHX_
4778 "Cannot create POSIX 2008 C locale object"));
e9bc6d6b 4779 }
e9bc6d6b 4780
b26541a9
KW
4781 DEBUG_Lv(PerlIO_printf(Perl_debug_log, "created C object %p\n",
4782 PL_C_locale_obj));
e464e1ed
KW
4783
4784# ifdef USE_LOCALE_NUMERIC
4785
4786 PL_underlying_numeric_obj = duplocale(PL_C_locale_obj);
4787
4788# endif
e9bc6d6b 4789# endif
78f7c09f
KW
4790# ifdef USE_LOCALE_NUMERIC
4791
3014c972
KW
4792 PL_numeric_radix_sv = newSV(1);
4793 PL_underlying_radix_sv = newSV(1);
4794 Newxz(PL_numeric_name, 1, char); /* Single NUL character */
4795 new_numeric("C");
7b9f2e4d
KW
4796
4797# endif
4798# ifdef USE_LOCALE_COLLATE
4799
3014c972
KW
4800 Newxz(PL_collation_name, 1, char);
4801 new_collate("C");
4802
4803# endif
4804# ifdef USE_LOCALE_CTYPE
4805
4806 Newxz(PL_ctype_name, 1, char);
4807 new_ctype("C");
3ca88433 4808
78f7c09f 4809# endif
434e0069 4810# ifdef USE_PL_CURLOCALES
e9bc6d6b 4811
ec18fac0
KW
4812 /* Initialize our records. */
4813 for (i = 0; i < NOMINAL_LC_ALL_INDEX; i++) {
952fa04f 4814 (void) emulate_setlocale_i(i, posix_setlocale(categories[i], NULL),
ec18fac0
KW
4815 RECALCULATE_LC_ALL_ON_FINAL_INTERATION,
4816 __LINE__);
4817 }
e9bc6d6b
KW
4818
4819# endif
98994639 4820
65ebb059 4821 /* We try each locale in the list until we get one that works, or exhaust
20a240df
KW
4822 * the list. Normally the loop is executed just once. But if setting the
4823 * locale fails, inside the loop we add fallback trials to the array and so
4824 * will execute the loop multiple times */
e57aaf75
B
4825 trial_locales[0] = (trial_locales_struct) {
4826 .trial_locale = setlocale_init,
6e1b5a64
B
4827 .fallback_desc = NULL,
4828 .fallback_name = NULL,
e57aaf75 4829 };
c3fcd832 4830 trial_locales_count = 1;
7d4bcc4a 4831
65ebb059 4832 for (i= 0; i < trial_locales_count; i++) {
e57aaf75 4833 const char * trial_locale = trial_locales[i].trial_locale;
503bf960 4834 setlocale_failure = FALSE;
65ebb059 4835
7d4bcc4a
KW
4836# ifdef LC_ALL
4837
c117e8b0
KW
4838 /* setlocale() return vals; not copied so must be looked at
4839 * immediately. */
4840 const char * sl_result[NOMINAL_LC_ALL_INDEX + 1];
a7ff7aca 4841 sl_result[LC_ALL_INDEX_] = stdized_setlocale(LC_ALL, trial_locale);
d188fe6d 4842 DEBUG_LOCALE_INIT(LC_ALL_INDEX_, trial_locale, sl_result[LC_ALL_INDEX_]);
4fd76d49 4843 if (! sl_result[LC_ALL_INDEX_]) {
49c85077 4844 setlocale_failure = TRUE;
7cd8b568
KW
4845 }
4846 else {
4847 /* Since LC_ALL succeeded, it should have changed all the other
4848 * categories it can to its value; so we massage things so that the
4849 * setlocales below just return their category's current values.
4850 * This adequately handles the case in NetBSD where LC_COLLATE may
4851 * not be defined for a locale, and setting it individually will
7d4bcc4a 4852 * fail, whereas setting LC_ALL succeeds, leaving LC_COLLATE set to
7cd8b568
KW
4853 * the POSIX locale. */
4854 trial_locale = NULL;
4855 }
7d4bcc4a
KW
4856
4857# endif /* LC_ALL */
98994639 4858
e5f10d49
KW
4859 if (! setlocale_failure) {
4860 unsigned int j;
4861 for (j = 0; j < NOMINAL_LC_ALL_INDEX; j++) {
a7ff7aca 4862 curlocales[j] = stdized_setlocale(categories[j], trial_locale);
e5f10d49
KW
4863 if (! curlocales[j]) {
4864 setlocale_failure = TRUE;
4865 }
a683a66a 4866 curlocales[j] = savepv(curlocales[j]);
d188fe6d 4867 DEBUG_LOCALE_INIT(j, trial_locale, curlocales[j]);
e5f10d49 4868 }
c835d6be 4869
f6665bcd 4870 if (LIKELY(! setlocale_failure)) { /* All succeeded */
e5f10d49 4871 break; /* Exit trial_locales loop */
49c85077 4872 }
65ebb059 4873 }
98994639 4874
49c85077
KW
4875 /* Here, something failed; will need to try a fallback. */
4876 ok = 0;
65ebb059 4877
49c85077
KW
4878 if (i == 0) {
4879 unsigned int j;
98994639 4880
65ebb059 4881 if (locwarn) { /* Output failure info only on the first one */
7d4bcc4a
KW
4882
4883# ifdef LC_ALL
98994639 4884
49c85077
KW
4885 PerlIO_printf(Perl_error_log,
4886 "perl: warning: Setting locale failed.\n");
98994639 4887
7d4bcc4a 4888# else /* !LC_ALL */
98994639 4889
49c85077 4890 PerlIO_printf(Perl_error_log,
71ce8c74 4891 "perl: warning: Setting locale failed for the categories:\n");
7d4bcc4a 4892
e5f10d49
KW
4893 for (j = 0; j < NOMINAL_LC_ALL_INDEX; j++) {
4894 if (! curlocales[j]) {
71ce8c74 4895 PerlIO_printf(Perl_error_log, "\t%s\n", category_names[j]);
e5f10d49 4896 }
e5f10d49 4897 }
7d4bcc4a 4898
7d4bcc4a 4899# endif /* LC_ALL */
98994639 4900
49c85077
KW
4901 PerlIO_printf(Perl_error_log,
4902 "perl: warning: Please check that your locale settings:\n");
98994639 4903
7d4bcc4a
KW
4904# ifdef __GLIBC__
4905
49c85077
KW
4906 PerlIO_printf(Perl_error_log,
4907 "\tLANGUAGE = %c%s%c,\n",
4908 language ? '"' : '(',
4909 language ? language : "unset",
4910 language ? '"' : ')');
7d4bcc4a 4911# endif
98994639 4912
49c85077
KW
4913 PerlIO_printf(Perl_error_log,
4914 "\tLC_ALL = %c%s%c,\n",
4915 lc_all ? '"' : '(',
4916 lc_all ? lc_all : "unset",
4917 lc_all ? '"' : ')');
98994639 4918
7d4bcc4a
KW
4919# if defined(USE_ENVIRON_ARRAY)
4920
49c85077 4921 {
cd999af9 4922 char **e;
d5e32b93
KW
4923
4924 /* Look through the environment for any variables of the
4925 * form qr/ ^ LC_ [A-Z]+ = /x, except LC_ALL which was
4926 * already handled above. These are assumed to be locale
4927 * settings. Output them and their values. */
cd999af9 4928 for (e = environ; *e; e++) {
d5e32b93
KW
4929 const STRLEN prefix_len = sizeof("LC_") - 1;
4930 STRLEN uppers_len;
4931
cd999af9 4932 if ( strBEGINs(*e, "LC_")
c8b388b0 4933 && ! strBEGINs(*e, "LC_ALL=")
d5e32b93
KW
4934 && (uppers_len = strspn(*e + prefix_len,
4935 "ABCDEFGHIJKLMNOPQRSTUVWXYZ"))
4936 && ((*e)[prefix_len + uppers_len] == '='))
cd999af9
KW
4937 {
4938 PerlIO_printf(Perl_error_log, "\t%.*s = \"%s\",\n",
d5e32b93
KW
4939 (int) (prefix_len + uppers_len), *e,
4940 *e + prefix_len + uppers_len + 1);
cd999af9
KW
4941 }
4942 }
49c85077 4943 }
7d4bcc4a
KW
4944
4945# else
4946
49c85077
KW
4947 PerlIO_printf(Perl_error_log,
4948 "\t(possibly more locale environment variables)\n");
7d4bcc4a
KW
4949
4950# endif
98994639 4951
49c85077
KW
4952 PerlIO_printf(Perl_error_log,
4953 "\tLANG = %c%s%c\n",
4954 lang ? '"' : '(',
4955 lang ? lang : "unset",
4956 lang ? '"' : ')');
98994639 4957
49c85077
KW
4958 PerlIO_printf(Perl_error_log,
4959 " are supported and installed on your system.\n");
4960 }
98994639 4961
65ebb059 4962 /* Calculate what fallback locales to try. We have avoided this
f6bab5f6 4963 * until we have to, because failure is quite unlikely. This will
65ebb059
KW
4964 * usually change the upper bound of the loop we are in.
4965 *
4966 * Since the system's default way of setting the locale has not
4967 * found one that works, We use Perl's defined ordering: LC_ALL,
4968 * LANG, and the C locale. We don't try the same locale twice, so
4969 * don't add to the list if already there. (On POSIX systems, the
4970 * LC_ALL element will likely be a repeat of the 0th element "",
6b058d42
KW
4971 * but there's no harm done by doing it explicitly.
4972 *
4973 * Note that this tries the LC_ALL environment variable even on
4974 * systems which have no LC_ALL locale setting. This may or may
4975 * not have been originally intentional, but there's no real need
4976 * to change the behavior. */
65ebb059
KW
4977 if (lc_all) {
4978 for (j = 0; j < trial_locales_count; j++) {
e57aaf75 4979 if (strEQ(lc_all, trial_locales[j].trial_locale)) {
65ebb059
KW
4980 goto done_lc_all;
4981 }
4982 }
e57aaf75
B
4983 trial_locales[trial_locales_count++] = (trial_locales_struct) {
4984 .trial_locale = lc_all,
6e1b5a64
B
4985 .fallback_desc = (strEQ(lc_all, "C")
4986 ? "the standard locale"
4987 : "a fallback locale"),
4988 .fallback_name = lc_all,
e57aaf75 4989 };
65ebb059
KW
4990 }
4991 done_lc_all:
98994639 4992
65ebb059
KW
4993 if (lang) {
4994 for (j = 0; j < trial_locales_count; j++) {
e57aaf75 4995 if (strEQ(lang, trial_locales[j].trial_locale)) {
65ebb059
KW
4996 goto done_lang;
4997 }
4998 }
e57aaf75
B
4999 trial_locales[trial_locales_count++] = (trial_locales_struct) {
5000 .trial_locale = lang,
6e1b5a64
B
5001 .fallback_desc = (strEQ(lang, "C")
5002 ? "the standard locale"
5003 : "a fallback locale"),
5004 .fallback_name = lang,
e57aaf75 5005 };
65ebb059
KW
5006 }
5007 done_lang:
5008
7d4bcc4a
KW
5009# if defined(WIN32) && defined(LC_ALL)
5010
65ebb059
KW
5011 /* For Windows, we also try the system default locale before "C".
5012 * (If there exists a Windows without LC_ALL we skip this because
5013 * it gets too complicated. For those, the "C" is the next
4fe10b51
B
5014 * fallback possibility). */
5015 {
5016 /* Note that this may change the locale, but we are going to do
c3b01b79
KW
5017 * that anyway.
5018 *
5019 * Our normal Windows setlocale() implementation ignores the
5020 * system default locale to make things work like POSIX. This
5021 * is the only place where we want to consider it, so have to
5022 * use wrap_wsetlocale(). */
5023 const char *system_default_locale =
5024 stdize_locale(LC_ALL,
5025 S_wrap_wsetlocale(aTHX_ LC_ALL, ""),
5026 &PL_stdize_locale_buf,
5027 &PL_stdize_locale_bufsize,
5028 __LINE__);
4fe10b51
B
5029 DEBUG_LOCALE_INIT(LC_ALL_INDEX_, "", system_default_locale);
5030
5031 /* Skip if invalid or if it's already on the list of locales to
5032 * try */
5033 if (! system_default_locale) {
5034 goto done_system_default;
5035 }
5036 for (j = 0; j < trial_locales_count; j++) {
5037 if (strEQ(system_default_locale, trial_locales[j].trial_locale)) {
5038 goto done_system_default;
5039 }
5040 }
5041
5042 trial_locales[trial_locales_count++] = (trial_locales_struct) {
5043 .trial_locale = system_default_locale,
5044 .fallback_desc = (strEQ(system_default_locale, "C")
5045 ? "the standard locale"
5046 : "the system default locale"),
5047 .fallback_name = system_default_locale,
5048 };
5049 }
5050 done_system_default:
7d4bcc4a
KW
5051
5052# endif
65ebb059
KW
5053
5054 for (j = 0; j < trial_locales_count; j++) {
e57aaf75 5055 if (strEQ("C", trial_locales[j].trial_locale)) {
65ebb059
KW
5056 goto done_C;
5057 }
5058 }
e57aaf75
B
5059 trial_locales[trial_locales_count++] = (trial_locales_struct) {
5060 .trial_locale = "C",
6e1b5a64
B
5061 .fallback_desc = "the standard locale",
5062 .fallback_name = "C",
e57aaf75 5063 };
98994639 5064
65ebb059
KW
5065 done_C: ;
5066 } /* end of first time through the loop */
98994639 5067
7d4bcc4a
KW
5068# ifdef WIN32
5069
65ebb059 5070 next_iteration: ;
7d4bcc4a
KW
5071
5072# endif
65ebb059
KW
5073
5074 } /* end of looping through the trial locales */
5075
5076 if (ok < 1) { /* If we tried to fallback */
5077 const char* msg;
5078 if (! setlocale_failure) { /* fallback succeeded */
5079 msg = "Falling back to";
5080 }
5081 else { /* fallback failed */
e5f10d49 5082 unsigned int j;
98994639 5083
65ebb059
KW
5084 /* We dropped off the end of the loop, so have to decrement i to
5085 * get back to the value the last time through */
5086 i--;
98994639 5087
65ebb059
KW
5088 ok = -1;
5089 msg = "Failed to fall back to";
5090
5091 /* To continue, we should use whatever values we've got */
7d4bcc4a 5092
e5f10d49
KW
5093 for (j = 0; j < NOMINAL_LC_ALL_INDEX; j++) {
5094 Safefree(curlocales[j]);
a7ff7aca 5095 curlocales[j] = savepv(stdized_setlocale(categories[j], NULL));
d188fe6d 5096 DEBUG_LOCALE_INIT(j, NULL, curlocales[j]);
e5f10d49 5097 }
65ebb059
KW
5098 }
5099
5100 if (locwarn) {
6e1b5a64
B
5101 const char * description = trial_locales[i].fallback_desc;
5102 const char * name = trial_locales[i].fallback_name;
7d4bcc4a 5103
65ebb059
KW
5104 if (name && strNE(name, "")) {
5105 PerlIO_printf(Perl_error_log,
5106 "perl: warning: %s %s (\"%s\").\n", msg, description, name);
5107 }
5108 else {
5109 PerlIO_printf(Perl_error_log,
5110 "perl: warning: %s %s.\n", msg, description);
5111 }
5112 }
5113 } /* End of tried to fallback */
98994639 5114
a7ff7aca
KW
5115# ifdef USE_POSIX_2008_LOCALE
5116
5117 /* The stdized setlocales haven't affected the P2008 locales. Initialize
11d8dd19
KW
5118 * them now, calculating LC_ALL only on the final go round, when all have
5119 * been set. */
a7ff7aca 5120 for (i = 0; i < NOMINAL_LC_ALL_INDEX; i++) {
20be777a
KW
5121 (void) emulate_setlocale_i(i, curlocales[i],
5122 RECALCULATE_LC_ALL_ON_FINAL_INTERATION,
5123 __LINE__);
a7ff7aca
KW
5124 }
5125
5126# endif
5127
4e10e6a0 5128 /* Done with finding the locales; update the auxiliary records */
c7dc1ea8 5129 new_LC_ALL(NULL);
e5f10d49 5130
ebb1d9ce
KW
5131# if defined(USE_POSIX_2008_LOCALE) && defined(USE_LOCALE_NUMERIC)
5132
5133 /* This is a temporary workaround for #20155, to avoid issues where the
5134 * global locale wants a radix different from the per-thread one. This
5135 * restores behavior for LC_NUMERIC to what it was before a7ff7ac. */
5136 posix_setlocale(LC_NUMERIC, "C");
5137
5138# endif
5139
948523db 5140 for (i = 0; i < NOMINAL_LC_ALL_INDEX; i++) {
e5f10d49
KW
5141 Safefree(curlocales[i]);
5142 }
b310b053 5143
7d4bcc4a
KW
5144# if defined(USE_PERLIO) && defined(USE_LOCALE_CTYPE)
5145
49c85077 5146 /* Set PL_utf8locale to TRUE if using PerlIO _and_ the current LC_CTYPE
50bf02bd
KW
5147 * locale is UTF-8. The call to new_ctype() just above has already
5148 * calculated the latter value and saved it in PL_in_utf8_CTYPE_locale. If
5149 * both PL_utf8locale and PL_unicode (set by -C or by $ENV{PERL_UNICODE})
5150 * are true, perl.c:S_parse_body() will turn on the PerlIO :utf8 layer on
5151 * STDIN, STDOUT, STDERR, _and_ the default open discipline. */
5152 PL_utf8locale = PL_in_utf8_CTYPE_locale;
49c85077 5153
a05d7ebb 5154 /* Set PL_unicode to $ENV{PERL_UNICODE} if using PerlIO.
fde18df1
JH
5155 This is an alternative to using the -C command line switch
5156 (the -C if present will override this). */
5157 {
1604cfb0
MS
5158 const char *p = PerlEnv_getenv("PERL_UNICODE");
5159 PL_unicode = p ? parse_unicode_opts(&p) : 0;
5160 if (PL_unicode & PERL_UNICODE_UTF8CACHEASSERT_FLAG)
5161 PL_utf8cache = -1;
b310b053
JH
5162 }
5163
7d4bcc4a 5164# endif
e3305790 5165#endif /* USE_LOCALE */
7d4bcc4a 5166
2fcc0ca9 5167 /* So won't continue to output stuff */
27cdc72e 5168 DEBUG_INITIALIZATION_set(FALSE);
7d4bcc4a 5169
98994639
HS
5170 return ok;
5171}
5172
98994639
HS
5173#ifdef USE_LOCALE_COLLATE
5174
21afce67
KW
5175STATIC void
5176S_compute_collxfrm_coefficients(pTHX)
5177{
5178
5179 PL_in_utf8_COLLATE_locale = (PL_collation_standard)
5180 ? 0
5181 : is_locale_utf8(PL_collation_name);
5182 PL_strxfrm_NUL_replacement = '\0';
5183 PL_strxfrm_max_cp = 0;
5184
5185 /* A locale collation definition includes primary, secondary, tertiary,
5186 * etc. weights for each character. To sort, the primary weights are
5187 * used, and only if they compare equal, then the secondary weights are
5188 * used, and only if they compare equal, then the tertiary, etc.
5189 *
5190 * strxfrm() works by taking the input string, say ABC, and creating an
5191 * output transformed string consisting of first the primary weights,
5192 * A¹B¹C¹ followed by the secondary ones, A²B²C²; and then the
5193 * tertiary, etc, yielding A¹B¹C¹ A²B²C² A³B³C³ .... Some characters
5194 * may not have weights at every level. In our example, let's say B
5195 * doesn't have a tertiary weight, and A doesn't have a secondary
5196 * weight. The constructed string is then going to be
5197 * A¹B¹C¹ B²C² A³C³ ....
5198 * This has the desired effect that strcmp() will look at the secondary
5199 * or tertiary weights only if the strings compare equal at all higher
5200 * priority weights. The spaces shown here, like in
5201 * "A¹B¹C¹ A²B²C² "
5202 * are not just for readability. In the general case, these must
5203 * actually be bytes, which we will call here 'separator weights'; and
5204 * they must be smaller than any other weight value, but since these
5205 * are C strings, only the terminating one can be a NUL (some
5206 * implementations may include a non-NUL separator weight just before
5207 * the NUL). Implementations tend to reserve 01 for the separator
5208 * weights. They are needed so that a shorter string's secondary
5209 * weights won't be misconstrued as primary weights of a longer string,
5210 * etc. By making them smaller than any other weight, the shorter
5211 * string will sort first. (Actually, if all secondary weights are
5212 * smaller than all primary ones, there is no need for a separator
5213 * weight between those two levels, etc.)
5214 *
5215 * The length of the transformed string is roughly a linear function of
5216 * the input string. It's not exactly linear because some characters
5217 * don't have weights at all levels. When we call strxfrm() we have to
5218 * allocate some memory to hold the transformed string. The
5219 * calculations below try to find coefficients 'm' and 'b' for this
5220 * locale so that m*x + b equals how much space we need, given the size
5221 * of the input string in 'x'. If we calculate too small, we increase
5222 * the size as needed, and call strxfrm() again, but it is better to
5223 * get it right the first time to avoid wasted expensive string
5224 * transformations. */
5225
5226 {
5227 /* We use the string below to find how long the tranformation of it
5228 * is. Almost all locales are supersets of ASCII, or at least the
5229 * ASCII letters. We use all of them, half upper half lower,
5230 * because if we used fewer, we might hit just the ones that are
5231 * outliers in a particular locale. Most of the strings being
5232 * collated will contain a preponderance of letters, and even if
5233 * they are above-ASCII, they are likely to have the same number of
5234 * weight levels as the ASCII ones. It turns out that digits tend
5235 * to have fewer levels, and some punctuation has more, but those
5236 * are relatively sparse in text, and khw believes this gives a
5237 * reasonable result, but it could be changed if experience so
5238 * dictates. */
5239 const char longer[] = "ABCDEFGHIJKLMnopqrstuvwxyz";
5240 char * x_longer; /* Transformed 'longer' */
5241 Size_t x_len_longer; /* Length of 'x_longer' */
5242
5243 char * x_shorter; /* We also transform a substring of 'longer' */
5244 Size_t x_len_shorter;
5245
5246 /* mem_collxfrm_() is used get the transformation (though here we
5247 * are interested only in its length). It is used because it has
5248 * the intelligence to handle all cases, but to work, it needs some
5249 * values of 'm' and 'b' to get it started. For the purposes of
5250 * this calculation we use a very conservative estimate of 'm' and
5251 * 'b'. This assumes a weight can be multiple bytes, enough to
5252 * hold any UV on the platform, and there are 5 levels, 4 weight
5253 * bytes, and a trailing NUL. */
5254 PL_collxfrm_base = 5;
5255 PL_collxfrm_mult = 5 * sizeof(UV);
5256
5257 /* Find out how long the transformation really is */
5258 x_longer = mem_collxfrm_(longer,
5259 sizeof(longer) - 1,
5260 &x_len_longer,
5261
5262 /* We avoid converting to UTF-8 in the
5263 * called function by telling it the
5264 * string is in UTF-8 if the locale is a
5265 * UTF-8 one. Since the string passed
5266 * here is invariant under UTF-8, we can
5267 * claim it's UTF-8 even though it isn't.
5268 * */
5269 PL_in_utf8_COLLATE_locale);
5270 Safefree(x_longer);
5271
5272 /* Find out how long the transformation of a substring of 'longer'
5273 * is. Together the lengths of these transformations are
5274 * sufficient to calculate 'm' and 'b'. The substring is all of
5275 * 'longer' except the first character. This minimizes the chances
5276 * of being swayed by outliers */
5277 x_shorter = mem_collxfrm_(longer + 1,
5278 sizeof(longer) - 2,
5279 &x_len_shorter,
5280 PL_in_utf8_COLLATE_locale);
5281 Safefree(x_shorter);
5282
5283 /* If the results are nonsensical for this simple test, the whole
5284 * locale definition is suspect. Mark it so that locale collation
5285 * is not active at all for it. XXX Should we warn? */
5286 if ( x_len_shorter == 0
5287 || x_len_longer == 0
5288 || x_len_shorter >= x_len_longer)
5289 {
5290 PL_collxfrm_mult = 0;
5291 PL_collxfrm_base = 1;
5292 DEBUG_L(PerlIO_printf(Perl_debug_log,
5293 "Disabling locale collation for LC_COLLATE='%s';"
5294 " length for shorter sample=%zu; longer=%zu\n",
5295 PL_collation_name, x_len_shorter, x_len_longer));
5296 }
5297 else {
5298 SSize_t base; /* Temporary */
5299
5300 /* We have both: m * strlen(longer) + b = x_len_longer
5301 * m * strlen(shorter) + b = x_len_shorter;
5302 * subtracting yields:
5303 * m * (strlen(longer) - strlen(shorter))
5304 * = x_len_longer - x_len_shorter
5305 * But we have set things up so that 'shorter' is 1 byte smaller
5306 * than 'longer'. Hence:
5307 * m = x_len_longer - x_len_shorter
5308 *
5309 * But if something went wrong, make sure the multiplier is at
5310 * least 1.
5311 */
5312 if (x_len_longer > x_len_shorter) {
5313 PL_collxfrm_mult = (STRLEN) x_len_longer - x_len_shorter;
5314 }
5315 else {
5316 PL_collxfrm_mult = 1;
5317 }
5318
5319 /* mx + b = len
5320 * so: b = len - mx
5321 * but in case something has gone wrong, make sure it is
5322 * non-negative */
5323 base = x_len_longer - PL_collxfrm_mult * (sizeof(longer) - 1);
5324 if (base < 0) {
5325 base = 0;
5326 }
5327
5328 /* Add 1 for the trailing NUL */
5329 PL_collxfrm_base = base + 1;
5330 }
5331
5332 DEBUG_L(PerlIO_printf(Perl_debug_log,
5333 "?UTF-8 locale=%d; x_len_shorter=%zu, "
5334 "x_len_longer=%zu,"
5335 " collate multipler=%zu, collate base=%zu\n",
5336 PL_in_utf8_COLLATE_locale,
5337 x_len_shorter, x_len_longer,
5338 PL_collxfrm_mult, PL_collxfrm_base));
5339 }
5340}
5341
a4a439fb 5342char *
1362b264 5343Perl_mem_collxfrm_(pTHX_ const char *input_string,
a4a439fb
KW
5344 STRLEN len, /* Length of 'input_string' */
5345 STRLEN *xlen, /* Set to length of returned string
5346 (not including the collation index
5347 prefix) */
5348 bool utf8 /* Is the input in UTF-8? */
6696cfa7 5349 )
98994639 5350{
1362b264 5351 /* mem_collxfrm_() is like strxfrm() but with two important differences.
fe1c1494
KW
5352 * First, it handles embedded NULs. Second, it allocates a bit more memory
5353 * than needed for the transformed data itself. The real transformed data
5354 * begins at offset COLLXFRM_HDR_LEN. *xlen is set to the length of that,
5355 * and doesn't include the collation index size.
5356 *
5357 * It is the caller's responsibility to eventually free the memory returned
5358 * by this function.
5359 *
a4a439fb
KW
5360 * Please see sv_collxfrm() to see how this is used. */
5361
fe1c1494 5362# define COLLXFRM_HDR_LEN sizeof(PL_collation_ix)
55e5378d 5363
6696cfa7
KW
5364 char * s = (char *) input_string;
5365 STRLEN s_strlen = strlen(input_string);
79f120c8 5366 char *xbuf = NULL;
55e5378d 5367 STRLEN xAlloc; /* xalloc is a reserved word in VC */
17f41037 5368 STRLEN length_in_chars;
c664130f 5369 bool first_time = TRUE; /* Cleared after first loop iteration */
057f0a44
KW
5370
5371# ifdef USE_LOCALE_CTYPE
5372 const char * orig_CTYPE_locale = NULL;
5373# endif
98994639 5374
ef054375
KW
5375# if defined(USE_POSIX_2008_LOCALE) && defined HAS_STRXFRM_L
5376 locale_t constructed_locale = (locale_t) 0;
5377# endif
5378
1362b264 5379 PERL_ARGS_ASSERT_MEM_COLLXFRM_;
a4a439fb
KW
5380
5381 /* Must be NUL-terminated */
5382 assert(*(input_string + len) == '\0');
7918f24d 5383
5fdda512
KW
5384 if (PL_collxfrm_mult == 0) { /* unknown or bad */
5385 if (PL_collxfrm_base != 0) { /* bad collation => skip */
5386 DEBUG_L(PerlIO_printf(Perl_debug_log,
5387 "mem_collxfrm_: locale's collation is defective\n"));
5388 goto bad;
5389 }
5390
5391 S_compute_collxfrm_coefficients(aTHX);
79f120c8
KW
5392 }
5393
6696cfa7
KW
5394 /* Replace any embedded NULs with the control that sorts before any others.
5395 * This will give as good as possible results on strings that don't
5396 * otherwise contain that character, but otherwise there may be
5397 * less-than-perfect results with that character and NUL. This is
fdc080f3 5398 * unavoidable unless we replace strxfrm with our own implementation. */
fd43f63c
KW
5399 if (UNLIKELY(s_strlen < len)) { /* Only execute if there is an embedded
5400 NUL */
6696cfa7
KW
5401 char * e = s + len;
5402 char * sans_nuls;
fdc080f3 5403 STRLEN sans_nuls_len;
94762aa0 5404 int try_non_controls;
afc4976f
KW
5405 char this_replacement_char[] = "?\0"; /* Room for a two-byte string,
5406 making sure 2nd byte is NUL.
5407 */
5408 STRLEN this_replacement_len;
5409
1e4c9676
KW
5410 /* If we don't know what non-NUL control character sorts lowest for
5411 * this locale, find it */
f28f4d2a 5412 if (PL_strxfrm_NUL_replacement == '\0') {
6696cfa7 5413 int j;
afc4976f 5414 char * cur_min_x = NULL; /* The min_char's xfrm, (except it also
6696cfa7
KW
5415 includes the collation index
5416 prefixed. */
5417
91c0e2e0 5418 DEBUG_Lv(PerlIO_printf(Perl_debug_log, "Looking to replace NUL\n"));
94762aa0
KW
5419
5420 /* Unlikely, but it may be that no control will work to replace
1e4c9676
KW
5421 * NUL, in which case we instead look for any character. Controls
5422 * are preferred because collation order is, in general, context
5423 * sensitive, with adjoining characters affecting the order, and
5424 * controls are less likely to have such interactions, allowing the
5425 * NUL-replacement to stand on its own. (Another way to look at it
5426 * is to imagine what would happen if the NUL were replaced by a
5427 * combining character; it wouldn't work out all that well.) */
94762aa0
KW
5428 for (try_non_controls = 0;
5429 try_non_controls < 2;
5430 try_non_controls++)
5431 {
07676341
KW
5432
5433# ifdef USE_LOCALE_CTYPE
5434
5435 /* In this case we use isCNTRL_LC() below, which relies on
5436 * LC_CTYPE, so that must be switched to correspond with the
5437 * LC_COLLATE locale */
5438 if (! try_non_controls && ! PL_in_utf8_COLLATE_locale) {
5439 orig_CTYPE_locale = toggle_locale_c(LC_CTYPE, PL_collation_name);
5440 }
5441# endif
d4ff9586
KW
5442 /* Look through all legal code points (NUL isn't) */
5443 for (j = 1; j < 256; j++) {
5444 char * x; /* j's xfrm plus collation index */
5445 STRLEN x_len; /* length of 'x' */
5446 STRLEN trial_len = 1;
736a4fed 5447 char cur_source[] = { '\0', '\0' };
d4ff9586 5448
736a4fed
KW
5449 /* Skip non-controls the first time through the loop. The
5450 * controls in a UTF-8 locale are the L1 ones */
afc4976f
KW
5451 if (! try_non_controls && (PL_in_utf8_COLLATE_locale)
5452 ? ! isCNTRL_L1(j)
5453 : ! isCNTRL_LC(j))
5454 {
d4ff9586 5455 continue;
6696cfa7 5456 }
6696cfa7 5457
736a4fed
KW
5458 /* Create a 1-char string of the current code point */
5459 cur_source[0] = (char) j;
5460
d4ff9586 5461 /* Then transform it */
1362b264 5462 x = mem_collxfrm_(cur_source, trial_len, &x_len,
afc4976f 5463 0 /* The string is not in UTF-8 */);
6696cfa7 5464
1e4c9676 5465 /* Ignore any character that didn't successfully transform.
d4ff9586
KW
5466 * */
5467 if (! x) {
5468 continue;
5469 }
6696cfa7 5470
d4ff9586
KW
5471 /* If this character's transformation is lower than
5472 * the current lowest, this one becomes the lowest */
5473 if ( cur_min_x == NULL
5474 || strLT(x + COLLXFRM_HDR_LEN,
5475 cur_min_x + COLLXFRM_HDR_LEN))
5476 {
f28f4d2a 5477 PL_strxfrm_NUL_replacement = j;
62d66863 5478 Safefree(cur_min_x);
d4ff9586 5479 cur_min_x = x;
d4ff9586
KW
5480 }
5481 else {
5482 Safefree(x);
5483 }
1e4c9676 5484 } /* end of loop through all 255 characters */
6696cfa7 5485
07676341
KW
5486# ifdef USE_LOCALE_CTYPE
5487 restore_toggled_locale_c(LC_CTYPE, orig_CTYPE_locale);
5488# endif
5489
1e4c9676 5490 /* Stop looking if found */
94762aa0
KW
5491 if (cur_min_x) {
5492 break;
5493 }
5494
5495 /* Unlikely, but possible, if there aren't any controls that
5496 * work in the locale, repeat the loop, looking for any
5497 * character that works */
5498 DEBUG_L(PerlIO_printf(Perl_debug_log,
1362b264 5499 "mem_collxfrm_: No control worked. Trying non-controls\n"));
1e4c9676 5500 } /* End of loop to try first the controls, then any char */
6696cfa7 5501
94762aa0
KW
5502 if (! cur_min_x) {
5503 DEBUG_L(PerlIO_printf(Perl_debug_log,
1362b264 5504 "mem_collxfrm_: Couldn't find any character to replace"
94762aa0
KW
5505 " embedded NULs in locale %s with", PL_collation_name));
5506 goto bad;
58eebef2
KW
5507 }
5508
94762aa0 5509 DEBUG_L(PerlIO_printf(Perl_debug_log,
1362b264 5510 "mem_collxfrm_: Replacing embedded NULs in locale %s with "
f28f4d2a 5511 "0x%02X\n", PL_collation_name, PL_strxfrm_NUL_replacement));
94762aa0 5512
6696cfa7 5513 Safefree(cur_min_x);
1e4c9676 5514 } /* End of determining the character that is to replace NULs */
afc4976f
KW
5515
5516 /* If the replacement is variant under UTF-8, it must match the
291a84fb 5517 * UTF8-ness of the original */
f28f4d2a
KW
5518 if ( ! UVCHR_IS_INVARIANT(PL_strxfrm_NUL_replacement) && utf8) {
5519 this_replacement_char[0] =
5520 UTF8_EIGHT_BIT_HI(PL_strxfrm_NUL_replacement);
5521 this_replacement_char[1] =
5522 UTF8_EIGHT_BIT_LO(PL_strxfrm_NUL_replacement);
afc4976f
KW
5523 this_replacement_len = 2;
5524 }
5525 else {
f28f4d2a 5526 this_replacement_char[0] = PL_strxfrm_NUL_replacement;
afc4976f
KW
5527 /* this_replacement_char[1] = '\0' was done at initialization */
5528 this_replacement_len = 1;
6696cfa7
KW
5529 }
5530
5531 /* The worst case length for the replaced string would be if every
5532 * character in it is NUL. Multiply that by the length of each
5533 * replacement, and allow for a trailing NUL */
afc4976f 5534 sans_nuls_len = (len * this_replacement_len) + 1;
fdc080f3 5535 Newx(sans_nuls, sans_nuls_len, char);
6696cfa7
KW
5536 *sans_nuls = '\0';
5537
6696cfa7
KW
5538 /* Replace each NUL with the lowest collating control. Loop until have
5539 * exhausted all the NULs */
5540 while (s + s_strlen < e) {
6069d6c5 5541 my_strlcat(sans_nuls, s, sans_nuls_len);
6696cfa7
KW
5542
5543 /* Do the actual replacement */
6069d6c5 5544 my_strlcat(sans_nuls, this_replacement_char, sans_nuls_len);
6696cfa7
KW
5545
5546 /* Move past the input NUL */
5547 s += s_strlen + 1;
5548 s_strlen = strlen(s);
5549 }
5550
5551 /* And add anything that trails the final NUL */
6069d6c5 5552 my_strlcat(sans_nuls, s, sans_nuls_len);
6696cfa7
KW
5553
5554 /* Switch so below we transform this modified string */
5555 s = sans_nuls;
5556 len = strlen(s);
1e4c9676 5557 } /* End of replacing NULs */
6696cfa7 5558
a4a439fb
KW
5559 /* Make sure the UTF8ness of the string and locale match */
5560 if (utf8 != PL_in_utf8_COLLATE_locale) {
9de6fe47 5561 /* XXX convert above Unicode to 10FFFF? */
a4a439fb
KW
5562 const char * const t = s; /* Temporary so we can later find where the
5563 input was */
5564
5565 /* Here they don't match. Change the string's to be what the locale is
5566 * expecting */
5567
5568 if (! utf8) { /* locale is UTF-8, but input isn't; upgrade the input */
5569 s = (char *) bytes_to_utf8((const U8 *) s, &len);
5570 utf8 = TRUE;
5571 }
5572 else { /* locale is not UTF-8; but input is; downgrade the input */
5573
5574 s = (char *) bytes_from_utf8((const U8 *) s, &len, &utf8);
5575
5576 /* If the downgrade was successful we are done, but if the input
5577 * contains things that require UTF-8 to represent, have to do
5578 * damage control ... */
5579 if (UNLIKELY(utf8)) {
5580
5581 /* What we do is construct a non-UTF-8 string with
5582 * 1) the characters representable by a single byte converted
5583 * to be so (if necessary);
5584 * 2) and the rest converted to collate the same as the
5585 * highest collating representable character. That makes
5586 * them collate at the end. This is similar to how we
5587 * handle embedded NULs, but we use the highest collating
5588 * code point instead of the smallest. Like the NUL case,
5589 * this isn't perfect, but is the best we can reasonably
5590 * do. Every above-255 code point will sort the same as
5591 * the highest-sorting 0-255 code point. If that code
5592 * point can combine in a sequence with some other code
5593 * points for weight calculations, us changing something to
5594 * be it can adversely affect the results. But in most
5595 * cases, it should work reasonably. And note that this is
5596 * really an illegal situation: using code points above 255
5597 * on a locale where only 0-255 are valid. If two strings
5598 * sort entirely equal, then the sort order for the
5599 * above-255 code points will be in code point order. */
5600
5601 utf8 = FALSE;
5602
5603 /* If we haven't calculated the code point with the maximum
5604 * collating order for this locale, do so now */
5605 if (! PL_strxfrm_max_cp) {
5606 int j;
5607
5608 /* The current transformed string that collates the
5609 * highest (except it also includes the prefixed collation
5610 * index. */
5611 char * cur_max_x = NULL;
5612
5613 /* Look through all legal code points (NUL isn't) */
5614 for (j = 1; j < 256; j++) {
5615 char * x;
5616 STRLEN x_len;
736a4fed 5617 char cur_source[] = { '\0', '\0' };
a4a439fb 5618
736a4fed
KW
5619 /* Create a 1-char string of the current code point */
5620 cur_source[0] = (char) j;
a4a439fb
KW
5621
5622 /* Then transform it */
1362b264 5623 x = mem_collxfrm_(cur_source, 1, &x_len, FALSE);
a4a439fb
KW
5624
5625 /* If something went wrong (which it shouldn't), just
5626 * ignore this code point */
94762aa0 5627 if (! x) {
a4a439fb
KW
5628 continue;
5629 }
5630
5631 /* If this character's transformation is higher than
5632 * the current highest, this one becomes the highest */
5633 if ( cur_max_x == NULL
55e5378d
KW
5634 || strGT(x + COLLXFRM_HDR_LEN,
5635 cur_max_x + COLLXFRM_HDR_LEN))
a4a439fb
KW
5636 {
5637 PL_strxfrm_max_cp = j;
62d66863 5638 Safefree(cur_max_x);
a4a439fb
KW
5639 cur_max_x = x;
5640 }
5641 else {
5642 Safefree(x);
5643 }
5644 }
5645
94762aa0
KW
5646 if (! cur_max_x) {
5647 DEBUG_L(PerlIO_printf(Perl_debug_log,
1362b264 5648 "mem_collxfrm_: Couldn't find any character to"
94762aa0
KW
5649 " replace above-Latin1 chars in locale %s with",
5650 PL_collation_name));
5651 goto bad;
5652 }
5653
58eebef2 5654 DEBUG_L(PerlIO_printf(Perl_debug_log,
1362b264 5655 "mem_collxfrm_: highest 1-byte collating character"
58eebef2
KW
5656 " in locale %s is 0x%02X\n",
5657 PL_collation_name,
5658 PL_strxfrm_max_cp));
58eebef2 5659
a4a439fb
KW
5660 Safefree(cur_max_x);
5661 }
5662
5663 /* Here we know which legal code point collates the highest.
5664 * We are ready to construct the non-UTF-8 string. The length
5665 * will be at least 1 byte smaller than the input string
5666 * (because we changed at least one 2-byte character into a
5667 * single byte), but that is eaten up by the trailing NUL */
5668 Newx(s, len, char);
5669
5670 {
5671 STRLEN i;
5672 STRLEN d= 0;
042d9e50 5673 char * e = (char *) t + len;
a4a439fb
KW
5674
5675 for (i = 0; i < len; i+= UTF8SKIP(t + i)) {
5676 U8 cur_char = t[i];
5677 if (UTF8_IS_INVARIANT(cur_char)) {
5678 s[d++] = cur_char;
5679 }
042d9e50 5680 else if (UTF8_IS_NEXT_CHAR_DOWNGRADEABLE(t + i, e)) {
a4a439fb
KW
5681 s[d++] = EIGHT_BIT_UTF8_TO_NATIVE(cur_char, t[i+1]);
5682 }
5683 else { /* Replace illegal cp with highest collating
5684 one */
5685 s[d++] = PL_strxfrm_max_cp;
5686 }
5687 }
5688 s[d++] = '\0';
5689 Renew(s, d, char); /* Free up unused space */
5690 }
5691 }
5692 }
5693
5694 /* Here, we have constructed a modified version of the input. It could
5695 * be that we already had a modified copy before we did this version.
5696 * If so, that copy is no longer needed */
5697 if (t != input_string) {
5698 Safefree(t);
5699 }
5700 }
5701
17f41037
KW
5702 length_in_chars = (utf8)
5703 ? utf8_length((U8 *) s, (U8 *) s + len)
5704 : len;
5705
59c018b9
KW
5706 /* The first element in the output is the collation id, used by
5707 * sv_collxfrm(); then comes the space for the transformed string. The
5708 * equation should give us a good estimate as to how much is needed */
55e5378d 5709 xAlloc = COLLXFRM_HDR_LEN
a4a439fb 5710 + PL_collxfrm_base
17f41037 5711 + (PL_collxfrm_mult * length_in_chars);
a02a5408 5712 Newx(xbuf, xAlloc, char);
c7202dee
KW
5713 if (UNLIKELY(! xbuf)) {
5714 DEBUG_L(PerlIO_printf(Perl_debug_log,
1362b264 5715 "mem_collxfrm_: Couldn't malloc %zu bytes\n", xAlloc));
1604cfb0 5716 goto bad;
c7202dee 5717 }
98994639 5718
d35fca5f 5719 /* Store the collation id */
98994639 5720 *(U32*)xbuf = PL_collation_ix;
d35fca5f 5721
ef054375
KW
5722# if defined(USE_POSIX_2008_LOCALE) && defined HAS_STRXFRM_L
5723# ifdef USE_LOCALE_CTYPE
5724
5725 constructed_locale = newlocale(LC_CTYPE_MASK, PL_collation_name,
5726 duplocale(use_curlocale_scratch()));
5727# else
5728
5729 constructed_locale = duplocale(use_curlocale_scratch());
5730
5731# endif
5732# define my_strxfrm(dest, src, n) strxfrm_l(dest, src, n, \
5733 constructed_locale)
5734# define CLEANUP_STRXFRM \
5735 STMT_START { \
5736 if (constructed_locale != (locale_t) 0) \
5737 freelocale(constructed_locale); \
5738 } STMT_END
5739# else
5740# define my_strxfrm(dest, src, n) strxfrm(dest, src, n)
1915ba89
KW
5741# ifdef USE_LOCALE_CTYPE
5742
5743 orig_CTYPE_locale = toggle_locale_c(LC_CTYPE, PL_collation_name);
5744
5745# define CLEANUP_STRXFRM \
5746 restore_toggled_locale_c(LC_CTYPE, orig_CTYPE_locale)
5747# else
5748# define CLEANUP_STRXFRM NOOP
5749# endif
ef054375 5750# endif
1915ba89 5751
d35fca5f
KW
5752 /* Then the transformation of the input. We loop until successful, or we
5753 * give up */
4ebeff16 5754 for (;;) {
1adab0a7 5755
3b30ed2b 5756 errno = 0;
ef054375 5757 *xlen = my_strxfrm(xbuf + COLLXFRM_HDR_LEN, s, xAlloc - COLLXFRM_HDR_LEN);
4ebeff16
KW
5758
5759 /* If the transformed string occupies less space than we told strxfrm()
3b30ed2b 5760 * was available, it means it transformed the whole string. */
55e5378d 5761 if (*xlen < xAlloc - COLLXFRM_HDR_LEN) {
17f41037 5762
3b30ed2b
KW
5763 /* But there still could have been a problem */
5764 if (errno != 0) {
5765 DEBUG_L(PerlIO_printf(Perl_debug_log,
5766 "strxfrm failed for LC_COLLATE=%s; errno=%d, input=%s\n",
5767 PL_collation_name, errno,
5768 _byte_dump_string((U8 *) s, len, 0)));
5769 goto bad;
5770 }
5771
5772 /* Here, the transformation was successful. Some systems include a
5773 * trailing NUL in the returned length. Ignore it, using a loop in
5774 * case multiple trailing NULs are returned. */
1adab0a7
KW
5775 while ( (*xlen) > 0
5776 && *(xbuf + COLLXFRM_HDR_LEN + (*xlen) - 1) == '\0')
5777 {
5778 (*xlen)--;
5779 }
5780
17f41037
KW
5781 /* If the first try didn't get it, it means our prediction was low.
5782 * Modify the coefficients so that we predict a larger value in any
5783 * future transformations */
5784 if (! first_time) {
5785 STRLEN needed = *xlen + 1; /* +1 For trailing NUL */
5786 STRLEN computed_guess = PL_collxfrm_base
5787 + (PL_collxfrm_mult * length_in_chars);
e1c30f0c
KW
5788
5789 /* On zero-length input, just keep current slope instead of
5790 * dividing by 0 */
5791 const STRLEN new_m = (length_in_chars != 0)
5792 ? needed / length_in_chars
5793 : PL_collxfrm_mult;
17f41037
KW
5794
5795 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
b26541a9 5796 "initial size of %zu bytes for a length "
b07929e4 5797 "%zu string was insufficient, %zu needed\n",
b07929e4 5798 computed_guess, length_in_chars, needed));
17f41037
KW
5799
5800 /* If slope increased, use it, but discard this result for
5801 * length 1 strings, as we can't be sure that it's a real slope
5802 * change */
5803 if (length_in_chars > 1 && new_m > PL_collxfrm_mult) {
7d4bcc4a
KW
5804
5805# ifdef DEBUGGING
5806
17f41037
KW
5807 STRLEN old_m = PL_collxfrm_mult;
5808 STRLEN old_b = PL_collxfrm_base;
7d4bcc4a
KW
5809
5810# endif
5811
17f41037
KW
5812 PL_collxfrm_mult = new_m;
5813 PL_collxfrm_base = 1; /* +1 For trailing NUL */
5814 computed_guess = PL_collxfrm_base
5815 + (PL_collxfrm_mult * length_in_chars);
5816 if (computed_guess < needed) {
5817 PL_collxfrm_base += needed - computed_guess;
5818 }
5819
5820 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
b26541a9 5821 "slope is now %zu; was %zu, base "
b07929e4 5822 "is now %zu; was %zu\n",
b07929e4
KW
5823 PL_collxfrm_mult, old_m,
5824 PL_collxfrm_base, old_b));
17f41037
KW
5825 }
5826 else { /* Slope didn't change, but 'b' did */
5827 const STRLEN new_b = needed
5828 - computed_guess
5829 + PL_collxfrm_base;
5830 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
b26541a9 5831 "base is now %zu; was %zu\n", new_b, PL_collxfrm_base));
17f41037
KW
5832 PL_collxfrm_base = new_b;
5833 }
5834 }
5835
4ebeff16
KW
5836 break;
5837 }
bb0f664e 5838
c7202dee
KW
5839 if (UNLIKELY(*xlen >= PERL_INT_MAX)) {
5840 DEBUG_L(PerlIO_printf(Perl_debug_log,
1362b264 5841 "mem_collxfrm_: Needed %zu bytes, max permissible is %u\n",
c7202dee 5842 *xlen, PERL_INT_MAX));
4ebeff16 5843 goto bad;
c7202dee 5844 }
d35fca5f 5845
c664130f 5846 /* A well-behaved strxfrm() returns exactly how much space it needs
1adab0a7
KW
5847 * (usually not including the trailing NUL) when it fails due to not
5848 * enough space being provided. Assume that this is the case unless
5849 * it's been proven otherwise */
c664130f 5850 if (LIKELY(PL_strxfrm_is_behaved) && first_time) {
55e5378d 5851 xAlloc = *xlen + COLLXFRM_HDR_LEN + 1;
c664130f
KW
5852 }
5853 else { /* Here, either:
5854 * 1) The strxfrm() has previously shown bad behavior; or
5855 * 2) It isn't the first time through the loop, which means
5856 * that the strxfrm() is now showing bad behavior, because
5857 * we gave it what it said was needed in the previous
5858 * iteration, and it came back saying it needed still more.
5859 * (Many versions of cygwin fit this. When the buffer size
5860 * isn't sufficient, they return the input size instead of
5861 * how much is needed.)
d4ff9586
KW
5862 * Increase the buffer size by a fixed percentage and try again.
5863 * */
6ddd902c 5864 xAlloc += (xAlloc / 4) + 1;
c664130f 5865 PL_strxfrm_is_behaved = FALSE;
c664130f 5866
94d521c0 5867 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
1362b264 5868 "mem_collxfrm_ required more space than previously"
94d521c0
KW
5869 " calculated for locale %s, trying again with new"
5870 " guess=%zu+%zu\n",
3e25f6d2 5871 PL_collation_name, COLLXFRM_HDR_LEN,
94d521c0 5872 xAlloc - COLLXFRM_HDR_LEN));
58eebef2 5873 }
c664130f 5874
4ebeff16 5875 Renew(xbuf, xAlloc, char);
c7202dee
KW
5876 if (UNLIKELY(! xbuf)) {
5877 DEBUG_L(PerlIO_printf(Perl_debug_log,
1362b264 5878 "mem_collxfrm_: Couldn't realloc %zu bytes\n", xAlloc));
4ebeff16 5879 goto bad;
c7202dee 5880 }
c664130f
KW
5881
5882 first_time = FALSE;
4ebeff16 5883 }
98994639 5884
1915ba89
KW
5885 CLEANUP_STRXFRM;
5886
467b0d67 5887 DEBUG_L(print_collxfrm_input_and_return(s, s + len, xbuf, *xlen, utf8));
58eebef2 5888
fe1c1494 5889 /* Free up unneeded space; retain enough for trailing NUL */
55e5378d 5890 Renew(xbuf, COLLXFRM_HDR_LEN + *xlen + 1, char);
98994639 5891
6696cfa7
KW
5892 if (s != input_string) {
5893 Safefree(s);
98994639
HS
5894 }
5895
98994639
HS
5896 return xbuf;
5897
5898 bad:
7d4bcc4a 5899
1915ba89 5900 CLEANUP_STRXFRM;
467b0d67 5901 DEBUG_L(print_collxfrm_input_and_return(s, s + len, NULL, 0, utf8));
7d4bcc4a 5902
21dce8f4
KW
5903 Safefree(xbuf);
5904 if (s != input_string) {
5905 Safefree(s);
5906 }
5907 *xlen = 0;
5908
98994639
HS
5909 return NULL;
5910}
5911
7d4bcc4a 5912# ifdef DEBUGGING
c7202dee 5913
4cbaac56 5914STATIC void
c7202dee 5915S_print_collxfrm_input_and_return(pTHX_
467b0d67
KW
5916 const char * s,
5917 const char * e,
5918 const char * xbuf,
5919 const STRLEN xlen,
c7202dee
KW
5920 const bool is_utf8)
5921{
c7202dee
KW
5922
5923 PERL_ARGS_ASSERT_PRINT_COLLXFRM_INPUT_AND_RETURN;
5924
467b0d67
KW
5925 PerlIO_printf(Perl_debug_log,
5926 "mem_collxfrm_[ix %" UVuf "] for locale '%s':\n",
5927 (UV) PL_collation_ix, PL_collation_name);
5928 PerlIO_printf(Perl_debug_log, " input=");
9c8a6dc2 5929 print_bytes_for_locale(s, e, is_utf8);
467b0d67
KW
5930 PerlIO_printf(Perl_debug_log, "\n return=%s\n return len=%zu\n",
5931 ((xbuf == NULL)
5932 ? "(null)"
5933 : _byte_dump_string((U8 *) xbuf + COLLXFRM_HDR_LEN, xlen, 0)),
5934 xlen);
9c8a6dc2
KW
5935}
5936
4d9252fd
KW
5937# endif /* DEBUGGING */
5938#endif /* USE_LOCALE_COLLATE */
fe1c1494 5939
4ab43d2f 5940#ifdef DEBUGGING
4d9252fd 5941
9c8a6dc2
KW
5942STATIC void
5943S_print_bytes_for_locale(pTHX_
5944 const char * const s,
5945 const char * const e,
5946 const bool is_utf8)
5947{
5948 const char * t = s;
5949 bool prev_was_printable = TRUE;
5950 bool first_time = TRUE;
5951
5952 PERL_ARGS_ASSERT_PRINT_BYTES_FOR_LOCALE;
c7202dee
KW
5953
5954 while (t < e) {
5955 UV cp = (is_utf8)
5956 ? utf8_to_uvchr_buf((U8 *) t, e, NULL)
5957 : * (U8 *) t;
5958 if (isPRINT(cp)) {
5959 if (! prev_was_printable) {
5960 PerlIO_printf(Perl_debug_log, " ");
5961 }
5962 PerlIO_printf(Perl_debug_log, "%c", (U8) cp);
5963 prev_was_printable = TRUE;
5964 }
5965 else {
5966 if (! first_time) {
5967 PerlIO_printf(Perl_debug_log, " ");
5968 }
147e3846 5969 PerlIO_printf(Perl_debug_log, "%02" UVXf, cp);
c7202dee
KW
5970 prev_was_printable = FALSE;
5971 }
5972 t += (is_utf8) ? UTF8SKIP(t) : 1;
5973 first_time = FALSE;
5974 }
c7202dee
KW
5975}
5976
4ab43d2f
KW
5977#endif /* #ifdef DEBUGGING */
5978#ifdef USE_LOCALE
8ef6e574 5979
962aa53f 5980STATIC const char *
497bfce4
KW
5981S_toggle_locale_i(pTHX_ const unsigned cat_index,
5982 const char * new_locale,
5983 const line_t caller_line)
5984{
5985 /* Changes the locale for the category specified by 'index' to 'new_locale,
5986 * if they aren't already the same.
5987 *
5988 * Returns a copy of the name of the original locale for 'cat_index'
5989 * so can be switched back to with the companion function
5990 * restore_toggled_locale_i(), (NULL if no restoral is necessary.) */
5991
5992 const char * locale_to_restore_to = NULL;
5993
5994 PERL_ARGS_ASSERT_TOGGLE_LOCALE_I;
5995 assert(cat_index <= NOMINAL_LC_ALL_INDEX);
5996
5997 /* Find the original locale of the category we may need to change, so that
5998 * it can be restored to later */
5999
6000 locale_to_restore_to = querylocale_i(cat_index);
6001
6002 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
6003 "(%" LINE_Tf "): toggle_locale_i: index=%d(%s), wanted=%s,"
6004 " actual=%s\n",
6005 caller_line, cat_index, category_names[cat_index],
6006 new_locale, locale_to_restore_to));
6007
6008 if (! locale_to_restore_to) {
6009 locale_panic_(Perl_form(aTHX_ "Could not find current %s locale, errno=%d",
6010 category_names[cat_index], errno));
6011 }
6012
6013 /* If the locales are the same, there's nothing to do */
6014 if (strEQ(locale_to_restore_to, new_locale)) {
6015 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
6016 "(%d): %s locale unchanged as %s\n",
6017 caller_line, category_names[cat_index],
6018 new_locale));
6019
6020 return NULL;
6021 }
6022
6023 /* Finally, change the locale to the new one */
6024 void_setlocale_i(cat_index, new_locale);
6025
6026 DEBUG_Lv(PerlIO_printf(Perl_debug_log, "(%" LINE_Tf "): %s locale switched to %s\n",
6027 caller_line, category_names[cat_index], new_locale));
6028
6029 return locale_to_restore_to;
6030
6031#ifndef DEBUGGING
6032 PERL_UNUSED_ARG(caller_line);
6033#endif
6034
6035}
6036
6037STATIC void
6038S_restore_toggled_locale_i(pTHX_ const unsigned int cat_index,
6039 const char * restore_locale,
6040 const line_t caller_line)
6041{
6042 /* Restores the locale for LC_category corresponding to cat_indes to
6043 * 'restore_locale' (which is a copy that will be freed by this function),
6044 * or do nothing if the latter parameter is NULL */
6045
6046 PERL_ARGS_ASSERT_RESTORE_TOGGLED_LOCALE_I;
6047 assert(cat_index <= NOMINAL_LC_ALL_INDEX);
6048
6049 if (restore_locale == NULL) {
6050 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
6051 "(%" LINE_Tf "): No need to restore %s\n",
6052 caller_line, category_names[cat_index]));
6053 return;
6054 }
6055
6056 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
6057 "(%" LINE_Tf "): %s restoring locale to %s\n",
6058 caller_line, category_names[cat_index],
6059 restore_locale));
6060
6061 void_setlocale_i(cat_index, restore_locale);
6062
6063#ifndef DEBUGGING
6064 PERL_UNUSED_ARG(caller_line);
6065#endif
6066
6067}
6068
057f0a44
KW
6069#ifdef USE_LOCALE_CTYPE
6070
9bbcebb1
KW
6071STATIC bool
6072S_is_codeset_name_UTF8(const char * name)
6073{
6074 /* Return a boolean as to if the passed-in name indicates it is a UTF-8
6075 * code set. Several variants are possible */
6076 const Size_t len = strlen(name);
6077
6078 PERL_ARGS_ASSERT_IS_CODESET_NAME_UTF8;
6079
6080# ifdef WIN32
6081
6082 /* http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756.aspx */
6083 if (memENDs(name, len, "65001")) {
6084 return TRUE;
6085 }
6086
6087# endif
6088 /* 'UTF8' or 'UTF-8' */
6089 return ( inRANGE(len, 4, 5)
6090 && name[len-1] == '8'
6091 && ( memBEGINs(name, len, "UTF")
6092 || memBEGINs(name, len, "utf"))
6093 && (len == 4 || name[3] == '-'));
6094}
6095
057f0a44
KW
6096#endif
6097
a0eeb339
KW
6098STATIC bool
6099S_is_locale_utf8(pTHX_ const char * locale)
6100{
6101 /* Returns TRUE if the locale 'locale' is UTF-8; FALSE otherwise. It uses
58399f03
KW
6102 * my_langinfo(), which employs various methods to get this information
6103 * if nl_langinfo() isn't available, using heuristics as a last resort, in
6104 * which case, the result will very likely be correct for locales for
6105 * languages that have commonly used non-ASCII characters, but for notably
6106 * English, it comes down to if the locale's name ends in something like
6107 * "UTF-8". It errs on the side of not being a UTF-8 locale. */
a0eeb339
KW
6108
6109# if ! defined(USE_LOCALE_CTYPE) \
6110 || defined(EBCDIC) /* There aren't any real UTF-8 locales at this time */
6111
6112 PERL_UNUSED_ARG(locale);
6113
6114 return FALSE;
6115
6116# else
6117
6118 const char * scratch_buffer = NULL;
b269edc8
KW
6119 const char * codeset;
6120 bool retval;
a0eeb339
KW
6121
6122 PERL_ARGS_ASSERT_IS_LOCALE_UTF8;
6123
ccbc6a9d 6124 if (strEQ(locale, PL_ctype_name)) {
b269edc8
KW
6125 return PL_in_utf8_CTYPE_locale;
6126 }
6127
6128 codeset = my_langinfo_c(CODESET, LC_CTYPE, locale,
6129 &scratch_buffer, NULL, NULL);
6130 retval = is_codeset_name_UTF8(codeset);
6131
a0eeb339
KW
6132 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
6133 "found codeset=%s, is_utf8=%d\n", codeset, retval));
6134
6135 Safefree(scratch_buffer);
6136 return retval;
6137
6138# endif
6139
6140}
6141
6142#endif /* USE_LOCALE */
7d74bb61 6143
d6ded950
KW
6144bool
6145Perl__is_in_locale_category(pTHX_ const bool compiling, const int category)
6146{
6147 /* Internal function which returns if we are in the scope of a pragma that
6148 * enables the locale category 'category'. 'compiling' should indicate if
6149 * this is during the compilation phase (TRUE) or not (FALSE). */
6150
6151 const COP * const cop = (compiling) ? &PL_compiling : PL_curcop;
6152
363d7d4a
JK
6153 SV *these_categories = cop_hints_fetch_pvs(cop, "locale", 0);
6154 if (! these_categories || these_categories == &PL_sv_placeholder) {
d6ded950
KW
6155 return FALSE;
6156 }
6157
6158 /* The pseudo-category 'not_characters' is -1, so just add 1 to each to get
6159 * a valid unsigned */
6160 assert(category >= -1);
363d7d4a 6161 return cBOOL(SvUV(these_categories) & (1U << (category + 1)));
d6ded950
KW
6162}
6163
091b5ed7
KW
6164/* my_strerror() returns a mortalized copy of the text of the error message
6165 * associated with 'errnum'.
6166 *
6167 * If not called from within the scope of 'use locale', it uses the text from
6168 * the C locale. If Perl is compiled to not pay attention to LC_CTYPE nor
6169 * LC_MESSAGES, it uses whatever strerror() returns. Otherwise the text is
6170 * derived from the locale, LC_MESSAGES if we have that; LC_CTYPE if not.
6171 *
c728a1a6
KW
6172 * It returns in *utf8ness the result's UTF-8ness
6173 *
cb5c690d
KW
6174 * The function just calls strerror(), but temporarily switches locales, if
6175 * needed. Many platforms require LC_CTYPE and LC_MESSAGES to be in the same
6176 * CODESET in order for the return from strerror() to not contain '?' symbols,
6177 * or worse, mojibaked. It's cheaper to just use the stricter criteria of
6178 * being in the same locale. So the code below uses a common locale for both
6179 * categories. Again, that is C if not within 'use locale' scope; or the
6180 * LC_MESSAGES locale if in scope and we have that category; and LC_CTYPE if we
6181 * don't have LC_MESSAGES; and whatever strerror returns if we don't have
6182 * either category.
6183 *
6184 * There are two sets of implementations. The first below is if we have
6185 * strerror_l(). This is the simpler. We just use the already-built C locale
6186 * object if not in locale scope, or build up a custom one otherwise.
6187 *
6188 * When strerror_l() is not available, we may have to swap locales temporarily
6189 * to bring the two categories into sync with each other, and possibly to the C
6190 * locale.
6191 *
6192 * Because the prepropessing directives to conditionally compile this function
6193 * would greatly obscure the logic of the various implementations, the whole
6194 * function is repeated for each configuration, with some common macros. */
091b5ed7
KW
6195
6196/* Used to shorten the definitions of the following implementations of
6197 * my_strerror() */
6198#define DEBUG_STRERROR_ENTER(errnum, in_locale) \
6199 DEBUG_Lv(PerlIO_printf(Perl_debug_log, \
6200 "my_strerror called with errnum %d;" \
6201 " Within locale scope=%d\n", \
6202 errnum, in_locale))
c728a1a6 6203#define DEBUG_STRERROR_RETURN(errstr, utf8ness) \
091b5ed7
KW
6204 DEBUG_Lv(PerlIO_printf(Perl_debug_log, \
6205 "Strerror returned; saving a copy: '"); \
c728a1a6
KW
6206 print_bytes_for_locale(errstr, errstr + strlen(errstr), 0); \
6207 PerlIO_printf(Perl_debug_log, "'; utf8ness=%d\n", (int) *utf8ness);)
091b5ed7 6208
cb5c690d
KW
6209/* On platforms that have precisely one of these categories (Windows
6210 * qualifies), these yield the correct one */
6211#if defined(USE_LOCALE_CTYPE)
6212# define WHICH_LC_INDEX LC_CTYPE_INDEX_
6213#elif defined(USE_LOCALE_MESSAGES)
6214# define WHICH_LC_INDEX LC_MESSAGES_INDEX_
6215#endif
091b5ed7 6216
cb5c690d
KW
6217/*==========================================================================*/
6218/* First set of implementations, when have strerror_l() */
6219
6220#if defined(USE_POSIX_2008_LOCALE) && defined(HAS_STRERROR_L)
6221
6222# if ! defined(USE_LOCALE_CTYPE) && ! defined(USE_LOCALE_MESSAGES)
6223
6224/* Here, neither category is defined: use the C locale */
8fffab58 6225const char *
c728a1a6 6226Perl_my_strerror(pTHX_ const int errnum, utf8ness_t * utf8ness)
6ebbc862 6227{
cb5c690d 6228 PERL_ARGS_ASSERT_MY_STRERROR;
6ebbc862 6229
cb5c690d 6230 DEBUG_STRERROR_ENTER(errnum, 0);
52770946 6231
8fffab58 6232 const char *errstr = savepv(strerror_l(errnum, PL_C_locale_obj));
c728a1a6 6233 *utf8ness = UTF8NESS_IMMATERIAL;
52770946 6234
c728a1a6 6235 DEBUG_STRERROR_RETURN(errstr, utf8ness);
091b5ed7
KW
6236
6237 SAVEFREEPV(errstr);
6238 return errstr;
6239}
cb5c690d
KW
6240
6241# elif ! defined(USE_LOCALE_CTYPE) || ! defined(USE_LOCALE_MESSAGES)
6242
091b5ed7 6243/*--------------------------------------------------------------------------*/
091b5ed7 6244
cb5c690d
KW
6245/* Here one or the other of CTYPE or MESSAGES is defined, but not both. If we
6246 * are not within 'use locale' scope of the only one defined, we use the C
6247 * locale; otherwise use the current locale object */
52770946 6248
8fffab58 6249const char *
c728a1a6 6250Perl_my_strerror(pTHX_ const int errnum, utf8ness_t * utf8ness)
091b5ed7 6251{
cb5c690d 6252 PERL_ARGS_ASSERT_MY_STRERROR;
2c6ee1a7 6253
cb5c690d 6254 DEBUG_STRERROR_ENTER(errnum, IN_LC(categories[WHICH_LC_INDEX]));
7aaa36b1 6255
cb5c690d
KW
6256 /* Use C if not within locale scope; Otherwise, use current locale */
6257 const locale_t which_obj = (IN_LC(categories[WHICH_LC_INDEX]))
6258 ? PL_C_locale_obj
6259 : use_curlocale_scratch();
39e69e77 6260
8fffab58 6261 const char *errstr = savepv(strerror_l(errnum, which_obj));
c728a1a6
KW
6262 *utf8ness = get_locale_string_utf8ness_i(NULL, WHICH_LC_INDEX, errstr,
6263 LOCALE_UTF8NESS_UNKNOWN);
6264 DEBUG_STRERROR_RETURN(errstr, utf8ness);
091b5ed7
KW
6265
6266 SAVEFREEPV(errstr);
6267 return errstr;
6268}
cb5c690d 6269
091b5ed7 6270/*--------------------------------------------------------------------------*/
cb5c690d
KW
6271# else /* Are using both categories. Place them in the same CODESET,
6272 * either C or the LC_MESSAGES locale */
6273
8fffab58 6274const char *
c728a1a6 6275Perl_my_strerror(pTHX_ const int errnum, utf8ness_t * utf8ness)
cb5c690d
KW
6276{
6277 PERL_ARGS_ASSERT_MY_STRERROR;
6278
6279 DEBUG_STRERROR_ENTER(errnum, IN_LC(LC_MESSAGES));
6280
8fffab58 6281 const char *errstr;
cb5c690d 6282 if (! IN_LC(LC_MESSAGES)) { /* Use C if not within locale scope */
4eb27fc5 6283 errstr = savepv(strerror_l(errnum, PL_C_locale_obj));
c728a1a6 6284 *utf8ness = UTF8NESS_IMMATERIAL;
7aaa36b1 6285 }
cb5c690d
KW
6286 else { /* Otherwise, use the LC_MESSAGES locale, making sure LC_CTYPE
6287 matches */
6288 locale_t cur = duplocale(use_curlocale_scratch());
6289
6290 cur = newlocale(LC_CTYPE_MASK, querylocale_c(LC_MESSAGES), cur);
6291 errstr = savepv(strerror_l(errnum, cur));
c728a1a6
KW
6292 *utf8ness = get_locale_string_utf8ness_i(NULL, LC_MESSAGES_INDEX_,
6293 errstr, LOCALE_UTF8NESS_UNKNOWN);
cb5c690d
KW
6294 freelocale(cur);
6295 }
7aaa36b1 6296
c728a1a6 6297 DEBUG_STRERROR_RETURN(errstr, utf8ness);
43cb6651 6298
091b5ed7
KW
6299 SAVEFREEPV(errstr);
6300 return errstr;
6301}
cb5c690d 6302# endif /* Above is using strerror_l */
cb5c690d
KW
6303/*==========================================================================*/
6304#else /* Below is not using strerror_l */
6305# if ! defined(USE_LOCALE_CTYPE) && ! defined(USE_LOCALE_MESSAGES)
43cb6651 6306
cb5c690d
KW
6307/* If not using using either of the categories, return plain, unadorned
6308 * strerror */
43cb6651 6309
8fffab58 6310const char *
c728a1a6 6311Perl_my_strerror(pTHX_ const int errnum, utf8ness_t * utf8ness)
cb5c690d
KW
6312{
6313 PERL_ARGS_ASSERT_MY_STRERROR;
6314
6315 DEBUG_STRERROR_ENTER(errnum, 0);
43cb6651 6316
8fffab58 6317 const char *errstr = savepv(Strerror(errnum));
c728a1a6 6318 *utf8ness = UTF8NESS_IMMATERIAL;
43cb6651 6319
c728a1a6 6320 DEBUG_STRERROR_RETURN(errstr, utf8ness);
091b5ed7
KW
6321
6322 SAVEFREEPV(errstr);
6323 return errstr;
6324}
cb5c690d 6325
091b5ed7 6326/*--------------------------------------------------------------------------*/
cb5c690d 6327# elif ! defined(USE_LOCALE_CTYPE) || ! defined(USE_LOCALE_MESSAGES)
091b5ed7 6328
cb5c690d
KW
6329/* Here one or the other of CTYPE or MESSAGES is defined, but not both. If we
6330 * are not within 'use locale' scope of the only one defined, we use the C
6331 * locale; otherwise use the current locale */
7aaa36b1 6332
8fffab58 6333const char *
c728a1a6 6334Perl_my_strerror(pTHX_ const int errnum, utf8ness_t * utf8ness)
cb5c690d
KW
6335{
6336 PERL_ARGS_ASSERT_MY_STRERROR;
2c6ee1a7 6337
cb5c690d 6338 DEBUG_STRERROR_ENTER(errnum, IN_LC(categories[WHICH_LC_INDEX]));
6ebbc862 6339
8fffab58 6340 const char *errstr;
cb5c690d
KW
6341 if (IN_LC(categories[WHICH_LC_INDEX])) {
6342 errstr = savepv(Strerror(errnum));
c728a1a6
KW
6343 *utf8ness = get_locale_string_utf8ness_i(NULL, WHICH_LC_INDEX, errstr,
6344 LOCALE_UTF8NESS_UNKNOWN);
cb5c690d
KW
6345 }
6346 else {
d2be42af 6347
cb5c690d 6348 SETLOCALE_LOCK;
d2be42af 6349
cb5c690d 6350 const char * orig_locale = toggle_locale_i(WHICH_LC_INDEX, "C");
d2be42af 6351
cb5c690d 6352 errstr = savepv(Strerror(errnum));
a0b53297 6353
cb5c690d 6354 restore_toggled_locale_i(WHICH_LC_INDEX, orig_locale);
6ebbc862 6355
d2be42af 6356 SETLOCALE_UNLOCK;
c728a1a6
KW
6357
6358 *utf8ness = UTF8NESS_IMMATERIAL;
6359
6ebbc862
KW
6360 }
6361
c728a1a6 6362 DEBUG_STRERROR_RETURN(errstr, utf8ness);
cb5c690d
KW
6363
6364 SAVEFREEPV(errstr);
6365 return errstr;
6366}
6367
6368/*--------------------------------------------------------------------------*/
6369# else
6370
6371/* Below, have both LC_CTYPE and LC_MESSAGES. Place them in the same CODESET,
6372 * either C or the LC_MESSAGES locale */
6373
8fffab58 6374const char *
c728a1a6 6375Perl_my_strerror(pTHX_ const int errnum, utf8ness_t * utf8ness)
cb5c690d
KW
6376{
6377 PERL_ARGS_ASSERT_MY_STRERROR;
6378
6379 DEBUG_STRERROR_ENTER(errnum, IN_LC(LC_MESSAGES));
6380
6381 const char * desired_locale = (IN_LC(LC_MESSAGES))
6382 ? querylocale_c(LC_MESSAGES)
6383 : "C";
6384 /* XXX Can fail on z/OS */
6385
6386 SETLOCALE_LOCK;
6387
6388 const char* orig_CTYPE_locale = toggle_locale_c(LC_CTYPE, desired_locale);
6389 const char* orig_MESSAGES_locale = toggle_locale_c(LC_MESSAGES,
6390 desired_locale);
8fffab58 6391 const char *errstr = savepv(Strerror(errnum));
cb5c690d
KW
6392
6393 restore_toggled_locale_c(LC_MESSAGES, orig_MESSAGES_locale);
6394 restore_toggled_locale_c(LC_CTYPE, orig_CTYPE_locale);
6395
7953f73f 6396 SETLOCALE_UNLOCK;
6ebbc862 6397
c728a1a6
KW
6398 *utf8ness = get_locale_string_utf8ness_i(NULL, LC_MESSAGES_INDEX_, errstr,
6399 LOCALE_UTF8NESS_UNKNOWN);
6400 DEBUG_STRERROR_RETURN(errstr, utf8ness);
2c6ee1a7 6401
52770946 6402 SAVEFREEPV(errstr);
6ebbc862 6403 return errstr;
2c6ee1a7
KW
6404}
6405
cb5c690d
KW
6406/*--------------------------------------------------------------------------*/
6407# endif /* end of not using strerror_l() */
091b5ed7
KW
6408#endif /* end of all the my_strerror() implementations */
6409
66610fdd 6410/*
747c467a 6411
58e641fb
KW
6412=for apidoc switch_to_global_locale
6413
75920755 6414On systems without locale support, or on typical single-threaded builds, or on
58e641fb
KW
6415platforms that do not support per-thread locale operations, this function does
6416nothing. On such systems that do have locale support, only a locale global to
6417the whole program is available.
6418
6419On multi-threaded builds on systems that do have per-thread locale operations,
6420this function converts the thread it is running in to use the global locale.
6421This is for code that has not yet or cannot be updated to handle multi-threaded
6422locale operation. As long as only a single thread is so-converted, everything
6423works fine, as all the other threads continue to ignore the global one, so only
6424this thread looks at it.
6425
69c5e0db
KW
6426However, on Windows systems this isn't quite true prior to Visual Studio 15,
6427at which point Microsoft fixed a bug. A race can occur if you use the
6428following operations on earlier Windows platforms:
6429
6430=over
6431
6432=item L<POSIX::localeconv|POSIX/localeconv>
6433
6434=item L<I18N::Langinfo>, items C<CRNCYSTR> and C<THOUSEP>
6435
6436=item L<perlapi/Perl_langinfo>, items C<CRNCYSTR> and C<THOUSEP>
6437
6438=back
6439
6440The first item is not fixable (except by upgrading to a later Visual Studio
6441release), but it would be possible to work around the latter two items by using
6442the Windows API functions C<GetNumberFormat> and C<GetCurrencyFormat>; patches
6443welcome.
6444
58e641fb
KW
6445Without this function call, threads that use the L<C<setlocale(3)>> system
6446function will not work properly, as all the locale-sensitive functions will
6447look at the per-thread locale, and C<setlocale> will have no effect on this
6448thread.
6449
6450Perl code should convert to either call
6451L<C<Perl_setlocale>|perlapi/Perl_setlocale> (which is a drop-in for the system
6452C<setlocale>) or use the methods given in L<perlcall> to call
6453L<C<POSIX::setlocale>|POSIX/setlocale>. Either one will transparently properly
6454handle all cases of single- vs multi-thread, POSIX 2008-supported or not.
6455
6456Non-Perl libraries, such as C<gtk>, that call the system C<setlocale> can
6457continue to work if this function is called before transferring control to the
6458library.
6459
6460Upon return from the code that needs to use the global locale,
6461L<C<sync_locale()>|perlapi/sync_locale> should be called to restore the safe
6462multi-thread operation.
6463
6464=cut
6465*/
6466
6467void
6468Perl_switch_to_global_locale()
6469{
4bcc858c 6470 dTHX;
58e641fb
KW
6471
6472#ifdef USE_THREAD_SAFE_LOCALE
6473# ifdef WIN32
6474
6475 _configthreadlocale(_DISABLE_PER_THREAD_LOCALE);
6476
6477# else
58e641fb
KW
6478
6479 {
6480 unsigned int i;
6481
4fd76d49 6482 for (i = 0; i < LC_ALL_INDEX_; i++) {
a683a66a 6483 setlocale(categories[i], querylocale_i(i));
58e641fb
KW
6484 }
6485 }
6486
58e641fb
KW
6487 uselocale(LC_GLOBAL_LOCALE);
6488
6489# endif
6490#endif
6491
6492}
6493
6494/*
6495
747c467a
KW
6496=for apidoc sync_locale
6497
b2e9ba0c
KW
6498L<C<Perl_setlocale>|perlapi/Perl_setlocale> can be used at any time to query or
6499change the locale (though changing the locale is antisocial and dangerous on
6500multi-threaded systems that don't have multi-thread safe locale operations.
6501(See L<perllocale/Multi-threaded operation>). Using the system
6502L<C<setlocale(3)>> should be avoided. Nevertheless, certain non-Perl libraries
6503called from XS, such as C<Gtk> do so, and this can't be changed. When the
6504locale is changed by XS code that didn't use
6505L<C<Perl_setlocale>|perlapi/Perl_setlocale>, Perl needs to be told that the
6506locale has changed. Use this function to do so, before returning to Perl.
6507
6508The return value is a boolean: TRUE if the global locale at the time of call
6509was in effect; and FALSE if a per-thread locale was in effect. This can be
6510used by the caller that needs to restore things as-they-were to decide whether
6511or not to call
6512L<C<Perl_switch_to_global_locale>|perlapi/switch_to_global_locale>.
747c467a
KW
6513
6514=cut
6515*/
6516
b2e9ba0c
KW
6517bool
6518Perl_sync_locale()
747c467a 6519{
d36adde0
KW
6520
6521#ifndef USE_LOCALE
6522
6523 return TRUE;
6524
6525#else
6526
e9bc6d6b 6527 const char * newlocale;
b2e9ba0c
KW
6528 dTHX;
6529
d36adde0 6530# ifdef USE_POSIX_2008_LOCALE
b2e9ba0c
KW
6531
6532 bool was_in_global_locale = FALSE;
6533 locale_t cur_obj = uselocale((locale_t) 0);
6534
6535 /* On Windows, unless the foreign code has turned off the thread-safe
6536 * locale setting, any plain setlocale() will have affected what we see, so
6537 * no need to worry. Otherwise, If the foreign code has done a plain
6538 * setlocale(), it will only affect the global locale on POSIX systems, but
6539 * will affect the */
6540 if (cur_obj == LC_GLOBAL_LOCALE) {
6541
d36adde0 6542# ifdef HAS_QUERY_LOCALE
b2e9ba0c 6543
a683a66a 6544 void_setlocale_c(LC_ALL, querylocale_c(LC_ALL));
b2e9ba0c 6545
d36adde0 6546# else
b2e9ba0c
KW
6547
6548 unsigned int i;
6549
6550 /* We can't trust that we can read the LC_ALL format on the
6551 * platform, so do them individually */
4fd76d49 6552 for (i = 0; i < LC_ALL_INDEX_; i++) {
3980cddb 6553 void_setlocale_i(i, querylocale_i(i));
b2e9ba0c 6554 }
747c467a 6555
d36adde0 6556# endif
b2e9ba0c
KW
6557
6558 was_in_global_locale = TRUE;
6559 }
6560
d36adde0 6561# else
b2e9ba0c
KW
6562
6563 bool was_in_global_locale = TRUE;
6564
d36adde0
KW
6565# endif
6566# ifdef USE_LOCALE_CTYPE
7d4bcc4a 6567
22392ba2 6568 newlocale = querylocale_c(LC_CTYPE);
9f82ea3e 6569 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
b26541a9 6570 "%s\n", setlocale_debug_string_c(LC_CTYPE, NULL, newlocale)));
9f82ea3e 6571 new_ctype(newlocale);
747c467a 6572
d36adde0
KW
6573# endif /* USE_LOCALE_CTYPE */
6574# ifdef USE_LOCALE_COLLATE
7d4bcc4a 6575
22392ba2 6576 newlocale = querylocale_c(LC_COLLATE);
9f82ea3e 6577 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
b26541a9 6578 "%s\n", setlocale_debug_string_c(LC_COLLATE, NULL, newlocale)));
9f82ea3e 6579 new_collate(newlocale);
747c467a 6580
d36adde0
KW
6581# endif
6582# ifdef USE_LOCALE_NUMERIC
7d4bcc4a 6583
22392ba2 6584 newlocale = querylocale_c(LC_NUMERIC);
9f82ea3e 6585 DEBUG_Lv(PerlIO_printf(Perl_debug_log,
b26541a9 6586 "%s\n", setlocale_debug_string_c(LC_NUMERIC, NULL, newlocale)));
9f82ea3e 6587 new_numeric(newlocale);
7d4bcc4a 6588
d36adde0 6589# endif /* USE_LOCALE_NUMERIC */
747c467a 6590
b2e9ba0c 6591 return was_in_global_locale;
d36adde0
KW
6592
6593#endif
6594
747c467a
KW
6595}
6596
5d1187d1
KW
6597#if defined(DEBUGGING) && defined(USE_LOCALE)
6598
a4f00dcc 6599STATIC char *
1e78b5ed
KW
6600S_my_setlocale_debug_string_i(pTHX_
6601 const unsigned cat_index,
6602 const char* locale, /* Optional locale name */
5d1187d1 6603
1e78b5ed
KW
6604 /* return value from setlocale() when attempting
6605 * to set 'category' to 'locale' */
6606 const char* retval,
6607
6608 const line_t line)
5d1187d1
KW
6609{
6610 /* Returns a pointer to a NUL-terminated string in static storage with
6611 * added text about the info passed in. This is not thread safe and will
6612 * be overwritten by the next call, so this should be used just to
fa07b8e5 6613 * formulate a string to immediately print or savepv() on. */
5d1187d1 6614
1e78b5ed
KW
6615 const char * locale_quote;
6616 const char * retval_quote;
7d4bcc4a 6617
1e78b5ed 6618 assert(cat_index <= NOMINAL_LC_ALL_INDEX);
5d1187d1 6619
1e78b5ed
KW
6620 if (locale == NULL) {
6621 locale_quote = "";
6622 locale = "NULL";
5d1187d1
KW
6623 }
6624 else {
1e78b5ed 6625 locale_quote = "\"";
5d1187d1
KW
6626 }
6627
1e78b5ed
KW
6628 if (retval == NULL) {
6629 retval_quote = "";
6630 retval = "NULL";
5d1187d1
KW
6631 }
6632 else {
1e78b5ed 6633 retval_quote = "\"";
5d1187d1
KW
6634 }
6635
1e78b5ed
KW
6636# ifdef USE_LOCALE_THREADS
6637# define THREAD_FORMAT "%p:"
6638# define THREAD_ARGUMENT aTHX_
6639# else
6640# define THREAD_FORMAT
6641# define THREAD_ARGUMENT
6642# endif
6643
6644 return Perl_form(aTHX_
6645 "%s:%" LINE_Tf ":" THREAD_FORMAT
6646 " setlocale(%s[%d], %s%s%s) returned %s%s%s\n",
5d1187d1 6647
1e78b5ed
KW
6648 __FILE__, line, THREAD_ARGUMENT
6649 category_names[cat_index], categories[cat_index],
6650 locale_quote, locale, locale_quote,
6651 retval_quote, retval, retval_quote);
5d1187d1
KW
6652}
6653
6654#endif
747c467a 6655
e9bc6d6b
KW
6656void
6657Perl_thread_locale_init()
6658{
6659 /* Called from a thread on startup*/
6660
6661#ifdef USE_THREAD_SAFE_LOCALE
6662
6663 dTHX_DEBUGGING;
6664
e9bc6d6b 6665 DEBUG_L(PerlIO_printf(Perl_debug_log,
289e5f30
KW
6666 "new thread, initial locale is %s;"
6667 " calling setlocale(LC_ALL, \"C\")\n",
6668 get_LC_ALL_display()));
e9bc6d6b
KW
6669# ifdef WIN32
6670
fe1c1494 6671 /* On Windows, make sure new thread has per-thread locales enabled */
e9bc6d6b
KW
6672 _configthreadlocale(_ENABLE_PER_THREAD_LOCALE);
6673
6b32b503 6674# endif
289e5f30 6675# if defined(LC_ALL)
e9bc6d6b 6676
289e5f30
KW
6677 /* This thread starts off in the C locale. Use the full Perl_setlocale()
6678 * to make sure no ill-advised shortcuts get taken on this new thread, */
e9bc6d6b
KW
6679 Perl_setlocale(LC_ALL, "C");
6680
289e5f30
KW
6681# else
6682
6683 for (unsigned i = 0; i < NOMINAL_LC_ALL_INDEX; i++) {
6684 Perl_setlocale(categories[i], "C");
6685 }
6686
6687# endif
e9bc6d6b
KW
6688#endif
6689
6690}
6691
6692void
6693Perl_thread_locale_term()
6694{
6695 /* Called from a thread as it gets ready to terminate */
6696
4b06b4d0 6697#ifdef USE_POSIX_2008_LOCALE
e9bc6d6b
KW
6698
6699 /* C starts the new thread in the global C locale. If we are thread-safe,
6700 * we want to not be in the global locale */
6701
e9bc6d6b
KW
6702 { /* Free up */
6703 locale_t cur_obj = uselocale(LC_GLOBAL_LOCALE);
e72200e7 6704 if (cur_obj != LC_GLOBAL_LOCALE && cur_obj != PL_C_locale_obj) {
e9bc6d6b
KW
6705 freelocale(cur_obj);
6706 }
6707 }
6708
e9bc6d6b
KW
6709#endif
6710
6711}
747c467a
KW
6712
6713/*
14d04a33 6714 * ex: set ts=8 sts=4 sw=4 et:
37442d52 6715 */