From 9487427ba26d65e7adf5954069fc2fde3bdedf41 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Mon, 12 Mar 2018 12:24:04 -0600 Subject: [PATCH] Fix comments/pod for LC_NUMERIC not always C In recent Perl versions, the underlying locale for LC_NUMERIC has been kept in C because XS code is expecting a dot radix character. But if the LC_NUMERIC locale has a dot, that is unnecessary. (There is also the thousands grouping separator which for safety we verify is empty.) Thus 5.27 doesn't always keep the underlying locale in C; it does so only if necessary. This commit updates various comments and pods to reflect this change. --- locale.c | 30 ++++++++++++++++++------------ perl.h | 32 +++++++++++++++++--------------- 2 files changed, 35 insertions(+), 27 deletions(-) diff --git a/locale.c b/locale.c index 6a4e012..d907e37 100644 --- a/locale.c +++ b/locale.c @@ -2081,9 +2081,13 @@ S_win32_setlocale(pTHX_ int category, const char* locale) This is an (almost) drop-in replacement for the system L>, taking the same parameters, and returning the same information, except that it -returns the correct underlying C locale, instead of C always, as -perl keeps that locale category as C, changing it briefly during the -operations where the underlying one is required. +returns the correct underlying C locale. Regular C will +instead return C if the underlying locale has a non-dot decimal point +character, or a non-empty thousands separator for displaying floating point +numbers. This is because perl keeps that locale category such that it has a +dot and empty separator, changing the locale briefly during the operations +where the underlying one is required. C knows about this, and +compensates; regular C doesn't. Another reason it isn't completely a drop-in replacement is that it is declared to return S>, whereas the system setlocale omits the @@ -2123,8 +2127,9 @@ Perl_setlocale(const int category, const char * locale) /* A NULL locale means only query what the current one is. We have the * LC_NUMERIC name saved, because we are normally switched into the C - * locale for it. For an LC_ALL query, switch back to get the correct - * results. All other categories don't require special handling */ + * (or equivalent) locale for it. For an LC_ALL query, switch back to get + * the correct results. All other categories don't require special + * handling */ if (locale == NULL) { if (category == LC_NUMERIC) { @@ -2291,13 +2296,14 @@ rather than getting segfaults at runtime. It delivers the correct results for the C and C items, without you having to write extra code. The reason for the extra code would be because these are from the C locale category, which is normally -kept set to the C locale by Perl, no matter what the underlying locale is -supposed to be, and so to get the expected results, you have to temporarily -toggle into the underlying locale, and later toggle back. (You could use plain -C and C> for this but -then you wouldn't get the other advantages of C; not keeping -C in the C locale would break a lot of CPAN, which is expecting the -radix (decimal point) character to be a dot.) +kept set by Perl so that the radix is a dot, and the separator is the empty +string, no matter what the underlying locale is supposed to be, and so to get +the expected results, you have to temporarily toggle into the underlying +locale, and later toggle back. (You could use plain C and +C> for this but then you wouldn't get +the other advantages of C; not keeping C in the C +(or equivalent) locale would break a lot of CPAN, which is expecting the radix +(decimal point) character to be a dot.) =item * diff --git a/perl.h b/perl.h index 5462b47..e76b9b8 100644 --- a/perl.h +++ b/perl.h @@ -5807,7 +5807,10 @@ typedef struct am_table_short AMTS; #ifdef USE_LOCALE_NUMERIC /* These macros are for toggling between the underlying locale (UNDERLYING or - * LOCAL) and the C locale (STANDARD). + * LOCAL) and the C locale (STANDARD). (Actually we don't have to use the C + * locale if the underlying locale is indistinguishable from it in the numeric + * operations used by Perl, namely the decimal point, and even the thousands + * separator.) =head1 Locale-related functions and macros @@ -5851,10 +5854,11 @@ close by, and guaranteed to be called. =for apidoc Am|void|STORE_LC_NUMERIC_SET_TO_NEEDED -This is used to help wrap XS or C code that that is C locale-aware. -This locale category is generally kept set to the C locale by Perl for -backwards compatibility, and because most XS code that reads floating point -values can cope only with the decimal radix character being a dot. +This is used to help wrap XS or C code that is C locale-aware. +This locale category is generally kept set to a locale where the decimal radix +character is a dot, and the separator between groups of digits is empty. This +is because most XS code that reads floating point numbers is expecting them to +have this syntax. This macro makes sure the current C state is set properly, to be aware of locale if the call to the XS or C code from the Perl program is @@ -5906,16 +5910,14 @@ expression, but with an empty argument list, like this: */ -/* The numeric locale is generally kept in the C locale instead of the - * underlying locale. The current status is known by looking at two words. - * One is non-zero if the current numeric locale is the standard C/POSIX one. - * The other is non-zero if the current locale is the underlying locale. Both - * can be non-zero if, as often happens, the underlying locale is C. - * - * Its slightly more complicated than this, as the PL_numeric_standard variable - * is set if the current numeric locale is indistinguishable from the C locale. - * This happens when the radix character is a dot, and the thousands separator - * is the empty string. +/* If the underlying numeric locale has a non-dot decimal point or has a + * non-empty floating point thousands separator, the current locale is instead + * generally kept in the C locale instead of that underlying locale. The + * current status is known by looking at two words. One is non-zero if the + * current numeric locale is the standard C/POSIX one or is indistinguishable + * from C. The other is non-zero if the current locale is the underlying + * locale. Both can be non-zero if, as often happens, the underlying locale is + * C or indistinguishable from it. * * khw believes the reason for the variables instead of the bits in a single * word is to avoid having to have masking instructions. */ -- 1.8.3.1