From 8d72e74e3dc5017c5a3fade48e0c74109c297ebc Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Thu, 8 Mar 2018 13:00:40 -0700 Subject: [PATCH] Langinfo: Implement CODESET on Windows This applies to I18N::Langinfo, and the API function Perl_langinfo. Windows doesn't have nl_langinfo, so an emulation is used. It turns out that it is easy to emulate the behavior for the CODESET item on Windows, as that vendor has kept things consistent. --- ext/I18N-Langinfo/Langinfo.pm | 7 ++-- locale.c | 74 ++++++++++++++++++++++++++++++++++++++----- pod/perldelta.pod | 4 +-- 3 files changed, 73 insertions(+), 12 deletions(-) diff --git a/ext/I18N-Langinfo/Langinfo.pm b/ext/I18N-Langinfo/Langinfo.pm index bcc1527..e9e84d2 100644 --- a/ext/I18N-Langinfo/Langinfo.pm +++ b/ext/I18N-Langinfo/Langinfo.pm @@ -168,12 +168,15 @@ glitches. These are the items that could be different: =over -=item C - =item C Unimplemented, so returns C<"">. +=item C + +Unimplemented, except on Windows, due to the vagaries of vendor locale names, +returning C<""> on non-Windows. + =item C =item C diff --git a/locale.c b/locale.c index b90d69f..277e038 100644 --- a/locale.c +++ b/locale.c @@ -2312,12 +2312,12 @@ But most importantly, it works on systems that don't have C, such as Windows, hence makes your code more portable. Of the fifty-some possible items specified by the POSIX 2008 standard, L, -only two are completely unimplemented (though the loss of one of these is -significant). It uses various techniques to recover the other items, including -calling C>, and C>, both of which are specified -in C89, so should be always be available. Later C versions have -additional capabilities; C<""> is returned for those not available on your -system. +only one is completely unimplemented, though on non-Windows platforms, another +significant one is also not implemented). It uses various techniques to +recover the other items, including calling C>, and +C>, both of which are specified in C89, so should be always be +available. Later C versions have additional capabilities; C<""> is +returned for those not available on your system. It is important to note that when called with an item that is recovered by using C, the buffer from any previous explicit call to @@ -2493,8 +2493,7 @@ S_my_nl_langinfo(const int item, bool toggle) switch (item) { Size_t len; - /* These 2 are unimplemented */ - case CODESET: + /* This is unimplemented */ case ERA: /* For use with strftime() %E modifier */ default: @@ -2506,7 +2505,66 @@ S_my_nl_langinfo(const int item, bool toggle) case NOEXPR: return "^[-0nN]"; case NOSTR: return "no"; + case CODESET: + +# ifndef WIN32 + + /* On non-windows, this is unimplemented, in part because of + * inconsistencies between vendors. The Darwin native + * nl_langinfo() implementation simply looks at everything past + * any dot in the name, but that doesn't work for other + * vendors. Many Linux locales that don't have UTF-8 in their + * names really are UTF-8, for example; z/OS locales that do + * have UTF-8 in their names, aren't really UTF-8 */ + return ""; + +# else + + { /* But on Windows, the name does seem to be consistent, so + use that. */ + const char * p; + const char * first; + Size_t offset = 0; + const char * name = my_setlocale(LC_CTYPE, NULL); + + if (isNAME_C_OR_POSIX(name)) { + return "ANSI_X3.4-1968"; + } + + /* Find the dot in the locale name */ + first = (const char *) strchr(name, '.'); + if (! first) { + first = name; + goto has_nondigit; + } + /* Look at everything past the dot */ + first++; + p = first; + + while (*p) { + if (! isDIGIT(*p)) { + goto has_nondigit; + } + + p++; + } + + /* Here everything past the dot is a digit. Treat it as a + * code page */ + save_to_buffer("CP", &PL_langinfo_buf, + &PL_langinfo_bufsize, 0); + offset = STRLENs("CP"); + + has_nondigit: + + retval = save_to_buffer(first, &PL_langinfo_buf, + &PL_langinfo_bufsize, offset); + } + + break; + +# endif # ifdef HAS_LOCALECONV case CRNCYSTR: diff --git a/pod/perldelta.pod b/pod/perldelta.pod index 42db70d..9baa05a 100644 --- a/pod/perldelta.pod +++ b/pod/perldelta.pod @@ -140,8 +140,8 @@ L has been upgraded from version 0.15 to 0.16. This module is now available on all platforms, emulating the system L on systems that lack it. Some caveats apply, as L, the most severe being -that the C item is not implemented on those systems, always -returning C<"">. +that, except for MS Windows, the C item is not implemented on +those systems, always returning C<"">. It now sets the UTF-8 flag in its returned scalar if the string contains legal non-ASCII UTF-8, and the locale is UTF-8 ([perl #127288]. -- 1.8.3.1