This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
locale.c: Revamp finding if locale is UTF-8
This changes how this functionality works for the LC_CTYPE locale. On
systems that have nl_langinfo() one can get a "definitive" answer from
just that. Otherwise (or if that doesn't return properly) one can use
mbtowc() to check if the UTF-8 byte sequence for the Unicode REPLACEMENT
CHARACTER actually is considered to be that code point. This is also
"definitive". If the maximum byte string length for a character is too
short to handle all Unicode UTF-8, we know without further checking that
this isn't a UTF-8 locale, so can avoid the mbtowc check.
It turns out, from testing, that some locales are labelled UTF-8 by
nl_langinfo even though they depart from that at times. Similarly for
mbtowc(). Perl assumes that a locale doesn't depart from this, and uses
its internal rules that it knows are UTF-8. A future commit will warn
when this happens.