For some combinations of base character and modifiers, there are
I<precomposed> characters. There is a single character equivalent, for
-example, to the sequence C<LATIN CAPITAL LETTER A> followed by
+example, for the sequence C<LATIN CAPITAL LETTER A> followed by
C<COMBINING ACUTE ACCENT>. It is called C<LATIN CAPITAL LETTER A WITH
ACUTE>. These precomposed characters are, however, only available for
some combinations, and are mainly meant to support round-trip
displayed as C<\x..>, and the rest of the characters as themselves:
sub nice_string {
- join("",
- map { $_ > 255 ? # if wide character...
- sprintf("\\x{%04X}", $_) : # \x{...}
- chr($_) =~ /[[:cntrl:]]/ ? # else if control character...
- sprintf("\\x%02X", $_) : # \x..
- quotemeta(chr($_)) # else quoted or as themselves
- } unpack("W*", $_[0])); # unpack Unicode characters
+ join("",
+ map { $_ > 255 # if wide character...
+ ? sprintf("\\x{%04X}", $_) # \x{...}
+ : chr($_) =~ /[[:cntrl:]]/ # else if control character...
+ ? sprintf("\\x%02X", $_) # \x..
+ : quotemeta(chr($_)) # else quoted or as themselves
+ } unpack("W*", $_[0])); # unpack Unicode characters
}
For example,
As of Perl 5.8.0, the "Full" case-folding of I<Case
Mappings/SpecialCasing> is implemented, but bugs remain in C<qr//i> with them,
-mostly fixed by 5.14.
+mostly fixed by 5.14, and essentially entirely by 5.18.
=item *
Unicode does define several other decimal--and numeric--characters
besides the familiar 0 to 9, such as the Arabic and Indic digits.
Perl does not support string-to-number conversion for digits other
-than ASCII 0 to 9 (and ASCII a to f for hexadecimal).
+than ASCII C<0> to C<9> (and ASCII C<a> to C<f> for hexadecimal).
To get safe conversions from any Unicode string, use
L<Unicode::UCD/num()>.
How Does Unicode Work With Traditional Locales?
-Starting in Perl 5.16, you can specify
+If your locale is a UTF-8 locale, starting in Perl v5.20, Perl works
+well for all categories except C<LC_COLLATE> dealing with sorting and
+the C<cmp> operator.
+
+For other locales, starting in Perl 5.16, you can specify
use locale ':not_characters';
-to get Perl to work well with traditional locales. The catch is that you
+to get Perl to work well with them. The catch is that you
have to translate from the locale character set to/from Unicode
yourself. See L</Unicode IE<sol>O> above for how to