=head2 Why do regex character classes sometimes match only in the ASCII range?
-=head2 Why do some characters not uppercase or lowercase correctly?
-
Starting in Perl 5.14 (and partially in Perl 5.12), just put a
C<use feature 'unicode_strings'> near the beginning of your program.
Within its lexical scope you shouldn't have this problem. It also is
the problem.
However, on earlier Perls, or if you pass strings to subroutines outside
-the feature's scope, you can force Unicode semantics by changing the
+the feature's scope, you can force Unicode rules by changing the
encoding to UTF-8 by doing C<utf8::upgrade($string)>. This can be used
safely on any string, as it checks and does not change strings that have
already been upgraded.
For a more detailed discussion, see L<Unicode::Semantics> on CPAN.
+=head2 Why do some characters not uppercase or lowercase correctly?
+
+See the answer to the previous question.
+
=head2 How can I determine if a string is a text string or a binary string?
You can't. Some use the UTF8 flag for this, but that's misuse, and makes well
=head2 What are C<decode_utf8> and C<encode_utf8>?
These are alternate syntaxes for C<decode('utf8', ...)> and C<encode('utf8',
-...)>.
+...)>. Do not use these functions for data exchange. Instead use
+C<decode('UTF-8', ...)> and C<encode('UTF-8', ...)>; see
+L</What's the difference between UTF-8 and utf8?> below.
=head2 What is a "wide character"?
-This is a term used both for characters with an ordinal value greater than 127,
-characters with an ordinal value greater than 255, or any character occupying
-more than one byte, depending on the context.
+This is a term used for characters occupying more than one byte.
-The Perl warning "Wide character in ..." is caused by a character with an
-ordinal value greater than 255. With no specified encoding layer, Perl tries to
-fit things in ISO-8859-1 for backward compatibility reasons. When it can't, it
-emits this warning (if warnings are enabled), and outputs UTF-8 encoded data
+The Perl warning "Wide character in ..." is caused by such a character.
+With no specified encoding layer, Perl tries to
+fit things into a single byte. When it can't, it
+emits this warning (if warnings are enabled), and uses UTF-8 encoded data
instead.
To avoid this warning and to avoid having different output encodings in a single
what it accepts. If you have to communicate with things that aren't so liberal,
you may want to consider using C<UTF-8>. If you have to communicate with things
that are too liberal, you may have to use C<utf8>. The full explanation is in
-L<Encode>.
+L<Encode/"UTF-8 vs. utf8 vs. UTF8">.
C<UTF-8> is internally known as C<utf-8-strict>. The tutorial uses UTF-8
consistently, even where utf8 is actually used internally, because the