silence VC Win64 perl warnings in hv_func.h

[perl5.git] / pod / perlunifaq.pod
diff --git a/pod/perlunifaq.pod b/pod/perlunifaq.pod

index b291334..19eadd4 100644 (file)
--- a/pod/perlunifaq.pod
+++ b/pod/perlunifaq.pod
@@ -11,7 +11,7 @@ read after L<perlunitut>.
  
  No, and this isn't really a Unicode FAQ.
  
-Perl has an abstracted interface for all supported character encodings, so they
+Perl has an abstracted interface for all supported character encodings, so this
  is actually a generic C<Encode> tutorial and C<Encode> FAQ. But many people
  think that Unicode is special and magical, and I didn't want to disappoint
  them, so I decided to call the document a Unicode tutorial.
@@ -25,7 +25,7 @@ To find out which character encodings your Perl supports, run:
  =head2 Which version of perl should I use?
  
  Well, if you can, upgrade to the most recent, but certainly C<5.8.1> or newer.
-The tutorial and FAQ are based on the status quo as of C<5.8.8>.
+The tutorial and FAQ assume the latest release.
  
  You should also check your modules, and upgrade them if necessary. For example,
  HTML::Entities requires version >= 1.32 to function correctly, even though the
@@ -84,12 +84,12 @@ or encode anymore, on things that use the layered handle.
  
  You can provide this layer when C<open>ing the file:
  
-    open my $fh, '>:encoding(UTF-8)', $filename;  # auto encoding on write
-    open my $fh, '<:encoding(UTF-8)', $filename;  # auto decoding on read
+  open my $fh, '>:encoding(UTF-8)', $filename;  # auto encoding on write
+  open my $fh, '<:encoding(UTF-8)', $filename;  # auto decoding on read
  
  Or if you already have an open filehandle:
  
-    binmode $fh, ':encoding(UTF-8)';
+  binmode $fh, ':encoding(UTF-8)';
  
  Some database drivers for DBI can also automatically encode and decode, but
  that is sometimes limited to the UTF-8 encoding.
@@ -136,22 +136,35 @@ concern, and you can just C<eval> dumped data as always.
  
  =head2 Why do regex character classes sometimes match only in the ASCII range?
  
-=head2 Why do some characters not uppercase or lowercase correctly?
-
-It seemed like a good idea at the time, to keep the semantics the same for
-standard strings, when Perl got Unicode support. While it might be repaired
-in the future, we now have to deal with the fact that Perl treats equal
-strings differently, depending on the internal state.
+Starting in Perl 5.14 (and partially in Perl 5.12), just put a
+C<use feature 'unicode_strings'> near the beginning of your program.
+Within its lexical scope you shouldn't have this problem.  It also is
+automatically enabled under C<use feature ':5.12'> or C<use v5.12> or
+using C<-E> on the command line for Perl 5.12 or higher.
+
+The rationale for requiring this is to not break older programs that
+rely on the way things worked before Unicode came along.  Those older
+programs knew only about the ASCII character set, and so may not work
+properly for additional characters.  When a string is encoded in UTF-8,
+Perl assumes that the program is prepared to deal with Unicode, but when
+the string isn't, Perl assumes that only ASCII
+is wanted, and so those characters that are not ASCII
+characters aren't recognized as to what they would be in Unicode.
+C<use feature 'unicode_strings'> tells Perl to treat all characters as
+Unicode, whether the string is encoded in UTF-8 or not, thus avoiding
+the problem.
+
+However, on earlier Perls, or if you pass strings to subroutines outside
+the feature's scope, you can force Unicode rules by changing the
+encoding to UTF-8 by doing C<utf8::upgrade($string)>. This can be used
+safely on any string, as it checks and does not change strings that have
+already been upgraded.
  
-Affected are C<uc>, C<lc>, C<ucfirst>, C<lcfirst>, C<\U>, C<\L>, C<\u>, C<\l>,
-C<\d>, C<\s>, C<\w>, C<\D>, C<\S>, C<\W>, C</.../i>, C<(?i:...)>,
-C</[[:posix:]]/>.
+For a more detailed discussion, see L<Unicode::Semantics> on CPAN.
  
-To force Unicode semantics, you can upgrade the internal representation to
-by doing C<utf8::upgrade($string)>. This does not change strings that were
-already upgraded.
+=head2 Why do some characters not uppercase or lowercase correctly?
  
-For a more detailed discussion, see L<Unicode::Semantics> on CPAN.
+See the answer to the previous question.
  
  =head2 How can I determine if a string is a text string or a binary string?
  
@@ -192,7 +205,7 @@ These are alternate syntaxes for C<decode('utf8', ...)> and C<encode('utf8',
  
  This is a term used both for characters with an ordinal value greater than 127,
  characters with an ordinal value greater than 255, or any character occupying
-than one byte, depending on the context.
+more than one byte, depending on the context.
  
  The Perl warning "Wide character in ..." is caused by a character with an
  ordinal value greater than 255. With no specified encoding layer, Perl tries to
@@ -215,7 +228,9 @@ use C<is_utf8>, C<_utf8_on> or C<_utf8_off> at all.
  
  The UTF8 flag, also called SvUTF8, is an internal flag that indicates that the
  current internal representation is UTF-8. Without the flag, it is assumed to be
-ISO-8859-1. Perl converts between these automatically.
+ISO-8859-1. Perl converts between these automatically.  (Actually Perl usually
+assumes the representation is ASCII; see L</Why do regex character classes
+sometimes match only in the ASCII range?> above.)
  
  One of Perl's internal formats happens to be UTF-8. Unfortunately, Perl can't
  keep a secret, so everyone knows about this. That is the source of much
@@ -261,7 +276,8 @@ Instead of C<decode> and C<encode>, you could use C<_utf8_on> and C<_utf8_off>,
  but this is considered bad style. Especially C<_utf8_on> can be dangerous, for
  the same reason that C<:utf8> can.
  
-There are some shortcuts for oneliners; see C<-C> in L<perlrun>.
+There are some shortcuts for oneliners;
+see L<-C|perlrun/-C [numberE<sol>list]> in L<perlrun>.
  
  =head2 What's the difference between C<UTF-8> and C<utf8>?