perldelta for 127ce1c

[perl5.git] / pod / perluniintro.pod
diff --git a/pod/perluniintro.pod b/pod/perluniintro.pod

index 8ce4b7b..244cd38 100644 (file)
--- a/pod/perluniintro.pod
+++ b/pod/perluniintro.pod
@@ -71,7 +71,7 @@ view: one "character" is one Unicode code point.
  
  For some combinations of base character and modifiers, there are
  I<precomposed> characters.  There is a single character equivalent, for
-example, to the sequence C<LATIN CAPITAL LETTER A> followed by
+example, for the sequence C<LATIN CAPITAL LETTER A> followed by
  C<COMBINING ACUTE ACCENT>.  It is called  C<LATIN CAPITAL LETTER A WITH
  ACUTE>.  These precomposed characters are, however, only available for
  some combinations, and are mainly meant to support round-trip
@@ -109,7 +109,7 @@ C<block> of consecutive unallocated code points for its characters.  So
  far, the number of code points in these blocks has always been evenly
  divisible by 16.  Extras in a block, not currently needed, are left
  unallocated, for future growth.  But there have been occasions when
-a later relase needed more code points than the available extras, and a
+a later release needed more code points than the available extras, and a
  new block had to allocated somewhere else, not contiguous to the initial
  one, to handle the overflow.  Thus, it became apparent early on that
  "block" wasn't an adequate organizing principal, and so the C<Script>
@@ -137,7 +137,7 @@ forms>, of which I<UTF-8> is perhaps the most popular.  UTF-8 is a
  variable length encoding that encodes Unicode characters as 1 to 6
  bytes.  Other encodings
  include UTF-16 and UTF-32 and their big- and little-endian variants
-(UTF-8 is byte-order independent) The ISO/IEC 10646 defines the UCS-2
+(UTF-8 is byte-order independent).  The ISO/IEC 10646 defines the UCS-2
  and UCS-4 encoding forms.
  
  For more information about encodings--for instance, to learn what
@@ -145,12 +145,12 @@ I<surrogates> and I<byte order marks> (BOMs) are--see L<perlunicode>.
  
  =head2 Perl's Unicode Support
  
-Starting from Perl 5.6.0, Perl has had the capacity to handle Unicode
-natively.  Perl 5.8.0, however, is the first recommended release for
+Starting from Perl v5.6.0, Perl has had the capacity to handle Unicode
+natively.  Perl v5.8.0, however, is the first recommended release for
  serious Unicode work.  The maintenance release 5.6.1 fixed many of the
  problems of the initial Unicode implementation, but for example
  regular expressions still do not work with Unicode in 5.6.1.
-Perl 5.14.0 is the first release where Unicode support is
+Perl v5.14.0 is the first release where Unicode support is
  (almost) seamlessly integrable without some gotchas (the exception being
  some differences in L<quotemeta|perlfunc/quotemeta>, which is fixed
  starting in Perl 5.16.0).   To enable this
@@ -159,12 +159,12 @@ automatically selected if you C<use 5.012> or higher).  See L<feature>.
  (5.14 also fixes a number of bugs and departures from the Unicode
  standard.)
  
-Before Perl 5.8.0, the use of C<use utf8> was used to declare
+Before Perl v5.8.0, the use of C<use utf8> was used to declare
  that operations in the current block or file would be Unicode-aware.
  This model was found to be wrong, or at least clumsy: the "Unicodeness"
  is now carried with the data, instead of being attached to the
  operations.
-Starting with Perl 5.8.0, only one case remains where an explicit C<use
+Starting with Perl v5.8.0, only one case remains where an explicit C<use
  utf8> is needed: if your Perl script itself is encoded in UTF-8, you can
  use UTF-8 in your identifier names, and in string and regular expression
  literals, by saying C<use utf8>.  This is not the default because
@@ -176,7 +176,7 @@ Perl supports both pre-5.6 strings of eight-bit native bytes, and
  strings of Unicode characters.  The general principle is that Perl tries
  to keep its data as eight-bit bytes for as long as possible, but as soon
  as Unicodeness cannot be avoided, the data is transparently upgraded
-to Unicode.  Prior to Perl 5.14, the upgrade was not completely
+to Unicode.  Prior to Perl v5.14.0, the upgrade was not completely
  transparent (see L<perlunicode/The "Unicode Bug">), and for backwards
  compatibility, full transparency is not gained unless C<use feature
  'unicode_strings'> (see L<feature>) or C<use 5.012> (or higher) is
@@ -415,7 +415,7 @@ streams, use explicit layers directly in the C<open()> call.
  You can switch encodings on an already opened stream by using
  C<binmode()>; see L<perlfunc/binmode>.
  
-The C<:locale> does not currently (as of Perl 5.8.0) work with
+The C<:locale> does not currently work with
  C<open()> and C<binmode()>, only with the C<open> pragma.  The
  C<:utf8> and C<:encoding(...)> methods do work with all of C<open()>,
  C<binmode()>, and the C<open> pragma.
@@ -461,8 +461,8 @@ UTF-8 encoded.  A C<use open ':encoding(utf8)'> would have avoided the
  bug, or explicitly opening also the F<file> for input as UTF-8.
  
  B<NOTE>: the C<:utf8> and C<:encoding> features work only if your
-Perl has been built with the new PerlIO feature (which is the default
-on most systems).
+Perl has been built with L<PerlIO>, which is the default
+on most systems.
  
  =head2 Displaying Unicode As Text
  
@@ -473,13 +473,13 @@ its argument so that Unicode characters with code points greater than
  displayed as C<\x..>, and the rest of the characters as themselves:
  
   sub nice_string {
-     join("",
-       map { $_ > 255 ?                  # if wide character...
-              sprintf("\\x{%04X}", $_) :  # \x{...}
-              chr($_) =~ /[[:cntrl:]]/ ?  # else if control character...
-              sprintf("\\x%02X", $_) :    # \x..
-              quotemeta(chr($_))          # else quoted or as themselves
-         } unpack("W*", $_[0]));           # unpack Unicode characters
+        join("",
+        map { $_ > 255                    # if wide character...
+              ? sprintf("\\x{%04X}", $_)  # \x{...}
+              : chr($_) =~ /[[:cntrl:]]/  # else if control character...
+                ? sprintf("\\x%02X", $_)  # \x..
+                : quotemeta(chr($_))      # else quoted or as themselves
+        } unpack("W*", $_[0]));           # unpack Unicode characters
     }
  
  For example,
@@ -562,7 +562,7 @@ sections on case mapping in the L<Unicode Standard|http://www.unicode.org>.
  
  As of Perl 5.8.0, the "Full" case-folding of I<Case
  Mappings/SpecialCasing> is implemented, but bugs remain in C<qr//i> with them,
-mostly fixed by 5.14.
+mostly fixed by 5.14, and essentially entirely by 5.18.
  
  =item *
  
@@ -615,7 +615,7 @@ String-To-Number Conversions
  Unicode does define several other decimal--and numeric--characters
  besides the familiar 0 to 9, such as the Arabic and Indic digits.
  Perl does not support string-to-number conversion for digits other
-than ASCII 0 to 9 (and ASCII a to f for hexadecimal).
+than ASCII C<0> to C<9> (and ASCII C<a> to C<f> for hexadecimal).
  To get safe conversions from any Unicode string, use
  L<Unicode::UCD/num()>.
  
@@ -802,18 +802,22 @@ L<http://www.cl.cam.ac.uk/~mgk25/unicode.html>
  
  How Does Unicode Work With Traditional Locales?
  
-Starting in Perl 5.16, you can specify
+If your locale is a UTF-8 locale, starting in Perl v5.20, Perl works
+well for all categories except C<LC_COLLATE> dealing with sorting and
+the C<cmp> operator.
+
+For other locales, starting in Perl 5.16, you can specify
  
      use locale ':not_characters';
  
-to get Perl to work well with tradtional locales.  The catch is that you
+to get Perl to work well with them.  The catch is that you
  have to translate from the locale character set to/from Unicode
  yourself.  See L</Unicode IE<sol>O> above for how to
  
      use open ':locale';
  
  to accomplish this, but full details are in L<perllocale/Unicode and
-UTF-8>, including gotchas that happen if you don't specifiy
+UTF-8>, including gotchas that happen if you don't specify
  C<:not_characters>.
  
  =back