Reword warning for deviations from UTF-8 locales

author Karl Williamson <khw@cpan.org>

Fri, 2 Mar 2018 19:13:55 +0000 (12:13 -0700)

committer Karl Williamson <khw@cpan.org>

Fri, 2 Mar 2018 19:37:16 +0000 (12:37 -0700)
author Karl Williamson <khw@cpan.org>
Fri, 2 Mar 2018 19:13:55 +0000 (12:13 -0700)
committer Karl Williamson <khw@cpan.org>
Fri, 2 Mar 2018 19:37:16 +0000 (12:37 -0700)
diff --git a/README.hpux b/README.hpux

index ce000dd..e1857e0 100644 (file)
--- a/README.hpux
+++ b/README.hpux
@@ -563,15 +563,6 @@ questions about 64-bit numbers when Configure asks you, you may get a
  configuration that cannot be compiled, or that does not function as
  expected.
  
-=head2 Locales on HP-UX
-
-HP-UX installs the locale C<univ.utf8>  and C<en_US.utf8> on all systems.
-Up to and including HP-UX 11.23, this local is defective in that it
-does not thinks that the characters C<< $ + < = > ^ ` | >> and C<~> are
-punctuation, which they are according to the Unicode standards.
-
-This appears to be fixed on HP-UX 11.31.
-
  =head2 Oracle on HP-UX
  
  Using perl to connect to Oracle databases through DBI and DBD::Oracle
diff --git a/locale.c b/locale.c

index ead73e5..d6d91ea 100644 (file)
--- a/locale.c
+++ b/locale.c
@@ -1656,10 +1656,9 @@ S_new_ctype(pTHX_ const char *newctype)
              if (UNLIKELY(bad_count) && PL_in_utf8_CTYPE_locale) {
                  PL_warn_locale = Perl_newSVpvf(aTHX_
                       "Locale '%s' contains (at least) the following characters"
-                     " which have\nnon-standard meanings: %s\nThe Perl program"
-                     " will use the standard meanings",
+                     " which have\nunexpected meanings: %s\nThe Perl program"
+                     " will use the expected meanings",
                        newctype, bad_chars_list);
-
              }
              else {
                  PL_warn_locale = Perl_newSVpvf(aTHX_
diff --git a/pod/perldelta.pod b/pod/perldelta.pod

index fb56b11..58781e3 100644 (file)
--- a/pod/perldelta.pod
+++ b/pod/perldelta.pod
@@ -227,6 +227,14 @@ allow entering the I<first> argument of an operator that takes a fixed
  number of arguments, since this is a case that will not cause stack
  corruption.  [perl #132854]
  
+=item *
+
+The warning added in 5.27.8 concerning UTF-8 locale compatibility was
+misleading.  The new wording and explanation are at
+L<perldiag/Locale '%s' contains (at least) the following characters which
+have unexepected meanings: %s  The Perl program will use the exepected
+meanings>
+
  =back
  
  =head1 Utility Changes
diff --git a/pod/perldiag.pod b/pod/perldiag.pod

index c24be8a..3abc301 100644 (file)
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -3371,28 +3371,43 @@ said library was compiled against.  Reinstalling the XS module will
  likely fix this error.
  
  =item Locale '%s' contains (at least) the following characters which
-have non-standard meanings: %s  The Perl program will use the standard
+have unexepected meanings: %s  The Perl program will use the exepected
  meanings
  
  (W locale) You are using the named UTF-8 locale.  UTF-8 locales are
-expected to adhere to the Unicode standard.  This message arises when
-perl found some anomalies in the locale, and is notifying you that there
-are potential problems.
-
-The most common cause of this warning is that, contrary to the claims,
-Unicode is not completely locale insensitive.  Turkish and some related
-languages have two types of C<"I"> characters.  One is dotted in both
-upper- and lowercase, and the other is dotless in both cases.  Unicode
-allows a locale to use either these rules, or the rules used in all
-other instances, where there is only one type of C<"I">, which is
-dotless in the uppercase, and dotted in the lower.  The perl core does
-not (yet) handle the Turkish case, and this warns you of that.  Instead,
+expected to have very particular behavior, which most do.  This message
+arises when perl found some departures from the expectations, and is
+notifying you that the expected behavior overrides these differences.
+In some cases the differences are caused by the locale definition being
+defective, but the most common causes of this warning are when there are
+ambiguities and conflicts in following the Standard, and the locale has
+chosen an approach that differs from Perl's.
+
+One of these is because that, contrary to the claims, Unicode is not
+completely locale insensitive.  Turkish and some related languages have
+two types of C<"I"> characters.  One is dotted in both upper- and
+lowercase, and the other is dotless in both cases.  Unicode allows a
+locale to use either the Turkish rules, or the rules used in all other
+instances, where there is only one type of C<"I">, which is dotless in
+the uppercase, and dotted in the lower.  The perl core does not (yet)
+handle the Turkish case, and this message warns you of that.  Instead,
  the L<Unicode::Casing> module allows you to mostly implement the Turkish
  casing rules.
  
-But there are other locales which are defective in not following the
-Unicode standard, and this message is raised if one of these is
-detected.
+The other common cause is for the characters
+
+ $ + < = > ^ ` | ~
+
+These are probematic.  The C standard says that these should be
+considered punctuation in the C locale (and the POSIX standard defers to
+the C standard), and Unicode is generally considered a superset of the C
+locale.  But Unicode has added an extra category, "Symbol", and
+classifies these particular characters as being symbols.  Most UTF-8
+locales have them treated as punctuation, so that L<ispunct(2)> returns
+non-zero for them.  But a few locales have it return 0.   Perl takes the
+first approach, not using C<ispunct()> at all (see L<Note [5] in
+perlrecharclass|perlrecharclass/[5]>), and this message is raised to
+notify you that you are getting Perl's approach, not the locale's.
  
  =item Locale '%s' may not work well.%s
  
diff --git a/t/porting/known_pod_issues.dat b/t/porting/known_pod_issues.dat

index 78e0ec6..5856f80 100644 (file)
--- a/t/porting/known_pod_issues.dat
+++ b/t/porting/known_pod_issues.dat
@@ -147,6 +147,7 @@ ioctl(2)
  IPC::Run
  IPC::Shareable
  IPC::Signal
+ispunct(2)
  kill(3)
  langinfo(3)
  LaTeX::Encode
author	Karl Williamson <khw@cpan.org>
	Fri, 2 Mar 2018 19:13:55 +0000 (12:13 -0700)
committer	Karl Williamson <khw@cpan.org>
	Fri, 2 Mar 2018 19:37:16 +0000 (12:37 -0700)
README.hpux		patch \| blob \| blame \| history
locale.c		patch \| blob \| blame \| history
pod/perldelta.pod		patch \| blob \| blame \| history
pod/perldiag.pod		patch \| blob \| blame \| history
t/porting/known_pod_issues.dat		patch \| blob \| blame \| history