This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
PATCH: [perl #131646] Assertion fail UTF-8 error msg
authorKarl Williamson <khw@cpan.org>
Sat, 24 Jun 2017 17:47:19 +0000 (11:47 -0600)
committerKarl Williamson <khw@cpan.org>
Sat, 24 Jun 2017 18:03:50 +0000 (12:03 -0600)
Instead of croaking with a proper message, creating the message creates
an assertion failure.

The cause was that there were two ++ operators on a string, so one
should subtract 2 to get to the string start, but only 1 was being
subtracted.

This is a 5.26 regression, but not terribly consequential, as the
program is about to die, but it is a trivial fix that allows the reason
the crash is happening to be properly displayed to aid debugging, so I'm
adding my vote for it for 5.26.1.

t/lib/warnings/utf8
utf8.c

index a4dfb12..a26bbed 100644 (file)
@@ -749,3 +749,16 @@ BEGIN{
 {};$^H=eval'2**400'}Â
 EXPECT
 Malformed UTF-8 character: \xc2\x0a (unexpected non-continuation byte 0x0a, immediately after start byte 0xc2; need 2 bytes, got 1) at - line 11.
+########
+# NAME  [perl #131646]
+BEGIN{
+    if (ord('A') == 193) {
+        print "SKIPPED\n# ebcdic platforms generates different Malformed UTF-8 warnings.";
+        exit 0;
+    }
+}
+no warnings;
+use warnings 'utf8';
+for(uc 0..t){0~~pack"UXp>",exp}
+EXPECT
+Malformed UTF-8 character: \xc2\x00 (unexpected non-continuation byte 0x00, immediately after start byte 0xc2; need 2 bytes, got 1)  in smart match at - line 9.
diff --git a/utf8.c b/utf8.c
index 68ac640..2ee701a 100644 (file)
--- a/utf8.c
+++ b/utf8.c
@@ -1875,7 +1875,7 @@ Perl_bytes_cmp_utf8(pTHX_ const U8 *b, STRLEN blen, const U8 *u, STRLEN ulen)
                         /* diag_listed_as: Malformed UTF-8 character%s */
                        Perl_ck_warner_d(aTHX_ packWARN(WARN_UTF8),
                                     "%s %s%s",
-                                    unexpected_non_continuation_text(u - 1, 2, 1, 2),
+                                    unexpected_non_continuation_text(u - 2, 2, 1, 2),
                                     PL_op ? " in " : "",
                                     PL_op ? OP_DESC(PL_op) : "");
                        return -2;