This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
utf8.c: Don't calc code point from overflowing UTF8
authorKarl Williamson <khw@cpan.org>
Wed, 10 May 2017 02:16:13 +0000 (20:16 -0600)
committerKarl Williamson <khw@cpan.org>
Thu, 13 Jul 2017 03:14:23 +0000 (21:14 -0600)
This avoids calculating a code point from UTF-8 that is known to
overflow.  This could give incorrect results (used only in warning
messages), but is done only when there are 3 (or more) malformations:
overflow, overlong, UTF-8 terminated early, so it's unlikely to actually
happen in the field.

I am not adding any tests, as I don't know of any existing failures, and
soon there will be a commit that limits code points to be at most
IV_MAX.  That commit will cause cause existing tests to fail without
this fix, so that is good enough to test it.  I imagine a brute force
generator of UTF-8 would find some string that showed this problem up,
absent the other coming changes, but it's not worth it.

utf8.c

diff --git a/utf8.c b/utf8.c
index a784c54..db10654 100644 (file)
--- a/utf8.c
+++ b/utf8.c
@@ -1248,7 +1248,13 @@ Perl_utf8n_to_uvchr_error(pTHX_ const U8 *s,
     {
         possible_problems |= UTF8_GOT_LONG;
 
-        if (UNLIKELY(possible_problems & UTF8_GOT_TOO_SHORT)) {
+        if (   UNLIKELY(   possible_problems & UTF8_GOT_TOO_SHORT)
+                          /* The calculation in the 'true' branch of this 'if'
+                           * below won't work if overflows, and isn't needed
+                           * anyway.  Further below we handle all overflow
+                           * cases */
+            &&   LIKELY(! (possible_problems & UTF8_GOT_OVERFLOW)))
+        {
             UV min_uv = uv_so_far;
             STRLEN i;