This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
utf8.c: Avoid some unnecessary work
authorKarl Williamson <khw@cpan.org>
Wed, 25 Jul 2018 01:52:25 +0000 (19:52 -0600)
committerKarl Williamson <khw@cpan.org>
Fri, 3 Aug 2018 19:13:24 +0000 (13:13 -0600)
The code changed by this commit used to check that the input was valid
UTF-8, and if so, calculated the code point, using a fast function
that doesn't do any error checking.

However, the changes earlier in 5.29 make the time spent in checking for
validity hardly less than the time spent in calculating the code point
at the same time.  So this commit switches to calculating the code point
from the start, avoiding a second pass through the byte string.

utf8.c

diff --git a/utf8.c b/utf8.c
index 5b243df..a4ceb35 100644 (file)
--- a/utf8.c
+++ b/utf8.c
@@ -3120,15 +3120,17 @@ S_is_utf8_common_with_len(pTHX_ const U8 *const p, const U8 * const e,
      * starts at <p>, and extending no further than <e - 1> is in the inversion
      * list <invlist>. */
 
+    UV cp = utf8n_to_uvchr(p, e - p, NULL, 0);
+
     PERL_ARGS_ASSERT_IS_UTF8_COMMON_WITH_LEN;
 
-    if (! isUTF8_CHAR(p, e)) {
+    if (cp == 0 && (p >= e || *p != '\0')) {
         _force_out_malformed_utf8_message(p, e, 0, 1);
         NOT_REACHED; /* NOTREACHED */
     }
 
     assert(invlist);
-    return _invlist_contains_cp(invlist, valid_utf8_to_uvchr(p, NULL));
+    return _invlist_contains_cp(invlist, cp);
 }
 
 STATIC void