This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
regcomp.c: Safer handling of malformed UTF-8
authorKarl Williamson <khw@cpan.org>
Wed, 16 Sep 2015 14:48:29 +0000 (08:48 -0600)
committerKarl Williamson <khw@cpan.org>
Wed, 16 Sep 2015 20:48:38 +0000 (14:48 -0600)
This commit just changes a test to look for UTF-8 invariants instead of
legal UTF-8 start characters.  The effective difference is that now all
non-invariants go to the general utf8 handling function, which is
equipped to find malformed UTF-8.  Previously, this code would
improperly accept malformations that were illegal start characters or
continuation characters.

regcomp.c

index fe9b326..4f4bb44 100644 (file)
--- a/regcomp.c
+++ b/regcomp.c
@@ -12651,7 +12651,7 @@ S_regatom(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, U32 depth)
                    /*FALLTHROUGH*/
                default:    /* A literal character */
                  normal_default:
-                   if (UTF8_IS_START(*p) && UTF) {
+                   if (! UTF8_IS_INVARIANT(*p) && UTF) {
                        STRLEN numlen;
                        ender = utf8n_to_uvchr((U8*)p, RExC_end - p,
                                               &numlen, UTF8_ALLOW_DEFAULT);