This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
regcomp.c: Improve dump ANYOFR patterns
authorKarl Williamson <khw@cpan.org>
Wed, 12 Feb 2020 16:26:22 +0000 (09:26 -0700)
committerSawyer X <xsawyerx@cpan.org>
Wed, 27 May 2020 08:09:54 +0000 (11:09 +0300)
On ASCII platforms, where its easy to calculate, when dumping a pattern,
don't output the lowest first UTF-8 byte when the entire range is ASCII.
The info about this minimum byte is carried in the node, but is ignored
unless the pattern is UTF-8, and in the case of UTF-8 invariant
characters gives no extra help.  The information is quite useful for
large code points, so we can quickly rule out large swaths of potential
matches without having to convert the target UTF-8 string to code point
format.  But for ASCII matches it isn't helpful and dumping it is just
extra noise.

regcomp.c

index 01d96ec..203dbdc 100644 (file)
--- a/regcomp.c
+++ b/regcomp.c
@@ -21421,11 +21421,16 @@ Perl_regprop(pTHX_ const regexp *prog, SV *sv, const regnode *o, const regmatch_
                          : (OP(o) == ANYOFH || OP(o) == ANYOFR)
                            ? 0xFF
                            : lowest;
-            Perl_sv_catpvf(aTHX_ sv, " (First UTF-8 byte=%02X", lowest);
-            if (lowest != highest) {
-                Perl_sv_catpvf(aTHX_ sv, "-%02X", highest);
+#ifndef EBCDIC
+            if (OP(o) != ANYOFR || ! isASCII(ANYOFRbase(o) + ANYOFRdelta(o)))
+#endif
+            {
+                Perl_sv_catpvf(aTHX_ sv, " (First UTF-8 byte=%02X", lowest);
+                if (lowest != highest) {
+                    Perl_sv_catpvf(aTHX_ sv, "-%02X", highest);
+                }
+                Perl_sv_catpvf(aTHX_ sv, ")");
             }
-            Perl_sv_catpvf(aTHX_ sv, ")");
         }
 
         SvREFCNT_dec(unresolved);