This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
regexec: Do less work on quantified UTF-8
Consider the regexes /A*B/ and /A*?B/ where A and B are arbitrary,
except that B begins with an EXACTish node. Prior to this patch, as a
shortcut, the loop for accumulating A* would look for the first character
of B to help it decide if B is a possiblity for the next thing. It did
not test for all of B unless testing showed that the next thing could be
the beginning of B. If the target string was UTF-8, it converted each
new sequence of bytes to the code point they represented, and then did
the comparision. This is a relative expensive process.
This commit avoids that conversion by just doing a memEQ at the current
input position. To do this, it revamps S_setup_EXACTISH_ST_c1_c2() to
output the UTF-8 sequences to compare against. The function also has
been tightened up so that there are fewer false positives.