This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
regcomp.c: refactor a static function
nextchar() advances the parse to the next byte beyond any ignorable
bytes, returning the parse pointer before the advancement.
I find this confusing, as
foo = nextchar();
reads as if foo should point to the next character, instead of the
character where the parse already is at. This functionality is hard for
a reader to grok, even if the name weren't misleading, as the place the
variable gets set in the source is far away from the call. It's clearer
to say
foo = current;
nextchar();
This has confused others as well, as in one place several commits have
been required to get it so it works properly, and games have been played
to back up the parse if it turns out it shouldn't have been advanced,
whereas it's better to check first, then advance if it is the right
thing to do. Ready-Fire-Aim is not a best practice.
This commit makes nextchar() return void, and changes the few places
where the en-passant value was used.
The new scheme is still buggy, as nextchar() only advances a single
byte, which may be the wrong thing to do when the pattern is UTF-8
encoded. More work is needed to be in a position to fix this. We have
only gotten away with this so far because apparently no one is using
non-ASCII white space under /x, and our meta characters are all ASCII,
and there are likely other things that reposition things to a character
boundary before problems have arisen.