The blamed commit simplified some code based on the assumption that
UTF-8 well-formedness had already been verified. It had an assertion to
verify this. The test case shows that there is a path, through 'eval',
that bypasses this usual checking.
The checking was based on the assumption that the program started not in
UTF-8, and something like a 'use utf8' would be needed to get it there,
at which point a flag would be set to the effect that well-formedness
should be checked. But it turns out that a string eval (perhaps other
things) gets parsed separately and so the flag wasn't set, so no
well-formedness checking was being done.
The solution is a one word change, to initialize the flag to TRUE "'yes,
check" instead of FALSE "no, don't check" in the initialization routine
run at the beginning of lexing a code unit. This catches eval and
presumably anything else that was being bypassed.
The checking is only actually done if the code being lexed is known to
be in UTF-8. This will continue to get turned on by the ways it
currently gets turned on, such as 'use utf8'.
use utf8;
qw∘foo ∞ ♥ bar∘
EXPECT
+########
+# NAME [perl #134064]
+BEGIN {
+ if (ord('A') == 193) {
+ print "SKIPPED\n# test is ASCII-specific, but could be extended to EBCDIC";
+ exit 0;
+ }
+}
+use utf8;
+$foo="m'\302'";
+eval $foo ;
+print "The eval did not crash the program\n"
+EXPECT
+OPTION regex
+Malformed UTF-8 character: .*non-continuation.*
+The eval did not crash the program
parser->lex_state = LEX_NORMAL;
parser->expect = XSTATE;
parser->rsfp = rsfp;
- parser->recheck_utf8_validity = FALSE;
+ parser->recheck_utf8_validity = TRUE;
parser->rsfp_filters =
!(flags & LEX_START_SAME_FILTER) || !oparser
? NULL