This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
utf8n_to_uvchr(): Note multiple malformations
Some UTF-8 sequences can have multiple malformations. For example, a
sequence can be the start of an overlong representation of a code point,
and still be incomplete. Until this commit what was generally done was
to stop looking when the first malformation was found. This was not
correct behavior, as that malformation may be allowed, while another
unallowed one went unnoticed. (But this did not actually create
security holes, as those allowed malformations replaced the input with a
REPLACEMENT CHARACTER.) This commit refactors the error handling of
this function to set a flag and keep going if a malformation is found
that doesn't preclude others. Then each is handled in a loop at the
end, warning if warranted. The result is that there is a warning for
each malformation for which warnings should be generated, and an error
return is made if any one is disallowed.
Overflow doesn't happen except for very high code points, well above the
Unicode range, and above fitting in 31 bits. Hence the latter 2
potential malformations are subsets of overflow, so only one warning is
output--the most dire.
This will speed up the normal case slightly, as the test for overflow is
pulled out of the loop, allowing the UV to overflow. Then a single test
after the loop is done to see if there was overflow or not.