This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Split diagnostics for two UTF-8 malformations
authorKarl Williamson <khw@cpan.org>
Wed, 23 Nov 2016 00:47:35 +0000 (17:47 -0700)
committerKarl Williamson <khw@cpan.org>
Thu, 24 Nov 2016 15:29:14 +0000 (08:29 -0700)
commite308b348b63b8c65648ae3d340ce96b3ec19f1a2
tree2b8b5bf146d199751c85d6a43b9b3dae21b7a3f2
parent2d0a280183c8525ba909db81b0007830c2f3a118
Split diagnostics for two UTF-8 malformations

Some UTF-8 sequences may have multiple malformations.  Commit
2b5e7bc2e60b4c4b5d87aa66e066363d9dce7930 tried to make sure that all
possible ones are raised, instead of abandoning searching after one is
found.  Since, I realized that there was yet another case of two
malformations that it returned only one or the other of.

An input buffer may be too short to fully express the code point it
purports to.  This can be determined by the first byte of the UTF-8
sequence indicating a longer sequence is requred than the space
available.  But also, that shortened sequence can have a premature
beginning of another character earlier than the shortness.  This commit
causes these to be both raised, instead of the previous behavior of
noting just one.
ext/XS-APItest/t/utf8.t
t/op/utf8decode.t
utf8.c