This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Allow unquoted UTF-8 HERE-document terminators
authorAlex Vandiver <alex@chmrr.net>
Mon, 23 Mar 2015 02:45:54 +0000 (22:45 -0400)
committerFather Chrysostomos <sprout@cpan.org>
Fri, 27 Mar 2015 19:46:46 +0000 (12:46 -0700)
commit6e59c8626d31f697a2b7b36cf8a200b36d93eac2
tree6136903f7610533f41254dcb862b4895a2dd1c81
parentb3089e964c0afaf7eb8d54aa5a912e4eb2e6c176
Allow unquoted UTF-8 HERE-document terminators

When not explicitly quoted, tokenization of the HERE-document terminator
dealt improperly with multi-byte characters, advancing one byte at a
time instead of one character at a time.  This lead to
incomprehensible-to-the-user errors of the form:

    Passing malformed UTF-8 to "XPosixWord" is deprecated
    Malformed UTF-8 character (unexpected continuation byte 0xa7, with
      no preceding start byte)
    Can't find string terminator "EnFra�" anywhere before EOF

If enclosed in single or double quotes, parsing was correctly effected,
as delimcpy advances byte-by-byte, but looks only for the single-byte
ending character.

When doing a \w+ match looking for the end of the word, advance
character-by-character instead of byte-by-byte, ensuring that the size
does not extend past the available size in PL_tokenbuf.
t/lib/warnings/toke
toke.c