During parsing, toke.c checks if the user is attempting provide multiple
indexes to an array index:
$a[ $foo, $bar ];
However, while checking for word characters in variable names is aware
of multi-byte characters if "use utf8" is enabled, the loop is only
advanced one byte at a time, not one character at a time. As such,
multibyte variables in array indexes incorrectly yield warnings:
Passing malformed UTF-8 to "XPosixWord" is deprecated
Malformed UTF-8 character (unexpected continuation byte 0x9d, with
no preceding start byte)
Switch the loop to advance character-by-character if UTF-8 semantics are
in use.
-a;
;-a;
EXPECT
+########
+# toke.c
+# [perl #124113] Compile-time warning with UTF8 variable in array index
+use warnings;
+use utf8;
+my $𝛃 = 0;
+my @array = (0);
+my $v = $array[ 0 + $𝛃 ];
+ $v = $array[ $𝛃 + 0 ];
+EXPECT
char *t = s+1;
while (isSPACE(*t) || isWORDCHAR_lazy_if(t,UTF) || *t == '$')
- t++;
+ t += UTF ? UTF8SKIP(t) : 1;
if (*t++ == ',') {
PL_bufptr = skipspace(PL_bufptr); /* XXX can realloc */
while (t < PL_bufend && *t != ']')