This was brought up in ticket #114690.
pos checks the length of the string and then its UTF8-ness. But the
UTF8-ness is not updated by length magic. So it can get confused if
simply stringifying a match var happens to flip the UTF8 flag:
$ perl -le '"\x{100}a" =~ /(..)/; pos($1) = 2; print pos($1); "$1";
print pos($1)'
2
1
$ perl -le '"\x{100}a" =~ /(.)/; pos($1) = 2; print pos($1); "$1"; print
pos($1)'
1
Malformed UTF-8 character (unexpected end of string) in match position
at -e line 1.
0
As pointed out in that ticket, length magic on scalars cannot work
properly with UTF8, so stop using it.
if (!sv)
return 0;
- if (SvGMAGICAL(sv))
- len = mg_length(sv);
- else
- (void)SvPV_const(sv, len);
+ (void)SvPV_const(sv, len);
return len;
}
require './test.pl';
}
-plan tests => 11;
+plan tests => 12;
$x='banana';
$x=~/.a/g;
'pos refuses %hashes';
eval 'pos *a = 1';
is eval 'pos *a', 1, 'pos *glob works';
+
+# Test that UTF8-ness of $1 changing does not confuse pos
+"f" =~ /(f)/; "$1"; # first make sure UTF8-ness is off
+"\x{100}a" =~ /(..)/; # give PL_curpm a UTF8 string; $1 does not know yet
+pos($1) = 2; # set pos; was ignoring UTF8-ness
+"$1"; # turn on UTF8 flag
+is pos($1), 2, 'pos is not confused about changing UTF8-ness';