RT #122148: In 5.20, commit
25fdce4a165 changed the way pos() was stored
in magic attached to SVs from being a byte offset to a char offset,
*except* that, for efficiency, strings being used for pattern matching
were kept as byte offsets (with a flag indicating thus), *except* where
the SV already had magic attached (such as taint, as in the bug report and
in this commit's test), in which case it kept it as chars.
The code that updated pos() after an iteration of s///e was faulty: the
string buffer it used for converting byte legnths to char lengths (via
utf8_length()) was the wrong buffer: rather than using the src string
being matched against, it was using the destination string being built up
via iterations of s///. Once double-byte utf8 chars were involved, all the
pos() calculations went wrong, and utf8 warnings started mysteriously
appearing.
if (!(mg = mg_find_mglob(sv))) {
mg = sv_magicext_mglob(sv);
}
- assert(SvPOK(dstr));
- MgBYTEPOS_set(mg, sv, SvPVX(dstr), m - orig);
+ assert(SvPOK(sv));
+ MgBYTEPOS_set(mg, sv, SvPVX(sv), m - orig);
}
if (old != rx)
(void)ReREFCNT_inc(rx);
}
require './test.pl';
-plan(tests => 3*10 + 3*8 + 2*16 + 2);
+plan(tests => 3*10 + 3*8 + 2*16 + 3);
my $arg = $ENV{PATH}; # a tainted value
use constant UTF8 => "\x{1234}";
'ok', {switches => ["-T", "-l"]},
"therefore swash_init should be taint agnostic");
}
+
+{
+ # RT #122148: s///e on tainted utf8 strings got pos() messed up in 5.20
+
+ my @p;
+ my $s = "\x{100}\x{100}\x{100}\x{100}". $^X;
+ $s =~ s/\x{100}/push @p, pos($s); "xxxx";/eg;
+ is("@p", "0 1 2 3", "RT #122148");
+}