This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Andy Lester [Sat, 18 Feb 2017 01:46:15 +0000 (19:46 -0600)]
Moving variables to their innermost scope.
Some vars have been tagged as const because they do not change in their
new scopes. In pp_reverse in pp.c, I32 tmp is only used to hold a char,
so is changed to char.
David Mitchell [Sat, 18 Feb 2017 10:46:53 +0000 (10:46 +0000)]
pp_multideref: tweak an assertion
My recent commit v5.25.9-89-g43dbb3c added an assertion to the effect that
in
@{ local $a[0]{b}[1] } = 1;
the 'local' could only appear at the end of a block and so asserted that the
next op should be OP_LEAVE. However, this:
@{1, local $a[0]{b}[1] } = 1;
inserts an OP_LIST before the OP_LEAVE.
Improve the assert to cope with any number of OP_NULL or OP_LISTs before
the OP_LEAVE.
David Mitchell [Sat, 18 Feb 2017 10:20:00 +0000 (10:20 +0000)]
pp_formline(): avoid buffer overrun
RT #130703
My recent commit v5.25.9-77-g90c3aa0 attempted to simplify buffer growth
in pp_formline(), but missed the operators which append data to
PL_formtarget *without* doing 'goto append'. These ops either append a
fieldsize's worth of bytes, or a \n (FF_NEWLINE). So grow by fieldsize
whenever we fetch something new, and for each FF_NEWLINE.
Andreas Koenig [Thu, 16 Feb 2017 10:27:57 +0000 (11:27 +0100)]
Updates CPAN.pm to ANDK/CPAN-2.17-TRIAL2.tar.gz
jdhedden [Wed, 15 Feb 2017 04:56:20 +0000 (23:56 -0500)]
Upgrade to Thread::Queue 3.12
Karl Williamson [Wed, 15 Feb 2017 17:17:06 +0000 (10:17 -0700)]
regexec.c: Fix comment typos
David Mitchell [Wed, 15 Feb 2017 15:58:24 +0000 (15:58 +0000)]
avoid a leak in list assign from/to magic values
RT #130766
A leak in list assignment was introduced by v5.23.6-89-gbeb08a1 and
extended with v5.23.6-90-g5c1db56.
Basically the code in S_aassign_copy_common() which does a mark-and-sweep
looking for common vars by temporarily setting SVf_BREAK on LHS SVs then
seeing if that flag was present on RHS vars, very temporarily removed that
flag from the RHS SV while mortal copying it, then set it again. After
those two commits, the "resetting" code could set SVf_BREAK on the RHS SV
even when it hadn't been been present earlier.
This meant that on exit from S_aassign_copy_common(), some SVs could be
left with SVf_BREAK on. When that SV was freed, the SVf_BREAK flag meant
that the SV head wasn't planted back in the arena (but PL_sv_count was
still decremented). This could lead to slow growth of the SV HEAD arenas.
The two circumstances that could trigger the leak were:
1) An SMG var on the LHS and a temporary on the RHS, e.g.
use Tie::Scalar;
my ($s, $t);
tie $s, 'Tie::StdScalar'; # $s has set magic
while (1) {
($s, $t) = ($t, map 1, 1, 2); # the map returns temporaries
}
2) A temporary on the RHS which has GMG, e.g.
my $s = "abc";
pos($s) = 1;
local our ($x, $y);
while (1) {
my $pr = \pos($s); # creates a ref to a TEMP with get magic
($x, $y) = (1, $$pr);
}
Strictly speaking a TEMP isn't required for either case; just a situation
where there's always a fresh SV on the RHS for each iteration that will
soon get freed and thus leaked.
This commit doesn't include any tests since I can't think of a way of
testing it. svleak.t relies on PL_sv_count, which in this case doesn't
show the leak.
Karl Williamson [Tue, 14 Feb 2017 20:22:58 +0000 (13:22 -0700)]
Improve handling pattern compilation errors
Perl tries to continue parsing in the face of errors for the convenience
of the person running the script, so as to batch up as many errors as
possible, and cut down the number of runs. Some errors will, however,
have a cascading effect, resulting in the parser getting confused as to
the intent. Perl currently aborts parsing if 10 errors accumulate.
However, some things are reparsed as compilation continues, in
particular tr///, s///, and qr//. The code that reparses has an
expectation of basic sanity in what it is looking at, and so reparsing
with known errors can lead to segfaults. Recent commits have tightened
this up to avoid reparsing, or substitute valid stuff before reparsing.
This all works, as the code won't execute until all the errors get
fixed.
Commit
f065e1e68bf6a5541c8ceba8c9fcc6e18f51a32b changed things so that
if there is an error in parsing a pattern, the whole compilation is
immediately aborted. Since then, I realized it would be relatively
simple to instead, skip compilation of that particular pattern, but
continue on with the parsing of the program as a whole, up to the
maximum number of allowed errors. And again the program will refuse to
execute after compilation if there were any errors.
This commit implements that, the benefit being that we don't try to
reparse a pattern that failed the original parse, but can go on to find
errors elsewhere in the program.
James E Keenan [Tue, 14 Feb 2017 19:04:54 +0000 (14:04 -0500)]
Revert "Upgrade to Thread::Queue 3.12"
This reverts commit
57c819f845c985ed9979bfa76b1b8ca1708370f0.
Reverting to give us time to explore possible race condition. See:
https://rt.perl.org/Ticket/Display.html?id=130777
David Mitchell [Tue, 14 Feb 2017 17:50:11 +0000 (17:50 +0000)]
[MERGE] regex (?{...}) and WHILEM scope fixups
David Mitchell [Tue, 14 Feb 2017 17:10:34 +0000 (17:10 +0000)]
S_regmatch: eliminate WHILEM_A_min paren saving
In something like
"
a1b2c3d4..." =~ /(?:(\w)(\d))*..../
A WHILEM state is pushed for each iteration of the '*'. Part of this
state saving includes the previous indices for each of the captures within
the body of the thing being iterated over. So we save the following sets of
values for $1,$2:
()()
(a)(1)
(b)(2)
(c)(3)
(d)(4)
Then if at any point we backtrack, we can undo one or more iterations and
restore the older values of $1,$2.
However, when the match is non-greedy, as in A*?B, then on failure of B
and backtracking we attempt *more* A's rather than removing some already
matched A's. So there's never any need to save all the current paren state
for each iteration.
This eliminates a lot of per-iteration overhead for minimal WHILEMs and
makes the following run about 25% faster:
$s = ("a" x 1000);
$s =~ /^(?:(.)(.))*?[XY]/ for 1..10_000;
David Mitchell [Tue, 14 Feb 2017 16:28:31 +0000 (16:28 +0000)]
S_regmatch: eliminate WHILEM_B paren saving
In something like
"
a1b2c3d4..." =~ /(?:(\w)(\d))*..../
A WHILEM state is pushed for each iteration of the '*'. Part of this
state saving includes the previous indices for each of the captures within
the body of the thing being iterated over. So we save the following sets of
values for $1,$2:
()()
(a)(1)
(b)(2)
(c)(3)
(d)(4)
Then if at any point we backtrack, we can undo one or more iterations and
restore the older values of $1,$2.
For /A*B/ where A is a complex sub-pattern like (\w)(\d), we currently save
the paren state each time we're about to attempt to iterate another A.
But it turns out that for non-greedy matching, i.e. A*?B, we also
save the paren state before executing B. This is unnecessary, as
B can't alter the capture state of the parens within A. So eliminate it.
If in the future some sneaky regex is found which this commit breaks,
then as well as restoring the old behaviour, you should look carefully
to see whether similar paren-saving behaviour for B should be added to
greedy matches too, i.e. A*B. It was partly the discrepancy between
saving for A*?B but not for A*B which made me suspect it was redundant.
David Mitchell [Tue, 14 Feb 2017 16:21:40 +0000 (16:21 +0000)]
Add a comment on why TRIE.jump does a UNWIND_PAREN
(it wasn't obvious to me)
David Mitchell [Tue, 14 Feb 2017 15:59:57 +0000 (15:59 +0000)]
clear savestack on (?{...}) failure and backtrack
RT #126697
In a regex, after executing a (?{...}) code block, if we fail and
backtrack over the codeblock, we're supposed to unwind the savestack, so
that for any example any local()s within the code block are undone.
It turns out that a backtracking state isn't pushed for (?{...}), only
for postponed evals ( i.e. (??{...})). This means that it relies on one
of the earlier backtracking states to clear the savestack on its behalf.
This can't always be relied upon, and the ticket above contains code where
this falls down; in particular:
'ABC' =~ m{
\A
(?:
(?: AB | A | BC )
(?{
local $count = $count + 1;
print "! count=$count; ; pos=${\pos}\n";
})
)*
\z
}x
Here we end up relying on TRIE_next to do the cleaning up, but TRIE_next
doesn't, since there's nothing it would be responsible for that needs
cleaning up.
The solution to this is to push a backtrack state for every (?{...}) as
well as every (??{...}). The sole job of that state is to do a
LEAVE_SCOPE(ST.lastcp).
The existing backtrack state EVAL_AB has been renamed EVAL_postponed_AB
to make it clear it's only used on postponed /(??{A})B/ regexes, and a new
state has been added, EVAL_B, which is only called when backtracking after
failing something in the B in /(?{...})B/.
David Mitchell [Tue, 14 Feb 2017 13:32:16 +0000 (13:32 +0000)]
-Mre=Debug,ALL: indicate regex state stack pushes
At this maximal level of debugging output, it displays the top 3 state
stack entries each time it pushes, but with no obvious indication that
a push is occurring. This commit changes this output:
| 1| Setting an EVAL scope, savestack=9,
| 2| #4 WHILEM_A_max
| 2| #3 WHILEM_A_max
| 2| #2 CURLYX_end yes
0 <abcdef> <g> | 2| 4:POSIXD[\w](5)
to be this (which includes the word "push" and extra indentation for the
stack dump):
| 1| Setting an EVAL scope, savestack=9,
| 2| push #4 WHILEM_A_max
| 2| #3 WHILEM_A_max
| 2| #2 CURLYX_end yes
0 <abcdef> <g> | 2| 4:POSIXD[\w](5)
Also, replace curd (current depth) var with a positive integer offset
(i) var, to avoid signed/unsigned mixing problems.
David Mitchell [Sat, 11 Feb 2017 11:53:41 +0000 (11:53 +0000)]
fix pad/scope issue in re_evals
RT #129881 heap-buffer-overflow Perl_pad_sv
In some circumstances involving a pattern which has embedded code blocks
from more than one source, e.g.
my $r = qr{(?{1;}){2}X};
"" =~ /$r|(?{1;})/;
the wrong PL_comppad could be active while doing a LEAVE_SCOPE() or on
exit from the pattern.
This was mainly due to the big context stack changes in 5.24.0 - in
particular, since POP_MULTICALL() now does CX_LEAVE_SCOPE(cx) *before*
restoring PL_comppad, the (correct) unwinding of any SAVECOMPPAD's was
being followed by C<PL_comppad = cx->blk_sub.prevcomppad>, which wasn't
necessarily a sensible value.
To fix this, record the value of PL_savestack_ix at entry to S_regmatch(),
and set the cx->blk_oldsaveix of the MULTICALL to this value when pushed.
On exit from S_regmatch, we either POP_MULTICALL which will do a
LEAVE_SCOPE(cx->blk_oldsaveix), or in the absense of any EVAL, do the
explicit but equivalent LEAVE_SCOPE(orig_savestack_ix).
Note that this is a change in behaviour to S_regmatch() - formerly it
wouldn't necessarily clear the savestack completely back the point of
entry - that would get left to do by its caller, S_regtry(), or indirectly
by Perl_regexec_flags(). This shouldn't make any practical difference, but
is tidier and less likely to introduce bugs later.
jdhedden [Tue, 14 Feb 2017 00:46:37 +0000 (19:46 -0500)]
Upgrade to Thread::Queue 3.12
Karl Williamson [Tue, 14 Feb 2017 02:57:53 +0000 (19:57 -0700)]
Make _byte_dump_string() usable in all of core
I found myself needing this function for development debugging, which
formerly was only usable from utf8.c. This enhances it to allow a
second format type, and makes it core-accessible.
Karl Williamson [Tue, 14 Feb 2017 02:22:59 +0000 (19:22 -0700)]
toke.c: Make sure things are initialized
Commit
3dd4eaeb8ac39e08179145b86aedda36584a3509 fixed a bug wherein the
tr/// operator parsing code could be looking at uninitialized data.
This happens only because we try to carry on when we find errors, so as
to find as many errors as possible in a single run, as a convenience to
the person debugging the script being compiled. And we failed to
initialize stuff upon getting an error; stuff that was later looked at
by tr///.
That commit fixed the ticket by making sure the things mentioned there
got initialized upon error, but didn't handle the various other places
in the loop where the same thing could happen.
At the time, I thought it would be easier to instead change the tr///
handling code to know that its inputs were problematic, and to avoid
looking at them in that case. This is easily done, and would
automatically catch all the cases in the loop, now and any added in the
future.
But then I thought, maybe tr/// isn't the only operator that could be
thrown off by this. It is the most obvious one, to someone who knows
how it goes about getting compiled; but there may be other operators
that I don't know how they get compiled and have the same or a similar
problem. The better solution then would be to extend
3dd4eaeb8ac39e08179145b86aedda36584a3509 to make sure everything gets
initialized when there is an error. That is what this current commit
does.
The previous few commits have refactored things so as to minimize the
number of places that need to be handled here, down to three. I kinda
doubt that new constructs will be added, at this stage in the language
development, that would require the same initialization handling. But,
if they were, hopefully those doing it would follow the existing
paradigm that this commit and
3dd4eaeb8ac39e08179145b86aedda36584a3509
establish.
Another way to handle this would have been to, instead of doing an
initialize-and-'continue', to instead jump to a common label at the
bottom of the loop which does the initialization. I think it doesn't
matter much which, so left it as this.
Karl Williamson [Tue, 14 Feb 2017 02:18:38 +0000 (19:18 -0700)]
toke.c: Quit now if error at end of input
In these two cases, we know we are at the end of the input, and that we
have an error. There is no need to try to patch things up so we can
continue to parse looking for other errors; there's nothing left to
parse. So skip having to deal with patching up.
Karl Williamson [Tue, 14 Feb 2017 02:12:31 +0000 (19:12 -0700)]
toke.c: Un-special case something
By refactoring slightly, we make this code in a switch statement
have the same entrance and exit invariants as the other cases, so they
all can be handled uniformly at the end of the switch.
Karl Williamson [Tue, 14 Feb 2017 02:01:46 +0000 (19:01 -0700)]
Don't try to compile a pattern known to be in error
Regular expression patterns are parsed by the lexer/toker, and then
compiled by the regex compiler. It is foolish to try to compile one
that the parser has rejected as syntactically bad; assumptions may be
violated and segfaults ensue. This commit abandons all parsing
immediately if a pattern had errors in it. A better solution would be
to flag this pattern as not to be compiled, and continue parsing other
things so as to find the most errors in a single attempt, but I don't
think it's worth the extra effort.
Making this change caused some misleading error messages in the test
suite to be replaced by better ones.
Karl Williamson [Tue, 14 Feb 2017 01:49:52 +0000 (18:49 -0700)]
toke.c: Add internal function to abort parsing
This is to be called to abort the parsing early, before the required
number of errors have been found. It is used when continuing the parse
would be either fruitless or we could be looking at garbage.
Karl Williamson [Tue, 14 Feb 2017 01:46:30 +0000 (18:46 -0700)]
toke.c: White-space only
Indent after the previous commit enclosed this code in a new block.
Karl Williamson [Tue, 14 Feb 2017 01:27:02 +0000 (18:27 -0700)]
Relax internal function API
This changes yyerror_pvn so that its first parameter can be NULL. This
indicates no message is to be output, but that parsing is to be
abandoned immediately, without waiting for more errors to build up.
Karl Williamson [Mon, 13 Feb 2017 20:18:38 +0000 (13:18 -0700)]
Extract code into a function
This creates a function in toke.c to output the compilation aborted
message, changing perl.c to call that function. This is in preparation
for this to be called from a 2nd place
Karl Williamson [Mon, 13 Feb 2017 20:03:32 +0000 (13:03 -0700)]
toke.c: Rmv no longer necessary UTF-8 checks
The previous commit tightened up the checking for well-formed UTF8ness,
so that the ones removed here were redundant.
The test during a string eval may also no longer be necessary, but since
there are many ways to create that string, I'm not confidant enough to
remove it.
Karl Williamson [Tue, 14 Feb 2017 03:40:57 +0000 (20:40 -0700)]
Add test for [perl #130675]
Karl Williamson [Tue, 14 Feb 2017 03:37:47 +0000 (20:37 -0700)]
t/lib/croak/toke_l1: Cut down test
The previous commits hardening toke.c against malformed UTF-8 input have
allowed this test case to be cut down substantially
Karl Williamson [Mon, 13 Feb 2017 19:35:18 +0000 (12:35 -0700)]
toke.c: Fix bugs where UTF-8 is turned on in mid chunk
Previous commits have tightened up the checking of UTF-8 for
well-formedness in the input program or string eval. This is done in
lex_next_chunk and lex_start. But it doesn't handle the case of
use utf8; foo
because 'foo' is checked while UTF-8 is still off. This solves that
problem by noticing when utf8 is turned on, and then rechecking at the
next opportunity.
See thread beginning at
http://nntp.perl.org/group/perl.perl5.porters/242916
This fixes [perl #130675]. A test will be added in a future commit
This catches some errors earlier than they used to be and aborts. so
some tests in the suite had to be split into multiple parts.
Karl Williamson [Mon, 13 Feb 2017 19:16:45 +0000 (12:16 -0700)]
mg.c: PL_hints is unsigned
Therefore it's dangerous to presume things fit into an IV.
Karl Williamson [Mon, 13 Feb 2017 19:02:54 +0000 (12:02 -0700)]
toke.c: Add branch prediction
The input is far more likely to be well-formed than not.
Karl Williamson [Mon, 13 Feb 2017 18:54:06 +0000 (11:54 -0700)]
toke.c: Fix comments describing S_tokeq
The comments about what this function does were incorrect.
Karl Williamson [Mon, 13 Feb 2017 18:51:59 +0000 (11:51 -0700)]
toke.c: Slight refactor.
This moves an automatic variable to closer to the only place it is used;
it also adds branch prediction. It is likely that the input will be
well-formed.
Karl Williamson [Mon, 13 Feb 2017 18:29:35 +0000 (11:29 -0700)]
toke.c: White space, comments, braces
I am adding the braces because in one of the areas, the lack of braces
had led to a blead failure.
Karl Williamson [Mon, 13 Feb 2017 18:31:53 +0000 (11:31 -0700)]
toke.c: Don't compare same bytes twice
Before starting this memEQ, we know that the first bytes are the same,
so might as well start the compare with the 2nd bytes.
Karl Williamson [Mon, 13 Feb 2017 18:15:07 +0000 (11:15 -0700)]
toke.c: Move declaration
This automatic variable doesn't need such a large scope.
Karl Williamson [Thu, 9 Feb 2017 05:18:27 +0000 (22:18 -0700)]
perly.c: Clarify comment
Karl Williamson [Sat, 11 Feb 2017 04:26:55 +0000 (21:26 -0700)]
sv.h: Add comment
Sawyer X [Sun, 12 Feb 2017 10:52:12 +0000 (12:52 +0200)]
Merge branch 'sawyerx/undeprecate-backslash-c' into blead
Sawyer X [Wed, 8 Feb 2017 13:00:37 +0000 (14:00 +0100)]
Revert "Deprecating the use of C<< \cI<X> >> to specify a printable character."
This reverts commit
bfdc8cd3d5a81ab176f7d530d2e692897463c97d.
Sawyer X [Wed, 8 Feb 2017 12:34:31 +0000 (13:34 +0100)]
Revert "Avoid triggering a deprecation warnings."
This reverts commit
5ad2a0b67cdf1d90b67b991ae8708d3b9d57bad9.
We have decided to undeprecate \c. Please see:
http://www.nntp.perl.org/group/perl.perl5.porters/2017/01/msg242693.html
Karl Williamson [Sat, 11 Feb 2017 20:07:33 +0000 (13:07 -0700)]
pp_pack.c: Remove no longer relevant comment
Karl Williamson [Sat, 11 Feb 2017 20:02:46 +0000 (13:02 -0700)]
pp_pack.c: Remove needless branch
This function only sets *retlen to 0 if the input length is 0. In all
but one case, the function was not called with with that input. In that
one case, I changed to avoid calling the function with that input.
Hence we can remove checking *retlen for 0.
Karl Williamson [Sat, 11 Feb 2017 20:00:27 +0000 (13:00 -0700)]
pp_pack.c: Remove obsolete code
This code effectively reduced to
if (foo) 0 else 0
because a #define was changed to 0 some releases ago. Just replace by
0
Karl Williamson [Sat, 11 Feb 2017 19:58:16 +0000 (12:58 -0700)]
utf8.c: Move comment a few lines up in the file
Move it to where it makes more sense.
Karl Williamson [Sat, 11 Feb 2017 19:57:33 +0000 (12:57 -0700)]
utf8.h: Clarify comment
Karl Williamson [Sun, 12 Feb 2017 03:59:49 +0000 (20:59 -0700)]
utf8.h: White-space, parens only
Add parens to clarify grouping, white-space for legibility
Karl Williamson [Sun, 12 Feb 2017 03:57:36 +0000 (20:57 -0700)]
utf8.h: Add branch prediction
use bytes;
is unlikely to be the case.
Tony Cook [Sat, 11 Feb 2017 21:53:58 +0000 (08:53 +1100)]
(perl #126203) build issues
helps to test builds with the right options...
Karl Williamson [Fri, 13 Jan 2017 19:35:26 +0000 (12:35 -0700)]
Change av_foo_nomg() name
These names sparked some controversy when created:
http://www.nntp.perl.org/group/perl.perl5.porters/2016/03/msg235216.html
I looked through existing code for paradigms to follow, and found some
occurrences of 'skip_foo_mg'. So this commit changes the names to be
av_top_index_skip_len_mg()
av_tindex_skip_len_mg()
This is explicit about the type of magic that is ignored, and will still
be valid if another type of magic ever gets added.
Jarkko Hietaniemi [Sat, 11 Feb 2017 00:03:29 +0000 (19:03 -0500)]
Coverity #28930: unchecked return value
Strangely, this was apparently found already in 2014, but it now
(rightfully) showed up. Coverity database tweak?
Karl Williamson [Fri, 10 Feb 2017 17:04:16 +0000 (10:04 -0700)]
Net::Ping: Todo a test on EBCDIC
This is just to get it to pass. It appears to be a permissions problem
on our one smoker.
Jarkko Hietaniemi [Sun, 5 Feb 2017 23:50:11 +0000 (18:50 -0500)]
Coverity #155950: pRExC->code_blocks is blindly derefed
Even though code calling S_pat_upgrade_to_utf8 from the
Perl_re_op_compile is testing the code_blocks for NULLness.
Karl Williamson [Wed, 8 Feb 2017 22:48:56 +0000 (15:48 -0700)]
regcomp.c: Fix so will compile on C++11
See
147e38468b8279e26a0ca11e4efd8492016f2702 for complete explanation
Tony Cook [Thu, 9 Feb 2017 00:19:45 +0000 (11:19 +1100)]
(perl #126203) avoid potential leaks on quadmath_snprintf() failure
In the unlikely case quadmath_snprintf() fails both sv_vcatpvfn_flags()
and my_snprintf() could leak the temp format string returned by
quadmath_format_single() if quadmath_format_single() had to rebuild
the format.
Getting quadmath_snprintf() to fail in this context seems impractical,
but future changes may make it happen, so clean up after ourselves.
Neil Bowers [Sun, 5 Feb 2017 21:18:37 +0000 (21:18 +0000)]
Getopt::Std: Changed pod NAME to follow convention
See thread at http://nntp.perl.org/group/perl.perl5.porters/242374
Karl Williamson [Tue, 7 Feb 2017 17:00:19 +0000 (10:00 -0700)]
locale.c: Use only C89 legal C
An array was being declared and initialized from a non-constant.
Spotted by James Keenan
Karl Williamson [Wed, 8 Feb 2017 04:42:40 +0000 (21:42 -0700)]
Add TODO test for [perl #125493]
Tony Cook [Wed, 8 Feb 2017 03:56:10 +0000 (14:56 +1100)]
perldelta for
d6851fe9ee8e
Tony Cook [Wed, 8 Feb 2017 03:38:45 +0000 (14:38 +1100)]
(perl #130705) don't convert match with argument to qr
Code like:
0 =~ qr/1/ ~~ 0
would have the match operator replaced with qr, leaving an op tree
like:
e <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
d <@> print vK ->e
3 <0> pushmark s ->4
c <2> aelem sK/2 ->d
5 <1> rv2av[t1] sKR/1 ->6
4 <$> gv(*0) s ->5
b <2> smartmatch sK/2 ->c
9 </> qr() sKS ->a <=== umm
6 <$> const(IV 0) s ->7
8 <|> regcomp(other->9) sK ->9
7 </> qr(/"1"/) s ->8
a <$> const(IV 0) s ->b
when executed, this would leave an extra value on the stack:
$ ./perl -Dst -e 'print(0->[0 =~ qr/1/ ~~ 0])'
Smartmatch is experimental at -e line 1.
EXECUTING...
=>
(-e:0) enter
=>
(-e:0) nextstate
=>
(-e:1) pushmark
=> *
(-e:1) gv(main::0)
=> * GV()
(-e:1) rv2av
=> * AV()
(-e:1) const(IV(0))
=> * AV() IV(0)
(-e:1) qr
=> * AV() IV(0) \REGEXP()
(-e:1) regcomp
=> * AV() IV(0)
(-e:1) qr
=> * AV() IV(0) \REGEXP()
(-e:1) const(IV(0))
=> * AV() IV(0) \REGEXP() IV(0)
(-e:1) smartmatch
=> * AV() IV(0) SV_NO
(-e:1) aelem
=> * AV() SV_UNDEF <=== trying to print an AV
(-e:1) print
perl: sv.c:2941: Perl_sv_2pv_flags: Assertion `((svtype)((sv)->sv_flags & 0xff)) != SVt_PVAV && ((svtype)((sv)->sv_flags & 0xff)) != SVt_PVHV && ((svtype)((sv)->sv_flags & 0xff)) != SVt_PVFM' failed.
Aborted
Pali [Sun, 18 Sep 2016 15:16:33 +0000 (17:16 +0200)]
pod: Do not suggest to use insecure :utf8 PerlIO layer when reading files
Instead use strict :encoding(UTF-8) PerlIO layer for input files.
Karl Williamson [Tue, 7 Feb 2017 23:21:25 +0000 (16:21 -0700)]
Add test for [perl #129157]
Tony Cook [Tue, 7 Feb 2017 23:16:33 +0000 (10:16 +1100)]
perldelta for
dd314e1ca8c3
Tony Cook [Tue, 7 Feb 2017 05:14:53 +0000 (16:14 +1100)]
(perl #130722) don't call SvPVX() on a glob
S_doparseform() called SvPVX() on the format argument, which
produced an assertion failure when the format was supplied as a
glob.
Since S_doparseform() calls SvPV() initially and stores the result,
just use that result.
Karl Williamson [Tue, 7 Feb 2017 20:12:36 +0000 (13:12 -0700)]
Add .t for malformed-UTF-8 toke.c testing
This adds a non-UTF-8 encoded file to make it easier to test malformed
UTF-8 strings. The file is automatically skipped on EBCDIC platforms.
The file is initialized with a test for [perl #129037], which had
already been unknowingly fixed by commit
75219bacf5aacd315b96083de24e82cd8238e99a
David Mitchell [Tue, 7 Feb 2017 15:45:14 +0000 (15:45 +0000)]
multideref: handle both OPpLVAL_INTRO,OPpDEREF
RT #130727
In a nested dereference like $a[0]{b}[1], all but the last aelem/helem
will normally have a OPpDEREF_AV/HV flag, while the last won't have a deref
but may well have OPpLVAL_INTRO, e.g.
local $a[0]{b}[1] = 1;
The code in S_maybe_multideref() which converts a chain of aelem/helem's
into a single mltideref op assumes this - in particular that an op can't
have both OPpLVAL_INTRO and OPpDEREF* at the same time. However, the
following code violates that assumption:
@{ local $a[0]{b}[1] } = 1;
In @{expr} = 1, the array is in lvalue context, which makes expr be done
in ref (autovivify) context. So the final aelem in the above expression
gets both OPpLVAL_INTRO and OPpDEREF_AV flags.
In the old days, pp_aelem (probably more by luck than design) would action
OPpLVAL_INTRO and ignore OPpDEREF_AV. This commit makes pp_multideref
behave in the same way. In particular, there's no point in autovivifying
$a[0]{b}[1] as an array ref since the local() will be undone before it
gets a change to be used.
The easiest way to achieve this is to tun off the OPpDEREF flag on the
aelem/helem op if the OPpLVAL_INTRO flag is set.
Pali [Sun, 18 Sep 2016 15:52:36 +0000 (17:52 +0200)]
perluniintro: Use uppercase UTF-8 encoding name
Reason is consistency with other documentation files.
Pali [Sun, 18 Sep 2016 15:45:57 +0000 (17:45 +0200)]
perluniintro: Fix comment, Encode::decode does not have to return string with UTF8 flag set
Pali [Sun, 18 Sep 2016 15:44:22 +0000 (17:44 +0200)]
perluniintro: Suggest to use utf8::decode() instead heavy Encode when sequence of bytes is valid UTF-8
Pali [Sun, 18 Sep 2016 15:19:59 +0000 (17:19 +0200)]
pod: Suggest to use strict :encoding(UTF-8) PerlIO layer over not strict :encoding(utf8)
For data exchange it is better to use strict UTF-8 encoding and not perl's utf8.
Hugo van der Sanden [Wed, 5 Oct 2016 01:20:26 +0000 (02:20 +0100)]
[perl #129061] CURLYX nodes can be studied more than once
study_chunk() for CURLYX is used to set flags on the linked WHILEM
node to say it is the whilem_c'th of whilem_seen. However it assumes
each CURLYX can be studied only once, which is not the case - there
are various cases such as GOSUB which call study_chunk() recursively
on already-visited parts of the program.
Storing the wrong index can cause the super-linear cache handling in
regmatch() to read/write the byte after the end of poscache.
Also reported in [perl #129281].
Tony Cook [Sun, 9 Oct 2016 23:46:46 +0000 (10:46 +1100)]
(perl #129281) test for buffer overflow issue
Hugo van der Sanden [Mon, 6 Feb 2017 11:11:11 +0000 (11:11 +0000)]
vi hints for pat.t
James E Keenan [Sun, 5 Feb 2017 23:04:36 +0000 (18:04 -0500)]
Remove extra terminating semicolon
Detected by compiling with clang under -Weverything, which includes -Wextra-semi.
Reviewed by jhedden++.
Increment $VERSION in threads.pm.
David Mitchell [Mon, 6 Feb 2017 09:20:04 +0000 (09:20 +0000)]
Perl_fbm_instr(): remove dead code.
For the case where littlestr hasn't been FBM compiled (!SvVALID()), it
can't be SvTAIL(), so there's no need for optional \n handling.
Spotted by Coverity.
Tony Cook [Mon, 6 Feb 2017 00:38:10 +0000 (11:38 +1100)]
prevent leak of class name from retrieve_hook() on an exception
If supplied with a large class name, retrieve_hook() allocates
buffer for the class name and Safefree()s it on exit path.
Unfortunately this memory leaks if load_module() (or a couple of other
code paths) throw an exception.
So use SAVEFREEPV() to release the memory instead.
==20183== 193 bytes in 1 blocks are definitely lost in loss record 4 of 6
==20183== at 0x4C28C20: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20183== by 0x55F85D: Perl_safesysmalloc (util.c:153)
==20183== by 0x6ACA046: retrieve_hook (Storable.xs:4265)
==20183== by 0x6AD6D19: retrieve (Storable.xs:6217)
==20183== by 0x6AD8144: do_retrieve (Storable.xs:6401)
==20183== by 0x6AD85B7: pretrieve (Storable.xs:6506)
==20183== by 0x6AD8E14: XS_Storable_pretrieve (Storable.xs:6718)
==20183== by 0x5C176D: Perl_pp_entersub (pp_hot.c:4227)
==20183== by 0x55E1C6: Perl_runops_debug (dump.c:2450)
==20183== by 0x461B79: S_run_body (perl.c:2528)
==20183== by 0x46115C: perl_run (perl.c:2451)
==20183== by 0x41F1CD: main (perlmain.c:123)
John Lightsey [Tue, 24 Jan 2017 16:30:18 +0000 (10:30 -0600)]
Fix stack buffer overflow in deserialization of hooks.
The use of signed lengths resulted in a stack overflow in retrieve_hook()
when a negative length was provided in the storable data.
The retrieve_blessed() codepath had a similar problem with the placement
of the trailing null byte when negative lengths were provided.
David Mitchell [Sat, 4 Feb 2017 15:54:09 +0000 (15:54 +0000)]
pp_formline: simplify growing of PL_formtarget
There's some reasonably complex logic to try and second guess how much
space to allocate or reallocate for the output buffer (some of which is
my doing from 2011,
26e935cfa6e7).
This commit removes most of this and now just does:
initially, grow the buffer by the size of the format. If any further
growing is needed later on (e.g. after a utf8 upgrade or due to @*) then
just grow as needed. This may give less optimal growing in edge cases
( i.e. repeated smaller grows rather than one big grow), but the old code
was often guessing wrong anyway.
This commit also makes it *always* check whether PL_formtarget needs growing
when about to append data to it, which is safer.
David Mitchell [Sat, 4 Feb 2017 15:10:49 +0000 (15:10 +0000)]
buffer overrun with format and 'use bytes'
RT #130703
In the scope of 'use bytes', appending a string to a format where the
format is utf8 and the string is non-utf8 but contains lots of chars
with ords >= 128, the buffer could be overrun. This is due to all the
\x80-type chars going from being stored as 1 bytes to 2 bytes, without
growing PL_formtarget accordingly.
This commit contains a minimal fix; the next commit will more generally
tidy up the grow code in pp_formline.
Aaron Crane [Sat, 4 Feb 2017 15:28:19 +0000 (15:28 +0000)]
Fix memory leak in generating an exception message
This was my fault; oops.
Aaron Crane [Sat, 4 Feb 2017 14:56:06 +0000 (14:56 +0000)]
Fix outdated note in perldiag
As of
4fa06845e75d453a3101cff32e24c5b743f9819e, the "Odd name/value argument
for subroutine" error has been reported from the callers' perspective.
Steffen Mueller [Fri, 3 Feb 2017 08:06:41 +0000 (09:06 +0100)]
HvTOTALKEYS() takes a HV* as argument
Incidentally, it currently works on SV *'s as well because there's an
explicit cast after an SvANY. Let's not rely on that. This commit also
removes a pointless const in a cast. Again. It takes an HV * as argument.
Let's only change that if we have a strong reason to.
Karl Williamson [Wed, 1 Feb 2017 20:15:00 +0000 (13:15 -0700)]
toke.c: Remove unused param from static function
Commit
d2067945159644d284f8064efbd41024f9e8448a reverted commit
b5248d1e210c2a723adae8e9b7f5d17076647431. b5248 removed a parameter
from S_scan_ident, and changed its interior to use PL_bufend instead of
that parameter. The parameter had been used to limit how far into the
string being parsed scan_ident could look. In all calls to scan_ident
but one, the parameter was already PL_bufend. In the one call where it
wasn't, b5248 compensated by temporarily changing PL_bufend around the
call, running afoul, eventually, of the expectation that PL_bufend
points to a NUL.
I would have expected the reversion to add back both the parameter and
the uses of it, but apparently the function interior has changed enough
since the original commit, that it didn't even think there were
conflicts. As a result the parameter got added back, but not the uses
of it.
I tried both approaches to fix this:
1) to change the function to use the parameter;
2) to simply delete the parameter.
Only the latter passed the test suite without error.
I then tried to understand why the parameter in the first place, and why
the kludge introduced by b5248 to work around removing it. It appears
to me that this is for the benefit of the intuit_more function to enable
it to discern $] from a $ ending a bracketed character class, by ending
the scan before the ']' when in a pattern.
The trouble is that modern scan_ident versions do not view themselves as
constrained by PL_bufend. If that is reached at a point where white
space is allowed, it will try appending the next input line and
continuing, thus changing PL_bufend. Thus the kludge in b5248 wouldn't
necessarily do the expected limiting anyway. The reason the approach
"1)" I tried didn't work was that the function continued to use the
original value, even after it had read in new things, instead of
accounting for those.
Hence approach "2)" is used. I'm a little nervous about this, as it may
lead to intuit_more() (which uses heuristics) having more cases where it
makes the wrong choice about $] vs [...$]. But I don't see a way around
this, and the pre-existing code could fail anyway.
Spotted by Dave Mitchell.
Karl Williamson [Tue, 31 Jan 2017 22:17:56 +0000 (15:17 -0700)]
t/lib/warnings/toke: Fix comment typos
David Mitchell [Wed, 1 Feb 2017 15:50:14 +0000 (15:50 +0000)]
avoid double-freeing regex code blocks
RT #130650 heap-use-after-free in S_free_codeblocks
When compiling qr/(?{...})/, a reg_code_blocks structure is allocated
and various SVs are attached to it. Initially this is set to be freed
via a destructor on the savestack, in case of early dying. Later the
structure is attached to the compiling regex, and a boolean flag in the
structure, 'attached', is set to true to show that the destructor no
longer needs to free the struct.
However, it is possible to get three orders of destruction:
1) allocate, push destructor, die early
2) allocate, push destructor, attach to regex, die
2) allocate, push destructor, attach to regex, succeed
In 2, the regex is freed (via the savestack) before the destructor is
called. In 3, the destructor is called, then later the regex is freed.
It turns out perl can't currently handle case 2:
qr'(?{})\6'
Fix this by turning the 'attached' boolean field into an integer refcount,
then keep a count of whether the struct is referenced from the savestack
and/or the regex. Since it normally has a value of 1 or 2, it's similar
to a boolean flag, but crucially it no longer just indicates that the
regex has a pointer to it ('attached'), but that at least one of the
savestack and regex have a pointer to it. So order of freeing no longer
matters.
I also updated S_free_codeblocks() so that it nulls out SV pointers in
the reg_code_blocks struct before freeing them. This is is generally good
practice to avoid double frees, although is probably not needed at the
moment.
Tony Cook [Wed, 1 Feb 2017 03:34:16 +0000 (14:34 +1100)]
(perl #130684) allocate enough space for the extra 'x'
77c8f26370dcc0e added support for a doubled x regexp flags, and ensured
the doubled flag was passed to the qr// created by
S_compile_runtime_code().
Unfortunately it didn't ensure enough space was allocated for that
extra 'x'.
Karl Williamson [Tue, 31 Jan 2017 21:17:14 +0000 (14:17 -0700)]
PATCH: [perl #130655] Unrecognized UTF-8 char
The root cause of this was code like this
if (a)
b
which got changed into
if (a)
c
b
thus causing 'b' to being changed to be executed unconditionally. The
solution is just to add braces
if (a) {
c
b
}
This is why I always use braces even if not required at the moment. It
was the coding standard at $work.
It turns out that #130567 doesn't even come up with this fix in place.
Karl Williamson [Tue, 31 Jan 2017 18:15:08 +0000 (11:15 -0700)]
PATCH: [perl #130656] tr// failue with UTF-8 across lines
This bug happend under things like
tr/\x{101}-\x{200}/
\x{201}-\x{301}/
The newline in the middle was crucial. As a result the second line got
parsed already knowing that the result was UTF-8, and as a result
setting a variable got skipped which happens only when we discover we
need to flip into UTF-8.
The solution adopted here is to set the variable under other conditions,
which leads to it getting set multiple times. But this extra branch and
setting is confined to somehwat rare circumstances, leaving the mainline
code untouched.
David Mitchell [Mon, 30 Jan 2017 12:25:55 +0000 (12:25 +0000)]
signature sub (\x80 triggered an assertion
RT #130661
In the presence of 'use feature "signatures"', a char >= 0x80 where a sigil
was expected triggered an assert failure, because the (signed) character
was being was being promoted to int and ended up getting returned from
yylex() as a negative value.
Karl Williamson [Mon, 30 Jan 2017 03:59:44 +0000 (20:59 -0700)]
Add test for [perl #129036]
This was fixed by
6cdc5cd8f36f88172b0fcefdcadec75f5b6600b2
(but I didn't check that this was the actual commit).
Karl Williamson [Sun, 29 Jan 2017 22:56:20 +0000 (15:56 -0700)]
PATCH: [perl #130666]: Revert "toke.c, S_scan_ident(): Don't take a "end of buffer" argument, use PL_bufend"
This reverts commit
b5248d1e210c2a723adae8e9b7f5d17076647431.
This commit, dating from 2013, was made unnecessary by later removal of
the MAD code. It temporarily changed the PL_bufend variable; doing that
ran afoul of an assertion, added in
fac0f7a38edc4e50a7250b738699165079b852d8, that expects PL_bufend to
point to a terminating NUL.
Beyond the reversion, a test is added here.
Hugo van der Sanden [Sun, 29 Jan 2017 15:10:02 +0000 (15:10 +0000)]
mention PASS2 in reginsert() example
As per
bb78386f13.
Yves Orton [Sat, 28 Jan 2017 15:20:35 +0000 (16:20 +0100)]
assert that the RExC_recurse data structure points at a valid GOSUB
This assert will fail if someone adds code that optimises away a GOSUB
call. At which point they will see the comment and know what to do.
Yves Orton [Sat, 28 Jan 2017 14:13:17 +0000 (15:13 +0100)]
silence warnings from tests about impossible quantifiers
thanks to Dave M for noticing....
Zefram [Sat, 28 Jan 2017 06:25:28 +0000 (06:25 +0000)]
in dump_sub() handle CV ref used as GV
dump_sub() can receive a CV ref where it's expecting a GV. Make it
handle that cleanly. Fixes [perl #129126].
Zefram [Sat, 28 Jan 2017 05:51:00 +0000 (05:51 +0000)]
croak on sv_setpvn() on a glob
A real glob cannot be written to as a string scalar, and a sv_setpvn()
call attempting to do so used to hit an assertion. (sv_force_normal()
coerces glob copies to strings, but leaves real globs unchanged.)
This isn't exposed through assignment ops, which have special semantics
for assignments to globs, but it can be reached through XS subs that
mutate arguments, and through "^" formats. Change sv_setpvn() to check
for globs and croak cleanly. Fixes [perl #129147].
Yves Orton [Fri, 27 Jan 2017 15:57:40 +0000 (16:57 +0100)]
only mess with NEXT_OFF() when we are in PASS2
In
31fc93954d1f379c7a49889d91436ce99818e1f6 I added code that would modify
NEXT_OFF() when we were not in PASS2, when we should not do so. Strangly this
did not segfault when I tested, but this fix is required.
Steffen Mueller [Fri, 27 Jan 2017 14:55:51 +0000 (15:55 +0100)]
Reuse previously-computed flag
Yves Orton [Fri, 27 Jan 2017 09:23:05 +0000 (10:23 +0100)]
add some details to the docs for S_reginsert()
Had these docs been here I would have saved some time debugging. So
save the next guy from the same trouble... (with my memory *I* might
even be the /next guy/. Sigh.)