perl5.git.perl.org Git - perl5.git/log

This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5

https://perl5.git.perl.org / perl5.git / log

Hugo van der Sanden [Sat, 25 Feb 2017 10:42:17 +0000 (10:42 +0000)]

fix VMS test fail

d7186add added a runperl() test that breaks command line length limits for
VMS. Switch to fresh_perl() instead, so the prog is put in a file for us.

commit | commitdiff | tree

Aaron Crane [Sat, 25 Feb 2017 17:24:58 +0000 (17:24 +0000)]

Add "default_inc_excludes_dot" to "perl -V" output

As proposed by Andreas Koenig++ in this message:

http://www.nntp.perl.org/group/perl.perl5.porters/2017/02/msg243256.html

commit | commitdiff | tree

Dominic Hargreaves [Tue, 21 Feb 2017 20:30:38 +0000 (20:30 +0000)]

Documentation fixes for '.' possibly no longer being in @INC

commit | commitdiff | tree

Karl Williamson [Tue, 21 Feb 2017 04:18:28 +0000 (21:18 -0700)]

embed.fnc: _byte_dump_string is core-only

This commit, made during the freeze, was approved by the pumpking

commit | commitdiff | tree

Jarkko Hietaniemi [Thu, 23 Feb 2017 14:51:42 +0000 (09:51 -0500)]

Followup on a4570f51 for t/porting/extrefs.t

More functions have appeared that are PERL_STATIC_INLINE, but the
porting/extrefs.t compiles with -DPERL_NO_INLINE_FUNCTIONS, which
means no bodies are visible, but the Tru64 cc takes static inline
seriously, requiring the bodies.

Instead of the manual tweak of adding #ifndef PERL_NO_INLINE_FUNCTIONS
to embed.fnc, fix the problem in embed.pl so that 'i' type inserts the
required ifndef. Remove the manual PERL_NO_INLINE_FUNCTIONS insertions
made in a4570f51 (note that the types of some have diverged).
Now the extrefs.t again works in Tru64 (and no other compiler
has ever tripped on this).

commit | commitdiff | tree

James E Keenan [Thu, 23 Feb 2017 13:24:16 +0000 (08:24 -0500)]

Clean up temporary directories after testing.

Signed-off-by: James E Keenan <jkeenan@cpan.org>

commit | commitdiff | tree

Karl Williamson [Tue, 21 Feb 2017 23:49:28 +0000 (16:49 -0700)]

Forgotten static declarations

Signed-off-by: James E Keenan <jkeenan@cpan.org>

commit | commitdiff | tree

Andy Lester [Wed, 22 Feb 2017 05:22:07 +0000 (23:22 -0600)]

Make Perl_abort_execution flagged as not returning

commit | commitdiff | tree

Craig A. Berry [Wed, 22 Feb 2017 03:09:03 +0000 (21:09 -0600)]

Revert "ext/VMS-Stdio: switch to using macros designed for string constant args"

This reverts commit c0dea56fe487504493d97df5a7a6be57a2d2834d.

The new macros introduced here have now just been rendered invisible
by 8f71649941d02d5bdfe4f. Using macros that we can't see breaks the
build, so revert this for now. It can be reintroduced when the macro
names are settled and no longer hidden.

commit | commitdiff | tree

Tony Cook [Tue, 21 Feb 2017 23:35:03 +0000 (10:35 +1100)]

perldelta for 853eb961c1a3

commit | commitdiff | tree

Hugo van der Sanden [Tue, 21 Feb 2017 15:45:02 +0000 (15:45 +0000)]

update comment in test_bootstrap.pl

commit | commitdiff | tree

James E Keenan [Tue, 21 Feb 2017 15:16:37 +0000 (10:16 -0500)]

Add t/comp/parser_run.t to MANIFEST.

To keep t/porting/test_bootstrap.t happy, we need to declare the new test file
as an exception in that it says 'require test.pl' which tests in t/comp/ are
normally not permitted to do.

commit | commitdiff | tree

Hugo van der Sanden [Sun, 19 Feb 2017 10:46:09 +0000 (10:46 +0000)]

[perl #130814] update pointer into PL_linestr after lookahead

Looking ahead for the "Missing $ on loop variable" diagnostic can reallocate
PL_linestr, invalidating our pointer. Save the offset so we can update it
in that case.

commit | commitdiff | tree

Hugo van der Sanden [Sun, 19 Feb 2017 11:15:38 +0000 (11:15 +0000)]

[perl #130814] Add testcase, and new testfile t/comp/parser_run.t

Sometimes it's useful to have test.pl around, but it seems inappropriate
to pollute the existing t/comp/parser.t with that.

commit | commitdiff | tree

Chris 'BinGOs' Williams [Tue, 21 Feb 2017 10:38:09 +0000 (10:38 +0000)]

Are friends electric?

commit | commitdiff | tree

David Mitchell [Tue, 21 Feb 2017 10:20:44 +0000 (10:20 +0000)]

update Module::CoreList for 5.25.11

commit | commitdiff | tree

David Mitchell [Tue, 21 Feb 2017 10:02:36 +0000 (10:02 +0000)]

bump version number in lib/B/Op_private.pm

this was achieved with 'make regen'

commit | commitdiff | tree

reneeb [Tue, 21 Feb 2017 06:23:54 +0000 (07:23 +0100)]

bump version to 5.25.11

commit | commitdiff | tree

reneeb [Tue, 21 Feb 2017 06:13:21 +0000 (07:13 +0100)]

Merge branch 'blead' of ssh://perl5.git.perl.org/perl into blead

commit | commitdiff | tree

reneeb [Tue, 21 Feb 2017 06:12:59 +0000 (07:12 +0100)]

new perldelta for 5.25.11

commit | commitdiff | tree

reneeb [Tue, 21 Feb 2017 06:10:19 +0000 (07:10 +0100)]

add 5.25.10 epigraph

commit | commitdiff | tree

Tony Cook [Tue, 21 Feb 2017 05:38:36 +0000 (16:38 +1100)]

(perl #130822) fix an AV leak in Perl_reg_named_buff_fetch

Originally noted as a scoping issue by Andy Lester.

commit | commitdiff | tree

reneeb [Mon, 20 Feb 2017 22:06:59 +0000 (23:06 +0100)]

Tick release in the release schedule

commit | commitdiff | tree

reneeb [Mon, 20 Feb 2017 21:57:03 +0000 (22:57 +0100)]

Merge branch 'release-5.25.10' into blead

commit | commitdiff | tree

reneeb [Mon, 20 Feb 2017 16:46:24 +0000 (17:46 +0100)]

finalize perldelta

commit | commitdiff | tree

Karl Williamson [Sat, 18 Feb 2017 20:00:49 +0000 (13:00 -0700)]

perlre, perlrecharclass, Fix overlooked typos

I thought I had committed these nits, pointed out to me by reviewers,
but I hadn't done so properly.

commit | commitdiff | tree

Karl Williamson [Sat, 18 Feb 2017 21:01:05 +0000 (14:01 -0700)]

perlrebackslash: Clarify

"Character class for non vertical whitespace." wasn't meant to mean match
whitespace that isn't vertical.

commit | commitdiff | tree

Karl Williamson [Sat, 18 Feb 2017 20:50:00 +0000 (13:50 -0700)]

perlre: Revamp portions

This commit folds in the after-thought section on Version 8 regexes into
the rest of the document, making most of it part of a gentler "Basics"
section. Some redundancies from the auxiliary pods have been removed
(these being perlrebackslash and perlrecharclass, created, I presume,
to allow this document to be shorter).

commit | commitdiff | tree

Karl Williamson [Sat, 18 Feb 2017 20:46:16 +0000 (13:46 -0700)]

perlre: Some clarifications, small corrections

commit | commitdiff | tree

Karl Williamson [Sat, 18 Feb 2017 20:30:14 +0000 (13:30 -0700)]

perlre: Nits involving C<>, I<>

This standardizes the usage of single characters inside C<> to be
C<"x">, which was the most common usage previously in this pod.

It italicizes e.g., etc.

It removes trailing blanks on a few lines

commit | commitdiff | tree

Karl Williamson [Sat, 18 Feb 2017 20:21:19 +0000 (13:21 -0700)]

perlre: Don't name exact max non-consume depth

In a couple of places, this pod says that 50 is the recursion limit in
patterns without consuming any input, but that it is changeable by
recompiling perl. Therefore, we shouldn't specify the quantity, because
it might not be the correct value. Further, 50 is currently wrong.

commit | commitdiff | tree

Karl Williamson [Sat, 18 Feb 2017 20:00:49 +0000 (13:00 -0700)]

perlrecharclass: A few clarifications

commit | commitdiff | tree

Karl Williamson [Fri, 17 Feb 2017 18:54:07 +0000 (11:54 -0700)]

perlretut: "-" is sometimes a metacharacter

commit | commitdiff | tree

Karl Williamson [Fri, 17 Feb 2017 02:36:11 +0000 (19:36 -0700)]

perlretut: Cleanup, nits

This adds some C<>, I<>, changes non-literal text from C<> to I<>.
It changes some phrases that are enclosed in single quotes to the more
idiomatic double quotes.

It standardizes on single characters within C<> to be C<'x'>. This is
not standardized in our documentation, and people change it back and
forth. I prefer the extra quotes, as it otherwise blends in to the
background on html displays.

It converts the few 'regex' terms to 'regexp'.

It fixes some numbered lists to display not so uglily

It removes the cautions about the features that are no longer experimental

It corrects some grammar

commit | commitdiff | tree

Karl Williamson [Fri, 17 Feb 2017 02:30:08 +0000 (19:30 -0700)]

Pods: Standardize on one pattern mod style

There were about 40 cases in pods where //m is used to represent the
pattern modifier 'm', but nearly 400 where /m is used. Convert to the
most common representation.

commit | commitdiff | tree

Jarkko Hietaniemi [Mon, 20 Feb 2017 14:17:14 +0000 (09:17 -0500)]

Implement --help|--usage.

commit | commitdiff | tree

Jarkko Hietaniemi [Mon, 20 Feb 2017 14:10:16 +0000 (09:10 -0500)]

Also understand the output of "make test_harness".

commit | commitdiff | tree

Jarkko Hietaniemi [Mon, 20 Feb 2017 13:48:54 +0000 (08:48 -0500)]

Be more verbose about what failed and from which input.

commit | commitdiff | tree

reneeb [Mon, 20 Feb 2017 11:51:20 +0000 (12:51 +0100)]

Update Module::CoreList for 5.25.10

commit | commitdiff | tree

Karl Williamson [Mon, 20 Feb 2017 08:27:16 +0000 (01:27 -0700)]

perldelta

commit | commitdiff | tree

Karl Williamson [Mon, 20 Feb 2017 07:46:05 +0000 (00:46 -0700)]

re/fold_grind.t: Allow watchdog timeout to vary

If someone is running on a slow system, and they want fold_grind to
complete, they can now set an environment variable based on the relative
slowness of their system, that will be factored in to the length of the
timer.

commit | commitdiff | tree

Karl Williamson [Sun, 19 Feb 2017 21:14:35 +0000 (14:14 -0700)]

Split XS-APItest/t/utf8.t

This test file is one of the longest running ones.  It has three main
semi-independent parts.  Two of them are split off into 2 files with a
common file required.  The other part is still long running, so it is
split so that a common file is used to run the tests, but it is called
with a chunk number and it only executes based on that chunk.  The
number of chunks is based on the environment variable TEST_JOBS, up to
10.  Each chunk executes 1/TEST_JOBS of the total test.  If TEST_JOBS is
not set, it reverts to 1 chunk.  The alternative would be to revert to
10, but since there is overhead associated with each new chunk, I chose,
for now, 1.

There may be a better solution later on, but I think this is good enough
for now.

commit | commitdiff | tree

Karl Williamson [Sun, 19 Feb 2017 05:28:58 +0000 (22:28 -0700)]

Split APItest/t/handy.t

This is a very long running test.  This commit splits it into smaller
chunks, based on the environment variable TEST_JOBS, up to 10.   Each
chunk executes 1/TEST_JOBS of the total test.  If TEST_JOBS is not set,
it reverts to 1 chunk.  The alternative would be to revert to 10, but
since there is overhead associated with each new chunk, I chose, for
now, 1.

There may be a better solution later on, but I think this is good enough
for now.

commit | commitdiff | tree

Karl Williamson [Mon, 20 Feb 2017 05:14:53 +0000 (22:14 -0700)]

handy.h: Guard controversial macro name

This is so their use cannot spread easily until we have sorted things
out in 5.27

commit | commitdiff | tree

Karl Williamson [Fri, 17 Feb 2017 18:56:38 +0000 (11:56 -0700)]

perlretut: Note when metacharacters become ordinary

commit | commitdiff | tree

Karl Williamson [Thu, 16 Feb 2017 04:22:59 +0000 (21:22 -0700)]

Revise documentation of eval and evalbytes

commit | commitdiff | tree

Karl Williamson [Tue, 14 Feb 2017 23:59:49 +0000 (16:59 -0700)]

Clarify "User-visible changes"

The pumpking agreed with this wording

commit | commitdiff | tree

Karl Williamson [Mon, 20 Feb 2017 05:03:27 +0000 (22:03 -0700)]

Balance uniprops tests

Commit 5656b1f654bb034c561558968ed3cf87a737b3e1 split the tests
generated by mktables so that 10 separate files each execute 10% of the
tests.  But it turns out that some tests are much more involved than
others, so that some of those 10 files still took much longer than
average.  This commit changes the split so that the amount of time each
file takes is more balanced.  It uses a natural breaking spot for the
tests for the \b{} flavors, except that GCB and SB are each short (so
are combined into being tested from one file), and LB is very long, so
is split into 4 test groups.

commit | commitdiff | tree

Karl Williamson [Mon, 20 Feb 2017 04:48:40 +0000 (21:48 -0700)]

Inline foldEQ, foldEQ_latin1, foldEQ_locale

These short functions are called in inner loops and regex backtracking.

commit | commitdiff | tree

Karl Williamson [Mon, 20 Feb 2017 04:43:40 +0000 (21:43 -0700)]

op.c: Add comment

commit | commitdiff | tree

Karl Williamson [Mon, 20 Feb 2017 04:39:32 +0000 (21:39 -0700)]

perlrecharclass: Simplify by referring to other pod

The (?[...] has 're strict' rules. Slightly reword to more directly
refer to the documentation on that.

commit | commitdiff | tree

Tony Cook [Mon, 20 Feb 2017 00:55:22 +0000 (11:55 +1100)]

perldelta for e7a8a8aac45d

commit | commitdiff | tree

Tony Cook [Mon, 20 Feb 2017 00:54:58 +0000 (11:54 +1100)]

Add another reneeb alias

commit | commitdiff | tree

Tony Cook [Mon, 20 Feb 2017 00:02:21 +0000 (11:02 +1100)]

(perl #129340) copy the source when inside the dest in sv_insert_flags()

commit | commitdiff | tree

reneeb [Sun, 19 Feb 2017 22:41:56 +0000 (23:41 +0100)]

Some version numbers in INSTALL were wrong

commit | commitdiff | tree

Steve Hay [Sun, 19 Feb 2017 16:10:13 +0000 (16:10 +0000)]

perldelta for commit 1f664ef5314fb6e438137c44c95cf5ecdbdb5e9b

commit | commitdiff | tree

Steve Hay [Sun, 19 Feb 2017 13:33:37 +0000 (13:33 +0000)]

Add support for VS2015 (VC++ 14.0)

Due to the rewritten CRT in this version of Visual C++ it is no longer
possible (or at least not at all easy) to make use of the ioinfo struct,
which commit b47a847f62 (re-)introduced in order to fix RT#120091/118059.
Therefore, we effectively revert commit b47a847f62 for VS2015 onwards on
the basis that being able to build with VS2015 onwards is more important
than the RT#120091/118059 bug fix. This does unfortunately mean that perls
built with <=VS2013 will not be compatible with perls built with >=VS2015,
but they may well not have been compatible anyway because of the CRT
rewrite, and certainly wouldn't be compatible if perl builds with VS2015
were not supported!

See RT#125714 for more discussion about this.

commit | commitdiff | tree

David Mitchell [Sun, 19 Feb 2017 12:58:37 +0000 (12:58 +0000)]

davem's perldelta entries for 5.25.10

commit | commitdiff | tree

David Mitchell [Sun, 19 Feb 2017 12:36:58 +0000 (12:36 +0000)]

bump test count in t/comp/parser.t

(the previous commit forgot to)

commit | commitdiff | tree

David Mitchell [Sun, 19 Feb 2017 12:21:47 +0000 (12:21 +0000)]

pp_formline(): revert recent buffer growth changes

This commit reverts the following (except for the additions to
t/op/write.t):

    3b1d752 pp_formline(): add empty body to empty while loop
    f62fd06 pp_formline(): avoid buffer overrun
    90c3aa0 pp_formline: simplify growing of PL_formtarget

90c3aa0 was intended to make the code for growing the buffer simpler and
more robust with less possibility of obscure edge cases, while the
follow-up commit fixed an issue introduced by that commit, and the next
was a tweak for a compiler warning. But

    http://nntp.perl.org/group/perl.perl5.porters/243101

shows that there are still issues with the new code and I've decided to
abandon the effort and leave things how they were originally - i.e.
happily working, but probably with some still undiscovered edge cases.

commit | commitdiff | tree

Aaron Crane [Sun, 19 Feb 2017 12:26:54 +0000 (12:26 +0000)]

[perl #130815] fix ck_return null-pointer deref on malformed code

commit | commitdiff | tree

David Mitchell [Sat, 18 Feb 2017 14:00:56 +0000 (14:00 +0000)]

pp_formline(): add empty body to empty while loop

my previous commit in this function added a block that happened
to follow directly after a bodiless while loop, i.e. 'while(...);'.
clang spotted this and warned. So add an empty body '{}' after the
while to visually disambiguate it.

commit | commitdiff | tree

Aaron Crane [Sat, 4 Feb 2017 17:15:28 +0000 (17:15 +0000)]

Show sub name in signature arity-check error messages

commit | commitdiff | tree

Andy Lester [Sat, 18 Feb 2017 01:46:15 +0000 (19:46 -0600)]

Moving variables to their innermost scope.

Some vars have been tagged as const because they do not change in their
new scopes. In pp_reverse in pp.c, I32 tmp is only used to hold a char,
so is changed to char.

commit | commitdiff | tree

David Mitchell [Sat, 18 Feb 2017 10:46:53 +0000 (10:46 +0000)]

pp_multideref: tweak an assertion

My recent commit v5.25.9-89-g43dbb3c added an assertion to the effect that
in

@{ local $a[0]{b}[1] } = 1;

the 'local' could only appear at the end of a block and so asserted that the
next op should be OP_LEAVE. However, this:

@{1, local $a[0]{b}[1] } = 1;

inserts an OP_LIST before the OP_LEAVE.

Improve the assert to cope with any number of OP_NULL or OP_LISTs before
the OP_LEAVE.

commit | commitdiff | tree

David Mitchell [Sat, 18 Feb 2017 10:20:00 +0000 (10:20 +0000)]

pp_formline(): avoid buffer overrun

RT #130703

My recent commit v5.25.9-77-g90c3aa0 attempted to simplify buffer growth
in pp_formline(), but missed the operators which append data to
PL_formtarget *without* doing 'goto append'. These ops either append a
fieldsize's worth of bytes, or a \n (FF_NEWLINE). So grow by fieldsize
whenever we fetch something new, and for each FF_NEWLINE.

commit | commitdiff | tree

Andreas Koenig [Thu, 16 Feb 2017 10:27:57 +0000 (11:27 +0100)]

Updates CPAN.pm to ANDK/CPAN-2.17-TRIAL2.tar.gz

commit | commitdiff | tree

jdhedden [Wed, 15 Feb 2017 04:56:20 +0000 (23:56 -0500)]

Upgrade to Thread::Queue 3.12

commit | commitdiff | tree

Karl Williamson [Wed, 15 Feb 2017 17:17:06 +0000 (10:17 -0700)]

regexec.c: Fix comment typos

commit | commitdiff | tree

David Mitchell [Wed, 15 Feb 2017 15:58:24 +0000 (15:58 +0000)]

avoid a leak in list assign from/to magic values

RT #130766

A leak in list assignment was introduced by v5.23.6-89-gbeb08a1 and
extended with v5.23.6-90-g5c1db56.

Basically the code in S_aassign_copy_common() which does a mark-and-sweep
looking for common vars by temporarily setting SVf_BREAK on LHS SVs then
seeing if that flag was present on RHS vars, very temporarily removed that
flag from the RHS SV while mortal copying it, then set it again. After
those two commits, the "resetting" code could set SVf_BREAK on the RHS SV
even when it hadn't been been present earlier.

This meant that on exit from S_aassign_copy_common(), some SVs could be
left with SVf_BREAK on. When that SV was freed, the SVf_BREAK flag meant
that the SV head wasn't planted back in the arena (but PL_sv_count was
still decremented). This could lead to slow growth of the SV HEAD arenas.

The two circumstances that could trigger the leak were:

1) An SMG var on the LHS and a temporary on the RHS, e.g.

    use Tie::Scalar;
    my ($s, $t);
    tie $s, 'Tie::StdScalar'; # $s has set magic
    while (1) {
        ($s, $t) = ($t, map 1, 1, 2); # the map returns temporaries
    }

2) A temporary on the RHS which has GMG, e.g.

    my $s = "abc";
    pos($s) = 1;
    local our ($x, $y);
    while (1) {
        my $pr = \pos($s); # creates a ref to a TEMP with get magic
        ($x, $y) = (1, $$pr);
    }

Strictly speaking a TEMP isn't required for either case; just a situation
where there's always a fresh SV on the RHS for each iteration that will
soon get freed and thus leaked.

This commit doesn't include any tests since I can't think of a way of
testing it. svleak.t relies on PL_sv_count, which in this case doesn't
show the leak.

commit | commitdiff | tree

Karl Williamson [Tue, 14 Feb 2017 20:22:58 +0000 (13:22 -0700)]

Improve handling pattern compilation errors

Perl tries to continue parsing in the face of errors for the convenience
of the person running the script, so as to batch up as many errors as
possible, and cut down the number of runs.  Some errors will, however,
have a cascading effect, resulting in the parser getting confused as to
the intent.  Perl currently aborts parsing if 10 errors accumulate.

However, some things are reparsed as compilation continues, in
particular tr///, s///, and qr//.  The code that reparses has an
expectation of basic sanity in what it is looking at, and so reparsing
with known errors can lead to segfaults.  Recent commits have tightened
this up to avoid reparsing, or substitute valid stuff before reparsing.
This all works, as the code won't execute until all the errors get
fixed.

Commit f065e1e68bf6a5541c8ceba8c9fcc6e18f51a32b changed things so that
if there is an error in parsing a pattern, the whole compilation is
immediately aborted.  Since then, I realized it would be relatively
simple to instead, skip compilation of that particular pattern, but
continue on with the parsing of the program as a whole, up to the
maximum number of allowed errors.  And again the program will refuse to
execute after compilation if there were any errors.

This commit implements that, the benefit being that we don't try to
reparse a pattern that failed the original parse, but can go on to find
errors elsewhere in the program.

commit | commitdiff | tree

James E Keenan [Tue, 14 Feb 2017 19:04:54 +0000 (14:04 -0500)]

Revert "Upgrade to Thread::Queue 3.12"

This reverts commit 57c819f845c985ed9979bfa76b1b8ca1708370f0.

Reverting to give us time to explore possible race condition. See:
https://rt.perl.org/Ticket/Display.html?id=130777

commit | commitdiff | tree

David Mitchell [Tue, 14 Feb 2017 17:50:11 +0000 (17:50 +0000)]

[MERGE] regex (?{...}) and WHILEM scope fixups

commit | commitdiff | tree

David Mitchell [Tue, 14 Feb 2017 17:10:34 +0000 (17:10 +0000)]

S_regmatch: eliminate WHILEM_A_min paren saving

In something like

    "a1b2c3d4..." =~ /(?:(\w)(\d))*..../

A WHILEM state is pushed for each iteration of the '*'. Part of this
state saving includes the previous indices for each of the captures within
the body of the thing being iterated over. So we save the following sets of
values for $1,$2:

    ()()
    (a)(1)
    (b)(2)
    (c)(3)
    (d)(4)

Then if at any point we backtrack, we can undo one or more iterations and
restore the older values of $1,$2.

However, when the match is non-greedy, as in A*?B, then on failure of B
and backtracking we attempt *more* A's rather than removing some already
matched A's. So there's never any need to save all the current paren state
for each iteration.

This eliminates a lot of per-iteration overhead for minimal WHILEMs and
makes the following run about 25% faster:

$s = ("a" x 1000);
$s =~ /^(?:(.)(.))*?[XY]/ for 1..10_000;

commit | commitdiff | tree

David Mitchell [Tue, 14 Feb 2017 16:28:31 +0000 (16:28 +0000)]

S_regmatch: eliminate WHILEM_B paren saving

In something like

    "a1b2c3d4..." =~ /(?:(\w)(\d))*..../

A WHILEM state is pushed for each iteration of the '*'. Part of this
state saving includes the previous indices for each of the captures within
the body of the thing being iterated over. So we save the following sets of
values for $1,$2:

    ()()
    (a)(1)
    (b)(2)
    (c)(3)
    (d)(4)

Then if at any point we backtrack, we can undo one or more iterations and
restore the older values of $1,$2.

For /A*B/ where A is a complex sub-pattern like (\w)(\d), we currently save
the paren state each time we're about to attempt to iterate another A.
But it turns out that for non-greedy matching, i.e. A*?B, we also
save the paren state before executing B. This is unnecessary, as
B can't alter the capture state of the parens within A. So eliminate it.

If in the future some sneaky regex is found which this commit breaks,
then as well as restoring the old behaviour, you should look carefully
to see whether similar paren-saving behaviour for B should be added to
greedy matches too, i.e. A*B. It was partly the discrepancy between
saving for A*?B but not for A*B which made me suspect it was redundant.

commit | commitdiff | tree

David Mitchell [Tue, 14 Feb 2017 16:21:40 +0000 (16:21 +0000)]

Add a comment on why TRIE.jump does a UNWIND_PAREN

(it wasn't obvious to me)

commit | commitdiff | tree

David Mitchell [Tue, 14 Feb 2017 15:59:57 +0000 (15:59 +0000)]

clear savestack on (?{...}) failure and backtrack

RT #126697

In a regex, after executing a (?{...}) code block, if we fail and
backtrack over the codeblock, we're supposed to unwind the savestack, so
that for any example any local()s within the code block are undone.

It turns out that a backtracking state isn't pushed for (?{...}), only
for postponed evals ( i.e.  (??{...})). This means that it relies on one
of the earlier backtracking states to clear the savestack on its behalf.
This can't always be relied upon, and the ticket above contains code where
this falls down; in particular:

    'ABC' =~ m{
        \A
        (?:
            (?: AB | A | BC )
            (?{
                local $count = $count + 1;
                print "! count=$count; ; pos=${\pos}\n";
            })
        )*
        \z
    }x

Here we end up relying on TRIE_next to do the cleaning up, but TRIE_next
doesn't, since there's nothing it would be responsible for that needs
cleaning up.

The solution to this is to push a backtrack state for every (?{...}) as
well as every (??{...}). The sole job of that state is to do a
LEAVE_SCOPE(ST.lastcp).

The existing backtrack state EVAL_AB has been renamed EVAL_postponed_AB
to make it clear it's only used on postponed /(??{A})B/ regexes, and a new
state has been added, EVAL_B, which is only called when backtracking after
failing something in the B in /(?{...})B/.

commit | commitdiff | tree

David Mitchell [Tue, 14 Feb 2017 13:32:16 +0000 (13:32 +0000)]

-Mre=Debug,ALL: indicate regex state stack pushes

At this maximal level of debugging output, it displays the top 3 state
stack entries each time it pushes, but with no obvious indication that
a push is occurring. This commit changes this output:

                             |   1|  Setting an EVAL scope, savestack=9,
                             |   2|   #4   WHILEM_A_max
                             |   2|   #3   WHILEM_A_max
                             |   2|   #2   CURLYX_end yes
   0 <abcdef> <g>            |   2|   4:POSIXD[\w](5)

to be this (which includes the word "push" and extra indentation for the
stack dump):

                             |   1|  Setting an EVAL scope, savestack=9,
                             |   2|   push #4   WHILEM_A_max
                             |   2|        #3   WHILEM_A_max
                             |   2|        #2   CURLYX_end yes
   0 <abcdef> <g>            |   2|   4:POSIXD[\w](5)

Also, replace curd (current depth) var with a positive integer offset
(i) var, to avoid signed/unsigned mixing problems.

commit | commitdiff | tree

David Mitchell [Sat, 11 Feb 2017 11:53:41 +0000 (11:53 +0000)]

fix pad/scope issue in re_evals

RT #129881 heap-buffer-overflow Perl_pad_sv

In some circumstances involving a pattern which has embedded code blocks
from more than one source, e.g.

my $r = qr{(?{1;}){2}X};
"" =~ /$r|(?{1;})/;

the wrong PL_comppad could be active while doing a LEAVE_SCOPE() or on
exit from the pattern.

This was mainly due to the big context stack changes in 5.24.0 - in
particular, since POP_MULTICALL() now does CX_LEAVE_SCOPE(cx) *before*
restoring PL_comppad, the (correct) unwinding of any SAVECOMPPAD's was
being followed by C<PL_comppad = cx->blk_sub.prevcomppad>, which wasn't
necessarily a sensible value.

To fix this, record the value of PL_savestack_ix at entry to S_regmatch(),
and set the cx->blk_oldsaveix of the MULTICALL to this value when pushed.
On exit from S_regmatch, we either POP_MULTICALL which will do a
LEAVE_SCOPE(cx->blk_oldsaveix), or in the absense of any EVAL, do the
explicit but equivalent LEAVE_SCOPE(orig_savestack_ix).

Note that this is a change in behaviour to S_regmatch() - formerly it
wouldn't necessarily clear the savestack completely back the point of
entry - that would get left to do by its caller, S_regtry(), or indirectly
by Perl_regexec_flags(). This shouldn't make any practical difference, but
is tidier and less likely to introduce bugs later.

commit | commitdiff | tree

jdhedden [Tue, 14 Feb 2017 00:46:37 +0000 (19:46 -0500)]

Upgrade to Thread::Queue 3.12

commit | commitdiff | tree

Karl Williamson [Tue, 14 Feb 2017 02:57:53 +0000 (19:57 -0700)]

Make _byte_dump_string() usable in all of core

I found myself needing this function for development debugging, which
formerly was only usable from utf8.c. This enhances it to allow a
second format type, and makes it core-accessible.

commit | commitdiff | tree

Karl Williamson [Tue, 14 Feb 2017 02:22:59 +0000 (19:22 -0700)]

toke.c: Make sure things are initialized

Commit 3dd4eaeb8ac39e08179145b86aedda36584a3509 fixed a bug wherein the
tr/// operator parsing code could be looking at uninitialized data.
This happens only because we try to carry on when we find errors, so as
to find as many errors as possible in a single run, as a convenience to
the person debugging the script being compiled.  And we failed to
initialize stuff upon getting an error; stuff that was later looked at
by tr///.

That commit fixed the ticket by making sure the things mentioned there
got initialized upon error, but didn't handle the various other places
in the loop where the same thing could happen.

At the time, I thought it would be easier to instead change the tr///
handling code to know that its inputs were problematic, and to avoid
looking at them in that case.  This is easily done, and would
automatically catch all the cases in the loop, now and any added in the
future.

But then I thought, maybe tr/// isn't the only operator that could be
thrown off by this.  It is the most obvious one, to someone who knows
how it goes about getting compiled; but there may be other operators
that I don't know how they get compiled and have the same or a similar
problem.  The better solution then would be to extend
3dd4eaeb8ac39e08179145b86aedda36584a3509 to make sure everything gets
initialized when there is an error.  That is what this current commit
does.

The previous few commits have refactored things so as to minimize the
number of places that need to be handled here, down to three.    I kinda
doubt that new constructs will be added, at this stage in the language
development, that would require the same initialization handling.  But,
if they were, hopefully those doing it would follow the existing
paradigm that this commit and 3dd4eaeb8ac39e08179145b86aedda36584a3509
establish.

Another way to handle this would have been to, instead of doing an
initialize-and-'continue', to instead jump to a common label at the
bottom of the loop which does the initialization.  I think it doesn't
matter much which, so left it as this.

commit | commitdiff | tree

Karl Williamson [Tue, 14 Feb 2017 02:18:38 +0000 (19:18 -0700)]

toke.c: Quit now if error at end of input

In these two cases, we know we are at the end of the input, and that we
have an error. There is no need to try to patch things up so we can
continue to parse looking for other errors; there's nothing left to
parse. So skip having to deal with patching up.

commit | commitdiff | tree

Karl Williamson [Tue, 14 Feb 2017 02:12:31 +0000 (19:12 -0700)]

toke.c: Un-special case something

By refactoring slightly, we make this code in a switch statement
have the same entrance and exit invariants as the other cases, so they
all can be handled uniformly at the end of the switch.

commit | commitdiff | tree

Karl Williamson [Tue, 14 Feb 2017 02:01:46 +0000 (19:01 -0700)]

Don't try to compile a pattern known to be in error

Regular expression patterns are parsed by the lexer/toker, and then
compiled by the regex compiler.  It is foolish to try to compile one
that the parser has rejected as syntactically bad; assumptions may be
violated and segfaults ensue.  This commit abandons all parsing
immediately if a pattern had errors in it.  A better solution would be
to flag this pattern as not to be compiled, and continue parsing other
things so as to find the most errors in a single attempt, but I don't
think it's worth the extra effort.

Making this change caused some misleading error messages in the test
suite to be replaced by better ones.

commit | commitdiff | tree

Karl Williamson [Tue, 14 Feb 2017 01:49:52 +0000 (18:49 -0700)]

toke.c: Add internal function to abort parsing

This is to be called to abort the parsing early, before the required
number of errors have been found. It is used when continuing the parse
would be either fruitless or we could be looking at garbage.

commit | commitdiff | tree

Karl Williamson [Tue, 14 Feb 2017 01:46:30 +0000 (18:46 -0700)]

toke.c: White-space only

Indent after the previous commit enclosed this code in a new block.

commit | commitdiff | tree

Karl Williamson [Tue, 14 Feb 2017 01:27:02 +0000 (18:27 -0700)]

Relax internal function API

This changes yyerror_pvn so that its first parameter can be NULL. This
indicates no message is to be output, but that parsing is to be
abandoned immediately, without waiting for more errors to build up.

commit | commitdiff | tree

Karl Williamson [Mon, 13 Feb 2017 20:18:38 +0000 (13:18 -0700)]

Extract code into a function

This creates a function in toke.c to output the compilation aborted
message, changing perl.c to call that function. This is in preparation
for this to be called from a 2nd place

commit | commitdiff | tree

Karl Williamson [Mon, 13 Feb 2017 20:03:32 +0000 (13:03 -0700)]

toke.c: Rmv no longer necessary UTF-8 checks

The previous commit tightened up the checking for well-formed UTF8ness,
so that the ones removed here were redundant.

The test during a string eval may also no longer be necessary, but since
there are many ways to create that string, I'm not confidant enough to
remove it.

commit | commitdiff | tree

Karl Williamson [Tue, 14 Feb 2017 03:40:57 +0000 (20:40 -0700)]

Add test for [perl #130675]

commit | commitdiff | tree

Karl Williamson [Tue, 14 Feb 2017 03:37:47 +0000 (20:37 -0700)]

t/lib/croak/toke_l1: Cut down test

The previous commits hardening toke.c against malformed UTF-8 input have
allowed this test case to be cut down substantially

commit | commitdiff | tree

Karl Williamson [Mon, 13 Feb 2017 19:35:18 +0000 (12:35 -0700)]

toke.c: Fix bugs where UTF-8 is turned on in mid chunk

Previous commits have tightened up the checking of UTF-8 for
well-formedness in the input program or string eval.  This is done in
lex_next_chunk and lex_start.  But it doesn't handle the case of

    use utf8; foo

because 'foo' is checked while UTF-8 is still off.  This solves that
problem by noticing when utf8 is turned on, and then rechecking at the
next opportunity.

See thread beginning at
http://nntp.perl.org/group/perl.perl5.porters/242916

This fixes [perl #130675].  A test will be added in a future commit

This catches some errors earlier than they used to be and aborts. so
some tests in the suite had to be split into multiple parts.

commit | commitdiff | tree

Karl Williamson [Mon, 13 Feb 2017 19:16:45 +0000 (12:16 -0700)]

mg.c: PL_hints is unsigned

Therefore it's dangerous to presume things fit into an IV.

commit | commitdiff | tree

Karl Williamson [Mon, 13 Feb 2017 19:02:54 +0000 (12:02 -0700)]

toke.c: Add branch prediction

The input is far more likely to be well-formed than not.

commit | commitdiff | tree

Karl Williamson [Mon, 13 Feb 2017 18:54:06 +0000 (11:54 -0700)]

toke.c: Fix comments describing S_tokeq

The comments about what this function does were incorrect.

commit | commitdiff | tree

Karl Williamson [Mon, 13 Feb 2017 18:51:59 +0000 (11:51 -0700)]

toke.c: Slight refactor.

This moves an automatic variable to closer to the only place it is used;
it also adds branch prediction. It is likely that the input will be
well-formed.

commit | commitdiff | tree

Karl Williamson [Mon, 13 Feb 2017 18:29:35 +0000 (11:29 -0700)]

toke.c: White space, comments, braces

I am adding the braces because in one of the areas, the lack of braces
had led to a blead failure.

commit | commitdiff | tree

Karl Williamson [Mon, 13 Feb 2017 18:31:53 +0000 (11:31 -0700)]

toke.c: Don't compare same bytes twice

Before starting this memEQ, we know that the first bytes are the same,
so might as well start the compare with the 2nd bytes.

commit | commitdiff | tree

Karl Williamson [Mon, 13 Feb 2017 18:15:07 +0000 (11:15 -0700)]

toke.c: Move declaration

This automatic variable doesn't need such a large scope.

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom