perl5.git.perl.org Git - perl5.git/log

locale.c: Failure to build if not allowing LC_COLLATE

This is part of [perl #133696]. A typo was causing a macro to be
defined in terms of itself, hence an illegal recursive definition.

commit | commitdiff | tree

Karl Williamson [Thu, 29 Nov 2018 18:53:58 +0000 (11:53 -0700)]

locale.c: Don't use numeric unless LC_NUMERIC

This commit #ifdef's a usage of a variable that isn't valid unless the
system has LC_NUMERIC

commit | commitdiff | tree

Karl Williamson [Thu, 29 Nov 2018 18:50:58 +0000 (11:50 -0700)]

locale.c: Fix wrong scope of #if's

The function print_bytes_for_locale() should be defined if DEBUGGING;
prior to this commit it didn't get defined unless LC_COLLATE was
defined on the platform.

commit | commitdiff | tree

Eugen Konkov [Thu, 29 Nov 2018 17:56:07 +0000 (10:56 -0700)]

More removals of $a,$b in perldata for [#perl #133700]

commit | commitdiff | tree

Eugen Konkov [Thu, 29 Nov 2018 17:21:20 +0000 (10:21 -0700)]

PATCH: [perl #133700] avoid use of $a and $b in perldata

commit | commitdiff | tree

Tony Cook [Thu, 29 Nov 2018 03:16:03 +0000 (14:16 +1100)]

make boot_Win32CORE extern "C" for C++ builds

commit | commitdiff | tree

Tony Cook [Thu, 29 Nov 2018 03:12:01 +0000 (14:12 +1100)]

stdio.h on Cygwin doesn't expose cuserid() with _GNU_SOURCE

It's probably possible to expose it by setting _XOPEN_SOURCE to
some specific value, but this appears to be a bug.

https://cygwin.com/ml/cygwin/2018-11/msg00230.html

commit | commitdiff | tree

Tony Cook [Wed, 28 Nov 2018 23:50:19 +0000 (23:50 +0000)]

Pass a UV to a format expecting a UV

MAX_LEGAL_CP can end up as int depending on the ranges of the types
involved, causing a type mismatch on the format in cp_above_legal_max.

By adding the cast to the macro definition we both prevent the type
mismatch on the format, but also may allow some static analysis tool to
detect comparisons against signed types, which is likely an error.

commit | commitdiff | tree

jdhedden [Wed, 28 Nov 2018 03:53:40 +0000 (22:53 -0500)]

Upgrade to threads::shared 1.59

Committer: perldelta entry

commit | commitdiff | tree

Karl Williamson [Tue, 27 Nov 2018 17:07:12 +0000 (10:07 -0700)]

Add USE_THREAD_SAFE_LOCALE to non-bin-compat options list

Spotted by Tux

commit | commitdiff | tree

Karl Williamson [Tue, 27 Nov 2018 16:43:01 +0000 (09:43 -0700)]

regcomp.c: White-space only

Vertical align continuation chars in a macro

commit | commitdiff | tree

Karl Williamson [Tue, 27 Nov 2018 16:42:45 +0000 (09:42 -0700)]

Add regnode EXACTFU_ONLY8

This is a regnode that otherwise would be an EXACTFU except that it
contains a code point that requires UTF-8 to match, including all the
possible folds involving it. Hence if the target string isn't UTF-8, we
know it can't possibly match, without needing to try.

For completeness, there could also be an EXACTFAA_ONLY8 and an
EXACTFL_ONLY8 created, but I think these are unlikely to actually appear
in the wild, since using /aa is mainly about ASCII, and /l mostly will
involve characters that don't require UTF-8.

commit | commitdiff | tree

Karl Williamson [Sat, 17 Nov 2018 22:51:19 +0000 (15:51 -0700)]

Add regnode EXACT_ONLY8

This is a regnode that otherwise would be an EXACT except that it
contains a code point that requires UTF-8 to represent. Hence if the
target string isn't UTF-8, we know it can't possibly match, without
needing to try.

commit | commitdiff | tree

Karl Williamson [Sat, 17 Nov 2018 22:47:02 +0000 (15:47 -0700)]

regcomp.c: Use common code instead of duplicating it

The common code is about to get more complicated, so use it instead of a
copy.

commit | commitdiff | tree

David Mitchell [Tue, 27 Nov 2018 16:36:35 +0000 (16:36 +0000)]

perlreref.pod: disambiguate "code"

It says:

   (?(cond)yes)      Conditional expression, where "cond" can be:
                     (?=pat)   lookahead
                     ...

A strict reading of that is that there must be two pairs of parens
in each conditional construct, e.g. (?((?=pat))yes).

Make the text clearer.

commit | commitdiff | tree

David Mitchell [Tue, 27 Nov 2018 13:26:39 +0000 (13:26 +0000)]

handle /(?(?{code}))/ mixed compile-and runtime

Where a runtime pattern contains both compile-time and run-time code
blocks, e.g.:

    $re = '(?{ RRR })';
    / $re X(?{ CCC })Y/

The compile-time code-block CCC is parsed at the same time as the
surrounding text. The runtime code RRR is parsed at runtime by
constructing a fake pattern and re-parsing it, but with any compile-time
code-blocks blanked out (so they don't get compiled twice). The compiled
regex is then thrown away, but any optrees just created for the runtime
code blocks are kept.

For example at runtime, the re-parsed pattern looks like:

    / (?{ RRR }) X__________Y/

Unfortunately this was failing for the conditional pattern, e.g.

    / $re X(?(?{ CCC }))Y/

which was getting blanked as

    / (?{ RRR }) X(?_______)Y/

which isn't valid syntax.

This commit blanks (?{...}) into (?=====) instead which is always legal.

commit | commitdiff | tree

Aaron Crane [Tue, 27 Nov 2018 11:32:11 +0000 (11:32 +0000)]

Rely on C89 "const"

The metaconfig probe for <db.h> previously relied on the d_const symbol set
by the "const" probe, so generating Configure here has been done against
metaconfig commit 1204d4627a06b11f16620188f3fa83159ed35fd9 which changes
that.

Thanks to khw++ for pointing out this oversight in my attempt last year to
make the codebase rely on C89.

commit | commitdiff | tree

Karl Williamson [Sun, 18 Nov 2018 22:46:07 +0000 (15:46 -0700)]

regexec.c: Use ANYOF bitmap lookup in more cases

ANYOFish nodes have a bitmap.  If we know the value is in the bitmap
range, then flags that apply to out-of-range values are irrelevant.
Other flags being set indicate that the desired answer is more
complicated than just using a bitmap lookup.  But exclude this
irrelevant flag from that calculation when we know the value is in the
bitmap.

There are other flags that it is possible to exclude, but not without
further conditionals, or unsharing code, and are either rarely set or
are for node types that we don't worry so much about optimal
performance, like /l and /d.  The changes introduced by this commit
are determined at .c compile time except for a runtime mask, and hence
don't introduce new branches that may destroy the instruction cache
pipeline.

commit | commitdiff | tree

Karl Williamson [Sun, 18 Nov 2018 22:24:45 +0000 (15:24 -0700)]

regexec.c: Refactor expanded macro from prev. commit

commit | commitdiff | tree

Karl Williamson [Sun, 18 Nov 2018 22:12:02 +0000 (15:12 -0700)]

regexec.c: Expand out macro in only remaining use

There is only one call to this macro. It's easier to understand if
expanded out instead of the call. The next commit will refactor.

commit | commitdiff | tree

Karl Williamson [Wed, 21 Nov 2018 03:59:18 +0000 (20:59 -0700)]

regcomp.c: Clarify comment

commit | commitdiff | tree

Karl Williamson [Sun, 18 Nov 2018 21:52:15 +0000 (14:52 -0700)]

regcomp.c: Initialize a variable more conservatively

It doesn't matter currently, but thes variable shouldn't be TRUE unless
/i is in effect.

commit | commitdiff | tree

Karl Williamson [Sun, 18 Nov 2018 20:39:49 +0000 (13:39 -0700)]

regcomp.c: Use a weird value in a place where ignored

This way, it doesn't confuse that it is legal, and should it stop being
ignored in the called function, it will show up as a problem much
sooner.

commit | commitdiff | tree

Karl Williamson [Sat, 17 Nov 2018 23:02:31 +0000 (16:02 -0700)]

regexec.c: Add comment

commit | commitdiff | tree

Karl Williamson [Sat, 17 Nov 2018 22:54:34 +0000 (15:54 -0700)]

regexec.c: Rmv unused macros

commit | commitdiff | tree

Karl Williamson [Sun, 18 Nov 2018 23:06:50 +0000 (16:06 -0700)]

regcomp.h: Clarify comments

commit | commitdiff | tree

Karl Williamson [Thu, 15 Nov 2018 18:09:35 +0000 (11:09 -0700)]

regcomp.c: Consolidate duplicated code into 1 place

commit | commitdiff | tree

Karl Williamson [Wed, 21 Nov 2018 04:21:44 +0000 (21:21 -0700)]

regcomp.c: Use better method for setting debug offsets

This was changing the parse pointer around some code and restoring it
afterwards. The purpose must have been to get the debug offsets
correct. But there is a better way to do that, which doesn't take up
cycles on a non-Debugging build, and that is to set the offsets
directly.

commit | commitdiff | tree

Karl Williamson [Wed, 21 Nov 2018 04:18:51 +0000 (21:18 -0700)]

regcomp.c: Remove another sizing pass relict

We no longer play with the emit ptr in this function, so no need to save
and restore it.

commit | commitdiff | tree

Karl Williamson [Sun, 25 Nov 2018 19:14:49 +0000 (12:14 -0700)]

Move isPOWER_OF_2() macro to handy.h

This is in preparation for it to be used outside of the file which
previously defined it.

commit | commitdiff | tree

Karl Williamson [Sun, 25 Nov 2018 19:12:19 +0000 (12:12 -0700)]

regen/mk_invlists.pl: Generate a new value

The new value is the maximum number of code points that fold to any
single code point. It will be used in a future commit.

commit | commitdiff | tree

James E Keenan [Mon, 26 Nov 2018 17:29:03 +0000 (12:29 -0500)]

Rename local variable to prevent confusion with global

Per: https://lgtm.com/projects/g/Perl/perl5/alerts/?mode=tree&ruleFocus=2157860312

For: RT # 133686 (partial)

commit | commitdiff | tree

James E Keenan [Mon, 26 Nov 2018 17:14:51 +0000 (12:14 -0500)]

Rename global variable to prevent confusion with local

Per: https://lgtm.com/projects/g/Perl/perl5/alerts/?mode=tree&ruleFocus=2157860312

commit | commitdiff | tree

Tony Cook [Mon, 26 Nov 2018 03:21:13 +0000 (14:21 +1100)]

perldelta for 404395d24bc8, 640e129d0fc4 and 85d2f7cacba4

commit | commitdiff | tree

Tony Cook [Tue, 20 Nov 2018 23:05:27 +0000 (10:05 +1100)]

(perl #133659) make an in-place edit successful if the exit status is zero

during global destruction.

This means that code like:

  perl -i -ne '...; last'

will replace the input file with the in-place edit output of the file,
but:

  perl -i -ne '...; die'

or

  perl -i -ne '...; exit 1'

won't.

commit | commitdiff | tree

Tony Cook [Tue, 20 Nov 2018 05:43:43 +0000 (16:43 +1100)]

(perl #133659) tests for global destruction handling of inplace editing

commit | commitdiff | tree

Tony Cook [Tue, 20 Nov 2018 04:30:20 +0000 (15:30 +1100)]

(perl #133659) move argvout cleanup to a new function

commit | commitdiff | tree

James E Keenan [Fri, 23 Nov 2018 16:49:03 +0000 (11:49 -0500)]

Remove 1 comparison whose result is always the same.

Per: https://lgtm.com/projects/g/Perl/perl5/alerts/?mode=tree&ruleFocus=2154840804

For: RT 133686 (partial)

commit | commitdiff | tree

James E Keenan [Fri, 23 Nov 2018 18:45:25 +0000 (13:45 -0500)]

Eliminate empty conditional branch

Per: https://lgtm.com/projects/g/Perl/perl5/alerts/?mode=tree&ruleFocus=2154840803

For: RT 133686 (partial)

commit | commitdiff | tree

James E Keenan [Sat, 24 Nov 2018 00:17:38 +0000 (19:17 -0500)]

AltaVista is no more.

Provide a non-Mountain View alternative.

For: RT # 133684

commit | commitdiff | tree

James E Keenan [Fri, 23 Nov 2018 22:13:59 +0000 (17:13 -0500)]

Split NAME line on multiple whitespaces

For:  RT # 133683

pod/perlmodlib.pod is a file generated by pod/perlmodlib.PL, which is
run by 'miniperl' during 'make'.  That program parses the 'NAME' header
of .pod files and fragments of POD found in 'regen/opcode.pl'.  The POD
for B::Op_private is one such fragment.  Correcting a superfluous
whitespace in that fragment did not suffice to prevent the downstream
formatting error reported in the RT -- an error visible with 'pod2text'
and 'pod2html' as well.  We also had to make the regex which
perlmodlib.PL uses to parse the 'NAME' header more flexible.

commit | commitdiff | tree

Dagfinn Ilmari Mannsåker [Thu, 22 Nov 2018 12:14:38 +0000 (12:14 +0000)]

perlfunc: clarify reset EXPR behaviour

Mention that it only affects variables in the current package, and
that it affects scalars, arrays and hashes ("variables and arrays"
is just confus(ed|ing)).

commit | commitdiff | tree

Karen Etheridge [Wed, 21 Nov 2018 17:40:19 +0000 (09:40 -0800)]

oops, typo

commit | commitdiff | tree

Tomasz Konojacki [Wed, 21 Nov 2018 08:26:31 +0000 (09:26 +0100)]

optimize IV -> UV conversions

This commit replaces all instances of code that looks like this:

uv = (iv == IV_MIN) ? (UV)iv : (UV)(-iv)

with simpler and more optimal:

uv = -(UV)iv

While -iv indeed results in an undefined behaviour when iv == IV_MIN,
-(UV)iv is perfectly well defined and does the right thing.

C standard guarantees that the result of (UV)iv (for negative iv) is
equal to iv + UV_MAX + 1 (see 6.3.1.3, paragraph 2 in C11). It also
guarantees that the result of -uv is UV_MAX - uv + 1 (6.2.5,
paragraph 9).

That means that the result of -(UV)iv is UV_MAX - (iv + UV_MAX + 1) + 1
which is equal to -iv for *all* possible negative values of iv.

[perl #133677]

commit | commitdiff | tree

David Mitchell [Wed, 21 Nov 2018 12:09:45 +0000 (12:09 +0000)]

S_hv_delete_common(): avoid undefined behaviour

ASAN -fsanitize-undefined was tripping on the second of these two lines:

svp = AvARRAY(isa);
end = svp + AvFILLp(isa)+1;

In the case where svp is NULL and AvFILLp(isa) is -1, the first addition
is undefined behaviour. Add the 1 first, so that it becomes
svp + (-1+1), which is safe.

commit | commitdiff | tree

Dominic Hargreaves [Wed, 21 Nov 2018 10:49:39 +0000 (10:49 +0000)]

lgtm.yml: fix erroneous inclusion

commit | commitdiff | tree

Karen Etheridge [Tue, 20 Nov 2018 22:46:24 +0000 (14:46 -0800)]

add entries for Module-CoreList 5.0181220

I did this manually; I am not sure if 'perl -Ilib Porting/corelist.pl
cpan' would add these entries on the next blead-point release day.

commit | commitdiff | tree

Karen Etheridge [Tue, 20 Nov 2018 22:46:05 +0000 (14:46 -0800)]

bump version of released Module-CoreList

commit | commitdiff | tree

Karen Etheridge [Tue, 20 Nov 2018 22:24:49 +0000 (14:24 -0800)]

Bump the perl version in various places for 5.29.6

commit | commitdiff | tree

Karen Etheridge [Tue, 20 Nov 2018 22:18:38 +0000 (14:18 -0800)]

new perldelta for 5.29.6

commit | commitdiff | tree

Karen Etheridge [Tue, 20 Nov 2018 22:17:22 +0000 (14:17 -0800)]

tick off 5.29.5 release

commit | commitdiff | tree

Karen Etheridge [Tue, 20 Nov 2018 22:16:44 +0000 (14:16 -0800)]

epigraph for 5.29.5 release

commit | commitdiff | tree

Karen Etheridge [Tue, 20 Nov 2018 20:47:18 +0000 (12:47 -0800)]

add new release to perlhist

commit | commitdiff | tree

Karen Etheridge [Tue, 20 Nov 2018 20:24:47 +0000 (12:24 -0800)]

finalize perldelta for 5.29.5

commit | commitdiff | tree

Karen Etheridge [Tue, 20 Nov 2018 20:24:04 +0000 (12:24 -0800)]

RMG tweaks

commit | commitdiff | tree

Karen Etheridge [Tue, 20 Nov 2018 20:23:54 +0000 (12:23 -0800)]

fix whitespace

commit | commitdiff | tree

Karen Etheridge [Tue, 20 Nov 2018 20:19:07 +0000 (12:19 -0800)]

update list of customized files

commit | commitdiff | tree

Karen Etheridge [Tue, 20 Nov 2018 19:36:53 +0000 (11:36 -0800)]

Update Module::Corelist for 5.29.5

commit | commitdiff | tree

Karen Etheridge [Tue, 20 Nov 2018 18:57:43 +0000 (10:57 -0800)]

5.22 and 5.24 are now outside the regular release window

commit | commitdiff | tree

Karl Williamson [Tue, 20 Nov 2018 15:55:43 +0000 (08:55 -0700)]

regcomp.c: Rmv malformed assert()

This had a plain '=' instead of an '==', thus setting the value in the
process of asserting it, hence would always be true. But changing it to
'==' causes the assertion to fail in various cases.

I need to rethink this, so in the meantime am simply removing it.

Spotted by Dave Mitchell++

commit | commitdiff | tree

David Mitchell [Tue, 20 Nov 2018 10:59:22 +0000 (10:59 +0000)]

t/perf/benchmarks.t: improve do error checks

Make the checks for "do 't/perf/benchmarks'" look more like those
suggested for 'do' in perlfunc.

In particular, this may help track down the issue in RT #133663.

commit | commitdiff | tree

Niko Tyni [Sat, 17 Nov 2018 17:27:42 +0000 (19:27 +0200)]

Make Errno_pm.PL compatible with /usr/include/<ARCH>/errno.h

As seen in <https://bugs.debian.org/798955>, Debian glibc
maintainers intend to move system header files from /usr/include to
/usr/include/<arch> at some point. This would break Errno_pm.PL, which
has logic for asking cpp for the location of errno.h but fails earlier
if errno.h is not on a list of known paths.

Take the cpp fallback instead of dying. The behaviour should stay
identical as long as errno.h is not moved.

This will also enable multiarch (non-sysroot) cross builds.

Original patch by Helmut Grohne.

Bug-Debian: https://bugs.debian.org/875921

commit | commitdiff | tree

Tony Cook [Mon, 19 Nov 2018 23:25:59 +0000 (10:25 +1100)]

James Clarke is now a perl author

commit | commitdiff | tree

James Clarke [Mon, 19 Nov 2018 14:25:56 +0000 (14:25 +0000)]

Also work around renameat() kernel bug on GNU/kFreeBSD

commit | commitdiff | tree

Karl Williamson [Mon, 19 Nov 2018 20:59:56 +0000 (13:59 -0700)]

Allow forcing use of POSIX 2008 locale fcns

These thread-safe functions are not normally used on unthreaded builds,
retaining the use of the library functions that have long been used.
But, it is now possible to tell Configure to use them on unthreaded
builds.

commit | commitdiff | tree

Karl Williamson [Mon, 19 Nov 2018 20:55:04 +0000 (13:55 -0700)]

t/harness: Run tests for IO::Zlib sequentially

Most of these failed for me in one run inexplicably. This typically
means there was a glitch, and its likely to be that the tests somehow
interferred with each other. Rather than take the time to investigate
further, I changed harness to run the tests for this distribution
sequentially.

commit | commitdiff | tree

Karl Williamson [Sun, 18 Nov 2018 20:18:49 +0000 (13:18 -0700)]

PATCH: [perl #133649] fails on OpenBSD-6.4 unthreaded

OpenBSD has a (perfectly legal) different syntax for the string
parameter to setlocale(), so the failing tests are not actually valid on
that platform. Rather than go to the significant effort to create a
Configure probe to find out what platforms use what syntax, simply skip
these on this platform. The tests aren't skipped if built with threads,
as they actually help make sure that the code that deals with different
syntaxes for thread-safe handling of locales works in the face of
different syntaxes

commit | commitdiff | tree

Dominic Hargreaves [Mon, 19 Nov 2018 19:17:17 +0000 (19:17 +0000)]

lgtm.yml: work around some incorrect classification

commit | commitdiff | tree

David Mitchell [Mon, 19 Nov 2018 16:28:03 +0000 (16:28 +0000)]

perlguts: clarify SV types which are scalars

The '< SVt_PVAV' entry looked to one reader like malformed HTML
rather than indicating a numerical range.

http://nntp.perl.org/group/perl.perl5.porters/252585

commit | commitdiff | tree

David Mitchell [Mon, 19 Nov 2018 14:12:05 +0000 (14:12 +0000)]

ext/File-Find: support parallel testing

t/harness was recently modified to run tests under ext/ etc in parallel.
ext/File-Find/t/ has two test scripts which both use the same temporary
directory names.
Make taint.t use different names, so that it can run in parallel with
the other script.

commit | commitdiff | tree

David Mitchell [Mon, 19 Nov 2018 13:52:46 +0000 (13:52 +0000)]

ext/GDBM_File/t/fatal.t: support parallel testing

t/harness was recently modified to run tests under ext/ etc in parallel.
ext/GDBM_File/t/ has two test scripts which both use the same filename.
Make fatal.t use a different name, so that it can run in parallel with
the other script.

commit | commitdiff | tree

David Mitchell [Mon, 19 Nov 2018 12:38:27 +0000 (12:38 +0000)]

autodoc.pl: escape POD

RT #133638

This script generates perlapi.pod, and contains snippets of POD
which it inserts into that file. The metacpan web site was interpreting
this as pod for autodoc.pl and displaying it.

Escape the pod by prefixing each line with '|'.

commit | commitdiff | tree

Chris 'BinGOs' Williams [Mon, 19 Nov 2018 11:12:22 +0000 (11:12 +0000)]

Update Maintainers.pl to match reality

commit | commitdiff | tree

David Mitchell [Mon, 19 Nov 2018 08:34:40 +0000 (08:34 +0000)]

perldelta for davem's commits

commit | commitdiff | tree

Tony Cook [Mon, 19 Nov 2018 04:04:34 +0000 (15:04 +1100)]

perldelta for dda4a47798d6

commit | commitdiff | tree

Tony Cook [Mon, 19 Nov 2018 03:54:46 +0000 (14:54 +1100)]

perldelta for 7d5be4b6, ea9daa76, 109d4d79, 36a4593d, 7f4a9bc7

commit | commitdiff | tree

Tony Cook [Mon, 19 Nov 2018 03:47:14 +0000 (14:47 +1100)]

perldelta for 191f8909fa4e

commit | commitdiff | tree

Tony Cook [Mon, 19 Nov 2018 03:04:20 +0000 (14:04 +1100)]

(perl # 132147) improve robustness against corrupt SDBM databases

This merge makes a few changes to the SDBM database handling:

- in a few places, a corrupt page could be loaded, but despite
failing validation, it would still be cached, so a second call
would try to use the corrupt page, causing buffer overflows

- some code didn't validate on page load at all.

- adds three extra checks to the page validator

commit | commitdiff | tree

Tony Cook [Mon, 19 Nov 2018 02:58:33 +0000 (13:58 +1100)]

(perl #132147) only test corrupt dbs on archs that match

The files were generated for little-endian, sizeof(short) == 2 platforms,
only run those tests on such platforms.

commit | commitdiff | tree

Tony Cook [Wed, 7 Nov 2018 00:16:10 +0000 (11:16 +1100)]

(perl #132147) add extra block validation checks

and a few extra tests that fuzz testing found.

commit | commitdiff | tree

Tony Cook [Tue, 6 Nov 2018 03:23:48 +0000 (14:23 +1100)]

(perl #132147) don't cache invalid pages

When sdbm loads its page buffer from disk, in most cases it validates
the page and doesn't continue processing if it fails validation.

Unfortunately, in a few places it still marked the buffer as loaded
from that page, and later calls would then use that cached page,
causing a variety of problems, including buffer read overflows.

sdbm_firstkey() didn't validate the loaded page at all, it now does.

All places that validate the loaded page now on a failed validation:
  - invalidate the cached page (set pagbno to -1)
  - set the I/O error flag on the database object
  - set errno ($!) to EINVAL

The first ensures that later calls don't end up using an invalid cached
page.

The others allow the caller to check whether an error has occurred.

commit | commitdiff | tree

Tony Cook [Tue, 6 Nov 2018 03:12:53 +0000 (14:12 +1100)]

(perl #132147) add tests for corrupt files from tickets

commit | commitdiff | tree

Karl Williamson [Thu, 15 Nov 2018 17:57:24 +0000 (10:57 -0700)]

Add regnode NANYOFM

This matches when the existing node ANYOFM would not match; i.e., they
are complements.

I almost didn't create this node, but it turns out to significantly
speed up various classes of matches.  For example qr/[^g]/, both /i and
not, turn into this node; and something like

    (("a" x $large_number) . "b") =~ /[^a]/

goes through the string a word at a time, instead of previously
byte-by-byte.  Benchmarks are at the end of this mesage.

This node gets generated when complementing any single ASCII character
and when complementing any ASCII case pair, like /[^Gg]/.  It never gets
generated if the class includes a character that isn't ASCII (actually
UTF-8 invariant, which matters only on EBCDIC platforms).

The details of when this node gets constructed are complicated.  It
happens when the bit patterns of the characters in the class happen to
have certain very particular characteristics, depending on the vagaries
of the character set.  [BbCc] will do so, but [AaBb] does not.  [^01]
does, but not [^12].  Precisely, look at all the bit patterns of the
characters in the set, and count the total number of differing bits,
calling it 'n'.  If and only if the number of characters is 2**n, this
node gets generated.  As an example, on both ASCII and EBCDIC, the last
4 bits of '0' are 0000; of '1' are 0001; of '2' are 0010; and of '3' are
0011.  The other 4 bits are the same for each of these 4 digits.  That
means that only 2 bits differ among the 4 characters, and 2**2==4, so
the NANYOFM node will get generated.  Similarly, 8=1000 and 0=0000
differ only in one bit so 2**1==2, and so [^08] will generate this node.

We could consider in the future, an extension where, if the input
doesn't work to generate this node, that we construct the closure of
that input to generate this node, which would have false positives that
would have to be tested for.  The speedup of this node is so significant
that that could still be faster than what we have today.

The benchmarks are for a 64-bit word.  32-bits would not be as good.
Key:
    Ir   Instruction read
    Dr   Data read
    Dw   Data write
    COND conditional branches
    IND  indirect branches

The numbers (except for the final column) represent raw counts per loop
iteration.  The higher the number in the final column, the faster.

(('a' x 1) . 'b') =~ /[^a]/

          blead   nanyof  Ratio %
       -------- -------- --------
    Ir   2782.0   2648.0    105.1
    Dr    845.0    799.0    105.8
    Dw    531.0    500.0    106.2
  COND    431.0    419.0    102.9
   IND     22.0     22.0    100.0

(('a' x 10) . 'b') =~ /[^a]/

          blead   nanyof  Ratio %
       -------- -------- --------
    Ir   3358.0   2671.0    125.7
    Dr    998.0    801.0    124.6
    Dw    630.0    500.0    126.0
  COND    503.0    424.0    118.6
   IND     22.0     22.0    100.0

(('a' x 100) . 'b') =~ /[^a]/

          blead   nanyof  Ratio %
       -------- -------- --------
    Ir   9118.0   2773.0    328.8
    Dr   2528.0    814.0    310.6
    Dw   1620.0    500.0    324.0
  COND   1223.0    450.0    271.8
   IND     22.0     22.0    100.0

(('a' x 1000) . 'b') =~ /[^a]/

          blead   nanyof  Ratio %
       -------- -------- --------
    Ir  66718.0   3650.0   1827.9
    Dr  17828.0    923.0   1931.5
    Dw  11520.0    500.0   2304.0
  COND   8423.0    668.0   1260.9
   IND     22.0     22.0    100.0

(('a' x 10000) . 'b') =~ /[^a]/

          blead   nanyof  Ratio %
       -------- -------- --------
    Ir 642718.0  12650.0   5080.8
    Dr 170828.0   2048.0   8341.2
    Dw 110520.0    500.0  22104.0
  COND  80423.0   2918.0   2756.1
   IND     22.0     22.0    100.0

(('a' x 100000) . 'b') =~ /[^a]/

          blead   nanyof  Ratio %
       -------- -------- --------
    Ir      Inf 102654.8   6237.1
    Dr      Inf  13299.3  12788.9
    Dw      Inf    500.9 219708.7
  COND 800424.1  25419.1   3148.9
   IND     22.0     22.0    100.0

commit | commitdiff | tree

Karl Williamson [Thu, 15 Nov 2018 17:56:31 +0000 (10:56 -0700)]

regexec.c: Fix logic error

The function S_find_next_masked() could return a pointer to something
that wasn't wanted, returning prematurely due to a logic error I made.
This erroneous code is in 5.28.0, but I couldn't figure out any actual
bugs this caused, due to the circumstances it is called under.

The bug is I should have used 'xor' instead of complement and 'and'.
Thus trying to find 0x2f, with a mask of all F's also found 2e.

commit | commitdiff | tree

Karl Williamson [Fri, 16 Nov 2018 17:48:51 +0000 (10:48 -0700)]

Merge branch 'fixup after regcomp sizing pass removal' into blead

Having a sizing pass for compiling regular expression patterns forced
various other design decisions that are now no longer necessary, since
the sizing pass has been eliminated.

This series of commits removes a bunch of them, simplifying the code

commit | commitdiff | tree

Karl Williamson [Tue, 13 Nov 2018 21:17:37 +0000 (14:17 -0700)]

regcomp.c: Simplify early failure returns

Previous commits have removed the need for certain macros and generality
in returning from functions early. Correspondingly simplify

commit | commitdiff | tree

Karl Williamson [Wed, 7 Nov 2018 05:49:51 +0000 (22:49 -0700)]

regcomp.c: Remove no longer used parameter, and refactor

This static function no longer is called with a non-NULL final
parameter. That means it no longer returns a list, and its name is
hereby changed to reflect that. It also means the function can be
refactored and made simpler.

commit | commitdiff | tree

Karl Williamson [Wed, 7 Nov 2018 01:44:46 +0000 (18:44 -0700)]

regcomp.c: Remove now always NULL parameter

This parameter is always NULL. No need to have it in this static
function

commit | commitdiff | tree

Karl Williamson [Wed, 7 Nov 2018 01:26:39 +0000 (18:26 -0700)]

regcomp.c: Don't restart parse for /d to /u if no need to

This commit keeps track of if there are any operations encountered which
differ under /d from /u. If we switch to /u and haven't so far found
anything which differs, there's no need to reparse

commit | commitdiff | tree

Karl Williamson [Wed, 7 Nov 2018 01:10:36 +0000 (18:10 -0700)]

regcomp.c: Don't restart parse for /d to /u if reparsing anyway

Prior to this commit, if the rules changed from /d to /u, the parse was
immediately restarted. This commit changes that so that it doesn't do
this if it is known that the parse will be redone anyway, but a full
parse needs to done first in order to count the parentheses.

Doing this can avoid the need for an almost full extra reparse.

commit | commitdiff | tree

Karl Williamson [Wed, 7 Nov 2018 01:02:07 +0000 (18:02 -0700)]

regcomp.c: Don't restart parse now if doing so later

Prior to this commit, if it became apparent that long branches were
going to be needed, the parse was immediately restarted. This commit
changes that so that it doesn't do this if it is known that the parse
will be redone anyway, but a full parse needs to done first in order to
count the parentheses.

This can avoid an almost complete reparse in some situations.

commit | commitdiff | tree

Karl Williamson [Wed, 7 Nov 2018 00:41:18 +0000 (17:41 -0700)]

regcomp.c: Swap 'if' branches for readability

It's easier to understand if the simplest case is first in the code.

commit | commitdiff | tree

Karl Williamson [Wed, 7 Nov 2018 00:31:21 +0000 (17:31 -0700)]

regcomp.c: Refactor constructing EXACTish nodes

The previous commits have allowed us to refactor this to eliminate
redundancies.

Previously, the same logic was done separately for UTF-8 and non-UTF-8
patterns. This refactors so the logic is done once. The details differ
for UTF-8 and non-UTF-8. So that's where the differences lie, in the
details without having to duplicate the logic.

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom