This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Steve Hay [Wed, 6 Jan 2016 08:14:36 +0000 (08:14 +0000)]
Upgrade libnet from version 3.07 to 3.08
Tony Cook [Wed, 6 Jan 2016 05:16:04 +0000 (16:16 +1100)]
fix a typo in perl5233delta.pod
Pointed out by Andrew Rodland (hobbs) on #p5p
Tony Cook [Wed, 6 Jan 2016 03:27:46 +0000 (14:27 +1100)]
[perl #126240] avoid leaking memory when setting $ENV{foo} on darwin
My change in e396210 was incomplete, that change was intended to
force use of setenv()/unsetenv() on Darwin, but ended up using putenv()
instead, which is a leaky mechanism.
Added darwin to the list of many others that work better with setenv()/
unsetenv().
Lukas Mai [Tue, 5 Jan 2016 23:35:24 +0000 (00:35 +0100)]
perlsyn: change = to == in conditional in do/while example
... also remove unused LOOP label from 'last' example, mention 'redo'
(works like 'next' in this case), add example that combines
'next'/'last' (and requires the label).
Steve Hay [Tue, 5 Jan 2016 17:37:21 +0000 (17:37 +0000)]
Upgrade Unicode-Normalize from version 1.24 to 1.25
Steve Hay [Tue, 5 Jan 2016 13:12:45 +0000 (13:12 +0000)]
Upgrade bignum from version 0.41 to 0.42
Steve Hay [Mon, 4 Jan 2016 14:15:43 +0000 (14:15 +0000)]
Silence t/porting/cmp_version.t after Math-Big* upgrades
Steve Hay [Mon, 4 Jan 2016 13:55:50 +0000 (13:55 +0000)]
Upgrade Math-BigRat from version 0.260801 to 0.260802
(This maintains the one minor divergence between blead and cpan. The blead
version first appeared in
50a54b125c. I haven't examined whether this
difference needs to remain, or whether we can switch to the cpan version.)
Steve Hay [Mon, 4 Jan 2016 13:33:22 +0000 (13:33 +0000)]
Upgrade Math-BigInt-FastCalc from version 0.38 to 0.40
Steve Hay [Mon, 4 Jan 2016 13:32:30 +0000 (13:32 +0000)]
Upgrade Math-BigInt from version 1.999710 to 1.999714
Lukas Mai [Tue, 5 Jan 2016 12:04:24 +0000 (13:04 +0100)]
perlgit: many small changes
- verbatimize a paragraph of sample commands
- grammar: sent -> send
- consistently hyperlink all email addresses
- hyperlink RT tickets
- hyperlink commit hashes
- consistently refer to bisect.pl as F<Porting/bisect.pl>
- add F< > to things that look like filenames
Lukas Mai [Fri, 1 Jan 2016 14:45:47 +0000 (15:45 +0100)]
clarify meaning of waitpid returning 0 [perl #127080]
Lukas Mai [Fri, 1 Jan 2016 14:35:58 +0000 (15:35 +0100)]
explain meaning of negative PIDs in waitpid [perl #127080]
Tony Cook [Thu, 17 Dec 2015 00:15:31 +0000 (11:15 +1100)]
[perl #126922] avoid access to uninitialized memory in win32 crypt()
Previously the Win32 crypt implementation() would access the first
and second characters of the salt, even if the salt was zero length.
Add validation that will detect both a short salt and invalid
characters in the salt.
Andy Dougherty [Wed, 30 Dec 2015 03:58:51 +0000 (22:58 -0500)]
Add Configure support for fdclose() for [perl #126847].
This patch also adjusts the generated files suggested by
Porting/checkcfgvar.pl.
Andy Dougherty [Wed, 30 Dec 2015 03:47:42 +0000 (22:47 -0500)]
PATCH: Re: [perl #126847] fdclose(3) patch
This patch uses the fdclose() function from FreeBSD if it
is available. It is based on the original patch supplied
by Mariusz Zaborski <oshogbo@FreeBSD.org> in the RT ticket.
The next patch will add Configure support for HAS_FDCLOSE.
David Mitchell [Mon, 4 Jan 2016 13:15:19 +0000 (13:15 +0000)]
Porting/bench.pl: add --compact option
With this, you specify which perl executable you want the results for,
and it will display the result in a much more compact form than when
displaying the results for all perls, with just one line per test.
David Mitchell [Mon, 4 Jan 2016 11:47:18 +0000 (11:47 +0000)]
Porting/bench.pl: preserve test order
In the absence of a --sort option, process and display the tests in the
order they appear in the test file, rather than in alphabetical order.
This is because the layout in the benchmark file usually follows some sort
of logical order
James E Keenan [Sun, 3 Jan 2016 23:00:17 +0000 (18:00 -0500)]
Remove superfluous entry in checkAUTHORS.pl.
Tony Cook [Sun, 3 Jan 2016 22:43:18 +0000 (09:43 +1100)]
perldelta for
4732711e2548
Andreas Koenig [Sun, 3 Jan 2016 07:40:33 +0000 (08:40 +0100)]
Remove nm from libswanted
Nm stood for "New Math" library in the context of 1994. 2014 a conflicting
library libnm appeared that has a network manager context.
James E Keenan [Sun, 3 Jan 2016 21:59:46 +0000 (16:59 -0500)]
Provide additional email address for contributor Mattia Barbon.
Mattia Barbon [Sun, 3 Jan 2016 20:54:31 +0000 (21:54 +0100)]
Replace :: with __ in THIS like it's done for parameters/return values
Apart from being more consistent, this simplifies writing XS code
wrapping C++ classes into a nested Perl namespace (it reqquires only
a typedef for Foo__Bar rather than two, one for Foo_Bar and the other
for Foo::Bar).
Impact is likely to be minimmal: it will only affect classes:
- in C++ extensions (there is no way to make Foo::Bar *THIS compile in C)
- that use Foo::Bar only as a receiver (if they use it as a
parameter/return value the typedef is already there)
Given that a class is always used as the return valus in a normal
constructor, this case should be relatively rare.
given this Foo.xs file:
MODULE=Foo PACKAGE=Foo::Bar
TYPEMAP: <<EOT
TYPEMAP
Foo::Bar * T_PTRREF
EOT
Foo::Bar *
Foo::Bar::moo(Foo::Bar *foo)
the output of
perl -Ilib lib/ExtUtils/xsubpp -prototypes Foo.xs
| grep -A8 moo | head -n 10
changes from:
XS_EUPXS(XS_Foo__Bar_moo); /* prototype to pass -Wmissing-prototypes */
XS_EUPXS(XS_Foo__Bar_moo)
{
dVAR; dXSARGS;
if (items != 2)
croak_xs_usage(cv, "THIS, foo");
{
Foo::Bar * THIS;
Foo__Bar * RETVAL;
Foo__Bar * foo;
to:
XS_EUPXS(XS_Foo__Bar_moo); /* prototype to pass -Wmissing-prototypes */
XS_EUPXS(XS_Foo__Bar_moo)
{
dVAR; dXSARGS;
if (items != 2)
croak_xs_usage(cv, "THIS, foo");
{
Foo__Bar * THIS;
Foo__Bar * RETVAL;
Foo__Bar * foo;
James E Keenan [Sun, 3 Jan 2016 19:51:52 +0000 (14:51 -0500)]
perldelta: podlators upgrade to 4.04
Karen Etheridge [Sun, 3 Jan 2016 19:05:17 +0000 (11:05 -0800)]
update podlators to 4.04
David Mitchell [Sun, 3 Jan 2016 19:34:26 +0000 (19:34 +0000)]
fix -DPERL_GLOBAL_STRUCT_PRIVATE builds
t/porting/libperl.t checks that, under -DPERL_GLOBAL_STRUCT_PRIVATE
builds, there are no bss symbols. This line in locale.c was failing that
test:
static char ret[128] = "";
By changing it to
static char ret[128] = "x";
it's no longer BSS data and the test passes.
Bit of a hack, but that function only exists in debugging builds, so it
doesn't matter too much.
Aaron Crane [Sun, 3 Jan 2016 14:29:43 +0000 (14:29 +0000)]
Remove unwarranted assertion in Perl_newATTRSUB_x()
RT #126845: if a stub subroutine definition with a prototype has been seen,
then any subsequent stub (or definition) of the same subroutine with an
attribute was causing an assertion failure because of a null pointer.
This assertion was added in
2eaf799e74b14dc77b90d5484a3fd4ceac12b46a, which
itself would already have triggered this assertion failure, even though all
subsequent uses of the pointer in question were guarded with non-null
conditions. So merely deleting the assertion is the right thing.
Ricardo Signes [Fri, 1 Jan 2016 02:54:49 +0000 (21:54 -0500)]
*glob{FILEHANDLE} is no longer deprecated
We are now trying to use deprecation warnings only when we believe
that a behavior will really be removed or made an error. Since we
don't plan to do that with *glob{FILEHANDLE}, the warning is not
useful and may be harmful.
See discussion at [perl #127060].
Karen Etheridge [Sun, 20 Dec 2015 03:08:24 +0000 (19:08 -0800)]
Update podlators to version 4.03
Karen Etheridge [Fri, 1 Jan 2016 18:46:31 +0000 (10:46 -0800)]
podcheck.t: tell the author where the problems db is located
Karen Etheridge [Mon, 21 Dec 2015 05:22:08 +0000 (21:22 -0800)]
RMG: fix typo, clarify instructions a bit
Andy Dougherty [Thu, 31 Dec 2015 14:01:06 +0000 (09:01 -0500)]
[PATCH] Try more crypt algorithms in the tests, for OpenBSD.
OpenBSD implements the Blowfish algorithm, but not the MD5 one used
by glibc. Enhance the crypt and taint tests to try both algorithms.
If neither works, fall back to no algorithm. The Blowfish salt
is taken from the OpenBSD crypt(3) page.
Ricardo Signes [Tue, 29 Dec 2015 21:01:39 +0000 (16:01 -0500)]
release schedule: add release managers for 2016Q1
Lukas Mai [Mon, 28 Dec 2015 01:03:20 +0000 (02:03 +0100)]
File::Find: update POD/comments
- change double spaces to single spaces
- remove comment that got lost during the POD reshuffling in
f4eedc6b8c8
(and probably should have been a commit message in the first place)
- remove use of "EG:" that makes no sense to me
- remove reference to hints/machten.sh (removed in
e94c1c0554 6 years
ago)
- change L<The wanted function> to L</The wanted function> because
that's what internal links should look like according to perlpod
- change S<_> to C<_> (it was S< _> originally but the space got lost
during a revert, making S<> into a no-op (but why would you write
S< _> in the first place?))
- link "taint-mode" to perlsec (probably only makes a difference in
HTML, not man)
- various typo/grammar fixes
- teach podcheck.t about find(1)
- bump version
Lukas Mai [Mon, 28 Dec 2015 00:58:50 +0000 (01:58 +0100)]
perlpodspec: fix typo
Karl Williamson [Sat, 26 Dec 2015 19:37:00 +0000 (12:37 -0700)]
regcomp.c: Add comment.
This should have been included in commit
285b5ca0145796a915dec03e87e0176fd4681041
Karl Williamson [Sat, 26 Dec 2015 19:35:32 +0000 (12:35 -0700)]
regexec.c: Avoid a function call
Not infrequently, a UTF-8 string will contain ASCII. In this case, by
adding a test for this we can avoid the function call that is needed for
more complicated cases.
Karl Williamson [Sat, 26 Dec 2015 19:34:07 +0000 (12:34 -0700)]
regcomp.h: Remove extraneous 'struct's
Better to not have this clutter.
Karl Williamson [Sat, 26 Dec 2015 18:47:26 +0000 (11:47 -0700)]
regcomp.h: Fix shift and mask
The mask removed here was to make sure that right shifting didn't
propagate the sign bit, but is unnecessary as the value shifted is
unsigned. And confining things to a U8 with that mask assumes that the
bit vector being operated on has 256 elements max. This isn't
necessarily true these days, as one can change ANYOF_BITMAP_SIZE.
In fact changing that number was failing until this commit.
It also adds white space to make it easier to read.
Karl Williamson [Sat, 26 Dec 2015 18:28:09 +0000 (11:28 -0700)]
regcomp.h: Use more basic macro in #defines
Instead of having this code repeated in several places, call
the more base macro from the others.
Karl Williamson [Fri, 25 Dec 2015 05:42:08 +0000 (22:42 -0700)]
regcomp.h: Free up bit in ANYOF FLAGS field
I've long been confronted with trying to do things to create a spare bit
to use. I thought it easier now, while it's fresh in my mind, to free
up one for future use, rather than re-learn things when it next becomes
necessary. It would have been a different story if the freed bit had
required a performance penalty.
This commit also updates the comments about how to create even more
spare bits should it become necessary.
Karl Williamson [Wed, 23 Dec 2015 19:43:30 +0000 (12:43 -0700)]
regcomp.h: Shorten, clarify names of internal flags
Some of the names are expanded slightly and not shortened
Karl Williamson [Wed, 23 Dec 2015 19:38:23 +0000 (12:38 -0700)]
APItest.xs: Silence compiler warning on 32-bit machines
One warning remains, otherwise things don't work.
Karl Williamson [Wed, 23 Dec 2015 19:30:40 +0000 (12:30 -0700)]
mktables: Free up some memory after final use
This may be enough for some platforms that aren't able to compile the
Unicode tables to work. BUt it's quite late in the process. The
ultimate solution would be for the tables to all be compiled ahead of
time. That is under consideration for the future.
Karl Williamson [Wed, 23 Dec 2015 18:29:08 +0000 (11:29 -0700)]
t/thread_it.pl: Increase stack size for AIX
This is enough to get the smoker to pass t/re/pat_thr.t
David Golden [Tue, 22 Dec 2015 20:49:17 +0000 (15:49 -0500)]
Update release manager's guide
Karl Williamson [Tue, 22 Dec 2015 03:52:50 +0000 (20:52 -0700)]
PATCH: [perl #126261: Assertion failure on missing [ in qr//
This is the result of the regex compiler creating a temporary buffer to
parse a portion of the input pattern, and then when an error or warning
occurs in that buffer, trying to use addresses both inside it and the
original pattern.
The solution here is a general one, that confines the heavy lifting to
one macro, plus a little setup and tear-down around the use of the
temporary buffer. The comments in the code detail how we relate the
address of the error in the temporary back to the parallel address in
the input pattern.
Karl Williamson [Tue, 22 Dec 2015 03:38:14 +0000 (20:38 -0700)]
regcomp.c: update RExC_start when parsing outside input
I noticed this while code reading. In places, regcomp parses not the
input pattern but a temporary buffer it constructs, based on that input
pattern. RExC_start should be updated so it always is pointing to the
same buffer as the parse pointer; otherwise segfaults can happen.
I have no idea how one currently can get into the situation this
protects against, so there are no tests added.
Karl Williamson [Tue, 22 Dec 2015 01:26:37 +0000 (18:26 -0700)]
regcomp.c: Add a stable pattern end pointer.
RExC_end is set sometimes during pattern compilation to perhaps another
string in memory. Messages are output based on the original string, so
create an end pointer that is in terms of that original string,
otherwise could get segfaults.
Karl Williamson [Tue, 22 Dec 2015 01:18:36 +0000 (18:18 -0700)]
t/lib/warnings/regcomp: Fix typo in comment
Karl Williamson [Tue, 22 Dec 2015 00:56:13 +0000 (17:56 -0700)]
regcomp.c: Use macro instead of recalculating
There is a macro that does the job that this code does. Use it.
Karl Williamson [Mon, 21 Dec 2015 04:48:04 +0000 (21:48 -0700)]
regcomp.c: Move calculations to common macro
This consolidates identical calculations into a single place, which
makes things easier to maintain.
Probably the reason they previously were dispersed, is because now the
common macro has to evaluate the same expression more than once. Since
the macro is used to return a list, it can't be turned into a single
statement.
Any decent optimizing compiler will extract the common subexpressions
and evaluate them just once. But even if not, the macro is called only
in the event of a fatal error (in which case speed is not important), or
to raise a warning, which we expect to be rare, and the extra work is
negligible in comparison with what is needed to output the message.
Karl Williamson [Mon, 21 Dec 2015 20:37:20 +0000 (13:37 -0700)]
regcomp.h: reword some comments
Karl Williamson [Mon, 21 Dec 2015 21:47:05 +0000 (14:47 -0700)]
regcomp.c: Make some params to a static fcn const
This is just acting on the TODO comment.
Karl Williamson [Fri, 20 Nov 2015 03:51:04 +0000 (20:51 -0700)]
regcomp.c: Add 2 basic assertions
These should be true because an SV* should always have a trailing NUL,
but a lot of things in this code depend on it. It's worthwhile to point
that out; I wasn't sure it was true until I investigated. And an
assert() makes sure it is really true
Karl Williamson [Wed, 21 Oct 2015 04:23:00 +0000 (22:23 -0600)]
pp_hot.c: Add assertion
This will make the cause of any future failures more clear.
Karl Williamson [Wed, 21 Oct 2015 04:21:42 +0000 (22:21 -0600)]
perlapi: Clarify 'string' vs. buffer
A string strictly is NUL terminated, but our terminology is lax
Karl Williamson [Wed, 21 Oct 2015 04:08:59 +0000 (22:08 -0600)]
utf8.h: Add 2 assertions
This makes sure in DEBUGGING builds that the macro is called correctly.
Chris 'BinGOs' Williams [Tue, 22 Dec 2015 14:29:48 +0000 (14:29 +0000)]
Controlled demolition, CoreList is 5.
20151220
Karl Williamson [Tue, 22 Dec 2015 04:29:12 +0000 (21:29 -0700)]
Deprecate to_utf8_case()
See http://nntp.perl.org/group/perl.perl5.porters/233287
David Golden [Mon, 21 Dec 2015 23:17:43 +0000 (18:17 -0500)]
Bump the perl version in various places for 5.23.7
David Golden [Mon, 21 Dec 2015 23:07:32 +0000 (18:07 -0500)]
Create new perldelta.pod for v5.23.7
David Golden [Mon, 21 Dec 2015 22:59:15 +0000 (17:59 -0500)]
Updated release schedule
David Golden [Mon, 21 Dec 2015 22:58:32 +0000 (17:58 -0500)]
Updated Porting/epigraphs.pod for v5.23.6
David Golden [Mon, 21 Dec 2015 18:37:03 +0000 (13:37 -0500)]
add new release to perlhist
David Golden [Mon, 21 Dec 2015 18:31:37 +0000 (13:31 -0500)]
Update perldelta with additional module updates
David Golden [Mon, 21 Dec 2015 18:15:03 +0000 (13:15 -0500)]
Update perldelta with Module::CoreList version bump
David Golden [Mon, 21 Dec 2015 18:14:48 +0000 (13:14 -0500)]
Update Module::CoreList from 5.23.6
David Golden [Mon, 21 Dec 2015 17:01:22 +0000 (12:01 -0500)]
Update perldelta to near-final state
Karl Williamson [Mon, 21 Dec 2015 15:38:38 +0000 (08:38 -0700)]
perldelta for case changing on caseless language
Karl Williamson [Mon, 21 Dec 2015 05:28:38 +0000 (22:28 -0700)]
perldelta for -Dr fix
David Golden [Mon, 21 Dec 2015 04:52:01 +0000 (23:52 -0500)]
Update perldelta
This commit adds various release notes covering:
* module updates
* documentation updates
* some bug fixes and internal changes
David Golden [Mon, 21 Dec 2015 02:16:19 +0000 (21:16 -0500)]
Correct perldelta typo
David Golden [Mon, 21 Dec 2015 02:19:47 +0000 (21:19 -0500)]
Add alternate email address for dagolden to checkAUTHORS.pl
Lukas Mai [Mon, 21 Dec 2015 02:23:05 +0000 (03:23 +0100)]
perldelta for
18371617dfb (B::Deparse)
Craig A. Berry [Sun, 20 Dec 2015 16:12:36 +0000 (10:12 -0600)]
Do not define invlistEQ in the re extension.
Because it's already defined in regcomp.c and the VMS build was
failing with a linker error (multiply-defined symbol).
Karl Williamson [Sat, 19 Dec 2015 18:22:04 +0000 (11:22 -0700)]
regcomp.c: Skip some work
We can optimize ANYOF nodes that are equivalent to POSIX character
classes. Discovering if they are equivalent takes work, which can be
skipped with a simple test that will rule out many run-of-the-mill
character classes.
Karl Williamson [Sat, 19 Dec 2015 18:19:35 +0000 (11:19 -0700)]
regcomp.c: White space only
Indent a section of code in preparation for the next commit which will
make it into a block.
Karl Williamson [Sat, 19 Dec 2015 18:14:07 +0000 (11:14 -0700)]
regcomp.c: Add comments
Karl Williamson [Sat, 19 Dec 2015 16:49:00 +0000 (09:49 -0700)]
mktables: Add "$0:" to its first output
So in a make, it is abundantly clear where the messages are coming from
Karl Williamson [Sat, 19 Dec 2015 05:59:35 +0000 (22:59 -0700)]
regcomp.c: Silence uninit compiler warning
This shouldn't actually happen, and g++ under -O0 didn't flag it, but
gcc under -O2 does, so initialize to an illegal value
Karl Williamson [Sat, 19 Dec 2015 05:51:23 +0000 (22:51 -0700)]
regcomp.c: Remove outdated comments
These were invalidated by commit
709be747a32edc503b4645d9c5396bd4b40100d2
Karl Williamson [Sat, 19 Dec 2015 05:04:20 +0000 (22:04 -0700)]
Jarkko Hietaniemi [Fri, 18 Dec 2015 13:36:25 +0000 (08:36 -0500)]
perldelta for
572cd850,
406d5545 (signbit)
Jarkko Hietaniemi [Fri, 18 Dec 2015 13:26:41 +0000 (08:26 -0500)]
perldelta for the hexfp %a fixes.
Jarkko Hietaniemi [Fri, 18 Dec 2015 13:13:39 +0000 (08:13 -0500)]
perldelta for 3118d7d,74c6ce8,1f02ab1 (ppc64el fp)
Jarkko Hietaniemi [Fri, 18 Dec 2015 13:12:57 +0000 (08:12 -0500)]
perldelta for 68bcb86 (openindiana: useshrplib for all solaris)
Jarkko Hietaniemi [Thu, 17 Dec 2015 02:57:31 +0000 (21:57 -0500)]
Configure: notes on the m68881 extended precision format
Jarkko Hietaniemi [Fri, 18 Dec 2015 12:19:12 +0000 (07:19 -0500)]
Double-double implementations differ.
Karl Williamson [Thu, 17 Dec 2015 17:22:44 +0000 (10:22 -0700)]
Optimize some qr/[...]/ classes
Bracketed character classes generally generate an ANYOF-type regnode,
which consists of a bitmap for the lower code points, and an inversion
list or swash to handle ones not in the bitmap. They take up more
memory than other regnode types. There are already some optimizations
that use a smaller and/or faster regnode instead. For example, some
people prefer not to use a backslash to escape metacharacters, instead
writing something like /abc[.]def/. This has for some time generated
the same thing as /abc\.def/ does, namely a single EXACT node, which is
both smaller and faster than an ANYOF node in the middle of two EXACT
nodes.
This commit adds some optimizations that hadn't been done previously.
Now things like /[\p{Word}]/ will optimize to \w, for example. I had
not done this before, because my tests had shown very little performance
difference, but I had added most of the code to regcomp.c so it wouldn't
get lost, #ifdef'd out.
It turns out that I hadn't tested on code points above the bitmap, which
with this commit have a small, but appreciable speed up in matching, so
this commit enables and finishes that code.
Prior to this commit, things like /[[:word:]]/ were optimized to \w, but
things like /[_[:word:]]/ were not. This commit fixes that.
If the following command is run on a perl compiled with -O2 and no
DEBUGGING:
blead Porting/bench.pl --raw --benchfile=charclass_perf --perlargs=-Ilib /path_to_prior_perl="before this commit" /path_to_this_perl=after
and the file 'charclass_perf' contains
[
'regex::charclass::ascii' => {
desc => 'charclass, ascii range',
setup => 'my $a = qr/[\p{Word}]/',
code => '"A" =~ $a'
},
'regex::charclass::upper_latin1' => {
desc => 'charclass, upper latin1 range',
setup => 'my $a = qr/[\p{Word}]/',
code => '"\x{e0}" =~ $a'
},
'regex::charclass::above_latin1' => {
desc => 'charclass, above latin1 range',
setup => 'my $a = qr/[\p{Word}]/',
code => '"\x{100}" =~ $a'
},
'regex::charclass::high_Unicode' => {
desc => 'charclass, high Unicode code point',
setup => 'my $a = qr/[\p{Word}]/',
code => '"\x{10FFFF}" =~ $a'
},
];
the following results are obtained:
The numbers represent raw counts per loop iteration.
regex::charclass::above_latin1
charclass, above latin1 range
before this commit after
------------------ --------
Ir 3344.0 2888.0
Dr 971.0 855.0
Dw 604.0 541.0
COND 575.0 504.0
IND 25.0 25.0
COND_m 11.0 10.7
IND_m 10.0 10.0
Ir_m1 8.9 6.0
Dr_m1 3.0 3.2
Dw_m1 1.5 1.4
Ir_mm 0.0 0.0
Dr_mm 0.0 0.0
Dw_mm 0.0 0.0
regex::charclass::ascii
charclass, ascii range
before this commit after
------------------ --------
Ir 2661.0 2649.0
Dr 798.0 795.0
Dw 516.0 517.0
COND 467.0 465.0
IND 23.0 23.0
COND_m 10.0 8.8
IND_m 10.0 10.0
Ir_m1 7.9 0.0
Dr_m1 2.9 3.1
Dw_m1 1.3 1.3
Ir_mm 0.0 0.0
Dr_mm 0.0 0.0
Dw_mm 0.0 0.0
regex::charclass::high_Unicode
charclass, high Unicode code point
before this commit after
------------------ --------
Ir 3344.0 2888.0
Dr 971.0 855.0
Dw 604.0 541.0
COND 575.0 504.0
IND 25.0 25.0
COND_m 11.0 10.7
IND_m 10.0 10.0
Ir_m1 8.9 6.0
Dr_m1 3.0 3.2
Dw_m1 1.5 1.4
Ir_mm 0.0 0.0
Dr_mm 0.0 0.0
Dw_mm 0.0 0.0
regex::charclass::upper_latin1
charclass, upper latin1 range
before this commit after
------------------ --------
Ir 2661.0 2651.0
Dr 798.0 796.0
Dw 516.0 517.0
COND 467.0 466.0
IND 23.0 23.0
COND_m 11.0 8.8
IND_m 10.0 10.0
Ir_m1 7.9 0.0
Dr_m1 2.9 3.3
Dw_m1 1.5 1.2
Ir_mm 0.0 0.0
Dr_mm 0.0 0.0
Dw_mm 0.0 0.0
Karl Williamson [Wed, 16 Dec 2015 20:24:45 +0000 (13:24 -0700)]
regcomp.h: Add comments
Karl Williamson [Wed, 16 Dec 2015 19:06:46 +0000 (12:06 -0700)]
regex matching: Don't do unnecessary work
This commit sets a flag at pattern compilation time to indicate if
a rare case is present that requires special handling, so that that
handling can be avoided unless necessary.
Karl Williamson [Wed, 16 Dec 2015 18:40:18 +0000 (11:40 -0700)]
regcomp.h: Renumber 2 flag bits
This changes the spare bit to be adjacent to the LOC_FOLD bit, in
preparation for the next commit, which will use that bit for a
LOC_FOLD-related use.
Karl Williamson [Wed, 16 Dec 2015 18:05:17 +0000 (11:05 -0700)]
regex: Free a ANYOF node bit
This is done by combining 2 mutually exclusive bits into one. I hadn't
seen this possibility before because the name of one of them misled me.
It also misled me into turning on one that flag unnecessarily, and to
miss opportunities to not have to create a swash at runtime. This
commit corrects those things as well.
Karl Williamson [Wed, 16 Dec 2015 05:42:18 +0000 (22:42 -0700)]
regcomp.c: Move comments adjacent to their object
Karl Williamson [Wed, 16 Dec 2015 05:20:20 +0000 (22:20 -0700)]
regcomp.c: Try simplifications in some qr/[...]/d
Characters in a bracketed character class can come from a bunch of
sources, all bundled together. Some things under /d match only when the
target string is UTF-8; some match only when it isn't UTF-8. Other
sources may introduce ones that match regardless. It may be that some
things are specified as conditionally matching from one source, and as
unconditionally matching from another. We can subtract the
unconditionals from the conditionals, leaving a simpler set of things
that must be conditionally matched. In some cases, the conditional set
may go to zero, allowing other optimizations to happen that otherwise
couldn't. An example is
qr/[\W\xAB]/
which before this commit compiled to:
ANYOFD[^0-9A-Z_a-z\x{80}-\x{AA}\x{AC}-\x{FF}][{non-utf8-latin1-all}
{utf8}0080-00A9 00AC-00B4 00B6-00B9 00BB-00BF 00D7 00F7
02C2-02C5...] (12)
and after it, compiles to
ANYOFD[^0-9A-Z_a-z\x{AA}\x{B5}\x{BA}\x{C0}-\x{D6}\x{D8}-\x{F6}
\x{F8}-\x{FF}][{non-utf8-latin1-all}{utf8}02C2-02C5...] (12)
Notice that the {utf8} component has been stripped of everything below
256. That means no swash has to be created at runtime when matching
code points below 256, unlike the case before this commit.
A starker example, though unlikely in real life except in
machine-generated code, is
qr/[\w\W]/
Before this commit, it would generate:
ANYOFD[\x{00}-\x{7F}][{non-utf8-latin1-all}{above_bitmap_all}
{utf8}0080-00FF]
and afterwards, simply:
SANY
Karl Williamson [Wed, 16 Dec 2015 04:46:42 +0000 (21:46 -0700)]
regcomp.c: Change variable name to be clearer
This name confused me, and led to suboptimal code. The new name is more
cumbersome, but won't confuse (at least it won't confuse me).
Jarkko Hietaniemi [Thu, 17 Dec 2015 01:19:03 +0000 (20:19 -0500)]
Configure: grep -q is not portable
It does not work in SysV (solaris) or old BSD greps.
Steve Hay [Thu, 17 Dec 2015 11:08:16 +0000 (11:08 +0000)]
Revert "Upgrade Socket from 2.020 to 2.021"
This reverts commit
0bd66ca801c5fb84ee6a8feeb8114f0d8248029f.
Worked for me, but Jenkins isn't happy :-(
Steve Hay [Thu, 17 Dec 2015 10:55:40 +0000 (10:55 +0000)]
Update META.yml following commit
0d99ea0387