This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Karl Williamson [Tue, 20 Dec 2016 18:58:38 +0000 (11:58 -0700)]
Create inversion list for Assigned code points
This will be used in a future commit.
Karl Williamson [Mon, 19 Dec 2016 20:20:44 +0000 (13:20 -0700)]
regen/mk_invlists.pl: Create list of Assigned code points
This creates a read-only C array to be compiled into the perl source
text segment of an inversion list of the characters that are assigned in
the current Unicode version. This will be used in a future commit.
The difference listing is large because of defects in the diff algorithm
Karl Williamson [Mon, 19 Dec 2016 18:46:10 +0000 (11:46 -0700)]
Don't assume input UTF-8 is well-formed in to_utf8_case()
This is a deprecated function, but it still should check input validity
as best it can.
This also adds to the pod that it will be removed in 5.28.
Karl Williamson [Mon, 19 Dec 2016 18:23:22 +0000 (11:23 -0700)]
Deprecate toFOO_utf8()
Now that there are _safe versions, deprecate the unsafe ones.
Karl Williamson [Mon, 19 Dec 2016 18:12:48 +0000 (11:12 -0700)]
Convert core to use toFOO_utf8_safe()
Karl Williamson [Mon, 19 Dec 2016 01:05:46 +0000 (18:05 -0700)]
Add toFOO_utf8_safe() macros
Karl Williamson [Fri, 16 Dec 2016 03:11:00 +0000 (20:11 -0700)]
Convert some calls to test for malformations
Code review showed several places in core where a UTF-8 sequence that
was for a code point below 256 could be malformed, and be blindly
accepted. Convert these to use the similar macro that does the check.
One place in regexec.c was not converted because it is working on the
pattern, which perl should have generated itself, so very unlikely to be
bemalformed.
I didn't add tests for these, as it would be a pain to figure out
somehow to trigger them, and this is precautionary, based on code
reading rather than any known field experience.
Karl Williamson [Sun, 18 Dec 2016 23:33:08 +0000 (16:33 -0700)]
Don't assume input to case change macros is valid
Experience has shown that they can be invalid, and this commit now checks
for that. Further checking will be done in the next commit
Karl Williamson [Wed, 14 Dec 2016 20:02:06 +0000 (13:02 -0700)]
For character case changing, create macros and use
This creates several macros that future commits will use to provide a
layer between the caller and the function.
Karl Williamson [Wed, 14 Dec 2016 20:00:45 +0000 (13:00 -0700)]
regcomp.c, mathoms.c: Convert to use preferred macro
Better to use the macro than to directly call the function it wraps
Karl Williamson [Sat, 24 Dec 2016 04:29:15 +0000 (21:29 -0700)]
Scalar::List-Utils/t/tainted.t: Skip failing tests
These randomly fail, often enough to cause most smokes to not show
pass, hence there is continual work involved in looking at smoke
summaries and seeing everything failing, and having to do further
investigation on each one to know if the failure is because of this bug,
or something else. The fix for this has been delayed, so I'm creating a
temporary skip. This will start failing again, unless fixed, at the
next 5.25 dot release.
I'm not getting any failures about having customized this cpan module,
though I've tried several different configurations to do so. I fear
that when pushed, these will start appearing, but then it can be easily
remedied.
Karl Williamson [Mon, 12 Dec 2016 04:07:27 +0000 (21:07 -0700)]
perlapi: Italicize some C<text> that isn't as-is
This text appears in the middle of C<>, but is meant to be substituted
for, instead of being typed in as-is.
Karl Williamson [Mon, 12 Dec 2016 04:01:21 +0000 (21:01 -0700)]
handy.h: White-space, comment only
Karl Williamson [Mon, 12 Dec 2016 03:53:54 +0000 (20:53 -0700)]
utf8.c: Add flag to indicate unsure as to end of string to print
When decoding a UTF-8 encoded string, we may have guessed as to how long
it is. This adds a flag so that the base level decode routine knows
that it is a guess, and it minimizes what gets printed, rather than the
normal full information, so as to minimize reading past the end of the
string
Karl Williamson [Fri, 16 Dec 2016 02:51:26 +0000 (19:51 -0700)]
Deprecate isFOO_utf8() macros
These macros are being replaced by a safe version; they now generate a
deprecation message at each call site upon the first use there in each
program run.
Karl Williamson [Mon, 12 Dec 2016 03:35:09 +0000 (20:35 -0700)]
regexec.c: Make isFOO_lc() non-static
This is in preparation for it to be called from outside this file.
Karl Williamson [Fri, 9 Dec 2016 05:01:58 +0000 (22:01 -0700)]
utf8.c: White space, comments only
This indents code because a new block was formed around it. It also
does a few other white-space changes to fit in 79 columns, and removes
an unbalanced '{' in a comment so editors that find matching pairs
aren't fooled, and adds text to another comment
Karl Williamson [Sun, 11 Dec 2016 01:01:39 +0000 (18:01 -0700)]
Allow allowing UTF-8 overflow malformation
perl has never allowed the UTF-8 overflow malformation, for some reason.
But as long as overflows are turned into the REPLACEMENT CHARACTER,
there is no real reason not to. And making it allowable allows code
that wants to carry on in the face of malformed input to do so, without
risk of contaminating things, as the REPLACEMENT is the Unicode
prescribed way of handling malformations.
Karl Williamson [Sat, 10 Dec 2016 22:26:24 +0000 (15:26 -0700)]
Return REPLACEMENT for UTF-8 overlong malformation
When perl decodes UTF-8 into a code point, it must decide what to do if
the input is malformed in some way. When the flags passed to the decode
function indicate that a given malformation type is not acceptable, the
function returns 0 to indicate failure; on success it returns the decoded
code point (unfortunately that may require disambiguation if the
input is validly a NUL). As perl evolved, what happened when various
allowed malformations were encountered got stricter and stricter. This
is the final malformation that was not turned into a REPLACEMENT
CHARACTER when the malformation was allowed, and this commit changes to
return that. Unlike most other malformations, the code point value of
an overlong is well-defined, and that is why it hadn't been changed
here-to-fore. But it is safer to use the Unicode prescribed behavior on
all malformations, which is to replace them with the REPLACEMENT
CHARACTER. Just in case there is code that requires the old behavior,
it is retained, but you have to search the source for the undocumented
flag that enables it.
Karl Williamson [Wed, 14 Dec 2016 18:38:42 +0000 (11:38 -0700)]
Return REPLACEMENT for UTF-8 empty malformation
The previous commit no longer allows this so-called malformation under
DEBUGGING builds, except if code explicitly changes to request it (or
already explicitly does, but there are no instances of this in CPAN).
If it is explicitly allowed, prior to this commit it returned NUL. If
it wasn't allowed, it returned 0. Most code won't treat these as
different. When returning NUL, it basically is making nothing into
something, which might be exploitable some way by an attacker. The
Unicode accepted way of dealing with malformations is to replace them
with the REPLACEMENT CHARACTER, and so this commit changes things to
conform to this.
Karl Williamson [Mon, 19 Dec 2016 20:25:06 +0000 (13:25 -0700)]
utf8.c: Forbid zero-length malformation under DEBUGGING
Karl Williamson [Sat, 10 Dec 2016 19:51:59 +0000 (12:51 -0700)]
utf8.h: Don't allow zero length malformation unless requested
The bottom level Perl routine that decodes UTF-8 into a code point has
long accepted inputs where the length is specified to be 0, returning a
NUL. It considers this a malformation, which is accepted in some
scenarios, but not others. In consultation with Tony Cook, we decided
this really isn't a malformation, but is a bug in the calling program.
Rather than call the decode routine when it has nothing to decode, it
should just not call it.
This commit removes the acceptance of a zero length string from any of
the canned flag combinations passed to the decode function. One can
convert to specify this flag explicitly, if necessary. However the next
commit will cause this to fail under DEBUGGING builds, as a step towards
removing the capability altogether.
Karl Williamson [Sat, 10 Dec 2016 19:27:19 +0000 (12:27 -0700)]
utf8.h: Renumber flag bits
This creates a gap that will be filled by future commits
Karl Williamson [Tue, 13 Dec 2016 02:42:23 +0000 (19:42 -0700)]
toke.c: Replace infinite loop reading input by bounded
It's safer to have an upper limit on how far you look in your input.
Karl Williamson [Tue, 13 Dec 2016 02:36:36 +0000 (19:36 -0700)]
toke.c: Use fewer branches
This code is true for all ASCII space characters except \n. Rather
than enumerating them with a branch each, use a single lookup, and then
exclude \n
Karl Williamson [Tue, 6 Dec 2016 17:15:07 +0000 (10:15 -0700)]
toke.c: Use macro instead of repeating code
toke.c has a macro that does this task. Use it.
Karl Williamson [Tue, 6 Dec 2016 04:50:08 +0000 (21:50 -0700)]
toke.c: White-space only
Karl Williamson [Wed, 14 Dec 2016 01:34:12 +0000 (18:34 -0700)]
toke.c: Convert to use isFOO_utf8_safe() macros
Karl Williamson [Wed, 30 Nov 2016 16:53:17 +0000 (09:53 -0700)]
Convert core (except toke.c) to use isFOO_utf8_safe()
The previous commit added this feature; now this commit uses it in core.
toke.c is deferred to the next commit to aid in possible future
bisecting, because some of the changes there seem somewhat more likely
to expose bugs.
Karl Williamson [Thu, 15 Dec 2016 23:30:27 +0000 (16:30 -0700)]
Add isFOO_utf8_safe() macros
The original API does not check that we aren't reading beyond the end of
a buffer, apparently assuming that we could keep malformed UTF-8 out by
use of gatekeepers, but that is currently impossible. This commit adds
"safe" macros for determining if a UTF-8 sequence represents
an alphabetic, a digit, etc. Each new macro has an extra parameter
pointing to the end of the sequence, so that looking beyond the input
string can be avoided.
The macros aren't currently completely safe, as they don't test that
there is at least a single valid byte in the input, except by an
assertion in DEBUGGING builds. This is because typically they are
called in code that makes that assumption, and frequently tests the
current byte for one thing or another.
Karl Williamson [Sat, 3 Dec 2016 19:14:33 +0000 (12:14 -0700)]
toke.c: Avoid a conversion to/from UTF-8
If the source file is encoded as UTF-8, we don't have to find its code
point equivalent when parsing--we can just copy it unchanged. This
wasn't done before because of the fear the input would be malformed, and
finding the code point had the side effect of checking for
well-formedness. The previous commit added wellformedness checking,
so doing it again here would be redundant.
Karl Williamson [Fri, 2 Dec 2016 16:35:53 +0000 (09:35 -0700)]
PATCH: [perl #126310] single quote UTF-8 malformation detection
This adds UTF-8 wellformedness checking in Perl_lex_next_chunk, which
should get called for all program text, so this makes sure the entire
program is well-formed, not just single- or double-quoted strings.
Karl Williamson [Thu, 8 Dec 2016 04:08:38 +0000 (21:08 -0700)]
Die on malformed isFOO_utf8() input
At the p5p core hackathon in November 2016, it was decided to make the
previous deprecation message fatal for malformed input passed to the
isFOO_utf8() macros and friends.
Karl Williamson [Fri, 9 Dec 2016 15:45:18 +0000 (08:45 -0700)]
Use fnc to force out malformed warnings
The previous commit added a function to do this task. This current
commit changes the several places in the core that have here-to-fore
done this in an ad-hoc (and not as reliable) manner to use the new
function.
A couple of messages in toke.c are left in so as to avoid changing
diagnostics unnecessarily. If those messages had been created in the
project after the enhanced malformation warnings were created, they
would have been phrased differently.
The reason some of the methods weren't so reliable, is they relied on
fatalizing the warnng message. However if warnings are turned off, it
never gets to the point of outputting, hence doesn't necessarily die.
Karl Williamson [Thu, 8 Dec 2016 03:48:40 +0000 (20:48 -0700)]
Add fnc to force out UTF-8 malform warnings at death
The bottom level UTF-8 decode routine now generates detailed messages
when it encounters malformations. In some instances these should be
treated as croak reasons and output even if warnings are off, just
before dying. This commit adds a function to do this.
John Lightsey [Fri, 23 Dec 2016 17:35:45 +0000 (12:35 -0500)]
Switch most open() calls to three-argument form.
Switch from two-argument form. Filehandle cloning is still done with the two
argument form for backward compatibility.
Committer: Get all porting tests to pass. Increment some $VERSIONs.
Run: ./perl -Ilib regen/mk_invlists.pl; ./perl -Ilib regen/regcharclass.pl
For: RT #130122
Karl Williamson [Fri, 23 Dec 2016 01:36:33 +0000 (18:36 -0700)]
Silence win32 compiler warning
The function's parameter was not declared const in embed.fnc, but was in
the function itself.
Karl Williamson [Fri, 23 Dec 2016 01:24:26 +0000 (18:24 -0700)]
toke.c: Silence win32 compiler warning.
Karl Williamson [Sun, 18 Dec 2016 00:25:29 +0000 (17:25 -0700)]
utf8.c Extract common code into macros
The 3 case changing functions: to upper, lower, and title case are
essentially identical except for what they call to actually do the
change; those being different macros or functions.
The fourth function, to fold, is identical to the other three for the
first part of its code, but diverges at the end in order to handle some
special cases.
This commit replaces the first part of the bodies of these 4 functions
by a common macro. And it replaces the remainder of the first 3
functions by another common macro.
I'm not a fan of this kind of macro to use in generating code, but it
seems the best way to keep these definitions in sync. (It has to be a
macro instead of a function because one of the parameters is a macro,
which you can't pass to a function. I suppose one could create
functions that just calls their macro, and get around it that way, but
it doesn't seem worth it.)
This commit just moved the code to the macro, and I manually verified
that there were no logic changes.
1 of the passed-in functions requires one less argument (the final one)
than the other 3. I originally tried to do something with the C
preprocessor to get around that, but it didn't work with the Win32
version of the preprocessor, so I gave up and added a dummy parameter to
the fourth function, which is static so that's ok to do. Below, for the
record is my original attempt:
/* These two macros are used to make optional a parameter to the
* passed-in function to the macros just above. If the passed-in
* function doesn't take the parameter, use PLACEHOLDER in the macro
* call; otherwise surround the parameter by a PARAM() call */
#define PARAM(parameter) ,parameter
#define PLACEHOLDER /* Something for the preprocessor to grab onto */
And within the macro, it called the function like this:
L1_func(*p, ustrp, lenp/*,*/ L1_func_extra_param)
Karl Williamson [Sun, 18 Dec 2016 20:38:01 +0000 (13:38 -0700)]
APItest/t/handy.t: Bring final special case into loop
All the tests in this file are now in two loops, one for the isFOO()
macros, and the other for the toFOO() macros. Thus the main logic
applies to all, and tests can be added or changed easily.
Karl Williamson [Sun, 18 Dec 2016 20:17:45 +0000 (13:17 -0700)]
APItest/t/handy.t: White-space only
Indent newly formed block
Karl Williamson [Sun, 18 Dec 2016 19:40:06 +0000 (12:40 -0700)]
APItest/t/handy.t: Add more tests
Macros with the '_uvchr' suffix were not being tested at all. Instead,
the undocumented backwards-compatibility-only macros with the suffixes
_uni were being tested, but these might diverge, and the tests wouldn't
find that.
Karl Williamson [Sun, 18 Dec 2016 18:55:49 +0000 (11:55 -0700)]
APItest/t/handy.t: Add more tests
The macros like isALPHA() were not getting tested; instead the theory
being that testing isALPHA_A() was good enough because they are #defined
to be the same. But that might change and the tests wouldn't uncover
that. And it turned out that some things wern't getting tested at all
if there was no _A version of the macro, for example isALNUM(). This
commit adds test for the version of the isFOO() macros with no suffix.
Karl Williamson [Sun, 18 Dec 2016 02:43:28 +0000 (19:43 -0700)]
APItest/t/handy.t: Use abbrev. char name in test names
I got tired of seeing all these long character names fly by on my screen
while testing, so this changes to use any official Unicode abbreviation
when available. It's kind of silly to do this in this test, but I might
extract and improve this for more general use in tests of characters in
the future.
This also changes some imports so that the full module name need not
always be specified.
Karl Williamson [Sun, 18 Dec 2016 02:22:14 +0000 (19:22 -0700)]
APItest/t/handy.t: White-space only
indent newly formed block.
Karl Williamson [Sun, 18 Dec 2016 02:19:39 +0000 (19:19 -0700)]
APItest/t/handy.t: Fold in another special case
The previous commit revamped this .t to make most things
part of a single loop. This adds another thing that was outside it.
Karl Williamson [Thu, 15 Dec 2016 23:12:30 +0000 (16:12 -0700)]
APItest/t/handy.t: Refactor for maintenance
Over the years code has kept getting copied and modified slightly in
each new place. And a future commit would create still more. This cuts
down the number of slightly different versions to the minimum reasonably
attainable.
Tony Cook [Wed, 14 Dec 2016 03:24:08 +0000 (14:24 +1100)]
(perl #130335) fix numeric comparison for sort's built-in compare
For non-'use integer' this would always compare as NVs, but with
64-bit integers and non-long doubles, integers can have more
significant digits, making the sort <=> replacement less precise
than the <=> operator.
Use the same code to perform the comparison that <=> does, which
happens to be handily broken out into Perl_do_ncmp().
James E Keenan [Tue, 20 Dec 2016 23:27:36 +0000 (18:27 -0500)]
Clarify use of 'continue' keyword after 'given'.
For: RT #130324
Karl Williamson [Thu, 22 Dec 2016 04:31:06 +0000 (21:31 -0700)]
t/uni/variables.t: Test what it purports to test
One of the tests wasn't testing what it thought it was, since evalbytes
downgrades the input if if is UTF-8 encoded. Therefore, this needs to
use unicode_eval, as the other places in the .t that do similar things
use.
Karl Williamson [Tue, 20 Dec 2016 20:07:23 +0000 (13:07 -0700)]
toke.c: Simplify finding mirror-image close delimiter
This is the code that figures out what the closing delimiter is for a
given opening string delimiter. For most, it is the same character,
but for a few, it is a mirror-image character.
I have had to figure out multiple times how these couple lines of code
works. This time, as I started to comment it, so I wouldn't have to do
figure it out again, I realized that its cleverness wasn't really saving
anything, and might slow things down. So split into two parallel strings,
with one string containing the opening delimiters which have mirror
image closing ones, and the other containing those closing delimiters,
in the same order. So we find the offset into the first string of the
opening delimiter. If it isn't in that string, it isn't mirrored, but
if it does, the corresponding closing delimiter is found at the same
offset in the other string.
Karl Williamson [Tue, 20 Dec 2016 18:43:08 +0000 (11:43 -0700)]
toke.c: Skip some work for UTF-8 invariant
Since these chars are the same when encoded in UTF-8 as when not, no
need to do the extra UTF-8 work.
Karl Williamson [Tue, 20 Dec 2016 21:37:11 +0000 (14:37 -0700)]
pod/perlop: Note that need space between op and \w delim
You can't say qqXfooX because it thinks it is all one bareword. Note
this, and that
qq XfooX
works.
Karl Williamson [Thu, 22 Dec 2016 17:53:56 +0000 (10:53 -0700)]
PerlIO-scalar: Bump version to 0.26
David Mitchell [Thu, 22 Dec 2016 10:27:40 +0000 (10:27 +0000)]
PerlIOScalar_eof(): silence compiler warning:
scalar.xs:23:15: warning: variable āpā set but not used
[-Wunused-but-set-variable]
char *p;
I'm not sure why this has only started warning, but this commit shuts it
up anyway.
Chris 'BinGOs' Williams [Wed, 21 Dec 2016 13:59:31 +0000 (13:59 +0000)]
Deck the halls
James E Keenan [Wed, 21 Dec 2016 13:29:25 +0000 (08:29 -0500)]
Correct version number for Module::CoreList.
Sawyer X [Wed, 21 Dec 2016 10:20:54 +0000 (11:20 +0100)]
Update of Module::CoreList
Sawyer X [Wed, 21 Dec 2016 09:39:22 +0000 (10:39 +0100)]
Merge branch 'blead' of ssh://perl5.git.perl.org/perl into blead
Sawyer X [Wed, 21 Dec 2016 09:39:15 +0000 (10:39 +0100)]
Link to epigraph
James E Keenan [Tue, 20 Dec 2016 23:07:55 +0000 (18:07 -0500)]
Bump Module::CoreList version following 5.25.8 release.
Sawyer X [Tue, 20 Dec 2016 20:19:08 +0000 (21:19 +0100)]
Bump the perl version in various places for 5.25.9
Sawyer X [Tue, 20 Dec 2016 19:36:47 +0000 (20:36 +0100)]
New perldelta
Sawyer X [Tue, 20 Dec 2016 19:31:35 +0000 (20:31 +0100)]
Tick off release
Sawyer X [Tue, 20 Dec 2016 19:31:04 +0000 (20:31 +0100)]
Update epigraph. Will add link later
Sawyer X [Tue, 20 Dec 2016 19:29:09 +0000 (20:29 +0100)]
Merge branch 'release-5.25.8' into blead
Karl Williamson [Tue, 20 Dec 2016 17:00:27 +0000 (10:00 -0700)]
perldelta: Fix typo
Sawyer X [Tue, 20 Dec 2016 16:46:10 +0000 (17:46 +0100)]
add new release to perlhist
Sawyer X [Tue, 20 Dec 2016 16:27:02 +0000 (17:27 +0100)]
Pod typos
Sawyer X [Tue, 20 Dec 2016 16:23:29 +0000 (17:23 +0100)]
Update perldelta for 5.25.8
Sawyer X [Tue, 20 Dec 2016 15:55:34 +0000 (16:55 +0100)]
Update Module::CoreList for 5.25.8
Yves Orton [Tue, 20 Dec 2016 01:15:26 +0000 (02:15 +0100)]
document removal of non-standard hash function build options in perldelta
Karl Williamson [Mon, 19 Dec 2016 21:04:10 +0000 (14:04 -0700)]
utfebcdic.h: Fix typo in comment
Spotted by Christian Hansen
Karl Williamson [Mon, 19 Dec 2016 20:25:20 +0000 (13:25 -0700)]
perlapi: Expand on utf8n_to_uvchr_error
Karl Williamson [Sun, 18 Dec 2016 20:57:46 +0000 (13:57 -0700)]
perlapi: Add explanation for why certain macros don't exist.
This also fixes some orphaned references.
Chris 'BinGOs' Williams [Mon, 19 Dec 2016 09:26:24 +0000 (09:26 +0000)]
Update Test-Simple to CPAN version 1.302073
James E Keenan [Sun, 18 Dec 2016 23:20:56 +0000 (18:20 -0500)]
Correct one spelling error.
Tony Cook [Sun, 18 Dec 2016 22:28:18 +0000 (09:28 +1100)]
perldelta for
6b2c7479,
dd688536
James E Keenan [Sun, 18 Dec 2016 21:16:41 +0000 (16:16 -0500)]
perldelta for Test-Simple 1.302067 to 1.302071.
James E Keenan [Sun, 18 Dec 2016 14:01:09 +0000 (09:01 -0500)]
Upgrade Test-Simple to 1.302071.
Had to run ./perl -Ilib regen/lib_cleanup.pl.
James E Keenan [Sun, 18 Dec 2016 20:58:36 +0000 (15:58 -0500)]
Revert "Update Socket to CPAN version 2.024."
This reverts commit
3e7b45e4a2b8308f16a5ca9443c3f6b8caafe0a6.
Reason: test failures on Win32 not yet fully addressed.
Chris 'BinGOs' Williams [Sun, 18 Dec 2016 11:50:16 +0000 (11:50 +0000)]
Update Archive-Tar to CPAN version 2.24
[DELTA]
2.24 16/12/2016 (SREZIC)
- Handle tarballs compressed with pbzip2 (RT #119262)
James E Keenan [Sat, 17 Dec 2016 22:43:00 +0000 (17:43 -0500)]
Update Socket to CPAN version 2.024.
'porting/customized.t --regen' for Socket.pm and .xs
perldelta entry for Socket 2.020 to 2.024 upgrade.
James E Keenan [Sat, 17 Dec 2016 22:22:59 +0000 (17:22 -0500)]
perldelta entry for Pod::Simple 3.32 to 3.35 upgrade.
James E Keenan [Sat, 17 Dec 2016 22:14:13 +0000 (17:14 -0500)]
Update Pod-Simple to CPAN version 3.35.
From ChangeLog: Stabilize t/search50.t (per rurban). Turn off utf8 warnings
when trying to see if a file is UTF-8 or not.
David Mitchell [Fri, 16 Dec 2016 13:07:58 +0000 (13:07 +0000)]
regexes: make scanning for ANYOF faster
Given a character class of random chars (like [acgt] say, rather than
predefined ones like [\d], say), speed up the code in:
1) S_find_byclass(), which scans for the first char in the string that's
in that class (e.g. /[acgt]...../),
2) S_regrepeat() which scans past all chars that are in that class
(e.g. /....[acgt]+..../)
by hoisting an unchanging test outside the main while loop. So this:
while (s < end) {
if (ANYOF_FLAGS(node))
match = reginclass(*s, ...);
else
match = ANYOF_BITMAP_TEST(*s, ...);
...
}
becomes this:
if (ANYOF_FLAGS(node)) {
while (s < end) {
match = reginclass(*s, ...);
...
}
else
while (s < end) {
match = ANYOF_BITMAP_TEST(*s, ...);
...
}
}
The average of the 3 tests added to t/perf/benchmarks by this commit show
this change (raw numbers, lower better):
before after
-------- --------
Ir 3294.0 2763.0
Dr 900.7 802.3
Dw 356.0 390.0
COND 569.0 436.7
IND 11.0 11.0
COND_m 1.2 2.0
IND_m 7.3 7.3
Chris 'BinGOs' Williams [Fri, 16 Dec 2016 10:50:53 +0000 (10:50 +0000)]
Update Archive-Tar to CPAN version 2.22
[DELTA]
2.22 16/12/2016 (MANWAR)
- Add missing strict/warnings pragma to Constants.pm
Chris 'BinGOs' Williams [Thu, 15 Dec 2016 14:39:45 +0000 (14:39 +0000)]
Update bignum to CPAN version 0.47
Chris 'BinGOs' Williams [Thu, 15 Dec 2016 14:38:20 +0000 (14:38 +0000)]
Update Math-BigRat to CPAN version 0.2611
[DELTA]
2016-12-13 v0.2611 pjacklam
* Add more logic to Makefile.PL regarding INSTALLDIRS (CPAN RT #119199
and #119225).
2016-12-11 v0.2610 pjacklam
* Fix Makefile.PL so that this module installs over the core version.
Chris 'BinGOs' Williams [Thu, 15 Dec 2016 14:37:15 +0000 (14:37 +0000)]
Update Math-BigInt-FastCalc to CPAN version 0.5005
Chris 'BinGOs' Williams [Thu, 15 Dec 2016 14:35:53 +0000 (14:35 +0000)]
Update Math-BigInt to CPAN version 1.999806
[DELTA]
2016-12-13 v1.999806 pjacklam
* Add more logic to Makefile.PL regarding INSTALLDIRS (CPAN RT #119199
and #119225).
* In the TODO file, remove stuff that has been implemented.
2016-12-11 v1.999805 pjacklam
* Fix Makefile.PL so that this module installs over the core version.
* Add more tests for _nok() (binomial coefficient "n over k"). These new tests
revealed some problems with some of the backend libraries when _nok() was
given very large arguments.
* Remove t/Math/BigFloat/#Subclass.pm#, which is an Emacs temporary file
included by accident.
2016-12-07 v1.999804 pjacklam
* Implement as_bytes(), as requested (CPAN RT 119096). Also implement the
inverse conversion from_bytes(). This applies to Math::BigInt only. (Alas,
these methods will be inherited from Math::BigInt into Math::BigFloat,
Math::BigRat etc. where the methods won't work. Fixing this class
relationship is an issue of its own.)
* Implement _as_bytes() and _from_bytes() in Math::BigInt::Lib. Preferably,
the various backend libraries will implement faster versions of their
own. Add author test files for testing these methods thorougly.
* Fix from_hex(), from_oct(), and from_bin().
- When called as instance methods, the new value should be assigned to the
invocand unless the invocand is read-only (a constant).
- When called as instance methods, the assigned value was incorrect, if the
invocand was inf or NaN.
- Add tests to t/from_hex-mbf.t, t/from_oct-mbf.t, and t/from_bin-mbf.t
to confirm the fix.
- Add new test files t/from_hex-mbi.t, t/from_oct-mbi.t, and
t/from_bin-mbi.t for better testing of these methods with Math::BigInt.
* Correct typo in Math/BigInt/Lib.pm (otherise -> otherwise) (CPAN RT 118829).
* Add POD coverage testing of Math::BigInt::Lib to t/03podcov.t.
Chris 'BinGOs' Williams [Thu, 15 Dec 2016 14:32:39 +0000 (14:32 +0000)]
Update B-Debug to CPAN version 1.24
[DELTA]
1.24 2016-12-11 rurban
* add 5.25.6 support: split optimization
Chris 'BinGOs' Williams [Thu, 15 Dec 2016 14:31:00 +0000 (14:31 +0000)]
Update Archive-Tar to CPAN version 2.20
[DELTA]
2.20 15/12/2016 (AGRUNDMA)
- Check for gzip/bzip2 before round tripping gz/bz2 files in tests
Karl Williamson [Wed, 14 Dec 2016 18:38:12 +0000 (11:38 -0700)]
perlapi: Clarify entry for utf8n_to_uvchr()
Karl Williamson [Mon, 12 Dec 2016 03:30:33 +0000 (20:30 -0700)]
perlapi: Clarify the isFOO_A() macros meanings
Karl Williamson [Mon, 12 Dec 2016 00:31:59 +0000 (17:31 -0700)]
APItest/t/utf8.t: Fix test name
This test name gave the wrong function being tested.
Karl Williamson [Sat, 10 Dec 2016 19:18:50 +0000 (12:18 -0700)]
APItest/t/utf8.t: White-space, comment only
Wraps lines to fit in 79 columns, removes a comment for development that
shouldn't have been committed in the first place.
Karl Williamson [Sat, 10 Dec 2016 18:03:58 +0000 (11:03 -0700)]
APItest/t/utf8.t: Use more idiomatic Perl
This replaces
unless(x) { y }
by
x or y
Spotted by ilmari
Karl Williamson [Mon, 12 Dec 2016 16:38:24 +0000 (09:38 -0700)]
embed.fnc: Mark some functions as pure
Some of these were identified by me, and some by Andy Lester
Karl Williamson [Mon, 12 Dec 2016 21:50:34 +0000 (14:50 -0700)]
embed.fnc: Remove pure declaration for fcns that deal with SVs
These aren't actually pure, as dealing with SVs can trigger side
effects, including processing of magic.