This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
David Mitchell [Mon, 11 Nov 2019 10:46:56 +0000 (10:46 +0000)]
add PERL_USE_3ARG_SIGHANDLER macro
There are a bunch of places in core that do
#if defined(HAS_SIGACTION) && defined(SA_SIGINFO)
to decide whether the C signal handler function should be declared with,
and called with, 1 arg or 3 args.
This commit just adds
#if defined(HAS_SIGACTION) && defined(SA_SIGINFO)
# define PERL_USE_3ARG_SIGHANDLER
#endif
Then uses the new macro in all other places rather than checking
HAS_SIGACTION and SA_SIGINFO. Thus there is no functional change; it just
makes the code more readable.
However, it turns out that all is not well with core's use of 1-arg
versus 3-arg, and the few commits will fix this.
David Mitchell [Thu, 7 Nov 2019 12:30:14 +0000 (12:30 +0000)]
add Siginfo_t
From the code comments:
This is an alias for the OS's siginfo_t, except that where the OS
doesn't support it, declare a dummy version instead. This allows us to
have signal handler functions which always have a Siginfo_t parameter
regardless of platform, (and which will just be passed a NULL value
where the OS doesn't support HAS_SIGACTION).
It doesn't actually do anything useful yet, but will shortly allow
signal handler functions to be rationalised.
Karl Williamson [Mon, 18 Nov 2019 03:39:11 +0000 (20:39 -0700)]
regcomp.h: Fix up comment
Karl Williamson [Sat, 21 Sep 2019 15:51:52 +0000 (09:51 -0600)]
Add ANYOFRb regnode
This is like the ANYOFR regnode added in the previous commit, but all
code points in the range it matches are known to have the same first
UTF-8 start byte. That means it can't match UTF-8 invariant characters,
like ASCII, because the "start" byte is different on each one, so it
could only match a range of 1, and the compiler wouldn't generate this
node for that; instead using an EXACT.
Pattern matching can rule out most code points by looking at the first
character of their UTF-8 representation, before having to convert from
UTF-8.
On ASCII this rules out all but 64 2-byte UTF-8 characters from this
simple comparison. 3-byte it's up to 4096, and 4-byte, 2**18, so the
test is less effective for higher code points.
I believe that most UTF-8 patterns that otherwise would compile to
ANYOFR will instead compile to this, as I can't envision real life
applications wanting to match large single ranges. Even the 2048
surrogates all have the same first byte.
Karl Williamson [Thu, 19 Sep 2019 22:03:04 +0000 (16:03 -0600)]
Add ANYOFR regnode
This matches a single range of code points. It is both faster and
smaller than other ANYOF-type nodes, requiring, after set-up, a single
subtraction and conditional branch.
The vast majority of Unicode properties match a single range (though
most of the properties likely to be used in real world applications have
more than a single range). But things like [ij] are a single range, and
those are quite commonly encountered. This new regnode matches them more
efficiently than a bitmap would, and doesn't require the space for one
either.
The flags field is used to store the minimum matchable start byte for
UTF-8 strings, and is ignored for non-UTF-8 targets. This, like ANYOFH
nodes which have a similar mechanism, allows for quick weeding out of
many possible matches without having to convert the UTF-8 to its
corresponding code point.
This regnode packs the 32 bit argument with 20 bits for the minimum code
point the node matches, and 12 bits for the maximum range. If the input
is a value outside these, it simply won't compile to this regnode,
instead going to one of the ANYOFH flavors.
ANYOFR is sufficient to match all of Unicode except for the final
(private use) 65K plane.
Karl Williamson [Thu, 19 Sep 2019 21:47:51 +0000 (15:47 -0600)]
regcomp.c: Use variables initialized to macro results
instead of the macros. This is in preparation for the next commit.
Karl Williamson [Thu, 19 Sep 2019 22:04:03 +0000 (16:04 -0600)]
regexec.c: Rmv some unnecessary casts
The called macro does the cast, and this makes it more legibile
Karl Williamson [Thu, 19 Sep 2019 20:20:59 +0000 (14:20 -0600)]
regcomp.c: Add parameter to static function
This further decouples this function from knowing details of the calling
structure, by passing this detail in.
Karl Williamson [Wed, 18 Sep 2019 19:20:42 +0000 (13:20 -0600)]
t/re/anyof.t: Add a test
This makes sure a non-folding above-Latin1 character is tested.
Karl Williamson [Wed, 18 Sep 2019 19:12:51 +0000 (13:12 -0600)]
Prefer EXACTish regnodes to ANYOFH nodes
ANYOFH nodes (that match code points above 255) are smaller than regular
ANYOF nodes because they don't have a 256-bit bitmap. But the
disadvantage of them over EXACT nodes is that the characters encountered
must first be converted from UTF-8 to code point to see if they match
the ANYOFH. (The difference is less clearcut with /i, because typically,
currently, the UTF-8 must be converted to code point anyway in order to
fold them.) But the EXACTFish node doesn't have an inversion list to do
lookup in, and occupies less space, because it doesn't have inversion
list data attached to it.
Also there is a bug in using ANYOFH under /l, as wide character warnings
should be emitted if the locale isn't a UTF-8 one.
The reason this change hasn't been made before (by me anyway) is that
the old way avoided upgrading the pattern to UTF-8. But having thought
about this for a long time, to match this node, the target string must
be in UTF-8 anyway, and having a UTF8ness mismatch slows down pattern
matching, as things have to be continually converted, and reconverted
after backtracking.
Karl Williamson [Thu, 7 Nov 2019 17:42:14 +0000 (10:42 -0700)]
Add -Dy debugging of tr///, y///
Karl Williamson [Wed, 18 Sep 2019 18:45:55 +0000 (12:45 -0600)]
t/re/anyof.t: Fix highest range tests
Previously we had infinity minus 1, but infinity should be beyond the
range, and the highest isn't infinity - 1, but the highest legal code
point.
Karl Williamson [Wed, 18 Sep 2019 18:41:41 +0000 (12:41 -0600)]
t/re/anyof.t: Remove duplicate test
These are covered by the single code point tests.
Karl Williamson [Wed, 18 Sep 2019 18:34:23 +0000 (12:34 -0600)]
t/re/anyof.t: Remove invalid test
One shouldn't be able to specify an infinite code point. The tests have
the conceit that one can specify a range's upper limit as infinity, but
that is just shorthand for the range being unbounded.
Karl Williamson [Wed, 18 Sep 2019 18:31:11 +0000 (12:31 -0600)]
re/anyof.t: Clarify failing message
When a test fails, an extra test is run to output debugging info; this
will cause the planned number of tests to be wrong, which will output an
extra, confusing message. This adds an explanation that the number is
expected to be wrong, hence not to worry.
Karl Williamson [Fri, 13 Sep 2019 02:19:07 +0000 (20:19 -0600)]
Allow some optimizations of qr/(?[...])/
Prior to this commit, this construct always returned an ANYOF node, even
if it could be optimized into something else.
Karl Williamson [Thu, 7 Nov 2019 20:54:48 +0000 (13:54 -0700)]
Document SVf format
Tony Cook [Mon, 11 Nov 2019 22:11:34 +0000 (09:11 +1100)]
clean up quadmath_format_*() functions
This includes:
- remove them from the API
- simplify quadmath_format_single()'s interface, and rename it
to match the new interface
fixes #17288
Karl Williamson [Sat, 16 Nov 2019 18:31:23 +0000 (11:31 -0700)]
Remove generation and use of NonFinalFold table
With the revamping done in
cc288b7a2732c37504039083ebb98241954636be, the
table of Unicode case folds that are more than a single character is no
longer used, so no need to generate it, or having it available.
Karl Williamson [Fri, 1 Nov 2019 03:30:34 +0000 (21:30 -0600)]
mktables: Fix non-final-fold table
This wasn't generating the correct values. It is no longer used, and
the next commit will remove it, but I wanted to get it right, in case it
is ever needed again.
Karl Williamson [Sat, 16 Nov 2019 18:14:15 +0000 (11:14 -0700)]
Merge branch 'multi-fold' into blead
These few commits fix the code that avoids splitting a multi-character
fold across EXACTFish nodes in regex patterns
Karl Williamson [Thu, 14 Nov 2019 22:26:53 +0000 (15:26 -0700)]
Revamp finding splittable places in /i full node
Commits
3ae8ec479bc65ef004bd856d90b82106186771d9 and
cc1ed6368d665290794d7c24d1dbeb42466e256a didn't actually work.
Tests in pat_advanced.t would have failed, except that optimizations in
the regex engine in the meantime led to the tests not actually testing
what they originally did.
I believe that this finally gets it right for non-/l.
The problem is when an EXACTFish node becomes full, you don't want to
split across a multi-char fold. To use a fairly familiar example, we
can't split between 'ss', as that sequence matches a LATIN SMALL LETTER
SHARP S, and the way the regex engine currently works, it can't see
beyond the current node; it would see one or the other 's' but not the
sequence. So the code backs off one character and checks if it can
split there. If not, it repeats until it finds such a place or gets to
the beginning. If the entire node is all 's'es, for example, there's no
good place to split. So it gives up and takes all of them.
One thing I hadn't realized before is when there are three-character
folds, you can't split if the current position is the beginning of the
three, but also when it is the second of the three.
Karl Williamson [Thu, 14 Nov 2019 16:55:08 +0000 (09:55 -0700)]
regcharclass.h: Add some macros
These macros will be used in a future commit, and are for
three-character folds. regen/regcharclass*.pl are changed for this
purpose.
Karl Williamson [Thu, 14 Nov 2019 15:43:33 +0000 (08:43 -0700)]
pat_advanced.t: Update test
This test was no longer exercising the code it once had, because ':'
doesn't go in a folded regnode. Change to use a character that does
have a fold.
And doing so shows that something in the past broke this test. This
branch will fix that; in the meantime make some tests TODO
Karl Williamson [Thu, 14 Nov 2019 20:46:53 +0000 (13:46 -0700)]
S_regatom: reinitialize flags if reparsing
Sometimes we have to reparse a node. We need to reset the flags to
avoid contamination from the first parse, where a flag got set by a
character in it that won't actually be in the reparsed version.
Karl Williamson [Thu, 14 Nov 2019 20:30:23 +0000 (13:30 -0700)]
Revamp S_regatom() handling of non-UTF-8 folds
This accomplishes two things. One is that prior to this commit, a
character being added to the node could set some flags before we
determine that the character won't even fit in the node. So the flags
get set inappropriately. This may be harmless except for performance
penalties; I don't know.
The other thing it does is to make sure 'ender' is not changed in the
loop. A future commit with depend on that.
Karl Williamson [Thu, 14 Nov 2019 20:21:40 +0000 (13:21 -0700)]
regcomp.c: Avoid a Copy
By reserving a few more bytes at the beginning of the loop, which will
be given back at the end anyway, we can avoid a temporary variable and a
Copy.
Karl Williamson [Thu, 14 Nov 2019 17:35:15 +0000 (10:35 -0700)]
regcomp.c: White space, comment only
One comment was outdated.
Karl Williamson [Thu, 14 Nov 2019 16:49:50 +0000 (09:49 -0700)]
regen/regcharclass_multi_char_folds.pl: Simplify
This creates a simply named array instead of a more complicated array
ref, so is easier to understand
Karl Williamson [Thu, 14 Nov 2019 16:36:48 +0000 (09:36 -0700)]
regen/regcharclass_multi_char_folds.pl: Use printable char
It makes the result more legible if it uses the printable character
instead of an escape sequence when appropriate.
Although, currently, the value is re-escaped for output. This helped
during debugging.
Karl Williamson [Thu, 14 Nov 2019 16:33:39 +0000 (09:33 -0700)]
regen/regcharclass_multi_char_folds.pl: Fix comments
Max Maischein [Fri, 15 Nov 2019 19:16:32 +0000 (20:16 +0100)]
Initial Windows Github action, adapted from skaji
This supports
* 64bit MSVC 2019 (MSVC142)
* 64bit Mingw64 as supplied by Strawberry Perl
* 64bit Cygwin gcc
* 32bit MSVC 2010 (MSVC100FREE)
Characteristics
* Only clone the repo 10 levels deep (we need only one?)
* Parallel build on the one environment where it works (Cygwin)
* Ready for clcache / ccache, but these need a 100% pass before
Github saves the results to the cache
Karl Williamson [Thu, 31 Oct 2019 19:42:04 +0000 (13:42 -0600)]
Double the number of possible SV types
As per discussion beginning in
http://nntp.perl.org/group/perl.perl5.porters/25656
Karl Williamson [Sat, 16 Nov 2019 12:29:38 +0000 (05:29 -0700)]
regcomp.h: Fix comment
Karl Williamson [Sat, 16 Nov 2019 12:28:41 +0000 (05:28 -0700)]
embed.fnc: Parameter is really a const
Make it so.
Karl Williamson [Sat, 16 Nov 2019 05:23:44 +0000 (22:23 -0700)]
doop.c, op.c: Silence some compiler warnings
Karl Williamson [Fri, 15 Nov 2019 22:01:15 +0000 (15:01 -0700)]
PATCH: gh#17218 memory leak
Indeed, a variable's ref count was not getting decremented.
Dagfinn Ilmari Mannsåker [Mon, 22 Jul 2019 09:44:15 +0000 (10:44 +0100)]
win32: Add more missing wchar.h includes
Dagfinn Ilmari Mannsåker [Mon, 22 Jul 2019 09:37:30 +0000 (10:37 +0100)]
Add missing wchar.h include to Win32API::File
Karl Williamson [Thu, 14 Nov 2019 16:28:14 +0000 (09:28 -0700)]
regcomp.sym: Add detail to some node descriptions
Having this enabled me to more quickly understand what's going on.
A trailing period is removed from some long descriptions to make them
slightly shorter.
Karl Williamson [Thu, 14 Nov 2019 15:33:31 +0000 (08:33 -0700)]
utf8.h: Use MAX() macro instead of its expansion
It makes things a little clearer.
Dagfinn Ilmari Mannsåker [Thu, 14 Nov 2019 12:12:07 +0000 (12:12 +0000)]
Fix FEATURE_${NAME}_IS_ENABLED macro for default features
Commit
9f601cf3bbfa6be3e2ab3468e77a7b79c80ff5cf changed feature checks
from using a hash lookup to a bitmap check, but the macro definition
for enabled-by-default had the wrong macro name for the mask check,
and had `\L` instead of `\U` for the bit macro. Change them all to
use the already-uppercase `$NAME` variable.
We don't actually have any default-enabled features since array_base
was removed, but in converting TonyC's 'noindirect' feature into a
default-enabled 'indirect' feature, I got bitten by this.
Tony Cook [Wed, 13 Nov 2019 22:14:16 +0000 (09:14 +1100)]
perldelta updates
Chris 'BinGOs' Williams [Tue, 12 Nov 2019 21:55:21 +0000 (21:55 +0000)]
James E Keenan [Fri, 8 Nov 2019 15:17:50 +0000 (10:17 -0500)]
Fix: local variable hiding parameter of same name
LGTM provides static code analysis and recommendations for code quality
improvements. Their recent run over the Perl 5 core distribution
identified 12 instances where a local variable hid a parameter of
the same name in an outer scope. The LGTM rule governing this situation
can be found here:
Per: https://lgtm.com/rules/
2156240606/
This patch renames local variables in approximately 8 of those instances
to comply with the LGTM recommendation. Suggestions for renamed
variables were made by Tony Cook.
For: https://github.com/Perl/perl5/pull/17281
David Mitchell [Tue, 12 Nov 2019 15:34:55 +0000 (15:34 +0000)]
remove leak in tr/ascii/utf8/
The recent change to use invlists left a bug in S_do_trans_invmap()
whereby it allocated a new temp buf if it knew the resulting string
would be too long, but failed to free the buffer at the end.
Showed up as smokes under ASAN failing these tests:
op/tr_latin1.t
op/tr.t
uni/tr_utf8.t
David Mitchell [Tue, 12 Nov 2019 13:18:43 +0000 (13:18 +0000)]
Memoize: fix test timing
NB: this distro is upstream-CPAN, but there hasn't been a new release in
7 years, so I'm patching this intermittently false positive test
directly in blead.
I reported this issue 4 years ago as
https://rt.cpan.org/Public/Bug/Display.html?id=108382
cpan/Memoize/t/speed.t occasionally fails tests 2 and 5 in bleadperl
smokes.
This is because the time deltas (using "time") have a granularity of 1
sec. The basic test is: run fib() for at least 10 secs, then run again
with memoize and check that it takes less than 1/10th of that time.
However, the first measurement may be exactly 10 secs, while the second
run (no matter how speedy) may be 1 sec if it passes over a tick
boundary. 0.001 is then added to it to avoid division by zero.
The test is ($ELAPSED/$ELAPSED2 > 10). With $ELAPSED = 10 and
$ELAPSED2 = 1.001, the test fails. The easy fix is to run for at least
11 secs rather than 10.
David Mitchell [Tue, 12 Nov 2019 12:45:29 +0000 (12:45 +0000)]
fix build under PERL_GLOBAL_STRUCT_PRIVATE
sprinkle a few random 'dVAR's at the top of some fns.
Petr Písař [Tue, 12 Nov 2019 08:19:18 +0000 (09:19 +0100)]
Adapt Configure to GCC version 10
I got a notice from Jeff Law <law@redhat.com>:
Your particular package fails its testsuite. This was ultimately
tracked down to a Configure problem. The perl configure script treated
gcc-10 as gcc-1 and turned on -fpcc-struct-return. This is an ABI
changing flag and caused Perl to not be able to interact properly with
the dbm libraries on the system leading to a segfault.
His proposed patch corrected only this one instance of the version
mismatch. Reading the Configure script revealed more issues. This
patch fixes all of them I found.
Please note I do not have GCC 10 available, I tested it by faking the version
with:
--- a/Configure
+++ b/Configure
@@ -4672,7 +4672,7 @@ $cat >try.c <<EOM
int main() {
#if defined(__GNUC__) && !defined(__INTEL_COMPILER)
#ifdef __VERSION__
- printf("%s\n", __VERSION__);
+ printf("%s\n", "10.0.0");
#else
printf("%s\n", "1");
#endif
Karl Williamson [Tue, 12 Nov 2019 04:48:46 +0000 (21:48 -0700)]
PATCH: [gh #17185] Improper 'unescaped lbrace' msg
This warning is simply deleted. The possible places where an unescaped
left brace is illegal has been scaled back to avoid breaking more
existing code, and this context will remain legal.
Karl Williamson [Thu, 3 Oct 2019 03:11:14 +0000 (21:11 -0600)]
ext/DynaLoader/dl_aix.xs: Use isDIGIT macro
which is more efficient
Karl Williamson [Sat, 7 Sep 2019 15:18:49 +0000 (09:18 -0600)]
malloc.c: Use isDIGIT macro instead of hand-rolling it
The macro is more efficient
Karl Williamson [Fri, 6 Sep 2019 16:23:26 +0000 (10:23 -0600)]
t/re/regexp.t: Only convert to EBCDIC once
Some tests get added as we go along, and those added tests have already
been converted to EBCDIC if necessary. Don't reconvert, which messes
things up.
Karl Williamson [Fri, 6 Sep 2019 15:49:41 +0000 (09:49 -0600)]
re/regexp.t: Change variable name to be more meaningful
Karl Williamson [Sat, 5 Oct 2019 22:43:10 +0000 (16:43 -0600)]
utf8.h: Use a cast to U8 to avoid an AND
Karl Williamson [Tue, 12 Nov 2019 00:12:45 +0000 (17:12 -0700)]
op.c: Move #endif
Otherwise this fails to compile on EBCDIC
Karl Williamson [Tue, 12 Nov 2019 00:09:13 +0000 (17:09 -0700)]
regen/ebcdic.pl: Allow for declaring table size.
This fixes a bug where xlc requires the size of the array.
Karl Williamson [Thu, 3 Oct 2019 04:04:12 +0000 (22:04 -0600)]
utfebcdic.h: Add comments
Tony Cook [Mon, 11 Nov 2019 03:43:42 +0000 (14:43 +1100)]
handle s being updated without len being updated
fix #17279
Chris 'BinGOs' Williams [Sun, 10 Nov 2019 18:20:21 +0000 (18:20 +0000)]
Update IO-Compress to CPAN version 2.090
[DELTA]
2.090 9 November 2019
* MANIFEST error for streamzip
https://github.com/pmqs/IO-Compress/issues/6
70dd9bb4d27bd23d47ac9392320f55c124bc347b
Chris 'BinGOs' Williams [Sun, 10 Nov 2019 18:18:10 +0000 (18:18 +0000)]
Update Compress-Raw-Zlib to CPAN version 2.090
[DELTA]
2.090 9 November 2019
* No Changes
Chris 'BinGOs' Williams [Sun, 10 Nov 2019 18:17:04 +0000 (18:17 +0000)]
Update Compress-Raw-Bzip2 to CPAN version 2.090
[DELTA]
2.090 9 November 2019
* No Changes
Chris 'BinGOs' Williams [Sun, 10 Nov 2019 17:50:09 +0000 (17:50 +0000)]
Update Module-Load-Conditional to CPAN version 0.70
[DELTA]
0.70 Sun Nov 10 14:28:41 GMT 2019
* Protect ourselves from Module::Metadata parsing problems
[ RT#130939 ]
Chris 'BinGOs' Williams [Sun, 10 Nov 2019 17:48:54 +0000 (17:48 +0000)]
Ten-ten let's do it again
Tomasz Konojacki [Sun, 10 Nov 2019 16:37:50 +0000 (17:37 +0100)]
t/op/fork.t: fix skip condition
'is_miniperl' was being parsed as a bareword so the condition was
always true on Windows.
Tomasz Konojacki [Sun, 10 Nov 2019 06:14:01 +0000 (07:14 +0100)]
win32: fix waitpid(-1, WNOHANG) segfault/panic
waitpid(-1, WNOHANG) would panic or segfault if called when the
thread's message queue is not empty.
Thanks to Erik Jezierski for the report and diagnosis.
[gh #16529]
Steve Hay [Sun, 10 Nov 2019 14:59:55 +0000 (14:59 +0000)]
Import perl5301delta.pod
Steve Hay [Sun, 10 Nov 2019 14:55:51 +0000 (14:55 +0000)]
Update Module-CoreList with data for 5.30.1
Steve Hay [Sun, 10 Nov 2019 14:31:50 +0000 (14:31 +0000)]
Tick off 5.30.1
Steve Hay [Sun, 10 Nov 2019 14:30:53 +0000 (14:30 +0000)]
Add epigraph for 5.30.1
Steve Hay [Sun, 10 Nov 2019 12:20:29 +0000 (12:20 +0000)]
5.30.1 today
Tomasz Konojacki [Sat, 9 Nov 2019 01:26:38 +0000 (02:26 +0100)]
UTF8_CHK_SKIP uses MIN() too
This fixes compilation with Visual C++
Nicolas R [Fri, 8 Nov 2019 17:16:04 +0000 (10:16 -0700)]
sync with cpan release of Devel-PPPort 3.55
Nicolas R [Thu, 7 Nov 2019 18:40:08 +0000 (11:40 -0700)]
Prepare Changelog and version for coming release
Karl Williamson [Sun, 27 Oct 2019 00:53:08 +0000 (18:53 -0600)]
parts/inc/misc: Convert to use ivers()
Doing this showed me a redundant test.
I didn't have to take out the zeros, it just looks better without them.
(cherry picked from commit
9e0e078a1aefa78df3322c87b01323862e05c397)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Sun, 27 Oct 2019 00:51:29 +0000 (18:51 -0600)]
parts/inc/inctools: ivers(): Add version string inputs
(cherry picked from commit
bb54da9a565f7fcf13708a69ed6a33a36bb32745)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Sun, 27 Oct 2019 00:47:58 +0000 (18:47 -0600)]
HACKERS: add more details; use of ivers()
(cherry picked from commit
1f521ca952d70d2a93afa18e7069b162d64949f0)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Thu, 24 Oct 2019 17:26:43 +0000 (11:26 -0600)]
parts/inc/inctools: Add short synonym for int_parse_version
(cherry picked from commit
3df9d356984187e51559b28cd6653cfffef94bff)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Thu, 24 Oct 2019 17:16:24 +0000 (11:16 -0600)]
mktests.PL: Require inctools in .t files
This will allow them to use the functions therein.
(cherry picked from commit
48bb078538a75f644588489b4d59f39d1d3d5711)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Thu, 24 Oct 2019 03:19:47 +0000 (21:19 -0600)]
utf8_to_uvchr_buf() Return proper length
When input UTF-8 is 13 bytes, return 13, even on 32 bit machines where
overflow happens at 7 UTF-8 bytes.
(cherry picked from commit
f379e2ee4277fc855a37b82c6c94294c4e0e8c8d)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Thu, 24 Oct 2019 01:53:32 +0000 (19:53 -0600)]
Regenerate after new backportings
(cherry picked from commit
237f5af008eeb7e48fa94eb14952cc1f37d7807e)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Tue, 22 Oct 2019 20:19:16 +0000 (14:19 -0600)]
parts/inc/misc: Change version validity criteria
I'm not sure why I think this is a good idea, but I know Unicode
handling started in 5.6, and am converting to use that criterium
(cherry picked from commit
d69dce5eb7ab3ae39718aad250fcba2189773621)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Tue, 22 Oct 2019 20:12:31 +0000 (14:12 -0600)]
Backport isFOO_LC_utf8_safe()
This also involves some test refactoring
(cherry picked from commit
7149d3266cce3561c90c73f57ec932db73105311)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Wed, 23 Oct 2019 00:01:39 +0000 (18:01 -0600)]
parts/inc/misc: Backport some isFOO_LC macros
A few of this class of macros did not go back very far. This makes a
reasonable attempt to get things right, but very early versions may have
some wrong answers, but unlikely.
This was complicated by the fact that isascii() and isblank() may not be
available on a given platform. So this just uses the plain non-locale
version for those very early Perl versions.
I didn't add tests. It is hard to portably test locales. The next
commits will backport functions that call these and do have tests.
(cherry picked from commit
a8f88b13766d8f2820f5bba560bb282188124b34)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Tue, 22 Oct 2019 19:00:08 +0000 (13:00 -0600)]
parts/inc/misc: White-space only
Mostly indenting a newly formed block
(cherry picked from commit
820b22a12ec9783c819b7f2400201908bceed04d)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Tue, 22 Oct 2019 20:03:30 +0000 (14:03 -0600)]
parts/inc/misc: Generalize a test
The result of this commit is a loop that runs once; that will change in
two commits from now, when it runs with different values
(cherry picked from commit
1d9bacdd128c8dc51c3b54b05e9d112a39c25e4a)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Tue, 22 Oct 2019 19:28:32 +0000 (13:28 -0600)]
Backport toFOO_uvchr()
(cherry picked from commit
1123d46ee9d608669383de3bf540882072690ad4)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Tue, 22 Oct 2019 19:21:03 +0000 (13:21 -0600)]
Backport isFOO_uvchr()
(cherry picked from commit
9ae426cf5b257cb458fcf48427524a6aa4332cad)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Tue, 22 Oct 2019 19:10:23 +0000 (13:10 -0600)]
parts/inc/misc: Change internal macro name
This is to distinguish it from a macro with similar intent about to be
added.
(cherry picked from commit
cd875ece2bd9cf79a62016767589f8b5821293d6)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Tue, 22 Oct 2019 19:05:44 +0000 (13:05 -0600)]
parts/inc/misc: early toFOLD_utf8_safe() is toLOWER
On early perls, there was no distinction between fold and lowercase, so
just call lower from fold.
(cherry picked from commit
0450a74631276c933399241c46616508ce32c299)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Mon, 21 Oct 2019 21:18:44 +0000 (14:18 -0700)]
parts/inc/misc: Use hash and loop to generalize code
This converts the testing of certain tests that are nearly identical to
use a loop with a hash to store the differences, leading to simpler,
extensible code.
(cherry picked from commit
14ec6258920f199e95c72891c23139f6ff10e511)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Fri, 18 Oct 2019 22:20:17 +0000 (15:20 -0700)]
Backport UTF8_MAXBYTES_CASE
This constant was wrong in earlier perls.
(cherry picked from commit
6e55485b5c4486d7883a50325a42a51dcca42ab8)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Wed, 23 Oct 2019 00:01:05 +0000 (18:01 -0600)]
parts/inc/misc: Fix EBCDIC bug
We were double xlating the underscore
(cherry picked from commit
dc1ee4fcc8bc4d1f627c33228ab30432e57f0a9a)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Fri, 18 Oct 2019 22:19:07 +0000 (15:19 -0700)]
parts/inc/utf8: Refactor a little for clarity
(cherry picked from commit
eddcc8663f05c54bedd22e22242d751af34e61d3)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Fri, 18 Oct 2019 22:09:21 +0000 (15:09 -0700)]
Can test isASCII_utf8_safe to earlier
This doesn't really depend on any UTF-8, so by slightly rewriting it, we
can backport it earlier.
(cherry picked from commit
621fa67cacff3079f35a57baf2bd737b03158601)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Thu, 24 Oct 2019 20:21:06 +0000 (14:21 -0600)]
HACKERS: Update, correct
(cherry picked from commit
c2ab6b2df5c405ba7a4b0e98d14fbd7cc19f70ad)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Karl Williamson [Tue, 22 Oct 2019 18:47:33 +0000 (12:47 -0600)]
Generate latest apidoc.fnc from blead
(cherry picked from commit
535722786ead0041913879fc51ed4eaacc27a693)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Pali [Fri, 25 Oct 2019 13:58:50 +0000 (15:58 +0200)]
Implement G_RETHROW for eval_sv
(cherry picked from commit
73a4fb176de5b198cebeb88d08a57b0ad4bbf1f3)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Pali [Thu, 24 Oct 2019 08:11:10 +0000 (10:11 +0200)]
Partially revert 9f84bc0 which broke generating Makefile for non-blead perl versions
(cherry picked from commit
e85ddb627f9f99461f45ae482506cc276c7e8165)
Signed-off-by: Nicolas R <atoomic@cpan.org>
Nicolas R [Fri, 11 Oct 2019 22:13:28 +0000 (16:13 -0600)]
Make dist generate a fresh PPPort.pm
(cherry picked from commit
d811f7d36e88cdb964f3fece5e2eff8c0d6a252b)
Signed-off-by: Nicolas R <atoomic@cpan.org>