This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
perl5.git
4 years agoFix FEATURE_${NAME}_IS_ENABLED macro for default features
Dagfinn Ilmari Mannsåker [Thu, 14 Nov 2019 12:12:07 +0000 (12:12 +0000)]
Fix FEATURE_${NAME}_IS_ENABLED macro for default features

Commit 9f601cf3bbfa6be3e2ab3468e77a7b79c80ff5cf changed feature checks
from using a hash lookup to a bitmap check, but the macro definition
for enabled-by-default had the wrong macro name for the mask check,
and had `\L` instead of `\U` for the bit macro.  Change them all to
use the already-uppercase `$NAME` variable.

We don't actually have any default-enabled features since array_base
was removed, but in converting TonyC's 'noindirect' feature into a
default-enabled 'indirect' feature, I got bitten by this.

4 years agoperldelta updates
Tony Cook [Wed, 13 Nov 2019 22:14:16 +0000 (09:14 +1100)]
perldelta updates

4 years agoperldelta for 6bd6308f
Chris 'BinGOs' Williams [Tue, 12 Nov 2019 21:55:21 +0000 (21:55 +0000)]
perldelta for 6bd6308f

4 years agoFix: local variable hiding parameter of same name
James E Keenan [Fri, 8 Nov 2019 15:17:50 +0000 (10:17 -0500)]
Fix: local variable hiding parameter of same name

LGTM provides static code analysis and recommendations for code quality
improvements.  Their recent run over the Perl 5 core distribution
identified 12 instances where a local variable hid a parameter of
the same name in an outer scope.  The LGTM rule governing this situation
can be found here:

Per: https://lgtm.com/rules/2156240606/

This patch renames local variables in approximately 8 of those instances
to comply with the LGTM recommendation.  Suggestions for renamed
variables were made by Tony Cook.

For: https://github.com/Perl/perl5/pull/17281

4 years agoremove leak in tr/ascii/utf8/
David Mitchell [Tue, 12 Nov 2019 15:34:55 +0000 (15:34 +0000)]
remove leak in tr/ascii/utf8/

The recent change to use invlists left a bug in S_do_trans_invmap()
whereby it allocated a new temp buf if it knew the resulting string
would be too long, but failed to free the buffer at the end.

Showed up as smokes under ASAN failing these tests:

    op/tr_latin1.t
    op/tr.t
    uni/tr_utf8.t

4 years agoMemoize: fix test timing
David Mitchell [Tue, 12 Nov 2019 13:18:43 +0000 (13:18 +0000)]
Memoize: fix test timing

NB: this distro is upstream-CPAN, but there hasn't been a new release in
7 years, so I'm patching this intermittently false positive test
directly in blead.

I reported this issue 4 years ago as

    https://rt.cpan.org/Public/Bug/Display.html?id=108382

cpan/Memoize/t/speed.t occasionally fails tests 2 and 5 in bleadperl
smokes.

This is because the time deltas (using "time") have a granularity of 1
sec. The basic test is: run fib() for at least 10 secs, then run again
with memoize and check that it takes less than 1/10th of that time.

However, the first measurement may be exactly 10 secs, while the second
run (no matter how speedy) may be 1 sec if it passes over a tick
boundary. 0.001 is then added to it to avoid division by zero.

The test is ($ELAPSED/$ELAPSED2 > 10).  With $ELAPSED = 10 and
$ELAPSED2 = 1.001, the test fails. The easy fix is to run for at least
11 secs rather than 10.

4 years agofix build under PERL_GLOBAL_STRUCT_PRIVATE
David Mitchell [Tue, 12 Nov 2019 12:45:29 +0000 (12:45 +0000)]
fix build under PERL_GLOBAL_STRUCT_PRIVATE

sprinkle a few random 'dVAR's at the top of some fns.

4 years agoAdapt Configure to GCC version 10
Petr Písař [Tue, 12 Nov 2019 08:19:18 +0000 (09:19 +0100)]
Adapt Configure to GCC version 10

I got a notice from Jeff Law <law@redhat.com>:

    Your particular package fails its testsuite. This was ultimately
    tracked down to a Configure problem. The perl configure script treated
    gcc-10 as gcc-1 and turned on -fpcc-struct-return. This is an ABI
    changing flag and caused Perl to not be able to interact properly with
    the dbm libraries on the system leading to a segfault.

His proposed patch corrected only this one instance of the version
mismatch. Reading the Configure script revealed more issues. This
patch fixes all of them I found.

Please note I do not have GCC 10 available, I tested it by faking the version
with:

--- a/Configure
+++ b/Configure
@@ -4672,7 +4672,7 @@ $cat >try.c <<EOM
 int main() {
 #if defined(__GNUC__) && !defined(__INTEL_COMPILER)
 #ifdef __VERSION__
-       printf("%s\n", __VERSION__);
+       printf("%s\n", "10.0.0");
 #else
        printf("%s\n", "1");
 #endif

4 years agoPATCH: [gh #17185] Improper 'unescaped lbrace' msg
Karl Williamson [Tue, 12 Nov 2019 04:48:46 +0000 (21:48 -0700)]
PATCH: [gh #17185] Improper 'unescaped lbrace' msg

This warning is simply deleted.  The possible places where an unescaped
left brace is illegal has been scaled back to avoid breaking more
existing code, and this context will remain legal.

4 years agoext/DynaLoader/dl_aix.xs: Use isDIGIT macro
Karl Williamson [Thu, 3 Oct 2019 03:11:14 +0000 (21:11 -0600)]
ext/DynaLoader/dl_aix.xs: Use isDIGIT macro

which is more efficient

4 years agomalloc.c: Use isDIGIT macro instead of hand-rolling it
Karl Williamson [Sat, 7 Sep 2019 15:18:49 +0000 (09:18 -0600)]
malloc.c: Use isDIGIT macro instead of hand-rolling it

The macro is more efficient

4 years agot/re/regexp.t: Only convert to EBCDIC once
Karl Williamson [Fri, 6 Sep 2019 16:23:26 +0000 (10:23 -0600)]
t/re/regexp.t: Only convert to EBCDIC once

Some tests get added as we go along, and those added tests have already
been converted to EBCDIC if necessary.  Don't reconvert, which messes
things up.

4 years agore/regexp.t: Change variable name to be more meaningful
Karl Williamson [Fri, 6 Sep 2019 15:49:41 +0000 (09:49 -0600)]
re/regexp.t: Change variable name to be more meaningful

4 years agoutf8.h: Use a cast to U8 to avoid an AND
Karl Williamson [Sat, 5 Oct 2019 22:43:10 +0000 (16:43 -0600)]
utf8.h: Use a cast to U8 to avoid an AND

4 years agoop.c: Move #endif
Karl Williamson [Tue, 12 Nov 2019 00:12:45 +0000 (17:12 -0700)]
op.c: Move #endif

Otherwise this fails to compile on EBCDIC

4 years agoregen/ebcdic.pl: Allow for declaring table size.
Karl Williamson [Tue, 12 Nov 2019 00:09:13 +0000 (17:09 -0700)]
regen/ebcdic.pl: Allow for declaring table size.

This fixes a bug where xlc requires the size of the array.

4 years agoutfebcdic.h: Add comments
Karl Williamson [Thu, 3 Oct 2019 04:04:12 +0000 (22:04 -0600)]
utfebcdic.h: Add comments

4 years agohandle s being updated without len being updated
Tony Cook [Mon, 11 Nov 2019 03:43:42 +0000 (14:43 +1100)]
handle s being updated without len being updated

fix #17279

4 years agoUpdate IO-Compress to CPAN version 2.090
Chris 'BinGOs' Williams [Sun, 10 Nov 2019 18:20:21 +0000 (18:20 +0000)]
Update IO-Compress to CPAN version 2.090

  [DELTA]

  2.090 9 November 2019

      * MANIFEST error for streamzip
        https://github.com/pmqs/IO-Compress/issues/6
        70dd9bb4d27bd23d47ac9392320f55c124bc347b

4 years agoUpdate Compress-Raw-Zlib to CPAN version 2.090
Chris 'BinGOs' Williams [Sun, 10 Nov 2019 18:18:10 +0000 (18:18 +0000)]
Update Compress-Raw-Zlib to CPAN version 2.090

  [DELTA]

  2.090 9 November 2019

      * No Changes

4 years agoUpdate Compress-Raw-Bzip2 to CPAN version 2.090
Chris 'BinGOs' Williams [Sun, 10 Nov 2019 18:17:04 +0000 (18:17 +0000)]
Update Compress-Raw-Bzip2 to CPAN version 2.090

  [DELTA]

  2.090 9 November 2019

      * No Changes

4 years agoUpdate Module-Load-Conditional to CPAN version 0.70
Chris 'BinGOs' Williams [Sun, 10 Nov 2019 17:50:09 +0000 (17:50 +0000)]
Update Module-Load-Conditional to CPAN version 0.70

  [DELTA]

0.70    Sun Nov 10 14:28:41 GMT 2019

* Protect ourselves from Module::Metadata parsing problems
  [ RT#130939 ]

4 years agoTen-ten let's do it again
Chris 'BinGOs' Williams [Sun, 10 Nov 2019 17:48:54 +0000 (17:48 +0000)]
Ten-ten let's do it again

4 years agot/op/fork.t: fix skip condition
Tomasz Konojacki [Sun, 10 Nov 2019 16:37:50 +0000 (17:37 +0100)]
t/op/fork.t: fix skip condition

'is_miniperl' was being parsed as a bareword so the condition was
always true on Windows.

4 years agowin32: fix waitpid(-1, WNOHANG) segfault/panic
Tomasz Konojacki [Sun, 10 Nov 2019 06:14:01 +0000 (07:14 +0100)]
win32: fix waitpid(-1, WNOHANG) segfault/panic

waitpid(-1, WNOHANG) would panic or segfault if called when the
thread's message queue is not empty.

Thanks to Erik Jezierski for the report and diagnosis.

[gh #16529]

4 years agoImport perl5301delta.pod
Steve Hay [Sun, 10 Nov 2019 14:59:55 +0000 (14:59 +0000)]
Import perl5301delta.pod

4 years agoUpdate Module-CoreList with data for 5.30.1
Steve Hay [Sun, 10 Nov 2019 14:55:51 +0000 (14:55 +0000)]
Update Module-CoreList with data for 5.30.1

4 years agoTick off 5.30.1
Steve Hay [Sun, 10 Nov 2019 14:31:50 +0000 (14:31 +0000)]
Tick off 5.30.1

4 years agoAdd epigraph for 5.30.1
Steve Hay [Sun, 10 Nov 2019 14:30:53 +0000 (14:30 +0000)]
Add epigraph for 5.30.1

4 years ago5.30.1 today
Steve Hay [Sun, 10 Nov 2019 12:20:29 +0000 (12:20 +0000)]
5.30.1 today

4 years agoUTF8_CHK_SKIP uses MIN() too
Tomasz Konojacki [Sat, 9 Nov 2019 01:26:38 +0000 (02:26 +0100)]
UTF8_CHK_SKIP uses MIN() too

This fixes compilation with Visual C++

4 years agosync with cpan release of Devel-PPPort 3.55
Nicolas R [Fri, 8 Nov 2019 17:16:04 +0000 (10:16 -0700)]
sync with cpan release of Devel-PPPort 3.55

4 years agoPrepare Changelog and version for coming release
Nicolas R [Thu, 7 Nov 2019 18:40:08 +0000 (11:40 -0700)]
Prepare Changelog and version for coming release

4 years agoparts/inc/misc: Convert to use ivers()
Karl Williamson [Sun, 27 Oct 2019 00:53:08 +0000 (18:53 -0600)]
parts/inc/misc: Convert to use ivers()

Doing this showed me a redundant test.

I didn't have to take out the zeros, it just looks better without them.

(cherry picked from commit 9e0e078a1aefa78df3322c87b01323862e05c397)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoparts/inc/inctools: ivers(): Add version string inputs
Karl Williamson [Sun, 27 Oct 2019 00:51:29 +0000 (18:51 -0600)]
parts/inc/inctools: ivers(): Add version string inputs

(cherry picked from commit bb54da9a565f7fcf13708a69ed6a33a36bb32745)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoHACKERS: add more details; use of ivers()
Karl Williamson [Sun, 27 Oct 2019 00:47:58 +0000 (18:47 -0600)]
HACKERS: add more details; use of ivers()

(cherry picked from commit 1f521ca952d70d2a93afa18e7069b162d64949f0)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoparts/inc/inctools: Add short synonym for int_parse_version
Karl Williamson [Thu, 24 Oct 2019 17:26:43 +0000 (11:26 -0600)]
parts/inc/inctools: Add short synonym for int_parse_version

(cherry picked from commit 3df9d356984187e51559b28cd6653cfffef94bff)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agomktests.PL: Require inctools in .t files
Karl Williamson [Thu, 24 Oct 2019 17:16:24 +0000 (11:16 -0600)]
mktests.PL: Require inctools in .t files

This will allow them to use the functions therein.

(cherry picked from commit 48bb078538a75f644588489b4d59f39d1d3d5711)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoutf8_to_uvchr_buf() Return proper length
Karl Williamson [Thu, 24 Oct 2019 03:19:47 +0000 (21:19 -0600)]
utf8_to_uvchr_buf() Return proper length

When input UTF-8 is 13 bytes, return 13, even on 32 bit machines where
overflow happens at 7 UTF-8 bytes.

(cherry picked from commit f379e2ee4277fc855a37b82c6c94294c4e0e8c8d)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoRegenerate after new backportings
Karl Williamson [Thu, 24 Oct 2019 01:53:32 +0000 (19:53 -0600)]
Regenerate after new backportings

(cherry picked from commit 237f5af008eeb7e48fa94eb14952cc1f37d7807e)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoparts/inc/misc: Change version validity criteria
Karl Williamson [Tue, 22 Oct 2019 20:19:16 +0000 (14:19 -0600)]
parts/inc/misc: Change version validity criteria

I'm not sure why I think this is a good idea, but I know Unicode
handling started in 5.6, and am converting to use that criterium

(cherry picked from commit d69dce5eb7ab3ae39718aad250fcba2189773621)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoBackport isFOO_LC_utf8_safe()
Karl Williamson [Tue, 22 Oct 2019 20:12:31 +0000 (14:12 -0600)]
Backport isFOO_LC_utf8_safe()

This also involves some test refactoring

(cherry picked from commit 7149d3266cce3561c90c73f57ec932db73105311)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoparts/inc/misc: Backport some isFOO_LC macros
Karl Williamson [Wed, 23 Oct 2019 00:01:39 +0000 (18:01 -0600)]
parts/inc/misc: Backport some isFOO_LC macros

A few of this class of macros did not go back very far.  This makes a
reasonable attempt to get things right, but very early versions may have
some wrong answers, but unlikely.

This was complicated by the fact that isascii() and isblank() may not be
available on a given platform.  So this just uses the plain non-locale
version for those very early Perl versions.

I didn't add tests.  It is hard to portably test locales.  The next
commits will backport functions that call these and do have tests.

(cherry picked from commit a8f88b13766d8f2820f5bba560bb282188124b34)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoparts/inc/misc: White-space only
Karl Williamson [Tue, 22 Oct 2019 19:00:08 +0000 (13:00 -0600)]
parts/inc/misc: White-space only

Mostly indenting a newly formed block

(cherry picked from commit 820b22a12ec9783c819b7f2400201908bceed04d)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoparts/inc/misc: Generalize a test
Karl Williamson [Tue, 22 Oct 2019 20:03:30 +0000 (14:03 -0600)]
parts/inc/misc: Generalize a test

The result of this commit is a loop that runs once;  that will change in
two commits from now, when it runs with different values

(cherry picked from commit 1d9bacdd128c8dc51c3b54b05e9d112a39c25e4a)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoBackport toFOO_uvchr()
Karl Williamson [Tue, 22 Oct 2019 19:28:32 +0000 (13:28 -0600)]
Backport toFOO_uvchr()

(cherry picked from commit 1123d46ee9d608669383de3bf540882072690ad4)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoBackport isFOO_uvchr()
Karl Williamson [Tue, 22 Oct 2019 19:21:03 +0000 (13:21 -0600)]
Backport isFOO_uvchr()

(cherry picked from commit 9ae426cf5b257cb458fcf48427524a6aa4332cad)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoparts/inc/misc: Change internal macro name
Karl Williamson [Tue, 22 Oct 2019 19:10:23 +0000 (13:10 -0600)]
parts/inc/misc: Change internal macro name

This is to distinguish it from a macro with similar intent about to be
added.

(cherry picked from commit cd875ece2bd9cf79a62016767589f8b5821293d6)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoparts/inc/misc: early toFOLD_utf8_safe() is toLOWER
Karl Williamson [Tue, 22 Oct 2019 19:05:44 +0000 (13:05 -0600)]
parts/inc/misc: early toFOLD_utf8_safe() is toLOWER

On early perls, there was no distinction between fold and lowercase, so
just call lower from fold.

(cherry picked from commit 0450a74631276c933399241c46616508ce32c299)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoparts/inc/misc: Use hash and loop to generalize code
Karl Williamson [Mon, 21 Oct 2019 21:18:44 +0000 (14:18 -0700)]
parts/inc/misc: Use hash and loop to generalize code

This converts the testing of certain tests that are nearly identical to
use a loop with a hash to store the differences, leading to simpler,
extensible code.

(cherry picked from commit 14ec6258920f199e95c72891c23139f6ff10e511)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoBackport UTF8_MAXBYTES_CASE
Karl Williamson [Fri, 18 Oct 2019 22:20:17 +0000 (15:20 -0700)]
Backport UTF8_MAXBYTES_CASE

This constant was wrong in earlier perls.

(cherry picked from commit 6e55485b5c4486d7883a50325a42a51dcca42ab8)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoparts/inc/misc: Fix EBCDIC bug
Karl Williamson [Wed, 23 Oct 2019 00:01:05 +0000 (18:01 -0600)]
parts/inc/misc: Fix EBCDIC bug

We were double xlating the underscore

(cherry picked from commit dc1ee4fcc8bc4d1f627c33228ab30432e57f0a9a)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoparts/inc/utf8: Refactor a little for clarity
Karl Williamson [Fri, 18 Oct 2019 22:19:07 +0000 (15:19 -0700)]
parts/inc/utf8: Refactor a little for clarity

(cherry picked from commit eddcc8663f05c54bedd22e22242d751af34e61d3)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoCan test isASCII_utf8_safe to earlier
Karl Williamson [Fri, 18 Oct 2019 22:09:21 +0000 (15:09 -0700)]
Can test isASCII_utf8_safe to earlier

This doesn't really depend on any UTF-8, so by slightly rewriting it, we
can backport it earlier.

(cherry picked from commit 621fa67cacff3079f35a57baf2bd737b03158601)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoHACKERS: Update, correct
Karl Williamson [Thu, 24 Oct 2019 20:21:06 +0000 (14:21 -0600)]
HACKERS: Update, correct

(cherry picked from commit c2ab6b2df5c405ba7a4b0e98d14fbd7cc19f70ad)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoGenerate latest apidoc.fnc from blead
Karl Williamson [Tue, 22 Oct 2019 18:47:33 +0000 (12:47 -0600)]
Generate latest apidoc.fnc from blead

(cherry picked from commit 535722786ead0041913879fc51ed4eaacc27a693)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoImplement G_RETHROW for eval_sv
Pali [Fri, 25 Oct 2019 13:58:50 +0000 (15:58 +0200)]
Implement G_RETHROW for eval_sv

(cherry picked from commit 73a4fb176de5b198cebeb88d08a57b0ad4bbf1f3)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoPartially revert 9f84bc0 which broke generating Makefile for non-blead perl versions
Pali [Thu, 24 Oct 2019 08:11:10 +0000 (10:11 +0200)]
Partially revert 9f84bc0 which broke generating Makefile for non-blead perl versions

(cherry picked from commit e85ddb627f9f99461f45ae482506cc276c7e8165)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoMake dist generate a fresh PPPort.pm
Nicolas R [Fri, 11 Oct 2019 22:13:28 +0000 (16:13 -0600)]
Make dist generate a fresh PPPort.pm

(cherry picked from commit d811f7d36e88cdb964f3fece5e2eff8c0d6a252b)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoBackport toLOWER_utf8_safe and kin
Karl Williamson [Fri, 11 Oct 2019 17:44:29 +0000 (11:44 -0600)]
Backport toLOWER_utf8_safe and kin

These now are backported to 5.6.0

(cherry picked from commit 3d196ee9ca5e58cd9908fa8f60ab7339bb2f3160)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoRegenerate to latest
Karl Williamson [Sun, 6 Oct 2019 17:51:26 +0000 (11:51 -0600)]
Regenerate to latest

This updates parts/base, parts/todo based on blead and changes to D:P

(cherry picked from commit d391de2e81ef30cc86e1e96c36b94bb9888c3f3c)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoPerl 7.0 had a space as being a graphic char
Karl Williamson [Fri, 11 Oct 2019 10:05:16 +0000 (04:05 -0600)]
Perl 7.0 had a space as being a graphic char

(cherry picked from commit 36f9cc037debdf1d07fa92b2b99fa9b27ba3e8e2)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoChange isUTF8_CHAR to use macro, not expansion
Karl Williamson [Wed, 9 Oct 2019 15:19:58 +0000 (09:19 -0600)]
Change isUTF8_CHAR to use macro, not expansion

Use the macro instead of duplicating its definition ourselves--

(cherry picked from commit a02174d23a38cb41bfc90fcc646dbb8a748b6807)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoAdd warning about UTF-8 unreliable in early perls
Karl Williamson [Sun, 6 Oct 2019 17:59:15 +0000 (11:59 -0600)]
Add warning about UTF-8 unreliable in early perls

(cherry picked from commit f5227a1c2cb19045b6e5a5e36454fdf893f88f8c)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoisPSXSPC() is a synonym for isSPACE
Karl Williamson [Sun, 6 Oct 2019 03:59:53 +0000 (21:59 -0600)]
isPSXSPC() is a synonym for isSPACE

They used to have a slightly different meaning, but that was changed a
long time ago.

(cherry picked from commit baa6a68a10d1cd881d989a5c593d58e04593d0e1)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoBackport isFOO_utf8_safe() macros
Karl Williamson [Sun, 6 Oct 2019 03:51:35 +0000 (21:51 -0600)]
Backport isFOO_utf8_safe() macros

This tests every code point between 0..255.  Doing so caught several
bugs.

(cherry picked from commit f899f86bd25f0fd9a84f892598cedb32d18f20ce)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoAdd tests for NATIVE_TO_LATIN1, vice-versa
Karl Williamson [Sun, 6 Oct 2019 03:45:55 +0000 (21:45 -0600)]
Add tests for NATIVE_TO_LATIN1, vice-versa

(cherry picked from commit be9d71690938fec20465f21e2f57ac206753db99)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoFix isGRAPH_L1() bug
Karl Williamson [Sun, 6 Oct 2019 03:38:43 +0000 (21:38 -0600)]
Fix isGRAPH_L1() bug

This was including NBSP as printable

(cherry picked from commit 63ece791ac466f042e9ebc4eaeab7f476eb4d22a)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoparts/inc/misc: Add withinCOUNT and inRANGE
Karl Williamson [Sun, 6 Oct 2019 03:36:27 +0000 (21:36 -0600)]
parts/inc/misc: Add withinCOUNT and inRANGE

These are too new to be in the public API, but their presence here helps
with backporting things, so provide them, but undocumented.

(cherry picked from commit eb85beedef0dfeca24cb5a011f17a893525cb098)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoparts/inc/utf8: Backport some basic UTF-8 stuff
Karl Williamson [Sun, 6 Oct 2019 03:32:16 +0000 (21:32 -0600)]
parts/inc/utf8: Backport some basic UTF-8 stuff

These are not in the public API because no module writer should be
dealing at this level, but they are needed for backporting some things,
and so they are provided here without publicly announcing their
availability.

Included is an internal helper function

(cherry picked from commit 4ebf864379b0c46d16a19d5546e36062b2545eae)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoFix typos in HACKERS; add clarification
Karl Williamson [Sun, 6 Oct 2019 03:29:07 +0000 (21:29 -0600)]
Fix typos in HACKERS; add clarification

(cherry picked from commit 3f7ae74acac1e4fa34913985e9fc9119d517ca6c)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoBackport UTF8_CHK_SKIP
Karl Williamson [Wed, 9 Oct 2019 15:12:49 +0000 (09:12 -0600)]
Backport UTF8_CHK_SKIP

And revise an existing item to use it.

(cherry picked from commit 45578f404ccd72c0dbfb539cb5c438e8a46106fa)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoBackport UTF8_SKIP
Karl Williamson [Wed, 9 Oct 2019 15:10:42 +0000 (09:10 -0600)]
Backport UTF8_SKIP

which is just a synonym for UTF8SKIP

(cherry picked from commit 393aff27a03ba3dfe30531be2950e3702fc07bd2)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoparts/inc/misc: Backport UNI to/from NATIVE
Karl Williamson [Wed, 9 Oct 2019 15:05:06 +0000 (09:05 -0600)]
parts/inc/misc: Backport UNI to/from NATIVE

And change the way we setup similar defines.  On perls before EBCDIC
existed, these simply return their arguments.  But a module now can
unconditionally include these, and it will do the right thing.

(cherry picked from commit e836478411ddc77f5155d0fd4e4f2a25f28dc47e)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoUpdated apidoc.fnc to latest blead
Karl Williamson [Sun, 6 Oct 2019 04:02:05 +0000 (22:02 -0600)]
Updated apidoc.fnc to latest blead

(cherry picked from commit 89cfe197a6de2c56f9d7ef5e75ce074b6f7c8579)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoGet latest blead embed.fnc
Karl Williamson [Sun, 6 Oct 2019 04:01:46 +0000 (22:01 -0600)]
Get latest blead embed.fnc

(cherry picked from commit 1d506a06fb1782b574eae0c8177b0af23823f1ed)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agodevel/regenerate: Add --yes option
Karl Williamson [Wed, 9 Oct 2019 14:52:19 +0000 (08:52 -0600)]
devel/regenerate: Add --yes option

This answers yes to the standard questions automatically.  It's handy
when you want to run the job nohup, and really know what you're doing.

(cherry picked from commit e8dfa8b9ad3c70c21ed7c7c33d9049c94ea8cbbc)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agocat_file util in Makefile
Nicolas R [Tue, 8 Oct 2019 15:26:36 +0000 (09:26 -0600)]
cat_file util in Makefile

References #137

(cherry picked from commit fb6dad1b13facfbcdf834a6b95db2320410d7c78)
Signed-off-by: Nicolas R <atoomic@cpan.org>
4 years agoPATCH: gh#17227 heap-buffer-overflow
Karl Williamson [Fri, 8 Nov 2019 17:29:05 +0000 (10:29 -0700)]
PATCH: gh#17227 heap-buffer-overflow

There were two problems this uncovered.  One was that a floating point
expression with both operands ints truncated before becoming floating.
One operand needs to be floating.

The second is that the expansion of a non-UTF-8 byte needs to be
considered based on non-UTF-8, rather than its UTF-8 representation.

4 years agoFix tr/// compilation on VMS
Karl Williamson [Fri, 8 Nov 2019 17:14:33 +0000 (10:14 -0700)]
Fix tr/// compilation on VMS

64-bits on that platform require a long long, and 1UL isn't.  I should
have copied more carefully the similar code in utf8.h

(reported to me privately by Craig Berry)

4 years agoLink to more useful section of perlop from readpipe
Dan Book [Thu, 7 Nov 2019 23:44:47 +0000 (18:44 -0500)]
Link to more useful section of perlop from readpipe

qx is only briefly mentioned in the "I/O Operators" section of perlop. It is better to link to the section where it is discussed in detail.

4 years agoperlop - Make "STRING" section heading consistent
Dan Book [Thu, 7 Nov 2019 23:53:14 +0000 (18:53 -0500)]
perlop - Make "STRING" section heading consistent

All of the similar section headings are enclosed in C<>.

4 years agoperlguts: Revise pod of UTF8f
Karl Williamson [Thu, 7 Nov 2019 20:32:18 +0000 (13:32 -0700)]
perlguts: Revise pod of UTF8f

This is really about strings.  SVs are more conveniently printed using
SVf.

4 years agoSilence some compiler warnings
Karl Williamson [Thu, 7 Nov 2019 17:41:28 +0000 (10:41 -0700)]
Silence some compiler warnings

These were introduced in the tr/// changes in the series
merged in 240494d6992696a7a350217c131e1d5dc1444a0c

4 years agoMerge branch 'Remove swashes from core' into blead
Karl Williamson [Thu, 7 Nov 2019 04:23:18 +0000 (21:23 -0700)]
Merge branch 'Remove swashes from core' into blead

This branch reimplements the final use of swashes in core, tr///, and
then proceeds to remove the swash implementation from core.

Swashes are still used in Unicode::UCD, though this can also be changed.
But there are higher priority tasks to do at the moment.

I started work on this more than two releases ago, and it finally is
ready.

4 years agoRemove lib/unicore/Heavy.pl
Karl Williamson [Wed, 6 Nov 2019 17:32:31 +0000 (10:32 -0700)]
Remove lib/unicore/Heavy.pl

This file was for the use of utf8_heavy.pl.  But now that that is
incorporated into Unicode::UCD, move the definitions from Heavy.pl to
lib/unicore/UCD.pl which is used by Unicode::UCD.  This allows removing
package names.

4 years agoUCD.pm: Remove 'none' from swash
Karl Williamson [Wed, 6 Nov 2019 17:02:45 +0000 (10:02 -0700)]
UCD.pm: Remove 'none' from swash

This was only used by tr///, and hence is no longer relevant.  I never
really understood it.

4 years agoRemove utf8_heavy.pl
Karl Williamson [Wed, 6 Nov 2019 16:40:11 +0000 (09:40 -0700)]
Remove utf8_heavy.pl

The only remaining user of this is Unicode::UCD, and so most of the code
from utf8_heavy.pl is moved into that UCD.pm.

It removes a no-longer relevant test (that had been changed into a skip
anyway), and it changes or removes the no-longer relevant references in
comments to utf8_heavy.pl

Later commits will do some simplification as not all the previous
functionality is needed.  This commit removed only the parts that were
preventing compilation and tests passing.

4 years agoRemove swashes from core
Karl Williamson [Tue, 5 Nov 2019 05:27:39 +0000 (22:27 -0700)]
Remove swashes from core

Also references to the term.

4 years agoop.c: Remove no-longer used function
Karl Williamson [Tue, 5 Nov 2019 05:18:05 +0000 (22:18 -0700)]
op.c: Remove no-longer used function

4 years agohandy.h: Change references to swashes
Karl Williamson [Tue, 5 Nov 2019 05:17:08 +0000 (22:17 -0700)]
handy.h: Change references to swashes

As these are no longer used.

4 years agoPorting/todo.pod: Rmv reference to fixing swashes
Karl Williamson [Tue, 5 Nov 2019 05:14:30 +0000 (22:14 -0700)]
Porting/todo.pod: Rmv reference to fixing swashes

4 years agoUnTODO some tests fixed by the previous commit
Karl Williamson [Tue, 5 Nov 2019 05:10:56 +0000 (22:10 -0700)]
UnTODO some tests fixed by the previous commit

4 years agoReimplement tr/// without swashes
Karl Williamson [Tue, 5 Nov 2019 04:30:48 +0000 (21:30 -0700)]
Reimplement tr/// without swashes

This large commit removes the last use of swashes from core.

It replaces swashes by inversion maps.  This data structure is already
in use for some Unicode properties, such as case changing.

The inversion map data structure leads to straight forward
implementation code, so I collapsed the two doop.c routines
do_trans_complex_utf8() and do_trans_simple_utf8() into one.  A few
conditionals could be avoided in the loop if this function were split so
that one version didn't have to test for, e.g., squashing, but I suspect
these are in the noise in the loop, which has to deal with UTF-8
conversions.  This should be faster than the previous implementation
anyway.  I measured the differences some releases back, and inversion
maps were faster than the equivalent swash for up to 512 or 1024
different ranges.  These numbers are unlikely to be exceeded in tr///
except possibly in machine-generated ones.

Inversion maps are capable of handling both UTF-8 and non-UTF-8 cases,
but I left in the existing non-UTF-8 implementation, which uses tables,
because I suspect it is faster.  This means that there is extra code,
purely for runtime performance.

An inversion map is always created from the input, and then if the table
implementation is to be used, the table is easily derived from the map.
Prior to this commit, the table implementation was used in certain edge
cases involving code points above 255.  Those cases are now handled by
the inversion map implementation, because it would have taken extra code
to detect them, and I didn't think it was worth it.  That could be
changed if I am wrong.

Creating an inversion map for all inputs essentially normalizes them,
and then the same logic is usable for all.  This fixes some false
negatives in the previous implementation.  It also allows for detecting
if the actual transliteration can be done in place.  Previously, the
code mostly punted on that detection for the UTF-8 case.

This also allows for accurate counting of the lengths of the two sides,
fixing some longstanding TODO warning tests.

A new flag is created, OPpTRANS_CAN_FORCE_UTF8, when the tr/// has a
below 256 character resolving to one that requires UTF-8.  If this isn't
set, the code knows that a non-UTF-8 input won't become UTF-8 in the
process, and so can take short cuts.  The bit representing this flag is
the same as OPpTRANS_FROM_UTF, which is no longer used.  That name is
left in so that the dozen-ish modules in cpan that refer to it can still
compile.  AFAICT none of them actually use the flag, as well they
shouldn't since it is private to the core.

Inversion maps are ideally suited for tr/// implementations.  An issue
with them in general is that for some pathological data, they can become
fragmented requiring more space than you would expect, to represent the
underlying data.  However, the typical tr/// would not have this issue,
requiring only very short inversion maps to represent; in some cases
shorter than the table implementation.

Inversion maps are also easier to deparse than swashes.  A deparse TODO
was also fixed by this commit, and the code to deparse UTF-8 inputs is
simplified.

One could implement specialized data structures for specific types of
inputs.  For example, a common tr/// form is a single range, like
tr/A-Z/a-z/.  That could be implemented without a table and be quite
fast.  An intermediate step would be to use the inversion map
implementation always when the transliteration is a single range, and
then special case length=1 maps at execution time.

Thanks to Nicholas Rochemagne for his help on B

4 years agointrpvar.h: Add variable for use in tr///
Karl Williamson [Thu, 3 Oct 2019 04:34:37 +0000 (22:34 -0600)]
intrpvar.h: Add variable for use in tr///

This is part of this branch of changes.

4 years agoop.c: Add debugging dump function
Karl Williamson [Tue, 19 Feb 2019 04:14:47 +0000 (21:14 -0700)]
op.c: Add debugging dump function

This function dumps out an inversion map

4 years agoop.h: Add synonyms for some tr/// values
Karl Williamson [Mon, 4 Nov 2019 21:59:02 +0000 (14:59 -0700)]
op.h: Add synonyms for some tr/// values

4 years agoChange names of some OPpTRANS flags
Karl Williamson [Mon, 4 Nov 2019 21:55:16 +0000 (14:55 -0700)]
Change names of some OPpTRANS flags

These two flags will shortly become obsolete, replaced by ones with
different meanings.  This flag makes the new ones the normal ones, and
makes the old names synonyms so that code that refers to them can
compile.

4 years agodoop.c: Refactor do_trans_complex()
Karl Williamson [Mon, 4 Nov 2019 21:38:58 +0000 (14:38 -0700)]
doop.c: Refactor do_trans_complex()

I had trouble understanding how this uncommented routine worked.  And it
turned out to be broken, squeezing the pre-transliterated characters
instead of the post-transliterated ones.  This fixes the TODO test added
in the previous commit.

4 years agot/op/tr.t: Add tests, incl. a TODO
Karl Williamson [Tue, 5 Nov 2019 05:13:43 +0000 (22:13 -0700)]
t/op/tr.t: Add tests, incl. a TODO

This adds a TODO test which demonstrates that the current tr/// is
broken, to be fixed by the next commit.

It adds other tests designed to stress the forthcoming revisions in the
implementation of tr///.