perl5.git.perl.org Git - perl5.git/log

This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5

https://perl5.git.perl.org / perl5.git / log

Yves Orton [Thu, 7 Apr 2022 11:47:15 +0000 (13:47 +0200)]

pod/.gitignore - remove redundant perlmacos.pod entry

perlmacos related materials were deleted in 21c79b49e2f6f7,
this pod/.gitignore entry was overlooked.

commit | commitdiff | tree

Yves Orton [Thu, 7 Apr 2022 07:02:05 +0000 (09:02 +0200)]

Update Encode to 3.17

This silences the build warnings reported in https://github.com/Perl/perl5/issues/19588
and in https://github.com/Perl/perl5/issues/17014.

It includes some test updates, but no functionality changes.

commit | commitdiff | tree

Paul Marquess [Wed, 6 Apr 2022 12:17:05 +0000 (12:17 +0000)]

Sync Compress-Raw-Zlib-2.103 + 2 others into blead

This commit synchs into blead versions 2.103 for 3 CPAN distributions:

Compress-Raw-Zlib
IO-Compress
Compress-Raw-Bzip2

Applying the commits one at a time would have resulted in one test
failure in one of those commits, but applying all three has all tests
passing as expected

From Changes for Compress-Raw-Zlib

2.103 3 April 2022

* Sync upstream fix for CVE-2018-25032
  https://github.com/advisories/GHSA-jc36-42cf-vqwj

  Update to Zlib 1.2.12
  d507f527768f6cbab5831ed3ec17fe741163785c

  Fix for inflateSync return code change
  f47ea5f36c40fe19efe404dd75fd790b115de596

  Fix for incorrect CRC from zlib 1.2.12.1
  https://github.com/madler/zlib/commit/ec3df00224d4b396e2ac6586ab5d25f673caa4c2
  60104e3a162a116548303861ae0811fb850e65fd

* AUTHOR doesn't contain the stated information
  bf5a03c1b440c8d9e41cffb344bf889794cc532b

From Changes for IO-Compress

2.103 3 April 2022

* Update version to 2.103
  97f1893892eccac69b3a8033378b0b44d7c4f3ab

* Fix for inflateSyncs retrurn code change
  4843e22285bf8e52c9b5b913d167a1545995c793

* Add constant for ZIP_CM_AES
  91be04dd8dc2848e3c25b87ec498cf8ccc34187a

* Point links to rfcs to ietf.org
  https://github.com/pmqs/IO-Compress/pull/37
  a8f28b36cf4d77df1cfa0516867012425920a62f

* Rename test file to fix manifest warning
  https://github.com/pmqs/IO-Compress/pull/36
  955244f9ac0654d7e8d54115162da53c85d7178c

* Add perl 5.34
  06f41883f62ed1b88b03c246b16e0b5ef72503bc

* Fix for Calling nextStream on an IO::Uncompress::Zip object in Transparent mode dies when input is uncompressed
  https://github.com/pmqs/IO-Compress/issues/34
  b0f93fe62f84b7d4d4bb8d2ea8e6d5432887103f

* IO::Compress: Generalize for EBCDIC
  https://github.com/pmqs/IO-Compress/pull/32
  90b51dbbd785e2c824cb0a93feef3b3dd5d075f2

* IO::Compress: Fix misspelling in 112utf8-zip.t
  c22216b5d3202dce01ef17a271252f82520a6ab9

* Revert "Always have full zip64 entry in central directory"
  7df4c9bc98667bc1afd1b4bc5a27d20f94e3cd9c

* Always have full zip64 entry in central directory
  333648ee1dece6eb220060c7ec09806f6ebb9866

* update cpanm path on MacOS
  33079902934885c515768a08d72e89243a5d01a9

From Changes for Compress-Raw-Bzip2

2.103 3 April 2022

* Silence uninitialized warnings
  https://github.com/pmqs/Compress-Raw-Bzip2/pull/5
  ff3d907325091287ac1525db384b99a968d763d7
  641a440ec6229c1d368b9ead48f4968b955c0115

commit | commitdiff | tree

Dagfinn Ilmari Mannsåker [Tue, 5 Apr 2022 18:13:54 +0000 (19:13 +0100)]

Delete long-obsolete README.macos

Support for Mac OS Classic was removed in 5.12, there's no need to
keep this obsolete notice around.

In passing, add missing perlmacosx to plan9/mkfile's list of
archpodnames.

commit | commitdiff | tree

Niyas Sait [Tue, 22 Mar 2022 10:57:42 +0000 (10:57 +0000)]

build: add configurations to compile perl for windows/arm64

commit | commitdiff | tree

Yves Orton [Tue, 5 Apr 2022 13:56:20 +0000 (15:56 +0200)]

makedepend_file.SH - add -DPERL_CORE so we pick up all deps

makedepend_file does not find all deps for our code because in many
cases the dependencies are hidden behind a guard clause which checks
if PERL_CORE is defined. This is annoying when working on the regex
engine as after `make regen` is executed `make` does not notice that
files like regnodes.h have been updated. No doubt it is annoying in
other contexts too.

This adds the -DPERL_CORE so that we pick up these guarded dependencies.
With this patch things updating regnodes.h is noticed and regcomp.o,
regexec.o and miniperl will be appropriately rebuilt.

Thanks to Dagfinn Ilmari Mannsåker for figuring this out.

commit | commitdiff | tree

Paul "LeoNerd" Evans [Mon, 4 Apr 2022 15:28:06 +0000 (16:28 +0100)]

Add the new flow-control keywords to perlfunc.pod

commit | commitdiff | tree

Yves Orton [Sun, 13 Mar 2022 12:49:35 +0000 (13:49 +0100)]

runenv_hashseed.t - rework to deal with probabilistic failures

The old version of this test had a 1/128 chance of failing the
RANDOM key change test each time we try it. We run it 50 times, so
there is a non-zero chance of failing it at least once. It is actually
surprising we haven't seen more test fails so far. The key change test
also has a similar problem, albeit with a much lower probability of
failing the test.

This version of the test includes a number of changes.

* Test setting PERL_PERTURB_KEYS both via name and number.

* Use more test keys with a mix of different key lengths, some
relatively long.

* Rework various tests that only verify that setting PERL_PERTURB_KEYS
mode works, and that setting the seed works so they only fire once per
mode. There is no point in repeating them over and over.

* Improved comments about what is going on.

* Only use -Dh mode when we test the DETERMINISTIC mode.

* Clean up the temp files more aggressively.

commit | commitdiff | tree

Richard Leach [Sat, 19 Mar 2022 23:01:57 +0000 (23:01 +0000)]

Perl_reg_named_buff_fetch: simplify newSVsv(&PL_sv_undef)

Specifically, newSVsv(&PL_sv_undef) reduces to just newSV_type(SVt_NULL).

commit | commitdiff | tree

Karl Williamson [Thu, 31 Mar 2022 14:40:25 +0000 (08:40 -0600)]

toke.c: Reorder branches for clarity

The trivial case should be handled first.

commit | commitdiff | tree

Karl Williamson [Tue, 29 Mar 2022 10:43:46 +0000 (04:43 -0600)]

toke.c: scan_str(): Rmv special handling for NUL delim

Because we use ninstr(), which can handle NULs, no special handling of
them is required.

commit | commitdiff | tree

Karl Williamson [Tue, 29 Mar 2022 03:31:55 +0000 (21:31 -0600)]

toke.c: C_ARRAY_END() doesn't work on a string

The code was using the macro C_ARRAY_END which doesn't apply to strings,
thus not giving the correct end to the string. But no tests were
failing. No new tests are added here, because the next commit will
change things so that tests would fail all over the place.

commit | commitdiff | tree

Karl Williamson [Tue, 29 Mar 2022 10:38:30 +0000 (04:38 -0600)]

toke.c: Don't assume is UTF-8 when it might not be

This code only works by coincidence on ASCII platforms, due to the
chance ways the underlying UTF-8 is represented. But it definitely
doesn't on EBCDIC. Test before assuming is UTF-8.

commit | commitdiff | tree

Karl Williamson [Tue, 29 Mar 2022 10:36:15 +0000 (04:36 -0600)]

toke.c: Move variable set to after possible exit

It's just a little bit better to do the warning (which could be made
fatal) before setting something that's only needed later.

commit | commitdiff | tree

Karl Williamson [Tue, 29 Mar 2022 10:32:18 +0000 (04:32 -0600)]

toke.c Add NUL terminator in both branches

By moving the setting of this to after two branches of a conditional
come together, it gets set always, instead of sometimes.

commit | commitdiff | tree

Karl Williamson [Tue, 29 Mar 2022 10:26:32 +0000 (04:26 -0600)]

toke.c: Move setting of a variable to later

This simplies a bit.

commit | commitdiff | tree

Karl Williamson [Tue, 29 Mar 2022 10:17:04 +0000 (04:17 -0600)]

toke.c: Rmv redundant storage

The data contained in this variable is a copy of const data stored
elsewhere. Instead of making a copy, simplify to just point to the
already-stored data

commit | commitdiff | tree

Karl Williamson [Tue, 29 Mar 2022 09:10:41 +0000 (03:10 -0600)]

toke.c: Rmv redundant variable

This variable doesn't add anything. We can use other variables to
just as conveniently get at the information it contains.

commit | commitdiff | tree

Karl Williamson [Tue, 29 Mar 2022 03:24:33 +0000 (21:24 -0600)]

toke.c: Rmv redundant conditional

To get to the removed conditional, it has already been checked for being
true.

commit | commitdiff | tree

Karl Williamson [Mon, 28 Mar 2022 19:45:59 +0000 (13:45 -0600)]

toke.c: Change constant to sync with comment

Its better if the comment and code mesh.

commit | commitdiff | tree

Paul "LeoNerd" Evans [Fri, 11 Mar 2022 12:22:11 +0000 (12:22 +0000)]

Some initial documentation about the new created_as_{string,number} functions

commit | commitdiff | tree

Paul "LeoNerd" Evans [Mon, 14 Mar 2022 14:31:29 +0000 (14:31 +0000)]

Initial implementation and unit-tests of created_as_{string,number}

commit | commitdiff | tree

James E Keenan [Wed, 30 Mar 2022 17:54:09 +0000 (17:54 +0000)]

Test::More::note() needs version 0.82

For: https://github.com/Perl/perl5/issues/19569, as reported by
kbulgrien.

commit | commitdiff | tree

Karl Williamson [Wed, 30 Mar 2022 16:04:06 +0000 (10:04 -0600)]

podcheck.t: Dont check verbatim length on a few pods

These pods have some very long lines that make sense to keep on a single
line, such as output from a program. That means that someone viewing
them will either enlarge their window to view them unbroken or all is
lost anyway; there may be a few lines that could be shortened, but no
real value to do so.

commit | commitdiff | tree

Karl Williamson [Wed, 30 Mar 2022 16:01:22 +0000 (10:01 -0600)]

podcheck: Add links to known modules

commit | commitdiff | tree

Karl Williamson [Wed, 30 Mar 2022 15:59:41 +0000 (09:59 -0600)]

podcheck.t: Add spaces to an output message

To make it easier to read.

commit | commitdiff | tree

Karl Williamson [Wed, 30 Mar 2022 15:57:10 +0000 (09:57 -0600)]

podcheck.t: Devel::ppport::ppphdoc isn't pod

So don't check it.

commit | commitdiff | tree

Karl Williamson [Wed, 30 Mar 2022 15:56:03 +0000 (09:56 -0600)]

podcheck.t: Update pod about verbatim lines

commit | commitdiff | tree

Karl Williamson [Wed, 30 Mar 2022 15:52:28 +0000 (09:52 -0600)]

perlhacktips: white-space only

commit | commitdiff | tree

Karl Williamson [Thu, 24 Mar 2022 02:21:01 +0000 (20:21 -0600)]

Clarify \p{Decomposition_Type=NonCanonical}

This closes #18458

commit | commitdiff | tree

Karl Williamson [Wed, 30 Mar 2022 03:42:53 +0000 (21:42 -0600)]

ExtUtils::ParseXS/t/002-more.t: Fix skip count

commit | commitdiff | tree

Karl Williamson [Mon, 28 Mar 2022 19:34:32 +0000 (13:34 -0600)]

toke.c: White-space, comments

commit | commitdiff | tree

Karl Williamson [Mon, 28 Mar 2022 19:29:17 +0000 (13:29 -0600)]

t/loc_tools.pl: Skip locale tests on z/OS threaded

setlocale() is a no-op on this system after the first thread is created,
making it an outlier of platforms, so the tests assume otherwise, hence
would fail.

commit | commitdiff | tree

Karl Williamson [Mon, 28 Mar 2022 19:25:10 +0000 (13:25 -0600)]

perllocale: Add note about z/OS special behavior

commit | commitdiff | tree

Karl Williamson [Mon, 28 Mar 2022 19:24:12 +0000 (13:24 -0600)]

perllocale: Formatting, grammar

commit | commitdiff | tree

Ricardo Signes [Mon, 28 Mar 2022 19:16:30 +0000 (15:16 -0400)]

release schedule: add the next two people

commit | commitdiff | tree

James E Keenan [Sun, 27 Mar 2022 19:52:44 +0000 (19:52 +0000)]

Bump $VERSION in perl5db.pl

commit | commitdiff | tree

Karl Williamson [Thu, 24 Mar 2022 12:52:51 +0000 (06:52 -0600)]

APItest/t/sv_streq.t: Generalize for EBCDIC

This test fails on EBCDIC systems, because it wants a non-ASCII
character, and the one it chose, E9, is ASCII on EBCDIC ('Z').

perlhacktips suggests B6 as a character to use in such tests, and this
commit changes to use that.

commit | commitdiff | tree

Karl Williamson [Mon, 21 Mar 2022 20:10:16 +0000 (14:10 -0600)]

Perl5DB: Rmv ASCII dependency

commit | commitdiff | tree

Karl Williamson [Sat, 26 Mar 2022 17:23:04 +0000 (11:23 -0600)]

Devel::Peek::Peek.t: Simplify EBCDIC handling

The prior commit shows what can happen when two branches do the same
thing: they can get out of sync

Since this test file was originally written, the testing infrastructure
has improved so that there are functions that handle the gory details of
character set differences for you. This test file hadn't been updated
since it wasn't causing a problem, until now.

This commit changes to use the new infrastructure, and as a result one
branch gets removed each from the two tests that varied depending on
character set.

commit | commitdiff | tree

Karl Williamson [Sat, 26 Mar 2022 12:50:28 +0000 (06:50 -0600)]

Devel::Peek::Peek.t: Add missing '\' for EBCDIC

This file was recently changed, and the EBCDIC side of the change had a
typo.

commit | commitdiff | tree

Hugo van der Sanden [Wed, 23 Mar 2022 13:19:00 +0000 (13:19 +0000)]

gh19557: restore match_end on early bailout

After 271c3af797, early bailout from the inner one of a pair of nested
lookbehinds would leave the desired match_end pointing at the wrong
place, so the outer lookbehind could give the wrong answer.

commit | commitdiff | tree

Karl Williamson [Thu, 24 Mar 2022 20:31:44 +0000 (14:31 -0600)]

toke.c: Add missing ptr update

scan_str() calls s=skipspace(s).  It turns out that this function can
actually change the buffer 's' is pointing to, so that the original
'start' passed in to the function is obsolete.  Just update it.  This is
very much like the paradigm already in S_force_word().

This bug previously existed, but commit
32b87797e986f5d99836e16ea6b9d9ff5a56d3be increased the frequency of
occurrence from close to non-existent to relatively often.  It only
happened when the string being delimited had some spaces before it, and
only if the buffer got moved.  This depends on the position the
construct is in the file, and on the buffering of the reading of that
file, hence the symptoms had it occurring much more often using stdio
than PerlIO. (it could just as well have been the reverse, I suppose.)

The mentioned commit collapsed two different loops; one of which didn't
bother with a check it should have been doing.  Without that check, the
likelihood of this being triggered was much less.  (But illegal input
would get by.)

There is a nuance here, which resulted in the need for this commit to
also update the test file, from having two occurrences of an error on a
single line to just one.  This is because, if the buffer moves, we reset
'start' to 's'.  This makes 's' appear to be at the left edge of the
input when it really is just at the left edge of the buffer.  The test
that failed used a combining character (I'll call it 'cc' for short)
after a space, to check that the code accurately catches the illegality
that you can't delimit a string with a character that doesn't stand on
its own, such as a cc.  However when such a character comes at the
beginning of the input, there's nothing for it to combine with, and
Unicode says that is legal, so we do too.  So this moving 'start' makes
something that is illegal look to be legal.  I don't think this is a
problem because the code looks up the cc and discovers there is no
mirror for it, so it must also be the terminator for the string.  If
this cc is just from a single typo in the input, there won't be a
matching terminator, and the compilation will abort.  If the program
intended to use a cc as both fore and aft of a string, the terminating
occurrence of this cc will also be checked for validity, and it will
almost certainly be seen to be an illegal cc in this context, so again
the compilation will fail.  That is indeed what is happening in
t/lib/warnings/toke.  If the buffering were such that the terminating cc
also began a new buffer, it again would be viewed as at the edge and the
string would be parsed as being ok, when it really shouldn't have been.
Should this happen, I don't see a real problem.  An attacker could craft
a string with the precise length to make this happen, but to do so they
would have to control the source code, and the war is already lost.

commit | commitdiff | tree

James E Keenan [Thu, 24 Mar 2022 12:46:56 +0000 (12:46 +0000)]

Correct POD formatting error

commit | commitdiff | tree

Karl Williamson [Thu, 24 Mar 2022 03:33:54 +0000 (21:33 -0600)]

perlunicode: regex sets is no longer experimental

commit | commitdiff | tree

Karl Williamson [Wed, 23 Mar 2022 19:26:26 +0000 (13:26 -0600)]

re/anyof.t: Add debugging info

This is in response to
https://github.com/Perl/perl5/pull/19558#issuecomment-1076659884

commit | commitdiff | tree

Karl Williamson [Wed, 23 Mar 2022 19:31:38 +0000 (13:31 -0600)]

Fix double encoding of UTF-8 on EBCDIC

Commit d1e771d8c533168553df9b2a858d967f707fc9fe broke EBCDIC builds by
doubly encoding some UTF-8 characters.

commit | commitdiff | tree

Hugo van der Sanden [Mon, 21 Mar 2022 21:55:01 +0000 (21:55 +0000)]

gh17746: add missing check on hardcount

Failing to check for max iterations caused an assertion failure.

commit | commitdiff | tree

Graham Knop [Mon, 21 Mar 2022 16:48:51 +0000 (17:48 +0100)]

fix typo in perl53510delta

commit | commitdiff | tree

James E Keenan [Mon, 21 Mar 2022 13:37:35 +0000 (13:37 +0000)]

Correct POD formatting error

Use 'F<>' for strings that are simply filenames.

As reported by Tux on #p5p.

commit | commitdiff | tree

Sawyer X [Mon, 21 Mar 2022 10:32:29 +0000 (11:32 +0100)]

Update RMG on updateAUTHORS.pl

commit | commitdiff | tree

Sawyer X [Mon, 21 Mar 2022 10:21:08 +0000 (11:21 +0100)]

Update Module::CoreList for 5.35.11

commit | commitdiff | tree

Richard Leach [Sun, 20 Mar 2022 19:05:10 +0000 (19:05 +0000)]

Perl_newSViv: simplify by using (inline) newSV_type

commit | commitdiff | tree

Richard Leach [Sun, 23 May 2021 23:33:36 +0000 (00:33 +0100)]

Perl_newSVnv: simplify SV creation and SvNV_set

The function can be simplified by using the now-inlined newSV_type
function, directly using SvNV_set, and twiddling the required flags.

This cuts out any function call overhead, a switch statement leading
to a sv_upgrade(sv, SVt_NV) call, and a touch more bit-twiddling than
is necessary.

commit | commitdiff | tree

Tony Cook [Mon, 11 Jul 2016 00:52:20 +0000 (10:52 +1000)]

(perl #128245) make it obvious :encoding uses PerlIO::encoding

rather than encoding.pm

commit | commitdiff | tree

Sawyer X [Sun, 20 Mar 2022 18:34:19 +0000 (19:34 +0100)]

Bump version for 5.35.11

commit | commitdiff | tree

Sawyer X [Sun, 20 Mar 2022 16:46:29 +0000 (17:46 +0100)]

New perldelta for 5.35.11

commit | commitdiff | tree

Sawyer X [Sun, 20 Mar 2022 16:40:47 +0000 (17:40 +0100)]

Tick release, update epigraph

commit | commitdiff | tree

Sawyer X [Sun, 20 Mar 2022 10:35:36 +0000 (11:35 +0100)]

Add new release to perlhist

commit | commitdiff | tree

Sawyer X [Sun, 20 Mar 2022 10:09:28 +0000 (11:09 +0100)]

Finalize perldelta

commit | commitdiff | tree

Sawyer X [Sun, 20 Mar 2022 09:44:55 +0000 (10:44 +0100)]

Update Module::CoreList for 5.35.10

commit | commitdiff | tree

Sawyer X [Sun, 20 Mar 2022 09:36:13 +0000 (10:36 +0100)]

Update AUTHORS file

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 23:38:55 +0000 (17:38 -0600)]

Add arrows to paired string delimiters

Unicode has lots of arrows of various shapes, sizes, and directions.
None of them were of consequence to the Bidirectional algorithm, so none
were specified as being mirrored pairs. This commit uses the
generalizations already in place from previous commits to examine arrow
symbols and choose which are mirrored pairs.

As previously, it rejects arrows with contrary directionality, and ones
without horizontal directionality.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 23:37:32 +0000 (17:37 -0600)]

Add SPEAKERs paired delimiters

The characters with this name look good as mirrored delimiters.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 23:35:36 +0000 (17:35 -0600)]

Add TELEPHONE RECEIVER paired delimiters

The characters with this name look good as mirrored delimiters.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 23:33:08 +0000 (17:33 -0600)]

Add ERASE paired delimiters

The characters with this name look good as mirrored delimiters.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 23:30:47 +0000 (17:30 -0600)]

Add DOUBLE TRIANGLEs paired delimiters

The characters with this name look good as mirrored delimiters.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 23:28:31 +0000 (17:28 -0600)]

Add THREE RAYS paired delimiters

The characters with this name look good as mirrored delimiters.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 23:17:53 +0000 (17:17 -0600)]

Add musical score paired delimiters

The characters that signify the beginning and ending of Western music
scores serve as good delimiters

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 23:11:00 +0000 (17:11 -0600)]

Add INDEX paired delimiters

The bidi-aware characters containing this word are visually suitable for
being mirrored delimiters. The 'index' refers to the index finger
in a hand pointing at the delimited string

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 18:23:33 +0000 (12:23 -0600)]

Add TURNSTILE paired delimiters

The bidi-aware characters containing this word are visually suitable for
being mirrored delimiters.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 18:23:12 +0000 (12:23 -0600)]

Add TACK paired delimiters

The bidi-aware characters containing this word are visually suitable for
being mirrored delimiters.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 17:55:54 +0000 (11:55 -0600)]

Directionality pres/abs-ence can mean paired delimiters

Another way Unicode indicates that a character has horizontal
directionality is by adding LEFT or RIGHT to the name of a base
character. Hence we get RIGHT SPEAKER vs just plain SPEAKER.

Presumably this comes about when they didn't consider directionality at
first, and then realized later it was needed.

This commit makes the script look for these kinds of character pairs.
Because the current Unicode version only has this characteristic for
Symbols, and symbols must be included explicitly, no changes in what
gets paired ensues. But if you turn on the outputting of characters not
chosen, that list will now include things meeting this new criteria.
Less than a handful actually are like this.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 16:35:00 +0000 (10:35 -0600)]

unicode_constants.pl: Prepare for examining Symbols

Heretofore, the code looking for paired string delimiters has looked at
punctuation, and a few symbols that Unicode gives a mirror for.  But
there are many more suitable-for-pairing characters in Unicode.

This commit generalizes things so as to handle the extra complexities of
the way symbols are named beyond the punctuation names.   For example,
RIGHTWARDS is sometimes used; it turns out that it also is used in one
punctuation character, which was previously overlooked by this script.

The generalization introduced by this commit handles almost all current
Unicode symbols properly.

But some symbols are barely distinguishable from their mirrors, such as
a tilde and a reversed tilde.  The scheme adopted here, then, makes the
default for a symbol pair to not be marked as paired delimiters.  The
code explicitly has to specify that a given pair is to be included.

The next few commits are mostly for adding ones that I thought were
good.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 12:27:54 +0000 (06:27 -0600)]

Add 'ELEMENT OF'/CONTAINS to paired string delimiters

This commit adds 8 pairs of symbols that are variants on ELEMENT OF
These make nice paired delimiters in the vein of < >

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 12:20:09 +0000 (06:20 -0600)]

Add SUBSET/SUPERSET to paired string delimiters

This commit adds 20 pairs of symbols that are variants on SUBSET
These make nice paired delimiters in the vein of < >

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 12:14:46 +0000 (06:14 -0600)]

Add PRECEDES/SUCCEEDS to paired string delimiters

This commit adds 15 pairs of symbols that are variants on PRECEDES.
These look a lot like <>, so makes sense to make them paired delimiters.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 12:09:24 +0000 (06:09 -0600)]

Add SMALLER THAN to paired string delimiters

This commit adds 2 pairs of symbols that are variants on SMALLER THAN.
These look a lot like <>, so makes sense to make them paired delimiters.

commit | commitdiff | tree

Karl Williamson [Wed, 9 Mar 2022 20:13:02 +0000 (13:13 -0700)]

unicode_constants.pl: Consider all \pP for delims

Previously, only the punctuation characters that Unicode had classed as
being opening/closing were considered in looking for suitable paired
delimiters.

This commit looks at all punctuation characters. There are actually
only 7 new pairs found.

This gives us ꧁ ꧂ as string delimiterss, if your font allows,
which are Javanese and used to surround an honorific title, according to
Wikipedia.

commit | commitdiff | tree

Karl Williamson [Thu, 10 Mar 2022 16:40:10 +0000 (09:40 -0700)]

Add < > variants to paired delimiters

Perl considers '< >' to be delimiters for strings; this commit adds
most of the Unicode variants of these to also be string delimiters. The
ones that are combinations of both < and >, aren't included, as that
would be visually confusing.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 03:11:07 +0000 (21:11 -0600)]

unicode_constants.pl: Add REVERSED punctuation

Besides LEFT/RIGHT, horizontal directionality can be specified by
Unicode in names by the presence or absence of REVERSED.

Enhancing the algorithm to take this into account adds 2 pairs or
mirrored delimiters that were previously overlooked.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 02:03:01 +0000 (20:03 -0600)]

unicode_constants.pl: Output why chars not chosen

This script now examines all punctuation characters to see if there is a
mirrored character for it, suitable for use as a Perl string delimiter.
Some don't qualify, and some do qualify but the script doesn't catch
them.

This commit adds the ability to output which characters it doesn't think
qualify, and why. This enables a maintainer to easily check and know
what its deficiencies are, or that there is a good reason that a
particular character gets rejected.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 01:11:08 +0000 (19:11 -0600)]

unicode_constants.pl: Refactor to catch more paired delims

Previously, only characters that Unicode included in its bidirectional
algorithm have been eligible to be found by this program to be mirrored
string delimiters.

This commit adds 5 quotation marker character pairs that
are omitted from the bidirectional algorithm, as most quotes are,
because, as the Standard says, their "directionality and pairing status
is less predictable than paired brackets."

But we're not particularly interested in those semantics, most string
delimiters will be selected only for their visual appearance.

Because they aren't in the bidi algorithm, there is no property that
maps one member of a pair to its mate. However, Two characters whose
names pair only by LEFT vs RIGHT are almost certainly a mirrored pair.
This doesn't catch all possibilities; future commits will expand the
ones caught.

The commit refactors things so as to make future commits easier which
look at even more delimiter possibilities.

commit | commitdiff | tree

Karl Williamson [Wed, 9 Mar 2022 14:25:46 +0000 (07:25 -0700)]

Allow reversal of some paired delimiters; deprecations

Unicode says certain opening punctuation characters may be used as
closing ones in some languages; and their mirror is instead the opening
one.

This commit changes to allow either one of each such set to be the
opening one.

It also deprecates the use of any of the new mirrored delimiters to be
used outside the feature as an unmirrored delimiter, and the normal
closing delimiter from being used as an unpaired opening one while in
the feature. This gives us the freedom to make some or all of the new
paired delimiters be reversible.

commit | commitdiff | tree

Karl Williamson [Mon, 14 Feb 2022 22:25:40 +0000 (15:25 -0700)]

Add 'extra paired delimiters' feature

When this feature is enabled, one can use many more string delimiters
that have an opening version and a mirrored closing one.

commit | commitdiff | tree

Karl Williamson [Tue, 8 Mar 2022 14:31:09 +0000 (07:31 -0700)]

regen/unicode_constants.pl: List paired delimiters

This adds the capability to temporarily change a scalar to true to cause
this to print on stderr a list of the paired string delimiters, suitable
for pasting into a pod.

commit | commitdiff | tree

Karl Williamson [Mon, 14 Feb 2022 04:08:22 +0000 (21:08 -0700)]

unicode_constants.pl: Generate paired string delimiters

This commit causes several C strings to be generated containing bytes
that match paired string delimiters beyond the four that have
traditionally been used in Perl. This will allow a future commit to
accept more matching delimiters around strings than those four.

The code explains how the added delimiters are chosen.

commit | commitdiff | tree

Karl Williamson [Mon, 14 Feb 2022 02:23:50 +0000 (19:23 -0700)]

regen/unicode_constants.pl: Extract code into a fcn

This is in preparation for it to be used in multiple places in a future
commit.

commit | commitdiff | tree

Karl Williamson [Mon, 14 Feb 2022 02:03:54 +0000 (19:03 -0700)]

regen/unicode_constants.pl: White space only

Align the output of this bit vertically with surrounding output.

commit | commitdiff | tree

Karl Williamson [Sun, 13 Feb 2022 04:47:13 +0000 (21:47 -0700)]

toke.c: merge loops, multi-byte delim

The code in toke.c had two closely related loops; one for unmirrored
delimiters (same on both ends of the string) that could take UTF-8
delimiters, and one that allowed a mirrored closing delimiter, which
could take only a single byte. I found that it was just easiest to
collapse these into one loop in preparation to allow multi-byte
mirroring.

commit | commitdiff | tree

Karl Williamson [Sun, 13 Feb 2022 01:11:27 +0000 (18:11 -0700)]

toke.c: white-space only

This outdents some code that the next commit will remove an enclosing
block from

commit | commitdiff | tree

Karl Williamson [Sat, 12 Feb 2022 23:13:29 +0000 (16:13 -0700)]

toke.c: Rmv unnecessary conditionals

Cheaper to just redo a simple assignment than to test if you've already
done it and skipping it if you had.

commit | commitdiff | tree

Karl Williamson [Sat, 12 Feb 2022 21:29:14 +0000 (14:29 -0700)]

toke.c: Rename some variables; terminology in comment

Previously in places, things were called variously delimiter and
terminator, or variants thereof. This makes things consistent, and
clearer.

commit | commitdiff | tree

Karl Williamson [Sat, 12 Feb 2022 19:48:12 +0000 (12:48 -0700)]

toke.c: Split a variable into two for clarity

It can have two different meanings; split and rename to clarify what
meaning it refers to.

commit | commitdiff | tree

Karl Williamson [Sun, 20 Feb 2022 17:26:24 +0000 (10:26 -0700)]

toke.c: Rmv unnecessary SV

This code created an SV simply to get SVfARG() to print it out.  But
nowadays there is UTF8fARG which does the right thing on strings, so the
SV is unnecessary.

The code also has a fix where non-ASCII but Latin1 delimiters  when 'use
utf8' is in effect would potentially come out as garbage  This commit
also adds a clause in a conditional to prevent that.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Feb 2022 02:28:53 +0000 (19:28 -0700)]

Add builtin::trim()

Most of this code came from Paul Evans and Scott Chief Baker

commit | commitdiff | tree

Karl Williamson [Thu, 17 Feb 2022 00:40:58 +0000 (17:40 -0700)]

Add is_XPERLSPACE_utf8_safe_backwards()

This macro starts from the right side and matches UTF-8 white space
characters.

commit | commitdiff | tree

Karl Williamson [Sun, 6 Jun 2021 19:45:05 +0000 (13:45 -0600)]

regen/regcharclass.pl: Add backwards UTF-8 tries

This adds the ability to generate a trie macro that starts at the right
end of a string and backs up one matching byte at a time until a full
character is matched; bailing immediately if a non-matching byte is
found.

Previously, the way to accomplish this was to call the function to hop
back (which looked at the string byte by byte backwards until it found a
non-continuation byte), and then look forwards for matching bytes.

This new way is more efficient, as only the necessary bytes are
examined.

commit | commitdiff | tree

Karl Williamson [Sun, 6 Mar 2022 20:42:48 +0000 (13:42 -0700)]

Mark regex sets feature as accepted

It is no longer experimental.

commit | commitdiff | tree

Karl Williamson [Sun, 6 Mar 2022 20:10:12 +0000 (13:10 -0700)]

Remove use of experimental regex sets warnings

These warnings are no longer generated; so simplify the core by not
trying to turn them off.

The warning is preserved so that other code need not change, but this
commit also turns the default generation of it off.

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom