This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Yves Orton [Thu, 7 Apr 2022 11:47:15 +0000 (13:47 +0200)]
pod/.gitignore - remove redundant perlmacos.pod entry
perlmacos related materials were deleted in
21c79b49e2f6f7,
this pod/.gitignore entry was overlooked.
Yves Orton [Thu, 7 Apr 2022 07:02:05 +0000 (09:02 +0200)]
Update Encode to 3.17
This silences the build warnings reported in https://github.com/Perl/perl5/issues/19588
and in https://github.com/Perl/perl5/issues/17014.
It includes some test updates, but no functionality changes.
Paul Marquess [Wed, 6 Apr 2022 12:17:05 +0000 (12:17 +0000)]
Sync Compress-Raw-Zlib-2.103 + 2 others into blead
This commit synchs into blead versions 2.103 for 3 CPAN distributions:
Compress-Raw-Zlib
IO-Compress
Compress-Raw-Bzip2
Applying the commits one at a time would have resulted in one test
failure in one of those commits, but applying all three has all tests
passing as expected
From Changes for Compress-Raw-Zlib
2.103 3 April 2022
* Sync upstream fix for CVE-2018-25032
https://github.com/advisories/GHSA-jc36-42cf-vqwj
Update to Zlib 1.2.12
d507f527768f6cbab5831ed3ec17fe741163785c
Fix for inflateSync return code change
f47ea5f36c40fe19efe404dd75fd790b115de596
Fix for incorrect CRC from zlib 1.2.12.1
https://github.com/madler/zlib/commit/
ec3df00224d4b396e2ac6586ab5d25f673caa4c2
60104e3a162a116548303861ae0811fb850e65fd
* AUTHOR doesn't contain the stated information
bf5a03c1b440c8d9e41cffb344bf889794cc532b
From Changes for IO-Compress
2.103 3 April 2022
* Update version to 2.103
97f1893892eccac69b3a8033378b0b44d7c4f3ab
* Fix for inflateSyncs retrurn code change
4843e22285bf8e52c9b5b913d167a1545995c793
* Add constant for ZIP_CM_AES
91be04dd8dc2848e3c25b87ec498cf8ccc34187a
* Point links to rfcs to ietf.org
https://github.com/pmqs/IO-Compress/pull/37
a8f28b36cf4d77df1cfa0516867012425920a62f
* Rename test file to fix manifest warning
https://github.com/pmqs/IO-Compress/pull/36
955244f9ac0654d7e8d54115162da53c85d7178c
* Add perl 5.34
06f41883f62ed1b88b03c246b16e0b5ef72503bc
* Fix for Calling nextStream on an IO::Uncompress::Zip object in Transparent mode dies when input is uncompressed
https://github.com/pmqs/IO-Compress/issues/34
b0f93fe62f84b7d4d4bb8d2ea8e6d5432887103f
* IO::Compress: Generalize for EBCDIC
https://github.com/pmqs/IO-Compress/pull/32
90b51dbbd785e2c824cb0a93feef3b3dd5d075f2
* IO::Compress: Fix misspelling in 112utf8-zip.t
c22216b5d3202dce01ef17a271252f82520a6ab9
* Revert "Always have full zip64 entry in central directory"
7df4c9bc98667bc1afd1b4bc5a27d20f94e3cd9c
* Always have full zip64 entry in central directory
333648ee1dece6eb220060c7ec09806f6ebb9866
* update cpanm path on MacOS
33079902934885c515768a08d72e89243a5d01a9
From Changes for Compress-Raw-Bzip2
2.103 3 April 2022
* Silence uninitialized warnings
https://github.com/pmqs/Compress-Raw-Bzip2/pull/5
ff3d907325091287ac1525db384b99a968d763d7
641a440ec6229c1d368b9ead48f4968b955c0115
Dagfinn Ilmari Mannsåker [Tue, 5 Apr 2022 18:13:54 +0000 (19:13 +0100)]
Delete long-obsolete README.macos
Support for Mac OS Classic was removed in 5.12, there's no need to
keep this obsolete notice around.
In passing, add missing perlmacosx to plan9/mkfile's list of
archpodnames.
Niyas Sait [Tue, 22 Mar 2022 10:57:42 +0000 (10:57 +0000)]
build: add configurations to compile perl for windows/arm64
Yves Orton [Tue, 5 Apr 2022 13:56:20 +0000 (15:56 +0200)]
makedepend_file.SH - add -DPERL_CORE so we pick up all deps
makedepend_file does not find all deps for our code because in many
cases the dependencies are hidden behind a guard clause which checks
if PERL_CORE is defined. This is annoying when working on the regex
engine as after `make regen` is executed `make` does not notice that
files like regnodes.h have been updated. No doubt it is annoying in
other contexts too.
This adds the -DPERL_CORE so that we pick up these guarded dependencies.
With this patch things updating regnodes.h is noticed and regcomp.o,
regexec.o and miniperl will be appropriately rebuilt.
Thanks to Dagfinn Ilmari Mannsåker for figuring this out.
Paul "LeoNerd" Evans [Mon, 4 Apr 2022 15:28:06 +0000 (16:28 +0100)]
Add the new flow-control keywords to perlfunc.pod
Yves Orton [Sun, 13 Mar 2022 12:49:35 +0000 (13:49 +0100)]
runenv_hashseed.t - rework to deal with probabilistic failures
The old version of this test had a 1/128 chance of failing the
RANDOM key change test each time we try it. We run it 50 times, so
there is a non-zero chance of failing it at least once. It is actually
surprising we haven't seen more test fails so far. The key change test
also has a similar problem, albeit with a much lower probability of
failing the test.
This version of the test includes a number of changes.
* Test setting PERL_PERTURB_KEYS both via name and number.
* Use more test keys with a mix of different key lengths, some
relatively long.
* Rework various tests that only verify that setting PERL_PERTURB_KEYS
mode works, and that setting the seed works so they only fire once per
mode. There is no point in repeating them over and over.
* Improved comments about what is going on.
* Only use -Dh mode when we test the DETERMINISTIC mode.
* Clean up the temp files more aggressively.
Richard Leach [Sat, 19 Mar 2022 23:01:57 +0000 (23:01 +0000)]
Perl_reg_named_buff_fetch: simplify newSVsv(&PL_sv_undef)
Specifically, newSVsv(&PL_sv_undef) reduces to just newSV_type(SVt_NULL).
Karl Williamson [Thu, 31 Mar 2022 14:40:25 +0000 (08:40 -0600)]
toke.c: Reorder branches for clarity
The trivial case should be handled first.
Karl Williamson [Tue, 29 Mar 2022 10:43:46 +0000 (04:43 -0600)]
toke.c: scan_str(): Rmv special handling for NUL delim
Because we use ninstr(), which can handle NULs, no special handling of
them is required.
Karl Williamson [Tue, 29 Mar 2022 03:31:55 +0000 (21:31 -0600)]
toke.c: C_ARRAY_END() doesn't work on a string
The code was using the macro C_ARRAY_END which doesn't apply to strings,
thus not giving the correct end to the string. But no tests were
failing. No new tests are added here, because the next commit will
change things so that tests would fail all over the place.
Karl Williamson [Tue, 29 Mar 2022 10:38:30 +0000 (04:38 -0600)]
toke.c: Don't assume is UTF-8 when it might not be
This code only works by coincidence on ASCII platforms, due to the
chance ways the underlying UTF-8 is represented. But it definitely
doesn't on EBCDIC. Test before assuming is UTF-8.
Karl Williamson [Tue, 29 Mar 2022 10:36:15 +0000 (04:36 -0600)]
toke.c: Move variable set to after possible exit
It's just a little bit better to do the warning (which could be made
fatal) before setting something that's only needed later.
Karl Williamson [Tue, 29 Mar 2022 10:32:18 +0000 (04:32 -0600)]
toke.c Add NUL terminator in both branches
By moving the setting of this to after two branches of a conditional
come together, it gets set always, instead of sometimes.
Karl Williamson [Tue, 29 Mar 2022 10:26:32 +0000 (04:26 -0600)]
toke.c: Move setting of a variable to later
This simplies a bit.
Karl Williamson [Tue, 29 Mar 2022 10:17:04 +0000 (04:17 -0600)]
toke.c: Rmv redundant storage
The data contained in this variable is a copy of const data stored
elsewhere. Instead of making a copy, simplify to just point to the
already-stored data
Karl Williamson [Tue, 29 Mar 2022 09:10:41 +0000 (03:10 -0600)]
toke.c: Rmv redundant variable
This variable doesn't add anything. We can use other variables to
just as conveniently get at the information it contains.
Karl Williamson [Tue, 29 Mar 2022 03:24:33 +0000 (21:24 -0600)]
toke.c: Rmv redundant conditional
To get to the removed conditional, it has already been checked for being
true.
Karl Williamson [Mon, 28 Mar 2022 19:45:59 +0000 (13:45 -0600)]
toke.c: Change constant to sync with comment
Its better if the comment and code mesh.
Paul "LeoNerd" Evans [Fri, 11 Mar 2022 12:22:11 +0000 (12:22 +0000)]
Some initial documentation about the new created_as_{string,number} functions
Paul "LeoNerd" Evans [Mon, 14 Mar 2022 14:31:29 +0000 (14:31 +0000)]
Initial implementation and unit-tests of created_as_{string,number}
James E Keenan [Wed, 30 Mar 2022 17:54:09 +0000 (17:54 +0000)]
Test::More::note() needs version 0.82
For: https://github.com/Perl/perl5/issues/19569, as reported by
kbulgrien.
Karl Williamson [Wed, 30 Mar 2022 16:04:06 +0000 (10:04 -0600)]
podcheck.t: Dont check verbatim length on a few pods
These pods have some very long lines that make sense to keep on a single
line, such as output from a program. That means that someone viewing
them will either enlarge their window to view them unbroken or all is
lost anyway; there may be a few lines that could be shortened, but no
real value to do so.
Karl Williamson [Wed, 30 Mar 2022 16:01:22 +0000 (10:01 -0600)]
podcheck: Add links to known modules
Karl Williamson [Wed, 30 Mar 2022 15:59:41 +0000 (09:59 -0600)]
podcheck.t: Add spaces to an output message
To make it easier to read.
Karl Williamson [Wed, 30 Mar 2022 15:57:10 +0000 (09:57 -0600)]
podcheck.t: Devel::ppport::ppphdoc isn't pod
So don't check it.
Karl Williamson [Wed, 30 Mar 2022 15:56:03 +0000 (09:56 -0600)]
podcheck.t: Update pod about verbatim lines
Karl Williamson [Wed, 30 Mar 2022 15:52:28 +0000 (09:52 -0600)]
perlhacktips: white-space only
Karl Williamson [Thu, 24 Mar 2022 02:21:01 +0000 (20:21 -0600)]
Clarify \p{Decomposition_Type=NonCanonical}
This closes #18458
Karl Williamson [Wed, 30 Mar 2022 03:42:53 +0000 (21:42 -0600)]
ExtUtils::ParseXS/t/002-more.t: Fix skip count
Karl Williamson [Mon, 28 Mar 2022 19:34:32 +0000 (13:34 -0600)]
toke.c: White-space, comments
Karl Williamson [Mon, 28 Mar 2022 19:29:17 +0000 (13:29 -0600)]
t/loc_tools.pl: Skip locale tests on z/OS threaded
setlocale() is a no-op on this system after the first thread is created,
making it an outlier of platforms, so the tests assume otherwise, hence
would fail.
Karl Williamson [Mon, 28 Mar 2022 19:25:10 +0000 (13:25 -0600)]
perllocale: Add note about z/OS special behavior
Karl Williamson [Mon, 28 Mar 2022 19:24:12 +0000 (13:24 -0600)]
perllocale: Formatting, grammar
Ricardo Signes [Mon, 28 Mar 2022 19:16:30 +0000 (15:16 -0400)]
release schedule: add the next two people
James E Keenan [Sun, 27 Mar 2022 19:52:44 +0000 (19:52 +0000)]
Bump $VERSION in perl5db.pl
Karl Williamson [Thu, 24 Mar 2022 12:52:51 +0000 (06:52 -0600)]
APItest/t/sv_streq.t: Generalize for EBCDIC
This test fails on EBCDIC systems, because it wants a non-ASCII
character, and the one it chose, E9, is ASCII on EBCDIC ('Z').
perlhacktips suggests B6 as a character to use in such tests, and this
commit changes to use that.
Karl Williamson [Mon, 21 Mar 2022 20:10:16 +0000 (14:10 -0600)]
Perl5DB: Rmv ASCII dependency
Karl Williamson [Sat, 26 Mar 2022 17:23:04 +0000 (11:23 -0600)]
Devel::Peek::Peek.t: Simplify EBCDIC handling
The prior commit shows what can happen when two branches do the same
thing: they can get out of sync
Since this test file was originally written, the testing infrastructure
has improved so that there are functions that handle the gory details of
character set differences for you. This test file hadn't been updated
since it wasn't causing a problem, until now.
This commit changes to use the new infrastructure, and as a result one
branch gets removed each from the two tests that varied depending on
character set.
Karl Williamson [Sat, 26 Mar 2022 12:50:28 +0000 (06:50 -0600)]
Devel::Peek::Peek.t: Add missing '\' for EBCDIC
This file was recently changed, and the EBCDIC side of the change had a
typo.
Hugo van der Sanden [Wed, 23 Mar 2022 13:19:00 +0000 (13:19 +0000)]
gh19557: restore match_end on early bailout
After
271c3af797, early bailout from the inner one of a pair of nested
lookbehinds would leave the desired match_end pointing at the wrong
place, so the outer lookbehind could give the wrong answer.
Karl Williamson [Thu, 24 Mar 2022 20:31:44 +0000 (14:31 -0600)]
toke.c: Add missing ptr update
scan_str() calls s=skipspace(s). It turns out that this function can
actually change the buffer 's' is pointing to, so that the original
'start' passed in to the function is obsolete. Just update it. This is
very much like the paradigm already in S_force_word().
This bug previously existed, but commit
32b87797e986f5d99836e16ea6b9d9ff5a56d3be increased the frequency of
occurrence from close to non-existent to relatively often. It only
happened when the string being delimited had some spaces before it, and
only if the buffer got moved. This depends on the position the
construct is in the file, and on the buffering of the reading of that
file, hence the symptoms had it occurring much more often using stdio
than PerlIO. (it could just as well have been the reverse, I suppose.)
The mentioned commit collapsed two different loops; one of which didn't
bother with a check it should have been doing. Without that check, the
likelihood of this being triggered was much less. (But illegal input
would get by.)
There is a nuance here, which resulted in the need for this commit to
also update the test file, from having two occurrences of an error on a
single line to just one. This is because, if the buffer moves, we reset
'start' to 's'. This makes 's' appear to be at the left edge of the
input when it really is just at the left edge of the buffer. The test
that failed used a combining character (I'll call it 'cc' for short)
after a space, to check that the code accurately catches the illegality
that you can't delimit a string with a character that doesn't stand on
its own, such as a cc. However when such a character comes at the
beginning of the input, there's nothing for it to combine with, and
Unicode says that is legal, so we do too. So this moving 'start' makes
something that is illegal look to be legal. I don't think this is a
problem because the code looks up the cc and discovers there is no
mirror for it, so it must also be the terminator for the string. If
this cc is just from a single typo in the input, there won't be a
matching terminator, and the compilation will abort. If the program
intended to use a cc as both fore and aft of a string, the terminating
occurrence of this cc will also be checked for validity, and it will
almost certainly be seen to be an illegal cc in this context, so again
the compilation will fail. That is indeed what is happening in
t/lib/warnings/toke. If the buffering were such that the terminating cc
also began a new buffer, it again would be viewed as at the edge and the
string would be parsed as being ok, when it really shouldn't have been.
Should this happen, I don't see a real problem. An attacker could craft
a string with the precise length to make this happen, but to do so they
would have to control the source code, and the war is already lost.
James E Keenan [Thu, 24 Mar 2022 12:46:56 +0000 (12:46 +0000)]
Correct POD formatting error
Karl Williamson [Thu, 24 Mar 2022 03:33:54 +0000 (21:33 -0600)]
perlunicode: regex sets is no longer experimental
Karl Williamson [Wed, 23 Mar 2022 19:26:26 +0000 (13:26 -0600)]
re/anyof.t: Add debugging info
This is in response to
https://github.com/Perl/perl5/pull/19558#issuecomment-
1076659884
Karl Williamson [Wed, 23 Mar 2022 19:31:38 +0000 (13:31 -0600)]
Fix double encoding of UTF-8 on EBCDIC
Commit
d1e771d8c533168553df9b2a858d967f707fc9fe broke EBCDIC builds by
doubly encoding some UTF-8 characters.
Hugo van der Sanden [Mon, 21 Mar 2022 21:55:01 +0000 (21:55 +0000)]
gh17746: add missing check on hardcount
Failing to check for max iterations caused an assertion failure.
Graham Knop [Mon, 21 Mar 2022 16:48:51 +0000 (17:48 +0100)]
fix typo in perl53510delta
James E Keenan [Mon, 21 Mar 2022 13:37:35 +0000 (13:37 +0000)]
Correct POD formatting error
Use 'F<>' for strings that are simply filenames.
As reported by Tux on #p5p.
Sawyer X [Mon, 21 Mar 2022 10:32:29 +0000 (11:32 +0100)]
Update RMG on updateAUTHORS.pl
Sawyer X [Mon, 21 Mar 2022 10:21:08 +0000 (11:21 +0100)]
Update Module::CoreList for 5.35.11
Richard Leach [Sun, 20 Mar 2022 19:05:10 +0000 (19:05 +0000)]
Perl_newSViv: simplify by using (inline) newSV_type
Richard Leach [Sun, 23 May 2021 23:33:36 +0000 (00:33 +0100)]
Perl_newSVnv: simplify SV creation and SvNV_set
The function can be simplified by using the now-inlined newSV_type
function, directly using SvNV_set, and twiddling the required flags.
This cuts out any function call overhead, a switch statement leading
to a sv_upgrade(sv, SVt_NV) call, and a touch more bit-twiddling than
is necessary.
Tony Cook [Mon, 11 Jul 2016 00:52:20 +0000 (10:52 +1000)]
(perl #128245) make it obvious :encoding uses PerlIO::encoding
rather than encoding.pm
Sawyer X [Sun, 20 Mar 2022 18:34:19 +0000 (19:34 +0100)]
Bump version for 5.35.11
Sawyer X [Sun, 20 Mar 2022 16:46:29 +0000 (17:46 +0100)]
New perldelta for 5.35.11
Sawyer X [Sun, 20 Mar 2022 16:40:47 +0000 (17:40 +0100)]
Tick release, update epigraph
Sawyer X [Sun, 20 Mar 2022 10:35:36 +0000 (11:35 +0100)]
Add new release to perlhist
Sawyer X [Sun, 20 Mar 2022 10:09:28 +0000 (11:09 +0100)]
Finalize perldelta
Sawyer X [Sun, 20 Mar 2022 09:44:55 +0000 (10:44 +0100)]
Update Module::CoreList for 5.35.10
Sawyer X [Sun, 20 Mar 2022 09:36:13 +0000 (10:36 +0100)]
Update AUTHORS file
Karl Williamson [Thu, 17 Mar 2022 23:38:55 +0000 (17:38 -0600)]
Add arrows to paired string delimiters
Unicode has lots of arrows of various shapes, sizes, and directions.
None of them were of consequence to the Bidirectional algorithm, so none
were specified as being mirrored pairs. This commit uses the
generalizations already in place from previous commits to examine arrow
symbols and choose which are mirrored pairs.
As previously, it rejects arrows with contrary directionality, and ones
without horizontal directionality.
Karl Williamson [Thu, 17 Mar 2022 23:37:32 +0000 (17:37 -0600)]
Add SPEAKERs paired delimiters
The characters with this name look good as mirrored delimiters.
Karl Williamson [Thu, 17 Mar 2022 23:35:36 +0000 (17:35 -0600)]
Add TELEPHONE RECEIVER paired delimiters
The characters with this name look good as mirrored delimiters.
Karl Williamson [Thu, 17 Mar 2022 23:33:08 +0000 (17:33 -0600)]
Add ERASE paired delimiters
The characters with this name look good as mirrored delimiters.
Karl Williamson [Thu, 17 Mar 2022 23:30:47 +0000 (17:30 -0600)]
Add DOUBLE TRIANGLEs paired delimiters
The characters with this name look good as mirrored delimiters.
Karl Williamson [Thu, 17 Mar 2022 23:28:31 +0000 (17:28 -0600)]
Add THREE RAYS paired delimiters
The characters with this name look good as mirrored delimiters.
Karl Williamson [Thu, 17 Mar 2022 23:17:53 +0000 (17:17 -0600)]
Add musical score paired delimiters
The characters that signify the beginning and ending of Western music
scores serve as good delimiters
Karl Williamson [Thu, 17 Mar 2022 23:11:00 +0000 (17:11 -0600)]
Add INDEX paired delimiters
The bidi-aware characters containing this word are visually suitable for
being mirrored delimiters. The 'index' refers to the index finger
in a hand pointing at the delimited string
Karl Williamson [Thu, 17 Mar 2022 18:23:33 +0000 (12:23 -0600)]
Add TURNSTILE paired delimiters
The bidi-aware characters containing this word are visually suitable for
being mirrored delimiters.
Karl Williamson [Thu, 17 Mar 2022 18:23:12 +0000 (12:23 -0600)]
Add TACK paired delimiters
The bidi-aware characters containing this word are visually suitable for
being mirrored delimiters.
Karl Williamson [Thu, 17 Mar 2022 17:55:54 +0000 (11:55 -0600)]
Directionality pres/abs-ence can mean paired delimiters
Another way Unicode indicates that a character has horizontal
directionality is by adding LEFT or RIGHT to the name of a base
character. Hence we get RIGHT SPEAKER vs just plain SPEAKER.
Presumably this comes about when they didn't consider directionality at
first, and then realized later it was needed.
This commit makes the script look for these kinds of character pairs.
Because the current Unicode version only has this characteristic for
Symbols, and symbols must be included explicitly, no changes in what
gets paired ensues. But if you turn on the outputting of characters not
chosen, that list will now include things meeting this new criteria.
Less than a handful actually are like this.
Karl Williamson [Thu, 17 Mar 2022 16:35:00 +0000 (10:35 -0600)]
unicode_constants.pl: Prepare for examining Symbols
Heretofore, the code looking for paired string delimiters has looked at
punctuation, and a few symbols that Unicode gives a mirror for. But
there are many more suitable-for-pairing characters in Unicode.
This commit generalizes things so as to handle the extra complexities of
the way symbols are named beyond the punctuation names. For example,
RIGHTWARDS is sometimes used; it turns out that it also is used in one
punctuation character, which was previously overlooked by this script.
The generalization introduced by this commit handles almost all current
Unicode symbols properly.
But some symbols are barely distinguishable from their mirrors, such as
a tilde and a reversed tilde. The scheme adopted here, then, makes the
default for a symbol pair to not be marked as paired delimiters. The
code explicitly has to specify that a given pair is to be included.
The next few commits are mostly for adding ones that I thought were
good.
Karl Williamson [Thu, 17 Mar 2022 12:27:54 +0000 (06:27 -0600)]
Add 'ELEMENT OF'/CONTAINS to paired string delimiters
This commit adds 8 pairs of symbols that are variants on ELEMENT OF
These make nice paired delimiters in the vein of < >
Karl Williamson [Thu, 17 Mar 2022 12:20:09 +0000 (06:20 -0600)]
Add SUBSET/SUPERSET to paired string delimiters
This commit adds 20 pairs of symbols that are variants on SUBSET
These make nice paired delimiters in the vein of < >
Karl Williamson [Thu, 17 Mar 2022 12:14:46 +0000 (06:14 -0600)]
Add PRECEDES/SUCCEEDS to paired string delimiters
This commit adds 15 pairs of symbols that are variants on PRECEDES.
These look a lot like <>, so makes sense to make them paired delimiters.
Karl Williamson [Thu, 17 Mar 2022 12:09:24 +0000 (06:09 -0600)]
Add SMALLER THAN to paired string delimiters
This commit adds 2 pairs of symbols that are variants on SMALLER THAN.
These look a lot like <>, so makes sense to make them paired delimiters.
Karl Williamson [Wed, 9 Mar 2022 20:13:02 +0000 (13:13 -0700)]
unicode_constants.pl: Consider all \pP for delims
Previously, only the punctuation characters that Unicode had classed as
being opening/closing were considered in looking for suitable paired
delimiters.
This commit looks at all punctuation characters. There are actually
only 7 new pairs found.
This gives us ꧁ ꧂ as string delimiterss, if your font allows,
which are Javanese and used to surround an honorific title, according to
Wikipedia.
Karl Williamson [Thu, 10 Mar 2022 16:40:10 +0000 (09:40 -0700)]
Add < > variants to paired delimiters
Perl considers '< >' to be delimiters for strings; this commit adds
most of the Unicode variants of these to also be string delimiters. The
ones that are combinations of both < and >, aren't included, as that
would be visually confusing.
Karl Williamson [Thu, 17 Mar 2022 03:11:07 +0000 (21:11 -0600)]
unicode_constants.pl: Add REVERSED punctuation
Besides LEFT/RIGHT, horizontal directionality can be specified by
Unicode in names by the presence or absence of REVERSED.
Enhancing the algorithm to take this into account adds 2 pairs or
mirrored delimiters that were previously overlooked.
Karl Williamson [Thu, 17 Mar 2022 02:03:01 +0000 (20:03 -0600)]
unicode_constants.pl: Output why chars not chosen
This script now examines all punctuation characters to see if there is a
mirrored character for it, suitable for use as a Perl string delimiter.
Some don't qualify, and some do qualify but the script doesn't catch
them.
This commit adds the ability to output which characters it doesn't think
qualify, and why. This enables a maintainer to easily check and know
what its deficiencies are, or that there is a good reason that a
particular character gets rejected.
Karl Williamson [Thu, 17 Mar 2022 01:11:08 +0000 (19:11 -0600)]
unicode_constants.pl: Refactor to catch more paired delims
Previously, only characters that Unicode included in its bidirectional
algorithm have been eligible to be found by this program to be mirrored
string delimiters.
This commit adds 5 quotation marker character pairs that
are omitted from the bidirectional algorithm, as most quotes are,
because, as the Standard says, their "directionality and pairing status
is less predictable than paired brackets."
But we're not particularly interested in those semantics, most string
delimiters will be selected only for their visual appearance.
Because they aren't in the bidi algorithm, there is no property that
maps one member of a pair to its mate. However, Two characters whose
names pair only by LEFT vs RIGHT are almost certainly a mirrored pair.
This doesn't catch all possibilities; future commits will expand the
ones caught.
The commit refactors things so as to make future commits easier which
look at even more delimiter possibilities.
Karl Williamson [Wed, 9 Mar 2022 14:25:46 +0000 (07:25 -0700)]
Allow reversal of some paired delimiters; deprecations
Unicode says certain opening punctuation characters may be used as
closing ones in some languages; and their mirror is instead the opening
one.
This commit changes to allow either one of each such set to be the
opening one.
It also deprecates the use of any of the new mirrored delimiters to be
used outside the feature as an unmirrored delimiter, and the normal
closing delimiter from being used as an unpaired opening one while in
the feature. This gives us the freedom to make some or all of the new
paired delimiters be reversible.
Karl Williamson [Mon, 14 Feb 2022 22:25:40 +0000 (15:25 -0700)]
Add 'extra paired delimiters' feature
When this feature is enabled, one can use many more string delimiters
that have an opening version and a mirrored closing one.
Karl Williamson [Tue, 8 Mar 2022 14:31:09 +0000 (07:31 -0700)]
regen/unicode_constants.pl: List paired delimiters
This adds the capability to temporarily change a scalar to true to cause
this to print on stderr a list of the paired string delimiters, suitable
for pasting into a pod.
Karl Williamson [Mon, 14 Feb 2022 04:08:22 +0000 (21:08 -0700)]
unicode_constants.pl: Generate paired string delimiters
This commit causes several C strings to be generated containing bytes
that match paired string delimiters beyond the four that have
traditionally been used in Perl. This will allow a future commit to
accept more matching delimiters around strings than those four.
The code explains how the added delimiters are chosen.
Karl Williamson [Mon, 14 Feb 2022 02:23:50 +0000 (19:23 -0700)]
regen/unicode_constants.pl: Extract code into a fcn
This is in preparation for it to be used in multiple places in a future
commit.
Karl Williamson [Mon, 14 Feb 2022 02:03:54 +0000 (19:03 -0700)]
regen/unicode_constants.pl: White space only
Align the output of this bit vertically with surrounding output.
Karl Williamson [Sun, 13 Feb 2022 04:47:13 +0000 (21:47 -0700)]
toke.c: merge loops, multi-byte delim
The code in toke.c had two closely related loops; one for unmirrored
delimiters (same on both ends of the string) that could take UTF-8
delimiters, and one that allowed a mirrored closing delimiter, which
could take only a single byte. I found that it was just easiest to
collapse these into one loop in preparation to allow multi-byte
mirroring.
Karl Williamson [Sun, 13 Feb 2022 01:11:27 +0000 (18:11 -0700)]
toke.c: white-space only
This outdents some code that the next commit will remove an enclosing
block from
Karl Williamson [Sat, 12 Feb 2022 23:13:29 +0000 (16:13 -0700)]
toke.c: Rmv unnecessary conditionals
Cheaper to just redo a simple assignment than to test if you've already
done it and skipping it if you had.
Karl Williamson [Sat, 12 Feb 2022 21:29:14 +0000 (14:29 -0700)]
toke.c: Rename some variables; terminology in comment
Previously in places, things were called variously delimiter and
terminator, or variants thereof. This makes things consistent, and
clearer.
Karl Williamson [Sat, 12 Feb 2022 19:48:12 +0000 (12:48 -0700)]
toke.c: Split a variable into two for clarity
It can have two different meanings; split and rename to clarify what
meaning it refers to.
Karl Williamson [Sun, 20 Feb 2022 17:26:24 +0000 (10:26 -0700)]
toke.c: Rmv unnecessary SV
This code created an SV simply to get SVfARG() to print it out. But
nowadays there is UTF8fARG which does the right thing on strings, so the
SV is unnecessary.
The code also has a fix where non-ASCII but Latin1 delimiters when 'use
utf8' is in effect would potentially come out as garbage This commit
also adds a clause in a conditional to prevent that.
Karl Williamson [Thu, 17 Feb 2022 02:28:53 +0000 (19:28 -0700)]
Add builtin::trim()
Most of this code came from Paul Evans and Scott Chief Baker
Karl Williamson [Thu, 17 Feb 2022 00:40:58 +0000 (17:40 -0700)]
Add is_XPERLSPACE_utf8_safe_backwards()
This macro starts from the right side and matches UTF-8 white space
characters.
Karl Williamson [Sun, 6 Jun 2021 19:45:05 +0000 (13:45 -0600)]
regen/regcharclass.pl: Add backwards UTF-8 tries
This adds the ability to generate a trie macro that starts at the right
end of a string and backs up one matching byte at a time until a full
character is matched; bailing immediately if a non-matching byte is
found.
Previously, the way to accomplish this was to call the function to hop
back (which looked at the string byte by byte backwards until it found a
non-continuation byte), and then look forwards for matching bytes.
This new way is more efficient, as only the necessary bytes are
examined.
Karl Williamson [Sun, 6 Mar 2022 20:42:48 +0000 (13:42 -0700)]
Mark regex sets feature as accepted
It is no longer experimental.
Karl Williamson [Sun, 6 Mar 2022 20:10:12 +0000 (13:10 -0700)]
Remove use of experimental regex sets warnings
These warnings are no longer generated; so simplify the core by not
trying to turn them off.
The warning is preserved so that other code need not change, but this
commit also turns the default generation of it off.