perl5.git.perl.org Git - perl5.git/log

This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5

https://perl5.git.perl.org / perl5.git / log

commit | commitdiff | tree

Karl Williamson [Thu, 24 Mar 2022 02:21:01 +0000 (20:21 -0600)]

Clarify \p{Decomposition_Type=NonCanonical}

This closes #18458

commit | commitdiff | tree

Karl Williamson [Wed, 30 Mar 2022 03:42:53 +0000 (21:42 -0600)]

ExtUtils::ParseXS/t/002-more.t: Fix skip count

commit | commitdiff | tree

Karl Williamson [Mon, 28 Mar 2022 19:34:32 +0000 (13:34 -0600)]

toke.c: White-space, comments

commit | commitdiff | tree

Karl Williamson [Mon, 28 Mar 2022 19:29:17 +0000 (13:29 -0600)]

t/loc_tools.pl: Skip locale tests on z/OS threaded

setlocale() is a no-op on this system after the first thread is created,
making it an outlier of platforms, so the tests assume otherwise, hence
would fail.

commit | commitdiff | tree

Karl Williamson [Mon, 28 Mar 2022 19:25:10 +0000 (13:25 -0600)]

perllocale: Add note about z/OS special behavior

commit | commitdiff | tree

Karl Williamson [Mon, 28 Mar 2022 19:24:12 +0000 (13:24 -0600)]

perllocale: Formatting, grammar

commit | commitdiff | tree

Ricardo Signes [Mon, 28 Mar 2022 19:16:30 +0000 (15:16 -0400)]

release schedule: add the next two people

commit | commitdiff | tree

James E Keenan [Sun, 27 Mar 2022 19:52:44 +0000 (19:52 +0000)]

Bump $VERSION in perl5db.pl

commit | commitdiff | tree

Karl Williamson [Thu, 24 Mar 2022 12:52:51 +0000 (06:52 -0600)]

APItest/t/sv_streq.t: Generalize for EBCDIC

This test fails on EBCDIC systems, because it wants a non-ASCII
character, and the one it chose, E9, is ASCII on EBCDIC ('Z').

perlhacktips suggests B6 as a character to use in such tests, and this
commit changes to use that.

commit | commitdiff | tree

Karl Williamson [Mon, 21 Mar 2022 20:10:16 +0000 (14:10 -0600)]

Perl5DB: Rmv ASCII dependency

commit | commitdiff | tree

Karl Williamson [Sat, 26 Mar 2022 17:23:04 +0000 (11:23 -0600)]

Devel::Peek::Peek.t: Simplify EBCDIC handling

The prior commit shows what can happen when two branches do the same
thing: they can get out of sync

Since this test file was originally written, the testing infrastructure
has improved so that there are functions that handle the gory details of
character set differences for you. This test file hadn't been updated
since it wasn't causing a problem, until now.

This commit changes to use the new infrastructure, and as a result one
branch gets removed each from the two tests that varied depending on
character set.

commit | commitdiff | tree

Karl Williamson [Sat, 26 Mar 2022 12:50:28 +0000 (06:50 -0600)]

Devel::Peek::Peek.t: Add missing '\' for EBCDIC

This file was recently changed, and the EBCDIC side of the change had a
typo.

commit | commitdiff | tree

Hugo van der Sanden [Wed, 23 Mar 2022 13:19:00 +0000 (13:19 +0000)]

gh19557: restore match_end on early bailout

After 271c3af797, early bailout from the inner one of a pair of nested
lookbehinds would leave the desired match_end pointing at the wrong
place, so the outer lookbehind could give the wrong answer.

commit | commitdiff | tree

Karl Williamson [Thu, 24 Mar 2022 20:31:44 +0000 (14:31 -0600)]

toke.c: Add missing ptr update

scan_str() calls s=skipspace(s).  It turns out that this function can
actually change the buffer 's' is pointing to, so that the original
'start' passed in to the function is obsolete.  Just update it.  This is
very much like the paradigm already in S_force_word().

This bug previously existed, but commit
32b87797e986f5d99836e16ea6b9d9ff5a56d3be increased the frequency of
occurrence from close to non-existent to relatively often.  It only
happened when the string being delimited had some spaces before it, and
only if the buffer got moved.  This depends on the position the
construct is in the file, and on the buffering of the reading of that
file, hence the symptoms had it occurring much more often using stdio
than PerlIO. (it could just as well have been the reverse, I suppose.)

The mentioned commit collapsed two different loops; one of which didn't
bother with a check it should have been doing.  Without that check, the
likelihood of this being triggered was much less.  (But illegal input
would get by.)

There is a nuance here, which resulted in the need for this commit to
also update the test file, from having two occurrences of an error on a
single line to just one.  This is because, if the buffer moves, we reset
'start' to 's'.  This makes 's' appear to be at the left edge of the
input when it really is just at the left edge of the buffer.  The test
that failed used a combining character (I'll call it 'cc' for short)
after a space, to check that the code accurately catches the illegality
that you can't delimit a string with a character that doesn't stand on
its own, such as a cc.  However when such a character comes at the
beginning of the input, there's nothing for it to combine with, and
Unicode says that is legal, so we do too.  So this moving 'start' makes
something that is illegal look to be legal.  I don't think this is a
problem because the code looks up the cc and discovers there is no
mirror for it, so it must also be the terminator for the string.  If
this cc is just from a single typo in the input, there won't be a
matching terminator, and the compilation will abort.  If the program
intended to use a cc as both fore and aft of a string, the terminating
occurrence of this cc will also be checked for validity, and it will
almost certainly be seen to be an illegal cc in this context, so again
the compilation will fail.  That is indeed what is happening in
t/lib/warnings/toke.  If the buffering were such that the terminating cc
also began a new buffer, it again would be viewed as at the edge and the
string would be parsed as being ok, when it really shouldn't have been.
Should this happen, I don't see a real problem.  An attacker could craft
a string with the precise length to make this happen, but to do so they
would have to control the source code, and the war is already lost.

commit | commitdiff | tree

James E Keenan [Thu, 24 Mar 2022 12:46:56 +0000 (12:46 +0000)]

Correct POD formatting error

commit | commitdiff | tree

Karl Williamson [Thu, 24 Mar 2022 03:33:54 +0000 (21:33 -0600)]

perlunicode: regex sets is no longer experimental

commit | commitdiff | tree

Karl Williamson [Wed, 23 Mar 2022 19:26:26 +0000 (13:26 -0600)]

re/anyof.t: Add debugging info

This is in response to
https://github.com/Perl/perl5/pull/19558#issuecomment-1076659884

commit | commitdiff | tree

Karl Williamson [Wed, 23 Mar 2022 19:31:38 +0000 (13:31 -0600)]

Fix double encoding of UTF-8 on EBCDIC

Commit d1e771d8c533168553df9b2a858d967f707fc9fe broke EBCDIC builds by
doubly encoding some UTF-8 characters.

commit | commitdiff | tree

Hugo van der Sanden [Mon, 21 Mar 2022 21:55:01 +0000 (21:55 +0000)]

gh17746: add missing check on hardcount

Failing to check for max iterations caused an assertion failure.

commit | commitdiff | tree

Graham Knop [Mon, 21 Mar 2022 16:48:51 +0000 (17:48 +0100)]

fix typo in perl53510delta

commit | commitdiff | tree

James E Keenan [Mon, 21 Mar 2022 13:37:35 +0000 (13:37 +0000)]

Correct POD formatting error

Use 'F<>' for strings that are simply filenames.

As reported by Tux on #p5p.

commit | commitdiff | tree

Sawyer X [Mon, 21 Mar 2022 10:32:29 +0000 (11:32 +0100)]

Update RMG on updateAUTHORS.pl

commit | commitdiff | tree

Sawyer X [Mon, 21 Mar 2022 10:21:08 +0000 (11:21 +0100)]

Update Module::CoreList for 5.35.11

commit | commitdiff | tree

Richard Leach [Sun, 20 Mar 2022 19:05:10 +0000 (19:05 +0000)]

Perl_newSViv: simplify by using (inline) newSV_type

commit | commitdiff | tree

Richard Leach [Sun, 23 May 2021 23:33:36 +0000 (00:33 +0100)]

Perl_newSVnv: simplify SV creation and SvNV_set

The function can be simplified by using the now-inlined newSV_type
function, directly using SvNV_set, and twiddling the required flags.

This cuts out any function call overhead, a switch statement leading
to a sv_upgrade(sv, SVt_NV) call, and a touch more bit-twiddling than
is necessary.

commit | commitdiff | tree

Tony Cook [Mon, 11 Jul 2016 00:52:20 +0000 (10:52 +1000)]

(perl #128245) make it obvious :encoding uses PerlIO::encoding

rather than encoding.pm

commit | commitdiff | tree

Sawyer X [Sun, 20 Mar 2022 18:34:19 +0000 (19:34 +0100)]

Bump version for 5.35.11

commit | commitdiff | tree

Sawyer X [Sun, 20 Mar 2022 16:46:29 +0000 (17:46 +0100)]

New perldelta for 5.35.11

commit | commitdiff | tree

Sawyer X [Sun, 20 Mar 2022 16:40:47 +0000 (17:40 +0100)]

Tick release, update epigraph

commit | commitdiff | tree

Sawyer X [Sun, 20 Mar 2022 10:35:36 +0000 (11:35 +0100)]

Add new release to perlhist

commit | commitdiff | tree

Sawyer X [Sun, 20 Mar 2022 10:09:28 +0000 (11:09 +0100)]

Finalize perldelta

commit | commitdiff | tree

Sawyer X [Sun, 20 Mar 2022 09:44:55 +0000 (10:44 +0100)]

Update Module::CoreList for 5.35.10

commit | commitdiff | tree

Sawyer X [Sun, 20 Mar 2022 09:36:13 +0000 (10:36 +0100)]

Update AUTHORS file

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 23:38:55 +0000 (17:38 -0600)]

Add arrows to paired string delimiters

Unicode has lots of arrows of various shapes, sizes, and directions.
None of them were of consequence to the Bidirectional algorithm, so none
were specified as being mirrored pairs. This commit uses the
generalizations already in place from previous commits to examine arrow
symbols and choose which are mirrored pairs.

As previously, it rejects arrows with contrary directionality, and ones
without horizontal directionality.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 23:37:32 +0000 (17:37 -0600)]

Add SPEAKERs paired delimiters

The characters with this name look good as mirrored delimiters.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 23:35:36 +0000 (17:35 -0600)]

Add TELEPHONE RECEIVER paired delimiters

The characters with this name look good as mirrored delimiters.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 23:33:08 +0000 (17:33 -0600)]

Add ERASE paired delimiters

The characters with this name look good as mirrored delimiters.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 23:30:47 +0000 (17:30 -0600)]

Add DOUBLE TRIANGLEs paired delimiters

The characters with this name look good as mirrored delimiters.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 23:28:31 +0000 (17:28 -0600)]

Add THREE RAYS paired delimiters

The characters with this name look good as mirrored delimiters.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 23:17:53 +0000 (17:17 -0600)]

Add musical score paired delimiters

The characters that signify the beginning and ending of Western music
scores serve as good delimiters

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 23:11:00 +0000 (17:11 -0600)]

Add INDEX paired delimiters

The bidi-aware characters containing this word are visually suitable for
being mirrored delimiters. The 'index' refers to the index finger
in a hand pointing at the delimited string

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 18:23:33 +0000 (12:23 -0600)]

Add TURNSTILE paired delimiters

The bidi-aware characters containing this word are visually suitable for
being mirrored delimiters.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 18:23:12 +0000 (12:23 -0600)]

Add TACK paired delimiters

The bidi-aware characters containing this word are visually suitable for
being mirrored delimiters.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 17:55:54 +0000 (11:55 -0600)]

Directionality pres/abs-ence can mean paired delimiters

Another way Unicode indicates that a character has horizontal
directionality is by adding LEFT or RIGHT to the name of a base
character. Hence we get RIGHT SPEAKER vs just plain SPEAKER.

Presumably this comes about when they didn't consider directionality at
first, and then realized later it was needed.

This commit makes the script look for these kinds of character pairs.
Because the current Unicode version only has this characteristic for
Symbols, and symbols must be included explicitly, no changes in what
gets paired ensues. But if you turn on the outputting of characters not
chosen, that list will now include things meeting this new criteria.
Less than a handful actually are like this.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 16:35:00 +0000 (10:35 -0600)]

unicode_constants.pl: Prepare for examining Symbols

Heretofore, the code looking for paired string delimiters has looked at
punctuation, and a few symbols that Unicode gives a mirror for.  But
there are many more suitable-for-pairing characters in Unicode.

This commit generalizes things so as to handle the extra complexities of
the way symbols are named beyond the punctuation names.   For example,
RIGHTWARDS is sometimes used; it turns out that it also is used in one
punctuation character, which was previously overlooked by this script.

The generalization introduced by this commit handles almost all current
Unicode symbols properly.

But some symbols are barely distinguishable from their mirrors, such as
a tilde and a reversed tilde.  The scheme adopted here, then, makes the
default for a symbol pair to not be marked as paired delimiters.  The
code explicitly has to specify that a given pair is to be included.

The next few commits are mostly for adding ones that I thought were
good.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 12:27:54 +0000 (06:27 -0600)]

Add 'ELEMENT OF'/CONTAINS to paired string delimiters

This commit adds 8 pairs of symbols that are variants on ELEMENT OF
These make nice paired delimiters in the vein of < >

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 12:20:09 +0000 (06:20 -0600)]

Add SUBSET/SUPERSET to paired string delimiters

This commit adds 20 pairs of symbols that are variants on SUBSET
These make nice paired delimiters in the vein of < >

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 12:14:46 +0000 (06:14 -0600)]

Add PRECEDES/SUCCEEDS to paired string delimiters

This commit adds 15 pairs of symbols that are variants on PRECEDES.
These look a lot like <>, so makes sense to make them paired delimiters.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 12:09:24 +0000 (06:09 -0600)]

Add SMALLER THAN to paired string delimiters

This commit adds 2 pairs of symbols that are variants on SMALLER THAN.
These look a lot like <>, so makes sense to make them paired delimiters.

commit | commitdiff | tree

Karl Williamson [Wed, 9 Mar 2022 20:13:02 +0000 (13:13 -0700)]

unicode_constants.pl: Consider all \pP for delims

Previously, only the punctuation characters that Unicode had classed as
being opening/closing were considered in looking for suitable paired
delimiters.

This commit looks at all punctuation characters. There are actually
only 7 new pairs found.

This gives us ꧁ ꧂ as string delimiterss, if your font allows,
which are Javanese and used to surround an honorific title, according to
Wikipedia.

commit | commitdiff | tree

Karl Williamson [Thu, 10 Mar 2022 16:40:10 +0000 (09:40 -0700)]

Add < > variants to paired delimiters

Perl considers '< >' to be delimiters for strings; this commit adds
most of the Unicode variants of these to also be string delimiters. The
ones that are combinations of both < and >, aren't included, as that
would be visually confusing.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 03:11:07 +0000 (21:11 -0600)]

unicode_constants.pl: Add REVERSED punctuation

Besides LEFT/RIGHT, horizontal directionality can be specified by
Unicode in names by the presence or absence of REVERSED.

Enhancing the algorithm to take this into account adds 2 pairs or
mirrored delimiters that were previously overlooked.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 02:03:01 +0000 (20:03 -0600)]

unicode_constants.pl: Output why chars not chosen

This script now examines all punctuation characters to see if there is a
mirrored character for it, suitable for use as a Perl string delimiter.
Some don't qualify, and some do qualify but the script doesn't catch
them.

This commit adds the ability to output which characters it doesn't think
qualify, and why. This enables a maintainer to easily check and know
what its deficiencies are, or that there is a good reason that a
particular character gets rejected.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Mar 2022 01:11:08 +0000 (19:11 -0600)]

unicode_constants.pl: Refactor to catch more paired delims

Previously, only characters that Unicode included in its bidirectional
algorithm have been eligible to be found by this program to be mirrored
string delimiters.

This commit adds 5 quotation marker character pairs that
are omitted from the bidirectional algorithm, as most quotes are,
because, as the Standard says, their "directionality and pairing status
is less predictable than paired brackets."

But we're not particularly interested in those semantics, most string
delimiters will be selected only for their visual appearance.

Because they aren't in the bidi algorithm, there is no property that
maps one member of a pair to its mate. However, Two characters whose
names pair only by LEFT vs RIGHT are almost certainly a mirrored pair.
This doesn't catch all possibilities; future commits will expand the
ones caught.

The commit refactors things so as to make future commits easier which
look at even more delimiter possibilities.

commit | commitdiff | tree

Karl Williamson [Wed, 9 Mar 2022 14:25:46 +0000 (07:25 -0700)]

Allow reversal of some paired delimiters; deprecations

Unicode says certain opening punctuation characters may be used as
closing ones in some languages; and their mirror is instead the opening
one.

This commit changes to allow either one of each such set to be the
opening one.

It also deprecates the use of any of the new mirrored delimiters to be
used outside the feature as an unmirrored delimiter, and the normal
closing delimiter from being used as an unpaired opening one while in
the feature. This gives us the freedom to make some or all of the new
paired delimiters be reversible.

commit | commitdiff | tree

Karl Williamson [Mon, 14 Feb 2022 22:25:40 +0000 (15:25 -0700)]

Add 'extra paired delimiters' feature

When this feature is enabled, one can use many more string delimiters
that have an opening version and a mirrored closing one.

commit | commitdiff | tree

Karl Williamson [Tue, 8 Mar 2022 14:31:09 +0000 (07:31 -0700)]

regen/unicode_constants.pl: List paired delimiters

This adds the capability to temporarily change a scalar to true to cause
this to print on stderr a list of the paired string delimiters, suitable
for pasting into a pod.

commit | commitdiff | tree

Karl Williamson [Mon, 14 Feb 2022 04:08:22 +0000 (21:08 -0700)]

unicode_constants.pl: Generate paired string delimiters

This commit causes several C strings to be generated containing bytes
that match paired string delimiters beyond the four that have
traditionally been used in Perl. This will allow a future commit to
accept more matching delimiters around strings than those four.

The code explains how the added delimiters are chosen.

commit | commitdiff | tree

Karl Williamson [Mon, 14 Feb 2022 02:23:50 +0000 (19:23 -0700)]

regen/unicode_constants.pl: Extract code into a fcn

This is in preparation for it to be used in multiple places in a future
commit.

commit | commitdiff | tree

Karl Williamson [Mon, 14 Feb 2022 02:03:54 +0000 (19:03 -0700)]

regen/unicode_constants.pl: White space only

Align the output of this bit vertically with surrounding output.

commit | commitdiff | tree

Karl Williamson [Sun, 13 Feb 2022 04:47:13 +0000 (21:47 -0700)]

toke.c: merge loops, multi-byte delim

The code in toke.c had two closely related loops; one for unmirrored
delimiters (same on both ends of the string) that could take UTF-8
delimiters, and one that allowed a mirrored closing delimiter, which
could take only a single byte. I found that it was just easiest to
collapse these into one loop in preparation to allow multi-byte
mirroring.

commit | commitdiff | tree

Karl Williamson [Sun, 13 Feb 2022 01:11:27 +0000 (18:11 -0700)]

toke.c: white-space only

This outdents some code that the next commit will remove an enclosing
block from

commit | commitdiff | tree

Karl Williamson [Sat, 12 Feb 2022 23:13:29 +0000 (16:13 -0700)]

toke.c: Rmv unnecessary conditionals

Cheaper to just redo a simple assignment than to test if you've already
done it and skipping it if you had.

commit | commitdiff | tree

Karl Williamson [Sat, 12 Feb 2022 21:29:14 +0000 (14:29 -0700)]

toke.c: Rename some variables; terminology in comment

Previously in places, things were called variously delimiter and
terminator, or variants thereof. This makes things consistent, and
clearer.

commit | commitdiff | tree

Karl Williamson [Sat, 12 Feb 2022 19:48:12 +0000 (12:48 -0700)]

toke.c: Split a variable into two for clarity

It can have two different meanings; split and rename to clarify what
meaning it refers to.

commit | commitdiff | tree

Karl Williamson [Sun, 20 Feb 2022 17:26:24 +0000 (10:26 -0700)]

toke.c: Rmv unnecessary SV

This code created an SV simply to get SVfARG() to print it out.  But
nowadays there is UTF8fARG which does the right thing on strings, so the
SV is unnecessary.

The code also has a fix where non-ASCII but Latin1 delimiters  when 'use
utf8' is in effect would potentially come out as garbage  This commit
also adds a clause in a conditional to prevent that.

commit | commitdiff | tree

Karl Williamson [Thu, 17 Feb 2022 02:28:53 +0000 (19:28 -0700)]

Add builtin::trim()

Most of this code came from Paul Evans and Scott Chief Baker

commit | commitdiff | tree

Karl Williamson [Thu, 17 Feb 2022 00:40:58 +0000 (17:40 -0700)]

Add is_XPERLSPACE_utf8_safe_backwards()

This macro starts from the right side and matches UTF-8 white space
characters.

commit | commitdiff | tree

Karl Williamson [Sun, 6 Jun 2021 19:45:05 +0000 (13:45 -0600)]

regen/regcharclass.pl: Add backwards UTF-8 tries

This adds the ability to generate a trie macro that starts at the right
end of a string and backs up one matching byte at a time until a full
character is matched; bailing immediately if a non-matching byte is
found.

Previously, the way to accomplish this was to call the function to hop
back (which looked at the string byte by byte backwards until it found a
non-continuation byte), and then look forwards for matching bytes.

This new way is more efficient, as only the necessary bytes are
examined.

commit | commitdiff | tree

Karl Williamson [Sun, 6 Mar 2022 20:42:48 +0000 (13:42 -0700)]

Mark regex sets feature as accepted

It is no longer experimental.

commit | commitdiff | tree

Karl Williamson [Sun, 6 Mar 2022 20:10:12 +0000 (13:10 -0700)]

Remove use of experimental regex sets warnings

These warnings are no longer generated; so simplify the core by not
trying to turn them off.

The warning is preserved so that other code need not change, but this
commit also turns the default generation of it off.

commit | commitdiff | tree

Karl Williamson [Sun, 6 Mar 2022 19:36:07 +0000 (12:36 -0700)]

Stop emitting the experimental::regex_sets warning

This is in preparation for it becoming non-experimental

commit | commitdiff | tree

Sawyer X [Sat, 19 Mar 2022 12:57:55 +0000 (13:57 +0100)]

Update Devel::PPPort to 3.68

commit | commitdiff | tree

Sawyer X [Fri, 18 Mar 2022 19:12:10 +0000 (20:12 +0100)]

Fix Atoomic's entries in AUTHORS and .mailmap

commit | commitdiff | tree

Sawyer X [Fri, 18 Mar 2022 18:54:38 +0000 (19:54 +0100)]

Update Scalar-List-Utils from 1.61 to 1.62

commit | commitdiff | tree

Nicholas Clark [Thu, 21 Oct 2021 18:57:50 +0000 (18:57 +0000)]

perldelta entry for the new key behaviour for large hashes

Note that large hashes (that are neither objects nor symbol tables) no
longer used the shared string table, and what the performance implications
might be.

This commit and the related code commits incorporate several improvements
suggested by Hugo during review.

commit | commitdiff | tree

Nicholas Clark [Tue, 19 Oct 2021 10:51:29 +0000 (10:51 +0000)]

Heuristically turn off shared hash keys for larger hashes

The assumption is that large hashes (that are not objects or symbol tables)
have keys that are not repeated in other hashes, hence (also) storing those
keys in the shared string table is creating work without real benefit.

commit | commitdiff | tree

Nicholas Clark [Tue, 19 Oct 2021 09:16:03 +0000 (09:16 +0000)]

Explicitly clear the HVhek_NOTSHARED bit on entry to hv_common

Some callers to hv_common() pass the flags value from an existing HEK, and
if that HEK is not shared, then it has the relevant flag bit set, which
must not be passed into share_hek_flags().

There is an assertion that catches this in share_hek_flags() if assertions
are enabled.

Remove the analogous assertion in save_hek_flags() - to comply with this
assertion he_dup() and new_HVhv() would need to be changed to clear the
flag bit before every call, only for share_hek_flags() to add it right back.
This feels like makework.

commit | commitdiff | tree

Nicholas Clark [Tue, 19 Oct 2021 08:50:29 +0000 (08:50 +0000)]

Eliminate "masked_flags" from functions in hv.c

This was confusing because there are (at least) 3 types of masking needed

*) the bits that we record (HVhek_UTF8 and HVhek_WASUTF8)
*) the bit that flags storage type (HVhek_NOTSHARED)
*) the bit that triggers key freeing (HVhek_FREEKEY)

and at different times we need to mask out different things.

So eliminate the ambiguous term "mask", and instead explicitly test or mask
the bits we need.

commit | commitdiff | tree

Nicholas Clark [Mon, 18 Oct 2021 19:56:09 +0000 (19:56 +0000)]

Rename HVhek_UNSHARED to HVhek_NOTSHARED

"HVhek_UNSHARED" marked unshared HEKs - allocated directly with malloc(),
rather then from the shared string table, and released with free().

But *shared* HEKs (in the shared string table) are released by calling
unshare_hek(), whilst unshared HEKs should never go near this.

So rename them to "not shared", to avoid this confusion. Change their flag
bit from 0x08 to 0x04 to remove a gap. 0x04 had previously been used to
flag "REHASH", which was removed before v5.18.0

Move the definition of the macro HVhek_MASK from hv.h to hv.c

commit | commitdiff | tree

Nicholas Clark [Thu, 21 Oct 2021 18:53:01 +0000 (18:53 +0000)]

Drop the unused hv argument from S_hv_free_ent_ret()

In turn, this means that the hv argument to Perl_hv_free_ent() and
Perl_hv_delayfree_ent() is now clearly unused, so mark it as such. Both
functions are deemed to be API, so unlike the static function
S_hv_free_ent_ret we can't simply change their parameters.

However, change all the internal callers to pass NULL instead of the hv, as
this makes it obvious that the function does not read hv, and might cause
the compiler to generate better code.

commit | commitdiff | tree

Nicholas Clark [Sun, 17 Oct 2021 18:52:28 +0000 (18:52 +0000)]

Use each HEK's own flags to decide "shared or not", instead of the HV's

Previously it was assumed that a hash with HvSHAREKEYS() true could only
contain shared HEKs, and a hash with it false always contained only unshared
HEKs. As HEKs all contain a flag bit to indicated "shared or not", instead
use that to take decisions on how to dup or free them.

commit | commitdiff | tree

Daniel Laügt [Fri, 18 Mar 2022 16:21:24 +0000 (17:21 +0100)]

Pass usequadmath to config_sh.PL

commit | commitdiff | tree

Dagfinn Ilmari Mannsåker [Tue, 15 Mar 2022 18:49:17 +0000 (18:49 +0000)]

Fix inttypes.h reference in perlhacktips

commit | commitdiff | tree

Graham Knop [Tue, 15 Mar 2022 14:19:03 +0000 (15:19 +0100)]

always prevent setting POK flag when NV values are used as strings

Since PR #18958, values that start as IVs will not get their POK flags
set when they are used as strings. This is meant to aid in
serialization, allowing the "original" type of a value to be preserved.

For NV values, the POK flag was already usually not being set, because
the string form of a float could change based on the locale changing.
However, for Inf and NaN values, the POK flag would still be enabled.
Also, POK would be set for all floats if USE_LOCALE_NUMERIC was not
defined.

Update Perl_sv_2pv_flags to only enable the POKp flag when storing the
PV for Inf or NaN values, or all NVs when USE_LOCALE_NUMERIC is not
defined.

commit | commitdiff | tree

Paul "LeoNerd" Evans [Wed, 9 Mar 2022 11:43:42 +0000 (11:43 +0000)]

Add mention of new `builtin::indexed` to perldelta

commit | commitdiff | tree

Paul "LeoNerd" Evans [Wed, 9 Mar 2022 11:31:43 +0000 (11:31 +0000)]

No need to document 'useless use of sort in scalar context' separately now there's a general category for it

commit | commitdiff | tree

Paul "LeoNerd" Evans [Mon, 14 Mar 2022 11:13:15 +0000 (11:13 +0000)]

An initial implementation of builtin::indexed

* Implementation, unit tests, documentation

commit | commitdiff | tree

Paul "LeoNerd" Evans [Wed, 9 Mar 2022 11:05:33 +0000 (11:05 +0000)]

Add missing builtin diagnostic to perldiag.pod; rename 'compiletime' to 'compile time' for consistency

commit | commitdiff | tree

Paul "LeoNerd" Evans [Wed, 9 Mar 2022 11:02:12 +0000 (11:02 +0000)]

t/porting/diag.t: Don't skip the bodies of XS functions in builtin.t

commit | commitdiff | tree

Yves Orton [Sun, 13 Mar 2022 13:35:54 +0000 (14:35 +0100)]

hv.c: remove dead function ptr_hash()

ptr_hash() was made obsolete by c3c9d6b15f57fdce79988a553671a1ceb54c0f10

It is a static function only used or exposed in hv.c and is no longer
used, this deletes it.

No tests required, it is not used.

commit | commitdiff | tree

Yves Orton [Sat, 12 Mar 2022 01:45:09 +0000 (02:45 +0100)]

Revert "Fix GH Issue #19472: read warnings from open($fh,">",\(my $x))"

This reverts commit 8b03aeb95ab72abdb2fa40f2d1196ce42f34708d.

This is causing BBC breakage, and its unimport and grey zone enough that
we can pick it up in 5.37 when we have more time to deal with it.

commit | commitdiff | tree

Richard Leach [Sun, 13 Mar 2022 18:48:12 +0000 (18:48 +0000)]

perl.c: www.perl.org uses https

commit | commitdiff | tree

Leon Timmermans [Fri, 11 Mar 2022 15:04:52 +0000 (16:04 +0100)]

Delete Porting/check83.pl

We stopped caring about 8.3 compliance at least 15 years ago, and
currently ship hundreds of files that aren't compliant with such a
filesystem. This script doesn't serve any purpose anymore.

commit | commitdiff | tree

Steve Hay [Sun, 13 Mar 2022 09:15:09 +0000 (09:15 +0000)]

Import perl5341delta.pod

commit | commitdiff | tree

Steve Hay [Sun, 13 Mar 2022 09:00:28 +0000 (09:00 +0000)]

Update Module-CoreList with data for 5.34.1

commit | commitdiff | tree

Steve Hay [Sat, 12 Mar 2022 17:35:46 +0000 (17:35 +0000)]

Fill in date for 5.34.1

(cherry picked from commit 1181cf72936404b890ef41362f533fe2c2626f9b)

commit | commitdiff | tree

Steve Hay [Sun, 13 Mar 2022 08:49:19 +0000 (08:49 +0000)]

Tick off release