perl5.git.perl.org Git - perl5.git/log

This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5

https://perl5.git.perl.org / perl5.git / log

commit | commitdiff | tree

Tony Cook [Tue, 30 Jan 2018 23:46:49 +0000 (10:46 +1100)]

bump $Devel::PPPort::VERSION to 3,39

commit | commitdiff | tree

Pali [Fri, 26 Jan 2018 18:39:49 +0000 (19:39 +0100)]

Devel::PPPort: Use croak_nocontext() intead of croak() when dTHX is not declared

commit | commitdiff | tree

Pali [Fri, 26 Jan 2018 18:39:14 +0000 (19:39 +0100)]

Devel::PPPort: Declare dTHX in croak_xs_usage()

CvGV() takes aTHX_ as first argument.

commit | commitdiff | tree

Tony Cook [Thu, 25 Jan 2018 03:39:54 +0000 (14:39 +1100)]

(perl #132761) croak_xs_usage() shouldn't accept a THX arguement

commit | commitdiff | tree

Pali [Tue, 23 Jan 2018 22:36:06 +0000 (23:36 +0100)]

Devel::PPPort: Do not run tests which use \N{U+XX} on Perl 5.12.0

Perl 5.12.0 has bug when parsing \N{U+XX} syntax and throw error:
Invalid hexadecimal number in \N{U+...} in regex.

commit | commitdiff | tree

Pali [Tue, 23 Jan 2018 22:00:55 +0000 (23:00 +0100)]

Devel::PPPort: Do not define PERL_MAGIC_qr more times

make regen show warning: magic: PERL_MAGIC_qr already provided by misc

Remove it from misc, but because misc depends on it, put magic before misc.

commit | commitdiff | tree

Pali [Tue, 23 Jan 2018 21:50:20 +0000 (22:50 +0100)]

Devel::PPPort: Do not mask Perl_warn_nocontext and Perl_croak_nocontext

It cause compile errors on older threaded Perl versions.

commit | commitdiff | tree

Karl Williamson [Mon, 22 Jan 2018 19:55:31 +0000 (12:55 -0700)]

Use dfa to speed up translating UTF-8 into code point

This dfa is available from the internet has the reputation of being the
fastest general translator.  This commit changes to use it at the
beginning of our translator, modifying it slightly to accept surrogates
and all 4-byte Perl-extended.  If necessary, it drops down into our
translator to handle errors and warnings and Perl extended.

It shows some improvement over our base translation:

Key:
    Ir   Instruction read
    Dr   Data read
    Dw   Data write
    COND conditional branches
    IND  indirect branches
    _m   branch predict miss
    -    indeterminate percentage (e.g. 1/0)

The numbers represent raw counts per loop iteration.

unicode::utf8n_to_uvchr_0x007f
ord(X)

       blead   dfa Ratio %
       ----- ----- -------
    Ir 359.0 359.0   100.0
    Dr 111.0 111.0   100.0
    Dw  64.0  64.0   100.0
  COND  42.0  42.0   100.0
   IND   5.0   5.0   100.0

COND_m   2.0   0.0     Inf
IND_m   5.0   5.0   100.0

unicode::utf8n_to_uvchr_0x07ff
ord(X)

       blead   dfa Ratio %
       ----- ----- -------
    Ir 478.0 467.0   102.4
    Dr 132.0 133.0    99.2
    Dw  79.0  78.0   101.3
  COND  63.0  57.0   110.5
   IND   5.0   5.0   100.0

COND_m   1.0   0.0     Inf
IND_m   5.0   5.0   100.0

unicode::utf8n_to_uvchr_0xfffd
ord(X)

       blead   dfa Ratio %
       ----- ----- -------
    Ir 494.0 486.0   101.6
    Dr 134.0 136.0    98.5
    Dw  79.0  78.0   101.3
  COND  67.0  61.0   109.8
   IND   5.0   5.0   100.0

COND_m   2.0   0.0     Inf
IND_m   5.0   5.0   100.0

unicode::utf8n_to_uvchr_0x1fffd
ord(X)

       blead   dfa Ratio %
       ----- ----- -------
    Ir 508.0 505.0   100.6
    Dr 135.0 139.0    97.1
    Dw  79.0  78.0   101.3
  COND  70.0  65.0   107.7
   IND   5.0   5.0   100.0

COND_m   2.0   1.0   200.0
IND_m   5.0   5.0   100.0

unicode::utf8n_to_uvchr_0x10fffd
ord(X)

       blead   dfa Ratio %
       ----- ----- -------
    Ir 508.0 505.0   100.6
    Dr 135.0 139.0    97.1
    Dw  79.0  78.0   101.3
  COND  70.0  65.0   107.7
   IND   5.0   5.0   100.0

COND_m   2.0   1.0   200.0
IND_m   5.0   5.0   100.0

Each code point represents an extra byte required in its UTF-8
representation from the previous one.

commit | commitdiff | tree

Karl Williamson [Tue, 30 Jan 2018 19:34:20 +0000 (12:34 -0700)]

regcomp.c: Silence compiler maybe uninit warnings

I don't believe that actually these can be used uninitialized, but
initialize them anyway to silence the warnings.

commit | commitdiff | tree

Matthew Horsfall [Tue, 23 Jan 2018 19:45:06 +0000 (14:45 -0500)]

Use the correct path for valgrind logs in make test.valgrind

commit | commitdiff | tree

Karl Williamson [Tue, 30 Jan 2018 03:47:56 +0000 (20:47 -0700)]

Add ANYOFM regnode

This is a specialized ANYOF node for use when the code points in it
have characteristics that allow them to be matched with a mask instead
of a bit map.  When this happens, the speed up is pretty spectacular:

Key:
    Ir   Instruction read
    Dr   Data read
    Dw   Data write
    COND conditional branches
    IND  indirect branches

The numbers represent raw counts per loop iteration.

Results of ('b' x 10000) . 'a' =~ /[Aa]/

          blead    mask Ratio %
       -------- ------- -------
    Ir 153132.0 25636.0   597.3
    Dr  40909.0  2155.0  1898.3
    Dw  20593.0   593.0  3472.7
  COND  20529.0  3028.0   678.0
   IND     22.0    22.0   100.0

See the comments in regcomp.c or
http://nntp.perl.org/group/perl.perl5.porters/249001 for a description
of the cases that this new technique can handle.  But several common
ones include the C0 controls (on ASCII platforms), [01], [0-7], [Aa] and
any other ASCII case pair.

The set of ASCII characters also could be done with this node instead of
having the special ASCII regnode, reducing code size and complexity.
I haven't investigated the speed loss of doing so.

A NANYOFM node could be created for matching the complements this one
matches.

A pattern like /A/i is not affected by this commit, but the regex
optimizer could be changed to take advantage of this commit.  What would
need to be done is for it to look at the first byte of an EXACTFish node
and if its one of the case pairs this handles, to generate a synthetic
start class for it.  This would automatically invoke the sped up code.

commit | commitdiff | tree

Karl Williamson [Mon, 22 Jan 2018 20:55:03 +0000 (13:55 -0700)]

recomp.sym: Add ANYOFM regnode

This uses a mask instead of a bitmap, and is restricted to representing
invariant characters under UTF-8 that meet particular bit patterns.

commit | commitdiff | tree

Karl Williamson [Thu, 25 Jan 2018 20:35:09 +0000 (13:35 -0700)]

regcomp.c: White-space only

Indent code that the previous commit created a block around

commit | commitdiff | tree

Karl Williamson [Thu, 25 Jan 2018 20:26:16 +0000 (13:26 -0700)]

regcomp.c: Allow a fcn param to be NULL

In which case handling is skipped. This is in preparation for a future
commit which will use this function in a slightly different manner

commit | commitdiff | tree

Karl Williamson [Fri, 29 Dec 2017 22:45:38 +0000 (15:45 -0700)]

regexec.c: Use word-at-a-time to repeat /i single byte pattern

For most of the case folding pairs, like [Aa], it is possible to use a
mask to match them word-at-a-time in regrepeat(), so that long sequences
of them are handled with significantly better performance.

commit | commitdiff | tree

Karl Williamson [Fri, 29 Dec 2017 22:17:41 +0000 (15:17 -0700)]

regexec.c: Use word-at-a-time to repeat a single byte pattern

There is special code in the function regrepeat() to handle instances
where the pattern to repeat is a single byte. These all can be done
word-at-a-time to significantly increase the performance of long
repeats.

commit | commitdiff | tree

Karl Williamson [Wed, 27 Dec 2017 01:25:26 +0000 (18:25 -0700)]

regexec.c: Replace loop by memchr()

This can be called on a potentially long string.

commit | commitdiff | tree

Karl Williamson [Tue, 30 Jan 2018 03:33:14 +0000 (20:33 -0700)]

Compile variant_byte_number() for EBCDIC

Future commits will use this without regard to platform.

commit | commitdiff | tree

Karl Williamson [Tue, 30 Jan 2018 03:07:51 +0000 (20:07 -0700)]

Use different scheme to handle MSVC6

Recent commit 0b08cab0fc46a5f381ca18a451f55cf12c81d966 caused a function
to not be compiled when running on MSVC6, and hence its callers needed
to use an alternative mechanism there. This is easy enough, it turns
out, but it also turns out that there are more opportunities to call
this function. Rather than having each caller have to know about the
MSVC6 problem, this current commit reimplements the function on that
platform to use a slow, dumb method, so knowing about the issue is
confined to just this one function.

commit | commitdiff | tree

Karl Williamson [Sun, 28 Jan 2018 21:48:53 +0000 (14:48 -0700)]

APItest/APItest.xs: Simplify mappings

Instead of using SVs, use the underlying C type, and so the code here
doesn't have to deal with the SV conversions

commit | commitdiff | tree

Karl Williamson [Sun, 28 Jan 2018 21:47:16 +0000 (14:47 -0700)]

APItest/t/utf8_warn_base.pl: White-space only

This outdents a bunch of code to make it a shift width of 2 instead of 4
because the nesting was getting too deep, making the space available on
a line too short.

commit | commitdiff | tree

Karl Williamson [Sun, 28 Jan 2018 21:43:00 +0000 (14:43 -0700)]

APItest/t/utf8_warn_base.pl: Improve diagnostics

commit | commitdiff | tree

Karl Williamson [Sun, 28 Jan 2018 00:43:00 +0000 (17:43 -0700)]

Add utf8n_to_uvchr_msgs()

This UTF-8 to code point translator variant is to meet the needs of
Encode, and provides XS authors with more general capability than
the other decoders.

commit | commitdiff | tree

Karl Williamson [Sun, 28 Jan 2018 17:02:11 +0000 (10:02 -0700)]

Don't use variant_byte_number on MSVC6

See [perl #132766]

commit | commitdiff | tree

Karl Williamson [Thu, 25 Jan 2018 17:37:04 +0000 (10:37 -0700)]

inline.h: Clarify comment

commit | commitdiff | tree

Karl Williamson [Thu, 25 Jan 2018 17:25:27 +0000 (10:25 -0700)]

Don't use C99 ULL constant suffix

The suffix ULL in, e.g., 7ULL, is C99, and since perl supports C89, we
can't use it. Change these occurrences to wrap those that would exceed
32 bits to use UINTMAX_C(...).

perl.h has logic to define that macro appropriately if the compiler
doesn't already know it.

commit | commitdiff | tree

Karl Williamson [Mon, 29 Jan 2018 17:46:02 +0000 (10:46 -0700)]

Fix bug in new [[:ascii:]] nodes

Commit aff4cafe362e55c7722ba12952e287a7d1770cb9 added new regnodes for
[[:ascii:]] and its complement for a significant performance
improvement.  In looking at the code later, I realized that there was a
bug in find_byclass() in that it didn't continue to try after an initial
trial match succeeds, but getting the whole pattern to match fails.
It's supposed to try again with the next ascii.

This commit fixes that, and adds tests.

I thought that these new changes might lower the performance improvement
of the original, but it doesn't.  Here's a typical one where we have a
string of a million non-ascii 2-byte characters, followed by a single
ASCII one.

         posixa    ascii  Ratio %
        ------- -------- --------
     Ir     Inf      Inf    665.9
     Dr     Inf 250907.0   1993.1
     Dw     Inf    597.0 167603.7
   COND     Inf 500532.0    399.7
    IND    22.0     22.0    100.0

(posixa is the old way of doing things; Inf just means the number was
too large for the program to want to display it; the ratio is still
valid).

commit | commitdiff | tree

Karl Williamson [Mon, 29 Jan 2018 16:51:45 +0000 (09:51 -0700)]

regexec.c: Extract some macro code into a submacro

A future commit will reuse this code, so will avoid duplication.

commit | commitdiff | tree

Karl Williamson [Mon, 29 Jan 2018 04:02:49 +0000 (21:02 -0700)]

regexec.c: Use different method for finding adjacent chars

Commit 3b6c52ce7db772c296d8f10d92dec46af03391dc changed the variable
name and commented what the code was doing. This changes that code to
use a different mechanism that I think is simpler, and is extensible so
that it can be used not just for instances in which the input is
examined character-by-character.

Until this commit, a boolean was used to indicate that we've found
adjacent characters. This commit saves the address of the next
character, so when we find the next match, if it begins at the saved
address, we know it is adjacent.

commit | commitdiff | tree

Karl Williamson [Mon, 29 Jan 2018 03:33:10 +0000 (20:33 -0700)]

regexec.c: Extract some macro code into a sub-macro

By doing this, it becomes common code with another place in the code, so
the duplication can be removed.

commit | commitdiff | tree

Karl Williamson [Mon, 29 Jan 2018 02:15:25 +0000 (19:15 -0700)]

regexec.c: Collapse some macros

By adding a utf8ness parameter these 4 macros can be collapsed into 2,
with no increase in run time, as the parameter is always a compile time
constant and modern compilers will avoid the conditional.

commit | commitdiff | tree

Karl Williamson [Mon, 29 Jan 2018 23:05:41 +0000 (16:05 -0700)]

Fix bug in t/re/regex_sets_compat.t

This tests the tests that regexp.t has and which have bracketed
character classes. It converts those to the regex sets notation, and
verifies they still work. It was adding and extra blank at the end of
the pattern in some cases, causing it to fail.

commit | commitdiff | tree

Karl Williamson [Fri, 26 Jan 2018 19:33:20 +0000 (12:33 -0700)]

regexec.c: Use meaningful variable name; comment

It took me quite a while to figure out what 'tmp' is doing here. So I
renamed it to a more meaningful name, and added comments.

commit | commitdiff | tree

Karl Williamson [Thu, 25 Jan 2018 20:36:25 +0000 (13:36 -0700)]

regcomp.c: Clarify comment

commit | commitdiff | tree

Karl Williamson [Thu, 25 Jan 2018 20:20:24 +0000 (13:20 -0700)]

regcomp.c: Use existing function to do task

The function does it better than this code, which looked too deeply into
the internals, and got it wrong sometimes, because it didn't look at the
state of the inversion. The consequences are not a bug, but potentially
forgoing an optimization, or needlessly looking for an optimization that
will turn out to not be there.

commit | commitdiff | tree

Karl Williamson [Tue, 23 Jan 2018 20:38:04 +0000 (13:38 -0700)]

regcomp.c: Fix typo in comment

commit | commitdiff | tree

Steve Hay [Thu, 25 Jan 2018 13:38:36 +0000 (13:38 +0000)]

Define I_STDINT for gcc, and for VC++ 2010 onwards

Fixes the Math-MPFR-4.0.0 build on 5.27.7 onwards.
See: https://www.nntp.perl.org/group/perl.perl5.porters/2018/01/msg248964.html

commit | commitdiff | tree

Karl Williamson [Thu, 25 Jan 2018 01:05:31 +0000 (18:05 -0700)]

embed.fnc: Formal param shouldn't be const

commit | commitdiff | tree

Tony Cook [Thu, 25 Jan 2018 00:43:07 +0000 (11:43 +1100)]

George Hartzell is now a perl author

commit | commitdiff | tree

Tony Cook [Thu, 25 Jan 2018 00:42:51 +0000 (11:42 +1100)]

bump $Errno::VERSION

commit | commitdiff | tree

George Hartzell [Wed, 24 Jan 2018 21:36:10 +0000 (13:36 -0800)]

Typo: 'at alia' should be 'et alia'

commit | commitdiff | tree

Craig A. Berry [Tue, 23 Jan 2018 03:37:08 +0000 (21:37 -0600)]

Fix reversed logic from 1d60dc3fde1056479b.

Fat-fingered this one somehow.

commit | commitdiff | tree

Chris 'BinGOs' Williams [Mon, 22 Jan 2018 20:12:44 +0000 (20:12 +0000)]

Remove Module::CoreList::TieHashDelta

commit | commitdiff | tree

Chris 'BinGOs' Williams [Mon, 22 Jan 2018 19:58:00 +0000 (19:58 +0000)]

Reset Module-CoreList versioning back to 5.20180220

commit | commitdiff | tree

Karl Williamson [Mon, 22 Jan 2018 19:45:14 +0000 (12:45 -0700)]

Allow space for NUL is UTF-8 array decls

In grepping the source, I noticed that several arrays that are for
holding UTF-8 characters did not allow space for a trailing NUL. This
commit adds that.

commit | commitdiff | tree

Pali [Mon, 22 Jan 2018 17:29:11 +0000 (18:29 +0100)]

Devel::PPPort: Skip ASCII tests on non-ASCII platforms

commit | commitdiff | tree

Dagfinn Ilmari Mannsåker [Mon, 22 Jan 2018 18:04:25 +0000 (18:04 +0000)]

Remove obsolete reference to Module::CoreList::TieHashDelta

%Module::CoreList::version is no longer implemented via
Module::CoreList::TieHashDelta, but generated from
%Module::CoreList::delta upfront.

commit | commitdiff | tree

Dagfinn Ilmari Mannsåker [Mon, 22 Jan 2018 13:10:57 +0000 (13:10 +0000)]

Improve handling of broken versions in Module::CoreList::is_core

- Only parse the user-provided version once
- Include the invalid version in the error message
- Ignore broken versions in M:CL's own data

commit | commitdiff | tree

jdhedden [Sat, 20 Jan 2018 20:38:03 +0000 (15:38 -0500)]

Upgrade to threads::shared 1.58

commit | commitdiff | tree

jdhedden [Sat, 20 Jan 2018 20:31:53 +0000 (15:31 -0500)]

Upgrade to threads 2.21

commit | commitdiff | tree

Karl Williamson [Mon, 22 Jan 2018 00:55:23 +0000 (17:55 -0700)]

PATCH: [perl #132750] Silence uninit warning

I inspected the code, and there is no problem here; it's a compiler
mistake. Nevertheless, smply initializing the variable silences it.

commit | commitdiff | tree

Father Chrysostomos [Mon, 22 Jan 2018 00:19:56 +0000 (16:19 -0800)]

Follow-up to fd77b29b3be4

As Zefram pointed out, I left in a piece of code that caused one
branch to continue to behave as before. The change was ineffective
and the tests happened to be written in such a way as to take the
other branch.

commit | commitdiff | tree

Aaron Crane [Sun, 21 Jan 2018 20:28:52 +0000 (20:28 +0000)]

perlpolicy: update policy in accordance with recent moderator discussions

Substantive changes:

Firstly, as promised earlier, we have clarified that, by forwarding a
message to the list, the sender takes responsibility for the content of the
message in question.

Secondly, we have changed the policy regarding ban lengths. Previously,
third or subsequent instances of unacceptable behaviour resulted in a ban
twice the length of the person's previous ban. Under the new policy, a third
instance of unacceptable behaviour results in a further warning, and a
fourth instance results in a ban of indefinite length.

Our rationale is that temporary bans are for the offender: to give them the
opportunity to change their behaviour in a way that aligns with our
community expectations. However, if the person in question fails to take
advantage of that opportunity, our focus must shift to the community: we aim
to protect other list members from having to bear the burden of unacceptable
behaviour.

Finally, we welcome Karen Etheridge and Todd Rinaldo as additional
moderators. I'd like to offer both Karen and Todd my personal thanks for
agreeing to serve.

commit | commitdiff | tree

Craig A. Berry [Sun, 21 Jan 2018 18:47:45 +0000 (12:47 -0600)]

Make VMS CRTL features work for embedders.

The various run-time features of the CRTL that Perl uses were being
fetched at image activation time and stored in static variables
for later reference.  That works ok when Perl is the program, but
not when Perl is the library since in the latter case attempts by
an embedder to alter the feature settings before invoking Perl were
being ignored.

So store the feature index, not its value, and use that index to
get the current value via decc$feature_get_value whenever we need
it.  This means function calls rather than data references, but
there is no measurable impact on performance.

Also fix a bug in the handling of the feature to disable the POSIX
root; we were saying we were disabling it but weren't really doing
so because its current value cannot be set for some reason (only
its default value).  Since the feature only affects the conversion
of filenames between Unix and VMS format and we don't use the CRTL's
functions for that, it's unlikely this bug ever caused trouble.

commit | commitdiff | tree

Tomasz Konojacki [Sat, 20 Jan 2018 22:54:34 +0000 (23:54 +0100)]

loc_tools.pl: untaint _source_location's return value and fix a warning

Some of our tests are running with -T and it turns out that something in
File::Spec::Unix is the tainting the return value of _source_location().

Additionally, when warnings are enabled globally (with either $^W or -W),
not passing the last argument (filename) to File::Spec->catpath results in
an undefined variable warning.

commit | commitdiff | tree

Tom Wyant [Sat, 20 Jan 2018 22:09:00 +0000 (15:09 -0700)]

Fix typos in script_run documentation

commit | commitdiff | tree

Chris 'BinGOs' Williams [Sat, 20 Jan 2018 11:06:19 +0000 (11:06 +0000)]

Bibbidi-Bobbidi-Boo

commit | commitdiff | tree

Abigail [Sat, 20 Jan 2018 04:18:09 +0000 (05:18 +0100)]

Update Module::CoreList for 5.27.9

commit | commitdiff | tree

Abigail [Sat, 20 Jan 2018 04:12:23 +0000 (05:12 +0100)]

Bump the perl version in various places for 5.27.9

commit | commitdiff | tree

Abigail [Sat, 20 Jan 2018 03:57:46 +0000 (04:57 +0100)]

New perldelta for 5.27.9

commit | commitdiff | tree

Abigail [Sat, 20 Jan 2018 03:51:34 +0000 (04:51 +0100)]

Tick release schedule

commit | commitdiff | tree

Abigail [Sat, 20 Jan 2018 03:49:55 +0000 (04:49 +0100)]

Epigraph for 5.27.8

commit | commitdiff | tree

Abigail [Sat, 20 Jan 2018 01:35:07 +0000 (02:35 +0100)]

Perlhist entry for 5.27.8

commit | commitdiff | tree

Abigail [Sat, 20 Jan 2018 01:29:00 +0000 (02:29 +0100)]

Fixup for perldelta.

Removed a XXX section.

commit | commitdiff | tree

Abigail [Sat, 20 Jan 2018 01:22:33 +0000 (02:22 +0100)]

Updated modules for perldelta.

commit | commitdiff | tree

Abigail [Sat, 20 Jan 2018 00:51:29 +0000 (01:51 +0100)]

Acknowledgements for perldelta

commit | commitdiff | tree

Abigail [Fri, 19 Jan 2018 23:18:05 +0000 (00:18 +0100)]

Update Module::CoreList for 5.27.8

commit | commitdiff | tree

Karl Williamson [Fri, 19 Jan 2018 22:24:31 +0000 (15:24 -0700)]

perldelta: Clarify entry

Spotted by Dan Book

commit | commitdiff | tree

Father Chrysostomos [Fri, 19 Jan 2018 21:47:53 +0000 (13:47 -0800)]

Don’t vivify elems when putting array on stack

6661956a2 was a little too powerful, and, in addition to fixing the
bug that @_ did not properly alias nonexistent elements, also broke
other uses of nonexistent array elements. (See the tests added.)

This commit changes it so that putting @a on the stack does not vivify
all ‘holes’ in @a, but creates defelem (deferred element) scalars, but
only in lvalue context.

commit | commitdiff | tree

Father Chrysostomos [Fri, 19 Jan 2018 20:41:15 +0000 (12:41 -0800)]

Apply the mod flag to @a in \(@a)

The next commit will depend on it.

commit | commitdiff | tree

David Mitchell [Fri, 19 Jan 2018 21:11:24 +0000 (21:11 +0000)]

perldelta for signatures/attribute order flip

commit | commitdiff | tree

David Mitchell [Thu, 18 Jan 2018 09:44:10 +0000 (09:44 +0000)]

move sub attributes before the signature

RT #132141

Attributes such as :lvalue have to come *before* the signature to ensure
that they're applied to any code block within the signature; e.g.

    sub f :lvalue ($a = do { $x = "abc"; return substr($x,0,1)}) {
        ....
    }

So this commit moves sub attributes to come before the signature.  This is
how they were originally, but they were swapped with v5.21.7-394-gabcf453.
This commit is essentially a revert of that commit (and its followups
v5.21.7-395-g71917f6, v5.21.7-421-g63ccd0d), plus some extra work for
Deparse, and an extra test.

See:
    RT #123069 for why they were originally swapped
    RT #132141 for why that broke :lvalue
    http://nntp.perl.org/group/perl.perl5.porters/247999
               for a general discussion about RT #132141

commit | commitdiff | tree

Karl Williamson [Fri, 19 Jan 2018 20:02:36 +0000 (13:02 -0700)]

newSVpvn(): Fix pod

There is no "buffer" argument; don't refer to one.

Spotted by KES

commit | commitdiff | tree

Karl Williamson [Thu, 4 Jan 2018 19:53:29 +0000 (12:53 -0700)]

Raise deprecation for qr/(?foo})/

An unescaped left brace that is meant to be taken literally is
officially deprecated, though there are no plans to remove it in contexts
where we don't expect to use it to mean something else, and no warning
is raised in those contexts.

reg_mesg.t tests the known set of these contexts, currently (after this
commit):

/^{/
/foo|{/
/foo|^{/
/foo(:?{bar)/
/\s*{/
/a{3,4}{/

This commit deprecates this context:

/foo({bar})/

This probably should have been illegal all along when 'bar' is a valid
quantifier, as we do with the other quantifiers that follow a left
paren whose illegality we haven't already taken advantage of to mean
something else:

qr/(+0)/
Quantifier follows nothing in regex

This deprecation will allow ({...}) to be usable for a possible future
regex extension

commit | commitdiff | tree

Karl Williamson [Tue, 19 Dec 2017 23:14:01 +0000 (16:14 -0700)]

doop.c: White-space only

Indent to correspond with the new block placed by the previous commit.

commit | commitdiff | tree

Karl Williamson [Tue, 19 Dec 2017 23:03:39 +0000 (16:03 -0700)]

Deprecate above \xFF in bitwise string ops

This is already a fatal error for operations whose outcome depends on
them, but in things like

"abc" & "def\x{100}"

the wide character doesn't actually need to participate in the AND, and
so perl doesn't. As a result of the discussion in the thread beginning
with http://nntp.perl.org/group/perl.perl5.porters/244884, it was
decided to deprecate these ones too.

commit | commitdiff | tree

Karl Williamson [Tue, 19 Dec 2017 23:03:57 +0000 (16:03 -0700)]

doop.c: Use MIN()

This is slightly cleaner than hand rolling the min.

commit | commitdiff | tree

Karl Williamson [Tue, 19 Dec 2017 22:49:47 +0000 (15:49 -0700)]

op/bop.t: Fix typo in test name

commit | commitdiff | tree

Abigail [Fri, 19 Jan 2018 15:36:57 +0000 (16:36 +0100)]

Update Copyright years in README and perl.c.

Now, 2018 is included.

commit | commitdiff | tree

David Mitchell [Fri, 19 Jan 2018 14:27:07 +0000 (14:27 +0000)]

perldelta: add recent tr/// changes

commit | commitdiff | tree

David Mitchell [Fri, 19 Jan 2018 14:08:28 +0000 (14:08 +0000)]

[MERGE] various tr/// fixups, esp for /c and /d

This branch does the following:

Fixes an issue with tr/non_utf8/long_non_utf8/c, where
length(long_non_utf8) > 0x7fff.

Fixes an issue with tr/non_utf8/non_utf8/cd: basically, the
implicit \x{100}-\x{7fffffff} added to the searchlist by /c wasn't being
added.

Adds a lot of code comments to the various tr/// functions.

Adds tr///c tests - basically /c was almost completely untested.

Changes the layout of the op_pv transliteration table: it used to be roughly

      256 x short  - basic table
        1 x short  - length of extended table (n)
        n x short  - extended table

where the 2 and 3rd items were only present under /c. Its now

        1 x Size_t - length of table (256+n)
  (256+n) x short  - table - both basic and extended

where n == 0 apart from under /c.

The new table format also allowed the tr/non_utf8/non_utf8/ code branches
to be considerably simplified.

op_dump() now dumps the contents of the (non-utf8 variant) transliteration
table.

Removes I32's from the tr/non_utf8/non_utf8/ code paths, making it fully
64-bit clean.

Improves the pod for tr///.

commit | commitdiff | tree

David Mitchell [Fri, 19 Jan 2018 13:33:14 +0000 (13:33 +0000)]

perlop: improve tr/// documentation

Specifically, explain more clearly what the /csd modifiers do.

commit | commitdiff | tree

David Mitchell [Fri, 19 Jan 2018 12:56:33 +0000 (12:56 +0000)]

tr///: eliminate I32 from the do_trans*() fns

Replace each with a more appropriate type

commit | commitdiff | tree

David Mitchell [Fri, 19 Jan 2018 12:45:37 +0000 (12:45 +0000)]

tr///: return Size_t count rather than I32

Change the signature of all the internal do_trans*() functions to return
Size_t rather than I32, so that the count returned by tr//// can cope with
strings longer than 2Gb.

commit | commitdiff | tree

David Mitchell [Fri, 19 Jan 2018 12:18:39 +0000 (12:18 +0000)]

tr///: remove some I32 from S_pmtrans()

I32 to hold char counts etc is generally a bug. I've replaced with Size_t.
I've left the swash part of the code alone.

commit | commitdiff | tree

David Mitchell [Fri, 19 Jan 2018 12:01:56 +0000 (12:01 +0000)]

tr/nonutf8/nonutf8/c: simplify GROW calc

When, for each slot, deciding whether to set OPpTRANS_GROWS, the
calculation is only done in one of 4 possible branches. It turns out that
in the other branches, the condition can never be true; but determining
that is subtle, and the assumption might break for future changes. Move
the test outside the if/else tree so it can be seen to always apply.

So in theory this commit makes no function difference.

commit | commitdiff | tree

David Mitchell [Thu, 4 Jan 2018 16:27:46 +0000 (16:27 +0000)]

op_dump(): dump tr/// translation table

previously it just displayed its address.
Also, when the table is in fact a swash, don't display its address
on threaded builds, as its actually just a padix.

commit | commitdiff | tree

David Mitchell [Mon, 15 Jan 2018 15:29:27 +0000 (15:29 +0000)]

tr///; simplify $utf8 =~ tr/nonutf8/nonutf8/

The run-time code to handle a non-utf8 tr/// against a utf8 string
is complex, with many variants of similar code repeated depending on the
presence of the /s and /c flags.

Simplify them all into a single code block by changing how the translation
table is stored. Formerly, the tr struct contained possibly two tables:
the basic 0-255 slot one, plus in the presence of /c, a second one
to map the implicit search range (\x{100}...) against any residual
replacement chars not consumed by the first table.

This commit merges the two tables into a single unified whole. For example

    tr/\x00-\xfe/abcd/c

is equivalent to

    tr/xff-\x{7fffffff}/abcd/

which generates a 259-entry translation table consisting of:

    0x00  => -1
    0x01  => -1
    ...
    0xfe  => -1
    0xff  =>  a
    0x100 =>  b
    0x101 =>  c
    0x102 =>  d

In addition we store:
    1) the size of the translation table (0x103 in the example above);
    2) an extra 'wildcard' entry stored 1 slot beyond the main table,
       which specifies the action for any codepoints outside the range of
       the table (i.e. chars 0x103..0x7fffffff). This can be either:
        a) a character, when the last replacement char is repeated;
        b) -1 when /c isn't in effect;
        c) -2 when /d is in effect;
        c) -3 identity: when the replacement list is empty but not /d.

       In the example above, this would be
            0x103 =>  d

The addition of -3 as a valid slot value is new.

This makes the main runtime code for the utf8 string with non-utf8 tr//
case look like, at its core:

    size = tbl->size;
    mapped_ch = tbl->map[ch >= size ? size : ch];

which then processes mapped_ch based on whether its >=0, or -1/-2/-3.

This is a lot simpler than the old scheme, and should generally be faster
too.

commit | commitdiff | tree

David Mitchell [Fri, 12 Jan 2018 16:21:48 +0000 (16:21 +0000)]

tr///c: handle len(replacement charlist) > 32767

RT #132608

In the non-utf8 case, the /c (complement) flag to tr adds an implied
\x{100}-\x{7fffffff} range to the search charlist. If the replacement list
contains more chars than are paired with the 0-255 part of the search
list, then the excess chars are stored in an extended part of the table.
The excess char count was being stored as a short, which caused problems
if the replacement list contained more than 32767 excess chars: either
substituting the wrong char, or substituting for a char located up to
0xffff bytes in memory before the real translation table.

So change it to SSize_t.

Note that this is only a problem when the search and replacement charlists
are non-utf8, the replacement list contains around 0x8000+ entries, and
where the string being translated is utf8 with at least one codepoint >=
U+8000.

commit | commitdiff | tree

David Mitchell [Fri, 12 Jan 2018 14:35:03 +0000 (14:35 +0000)]

B, Deparse fixups for tr///c

Recent commits slightly changed the layout of the extended map table: it
now always stores a repeat count, and there are now two structs defined,
rather than treating certain slots, like tbl[0x101], specially.

Update B and Deparse to reflect this.

commit | commitdiff | tree

David Mitchell [Fri, 12 Jan 2018 12:00:30 +0000 (12:00 +0000)]

add two structs for OP_TRANS

Originally, the op_pv of an OP_TRANS op pointed to a 256-slot array of
shorts, which contained the translations. However, in the presence of
tr///c, extra information needs to be stored to handle utf8 strings.
The 256 slot array was extended, with slot 0x100 holding a length,
and slots 0x101 holding some extra chars.

This has made things a bit messy, so this commit adds two structs,
one being an array of 256 shorts, and the other being the same but with
some extra fields. So for example tbl->[0x100] has been replaced with
tbl->excess_len.

This commit should make no functional difference, but will allow us
shortly to fix a bug by changing the type of the excess_len field from
short to something bigger, for example.

commit | commitdiff | tree

David Mitchell [Thu, 11 Jan 2018 14:41:33 +0000 (14:41 +0000)]

S_do_trans_complex(): re-indent

outdent a code block following previous commit.

commit | commitdiff | tree

David Mitchell [Thu, 11 Jan 2018 11:45:49 +0000 (11:45 +0000)]

fix "\x{100}..." =~ tr/.../.../cd

In transliterations where the search and replacement charlists are
non-utf8, but where the string being modified contains codepoints >=
0x100, then tr/.../.../cd would always delete all such codepoints, rather
than potentially mapping some of them.

In more detail: in the presence of /c (complement), an implicit
0x100..0x7fffffff is added to a non-utf8 search charlist. If the
replacement list is longer than the < 0x100 part of the search list, then
the last few replacement chars should in principle be paired off against
the first few of (\x100, \x101, ...). However, this wasn't happening. For
example,

    tr/\x00-\xfd/ABCD/cd

should be equivalent to

    tr/\xfe-\x{7fffffff}/ABCD/d

which should
    map:
        \xfe    => A,
        \xff    => B,
        \x{100} => C,
        \x{101} => D,
    and delete \x{102} onwards.

But instead, it behaved like

    tr/\xfe-\x{7fffffff}/AB/d

and deleted all codepoints >= 0x100.

This commit fixes that by using the extended mapping table format
for all /c variants (formerly it excluded /cd).

I also changed a variable holding the mapped char from being I32 to UV:
principally to avoid a casting mess in the fixed code. This may (or may
not), as a side-effect, have fixed possible issues with very large
codepoints.

commit | commitdiff | tree

David Mitchell [Tue, 9 Jan 2018 10:05:33 +0000 (10:05 +0000)]

OP_TRANS: change extended table format

For non-utf8, OP_TRANS(R) ops have a translation table consisting of an
array of 256 shorts attached. For tr///c, this table is extended to hold
information about chars in the replacement list which aren't paired with
chars in the search list.  For example,

    tr/\x00-AE-\xff/bcdefg/c

is equivalent to

    tr/BCD\x{100}-\x{7fffffff}/bcdefg/

which is equivalent to

    tr/BCD\x{100}-\x{7fffffff}/bcdefggggggggg..../

Only the BCD => bcd mappings can be stored in the basic 256-slot table,
so potentially the following extra information needs recording in an
extended table to handle codepoints > 0xff in the string being modified:

    1) the extra replacement chars ("efg");
    2) the number of extra replacement chars (3);
    3) the "repeat" char ('g').

Currently 2) and 3) are combined: the repeat char is found as the last
extra char, and if there are no extra chars, the repeat char is treated
as an extra char list of length 1.
Similarly, an 'extra chars' length value of 1 can imply either one extra
char, or no extra chars with the repeat char being faked as an extra char.
An 'extra chars' length of 0 implies an empty replacement list, i.e.
tr/....//c.

This commit changes it so that the repeat char is *always* stored (in slot
0x101), with the extra chars stored beginning at slot 0x102.
The 'extra chars' length value (located at slot 0x0100) has changed its
meaning slightly: now
    -1 implies tr/....//c
     0  implies no more replacement chars than search chars
    1+ the number of excess replacement chars.

This (should) make no function difference, but the extra information
stored will make it easier to fix some bugs shortly.

commit | commitdiff | tree

David Mitchell [Mon, 8 Jan 2018 16:14:17 +0000 (16:14 +0000)]

S_pmtrans(): add assert and simplify conditional

in tr/search/replace/c, the number of 'paired' replacement chars
will always be <= length(replace). Assert this, and thus simplify a couple
of conditionals from >= to ==.

It should make no difference to execution, but reduces the cognitive
load.

commit | commitdiff | tree

David Mitchell [Thu, 4 Jan 2018 13:20:59 +0000 (13:20 +0000)]

t/op/tr.t: add tr///c tests

The /c (complement) flag is almost completely untested. Indeed, for the
all non-utf8 case, nothing in core exercises a plain tr///c.

So this commit adds reasonably comprehensive tests for tr//c and variants
(/cs, /cd, /csd) where the search and replacement ranges are non-utf8, and
the string being matched may or may not be utf8.

A few tests are TODO for now as I've exposed some bugs - to be fixed
shortly.

commit | commitdiff | tree

David Mitchell [Mon, 8 Jan 2018 15:42:23 +0000 (15:42 +0000)]

S_pmtrans(): always use op_private flag variables

Various flag vars are set early on, such as:

const I32 complement = o->op_private & OPpTRANS_COMPLEMENT;

but sometimes these vars weren't being used, and op_private was being
tested again.

commit | commitdiff | tree

David Mitchell [Tue, 26 Dec 2017 16:43:31 +0000 (16:43 +0000)]

remove fossil debugging statement from do_trans()

This:

DEBUG_t( Perl_deb(aTHX_ "2.TBL\n"));

has been around in one form or another since perl1, but it makes no sense
since perl5,000, where -Dt now shows the name of the op being executed.

commit | commitdiff | tree

David Mitchell [Tue, 26 Dec 2017 17:11:01 +0000 (17:11 +0000)]

S_pmtrans(): remove some whitespace

Removal of MAD a long time ago left a couple of lines with very weird
indentation.

commit | commitdiff | tree

David Mitchell [Tue, 26 Dec 2017 16:40:14 +0000 (16:40 +0000)]

tr/// functions: add some basic code comments

For the various C functions which implement the compile-time and
run-time aspects of OP_TRANS, add some basic code comments at the top of
each function explaining what its purpose is.

Also add lots of code comments to the body of S_pmtrans() (which compiles
a tr///).

Also comment what the OPpTRANS_ private flag bits mean.

No functional changes.

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom