perl.git
9 months agoInline dfa for translating from UTF-8
Karl Williamson [Thu, 28 Jun 2018 03:52:47 +0000 (21:52 -0600)] 
Inline dfa for translating from UTF-8

This commit inlines the simple portion of the dfa that translates from
UTF-8 to code points, used in functions like utf8_to_uvchr_buf.

This dfa has been changed in previous commits so that it is small, and
punts on any problematic input, plus 18% of the Hangul syllable code
points.  (These still come out faster than blead.)  The smallness allows
it to be inlined, adding <2000 total bytes to the perl text space.

The inlined part never calls anything that needs thread context, so that
parameter can be removed.  I decided to remove it also from the
Perl_utf8_to_uvchr_buf() and Perl_utf8n_to_uvchr_error() functions.
There is a small risk that someone is actually using those functions
instead of the documented macros utf8_to_uvchr_buf() and
utf8n_to_uvchr_error().  If so, this can be added back in.

Perl_utf8_to_uvchr_msgs() is entirely removed, but the macro
utf8_to_uvchr_msgs() which is the normal interface to it is retained
unchanged, and it is marked as unstable anyway.

This change decreases the number of conditional branches in the Perl
statement

    my $a = ord("\x{foo}")

where foo is a non-problematic code point by about 11%, except for
ASCII characters, where it is 4%, and those Hangul syllables mentioned
above, where it is 7%.  Problematic code points fare much worse here
than in blead.  These are the surrogates, non-characters, and
non-Unicode code points.  We don't care very much about the speed of
handling these code points, which are mostly considered illegal by
Unicode anyway.

The percentage decrease is higher for the just the function itself, as
the measured Perl statement has unchanged overhead.

Here are the annotated benchmarks:

Key:
    Ir   Instruction read
    Dr   Data read
    Dw   Data write
    COND conditional branches
    IND  indirect branches
    _m   branch predict miss
    _m1  level 1 cache miss
    _mm  last cache (e.g. L3) miss
    -    indeterminate percentage (e.g. 1/0)

The numbers represent raw counts per loop iteration.

translate_utf8_to_uv_007f
my $a = ord("\x{007f}")

       blead   dfa Ratio %
       ----- ----- -------
    Ir 395.0 370.0   106.8
    Dr 122.0 115.0   106.1
    Dw  71.0  61.0   116.4
  COND  49.0  47.0   104.3
   IND   5.0   5.0   100.0

In all the measurements, the indirect numbers were all zeros and
unchanged, and are omitted in this message.

translate_utf8_to_uv_07ff
my $a = ord("\x{07ff}")

       blead   dfa Ratio %
       ----- ----- -------
    Ir 438.0 390.0   112.3
    Dr 128.0 118.0   108.5
    Dw  71.0  61.0   116.4
  COND  57.0  51.0   111.8
   IND   5.0   5.0   100.0

translate_utf8_to_uv_cfff
my $a = ord("\x{cfff}")

This is the highest Hangul syllable that gets the full reduction.

       blead   dfa Ratio %
       ----- ----- -------
    Ir 457.0 410.0   111.5
    Dr 131.0 121.0   108.3
    Dw  71.0  61.0   116.4
  COND  61.0  55.0   110.9
   IND   5.0   5.0   100.0

translate_utf8_to_uv_d000
my $a = ord("\x{d000}")

This is the lowest affected Hangul syllable

       blead   dfa Ratio %
       ----- ----- -------
    Ir 457.0 443.0   103.2
    Dr 131.0 132.0    99.2
    Dw  71.0  71.0   100.0
  COND  61.0  57.0   107.0
   IND   5.0   5.0   100.0

translate_utf8_to_uv_d7ff
my $a = ord("\x{d7ff}")

This is the highest affected Hangul syllable

       blead   dfa Ratio %
       ----- ----- -------
    Ir 457.0 443.0   103.2
    Dr 131.0 132.0    99.2
    Dw  71.0  71.0   100.0
  COND  61.0  57.0   107.0
   IND   5.0   5.0   100.0

translate_utf8_to_uv_d800
my $a = ord("\x{d800}")

This is a surrogate, showing much worse performance, but we don't care

       blead   dfa Ratio %
       ----- ----- -------
    Ir 457.0 515.0    88.7
    Dr 131.0 134.0    97.8
    Dw  71.0  73.0    97.3
  COND  61.0  75.0    81.3
   IND   5.0   5.0   100.0

translate_utf8_to_uv_fdd0
my $a = ord("\x{fdd0}")

This is a non-char, showing much worse performance, but we don't care

       blead   dfa Ratio %
       ----- ----- -------
    Ir 457.0 548.0    83.4
    Dr 131.0 139.0    94.2
    Dw  71.0  73.0    97.3
  COND  61.0  81.0    75.3
   IND   5.0   5.0   100.0

translate_utf8_to_uv_fffd
my $a = ord("\x{fffd}")

       blead   dfa Ratio %
       ----- ----- -------
    Ir 457.0 410.0   111.5
    Dr 131.0 121.0   108.3
    Dw  71.0  61.0   116.4
  COND  61.0  55.0   110.9
   IND   5.0   5.0   100.0

translate_utf8_to_uv_ffff
my $a = ord("\x{ffff}")

This is another non-char, showing much worse performance, but we don't
care

       blead   dfa Ratio %
       ----- ----- -------
    Ir 457.0 548.0    83.4
    Dr 131.0 139.0    94.2
    Dw  71.0  73.0    97.3
  COND  61.0  81.0    75.3
   IND   5.0   5.0   100.0

translate_utf8_to_uv_1fffd
my $a = ord("\x{1fffd}")

       blead   dfa Ratio %
       ----- ----- -------
    Ir 476.0 430.0   110.7
    Dr 134.0 124.0   108.1
    Dw  71.0  61.0   116.4
  COND  65.0  59.0   110.2
   IND   5.0   5.0   100.0

translate_utf8_to_uv_10fffd
my $a = ord("\x{10fffd}")

       blead   dfa Ratio %
       ----- ----- -------
    Ir 476.0 430.0   110.7
    Dr 134.0 124.0   108.1
    Dw  71.0  61.0   116.4
  COND  65.0  59.0   110.2
   IND   5.0   5.0   100.0

translate_utf8_to_uv_110000
my $a = ord("\x{110000}")

This is a non-Unicode code point, showing much worse performance, but we
don't care

       blead   dfa Ratio %
       ----- ----- -------
    Ir 476.0 544.0    87.5
    Dr 134.0 137.0    97.8
    Dw  71.0  73.0    97.3
  COND  65.0  81.0    80.2
   IND   5.0   5.0   100.0

9 months agoutf8.c: Avoid unnecessary work xlating utf8 to uv
Karl Williamson [Thu, 28 Jun 2018 03:28:15 +0000 (21:28 -0600)] 
utf8.c: Avoid unnecessary work xlating utf8 to uv

This moves the code for the dfa that does the translation of
non-problematic characters to earlier in the function to avoid work that
only needs to be done if the dfa rejects the input.  For example,
calculating how long the sequence is needed to be no longer is done
unless the dfa rejects.

Since the dfa always accepts an invariant if the allowed length is
non-zero, the code that tests for those specifically can be removed.

9 months agoUse strict dfa to translate from UTF-8 to code point
Karl Williamson [Mon, 2 Jul 2018 01:23:35 +0000 (19:23 -0600)] 
Use strict dfa to translate from UTF-8 to code point

With this commit, if a sequence passes the dfa, the result can be
returned immediately.  Previously some rare potentially problematic
sequences could pass, which would then need further checking, which then
have to be done always.  So this speeds up the general case.

9 months agoAdd dfa for strict translation from UTF-8
Karl Williamson [Thu, 28 Jun 2018 00:08:12 +0000 (18:08 -0600)] 
Add dfa for strict translation from UTF-8

9 months agoFix outdated docs for isUTF8_char()
Karl Williamson [Sun, 1 Jul 2018 19:48:34 +0000 (13:48 -0600)] 
Fix outdated docs for isUTF8_char()

It doesn't accept non-negative code points that don't fit in an IV

9 months agoMake isUTF8_char() an inline function
Karl Williamson [Tue, 26 Jun 2018 01:11:46 +0000 (19:11 -0600)] 
Make isUTF8_char() an inline function

It was a macro that used a trie.  This changes to use the dfa
constructed in previous commits.  I didn't bother with taking
measurements.  A dfa should require fewer conditionals to be executed
for many code points.

9 months agoExtend dfa for translation of UTF-8 to EBCDIC
Karl Williamson [Mon, 25 Jun 2018 23:01:30 +0000 (17:01 -0600)] 
Extend dfa for translation of UTF-8 to EBCDIC

This commit changes to use a dfa for translating from UTF-8 on EBCDIC
platforms.  This makes for fewer #ifdefs, and I realized while I was
working on the dfa, that it wasn't difficult to do for EBCDIC.

9 months agoMake UTF-8 dfa table an EXTCONST
Karl Williamson [Mon, 25 Jun 2018 22:49:22 +0000 (16:49 -0600)] 
Make UTF-8 dfa table an EXTCONST

This will allow it to be used inline.

9 months agoRename dfa table for UTF-8
Karl Williamson [Mon, 25 Jun 2018 22:16:04 +0000 (16:16 -0600)] 
Rename dfa table for UTF-8

This is in preparation for having additional dfa tables.  This names
this one to reflect its specific purpose.

9 months agoChange dfa table for Perl extended UTF-8
Karl Williamson [Mon, 25 Jun 2018 22:05:39 +0000 (16:05 -0600)] 
Change dfa table for Perl extended UTF-8

This restructures the dfa table for translating UTF-8 into U32 to handle
higher code points.  In doing so, I rationalized the numbering scheme
for nodes and byte types.  This makes it easier to see the patterns in
the table.

9 months agoregen/ebcdic.pl: Add capability to generate a dfa table
Karl Williamson [Sun, 1 Jul 2018 19:12:16 +0000 (13:12 -0600)] 
regen/ebcdic.pl: Add capability to generate a dfa table

This kind of table is used for the dfa for translating or verifying
UTF-8.

9 months agoregen/ebcdic.pl: Add declaration of generated tables
Karl Williamson [Sun, 1 Jul 2018 18:43:59 +0000 (12:43 -0600)] 
regen/ebcdic.pl: Add declaration of generated tables

This adds code to declare and define the tables only under DOINIT, and
otherwise to just declare them.  This allows the includer to not have to
deal with them at all.

9 months agoregen/ebcdic.pl: Always print row headings
Karl Williamson [Thu, 5 Jul 2018 19:27:54 +0000 (13:27 -0600)] 
regen/ebcdic.pl: Always print row headings

Previously, this omitted the headings on tables that just barely fit
into 80 columns.  But future commits will create tables that can't fit
into 80 columns, and these headings are useful, so print them.

9 months agoebcdic_tables.h: Add comments
Karl Williamson [Sun, 10 Jun 2018 18:18:44 +0000 (12:18 -0600)] 
ebcdic_tables.h: Add comments

9 months agoutf8.c: Change expression to be EBCDIC friendly
Karl Williamson [Thu, 14 Jun 2018 19:35:39 +0000 (13:35 -0600)] 
utf8.c: Change expression to be EBCDIC friendly

This actually does two things: 1) it adds macros that evaluate to no
extra code on ASCII platforms, but allow things to work under EBCDIC;
and 2) it changes to use a ternary conditional.  This may not change
anything, or it may cause the compiler to generate slightly smaller
code at the expense of an extra addition instruction.  I am moving to
inlining this code, and want to make it smaller to enable that to
happen.

9 months agoAPItest: Add comprehensive UTF-8 validity tests
Karl Williamson [Mon, 2 Jul 2018 01:20:59 +0000 (19:20 -0600)] 
APItest: Add comprehensive UTF-8 validity tests

These test all combinations of bytes at all likely to have any issues.
They are run only when an environment variable is set to a particular
obscure value, as they take a long time.

9 months agoinline.h: Fix typo in comment
Karl Williamson [Thu, 5 Jul 2018 18:54:17 +0000 (12:54 -0600)] 
inline.h: Fix typo in comment

9 months agoutf8.h: Remove obsolete comment
Karl Williamson [Thu, 5 Jul 2018 18:52:00 +0000 (12:52 -0600)] 
utf8.h: Remove obsolete comment

9 months agoUpgrade Math::BigInt from version 1.999811 to 1.999813
Steve Hay [Wed, 4 Jul 2018 16:54:42 +0000 (17:54 +0100)] 
Upgrade Math::BigInt from version 1.999811 to 1.999813

9 months agoUpgrade bignum from version 0.49 to 0.50
Steve Hay [Wed, 4 Jul 2018 16:51:34 +0000 (17:51 +0100)] 
Upgrade bignum from version 0.49 to 0.50

9 months agoUpdate Config::Perl::V to 0.30
H.Merijn Brand [Wed, 4 Jul 2018 18:23:50 +0000 (20:23 +0200)] 
Update Config::Perl::V to 0.30

9 months agoNote that .appveyor.yml is EXCLUDED from Filter::Util::Call
Steve Hay [Wed, 4 Jul 2018 12:54:41 +0000 (13:54 +0100)] 
Note that .appveyor.yml is EXCLUDED from Filter::Util::Call

9 months agoUpgrade Math::BigRat from version 0.2613 to 0.2614
Steve Hay [Tue, 3 Jul 2018 11:18:02 +0000 (12:18 +0100)] 
Upgrade Math::BigRat from version 0.2613 to 0.2614

9 months agoUpgrade Math::BigInt::FastCalc from version 0.5006 to 0.5007
Steve Hay [Tue, 3 Jul 2018 11:13:44 +0000 (12:13 +0100)] 
Upgrade Math::BigInt::FastCalc from version 0.5006 to 0.5007

9 months agoUpgrade experimental from version 0.019 to 0.020
Steve Hay [Wed, 4 Jul 2018 11:50:46 +0000 (12:50 +0100)] 
Upgrade experimental from version 0.019 to 0.020

(This retains blead customizations from da4e040f42 and 14e4cec412.)

9 months agoSync with version-0.9924
Steve Hay [Wed, 4 Jul 2018 08:01:56 +0000 (09:01 +0100)] 
Sync with version-0.9924

(There are no material changes to any files in core.)

9 months agoversion is in sync with 0.9923; vxs.inc is not customized
Steve Hay [Wed, 4 Jul 2018 07:51:51 +0000 (08:51 +0100)] 
version is in sync with 0.9923; vxs.inc is not customized

9 months agoUpgrade Time::Local from version 1.25 to 1.28
Steve Hay [Tue, 3 Jul 2018 12:49:05 +0000 (13:49 +0100)] 
Upgrade Time::Local from version 1.25 to 1.28

9 months agoUpgrade Test::Simple from vesion 1.302133 to 1.302136
Steve Hay [Tue, 3 Jul 2018 12:47:10 +0000 (13:47 +0100)] 
Upgrade Test::Simple from vesion 1.302133 to 1.302136

9 months agoUpgrade podlators from version 4.10 to 4.11
Steve Hay [Tue, 3 Jul 2018 12:36:04 +0000 (13:36 +0100)] 
Upgrade podlators from version 4.10 to 4.11

(This includes the former blead customization.)

9 months agoUpgrade perlfaq from version 5.021011 to 5.20180605
Steve Hay [Tue, 3 Jul 2018 12:24:10 +0000 (13:24 +0100)] 
Upgrade perlfaq from version 5.021011 to 5.20180605

(Existing blead customizations are retained.)

9 months agoUpgrade Locale-Codes from version 3.56 to 3.57
Steve Hay [Tue, 3 Jul 2018 12:16:14 +0000 (13:16 +0100)] 
Upgrade Locale-Codes from version 3.56 to 3.57

9 months agoUpgrade IPC::Cmd from version 1.00 to 1.02
Steve Hay [Tue, 3 Jul 2018 08:07:35 +0000 (09:07 +0100)] 
Upgrade IPC::Cmd from version 1.00 to 1.02

9 months agoUpgrade IO-Compress from version 2.074 to 2.081
Steve Hay [Tue, 3 Jul 2018 08:06:01 +0000 (09:06 +0100)] 
Upgrade IO-Compress from version 2.074 to 2.081

9 months agoUpgrade File::Temp from version 2.034 to 2.036
Steve Hay [Tue, 3 Jul 2018 08:00:48 +0000 (09:00 +0100)] 
Upgrade File::Temp from version 2.034 to 2.036

9 months agoUpgrade ExtUtils::Manifest from version 1.70 to 1.71
Steve Hay [Tue, 3 Jul 2018 07:53:53 +0000 (08:53 +0100)] 
Upgrade ExtUtils::Manifest from version 1.70 to 1.71

9 months agoExtUtils-Constant is synced with version 0.25
Steve Hay [Tue, 3 Jul 2018 07:49:45 +0000 (08:49 +0100)] 
ExtUtils-Constant is synced with version 0.25

9 months agoUpgrade Digest::SHA from version 6.01 to 6.02
Steve Hay [Tue, 3 Jul 2018 07:40:09 +0000 (08:40 +0100)] 
Upgrade Digest::SHA from version 6.01 to 6.02

9 months agoUpgrade DB_File from version 1.840 to 1.841
Steve Hay [Tue, 3 Jul 2018 07:34:44 +0000 (08:34 +0100)] 
Upgrade DB_File from version 1.840 to 1.841

9 months agoUpgrade Compress::Raw::Zlib from 2.076 to 2.081
Steve Hay [Tue, 3 Jul 2018 07:32:45 +0000 (08:32 +0100)] 
Upgrade Compress::Raw::Zlib from 2.076 to 2.081

9 months agoUpgrade Compress::Raw::Bzip2 from version 2.074 to 2.081
Steve Hay [Tue, 3 Jul 2018 07:32:03 +0000 (08:32 +0100)] 
Upgrade Compress::Raw::Bzip2 from version 2.074 to 2.081

9 months agoDisable optimizer on pp_pack for HP C-ANSI-C on HP-UX 11.11
H.Merijn Brand [Mon, 2 Jul 2018 16:25:16 +0000 (18:25 +0200)] 
Disable optimizer on pp_pack for HP C-ANSI-C on HP-UX 11.11

with optimize levels +O1 and higher:

$ ./miniperl -I./lib -wE'$a = pack "Cn4", 1, 3726, 32, 2'
Character in 'C' format wrapped in pack at -e line 1.

with +O0 (or no -O/+O) all goes well
Chances are too small to care that this will ever be fixed

This was found as the Test::Smoke run on this system created a
log-file of over 350 Mb with 4016149 warnings like the above

9 months agoRegen uni_keywords.h
Karl Williamson [Mon, 2 Jul 2018 14:41:37 +0000 (08:41 -0600)] 
Regen uni_keywords.h

This is as a result of 72196ef94b98987bb277d8bf6db6efaacd624c3c.
and the MD5 for the file it changed needing to be recalculated

9 months agofix issue in regen/mph under 32 bit builds
Yves Orton [Mon, 2 Jul 2018 13:46:54 +0000 (15:46 +0200)] 
fix issue in regen/mph under 32 bit builds

the multiplication overflows so perl "helpfully" convers to a float,
which then makes things go horribly horribly wrong. `use integer` to the rescue

9 months agoFix to compile under -DNO_LOCALE
Karl Williamson [Wed, 23 May 2018 21:32:47 +0000 (15:32 -0600)] 
Fix to compile under -DNO_LOCALE

Several problems with this compile option were not caught before 5.28
was frozen.

9 months agoutf8.h: Add assert for utf8n_to_uvchr_buf()
Karl Williamson [Mon, 11 Jun 2018 18:58:25 +0000 (12:58 -0600)] 
utf8.h: Add assert for utf8n_to_uvchr_buf()

The Perl_utf8n_to_uvchr_buf() version of this function has an assert;
this adds it as well to the macro that bypasses the function.

9 months agoperl.h: Add parens around macro arguments
Karl Williamson [Mon, 11 Jun 2018 19:26:24 +0000 (13:26 -0600)] 
perl.h: Add parens around macro arguments

Arguments used within macros need to be parenthesized in case they are
called with an expression.  This commit changes
_CHECK_AND_OUTPUT_WIDE_LOCALE_UTF8_MSG() to do that.

9 months agoregexec.c: Call macro with correct args.
Karl Williamson [Mon, 11 Jun 2018 19:28:53 +0000 (13:28 -0600)] 
regexec.c: Call macro with correct args.

The second argument to this macro is a pointer to the end, as opposed to
a length.

9 months agoPATCH: [perl #133311] BBC GRANTM/Encoding-FixLatin
Karl Williamson [Sat, 30 Jun 2018 21:17:51 +0000 (15:17 -0600)] 
PATCH: [perl #133311] BBC GRANTM/Encoding-FixLatin

This effectively reverts a portion of
a74bb78e4469c9f5ea806b57b155df6265d07975.

I got confused when I wrote that commit, and conflated ASCII and POSIX
regnodes.

9 months agofix stren in PERL_MEM_LOG builds
Jim Cromie [Sat, 30 Jun 2018 02:26:21 +0000 (20:26 -0600)] 
fix stren in PERL_MEM_LOG builds

9 months agoperldeprecation: Clean up text about grapheme delims
Karl Williamson [Fri, 29 Jun 2018 14:29:24 +0000 (08:29 -0600)] 
perldeprecation: Clean up text about grapheme delims

This changes the text to make more sense in light of the fact that the
the deprecation has changed to illegality.

9 months ago[MERGE] fixups to Perl_my_setenv()
David Mitchell [Fri, 29 Jun 2018 13:38:10 +0000 (14:38 +0100)] 
[MERGE] fixups to Perl_my_setenv()

9 months agoPerl_my_setenv(): re-indent cpp directive lines
David Mitchell [Fri, 29 Jun 2018 13:30:17 +0000 (14:30 +0100)] 
Perl_my_setenv(): re-indent cpp directive lines

The indentation was all over the place.  Whitespace-only changes apart
from fixing code comments at end of '#endif' lines.

9 months agoPerl_my_setenv: move code comment
David Mitchell [Fri, 29 Jun 2018 13:12:18 +0000 (14:12 +0100)] 
Perl_my_setenv: move code comment

This comment about VMS seems to have drifted over time away from the
ifdef it refs to

9 months agoPerl_my_setenv(); handle integer wrap
David Mitchell [Fri, 29 Jun 2018 12:37:03 +0000 (13:37 +0100)] 
Perl_my_setenv(); handle integer wrap

RT #133204

Wean this function off int/I32 and onto UV/Size_t.
Also, replace all malloc-ish calls with a wrapper that does
overflow checks,

In particular, it was doing (nlen + vlen + 2) which could wrap when
the combined length of the environment variable name and value
exceeded around 0x7fffffff.

The wrapper check function is probably overkill, but belt and braces...

NB this function has several variant parts, #ifdef'ed by platform
type; I have blindly changed the parts that aren't compiled under linux.

9 months agoutf8.c: Use abbreviations consistently
Karl Williamson [Sun, 20 May 2018 23:09:37 +0000 (17:09 -0600)] 
utf8.c: Use abbreviations consistently

Elsewhere in this function some abbreviations were introduced and used.
This is the one area not using them.

9 months agoRemove some deprecated functions from mathoms.c
Karl Williamson [Fri, 29 Jun 2018 03:58:23 +0000 (21:58 -0600)] 
Remove some deprecated functions from mathoms.c

These have been deprecated since 5.18, and have security issues, as they
can try to read beyond the end of the buffer.

9 months agoIO is actually 1.39
Chris 'BinGOs' Williams [Thu, 28 Jun 2018 12:59:06 +0000 (13:59 +0100)] 
IO is actually 1.39

9 months ago`make distclean` now removes dist/Time-HiRes/xdefine if it's still there.
Nicholas Clark [Thu, 28 Jun 2018 08:11:12 +0000 (10:11 +0200)] 
`make distclean` now removes dist/Time-HiRes/xdefine if it's still there.

The Time::HiRes Makefile *should* remove dist/Time-HiRes/xdefine for
'clean', but it's possible to get a rebuilt checkout into a state where it
can't run distclean, and can't recover until the file is gone.

There's no harm in adding it to the top level 'distclean' target - it should
anyway by then, and miniperl is long gone.

9 months agoAdding a few release managers for this cycle
Sawyer X [Thu, 28 Jun 2018 06:50:24 +0000 (09:50 +0300)] 
Adding a few release managers for this cycle

9 months agoadd data for 5.28.0 release in perlhist
Sawyer X [Wed, 27 Jun 2018 11:09:47 +0000 (14:09 +0300)] 
add data for 5.28.0 release in perlhist

9 months agoUpdate epigraphs with links
Sawyer X [Wed, 27 Jun 2018 11:03:15 +0000 (14:03 +0300)] 
Update epigraphs with links

9 months agoUpdate and bump Module-CoreList
Sawyer X [Wed, 27 Jun 2018 06:41:04 +0000 (09:41 +0300)] 
Update and bump Module-CoreList

9 months agoregen opcodes for 5.29.1
Sawyer X [Tue, 26 Jun 2018 22:05:13 +0000 (01:05 +0300)] 
regen opcodes for 5.29.1

9 months agoFix Module::CoreList tests for 5.29.1
Sawyer X [Tue, 26 Jun 2018 22:04:21 +0000 (01:04 +0300)] 
Fix Module::CoreList tests for 5.29.1

9 months agobump version for 5.29.1
Sawyer X [Tue, 26 Jun 2018 21:37:51 +0000 (00:37 +0300)] 
bump version for 5.29.1

9 months agonew perldelta for 5.29.1
Sawyer X [Tue, 26 Jun 2018 21:36:13 +0000 (00:36 +0300)] 
new perldelta for 5.29.1

9 months agotick off
Sawyer X [Tue, 26 Jun 2018 21:30:17 +0000 (00:30 +0300)] 
tick off

9 months agoUpdate epigraph
Sawyer X [Tue, 26 Jun 2018 21:29:49 +0000 (00:29 +0300)] 
Update epigraph

9 months agoadd new release to perlhist v5.29.0
Sawyer X [Tue, 26 Jun 2018 20:36:02 +0000 (23:36 +0300)] 
add new release to perlhist

9 months agoFinalize perldelta
Sawyer X [Tue, 26 Jun 2018 20:26:54 +0000 (23:26 +0300)] 
Finalize perldelta

9 months agoUpdate Module::CoreList for 5.29.0
Sawyer X [Tue, 26 Jun 2018 20:15:59 +0000 (23:15 +0300)] 
Update Module::CoreList for 5.29.0

9 months agoClean up perldelta
Sawyer X [Tue, 26 Jun 2018 20:10:13 +0000 (23:10 +0300)] 
Clean up perldelta

9 months agoRemove 5.27 entries from Porting/release_schedule.
Abigail [Tue, 26 Jun 2018 17:45:17 +0000 (19:45 +0200)] 
Remove 5.27 entries from Porting/release_schedule.

9 months agoI've committed to do two back-to-back releases.
Abigail [Tue, 26 Jun 2018 17:43:14 +0000 (19:43 +0200)] 
I've committed to do two back-to-back releases.

For December 2018 and January 2019.

9 months agoCorrect release schedule for 5.29/5.30.
James E Keenan [Tue, 26 Jun 2018 00:23:19 +0000 (20:23 -0400)] 
Correct release schedule for 5.29/5.30.

Signed-off-by: Abigail <abigail@abigail.be>
9 months agohandy.h: Remove obsolete comment
Karl Williamson [Sun, 20 May 2018 21:23:38 +0000 (15:23 -0600)] 
handy.h: Remove obsolete comment

9 months agoDon't allow non-graphemes as pattern delimiters
Karl Williamson [Sun, 20 May 2018 18:52:33 +0000 (12:52 -0600)] 
Don't allow non-graphemes as pattern delimiters

This has been deprecated, and scheduled for removal in 5.30.

9 months agotoke.c: Move some code into called function
Karl Williamson [Sun, 20 May 2018 18:52:17 +0000 (12:52 -0600)] 
toke.c: Move some code into called function

It makes more sense for this code to be in the function called, rather
than separated out.

9 months agot/test.pl: Add $|=1;
Karl Williamson [Thu, 26 Apr 2018 18:08:18 +0000 (12:08 -0600)] 
t/test.pl: Add $|=1;

This makes the warning/error messages adjacent to the problematic code.
A bunch of tests already do this individually, but when I asked if there
was a problem in doing it globally, I got no response.

http://nntp.perl.org/group/perl.perl5.porters/249600

There were two failures in the test suite as a result of this change, so
for those files, this commit turns it off.

This commit is being done early in the test cycle to see what other ill
effects it might have.

9 months agomktables: Correct L<> for perluniprops; rmv trail space
Karl Williamson [Mon, 25 Jun 2018 13:25:57 +0000 (07:25 -0600)] 
mktables: Correct L<> for perluniprops; rmv trail space

9 months agot/porting/regen.t: Add test for new uni_keywords.h
Karl Williamson [Sun, 6 May 2018 15:08:06 +0000 (09:08 -0600)] 
t/porting/regen.t: Add test for new uni_keywords.h

9 months agoregen/mk_invlists.pl: Fix outdated comments
Karl Williamson [Sun, 6 May 2018 04:07:55 +0000 (22:07 -0600)] 
regen/mk_invlists.pl: Fix outdated comments

9 months agoregen/mk_invlists.pl: use re 'qr/aa'
Karl Williamson [Sun, 6 May 2018 03:21:45 +0000 (21:21 -0600)] 
regen/mk_invlists.pl: use re 'qr/aa'

This makes sure that all patterns in this file are compiled under /aa.
Doing this can catch bugs.  The bug the previous commit fixes would have
been caught if we did this.

9 months agoregen/mk_invlists.pl: Fix chicken and egg problem
Karl Williamson [Sun, 6 May 2018 02:46:21 +0000 (20:46 -0600)] 
regen/mk_invlists.pl: Fix chicken and egg problem

The problem here is that it was using a regular expression pattern to
determine if a code point is the integer 0.  When a new Unicode release
comes along and adds a new block of decimals, this routine should be run
before the interpreter is compiled for real.  And the pattern won't know
about the new block, so this would fail.

Solve the problem by using only Unicode::UCD to discover this info, and
not a pattern.

9 months agomktables: Add, change some comments
Karl Williamson [Sun, 6 May 2018 01:53:18 +0000 (19:53 -0600)] 
mktables: Add, change some comments

9 months agoutf8.c: Use a more generic enum instead of explicit ptr
Karl Williamson [Sat, 5 May 2018 18:13:37 +0000 (12:13 -0600)] 
utf8.c: Use a more generic enum instead of explicit ptr

This changes, where possible, the reference to an inversion list, from
its specific name, to using an enum value (or a #define to an enum
value) which is an offset into a list of inversion lists.

This seems slightly more robust to me, as we don't have to know the
precise name of the table, but can use an enum which may have #define's
for it to create synonyms.  Some versions of Unicode may not have the
precise name, but regen/mk_invlists.pl creates synonyms where possible,
so the chances of it being undefined go down.

Currently there is an inconsistency in the tables' names.  Some recent
ones all begin with 'PL_'.  That was when I thought these tables were
all going to be public.  But then it turned out that they could just be
defined in one file (utf8.c), so the prefix is probably unnecessary.
Older tables didn't have that, and haven't changed.  I'm not sure how it
will or should turn out.

9 months agoutf8.c: Reorder some initialization code
Karl Williamson [Sat, 5 May 2018 18:01:27 +0000 (12:01 -0600)] 
utf8.c: Reorder some initialization code

This puts the code into various related groups.

9 months agoutf8.c: Fix \p{} to work on old Unicodes
Karl Williamson [Sat, 5 May 2018 17:38:18 +0000 (11:38 -0600)] 
utf8.c: Fix \p{} to work on old Unicodes

This change to use one #define instead of a synonym causes the code to
work unchanged on any Unicode version.  The synonym isn't defined in
very old Unicodes, so this wouldn't compile for them.

9 months agoutf8.c: qr/\p{}/ Handle Unihan numeric properties
Karl Williamson [Sat, 5 May 2018 17:28:09 +0000 (11:28 -0600)] 
utf8.c: qr/\p{}/ Handle Unihan numeric properties

The Unihan data base is not shipped with perl due to its size.  But we
allow someone to copy its files into the unicore directory and recompile
perl in order to get access to its properties.  Some of those properties
are numeric, which, like the nv property, require special handling in
utf8.c.  This commit adds that handling.

9 months agomktables: Handle cjkiicore properly
Karl Williamson [Sat, 5 May 2018 04:25:54 +0000 (22:25 -0600)] 
mktables: Handle cjkiicore properly

This property is not normally compiled by perl, but an installation may
choose to use it.  It was failing some tests because this is a special
property that is like a perl dual-var.  It is both binary, and
non-binary, and commit 346f9bfbe12 forgot that.

9 months agoregen/mk_invlists.pl: Fix-ups for early Unicode versions
Karl Williamson [Tue, 1 May 2018 23:26:42 +0000 (17:26 -0600)] 
regen/mk_invlists.pl: Fix-ups for early Unicode versions

In some of these, certain properties aren't defined yet, so have no
entries.  Just add a check for that, and compensate.

9 months agoregcomp.c: Simplify
Karl Williamson [Tue, 1 May 2018 22:42:29 +0000 (16:42 -0600)] 
regcomp.c: Simplify

Under /a pattern matching, the matches of the [:posix:] classes are
restricted to the ASCII range.  Previously, in a time/space trade-off
that favored space, we created the list of matching characters at
pattern compilation time by ANDing the full-range Posix class with the
set of ASCII characters.

But now, the tables for just the ASCII-range classes are generated
anyway, so there's no need to do that compilation-time intersection.
This slightly simplifies the code.

9 months agomktables: Add guard against Unicode breakage
Karl Williamson [Tue, 1 May 2018 21:47:11 +0000 (15:47 -0600)] 
mktables: Add guard against Unicode breakage

This adds a check that a new Unicode version doesn't create a rational
number that is too close to a current rational for our existing
floating point precision.  Should this happen, we can increase the
precision we use.

9 months agoAdd tests for qr/\p{}/
Karl Williamson [Tue, 1 May 2018 21:24:19 +0000 (15:24 -0600)] 
Add tests for qr/\p{}/

This adds tests for nv=integer, where 'integer' is expressed in %e.

9 months agoutf8.c: Handle qr!\p{nv=6/8}!
Karl Williamson [Tue, 1 May 2018 01:05:54 +0000 (19:05 -0600)] 
utf8.c: Handle qr!\p{nv=6/8}!

I thought this worked before, but it turns out it never did.  This
commit allows the rational number specified in looking up the Numeric
Value property to not be in lowest possible terms.  Unicode even
furnishes some of its data in non-lowest form, so we should accept
this.

9 months agoutf8.c: Use \p{nv=float}
Karl Williamson [Mon, 30 Apr 2018 16:39:46 +0000 (10:39 -0600)] 
utf8.c: Use \p{nv=float}

Now that the float data is available to us (in the previous commit), we
can take advantage of it, and avoid swash creation.

We just use the perl atof() to convert the input string to an NV, and
then convert back to a string, but in guaranteed canonical form.  Then
we look that up.

9 months agoregen/mk_invlists.pl: Add \p{nv=float} data
Karl Williamson [Thu, 26 Apr 2018 18:29:54 +0000 (12:29 -0600)] 
regen/mk_invlists.pl: Add \p{nv=float} data

The previous commit revised how nv=float is handled.  This commit adds
data for handling that to charclass_invlists.h, so that the next commit
can use that and avoid swash creation.

9 months agoRevise \p{nv=float} lookup
Karl Williamson [Mon, 30 Apr 2018 03:08:37 +0000 (21:08 -0600)] 
Revise \p{nv=float} lookup

The Numeric Value property allows one to find all code points that have
a certain numeric value.  An example would be to match against any
character in any of the world's scripts which is effectively equivalent
to the digit zero.

It is documented that we accept either integers (like \p{nv=9}) or
rationals (like \p{nv=1/2}).  But we also accept floating point
representations in case a conversion to numeric has happened.  I think
it is right that we not document these and their vagaries.  One reason
is that Unicode might someday create a new rational number that, to the
precision we currently accept, is indistinguishable from an existing
one, so that we would have to increase the precision.

But there was a bug I introduced years ago.  I thought that in order for
a float to be considered to match a close rational, that 3 significant
digits of precision would be needed, like .667 to match 2/3.  That still
seems reasonable.   But I didn't implement that concept.  Instead, prior
to this commit, it was 3 (not necessarily significant) digits, so that
for 1/160, it would match .001.

This commit corrects that, and makes the lookup simpler.  mktables will
use sprintf %e to get the number normalized and having the 3 signicant
digits required.  At runtime, a floating number is normalized using the
same format, and the result looked up in a hash.  This eliminates the
need to worry about matching within some epsilon.

Further simplifications in utf8_heavy.pl are achieved by making a more
precise definition as to what an acceptable number looks like, so we
don't have to check later to see if what matched really was one.

9 months agoregen/mk_invlists.pl: Add to list of props to keep together
Karl Williamson [Thu, 26 Apr 2018 03:18:59 +0000 (21:18 -0600)] 
regen/mk_invlists.pl: Add to list of props to keep together

Using the same idea as pp_hot.c, the Unicode properties actually used by
perl are attempted to be kept together so that paging in one is likely
to page in others.  A few were omitted prior to this commit.