This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Karl Williamson [Fri, 30 Sep 2016 03:03:30 +0000 (21:03 -0600)]
APItest/t/utf8.t: Fix EBCDIC test
Unlike on ASCII platforms, it may take more than one byte of a partial
character to determine if it represents a code point that needs 32 or
more bits to represent. This fixes the test to account for that.
Karl Williamson [Fri, 30 Sep 2016 03:01:36 +0000 (21:01 -0600)]
utf8.c: Add missing type specifier to declaration
This code was missing a STRLEN specifier; only compiled on EBCDIC.
Karl Williamson [Thu, 29 Sep 2016 17:51:41 +0000 (11:51 -0600)]
APItest/t/utf8.t: Skip some tests if major one fails
If the patched test fails, the subsequent ones in the loop are
meaningless, so don't execute them.
Karl Williamson [Thu, 29 Sep 2016 17:50:51 +0000 (11:50 -0600)]
APItest/t/utf8.t: Fix typo
This was a typo in the UTF-EBCDIC for a code point, so affected only
tests on that platform
Karl Williamson [Thu, 29 Sep 2016 02:42:30 +0000 (20:42 -0600)]
utf8n_to_uvchr() Fix EBCDIC bug with overlongs
The comment removed in this commit was wrong, and so was the code it
described. On EBCDIC platforms, there are malformations that need to be
converted from Unicode to native. When I wrote that I wasn't thinking
about overlongs, which can evaluate to any code point. The new tests in
d566bd20c27a46aecd668d2f739b9515f46ac74f caught this.
Dan Collins [Thu, 29 Sep 2016 15:23:00 +0000 (11:23 -0400)]
RT #116250: Fail the right number of tests on timeout
Committer: Correct syntax error. Correct RT # in commit message.
David Mitchell [Thu, 29 Sep 2016 15:03:05 +0000 (16:03 +0100)]
OP_SASSIGN: make op_first==op_last for UNOP
Occasionally (e.g. $x ||= 1) an OP_SASSIGN operator only has a single
arg. The previous two commits made OP_SASSIGN always be allocated as a
BINOP, and if necessary, set op_last to NULL when there's only s single
arg.
This commit instead sets op_last equal to op_first for this case (similar
to how a LISTOP with a single arg is handled). This removes the need for
special handling in S_finalize_op().
Reini Urban [Thu, 29 Sep 2016 15:20:52 +0000 (16:20 +0100)]
sassign was used as UNOP, optimize {or,and,dor}assign
[ DAPM:
To clarify: OP_SASSIGN normally has two args, and is allocated as a
BINOP. However, in something like $x ||= 1, the optree looks like:
4 <|> orassign(other->5) vK/1 ->7
- <1> ex-rv2sv sKRM/1 ->4
3 <#> gvsv[*x] s ->4
6 <1> sassign sK/BKWARD,1 ->7
5 <$> const[IV 1] s ->6
Here the sassign only has a single arg, since the other arg is already
left on the stack after orassign has executed.
In this case, perl was allocating the op as a UNOP, which causes
problems with any code which assumes op_last contains a valid pointer.
This commit changes it so that the op is always allocated as a BINOP,
even when it only has one arg. In that case, it sets op_last to NULL
(but see the next commit).
Setting OPpASSIGN_BACKWARDS earlier is just a simplification of the
code.
]
In newASSIGNOP with {or,and,dor}assign, the rhs was wrongly compiled as UNOP sassign.
It caused DEBUGGING corruption in the op finalizer for sassign (first not
pointing to last without sibling) and added random chunk to the last field.
It was never used though, as only {or,and,dor}assign used this op_other op.
{or,and,dor}assign needs the sassign with OPpASSIGN_BACKWARDS, set it
directly, not later in the LOGOP.
finalize_op needs a special case for it, as the last is empty there.
Reini Urban [Thu, 29 Sep 2016 13:30:27 +0000 (14:30 +0100)]
sassign is wrongly declared as BASEOP, not BINOP.
[ DAPM:
To clarify: OP_SASSIGN is always allocated as a BINOP (or occasionally
as a UNOP - see the next commit), but is listed as a BASEOP in
regen/opcodes. Because of this, various bits of code that rely on e.g.
PL_opargs[] have to be special-cased for OP_SASSIGN. This commit changes
the entry in regen/opcodes to list it as BINOP, and removes the
special-casing.
I've also added a temporary workaround marked by XXX to make the commit
work under PERL_OP_PARENT, which is the default now. This will be
removed in a couple if commits' time.
]
This was wrong from the very beginning:
added with
79072805bf lwall perl 5.0 alpha 2 1993 with class s, not 0,
but missing the 2 S S args, which are present in aassign.
Changed to BASEOP with
db173bac9b6de7d by mbeattie in 1997.
The '# sassign is special-cased for op class' comment is suspicious.
Fix it in ck_sassign also, it is created as BINOP in newASSIGNOP.
In
202206897 dapm 2014 complained about it also. Remove some special
cases where it should be a BINOP but was not.
David Mitchell [Wed, 28 Sep 2016 12:40:34 +0000 (13:40 +0100)]
undef $0 shouldn't warn about $0
RT #123910
$ perl -we'undef $0'
Use of uninitialized value $0 in undef operator at -e line 1.
Generally, undef should ignore its arg when determining which var was
undef: only magic will trigger an undef warning.
David Mitchell [Wed, 28 Sep 2016 10:27:38 +0000 (11:27 +0100)]
OP_MULTIDEREF: ignore customised delete/exists
We already skip optmising to a multideref if an aelem op has a customised
PL_check[] routine; extend this skip to OP_EXISTS and OP_DELETE too.
See http://nntp.perl.org/group/perl.perl5.porters/227545
James Raspass [Wed, 15 Jul 2015 22:46:20 +0000 (23:46 +0100)]
Speed up compilation of overload.pm a smidge.
Measured with the following crude perl script calling perf. Perl
is in there to get a rough baseline cost of starting perl:
print 'PERL', (`perf stat -r100 perl -e 1 2>&1`)[10];
print 'OLD ', (`perf stat -r100 perl lib/overload.pm 2>&1`)[10];
print 'NEW ', (`perf stat -r100 perl lib/overload2.pm 2>&1`)[10];
Produced the following results on my machine:
PERL 5,800,051 instructions # 1.05 insns per cycle ( +- 0.06% )
OLD 14,818,995 instructions # 1.16 insns per cycle ( +- 0.03% )
NEW 14,696,974 instructions # 1.16 insns per cycle ( +- 0.03% )
While the numbers did fluctuate between runs, the new code was
consistently faster.
David Mitchell [Tue, 27 Sep 2016 15:43:30 +0000 (16:43 +0100)]
Cwd.xs: avoid blib better while building
RT # 125603
There's an old line in Cwd's Makefile.PL:
BEGIN { @INC = grep {!/blib/} @INC }
This was added 12 years ago to solve a problem with a static perl and
building a newer Cwd (but no-one's quite sure what *exactly* the issue was
any more).
However, this breaks building perl under a directory that has 'blib'
in the pathname. This commit estricts the grep to just exclusing exactly
blib/lib and blib/arch.
This should hopefully still workaround the static build issue, while no
longer breaking perl builds.
David Mitchell [Tue, 27 Sep 2016 14:13:56 +0000 (15:13 +0100)]
Porting/bench.pl: explain what PUT means
'PUT' is used in code comments, a function name, an error message, and
verbose output; without ever saying what it stands for. Rectify this.
David Mitchell [Tue, 27 Sep 2016 13:50:25 +0000 (14:50 +0100)]
Eliminate xpad_cop_seq from _xnvu union
PVNV's used to be used to hold pad names (like '$lex'), but aren't used
for this purpose any more. So eliminate the xpad_cop_seq part of the
union.
Since S_scan_subst() was using xnv_u.xpad_cop_seq.xlow to store a
temporary line count, add a new union member for that.
The main usage of this field on CPAN is to define
COP_SEQ_RANGE_LOW()-style macros, so if the module is still using
xpad_cop_seq for that purpose, it's already broken.
Chris 'BinGOs' Williams [Tue, 27 Sep 2016 12:43:59 +0000 (13:43 +0100)]
Update podlators to CPAN version 4.08
[DELTA]
podlators 4.08 (2016-09-24)
[Pod::Man] Partially revert change in 4.00 to require the name option
(--name to pod2man) when generating man pages from standard input.
Historically, pod2man silently tolerated this, and there turned out to
be a lot of software that depended on this, making the change too
disruptive. Instead, silently set the man page title to STDIN in this
case, but warn about it in the documentation. (#117990)
[Pod::Man] Fix rendering bug for "TRUE (1)", which was recognized as
needing small caps and then erroneously as a man page reference,
resulting in escaped nroff. (Found by Dan Jacobson with the
XML::LibXML::Element man page.) (Debian Bug#836831)
[Pod::Man] Fix rendering bug causing "\s0(1)" to be mistakenly marked
as a man page reference, later confusing backslash escaping.
[Pod::Man] Add new lquote and rquote options (and corresponding
--lquote and --rquote flags to pod2man) to set the left and right
quotes for C<> text independently. (#103298)
Remove test for nested L<> markup, since an upcoming version of
Pod::Simple will drop support for this. (#114075)
Chris 'BinGOs' Williams [Tue, 27 Sep 2016 12:38:24 +0000 (13:38 +0100)]
Update HTTP-Tiny to CPAN version 0.068
[DELTA]
0.068 2016-09-23 16:10:03-04:00 America/New_York
- No changes from 0.067-TRIAL.
0.067 2016-09-14 11:43:14-04:00 America/New_York (TRIAL RELEASE)
[FIXED]
- Includes redirect history when issuing a 599 internal error.
0.065 2016-09-09 22:42:43-04:00 America/New_York (TRIAL RELEASE)
[TESTS]
- Try harder to clean up environment in t/140_proxy.t (needed for VMS)
Chris 'BinGOs' Williams [Tue, 27 Sep 2016 12:35:32 +0000 (13:35 +0100)]
Update Time-HiRes version in Maintainers.pl
David Mitchell [Tue, 27 Sep 2016 11:59:01 +0000 (12:59 +0100)]
S_sv_2iuv_common(): optimise single digit strings
When converting a POK SV to an IOK SV, short-cut the relatively
common case of a string that is only one char long and consists of a
single digit, e.g. "0". Thus skipping all the floating-point, infinity,
whitespace etc complexity.
David Mitchell [Tue, 27 Sep 2016 11:11:50 +0000 (12:11 +0100)]
pp_leaveloop(): rename local vars
For internal consistency and for consistency with other pp_leave()
functions, rename oldsp to base and mark/MARK to oldsp.
Should be no functional difference.
David Mitchell [Tue, 27 Sep 2016 10:52:07 +0000 (11:52 +0100)]
padrange, aelemfast: use label for private bits
Change the output of Concise etc:
$ perl -MO=Concise -e'my (@a,$b,$c); $a[5];'
from:
3 <0> padrange[@a:1,2; $b:1,2; $c:1,2] vM/LVINTRO,3
...
5 <0> aelemfast_lex[@a:1,2] sR/5
to:
3 <0> padrange[@a:1,2; $b:1,2; $c:1,2] vM/LVINTRO,range=3
...
5 <0> aelemfast_lex[@a:1,2] sR/key=5
See http://nntp.perl.org/group/perl.perl5.porters/220208.
David Mitchell [Tue, 27 Sep 2016 10:44:42 +0000 (11:44 +0100)]
OP_AVHVSWITCH: make op_private bits 0..1 symbolic
Add OPpAVHVSWITCH_MASK and make Concise etc display the offset as
/offset=2 rather than /2.
David Mitchell [Tue, 27 Sep 2016 08:51:45 +0000 (09:51 +0100)]
fixup some AV API pod descriptions.
In particular:
* improve some of the "perl equivalent" entries; for example
av_store() is *not* like $myarray[$key] = $val, since it replaces the
stored SV with a different SV, rather than just updating the current
SV's value.
* Also change the "perl equivalent" variable names to match the function
parameter names, e.g. $key rather than $idx.
* Don't use 'delete' as a perl equivalent, since delete is discouraged on
arrays.
* You don't *have* to use av_store() to change undef values inserted by
av_unshift; e.g. you could do av_fetch() then modify the returned
undef SV; so just delete that sentence
David Mitchell [Tue, 27 Sep 2016 08:27:30 +0000 (09:27 +0100)]
perldelta for PADOFFSET changes
David Mitchell [Mon, 26 Sep 2016 14:56:08 +0000 (15:56 +0100)]
make PL_ pad vars be of type PADOFFSET
Now that that PADOFFSET is signed, make
PL_comppad_name_fill
PL_comppad_name_floor
PL_padix
PL_constpadix
PL_padix_floor
PL_min_intro_pending
PL_max_intro_pending
be of type PADOFFSET rather than I32, to match the rest of the pad
interface.
At the same time, change various I32 local vars in pad.c functions to be
PADOFFSET.
David Mitchell [Mon, 26 Sep 2016 14:22:25 +0000 (15:22 +0100)]
make PADOFFSET be SSizet_t
Currently it's defined as U32 or U64 depending on whether pointers are
32 bit or 64-bit, which is just a long-winded way of doing
typedef Size_t PADOFFSET
Change it to
typedef SSize_t PADOFFSET
Making it signed makes it easier to handle comparisons against PADOFFSET
values that can be -1, such as PL_comppad_name_floor (which will be fixed
in the next commit).
David Mitchell [Mon, 26 Sep 2016 14:04:21 +0000 (15:04 +0100)]
remove a bunch of XXX's from pad.c
When in 2002 I moved a bunch of code from op.c etc into a new file,
pad.c, I left this comment at the top:
/* XXX DAPM
* As of Sept 2002, this file is new and may be in a state of flux for
* a while. I've marked things I intent to come back and look at further
* with an 'XXX DAPM' comment.
*/
Well, 12 years have passed since then, and if I was going to do any of
this stuff I would probably have done it by now, or someone else would.
So this commit removes the XXX's.
David Mitchell [Mon, 26 Sep 2016 13:59:26 +0000 (14:59 +0100)]
pad.c comments: clarify PERL_PADSEQ_INTRO
David Mitchell [Wed, 21 Sep 2016 08:22:13 +0000 (09:22 +0100)]
add a test for gv_try_downgrade()
Previously, making gv_try_downgrade() just immediately return didn't cause
any tests to fail.
David Mitchell [Tue, 20 Sep 2016 08:45:07 +0000 (09:45 +0100)]
fix builds under USE_PAD_RESET
It had suffered some bitrot.
Karl Williamson [Wed, 21 Sep 2016 22:15:08 +0000 (16:15 -0600)]
Centralize definitions of MIN, MAX
Instead of having each file have them, keep them in handy.h, but only
for core compilations.
Karl Williamson [Mon, 26 Sep 2016 04:04:08 +0000 (22:04 -0600)]
Add is_utf8_fixed_width_buf_flags() and use it
This encodes a simple pattern that may not be immediately obvious to
someone needing it. If you have a fixed-size buffer that is full of
purportedly UTF-8 bytes, is it valid or not? It's easy to do, as shown
in this commit. The file test operators -T and -B can be simpified by
using this function.
Karl Williamson [Mon, 19 Sep 2016 15:59:32 +0000 (09:59 -0600)]
Add API Unicode handling functions
These functions are all extensions of the is_utf8_string_foo()
functions, that restrict the UTF-8 recognized as valid in various ways.
There are named ones for the two definitions that Unicode makes, and
foo_flags ones for more custom restrictions.
The named ones are implemented as tries, while the flags ones provide
complete generality
Karl Williamson [Sun, 25 Sep 2016 16:14:50 +0000 (10:14 -0600)]
APItest/t/utf8.t: Rename variable
The new name is clearer, which will matter more in the next commit
Karl Williamson [Tue, 20 Sep 2016 16:12:45 +0000 (10:12 -0600)]
XS-APItest/t/utf8.t: Add some tests
These will help in testing the string functions coming in the next
commit. These add problematic code points to the first testing loop.
As a result some of the tests in the final loop may be redundant, but
since this .t is quick to run, I chose not to investigate and remove any
such.
Karl Williamson [Thu, 15 Sep 2016 01:57:46 +0000 (19:57 -0600)]
Move #define to different header
Instead of having a comment in one header pointing to the #define in the
other, remove the indirection and just have the #define itself where it
is needed.
Karl Williamson [Mon, 19 Sep 2016 15:52:57 +0000 (09:52 -0600)]
perlapi: Clarifications, nits in Unicode support docs
This also does a white space change to inline.h
Karl Williamson [Thu, 15 Sep 2016 15:06:39 +0000 (09:06 -0600)]
perlapi: Minor clarifications to sv_utf8_decode
James E Keenan [Sun, 25 Sep 2016 23:48:52 +0000 (19:48 -0400)]
Time-HiRes: bring up-to-date with CPAN.
The ext3/ext2 filesystems do not have subsecond resolution, therefore skip the
t/utime.t test. [rt.cpan.org #116127]
Karl Williamson [Wed, 21 Sep 2016 22:12:50 +0000 (16:12 -0600)]
podcheck.t: perlepigraphs: don't note too long verbatims
These epigraphs may not be foldable properly. Instead of warning when
new ones are added, ignore this category entirely for this pod.
Jarkko Hietaniemi [Sun, 25 Sep 2016 00:23:36 +0000 (20:23 -0400)]
macos Sierra (10.12) hints comment updates.
Chris 'BinGOs' Williams [Sat, 24 Sep 2016 22:56:18 +0000 (23:56 +0100)]
Update for the Module-CoreList that is on teh CPAN
Chris 'BinGOs' Williams [Sat, 24 Sep 2016 22:55:25 +0000 (23:55 +0100)]
Bump Module-CoreList version for
bc46539a
Stevan Little [Sat, 24 Sep 2016 20:22:01 +0000 (22:22 +0200)]
update Module::CoreList
Stevan Little [Sat, 24 Sep 2016 19:59:52 +0000 (21:59 +0200)]
updating opcodes (version number mostly)
Stevan Little [Sat, 24 Sep 2016 19:57:41 +0000 (21:57 +0200)]
bumping the version number
Karl Williamson [Wed, 21 Sep 2016 15:46:46 +0000 (09:46 -0600)]
utf8.c: #define MIN if not already defined
This is only used on EBCDIC.
Dagfinn Ilmari Mannsåker [Wed, 21 Sep 2016 14:38:42 +0000 (15:38 +0100)]
Change sv_setpvn(…, "…", …) to sv_setpvs(…, "…")
The dual-life dists affected use Devel::PPPort, so can safely use
sv_setpvs() even though it wasn't added until Perl v5.10.0.
Steven Humphrey [Tue, 20 Sep 2016 11:42:39 +0000 (12:42 +0100)]
Fix typo in perlrun.pod
s/and/any/
perl -c documentation has a typo when talking about BEGIN blocks.
Steven Humphrey is now a Perl author.
For: RT #129313
Stevan Little [Tue, 20 Sep 2016 20:25:11 +0000 (22:25 +0200)]
new perldelta
Stevan Little [Tue, 20 Sep 2016 20:24:16 +0000 (22:24 +0200)]
known pod issues
Stevan Little [Tue, 20 Sep 2016 20:05:42 +0000 (22:05 +0200)]
ticking the release
Stevan Little [Tue, 20 Sep 2016 19:59:20 +0000 (21:59 +0200)]
update epigraphs.pod
Sawyer X [Tue, 20 Sep 2016 20:28:42 +0000 (22:28 +0200)]
typo
Stevan Little [Tue, 20 Sep 2016 12:59:00 +0000 (14:59 +0200)]
add new release to perlhist
Stevan Little [Tue, 20 Sep 2016 12:54:37 +0000 (14:54 +0200)]
finalize the perldelta
Stevan Little [Tue, 20 Sep 2016 12:34:02 +0000 (14:34 +0200)]
Update Module::CoreList for 5.25.5
Karl Williamson [Mon, 19 Sep 2016 21:37:52 +0000 (15:37 -0600)]
utf8.c: Fix bug in new _is_utf8_char_helper() function
This bug was exposed by the tests that I'm still developing
Father Chrysostomos [Sun, 18 Sep 2016 19:11:28 +0000 (12:11 -0700)]
Make regexp_nonull.t test patterns without null
It was only testing matches against strings without a trailing
null byte. Now it also tests compilation of patterns without
a trailing null byte.
Yves Orton [Sat, 17 Sep 2016 18:14:53 +0000 (20:14 +0200)]
regcomp.c: S_concat_pat: guard against missing trailing nulls
The regex engine expects the pattern to have a null byte at
SvEND(pat), but is not guaranteed to receive such a pattern
when it is called, so S_concat_pat should guard against this
case. It turns out this is only an issue when there is exactly
one "argument" to the pattern. (Consider concatenation rules, etc).
Yves Orton [Sat, 17 Sep 2016 18:13:23 +0000 (20:13 +0200)]
sv.c: sv_grow: newlen cannot be smaller than SvCUR()
This expression dates back to about 2003 or so, and as
far as I can tell is no longer necessary.
Yves Orton [Sat, 17 Sep 2016 18:12:26 +0000 (20:12 +0200)]
doop.c: use sv_setpvn() instead of sv_setpvs()
Lukas Mai [Mon, 19 Sep 2016 16:13:12 +0000 (18:13 +0200)]
perldelta: grammar
James E Keenan [Mon, 19 Sep 2016 12:19:38 +0000 (08:19 -0400)]
Correct one formatting error in perldelta.pod.
This was causing a failure in t/porting/podcheck.t.
Stevan Little [Sun, 18 Sep 2016 21:53:27 +0000 (23:53 +0200)]
working on perldelta some more
Father Chrysostomos [Mon, 19 Sep 2016 03:28:58 +0000 (20:28 -0700)]
Father Chrysostomos [Mon, 19 Sep 2016 03:27:11 +0000 (20:27 -0700)]
perldelta for #129287 / b43665
Father Chrysostomos [Mon, 19 Sep 2016 03:24:00 +0000 (20:24 -0700)]
perldelta: Remove duplicate entry; fix typo
I had already documented the perlinterp change.
Father Chrysostomos [Mon, 19 Sep 2016 03:20:06 +0000 (20:20 -0700)]
bop.t: Delete $SIG{__WARN__}
It is only needed for one block of tests. Leaving the handler in
place makes it harder to add temporary diagnostics elsewhere in
the code. (Where did my warning go? Hey, why is ‘warn’ not work-
ing?????!!!! :-)
Father Chrysostomos [Mon, 19 Sep 2016 03:17:08 +0000 (20:17 -0700)]
[perl #129287] Make UTF8 & append null
The & and &. operators were not appending a null byte to the string
in utf8 mode.
(The internal function that they use is the same. I used &. in the
test just because its intent is clearer.)
Father Chrysostomos [Sun, 18 Sep 2016 19:19:13 +0000 (12:19 -0700)]
regexp.t: Update comments about column 1
Years out of date!
Stevan Little [Sun, 18 Sep 2016 15:05:01 +0000 (17:05 +0200)]
working on perldelta
Aristotle Pagaltzis [Sun, 18 Sep 2016 09:53:20 +0000 (11:53 +0200)]
perlfunc: re-document old split() @_ side effect
Lukas Mai [Sun, 18 Sep 2016 07:50:16 +0000 (09:50 +0200)]
perlsub: scalar split no longer clobbers @_ (RT #129297)
Karl Williamson [Sun, 18 Sep 2016 03:07:29 +0000 (21:07 -0600)]
perldelta for new Unicode-handling function.
Karl Williamson [Thu, 15 Sep 2016 01:49:52 +0000 (19:49 -0600)]
perlapi: Clarify docs for some is_utf8_foo functions
Karl Williamson [Thu, 15 Sep 2016 00:54:23 +0000 (18:54 -0600)]
Add isUTF8_CHAR_flags() macro
This is like the previous 2 commits, but the macro takes a flags
parameter so any combination of the disallowed flags may be used. The
others, along with the original isUTF8_CHAR(), are the most commonly
desired strictures, and use an implementation of a, hopefully, inlined
trie for speed. This is for generality and the major portion of its
implementation isn't inlined.
Karl Williamson [Mon, 12 Sep 2016 22:52:41 +0000 (16:52 -0600)]
Add macro for Unicode Corregindum #9 strict
This macro follows Unicode Corrigendum #9 to allow non-character code
points. These are still discouraged but not completely forbidden.
It's best for code that isn't intended to operate on arbitrary other
code text to use the original definition, but code that does things,
such as source code control, should change to use this definition if it
wants to be Unicode-strict.
Perl can't adopt C9 wholesale, as it might create security holes in
existing applications that rely on Perl keeping non-chars out.
Karl Williamson [Mon, 12 Sep 2016 19:38:22 +0000 (13:38 -0600)]
Add macro for determining if UTF-8 is Unicode-strict
Karl Williamson [Mon, 12 Sep 2016 20:30:15 +0000 (14:30 -0600)]
perlapi: Clarify isUTF8_CHAR()
Karl Williamson [Wed, 14 Sep 2016 23:09:51 +0000 (17:09 -0600)]
inline.h: Add 'const's; avoid hiding outer variable
This changes some formal parameters to be const, and avoids reusing the
same variable name within an inner block, to avoid confusion
Karl Williamson [Thu, 8 Sep 2016 17:34:15 +0000 (11:34 -0600)]
Add tests for is_valid_partial_utf8_char_flags()
Karl Williamson [Mon, 12 Sep 2016 04:18:57 +0000 (22:18 -0600)]
Add is_utf8_valid_partial_char_flags()
This is a generalization of is_utf8_valid_partial_char to allow the
caller to automatically exclude things such as surrogates.
Karl Williamson [Sun, 11 Sep 2016 15:40:37 +0000 (09:40 -0600)]
perlapi: Reword description of is_utf8_valid_partial_char
Karl Williamson [Sun, 11 Sep 2016 04:27:37 +0000 (22:27 -0600)]
Fix off-by-one error in is_utf8_valid_partial_char()
Karl Williamson [Sun, 11 Sep 2016 04:24:48 +0000 (22:24 -0600)]
handy.h: Comment memEQs and memNEs
Karl Williamson [Sun, 11 Sep 2016 04:18:59 +0000 (22:18 -0600)]
utf8.c: Add some UNLIKELYs
Karl Williamson [Sun, 11 Sep 2016 04:18:16 +0000 (22:18 -0600)]
utf8.h: Add comment, white-space changes
Karl Williamson [Sun, 11 Sep 2016 04:09:44 +0000 (22:09 -0600)]
Enhance and rename is_utf8_char_slow()
This changes the name of this helper function and adds a parameter and
functionality to allow it to exclude problematic classes of code
points, the same ones excludeable by utf8n_to_uvchar(), like surrogates
or non-character code points.
Karl Williamson [Thu, 8 Sep 2016 04:22:01 +0000 (22:22 -0600)]
APItest/t/utf8.t: Add tests
These fill in gaps in current testing. In particular all the overlong
UTF-8 possible edge cases are now tested.
Karl Williamson [Thu, 8 Sep 2016 04:14:38 +0000 (22:14 -0600)]
APItest/utf8.t: Some clean up
This adds some information to test names, does some white-space
alignments, changes one test to stress things slightly more, and adds a
'use bytes' because in some cases the desired byte-oriented output was
not showing up.
Karl Williamson [Mon, 5 Sep 2016 03:32:08 +0000 (21:32 -0600)]
Test isUTF8_CHAR()
Karl Williamson [Sun, 11 Sep 2016 04:19:42 +0000 (22:19 -0600)]
lib/warnings/utf8: Reinstate warning test
I removed this in
35f8c9bd0ff4f298f8bc09ae9848a14a9667a95a, thinking the
warning was no longer being raised. But in fact, it was showing a bug,
now fixed by the previous commit.
Karl Williamson [Sun, 11 Sep 2016 03:15:04 +0000 (21:15 -0600)]
Revamp overlong handling in is_utf8_char_slow, fixing a bug
This combines EBCDIC and ASCII branches as much as possible, and fixes a
bug that showed up only on EBCDIC platforms, and 64-bit ASCII ones for
the highest overlong, where it could erroneously conclude that a
sequence was an overlong.
Tests are coming in a future commit.
.
Karl Williamson [Sun, 11 Sep 2016 03:06:39 +0000 (21:06 -0600)]
utf8.c: Fix typo in comment, add some comments
Karl Williamson [Sat, 10 Sep 2016 15:00:03 +0000 (09:00 -0600)]
utf8.c: Extract duplicate code to common fcn
Actually the code isn't quite duplicate, but should be because one
instance is wrong. This failure would only show up on EBCDIC platforms.
Tests are coming in a future commit.
Karl Williamson [Sat, 10 Sep 2016 14:54:36 +0000 (08:54 -0600)]
handy.h: Add memLT, memLE, memGT, memGE
These correspond to strLT, etc. I am deferring documenting them in case
this turns out to be a bad idea for some reason.
Karl Williamson [Sat, 10 Sep 2016 14:46:18 +0000 (08:46 -0600)]
Unconditionally define memcmp() if not sane
Prior to this commit, if there was a #define for memcmp that invoked a
version that Configure deemed to not be sufficient for normal use, it
was retained, so that perl used the defective version. This apparently
hasn't been a problem in the field, but I realized the potential issue
doing code reading, and am correcting it.
Karl Williamson [Sat, 3 Sep 2016 20:12:27 +0000 (14:12 -0600)]
isUTF8_CHAR(): Bring UTF-EBCDIC to parity with ASCII
This changes the macro isUTF8_CHAR to have the same number of code
points built-in for EBCDIC as ASCII. This obsoletes the
IS_UTF8_CHAR_FAST macro, which is removed.
Previously, the code generated by regen/regcharclass.pl for ASCII
platforms was hand copied into utf8.h, and LIKELY's manually added, then
the generating code was commented out. Now this has been done with
EBCDIC platforms as well. This makes regenerating regcharclass.h
faster.
The copied macro in utf8.h is moved by this commit to within the main
code section for non-EBCDIC compiles, cutting the number of #ifdef's
down, and the comments about it are changed somewhat.
Karl Williamson [Sat, 3 Sep 2016 18:15:29 +0000 (12:15 -0600)]
regen/regcharclass.pl: surrogates are code points
They are not "characters"