This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Zefram [Fri, 21 Jul 2017 07:50:12 +0000 (08:50 +0100)]
fix Storable test for pre-5.19.2 threaded perls
The test of Storable's identity-preserving handling of the immortal
scalars ran into trouble with an old bug regarding overloading of the
PADTMP flag, which could prevent the test script getting actual references
to the immortal truth values in order to feed to Storable and to compare
its output against. On the affected perls, compiling code that includes
a const op whose value is an immortal makes subsequent executions of the
refgen operator on that immortal scalar copy it, breaking its identity.
Loading the testing infrastructure modules could easily perform such
compilation, though whether it actually does depends on the versions of
the modules. Work around this by taking the platinum-iridium immortal
references first thing in the test script, before loading anything that
could break it.
Zefram [Fri, 21 Jul 2017 04:48:20 +0000 (05:48 +0100)]
restore Storable's portability to pre-5.25.6 perls
Commit
4f72e1e921be7caffd7029f421f171bad7f485f2 changed Storable.xs to
use SvPVCLEAR(), defined only on 5.25.6 and later, but didn't supply a
reserve definition. Add the obvious reserve definition.
Father Chrysostomos [Fri, 21 Jul 2017 01:55:44 +0000 (18:55 -0700)]
op.c: Confusing comment typo
Aaron Crane [Thu, 20 Jul 2017 19:53:42 +0000 (20:53 +0100)]
Bump Perl version from 5.27.2 to 5.27.3
Including the various pieces of Module::CoreList.
Aaron Crane [Thu, 20 Jul 2017 19:51:47 +0000 (20:51 +0100)]
Release announcement template: next stable is 5.28
Aaron Crane [Thu, 20 Jul 2017 19:49:20 +0000 (20:49 +0100)]
New draft perldelta for 5.27.3
Aaron Crane [Thu, 20 Jul 2017 19:47:37 +0000 (20:47 +0100)]
perldelta template: add a note about module versions
Aaron Crane [Thu, 20 Jul 2017 19:38:43 +0000 (20:38 +0100)]
Tick off 5.27.2 release
Aaron Crane [Thu, 20 Jul 2017 19:38:09 +0000 (20:38 +0100)]
Add epigraph for 5.27.2
Aaron Crane [Thu, 20 Jul 2017 19:33:36 +0000 (20:33 +0100)]
Merge branch 'release-5.27.2' into blead
Aaron Crane [Thu, 20 Jul 2017 18:29:51 +0000 (19:29 +0100)]
Add 5.27.2 to perlhist
Aaron Crane [Thu, 20 Jul 2017 18:19:12 +0000 (19:19 +0100)]
perldelta: delete boilerplate for empty sections
Aaron Crane [Thu, 20 Jul 2017 18:16:22 +0000 (19:16 +0100)]
perldelta: acknowledgements for 5.27.2
Aaron Crane [Thu, 20 Jul 2017 18:16:04 +0000 (19:16 +0100)]
perldelta: module updates for 5.27.2
Aaron Crane [Thu, 20 Jul 2017 17:32:44 +0000 (18:32 +0100)]
Update Module::CoreList for 5.27.2
Steve Hay [Thu, 20 Jul 2017 13:08:33 +0000 (14:08 +0100)]
perlpolicy - Mention the maint-votes branch
The maint-votes branch has been used for some time now to keep track of
which commits should be cherry-picked into maint branches, but this has
never been mentioned in perlpolicy.pod. Document it now to avoid possible
confusion -- especially during long maint branch freeze periods, which
occurred recently.
Steve Hay [Thu, 20 Jul 2017 12:37:26 +0000 (13:37 +0100)]
ExtUtils::CBuilder - Fix link() on Windows, broken in version 0.280226
Zefram [Thu, 20 Jul 2017 06:54:16 +0000 (07:54 +0100)]
perldelta entry for Carp
Zefram [Thu, 20 Jul 2017 06:34:47 +0000 (07:34 +0100)]
perldelta entry for PathTools
Zefram [Thu, 20 Jul 2017 06:20:52 +0000 (07:20 +0100)]
fix PathTools dynamic linking for Perl 5.6
On Perl 5.6, PathTools's use of SvPV_nomg() is satisfied by a definition
in ppport.h. That definition calls out to the function sv_2pv_flags(),
which is also not defined by the 5.6. ppport.h can define the function,
but doesn't unless requested to. Cwd.xs wasn't requesting it, resulting
in Cwd.so having a reference to a non-existent function of a decorated
version of that name. Fix this by making Cwd.xs request that function.
Some of the existing test cases already exercise the call out to
sv_2pv_flags(), but in non-obvious ways. Add a more direct test,
requesting canonisation of a number. However, the use of PERL_DL_NONLAZY
for running the test suite hides the problem. The XS loading for
PathTools is optional, so with PERL_DL_NONLAZY making the loading fail
due to the unresolvable symbol, PathTools just falls back to its pure
Perl backup implementation, and passes all the tests. The problem only
really manifests when PERL_DL_NONLAZY is unset, i.e., in real use: the
XS loading succeeds, so PathTools will rely on the XS, but then the code
fails when called. This issue remains, lurking to bite if PathTools
develops another dynamic linking problem in the future.
Zefram [Thu, 20 Jul 2017 05:32:27 +0000 (06:32 +0100)]
fix PathTools taint handling for Perl 5.6
PathTools's reserve code for detecting tainted values on Perl 5.6
generated a warning if the value was undefined, which it commonly was.
Make it allow all undefined values (which get filtered out in the next
step) without warning. Its test script for tainting behaviour also failed
to detect whether tainting was turned on on Perl 5.6, incorrectly skipping
the test script on the basis that the Perl doesn't support tainting.
Switch that check to an empirical arrangement that works on any Perl.
Zefram [Thu, 20 Jul 2017 03:00:30 +0000 (04:00 +0100)]
fix problems from Carp's partial EBCDIC support
Commit
975fe8546427b5f6259103912b13925be148becd introduced partial EBCDIC
support to Carp, but simultaneously introduced some bugs into the module
and the tests. Multiple issues are addressed in this commit:
* The main check for whether a character needs a non-literal
representation when dumping a string or regexp argument, which used
to be a regexp character range [ -~], was expanded to an explicit
character set not using range syntax, but in the expansion the "&"
was omitted. This caused unwanted \x representation of any "&" in an
argument in a stack trace. Add the "&" back in and fix the sorting
of the character set.
* The substitute version of this check for Perls on which Carp can't
safely apply a regexp to an upgraded string, but new enough to have
utf8::native_to_unicode(), was applying that function to some fixed
codepoint values that were already Unicode codepoints. Remove those
calls, and compare the fixed codepoints directly to codepoints correctly
converted through that function.
* That version of the check, by referring to utf8::native_to_unicode()
directly in source that is always compiled, caused the utf8:: stash to
be vivified on Perl 5.6, causing havoc (and failed tests). Hide that
version of the check behind a (compile-time) string eval.
* Another version of the printability check, for EBCDIC on Perl 5.6,
treated as printable any codepoint above 0xff. Change that to correctly
treat all such codepoints as not safely printable.
* Some tests in t/arg_regexp.t which were originally about non-ASCII
characters specified in a regexp by using \x regexp syntax got changed
to use the non-ASCII characters literally at the regexp syntax level
(by interpolating them from a constructed string). Restore these to
using \x syntax, with the appropriate variability of the hex digits.
* Add a couple of "fixme" comments about parts of the EBCDIC support
that are incomplete.
* Some tests involving non-ASCII characters were later made to skip on
any Perl prior to 5.17.1. In practice they work fine on earlier Perls,
and they're fairly important. Suspect that the problem that led to
the skipping being added was dependent on the tests having been broken
as described above, so remove the skipping logic.
* Incidentally, correct a comment about the purpose of t/arg_string.t
and add a similar one to t/arg_regexp.t.
* Incidentally, add Changes entries for versions 1.41 and 1.42, which
were omitted when those changes were made.
Zefram [Wed, 19 Jul 2017 23:39:33 +0000 (00:39 +0100)]
re-fix do/@INC issue in Time-HiRes's Makefile.PL
Time-HiRes's Makefile.PL loads its hints file using do-file, and so
has needed to be updated for the removal of the implicit "." from @INC.
Commit
8b69401c2ba8d1ced2e17c24d6b51a7ce3882664 attempted to do this,
but put the explicit "." in the input to File::Spec->catfile, which
edits it back out. Commit
ba570843add681d44ff50ff219d1ce000a961663
fixed this problem, by moving the explicit "." into a string
concatenation after File::Spec->catfile has done its bit. But commit
5cd155b07ed261125793850e101ebe6fa438c5e3 then reverted that fix,
apparently by mistake during preparation for a CPAN release. This commit
reinstates the fix.
Zefram [Wed, 19 Jul 2017 23:29:19 +0000 (00:29 +0100)]
correct declared min Perl version for Time-HiRes
Version 1.9727_03 of Time-HiRes introduced into Makefile.PL a declaration
of the minimum required Perl version, but got it wrong. It declared a
minimum of 5.8, but the module itself only demands 5.6, and it actually
works at least as far back as 5.6.1. Change the declared minimum to 5.6.
Zefram [Wed, 19 Jul 2017 22:28:00 +0000 (23:28 +0100)]
fix ExtUtils-CBuilder tests for Perl 5.6
Aaron Crane [Tue, 18 Jul 2017 17:06:46 +0000 (18:06 +0100)]
Import Encode-2.92 from CPAN
This also permits removing the local customisation for the previous version.
Aaron Crane [Tue, 18 Jul 2017 11:14:09 +0000 (12:14 +0100)]
Porting/perldelta_template.pod: tiny grammar tweak
James E Keenan [Fri, 7 Jul 2017 23:59:47 +0000 (19:59 -0400)]
Improve documentation of 'map'.
Per discussion in RT # 131652.
Karl Williamson [Fri, 14 Jul 2017 17:26:00 +0000 (11:26 -0600)]
locale.t: Refactor error reporting code
It turns out that there were paths through this code that didn't
generate the correct diagnostics. The diagnostics came out ahead of the
failing message. This commit fixes both those, and removes a
no-longer-needed use of explicitly saying we are using the postderef
feature
Karl Williamson [Mon, 17 Jul 2017 19:27:59 +0000 (13:27 -0600)]
perl.h: Move #define to earlier in the file
I don't know when this bug got introduced (and am not taking the time
to find out), but a symbol was defined after other code tested if it was
defined, so that always failed, and the alternative implementation got
compiled. I do know that the intended implementation was used at some
point, as I ended up fixing several bugs in it.
Karl Williamson [Mon, 17 Jul 2017 19:25:58 +0000 (13:25 -0600)]
perl.h: Remove extraneous '}'
This would be a syntax error if the code ever got compiled, but another
error prevents that, which will be fixed in the next commit.
David Mitchell [Sun, 16 Jul 2017 19:00:01 +0000 (20:00 +0100)]
PL_curstackinfo->si_stack_hwm: gently restore
RT #131732
With v5.27.1-66-g87058c3, I introduced a DEBUGGING-only mechanism in the
runops loop for checking whether an op extended the stack by as many slots
as values it returned on the stack. It did this by setting a
high-water-mark just before calling each pp function, and checking its
result on return.
It saved and restored the old value of PL_curstackinfo->si_stack_hwm
whenever it entered or left a runops loop or did a JMPENV_PUSH /
JMPENV_POP. However, the restoring could restore to an old value that was
smaller than the current value, leading to false-positive stack-extend
panics. So only restore if the old value was larger.
In particular this was causing false positives in DBI.
Aaron Crane [Sun, 16 Jul 2017 16:23:05 +0000 (17:23 +0100)]
Add some perldelta entries for 5.27.2
Aaron Crane [Sun, 16 Jul 2017 15:51:53 +0000 (16:51 +0100)]
[perl #131627] extend stack in scalar-context pp_list when no args
In scalar (well, non-list) context, pp_list always yields exactly one stack
element. It must therefore extend the stack for that element, in case there
were no arguments on the stack when it started.
Aaron Crane [Sun, 16 Jul 2017 12:09:41 +0000 (13:09 +0100)]
[MERGE] release management checklist maker
The Release Manager's Guide is a complicated document that must accurately
describe how to prepare all four possible types of release. This makes it
hard to use as the basis for a checklist: for any given type of release, it
must list some steps in the wrong order, and list some steps that mustn't in
fact be taken at all.
We do have a porting tool that prepares a release checklist from the RMG for
a given release type. This set of changes, largely written by Sawyer++,
modifies that tool so that its output lists only the desired steps.
Aaron Crane [Sun, 9 Jul 2017 13:01:54 +0000 (14:01 +0100)]
Restore Porting/make-rmg-checklist --html option
Aaron Crane [Sun, 9 Jul 2017 12:54:26 +0000 (13:54 +0100)]
Suppress irrelevant "MUST SKIP this step" RMG paragraphs
Sections that aren't relevant to the current release type are suppressed in
their entirety, so the remaining "MUST SKIP" messages are just confusing.
Remove them from the content.
Sawyer X [Sun, 14 May 2017 10:24:07 +0000 (12:24 +0200)]
Replace Release Managers Guide (RMG) with new version:
Many of the mistakes made by me during a release has to do with the
confusing instructions in the guide.
* Some steps are mentioned in different order
* Some steps are mentioned (and noted to *NOT* do)
* The confusion between "MAINT" and "BLEAD-FINAL", and "BLEAD-FINAL"
and "BLEAD-POINT".
This generator generates a checklist with only the instruction you
*will* have to perform. Any steps that mentions they must be skipped
for the release will not be included in the end-result.
Unlike the previous guide, you need not know the type of the release
you do. Instead, you give the version you want to release and it
generates the appropriate one for you.
All the following incantations work:
perl Porting/make-rmg-checklist --version 5.26.0-RC2 # RC
perl Porting/make-rmg-checklist --version 5.26.0 # BLEAD-FINAL
perl Porting/make-rmg-checklist --version 5.27.0 # BLEAD-POINT
perl Porting/make-rmg-checklist --version 5.27.1 # BLEAD-POINT
perl Porting/make-rmg-checklist --version 5.26.1 # MAINT
Extra benefit: Apparently it includes additional checklist steps
at the top that somehow are not included when you currently generate.
Downside: HTML is not yet supported.
Lukas Mai [Sun, 16 Jul 2017 09:48:41 +0000 (11:48 +0200)]
PerlIO::scalar: check invariant at compile time
Lukas Mai [Sun, 16 Jul 2017 09:48:28 +0000 (11:48 +0200)]
Opcode: check invariant at compile time
Karl Williamson [Sun, 16 Jul 2017 01:36:25 +0000 (19:36 -0600)]
t/lib/warnings/utf8: Fix test
There is some randomness to this test added to fix [perl #131646].
Change what passes to be a pattern that matches the correct template
Karl Williamson [Sun, 16 Jul 2017 00:46:50 +0000 (18:46 -0600)]
ext/File-Glob/t/rt131211.t: Fix typo
Commit
0887d051f49229ff72dc6fd22105ce922a11003f had an extra backslash
Karl Williamson [Sat, 15 Jul 2017 18:36:54 +0000 (12:36 -0600)]
embed.fnc: Fix declaration of my_strerror()
This was improperly made public (but the docs indicate it should not be
used by the public).
Karl Williamson [Sat, 15 Jul 2017 18:03:01 +0000 (12:03 -0600)]
embed.fnc Change Some functions only used in macros
The X flag is used for this situation where a function is public only
because it is called from a public macro.
Karl Williamson [Sat, 15 Jul 2017 17:11:41 +0000 (11:11 -0600)]
Move bulk of POSIX::setlocale to locale.c
This cleans up the interface, as it allows several functions to now be
static that used to have to be called from outside locale.c
Karl Williamson [Sat, 15 Jul 2017 21:01:44 +0000 (15:01 -0600)]
Fix File::Glob/t/rt131211.t
The \b boundaries I added in commit
5a993d81c4b1abf13cd3ae4cbc04f26c7516bc37 were wrong. \b{wb} give a better
result.
Chris 'BinGOs' Williams [Sat, 15 Jul 2017 18:55:05 +0000 (19:55 +0100)]
Carthago delenda est
Aaron Crane [Sat, 4 Mar 2017 12:50:58 +0000 (12:50 +0000)]
RT #130907: Fix the Unicode Bug in split " "
Steve Hay [Sat, 15 Jul 2017 18:17:30 +0000 (19:17 +0100)]
Tick off 5.22.4 and 5.24.2
That was probably the last 5.22. There may or may not be another 5.24.
Steve Hay [Sat, 15 Jul 2017 18:07:31 +0000 (19:07 +0100)]
Add perldeltas for 5.22.4 and 5.24.2
Steve Hay [Sat, 15 Jul 2017 18:00:17 +0000 (19:00 +0100)]
Import Module::CoreList data for 5.24.2
Steve Hay [Sat, 15 Jul 2017 17:56:11 +0000 (18:56 +0100)]
Import Module::CoreList data for 5.22.4
Steve Hay [Sat, 15 Jul 2017 17:43:02 +0000 (18:43 +0100)]
Epigraphs for 5.22.4 and 5.24.2
James E Keenan [Sat, 15 Jul 2017 16:47:07 +0000 (12:47 -0400)]
perldelta for
c7ac81d9d79d22d7d1133b804e5f8dc4a641fe39
Signed-off-by: James E Keenan <jkeenan@cpan.org>
Alberto Simões [Fri, 14 Jul 2017 13:14:05 +0000 (14:14 +0100)]
Update to ExtUtils::CBuilder 0.280226
File::Basename::fileparse(), when called with two arguments, is
documented to return a list of three elements:
The non-suffix part of the file's basename.
The file's dirname, plus trailing path separator.
The suffix part of the file's basename.
Thus,
my ($name,$path,$suffix) = fileparse('/tmp/perl/p5p/foo.patch', qr/\.[^.]*/);
returns:
$name: foo
$path: /tmp/perl/p5p/
$suffix: .patch
If we want to take those values and compose a path with
File::Spec->catfile(), we have to bear in mind that File::Spec generally
expects to have directories precede filenames in its arguments. Thus,
the correct way to use the values returned by fileparse() would be:
my $cf = File::Spec->catfile($path, $name . $suffix);
In ExtUtils::CBuilder::Base::new(), however, the return values from
fileparse() were named in a way that suggested that the first value
would be the dirname and the second would be the non-suffix part of the
basename:
my ($ccpath, $ccbase, $ccsfx ) = fileparse($self->{config}{cc}, qr/\.[^.]*/);
$ccpath -- which here is really a basename -- was then used as the first
argument to catfile():
File::Spec->catfile( $ccpath, $cxx, $ccsfx )
In addition, in the above $ccsfx should not have been a separate
argument. Rather, it should have been concatenated without a path
separator to the second argument.
For: RT # 131749. See also:
https://github.com/Perl-Toolchain-Gang/ExtUtils-CBuilder/pull/6 (thanks
to stphnlyd of Perl Toolchain Gang).
Signed-off-by: James E Keenan <jkeenan@cpan.org>
Steve Hay [Sat, 15 Jul 2017 16:16:06 +0000 (17:16 +0100)]
5.22.4 and 5.24.2 today
Karl Williamson [Fri, 14 Jul 2017 21:03:51 +0000 (15:03 -0600)]
locale.c: Add forgotten #if DEBUGGING
I pushed the previous commit without actually amending it to include
this
Karl Williamson [Fri, 14 Jul 2017 19:56:44 +0000 (13:56 -0600)]
Add debugging to locale handling
These debug statements have proven useful in the past tracking down
problems. I looked them over and kept the ones that I though might be
useful in the future. This includes extracting some code into a
static function so it can be called from more than one place.
Karl Williamson [Fri, 14 Jul 2017 17:26:44 +0000 (11:26 -0600)]
perllocale: Clarifications, corrections, and nits
Karl Williamson [Fri, 14 Jul 2017 04:32:02 +0000 (22:32 -0600)]
File-Glob/t/rt131211.t: skip when File::Glob not used
File::Glob can be turned off at Configure time, and is on certain
platforms. Thus this .t is not testing File::Glob, but what the
platform's local sort-of-equivalent is. Thus the tests aren't valid
Karl Williamson [Thu, 13 Jul 2017 03:15:09 +0000 (21:15 -0600)]
Merge branch 'utf8 fixes' into blead
This branch reimplements the forbidding of code points above IV_MAX in
such a way that encountering UTF-8 evaluating to such doesn't kill the
receiving process, but is treated as an ordinary overflow. To do
otherwise can lead to Denial of Service attacks.
It fixes several bugs that occur only on UTF-8 that is malformed or for
very large code points.
And it cleans up and revamps the testing of the XS API for UTF-8 so that
more coverage is done, but in a fraction of the previous time needed.
Karl Williamson [Sat, 1 Jul 2017 17:58:00 +0000 (11:58 -0600)]
Forbid above IV_MAX code points
This implements the restriction of code points to 0..IV_MAX in such a
way that the process doesn't die when presented with input UTF-8 that
evaluates to a larger one. Instead, it is treated as overflow.
The commit reinstates causing the offending process to die if trying to
create a character somehow that is above IV_MAX (like
chr(0xFFFFFFFFFFFFF) or trying to do certain operations on one if
somehow one did get created.
The long term goal is to use code points above IV_MAX internally, as
Perl6 does. So code and tests are not removed, just commented out
Karl Williamson [Thu, 13 Jul 2017 02:28:45 +0000 (20:28 -0600)]
utf8.c: Change 2 static fcns to handle overlongs
This will be used in the following commit.
One function is made more complicated, so we stop asking it to be
inlined.
Karl Williamson [Thu, 13 Jul 2017 02:26:18 +0000 (20:26 -0600)]
utf8.c: Move and slightly change comment block
This is so there are fewer real differences shown in the next commit
Karl Williamson [Sat, 1 Jul 2017 13:21:09 +0000 (07:21 -0600)]
utf8.c: Generalize static fcn return for indeterminate result
This makes it harder to think that 0 means a definite FALSE.
Karl Williamson [Sat, 1 Jul 2017 12:32:28 +0000 (06:32 -0600)]
utf8.c: Move a fcn within the file
This simply moves a function to later in the file. The next commIt will
change it to needing a definition which, until this commit, came after it
in the file, and so was not available to it.
Karl Williamson [Sat, 1 Jul 2017 12:43:34 +0000 (06:43 -0600)]
utf8.c: Generalize static fcn return for indeterminate result
This makes it harder to think that 0 means a definite FALSE.
Karl Williamson [Sat, 1 Jul 2017 12:18:01 +0000 (06:18 -0600)]
utf8.c: Generalize static fcn return for indeterminate result
Prior to this commit, isFF_OVERLONG() returned a boolean, with 0 also
indicating that there wasn't enough information to make a determination.
I realized that I was forgetting that 0 wasn't necessarily definitive
while coding. By changing the API to return 3 values, forgetting that
won't likely happen.
This and the next several commits change several other functions that
have the same predicament.
Karl Williamson [Fri, 30 Jun 2017 19:21:58 +0000 (13:21 -0600)]
utf8.h: Comments only
An earlier commit had split some comments up. And this adds clarifying
details.
Karl Williamson [Fri, 30 Jun 2017 19:19:10 +0000 (13:19 -0600)]
utf8.c: Reorder two 'if' clauses
This is purely to get vertical line up that easier to see of slightly
differently spelled tests
Karl Williamson [Fri, 30 Jun 2017 17:19:59 +0000 (11:19 -0600)]
utf8.c: Slightly simplify some code
This just does a small refactor, which I think makes things easier to
understand.
Karl Williamson [Sat, 8 Jul 2017 20:54:28 +0000 (14:54 -0600)]
utf8n_to_uvchr(): Properly handle extremely high code points
It turns out that it could incorrectly deem something to be overflowing
or overlong. This fixes that and changes the test to catch this
possibility. This fixes a bug, so now on 32-bit systems, it detects
that if you have a start byte of FE, you need a continuation byte to
determine if the result overflows.
Karl Williamson [Fri, 7 Jul 2017 18:39:33 +0000 (12:39 -0600)]
rm APItest/t/utf8_malformed.t
This file no longer contains any tests. All were either made redundant
with utf8_warn_base.pl or have been moved to it.
Karl Williamson [Fri, 7 Jul 2017 18:37:39 +0000 (12:37 -0600)]
Move test to utf8_warn_base.pl
This is the final test that was in utf8_malformed.t. The next commit
will remove the file.
Karl Williamson [Wed, 5 Jul 2017 16:27:25 +0000 (10:27 -0600)]
APItest/t/utf8_malformed.t: Remove 2 redundant tests
These tests for the malformation where a UTF-8 sequence is interrupted
by the beginning of another character, already get tested int
utf8_warn_base.pl
Karl Williamson [Fri, 7 Jul 2017 21:20:44 +0000 (15:20 -0600)]
APItest/t/utf8_warn_base.pl: White-space only
This indents properly after the previous commit created a block around
this code, and reflows to fit in 79 columns.
Karl Williamson [Tue, 4 Jul 2017 18:57:40 +0000 (12:57 -0600)]
APItest/t/utf8_warn_base.pl: Add a test
This verifies that we don't mistake an overlong for overflow
Karl Williamson [Tue, 4 Jul 2017 22:04:26 +0000 (16:04 -0600)]
APItest/t/utf8_malformed.t: move tests to utf8_warn_base.pl
This adds infrastructure to utf8_warn_base.pl to handle the overlong
tests that are now moved to it from utf8_malformed.t
Karl Williamson [Tue, 4 Jul 2017 18:22:29 +0000 (12:22 -0600)]
APItest/t/utf8_malformed.t: move test to utf8_warn_base.pl
Actually, this test was already in utf8_warn_base, but was executed only
on 64 bit platforms. It is reasonable to make sure it works on 32 bit
ones, as it is an edge case there as well, in the sense that it is the
first 13 byte code point.
This is the first of a series of commits to remove all the tests in
utf8_malformed, so the entire file can be removed.
utf8_warn_base has been heavily cleaned up, and now has better
infrastructure for more completely testing than utf8_malformed. The
two files have much the same logic, and rather than trying to maintain
two versions, it's better to combine them.
Karl Williamson [Tue, 4 Jul 2017 19:23:18 +0000 (13:23 -0600)]
APItest/t/utf8_malformed.t: Remove redundant test
This tests the too short malformation, which is already adequately
tested in utf8_warn_base.pl
Karl Williamson [Tue, 4 Jul 2017 19:19:33 +0000 (13:19 -0600)]
APItest/t/utf8_malformed.t: Remove 2 redundant tests
These test overflowing, which is already adequately tested in
utf8_warn_base.pl
Karl Williamson [Tue, 4 Jul 2017 16:06:37 +0000 (10:06 -0600)]
APItest/t/utf8_malformed.t: Remove redundant test
This test already is covered in utf8_warn_base.pl. It tests an overlong
for 2**32.
Karl Williamson [Fri, 7 Jul 2017 16:56:23 +0000 (10:56 -0600)]
APItest/t/utf8_warn_base.pl: Add tests
This test takes its various base tests, and intentionally perturbs them to
create malformations to additionally test. Prior to this commit, only
the function utf8n_to_uvchr_error() was being tested with these
perturbations. Now, the functions whoe names start with 'is' also get
tested.
Karl Williamson [Wed, 5 Jul 2017 20:58:43 +0000 (14:58 -0600)]
APItest/t/utf8_warn_base.pl: Move some tests
This just moves a block and indents and reflows it. It is moved to
within the loops that set up various malformations in the input. The
next commit will change these tests to actually use the perturbed
inputs.
Karl Williamson [Wed, 5 Jul 2017 19:09:27 +0000 (13:09 -0600)]
APItest/t/utf8_warn_base.pl: Move some setup code
We don't need this code until we've determined we're actually going to
go through with a test.
Karl Williamson [Fri, 7 Jul 2017 16:34:01 +0000 (10:34 -0600)]
APItest/t/utf8_warn_base.pl: Clean up test name
This name was confusing, as there are two types of things that can be
(dis)allowed, and in the case of an overflow, the first type is not
being tested but has the adjective (dis)allowed present. Add the term
only when appropriate.
Karl Williamson [Wed, 5 Jul 2017 19:00:03 +0000 (13:00 -0600)]
APItest/t/utf8_warn_base.pl: Skip inappropriate tests
If we don't have enough information for the test to be meaningful, don't
bother doing it.
Karl Williamson [Sat, 1 Jul 2017 04:29:36 +0000 (22:29 -0600)]
APItest/t/utf8_warn_base.pl: Use a default value
This adds a default number of bytes needed to detect overflows, like
previous commits have added defaults for other categories.
Karl Williamson [Tue, 27 Jun 2017 20:46:26 +0000 (14:46 -0600)]
utf8n_to_uvchr() Properly test for extended UTF-8
It somehow dawned on me that the code is incorrect for
warning/disallowing very high code points. What is really wanted in the
API is to catch UTF-8 that is not necessarily portable. There are
several classes of this, but I'm referring here to just the code points
that are above the Unicode-defined maximum of 0x10FFFF. These can be
considered non-portable, and there is a mechanism in the API to
warn/disallow these.
However an earlier standard defined UTF-8 to handle code points up to
2**31-1. Anything above that is using an extension to UTF-8 that has
never been officially recognized. Perl does use such an extension, and
the API is supposed to have a different mechanism to warn/disallow on
this.
Thus there are two classes of warning/disallowing for above-Unicode code
points. One for things that have some non-Unicode official recognition,
and the other for things that have never had official recognition.
UTF-EBCDIC differs somewhat in this, and since Perl 5.24, we have had a
Perl extension that allows it to handle any code point that fits in a
64-bit word. This kicks in at code points above 2**30-1, a number
different than UTF-8 extended kicks in on ASCII platforms.
Things are also complicated by the fact that the API has provisions for
accepting the overlong UTF-8 malformation. It is possible to use
extended UTF-8 to represent code points smaller than 31-bit ones.
Until this commit, the extended warning/disallowing was based on the
resultant code point, and only when that code point did not fit into 31
bits.
But what is really wanted is if extended UTF-8 was used to represent a
code point, no matter how large the resultant code point is. This
differs from the previous definition, but only for EBCDIC platforms, or
when the overlong malformation was also present. So it does not affect
very many real-world cases.
This commit fixes that. It turns out that it is easier to tell if
something is using extended-UTF8. One just looks at the first byte of a
sequence.
The trailing part of the warning message that gets raised is slightly
changed to be clearer. It's not significant enough to affect perldiag.
Karl Williamson [Mon, 26 Jun 2017 17:43:21 +0000 (11:43 -0600)]
utf8.h: Add synonyms for flag names
The next commit will fix the detection of using Perl's extended UTF-8 to
be more accurate. The current name for various flags in the API is
somewhat misleading. What is really wanted to know is if extended UTF-8
was used, not the value of the resultant code point.
This commit basically does
s/ABOVE_31_BIT/PERL_EXTENDED/g
It also similarly changes the name of a hash key in APItest/t/utf8.t.
This intermediary step makes the next commit easier to read.
Karl Williamson [Tue, 27 Jun 2017 04:22:32 +0000 (22:22 -0600)]
APItest/t/utf8_warn_base.pl: Generate smaller overlongs
This file generates overlongs for testing that that malformation is
handled properly. This commit changes it to avoid generating an
overlong that uses Perl's extended UTF-8. This will come in handy a
couple of commits from now, when a bug dealing with that gets fixed.
It also moves setting a variable to outside the loop
Karl Williamson [Fri, 30 Jun 2017 18:57:49 +0000 (12:57 -0600)]
APItest/t/utf8_warn_base.pl: Data::Dumper isn't needed
Karl Williamson [Fri, 30 Jun 2017 19:14:57 +0000 (13:14 -0600)]
APItest/t/utf8_warn_base.pl: Move some tests from loop
These test if any warnings are generated. None are ever likely to be
given the way things work. We can test after the loop that none of the
iterations generated warnings, as any would accumulate.
Karl Williamson [Mon, 26 Jun 2017 03:35:05 +0000 (21:35 -0600)]
APItest/t/utf8_warn_base.pl: Extract code into a fcn
This uses a function to test for a common paradigm. The next couple of
commits will change that paradigm, and now the code will only have to
change in one place.
Karl Williamson [Mon, 19 Jun 2017 18:58:19 +0000 (12:58 -0600)]
utf8.c: Fix bugs with overlongs combined with other malformations.
The code handling the UTF-8 overlong malformation must come after
handling all the other malformations. This is because it may change the
code point represented to the REPLACEMENT CHARACTER. The other
malformation code is expecting the code point to be the original one.
This may cause failure to catch and report other malformations, or
report the wrong value of the erroneous code point.
What was needed was simply to move the 'if else' branch for overlongs to
after the branches for the other formations.
Karl Williamson [Sun, 25 Jun 2017 04:55:10 +0000 (22:55 -0600)]
APItest/t/utf8_warn_base.pl: Add some tests
This adds testing for having some malformations allowed. These had not
been checked for, and there were some bugs. It's easiest to TODO all
ones that might fail, creating many passing TODOs. The TODO will be
removed in the next commit.
Karl Williamson [Sun, 25 Jun 2017 04:42:25 +0000 (22:42 -0600)]
APItest/t/utf8_warn_base.pl: Move things out of inner loop
The most expensive stuff in this set of nested loops can actually be
done several nests up (even higher for some things, but it's not worth
the trouble). Given that this test file has been too-long runnning, I
moved things to an outer loop context.
Karl Williamson [Sun, 25 Jun 2017 03:32:41 +0000 (21:32 -0600)]
APItest/t/utf8_warn_base.pl: Reorder loop nesting
This is in preparation for the next commit. It also changes some of the
loop variables to 1 to indicate truth, rather than a string. This will
make some things easier later.
Karl Williamson [Wed, 21 Jun 2017 19:38:55 +0000 (13:38 -0600)]
APItest/t/utf8_warn_base.pl: Revamp testing isFOO
Several commits ago, the loop that handles testing the functions that
convert from/to UTF-8 was revampled. This commit does a similar thing
for the portion of the code that handles the isFOO functions, and
partial character recognition.
It reorders the nesting of loops so that more tests can be done than
previously in the outer loop. Among these, it now doesn't skip overflow
and deals with using Perl's extended UTF-8 better.
Karl Williamson [Mon, 19 Jun 2017 18:56:38 +0000 (12:56 -0600)]
utf8n_to_uvchr: U+ should be for only Unicode code points
For above-Unicode, we should use 0xDEADBEEF instead of U+
DEADBEEF.
^^ ^^
This is because U+ only applies to Unicode. This only affects a warning
message for overlongs.