2 years agoperldelta: fix bad references to "unicode_strings" v5.16.0-RC2
Tom Hukins [Wed, 16 May 2012 01:42:48 +0000]
perldelta: fix bad references to "unicode_strings"

The documentation written for 2e2b2571 erroneously mentions
"unicode_semantics" instead of "unicode_strings".

2 years agoprevent PERL_UNICODE from affecting t/mro/package_aliases_utf8.t
Ricardo Signes [Wed, 16 May 2012 01:34:00 +0000]
prevent PERL_UNICODE from affecting t/mro/package_aliases_utf8.t

2 years agoperldelta: known issue: t/op/filetest.t
Ricardo Signes [Wed, 16 May 2012 01:22:21 +0000]
perldelta: known issue: t/op/filetest.t

2 years agonote the gcc -O2 and link-time-optimization problem
Andy Dougherty [Wed, 16 May 2012 01:16:45 +0000]
note the gcc -O2 and link-time-optimization problem

2 years agoour next release is RC2
Ricardo Signes [Wed, 16 May 2012 01:13:08 +0000]
our next release is RC2

2 years agoperldelta: Americanise spellings
Ricardo Signes [Tue, 15 May 2012 21:59:48 +0000]
perldelta: Americanise spellings

2 years agoreflect Socket update in Module::CoreList
Ricardo Signes [Tue, 15 May 2012 11:41:36 +0000]
reflect Socket update in Module::CoreList

2 years agoUpdate Socket to CPAN version 2.001
Tony Cook [Tue, 15 May 2012 09:22:30 +0000]
Update Socket to CPAN version 2.001

2.001   CHANGES:
         * Apply (modified) patch from ppisar@redhat.com to fix memory
           addressing bug with Zero() - RT76067
         * Document that inet_pton() doesn't work on hostnames, only textual
           addresses - RT76010
         * Ignore any existing-but-undefined hints hash members to
           getaddrinfo()

Done for the critical RT76067 fix.

2 years agoperldelta typo fixes (from mauke)
Ricardo Signes [Tue, 15 May 2012 11:27:17 +0000]
perldelta typo fixes (from mauke)

2 years agoRevert part of 34d9f36f9
Father Chrysostomos [Tue, 15 May 2012 20:53:29 +0000]
Revert part of 34d9f36f9

I was going to apply this after code freeze, but I made a mistake
when switching branches locally and ended up combining it with
another commit.

2 years agoAUTHORS: Shirataka -> Shirakata
Father Chrysostomos [Tue, 15 May 2012 20:51:47 +0000]
AUTHORS: Shirataka -> Shirakata

2 years agoperldelta: extraneous double spaces
Father Chrysostomos [Tue, 15 May 2012 20:39:22 +0000]
perldelta: extraneous double spaces

2 years agov5.16 RC0 perldelta cleanup
Tom Christiansen [Tue, 15 May 2012 20:38:09 +0000]
v5.16 RC0 perldelta cleanup

Below is a patch with some simple typo and verbosity cleanup in
the current pod/perldelta.pod in blead as of ~30 minutes ago.

2 years ago[perl #112944] perldelta: typo
Shirakata Kentaro [Tue, 15 May 2012 20:02:50 +0000]
[perl #112944] perldelta: typo

2 years agoAdd Shirataka Kentaro to AUTHORS
Father Chrysostomos [Tue, 15 May 2012 19:58:42 +0000]
Add Shirataka Kentaro to AUTHORS

2 years agoadd 5.16.0-RC0 and -RC1 to perlhist
Ricardo Signes [Tue, 15 May 2012 02:59:38 +0000]
add 5.16.0-RC0 and -RC1 to perlhist

2 years agominor grammar correction v5.16.0-RC1
Ricardo Signes [Tue, 15 May 2012 01:52:47 +0000]
minor grammar correction

thanks, Jim Keenan!

2 years agoadd Daniel Kahn Gillmor to AUTHORS
Ricardo Signes [Tue, 15 May 2012 01:49:01 +0000]
add Daniel Kahn Gillmor to AUTHORS

2 years agodocument the yet-explained Win32 test hanging
Ricardo Signes [Tue, 15 May 2012 01:22:06 +0000]
document the yet-explained Win32 test hanging

We will ship with this unfixed unless someone comes up with the
cure in the next week.

2 years agoperldelta: fix a noun/verb number agreement
Ricardo Signes [Tue, 15 May 2012 00:53:50 +0000]
perldelta: fix a noun/verb number agreement

reported by mauke

2 years agoskip t/win32/runenv.t unless -DPERL_IMPLICIT_SYS
Ricardo Signes [Tue, 15 May 2012 00:15:59 +0000]
skip t/win32/runenv.t unless -DPERL_IMPLICIT_SYS

this test fails without PERL_IMPLICIT_SYS, as reported by Steve
Hay in <CADED=K4EqXkJa2uC13wVYY_=uGDCx=uQ_rXu3Me4+3FvVM8D+g@mail.gmail.com>

2 years agoRevert fixes for [rt.cpan.org #61577]
Ricardo Signes [Mon, 14 May 2012 19:49:27 +0000]
Revert fixes for [rt.cpan.org #61577]

These changes introduced some test failures on AIX and other platforms,
and rather than dig around for more failing platforms during the RCx
period, we will revert this to reapply later when it is more tested.

This reverts commit 01b71c89216c9f447494638a5d108e13c45c3863.

This reverts commit b6903614db213f07401367249dc84c896eb099b7.

This reverts commit 271d04eee1933df0971f54f7bf9a5ca3575e7e6a.

2 years agonext release will be RC1
Ricardo Signes [Mon, 14 May 2012 16:26:36 +0000]
next release will be RC1

2 years agoperldelta: fix version named in acknowledgements
Ricardo Signes [Mon, 14 May 2012 16:26:24 +0000]
perldelta: fix version named in acknowledgements

2 years agoIn the Linux hints, invoke gcc with LANG and LC_ALL set to "C".
Nicholas Clark [Mon, 14 May 2012 09:17:06 +0000]
In the Linux hints, invoke gcc with LANG and LC_ALL set to "C".

The output of gcc -print-search-dirs is subject to localisation, which means
that the literal text "libraries" will not be present if the user has a
non-English locale, and we won't determine the correct path for libraries
such as -lm, breaking the build. Problem diagnosed by Alexander Hartmaier.

2 years agoDon't test that errno is still 0 after POSIX::f?pathconf
Paul Johnson [Mon, 14 May 2012 08:45:10 +0000]
Don't test that errno is still 0 after POSIX::f?pathconf

I think the best we can do with respect to the f?pathconf tests is to
make sure that the perl call doesn't die, and that the system call
doesn't fail.  And it's arguable we should only be testing the former.
But since we've been testing more that this anyway, it's probably safe
to test both.

With respect to the sysconf call, I think we shouldn't test more than
that perl doesn't die.  Any further testing would require different
tests based the argument being passed in.  Before doing that, it's
probably worth considering the purpose of the tests.  I don't think we
really want to test that POSIX has been implemented correctly, only that
our layer over it is correctly implemented.

This fixes RT #112866.

2 years agoperldelta: Remove duplicate paragraph
Karl Williamson [Mon, 14 May 2012 15:47:36 +0000]
perldelta: Remove duplicate paragraph

2 years agostudy as no-op is a bugfix, not performance enhancement
Ricardo Signes [Fri, 11 May 2012 22:00:03 +0000]
study as no-op is a bugfix, not performance enhancement

2 years agoperldelta: Add ‘(5.14.2)’ markers
Father Chrysostomos [Fri, 11 May 2012 16:55:09 +0000]
perldelta: Add ‘(5.14.2)’ markers

2 years agoperldelta: Explain the ‘(5.14.1)’ markers
Father Chrysostomos [Fri, 11 May 2012 16:50:20 +0000]
perldelta: Explain the ‘(5.14.1)’ markers

2 years agoperldelta: Use single quotes in C<>
Father Chrysostomos [Fri, 11 May 2012 16:48:49 +0000]
perldelta: Use single quotes in C<>

C<> renders as "..." in nroff, so C<... "..." ...> ends up looking weird.

2 years agoperldelta: Use L<> to link to changed module pods
Karl Williamson [Fri, 11 May 2012 16:44:10 +0000]
perldelta: Use L<> to link to changed module pods

Spotted by Vincent Pit

2 years agoperldelta: Reorder to avoid pronoun confusion
Karl Williamson [Fri, 11 May 2012 16:35:13 +0000]
perldelta: Reorder to avoid pronoun confusion

Spotted by Zsbán Ambrus

2 years agoperldelta: typo
Karl Williamson [Fri, 11 May 2012 16:29:31 +0000]
perldelta: typo

Spotted by Zsbán Ambrus

2 years agoperldelta: Add future deprecation text about \Q
Karl Williamson [Fri, 11 May 2012 16:25:15 +0000]
perldelta: Add future deprecation text about \Q

2 years agoperldelta: misuse of commas
Father Chrysostomos [Fri, 11 May 2012 16:30:25 +0000]
perldelta: misuse of commas

2 years agoperldelta: typo
Father Chrysostomos [Fri, 11 May 2012 16:27:08 +0000]
perldelta: typo

2 years agoperldelta: [rt.cpan.org #0], not RT 0
Father Chrysostomos [Fri, 11 May 2012 16:26:43 +0000]
perldelta: [rt.cpan.org #0], not RT 0

2 years agoRmv second ‘version’ in upgrade notices
Father Chrysostomos [Fri, 11 May 2012 16:24:14 +0000]
Rmv second ‘version’ in upgrade notices

Some of these were like this:

...from version 123 to version 456.

and some like this:

...from version 123 to 456.

Since the former is wordy, I’ve used the latter throughout.

2 years agoperldelta: Consistent fullstops for ‘upgraded from x to x’
Father Chrysostomos [Fri, 11 May 2012 16:20:46 +0000]
perldelta: Consistent fullstops for ‘upgraded from x to x’

2 years agoperldelta: consistent spaces after dots
Father Chrysostomos [Fri, 11 May 2012 16:18:44 +0000]
perldelta: consistent spaces after dots

2 years agoperldelta: consistent semicolons in CGI example
Father Chrysostomos [Fri, 11 May 2012 16:17:10 +0000]
perldelta: consistent semicolons in CGI example

2 years agoperldelta: grammar
Father Chrysostomos [Fri, 11 May 2012 16:16:43 +0000]
perldelta: grammar

2 years agoperldelta: fix capitalisation
Father Chrysostomos [Fri, 11 May 2012 16:16:11 +0000]
perldelta: fix capitalisation

2 years agoperldelta: Mention 5.14.0, not 5.13.6
Karl Williamson [Fri, 11 May 2012 15:40:20 +0000]
perldelta: Mention 5.14.0, not 5.13.6

2 years agoperldelta: Correct statement
Karl Williamson [Fri, 11 May 2012 15:37:37 +0000]
perldelta: Correct statement

It was pointed out to me after I wrote the text in an earlier perldelta
that this one is extracted from, that it is extremely unlikely to run
out of memory; I had not bothered to really do the math.

2 years agoperldelta: correct statement
Karl Williamson [Fri, 11 May 2012 15:36:45 +0000]
perldelta: correct statement

2 years agoperldelta: grammar
Karl Williamson [Fri, 11 May 2012 15:33:04 +0000]
perldelta: grammar

2 years agoperldelta: slightly expand and clarify policy note
Ricardo Signes [Fri, 11 May 2012 14:06:39 +0000]
perldelta: slightly expand and clarify policy note

2 years agoperldelta: break Pod:: deprecations onto two items
Ricardo Signes [Fri, 11 May 2012 12:18:05 +0000]
perldelta: break Pod:: deprecations onto two items

2 years agoRevert "perl5160delta: The coreargs opcode is undeserving of mention"
Ricardo Signes [Fri, 11 May 2012 12:07:25 +0000]
Revert "perl5160delta: The coreargs opcode is undeserving of mention"

This reverts commit 1061b56a7b2cc84a8ac96a405e5b8c185936605c.

This is a reversion of a reversion.  The reversion in 1061b56a7b2cc was
a bizarre mistake made during merging some blead/release conflicts, and
rjbs sincerely apologizes!

2 years agoadd long-form keys for newer versions in CoreList
Ricardo Signes [Fri, 11 May 2012 12:04:55 +0000]
add long-form keys for newer versions in CoreList

2 years agoVarious small grammar fixes in perldelta
Dave Rolsky [Fri, 11 May 2012 07:28:33 +0000]
Various small grammar fixes in perldelta

2 years agoperldelta: update "Updated Modules" with highlights
Ricardo Signes [Fri, 11 May 2012 02:36:37 +0000]
perldelta: update "Updated Modules" with highlights

2 years agobump the CoreList version in CoreList for 5.16
Ricardo Signes [Fri, 11 May 2012 01:24:47 +0000]
bump the CoreList version in CoreList for 5.16

2 years agoskip the porting/utils.t unless in a git checkout
Ricardo Signes [Thu, 10 May 2012 20:56:40 +0000]
skip the porting/utils.t unless in a git checkout

Today I tried to build 5.16.0-RC0 on my Linode and I got this:

  ok 78 # skip utils/cpanp-run-perl executes code in a BEGIN block which fails for
   empty @ARGV
  not ok 79 - utils/cpanp compiles
  # Failed test 79 - utils/cpanp compiles at porting/utils.t line 81
  #      got "defined(%hash) is deprecated at /usr/local/lib/perl5/site_perl/5.10.
  0/Locale/Maketext/Lexicon.pm line 307.\n\t(Maybe you should just omit the define
  d()?)\nutils/cpanp syntax OK\n"
  # expected "utils/cpanp syntax OK\n"

Ugh.  We really don't want this to happen to somebody else, because this
test is "do not let the developer break stuff" not "make sure the install
works."

2 years agoadd a changes_between function in Module::CoreList
Ricardo Signes [Thu, 10 May 2012 19:08:21 +0000]
add a changes_between function in Module::CoreList

2 years agopoint out "corelist --diff" in perldelta
Ricardo Signes [Thu, 10 May 2012 18:47:18 +0000]
point out "corelist --diff" in perldelta

2 years agoadd the --diff option to corelist
Ricardo Signes [Tue, 1 May 2012 22:28:43 +0000]
add the --diff option to corelist

2 years agoupdate Module::CoreList for 5.16.0
Ricardo Signes [Thu, 10 May 2012 18:37:22 +0000]
update Module::CoreList for 5.16.0

2 years agoallow for .tgz dists in the CoreList updater
Ricardo Signes [Thu, 10 May 2012 18:34:27 +0000]
allow for .tgz dists in the CoreList updater

2 years agoperldelta: the acknowledgements section!
Ricardo Signes [Thu, 10 May 2012 18:17:04 +0000]
perldelta: the acknowledgements section!

2 years agoperl5160delta: The coreargs opcode is undeserving of mention
Father Chrysostomos [Wed, 25 Apr 2012 05:23:34 +0000]
perl5160delta: The coreargs opcode is undeserving of mention

2 years agoimport perldelta from eb83ed8 into release branch
Ricardo Signes [Thu, 10 May 2012 13:38:02 +0000]
import perldelta from eb83ed8 into release branch

2 years agoperldelta: Explain stdio/sfio future deprecation.
Leon Timmermans [Thu, 3 May 2012 12:19:31 +0000]
perldelta: Explain stdio/sfio future deprecation.

2 years agookay the links to CPAN modules in the perldelta
Ricardo Signes [Wed, 2 May 2012 12:27:24 +0000]
okay the links to CPAN modules in the perldelta

2 years agoupdate .gitignore: we generate 5160delta now
Ricardo Signes [Wed, 2 May 2012 12:27:06 +0000]
update .gitignore: we generate 5160delta now

2 years agoregenerate uconfig.h
Ricardo Signes [Wed, 2 May 2012 12:17:38 +0000]
regenerate uconfig.h

2 years agoremove perl515*delta, add perl5160delta
Ricardo Signes [Wed, 2 May 2012 02:33:19 +0000]
remove perl515*delta, add perl5160delta

2 years agobump version to 5.16.0 RC0
Ricardo Signes [Wed, 2 May 2012 01:18:37 +0000]
bump version to 5.16.0 RC0

Done with:

  ./perl -Ilib Porting/bump-perl-version -i 5.15.9 5.16.0

...followed by a small edit to INSTALL and patchlevel.h.

2 years agoadd Test::More as a prereq to Makefile.PL
Dominic Hargreaves [Wed, 9 May 2012 18:09:18 +0000]
add Test::More as a prereq to Makefile.PL

2 years agosometimes fork() isn't available
Tony Cook [Wed, 9 May 2012 18:04:28 +0000]
sometimes fork() isn't available

This was amended from the original Tony prepared in a parallel branch

2 years ago[rt.cpan.org #61577] sockdomain and socktype undef on newly accepted sockets
Daniel Kahn Gillmor [Fri, 17 Feb 2012 22:29:14 +0000]
[rt.cpan.org #61577] sockdomain and socktype undef on newly accepted sockets

There appears to be a flaw in IO::Socket where some IO::Socket objects
are unable to properly report their socktype, sockdomain, or protocol
(they return undef, even when the underlying socket is sufficiently
initialized to have these properties).

The attached patch should cover IO::Socket objects created via accept(),
new_from_fd(), new(), and anywhere else whose details haven't been
properly cached.

No new code should be executed on IO::Socket objects whose details are
already cached and present.

2 years agoSkip Carp tests on VMS.
Craig A. Berry [Wed, 9 May 2012 23:41:05 +0000]
Skip Carp tests on VMS.

They want IPC::Open3::open3, which is not currently working.

2 years agoperl5160delta: tweaks
Father Chrysostomos [Wed, 9 May 2012 19:49:49 +0000]
perl5160delta: tweaks

sdio -> stdio
two spaces after dots

2 years agoperldelta: Explain stdio/sfio future deprecation.
Leon Timmermans [Thu, 3 May 2012 12:19:31 +0000]
perldelta: Explain stdio/sfio future deprecation.

2 years agoadd a missing blink above =item to s2p.PL
Ricardo Signes [Wed, 9 May 2012 01:11:12 +0000]
add a missing blink above =item to s2p.PL

2 years agoFix test failure
Father Chrysostomos [Tue, 8 May 2012 15:26:54 +0000]
Fix test failure

Lesson learnt: After switching from threaded to unthreaded and fixing
the test, switch back again and re-run the test. :-)

2 years ago[perl #112780] Don’t set cloned in-memory handles to ""
Father Chrysostomos [Tue, 8 May 2012 03:43:18 +0000]
[perl #112780] Don’t set cloned in-memory handles to ""

PerlIO::scalar’s dup function (PerlIOScalar_dup) calls the base imple-
mentation (PerlIOBase_dup), which pushes the scalar layer on to the
new file handle.

When the scalar layer is pushed, if the mode is ">" then
PerlIOScalar_pushed sets the scalar to the empty string.  If it is
already a string, it does this simply by setting SvCUR to 0, without
touching the string buffer.

The upshot of this is that newly-cloned in-memory handles turn into
the empty string, as in this example:

use threads;
my $str = '';
open my $fh, ">", \$str;
$str = 'a';
async {
    warn $str;  # something's wrong
}->join;

This has probably always been this way.

The test suite for MSCHWERN/Test-Simple-1.005000_005.tar.gz does some-
thing similar to this:

use threads;
my $str = '';
open my $fh, ">", \$str;
print $fh "a";
async {
    print $fh "b";
    warn $str;  # "ab" expected, but 5.15.7-9 gives "\0b"
}->join;

What was happening before commit b6597275 was that two bugs were can-
celling each other out: $str would be "" when the new thread started,
but with a string buffer containing "a" beyond the end of the string
and $fh remembering 1 as its position.  The bug fixed by b6597275 was
that writing past the end of a string through a filehandle was leaving
junk (whatever was in memory already) in the intervening space between
the old end of string and the beginning of what was being written to
the string.  This allowed "" to turn magically into "ab" when "b" was
written one character past the end of the string.  Commit b6597275
started zeroing out the intervening space in that case, causing the
cloning bug to rear its head.

This commit solves the problem by hiding the scalar temporarily
in PerlIOScalar_dup so that PerlIOScalar_pushed won’t be able to
modify it.

Should PerlIOScalar_pushed stop clobbering the string and should
PerlIOScalar_open do it instead?  Perhaps.  But that would be a bigger
change, and we are supposed to be in code freeze right now.

2 years agoIncrease $PerlIO::scalar::VERSION to 0.14
Father Chrysostomos [Mon, 7 May 2012 21:53:20 +0000]
Increase $PerlIO::scalar::VERSION to 0.14

2 years agowith 5.16.0, 5.12.x is security-only
Ricardo Signes [Mon, 7 May 2012 15:03:37 +0000]
with 5.16.0, 5.12.x is security-only

2 years agocheck for PA* in both branches of case
H.Merijn Brand [Sun, 6 May 2012 11:11:03 +0000]
check for PA* in both branches of case

(thanks ilmari for spotting)

2 years agoDisable optimizer for 32bit PA-RISC builds on HP-UX
H.Merijn Brand [Sun, 6 May 2012 11:03:08 +0000]
Disable optimizer for 32bit PA-RISC builds on HP-UX

The (ANSI) C compiler fails to compile precompiled (.i) files when both
-g and -O (all +O1 and above) are given. When -g is requested, -O, +O,
and +Onolimit are removed from optimize flags

This #fail does not occur with the newer aCC compiler B3910B, which is
also used on HP-UX on Itanium.

The check/modification has to be done as late as possible, as the other
options, like -Duse64bitall and -DDEBUGING, will modify the variables
that need to be checked after hints/hpux.sh has been dealt with.

2 years agoAdd --libpods back as a non-functional option to pod2html.
Steve Peters [Fri, 4 May 2012 15:51:06 +0000]
Add --libpods back as a non-functional option to pod2html.

When --libpods was removed, this broke backward compatiblility with
existing uses.  This change adds back the option, but warns that
--libpods is no longer supported.

2 years agodelete PERL_YAML_BACKEND and PERL_JSON_BACKEND in T/TEST
David Golden [Fri, 4 May 2012 15:02:26 +0000]
delete PERL_YAML_BACKEND and PERL_JSON_BACKEND in T/TEST

If these are set, Parse-CPAN-Meta and other things that depend
on it may fail.

2 years agoCorrect variable name in example.
Paul Johnson [Sun, 29 Apr 2012 18:27:37 +0000]
Correct variable name in example.

As noticed by Lawrence Statton <lawrence@cluon.com>

2 years agoBump the version of perl5db since the porting scripts care
Jesse Vincent [Tue, 24 Apr 2012 23:02:34 +0000]
Bump the version of perl5db since the porting scripts care

2 years agowe no longer have in-file changelogs, since we have a version control system
Jesse Vincent [Tue, 24 Apr 2012 19:35:39 +0000]
we no longer have in-file changelogs, since we have a version control system

2 years agoWe now have version control and no longer need a changelog in perl5db
Jesse Vincent [Tue, 24 Apr 2012 19:05:55 +0000]
We now have version control and no longer need a changelog in perl5db

2 years agoutf8n_to_uvuni(): Fix broken malformation interactions
Karl Williamson [Fri, 27 Apr 2012 17:09:14 +0000]
utf8n_to_uvuni(): Fix broken malformation interactions

All code points whose UTF-8 representations start with a byte containing
either \xFE or \xFF are considered problematic because they are not
portable.  There are many such code points that are too large to
represent on a 32 or even a 64 bit platform.  Commit
eb83ed87110e41de6a4cd4463f75df60798a9243 failed to properly catch
overflow when the input flags to this function say to warn on, but
otherwise accept FE and FF sequences.  Now overflow is checked for
unconditionally.

2 years agoReally increase $File::DosGlob::VERSION to 1.07
Father Chrysostomos [Fri, 27 Apr 2012 20:31:20 +0000]
Really increase $File::DosGlob::VERSION to 1.07

I honestly thought I had run the tests, but I suppose not.

2 years agoIncrease $version::VERSION to 0.99
Father Chrysostomos [Fri, 27 Apr 2012 16:43:07 +0000]
Increase $version::VERSION to 0.99

What we have in blead right now matches that CPAN release, so this
version bump *must* happen before 5.16.

2 years agodisable codes_in_verbatim for Pod::Html
Ricardo Signes [Fri, 27 Apr 2012 01:39:33 +0000]
disable codes_in_verbatim for Pod::Html

...otherwise all our verbatim blocks will change radically!

2 years agois_utf8_char_slow(): Avoid accepting overlongs
Karl Williamson [Thu, 19 Apr 2012 04:14:15 +0000]
is_utf8_char_slow(): Avoid accepting overlongs

There are possible overlong sequences that this function blindly
accepts.  Instead of developing the code to figure this out, turn this
function into a wrapper for utf8n_to_uvuni() which already has this
check.

2 years agoperlapi: Update for changes in utf8 decoding
Karl Williamson [Thu, 19 Apr 2012 00:32:57 +0000]
perlapi: Update for changes in utf8 decoding

2 years agoutf8.c: White-space only
Karl Williamson [Mon, 23 Apr 2012 19:28:32 +0000]
utf8.c: White-space only

This outdents to account for the removal of a surrounding block.

2 years agoutf8.c: refactor utf8n_to_uvuni()
Karl Williamson [Wed, 18 Apr 2012 23:36:01 +0000]
utf8.c: refactor utf8n_to_uvuni()

The prior version had a number of issues, some of which have been taken
care of in previous commits.

The goal when presented with malformed input is to consume as few bytes
as possible, so as to position the input for the next try to the first
possible byte that could be the beginning of a character.  We don't want
to consume too few bytes, so that the next call has us thinking that
what is the middle of a character is really the beginning; nor do we
want to consume too many, so as to skip valid input characters.  (This
is forbidden by the Unicode standard because of security
considerations.)  The previous code could do both of these under various
circumstances.

In some cases it took as a given that the first byte in a character is
correct, and skipped looking at the rest of the bytes in the sequence.
This is wrong when just that first byte is garbled.  We have to look at
all bytes in the expected sequence to make sure it hasn't been
prematurely terminated from what we were led to expect by that first
byte.

Likewise when we get an overflow: we have to keep looking at each byte
in the sequence.  It may be that the initial byte was garbled, so that
it appeared that there was going to be overflow, but in reality, the
input was supposed to be a shorter sequence that doesn't overflow.  We
want to have an error on that shorter sequence, and advance the pointer
to just beyond it, which is the first position where a valid character
could start.

This fixes a long-standing TODO from an externally supplied utf8 decode
test suite.

And, the old algorithm for finding overflow failed to detect it on some
inputs.  This was spotted by Hugo van der Sanden, who suggested the new
algorithm that this commit uses, and which should work in all instances.
For example, on a 32-bit machine, any string beginning with "\xFE" and
having the next byte be either "\x86" or \x87 overflows, but this was
missed by the old algorithm.

Another bug was that the code was careless about what happens when a
malformation occurs that the input flags allow. For example, a sequence
should not start with a continuation byte.  If that malformation is
allowed, the code pretended it is a start byte and extracts the "length"
of the sequence from it.  But pretending it is a start byte is not the
same thing as it actually being a start byte, and so there is no
extractable length in it, so the number that this code thought was
"length" was bogus.

Yet another bug fixed is that if only the warning subcategories of the
utf8 category were turned on, and not the entire utf8 category itself,
warnings were not raised that should have been.

And yet another change is that given malformed input with warnings
turned off, this function used to return whatever it had computed so
far, which is incomplete or erroneous garbage.  This commit changes to
return the REPLACEMENT CHARACTER instead.

Thanks to Hugo van der Sanden for reviewing and finding problems with an
earlier version of these commits

2 years agoutf8n_to_uvuni: Avoid reading outside of buffer
Karl Williamson [Wed, 18 Apr 2012 22:48:29 +0000]
utf8n_to_uvuni: Avoid reading outside of buffer

Prior to this patch, if the first byte of a UTF-8 sequence indicated
that the sequence occupied n bytes, but the input parameters indicated
that fewer were available, all n were attempted to be read

2 years agoutf8.c: Clarify and correct pod
Karl Williamson [Wed, 18 Apr 2012 22:35:39 +0000]
utf8.c: Clarify and correct pod

Some of these were spotted by Hugo van der Sanden

2 years agoutf8.c: Use macros instead of if..else.. sequence
Karl Williamson [Wed, 18 Apr 2012 22:20:22 +0000]
utf8.c: Use macros instead of if..else.. sequence

There are two existing macros that do the job that this longish sequence
does.  One, UTF8SKIP(), does an array lookup and is very likely to be in
the machine's cache as it is used ubiquitously when processing UTF-8.
The other is a simple test and shift.  These simplify the code and
should speed things up as well.