23 months agoperldelta: fix bad references to "unicode_strings" v5.16.0-RC2
Tom Hukins [Wed, 16 May 2012 01:42:48 +0000]
perldelta: fix bad references to "unicode_strings"

The documentation written for 2e2b2571 erroneously mentions
"unicode_semantics" instead of "unicode_strings".

23 months agoprevent PERL_UNICODE from affecting t/mro/package_aliases_utf8.t
Ricardo Signes [Wed, 16 May 2012 01:34:00 +0000]
prevent PERL_UNICODE from affecting t/mro/package_aliases_utf8.t

23 months agoperldelta: known issue: t/op/filetest.t
Ricardo Signes [Wed, 16 May 2012 01:22:21 +0000]
perldelta: known issue: t/op/filetest.t

23 months agonote the gcc -O2 and link-time-optimization problem
Andy Dougherty [Wed, 16 May 2012 01:16:45 +0000]
note the gcc -O2 and link-time-optimization problem

23 months agoour next release is RC2
Ricardo Signes [Wed, 16 May 2012 01:13:08 +0000]
our next release is RC2

23 months agoperldelta: Americanise spellings
Ricardo Signes [Tue, 15 May 2012 21:59:48 +0000]
perldelta: Americanise spellings

23 months agoreflect Socket update in Module::CoreList
Ricardo Signes [Tue, 15 May 2012 11:41:36 +0000]
reflect Socket update in Module::CoreList

23 months agoUpdate Socket to CPAN version 2.001
Tony Cook [Tue, 15 May 2012 09:22:30 +0000]
Update Socket to CPAN version 2.001

2.001   CHANGES:
         * Apply (modified) patch from ppisar@redhat.com to fix memory
           addressing bug with Zero() - RT76067
         * Document that inet_pton() doesn't work on hostnames, only textual
           addresses - RT76010
         * Ignore any existing-but-undefined hints hash members to
           getaddrinfo()

Done for the critical RT76067 fix.

23 months agoperldelta typo fixes (from mauke)
Ricardo Signes [Tue, 15 May 2012 11:27:17 +0000]
perldelta typo fixes (from mauke)

23 months agoRevert part of 34d9f36f9
Father Chrysostomos [Tue, 15 May 2012 20:53:29 +0000]
Revert part of 34d9f36f9

I was going to apply this after code freeze, but I made a mistake
when switching branches locally and ended up combining it with
another commit.

23 months agoAUTHORS: Shirataka -> Shirakata
Father Chrysostomos [Tue, 15 May 2012 20:51:47 +0000]
AUTHORS: Shirataka -> Shirakata

23 months agoperldelta: extraneous double spaces
Father Chrysostomos [Tue, 15 May 2012 20:39:22 +0000]
perldelta: extraneous double spaces

23 months agov5.16 RC0 perldelta cleanup
Tom Christiansen [Tue, 15 May 2012 20:38:09 +0000]
v5.16 RC0 perldelta cleanup

Below is a patch with some simple typo and verbosity cleanup in
the current pod/perldelta.pod in blead as of ~30 minutes ago.

23 months ago[perl #112944] perldelta: typo
Shirakata Kentaro [Tue, 15 May 2012 20:02:50 +0000]
[perl #112944] perldelta: typo

23 months agoAdd Shirataka Kentaro to AUTHORS
Father Chrysostomos [Tue, 15 May 2012 19:58:42 +0000]
Add Shirataka Kentaro to AUTHORS

23 months agoadd 5.16.0-RC0 and -RC1 to perlhist
Ricardo Signes [Tue, 15 May 2012 02:59:38 +0000]
add 5.16.0-RC0 and -RC1 to perlhist

23 months agominor grammar correction v5.16.0-RC1
Ricardo Signes [Tue, 15 May 2012 01:52:47 +0000]
minor grammar correction

thanks, Jim Keenan!

23 months agoadd Daniel Kahn Gillmor to AUTHORS
Ricardo Signes [Tue, 15 May 2012 01:49:01 +0000]
add Daniel Kahn Gillmor to AUTHORS

23 months agodocument the yet-explained Win32 test hanging
Ricardo Signes [Tue, 15 May 2012 01:22:06 +0000]
document the yet-explained Win32 test hanging

We will ship with this unfixed unless someone comes up with the
cure in the next week.

23 months agoperldelta: fix a noun/verb number agreement
Ricardo Signes [Tue, 15 May 2012 00:53:50 +0000]
perldelta: fix a noun/verb number agreement

reported by mauke

23 months agoskip t/win32/runenv.t unless -DPERL_IMPLICIT_SYS
Ricardo Signes [Tue, 15 May 2012 00:15:59 +0000]
skip t/win32/runenv.t unless -DPERL_IMPLICIT_SYS

this test fails without PERL_IMPLICIT_SYS, as reported by Steve
Hay in <CADED=K4EqXkJa2uC13wVYY_=uGDCx=uQ_rXu3Me4+3FvVM8D+g@mail.gmail.com>

23 months agoRevert fixes for [rt.cpan.org #61577]
Ricardo Signes [Mon, 14 May 2012 19:49:27 +0000]
Revert fixes for [rt.cpan.org #61577]

These changes introduced some test failures on AIX and other platforms,
and rather than dig around for more failing platforms during the RCx
period, we will revert this to reapply later when it is more tested.

This reverts commit 01b71c89216c9f447494638a5d108e13c45c3863.

This reverts commit b6903614db213f07401367249dc84c896eb099b7.

This reverts commit 271d04eee1933df0971f54f7bf9a5ca3575e7e6a.

23 months agonext release will be RC1
Ricardo Signes [Mon, 14 May 2012 16:26:36 +0000]
next release will be RC1

23 months agoperldelta: fix version named in acknowledgements
Ricardo Signes [Mon, 14 May 2012 16:26:24 +0000]
perldelta: fix version named in acknowledgements

23 months agoIn the Linux hints, invoke gcc with LANG and LC_ALL set to "C".
Nicholas Clark [Mon, 14 May 2012 09:17:06 +0000]
In the Linux hints, invoke gcc with LANG and LC_ALL set to "C".

The output of gcc -print-search-dirs is subject to localisation, which means
that the literal text "libraries" will not be present if the user has a
non-English locale, and we won't determine the correct path for libraries
such as -lm, breaking the build. Problem diagnosed by Alexander Hartmaier.

23 months agoDon't test that errno is still 0 after POSIX::f?pathconf
Paul Johnson [Mon, 14 May 2012 08:45:10 +0000]
Don't test that errno is still 0 after POSIX::f?pathconf

I think the best we can do with respect to the f?pathconf tests is to
make sure that the perl call doesn't die, and that the system call
doesn't fail.  And it's arguable we should only be testing the former.
But since we've been testing more that this anyway, it's probably safe
to test both.

With respect to the sysconf call, I think we shouldn't test more than
that perl doesn't die.  Any further testing would require different
tests based the argument being passed in.  Before doing that, it's
probably worth considering the purpose of the tests.  I don't think we
really want to test that POSIX has been implemented correctly, only that
our layer over it is correctly implemented.

This fixes RT #112866.

23 months agoperldelta: Remove duplicate paragraph
Karl Williamson [Mon, 14 May 2012 15:47:36 +0000]
perldelta: Remove duplicate paragraph

23 months agostudy as no-op is a bugfix, not performance enhancement
Ricardo Signes [Fri, 11 May 2012 22:00:03 +0000]
study as no-op is a bugfix, not performance enhancement

23 months agoperldelta: Add ‘(5.14.2)’ markers
Father Chrysostomos [Fri, 11 May 2012 16:55:09 +0000]
perldelta: Add ‘(5.14.2)’ markers

23 months agoperldelta: Explain the ‘(5.14.1)’ markers
Father Chrysostomos [Fri, 11 May 2012 16:50:20 +0000]
perldelta: Explain the ‘(5.14.1)’ markers

23 months agoperldelta: Use single quotes in C<>
Father Chrysostomos [Fri, 11 May 2012 16:48:49 +0000]
perldelta: Use single quotes in C<>

C<> renders as "..." in nroff, so C<... "..." ...> ends up looking weird.

23 months agoperldelta: Use L<> to link to changed module pods
Karl Williamson [Fri, 11 May 2012 16:44:10 +0000]
perldelta: Use L<> to link to changed module pods

Spotted by Vincent Pit

23 months agoperldelta: Reorder to avoid pronoun confusion
Karl Williamson [Fri, 11 May 2012 16:35:13 +0000]
perldelta: Reorder to avoid pronoun confusion

Spotted by Zsbán Ambrus

23 months agoperldelta: typo
Karl Williamson [Fri, 11 May 2012 16:29:31 +0000]
perldelta: typo

Spotted by Zsbán Ambrus

23 months agoperldelta: Add future deprecation text about \Q
Karl Williamson [Fri, 11 May 2012 16:25:15 +0000]
perldelta: Add future deprecation text about \Q

23 months agoperldelta: misuse of commas
Father Chrysostomos [Fri, 11 May 2012 16:30:25 +0000]
perldelta: misuse of commas

23 months agoperldelta: typo
Father Chrysostomos [Fri, 11 May 2012 16:27:08 +0000]
perldelta: typo

23 months agoperldelta: [rt.cpan.org #0], not RT 0
Father Chrysostomos [Fri, 11 May 2012 16:26:43 +0000]
perldelta: [rt.cpan.org #0], not RT 0

23 months agoRmv second ‘version’ in upgrade notices
Father Chrysostomos [Fri, 11 May 2012 16:24:14 +0000]
Rmv second ‘version’ in upgrade notices

Some of these were like this:

...from version 123 to version 456.

and some like this:

...from version 123 to 456.

Since the former is wordy, I’ve used the latter throughout.

23 months agoperldelta: Consistent fullstops for ‘upgraded from x to x’
Father Chrysostomos [Fri, 11 May 2012 16:20:46 +0000]
perldelta: Consistent fullstops for ‘upgraded from x to x’

23 months agoperldelta: consistent spaces after dots
Father Chrysostomos [Fri, 11 May 2012 16:18:44 +0000]
perldelta: consistent spaces after dots

23 months agoperldelta: consistent semicolons in CGI example
Father Chrysostomos [Fri, 11 May 2012 16:17:10 +0000]
perldelta: consistent semicolons in CGI example

23 months agoperldelta: grammar
Father Chrysostomos [Fri, 11 May 2012 16:16:43 +0000]
perldelta: grammar

23 months agoperldelta: fix capitalisation
Father Chrysostomos [Fri, 11 May 2012 16:16:11 +0000]
perldelta: fix capitalisation

23 months agoperldelta: Mention 5.14.0, not 5.13.6
Karl Williamson [Fri, 11 May 2012 15:40:20 +0000]
perldelta: Mention 5.14.0, not 5.13.6

23 months agoperldelta: Correct statement
Karl Williamson [Fri, 11 May 2012 15:37:37 +0000]
perldelta: Correct statement

It was pointed out to me after I wrote the text in an earlier perldelta
that this one is extracted from, that it is extremely unlikely to run
out of memory; I had not bothered to really do the math.

23 months agoperldelta: correct statement
Karl Williamson [Fri, 11 May 2012 15:36:45 +0000]
perldelta: correct statement

23 months agoperldelta: grammar
Karl Williamson [Fri, 11 May 2012 15:33:04 +0000]
perldelta: grammar

23 months agoperldelta: slightly expand and clarify policy note
Ricardo Signes [Fri, 11 May 2012 14:06:39 +0000]
perldelta: slightly expand and clarify policy note

23 months agoperldelta: break Pod:: deprecations onto two items
Ricardo Signes [Fri, 11 May 2012 12:18:05 +0000]
perldelta: break Pod:: deprecations onto two items

23 months agoRevert "perl5160delta: The coreargs opcode is undeserving of mention"
Ricardo Signes [Fri, 11 May 2012 12:07:25 +0000]
Revert "perl5160delta: The coreargs opcode is undeserving of mention"

This reverts commit 1061b56a7b2cc84a8ac96a405e5b8c185936605c.

This is a reversion of a reversion.  The reversion in 1061b56a7b2cc was
a bizarre mistake made during merging some blead/release conflicts, and
rjbs sincerely apologizes!

23 months agoadd long-form keys for newer versions in CoreList
Ricardo Signes [Fri, 11 May 2012 12:04:55 +0000]
add long-form keys for newer versions in CoreList

23 months agoVarious small grammar fixes in perldelta
Dave Rolsky [Fri, 11 May 2012 07:28:33 +0000]
Various small grammar fixes in perldelta

23 months agoperldelta: update "Updated Modules" with highlights
Ricardo Signes [Fri, 11 May 2012 02:36:37 +0000]
perldelta: update "Updated Modules" with highlights

23 months agobump the CoreList version in CoreList for 5.16
Ricardo Signes [Fri, 11 May 2012 01:24:47 +0000]
bump the CoreList version in CoreList for 5.16

23 months agoskip the porting/utils.t unless in a git checkout
Ricardo Signes [Thu, 10 May 2012 20:56:40 +0000]
skip the porting/utils.t unless in a git checkout

Today I tried to build 5.16.0-RC0 on my Linode and I got this:

  ok 78 # skip utils/cpanp-run-perl executes code in a BEGIN block which fails for
   empty @ARGV
  not ok 79 - utils/cpanp compiles
  # Failed test 79 - utils/cpanp compiles at porting/utils.t line 81
  #      got "defined(%hash) is deprecated at /usr/local/lib/perl5/site_perl/5.10.
  0/Locale/Maketext/Lexicon.pm line 307.\n\t(Maybe you should just omit the define
  d()?)\nutils/cpanp syntax OK\n"
  # expected "utils/cpanp syntax OK\n"

Ugh.  We really don't want this to happen to somebody else, because this
test is "do not let the developer break stuff" not "make sure the install
works."

23 months agoadd a changes_between function in Module::CoreList
Ricardo Signes [Thu, 10 May 2012 19:08:21 +0000]
add a changes_between function in Module::CoreList

23 months agopoint out "corelist --diff" in perldelta
Ricardo Signes [Thu, 10 May 2012 18:47:18 +0000]
point out "corelist --diff" in perldelta

23 months agoadd the --diff option to corelist
Ricardo Signes [Tue, 1 May 2012 22:28:43 +0000]
add the --diff option to corelist

23 months agoupdate Module::CoreList for 5.16.0
Ricardo Signes [Thu, 10 May 2012 18:37:22 +0000]
update Module::CoreList for 5.16.0

23 months agoallow for .tgz dists in the CoreList updater
Ricardo Signes [Thu, 10 May 2012 18:34:27 +0000]
allow for .tgz dists in the CoreList updater

23 months agoperldelta: the acknowledgements section!
Ricardo Signes [Thu, 10 May 2012 18:17:04 +0000]
perldelta: the acknowledgements section!

23 months agoperl5160delta: The coreargs opcode is undeserving of mention
Father Chrysostomos [Wed, 25 Apr 2012 05:23:34 +0000]
perl5160delta: The coreargs opcode is undeserving of mention

23 months agoimport perldelta from eb83ed8 into release branch
Ricardo Signes [Thu, 10 May 2012 13:38:02 +0000]
import perldelta from eb83ed8 into release branch

23 months agoperldelta: Explain stdio/sfio future deprecation.
Leon Timmermans [Thu, 3 May 2012 12:19:31 +0000]
perldelta: Explain stdio/sfio future deprecation.

23 months agookay the links to CPAN modules in the perldelta
Ricardo Signes [Wed, 2 May 2012 12:27:24 +0000]
okay the links to CPAN modules in the perldelta

23 months agoupdate .gitignore: we generate 5160delta now
Ricardo Signes [Wed, 2 May 2012 12:27:06 +0000]
update .gitignore: we generate 5160delta now

23 months agoregenerate uconfig.h
Ricardo Signes [Wed, 2 May 2012 12:17:38 +0000]
regenerate uconfig.h

23 months agoremove perl515*delta, add perl5160delta
Ricardo Signes [Wed, 2 May 2012 02:33:19 +0000]
remove perl515*delta, add perl5160delta

23 months agobump version to 5.16.0 RC0
Ricardo Signes [Wed, 2 May 2012 01:18:37 +0000]
bump version to 5.16.0 RC0

Done with:

  ./perl -Ilib Porting/bump-perl-version -i 5.15.9 5.16.0

...followed by a small edit to INSTALL and patchlevel.h.

23 months agoadd Test::More as a prereq to Makefile.PL
Dominic Hargreaves [Wed, 9 May 2012 18:09:18 +0000]
add Test::More as a prereq to Makefile.PL

23 months agosometimes fork() isn't available
Tony Cook [Wed, 9 May 2012 18:04:28 +0000]
sometimes fork() isn't available

This was amended from the original Tony prepared in a parallel branch

23 months ago[rt.cpan.org #61577] sockdomain and socktype undef on newly accepted sockets
Daniel Kahn Gillmor [Fri, 17 Feb 2012 22:29:14 +0000]
[rt.cpan.org #61577] sockdomain and socktype undef on newly accepted sockets

There appears to be a flaw in IO::Socket where some IO::Socket objects
are unable to properly report their socktype, sockdomain, or protocol
(they return undef, even when the underlying socket is sufficiently
initialized to have these properties).

The attached patch should cover IO::Socket objects created via accept(),
new_from_fd(), new(), and anywhere else whose details haven't been
properly cached.

No new code should be executed on IO::Socket objects whose details are
already cached and present.

23 months agoSkip Carp tests on VMS.
Craig A. Berry [Wed, 9 May 2012 23:41:05 +0000]
Skip Carp tests on VMS.

They want IPC::Open3::open3, which is not currently working.

23 months agoperl5160delta: tweaks
Father Chrysostomos [Wed, 9 May 2012 19:49:49 +0000]
perl5160delta: tweaks

sdio -> stdio
two spaces after dots

23 months agoperldelta: Explain stdio/sfio future deprecation.
Leon Timmermans [Thu, 3 May 2012 12:19:31 +0000]
perldelta: Explain stdio/sfio future deprecation.

23 months agoadd a missing blink above =item to s2p.PL
Ricardo Signes [Wed, 9 May 2012 01:11:12 +0000]
add a missing blink above =item to s2p.PL

23 months agoFix test failure
Father Chrysostomos [Tue, 8 May 2012 15:26:54 +0000]
Fix test failure

Lesson learnt: After switching from threaded to unthreaded and fixing
the test, switch back again and re-run the test. :-)

23 months ago[perl #112780] Don’t set cloned in-memory handles to ""
Father Chrysostomos [Tue, 8 May 2012 03:43:18 +0000]
[perl #112780] Don’t set cloned in-memory handles to ""

PerlIO::scalar’s dup function (PerlIOScalar_dup) calls the base imple-
mentation (PerlIOBase_dup), which pushes the scalar layer on to the
new file handle.

When the scalar layer is pushed, if the mode is ">" then
PerlIOScalar_pushed sets the scalar to the empty string.  If it is
already a string, it does this simply by setting SvCUR to 0, without
touching the string buffer.

The upshot of this is that newly-cloned in-memory handles turn into
the empty string, as in this example:

use threads;
my $str = '';
open my $fh, ">", \$str;
$str = 'a';
async {
    warn $str;  # something's wrong
}->join;

This has probably always been this way.

The test suite for MSCHWERN/Test-Simple-1.005000_005.tar.gz does some-
thing similar to this:

use threads;
my $str = '';
open my $fh, ">", \$str;
print $fh "a";
async {
    print $fh "b";
    warn $str;  # "ab" expected, but 5.15.7-9 gives "\0b"
}->join;

What was happening before commit b6597275 was that two bugs were can-
celling each other out: $str would be "" when the new thread started,
but with a string buffer containing "a" beyond the end of the string
and $fh remembering 1 as its position.  The bug fixed by b6597275 was
that writing past the end of a string through a filehandle was leaving
junk (whatever was in memory already) in the intervening space between
the old end of string and the beginning of what was being written to
the string.  This allowed "" to turn magically into "ab" when "b" was
written one character past the end of the string.  Commit b6597275
started zeroing out the intervening space in that case, causing the
cloning bug to rear its head.

This commit solves the problem by hiding the scalar temporarily
in PerlIOScalar_dup so that PerlIOScalar_pushed won’t be able to
modify it.

Should PerlIOScalar_pushed stop clobbering the string and should
PerlIOScalar_open do it instead?  Perhaps.  But that would be a bigger
change, and we are supposed to be in code freeze right now.

23 months agoIncrease $PerlIO::scalar::VERSION to 0.14
Father Chrysostomos [Mon, 7 May 2012 21:53:20 +0000]
Increase $PerlIO::scalar::VERSION to 0.14

23 months agowith 5.16.0, 5.12.x is security-only
Ricardo Signes [Mon, 7 May 2012 15:03:37 +0000]
with 5.16.0, 5.12.x is security-only

23 months agocheck for PA* in both branches of case
H.Merijn Brand [Sun, 6 May 2012 11:11:03 +0000]
check for PA* in both branches of case

(thanks ilmari for spotting)

23 months agoDisable optimizer for 32bit PA-RISC builds on HP-UX
H.Merijn Brand [Sun, 6 May 2012 11:03:08 +0000]
Disable optimizer for 32bit PA-RISC builds on HP-UX

The (ANSI) C compiler fails to compile precompiled (.i) files when both
-g and -O (all +O1 and above) are given. When -g is requested, -O, +O,
and +Onolimit are removed from optimize flags

This #fail does not occur with the newer aCC compiler B3910B, which is
also used on HP-UX on Itanium.

The check/modification has to be done as late as possible, as the other
options, like -Duse64bitall and -DDEBUGING, will modify the variables
that need to be checked after hints/hpux.sh has been dealt with.

23 months agoAdd --libpods back as a non-functional option to pod2html.
Steve Peters [Fri, 4 May 2012 15:51:06 +0000]
Add --libpods back as a non-functional option to pod2html.

When --libpods was removed, this broke backward compatiblility with
existing uses.  This change adds back the option, but warns that
--libpods is no longer supported.

23 months agodelete PERL_YAML_BACKEND and PERL_JSON_BACKEND in T/TEST
David Golden [Fri, 4 May 2012 15:02:26 +0000]
delete PERL_YAML_BACKEND and PERL_JSON_BACKEND in T/TEST

If these are set, Parse-CPAN-Meta and other things that depend
on it may fail.

23 months agoCorrect variable name in example.
Paul Johnson [Sun, 29 Apr 2012 18:27:37 +0000]
Correct variable name in example.

As noticed by Lawrence Statton <lawrence@cluon.com>

23 months agoBump the version of perl5db since the porting scripts care
Jesse Vincent [Tue, 24 Apr 2012 23:02:34 +0000]
Bump the version of perl5db since the porting scripts care

23 months agowe no longer have in-file changelogs, since we have a version control system
Jesse Vincent [Tue, 24 Apr 2012 19:35:39 +0000]
we no longer have in-file changelogs, since we have a version control system

23 months agoWe now have version control and no longer need a changelog in perl5db
Jesse Vincent [Tue, 24 Apr 2012 19:05:55 +0000]
We now have version control and no longer need a changelog in perl5db

23 months agoutf8n_to_uvuni(): Fix broken malformation interactions
Karl Williamson [Fri, 27 Apr 2012 17:09:14 +0000]
utf8n_to_uvuni(): Fix broken malformation interactions

All code points whose UTF-8 representations start with a byte containing
either \xFE or \xFF are considered problematic because they are not
portable.  There are many such code points that are too large to
represent on a 32 or even a 64 bit platform.  Commit
eb83ed87110e41de6a4cd4463f75df60798a9243 failed to properly catch
overflow when the input flags to this function say to warn on, but
otherwise accept FE and FF sequences.  Now overflow is checked for
unconditionally.

23 months agoReally increase $File::DosGlob::VERSION to 1.07
Father Chrysostomos [Fri, 27 Apr 2012 20:31:20 +0000]
Really increase $File::DosGlob::VERSION to 1.07

I honestly thought I had run the tests, but I suppose not.

23 months agoIncrease $version::VERSION to 0.99
Father Chrysostomos [Fri, 27 Apr 2012 16:43:07 +0000]
Increase $version::VERSION to 0.99

What we have in blead right now matches that CPAN release, so this
version bump *must* happen before 5.16.

23 months agodisable codes_in_verbatim for Pod::Html
Ricardo Signes [Fri, 27 Apr 2012 01:39:33 +0000]
disable codes_in_verbatim for Pod::Html

...otherwise all our verbatim blocks will change radically!

23 months agois_utf8_char_slow(): Avoid accepting overlongs
Karl Williamson [Thu, 19 Apr 2012 04:14:15 +0000]
is_utf8_char_slow(): Avoid accepting overlongs

There are possible overlong sequences that this function blindly
accepts.  Instead of developing the code to figure this out, turn this
function into a wrapper for utf8n_to_uvuni() which already has this
check.

23 months agoperlapi: Update for changes in utf8 decoding
Karl Williamson [Thu, 19 Apr 2012 00:32:57 +0000]
perlapi: Update for changes in utf8 decoding

23 months agoutf8.c: White-space only
Karl Williamson [Mon, 23 Apr 2012 19:28:32 +0000]
utf8.c: White-space only

This outdents to account for the removal of a surrounding block.

23 months agoutf8.c: refactor utf8n_to_uvuni()
Karl Williamson [Wed, 18 Apr 2012 23:36:01 +0000]
utf8.c: refactor utf8n_to_uvuni()

The prior version had a number of issues, some of which have been taken
care of in previous commits.

The goal when presented with malformed input is to consume as few bytes
as possible, so as to position the input for the next try to the first
possible byte that could be the beginning of a character.  We don't want
to consume too few bytes, so that the next call has us thinking that
what is the middle of a character is really the beginning; nor do we
want to consume too many, so as to skip valid input characters.  (This
is forbidden by the Unicode standard because of security
considerations.)  The previous code could do both of these under various
circumstances.

In some cases it took as a given that the first byte in a character is
correct, and skipped looking at the rest of the bytes in the sequence.
This is wrong when just that first byte is garbled.  We have to look at
all bytes in the expected sequence to make sure it hasn't been
prematurely terminated from what we were led to expect by that first
byte.

Likewise when we get an overflow: we have to keep looking at each byte
in the sequence.  It may be that the initial byte was garbled, so that
it appeared that there was going to be overflow, but in reality, the
input was supposed to be a shorter sequence that doesn't overflow.  We
want to have an error on that shorter sequence, and advance the pointer
to just beyond it, which is the first position where a valid character
could start.

This fixes a long-standing TODO from an externally supplied utf8 decode
test suite.

And, the old algorithm for finding overflow failed to detect it on some
inputs.  This was spotted by Hugo van der Sanden, who suggested the new
algorithm that this commit uses, and which should work in all instances.
For example, on a 32-bit machine, any string beginning with "\xFE" and
having the next byte be either "\x86" or \x87 overflows, but this was
missed by the old algorithm.

Another bug was that the code was careless about what happens when a
malformation occurs that the input flags allow. For example, a sequence
should not start with a continuation byte.  If that malformation is
allowed, the code pretended it is a start byte and extracts the "length"
of the sequence from it.  But pretending it is a start byte is not the
same thing as it actually being a start byte, and so there is no
extractable length in it, so the number that this code thought was
"length" was bogus.

Yet another bug fixed is that if only the warning subcategories of the
utf8 category were turned on, and not the entire utf8 category itself,
warnings were not raised that should have been.

And yet another change is that given malformed input with warnings
turned off, this function used to return whatever it had computed so
far, which is incomplete or erroneous garbage.  This commit changes to
return the REPLACEMENT CHARACTER instead.

Thanks to Hugo van der Sanden for reviewing and finding problems with an
earlier version of these commits

23 months agoutf8n_to_uvuni: Avoid reading outside of buffer
Karl Williamson [Wed, 18 Apr 2012 22:48:29 +0000]
utf8n_to_uvuni: Avoid reading outside of buffer

Prior to this patch, if the first byte of a UTF-8 sequence indicated
that the sequence occupied n bytes, but the input parameters indicated
that fewer were available, all n were attempted to be read

23 months agoutf8.c: Clarify and correct pod
Karl Williamson [Wed, 18 Apr 2012 22:35:39 +0000]
utf8.c: Clarify and correct pod

Some of these were spotted by Hugo van der Sanden

23 months agoutf8.c: Use macros instead of if..else.. sequence
Karl Williamson [Wed, 18 Apr 2012 22:20:22 +0000]
utf8.c: Use macros instead of if..else.. sequence

There are two existing macros that do the job that this longish sequence
does.  One, UTF8SKIP(), does an array lookup and is very likely to be in
the machine's cache as it is used ubiquitously when processing UTF-8.
The other is a simple test and shift.  These simplify the code and
should speed things up as well.