Karl Williamson [Tue, 7 May 2013 16:04:40 +0000]
perlapi: Move 'experimental' warning to front of entries
In a long multi-paragraph entry, the fact that the described function is
considered experimental may be lost, as it comes at the end. This just
moves it to the front.
Karl Williamson [Sat, 4 May 2013 19:29:15 +0000]
pp.c, regexec.c: Declare buffers large enough
These three buffers are not declared with the proper size. There is
a #define to use in these declarations, so use it. These matter only on
EBCDIC platforms, where the one in pp.c prior to this commit could cause
a buffer overrun there.
The others shouldn't because what is being used is known (smaller) size.
Karl Williamson [Sat, 4 May 2013 19:27:19 +0000]
pp.c: Don'tdeclare array to large
There is an existing #define that gives the correct size for this
buffer. No need to calculate it (which actually gives a larger than
needed value).
Karl Williamson [Wed, 24 Apr 2013 00:58:54 +0000]
XXX experimental pp_pack.c: 'u'
Karl Williamson [Tue, 26 Feb 2013 00:25:08 +0000]
XXX CPAN Normalize
This converts Unicode::Normalize to use the native tables that are used
by Perl starting in XXX, while using the Unicode-ordered ones that were
used before then.
Another alternative would be to have mktables generate just these tables
in Unicode ordering.
Karl Williamson [Tue, 26 Feb 2013 00:22:55 +0000]
XXX CPAN prob wrong Collate
This changes to implicity usenative code points. This is likely wrong,
as the module comes with its own data, that are probably in terms of
Unicode
Karl Williamson [Sun, 28 Apr 2013 04:14:02 +0000]
utf8.c: Remove wrapper functions.
Now that the Unicode data is stored in native character set order, it is
rare to need to work with the Unicode order. Traditionally, the real
work was done in functions that worked with the Unicode order, and
wrapper functions (or macros) were used to translate to/from native.
There are two groups of functions: one that translates from code point
to UTF-8, and the other group goes the opposite direction.
This commit changes the base function that translates from UTF-8 to code
point to output native instead of Unicode. Those extremely rare
instances where Unicode output is needed instead will have to hand-wrap
calls to this function with a translation macro, as now described in the
API pod. Prior to this, it was the other way, the native was wrapped,
and the rare, strict Unicode wasn't. This eliminates a layer of
function call overhead for a common case.
The base function that translates from code point to UTF-8 retains its
Unicode input, as that is more natural to process. However, it is
de-emphasized in the pod, with the functionality description moved to
the pod for a native input wrapper function. And, those wrappers are
now macros in all cases; previously there was function call overhead
sometimes. (Equivalent exported functions are retained, however, for XS
code that uses the Perl_foo() form.)
I had hoped to rebase this commit, squashing it with an earlier commit
in this series, eliminating the use of a temporary function name change,
but the work involved turns out to be large, with no real payoff.
Karl Williamson [Tue, 30 Apr 2013 15:13:35 +0000]
perlapi vis utf8.c: Nits
Karl Williamson [Tue, 30 Apr 2013 14:04:45 +0000]
utf8.c: Move 2 functions to earlier in file
This moves these two functions to be adjacent to the function they each
call, thus keeping like things together.
Karl Williamson [Sat, 27 Apr 2013 14:59:19 +0000]
embed.fnc: Slight clarification in comments
Karl Williamson [Mon, 22 Apr 2013 20:44:08 +0000]
mg.c: White-space only
I found re-formatting this multi-line 'if' to be easier to understand
Karl Williamson [Mon, 22 Apr 2013 20:34:47 +0000]
toke.c: Remove redundant test
This checks that something is both not-printable and not a word
character, but all word characters are printable, so just the
non-printable test suffices.
Karl Williamson [Sat, 20 Apr 2013 23:04:08 +0000]
gv.c: Add comment
Karl Williamson [Fri, 19 Apr 2013 23:02:25 +0000]
XXX rebase, finish up: reenable fold_grind.t
Karl Williamson [Fri, 19 Apr 2013 19:58:12 +0000]
t/op/coreamp.t: Generalize for non-ASCII platfomrs
Karl Williamson [Fri, 19 Apr 2013 19:19:44 +0000]
XXX temporary lib/warnings.pm: Add debugging info
Karl Williamson [Fri, 19 Apr 2013 19:18:20 +0000]
regcomp.c: Add missing (parens) to expression
A pair of parentheses were missing leading to this 'if' not acting as
intended.
Karl Williamson [Thu, 18 Apr 2013 03:49:10 +0000]
t/re/re_tests: Some tests are platform-specific
Karl Williamson [Thu, 18 Apr 2013 03:47:41 +0000]
t/re/regexp.t: Add ability to skip depending on platform
This adds the capability to specify that a test is to be done only on an
ASCII platform, or only on an EBCDIC.
Karl Williamson [Wed, 17 Apr 2013 14:22:36 +0000]
t/io/crlf.t: Generalize for non-ASCII platforms
Karl Williamson [Wed, 17 Apr 2013 02:15:08 +0000]
unicode_constants.h: regened for ebcdic
Karl Williamson [Tue, 16 Apr 2013 21:49:06 +0000]
XXX finish up t/re/regexp.t: Generalize for non-ASCII platforms
This adds code to the processing of the tests in t/re/re_tests to
automatically convert from unicode to native character sets
Add comment about circular tests
XXX better commit message
Karl Williamson [Tue, 16 Apr 2013 18:13:07 +0000]
ext/B/t/b.t: Generalize for non-ASCII platforms
Karl Williamson [Tue, 16 Apr 2013 18:02:26 +0000]
dist/Safe/t/safeutf8.t: Generalize to non-ASCII platform
Karl Williamson [Tue, 16 Apr 2013 17:50:04 +0000]
t/op/warn.t: Generalize for non-ASCII platforms
Karl Williamson [Tue, 16 Apr 2013 16:18:02 +0000]
re/reg_email.t: Generalize for non-ASCII platforms
This replaces all the hard-coded hex character values. It uses the new
(?[ ]) notation. I checked that the compiled regex matches the exact
same code points as before these changes.
Karl Williamson [Tue, 16 Apr 2013 15:04:50 +0000]
t/porting/regen.t: Add file to check
Karl Williamson [Tue, 16 Apr 2013 15:03:47 +0000]
dist/ExtUtils-Install/t/InstallWithMM.t: Skip if EBCDIC
Because is uses JSON
Karl Williamson [Mon, 15 Apr 2013 03:31:04 +0000]
XXX: t/lib/warnings/utf8: Experiment with malformed utf8
Karl Williamson [Sun, 14 Apr 2013 04:04:50 +0000]
XXX skip cpan tests
Karl Williamson [Sat, 13 Apr 2013 22:19:20 +0000]
ext/XS-APItest/t/svpeek.t: Generalize for non-ASCII platforms
Karl Williamson [Sat, 13 Apr 2013 22:14:35 +0000]
ext/XS-APItest/t/svpv_magic.t: Generalize for non-ASCII platforms
Karl Williamson [Sat, 13 Apr 2013 21:54:37 +0000]
lib/DBM_Filter/t/encode.t: Generalize for non-ASCII platforms
Karl Williamson [Sat, 13 Apr 2013 21:48:06 +0000]
XXX finish up lib/dumpvar.pl: Generalize for EBCDIC
Has octal constants
Karl Williamson [Sat, 13 Apr 2013 21:35:52 +0000]
XXX finish up lib/utf8.t: Generalize for non-ASCII platforms
This includes choosing a different code point that has 3 bytes in both
UTF-8 and UTF-EBCDIC, so that the pos numbers work for both.
Karl Williamson [Sat, 13 Apr 2013 21:16:44 +0000]
t/uni/parser.t: Generalize for non-ASCII platforms
Karl Williamson [Sat, 13 Apr 2013 20:41:46 +0000]
t/uni/method.t: Generalize for non-ASCII platforms
I couldn't figure out a way to not use the hard-coded values
Karl Williamson [Sat, 13 Apr 2013 20:26:09 +0000]
t/op/magic.t: Generalize for non-ASCII platforms
Karl Williamson [Sat, 13 Apr 2013 19:36:41 +0000]
t/io/through.t: Generalize for non-ASCII platforms
This uses hard-coded values for EBCDIC because of the shell issues
Karl Williamson [Sat, 13 Apr 2013 19:16:00 +0000]
toke.c: Fix EBCDIC bugs with single char variable names
Latin1 variable single character variable names should all be legal,
but the test was not for non-ASCII, it was for variant characters. On
EBCDIC platforms, this isn't the same as non-ASCII.
The legal control character variable names are not the same as the C0
and DEL controls, but are \001 \037 minus those that traditionally match
\s on ASCII platforms, plus \c?.
Karl Williamson [Sat, 13 Apr 2013 18:55:09 +0000]
toke.c: An EBCDIC fix
toCTRL(0..31) yields a printing character. This is different from
toCTRL(control) on EBCDIC machines.
Karl Williamson [Sat, 13 Apr 2013 18:52:17 +0000]
XXX \c must be followed by printable
This should be revised and included in 5.18, 5.19 depending on RFC outcome.
Karl Williamson [Sat, 13 Apr 2013 17:41:04 +0000]
XXX temp toCTRL
Karl Williamson [Sat, 13 Apr 2013 15:18:41 +0000]
perlio.c: Generalize for EBCDIC
This code had the hex constants for CARRIAGE RETURN and LINE FEED
hard-coded in. It appears to me from the comments that '\r' and '\n'
are not suitable to use instead. This commit changes the constants to
use the native values instead.
Karl Williamson [Sat, 13 Apr 2013 15:51:34 +0000]
unicode_constants.h: Add #defines for CR, LF
Karl Williamson [Sun, 7 Apr 2013 16:45:14 +0000]
t/op/goto.t: Generalize for EBCDIC
Karl Williamson [Sun, 7 Apr 2013 03:03:44 +0000]
regcomp.c: White-space only, wrap comment to fit
Karl Williamson [Thu, 4 Apr 2013 02:15:17 +0000]
t/re/pat.t: Generalize for EBCDIC
Karl Williamson [Thu, 4 Apr 2013 03:56:02 +0000]
XXX t/op/pack.t: Generalize for EBCDIC
One unknown what to do: uuencode
Karl Williamson [Sat, 6 Apr 2013 18:56:52 +0000]
regcomp.c: In EBCDIC [i-j] exclude also ASCII
i and j are not adjacent in EBCDIC. This excluded any alphabetic
characters between them, but allowed other ascii ones.
Karl Williamson [Sat, 6 Apr 2013 18:54:42 +0000]
utf8.c: Don't use slower general-purpose function
There is a macro that accomplishes the same task for a two byte UTF-8
encoded character, and avoids the overhead of the general purpose
function call.
Karl Williamson [Sat, 6 Apr 2013 18:53:07 +0000]
utf8.c: Don't do ++ in macro parameter
The formal parameter gets evaluated multiple times on an EBCDIC
platform, thus incrementing more than the intended once.
Karl Williamson [Sat, 6 Apr 2013 18:50:48 +0000]
utf8.c: Use macro instead of duplicating code
There is a macro that accomplishes this task, and is easier to read.
Karl Williamson [Sat, 6 Apr 2013 16:15:05 +0000]
t/io/bom.t: Fix to run under EBCDIC
Karl Williamson [Sat, 6 Apr 2013 05:34:50 +0000]
t/uni/overload.t: EBCDIC fixes
Karl Williamson [Sat, 6 Apr 2013 05:34:13 +0000]
t/uni/method.t: EBCDIC fixes
Karl Williamson [Sat, 6 Apr 2013 05:33:28 +0000]
t/op/utf8magic.t: EBCDIC fixes
Karl Williamson [Sat, 6 Apr 2013 05:32:57 +0000]
t/op/evalbytes.t: EBCDIC fixes
Karl Williamson [Fri, 5 Apr 2013 22:20:20 +0000]
lib/utf8.pm: Fix pod verbatim line wrap
Karl Williamson [Fri, 5 Apr 2013 19:27:42 +0000]
t/op/length.t: EBCDIC fixes
Karl Williamson [Sat, 6 Apr 2013 19:01:54 +0000]
t/op/utfhash.t: XXX Add debug
Karl Williamson [Fri, 5 Apr 2013 18:21:21 +0000]
Data-Dumper/Dumper.pm: Fix for EBCDIC
Karl Williamson [Fri, 5 Apr 2013 18:15:58 +0000]
Dumper.xs: Don't translate character twice
utf8_to_uvchr() already returns the native code point; no need to
convert again. This code is only executed on Perls before 5.15
Karl Williamson [Sun, 7 Apr 2013 02:39:22 +0000]
dist/IO/t/io_utf8argv.t: Generalize and enable EBCDIC
Infrastructure now exists to have this test run on EBCDIC platforms.
Karl Williamson [Thu, 4 Apr 2013 03:59:16 +0000]
utf8.h: Clarify comments
Karl Williamson [Thu, 4 Apr 2013 01:06:52 +0000]
XXX CPAN cpan/Test/lib/Test.pm: Fixes for EBCDIC
Karl Williamson [Tue, 2 Apr 2013 04:29:16 +0000]
t/re/pat_re_eval.t: Some EBCDIC fixes
Karl Williamson [Tue, 2 Apr 2013 13:11:19 +0000]
t/test.pl: Add fcn for UTF-EBCDIC conversion
This adds the function byte_utf8a_to_utf8n(). This takes the bytes that
form a UTF-8 string and convert them to the bytes that form that string
on the native platform.
Karl Williamson [Tue, 2 Apr 2013 04:28:43 +0000]
dist/Storable/t/utf8.t: Fix to run under EBCDIC
Karl Williamson [Tue, 2 Apr 2013 04:28:08 +0000]
t/uni/variables.t: Fix to run under EBCDIC
Karl Williamson [Tue, 2 Apr 2013 03:08:20 +0000]
t/op/split.t: EBCDIC fixes
Karl Williamson [Tue, 2 Apr 2013 02:43:03 +0000]
re/pat_advanced.t: EBCDIC fixes
This includes not skipping some EBCDIC that formerly was, since we now
have testing infrastructure that makes this easy.
Karl Williamson [Tue, 2 Apr 2013 02:01:04 +0000]
t/io/utf8.t: EBCDIC fixes
Karl Williamson [Sun, 31 Mar 2013 03:13:38 +0000]
Unicode::UCD.pm: Nits
Karl Williamson [Sat, 30 Mar 2013 18:32:09 +0000]
t/uni/fold.t: Generalize for non-ASCII platforms
Karl Williamson [Fri, 29 Mar 2013 21:22:28 +0000]
XXX t/op/tiehandle.t: skip for now; deep recursion
Karl Williamson [Fri, 29 Mar 2013 20:56:16 +0000]
XXX better commit msg utf8.c: Avoid unnecessary UTF-8 conversions
This changes the code so that converting to UTF-8 is avoided unless
necessary. For such inputs, the conversion back from UTF-8 is also
avoided. The cost of doing this is that the first swatches are combined
into one that contains the values for all characters 0-255, instead of
having multiple swatches. That means when first calculating the swatch
it calculates all 256, instead of 128 (160 on EBCDIC).
This also fixes an EBCDIC bug in which characters in this range were
being translated twice.
Karl Williamson [Fri, 29 Mar 2013 19:34:59 +0000]
utf8.c: No need to check for UTF-8 malformations
This function assumes that the input is well-formed UTF-8, even though
until this commit, the preferatory comments didn't say so. The API does
not pass the buffer length, so there is no way it could check for
reading off the end of the buffer. One code path already calls
valid_utf8_to_uvchr(); this changes the remaining code path to correspond.
Karl Williamson [Fri, 29 Mar 2013 01:56:39 +0000]
utf8.c: Remove redundant assignment.
This variable is always set just below.
Karl Williamson [Thu, 28 Mar 2013 23:19:16 +0000]
XXX enable _invlist_dump;
Karl Williamson [Fri, 8 Mar 2013 18:01:32 +0000]
XXX EBCDIC header files
Karl Williamson [Fri, 15 Mar 2013 18:26:15 +0000]
hints/os390.sh: Suppress bogus compiler message
John Goodyear [Sat, 2 Mar 2013 19:31:25 +0000]
XXX Temporary for z/OS long long support
Karl Williamson [Thu, 28 Mar 2013 00:17:28 +0000]
Add test that to/from native character set works
For non-ASCII systems, there are character set translation tables. This
makes sure the two accessible ones are inverses of each other. If not,
nothing can be expected to work right.
Karl Williamson [Wed, 27 Mar 2013 22:55:55 +0000]
lib/feature/bundle: Fix some things to pass under EBCDIC
Karl Williamson [Wed, 27 Mar 2013 22:08:04 +0000]
XS-APItest/t/fetch_pad_names.t: Skip if EBCDIC
This could be ported, but there's a lot of stuff to convert; would need
a function to convert byte strings that form legal UTF-8 into legal
UTF-EBCDIC
Karl Williamson [Wed, 27 Mar 2013 18:05:53 +0000]
XXX ext/XS-APItest/t/utf8.t: Fix so passes EBCDIC
This involves skipping much of the tests. Reexamine later
Karl Williamson [Wed, 27 Mar 2013 17:27:06 +0000]
ext/re/t/re_funcs_u.t: Fix to work under EBCDIC
Karl Williamson [Wed, 27 Mar 2013 17:11:22 +0000]
XXX dist/IO/t/io_utf8argv.t: Temporarily skip if EBCDIC
Karl Williamson [Wed, 27 Mar 2013 16:33:44 +0000]
t/op/print.t: Skip an EBCDIC test
This could be written (the values would probably change depending on the
code page), but the code that would get exercised is unlikely to vary
depending on character set.
Karl Williamson [Tue, 26 Mar 2013 21:44:59 +0000]
XXX t/TEST: Avoid SIGPIPEs
Karl Williamson [Tue, 26 Mar 2013 21:49:08 +0000]
XXX Temporarily test normalization
Karl Williamson [Tue, 26 Mar 2013 20:06:50 +0000]
op/index.t: Fix tests for EBCDIC
Commit
8a38a836 erroneously translates literals into the native
encoding, causing a double translation, which is garbage.
Karl Williamson [Tue, 26 Mar 2013 02:43:38 +0000]
op/chop.t: Fix for EBCDIC
One test is skipped because the code point is not representable on
EBCDIC platforms. Another test is modified to work on EBCDIC.
Karl Williamson [Tue, 26 Mar 2013 01:56:50 +0000]
t/op/lc.t: Fix to work under EBCDIC
This had code that attempted this, but it was wrong. The conversion to
EBCDIC must be done before the \U, or similar.
Karl Williamson [Mon, 25 Mar 2013 21:33:55 +0000]
Skip some tests under EBCDIC
EBCDIC won't work on these because of inherent differences from ASCII
Karl Williamson [Mon, 25 Mar 2013 21:04:14 +0000]
porting/bincompat.t: Skip under EBCDIC
because the sorting order is different
Karl Williamson [Mon, 25 Mar 2013 20:59:50 +0000]
t/re/regex_sets.t: So will pass under EBCDIC
Karl Williamson [Mon, 25 Mar 2013 20:59:26 +0000]
t/porting/bincompat.t: Typo in comment
Karl Williamson [Mon, 25 Mar 2013 19:09:09 +0000]
XXX fix \x{too large}