This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
David Mitchell [Fri, 2 Jun 2017 12:08:12 +0000 (13:08 +0100)]
Perl_sv_vcatpvfn_flags: re-indent a code block
whitespace only
David Mitchell [Fri, 2 Jun 2017 12:00:52 +0000 (13:00 +0100)]
Perl_sv_vcatpvfn_flags: eliminate p var
It has 1500-line scope, and is equal to fmtstart-1 for most of the
time.
This also allows us to 'const'ify some variables better.
David Mitchell [Fri, 2 Jun 2017 11:23:32 +0000 (12:23 +0100)]
Perl_sv_vcatpvfn_flags: clarify GCC bug comments
In particular it wasn't clear what bug was being worked around, nor that
'#13488' referred to a GNU ticket rather than a perl ticket.
This bug was fixed back in 2004, but the workaround is fairly harmless, so
I've left it as-is.
David Mitchell [Fri, 2 Jun 2017 10:57:11 +0000 (11:57 +0100)]
Perl_sv_vcatpvfn_flags: simplify alt handling
only do calculations for alt (#) formatting in the branches which use it
David Mitchell [Fri, 2 Jun 2017 10:41:41 +0000 (11:41 +0100)]
Perl_sv_vcatpvfn_flags: rename 'p' var 's'
In the 'append # block of code at the end of the loop, don't re-use the
widely-scoped 'p' pointer; instead use a tightly scope var instead
(named 's' do it doesn't clash with p which is still valid in an outer
scope.)
David Mitchell [Fri, 2 Jun 2017 08:51:40 +0000 (09:51 +0100)]
Perl_sv_vcatpvfn_flags: simplify format appending
The bit at the end of the main loop has a whole bunch of conditionals
along the lines of
if (gap && !left)
apppend gap
if (esignlen && !fill)
append esignbuf
if (zeros)
append zeroes
if (elen)
append ebuf
if (gap && left)
append gap
This involves many tests along the main code path to cope with all the
possibilities (e.g. if left, gap is output before ebuf, otherwise after)
Instead split it into a couple of major branches with duplication between
the branches, but requiring few tests along any one code path.
For example, sprintf("%5d", -1) formerly required 9 branches, 1 for loop,
and 1 memset(). It now requires 2 branches and 3 for loops,
I've removed memset()s and replaced them with for loops. For the short
padding typically used (e.g. "%9d" rather than "%8192d") a loop is faster.
David Mitchell [Thu, 1 Jun 2017 15:05:59 +0000 (16:05 +0100)]
Perl_sv_vcatpvfn_flags: eliminate a wrap check
This is one case where it can never wrap, so don't check.
David Mitchell [Thu, 1 Jun 2017 11:46:23 +0000 (12:46 +0100)]
Perl_sv_vcatpvfn_flags: simpler special formats
At the top of Perl_sv_vcatpvfn_flags(), certain fixed formats are
special-cased: "", "%s", "%-p", "%.0f".
Simplify the code which handles these. In particular, don't try to issue
"missing" or "redundant" arg warnings there. Instead, check for the
correct number of args as part of the test for whether this can be
special-cased, and if not, fall through to the general code in the main
body of the function to handle that format and issue any warnings.
This makes the code a lot simpler. It also now detects the redundant arg
in printf("%.0f",1,2).
The code is now also more efficient - it tries to check for things like
pat[0] == '%' only once, rather than re-checking for every special-case
variant its trying.
David Mitchell [Thu, 1 Jun 2017 10:55:47 +0000 (11:55 +0100)]
Perl_sv_vcatpvfn_flags: simpler redundant arg test
5.24.0 added a new warning:
Redundant argument in printf at ....
That warning is issued if there are more args than format elements.
However, it may also warn for invalid format - e.g. for something like
printf("%Z%d", 1,2) you get both
Invalid conversion in printf: "%Z" at ...
Redundant argument in printf at ...
Personally I think once once part of the format has been determined to be
invalid, its hard for perl to second-guess in what way the format was
invalid, and thus to be able to conclude that there is in fact a redundant
arg.
So this commit commit suppresses any "redundant" warning once an "invalid"
warning has been issued.
Doing this makes it possible to simplify the code and remove the
used_explicit_ix variable.
Apart from warnings, used_explicit_ix was only used in %p to check for
'simple' special forms - but that code checks for a trailing '$' character
anyway, so that test was redundant.
David Mitchell [Thu, 1 Jun 2017 10:29:35 +0000 (11:29 +0100)]
Perl_sv_vcatpvfn_flags: fix comment typo
David Mitchell [Thu, 1 Jun 2017 10:27:20 +0000 (11:27 +0100)]
Perl_sv_vcatpvfn_flags: add comment about wrap
David Mitchell [Thu, 1 Jun 2017 10:08:27 +0000 (11:08 +0100)]
Perl_sv_vcatpvfn_flags: only do utf8 in radix code
For floating point formats, the output can only be utf8 if the radix point
is utf8. Currently the radix point code sets the is_utf8 variable, then
later, in the main floating-point code path, it tests is_utf8 and
upgrades the output string to utf8.
Instead, just do the upgrade directly in the radix code block.
David Mitchell [Thu, 1 Jun 2017 10:00:26 +0000 (11:00 +0100)]
Perl_sv_vcatpvfn_flags: simplify radix len adding
Assume the length of the radix point is a constant 1 (i.e. length('.'))
and only increment float_need further if we're in a locale.
David Mitchell [Thu, 1 Jun 2017 09:52:12 +0000 (10:52 +0100)]
sprintf %a/%A more sanity checks
For the code which generates hexadecimal floating-point formats,
add extra sanity checks against buffer overruns.
David Mitchell [Thu, 1 Jun 2017 09:32:36 +0000 (10:32 +0100)]
S_hextract(): fix #if indentation
a complex set of nested #if/#else/#endif's had incorrect and confusing
indentation.
whitespace-only change
David Mitchell [Wed, 31 May 2017 11:35:34 +0000 (12:35 +0100)]
Perl_sv_vcatpvfn_flags: simplify some wrap checks
Skip doing some overflow checks when we know it can't overflow.
David Mitchell [Wed, 31 May 2017 10:59:48 +0000 (11:59 +0100)]
Perl_sv_vcatpvfn_flags: simplify float_need calc
Include another constant addition in the initial assignment, to eliminate
a later wrap check.
David Mitchell [Wed, 31 May 2017 10:15:15 +0000 (11:15 +0100)]
S_format_hexfp(): s/int/STRLEN/
In the helper function that sprintf's %a/%A hex floating point values,
the calculation of the number of zeros to pad with should be in terms of
STRLEN rather than int.
A bit academic unless someone ever tries to print a hex f/p value with a
precision > 2Gb digits.
David Mitchell [Wed, 31 May 2017 08:47:27 +0000 (09:47 +0100)]
op/infnam.t: skip unportable tests
sprintf size modifiers L and q aren't available on all platform sizes,
so skip them.
David Mitchell [Tue, 30 May 2017 15:11:37 +0000 (16:11 +0100)]
Perl_sv_vcatpvfn_flags: add inits to silence gcc
Add a couple of unnecessary variable initialisers, to keep gcc's "this
variable might be used uninitialised - then again it might not - in fact I
don't really know what I'm talking about, but I've decided to annoy you
with it anyway" warning at bay.
David Mitchell [Tue, 30 May 2017 14:55:29 +0000 (15:55 +0100)]
Perl_sv_vcatpvfn_flags: avoid wrap on precision
Where the precision is specified literally in the format string,
the integer precision value could wrap. Instead, make it croak with
Integer overflow in format string
As in other recent commits, the upper limit is set at 1/4 of STRLEN.
David Mitchell [Tue, 30 May 2017 14:27:00 +0000 (15:27 +0100)]
Perl_sv_vcatpvfn_flags: s/int/STRLEN/g
There wee a few residual places that used int loop counters, e.g. to
prepend N '0's to a number. Since the N's are of type STRLEN, make the
loop counters STRLEN too.
Its a bit academic since you're unlikely to have a number needing >2Gb
worth of zero padding, but it makes things consistent and easier to audit.
At this point I believe that any remaining usage of int / I32 / U32 in
Perl_sv_vcatpvfn_flags() is legitimate.
David Mitchell [Tue, 30 May 2017 14:11:24 +0000 (15:11 +0100)]
Perl_sv_vcatpvfn_flags: %n: avoid wrap
Its a bit academic, but in principle if a string was longer than 2Gb
chars, the length as set by %n could wrap. So use the correct type(s).
David Mitchell [Tue, 30 May 2017 12:45:35 +0000 (13:45 +0100)]
Perl_sv_vcatpvfn_flags: width/precis arg wrap
When the width or precision is specified via an argument rather than
literally, check whether the value wraps.
Formerly, something like
$w = 0x100000005;
printf "%*s", $w, "abc";
might print " abc" or similar, depending on platform.
Now it croaks with "Integer overflow in format string".
I did wonder whether it should just warn instead, but:
1) over-large literal widths/precisions already croak.
2) Code that has wild field specifiers like that is already likely
to crash with an out-of-memory error.
3) At least this croak is trappable via eval - OOM isn't.
I also set the maximum allowed value to be 1/4 of the size of a pointer,
to give a safety margin for possible wrapping later
David Mitchell [Mon, 29 May 2017 16:06:06 +0000 (17:06 +0100)]
Perl_sv_vcatpvfn_flags: move vector initialisation
Move the generation of vecstr/veclen/vec_utf8 into the
vector-initialisation block, rather than being part of the general
'get next arg' block.
Also, stop vecsv being in scope for the whole of the loop block, and make
it two separate tightly-scope vars (with different purposes).
David Mitchell [Mon, 29 May 2017 15:53:06 +0000 (16:53 +0100)]
Perl_sv_vcatpvfn_flags: warn on missing %v arg
The explicit arg variant, e.g. %3$vd, didn't give 'missing arg' warning.
David Mitchell [Mon, 29 May 2017 15:20:17 +0000 (16:20 +0100)]
Perl_sv_vcatpvfn_flags: warn on missing width arg
It didn't used to warn when the width value was obtained from the next or
specified arg, and there wasn't such an arg.
David Mitchell [Mon, 29 May 2017 15:11:01 +0000 (16:11 +0100)]
Eliminate FETCH_VCATPVFN_ARGUMENT macro
This can be simplified so much now that it might as well just be expanded
in situ for its 3 uses.
David Mitchell [Mon, 29 May 2017 15:01:26 +0000 (16:01 +0100)]
Perl_sv_vcatpvfn_flags: re-indent block
whitespace-only
David Mitchell [Mon, 29 May 2017 14:27:18 +0000 (15:27 +0100)]
Perl_sv_vcatpvfn_flags: unify %v vers obj handling
Cureently sv_vcatpvfn_flags() has special handling of the arg under %v
when the arg is a version object, but only via the perlish interface
(argsv and svmax). This commit extends that handling to the C-sih
interface (args).
There seems no good reason not to, and it simplifies the code.
David Mitchell [Mon, 29 May 2017 12:49:42 +0000 (13:49 +0100)]
Perl_sv_vcatpvfn_flags: unify args handling
Several places do something along the lines of:
if (explicit arg index)
FETCH_VCATPVFN_ARGUMENT(...., svargs[ix-1])
else
FETCH_VCATPVFN_ARGUMENT(...., svargs[svix++])
For each of these, reduce the duplicate code by changing the above to
(approximately)
ix = ix ? ix - 1 : svix++;
FETCH_VCATPVFN_ARGUMENT(...., svargs[ix])
David Mitchell [Mon, 29 May 2017 10:16:49 +0000 (11:16 +0100)]
sv_vcatpvfn() family: make svmax arg Size_t
It was formerly I32. It should be unsigned since you can't have a negative
number of args. And although you're unlikely to call sprintf with more
than 0x7fffffff args, it makes it more consistent with other APIs which
we've been gradually expanding to 64-bit/ptrsize. It also makes the
code internal to Perl_sv_vcatpvfn_flags more consistent, when
dealing with explict arg index formats like "%10$s". This function still
has a mix of STRLEN (for string lengths) and Size_t (for arg indexes)
but they are aliases for each other.
I made Perl_do_sprintf()'s len arg SSize_t rather than Size_t, since
it typically gets called with ptr diff arithmetic. Not sure if this is
being overly cautious.
David Mitchell [Mon, 29 May 2017 08:59:16 +0000 (09:59 +0100)]
S_expect_number(): return STRLEN not I32
This static function is used by Perl_sv_vcatpvfn_flags() to read in
a width or explicit argument number. It currently returns an I32 result
(and croaks if the number exceeds the maximum possible I32 value).
Change it to return STRLEN, and to croak on the value being greater than
max(STRLEN) / 4.
This doesn't make a lot of difference in practice, since no code is ever
going to be able to successfully create a formatted string that large
without running out of memory anyway. But by making it unsigned and of the
same type used elsewhere in sv_vcatpvfn_flags(), it simplifies auditing
the code for possible wrapping/truncating etc.
The change in the limit where it croaks with "Integer overflow in format
string" has changed as follows:
previously now
32-bit system 0x7fffffff 0x3fffffff
32/64bit system 0x7fffffff 0x3fffffff
64bit system 0x7fffffff 0x3fffffffffffffff
Setting the limit as 1/4 max rather than 1/2 max is just a safety
net to help avoid wraps/overflows elsewhere.
David Mitchell [Sun, 28 May 2017 17:07:14 +0000 (18:07 +0100)]
Perl_sv_vcatpvfn_flags: simplify 'c' var
Make it so that its now *always* the format type ('s', 'd' etc).
Don';t bother initialising it, and *don't* use as as a temporary
buffer (eptr = &c), so it can be stored in a register.
David Mitchell [Sun, 28 May 2017 16:59:47 +0000 (17:59 +0100)]
Perl_sv_vcatpvfn_flags: reduce scope of 'iv' var
David Mitchell [Sun, 28 May 2017 16:52:25 +0000 (17:52 +0100)]
Perl_sv_vcatpvfn_flags: eliminate 'epix' var
Or rather, reduce its scope to a small block and rename to 'ix'.
David Mitchell [Sun, 28 May 2017 16:49:10 +0000 (17:49 +0100)]
S_expect_number() re-indent code
.. following previous commit. Whitespace only.
David Mitchell [Sun, 28 May 2017 16:43:36 +0000 (17:43 +0100)]
sprintf: move 1..9 test out of S_expect_number()
Currently Perl_sv_vcatpvfn_flags() does several checks for "is the next
part of the format a number starting with a '1'..'9'?" It does this by
calling S_expect_number(), which returns 0 if not, or the value of the
number otherwise. For a simple format specifier, this results in multiple
fruitless calls to S_expect_number.
This commits makes it that the caller of S_expect_number is responsible
for checking for the presence of 1..9.
David Mitchell [Fri, 26 May 2017 23:57:47 +0000 (00:57 +0100)]
Perl_sv_vcatpvfn_flags: more %v optimisation
Only do the code for appending the vector separator in the vector branch.
In particular, don't size the SvGROW for dotstrlen outside of %v.
This makes the %v code a bit slower but everything else a bit faster.
David Mitchell [Fri, 26 May 2017 23:17:35 +0000 (00:17 +0100)]
Perl_sv_vcatpvfn_flags: test for valid %vX once
Rather than testing for !vectorize in every conversion case which doesn't
support %v, test once for supported types in the if (vectorize) branch.
That way code which doesn't use %v never has to test for it.
David Mitchell [Fri, 26 May 2017 23:07:48 +0000 (00:07 +0100)]
Perl_sv_vcatpvfn_flags: join two if blocks
convert if (x); if (!x); into an single if/else
David Mitchell [Fri, 26 May 2017 23:00:10 +0000 (00:00 +0100)]
Perl_sv_vcatpvfn_flags: delay vector arg get
Move the block of code which retrieves the SV which the %v will iterate
over, from just before the /* SIZE */ block to just after. Since that
block doesn't do anything with args or svargs, this should make no
functional difference - but it will allow the next commit to coalesce
if (x); if (!x); into an single if/else.
Apart from cutting and pasting the code block, no other changes have been
made to it.
David Mitchell [Fri, 26 May 2017 22:49:58 +0000 (23:49 +0100)]
Perl_sv_vcatpvfn_flags: eliminate VECTORIZE_ARGS
This macro is only used once. Just expand it.
David Mitchell [Fri, 26 May 2017 22:42:07 +0000 (23:42 +0100)]
Perl_sv_vcatpvfn_flags: eliminate ewix local var
It's now only used within one code block.
David Mitchell [Fri, 26 May 2017 22:34:25 +0000 (23:34 +0100)]
Perl_sv_vcatpvfn_flags: remove 'asterisk' var
There was only one remaining use of this local var: in %p, to distinguish
between explicit and implicit width specifier, e.g. %*p or %1$p, vs %2p.
This can be done by just checking whether the char before the p was a '*'
or '$'.
David Mitchell [Fri, 26 May 2017 22:20:22 +0000 (23:20 +0100)]
Perl_sv_vcatpvfn_flags: further simplify %v logic
For the common case with no * or v, there now are only 2 test-and-branch
(! '*', ! 'v') rather than 3 (! '*', ! 'v', !asterisk)
This works by putting the *v handling code in the * branch
David Mitchell [Fri, 26 May 2017 21:45:02 +0000 (22:45 +0100)]
Perl_sv_vcatpvfn_flags: eliminate evix local var
David Mitchell [Fri, 26 May 2017 21:26:58 +0000 (22:26 +0100)]
Perl_sv_vcatpvfn_flags: simplify v/asterisk code
The previous commit's rearrangement of the v and * code now allows us to:
1) eliminate the 'vectorarg' bool variable, which is set but no longer
used;
2) join two adjacent "if (asterisk)" and "if (!asterisk)" blocks into a
single if/else.
David Mitchell [Fri, 26 May 2017 21:19:44 +0000 (22:19 +0100)]
Perl_sv_vcatpvfn_flags: move %*v handling earlier
Where the v flag appears, and it has non-default separator, i.e.
*v or *NNN$v, retrieve the next or NNNth arg (which defines the separator)
earlier - as soon as we encounter the v flag. This should in theory make
no functional difference since no args are processed between those two
points (so no chance of us stealing something else's arg).
Doing it ealrier makes the conditions simpler (we don't have to check for
(vectorize && vectorarg) later).
The whole code block has been moved as-is with no changes apart from
whitespace.
David Mitchell [Fri, 26 May 2017 17:19:11 +0000 (18:19 +0100)]
Perl_sv_vcatpvfn_flags: move Inf handling for ints
integer-like format types handle Inf/Nan specially. Currently the code to
handle this in the main execution path, guarded by
if (strchr("BbcDdiOouUXx", c)) ...
After the previous few commits reorganised the int-arg getting code, this
block can now be moved into an int-only section, so not slowing down
other format types.
There should be no functional changes.
I've added some comments to the %c branch explaining why its a special
case.
David Mitchell [Fri, 26 May 2017 16:23:09 +0000 (17:23 +0100)]
Perl_sv_vcatpvfn_flags: unify int arg fetching
There are two big blocks of code that do signed and unsigned 'get next int
arg' processing. Combine them (sort of).
Previously it was a bit like
case 'd':
case 'i':
base = 10;
if (vectorize)
uv = ...
else if (arg)
iv = ...
else
iv = SvIV_nomg(argsv);
if (!vectorize)
uv = f(iv) for some f.
goto integer;
case 'x' base = 16; goto uns_integer;
case 'u' base = 10; goto uns_integer;
...
uns_integer:
if (vectorize)
uv = ...
else if (arg)
uv = ...
else
uv = SvUV_nomg(argsv);
integer:
... do stuff with base and uv ...
Now it's more like
case 'd': base = -10; goto get_int_arg_val;
case 'i': base = -10; goto get_int_arg_val;
case 'x': base = 16; goto get_int_arg_val;
case 'u': base = 10; goto get_int_arg_val;
get_int_arg_val:
if (vectorize)
uv = ...
else if (base < 0) {
/* signed int type */
base = -base;
if (arg)
iv = ...
else
iv = SvIV_nomg(argsv);
uv = f(iv) for some f.
}
else {
/* unsigned int type */
if (arg)
uv = ...
else
uv = SvUV_nomg(argsv);
}
integer:
... do stuff with base and uv ...
Note that in particular the vectorize block of code is no longer
duplicated. This will also allow the next commit to handle Inf/overload
just after the 'get_int_arg_val' label rather than doing it before the
main switch and slowing down the non-integer format types.
Should be no functional changes
David Mitchell [Fri, 26 May 2017 15:39:30 +0000 (16:39 +0100)]
Perl_sv_vcatpvfn_flags: move %c handling to ints
%c is in some ways like integer formats - we treat the arg as an integer
(with '0+' overloading and Inf/Nan handling), but then at the end convert
it into a 1 char string rather than sequence of 0..9's.
Move the %c code partially into the main integer handling block of
code; this will shortly allow us to unify the SV-as-integer handling code.
David Mitchell [Fri, 26 May 2017 15:05:18 +0000 (16:05 +0100)]
Perl_sv_vcatpvfn_flags: %p and Inf/Nan
sprintf("%p", 0+Inf) should print the address of an SV, not the literal
string "Inf". Ditto NaN.
Similarly, sprintf("%p", $x) should print the address of the $x SV,
not triggering a tie fetch or overload method call, nor using the address
of any SV returned by such calls.
David Mitchell [Thu, 25 May 2017 11:09:52 +0000 (12:09 +0100)]
Perl_sv_vcatpvfn_flags: make 'fill' var a boolean
Currently the 'fill' local variable is a char, but it only ever holds the
values ' ' or '0'. Make it into a boolean flag instead.
David Mitchell [Thu, 25 May 2017 10:56:44 +0000 (11:56 +0100)]
Perl_sv_vcatpvfn_flags: do %p specials in %p case
There are currently a few special-cased %p variants (but only when called
from C, not from perl) such as %-p, %2p etc. Currently these are handled
specially at the top of main format-element loop, which penalises every
format type. Instead move the handling into the "case 'p'" branch of the
main switch. Which seems more logical, as well as more efficient.
I've also heavily rewritten the big comment block about all the special %p
formats.
David Mitchell [Thu, 25 May 2017 09:29:04 +0000 (10:29 +0100)]
Perl_sv_vcatpvfn_flags: move UTF8f handling code
The special UTF8f format (which is usually defined as something like
"%d%lu%4p") is currently handled as a special case at the top of the main
format-element loop.
Instead move it into the "case "'d'" branch so that it doesn't slow down
everything.
David Mitchell [Wed, 24 May 2017 15:29:16 +0000 (16:29 +0100)]
Perl_sv_vcatpvfn_flags: add %n code comment
point out thngs like "%-4.5n" don't currently warn
David Mitchell [Wed, 24 May 2017 15:09:25 +0000 (16:09 +0100)]
Perl_sv_vcatpvfn_flags: make %n missing arg fatal
Normally sprintf et al just warn if there aren't enough args; but since %n
wants to write the current string length to the next arg, make it fatal.
Formerly it would croak anyway, but with a spurious "Modification of a
read-only value" error as it as it tried to set &PL_sv_no
David Mitchell [Wed, 24 May 2017 14:58:06 +0000 (15:58 +0100)]
Perl_sv_vcatpvfn_flags: comment %n deficiency
This should be fixed sometime:
/* XXX if sv was originally non-utf8 with a char in the
* range 0x80-0xff, then if it got upgraded, we should
* calculate char len rather than byte len here */
David Mitchell [Sat, 20 May 2017 15:01:26 +0000 (16:01 +0100)]
Perl_sv_vcatpvfn_flags: skip IN_LC(LC_NUMERIC)
In a couple of places it does
if (PL_numeric_radix_sv && IN_LC(LC_NUMERIC)) { ... }
But PL_numeric_radix_sv is set to NULL unless we have a non-standard
radix point (i.e. not "."), and this can only happen when we're in the
scope of 'use locale'. So the IN_LC() should be a redundant (and
expensive) test. Replace it with an assert.
David Mitchell [Sat, 20 May 2017 14:51:31 +0000 (15:51 +0100)]
Perl_sv_vcatpvfn_flags: set locale at most once
Calls to external snprintf-ish functions or that directly access
PL_numeric_radix_sv are supposed to sandwich this access within
STORE_LC_NUMERIC_SET_TO_NEEDED();
....
RESTORE_LC_NUMERIC();
The code in Perl_sv_vcatpvfn_flags() seems to have gotten a bit confused
as to whether its trying to only set STORE_LC_NUMERIC_SET_TO_NEEDED()
once, then handle one of more %[aefh] format elements, then only
restore on exit. There is code at the end of the function which says:
RESTORE_LC_NUMERIC(); /* Done outside loop, so don't have to save/restore
each iteration. */
but in practice various places within this function (and its helper
function S_format_hexfp() inconsistently repeatedly do
STORE_LC_NUMERIC_SET_TO_NEEDED(); and sometime do RESTORE_LC_NUMERIC().
This commit changes it so that STORE_LC_NUMERIC_SET_TO_NEEDED() is called
at most once, the first time a % format involving a radix point is
encountered, and does RESTORE_LC_NUMERIC(); exactly once at the end of the
function.
Note that while calling STORE_LC_NUMERIC_SET_TO_NEEDED() multiple times
is harmless, its quite expensive, as each time it has to check whether
it's in the scope of 'use locale'. RESTORE_LC_NUMERIC() is cheap if
STORE_LC_NUMERIC_SET_TO_NEEDED() earlier determined that there was nothing
to do.
David Mitchell [Sat, 20 May 2017 12:01:02 +0000 (13:01 +0100)]
Perl_sv_vcatpvfn_flags: remove redundant code
At the start of the function, it marks the output as being utf8 if the
first arg is utf8. But this should be taken care of when the individual
args (including the first one are processed). So its redundant code.
In fact it would sometimes cause the resultant string to be unnecessarily
upgraded to utf8, e.g.:
my $precis = "9";
utf8::upgrade($precis);
my $s = sprintf "%.*f\n", $precis, 1.1;
# whoops, $s is now utf8
David Mitchell [Sat, 20 May 2017 11:07:23 +0000 (12:07 +0100)]
Perl_sv_vcatpvfn_flags: remove "%.Ng" special-case
This function has special-case handling for the formats "%.0f" and
"%.NNg", to speed things up. This special-casing appears twice,
once near the top of the function for where the format matches exactly
"%.0f" or "%.Ng" (N is 1..99), and once again in the main loop of the
function, where it handles those format elements embedded in the larger
format: "....%.0f..." and "....%.Ng..." (N > 0).
The problem with the "%.Ng" code is that it isn't as robust as the more
general "....%.Ng..." code - in particular the latter checks for a
locale-dependent radix-point when determining needed buffer size.
This commit removes the "%.Ng" special-cased code but leaves the
"....%.Ng..." special-cased code. It makes the former about 7% slower
compared to the situation at the start of this branch. (Part of the effort
in this branch has been to make the "....%.Ng..." code faster, so that
there's less of an overall performance hit by removing "%.Ng").
David Mitchell [Fri, 19 May 2017 15:15:31 +0000 (16:15 +0100)]
Perl_sv_vcatpvfn_flags: handle %.NNNg case earlier
In the main loop, we look for %.NNNg and handle it specially.
Change it so that the special-case is only used when precis is small
enough to that it fits in the local ebuf[] rather than the malloced
PL_efloatbuf. This allows the check for this special case to be done
earlier with less redundant calculations.
David Mitchell [Fri, 19 May 2017 14:45:51 +0000 (15:45 +0100)]
Perl_sv_vcatpvfn_flags: use quick concat for %.0f
Most floating-point formats now use the quick concat path. But the
"%.0f" shortcut was accidentally bypassing that path. This commit fixes
that.
David Mitchell [Thu, 18 May 2017 11:47:51 +0000 (12:47 +0100)]
Perl_sv_vcatpvfn_flags: simplify concat of f/p str
Since floating-point formats do their own formatting and padding, skip the
block of code at the end of the main loop which handles appending eptr to
sv, and do our own stripped-down version.
David Mitchell [Thu, 18 May 2017 10:44:17 +0000 (11:44 +0100)]
Perl_sv_vcatpvfn_flags: s/gconverts/Gconvert's/
fix a comment, so that a search for the word 'Gconvert' gets a match.
So that a later comment 'See earlier comment about buggy Gconvert' makes
sense.
David Mitchell [Thu, 18 May 2017 10:32:27 +0000 (11:32 +0100)]
Perl_sv_vcatpvfn_flags: tighten hexfp var scope
Only have the 'hexfp' var declared within the innermost scope it is
actually needed for.
David Mitchell [Thu, 18 May 2017 10:17:32 +0000 (11:17 +0100)]
Perl_sv_vcatpvfn_flags: rename 'is_simple' var
the definition of 'simple' required the format to have a precision.
David Mitchell [Thu, 18 May 2017 10:03:28 +0000 (11:03 +0100)]
Perl_sv_vcatpvfn_flags: move pod closer
Several static functions etc had been added between the pod and the
main function. Move the pod to be just above it.
Also incorporate a comment into the pod about utf8ness of pattern and SV
needing to match.
David Mitchell [Thu, 18 May 2017 09:45:56 +0000 (10:45 +0100)]
Perl_sv_vcatpvfn_flags: eliminate utf8buf[] var
%c for a >255 char generates its utf8 byte representation and stores it in
thiis temporarly buffer:
U8 utf8buf[UTF8_MAXBYTES+1]
But we already have another temporary buffer, ebuf, for creating floating
point strings, which is big enough. So use that instead.
David Mitchell [Thu, 18 May 2017 09:37:42 +0000 (10:37 +0100)]
Perl_sv_vcatpvfn_flags: reorganise loop vars
There are a big chunk of local vars declared at the top of the main loop.
Reorder the declarations to group similar vars together, and add a comment
to each var explaining what its for.
No functional changes.
David Mitchell [Thu, 18 May 2017 08:49:08 +0000 (09:49 +0100)]
Perl_sv_vcatpvfn_flags: move vars to inner scope
Add a new scope around the floating-point code, then move some
locals var declarations into that scope.
David Mitchell [Thu, 18 May 2017 08:41:15 +0000 (09:41 +0100)]
Perl_sv_vcatpvfn_flags: extract hex f/p code
There is a large block of code (nearly 300 lines) in
Perl_sv_vcatpvfn_flags(), which handles the %a/%A hexadecimal
floating-point format. Move it into new static function,
S_format_hexfp().
No functional changes.
David Mitchell [Thu, 18 May 2017 08:03:20 +0000 (09:03 +0100)]
Perl_sv_vcatpvfn_flags: move some macros earlier
There are some macro definitions in the body of Perl_sv_vcatpvfn_flags()
which handle some possible differences between double and long double.
Move these to before the function as they will shortly need to be visible
to a new helper function. At the same time, prefix their names with with
VCATPVFN_ to make clear what they're for.
For the same reason I've also added a new typedef, vcatpvfn_long_double_t.
I also eliminated the FV_ISFINITE macro definition as its no longer used.
David Mitchell [Wed, 17 May 2017 12:36:27 +0000 (13:36 +0100)]
remove HAS_LDBL_SPRINTF_BUG code
This code was added in 2002 to work round an Irix 6 rounding bug in
long double sprintfs.
I strongly suspect that any such OS bug has long been fixed and/or such
machines have been retired or are unlikely to have new perls installed on
them.
Part of the motivation for removing this code is that following the
previous commit, that block of code's use of the float_need variable
is likely to be wrong (since it now includes exponent etc), but I have no
way of testing it.
I've left the probe code in hints/irix_6.sh, so if anyone ever reports
sprintf.t failures on an old Irix platform, perl -V should show if their
system still has the bug. At that point someone brave could resurrect this
block of code.
David Mitchell [Wed, 17 May 2017 11:27:18 +0000 (12:27 +0100)]
Perl_sv_vcatpvfn_flags: better calc f/p buf size
How it works out the needed buffer size for the various floating point
formats is a bit opaque. This commit extensively documents and
rationalises the process. In particular it will no longer allocate a very
large buffer for %g printing a large number (%g switches to %e style
format rather than %f in cases like this). Also it no longer relies on a
+40 fudge factor to accommodate exponents - this is now factored in
properly.
It still includes a +20 safety fudge factor for production builds, but
this is disabled under DEBUGGING so that ASAN and the like are likely to
more quickly spot issues during development.
David Mitchell [Tue, 16 May 2017 15:30:13 +0000 (16:30 +0100)]
sprintf: handle sized int-ish formats with Inf/Nan
The code path taken when int-ish formats saw an Inf/Nan was to jump to the
floating-point handler, but then that would warn about (valid) size
qualifiers. For example before:
$ perl -we'printf "[%hi]\n", Inf'
Invalid conversion in printf: "%hi" at -e line 1.
Redundant argument in printf at -e line 1.
[%hi]
$
After this commit:
$ perl -we'printf "[%hi]\n", Inf'
[Inf]
$
It also makes the code simpler.
David Mitchell [Tue, 16 May 2017 07:53:19 +0000 (08:53 +0100)]
Perl_sv_vcatpvfn_flags: handle Inf/Nan in 1 place
At the start of the float section, check whether the value if Inf/Nan
and handle directly. This stops later blocks of code having to test for it
too. Also simplify the formatting of Inf/Nan - let the general code at the
end of the block do any pre/post padding.
David Mitchell [Mon, 15 May 2017 17:59:54 +0000 (18:59 +0100)]
Perl_sv_vcatpvfn_flags: sort PL_numeric_radix_sv
Under locales the radix point may not be just a simple '.' but a Unicode
string like "\N{ARABIC DECIMAL SEPARATOR}". Currently the hex f/p code
explicitly takes account of the length of this string when calculating the
buffer length, but the other branches don't - they just rely on the
"add 40 fudge factor" to protect them.
Instead, handle its length for all branches, and simplify utf8 handling.
Currently it checks post-format whether the radix point was utf8, and if
so marks the resulting buffer as utf8. Instead, check for utf8-ness at the
same time we check for length.
This new approach doesn't check whether the resulting string actually
contains the radix point string, so in principle the string could be
marked utf8 but not have any >127 chars. I think this is harmless.
David Mitchell [Mon, 15 May 2017 19:42:12 +0000 (20:42 +0100)]
Perl_sv_vcatpvfn_flags() split %.0f and %.Ng
The format elements "%.0f" and "%.NNNg" are handled specially in the main
loop. Split the code block which handles them and process %.0f earlier. It
doesn't need to allocate a variable-length buffer or worry about the
length of the radix string.
David Mitchell [Mon, 15 May 2017 13:49:50 +0000 (14:49 +0100)]
S_F0convert(): remove Nan/Inf handling
This function handles sprintf "%.0f". It also handles Inf/Nan, but neither
of its callers will call it with such an nv. Its code for handling them is
also broken - it returns the \0 following the "Inf" or "Nan! string.
So just remove this unneeded and broken functionality.
At the same time document what S_F0convert() does.
David Mitchell [Mon, 15 May 2017 12:54:17 +0000 (13:54 +0100)]
Perl_sv_vcatpvfn_flags: fix arg to SNPRINTF_G()
One of the callers of SNPRINTF_G() passes 'size' as its third arg - but
there is no such variable. This code happens only to be used in the
!USE_QUADMATH branch, and the SNPRINTF_G macro only uses that arg under
USE_QUADMATH. So it doesn't matter. But replace 'size' with 'sizeof(ebuf)'
in case that changes in future.
David Mitchell [Mon, 15 May 2017 11:51:56 +0000 (12:51 +0100)]
Perl_sv_vcatpvfn_flags: reduce scope of local var
fix_ldbl_sprintf_bug is only used in one block of code so declare it in
that block.
Given that that block is only compiled under HAS_LDBL_SPRINTF_BUG,
which seems only to be for some obscure Irix issues from 2002,
I haven't actually tested this.
David Mitchell [Mon, 15 May 2017 10:59:49 +0000 (11:59 +0100)]
use SvCUR(PL_numeric_radix_sv) not SvLEN()
When determining the length of buffer needed to output the decimal point
in the current locale, use SvCUR(PL_numeric_radix_sv) rather than
SvLEN(PL_numeric_radix_sv). I presume this was a thinko in the original
commit. Using SvLEN currently seems harmless, since typically SvCUR <
SvLEN, but one could conceive a future scenario where locale info is set
using alien string buffers with SvLEN(sv) == 0.
David Mitchell [Thu, 11 May 2017 08:06:05 +0000 (09:06 +0100)]
Perl_sv_vcatpvfn_flags: reindent block
whitespace only
David Mitchell [Thu, 11 May 2017 08:00:30 +0000 (09:00 +0100)]
Perl_sv_vcatpvfn_flags: reduce scope of 'int i'
Declare an 'i' var wherever needed for local use, rather than being in
scope for 1600 lines.
David Mitchell [Wed, 10 May 2017 16:23:51 +0000 (17:23 +0100)]
Perl_sv_vcatpvfn_flags: get rid of an (int) cast
harmless in this case, but there really shouldn't be (int) casts
on string length and ptr diff calculations
David Mitchell [Wed, 10 May 2017 15:58:58 +0000 (16:58 +0100)]
Perl_sv_vcatpvfn_flags: calc (width - elen) once
There's a couple of blocks of code which repeat the expression
(width - elen). Calculate this once at the top. This makes it slightly
easier to audit the code for signed/unsigned wrap etc.
Should be no functional change.
David Mitchell [Wed, 10 May 2017 15:17:18 +0000 (16:17 +0100)]
Perl_sv_vcatpvfn_flags: avoid 1-byte buf overrun
This only occurs on the "%a" (hex) format, and only happens when
processing a denormalised value whose bit pattern is 0xf....f or similar,
and when rounding up it needs to insert a '1' at the head of the number
and shift the rest of the digits down one.
In practice this never seems to happen - the top nybble of a denormalised
float value always seems to be 0x1 (presumably because that's implicit) so
there's never any carry to a higher digit. Maybe other platforms do it
differently.
Also VHEX_SIZE seems to be rounded up, so in practice there's no overrun.
But better safe than sorry.
David Mitchell [Wed, 10 May 2017 14:27:49 +0000 (15:27 +0100)]
Perl_sv_vcatpvfn_flags: avoid a potential wrap
In the floating-point hex (%a) code, it checks whether the requested
precision is smaller than the hex buf size. It does this by casting
(precis + 1) to signed. Since precis can be any user-supplied value,
this can wrap. Instead, cast the (buffer_length - 1) to unsigned, since
this is bounded to a small constant value > 1.
In practise this makes no difference currently, as a large precis will
have caused a malloc panic earlier anyway. But that might change in
future.
David Mitchell [Wed, 10 May 2017 13:03:25 +0000 (14:03 +0100)]
Perl_sv_vcatpvfn_flags: simplify an expression
In the hex floating/point code, (subnormal ? vfnz : vhex) is equivalent to
v0, which we just set to the same value.
So keep things simple.
David Mitchell [Wed, 10 May 2017 10:19:38 +0000 (11:19 +0100)]
sprintf(): handle mangled formats better with utf8
Currently if sprintf() detects an error in the format while processing
a %.... entry, it copies the bytes as-is from the % to the point the
error was detected, then continues, If the output string and format string
don't have the same utf8-ness, this can result in badly-formed utf8
output.
This commit changes the code so that it just appends a '%' then restarts
processing from the character following the %. Most of the time this just
again results with the characters following the % being output as-is,
expect this time the 'normal' character-copying code path is taken, which
handles utf8 mismatches correctly.
By doing this, it also removes a block of code which contained a "roll
your own" string appender which used SvGROW() and Copy(). This was one
further place which was potentially open to wrapping and block overrun
bugs.
This commit may cause occasional changes in behaviour, depending on
whether there are any further '%' characters within the bad section of the
format. Now these will be reprocessed, possibly triggering further
'Invalid conversion' type warnings.
David Mitchell [Tue, 9 May 2017 14:55:07 +0000 (15:55 +0100)]
Perl_sv_vcatpvfn_flags: simplify wrap checking
The main SvGROW() has a new-length arg roughly equivalent to
(SvCUR(sv) + elen + zeros + esignlen + dotstrlen + 1);
Rationalise the overflow/wrap checking by doing each individual addition
separately with its own check. This is slightly redundant as some of the
values are interdependent, but this way it's easier to see whether all
possible overflows are being checked for.
`
David Mitchell [Tue, 9 May 2017 14:32:49 +0000 (15:32 +0100)]
Perl_sv_vcatpvfn_flags: reduce scope of 'gap' var
shouldn't make any functional difference
David Mitchell [Tue, 9 May 2017 14:29:25 +0000 (15:29 +0100)]
Perl_sv_vcatpvfn_flags: reindent a block of code
(whitespace-only change)
indent a chunk of code ready for the next commit.
David Mitchell [Tue, 9 May 2017 13:48:59 +0000 (14:48 +0100)]
Perl_sv_vcatpvfn_flags: reduce scope of 'have' var
Just declare this var in the small block where its needed, rather than
being in scope for 500+ lines.
Should be no functional changes.
David Mitchell [Tue, 9 May 2017 13:36:40 +0000 (14:36 +0100)]
Perl_sv_vcatpvfn_flags: split the 'need' local var
The 'need' local var has a wide scope (over 500 lines), and is used for
two separate purposes. Split it into two separate vars. One remains wide
scope, but is just used to calculate the new value of PL_efloatsize. Rename
that one to 'float_need'.
For the second use, introduce a new scope of just 6 lines with its own
'need' variable'.
This should make no functional difference but makes the code slightly
easier to understand and analyse.
David Mitchell [Tue, 9 May 2017 13:29:11 +0000 (14:29 +0100)]
sprintf(): add memory wrap tests
In various places Perl_sv_vcatpvfn_flags() does croak_memory_wrap()
(including a couple added by the previous commit to fix RT #131260),
but there don't appear to be any tests for them.
So this commit adds some tests.
Steve Hay [Wed, 7 Jun 2017 07:39:20 +0000 (08:39 +0100)]
Fix dmake build breakage when using Visual C++
This was introduced by commit
1f664ef531. dmake with VC++ is not a common
combination, but I should have tested it :-(