This is a live mirror of the Perl 5 development currently hosted at
10 years agoMerge support for UTF8 symbols
Father Chrysostomos [Thu, 6 Oct 2011 20:02:12 +0000 (13:02 -0700)]
Merge support for UTF8 symbols

This branch makes symbols support UTF8 internally, which means that
Unicode is supported properly at the perl level.  So ${"\xff"} will
give you the same scalar, regardless of the internal encoding of the
string.  Also, many  parts of the core are now nul-clean, too, as a
result of the UTF8 changes, which means that ‘$m = "a\0b"; foo->$m’
will try to call the method named "a\0b", instead of just "a".
Details follow.

• New API functions:
  Many of these take a _flags parameter, which accept the
  SVf_UTF8 flag.
  • gv_init_pv(n)/sv
  • gv_fetchmeth_pv(n)/sv
  • gv_fetchmeth_pv(n)/sv_autoload
  • gv_fetchmethod_pv(n)/sv_flags — may change
  • gv_autoload_pv(n)/sv
  • newGVgen_flags
  • sv_derived_from_pv(n)/sv
  • sv_does_pv(n)/sv
  • whichsig_pv(n)/sv
• New internal functions:
  • CopSTASH_flags
  • CopSTASH_flags_set
  • PmopSTASH_flags
  • PmopSTASH_flags_set
  • sv_sethek
• Parts of Perl that handle Unicode symbol names correctly:
  • Method names (including those passed to ‘use overload’)
  • Typeglob names (including variable and filehandle names)
  • Package names
  • Constant subroutine names (not nul-clean yet)
  • goto
  • Symbolic dereferencing
  • Second argument to bless() and tie()
  • Return value of ref()
  • Package names returned by caller()
  • Subroutine prototypes
  • Attributes
  • Warnings and error messages that mention filehandles, packages,
    methods, variables, constant values, subroutines, symbolic refer-
    ences, format names and subroutine prototypes
• Parts of Perl that now handle embedded nuls correctly:
  • Method names
  • Typeglob names (including filehandle names)
  • Package names
  • Autoloading
  • Return value of ref()
  • Package names returned by caller()
  • Filehandle warnings
  • Typeglob elements (*foo{"THING\0stuff"})
  • Signal names
  • Warnings and error messages that mention (yes, it’s the same list
    as above) filehandles, packages, methods, variables, constant val-
    ues, subroutines, symbolic references, format names and subroutine
• Other bug fixes
  • *{é} now treats é as the name of the glob (the usual implicit
    quoting), instead of treating it as a bareword (strict-unsafe)
    or function call.  *{é} used to be equivalent to *{+é}, in
    other words.
• Modified modules:
  • constant has been modified not to apply the workaround for the bug
    that this branch fixes, if that workaround does not apply.
  • attributes has been modified as part of making Unicode attri-
    butes work.
  • XS::APItest
  • mro, as part of making method lookup account for Unicode.
• Side effects
  • Blessing into "\0" no longer causes ref() to return false.
  • *{"*é::..."} is now equivalent to *{"é::..."}, just as
    *{"*e::..."} is equivalent to *{"e::..."}.  Previously, the * was
    only stripped if followed by [A-Za-z].
  • $é is now subject to ‘Used only once’ warnings.  It used to be
    exempt, as the code that checked the named considered it a punctu-
    ation variable.

10 years agoCorrect skip counts for miniperl
Father Chrysostomos [Thu, 6 Oct 2011 15:46:35 +0000 (08:46 -0700)]
Correct skip counts for miniperl

10 years agoIncrease $mro::VERSION from 1.08 to 1.09
Father Chrysostomos [Thu, 6 Oct 2011 07:10:06 +0000 (00:10 -0700)]
Increase $mro::VERSION from 1.08 to 1.09

10 years agoIncrease $attributes::VERSION from 0.16 to 0.17
Father Chrysostomos [Thu, 6 Oct 2011 07:09:30 +0000 (00:09 -0700)]
Increase $attributes::VERSION from 0.16 to 0.17

10 years agoIncrease $XS::APItest::VERSION from 0.31 to 0.32
Father Chrysostomos [Thu, 6 Oct 2011 07:08:59 +0000 (00:08 -0700)]
Increase $XS::APItest::VERSION from 0.31 to 0.32

10 years agoIncrease $constant::VERSION from 1.22 to 1.23
Father Chrysostomos [Thu, 6 Oct 2011 07:08:28 +0000 (00:08 -0700)]
Increase $constant::VERSION from 1.22 to 1.23

10 years agouni/universal.t tests passing
Father Chrysostomos [Thu, 6 Oct 2011 06:32:16 +0000 (23:32 -0700)]
uni/universal.t tests passing

10 years agoRewrite -l warning test to account for 433644eed
Father Chrysostomos [Thu, 6 Oct 2011 06:31:52 +0000 (23:31 -0700)]
Rewrite -l warning test to account for 433644eed

10 years agoRestore whichsig to the list in perlapi
Father Chrysostomos [Thu, 6 Oct 2011 05:26:00 +0000 (22:26 -0700)]
Restore whichsig to the list in perlapi

10 years agowhichsig nul-cleanup.
Brian Fraser [Fri, 30 Sep 2011 20:48:58 +0000 (13:48 -0700)]
whichsig nul-cleanup.

This adds _pv, _pvn, and _pv versions of whichsig() in mg.c, which
get both kill "NAME" and %SIG lookup nul-clean.

10 years agotoke.c, ext/attributes/attributes.xs: Make attributes UTF-8 clean.
Brian Fraser [Fri, 22 Jul 2011 17:16:42 +0000 (18:16 +0100)]
toke.c, ext/attributes/attributes.xs: Make attributes UTF-8 clean.

10 years agoop.c: Scalar filehandles in errors UTF8 cleanup.
Brian Fraser [Fri, 30 Sep 2011 20:26:26 +0000 (13:26 -0700)]
op.c: Scalar filehandles in errors UTF8 cleanup.

10 years agoModify S_pending_ident to use sv_catpvn_flags
Father Chrysostomos [Fri, 30 Sep 2011 19:15:09 +0000 (12:15 -0700)]
Modify S_pending_ident to use sv_catpvn_flags

with the new SV_CAT* constants, since that’s faster than creating an
SV to pass to sv_catsv.

10 years agoTODO tests for parsing our() now pass
Brian Fraser [Thu, 6 Oct 2011 05:16:32 +0000 (22:16 -0700)]
TODO tests for parsing our() now pass

10 years agoOust cv_ckproto_len
Father Chrysostomos [Fri, 30 Sep 2011 19:10:51 +0000 (12:10 -0700)]
Oust cv_ckproto_len

It is no longer used in core (having been superseded by
cv_ckproto_len_flags), is unused on CPAN, and is not part of the API.

The cv_ckproto ‘public’ macro is modified to use the _flags version.
I put ‘public’ in quotes because, even before this commit, cv_ckproto
was using a non-exported function, and hence could never have worked
on a strict linker (or whatever you call it).

10 years agotoke.c, op.c, sv.c: Prototype parsing and checking are nul-and-UTF8 clean.
Brian Fraser [Fri, 30 Sep 2011 13:25:45 +0000 (06:25 -0700)]
toke.c, op.c, sv.c: Prototype parsing and checking are nul-and-UTF8 clean.

This means that eval "sub foo ($;\0whoops) { say @_  }" will correctly
include \0whoops in the CV's prototype (while complaining about illegal
characters), and that

use utf8;
BEGIN { $::{"foo"} = "\$\0L\351on" }
BEGIN { eval "sub foo (\$\0L\x{c3}\x{a9}on) {};"; }

will not warn about a mismatched prototype.

10 years agogv.c, op.c, pp.c: Stash-injected prototypes and prototype() are UTF-8 clean.
Brian Fraser [Mon, 11 Jul 2011 17:50:10 +0000 (18:50 +0100)]
gv.c, op.c, pp.c: Stash-injected prototypes and prototype() are UTF-8 clean.

This makes perl -E '$::{example} = "\x{30cb}"; say prototype example;'
store and fetch the correctly flagged prototype.

With this, all TODO tests in gv.t pass; The next commit will deal
with making the parsing of prototypes nul-clean.

10 years agopp.c: Got pp_gelem nul-clean.
Brian Fraser [Mon, 11 Jul 2011 17:17:32 +0000 (18:17 +0100)]
pp.c: Got pp_gelem nul-clean.

10 years agotoke.c: Some simple mending to get readline() working with UTF-8 filehandles
Brian Fraser [Sun, 10 Jul 2011 11:20:26 +0000 (08:20 -0300)]
toke.c: Some simple mending to get readline() working with UTF-8 filehandles

10 years agopp_sys.c: pp_select UTF8 cleanup.
Brian Fraser [Sun, 10 Jul 2011 11:18:57 +0000 (08:18 -0300)]
pp_sys.c: pp_select UTF8 cleanup.

10 years agoop.c: Malformed prototype warning on UTF8 sub name
Brian Fraser [Thu, 7 Jul 2011 12:25:18 +0000 (09:25 -0300)]
op.c: Malformed prototype warning on UTF8 sub name

10 years agogv.c: Use name_end to avoid compiler warning
Father Chrysostomos [Thu, 6 Oct 2011 05:13:35 +0000 (22:13 -0700)]
gv.c: Use name_end to avoid compiler warning

In this code path, name_cursor could be uninitialised if
gv_fetchpvn_flags is called with GV_NOTQUAL|GV_ADDWARN.  Whenever it
is initialised, it is the same as name_end by the time this part
of the function is reached.

10 years agoglobvar.t: Skip PL_warn_uninit_sv
Father Chrysostomos [Thu, 6 Oct 2011 05:07:07 +0000 (22:07 -0700)]
globvar.t: Skip PL_warn_uninit_sv

Until someone can explain to me why these sorts of things are exported,
I’ll skipping the test.  Nothing is failing for me (yet), and it is
not clear that we want to support this name for ever.

10 years agoFix diag.t failure with diag_listed_as comment
Father Chrysostomos [Thu, 6 Oct 2011 04:45:05 +0000 (21:45 -0700)]
Fix diag.t failure with diag_listed_as comment

10 years agoSeveral TODO tests that now pass.
Brian Fraser [Thu, 6 Oct 2011 03:45:21 +0000 (20:45 -0700)]
Several TODO tests that now pass.

10 years agoutil.c UTF8 cleanup
Brian Fraser [Thu, 7 Jul 2011 09:09:12 +0000 (06:09 -0300)]
util.c UTF8 cleanup

10 years agoMore warnings tests.
Brian Fraser [Thu, 7 Jul 2011 09:01:33 +0000 (06:01 -0300)]
More warnings tests.

10 years agouniversal.c: VERSION UTF8 cleanup
Brian Fraser [Thu, 7 Jul 2011 08:57:57 +0000 (05:57 -0300)]
universal.c: VERSION UTF8 cleanup

10 years agouniversal.c: Make croak_xs_usage account for UTF8
Brian Fraser [Thu, 7 Jul 2011 08:43:43 +0000 (05:43 -0300)]
universal.c: Make croak_xs_usage account for UTF8

10 years ago"Use of uninitialized value..." UTF8 cleanup
Brian Fraser [Thu, 7 Jul 2011 08:36:34 +0000 (05:36 -0300)]
"Use of uninitialized value..." UTF8 cleanup

10 years agogv.c: Make more warnings utf8-clean
Brian Fraser [Thu, 6 Oct 2011 03:42:53 +0000 (20:42 -0700)]
gv.c: Make more warnings utf8-clean

10 years agomro.(c|xs): Make warnings utf8-clean
Brian Fraser [Thu, 7 Jul 2011 07:35:35 +0000 (04:35 -0300)]
mro.(c|xs): Make warnings utf8-clean

10 years agot/uni/gv.t, stringify is clean, remove the TODO
Brian Fraser [Fri, 22 Jul 2011 13:05:11 +0000 (10:05 -0300)]
t/uni/gv.t, stringify is clean, remove the TODO

10 years agoTests for DATA handle in UTF8 packages
Brian Fraser [Fri, 30 Sep 2011 01:27:10 +0000 (18:27 -0700)]
Tests for DATA handle in UTF8 packages

10 years agotoke.c: Take utf8 into account when creating DATA handle
Father Chrysostomos [Fri, 30 Sep 2011 01:23:27 +0000 (18:23 -0700)]
toke.c: Take utf8 into account when creating DATA handle

This is based on work from Brian Fraser, but differs from his original
in that it does not require an intermediate SV.

10 years agoTests for UTF-8 stashes.
Brian Fraser [Fri, 22 Jul 2011 13:10:48 +0000 (10:10 -0300)]
Tests for UTF-8 stashes.

10 years agoTests for package; declarations in UTF-8
Brian Fraser [Fri, 22 Jul 2011 13:10:34 +0000 (10:10 -0300)]
Tests for package; declarations in UTF-8

10 years agoMore tests for t/uni/method.t
Brian Fraser [Fri, 22 Jul 2011 13:10:57 +0000 (10:10 -0300)]
More tests for t/uni/method.t

10 years agosv.c: Make most warnings utf8-clean
Brian Fraser [Thu, 6 Oct 2011 00:57:20 +0000 (17:57 -0700)]
sv.c: Make most warnings utf8-clean

10 years agosv.c: Make cloning account for UTF8 stash names
Brian Fraser [Thu, 29 Sep 2011 21:46:35 +0000 (14:46 -0700)]
sv.c: Make cloning account for UTF8 stash names

10 years agoMake sv.c:sv_clear account for UTF8 keys in PL_stashcache
Brian Fraser [Thu, 29 Sep 2011 21:44:55 +0000 (14:44 -0700)]
Make sv.c:sv_clear account for UTF8 keys in PL_stashcache

10 years agosv.c: Pass in UNI_DISPLAY_ISPRINT in S_not_a_number
Brian Fraser [Wed, 6 Jul 2011 17:44:11 +0000 (14:44 -0300)]
sv.c: Pass in UNI_DISPLAY_ISPRINT in S_not_a_number

10 years agopp_sys.c: Make warnings utf8-clean
Brian Fraser [Thu, 29 Sep 2011 21:39:35 +0000 (14:39 -0700)]
pp_sys.c: Make warnings utf8-clean

10 years agopp_hot.c: Make warnings utf8-clean
Brian Fraser [Wed, 6 Jul 2011 16:45:07 +0000 (13:45 -0300)]
pp_hot.c: Make warnings utf8-clean

10 years agoTeach porting/diag.t about SVf32 and SVf256
Father Chrysostomos [Wed, 5 Oct 2011 20:33:36 +0000 (13:33 -0700)]
Teach porting/diag.t about SVf32 and SVf256

10 years agopp.c: Make warnings utf8-clean
Brian Fraser [Wed, 6 Jul 2011 16:08:37 +0000 (13:08 -0300)]
pp.c: Make warnings utf8-clean

10 years agoMake op.c warnings UTF8-clean
Brian Fraser [Wed, 6 Jul 2011 15:50:59 +0000 (12:50 -0300)]
Make op.c warnings UTF8-clean

10 years agoMake gv.c and pp_ctl.c warnings utf8-clean
Brian Fraser [Wed, 5 Oct 2011 19:48:07 +0000 (12:48 -0700)]
Make gv.c and pp_ctl.c warnings utf8-clean

10 years agodoio.c: Make warnings UTF8- and nul-clean
Brian Fraser [Wed, 28 Sep 2011 03:33:02 +0000 (20:33 -0700)]
doio.c: Make warnings UTF8- and nul-clean

10 years agoutil.c for threads: stashpv_hvname_match UTF8 cleanup.
Brian Fraser [Sat, 23 Jul 2011 21:48:51 +0000 (18:48 -0300)]
util.c for threads: stashpv_hvname_match UTF8 cleanup.

10 years agoTests for DOES/isa/can with UTF8 and embedded nuls
Brian Fraser [Tue, 4 Oct 2011 21:53:12 +0000 (14:53 -0700)]
Tests for DOES/isa/can with UTF8 and embedded nuls

10 years agoDocument sv_does_pvn
Father Chrysostomos [Thu, 6 Oct 2011 07:02:36 +0000 (00:02 -0700)]
Document sv_does_pvn

10 years agoCorrect name of sv_does_sv apidoc entry
Father Chrysostomos [Fri, 30 Sep 2011 20:44:22 +0000 (13:44 -0700)]
Correct name of sv_does_sv apidoc entry

plus other tweaks

10 years agouniversal.c: sv_does() UTF8 cleanup.
Brian Fraser [Fri, 30 Sep 2011 20:42:31 +0000 (13:42 -0700)]
universal.c: sv_does() UTF8 cleanup.

This adds _sv, _pv, and _pvn forms to sv_does, and changes it to use
sv_ref() instead of sv_reftype().

10 years agomro.c: Correct utf8 and bytes concatenation
Father Chrysostomos [Thu, 29 Sep 2011 15:48:38 +0000 (08:48 -0700)]
mro.c: Correct utf8 and bytes concatenation

The previous commit introduced some code that concatenates a pv on to
an sv and then does SvUTF8_on on the sv if the pv was utf8.

That can’t work if the sv was in Latin-1 (or single-byte) encoding
and contained extra-ASCII characters.  Nor can it work if bytes are
appended to a utf8 sv.  Both produce mangled utf8.

There is apparently no function apart from sv_catsv that handle
this.  So I’ve modified sv_catpvn_flags to handle this if passed the
SV_CATUTF8 (concatenating a utf8 pv) or SV_CATBYTES (cancatenating a
byte pv) flag.

This avoids the overhead of creating a new sv (in fact, sv_catsv
even copies its rhs in some cases, so that would mean creating two
new svs).  It might even be worthwhile to redefine sv_catsv in terms
of this....

10 years agomro UTF8 cleanup.
Brian Fraser [Wed, 6 Jul 2011 13:41:10 +0000 (10:41 -0300)]
mro UTF8 cleanup.

This patch also duplicates existing mro tests with copies that use
Unicode in identifiers, to test the mro code.

Since those tests trigger it, it also fixes a bug in the parsing
of *{...}: If the first character inside the braces is a non-ASCII
Unicode identifier character, the inside is now implicitly quoted
if it is just an identifier (just as it is with ASCII identifiers),
instead of being parsed as a bareword that would violate strict subs.

10 years agouniversal.c: ->can UTF8 cleanup.
Brian Fraser [Wed, 6 Jul 2011 11:54:11 +0000 (08:54 -0300)]
universal.c: ->can UTF8 cleanup.

10 years agouniversal.c: ->isa, sv_derived_from UTF8 cleanup.
Brian Fraser [Tue, 27 Sep 2011 00:35:50 +0000 (17:35 -0700)]
universal.c: ->isa, sv_derived_from UTF8 cleanup.

This makes them both nul-and-UTF8 clean, although the latter
is somewhat superficial, as mro isn't clean yet.

(Tests coming once ->can and ->DOES are clean)

10 years agopp_sys.c: pp_tie and untie UTF8 cleanup.
Brian Fraser [Wed, 6 Jul 2011 10:57:20 +0000 (07:57 -0300)]
pp_sys.c: pp_tie and untie UTF8 cleanup.

10 years agopp.c: pp_substr for UTF-8 globs.
Brian Fraser [Tue, 27 Sep 2011 00:24:44 +0000 (17:24 -0700)]
pp.c: pp_substr for UTF-8 globs.

Since typeglobs may have the UTF8 flag set now, we need to avoid
testing SvCUR on a potential glob, as that would trip an assertion.

10 years agopp_ctl.c: pp_caller UTF8 cleanup.
Brian Fraser [Mon, 26 Sep 2011 22:32:45 +0000 (15:32 -0700)]
pp_ctl.c: pp_caller UTF8 cleanup.

10 years agosv.c: S_anonymise_cv_maybe UTF8 cleanup.
Brian Fraser [Mon, 26 Sep 2011 20:48:52 +0000 (13:48 -0700)]
sv.c: S_anonymise_cv_maybe UTF8 cleanup.

10 years agopp.c & sv.c: pp_ref UTF8 and null cleanup.
Brian Fraser [Mon, 26 Sep 2011 19:56:47 +0000 (12:56 -0700)]
pp.c & sv.c: pp_ref UTF8 and null cleanup.

This adds a new function to sv.c, sv_ref, which is a nul-and-UTF8
clean version of sv_reftype. pp_ref now uses that.

sv_ref() not only returns the SV, but also takes in an SV
to modify, so we can say both sv_ref(TARG, obj, TRUE); and
sv = sv_ref(NULL, obj, TRUE);

10 years agoAdd a sv_sethek() function to sv.c
Brian Fraser [Thu, 6 Oct 2011 06:56:03 +0000 (23:56 -0700)]
Add a sv_sethek() function to sv.c

This is exported so that attributes.xs can use it.

10 years agopp.c: pp_bless UTF8 cleanup.
Brian Fraser [Wed, 6 Jul 2011 09:16:30 +0000 (06:16 -0300)]
pp.c: pp_bless UTF8 cleanup.

Some tests in t/uni/bless.t are TODO, as ref() isn't
clean yet.

10 years agoop.c: Flag named methods if they are in UTF-8.
Brian Fraser [Mon, 26 Sep 2011 16:21:23 +0000 (09:21 -0700)]
op.c: Flag named methods if they are in UTF-8.

10 years agopp_hot.c: method_common is UTF-8 aware.
Brian Fraser [Mon, 26 Sep 2011 15:27:59 +0000 (08:27 -0700)]
pp_hot.c: method_common is UTF-8 aware.

Not really useful yet, since named methods aren't correctly
flagged; that is to access a \x{30cb} method, you'd need
to do something like Obj->${\"\x{30cb}"}.

Committer’s note: I’m also including one piece of the ‘gv.c and
pp_ctl.c warnings’ patch so that the newly-added tests in this
commit pass.

10 years agogv.c: gv_fetchmethod_(flags|autoload) UTF8 cleanup.
Brian Fraser [Tue, 4 Oct 2011 01:16:03 +0000 (18:16 -0700)]
gv.c: gv_fetchmethod_(flags|autoload) UTF8 cleanup.

10 years agogv.c: S_gv_get_super_pkg UTF8 cleanup.
Brian Fraser [Mon, 26 Sep 2011 05:32:52 +0000 (22:32 -0700)]
gv.c: S_gv_get_super_pkg UTF8 cleanup.

10 years agogv.c: gv_fetchmeth_pvn_autoload UTF8 cleanup.
Brian Fraser [Wed, 6 Jul 2011 07:31:08 +0000 (04:31 -0300)]
gv.c: gv_fetchmeth_pvn_autoload UTF8 cleanup.

As with the previous commit, no Perl-level visible changes.

10 years agogv.c: gv_fetchmeth_pvn UTF8 cleanup.
Brian Fraser [Mon, 26 Sep 2011 05:15:55 +0000 (22:15 -0700)]
gv.c: gv_fetchmeth_pvn UTF8 cleanup.

Since gv_fetchmeth_pvn is primarily used from within gv.c,
and not much of anything is passing in the flag yet, this has
no visible changes on the Perl level; So tests remain
entirely in XS::APItest for the time being.

10 years agogv.c: gv_init_pvn now uses newCONSTSUB_flags.
Brian Fraser [Wed, 6 Jul 2011 06:03:15 +0000 (03:03 -0300)]
gv.c: gv_init_pvn now uses newCONSTSUB_flags.

10 years agopp.c: Make pp_rv2cv use gv_autoload_pvn()
Brian Fraser [Fri, 22 Jul 2011 12:52:28 +0000 (09:52 -0300)]
pp.c: Make pp_rv2cv use gv_autoload_pvn()

10 years agopp_hot.c: pp_entersub UTF8 cleanup.
Brian Fraser [Fri, 22 Jul 2011 12:51:52 +0000 (09:51 -0300)]
pp_hot.c: pp_entersub UTF8 cleanup.

10 years agopp_ctl.c: pp_goto UTF8 cleanup.
Brian Fraser [Fri, 22 Jul 2011 12:51:03 +0000 (09:51 -0300)]
pp_ctl.c: pp_goto UTF8 cleanup.

10 years agogv.c: gv_autoload4 is now UTF-8 clean.
Brian Fraser [Fri, 22 Jul 2011 12:49:51 +0000 (09:49 -0300)]
gv.c: gv_autoload4 is now UTF-8 clean.

This also uncomments the UTF-8 tests in XS::APItest.

10 years agogv.c: gp_free UTF8 cleanup
Brian Fraser [Wed, 6 Jul 2011 05:36:37 +0000 (02:36 -0300)]
gv.c: gp_free UTF8 cleanup

10 years agoTests for UTF-8 GVs.
Brian Fraser [Wed, 6 Jul 2011 05:20:04 +0000 (02:20 -0300)]
Tests for UTF-8 GVs.

Basically t/op/gv.t with UTF-8 names. A vast majority of
the tests currently fail and are marked as TODO; Minus for
failures related to prototypes, these will start working
in the following commits.

10 years agoop.c: newCONSTSUB and newXS UTF8 cleanup.
Brian Fraser [Wed, 6 Jul 2011 04:50:31 +0000 (01:50 -0300)]
op.c: newCONSTSUB and newXS UTF8 cleanup.

newXS was merged into newXS_flags; added a line in the docs
recommeding using that instead.

newCONSTSUB got a _flags version, which generates the CV in
the right glob if passed the UTF-8 flag.

10 years agosv.c: glob_assign_glob is now UTF-8 aware.
Brian Fraser [Wed, 6 Jul 2011 04:43:51 +0000 (01:43 -0300)]
sv.c: glob_assign_glob is now UTF-8 aware.

This means that
is($t = sub { *\x{30cb} }->(), "*main::\x{30cb}");
won't fail, as $t will get the right glob.
(Though possibly not the right stash, if that also has
UTF-8 in it. That will be done later.)

10 years agoBasic tests for UTF-8 vars.
Brian Fraser [Tue, 5 Jul 2011 22:24:41 +0000 (19:24 -0300)]
Basic tests for UTF-8 vars.

10 years agotoke.c: S_scan_inputsymbol, initial GV-related UTF8 cleanup
Brian Fraser [Sat, 23 Jul 2011 21:34:05 +0000 (18:34 -0300)]
toke.c: S_scan_inputsymbol, initial GV-related UTF8 cleanup

10 years agotoke.c: S_checkcomma, GV-related UTF8 cleanup
Brian Fraser [Sat, 23 Jul 2011 21:32:19 +0000 (18:32 -0300)]
toke.c: S_checkcomma, GV-related UTF8 cleanup

10 years agotoke.c: yylex, GV-related UTF8 cleanup
Brian Fraser [Sat, 23 Jul 2011 21:26:51 +0000 (18:26 -0300)]
toke.c: yylex, GV-related UTF8 cleanup

10 years agotoke.c: S_find_in_my_stash, GV-related UTF8 cleanup
Brian Fraser [Sat, 23 Jul 2011 21:09:03 +0000 (18:09 -0300)]
toke.c: S_find_in_my_stash, GV-related UTF8 cleanup

10 years agotoke.c: S_intuit_method, GV-related UTF8 cleanup
Brian Fraser [Sat, 23 Jul 2011 20:29:44 +0000 (17:29 -0300)]
toke.c: S_intuit_method, GV-related UTF8 cleanup

10 years agotoke.c: S_intuit_more, GV-related UTF8 cleanup
Brian Fraser [Sat, 23 Jul 2011 20:16:15 +0000 (17:16 -0300)]
toke.c: S_intuit_more, GV-related UTF8 cleanup

10 years agotoke.c: S_force_ident, GV-related UTF8 cleanup
Brian Fraser [Sat, 23 Jul 2011 20:03:18 +0000 (17:03 -0300)]
toke.c: S_force_ident, GV-related UTF8 cleanup

10 years agopp.c: pp_rv2gv UTF8 cleanup.
Brian Fraser [Sat, 23 Jul 2011 19:54:00 +0000 (16:54 -0300)]
pp.c: pp_rv2gv UTF8 cleanup.

10 years agoMerge multi and flags params to gv_init_*
Father Chrysostomos [Sun, 2 Oct 2011 20:57:19 +0000 (13:57 -0700)]
Merge multi and flags params to gv_init_*

Since multi is a boolean (even though it’s typed as an int), there is
no need to have a separate parameter.  We can just use a flag bit.

10 years agogv.c: Initial gv_fetchpvn_flags and gv_stashpvn UTF8 cleanup
Brian Fraser [Sat, 24 Sep 2011 18:57:27 +0000 (11:57 -0700)]
gv.c: Initial gv_fetchpvn_flags and gv_stashpvn UTF8 cleanup

Now that a glob can be initialized and fetched in UTF-8,
the next commit will introduce some changes in toke.c to
actually test this.

Committer’s note: To keep tests passing I had to incorporate
the toke.c:S_pending_ident changes in the same patch.

10 years Disable the UTF8 downgrade when unnecessary
Father Chrysostomos [Sun, 25 Sep 2011 00:58:16 +0000 (17:58 -0700)] Disable the UTF8 downgrade when unnecessary

The downgrade bug that has to imitate is about to be fixed
in the next commit.  The bug workaround is itself a bug if the bug it
is trying to work around is not present.

10 years agoFix thinko in hek_eq_pvn_flags
Father Chrysostomos [Sat, 24 Sep 2011 13:29:10 +0000 (06:29 -0700)]
Fix thinko in hek_eq_pvn_flags

Doing memEQ(str1, str2, len2) without checking the length
first will cause memEQ("forth","fort"...) to compare equal and
memEQ("fort","forth"...) to read unallocated memory.

This was only a potential future problem, as none of the callers reach
this branch.

10 years agohv.c: Stash-related UTF-8 cleanup.
Brian Fraser [Sat, 23 Jul 2011 19:51:54 +0000 (16:51 -0300)]
hv.c: Stash-related UTF-8 cleanup.

This adds a new static function to hv.c, hek_eq_pvn_flags,
which replaces several memEQs.

It also cleans up hv_name_set and has the relevant calls
to hv_common and friends made UTF-8 aware.

Finally, it changes share_hek() to modify the hash passed
in if the pv was modified when downgrading.

10 years agogv.c: gv_name_set and gv_init_(etc) now initialize the GV's name as UTF-8 if passed...
Brian Fraser [Tue, 5 Jul 2011 10:00:02 +0000 (07:00 -0300)]
gv.c: gv_name_set and gv_init_(etc) now initialize the GV's name as UTF-8 if passed the UTF8 flag.

newCONSTSUB is still unclean however, so constant subs are
still generated under a wrong name.

gv_fullname4 is also UTF-8 aware now; While that should've gotten
it's own commit and tests, it's not possible to test the
UTF-8 part without the gv_init changes, and it's not possible
to test the gv_init changes without gv_fullname4.
Chicken and egg, as it were. So let's compromise and
wait for the relevant tests once globs can be intiialized as
UTF-8 from the Perl level without XS magic.

10 years agoSvUTF8() for globs.
Brian Fraser [Mon, 18 Jul 2011 16:36:09 +0000 (17:36 +0100)]
SvUTF8() for globs.

This turns on the GV's UTF8 flag in sv.c when the GV is stringified.
This works the same way overloading and references work, in that the
SvUTF8 flag is only valid immediately after SvPV.

For Nick's much more detailed explanation, see

10 years agoRestore newGVgen to perlapi.pod
Father Chrysostomos [Sat, 24 Sep 2011 12:40:41 +0000 (05:40 -0700)]
Restore newGVgen to perlapi.pod

10 years agogv.c: newGVgen_flags and a flags parameter for gv_get_super_pkg.
Brian Fraser [Sun, 2 Oct 2011 05:14:50 +0000 (22:14 -0700)]
gv.c: newGVgen_flags and a flags parameter for gv_get_super_pkg.

10 years agoRemove method param from gv_autoload_*
Father Chrysostomos [Sun, 2 Oct 2011 05:14:19 +0000 (22:14 -0700)]
Remove method param from gv_autoload_*

method is a boolean flag (typed I32, but used as a boolean) added by
commit 54310121b442.

These new gv_autoload_* functions have a flags parameter, so there’s
no reason for this extra effective bool.  We can just use a flag bit.

10 years agoRemove 4 from new gv_autoload4_(sv|pvn?) functions
Father Chrysostomos [Sun, 2 Oct 2011 05:13:26 +0000 (22:13 -0700)]
Remove 4 from new gv_autoload4_(sv|pvn?) functions

The 4 was added in commit 54310121b442 (inseparable changes during
5.003/4 developement), presumably the ‘Don't look up &AUTOLOAD in @ISA
when calling plain function’ part.

Before that, gv_autoload had three arguments, so the 4 indicated the
new version (with the method argument).

Since these new functions don’t all have four arguments, and since
they have a new naming convention, there is not reason for the 4.