This branch makes symbols support UTF8 internally, which means that
Unicode is supported properly at the perl level. So ${"\xff"} will
give you the same scalar, regardless of the internal encoding of the
string. Also, many parts of the core are now nul-clean, too, as a
result of the UTF8 changes, which means that ‘$m = "a\0b"; foo->$m’
will try to call the method named "a\0b", instead of just "a".
Details follow.
• New API functions:
Many of these take a _flags parameter, which accept the
SVf_UTF8 flag.
• HvNAMELEN
• HvNAMEUTF8
• HvENAMELEN
• HvENAMEUTF8
• gv_init_pv(n)/sv
• gv_fetchmeth_pv(n)/sv
• gv_fetchmeth_pv(n)/sv_autoload
• gv_fetchmethod_pv(n)/sv_flags — may change
• gv_autoload_pv(n)/sv
• newGVgen_flags
• sv_derived_from_pv(n)/sv
• sv_does_pv(n)/sv
• whichsig_pv(n)/sv
• New internal functions:
• GvNAMEUTF8
• GvENAMELEN
• GvENAME_HEK
• CopSTASH_flags
• CopSTASH_flags_set
• PmopSTASH_flags
• PmopSTASH_flags_set
• sv_sethek
• Parts of Perl that handle Unicode symbol names correctly:
• Method names (including those passed to ‘use overload’)
• Typeglob names (including variable and filehandle names)
• Package names
• Constant subroutine names (not nul-clean yet)
• goto
• Symbolic dereferencing
• Second argument to bless() and tie()
• Return value of ref()
• Package names returned by caller()
• Subroutine prototypes
• Attributes
• Warnings and error messages that mention filehandles, packages,
methods, variables, constant values, subroutines, symbolic refer-
ences, format names and subroutine prototypes
• Parts of Perl that now handle embedded nuls correctly:
• Method names
• Typeglob names (including filehandle names)
• Package names
• Autoloading
• Return value of ref()
• Package names returned by caller()
• Filehandle warnings
• Typeglob elements (*foo{"THING\0stuff"})
• Signal names
• Warnings and error messages that mention (yes, it’s the same list
as above) filehandles, packages, methods, variables, constant val-
ues, subroutines, symbolic references, format names and subroutine
prototypes
• Other bug fixes
• *{é} now treats é as the name of the glob (the usual implicit
quoting), instead of treating it as a bareword (strict-unsafe)
or function call. *{é} used to be equivalent to *{+é}, in
other words.
• Modified modules:
• constant has been modified not to apply the workaround for the bug
that this branch fixes, if that workaround does not apply.
• attributes has been modified as part of making Unicode attri-
butes work.
• XS::APItest
• mro, as part of making method lookup account for Unicode.
• Side effects
• Blessing into "\0" no longer causes ref() to return false.
• *{"*é::..."} is now equivalent to *{"é::..."}, just as
*{"*e::..."} is equivalent to *{"e::..."}. Previously, the * was
only stripped if followed by [A-Za-z].
• $é is now subject to ‘Used only once’ warnings. It used to be
exempt, as the code that checked the named considered it a punctu-
ation variable.