X-Git-Url: https://perl5.git.perl.org/perl5.git/blobdiff_plain/91acaa3b6f1aa6870ee6aeb2cc73546a9c2c36f9..b16ca463545a377fa963a7dd744ac3107716a142:/pod/perldelta.pod diff --git a/pod/perldelta.pod b/pod/perldelta.pod index f81a01b..c7a0436 100644 --- a/pod/perldelta.pod +++ b/pod/perldelta.pod @@ -2,4240 +2,495 @@ =head1 NAME -perldelta - what is new for perl v5.16.0 +[ this is a template for a new perldelta file. Any text flagged as +XXX needs to be processed before release. ] -=head1 DESCRIPTION - -This document describes differences between the 5.14.0 release and -the 5.16.0 release. - -If you are upgrading from an earlier release such as 5.12.0, first read -L, which describes differences between 5.12.0 and -5.14.0. - -=head1 Notice - -With the release of Perl 5.16.0, the 5.12.x series of releases are now out of -their support period. There may be future 5.12.x releases, but only in the -event of a critical security issue. Users of Perl 5.12 or earlier should -consider upgrading to a more recent release of Perl. - -This policy is described in greater detail in -L. - -=head1 Core Enhancements - -=head2 C> - -As of this release, version declarations like C now disable -all features before enabling the new feature bundle. This means that -the following holds true: - - use 5.016; - # only 5.16 features enabled here - use 5.014; - # only 5.14 features enabled here (not 5.16) - -C and higher continue to enable strict, but explicit C and C now override the version declaration, even -when they come first: - - no strict; - use 5.012; - # no strict here - -There is a new ":default" feature bundle that represents the set of -features enabled before any version declaration or C has -been seen. Version declarations below 5.10 now enable the ":default" -feature set. This does not actually change the behaviour of C, because features added to the ":default" set are those that were -traditionally enabled by default, before they could be turned off. - -C<< no feature >> now resets to the default feature set. To disable all -features (which is likely to be a pretty special-purpose request, since -it presumably won't match any named set of semantics) you can now -write C<< no feature ':all' >>. - -C<$[> is now disabled under C. It is part of the default -feature set and can be turned on or off explicitly with C. - -=head2 C<__SUB__> - -The new C<__SUB__> token, available under the C feature -(see L) or C, returns a reference to the current -subroutine, making it easier to write recursive closures. - -=head2 New and Improved Built-ins - -=head3 More consistent C - -The C operator sometimes treats a string argument as a sequence of -characters and sometimes as a sequence of bytes, depending on the -internal encoding. The internal encoding is not supposed to make any -difference, but there is code that relies on this inconsistency. - -The new C and C features (enabled under C) resolve this. The C feature causes C to treat the string always as Unicode. The C -features provides a function, itself called C, which -evaluates its argument always as a string of bytes. - -These features also fix oddities with source filters leaking to outer -dynamic scopes. - -See L for more detail. - -=head3 C lvalue revamp - -=for comment Does this belong here, or under Incomptable Changes? - -When C is called in lvalue or potential lvalue context with two -or three arguments, a special lvalue scalar is returned that modifies -the original string (the first argument) when assigned to. - -Previously, the offsets (the second and third arguments) passed to -C would be converted immediately to match the string, negative -offsets being translated to positive and offsets beyond the end of the -string being truncated. - -Now, the offsets are recorded without modification in the special -lvalue scalar that is returned, and the original string is not even -looked at by C itself, but only when the returned lvalue is -read or modified. - -These changes result in an incompatible change: - -If the original string changes length after the call to C but -before assignment to its return value, negative offsets will remember -their position from the end of the string, affecting code like this: - - my $string = "string"; - my $lvalue = \substr $string, -4, 2; - print $lvalue, "\n"; # prints "ri" - $string = "bailing twine"; - print $lvalue, "\n"; # prints "wi"; used to print "il" - -The same thing happens with an omitted third argument. The returned -lvalue will always extend to the end of the string, even if the string -becomes longer. - -Since this change also allowed many bugs to be fixed (see -L operator>), and since the behaviour -of negative offsets has never been specified, the -change was deemed acceptable. - -=head3 Return value of C - -The value returned by C on a tied variable is now the actual -scalar that holds the object to which the variable is tied. This -allows ties to be weakened with C. - -=head2 Unicode Support - -=head3 Supports (I) Unicode 6.1 - -Besides the addition of whole new scripts, and new characters in -existing scripts, this new version of Unicode, as always, makes some -changes to existing characters. One change that may trip up some -applications is that the General Category of two characters in the -Latin-1 range, PILCROW SIGN and SECTION SIGN, has been changed from -Other_Symbol to Other_Punctuation. The same change has been made for -a character in each of Tibetan, Ethiopic, and Aegean. -The code points U+3248..U+324F (CIRCLED NUMBER TEN ON BLACK SQUARE -through CIRCLED NUMBER EIGHTY ON BLACK SQUARE) have had their General -Category changed from Other_Symbol to Other_Numeric. The Line Break -property has changes for Hebrew and Japanese; and as a consequence of -other changes in 6.1, the Perl regular expression construct C<\X> now -works differently for some characters in Thai and Lao. - -New aliases (synonyms) have been defined for many property values; -these, along with the previously existing ones, are all cross-indexed in -L. - -The return value of C is affected by other -changes: - - Code point Old Name New Name - U+000A LINE FEED (LF) LINE FEED - U+000C FORM FEED (FF) FORM FEED - U+000D CARRIAGE RETURN (CR) CARRIAGE RETURN - U+0085 NEXT LINE (NEL) NEXT LINE - U+008E SINGLE-SHIFT 2 SINGLE-SHIFT-2 - U+008F SINGLE-SHIFT 3 SINGLE-SHIFT-3 - U+0091 PRIVATE USE 1 PRIVATE USE-1 - U+0092 PRIVATE USE 2 PRIVATE USE-2 - U+2118 SCRIPT CAPITAL P WEIERSTRASS ELLIPTIC FUNCTION - -Perl will accept any of these names as input, but -C now returns the new name of each pair. The -change for U+2118 is considered by Unicode to be a correction, that is -the original name was a mistake (but again, it will remain forever valid -to use it to refer to U+2118). But most of these changes are the -fallout of the mistake Unicode 6.0 made in naming a character used in -Japanese cell phones to be "BELL", which conflicts with the longstanding -industry use of (and Unicode's recommendation to use) that name -to mean the ASCII control character at U+0007. As a result, that name -has been deprecated in Perl since v5.14; and any use of it will raise a -warning message (unless turned off). The name "ALERT" is now the -preferred name for this code point, with "BEL" being an acceptable short -form. The name for the new cell phone character, at code point U+1F514, -remains undefined in this version of Perl (hence we don't quite -implement all of Unicode 6.1), but starting in v5.18, BELL will mean -this character, and not U+0007. - -Unicode has taken steps to make sure that this sort of mistake does not -happen again. The Standard now includes all the generally accepted -names and abbreviations for control characters, whereas previously it -didn't (though there were recommended names for most of them, which Perl -used). This means that most of those recommended names are now -officially in the Standard. Unicode did not recommend names for the -four code points listed above between U+008E and U+008F, and in -standardizing them Unicode subtly changed the names that Perl had -previously given them, by replacing the final blank in each name by a -hyphen. Unicode also officially accepts names that Perl had deprecated, -such as FILE SEPARATOR. Now the only deprecated name is BELL. -Finally, Perl now uses the new official names instead of the old -(now considered obsolete) names for the first four code points in the -list above (the ones which have the parentheses in them). - -Now that the names have been placed in the Unicode standard, these kinds -of changes should not happen again, though corrections, such as to -U+2118, are still possible. - -Unicode also added some name abbreviations, which Perl now accepts: -SP for SPACE; -TAB for CHARACTER TABULATION; -NEW LINE, END OF LINE, NL, and EOL for LINE FEED; -LOCKING-SHIFT ONE for SHIFT OUT; -LOCKING-SHIFT ZERO for SHIFT IN; -and ZWNBSP for ZERO WIDTH NO-BREAK SPACE. - -More details on this version of Unicode are provided in -L. - -=head3 C is no longer needed for C<\N{I}> - -When C<\N{I}> is encountered, the C module is now -automatically loaded when needed as if the C<:full> and C<:short> -options had been specified. See L for more information. - -=head3 C<\N{...}> can now have Unicode loose name matching - -This is described in the C item in -L below. - -=head3 Unicode Symbol Names - -Perl now has proper support for Unicode in symbol names. It used to be -that C<*{$foo}> would ignore the internal UTF8 flag and use the bytes of -the underlying representation to look up the symbol. That meant that -C<*{"\x{100}"}> and C<*{"\xc4\x80"}> would return the same thing. All -these parts of Perl have been fixed to account for Unicode: - -=over - -=item * - -Method names (including those passed to C) - -=item * - -Typeglob names (including names of variables, subroutines and filehandles) - -=item * - -Package names - -=item * - -C - -=item * - -Symbolic dereferencing - -=item * - -Second argument to C and C - -=item * - -Return value of C - -=item * - -Subroutine prototypes - -=item * - -Attributes - -=item * - -Various warnings and error messages that mention variable names or values, -methods, etc. - -=back - -In addition, a parsing bug has been fixed that prevented C<*{é}> from -implicitly quoting the name, but instead interpreted it as C<*{+é}>, which -would cause a strict violation. - -C<*{"*a::b"}> automatically strips off the * if it is followed by an ASCII -letter. That has been extended to all Unicode identifier characters. - -One-character non-ASCII non-punctuation variables (like C<$é>) are now -subject to "Used only once" warnings. They used to be exempt, as they -were treated as punctuation variables. - -Also, single-character Unicode punctuation variables (like C<$‰>) are now -supported [perl #69032]. - -=head3 Improved ability to mix locales and Unicode, including UTF-8 locales - -An optional parameter has been added to C - - use locale ':not_characters'; - -which tells Perl to use all but the C and C -portions of the current locale. Instead, the character set is assumed -to be Unicode. This allows locales and Unicode to be seamlessly mixed, -including the increasingly frequent UTF-8 locales. When using this -hybrid form of locales, the C<:locale> layer to the L pragma can -be used to interface with the file system, and there are CPAN modules -available for ARGV and environment variable conversions. - -Full details are in L. - -=head3 New function C and corresponding escape sequence C<\F> for Unicode foldcase - -Unicode foldcase is an extension to lowercase that gives better results -when comparing two strings case-insensitively. It has long been used -internally in regular expression C matching. Now it is available -explicitly through the new C function call (enabled by -S>, or C, or explicitly callable via -C) or through the new C<\F> sequence in double-quotish -strings. - -Full details are in L. - -=head3 The Unicode C property is now supported. - -New in Unicode 6.0, this is an improved C