This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Add a rough, incomplete version of perl5100delta
[perl5.git] / pod / perl5100delta.pod
CommitLineData
cf6c151c
RGS
1TODO: perl591delta and further
2
3=head1 NAME
4
5perldelta - what is new for perl 5.10.0
6
7=head1 DESCRIPTION
8
9This document describes the differences between the 5.8.8 release and
10the 5.10.0 release.
11
12Many of the bug fixes in 5.10.0 were already seen in the 5.8.X maintenance
13releases; they are not duplicated here and are documented in the set of
14man pages named perl58[1-8]?delta.
15
16=head1 Incompatible Changes
17
18=head2 Packing and UTF-8 strings
19
20=for XXX update this
21
22The semantics of pack() and unpack() regarding UTF-8-encoded data has been
23changed. Processing is now by default character per character instead of
24byte per byte on the underlying encoding. Notably, code that used things
25like C<pack("a*", $string)> to see through the encoding of string will now
26simply get back the original $string. Packed strings can also get upgraded
27during processing when you store upgraded characters. You can get the old
28behaviour by using C<use bytes>.
29
30To be consistent with pack(), the C<C0> in unpack() templates indicates
31that the data is to be processed in character mode, i.e. character by
32character; on the contrary, C<U0> in unpack() indicates UTF-8 mode, where
33the packed string is processed in its UTF-8-encoded Unicode form on a byte
34by byte basis. This is reversed with regard to perl 5.8.X.
35
36Moreover, C<C0> and C<U0> can also be used in pack() templates to specify
37respectively character and byte modes.
38
39C<C0> and C<U0> in the middle of a pack or unpack format now switch to the
40specified encoding mode, honoring parens grouping. Previously, parens were
41ignored.
42
43Also, there is a new pack() character format, C<W>, which is intended to
44replace the old C<C>. C<C> is kept for unsigned chars coded as bytes in
45the strings internal representation. C<W> represents unsigned (logical)
46character values, which can be greater than 255. It is therefore more
47robust when dealing with potentially UTF-8-encoded data (as C<C> will wrap
48values outside the range 0..255, and not respect the string encoding).
49
50In practice, that means that pack formats are now encoding-neutral, except
51C<C>.
52
53For consistency, C<A> in unpack() format now trims all Unicode whitespace
54from the end of the string. Before perl 5.9.2, it used to strip only the
55classical ASCII space characters.
56
57=head2 The C<$*> and C<$#> variables have been removed
58
59C<$*>, which was deprecated in favor of the C</s> and C</m> regexp
60modifiers, has been removed.
61
62The deprecated C<$#> variable (output format for numbers) has been
63removed.
64
65Two new warnings, C<$#/$* is no longer supported>, have been added.
66
67=head2 substr() lvalues are no longer fixed-length
68
69The lvalues returned by the three argument form of substr() used to be a
70"fixed length window" on the original string. In some cases this could
71cause surprising action at distance or other undefined behaviour. Now the
72length of the window adjusts itself to the length of the string assigned to
73it.
74
75=head2 Parsing of C<-f _>
76
77The identifier C<_> is now forced to be a bareword after a filetest
78operator. This solves a number of misparsing issues when a global C<_>
79subroutine is defined.
80
81=head2 C<:unique>
82
83The C<:unique> attribute has been made a no-op, since its current
84implementation was fundamentally flawed and not threadsafe.
85
86=head2 Scoping of the C<sort> pragma
87
88The C<sort> pragma is now lexically scoped. Its effect used to be global.
89
90=head2 Scoping of C<bignum>, C<bigint>, C<bigrat>
91
92The three numeric pragmas C<bignum>, C<bigint> and C<bigrat> are now
93lexically scoped. (Tels)
94
95=head2 Effect of pragmas in eval
96
97The compile-time value of the C<%^H> hint variable can now propagate into
98eval("")uated code. This makes it more useful to implement lexical
99pragmas.
100
101As a side-effect of this, the overloaded-ness of constants now propagates
102into eval("").
103
104=head2 chdir FOO
105
106A bareword argument to chdir() is now recognized as a file handle.
107Earlier releases interpreted the bareword as a directory name.
108(Gisle Aas)
109
110=head2 Handling of .pmc files
111
112An old feature of perl was that before C<require> or C<use> look for a
113file with a F<.pm> extension, they will first look for a similar filename
114with a F<.pmc> extension. If this file is found, it will be loaded in
115place of any potentially existing file ending in a F<.pm> extension.
116
117Previously, F<.pmc> files were loaded only if more recent than the
118matching F<.pm> file. Starting with 5.9.4, they'll be always loaded if
119they exist.
120
121=head2 @- and @+ in patterns
122
123The special arrays C<@-> and C<@+> are no longer interpolated in regular
124expressions. (Sadahiro Tomoyuki)
125
126=head2 $AUTOLOAD can now be tainted
127
128If you call a subroutine by a tainted name, and if it defers to an
129AUTOLOAD function, then $AUTOLOAD will be (correctly) tainted.
130(Rick Delaney)
131
132=head2 Tainting and printf
133
134When perl is run under taint mode, C<printf()> and C<sprintf()> will now
135reject any tainted format argument. (Rafael Garcia-Suarez)
136
137=head2 undef and signal handlers
138
139Undefining or deleting a signal handler via C<undef $SIG{FOO}> is now
140equivalent to setting it to C<'DEFAULT'>. (Rafael Garcia-Suarez)
141
142=head2 strictures and array/hash dereferencing in defined()
143
144C<defined @$foo> and C<defined %$bar> are now subject to C<strict 'refs'>
145(that is, C<$foo> and C<$bar> shall be proper references there.)
146(Nicholas Clark)
147
148(However, C<defined(@foo)> and C<defined(%bar)> are discouraged constructs
149anyway.)
150
151=head2 C<(?p{})> has been removed
152
153The regular expression construct C<(?p{})>, which was deprecated in perl
1545.8, has been removed. Use C<(??{})> instead. (Rafael Garcia-Suarez)
155
156=head2 Pseudo-hashes have been removed
157
158Support for pseudo-hashes has been removed from Perl 5.9. (The C<fields>
159pragma remains here, but uses an alternate implementation.)
160
161=head2 Removal of the bytecode compiler and of perlcc
162
163C<perlcc>, the byteloader and the supporting modules (B::C, B::CC,
164B::Bytecode, etc.) are no longer distributed with the perl sources. Those
165experimental tools have never worked reliably, and, due to the lack of
166volunteers to keep them in line with the perl interpreter developments, it
167was decided to remove them instead of shipping a broken version of those.
168The last version of those modules can be found with perl 5.9.4.
169
170However the B compiler framework stays supported in the perl core, as with
171the more useful modules it has permitted (among others, B::Deparse and
172B::Concise).
173
174=head2 Removal of the JPL
175
176The JPL (Java-Perl Linguo) has been removed from the perl sources tarball.
177
178=head2 Recursive inheritance detected earlier
179
180Perl will now immediately throw an exception if you modify any package's
181C<@ISA> in such a way that it would cause recursive inheritance.
182
183Previously, the exception would not occur until Perl attempted to make
184use of the recursive inheritance while resolving a method or doing a
185C<$foo-E<gt>isa($bar)> lookup.
186
187=head1 Core Enhancements
188
189=head2 The C<feature> pragma
190
191The C<feature> pragma is used to enable new syntax that would break Perl's
192backwards-compatibility with older releases of the language. It's a lexical
193pragma, like C<strict> or C<warnings>.
194
195Currently the following new features are available: C<switch> (adds a
196switch statement), C<say> (adds a C<say> built-in function), and C<state>
197(adds an C<state> keyword for declaring "static" variables). Those
198features are described in their own sections of this document.
199
200The C<feature> pragma is also implicitly loaded when you require a minimal
201perl version (with the C<use VERSION> construct) greater than, or equal
202to, 5.9.5. See L<feature> for details.
203
204=head2 New B<-E> command-line switch
205
206B<-E> is equivalent to B<-e>, but it implicitly enables all
207optional features (like C<use feature ":5.10">).
208
209=head2 Defined-or operator
210
211A new operator C<//> (defined-or) has been implemented.
212The following statement:
213
214 $a // $b
215
216is merely equivalent to
217
218 defined $a ? $a : $b
219
220and
221
222 $c //= $d;
223
224can now be used instead of
225
226 $c = $d unless defined $c;
227
228The C<//> operator has the same precedence and associativity as C<||>.
229Special care has been taken to ensure that this operator Do What You Mean
230while not breaking old code, but some edge cases involving the empty
231regular expression may now parse differently. See L<perlop> for
232details.
233
234=head2 Switch and Smart Match operator
235
236Perl 5 now has a switch statement. It's available when C<use feature
237'switch'> is in effect. This feature introduces three new keywords,
238C<given>, C<when>, and C<default>:
239
240 given ($foo) {
241 when (/^abc/) { $abc = 1; }
242 when (/^def/) { $def = 1; }
243 when (/^xyz/) { $xyz = 1; }
244 default { $nothing = 1; }
245 }
246
247A more complete description of how Perl matches the switch variable
248against the C<when> conditions is given in L<perlsyn/"Switch statements">.
249
250This kind of match is called I<smart match>, and it's also possible to use
251it outside of switch statements, via the new C<~~> operator. See
252L<perlsyn/"Smart matching in detail">.
253
254This feature was contributed by Robin Houston.
255
256=head2 Regular expressions
257
258=over 4
259
260=item Recursive Patterns
261
262It is now possible to write recursive patterns without using the C<(??{})>
263construct. This new way is more efficient, and in many cases easier to
264read.
265
266Each capturing parenthesis can now be treated as an independent pattern
267that can be entered by using the C<(?PARNO)> syntax (C<PARNO> standing for
268"parenthesis number"). For example, the following pattern will match
269nested balanced angle brackets:
270
271 /
272 ^ # start of line
273 ( # start capture buffer 1
274 < # match an opening angle bracket
275 (?: # match one of:
276 (?> # don't backtrack over the inside of this group
277 [^<>]+ # one or more non angle brackets
278 ) # end non backtracking group
279 | # ... or ...
280 (?1) # recurse to bracket 1 and try it again
281 )* # 0 or more times.
282 > # match a closing angle bracket
283 ) # end capture buffer one
284 $ # end of line
285 /x
286
287Note, users experienced with PCRE will find that the Perl implementation
288of this feature differs from the PCRE one in that it is possible to
289backtrack into a recursed pattern, whereas in PCRE the recursion is
290atomic or "possessive" in nature. (Yves Orton)
291
292=item Named Capture Buffers
293
294It is now possible to name capturing parenthesis in a pattern and refer to
295the captured contents by name. The naming syntax is C<< (?<NAME>....) >>.
296It's possible to backreference to a named buffer with the C<< \k<NAME> >>
297syntax. In code, the new magical hashes C<%+> and C<%-> can be used to
298access the contents of the capture buffers.
299
300Thus, to replace all doubled chars, one could write
301
302 s/(?<letter>.)\k<letter>/$+{letter}/g
303
304Only buffers with defined contents will be "visible" in the C<%+> hash, so
305it's possible to do something like
306
307 foreach my $name (keys %+) {
308 print "content of buffer '$name' is $+{$name}\n";
309 }
310
311The C<%-> hash is a bit more complete, since it will contain array refs
312holding values from all capture buffers similarly named, if there should
313be many of them.
314
315C<%+> and C<%-> are implemented as tied hashes through the new module
316C<Tie::Hash::NamedCapture>.
317
318Users exposed to the .NET regex engine will find that the perl
319implementation differs in that the numerical ordering of the buffers
320is sequential, and not "unnamed first, then named". Thus in the pattern
321
322 /(A)(?<B>B)(C)(?<D>D)/
323
324$1 will be 'A', $2 will be 'B', $3 will be 'C' and $4 will be 'D' and not
325$1 is 'A', $2 is 'C' and $3 is 'B' and $4 is 'D' that a .NET programmer
326would expect. This is considered a feature. :-) (Yves Orton)
327
328=item Possessive Quantifiers
329
330Perl now supports the "possessive quantifier" syntax of the "atomic match"
331pattern. Basically a possessive quantifier matches as much as it can and never
332gives any back. Thus it can be used to control backtracking. The syntax is
333similar to non-greedy matching, except instead of using a '?' as the modifier
334the '+' is used. Thus C<?+>, C<*+>, C<++>, C<{min,max}+> are now legal
335quantifiers. (Yves Orton)
336
337=item Backtracking control verbs
338
339The regex engine now supports a number of special-purpose backtrack
340control verbs: (*THEN), (*PRUNE), (*MARK), (*SKIP), (*COMMIT), (*FAIL)
341and (*ACCEPT). See L<perlre> for their descriptions. (Yves Orton)
342
343=item Relative backreferences
344
345A new syntax C<\g{N}> or C<\gN> where "N" is a decimal integer allows a
346safer form of back-reference notation as well as allowing relative
347backreferences. This should make it easier to generate and embed patterns
348that contain backreferences. See L<perlre/"Capture buffers">. (Yves Orton)
349
350=item C<\K> escape
351
352The functionality of Jeff Pinyan's module Regexp::Keep has been added to
353the core. You can now use in regular expressions the special escape C<\K>
354as a way to do something like floating length positive lookbehind. It is
355also useful in substitutions like:
356
357 s/(foo)bar/$1/g
358
359that can now be converted to
360
361 s/foo\Kbar//g
362
363which is much more efficient. (Yves Orton)
364
365=item Vertical and horizontal whitespace, and linebreak
366
367Regular expressions now recognize the C<\v> and C<\h> escapes, that match
368vertical and horizontal whitespace, respectively. C<\V> and C<\H>
369logically match their complements.
370
371C<\R> matches a generic linebreak, that is, vertical whitespace, plus
372the multi-character sequence C<"\x0D\x0A">.
373
374=back
375
376=head2 C<say()>
377
378say() is a new built-in, only available when C<use feature 'say'> is in
379effect, that is similar to print(), but that implicitly appends a newline
380to the printed string. See L<perlfunc/say>. (Robin Houston)
381
382=head2 Lexical C<$_>
383
384The default variable C<$_> can now be lexicalized, by declaring it like
385any other lexical variable, with a simple
386
387 my $_;
388
389The operations that default on C<$_> will use the lexically-scoped
390version of C<$_> when it exists, instead of the global C<$_>.
391
392In a C<map> or a C<grep> block, if C<$_> was previously my'ed, then the
393C<$_> inside the block is lexical as well (and scoped to the block).
394
395In a scope where C<$_> has been lexicalized, you can still have access to
396the global version of C<$_> by using C<$::_>, or, more simply, by
397overriding the lexical declaration with C<our $_>.
398
399=head2 The C<_> prototype
400
401A new prototype character has been added. C<_> is equivalent to C<$> (it
402denotes a scalar), but defaults to C<$_> if the corresponding argument
403isn't supplied. Due to the optional nature of the argument, you can only
404use it at the end of a prototype, or before a semicolon.
405
406This has a small incompatible consequence: the prototype() function has
407been adjusted to return C<_> for some built-ins in appropriate cases (for
408example, C<prototype('CORE::rmdir')>). (Rafael Garcia-Suarez)
409
410=head2 UNITCHECK blocks
411
412C<UNITCHECK>, a new special code block has been introduced, in addition to
413C<BEGIN>, C<CHECK>, C<INIT> and C<END>.
414
415C<CHECK> and C<INIT> blocks, while useful for some specialized purposes,
416are always executed at the transition between the compilation and the
417execution of the main program, and thus are useless whenever code is
418loaded at runtime. On the other hand, C<UNITCHECK> blocks are executed
419just after the unit which defined them has been compiled. See L<perlmod>
420for more information. (Alex Gough)
421
422=head2 New Pragma, C<mro>
423
424A new pragma, C<mro> (for Method Resolution Order) has been added. It
425permits to switch, on a per-class basis, the algorithm that perl uses to
426find inherited methods in case of a mutiple inheritance hierachy. The
427default MRO hasn't changed (DFS, for Depth First Search). Another MRO is
428available: the C3 algorithm. See L<mro> for more information.
429(Brandon Black)
430
431Note that, due to changes in the implentation of class hierarchy search,
432code that used to undef the C<*ISA> glob will most probably break. Anyway,
433undef'ing C<*ISA> had the side-effect of removing the magic on the @ISA
434array and should not have been done in the first place.
435
436=head2 readpipe() is now overridable
437
438The built-in function readpipe() is now overridable. Overriding it permits
439also to override its operator counterpart, C<qx//> (a.k.a. C<``>).
440Moreover, it now defaults to C<$_> if no argument is provided. (Rafael
441Garcia-Suarez)
442
443=head2 default argument for readline()
444
445readline() now defaults to C<*ARGV> if no argument is provided. (Rafael
446Garcia-Suarez)
447
448=head2 state() variables
449
450A new class of variables has been introduced. State variables are similar
451to C<my> variables, but are declared with the C<state> keyword in place of
452C<my>. They're visible only in their lexical scope, but their value is
453persistent: unlike C<my> variables, they're not undefined at scope entry,
454but retain their previous value. (Rafael Garcia-Suarez, Nicholas Clark)
455
456To use state variables, one needs to enable them by using
457
458 use feature "state";
459
460or by using the C<-E> command-line switch in one-liners.
461See L<perlsub/"Persistent variables via state()">.
462
463=head2 Stacked filetest operators
464
465As a new form of syntactic sugar, it's now possible to stack up filetest
466operators. You can now write C<-f -w -x $file> in a row to mean
467C<-x $file && -w _ && -f _>. See L<perlfunc/-X>.
468
469=head2 UNIVERSAL::DOES()
470
471The C<UNIVERSAL> class has a new method, C<DOES()>. It has been added to
472solve semantic problems with the C<isa()> method. C<isa()> checks for
473inheritance, while C<DOES()> has been designed to be overridden when
474module authors use other types of relations between classes (in addition
475to inheritance). (chromatic)
476
477See L<< UNIVERSAL/"$obj->DOES( ROLE )" >>.
478
479=head2 C<CLONE_SKIP()>
480
481Perl has now support for the C<CLONE_SKIP> special subroutine. Like
482C<CLONE>, C<CLONE_SKIP> is called once per package; however, it is called
483just before cloning starts, and in the context of the parent thread. If it
484returns a true value, then no objects of that class will be cloned. See
485L<perlmod> for details. (Contributed by Dave Mitchell.)
486
487=head2 Formats
488
489Formats were improved in several ways. A new field, C<^*>, can be used for
490variable-width, one-line-at-a-time text. Null characters are now handled
491correctly in picture lines. Using C<@#> and C<~~> together will now
492produce a compile-time error, as those format fields are incompatible.
493L<perlform> has been improved, and miscellaneous bugs fixed.
494
495=head2 Byte-order modifiers for pack() and unpack()
496
497There are two new byte-order modifiers, C<E<gt>> (big-endian) and C<E<lt>>
498(little-endian), that can be appended to most pack() and unpack() template
499characters and groups to force a certain byte-order for that type or group.
500See L<perlfunc/pack> and L<perlpacktut> for details.
501
502=head2 Byte count feature in pack()
503
504A new pack() template character, C<".">, returns the number of characters
505read so far.
506
507=head2 C<no VERSION>
508
509You can now use C<no> followed by a version number to specify that you
510want to use a version of perl older than the specified one.
511
512=head2 C<chdir>, C<chmod> and C<chown> on filehandles
513
514C<chdir>, C<chmod> and C<chown> can now work on filehandles as well as
515filenames, if the system supports respectively C<fchdir>, C<fchmod> and
516C<fchown>, thanks to a patch provided by Gisle Aas.
517
518=head2 OS groups
519
520C<$(> and C<$)> now return groups in the order where the OS returns them,
521thanks to Gisle Aas. This wasn't previously the case.
522
523=head2 Recursive sort subs
524
525You can now use recursive subroutines with sort(), thanks to Robin Houston.
526
527=head2 Exceptions in constant folding
528
529The constant folding routine is now wrapped in an exception handler, and
530if folding throws an exception (such as attempting to evaluate 0/0), perl
531now retains the current optree, rather than aborting the whole program.
532(Nicholas Clark, Dave Mitchell)
533
534=head2 Source filters in @INC
535
536It's possible to enhance the mechanism of subroutine hooks in @INC by
537adding a source filter on top of the filehandle opened and returned by the
538hook. This feature was planned a long time ago, but wasn't quite working
539until now. See L<perlfunc/require> for details. (Nicholas Clark)
540
541=head2 New internal variables
542
543=over 4
544
545=item C<${^RE_DEBUG_FLAGS}>
546
547This variable controls what debug flags are in effect for the regular
548expression engine when running under C<use re "debug">. See L<re> for
549details.
550
551=item C<${^CHILD_ERROR_NATIVE}>
552
553This variable gives the native status returned by the last pipe close,
554backtick command, successful call to wait() or waitpid(), or from the
555system() operator. See L<perlrun> for details. (Contributed by Gisle Aas.)
556
557=back
558
559=head2 Miscellaneous
560
561C<unpack()> now defaults to unpacking the C<$_> variable.
562
563C<mkdir()> without arguments now defaults to C<$_>.
564
565The internal dump output has been improved, so that non-printable characters
566such as newline and backspace are output in C<\x> notation, rather than
567octal.
568
569The B<-C> option can no longer be used on the C<#!> line. It wasn't
570working there anyway.
571
572=head2 UCD 5.0.0
573
574The copy of the Unicode Character Database included in Perl 5 has
575been updated to version 5.0.0.
576
577
578=head2 MAD
579
580MAD, which stands for I<Misc Attribute Decoration>, is a
581still-in-development work leading to a Perl 5 to Perl 6 converter. To
582enable it, it's necessary to pass the argument C<-Dmad> to Configure. The
583obtained perl isn't binary compatible with a regular perl 5.9.4, and has
584space and speed penalties; moreover not all regression tests still pass
585with it. (Larry Wall, Nicholas Clark)
586
587=head1 Modules and Pragmata
588=head1 Utility Changes
589=head1 New Documentation
590=head1 Performance Enhancements
591=head1 Installation and Configuration Improvements
592=head1 Selected Bug Fixes
593=head1 New or Changed Diagnostics
594=head1 Changed Internals
595=head1 New Tests
596=head1 Known Problems
597=head1 Platform Specific Problems
598=head1 Reporting Bugs
599
600=head1 SEE ALSO
601
602The F<Changes> file and the perl590delta to perl595delta man pages for
603exhaustive details on what changed.
604
605The F<INSTALL> file for how to build Perl.
606
607The F<README> file for general stuff.
608
609The F<Artistic> and F<Copying> files for copyright information.
610
611=cut