| 1 | =head1 NAME |
| 2 | X<subroutine> X<function> |
| 3 | |
| 4 | perlsub - Perl subroutines |
| 5 | |
| 6 | =head1 SYNOPSIS |
| 7 | |
| 8 | To declare subroutines: |
| 9 | X<subroutine, declaration> X<sub> |
| 10 | |
| 11 | sub NAME; # A "forward" declaration. |
| 12 | sub NAME(PROTO); # ditto, but with prototypes |
| 13 | sub NAME : ATTRS; # with attributes |
| 14 | sub NAME(PROTO) : ATTRS; # with attributes and prototypes |
| 15 | |
| 16 | sub NAME BLOCK # A declaration and a definition. |
| 17 | sub NAME(PROTO) BLOCK # ditto, but with prototypes |
| 18 | sub NAME SIG BLOCK # with signature |
| 19 | sub NAME : ATTRS BLOCK # with attributes |
| 20 | sub NAME(PROTO) : ATTRS BLOCK # with prototypes and attributes |
| 21 | sub NAME : ATTRS SIG BLOCK # with attributes and signature |
| 22 | |
| 23 | To define an anonymous subroutine at runtime: |
| 24 | X<subroutine, anonymous> |
| 25 | |
| 26 | $subref = sub BLOCK; # no proto |
| 27 | $subref = sub (PROTO) BLOCK; # with proto |
| 28 | $subref = sub SIG BLOCK; # with signature |
| 29 | $subref = sub : ATTRS BLOCK; # with attributes |
| 30 | $subref = sub (PROTO) : ATTRS BLOCK; # with proto and attributes |
| 31 | $subref = sub : ATTRS SIG BLOCK; # with attribs and signature |
| 32 | |
| 33 | To import subroutines: |
| 34 | X<import> |
| 35 | |
| 36 | use MODULE qw(NAME1 NAME2 NAME3); |
| 37 | |
| 38 | To call subroutines: |
| 39 | X<subroutine, call> X<call> |
| 40 | |
| 41 | NAME(LIST); # & is optional with parentheses. |
| 42 | NAME LIST; # Parentheses optional if predeclared/imported. |
| 43 | &NAME(LIST); # Circumvent prototypes. |
| 44 | &NAME; # Makes current @_ visible to called subroutine. |
| 45 | |
| 46 | =head1 DESCRIPTION |
| 47 | |
| 48 | Like many languages, Perl provides for user-defined subroutines. |
| 49 | These may be located anywhere in the main program, loaded in from |
| 50 | other files via the C<do>, C<require>, or C<use> keywords, or |
| 51 | generated on the fly using C<eval> or anonymous subroutines. |
| 52 | You can even call a function indirectly using a variable containing |
| 53 | its name or a CODE reference. |
| 54 | |
| 55 | The Perl model for function call and return values is simple: all |
| 56 | functions are passed as parameters one single flat list of scalars, and |
| 57 | all functions likewise return to their caller one single flat list of |
| 58 | scalars. Any arrays or hashes in these call and return lists will |
| 59 | collapse, losing their identities--but you may always use |
| 60 | pass-by-reference instead to avoid this. Both call and return lists may |
| 61 | contain as many or as few scalar elements as you'd like. (Often a |
| 62 | function without an explicit return statement is called a subroutine, but |
| 63 | there's really no difference from Perl's perspective.) |
| 64 | X<subroutine, parameter> X<parameter> |
| 65 | |
| 66 | Any arguments passed in show up in the array C<@_>. |
| 67 | (They may also show up in lexical variables introduced by a signature; |
| 68 | see L</Signatures> below.) Therefore, if |
| 69 | you called a function with two arguments, those would be stored in |
| 70 | C<$_[0]> and C<$_[1]>. The array C<@_> is a local array, but its |
| 71 | elements are aliases for the actual scalar parameters. In particular, |
| 72 | if an element C<$_[0]> is updated, the corresponding argument is |
| 73 | updated (or an error occurs if it is not updatable). If an argument |
| 74 | is an array or hash element which did not exist when the function |
| 75 | was called, that element is created only when (and if) it is modified |
| 76 | or a reference to it is taken. (Some earlier versions of Perl |
| 77 | created the element whether or not the element was assigned to.) |
| 78 | Assigning to the whole array C<@_> removes that aliasing, and does |
| 79 | not update any arguments. |
| 80 | X<subroutine, argument> X<argument> X<@_> |
| 81 | |
| 82 | A C<return> statement may be used to exit a subroutine, optionally |
| 83 | specifying the returned value, which will be evaluated in the |
| 84 | appropriate context (list, scalar, or void) depending on the context of |
| 85 | the subroutine call. If you specify no return value, the subroutine |
| 86 | returns an empty list in list context, the undefined value in scalar |
| 87 | context, or nothing in void context. If you return one or more |
| 88 | aggregates (arrays and hashes), these will be flattened together into |
| 89 | one large indistinguishable list. |
| 90 | |
| 91 | If no C<return> is found and if the last statement is an expression, its |
| 92 | value is returned. If the last statement is a loop control structure |
| 93 | like a C<foreach> or a C<while>, the returned value is unspecified. The |
| 94 | empty sub returns the empty list. |
| 95 | X<subroutine, return value> X<return value> X<return> |
| 96 | |
| 97 | Aside from an experimental facility (see L</Signatures> below), |
| 98 | Perl does not have named formal parameters. In practice all you |
| 99 | do is assign to a C<my()> list of these. Variables that aren't |
| 100 | declared to be private are global variables. For gory details |
| 101 | on creating private variables, see L<"Private Variables via my()"> |
| 102 | and L<"Temporary Values via local()">. To create protected |
| 103 | environments for a set of functions in a separate package (and |
| 104 | probably a separate file), see L<perlmod/"Packages">. |
| 105 | X<formal parameter> X<parameter, formal> |
| 106 | |
| 107 | Example: |
| 108 | |
| 109 | sub max { |
| 110 | my $max = shift(@_); |
| 111 | foreach $foo (@_) { |
| 112 | $max = $foo if $max < $foo; |
| 113 | } |
| 114 | return $max; |
| 115 | } |
| 116 | $bestday = max($mon,$tue,$wed,$thu,$fri); |
| 117 | |
| 118 | Example: |
| 119 | |
| 120 | # get a line, combining continuation lines |
| 121 | # that start with whitespace |
| 122 | |
| 123 | sub get_line { |
| 124 | $thisline = $lookahead; # global variables! |
| 125 | LINE: while (defined($lookahead = <STDIN>)) { |
| 126 | if ($lookahead =~ /^[ \t]/) { |
| 127 | $thisline .= $lookahead; |
| 128 | } |
| 129 | else { |
| 130 | last LINE; |
| 131 | } |
| 132 | } |
| 133 | return $thisline; |
| 134 | } |
| 135 | |
| 136 | $lookahead = <STDIN>; # get first line |
| 137 | while (defined($line = get_line())) { |
| 138 | ... |
| 139 | } |
| 140 | |
| 141 | Assigning to a list of private variables to name your arguments: |
| 142 | |
| 143 | sub maybeset { |
| 144 | my($key, $value) = @_; |
| 145 | $Foo{$key} = $value unless $Foo{$key}; |
| 146 | } |
| 147 | |
| 148 | Because the assignment copies the values, this also has the effect |
| 149 | of turning call-by-reference into call-by-value. Otherwise a |
| 150 | function is free to do in-place modifications of C<@_> and change |
| 151 | its caller's values. |
| 152 | X<call-by-reference> X<call-by-value> |
| 153 | |
| 154 | upcase_in($v1, $v2); # this changes $v1 and $v2 |
| 155 | sub upcase_in { |
| 156 | for (@_) { tr/a-z/A-Z/ } |
| 157 | } |
| 158 | |
| 159 | You aren't allowed to modify constants in this way, of course. If an |
| 160 | argument were actually literal and you tried to change it, you'd take a |
| 161 | (presumably fatal) exception. For example, this won't work: |
| 162 | X<call-by-reference> X<call-by-value> |
| 163 | |
| 164 | upcase_in("frederick"); |
| 165 | |
| 166 | It would be much safer if the C<upcase_in()> function |
| 167 | were written to return a copy of its parameters instead |
| 168 | of changing them in place: |
| 169 | |
| 170 | ($v3, $v4) = upcase($v1, $v2); # this doesn't change $v1 and $v2 |
| 171 | sub upcase { |
| 172 | return unless defined wantarray; # void context, do nothing |
| 173 | my @parms = @_; |
| 174 | for (@parms) { tr/a-z/A-Z/ } |
| 175 | return wantarray ? @parms : $parms[0]; |
| 176 | } |
| 177 | |
| 178 | Notice how this (unprototyped) function doesn't care whether it was |
| 179 | passed real scalars or arrays. Perl sees all arguments as one big, |
| 180 | long, flat parameter list in C<@_>. This is one area where |
| 181 | Perl's simple argument-passing style shines. The C<upcase()> |
| 182 | function would work perfectly well without changing the C<upcase()> |
| 183 | definition even if we fed it things like this: |
| 184 | |
| 185 | @newlist = upcase(@list1, @list2); |
| 186 | @newlist = upcase( split /:/, $var ); |
| 187 | |
| 188 | Do not, however, be tempted to do this: |
| 189 | |
| 190 | (@a, @b) = upcase(@list1, @list2); |
| 191 | |
| 192 | Like the flattened incoming parameter list, the return list is also |
| 193 | flattened on return. So all you have managed to do here is stored |
| 194 | everything in C<@a> and made C<@b> empty. See |
| 195 | L<Pass by Reference> for alternatives. |
| 196 | |
| 197 | A subroutine may be called using an explicit C<&> prefix. The |
| 198 | C<&> is optional in modern Perl, as are parentheses if the |
| 199 | subroutine has been predeclared. The C<&> is I<not> optional |
| 200 | when just naming the subroutine, such as when it's used as |
| 201 | an argument to defined() or undef(). Nor is it optional when you |
| 202 | want to do an indirect subroutine call with a subroutine name or |
| 203 | reference using the C<&$subref()> or C<&{$subref}()> constructs, |
| 204 | although the C<< $subref->() >> notation solves that problem. |
| 205 | See L<perlref> for more about all that. |
| 206 | X<&> |
| 207 | |
| 208 | Subroutines may be called recursively. If a subroutine is called |
| 209 | using the C<&> form, the argument list is optional, and if omitted, |
| 210 | no C<@_> array is set up for the subroutine: the C<@_> array at the |
| 211 | time of the call is visible to subroutine instead. This is an |
| 212 | efficiency mechanism that new users may wish to avoid. |
| 213 | X<recursion> |
| 214 | |
| 215 | &foo(1,2,3); # pass three arguments |
| 216 | foo(1,2,3); # the same |
| 217 | |
| 218 | foo(); # pass a null list |
| 219 | &foo(); # the same |
| 220 | |
| 221 | &foo; # foo() get current args, like foo(@_) !! |
| 222 | foo; # like foo() IFF sub foo predeclared, else "foo" |
| 223 | |
| 224 | Not only does the C<&> form make the argument list optional, it also |
| 225 | disables any prototype checking on arguments you do provide. This |
| 226 | is partly for historical reasons, and partly for having a convenient way |
| 227 | to cheat if you know what you're doing. See L</Prototypes> below. |
| 228 | X<&> |
| 229 | |
| 230 | Since Perl 5.16.0, the C<__SUB__> token is available under C<use feature |
| 231 | 'current_sub'> and C<use 5.16.0>. It will evaluate to a reference to the |
| 232 | currently-running sub, which allows for recursive calls without knowing |
| 233 | your subroutine's name. |
| 234 | |
| 235 | use 5.16.0; |
| 236 | my $factorial = sub { |
| 237 | my ($x) = @_; |
| 238 | return 1 if $x == 1; |
| 239 | return($x * __SUB__->( $x - 1 ) ); |
| 240 | }; |
| 241 | |
| 242 | The behaviour of C<__SUB__> within a regex code block (such as C</(?{...})/>) |
| 243 | is subject to change. |
| 244 | |
| 245 | Subroutines whose names are in all upper case are reserved to the Perl |
| 246 | core, as are modules whose names are in all lower case. A subroutine in |
| 247 | all capitals is a loosely-held convention meaning it will be called |
| 248 | indirectly by the run-time system itself, usually due to a triggered event. |
| 249 | Subroutines whose name start with a left parenthesis are also reserved the |
| 250 | same way. The following is a list of some subroutines that currently do |
| 251 | special, pre-defined things. |
| 252 | |
| 253 | =over |
| 254 | |
| 255 | =item documented later in this document |
| 256 | |
| 257 | C<AUTOLOAD> |
| 258 | |
| 259 | =item documented in L<perlmod> |
| 260 | |
| 261 | C<CLONE>, C<CLONE_SKIP>, |
| 262 | |
| 263 | =item documented in L<perlobj> |
| 264 | |
| 265 | C<DESTROY> |
| 266 | |
| 267 | =item documented in L<perltie> |
| 268 | |
| 269 | C<BINMODE>, C<CLEAR>, C<CLOSE>, C<DELETE>, C<DESTROY>, C<EOF>, C<EXISTS>, |
| 270 | C<EXTEND>, C<FETCH>, C<FETCHSIZE>, C<FILENO>, C<FIRSTKEY>, C<GETC>, |
| 271 | C<NEXTKEY>, C<OPEN>, C<POP>, C<PRINT>, C<PRINTF>, C<PUSH>, C<READ>, |
| 272 | C<READLINE>, C<SCALAR>, C<SEEK>, C<SHIFT>, C<SPLICE>, C<STORE>, |
| 273 | C<STORESIZE>, C<TELL>, C<TIEARRAY>, C<TIEHANDLE>, C<TIEHASH>, |
| 274 | C<TIESCALAR>, C<UNSHIFT>, C<UNTIE>, C<WRITE> |
| 275 | |
| 276 | =item documented in L<PerlIO::via> |
| 277 | |
| 278 | C<BINMODE>, C<CLEARERR>, C<CLOSE>, C<EOF>, C<ERROR>, C<FDOPEN>, C<FILENO>, |
| 279 | C<FILL>, C<FLUSH>, C<OPEN>, C<POPPED>, C<PUSHED>, C<READ>, C<SEEK>, |
| 280 | C<SETLINEBUF>, C<SYSOPEN>, C<TELL>, C<UNREAD>, C<UTF8>, C<WRITE> |
| 281 | |
| 282 | =item documented in L<perlfunc> |
| 283 | |
| 284 | L<< C<import> | perlfunc/use >>, L<< C<unimport> | perlfunc/use >>, |
| 285 | L<< C<INC> | perlfunc/require >> |
| 286 | |
| 287 | =item documented in L<UNIVERSAL> |
| 288 | |
| 289 | C<VERSION> |
| 290 | |
| 291 | =item documented in L<perldebguts> |
| 292 | |
| 293 | C<DB::DB>, C<DB::sub>, C<DB::lsub>, C<DB::goto>, C<DB::postponed> |
| 294 | |
| 295 | =item undocumented, used internally by the L<overload> feature |
| 296 | |
| 297 | any starting with C<(> |
| 298 | |
| 299 | =back |
| 300 | |
| 301 | The C<BEGIN>, C<UNITCHECK>, C<CHECK>, C<INIT> and C<END> subroutines |
| 302 | are not so much subroutines as named special code blocks, of which you |
| 303 | can have more than one in a package, and which you can B<not> call |
| 304 | explicitly. See L<perlmod/"BEGIN, UNITCHECK, CHECK, INIT and END"> |
| 305 | |
| 306 | =head2 Signatures |
| 307 | |
| 308 | B<WARNING>: Subroutine signatures are experimental. The feature may be |
| 309 | modified or removed in future versions of Perl. |
| 310 | |
| 311 | Perl has an experimental facility to allow a subroutine's formal |
| 312 | parameters to be introduced by special syntax, separate from the |
| 313 | procedural code of the subroutine body. The formal parameter list |
| 314 | is known as a I<signature>. The facility must be enabled first by a |
| 315 | pragmatic declaration, C<use feature 'signatures'>, and it will produce |
| 316 | a warning unless the "experimental::signatures" warnings category is |
| 317 | disabled. |
| 318 | |
| 319 | The signature is part of a subroutine's body. Normally the body of a |
| 320 | subroutine is simply a braced block of code. When using a signature, |
| 321 | the signature is a parenthesised list that goes immediately before |
| 322 | the braced block. The signature declares lexical variables that are |
| 323 | in scope for the block. When the subroutine is called, the signature |
| 324 | takes control first. It populates the signature variables from the |
| 325 | list of arguments that were passed. If the argument list doesn't meet |
| 326 | the requirements of the signature, then it will throw an exception. |
| 327 | When the signature processing is complete, control passes to the block. |
| 328 | |
| 329 | Positional parameters are handled by simply naming scalar variables in |
| 330 | the signature. For example, |
| 331 | |
| 332 | sub foo ($left, $right) { |
| 333 | return $left + $right; |
| 334 | } |
| 335 | |
| 336 | takes two positional parameters, which must be filled at runtime by |
| 337 | two arguments. By default the parameters are mandatory, and it is |
| 338 | not permitted to pass more arguments than expected. So the above is |
| 339 | equivalent to |
| 340 | |
| 341 | sub foo { |
| 342 | die "Too many arguments for subroutine" unless @_ <= 2; |
| 343 | die "Too few arguments for subroutine" unless @_ >= 2; |
| 344 | my $left = $_[0]; |
| 345 | my $right = $_[1]; |
| 346 | return $left + $right; |
| 347 | } |
| 348 | |
| 349 | An argument can be ignored by omitting the main part of the name from |
| 350 | a parameter declaration, leaving just a bare C<$> sigil. For example, |
| 351 | |
| 352 | sub foo ($first, $, $third) { |
| 353 | return "first=$first, third=$third"; |
| 354 | } |
| 355 | |
| 356 | Although the ignored argument doesn't go into a variable, it is still |
| 357 | mandatory for the caller to pass it. |
| 358 | |
| 359 | A positional parameter is made optional by giving a default value, |
| 360 | separated from the parameter name by C<=>: |
| 361 | |
| 362 | sub foo ($left, $right = 0) { |
| 363 | return $left + $right; |
| 364 | } |
| 365 | |
| 366 | The above subroutine may be called with either one or two arguments. |
| 367 | The default value expression is evaluated when the subroutine is called, |
| 368 | so it may provide different default values for different calls. It is |
| 369 | only evaluated if the argument was actually omitted from the call. |
| 370 | For example, |
| 371 | |
| 372 | my $auto_id = 0; |
| 373 | sub foo ($thing, $id = $auto_id++) { |
| 374 | print "$thing has ID $id"; |
| 375 | } |
| 376 | |
| 377 | automatically assigns distinct sequential IDs to things for which no |
| 378 | ID was supplied by the caller. A default value expression may also |
| 379 | refer to parameters earlier in the signature, making the default for |
| 380 | one parameter vary according to the earlier parameters. For example, |
| 381 | |
| 382 | sub foo ($first_name, $surname, $nickname = $first_name) { |
| 383 | print "$first_name $surname is known as \"$nickname\""; |
| 384 | } |
| 385 | |
| 386 | An optional parameter can be nameless just like a mandatory parameter. |
| 387 | For example, |
| 388 | |
| 389 | sub foo ($thing, $ = 1) { |
| 390 | print $thing; |
| 391 | } |
| 392 | |
| 393 | The parameter's default value will still be evaluated if the corresponding |
| 394 | argument isn't supplied, even though the value won't be stored anywhere. |
| 395 | This is in case evaluating it has important side effects. However, it |
| 396 | will be evaluated in void context, so if it doesn't have side effects |
| 397 | and is not trivial it will generate a warning if the "void" warning |
| 398 | category is enabled. If a nameless optional parameter's default value |
| 399 | is not important, it may be omitted just as the parameter's name was: |
| 400 | |
| 401 | sub foo ($thing, $=) { |
| 402 | print $thing; |
| 403 | } |
| 404 | |
| 405 | Optional positional parameters must come after all mandatory positional |
| 406 | parameters. (If there are no mandatory positional parameters then an |
| 407 | optional positional parameters can be the first thing in the signature.) |
| 408 | If there are multiple optional positional parameters and not enough |
| 409 | arguments are supplied to fill them all, they will be filled from left |
| 410 | to right. |
| 411 | |
| 412 | After positional parameters, additional arguments may be captured in a |
| 413 | slurpy parameter. The simplest form of this is just an array variable: |
| 414 | |
| 415 | sub foo ($filter, @inputs) { |
| 416 | print $filter->($_) foreach @inputs; |
| 417 | } |
| 418 | |
| 419 | With a slurpy parameter in the signature, there is no upper limit on how |
| 420 | many arguments may be passed. A slurpy array parameter may be nameless |
| 421 | just like a positional parameter, in which case its only effect is to |
| 422 | turn off the argument limit that would otherwise apply: |
| 423 | |
| 424 | sub foo ($thing, @) { |
| 425 | print $thing; |
| 426 | } |
| 427 | |
| 428 | A slurpy parameter may instead be a hash, in which case the arguments |
| 429 | available to it are interpreted as alternating keys and values. |
| 430 | There must be as many keys as values: if there is an odd argument then |
| 431 | an exception will be thrown. Keys will be stringified, and if there are |
| 432 | duplicates then the later instance takes precedence over the earlier, |
| 433 | as with standard hash construction. |
| 434 | |
| 435 | sub foo ($filter, %inputs) { |
| 436 | print $filter->($_, $inputs{$_}) foreach sort keys %inputs; |
| 437 | } |
| 438 | |
| 439 | A slurpy hash parameter may be nameless just like other kinds of |
| 440 | parameter. It still insists that the number of arguments available to |
| 441 | it be even, even though they're not being put into a variable. |
| 442 | |
| 443 | sub foo ($thing, %) { |
| 444 | print $thing; |
| 445 | } |
| 446 | |
| 447 | A slurpy parameter, either array or hash, must be the last thing in the |
| 448 | signature. It may follow mandatory and optional positional parameters; |
| 449 | it may also be the only thing in the signature. Slurpy parameters cannot |
| 450 | have default values: if no arguments are supplied for them then you get |
| 451 | an empty array or empty hash. |
| 452 | |
| 453 | A signature may be entirely empty, in which case all it does is check |
| 454 | that the caller passed no arguments: |
| 455 | |
| 456 | sub foo () { |
| 457 | return 123; |
| 458 | } |
| 459 | |
| 460 | When using a signature, the arguments are still available in the special |
| 461 | array variable C<@_>, in addition to the lexical variables of the |
| 462 | signature. There is a difference between the two ways of accessing the |
| 463 | arguments: C<@_> I<aliases> the arguments, but the signature variables |
| 464 | get I<copies> of the arguments. So writing to a signature variable |
| 465 | only changes that variable, and has no effect on the caller's variables, |
| 466 | but writing to an element of C<@_> modifies whatever the caller used to |
| 467 | supply that argument. |
| 468 | |
| 469 | There is a potential syntactic ambiguity between signatures and prototypes |
| 470 | (see L</Prototypes>), because both start with an opening parenthesis and |
| 471 | both can appear in some of the same places, such as just after the name |
| 472 | in a subroutine declaration. For historical reasons, when signatures |
| 473 | are not enabled, any opening parenthesis in such a context will trigger |
| 474 | very forgiving prototype parsing. Most signatures will be interpreted |
| 475 | as prototypes in those circumstances, but won't be valid prototypes. |
| 476 | (A valid prototype cannot contain any alphabetic character.) This will |
| 477 | lead to somewhat confusing error messages. |
| 478 | |
| 479 | To avoid ambiguity, when signatures are enabled the special syntax |
| 480 | for prototypes is disabled. There is no attempt to guess whether a |
| 481 | parenthesised group was intended to be a prototype or a signature. |
| 482 | To give a subroutine a prototype under these circumstances, use a |
| 483 | L<prototype attribute|attributes/Built-in Attributes>. For example, |
| 484 | |
| 485 | sub foo :prototype($) { $_[0] } |
| 486 | |
| 487 | It is entirely possible for a subroutine to have both a prototype and |
| 488 | a signature. They do different jobs: the prototype affects compilation |
| 489 | of calls to the subroutine, and the signature puts argument values into |
| 490 | lexical variables at runtime. You can therefore write |
| 491 | |
| 492 | sub foo :prototype($$) ($left, $right) { |
| 493 | return $left + $right; |
| 494 | } |
| 495 | |
| 496 | The prototype attribute, and any other attributes, must come before |
| 497 | the signature. The signature always immediately precedes the block of |
| 498 | the subroutine's body. |
| 499 | |
| 500 | =head2 Private Variables via my() |
| 501 | X<my> X<variable, lexical> X<lexical> X<lexical variable> X<scope, lexical> |
| 502 | X<lexical scope> X<attributes, my> |
| 503 | |
| 504 | Synopsis: |
| 505 | |
| 506 | my $foo; # declare $foo lexically local |
| 507 | my (@wid, %get); # declare list of variables local |
| 508 | my $foo = "flurp"; # declare $foo lexical, and init it |
| 509 | my @oof = @bar; # declare @oof lexical, and init it |
| 510 | my $x : Foo = $y; # similar, with an attribute applied |
| 511 | |
| 512 | B<WARNING>: The use of attribute lists on C<my> declarations is still |
| 513 | evolving. The current semantics and interface are subject to change. |
| 514 | See L<attributes> and L<Attribute::Handlers>. |
| 515 | |
| 516 | The C<my> operator declares the listed variables to be lexically |
| 517 | confined to the enclosing block, conditional (C<if/unless/elsif/else>), |
| 518 | loop (C<for/foreach/while/until/continue>), subroutine, C<eval>, |
| 519 | or C<do/require/use>'d file. If more than one value is listed, the |
| 520 | list must be placed in parentheses. All listed elements must be |
| 521 | legal lvalues. Only alphanumeric identifiers may be lexically |
| 522 | scoped--magical built-ins like C<$/> must currently be C<local>ized |
| 523 | with C<local> instead. |
| 524 | |
| 525 | Unlike dynamic variables created by the C<local> operator, lexical |
| 526 | variables declared with C<my> are totally hidden from the outside |
| 527 | world, including any called subroutines. This is true if it's the |
| 528 | same subroutine called from itself or elsewhere--every call gets |
| 529 | its own copy. |
| 530 | X<local> |
| 531 | |
| 532 | This doesn't mean that a C<my> variable declared in a statically |
| 533 | enclosing lexical scope would be invisible. Only dynamic scopes |
| 534 | are cut off. For example, the C<bumpx()> function below has access |
| 535 | to the lexical $x variable because both the C<my> and the C<sub> |
| 536 | occurred at the same scope, presumably file scope. |
| 537 | |
| 538 | my $x = 10; |
| 539 | sub bumpx { $x++ } |
| 540 | |
| 541 | An C<eval()>, however, can see lexical variables of the scope it is |
| 542 | being evaluated in, so long as the names aren't hidden by declarations within |
| 543 | the C<eval()> itself. See L<perlref>. |
| 544 | X<eval, scope of> |
| 545 | |
| 546 | The parameter list to my() may be assigned to if desired, which allows you |
| 547 | to initialize your variables. (If no initializer is given for a |
| 548 | particular variable, it is created with the undefined value.) Commonly |
| 549 | this is used to name input parameters to a subroutine. Examples: |
| 550 | |
| 551 | $arg = "fred"; # "global" variable |
| 552 | $n = cube_root(27); |
| 553 | print "$arg thinks the root is $n\n"; |
| 554 | fred thinks the root is 3 |
| 555 | |
| 556 | sub cube_root { |
| 557 | my $arg = shift; # name doesn't matter |
| 558 | $arg **= 1/3; |
| 559 | return $arg; |
| 560 | } |
| 561 | |
| 562 | The C<my> is simply a modifier on something you might assign to. So when |
| 563 | you do assign to variables in its argument list, C<my> doesn't |
| 564 | change whether those variables are viewed as a scalar or an array. So |
| 565 | |
| 566 | my ($foo) = <STDIN>; # WRONG? |
| 567 | my @FOO = <STDIN>; |
| 568 | |
| 569 | both supply a list context to the right-hand side, while |
| 570 | |
| 571 | my $foo = <STDIN>; |
| 572 | |
| 573 | supplies a scalar context. But the following declares only one variable: |
| 574 | |
| 575 | my $foo, $bar = 1; # WRONG |
| 576 | |
| 577 | That has the same effect as |
| 578 | |
| 579 | my $foo; |
| 580 | $bar = 1; |
| 581 | |
| 582 | The declared variable is not introduced (is not visible) until after |
| 583 | the current statement. Thus, |
| 584 | |
| 585 | my $x = $x; |
| 586 | |
| 587 | can be used to initialize a new $x with the value of the old $x, and |
| 588 | the expression |
| 589 | |
| 590 | my $x = 123 and $x == 123 |
| 591 | |
| 592 | is false unless the old $x happened to have the value C<123>. |
| 593 | |
| 594 | Lexical scopes of control structures are not bounded precisely by the |
| 595 | braces that delimit their controlled blocks; control expressions are |
| 596 | part of that scope, too. Thus in the loop |
| 597 | |
| 598 | while (my $line = <>) { |
| 599 | $line = lc $line; |
| 600 | } continue { |
| 601 | print $line; |
| 602 | } |
| 603 | |
| 604 | the scope of $line extends from its declaration throughout the rest of |
| 605 | the loop construct (including the C<continue> clause), but not beyond |
| 606 | it. Similarly, in the conditional |
| 607 | |
| 608 | if ((my $answer = <STDIN>) =~ /^yes$/i) { |
| 609 | user_agrees(); |
| 610 | } elsif ($answer =~ /^no$/i) { |
| 611 | user_disagrees(); |
| 612 | } else { |
| 613 | chomp $answer; |
| 614 | die "'$answer' is neither 'yes' nor 'no'"; |
| 615 | } |
| 616 | |
| 617 | the scope of $answer extends from its declaration through the rest |
| 618 | of that conditional, including any C<elsif> and C<else> clauses, |
| 619 | but not beyond it. See L<perlsyn/"Simple Statements"> for information |
| 620 | on the scope of variables in statements with modifiers. |
| 621 | |
| 622 | The C<foreach> loop defaults to scoping its index variable dynamically |
| 623 | in the manner of C<local>. However, if the index variable is |
| 624 | prefixed with the keyword C<my>, or if there is already a lexical |
| 625 | by that name in scope, then a new lexical is created instead. Thus |
| 626 | in the loop |
| 627 | X<foreach> X<for> |
| 628 | |
| 629 | for my $i (1, 2, 3) { |
| 630 | some_function(); |
| 631 | } |
| 632 | |
| 633 | the scope of $i extends to the end of the loop, but not beyond it, |
| 634 | rendering the value of $i inaccessible within C<some_function()>. |
| 635 | X<foreach> X<for> |
| 636 | |
| 637 | Some users may wish to encourage the use of lexically scoped variables. |
| 638 | As an aid to catching implicit uses to package variables, |
| 639 | which are always global, if you say |
| 640 | |
| 641 | use strict 'vars'; |
| 642 | |
| 643 | then any variable mentioned from there to the end of the enclosing |
| 644 | block must either refer to a lexical variable, be predeclared via |
| 645 | C<our> or C<use vars>, or else must be fully qualified with the package name. |
| 646 | A compilation error results otherwise. An inner block may countermand |
| 647 | this with C<no strict 'vars'>. |
| 648 | |
| 649 | A C<my> has both a compile-time and a run-time effect. At compile |
| 650 | time, the compiler takes notice of it. The principal usefulness |
| 651 | of this is to quiet C<use strict 'vars'>, but it is also essential |
| 652 | for generation of closures as detailed in L<perlref>. Actual |
| 653 | initialization is delayed until run time, though, so it gets executed |
| 654 | at the appropriate time, such as each time through a loop, for |
| 655 | example. |
| 656 | |
| 657 | Variables declared with C<my> are not part of any package and are therefore |
| 658 | never fully qualified with the package name. In particular, you're not |
| 659 | allowed to try to make a package variable (or other global) lexical: |
| 660 | |
| 661 | my $pack::var; # ERROR! Illegal syntax |
| 662 | |
| 663 | In fact, a dynamic variable (also known as package or global variables) |
| 664 | are still accessible using the fully qualified C<::> notation even while a |
| 665 | lexical of the same name is also visible: |
| 666 | |
| 667 | package main; |
| 668 | local $x = 10; |
| 669 | my $x = 20; |
| 670 | print "$x and $::x\n"; |
| 671 | |
| 672 | That will print out C<20> and C<10>. |
| 673 | |
| 674 | You may declare C<my> variables at the outermost scope of a file |
| 675 | to hide any such identifiers from the world outside that file. This |
| 676 | is similar in spirit to C's static variables when they are used at |
| 677 | the file level. To do this with a subroutine requires the use of |
| 678 | a closure (an anonymous function that accesses enclosing lexicals). |
| 679 | If you want to create a private subroutine that cannot be called |
| 680 | from outside that block, it can declare a lexical variable containing |
| 681 | an anonymous sub reference: |
| 682 | |
| 683 | my $secret_version = '1.001-beta'; |
| 684 | my $secret_sub = sub { print $secret_version }; |
| 685 | &$secret_sub(); |
| 686 | |
| 687 | As long as the reference is never returned by any function within the |
| 688 | module, no outside module can see the subroutine, because its name is not in |
| 689 | any package's symbol table. Remember that it's not I<REALLY> called |
| 690 | C<$some_pack::secret_version> or anything; it's just $secret_version, |
| 691 | unqualified and unqualifiable. |
| 692 | |
| 693 | This does not work with object methods, however; all object methods |
| 694 | have to be in the symbol table of some package to be found. See |
| 695 | L<perlref/"Function Templates"> for something of a work-around to |
| 696 | this. |
| 697 | |
| 698 | =head2 Persistent Private Variables |
| 699 | X<state> X<state variable> X<static> X<variable, persistent> X<variable, static> X<closure> |
| 700 | |
| 701 | There are two ways to build persistent private variables in Perl 5.10. |
| 702 | First, you can simply use the C<state> feature. Or, you can use closures, |
| 703 | if you want to stay compatible with releases older than 5.10. |
| 704 | |
| 705 | =head3 Persistent variables via state() |
| 706 | |
| 707 | Beginning with Perl 5.10.0, you can declare variables with the C<state> |
| 708 | keyword in place of C<my>. For that to work, though, you must have |
| 709 | enabled that feature beforehand, either by using the C<feature> pragma, or |
| 710 | by using C<-E> on one-liners (see L<feature>). Beginning with Perl 5.16, |
| 711 | the C<CORE::state> form does not require the |
| 712 | C<feature> pragma. |
| 713 | |
| 714 | The C<state> keyword creates a lexical variable (following the same scoping |
| 715 | rules as C<my>) that persists from one subroutine call to the next. If a |
| 716 | state variable resides inside an anonymous subroutine, then each copy of |
| 717 | the subroutine has its own copy of the state variable. However, the value |
| 718 | of the state variable will still persist between calls to the same copy of |
| 719 | the anonymous subroutine. (Don't forget that C<sub { ... }> creates a new |
| 720 | subroutine each time it is executed.) |
| 721 | |
| 722 | For example, the following code maintains a private counter, incremented |
| 723 | each time the gimme_another() function is called: |
| 724 | |
| 725 | use feature 'state'; |
| 726 | sub gimme_another { state $x; return ++$x } |
| 727 | |
| 728 | And this example uses anonymous subroutines to create separate counters: |
| 729 | |
| 730 | use feature 'state'; |
| 731 | sub create_counter { |
| 732 | return sub { state $x; return ++$x } |
| 733 | } |
| 734 | |
| 735 | Also, since C<$x> is lexical, it can't be reached or modified by any Perl |
| 736 | code outside. |
| 737 | |
| 738 | When combined with variable declaration, simple scalar assignment to C<state> |
| 739 | variables (as in C<state $x = 42>) is executed only the first time. When such |
| 740 | statements are evaluated subsequent times, the assignment is ignored. The |
| 741 | behavior of this sort of assignment to non-scalar variables is undefined. |
| 742 | |
| 743 | =head3 Persistent variables with closures |
| 744 | |
| 745 | Just because a lexical variable is lexically (also called statically) |
| 746 | scoped to its enclosing block, C<eval>, or C<do> FILE, this doesn't mean that |
| 747 | within a function it works like a C static. It normally works more |
| 748 | like a C auto, but with implicit garbage collection. |
| 749 | |
| 750 | Unlike local variables in C or C++, Perl's lexical variables don't |
| 751 | necessarily get recycled just because their scope has exited. |
| 752 | If something more permanent is still aware of the lexical, it will |
| 753 | stick around. So long as something else references a lexical, that |
| 754 | lexical won't be freed--which is as it should be. You wouldn't want |
| 755 | memory being free until you were done using it, or kept around once you |
| 756 | were done. Automatic garbage collection takes care of this for you. |
| 757 | |
| 758 | This means that you can pass back or save away references to lexical |
| 759 | variables, whereas to return a pointer to a C auto is a grave error. |
| 760 | It also gives us a way to simulate C's function statics. Here's a |
| 761 | mechanism for giving a function private variables with both lexical |
| 762 | scoping and a static lifetime. If you do want to create something like |
| 763 | C's static variables, just enclose the whole function in an extra block, |
| 764 | and put the static variable outside the function but in the block. |
| 765 | |
| 766 | { |
| 767 | my $secret_val = 0; |
| 768 | sub gimme_another { |
| 769 | return ++$secret_val; |
| 770 | } |
| 771 | } |
| 772 | # $secret_val now becomes unreachable by the outside |
| 773 | # world, but retains its value between calls to gimme_another |
| 774 | |
| 775 | If this function is being sourced in from a separate file |
| 776 | via C<require> or C<use>, then this is probably just fine. If it's |
| 777 | all in the main program, you'll need to arrange for the C<my> |
| 778 | to be executed early, either by putting the whole block above |
| 779 | your main program, or more likely, placing merely a C<BEGIN> |
| 780 | code block around it to make sure it gets executed before your program |
| 781 | starts to run: |
| 782 | |
| 783 | BEGIN { |
| 784 | my $secret_val = 0; |
| 785 | sub gimme_another { |
| 786 | return ++$secret_val; |
| 787 | } |
| 788 | } |
| 789 | |
| 790 | See L<perlmod/"BEGIN, UNITCHECK, CHECK, INIT and END"> about the |
| 791 | special triggered code blocks, C<BEGIN>, C<UNITCHECK>, C<CHECK>, |
| 792 | C<INIT> and C<END>. |
| 793 | |
| 794 | If declared at the outermost scope (the file scope), then lexicals |
| 795 | work somewhat like C's file statics. They are available to all |
| 796 | functions in that same file declared below them, but are inaccessible |
| 797 | from outside that file. This strategy is sometimes used in modules |
| 798 | to create private variables that the whole module can see. |
| 799 | |
| 800 | =head2 Temporary Values via local() |
| 801 | X<local> X<scope, dynamic> X<dynamic scope> X<variable, local> |
| 802 | X<variable, temporary> |
| 803 | |
| 804 | B<WARNING>: In general, you should be using C<my> instead of C<local>, because |
| 805 | it's faster and safer. Exceptions to this include the global punctuation |
| 806 | variables, global filehandles and formats, and direct manipulation of the |
| 807 | Perl symbol table itself. C<local> is mostly used when the current value |
| 808 | of a variable must be visible to called subroutines. |
| 809 | |
| 810 | Synopsis: |
| 811 | |
| 812 | # localization of values |
| 813 | |
| 814 | local $foo; # make $foo dynamically local |
| 815 | local (@wid, %get); # make list of variables local |
| 816 | local $foo = "flurp"; # make $foo dynamic, and init it |
| 817 | local @oof = @bar; # make @oof dynamic, and init it |
| 818 | |
| 819 | local $hash{key} = "val"; # sets a local value for this hash entry |
| 820 | delete local $hash{key}; # delete this entry for the current block |
| 821 | local ($cond ? $v1 : $v2); # several types of lvalues support |
| 822 | # localization |
| 823 | |
| 824 | # localization of symbols |
| 825 | |
| 826 | local *FH; # localize $FH, @FH, %FH, &FH ... |
| 827 | local *merlyn = *randal; # now $merlyn is really $randal, plus |
| 828 | # @merlyn is really @randal, etc |
| 829 | local *merlyn = 'randal'; # SAME THING: promote 'randal' to *randal |
| 830 | local *merlyn = \$randal; # just alias $merlyn, not @merlyn etc |
| 831 | |
| 832 | A C<local> modifies its listed variables to be "local" to the |
| 833 | enclosing block, C<eval>, or C<do FILE>--and to I<any subroutine |
| 834 | called from within that block>. A C<local> just gives temporary |
| 835 | values to global (meaning package) variables. It does I<not> create |
| 836 | a local variable. This is known as dynamic scoping. Lexical scoping |
| 837 | is done with C<my>, which works more like C's auto declarations. |
| 838 | |
| 839 | Some types of lvalues can be localized as well: hash and array elements |
| 840 | and slices, conditionals (provided that their result is always |
| 841 | localizable), and symbolic references. As for simple variables, this |
| 842 | creates new, dynamically scoped values. |
| 843 | |
| 844 | If more than one variable or expression is given to C<local>, they must be |
| 845 | placed in parentheses. This operator works |
| 846 | by saving the current values of those variables in its argument list on a |
| 847 | hidden stack and restoring them upon exiting the block, subroutine, or |
| 848 | eval. This means that called subroutines can also reference the local |
| 849 | variable, but not the global one. The argument list may be assigned to if |
| 850 | desired, which allows you to initialize your local variables. (If no |
| 851 | initializer is given for a particular variable, it is created with an |
| 852 | undefined value.) |
| 853 | |
| 854 | Because C<local> is a run-time operator, it gets executed each time |
| 855 | through a loop. Consequently, it's more efficient to localize your |
| 856 | variables outside the loop. |
| 857 | |
| 858 | =head3 Grammatical note on local() |
| 859 | X<local, context> |
| 860 | |
| 861 | A C<local> is simply a modifier on an lvalue expression. When you assign to |
| 862 | a C<local>ized variable, the C<local> doesn't change whether its list is viewed |
| 863 | as a scalar or an array. So |
| 864 | |
| 865 | local($foo) = <STDIN>; |
| 866 | local @FOO = <STDIN>; |
| 867 | |
| 868 | both supply a list context to the right-hand side, while |
| 869 | |
| 870 | local $foo = <STDIN>; |
| 871 | |
| 872 | supplies a scalar context. |
| 873 | |
| 874 | =head3 Localization of special variables |
| 875 | X<local, special variable> |
| 876 | |
| 877 | If you localize a special variable, you'll be giving a new value to it, |
| 878 | but its magic won't go away. That means that all side-effects related |
| 879 | to this magic still work with the localized value. |
| 880 | |
| 881 | This feature allows code like this to work : |
| 882 | |
| 883 | # Read the whole contents of FILE in $slurp |
| 884 | { local $/ = undef; $slurp = <FILE>; } |
| 885 | |
| 886 | Note, however, that this restricts localization of some values ; for |
| 887 | example, the following statement dies, as of perl 5.10.0, with an error |
| 888 | I<Modification of a read-only value attempted>, because the $1 variable is |
| 889 | magical and read-only : |
| 890 | |
| 891 | local $1 = 2; |
| 892 | |
| 893 | One exception is the default scalar variable: starting with perl 5.14 |
| 894 | C<local($_)> will always strip all magic from $_, to make it possible |
| 895 | to safely reuse $_ in a subroutine. |
| 896 | |
| 897 | B<WARNING>: Localization of tied arrays and hashes does not currently |
| 898 | work as described. |
| 899 | This will be fixed in a future release of Perl; in the meantime, avoid |
| 900 | code that relies on any particular behaviour of localising tied arrays |
| 901 | or hashes (localising individual elements is still okay). |
| 902 | See L<perl58delta/"Localising Tied Arrays and Hashes Is Broken"> for more |
| 903 | details. |
| 904 | X<local, tie> |
| 905 | |
| 906 | =head3 Localization of globs |
| 907 | X<local, glob> X<glob> |
| 908 | |
| 909 | The construct |
| 910 | |
| 911 | local *name; |
| 912 | |
| 913 | creates a whole new symbol table entry for the glob C<name> in the |
| 914 | current package. That means that all variables in its glob slot ($name, |
| 915 | @name, %name, &name, and the C<name> filehandle) are dynamically reset. |
| 916 | |
| 917 | This implies, among other things, that any magic eventually carried by |
| 918 | those variables is locally lost. In other words, saying C<local */> |
| 919 | will not have any effect on the internal value of the input record |
| 920 | separator. |
| 921 | |
| 922 | =head3 Localization of elements of composite types |
| 923 | X<local, composite type element> X<local, array element> X<local, hash element> |
| 924 | |
| 925 | It's also worth taking a moment to explain what happens when you |
| 926 | C<local>ize a member of a composite type (i.e. an array or hash element). |
| 927 | In this case, the element is C<local>ized I<by name>. This means that |
| 928 | when the scope of the C<local()> ends, the saved value will be |
| 929 | restored to the hash element whose key was named in the C<local()>, or |
| 930 | the array element whose index was named in the C<local()>. If that |
| 931 | element was deleted while the C<local()> was in effect (e.g. by a |
| 932 | C<delete()> from a hash or a C<shift()> of an array), it will spring |
| 933 | back into existence, possibly extending an array and filling in the |
| 934 | skipped elements with C<undef>. For instance, if you say |
| 935 | |
| 936 | %hash = ( 'This' => 'is', 'a' => 'test' ); |
| 937 | @ary = ( 0..5 ); |
| 938 | { |
| 939 | local($ary[5]) = 6; |
| 940 | local($hash{'a'}) = 'drill'; |
| 941 | while (my $e = pop(@ary)) { |
| 942 | print "$e . . .\n"; |
| 943 | last unless $e > 3; |
| 944 | } |
| 945 | if (@ary) { |
| 946 | $hash{'only a'} = 'test'; |
| 947 | delete $hash{'a'}; |
| 948 | } |
| 949 | } |
| 950 | print join(' ', map { "$_ $hash{$_}" } sort keys %hash),".\n"; |
| 951 | print "The array has ",scalar(@ary)," elements: ", |
| 952 | join(', ', map { defined $_ ? $_ : 'undef' } @ary),"\n"; |
| 953 | |
| 954 | Perl will print |
| 955 | |
| 956 | 6 . . . |
| 957 | 4 . . . |
| 958 | 3 . . . |
| 959 | This is a test only a test. |
| 960 | The array has 6 elements: 0, 1, 2, undef, undef, 5 |
| 961 | |
| 962 | The behavior of local() on non-existent members of composite |
| 963 | types is subject to change in future. |
| 964 | |
| 965 | =head3 Localized deletion of elements of composite types |
| 966 | X<delete> X<local, composite type element> X<local, array element> X<local, hash element> |
| 967 | |
| 968 | You can use the C<delete local $array[$idx]> and C<delete local $hash{key}> |
| 969 | constructs to delete a composite type entry for the current block and restore |
| 970 | it when it ends. They return the array/hash value before the localization, |
| 971 | which means that they are respectively equivalent to |
| 972 | |
| 973 | do { |
| 974 | my $val = $array[$idx]; |
| 975 | local $array[$idx]; |
| 976 | delete $array[$idx]; |
| 977 | $val |
| 978 | } |
| 979 | |
| 980 | and |
| 981 | |
| 982 | do { |
| 983 | my $val = $hash{key}; |
| 984 | local $hash{key}; |
| 985 | delete $hash{key}; |
| 986 | $val |
| 987 | } |
| 988 | |
| 989 | except that for those the C<local> is scoped to the C<do> block. Slices are |
| 990 | also accepted. |
| 991 | |
| 992 | my %hash = ( |
| 993 | a => [ 7, 8, 9 ], |
| 994 | b => 1, |
| 995 | ) |
| 996 | |
| 997 | { |
| 998 | my $a = delete local $hash{a}; |
| 999 | # $a is [ 7, 8, 9 ] |
| 1000 | # %hash is (b => 1) |
| 1001 | |
| 1002 | { |
| 1003 | my @nums = delete local @$a[0, 2] |
| 1004 | # @nums is (7, 9) |
| 1005 | # $a is [ undef, 8 ] |
| 1006 | |
| 1007 | $a[0] = 999; # will be erased when the scope ends |
| 1008 | } |
| 1009 | # $a is back to [ 7, 8, 9 ] |
| 1010 | |
| 1011 | } |
| 1012 | # %hash is back to its original state |
| 1013 | |
| 1014 | =head2 Lvalue subroutines |
| 1015 | X<lvalue> X<subroutine, lvalue> |
| 1016 | |
| 1017 | It is possible to return a modifiable value from a subroutine. |
| 1018 | To do this, you have to declare the subroutine to return an lvalue. |
| 1019 | |
| 1020 | my $val; |
| 1021 | sub canmod : lvalue { |
| 1022 | $val; # or: return $val; |
| 1023 | } |
| 1024 | sub nomod { |
| 1025 | $val; |
| 1026 | } |
| 1027 | |
| 1028 | canmod() = 5; # assigns to $val |
| 1029 | nomod() = 5; # ERROR |
| 1030 | |
| 1031 | The scalar/list context for the subroutine and for the right-hand |
| 1032 | side of assignment is determined as if the subroutine call is replaced |
| 1033 | by a scalar. For example, consider: |
| 1034 | |
| 1035 | data(2,3) = get_data(3,4); |
| 1036 | |
| 1037 | Both subroutines here are called in a scalar context, while in: |
| 1038 | |
| 1039 | (data(2,3)) = get_data(3,4); |
| 1040 | |
| 1041 | and in: |
| 1042 | |
| 1043 | (data(2),data(3)) = get_data(3,4); |
| 1044 | |
| 1045 | all the subroutines are called in a list context. |
| 1046 | |
| 1047 | Lvalue subroutines are convenient, but you have to keep in mind that, |
| 1048 | when used with objects, they may violate encapsulation. A normal |
| 1049 | mutator can check the supplied argument before setting the attribute |
| 1050 | it is protecting, an lvalue subroutine cannot. If you require any |
| 1051 | special processing when storing and retrieving the values, consider |
| 1052 | using the CPAN module Sentinel or something similar. |
| 1053 | |
| 1054 | =head2 Lexical Subroutines |
| 1055 | X<my sub> X<state sub> X<our sub> X<subroutine, lexical> |
| 1056 | |
| 1057 | B<WARNING>: Lexical subroutines are still experimental. The feature may be |
| 1058 | modified or removed in future versions of Perl. |
| 1059 | |
| 1060 | Lexical subroutines are only available under the C<use feature |
| 1061 | 'lexical_subs'> pragma, which produces a warning unless the |
| 1062 | "experimental::lexical_subs" warnings category is disabled. |
| 1063 | |
| 1064 | Beginning with Perl 5.18, you can declare a private subroutine with C<my> |
| 1065 | or C<state>. As with state variables, the C<state> keyword is only |
| 1066 | available under C<use feature 'state'> or C<use 5.010> or higher. |
| 1067 | |
| 1068 | These subroutines are only visible within the block in which they are |
| 1069 | declared, and only after that declaration: |
| 1070 | |
| 1071 | no warnings "experimental::lexical_subs"; |
| 1072 | use feature 'lexical_subs'; |
| 1073 | |
| 1074 | foo(); # calls the package/global subroutine |
| 1075 | state sub foo { |
| 1076 | foo(); # also calls the package subroutine |
| 1077 | } |
| 1078 | foo(); # calls "state" sub |
| 1079 | my $ref = \&foo; # take a reference to "state" sub |
| 1080 | |
| 1081 | my sub bar { ... } |
| 1082 | bar(); # calls "my" sub |
| 1083 | |
| 1084 | To use a lexical subroutine from inside the subroutine itself, you must |
| 1085 | predeclare it. The C<sub foo {...}> subroutine definition syntax respects |
| 1086 | any previous C<my sub;> or C<state sub;> declaration. |
| 1087 | |
| 1088 | my sub baz; # predeclaration |
| 1089 | sub baz { # define the "my" sub |
| 1090 | baz(); # recursive call |
| 1091 | } |
| 1092 | |
| 1093 | =head3 C<state sub> vs C<my sub> |
| 1094 | |
| 1095 | What is the difference between "state" subs and "my" subs? Each time that |
| 1096 | execution enters a block when "my" subs are declared, a new copy of each |
| 1097 | sub is created. "State" subroutines persist from one execution of the |
| 1098 | containing block to the next. |
| 1099 | |
| 1100 | So, in general, "state" subroutines are faster. But "my" subs are |
| 1101 | necessary if you want to create closures: |
| 1102 | |
| 1103 | no warnings "experimental::lexical_subs"; |
| 1104 | use feature 'lexical_subs'; |
| 1105 | |
| 1106 | sub whatever { |
| 1107 | my $x = shift; |
| 1108 | my sub inner { |
| 1109 | ... do something with $x ... |
| 1110 | } |
| 1111 | inner(); |
| 1112 | } |
| 1113 | |
| 1114 | In this example, a new C<$x> is created when C<whatever> is called, and |
| 1115 | also a new C<inner>, which can see the new C<$x>. A "state" sub will only |
| 1116 | see the C<$x> from the first call to C<whatever>. |
| 1117 | |
| 1118 | =head3 C<our> subroutines |
| 1119 | |
| 1120 | Like C<our $variable>, C<our sub> creates a lexical alias to the package |
| 1121 | subroutine of the same name. |
| 1122 | |
| 1123 | The two main uses for this are to switch back to using the package sub |
| 1124 | inside an inner scope: |
| 1125 | |
| 1126 | no warnings "experimental::lexical_subs"; |
| 1127 | use feature 'lexical_subs'; |
| 1128 | |
| 1129 | sub foo { ... } |
| 1130 | |
| 1131 | sub bar { |
| 1132 | my sub foo { ... } |
| 1133 | { |
| 1134 | # need to use the outer foo here |
| 1135 | our sub foo; |
| 1136 | foo(); |
| 1137 | } |
| 1138 | } |
| 1139 | |
| 1140 | and to make a subroutine visible to other packages in the same scope: |
| 1141 | |
| 1142 | package MySneakyModule; |
| 1143 | |
| 1144 | no warnings "experimental::lexical_subs"; |
| 1145 | use feature 'lexical_subs'; |
| 1146 | |
| 1147 | our sub do_something { ... } |
| 1148 | |
| 1149 | sub do_something_with_caller { |
| 1150 | package DB; |
| 1151 | () = caller 1; # sets @DB::args |
| 1152 | do_something(@args); # uses MySneakyModule::do_something |
| 1153 | } |
| 1154 | |
| 1155 | =head2 Passing Symbol Table Entries (typeglobs) |
| 1156 | X<typeglob> X<*> |
| 1157 | |
| 1158 | B<WARNING>: The mechanism described in this section was originally |
| 1159 | the only way to simulate pass-by-reference in older versions of |
| 1160 | Perl. While it still works fine in modern versions, the new reference |
| 1161 | mechanism is generally easier to work with. See below. |
| 1162 | |
| 1163 | Sometimes you don't want to pass the value of an array to a subroutine |
| 1164 | but rather the name of it, so that the subroutine can modify the global |
| 1165 | copy of it rather than working with a local copy. In perl you can |
| 1166 | refer to all objects of a particular name by prefixing the name |
| 1167 | with a star: C<*foo>. This is often known as a "typeglob", because the |
| 1168 | star on the front can be thought of as a wildcard match for all the |
| 1169 | funny prefix characters on variables and subroutines and such. |
| 1170 | |
| 1171 | When evaluated, the typeglob produces a scalar value that represents |
| 1172 | all the objects of that name, including any filehandle, format, or |
| 1173 | subroutine. When assigned to, it causes the name mentioned to refer to |
| 1174 | whatever C<*> value was assigned to it. Example: |
| 1175 | |
| 1176 | sub doubleary { |
| 1177 | local(*someary) = @_; |
| 1178 | foreach $elem (@someary) { |
| 1179 | $elem *= 2; |
| 1180 | } |
| 1181 | } |
| 1182 | doubleary(*foo); |
| 1183 | doubleary(*bar); |
| 1184 | |
| 1185 | Scalars are already passed by reference, so you can modify |
| 1186 | scalar arguments without using this mechanism by referring explicitly |
| 1187 | to C<$_[0]> etc. You can modify all the elements of an array by passing |
| 1188 | all the elements as scalars, but you have to use the C<*> mechanism (or |
| 1189 | the equivalent reference mechanism) to C<push>, C<pop>, or change the size of |
| 1190 | an array. It will certainly be faster to pass the typeglob (or reference). |
| 1191 | |
| 1192 | Even if you don't want to modify an array, this mechanism is useful for |
| 1193 | passing multiple arrays in a single LIST, because normally the LIST |
| 1194 | mechanism will merge all the array values so that you can't extract out |
| 1195 | the individual arrays. For more on typeglobs, see |
| 1196 | L<perldata/"Typeglobs and Filehandles">. |
| 1197 | |
| 1198 | =head2 When to Still Use local() |
| 1199 | X<local> X<variable, local> |
| 1200 | |
| 1201 | Despite the existence of C<my>, there are still three places where the |
| 1202 | C<local> operator still shines. In fact, in these three places, you |
| 1203 | I<must> use C<local> instead of C<my>. |
| 1204 | |
| 1205 | =over 4 |
| 1206 | |
| 1207 | =item 1. |
| 1208 | |
| 1209 | You need to give a global variable a temporary value, especially $_. |
| 1210 | |
| 1211 | The global variables, like C<@ARGV> or the punctuation variables, must be |
| 1212 | C<local>ized with C<local()>. This block reads in F</etc/motd>, and splits |
| 1213 | it up into chunks separated by lines of equal signs, which are placed |
| 1214 | in C<@Fields>. |
| 1215 | |
| 1216 | { |
| 1217 | local @ARGV = ("/etc/motd"); |
| 1218 | local $/ = undef; |
| 1219 | local $_ = <>; |
| 1220 | @Fields = split /^\s*=+\s*$/; |
| 1221 | } |
| 1222 | |
| 1223 | It particular, it's important to C<local>ize $_ in any routine that assigns |
| 1224 | to it. Look out for implicit assignments in C<while> conditionals. |
| 1225 | |
| 1226 | =item 2. |
| 1227 | |
| 1228 | You need to create a local file or directory handle or a local function. |
| 1229 | |
| 1230 | A function that needs a filehandle of its own must use |
| 1231 | C<local()> on a complete typeglob. This can be used to create new symbol |
| 1232 | table entries: |
| 1233 | |
| 1234 | sub ioqueue { |
| 1235 | local (*READER, *WRITER); # not my! |
| 1236 | pipe (READER, WRITER) or die "pipe: $!"; |
| 1237 | return (*READER, *WRITER); |
| 1238 | } |
| 1239 | ($head, $tail) = ioqueue(); |
| 1240 | |
| 1241 | See the Symbol module for a way to create anonymous symbol table |
| 1242 | entries. |
| 1243 | |
| 1244 | Because assignment of a reference to a typeglob creates an alias, this |
| 1245 | can be used to create what is effectively a local function, or at least, |
| 1246 | a local alias. |
| 1247 | |
| 1248 | { |
| 1249 | local *grow = \&shrink; # only until this block exits |
| 1250 | grow(); # really calls shrink() |
| 1251 | move(); # if move() grow()s, it shrink()s too |
| 1252 | } |
| 1253 | grow(); # get the real grow() again |
| 1254 | |
| 1255 | See L<perlref/"Function Templates"> for more about manipulating |
| 1256 | functions by name in this way. |
| 1257 | |
| 1258 | =item 3. |
| 1259 | |
| 1260 | You want to temporarily change just one element of an array or hash. |
| 1261 | |
| 1262 | You can C<local>ize just one element of an aggregate. Usually this |
| 1263 | is done on dynamics: |
| 1264 | |
| 1265 | { |
| 1266 | local $SIG{INT} = 'IGNORE'; |
| 1267 | funct(); # uninterruptible |
| 1268 | } |
| 1269 | # interruptibility automatically restored here |
| 1270 | |
| 1271 | But it also works on lexically declared aggregates. |
| 1272 | |
| 1273 | =back |
| 1274 | |
| 1275 | =head2 Pass by Reference |
| 1276 | X<pass by reference> X<pass-by-reference> X<reference> |
| 1277 | |
| 1278 | If you want to pass more than one array or hash into a function--or |
| 1279 | return them from it--and have them maintain their integrity, then |
| 1280 | you're going to have to use an explicit pass-by-reference. Before you |
| 1281 | do that, you need to understand references as detailed in L<perlref>. |
| 1282 | This section may not make much sense to you otherwise. |
| 1283 | |
| 1284 | Here are a few simple examples. First, let's pass in several arrays |
| 1285 | to a function and have it C<pop> all of then, returning a new list |
| 1286 | of all their former last elements: |
| 1287 | |
| 1288 | @tailings = popmany ( \@a, \@b, \@c, \@d ); |
| 1289 | |
| 1290 | sub popmany { |
| 1291 | my $aref; |
| 1292 | my @retlist = (); |
| 1293 | foreach $aref ( @_ ) { |
| 1294 | push @retlist, pop @$aref; |
| 1295 | } |
| 1296 | return @retlist; |
| 1297 | } |
| 1298 | |
| 1299 | Here's how you might write a function that returns a |
| 1300 | list of keys occurring in all the hashes passed to it: |
| 1301 | |
| 1302 | @common = inter( \%foo, \%bar, \%joe ); |
| 1303 | sub inter { |
| 1304 | my ($k, $href, %seen); # locals |
| 1305 | foreach $href (@_) { |
| 1306 | while ( $k = each %$href ) { |
| 1307 | $seen{$k}++; |
| 1308 | } |
| 1309 | } |
| 1310 | return grep { $seen{$_} == @_ } keys %seen; |
| 1311 | } |
| 1312 | |
| 1313 | So far, we're using just the normal list return mechanism. |
| 1314 | What happens if you want to pass or return a hash? Well, |
| 1315 | if you're using only one of them, or you don't mind them |
| 1316 | concatenating, then the normal calling convention is ok, although |
| 1317 | a little expensive. |
| 1318 | |
| 1319 | Where people get into trouble is here: |
| 1320 | |
| 1321 | (@a, @b) = func(@c, @d); |
| 1322 | or |
| 1323 | (%a, %b) = func(%c, %d); |
| 1324 | |
| 1325 | That syntax simply won't work. It sets just C<@a> or C<%a> and |
| 1326 | clears the C<@b> or C<%b>. Plus the function didn't get passed |
| 1327 | into two separate arrays or hashes: it got one long list in C<@_>, |
| 1328 | as always. |
| 1329 | |
| 1330 | If you can arrange for everyone to deal with this through references, it's |
| 1331 | cleaner code, although not so nice to look at. Here's a function that |
| 1332 | takes two array references as arguments, returning the two array elements |
| 1333 | in order of how many elements they have in them: |
| 1334 | |
| 1335 | ($aref, $bref) = func(\@c, \@d); |
| 1336 | print "@$aref has more than @$bref\n"; |
| 1337 | sub func { |
| 1338 | my ($cref, $dref) = @_; |
| 1339 | if (@$cref > @$dref) { |
| 1340 | return ($cref, $dref); |
| 1341 | } else { |
| 1342 | return ($dref, $cref); |
| 1343 | } |
| 1344 | } |
| 1345 | |
| 1346 | It turns out that you can actually do this also: |
| 1347 | |
| 1348 | (*a, *b) = func(\@c, \@d); |
| 1349 | print "@a has more than @b\n"; |
| 1350 | sub func { |
| 1351 | local (*c, *d) = @_; |
| 1352 | if (@c > @d) { |
| 1353 | return (\@c, \@d); |
| 1354 | } else { |
| 1355 | return (\@d, \@c); |
| 1356 | } |
| 1357 | } |
| 1358 | |
| 1359 | Here we're using the typeglobs to do symbol table aliasing. It's |
| 1360 | a tad subtle, though, and also won't work if you're using C<my> |
| 1361 | variables, because only globals (even in disguise as C<local>s) |
| 1362 | are in the symbol table. |
| 1363 | |
| 1364 | If you're passing around filehandles, you could usually just use the bare |
| 1365 | typeglob, like C<*STDOUT>, but typeglobs references work, too. |
| 1366 | For example: |
| 1367 | |
| 1368 | splutter(\*STDOUT); |
| 1369 | sub splutter { |
| 1370 | my $fh = shift; |
| 1371 | print $fh "her um well a hmmm\n"; |
| 1372 | } |
| 1373 | |
| 1374 | $rec = get_rec(\*STDIN); |
| 1375 | sub get_rec { |
| 1376 | my $fh = shift; |
| 1377 | return scalar <$fh>; |
| 1378 | } |
| 1379 | |
| 1380 | If you're planning on generating new filehandles, you could do this. |
| 1381 | Notice to pass back just the bare *FH, not its reference. |
| 1382 | |
| 1383 | sub openit { |
| 1384 | my $path = shift; |
| 1385 | local *FH; |
| 1386 | return open (FH, $path) ? *FH : undef; |
| 1387 | } |
| 1388 | |
| 1389 | =head2 Prototypes |
| 1390 | X<prototype> X<subroutine, prototype> |
| 1391 | |
| 1392 | Perl supports a very limited kind of compile-time argument checking |
| 1393 | using function prototyping. This can be declared in either the PROTO |
| 1394 | section or with a L<prototype attribute|attributes/Built-in Attributes>. |
| 1395 | If you declare either of |
| 1396 | |
| 1397 | sub mypush (+@) |
| 1398 | sub mypush :prototype(+@) |
| 1399 | |
| 1400 | then C<mypush()> takes arguments exactly like C<push()> does. |
| 1401 | |
| 1402 | If subroutine signatures are enabled (see L</Signatures>), then |
| 1403 | the shorter PROTO syntax is unavailable, because it would clash with |
| 1404 | signatures. In that case, a prototype can only be declared in the form |
| 1405 | of an attribute. |
| 1406 | |
| 1407 | The |
| 1408 | function declaration must be visible at compile time. The prototype |
| 1409 | affects only interpretation of new-style calls to the function, |
| 1410 | where new-style is defined as not using the C<&> character. In |
| 1411 | other words, if you call it like a built-in function, then it behaves |
| 1412 | like a built-in function. If you call it like an old-fashioned |
| 1413 | subroutine, then it behaves like an old-fashioned subroutine. It |
| 1414 | naturally falls out from this rule that prototypes have no influence |
| 1415 | on subroutine references like C<\&foo> or on indirect subroutine |
| 1416 | calls like C<&{$subref}> or C<< $subref->() >>. |
| 1417 | |
| 1418 | Method calls are not influenced by prototypes either, because the |
| 1419 | function to be called is indeterminate at compile time, since |
| 1420 | the exact code called depends on inheritance. |
| 1421 | |
| 1422 | Because the intent of this feature is primarily to let you define |
| 1423 | subroutines that work like built-in functions, here are prototypes |
| 1424 | for some other functions that parse almost exactly like the |
| 1425 | corresponding built-in. |
| 1426 | |
| 1427 | Declared as Called as |
| 1428 | |
| 1429 | sub mylink ($$) mylink $old, $new |
| 1430 | sub myvec ($$$) myvec $var, $offset, 1 |
| 1431 | sub myindex ($$;$) myindex &getstring, "substr" |
| 1432 | sub mysyswrite ($$$;$) mysyswrite $buf, 0, length($buf) - $off, $off |
| 1433 | sub myreverse (@) myreverse $a, $b, $c |
| 1434 | sub myjoin ($@) myjoin ":", $a, $b, $c |
| 1435 | sub mypop (+) mypop @array |
| 1436 | sub mysplice (+$$@) mysplice @array, 0, 2, @pushme |
| 1437 | sub mykeys (+) mykeys %{$hashref} |
| 1438 | sub myopen (*;$) myopen HANDLE, $name |
| 1439 | sub mypipe (**) mypipe READHANDLE, WRITEHANDLE |
| 1440 | sub mygrep (&@) mygrep { /foo/ } $a, $b, $c |
| 1441 | sub myrand (;$) myrand 42 |
| 1442 | sub mytime () mytime |
| 1443 | |
| 1444 | Any backslashed prototype character represents an actual argument |
| 1445 | that must start with that character (optionally preceded by C<my>, |
| 1446 | C<our> or C<local>), with the exception of C<$>, which will |
| 1447 | accept any scalar lvalue expression, such as C<$foo = 7> or |
| 1448 | C<< my_function()->[0] >>. The value passed as part of C<@_> will be a |
| 1449 | reference to the actual argument given in the subroutine call, |
| 1450 | obtained by applying C<\> to that argument. |
| 1451 | |
| 1452 | You can use the C<\[]> backslash group notation to specify more than one |
| 1453 | allowed argument type. For example: |
| 1454 | |
| 1455 | sub myref (\[$@%&*]) |
| 1456 | |
| 1457 | will allow calling myref() as |
| 1458 | |
| 1459 | myref $var |
| 1460 | myref @array |
| 1461 | myref %hash |
| 1462 | myref &sub |
| 1463 | myref *glob |
| 1464 | |
| 1465 | and the first argument of myref() will be a reference to |
| 1466 | a scalar, an array, a hash, a code, or a glob. |
| 1467 | |
| 1468 | Unbackslashed prototype characters have special meanings. Any |
| 1469 | unbackslashed C<@> or C<%> eats all remaining arguments, and forces |
| 1470 | list context. An argument represented by C<$> forces scalar context. An |
| 1471 | C<&> requires an anonymous subroutine, which, if passed as the first |
| 1472 | argument, does not require the C<sub> keyword or a subsequent comma. |
| 1473 | |
| 1474 | A C<*> allows the subroutine to accept a bareword, constant, scalar expression, |
| 1475 | typeglob, or a reference to a typeglob in that slot. The value will be |
| 1476 | available to the subroutine either as a simple scalar, or (in the latter |
| 1477 | two cases) as a reference to the typeglob. If you wish to always convert |
| 1478 | such arguments to a typeglob reference, use Symbol::qualify_to_ref() as |
| 1479 | follows: |
| 1480 | |
| 1481 | use Symbol 'qualify_to_ref'; |
| 1482 | |
| 1483 | sub foo (*) { |
| 1484 | my $fh = qualify_to_ref(shift, caller); |
| 1485 | ... |
| 1486 | } |
| 1487 | |
| 1488 | The C<+> prototype is a special alternative to C<$> that will act like |
| 1489 | C<\[@%]> when given a literal array or hash variable, but will otherwise |
| 1490 | force scalar context on the argument. This is useful for functions which |
| 1491 | should accept either a literal array or an array reference as the argument: |
| 1492 | |
| 1493 | sub mypush (+@) { |
| 1494 | my $aref = shift; |
| 1495 | die "Not an array or arrayref" unless ref $aref eq 'ARRAY'; |
| 1496 | push @$aref, @_; |
| 1497 | } |
| 1498 | |
| 1499 | When using the C<+> prototype, your function must check that the argument |
| 1500 | is of an acceptable type. |
| 1501 | |
| 1502 | A semicolon (C<;>) separates mandatory arguments from optional arguments. |
| 1503 | It is redundant before C<@> or C<%>, which gobble up everything else. |
| 1504 | |
| 1505 | As the last character of a prototype, or just before a semicolon, a C<@> |
| 1506 | or a C<%>, you can use C<_> in place of C<$>: if this argument is not |
| 1507 | provided, C<$_> will be used instead. |
| 1508 | |
| 1509 | Note how the last three examples in the table above are treated |
| 1510 | specially by the parser. C<mygrep()> is parsed as a true list |
| 1511 | operator, C<myrand()> is parsed as a true unary operator with unary |
| 1512 | precedence the same as C<rand()>, and C<mytime()> is truly without |
| 1513 | arguments, just like C<time()>. That is, if you say |
| 1514 | |
| 1515 | mytime +2; |
| 1516 | |
| 1517 | you'll get C<mytime() + 2>, not C<mytime(2)>, which is how it would be parsed |
| 1518 | without a prototype. If you want to force a unary function to have the |
| 1519 | same precedence as a list operator, add C<;> to the end of the prototype: |
| 1520 | |
| 1521 | sub mygetprotobynumber($;); |
| 1522 | mygetprotobynumber $a > $b; # parsed as mygetprotobynumber($a > $b) |
| 1523 | |
| 1524 | The interesting thing about C<&> is that you can generate new syntax with it, |
| 1525 | provided it's in the initial position: |
| 1526 | X<&> |
| 1527 | |
| 1528 | sub try (&@) { |
| 1529 | my($try,$catch) = @_; |
| 1530 | eval { &$try }; |
| 1531 | if ($@) { |
| 1532 | local $_ = $@; |
| 1533 | &$catch; |
| 1534 | } |
| 1535 | } |
| 1536 | sub catch (&) { $_[0] } |
| 1537 | |
| 1538 | try { |
| 1539 | die "phooey"; |
| 1540 | } catch { |
| 1541 | /phooey/ and print "unphooey\n"; |
| 1542 | }; |
| 1543 | |
| 1544 | That prints C<"unphooey">. (Yes, there are still unresolved |
| 1545 | issues having to do with visibility of C<@_>. I'm ignoring that |
| 1546 | question for the moment. (But note that if we make C<@_> lexically |
| 1547 | scoped, those anonymous subroutines can act like closures... (Gee, |
| 1548 | is this sounding a little Lispish? (Never mind.)))) |
| 1549 | |
| 1550 | And here's a reimplementation of the Perl C<grep> operator: |
| 1551 | X<grep> |
| 1552 | |
| 1553 | sub mygrep (&@) { |
| 1554 | my $code = shift; |
| 1555 | my @result; |
| 1556 | foreach $_ (@_) { |
| 1557 | push(@result, $_) if &$code; |
| 1558 | } |
| 1559 | @result; |
| 1560 | } |
| 1561 | |
| 1562 | Some folks would prefer full alphanumeric prototypes. Alphanumerics have |
| 1563 | been intentionally left out of prototypes for the express purpose of |
| 1564 | someday in the future adding named, formal parameters. The current |
| 1565 | mechanism's main goal is to let module writers provide better diagnostics |
| 1566 | for module users. Larry feels the notation quite understandable to Perl |
| 1567 | programmers, and that it will not intrude greatly upon the meat of the |
| 1568 | module, nor make it harder to read. The line noise is visually |
| 1569 | encapsulated into a small pill that's easy to swallow. |
| 1570 | |
| 1571 | If you try to use an alphanumeric sequence in a prototype you will |
| 1572 | generate an optional warning - "Illegal character in prototype...". |
| 1573 | Unfortunately earlier versions of Perl allowed the prototype to be |
| 1574 | used as long as its prefix was a valid prototype. The warning may be |
| 1575 | upgraded to a fatal error in a future version of Perl once the |
| 1576 | majority of offending code is fixed. |
| 1577 | |
| 1578 | It's probably best to prototype new functions, not retrofit prototyping |
| 1579 | into older ones. That's because you must be especially careful about |
| 1580 | silent impositions of differing list versus scalar contexts. For example, |
| 1581 | if you decide that a function should take just one parameter, like this: |
| 1582 | |
| 1583 | sub func ($) { |
| 1584 | my $n = shift; |
| 1585 | print "you gave me $n\n"; |
| 1586 | } |
| 1587 | |
| 1588 | and someone has been calling it with an array or expression |
| 1589 | returning a list: |
| 1590 | |
| 1591 | func(@foo); |
| 1592 | func( split /:/ ); |
| 1593 | |
| 1594 | Then you've just supplied an automatic C<scalar> in front of their |
| 1595 | argument, which can be more than a bit surprising. The old C<@foo> |
| 1596 | which used to hold one thing doesn't get passed in. Instead, |
| 1597 | C<func()> now gets passed in a C<1>; that is, the number of elements |
| 1598 | in C<@foo>. And the C<split> gets called in scalar context so it |
| 1599 | starts scribbling on your C<@_> parameter list. Ouch! |
| 1600 | |
| 1601 | If a sub has both a PROTO and a BLOCK, the prototype is not applied |
| 1602 | until after the BLOCK is completely defined. This means that a recursive |
| 1603 | function with a prototype has to be predeclared for the prototype to take |
| 1604 | effect, like so: |
| 1605 | |
| 1606 | sub foo($$); |
| 1607 | sub foo($$) { |
| 1608 | foo 1, 2; |
| 1609 | } |
| 1610 | |
| 1611 | This is all very powerful, of course, and should be used only in moderation |
| 1612 | to make the world a better place. |
| 1613 | |
| 1614 | =head2 Constant Functions |
| 1615 | X<constant> |
| 1616 | |
| 1617 | Functions with a prototype of C<()> are potential candidates for |
| 1618 | inlining. If the result after optimization and constant folding |
| 1619 | is either a constant or a lexically-scoped scalar which has no other |
| 1620 | references, then it will be used in place of function calls made |
| 1621 | without C<&>. Calls made using C<&> are never inlined. (See |
| 1622 | F<constant.pm> for an easy way to declare most constants.) |
| 1623 | |
| 1624 | The following functions would all be inlined: |
| 1625 | |
| 1626 | sub pi () { 3.14159 } # Not exact, but close. |
| 1627 | sub PI () { 4 * atan2 1, 1 } # As good as it gets, |
| 1628 | # and it's inlined, too! |
| 1629 | sub ST_DEV () { 0 } |
| 1630 | sub ST_INO () { 1 } |
| 1631 | |
| 1632 | sub FLAG_FOO () { 1 << 8 } |
| 1633 | sub FLAG_BAR () { 1 << 9 } |
| 1634 | sub FLAG_MASK () { FLAG_FOO | FLAG_BAR } |
| 1635 | |
| 1636 | sub OPT_BAZ () { not (0x1B58 & FLAG_MASK) } |
| 1637 | |
| 1638 | sub N () { int(OPT_BAZ) / 3 } |
| 1639 | |
| 1640 | sub FOO_SET () { 1 if FLAG_MASK & FLAG_FOO } |
| 1641 | |
| 1642 | Be aware that these will not be inlined; as they contain inner scopes, |
| 1643 | the constant folding doesn't reduce them to a single constant: |
| 1644 | |
| 1645 | sub foo_set () { if (FLAG_MASK & FLAG_FOO) { 1 } } |
| 1646 | |
| 1647 | sub baz_val () { |
| 1648 | if (OPT_BAZ) { |
| 1649 | return 23; |
| 1650 | } |
| 1651 | else { |
| 1652 | return 42; |
| 1653 | } |
| 1654 | } |
| 1655 | |
| 1656 | If you redefine a subroutine that was eligible for inlining, you'll get |
| 1657 | a warning by default. (You can use this warning to tell whether or not a |
| 1658 | particular subroutine is considered inlinable.) The warning is |
| 1659 | considered severe enough not to be affected by the B<-w> |
| 1660 | switch (or its absence) because previously compiled |
| 1661 | invocations of the function will still be using the old value of the |
| 1662 | function. If you need to be able to redefine the subroutine, you need to |
| 1663 | ensure that it isn't inlined, either by dropping the C<()> prototype |
| 1664 | (which changes calling semantics, so beware) or by thwarting the |
| 1665 | inlining mechanism in some other way, such as |
| 1666 | |
| 1667 | sub not_inlined () { |
| 1668 | 23 if $]; |
| 1669 | } |
| 1670 | |
| 1671 | =head2 Overriding Built-in Functions |
| 1672 | X<built-in> X<override> X<CORE> X<CORE::GLOBAL> |
| 1673 | |
| 1674 | Many built-in functions may be overridden, though this should be tried |
| 1675 | only occasionally and for good reason. Typically this might be |
| 1676 | done by a package attempting to emulate missing built-in functionality |
| 1677 | on a non-Unix system. |
| 1678 | |
| 1679 | Overriding may be done only by importing the name from a module at |
| 1680 | compile time--ordinary predeclaration isn't good enough. However, the |
| 1681 | C<use subs> pragma lets you, in effect, predeclare subs |
| 1682 | via the import syntax, and these names may then override built-in ones: |
| 1683 | |
| 1684 | use subs 'chdir', 'chroot', 'chmod', 'chown'; |
| 1685 | chdir $somewhere; |
| 1686 | sub chdir { ... } |
| 1687 | |
| 1688 | To unambiguously refer to the built-in form, precede the |
| 1689 | built-in name with the special package qualifier C<CORE::>. For example, |
| 1690 | saying C<CORE::open()> always refers to the built-in C<open()>, even |
| 1691 | if the current package has imported some other subroutine called |
| 1692 | C<&open()> from elsewhere. Even though it looks like a regular |
| 1693 | function call, it isn't: the CORE:: prefix in that case is part of Perl's |
| 1694 | syntax, and works for any keyword, regardless of what is in the CORE |
| 1695 | package. Taking a reference to it, that is, C<\&CORE::open>, only works |
| 1696 | for some keywords. See L<CORE>. |
| 1697 | |
| 1698 | Library modules should not in general export built-in names like C<open> |
| 1699 | or C<chdir> as part of their default C<@EXPORT> list, because these may |
| 1700 | sneak into someone else's namespace and change the semantics unexpectedly. |
| 1701 | Instead, if the module adds that name to C<@EXPORT_OK>, then it's |
| 1702 | possible for a user to import the name explicitly, but not implicitly. |
| 1703 | That is, they could say |
| 1704 | |
| 1705 | use Module 'open'; |
| 1706 | |
| 1707 | and it would import the C<open> override. But if they said |
| 1708 | |
| 1709 | use Module; |
| 1710 | |
| 1711 | they would get the default imports without overrides. |
| 1712 | |
| 1713 | The foregoing mechanism for overriding built-in is restricted, quite |
| 1714 | deliberately, to the package that requests the import. There is a second |
| 1715 | method that is sometimes applicable when you wish to override a built-in |
| 1716 | everywhere, without regard to namespace boundaries. This is achieved by |
| 1717 | importing a sub into the special namespace C<CORE::GLOBAL::>. Here is an |
| 1718 | example that quite brazenly replaces the C<glob> operator with something |
| 1719 | that understands regular expressions. |
| 1720 | |
| 1721 | package REGlob; |
| 1722 | require Exporter; |
| 1723 | @ISA = 'Exporter'; |
| 1724 | @EXPORT_OK = 'glob'; |
| 1725 | |
| 1726 | sub import { |
| 1727 | my $pkg = shift; |
| 1728 | return unless @_; |
| 1729 | my $sym = shift; |
| 1730 | my $where = ($sym =~ s/^GLOBAL_// ? 'CORE::GLOBAL' : caller(0)); |
| 1731 | $pkg->export($where, $sym, @_); |
| 1732 | } |
| 1733 | |
| 1734 | sub glob { |
| 1735 | my $pat = shift; |
| 1736 | my @got; |
| 1737 | if (opendir my $d, '.') { |
| 1738 | @got = grep /$pat/, readdir $d; |
| 1739 | closedir $d; |
| 1740 | } |
| 1741 | return @got; |
| 1742 | } |
| 1743 | 1; |
| 1744 | |
| 1745 | And here's how it could be (ab)used: |
| 1746 | |
| 1747 | #use REGlob 'GLOBAL_glob'; # override glob() in ALL namespaces |
| 1748 | package Foo; |
| 1749 | use REGlob 'glob'; # override glob() in Foo:: only |
| 1750 | print for <^[a-z_]+\.pm\$>; # show all pragmatic modules |
| 1751 | |
| 1752 | The initial comment shows a contrived, even dangerous example. |
| 1753 | By overriding C<glob> globally, you would be forcing the new (and |
| 1754 | subversive) behavior for the C<glob> operator for I<every> namespace, |
| 1755 | without the complete cognizance or cooperation of the modules that own |
| 1756 | those namespaces. Naturally, this should be done with extreme caution--if |
| 1757 | it must be done at all. |
| 1758 | |
| 1759 | The C<REGlob> example above does not implement all the support needed to |
| 1760 | cleanly override perl's C<glob> operator. The built-in C<glob> has |
| 1761 | different behaviors depending on whether it appears in a scalar or list |
| 1762 | context, but our C<REGlob> doesn't. Indeed, many perl built-in have such |
| 1763 | context sensitive behaviors, and these must be adequately supported by |
| 1764 | a properly written override. For a fully functional example of overriding |
| 1765 | C<glob>, study the implementation of C<File::DosGlob> in the standard |
| 1766 | library. |
| 1767 | |
| 1768 | When you override a built-in, your replacement should be consistent (if |
| 1769 | possible) with the built-in native syntax. You can achieve this by using |
| 1770 | a suitable prototype. To get the prototype of an overridable built-in, |
| 1771 | use the C<prototype> function with an argument of C<"CORE::builtin_name"> |
| 1772 | (see L<perlfunc/prototype>). |
| 1773 | |
| 1774 | Note however that some built-ins can't have their syntax expressed by a |
| 1775 | prototype (such as C<system> or C<chomp>). If you override them you won't |
| 1776 | be able to fully mimic their original syntax. |
| 1777 | |
| 1778 | The built-ins C<do>, C<require> and C<glob> can also be overridden, but due |
| 1779 | to special magic, their original syntax is preserved, and you don't have |
| 1780 | to define a prototype for their replacements. (You can't override the |
| 1781 | C<do BLOCK> syntax, though). |
| 1782 | |
| 1783 | C<require> has special additional dark magic: if you invoke your |
| 1784 | C<require> replacement as C<require Foo::Bar>, it will actually receive |
| 1785 | the argument C<"Foo/Bar.pm"> in @_. See L<perlfunc/require>. |
| 1786 | |
| 1787 | And, as you'll have noticed from the previous example, if you override |
| 1788 | C<glob>, the C<< <*> >> glob operator is overridden as well. |
| 1789 | |
| 1790 | In a similar fashion, overriding the C<readline> function also overrides |
| 1791 | the equivalent I/O operator C<< <FILEHANDLE> >>. Also, overriding |
| 1792 | C<readpipe> also overrides the operators C<``> and C<qx//>. |
| 1793 | |
| 1794 | Finally, some built-ins (e.g. C<exists> or C<grep>) can't be overridden. |
| 1795 | |
| 1796 | =head2 Autoloading |
| 1797 | X<autoloading> X<AUTOLOAD> |
| 1798 | |
| 1799 | If you call a subroutine that is undefined, you would ordinarily |
| 1800 | get an immediate, fatal error complaining that the subroutine doesn't |
| 1801 | exist. (Likewise for subroutines being used as methods, when the |
| 1802 | method doesn't exist in any base class of the class's package.) |
| 1803 | However, if an C<AUTOLOAD> subroutine is defined in the package or |
| 1804 | packages used to locate the original subroutine, then that |
| 1805 | C<AUTOLOAD> subroutine is called with the arguments that would have |
| 1806 | been passed to the original subroutine. The fully qualified name |
| 1807 | of the original subroutine magically appears in the global $AUTOLOAD |
| 1808 | variable of the same package as the C<AUTOLOAD> routine. The name |
| 1809 | is not passed as an ordinary argument because, er, well, just |
| 1810 | because, that's why. (As an exception, a method call to a nonexistent |
| 1811 | C<import> or C<unimport> method is just skipped instead. Also, if |
| 1812 | the AUTOLOAD subroutine is an XSUB, there are other ways to retrieve the |
| 1813 | subroutine name. See L<perlguts/Autoloading with XSUBs> for details.) |
| 1814 | |
| 1815 | |
| 1816 | Many C<AUTOLOAD> routines load in a definition for the requested |
| 1817 | subroutine using eval(), then execute that subroutine using a special |
| 1818 | form of goto() that erases the stack frame of the C<AUTOLOAD> routine |
| 1819 | without a trace. (See the source to the standard module documented |
| 1820 | in L<AutoLoader>, for example.) But an C<AUTOLOAD> routine can |
| 1821 | also just emulate the routine and never define it. For example, |
| 1822 | let's pretend that a function that wasn't defined should just invoke |
| 1823 | C<system> with those arguments. All you'd do is: |
| 1824 | |
| 1825 | sub AUTOLOAD { |
| 1826 | my $program = $AUTOLOAD; |
| 1827 | $program =~ s/.*:://; |
| 1828 | system($program, @_); |
| 1829 | } |
| 1830 | date(); |
| 1831 | who('am', 'i'); |
| 1832 | ls('-l'); |
| 1833 | |
| 1834 | In fact, if you predeclare functions you want to call that way, you don't |
| 1835 | even need parentheses: |
| 1836 | |
| 1837 | use subs qw(date who ls); |
| 1838 | date; |
| 1839 | who "am", "i"; |
| 1840 | ls '-l'; |
| 1841 | |
| 1842 | A more complete example of this is the Shell module on CPAN, which |
| 1843 | can treat undefined subroutine calls as calls to external programs. |
| 1844 | |
| 1845 | Mechanisms are available to help modules writers split their modules |
| 1846 | into autoloadable files. See the standard AutoLoader module |
| 1847 | described in L<AutoLoader> and in L<AutoSplit>, the standard |
| 1848 | SelfLoader modules in L<SelfLoader>, and the document on adding C |
| 1849 | functions to Perl code in L<perlxs>. |
| 1850 | |
| 1851 | =head2 Subroutine Attributes |
| 1852 | X<attribute> X<subroutine, attribute> X<attrs> |
| 1853 | |
| 1854 | A subroutine declaration or definition may have a list of attributes |
| 1855 | associated with it. If such an attribute list is present, it is |
| 1856 | broken up at space or colon boundaries and treated as though a |
| 1857 | C<use attributes> had been seen. See L<attributes> for details |
| 1858 | about what attributes are currently supported. |
| 1859 | Unlike the limitation with the obsolescent C<use attrs>, the |
| 1860 | C<sub : ATTRLIST> syntax works to associate the attributes with |
| 1861 | a pre-declaration, and not just with a subroutine definition. |
| 1862 | |
| 1863 | The attributes must be valid as simple identifier names (without any |
| 1864 | punctuation other than the '_' character). They may have a parameter |
| 1865 | list appended, which is only checked for whether its parentheses ('(',')') |
| 1866 | nest properly. |
| 1867 | |
| 1868 | Examples of valid syntax (even though the attributes are unknown): |
| 1869 | |
| 1870 | sub fnord (&\%) : switch(10,foo(7,3)) : expensive; |
| 1871 | sub plugh () : Ugly('\(") :Bad; |
| 1872 | sub xyzzy : _5x5 { ... } |
| 1873 | |
| 1874 | Examples of invalid syntax: |
| 1875 | |
| 1876 | sub fnord : switch(10,foo(); # ()-string not balanced |
| 1877 | sub snoid : Ugly('('); # ()-string not balanced |
| 1878 | sub xyzzy : 5x5; # "5x5" not a valid identifier |
| 1879 | sub plugh : Y2::north; # "Y2::north" not a simple identifier |
| 1880 | sub snurt : foo + bar; # "+" not a colon or space |
| 1881 | |
| 1882 | The attribute list is passed as a list of constant strings to the code |
| 1883 | which associates them with the subroutine. In particular, the second example |
| 1884 | of valid syntax above currently looks like this in terms of how it's |
| 1885 | parsed and invoked: |
| 1886 | |
| 1887 | use attributes __PACKAGE__, \&plugh, q[Ugly('\(")], 'Bad'; |
| 1888 | |
| 1889 | For further details on attribute lists and their manipulation, |
| 1890 | see L<attributes> and L<Attribute::Handlers>. |
| 1891 | |
| 1892 | =head1 SEE ALSO |
| 1893 | |
| 1894 | See L<perlref/"Function Templates"> for more about references and closures. |
| 1895 | See L<perlxs> if you'd like to learn about calling C subroutines from Perl. |
| 1896 | See L<perlembed> if you'd like to learn about calling Perl subroutines from C. |
| 1897 | See L<perlmod> to learn about bundling up your functions in separate files. |
| 1898 | See L<perlmodlib> to learn what library modules come standard on your system. |
| 1899 | See L<perlootut> to learn how to make object method calls. |