pod/perlsub.pod

   1 =head1 NAME
   2
   3 perlsub - Perl subroutines
   4
   5 =head1 SYNOPSIS
   6
   7 To declare subroutines:
   8
   9     sub NAME;             # A "forward" declaration.
  10     sub NAME(PROTO);      #  ditto, but with prototypes
  11
  12     sub NAME BLOCK        # A declaration and a definition.
  13     sub NAME(PROTO) BLOCK #  ditto, but with prototypes
  14
  15 To define an anonymous subroutine at runtime:
  16
  17     $subref = sub BLOCK;            # no proto
  18     $subref = sub (PROTO) BLOCK;    # with proto
  19
  20 To import subroutines:
  21
  22     use MODULE qw(NAME1 NAME2 NAME3);
  23
  24 To call subroutines:
  25
  26     NAME(LIST);    # & is optional with parentheses.
  27     NAME LIST;     # Parentheses optional if predeclared/imported.
  28     &NAME(LIST);   # Circumvent prototypes.
  29     &NAME;         # Makes current @_ visible to called subroutine.
  30
  31 =head1 DESCRIPTION
  32
  33 Like many languages, Perl provides for user-defined subroutines.
  34 These may be located anywhere in the main program, loaded in from
  35 other files via the C<do>, C<require>, or C<use> keywords, or
  36 generated on the fly using C<eval> or anonymous subroutines (closures).
  37 You can even call a function indirectly using a variable containing
  38 its name or a CODE reference.
  39
  40 The Perl model for function call and return values is simple: all
  41 functions are passed as parameters one single flat list of scalars, and
  42 all functions likewise return to their caller one single flat list of
  43 scalars.  Any arrays or hashes in these call and return lists will
  44 collapse, losing their identities--but you may always use
  45 pass-by-reference instead to avoid this.  Both call and return lists may
  46 contain as many or as few scalar elements as you'd like.  (Often a
  47 function without an explicit return statement is called a subroutine, but
  48 there's really no difference from Perl's perspective.)
  49
  50 Any arguments passed in show up in the array C<@_>.  Therefore, if
  51 you called a function with two arguments, those would be stored in
  52 C<$_[0]> and C<$_[1]>.  The array C<@_> is a local array, but its
  53 elements are aliases for the actual scalar parameters.  In particular,
  54 if an element C<$_[0]> is updated, the corresponding argument is
  55 updated (or an error occurs if it is not updatable).  If an argument
  56 is an array or hash element which did not exist when the function
  57 was called, that element is created only when (and if) it is modified
  58 or a reference to it is taken.  (Some earlier versions of Perl
  59 created the element whether or not the element was assigned to.)
  60 Assigning to the whole array C<@_> removes that aliasing, and does
  61 not update any arguments.
  62
  63 The return value of a subroutine is the value of the last expression
  64 evaluated.  More explicitly, a C<return> statement may be used to exit the
  65 subroutine, optionally specifying the returned value, which will be
  66 evaluated in the appropriate context (list, scalar, or void) depending
  67 on the context of the subroutine call.  If you specify no return value,
  68 the subroutine returns an empty list in list context, the undefined
  69 value in scalar context, or nothing in void context.  If you return
  70 one or more aggregates (arrays and hashes), these will be flattened
  71 together into one large indistinguishable list.
  72
  73 Perl does not have named formal parameters.  In practice all you
  74 do is assign to a C<my()> list of these.  Variables that aren't
  75 declared to be private are global variables.  For gory details
  76 on creating private variables, see L<"Private Variables via my()">
  77 and L<"Temporary Values via local()">.  To create protected
  78 environments for a set of functions in a separate package (and
  79 probably a separate file), see L<perlmod/"Packages">.
  80
  81 Example:
  82
  83     sub max {
  84         my $max = shift(@_);
  85         foreach $foo (@_) {
  86             $max = $foo if $max < $foo;
  87         }
  88         return $max;
  89     }
  90     $bestday = max($mon,$tue,$wed,$thu,$fri);
  91
  92 Example:
  93
  94     # get a line, combining continuation lines
  95     #  that start with whitespace
  96
  97     sub get_line {
  98         $thisline = $lookahead;  # global variables!
  99         LINE: while (defined($lookahead = <STDIN>)) {
 100             if ($lookahead =~ /^[ \t]/) {
 101                 $thisline .= $lookahead;
 102             }
 103             else {
 104                 last LINE;
 105             }
 106         }
 107         return $thisline;
 108     }
 109
 110     $lookahead = <STDIN>;       # get first line
 111     while (defined($line = get_line())) {
 112         ...
 113     }
 114
 115 Asisng to a list of private variables to name your arguments:
 116
 117     sub maybeset {
 118         my($key, $value) = @_;
 119         $Foo{$key} = $value unless $Foo{$key};
 120     }
 121
 122 Because the assignment copies the values, this also has the effect
 123 of turning call-by-reference into call-by-value.  Otherwise a
 124 function is free to do in-place modifications of C<@_> and change
 125 its caller's values.
 126
 127     upcase_in($v1, $v2);  # this changes $v1 and $v2
 128     sub upcase_in {
 129         for (@_) { tr/a-z/A-Z/ }
 130     }
 131
 132 You aren't allowed to modify constants in this way, of course.  If an
 133 argument were actually literal and you tried to change it, you'd take a
 134 (presumably fatal) exception.   For example, this won't work:
 135
 136     upcase_in("frederick");
 137
 138 It would be much safer if the C<upcase_in()> function
 139 were written to return a copy of its parameters instead
 140 of changing them in place:
 141
 142     ($v3, $v4) = upcase($v1, $v2);  # this doesn't change $v1 and $v2
 143     sub upcase {
 144         return unless defined wantarray;  # void context, do nothing
 145         my @parms = @_;
 146         for (@parms) { tr/a-z/A-Z/ }
 147         return wantarray ? @parms : $parms[0];
 148     }
 149
 150 Notice how this (unprototyped) function doesn't care whether it was
 151 passed real scalars or arrays.  Perl sees all arugments as one big,
 152 long, flat parameter list in C<@_>.  This is one area where
 153 Perl's simple argument-passing style shines.  The C<upcase()>
 154 function would work perfectly well without changing the C<upcase()>
 155 definition even if we fed it things like this:
 156
 157     @newlist   = upcase(@list1, @list2);
 158     @newlist   = upcase( split /:/, $var );
 159
 160 Do not, however, be tempted to do this:
 161
 162     (@a, @b)   = upcase(@list1, @list2);
 163
 164 Like the flattened incoming parameter list, the return list is also
 165 flattened on return.  So all you have managed to do here is stored
 166 everything in C<@a> and made C<@b> an empty list.  See L<Pass by
 167 Reference> for alternatives.
 168
 169 A subroutine may be called using an explicit C<&> prefix.  The
 170 C<&> is optional in modern Perl, as are parentheses if the
 171 subroutine has been predeclared.  The C<&> is I<not> optional
 172 when just naming the subroutine, such as when it's used as
 173 an argument to defined() or undef().  Nor is it optional when you
 174 want to do an indirect subroutine call with a subroutine name or
 175 reference using the C<&$subref()> or C<&{$subref}()> constructs,
 176 although the C<$subref-E<gt>()> notation solves that problem.
 177 See L<perlref> for more about all that.
 178
 179 Subroutines may be called recursively.  If a subroutine is called
 180 using the C<&> form, the argument list is optional, and if omitted,
 181 no C<@_> array is set up for the subroutine: the C<@_> array at the
 182 time of the call is visible to subroutine instead.  This is an
 183 efficiency mechanism that new users may wish to avoid.
 184
 185     &foo(1,2,3);        # pass three arguments
 186     foo(1,2,3);         # the same
 187
 188     foo();              # pass a null list
 189     &foo();             # the same
 190
 191     &foo;               # foo() get current args, like foo(@_) !!
 192     foo;                # like foo() IFF sub foo predeclared, else "foo"
 193
 194 Not only does the C<&> form make the argument list optional, it also
 195 disables any prototype checking on arguments you do provide.  This
 196 is partly for historical reasons, and partly for having a convenient way
 197 to cheat if you know what you're doing.  See L<Prototypes> below.
 198
 199 Function whose names are in all upper case are reserved to the Perl
 200 core, as are modules whose names are in all lower case.  A
 201 function in all capitals is a loosely-held convention meaning it
 202 will be called indirectly by the run-time system itself, usually
 203 due to a triggered event.  Functions that do special, pre-defined
 204 things include C<BEGIN>, C<END>, C<AUTOLOAD>, and C<DESTROY>--plus
 205 all functions mentioned in L<perltie>.  The 5.005 release adds
 206 C<INIT> to this list.
 207
 208 =head2 Private Variables via my()
 209
 210 Synopsis:
 211
 212     my $foo;            # declare $foo lexically local
 213     my (@wid, %get);    # declare list of variables local
 214     my $foo = "flurp";  # declare $foo lexical, and init it
 215     my @oof = @bar;     # declare @oof lexical, and init it
 216
 217 The C<my> operator declares the listed variables to be lexically
 218 confined to the enclosing block, conditional (C<if/unless/elsif/else>),
 219 loop (C<for/foreach/while/until/continue>), subroutine, C<eval>,
 220 or C<do/require/use>'d file.  If more than one value is listed, the
 221 list must be placed in parentheses.  All listed elements must be
 222 legal lvalues.  Only alphanumeric identifiers may be lexically
 223 scoped--magical built-in like C<$/> must currently be C<local>ize
 224 with C<local> instead.
 225
 226 Unlike dynamic variables created by the C<local> operator, lexical
 227 variables declared with C<my> are totally hidden from the outside
 228 world, including any called subroutines.  This is true if it's the
 229 same subroutine called from itself or elsewhere--every call gets
 230 its own copy.
 231
 232 This doesn't mean that a C<my> variable declared in a statically
 233 enclosing lexical scope would be invisible.  Only dynamic scopes
 234 are cut off.   For example, the C<bumpx()> function below has access
 235 to the lexical $x variable because both the C<my> and the C<sub>
 236 occurred at the same scope, presumably file scope.
 237
 238     my $x = 10;
 239     sub bumpx { $x++ }
 240
 241 An C<eval()>, however, can see lexical variables of the scope it is
 242 being evaluated in, so long as the names aren't hidden by declarations within
 243 the C<eval()> itself.  See L<perlref>.
 244
 245 The parameter list to my() may be assigned to if desired, which allows you
 246 to initialize your variables.  (If no initializer is given for a
 247 particular variable, it is created with the undefined value.)  Commonly
 248 this is used to name input parameters to a subroutine.  Examples:
 249
 250     $arg = "fred";        # "global" variable
 251     $n = cube_root(27);
 252     print "$arg thinks the root is $n\n";
 253  fred thinks the root is 3
 254
 255     sub cube_root {
 256         my $arg = shift;  # name doesn't matter
 257         $arg **= 1/3;
 258         return $arg;
 259     }
 260
 261 The C<my> is simply a modifier on something you might assign to.  So when
 262 you do assign to variables in its argument list, C<my> doesn't
 263 change whether those variables are viewed as a scalar or an array.  So
 264
 265     my ($foo) = <STDIN>;                # WRONG?
 266     my @FOO = <STDIN>;
 267
 268 both supply a list context to the right-hand side, while
 269
 270     my $foo = <STDIN>;
 271
 272 supplies a scalar context.  But the following declares only one variable:
 273
 274     my $foo, $bar = 1;                  # WRONG
 275
 276 That has the same effect as
 277
 278     my $foo;
 279     $bar = 1;
 280
 281 The declared variable is not introduced (is not visible) until after
 282 the current statement.  Thus,
 283
 284     my $x = $x;
 285
 286 can be used to initialize a new $x with the value of the old $x, and
 287 the expression
 288
 289     my $x = 123 and $x == 123
 290
 291 is false unless the old $x happened to have the value C<123>.
 292
 293 Lexical scopes of control structures are not bounded precisely by the
 294 braces that delimit their controlled blocks; control expressions are
 295 part of that scope, too.  Thus in the loop
 296
 297     while (my $line = <>) {
 298         $line = lc $line;
 299     } continue {
 300         print $line;
 301     }
 302
 303 the scope of $line extends from its declaration throughout the rest of
 304 the loop construct (including the C<continue> clause), but not beyond
 305 it.  Similarly, in the conditional
 306
 307     if ((my $answer = <STDIN>) =~ /^yes$/i) {
 308         user_agrees();
 309     } elsif ($answer =~ /^no$/i) {
 310         user_disagrees();
 311     } else {
 312         chomp $answer;
 313         die "'$answer' is neither 'yes' nor 'no'";
 314     }
 315
 316 the scope of $answer extends from its declaration through the rest
 317 of that conditional, including any C<elsif> and C<else> clauses,
 318 but not beyond it.
 319
 320 None of the foregoing text applies to C<if/unless> or C<while/until>
 321 modifiers appended to simple statements.  Such modifiers are not
 322 control structures and have no effect on scoping.
 323
 324 The C<foreach> loop defaults to scoping its index variable dynamically
 325 in the manner of C<local>.  However, if the index variable is
 326 prefixed with the keyword C<my>, or if there is already a lexical
 327 by that name in scope, then a new lexical is created instead.  Thus
 328 in the loop
 329
 330     for my $i (1, 2, 3) {
 331         some_function();
 332     }
 333
 334 the scope of $i extends to the end of the loop, but not beyond it,
 335 rendering the value of $i inaccessible within C<some_function()>.
 336
 337 Some users may wish to encourage the use of lexically scoped variables.
 338 As an aid to catching implicit uses to package variables,
 339 which are always global, if you say
 340
 341     use strict 'vars';
 342
 343 then any variable mentioned from there to the end of the enclosing
 344 block must either refer to a lexical variable, be predeclared via
 345 C<use vars>, or else must be fully qualified with the package name.
 346 A compilation error results otherwise.  An inner block may countermand
 347 this with C<no strict 'vars'>.
 348
 349 A C<my> has both a compile-time and a run-time effect.  At compile
 350 time, the compiler takes notice of it.  The principle usefulness
 351 of this is to quiet C<use strict 'vars'>, but it is also essential
 352 for generation of closures as detailed in L<perlref>.  Actual
 353 initialization is delayed until run time, though, so it gets executed
 354 at the appropriate time, such as each time through a loop, for
 355 example.
 356
 357 Variables declared with C<my> are not part of any package and are therefore
 358 never fully qualified with the package name.  In particular, you're not
 359 allowed to try to make a package variable (or other global) lexical:
 360
 361     my $pack::var;      # ERROR!  Illegal syntax
 362     my $_;              # also illegal (currently)
 363
 364 In fact, a dynamic variable (also known as package or global variables)
 365 are still accessible using the fully qualified C<::> notation even while a
 366 lexical of the same name is also visible:
 367
 368     package main;
 369     local $x = 10;
 370     my    $x = 20;
 371     print "$x and $::x\n";
 372
 373 That will print out C<20> and C<10>.
 374
 375 You may declare C<my> variables at the outermost scope of a file
 376 to hide any such identifiers from the world outside that file.  This
 377 is similar in spirit to C's static variables when they are used at
 378 the file level.  To do this with a subroutine requires the use of
 379 a closure (an anonymous function that accesses enclosing lexicals).
 380 If you want to create a private subroutine that cannot be called
 381 from outside that block, it can declare a lexical variable containing
 382 an anonymous sub reference:
 383
 384     my $secret_version = '1.001-beta';
 385     my $secret_sub = sub { print $secret_version };
 386     &$secret_sub();
 387
 388 As long as the reference is never returned by any function within the
 389 module, no outside module can see the subroutine, because its name is not in
 390 any package's symbol table.  Remember that it's not I<REALLY> called
 391 C<$some_pack::secret_version> or anything; it's just $secret_version,
 392 unqualified and unqualifiable.
 393
 394 This does not work with object methods, however; all object methods
 395 have to be in the symbol table of some package to be found.  See
 396 L<perlref/"Function Templates"> for something of a work-around to
 397 this.
 398
 399 =head2 Persistent Private Variables
 400
 401 Just because a lexical variable is lexically (also called statically)
 402 scoped to its enclosing block, C<eval>, or C<do> FILE, this doesn't mean that
 403 within a function it works like a C static.  It normally works more
 404 like a C auto, but with implicit garbage collection.
 405
 406 Unlike local variables in C or C++, Perl's lexical variables don't
 407 necessarily get recycled just because their scope has exited.
 408 If something more permanent is still aware of the lexical, it will
 409 stick around.  So long as something else references a lexical, that
 410 lexical won't be freed--which is as it should be.  You wouldn't want
 411 memory being free until you were done using it, or kept around once you
 412 were done.  Automatic garbage collection takes care of this for you.
 413
 414 This means that you can pass back or save away references to lexical
 415 variables, whereas to return a pointer to a C auto is a grave error.
 416 It also gives us a way to simulate C's function statics.  Here's a
 417 mechanism for giving a function private variables with both lexical
 418 scoping and a static lifetime.  If you do want to create something like
 419 C's static variables, just enclose the whole function in an extra block,
 420 and put the static variable outside the function but in the block.
 421
 422     {
 423         my $secret_val = 0;
 424         sub gimme_another {
 425             return ++$secret_val;
 426         }
 427     }
 428     # $secret_val now becomes unreachable by the outside
 429     # world, but retains its value between calls to gimme_another
 430
 431 If this function is being sourced in from a separate file
 432 via C<require> or C<use>, then this is probably just fine.  If it's
 433 all in the main program, you'll need to arrange for the C<my>
 434 to be executed early, either by putting the whole block above
 435 your main program, or more likely, placing merely a C<BEGIN>
 436 sub around it to make sure it gets executed before your program
 437 starts to run:
 438
 439     sub BEGIN {
 440         my $secret_val = 0;
 441         sub gimme_another {
 442             return ++$secret_val;
 443         }
 444     }
 445
 446 See L<perlmod/"Package Constructors and Destructors"> about the
 447 special triggered functions, C<BEGIN> and C<INIT>.
 448
 449 If declared at the outermost scope (the file scope), then lexicals
 450 work somewhat like C's file statics.  They are available to all
 451 functions in that same file declared below them, but are inaccessible
 452 from outside that file.  This strategy is sometimes used in modules
 453 to create private variables that the whole module can see.
 454
 455 =head2 Temporary Values via local()
 456
 457 B<WARNING>: In general, you should be using C<my> instead of C<local>, because
 458 it's faster and safer.  Exceptions to this include the global punctuation
 459 variables, filehandles and formats, and direct manipulation of the Perl
 460 symbol table itself.  Format variables often use C<local> though, as do
 461 other variables whose current value must be visible to called
 462 subroutines.
 463
 464 Synopsis:
 465
 466     local $foo;                 # declare $foo dynamically local
 467     local (@wid, %get);         # declare list of variables local
 468     local $foo = "flurp";       # declare $foo dynamic, and init it
 469     local @oof = @bar;          # declare @oof dynamic, and init it
 470
 471     local *FH;                  # localize $FH, @FH, %FH, &FH  ...
 472     local *merlyn = *randal;    # now $merlyn is really $randal, plus
 473                                 #     @merlyn is really @randal, etc
 474     local *merlyn = 'randal';   # SAME THING: promote 'randal' to *randal
 475     local *merlyn = \$randal;   # just alias $merlyn, not @merlyn etc
 476
 477 A C<local> modifies its listed variables to be "local" to the
 478 enclosing block, C<eval>, or C<do FILE>--and to I<any subroutine
 479 called from within that block>.  A C<local> just gives temporary
 480 values to global (meaning package) variables.  It does I<not> create
 481 a local variable.  This is known as dynamic scoping.  Lexical scoping
 482 is done with C<my>, which works more like C's auto declarations.
 483
 484 If more than one variable is given to C<local>, they must be placed in
 485 parentheses.  All listed elements must be legal lvalues.  This operator works
 486 by saving the current values of those variables in its argument list on a
 487 hidden stack and restoring them upon exiting the block, subroutine, or
 488 eval.  This means that called subroutines can also reference the local
 489 variable, but not the global one.  The argument list may be assigned to if
 490 desired, which allows you to initialize your local variables.  (If no
 491 initializer is given for a particular variable, it is created with an
 492 undefined value.)  Commonly this is used to name the parameters to a
 493 subroutine.  Examples:
 494
 495     for $i ( 0 .. 9 ) {
 496         $digits{$i} = $i;
 497     }
 498     # assume this function uses global %digits hash
 499     parse_num();
 500
 501     # now temporarily add to %digits hash
 502     if ($base12) {
 503         # (NOTE: not claiming this is efficient!)
 504         local %digits  = (%digits, 't' => 10, 'e' => 11);
 505         parse_num();  # parse_num gets this new %digits!
 506     }
 507     # old %digits restored here
 508
 509 Because C<local> is a run-time operator, it gets executed each time
 510 through a loop.  In releases of Perl previous to 5.0, this used more stack
 511 storage each time until the loop was exited.  Perl now reclaims the space
 512 each time through, but it's still more efficient to declare your variables
 513 outside the loop.
 514
 515 A C<local> is simply a modifier on an lvalue expression.  When you assign to
 516 a C<local>ized variable, the C<local> doesn't change whether its list is viewed
 517 as a scalar or an array.  So
 518
 519     local($foo) = <STDIN>;
 520     local @FOO = <STDIN>;
 521
 522 both supply a list context to the right-hand side, while
 523
 524     local $foo = <STDIN>;
 525
 526 supplies a scalar context.
 527
 528 A note about C<local()> and composite types is in order.  Something
 529 like C<local(%foo)> works by temporarily placing a brand new hash in
 530 the symbol table.  The old hash is left alone, but is hidden "behind"
 531 the new one.
 532
 533 This means the old variable is completely invisible via the symbol
 534 table (i.e. the hash entry in the C<*foo> typeglob) for the duration
 535 of the dynamic scope within which the C<local()> was seen.  This
 536 has the effect of allowing one to temporarily occlude any magic on
 537 composite types.  For instance, this will briefly alter a tied
 538 hash to some other implementation:
 539
 540     tie %ahash, 'APackage';
 541     [...]
 542     {
 543        local %ahash;
 544        tie %ahash, 'BPackage';
 545        [..called code will see %ahash tied to 'BPackage'..]
 546        {
 547           local %ahash;
 548           [..%ahash is a normal (untied) hash here..]
 549        }
 550     }
 551     [..%ahash back to its initial tied self again..]
 552
 553 As another example, a custom implementation of C<%ENV> might look
 554 like this:
 555
 556     {
 557         local %ENV;
 558         tie %ENV, 'MyOwnEnv';
 559         [..do your own fancy %ENV manipulation here..]
 560     }
 561     [..normal %ENV behavior here..]
 562
 563 It's also worth taking a moment to explain what happens when you
 564 C<local>ize a member of a composite type (i.e. an array or hash element).
 565 In this case, the element is C<local>ized I<by name>. This means that
 566 when the scope of the C<local()> ends, the saved value will be
 567 restored to the hash element whose key was named in the C<local()>, or
 568 the array element whose index was named in the C<local()>.  If that
 569 element was deleted while the C<local()> was in effect (e.g. by a
 570 C<delete()> from a hash or a C<shift()> of an array), it will spring
 571 back into existence, possibly extending an array and filling in the
 572 skipped elements with C<undef>.  For instance, if you say
 573
 574     %hash = ( 'This' => 'is', 'a' => 'test' );
 575     @ary  = ( 0..5 );
 576     {
 577          local($ary[5]) = 6;
 578          local($hash{'a'}) = 'drill';
 579          while (my $e = pop(@ary)) {
 580              print "$e . . .\n";
 581              last unless $e > 3;
 582          }
 583          if (@ary) {
 584              $hash{'only a'} = 'test';
 585              delete $hash{'a'};
 586          }
 587     }
 588     print join(' ', map { "$_ $hash{$_}" } sort keys %hash),".\n";
 589     print "The array has ",scalar(@ary)," elements: ",
 590           join(', ', map { defined $_ ? $_ : 'undef' } @ary),"\n";
 591
 592 Perl will print
 593
 594     6 . . .
 595     4 . . .
 596     3 . . .
 597     This is a test only a test.
 598     The array has 6 elements: 0, 1, 2, undef, undef, 5
 599
 600 The behavior of local() on non-existent members of composite
 601 types is subject to change in future.
 602
 603 =head2 Passing Symbol Table Entries (typeglobs)
 604
 605 B<WARNING>: The mechanism described in this section was originally
 606 the only way to simulate pass-by-reference in older versions of
 607 Perl.  While it still works fine in modern versions, the new reference
 608 mechanism is generally easier to work with.  See below.
 609
 610 Sometimes you don't want to pass the value of an array to a subroutine
 611 but rather the name of it, so that the subroutine can modify the global
 612 copy of it rather than working with a local copy.  In perl you can
 613 refer to all objects of a particular name by prefixing the name
 614 with a star: C<*foo>.  This is often known as a "typeglob", because the
 615 star on the front can be thought of as a wildcard match for all the
 616 funny prefix characters on variables and subroutines and such.
 617
 618 When evaluated, the typeglob produces a scalar value that represents
 619 all the objects of that name, including any filehandle, format, or
 620 subroutine.  When assigned to, it causes the name mentioned to refer to
 621 whatever C<*> value was assigned to it.  Example:
 622
 623     sub doubleary {
 624         local(*someary) = @_;
 625         foreach $elem (@someary) {
 626             $elem *= 2;
 627         }
 628     }
 629     doubleary(*foo);
 630     doubleary(*bar);
 631
 632 Scalars are already passed by reference, so you can modify
 633 scalar arguments without using this mechanism by referring explicitly
 634 to C<$_[0]> etc.  You can modify all the elements of an array by passing
 635 all the elements as scalars, but you have to use the C<*> mechanism (or
 636 the equivalent reference mechanism) to C<push>, C<pop>, or change the size of
 637 an array.  It will certainly be faster to pass the typeglob (or reference).
 638
 639 Even if you don't want to modify an array, this mechanism is useful for
 640 passing multiple arrays in a single LIST, because normally the LIST
 641 mechanism will merge all the array values so that you can't extract out
 642 the individual arrays.  For more on typeglobs, see
 643 L<perldata/"Typeglobs and Filehandles">.
 644
 645 =head2 When to Still Use local()
 646
 647 Despite the existence of C<my>, there are still three places where the
 648 C<local> operator still shines.  In fact, in these three places, you
 649 I<must> use C<local> instead of C<my>.
 650
 651 =over
 652
 653 =item 1. You need to give a global variable a temporary value, especially $_.
 654
 655 The global variables, like C<@ARGV> or the punctuation variables, must be
 656 C<local>ized with C<local()>.  This block reads in F</etc/motd>, and splits
 657 it up into chunks separated by lines of equal signs, which are placed
 658 in C<@Fields>.
 659
 660     {
 661         local @ARGV = ("/etc/motd");
 662         local $/ = undef;
 663         local $_ = <>;
 664         @Fields = split /^\s*=+\s*$/;
 665     }
 666
 667 It particular, it's important to C<local>ize $_ in any routine that assigns
 668 to it.  Look out for implicit assignments in C<while> conditionals.
 669
 670 =item 2. You need to create a local file or directory handle or a local function.
 671
 672 A function that needs a filehandle of its own must use C<local()> uses
 673 C<local()> on complete typeglob.   This can be used to create new symbol
 674 table entries:
 675
 676     sub ioqueue {
 677         local  (*READER, *WRITER);    # not my!
 678         pipe    (READER,  WRITER);    or die "pipe: $!";
 679         return (*READER, *WRITER);
 680     }
 681     ($head, $tail) = ioqueue();
 682
 683 See the Symbol module for a way to create anonymous symbol table
 684 entries.
 685
 686 Because assignment of a reference to a typeglob creates an alias, this
 687 can be used to create what is effectively a local function, or at least,
 688 a local alias.
 689
 690     {
 691         local *grow = \&shrink; # only until this block exists
 692         grow();                 # really calls shrink()
 693         move();                 # if move() grow()s, it shrink()s too
 694     }
 695     grow();                     # get the real grow() again
 696
 697 See L<perlref/"Function Templates"> for more about manipulating
 698 functions by name in this way.
 699
 700 =item 3. You want to temporarily change just one element of an array or hash.
 701
 702 You can C<local>ize just one element of an aggregate.  Usually this
 703 is done on dynamics:
 704
 705     {
 706         local $SIG{INT} = 'IGNORE';
 707         funct();                            # uninterruptible
 708     }
 709     # interruptibility automatically restored here
 710
 711 But it also works on lexically declared aggregates.  Prior to 5.005,
 712 this operation could on occasion misbehave.
 713
 714 =back
 715
 716 =head2 Pass by Reference
 717
 718 If you want to pass more than one array or hash into a function--or
 719 return them from it--and have them maintain their integrity, then
 720 you're going to have to use an explicit pass-by-reference.  Before you
 721 do that, you need to understand references as detailed in L<perlref>.
 722 This section may not make much sense to you otherwise.
 723
 724 Here are a few simple examples.  First, let's pass in several arrays
 725 to a function and have it C<pop> all of then, returning a new list
 726 of all their former last elements:
 727
 728     @tailings = popmany ( \@a, \@b, \@c, \@d );
 729
 730     sub popmany {
 731         my $aref;
 732         my @retlist = ();
 733         foreach $aref ( @_ ) {
 734             push @retlist, pop @$aref;
 735         }
 736         return @retlist;
 737     }
 738
 739 Here's how you might write a function that returns a
 740 list of keys occurring in all the hashes passed to it:
 741
 742     @common = inter( \%foo, \%bar, \%joe );
 743     sub inter {
 744         my ($k, $href, %seen); # locals
 745         foreach $href (@_) {
 746             while ( $k = each %$href ) {
 747                 $seen{$k}++;
 748             }
 749         }
 750         return grep { $seen{$_} == @_ } keys %seen;
 751     }
 752
 753 So far, we're using just the normal list return mechanism.
 754 What happens if you want to pass or return a hash?  Well,
 755 if you're using only one of them, or you don't mind them
 756 concatenating, then the normal calling convention is ok, although
 757 a little expensive.
 758
 759 Where people get into trouble is here:
 760
 761     (@a, @b) = func(@c, @d);
 762 or
 763     (%a, %b) = func(%c, %d);
 764
 765 That syntax simply won't work.  It sets just C<@a> or C<%a> and
 766 clears the C<@b> or C<%b>.  Plus the function didn't get passed
 767 into two separate arrays or hashes: it got one long list in C<@_>,
 768 as always.
 769
 770 If you can arrange for everyone to deal with this through references, it's
 771 cleaner code, although not so nice to look at.  Here's a function that
 772 takes two array references as arguments, returning the two array elements
 773 in order of how many elements they have in them:
 774
 775     ($aref, $bref) = func(\@c, \@d);
 776     print "@$aref has more than @$bref\n";
 777     sub func {
 778         my ($cref, $dref) = @_;
 779         if (@$cref > @$dref) {
 780             return ($cref, $dref);
 781         } else {
 782             return ($dref, $cref);
 783         }
 784     }
 785
 786 It turns out that you can actually do this also:
 787
 788     (*a, *b) = func(\@c, \@d);
 789     print "@a has more than @b\n";
 790     sub func {
 791         local (*c, *d) = @_;
 792         if (@c > @d) {
 793             return (\@c, \@d);
 794         } else {
 795             return (\@d, \@c);
 796         }
 797     }
 798
 799 Here we're using the typeglobs to do symbol table aliasing.  It's
 800 a tad subtle, though, and also won't work if you're using C<my>
 801 variables, because only globals (even in disguised as C<local>s)
 802 are in the symbol table.
 803
 804 If you're passing around filehandles, you could usually just use the bare
 805 typeglob, like C<*STDOUT>, but typeglobs references work, too.
 806 For example:
 807
 808     splutter(\*STDOUT);
 809     sub splutter {
 810         my $fh = shift;
 811         print $fh "her um well a hmmm\n";
 812     }
 813
 814     $rec = get_rec(\*STDIN);
 815     sub get_rec {
 816         my $fh = shift;
 817         return scalar <$fh>;
 818     }
 819
 820 If you're planning on generating new filehandles, you could do this.
 821 Notice to pass back just the bare *FH, not its reference.
 822
 823     sub openit {
 824         my $path = shift;
 825         local *FH;
 826         return open (FH, $path) ? *FH : undef;
 827     }
 828
 829 =head2 Prototypes
 830
 831 Perl supports a very limited kind of compile-time argument checking
 832 using function prototyping.  If you declare
 833
 834     sub mypush (\@@)
 835
 836 then C<mypush()> takes arguments exactly like C<push()> does.  The
 837 function declaration must be visible at compile time.  The prototype
 838 affects only interpretation of new-style calls to the function,
 839 where new-style is defined as not using the C<&> character.  In
 840 other words, if you call it like a built-in function, then it behaves
 841 like a built-in function.  If you call it like an old-fashioned
 842 subroutine, then it behaves like an old-fashioned subroutine.  It
 843 naturally falls out from this rule that prototypes have no influence
 844 on subroutine references like C<\&foo> or on indirect subroutine
 845 calls like C<&{$subref}> or C<$subref-E<gt>()>.
 846
 847 Method calls are not influenced by prototypes either, because the
 848 function to be called is indeterminate at compile time, since
 849 the exact code called depends on inheritance.
 850
 851 Because the intent of this feature is primarily to let you define
 852 subroutines that work like built-in functions, here are prototypes
 853 for some other functions that parse almost exactly like the
 854 corresponding built-in.
 855
 856     Declared as                 Called as
 857
 858     sub mylink ($$)          mylink $old, $new
 859     sub myvec ($$$)          myvec $var, $offset, 1
 860     sub myindex ($$;$)       myindex &getstring, "substr"
 861     sub mysyswrite ($$$;$)   mysyswrite $buf, 0, length($buf) - $off, $off
 862     sub myreverse (@)        myreverse $a, $b, $c
 863     sub myjoin ($@)          myjoin ":", $a, $b, $c
 864     sub mypop (\@)           mypop @array
 865     sub mysplice (\@$$@)     mysplice @array, @array, 0, @pushme
 866     sub mykeys (\%)          mykeys %{$hashref}
 867     sub myopen (*;$)         myopen HANDLE, $name
 868     sub mypipe (**)          mypipe READHANDLE, WRITEHANDLE
 869     sub mygrep (&@)          mygrep { /foo/ } $a, $b, $c
 870     sub myrand ($)           myrand 42
 871     sub mytime ()            mytime
 872
 873 Any backslashed prototype character represents an actual argument
 874 that absolutely must start with that character.  The value passed
 875 as part of C<@_> will be a reference to the actual argument given
 876 in the subroutine call, obtained by applying C<\> to that argument.
 877
 878 Unbackslashed prototype characters have special meanings.  Any
 879 unbackslashed C<@> or C<%> eats all remaining arguments, and forces
 880 list context.  An argument represented by C<$> forces scalar context.  An
 881 C<&> requires an anonymous subroutine, which, if passed as the first
 882 argument, does not require the C<sub> keyword or a subsequent comma.  A
 883 C<*> allows the subroutine to accept a bareword, constant, scalar expression,
 884 typeglob, or a reference to a typeglob in that slot.  The value will be
 885 available to the subroutine either as a simple scalar, or (in the latter
 886 two cases) as a reference to the typeglob.
 887
 888 A semicolon separates mandatory arguments from optional arguments.
 889 It is redundant before C<@> or C<%>, which gobble up everything else.
 890
 891 Note how the last three examples in the table above are treated
 892 specially by the parser.  C<mygrep()> is parsed as a true list
 893 operator, C<myrand()> is parsed as a true unary operator with unary
 894 precedence the same as C<rand()>, and C<mytime()> is truly without
 895 arguments, just like C<time()>.  That is, if you say
 896
 897     mytime +2;
 898
 899 you'll get C<mytime() + 2>, not C<mytime(2)>, which is how it would be parsed
 900 without a prototype.
 901
 902 The interesting thing about C<&> is that you can generate new syntax with it,
 903 provided it's in the initial position:
 904
 905     sub try (&@) {
 906         my($try,$catch) = @_;
 907         eval { &$try };
 908         if ($@) {
 909             local $_ = $@;
 910             &$catch;
 911         }
 912     }
 913     sub catch (&) { $_[0] }
 914
 915     try {
 916         die "phooey";
 917     } catch {
 918         /phooey/ and print "unphooey\n";
 919     };
 920
 921 That prints C<"unphooey">.  (Yes, there are still unresolved
 922 issues having to do with visibility of C<@_>.  I'm ignoring that
 923 question for the moment.  (But note that if we make C<@_> lexically
 924 scoped, those anonymous subroutines can act like closures... (Gee,
 925 is this sounding a little Lispish?  (Never mind.))))
 926
 927 And here's a reimplementation of the Perl C<grep> operator:
 928
 929     sub mygrep (&@) {
 930         my $code = shift;
 931         my @result;
 932         foreach $_ (@_) {
 933             push(@result, $_) if &$code;
 934         }
 935         @result;
 936     }
 937
 938 Some folks would prefer full alphanumeric prototypes.  Alphanumerics have
 939 been intentionally left out of prototypes for the express purpose of
 940 someday in the future adding named, formal parameters.  The current
 941 mechanism's main goal is to let module writers provide better diagnostics
 942 for module users.  Larry feels the notation quite understandable to Perl
 943 programmers, and that it will not intrude greatly upon the meat of the
 944 module, nor make it harder to read.  The line noise is visually
 945 encapsulated into a small pill that's easy to swallow.
 946
 947 It's probably best to prototype new functions, not retrofit prototyping
 948 into older ones.  That's because you must be especially careful about
 949 silent impositions of differing list versus scalar contexts.  For example,
 950 if you decide that a function should take just one parameter, like this:
 951
 952     sub func ($) {
 953         my $n = shift;
 954         print "you gave me $n\n";
 955     }
 956
 957 and someone has been calling it with an array or expression
 958 returning a list:
 959
 960     func(@foo);
 961     func( split /:/ );
 962
 963 Then you've just supplied an automatic C<scalar> in front of their
 964 argument, which can be more than a bit surprising.  The old C<@foo>
 965 which used to hold one thing doesn't get passed in.  Instead,
 966 C<func()> now gets passed in a C<1>; that is, the number of elements
 967 in C<@foo>.  And the C<split> gets called in scalar context so it
 968 starts scribbling on your C<@_> parameter list.  Ouch!
 969
 970 This is all very powerful, of course, and should be used only in moderation
 971 to make the world a better place.
 972
 973 =head2 Constant Functions
 974
 975 Functions with a prototype of C<()> are potential candidates for
 976 inlining.  If the result after optimization and constant folding
 977 is either a constant or a lexically-scoped scalar which has no other
 978 references, then it will be used in place of function calls made
 979 without C<&>.  Calls made using C<&> are never inlined.  (See
 980 F<constant.pm> for an easy way to declare most constants.)
 981
 982 The following functions would all be inlined:
 983
 984     sub pi ()           { 3.14159 }             # Not exact, but close.
 985     sub PI ()           { 4 * atan2 1, 1 }      # As good as it gets,
 986                                                 # and it's inlined, too!
 987     sub ST_DEV ()       { 0 }
 988     sub ST_INO ()       { 1 }
 989
 990     sub FLAG_FOO ()     { 1 << 8 }
 991     sub FLAG_BAR ()     { 1 << 9 }
 992     sub FLAG_MASK ()    { FLAG_FOO | FLAG_BAR }
 993
 994     sub OPT_BAZ ()      { not (0x1B58 & FLAG_MASK) }
 995     sub BAZ_VAL () {
 996         if (OPT_BAZ) {
 997             return 23;
 998         }
 999         else {
1000             return 42;
1001         }
1002     }
1003
1004     sub N () { int(BAZ_VAL) / 3 }
1005     BEGIN {
1006         my $prod = 1;
1007         for (1..N) { $prod *= $_ }
1008         sub N_FACTORIAL () { $prod }
1009     }
1010
1011 If you redefine a subroutine that was eligible for inlining, you'll get
1012 a mandatory warning.  (You can use this warning to tell whether or not a
1013 particular subroutine is considered constant.)  The warning is
1014 considered severe enough not to be optional because previously compiled
1015 invocations of the function will still be using the old value of the
1016 function.  If you need to be able to redefine the subroutine, you need to
1017 ensure that it isn't inlined, either by dropping the C<()> prototype
1018 (which changes calling semantics, so beware) or by thwarting the
1019 inlining mechanism in some other way, such as
1020
1021     sub not_inlined () {
1022         23 if $];
1023     }
1024
1025 =head2 Overriding Built-in Functions
1026
1027 Many built-in functions may be overridden, though this should be tried
1028 only occasionally and for good reason.  Typically this might be
1029 done by a package attempting to emulate missing built-in functionality
1030 on a non-Unix system.
1031
1032 Overriding may be done only by importing the name from a
1033 module--ordinary predeclaration isn't good enough.  However, the
1034 C<use subs> pragma lets you, in effect, predeclare subs
1035 via the import syntax, and these names may then override built-in ones:
1036
1037     use subs 'chdir', 'chroot', 'chmod', 'chown';
1038     chdir $somewhere;
1039     sub chdir { ... }
1040
1041 To unambiguously refer to the built-in form, precede the
1042 built-in name with the special package qualifier C<CORE::>.  For example,
1043 saying C<CORE::open()> always refers to the built-in C<open()>, even
1044 if the current package has imported some other subroutine called
1045 C<&open()> from elsewhere.  Even though it looks like a regular
1046 function calls, it isn't: you can't take a reference to it, such as
1047 the incorrect C<\&CORE::open> might appear to produce.
1048
1049 Library modules should not in general export built-in names like C<open>
1050 or C<chdir> as part of their default C<@EXPORT> list, because these may
1051 sneak into someone else's namespace and change the semantics unexpectedly.
1052 Instead, if the module adds that name to C<@EXPORT_OK>, then it's
1053 possible for a user to import the name explicitly, but not implicitly.
1054 That is, they could say
1055
1056     use Module 'open';
1057
1058 and it would import the C<open> override.  But if they said
1059
1060     use Module;
1061
1062 they would get the default imports without overrides.
1063
1064 The foregoing mechanism for overriding built-in is restricted, quite
1065 deliberately, to the package that requests the import.  There is a second
1066 method that is sometimes applicable when you wish to override a built-in
1067 everywhere, without regard to namespace boundaries.  This is achieved by
1068 importing a sub into the special namespace C<CORE::GLOBAL::>.  Here is an
1069 example that quite brazenly replaces the C<glob> operator with something
1070 that understands regular expressions.
1071
1072     package REGlob;
1073     require Exporter;
1074     @ISA = 'Exporter';
1075     @EXPORT_OK = 'glob';
1076
1077     sub import {
1078         my $pkg = shift;
1079         return unless @_;
1080         my $sym = shift;
1081         my $where = ($sym =~ s/^GLOBAL_// ? 'CORE::GLOBAL' : caller(0));
1082         $pkg->export($where, $sym, @_);
1083     }
1084
1085     sub glob {
1086         my $pat = shift;
1087         my @got;
1088         local *D;
1089         if (opendir D, '.') {
1090             @got = grep /$pat/, readdir D;
1091             closedir D;
1092         }
1093         return @got;
1094     }
1095     1;
1096
1097 And here's how it could be (ab)used:
1098
1099     #use REGlob 'GLOBAL_glob';      # override glob() in ALL namespaces
1100     package Foo;
1101     use REGlob 'glob';              # override glob() in Foo:: only
1102     print for <^[a-z_]+\.pm\$>;     # show all pragmatic modules
1103
1104 The initial comment shows a contrived, even dangerous example.
1105 By overriding C<glob> globally, you would be forcing the new (and
1106 subversive) behavior for the C<glob> operator for I<every> namespace,
1107 without the complete cognizance or cooperation of the modules that own
1108 those namespaces.  Naturally, this should be done with extreme caution--if
1109 it must be done at all.
1110
1111 The C<REGlob> example above does not implement all the support needed to
1112 cleanly override perl's C<glob> operator.  The built-in C<glob> has
1113 different behaviors depending on whether it appears in a scalar or list
1114 context, but our C<REGlob> doesn't.  Indeed, many perl built-in have such
1115 context sensitive behaviors, and these must be adequately supported by
1116 a properly written override.  For a fully functional example of overriding
1117 C<glob>, study the implementation of C<File::DosGlob> in the standard
1118 library.
1119
1120 =head2 Autoloading
1121
1122 If you call a subroutine that is undefined, you would ordinarily
1123 get an immediate, fatal error complaining that the subroutine doesn't
1124 exist.  (Likewise for subroutines being used as methods, when the
1125 method doesn't exist in any base class of the class's package.)
1126 However, if an C<AUTOLOAD> subroutine is defined in the package or
1127 packages used to locate the original subroutine, then that
1128 C<AUTOLOAD> subroutine is called with the arguments that would have
1129 been passed to the original subroutine.  The fully qualified name
1130 of the original subroutine magically appears in the global $AUTOLOAD
1131 variable of the same package as the C<AUTOLOAD> routine.  The name
1132 is not passed as an ordinary argument because, er, well, just
1133 because, that's why...
1134
1135 Many C<AUTOLOAD> routines load in a definition for the requested
1136 subroutine using eval(), then execute that subroutine using a special
1137 form of goto() that erases the stack frame of the C<AUTOLOAD> routine
1138 without a trace.  (See the source to the standard module documented
1139 in L<AutoLoader>, for example.)  But an C<AUTOLOAD> routine can
1140 also just emulate the routine and never define it.   For example,
1141 let's pretend that a function that wasn't defined should just invoke
1142 C<system> with those arguments.  All you'd do is:
1143
1144     sub AUTOLOAD {
1145         my $program = $AUTOLOAD;
1146         $program =~ s/.*:://;
1147         system($program, @_);
1148     }
1149     date();
1150     who('am', 'i');
1151     ls('-l');
1152
1153 In fact, if you predeclare functions you want to call that way, you don't
1154 even need parentheses:
1155
1156     use subs qw(date who ls);
1157     date;
1158     who "am", "i";
1159     ls -l;
1160
1161 A more complete example of this is the standard Shell module, which
1162 can treat undefined subroutine calls as calls to external programs.
1163
1164 Mechanisms are available to help modules writers split their modules
1165 into autoloadable files.  See the standard AutoLoader module
1166 described in L<AutoLoader> and in L<AutoSplit>, the standard
1167 SelfLoader modules in L<SelfLoader>, and the document on adding C
1168 functions to Perl code in L<perlxs>.
1169
1170 =head1 SEE ALSO
1171
1172 See L<perlref/"Function Templates"> for more about references and closures.
1173 See L<perlxs> if you'd like to learn about calling C subroutines from Perl.
1174 See L<perlembed> if you'd like to learn about calling PErl subroutines from C.
1175 See L<perlmod> to learn about bundling up your functions in separate files.
1176 See L<perlmodlib> to learn what library modules come standard on your system.
1177 See L<perltoot> to learn how to make object method calls.