Commit | Line | Data |
---|---|---|
a0d0e21e LW |
1 | =head1 NAME |
2 | ||
3 | perlsub - Perl subroutines | |
4 | ||
5 | =head1 SYNOPSIS | |
6 | ||
7 | To declare subroutines: | |
8 | ||
cb1a09d0 AD |
9 | sub NAME; # A "forward" declaration. |
10 | sub NAME(PROTO); # ditto, but with prototypes | |
11 | ||
12 | sub NAME BLOCK # A declaration and a definition. | |
13 | sub NAME(PROTO) BLOCK # ditto, but with prototypes | |
a0d0e21e | 14 | |
748a9306 LW |
15 | To define an anonymous subroutine at runtime: |
16 | ||
5a964f20 TC |
17 | $subref = sub BLOCK; # no proto |
18 | $subref = sub (PROTO) BLOCK; # with proto | |
748a9306 | 19 | |
a0d0e21e LW |
20 | To import subroutines: |
21 | ||
19799a22 | 22 | use MODULE qw(NAME1 NAME2 NAME3); |
a0d0e21e LW |
23 | |
24 | To call subroutines: | |
25 | ||
5f05dabc | 26 | NAME(LIST); # & is optional with parentheses. |
54310121 | 27 | NAME LIST; # Parentheses optional if predeclared/imported. |
19799a22 | 28 | &NAME(LIST); # Circumvent prototypes. |
5a964f20 | 29 | &NAME; # Makes current @_ visible to called subroutine. |
a0d0e21e LW |
30 | |
31 | =head1 DESCRIPTION | |
32 | ||
19799a22 GS |
33 | Like many languages, Perl provides for user-defined subroutines. |
34 | These may be located anywhere in the main program, loaded in from | |
35 | other files via the C<do>, C<require>, or C<use> keywords, or | |
36 | generated on the fly using C<eval> or anonymous subroutines (closures). | |
37 | You can even call a function indirectly using a variable containing | |
38 | its name or a CODE reference. | |
cb1a09d0 AD |
39 | |
40 | The Perl model for function call and return values is simple: all | |
41 | functions are passed as parameters one single flat list of scalars, and | |
42 | all functions likewise return to their caller one single flat list of | |
43 | scalars. Any arrays or hashes in these call and return lists will | |
44 | collapse, losing their identities--but you may always use | |
45 | pass-by-reference instead to avoid this. Both call and return lists may | |
46 | contain as many or as few scalar elements as you'd like. (Often a | |
47 | function without an explicit return statement is called a subroutine, but | |
19799a22 GS |
48 | there's really no difference from Perl's perspective.) |
49 | ||
50 | Any arguments passed in show up in the array C<@_>. Therefore, if | |
51 | you called a function with two arguments, those would be stored in | |
52 | C<$_[0]> and C<$_[1]>. The array C<@_> is a local array, but its | |
53 | elements are aliases for the actual scalar parameters. In particular, | |
54 | if an element C<$_[0]> is updated, the corresponding argument is | |
55 | updated (or an error occurs if it is not updatable). If an argument | |
56 | is an array or hash element which did not exist when the function | |
57 | was called, that element is created only when (and if) it is modified | |
58 | or a reference to it is taken. (Some earlier versions of Perl | |
59 | created the element whether or not the element was assigned to.) | |
60 | Assigning to the whole array C<@_> removes that aliasing, and does | |
61 | not update any arguments. | |
62 | ||
63 | The return value of a subroutine is the value of the last expression | |
64 | evaluated. More explicitly, a C<return> statement may be used to exit the | |
54310121 | 65 | subroutine, optionally specifying the returned value, which will be |
66 | evaluated in the appropriate context (list, scalar, or void) depending | |
67 | on the context of the subroutine call. If you specify no return value, | |
19799a22 GS |
68 | the subroutine returns an empty list in list context, the undefined |
69 | value in scalar context, or nothing in void context. If you return | |
70 | one or more aggregates (arrays and hashes), these will be flattened | |
71 | together into one large indistinguishable list. | |
72 | ||
73 | Perl does not have named formal parameters. In practice all you | |
74 | do is assign to a C<my()> list of these. Variables that aren't | |
75 | declared to be private are global variables. For gory details | |
76 | on creating private variables, see L<"Private Variables via my()"> | |
77 | and L<"Temporary Values via local()">. To create protected | |
78 | environments for a set of functions in a separate package (and | |
79 | probably a separate file), see L<perlmod/"Packages">. | |
a0d0e21e LW |
80 | |
81 | Example: | |
82 | ||
cb1a09d0 AD |
83 | sub max { |
84 | my $max = shift(@_); | |
a0d0e21e LW |
85 | foreach $foo (@_) { |
86 | $max = $foo if $max < $foo; | |
87 | } | |
cb1a09d0 | 88 | return $max; |
a0d0e21e | 89 | } |
cb1a09d0 | 90 | $bestday = max($mon,$tue,$wed,$thu,$fri); |
a0d0e21e LW |
91 | |
92 | Example: | |
93 | ||
94 | # get a line, combining continuation lines | |
95 | # that start with whitespace | |
96 | ||
97 | sub get_line { | |
19799a22 | 98 | $thisline = $lookahead; # global variables! |
54310121 | 99 | LINE: while (defined($lookahead = <STDIN>)) { |
a0d0e21e LW |
100 | if ($lookahead =~ /^[ \t]/) { |
101 | $thisline .= $lookahead; | |
102 | } | |
103 | else { | |
104 | last LINE; | |
105 | } | |
106 | } | |
19799a22 | 107 | return $thisline; |
a0d0e21e LW |
108 | } |
109 | ||
110 | $lookahead = <STDIN>; # get first line | |
19799a22 | 111 | while (defined($line = get_line())) { |
a0d0e21e LW |
112 | ... |
113 | } | |
114 | ||
19799a22 | 115 | Asisng to a list of private variables to name your arguments: |
a0d0e21e LW |
116 | |
117 | sub maybeset { | |
118 | my($key, $value) = @_; | |
cb1a09d0 | 119 | $Foo{$key} = $value unless $Foo{$key}; |
a0d0e21e LW |
120 | } |
121 | ||
19799a22 GS |
122 | Because the assignment copies the values, this also has the effect |
123 | of turning call-by-reference into call-by-value. Otherwise a | |
124 | function is free to do in-place modifications of C<@_> and change | |
125 | its caller's values. | |
cb1a09d0 AD |
126 | |
127 | upcase_in($v1, $v2); # this changes $v1 and $v2 | |
128 | sub upcase_in { | |
54310121 | 129 | for (@_) { tr/a-z/A-Z/ } |
130 | } | |
cb1a09d0 AD |
131 | |
132 | You aren't allowed to modify constants in this way, of course. If an | |
133 | argument were actually literal and you tried to change it, you'd take a | |
134 | (presumably fatal) exception. For example, this won't work: | |
135 | ||
136 | upcase_in("frederick"); | |
137 | ||
f86cebdf | 138 | It would be much safer if the C<upcase_in()> function |
cb1a09d0 AD |
139 | were written to return a copy of its parameters instead |
140 | of changing them in place: | |
141 | ||
19799a22 | 142 | ($v3, $v4) = upcase($v1, $v2); # this doesn't change $v1 and $v2 |
cb1a09d0 | 143 | sub upcase { |
54310121 | 144 | return unless defined wantarray; # void context, do nothing |
cb1a09d0 | 145 | my @parms = @_; |
54310121 | 146 | for (@parms) { tr/a-z/A-Z/ } |
c07a80fd | 147 | return wantarray ? @parms : $parms[0]; |
54310121 | 148 | } |
cb1a09d0 | 149 | |
19799a22 GS |
150 | Notice how this (unprototyped) function doesn't care whether it was |
151 | passed real scalars or arrays. Perl sees all arugments as one big, | |
152 | long, flat parameter list in C<@_>. This is one area where | |
153 | Perl's simple argument-passing style shines. The C<upcase()> | |
154 | function would work perfectly well without changing the C<upcase()> | |
155 | definition even if we fed it things like this: | |
cb1a09d0 AD |
156 | |
157 | @newlist = upcase(@list1, @list2); | |
158 | @newlist = upcase( split /:/, $var ); | |
159 | ||
160 | Do not, however, be tempted to do this: | |
161 | ||
162 | (@a, @b) = upcase(@list1, @list2); | |
163 | ||
19799a22 GS |
164 | Like the flattened incoming parameter list, the return list is also |
165 | flattened on return. So all you have managed to do here is stored | |
166 | everything in C<@a> and made C<@b> an empty list. See L<Pass by | |
167 | Reference> for alternatives. | |
168 | ||
169 | A subroutine may be called using an explicit C<&> prefix. The | |
170 | C<&> is optional in modern Perl, as are parentheses if the | |
171 | subroutine has been predeclared. The C<&> is I<not> optional | |
172 | when just naming the subroutine, such as when it's used as | |
173 | an argument to defined() or undef(). Nor is it optional when you | |
174 | want to do an indirect subroutine call with a subroutine name or | |
175 | reference using the C<&$subref()> or C<&{$subref}()> constructs, | |
176 | although the C<$subref-E<gt>()> notation solves that problem. | |
177 | See L<perlref> for more about all that. | |
178 | ||
179 | Subroutines may be called recursively. If a subroutine is called | |
180 | using the C<&> form, the argument list is optional, and if omitted, | |
181 | no C<@_> array is set up for the subroutine: the C<@_> array at the | |
182 | time of the call is visible to subroutine instead. This is an | |
183 | efficiency mechanism that new users may wish to avoid. | |
a0d0e21e LW |
184 | |
185 | &foo(1,2,3); # pass three arguments | |
186 | foo(1,2,3); # the same | |
187 | ||
188 | foo(); # pass a null list | |
189 | &foo(); # the same | |
a0d0e21e | 190 | |
cb1a09d0 | 191 | &foo; # foo() get current args, like foo(@_) !! |
54310121 | 192 | foo; # like foo() IFF sub foo predeclared, else "foo" |
cb1a09d0 | 193 | |
19799a22 GS |
194 | Not only does the C<&> form make the argument list optional, it also |
195 | disables any prototype checking on arguments you do provide. This | |
c07a80fd | 196 | is partly for historical reasons, and partly for having a convenient way |
19799a22 | 197 | to cheat if you know what you're doing. See L<Prototypes> below. |
c07a80fd | 198 | |
19799a22 GS |
199 | Function whose names are in all upper case are reserved to the Perl |
200 | core, as are modules whose names are in all lower case. A | |
201 | function in all capitals is a loosely-held convention meaning it | |
202 | will be called indirectly by the run-time system itself, usually | |
203 | due to a triggered event. Functions that do special, pre-defined | |
204 | things include C<BEGIN>, C<END>, C<AUTOLOAD>, and C<DESTROY>--plus | |
205 | all functions mentioned in L<perltie>. The 5.005 release adds | |
206 | C<INIT> to this list. | |
5a964f20 | 207 | |
b687b08b | 208 | =head2 Private Variables via my() |
cb1a09d0 AD |
209 | |
210 | Synopsis: | |
211 | ||
212 | my $foo; # declare $foo lexically local | |
213 | my (@wid, %get); # declare list of variables local | |
214 | my $foo = "flurp"; # declare $foo lexical, and init it | |
215 | my @oof = @bar; # declare @oof lexical, and init it | |
216 | ||
19799a22 GS |
217 | The C<my> operator declares the listed variables to be lexically |
218 | confined to the enclosing block, conditional (C<if/unless/elsif/else>), | |
219 | loop (C<for/foreach/while/until/continue>), subroutine, C<eval>, | |
220 | or C<do/require/use>'d file. If more than one value is listed, the | |
221 | list must be placed in parentheses. All listed elements must be | |
222 | legal lvalues. Only alphanumeric identifiers may be lexically | |
223 | scoped--magical built-in like C<$/> must currently be C<local>ize | |
224 | with C<local> instead. | |
225 | ||
226 | Unlike dynamic variables created by the C<local> operator, lexical | |
227 | variables declared with C<my> are totally hidden from the outside | |
228 | world, including any called subroutines. This is true if it's the | |
229 | same subroutine called from itself or elsewhere--every call gets | |
230 | its own copy. | |
231 | ||
232 | This doesn't mean that a C<my> variable declared in a statically | |
233 | enclosing lexical scope would be invisible. Only dynamic scopes | |
234 | are cut off. For example, the C<bumpx()> function below has access | |
235 | to the lexical $x variable because both the C<my> and the C<sub> | |
236 | occurred at the same scope, presumably file scope. | |
5a964f20 TC |
237 | |
238 | my $x = 10; | |
239 | sub bumpx { $x++ } | |
240 | ||
19799a22 GS |
241 | An C<eval()>, however, can see lexical variables of the scope it is |
242 | being evaluated in, so long as the names aren't hidden by declarations within | |
243 | the C<eval()> itself. See L<perlref>. | |
cb1a09d0 | 244 | |
19799a22 | 245 | The parameter list to my() may be assigned to if desired, which allows you |
cb1a09d0 AD |
246 | to initialize your variables. (If no initializer is given for a |
247 | particular variable, it is created with the undefined value.) Commonly | |
19799a22 | 248 | this is used to name input parameters to a subroutine. Examples: |
cb1a09d0 AD |
249 | |
250 | $arg = "fred"; # "global" variable | |
251 | $n = cube_root(27); | |
252 | print "$arg thinks the root is $n\n"; | |
253 | fred thinks the root is 3 | |
254 | ||
255 | sub cube_root { | |
256 | my $arg = shift; # name doesn't matter | |
257 | $arg **= 1/3; | |
258 | return $arg; | |
54310121 | 259 | } |
cb1a09d0 | 260 | |
19799a22 GS |
261 | The C<my> is simply a modifier on something you might assign to. So when |
262 | you do assign to variables in its argument list, C<my> doesn't | |
6cc33c6d | 263 | change whether those variables are viewed as a scalar or an array. So |
cb1a09d0 | 264 | |
5a964f20 | 265 | my ($foo) = <STDIN>; # WRONG? |
cb1a09d0 AD |
266 | my @FOO = <STDIN>; |
267 | ||
5f05dabc | 268 | both supply a list context to the right-hand side, while |
cb1a09d0 AD |
269 | |
270 | my $foo = <STDIN>; | |
271 | ||
5f05dabc | 272 | supplies a scalar context. But the following declares only one variable: |
748a9306 | 273 | |
5a964f20 | 274 | my $foo, $bar = 1; # WRONG |
748a9306 | 275 | |
cb1a09d0 | 276 | That has the same effect as |
748a9306 | 277 | |
cb1a09d0 AD |
278 | my $foo; |
279 | $bar = 1; | |
a0d0e21e | 280 | |
cb1a09d0 AD |
281 | The declared variable is not introduced (is not visible) until after |
282 | the current statement. Thus, | |
283 | ||
284 | my $x = $x; | |
285 | ||
19799a22 | 286 | can be used to initialize a new $x with the value of the old $x, and |
cb1a09d0 AD |
287 | the expression |
288 | ||
289 | my $x = 123 and $x == 123 | |
290 | ||
19799a22 | 291 | is false unless the old $x happened to have the value C<123>. |
cb1a09d0 | 292 | |
55497cff | 293 | Lexical scopes of control structures are not bounded precisely by the |
294 | braces that delimit their controlled blocks; control expressions are | |
19799a22 | 295 | part of that scope, too. Thus in the loop |
55497cff | 296 | |
19799a22 | 297 | while (my $line = <>) { |
55497cff | 298 | $line = lc $line; |
299 | } continue { | |
300 | print $line; | |
301 | } | |
302 | ||
19799a22 | 303 | the scope of $line extends from its declaration throughout the rest of |
55497cff | 304 | the loop construct (including the C<continue> clause), but not beyond |
305 | it. Similarly, in the conditional | |
306 | ||
307 | if ((my $answer = <STDIN>) =~ /^yes$/i) { | |
308 | user_agrees(); | |
309 | } elsif ($answer =~ /^no$/i) { | |
310 | user_disagrees(); | |
311 | } else { | |
312 | chomp $answer; | |
313 | die "'$answer' is neither 'yes' nor 'no'"; | |
314 | } | |
315 | ||
19799a22 GS |
316 | the scope of $answer extends from its declaration through the rest |
317 | of that conditional, including any C<elsif> and C<else> clauses, | |
55497cff | 318 | but not beyond it. |
319 | ||
19799a22 | 320 | None of the foregoing text applies to C<if/unless> or C<while/until> |
55497cff | 321 | modifiers appended to simple statements. Such modifiers are not |
19799a22 | 322 | control structures and have no effect on scoping. |
55497cff | 323 | |
5f05dabc | 324 | The C<foreach> loop defaults to scoping its index variable dynamically |
19799a22 GS |
325 | in the manner of C<local>. However, if the index variable is |
326 | prefixed with the keyword C<my>, or if there is already a lexical | |
327 | by that name in scope, then a new lexical is created instead. Thus | |
328 | in the loop | |
55497cff | 329 | |
330 | for my $i (1, 2, 3) { | |
331 | some_function(); | |
332 | } | |
333 | ||
19799a22 GS |
334 | the scope of $i extends to the end of the loop, but not beyond it, |
335 | rendering the value of $i inaccessible within C<some_function()>. | |
55497cff | 336 | |
cb1a09d0 | 337 | Some users may wish to encourage the use of lexically scoped variables. |
19799a22 GS |
338 | As an aid to catching implicit uses to package variables, |
339 | which are always global, if you say | |
cb1a09d0 AD |
340 | |
341 | use strict 'vars'; | |
342 | ||
19799a22 GS |
343 | then any variable mentioned from there to the end of the enclosing |
344 | block must either refer to a lexical variable, be predeclared via | |
345 | C<use vars>, or else must be fully qualified with the package name. | |
346 | A compilation error results otherwise. An inner block may countermand | |
347 | this with C<no strict 'vars'>. | |
348 | ||
349 | A C<my> has both a compile-time and a run-time effect. At compile | |
350 | time, the compiler takes notice of it. The principle usefulness | |
351 | of this is to quiet C<use strict 'vars'>, but it is also essential | |
352 | for generation of closures as detailed in L<perlref>. Actual | |
353 | initialization is delayed until run time, though, so it gets executed | |
354 | at the appropriate time, such as each time through a loop, for | |
355 | example. | |
356 | ||
357 | Variables declared with C<my> are not part of any package and are therefore | |
cb1a09d0 AD |
358 | never fully qualified with the package name. In particular, you're not |
359 | allowed to try to make a package variable (or other global) lexical: | |
360 | ||
361 | my $pack::var; # ERROR! Illegal syntax | |
362 | my $_; # also illegal (currently) | |
363 | ||
364 | In fact, a dynamic variable (also known as package or global variables) | |
f86cebdf | 365 | are still accessible using the fully qualified C<::> notation even while a |
cb1a09d0 AD |
366 | lexical of the same name is also visible: |
367 | ||
368 | package main; | |
369 | local $x = 10; | |
370 | my $x = 20; | |
371 | print "$x and $::x\n"; | |
372 | ||
f86cebdf | 373 | That will print out C<20> and C<10>. |
cb1a09d0 | 374 | |
19799a22 GS |
375 | You may declare C<my> variables at the outermost scope of a file |
376 | to hide any such identifiers from the world outside that file. This | |
377 | is similar in spirit to C's static variables when they are used at | |
378 | the file level. To do this with a subroutine requires the use of | |
379 | a closure (an anonymous function that accesses enclosing lexicals). | |
380 | If you want to create a private subroutine that cannot be called | |
381 | from outside that block, it can declare a lexical variable containing | |
382 | an anonymous sub reference: | |
cb1a09d0 AD |
383 | |
384 | my $secret_version = '1.001-beta'; | |
385 | my $secret_sub = sub { print $secret_version }; | |
386 | &$secret_sub(); | |
387 | ||
388 | As long as the reference is never returned by any function within the | |
5f05dabc | 389 | module, no outside module can see the subroutine, because its name is not in |
cb1a09d0 | 390 | any package's symbol table. Remember that it's not I<REALLY> called |
19799a22 | 391 | C<$some_pack::secret_version> or anything; it's just $secret_version, |
cb1a09d0 AD |
392 | unqualified and unqualifiable. |
393 | ||
19799a22 GS |
394 | This does not work with object methods, however; all object methods |
395 | have to be in the symbol table of some package to be found. See | |
396 | L<perlref/"Function Templates"> for something of a work-around to | |
397 | this. | |
cb1a09d0 | 398 | |
c2611fb3 | 399 | =head2 Persistent Private Variables |
5a964f20 TC |
400 | |
401 | Just because a lexical variable is lexically (also called statically) | |
f86cebdf | 402 | scoped to its enclosing block, C<eval>, or C<do> FILE, this doesn't mean that |
5a964f20 TC |
403 | within a function it works like a C static. It normally works more |
404 | like a C auto, but with implicit garbage collection. | |
405 | ||
406 | Unlike local variables in C or C++, Perl's lexical variables don't | |
407 | necessarily get recycled just because their scope has exited. | |
408 | If something more permanent is still aware of the lexical, it will | |
409 | stick around. So long as something else references a lexical, that | |
410 | lexical won't be freed--which is as it should be. You wouldn't want | |
411 | memory being free until you were done using it, or kept around once you | |
412 | were done. Automatic garbage collection takes care of this for you. | |
413 | ||
414 | This means that you can pass back or save away references to lexical | |
415 | variables, whereas to return a pointer to a C auto is a grave error. | |
416 | It also gives us a way to simulate C's function statics. Here's a | |
417 | mechanism for giving a function private variables with both lexical | |
418 | scoping and a static lifetime. If you do want to create something like | |
419 | C's static variables, just enclose the whole function in an extra block, | |
420 | and put the static variable outside the function but in the block. | |
cb1a09d0 AD |
421 | |
422 | { | |
54310121 | 423 | my $secret_val = 0; |
cb1a09d0 AD |
424 | sub gimme_another { |
425 | return ++$secret_val; | |
54310121 | 426 | } |
427 | } | |
cb1a09d0 AD |
428 | # $secret_val now becomes unreachable by the outside |
429 | # world, but retains its value between calls to gimme_another | |
430 | ||
54310121 | 431 | If this function is being sourced in from a separate file |
cb1a09d0 | 432 | via C<require> or C<use>, then this is probably just fine. If it's |
19799a22 | 433 | all in the main program, you'll need to arrange for the C<my> |
cb1a09d0 | 434 | to be executed early, either by putting the whole block above |
f86cebdf | 435 | your main program, or more likely, placing merely a C<BEGIN> |
cb1a09d0 AD |
436 | sub around it to make sure it gets executed before your program |
437 | starts to run: | |
438 | ||
439 | sub BEGIN { | |
54310121 | 440 | my $secret_val = 0; |
cb1a09d0 AD |
441 | sub gimme_another { |
442 | return ++$secret_val; | |
54310121 | 443 | } |
444 | } | |
cb1a09d0 | 445 | |
19799a22 GS |
446 | See L<perlmod/"Package Constructors and Destructors"> about the |
447 | special triggered functions, C<BEGIN> and C<INIT>. | |
cb1a09d0 | 448 | |
19799a22 GS |
449 | If declared at the outermost scope (the file scope), then lexicals |
450 | work somewhat like C's file statics. They are available to all | |
451 | functions in that same file declared below them, but are inaccessible | |
452 | from outside that file. This strategy is sometimes used in modules | |
453 | to create private variables that the whole module can see. | |
5a964f20 | 454 | |
cb1a09d0 AD |
455 | =head2 Temporary Values via local() |
456 | ||
19799a22 | 457 | B<WARNING>: In general, you should be using C<my> instead of C<local>, because |
6d28dffb | 458 | it's faster and safer. Exceptions to this include the global punctuation |
cb1a09d0 | 459 | variables, filehandles and formats, and direct manipulation of the Perl |
19799a22 | 460 | symbol table itself. Format variables often use C<local> though, as do |
cb1a09d0 AD |
461 | other variables whose current value must be visible to called |
462 | subroutines. | |
463 | ||
464 | Synopsis: | |
465 | ||
466 | local $foo; # declare $foo dynamically local | |
467 | local (@wid, %get); # declare list of variables local | |
468 | local $foo = "flurp"; # declare $foo dynamic, and init it | |
469 | local @oof = @bar; # declare @oof dynamic, and init it | |
470 | ||
471 | local *FH; # localize $FH, @FH, %FH, &FH ... | |
472 | local *merlyn = *randal; # now $merlyn is really $randal, plus | |
473 | # @merlyn is really @randal, etc | |
474 | local *merlyn = 'randal'; # SAME THING: promote 'randal' to *randal | |
54310121 | 475 | local *merlyn = \$randal; # just alias $merlyn, not @merlyn etc |
cb1a09d0 | 476 | |
19799a22 GS |
477 | A C<local> modifies its listed variables to be "local" to the |
478 | enclosing block, C<eval>, or C<do FILE>--and to I<any subroutine | |
479 | called from within that block>. A C<local> just gives temporary | |
480 | values to global (meaning package) variables. It does I<not> create | |
481 | a local variable. This is known as dynamic scoping. Lexical scoping | |
482 | is done with C<my>, which works more like C's auto declarations. | |
cb1a09d0 | 483 | |
19799a22 | 484 | If more than one variable is given to C<local>, they must be placed in |
5f05dabc | 485 | parentheses. All listed elements must be legal lvalues. This operator works |
cb1a09d0 | 486 | by saving the current values of those variables in its argument list on a |
5f05dabc | 487 | hidden stack and restoring them upon exiting the block, subroutine, or |
cb1a09d0 AD |
488 | eval. This means that called subroutines can also reference the local |
489 | variable, but not the global one. The argument list may be assigned to if | |
490 | desired, which allows you to initialize your local variables. (If no | |
491 | initializer is given for a particular variable, it is created with an | |
492 | undefined value.) Commonly this is used to name the parameters to a | |
493 | subroutine. Examples: | |
494 | ||
495 | for $i ( 0 .. 9 ) { | |
496 | $digits{$i} = $i; | |
54310121 | 497 | } |
cb1a09d0 | 498 | # assume this function uses global %digits hash |
54310121 | 499 | parse_num(); |
cb1a09d0 AD |
500 | |
501 | # now temporarily add to %digits hash | |
502 | if ($base12) { | |
503 | # (NOTE: not claiming this is efficient!) | |
504 | local %digits = (%digits, 't' => 10, 'e' => 11); | |
505 | parse_num(); # parse_num gets this new %digits! | |
506 | } | |
507 | # old %digits restored here | |
508 | ||
19799a22 | 509 | Because C<local> is a run-time operator, it gets executed each time |
cb1a09d0 AD |
510 | through a loop. In releases of Perl previous to 5.0, this used more stack |
511 | storage each time until the loop was exited. Perl now reclaims the space | |
512 | each time through, but it's still more efficient to declare your variables | |
513 | outside the loop. | |
514 | ||
f86cebdf GS |
515 | A C<local> is simply a modifier on an lvalue expression. When you assign to |
516 | a C<local>ized variable, the C<local> doesn't change whether its list is viewed | |
cb1a09d0 AD |
517 | as a scalar or an array. So |
518 | ||
519 | local($foo) = <STDIN>; | |
520 | local @FOO = <STDIN>; | |
521 | ||
5f05dabc | 522 | both supply a list context to the right-hand side, while |
cb1a09d0 AD |
523 | |
524 | local $foo = <STDIN>; | |
525 | ||
526 | supplies a scalar context. | |
527 | ||
3e3baf6d TB |
528 | A note about C<local()> and composite types is in order. Something |
529 | like C<local(%foo)> works by temporarily placing a brand new hash in | |
530 | the symbol table. The old hash is left alone, but is hidden "behind" | |
531 | the new one. | |
532 | ||
533 | This means the old variable is completely invisible via the symbol | |
534 | table (i.e. the hash entry in the C<*foo> typeglob) for the duration | |
535 | of the dynamic scope within which the C<local()> was seen. This | |
536 | has the effect of allowing one to temporarily occlude any magic on | |
537 | composite types. For instance, this will briefly alter a tied | |
538 | hash to some other implementation: | |
539 | ||
540 | tie %ahash, 'APackage'; | |
541 | [...] | |
542 | { | |
543 | local %ahash; | |
544 | tie %ahash, 'BPackage'; | |
545 | [..called code will see %ahash tied to 'BPackage'..] | |
546 | { | |
547 | local %ahash; | |
548 | [..%ahash is a normal (untied) hash here..] | |
549 | } | |
550 | } | |
551 | [..%ahash back to its initial tied self again..] | |
552 | ||
553 | As another example, a custom implementation of C<%ENV> might look | |
554 | like this: | |
555 | ||
556 | { | |
557 | local %ENV; | |
558 | tie %ENV, 'MyOwnEnv'; | |
559 | [..do your own fancy %ENV manipulation here..] | |
560 | } | |
561 | [..normal %ENV behavior here..] | |
562 | ||
6ee623d5 | 563 | It's also worth taking a moment to explain what happens when you |
f86cebdf GS |
564 | C<local>ize a member of a composite type (i.e. an array or hash element). |
565 | In this case, the element is C<local>ized I<by name>. This means that | |
6ee623d5 GS |
566 | when the scope of the C<local()> ends, the saved value will be |
567 | restored to the hash element whose key was named in the C<local()>, or | |
568 | the array element whose index was named in the C<local()>. If that | |
569 | element was deleted while the C<local()> was in effect (e.g. by a | |
570 | C<delete()> from a hash or a C<shift()> of an array), it will spring | |
571 | back into existence, possibly extending an array and filling in the | |
572 | skipped elements with C<undef>. For instance, if you say | |
573 | ||
574 | %hash = ( 'This' => 'is', 'a' => 'test' ); | |
575 | @ary = ( 0..5 ); | |
576 | { | |
577 | local($ary[5]) = 6; | |
578 | local($hash{'a'}) = 'drill'; | |
579 | while (my $e = pop(@ary)) { | |
580 | print "$e . . .\n"; | |
581 | last unless $e > 3; | |
582 | } | |
583 | if (@ary) { | |
584 | $hash{'only a'} = 'test'; | |
585 | delete $hash{'a'}; | |
586 | } | |
587 | } | |
588 | print join(' ', map { "$_ $hash{$_}" } sort keys %hash),".\n"; | |
589 | print "The array has ",scalar(@ary)," elements: ", | |
590 | join(', ', map { defined $_ ? $_ : 'undef' } @ary),"\n"; | |
591 | ||
592 | Perl will print | |
593 | ||
594 | 6 . . . | |
595 | 4 . . . | |
596 | 3 . . . | |
597 | This is a test only a test. | |
598 | The array has 6 elements: 0, 1, 2, undef, undef, 5 | |
599 | ||
19799a22 | 600 | The behavior of local() on non-existent members of composite |
7185e5cc GS |
601 | types is subject to change in future. |
602 | ||
cb1a09d0 AD |
603 | =head2 Passing Symbol Table Entries (typeglobs) |
604 | ||
19799a22 GS |
605 | B<WARNING>: The mechanism described in this section was originally |
606 | the only way to simulate pass-by-reference in older versions of | |
607 | Perl. While it still works fine in modern versions, the new reference | |
608 | mechanism is generally easier to work with. See below. | |
a0d0e21e LW |
609 | |
610 | Sometimes you don't want to pass the value of an array to a subroutine | |
611 | but rather the name of it, so that the subroutine can modify the global | |
612 | copy of it rather than working with a local copy. In perl you can | |
cb1a09d0 | 613 | refer to all objects of a particular name by prefixing the name |
5f05dabc | 614 | with a star: C<*foo>. This is often known as a "typeglob", because the |
a0d0e21e LW |
615 | star on the front can be thought of as a wildcard match for all the |
616 | funny prefix characters on variables and subroutines and such. | |
617 | ||
55497cff | 618 | When evaluated, the typeglob produces a scalar value that represents |
5f05dabc | 619 | all the objects of that name, including any filehandle, format, or |
a0d0e21e | 620 | subroutine. When assigned to, it causes the name mentioned to refer to |
19799a22 | 621 | whatever C<*> value was assigned to it. Example: |
a0d0e21e LW |
622 | |
623 | sub doubleary { | |
624 | local(*someary) = @_; | |
625 | foreach $elem (@someary) { | |
626 | $elem *= 2; | |
627 | } | |
628 | } | |
629 | doubleary(*foo); | |
630 | doubleary(*bar); | |
631 | ||
19799a22 | 632 | Scalars are already passed by reference, so you can modify |
a0d0e21e | 633 | scalar arguments without using this mechanism by referring explicitly |
1fef88e7 | 634 | to C<$_[0]> etc. You can modify all the elements of an array by passing |
f86cebdf GS |
635 | all the elements as scalars, but you have to use the C<*> mechanism (or |
636 | the equivalent reference mechanism) to C<push>, C<pop>, or change the size of | |
a0d0e21e LW |
637 | an array. It will certainly be faster to pass the typeglob (or reference). |
638 | ||
639 | Even if you don't want to modify an array, this mechanism is useful for | |
5f05dabc | 640 | passing multiple arrays in a single LIST, because normally the LIST |
a0d0e21e | 641 | mechanism will merge all the array values so that you can't extract out |
55497cff | 642 | the individual arrays. For more on typeglobs, see |
2ae324a7 | 643 | L<perldata/"Typeglobs and Filehandles">. |
cb1a09d0 | 644 | |
5a964f20 TC |
645 | =head2 When to Still Use local() |
646 | ||
19799a22 GS |
647 | Despite the existence of C<my>, there are still three places where the |
648 | C<local> operator still shines. In fact, in these three places, you | |
5a964f20 TC |
649 | I<must> use C<local> instead of C<my>. |
650 | ||
651 | =over | |
652 | ||
19799a22 | 653 | =item 1. You need to give a global variable a temporary value, especially $_. |
5a964f20 | 654 | |
f86cebdf GS |
655 | The global variables, like C<@ARGV> or the punctuation variables, must be |
656 | C<local>ized with C<local()>. This block reads in F</etc/motd>, and splits | |
5a964f20 | 657 | it up into chunks separated by lines of equal signs, which are placed |
f86cebdf | 658 | in C<@Fields>. |
5a964f20 TC |
659 | |
660 | { | |
661 | local @ARGV = ("/etc/motd"); | |
662 | local $/ = undef; | |
663 | local $_ = <>; | |
664 | @Fields = split /^\s*=+\s*$/; | |
665 | } | |
666 | ||
19799a22 | 667 | It particular, it's important to C<local>ize $_ in any routine that assigns |
5a964f20 TC |
668 | to it. Look out for implicit assignments in C<while> conditionals. |
669 | ||
670 | =item 2. You need to create a local file or directory handle or a local function. | |
671 | ||
f86cebdf GS |
672 | A function that needs a filehandle of its own must use C<local()> uses |
673 | C<local()> on complete typeglob. This can be used to create new symbol | |
5a964f20 TC |
674 | table entries: |
675 | ||
676 | sub ioqueue { | |
677 | local (*READER, *WRITER); # not my! | |
678 | pipe (READER, WRITER); or die "pipe: $!"; | |
679 | return (*READER, *WRITER); | |
680 | } | |
681 | ($head, $tail) = ioqueue(); | |
682 | ||
683 | See the Symbol module for a way to create anonymous symbol table | |
684 | entries. | |
685 | ||
686 | Because assignment of a reference to a typeglob creates an alias, this | |
687 | can be used to create what is effectively a local function, or at least, | |
688 | a local alias. | |
689 | ||
690 | { | |
f86cebdf GS |
691 | local *grow = \&shrink; # only until this block exists |
692 | grow(); # really calls shrink() | |
693 | move(); # if move() grow()s, it shrink()s too | |
5a964f20 | 694 | } |
f86cebdf | 695 | grow(); # get the real grow() again |
5a964f20 TC |
696 | |
697 | See L<perlref/"Function Templates"> for more about manipulating | |
698 | functions by name in this way. | |
699 | ||
700 | =item 3. You want to temporarily change just one element of an array or hash. | |
701 | ||
f86cebdf | 702 | You can C<local>ize just one element of an aggregate. Usually this |
5a964f20 TC |
703 | is done on dynamics: |
704 | ||
705 | { | |
706 | local $SIG{INT} = 'IGNORE'; | |
707 | funct(); # uninterruptible | |
708 | } | |
709 | # interruptibility automatically restored here | |
710 | ||
711 | But it also works on lexically declared aggregates. Prior to 5.005, | |
712 | this operation could on occasion misbehave. | |
713 | ||
714 | =back | |
715 | ||
cb1a09d0 AD |
716 | =head2 Pass by Reference |
717 | ||
55497cff | 718 | If you want to pass more than one array or hash into a function--or |
719 | return them from it--and have them maintain their integrity, then | |
720 | you're going to have to use an explicit pass-by-reference. Before you | |
721 | do that, you need to understand references as detailed in L<perlref>. | |
c07a80fd | 722 | This section may not make much sense to you otherwise. |
cb1a09d0 | 723 | |
19799a22 GS |
724 | Here are a few simple examples. First, let's pass in several arrays |
725 | to a function and have it C<pop> all of then, returning a new list | |
726 | of all their former last elements: | |
cb1a09d0 AD |
727 | |
728 | @tailings = popmany ( \@a, \@b, \@c, \@d ); | |
729 | ||
730 | sub popmany { | |
731 | my $aref; | |
732 | my @retlist = (); | |
733 | foreach $aref ( @_ ) { | |
734 | push @retlist, pop @$aref; | |
54310121 | 735 | } |
cb1a09d0 | 736 | return @retlist; |
54310121 | 737 | } |
cb1a09d0 | 738 | |
54310121 | 739 | Here's how you might write a function that returns a |
cb1a09d0 AD |
740 | list of keys occurring in all the hashes passed to it: |
741 | ||
54310121 | 742 | @common = inter( \%foo, \%bar, \%joe ); |
cb1a09d0 AD |
743 | sub inter { |
744 | my ($k, $href, %seen); # locals | |
745 | foreach $href (@_) { | |
746 | while ( $k = each %$href ) { | |
747 | $seen{$k}++; | |
54310121 | 748 | } |
749 | } | |
cb1a09d0 | 750 | return grep { $seen{$_} == @_ } keys %seen; |
54310121 | 751 | } |
cb1a09d0 | 752 | |
5f05dabc | 753 | So far, we're using just the normal list return mechanism. |
54310121 | 754 | What happens if you want to pass or return a hash? Well, |
755 | if you're using only one of them, or you don't mind them | |
cb1a09d0 | 756 | concatenating, then the normal calling convention is ok, although |
54310121 | 757 | a little expensive. |
cb1a09d0 AD |
758 | |
759 | Where people get into trouble is here: | |
760 | ||
761 | (@a, @b) = func(@c, @d); | |
762 | or | |
763 | (%a, %b) = func(%c, %d); | |
764 | ||
19799a22 GS |
765 | That syntax simply won't work. It sets just C<@a> or C<%a> and |
766 | clears the C<@b> or C<%b>. Plus the function didn't get passed | |
767 | into two separate arrays or hashes: it got one long list in C<@_>, | |
768 | as always. | |
cb1a09d0 AD |
769 | |
770 | If you can arrange for everyone to deal with this through references, it's | |
771 | cleaner code, although not so nice to look at. Here's a function that | |
772 | takes two array references as arguments, returning the two array elements | |
773 | in order of how many elements they have in them: | |
774 | ||
775 | ($aref, $bref) = func(\@c, \@d); | |
776 | print "@$aref has more than @$bref\n"; | |
777 | sub func { | |
778 | my ($cref, $dref) = @_; | |
779 | if (@$cref > @$dref) { | |
780 | return ($cref, $dref); | |
781 | } else { | |
c07a80fd | 782 | return ($dref, $cref); |
54310121 | 783 | } |
784 | } | |
cb1a09d0 AD |
785 | |
786 | It turns out that you can actually do this also: | |
787 | ||
788 | (*a, *b) = func(\@c, \@d); | |
789 | print "@a has more than @b\n"; | |
790 | sub func { | |
791 | local (*c, *d) = @_; | |
792 | if (@c > @d) { | |
793 | return (\@c, \@d); | |
794 | } else { | |
795 | return (\@d, \@c); | |
54310121 | 796 | } |
797 | } | |
cb1a09d0 AD |
798 | |
799 | Here we're using the typeglobs to do symbol table aliasing. It's | |
19799a22 GS |
800 | a tad subtle, though, and also won't work if you're using C<my> |
801 | variables, because only globals (even in disguised as C<local>s) | |
802 | are in the symbol table. | |
5f05dabc | 803 | |
804 | If you're passing around filehandles, you could usually just use the bare | |
19799a22 GS |
805 | typeglob, like C<*STDOUT>, but typeglobs references work, too. |
806 | For example: | |
5f05dabc | 807 | |
808 | splutter(\*STDOUT); | |
809 | sub splutter { | |
810 | my $fh = shift; | |
811 | print $fh "her um well a hmmm\n"; | |
812 | } | |
813 | ||
814 | $rec = get_rec(\*STDIN); | |
815 | sub get_rec { | |
816 | my $fh = shift; | |
817 | return scalar <$fh>; | |
818 | } | |
819 | ||
19799a22 GS |
820 | If you're planning on generating new filehandles, you could do this. |
821 | Notice to pass back just the bare *FH, not its reference. | |
5f05dabc | 822 | |
823 | sub openit { | |
19799a22 | 824 | my $path = shift; |
5f05dabc | 825 | local *FH; |
e05a3a1e | 826 | return open (FH, $path) ? *FH : undef; |
54310121 | 827 | } |
5f05dabc | 828 | |
cb1a09d0 AD |
829 | =head2 Prototypes |
830 | ||
19799a22 GS |
831 | Perl supports a very limited kind of compile-time argument checking |
832 | using function prototyping. If you declare | |
cb1a09d0 AD |
833 | |
834 | sub mypush (\@@) | |
835 | ||
19799a22 GS |
836 | then C<mypush()> takes arguments exactly like C<push()> does. The |
837 | function declaration must be visible at compile time. The prototype | |
838 | affects only interpretation of new-style calls to the function, | |
839 | where new-style is defined as not using the C<&> character. In | |
840 | other words, if you call it like a built-in function, then it behaves | |
841 | like a built-in function. If you call it like an old-fashioned | |
842 | subroutine, then it behaves like an old-fashioned subroutine. It | |
843 | naturally falls out from this rule that prototypes have no influence | |
844 | on subroutine references like C<\&foo> or on indirect subroutine | |
845 | calls like C<&{$subref}> or C<$subref-E<gt>()>. | |
c07a80fd | 846 | |
847 | Method calls are not influenced by prototypes either, because the | |
19799a22 GS |
848 | function to be called is indeterminate at compile time, since |
849 | the exact code called depends on inheritance. | |
cb1a09d0 | 850 | |
19799a22 GS |
851 | Because the intent of this feature is primarily to let you define |
852 | subroutines that work like built-in functions, here are prototypes | |
853 | for some other functions that parse almost exactly like the | |
854 | corresponding built-in. | |
cb1a09d0 AD |
855 | |
856 | Declared as Called as | |
857 | ||
f86cebdf GS |
858 | sub mylink ($$) mylink $old, $new |
859 | sub myvec ($$$) myvec $var, $offset, 1 | |
860 | sub myindex ($$;$) myindex &getstring, "substr" | |
861 | sub mysyswrite ($$$;$) mysyswrite $buf, 0, length($buf) - $off, $off | |
862 | sub myreverse (@) myreverse $a, $b, $c | |
863 | sub myjoin ($@) myjoin ":", $a, $b, $c | |
864 | sub mypop (\@) mypop @array | |
865 | sub mysplice (\@$$@) mysplice @array, @array, 0, @pushme | |
866 | sub mykeys (\%) mykeys %{$hashref} | |
867 | sub myopen (*;$) myopen HANDLE, $name | |
868 | sub mypipe (**) mypipe READHANDLE, WRITEHANDLE | |
869 | sub mygrep (&@) mygrep { /foo/ } $a, $b, $c | |
870 | sub myrand ($) myrand 42 | |
871 | sub mytime () mytime | |
cb1a09d0 | 872 | |
c07a80fd | 873 | Any backslashed prototype character represents an actual argument |
6e47f808 | 874 | that absolutely must start with that character. The value passed |
19799a22 GS |
875 | as part of C<@_> will be a reference to the actual argument given |
876 | in the subroutine call, obtained by applying C<\> to that argument. | |
c07a80fd | 877 | |
878 | Unbackslashed prototype characters have special meanings. Any | |
19799a22 | 879 | unbackslashed C<@> or C<%> eats all remaining arguments, and forces |
f86cebdf GS |
880 | list context. An argument represented by C<$> forces scalar context. An |
881 | C<&> requires an anonymous subroutine, which, if passed as the first | |
19799a22 | 882 | argument, does not require the C<sub> keyword or a subsequent comma. A |
648ca4f7 GS |
883 | C<*> allows the subroutine to accept a bareword, constant, scalar expression, |
884 | typeglob, or a reference to a typeglob in that slot. The value will be | |
885 | available to the subroutine either as a simple scalar, or (in the latter | |
886 | two cases) as a reference to the typeglob. | |
c07a80fd | 887 | |
888 | A semicolon separates mandatory arguments from optional arguments. | |
19799a22 | 889 | It is redundant before C<@> or C<%>, which gobble up everything else. |
cb1a09d0 | 890 | |
19799a22 GS |
891 | Note how the last three examples in the table above are treated |
892 | specially by the parser. C<mygrep()> is parsed as a true list | |
893 | operator, C<myrand()> is parsed as a true unary operator with unary | |
894 | precedence the same as C<rand()>, and C<mytime()> is truly without | |
895 | arguments, just like C<time()>. That is, if you say | |
cb1a09d0 AD |
896 | |
897 | mytime +2; | |
898 | ||
f86cebdf | 899 | you'll get C<mytime() + 2>, not C<mytime(2)>, which is how it would be parsed |
19799a22 | 900 | without a prototype. |
cb1a09d0 | 901 | |
19799a22 GS |
902 | The interesting thing about C<&> is that you can generate new syntax with it, |
903 | provided it's in the initial position: | |
cb1a09d0 | 904 | |
6d28dffb | 905 | sub try (&@) { |
cb1a09d0 AD |
906 | my($try,$catch) = @_; |
907 | eval { &$try }; | |
908 | if ($@) { | |
909 | local $_ = $@; | |
910 | &$catch; | |
911 | } | |
912 | } | |
55497cff | 913 | sub catch (&) { $_[0] } |
cb1a09d0 AD |
914 | |
915 | try { | |
916 | die "phooey"; | |
917 | } catch { | |
918 | /phooey/ and print "unphooey\n"; | |
919 | }; | |
920 | ||
f86cebdf | 921 | That prints C<"unphooey">. (Yes, there are still unresolved |
19799a22 | 922 | issues having to do with visibility of C<@_>. I'm ignoring that |
f86cebdf | 923 | question for the moment. (But note that if we make C<@_> lexically |
cb1a09d0 | 924 | scoped, those anonymous subroutines can act like closures... (Gee, |
5f05dabc | 925 | is this sounding a little Lispish? (Never mind.)))) |
cb1a09d0 | 926 | |
19799a22 | 927 | And here's a reimplementation of the Perl C<grep> operator: |
cb1a09d0 AD |
928 | |
929 | sub mygrep (&@) { | |
930 | my $code = shift; | |
931 | my @result; | |
932 | foreach $_ (@_) { | |
6e47f808 | 933 | push(@result, $_) if &$code; |
cb1a09d0 AD |
934 | } |
935 | @result; | |
936 | } | |
a0d0e21e | 937 | |
cb1a09d0 AD |
938 | Some folks would prefer full alphanumeric prototypes. Alphanumerics have |
939 | been intentionally left out of prototypes for the express purpose of | |
940 | someday in the future adding named, formal parameters. The current | |
941 | mechanism's main goal is to let module writers provide better diagnostics | |
942 | for module users. Larry feels the notation quite understandable to Perl | |
943 | programmers, and that it will not intrude greatly upon the meat of the | |
944 | module, nor make it harder to read. The line noise is visually | |
945 | encapsulated into a small pill that's easy to swallow. | |
946 | ||
947 | It's probably best to prototype new functions, not retrofit prototyping | |
948 | into older ones. That's because you must be especially careful about | |
949 | silent impositions of differing list versus scalar contexts. For example, | |
950 | if you decide that a function should take just one parameter, like this: | |
951 | ||
952 | sub func ($) { | |
953 | my $n = shift; | |
954 | print "you gave me $n\n"; | |
54310121 | 955 | } |
cb1a09d0 AD |
956 | |
957 | and someone has been calling it with an array or expression | |
958 | returning a list: | |
959 | ||
960 | func(@foo); | |
961 | func( split /:/ ); | |
962 | ||
19799a22 | 963 | Then you've just supplied an automatic C<scalar> in front of their |
f86cebdf | 964 | argument, which can be more than a bit surprising. The old C<@foo> |
cb1a09d0 | 965 | which used to hold one thing doesn't get passed in. Instead, |
19799a22 GS |
966 | C<func()> now gets passed in a C<1>; that is, the number of elements |
967 | in C<@foo>. And the C<split> gets called in scalar context so it | |
968 | starts scribbling on your C<@_> parameter list. Ouch! | |
cb1a09d0 | 969 | |
5f05dabc | 970 | This is all very powerful, of course, and should be used only in moderation |
54310121 | 971 | to make the world a better place. |
44a8e56a | 972 | |
973 | =head2 Constant Functions | |
974 | ||
975 | Functions with a prototype of C<()> are potential candidates for | |
19799a22 GS |
976 | inlining. If the result after optimization and constant folding |
977 | is either a constant or a lexically-scoped scalar which has no other | |
54310121 | 978 | references, then it will be used in place of function calls made |
19799a22 GS |
979 | without C<&>. Calls made using C<&> are never inlined. (See |
980 | F<constant.pm> for an easy way to declare most constants.) | |
44a8e56a | 981 | |
5a964f20 | 982 | The following functions would all be inlined: |
44a8e56a | 983 | |
699e6cd4 TP |
984 | sub pi () { 3.14159 } # Not exact, but close. |
985 | sub PI () { 4 * atan2 1, 1 } # As good as it gets, | |
986 | # and it's inlined, too! | |
44a8e56a | 987 | sub ST_DEV () { 0 } |
988 | sub ST_INO () { 1 } | |
989 | ||
990 | sub FLAG_FOO () { 1 << 8 } | |
991 | sub FLAG_BAR () { 1 << 9 } | |
992 | sub FLAG_MASK () { FLAG_FOO | FLAG_BAR } | |
54310121 | 993 | |
994 | sub OPT_BAZ () { not (0x1B58 & FLAG_MASK) } | |
44a8e56a | 995 | sub BAZ_VAL () { |
996 | if (OPT_BAZ) { | |
997 | return 23; | |
998 | } | |
999 | else { | |
1000 | return 42; | |
1001 | } | |
1002 | } | |
cb1a09d0 | 1003 | |
54310121 | 1004 | sub N () { int(BAZ_VAL) / 3 } |
1005 | BEGIN { | |
1006 | my $prod = 1; | |
1007 | for (1..N) { $prod *= $_ } | |
1008 | sub N_FACTORIAL () { $prod } | |
1009 | } | |
1010 | ||
5a964f20 | 1011 | If you redefine a subroutine that was eligible for inlining, you'll get |
4cee8e80 CS |
1012 | a mandatory warning. (You can use this warning to tell whether or not a |
1013 | particular subroutine is considered constant.) The warning is | |
1014 | considered severe enough not to be optional because previously compiled | |
1015 | invocations of the function will still be using the old value of the | |
19799a22 | 1016 | function. If you need to be able to redefine the subroutine, you need to |
4cee8e80 | 1017 | ensure that it isn't inlined, either by dropping the C<()> prototype |
19799a22 | 1018 | (which changes calling semantics, so beware) or by thwarting the |
4cee8e80 CS |
1019 | inlining mechanism in some other way, such as |
1020 | ||
4cee8e80 | 1021 | sub not_inlined () { |
54310121 | 1022 | 23 if $]; |
4cee8e80 CS |
1023 | } |
1024 | ||
19799a22 | 1025 | =head2 Overriding Built-in Functions |
a0d0e21e | 1026 | |
19799a22 | 1027 | Many built-in functions may be overridden, though this should be tried |
5f05dabc | 1028 | only occasionally and for good reason. Typically this might be |
19799a22 | 1029 | done by a package attempting to emulate missing built-in functionality |
a0d0e21e LW |
1030 | on a non-Unix system. |
1031 | ||
5f05dabc | 1032 | Overriding may be done only by importing the name from a |
a0d0e21e | 1033 | module--ordinary predeclaration isn't good enough. However, the |
19799a22 GS |
1034 | C<use subs> pragma lets you, in effect, predeclare subs |
1035 | via the import syntax, and these names may then override built-in ones: | |
a0d0e21e LW |
1036 | |
1037 | use subs 'chdir', 'chroot', 'chmod', 'chown'; | |
1038 | chdir $somewhere; | |
1039 | sub chdir { ... } | |
1040 | ||
19799a22 GS |
1041 | To unambiguously refer to the built-in form, precede the |
1042 | built-in name with the special package qualifier C<CORE::>. For example, | |
1043 | saying C<CORE::open()> always refers to the built-in C<open()>, even | |
fb73857a | 1044 | if the current package has imported some other subroutine called |
19799a22 GS |
1045 | C<&open()> from elsewhere. Even though it looks like a regular |
1046 | function calls, it isn't: you can't take a reference to it, such as | |
1047 | the incorrect C<\&CORE::open> might appear to produce. | |
fb73857a | 1048 | |
19799a22 GS |
1049 | Library modules should not in general export built-in names like C<open> |
1050 | or C<chdir> as part of their default C<@EXPORT> list, because these may | |
a0d0e21e | 1051 | sneak into someone else's namespace and change the semantics unexpectedly. |
19799a22 | 1052 | Instead, if the module adds that name to C<@EXPORT_OK>, then it's |
a0d0e21e LW |
1053 | possible for a user to import the name explicitly, but not implicitly. |
1054 | That is, they could say | |
1055 | ||
1056 | use Module 'open'; | |
1057 | ||
19799a22 | 1058 | and it would import the C<open> override. But if they said |
a0d0e21e LW |
1059 | |
1060 | use Module; | |
1061 | ||
19799a22 | 1062 | they would get the default imports without overrides. |
a0d0e21e | 1063 | |
19799a22 | 1064 | The foregoing mechanism for overriding built-in is restricted, quite |
95d94a4f | 1065 | deliberately, to the package that requests the import. There is a second |
19799a22 | 1066 | method that is sometimes applicable when you wish to override a built-in |
95d94a4f GS |
1067 | everywhere, without regard to namespace boundaries. This is achieved by |
1068 | importing a sub into the special namespace C<CORE::GLOBAL::>. Here is an | |
1069 | example that quite brazenly replaces the C<glob> operator with something | |
1070 | that understands regular expressions. | |
1071 | ||
1072 | package REGlob; | |
1073 | require Exporter; | |
1074 | @ISA = 'Exporter'; | |
1075 | @EXPORT_OK = 'glob'; | |
1076 | ||
1077 | sub import { | |
1078 | my $pkg = shift; | |
1079 | return unless @_; | |
1080 | my $sym = shift; | |
1081 | my $where = ($sym =~ s/^GLOBAL_// ? 'CORE::GLOBAL' : caller(0)); | |
1082 | $pkg->export($where, $sym, @_); | |
1083 | } | |
1084 | ||
1085 | sub glob { | |
1086 | my $pat = shift; | |
1087 | my @got; | |
19799a22 GS |
1088 | local *D; |
1089 | if (opendir D, '.') { | |
1090 | @got = grep /$pat/, readdir D; | |
1091 | closedir D; | |
1092 | } | |
1093 | return @got; | |
95d94a4f GS |
1094 | } |
1095 | 1; | |
1096 | ||
1097 | And here's how it could be (ab)used: | |
1098 | ||
1099 | #use REGlob 'GLOBAL_glob'; # override glob() in ALL namespaces | |
1100 | package Foo; | |
1101 | use REGlob 'glob'; # override glob() in Foo:: only | |
1102 | print for <^[a-z_]+\.pm\$>; # show all pragmatic modules | |
1103 | ||
19799a22 | 1104 | The initial comment shows a contrived, even dangerous example. |
95d94a4f | 1105 | By overriding C<glob> globally, you would be forcing the new (and |
19799a22 | 1106 | subversive) behavior for the C<glob> operator for I<every> namespace, |
95d94a4f GS |
1107 | without the complete cognizance or cooperation of the modules that own |
1108 | those namespaces. Naturally, this should be done with extreme caution--if | |
1109 | it must be done at all. | |
1110 | ||
1111 | The C<REGlob> example above does not implement all the support needed to | |
19799a22 | 1112 | cleanly override perl's C<glob> operator. The built-in C<glob> has |
95d94a4f | 1113 | different behaviors depending on whether it appears in a scalar or list |
19799a22 | 1114 | context, but our C<REGlob> doesn't. Indeed, many perl built-in have such |
95d94a4f GS |
1115 | context sensitive behaviors, and these must be adequately supported by |
1116 | a properly written override. For a fully functional example of overriding | |
1117 | C<glob>, study the implementation of C<File::DosGlob> in the standard | |
1118 | library. | |
1119 | ||
a0d0e21e LW |
1120 | =head2 Autoloading |
1121 | ||
19799a22 GS |
1122 | If you call a subroutine that is undefined, you would ordinarily |
1123 | get an immediate, fatal error complaining that the subroutine doesn't | |
1124 | exist. (Likewise for subroutines being used as methods, when the | |
1125 | method doesn't exist in any base class of the class's package.) | |
1126 | However, if an C<AUTOLOAD> subroutine is defined in the package or | |
1127 | packages used to locate the original subroutine, then that | |
1128 | C<AUTOLOAD> subroutine is called with the arguments that would have | |
1129 | been passed to the original subroutine. The fully qualified name | |
1130 | of the original subroutine magically appears in the global $AUTOLOAD | |
1131 | variable of the same package as the C<AUTOLOAD> routine. The name | |
1132 | is not passed as an ordinary argument because, er, well, just | |
1133 | because, that's why... | |
1134 | ||
1135 | Many C<AUTOLOAD> routines load in a definition for the requested | |
1136 | subroutine using eval(), then execute that subroutine using a special | |
1137 | form of goto() that erases the stack frame of the C<AUTOLOAD> routine | |
1138 | without a trace. (See the source to the standard module documented | |
1139 | in L<AutoLoader>, for example.) But an C<AUTOLOAD> routine can | |
1140 | also just emulate the routine and never define it. For example, | |
1141 | let's pretend that a function that wasn't defined should just invoke | |
1142 | C<system> with those arguments. All you'd do is: | |
cb1a09d0 AD |
1143 | |
1144 | sub AUTOLOAD { | |
1145 | my $program = $AUTOLOAD; | |
1146 | $program =~ s/.*:://; | |
1147 | system($program, @_); | |
54310121 | 1148 | } |
cb1a09d0 | 1149 | date(); |
6d28dffb | 1150 | who('am', 'i'); |
cb1a09d0 AD |
1151 | ls('-l'); |
1152 | ||
19799a22 GS |
1153 | In fact, if you predeclare functions you want to call that way, you don't |
1154 | even need parentheses: | |
cb1a09d0 AD |
1155 | |
1156 | use subs qw(date who ls); | |
1157 | date; | |
1158 | who "am", "i"; | |
1159 | ls -l; | |
1160 | ||
1161 | A more complete example of this is the standard Shell module, which | |
19799a22 | 1162 | can treat undefined subroutine calls as calls to external programs. |
a0d0e21e | 1163 | |
19799a22 GS |
1164 | Mechanisms are available to help modules writers split their modules |
1165 | into autoloadable files. See the standard AutoLoader module | |
6d28dffb | 1166 | described in L<AutoLoader> and in L<AutoSplit>, the standard |
1167 | SelfLoader modules in L<SelfLoader>, and the document on adding C | |
19799a22 | 1168 | functions to Perl code in L<perlxs>. |
cb1a09d0 AD |
1169 | |
1170 | =head1 SEE ALSO | |
a0d0e21e | 1171 | |
19799a22 GS |
1172 | See L<perlref/"Function Templates"> for more about references and closures. |
1173 | See L<perlxs> if you'd like to learn about calling C subroutines from Perl. | |
1174 | See L<perlembed> if you'd like to learn about calling PErl subroutines from C. | |
1175 | See L<perlmod> to learn about bundling up your functions in separate files. | |
1176 | See L<perlmodlib> to learn what library modules come standard on your system. | |
1177 | See L<perltoot> to learn how to make object method calls. |