Commit | Line | Data |
---|---|---|
a0d0e21e LW |
1 | =head1 NAME |
2 | ||
3 | perlsub - Perl subroutines | |
4 | ||
5 | =head1 SYNOPSIS | |
6 | ||
7 | To declare subroutines: | |
8 | ||
cb1a09d0 AD |
9 | sub NAME; # A "forward" declaration. |
10 | sub NAME(PROTO); # ditto, but with prototypes | |
11 | ||
12 | sub NAME BLOCK # A declaration and a definition. | |
13 | sub NAME(PROTO) BLOCK # ditto, but with prototypes | |
a0d0e21e | 14 | |
748a9306 LW |
15 | To define an anonymous subroutine at runtime: |
16 | ||
5a964f20 TC |
17 | $subref = sub BLOCK; # no proto |
18 | $subref = sub (PROTO) BLOCK; # with proto | |
748a9306 | 19 | |
a0d0e21e LW |
20 | To import subroutines: |
21 | ||
22 | use PACKAGE qw(NAME1 NAME2 NAME3); | |
23 | ||
24 | To call subroutines: | |
25 | ||
5f05dabc | 26 | NAME(LIST); # & is optional with parentheses. |
54310121 | 27 | NAME LIST; # Parentheses optional if predeclared/imported. |
5a964f20 | 28 | &NAME; # Makes current @_ visible to called subroutine. |
a0d0e21e LW |
29 | |
30 | =head1 DESCRIPTION | |
31 | ||
cb1a09d0 AD |
32 | Like many languages, Perl provides for user-defined subroutines. These |
33 | may be located anywhere in the main program, loaded in from other files | |
34 | via the C<do>, C<require>, or C<use> keywords, or even generated on the | |
35 | fly using C<eval> or anonymous subroutines (closures). You can even call | |
c07a80fd | 36 | a function indirectly using a variable containing its name or a CODE reference |
5a964f20 | 37 | to it. |
cb1a09d0 AD |
38 | |
39 | The Perl model for function call and return values is simple: all | |
40 | functions are passed as parameters one single flat list of scalars, and | |
41 | all functions likewise return to their caller one single flat list of | |
42 | scalars. Any arrays or hashes in these call and return lists will | |
43 | collapse, losing their identities--but you may always use | |
44 | pass-by-reference instead to avoid this. Both call and return lists may | |
45 | contain as many or as few scalar elements as you'd like. (Often a | |
46 | function without an explicit return statement is called a subroutine, but | |
47 | there's really no difference from the language's perspective.) | |
48 | ||
49 | Any arguments passed to the routine come in as the array @_. Thus if you | |
50 | called a function with two arguments, those would be stored in C<$_[0]> | |
3fe9a6f1 | 51 | and C<$_[1]>. The array @_ is a local array, but its elements are |
52 | aliases for the actual scalar parameters. In particular, if an element | |
53 | C<$_[0]> is updated, the corresponding argument is updated (or an error | |
54 | occurs if it is not updatable). If an argument is an array or hash | |
55 | element which did not exist when the function was called, that element is | |
56 | created only when (and if) it is modified or if a reference to it is | |
57 | taken. (Some earlier versions of Perl created the element whether or not | |
58 | it was assigned to.) Note that assigning to the whole array @_ removes | |
59 | the aliasing, and does not update any arguments. | |
60 | ||
61 | The return value of the subroutine is the value of the last expression | |
3e3baf6d | 62 | evaluated. Alternatively, a return statement may be used to exit the |
54310121 | 63 | subroutine, optionally specifying the returned value, which will be |
64 | evaluated in the appropriate context (list, scalar, or void) depending | |
65 | on the context of the subroutine call. If you specify no return value, | |
66 | the subroutine will return an empty list in a list context, an undefined | |
67 | value in a scalar context, or nothing in a void context. If you return | |
68 | one or more arrays and/or hashes, these will be flattened together into | |
69 | one large indistinguishable list. | |
cb1a09d0 AD |
70 | |
71 | Perl does not have named formal parameters, but in practice all you do is | |
72 | assign to a my() list of these. Any variables you use in the function | |
73 | that aren't declared private are global variables. For the gory details | |
1fef88e7 | 74 | on creating private variables, see |
6d28dffb | 75 | L<"Private Variables via my()"> and L<"Temporary Values via local()">. |
76 | To create protected environments for a set of functions in a separate | |
77 | package (and probably a separate file), see L<perlmod/"Packages">. | |
a0d0e21e LW |
78 | |
79 | Example: | |
80 | ||
cb1a09d0 AD |
81 | sub max { |
82 | my $max = shift(@_); | |
a0d0e21e LW |
83 | foreach $foo (@_) { |
84 | $max = $foo if $max < $foo; | |
85 | } | |
cb1a09d0 | 86 | return $max; |
a0d0e21e | 87 | } |
cb1a09d0 | 88 | $bestday = max($mon,$tue,$wed,$thu,$fri); |
a0d0e21e LW |
89 | |
90 | Example: | |
91 | ||
92 | # get a line, combining continuation lines | |
93 | # that start with whitespace | |
94 | ||
95 | sub get_line { | |
cb1a09d0 | 96 | $thisline = $lookahead; # GLOBAL VARIABLES!! |
54310121 | 97 | LINE: while (defined($lookahead = <STDIN>)) { |
a0d0e21e LW |
98 | if ($lookahead =~ /^[ \t]/) { |
99 | $thisline .= $lookahead; | |
100 | } | |
101 | else { | |
102 | last LINE; | |
103 | } | |
104 | } | |
105 | $thisline; | |
106 | } | |
107 | ||
108 | $lookahead = <STDIN>; # get first line | |
109 | while ($_ = get_line()) { | |
110 | ... | |
111 | } | |
112 | ||
113 | Use array assignment to a local list to name your formal arguments: | |
114 | ||
115 | sub maybeset { | |
116 | my($key, $value) = @_; | |
cb1a09d0 | 117 | $Foo{$key} = $value unless $Foo{$key}; |
a0d0e21e LW |
118 | } |
119 | ||
cb1a09d0 | 120 | This also has the effect of turning call-by-reference into call-by-value, |
5f05dabc | 121 | because the assignment copies the values. Otherwise a function is free to |
1fef88e7 | 122 | do in-place modifications of @_ and change its caller's values. |
cb1a09d0 AD |
123 | |
124 | upcase_in($v1, $v2); # this changes $v1 and $v2 | |
125 | sub upcase_in { | |
54310121 | 126 | for (@_) { tr/a-z/A-Z/ } |
127 | } | |
cb1a09d0 AD |
128 | |
129 | You aren't allowed to modify constants in this way, of course. If an | |
130 | argument were actually literal and you tried to change it, you'd take a | |
131 | (presumably fatal) exception. For example, this won't work: | |
132 | ||
133 | upcase_in("frederick"); | |
134 | ||
54310121 | 135 | It would be much safer if the upcase_in() function |
cb1a09d0 AD |
136 | were written to return a copy of its parameters instead |
137 | of changing them in place: | |
138 | ||
139 | ($v3, $v4) = upcase($v1, $v2); # this doesn't | |
140 | sub upcase { | |
54310121 | 141 | return unless defined wantarray; # void context, do nothing |
cb1a09d0 | 142 | my @parms = @_; |
54310121 | 143 | for (@parms) { tr/a-z/A-Z/ } |
c07a80fd | 144 | return wantarray ? @parms : $parms[0]; |
54310121 | 145 | } |
cb1a09d0 AD |
146 | |
147 | Notice how this (unprototyped) function doesn't care whether it was passed | |
148 | real scalars or arrays. Perl will see everything as one big long flat @_ | |
149 | parameter list. This is one of the ways where Perl's simple | |
150 | argument-passing style shines. The upcase() function would work perfectly | |
151 | well without changing the upcase() definition even if we fed it things | |
152 | like this: | |
153 | ||
154 | @newlist = upcase(@list1, @list2); | |
155 | @newlist = upcase( split /:/, $var ); | |
156 | ||
157 | Do not, however, be tempted to do this: | |
158 | ||
159 | (@a, @b) = upcase(@list1, @list2); | |
160 | ||
161 | Because like its flat incoming parameter list, the return list is also | |
162 | flat. So all you have managed to do here is stored everything in @a and | |
7b8d334a | 163 | made @b an empty list. See L<Pass by Reference> for alternatives. |
cb1a09d0 | 164 | |
5f05dabc | 165 | A subroutine may be called using the "&" prefix. The "&" is optional |
166 | in modern Perls, and so are the parentheses if the subroutine has been | |
54310121 | 167 | predeclared. (Note, however, that the "&" is I<NOT> optional when |
5f05dabc | 168 | you're just naming the subroutine, such as when it's used as an |
169 | argument to defined() or undef(). Nor is it optional when you want to | |
170 | do an indirect subroutine call with a subroutine name or reference | |
171 | using the C<&$subref()> or C<&{$subref}()> constructs. See L<perlref> | |
172 | for more on that.) | |
a0d0e21e LW |
173 | |
174 | Subroutines may be called recursively. If a subroutine is called using | |
cb1a09d0 AD |
175 | the "&" form, the argument list is optional, and if omitted, no @_ array is |
176 | set up for the subroutine: the @_ array at the time of the call is | |
177 | visible to subroutine instead. This is an efficiency mechanism that | |
178 | new users may wish to avoid. | |
a0d0e21e LW |
179 | |
180 | &foo(1,2,3); # pass three arguments | |
181 | foo(1,2,3); # the same | |
182 | ||
183 | foo(); # pass a null list | |
184 | &foo(); # the same | |
a0d0e21e | 185 | |
cb1a09d0 | 186 | &foo; # foo() get current args, like foo(@_) !! |
54310121 | 187 | foo; # like foo() IFF sub foo predeclared, else "foo" |
cb1a09d0 | 188 | |
c07a80fd | 189 | Not only does the "&" form make the argument list optional, but it also |
190 | disables any prototype checking on the arguments you do provide. This | |
191 | is partly for historical reasons, and partly for having a convenient way | |
192 | to cheat if you know what you're doing. See the section on Prototypes below. | |
193 | ||
5a964f20 TC |
194 | Function whose names are in all upper case are reserved to the Perl core, |
195 | just as are modules whose names are in all lower case. A function in | |
196 | all capitals is a loosely-held convention meaning it will be called | |
197 | indirectly by the run-time system itself. Functions that do special, | |
198 | pre-defined things BEGIN, END, AUTOLOAD, and DESTROY--plus all the | |
199 | functions mentioned in L<perltie>. The 5.005 release adds INIT | |
200 | to this list. | |
201 | ||
cb1a09d0 AD |
202 | =head2 Private Variables via my() |
203 | ||
204 | Synopsis: | |
205 | ||
206 | my $foo; # declare $foo lexically local | |
207 | my (@wid, %get); # declare list of variables local | |
208 | my $foo = "flurp"; # declare $foo lexical, and init it | |
209 | my @oof = @bar; # declare @oof lexical, and init it | |
210 | ||
211 | A "my" declares the listed variables to be confined (lexically) to the | |
55497cff | 212 | enclosing block, conditional (C<if/unless/elsif/else>), loop |
213 | (C<for/foreach/while/until/continue>), subroutine, C<eval>, or | |
214 | C<do/require/use>'d file. If more than one value is listed, the list | |
5f05dabc | 215 | must be placed in parentheses. All listed elements must be legal lvalues. |
55497cff | 216 | Only alphanumeric identifiers may be lexically scoped--magical |
217 | builtins like $/ must currently be localized with "local" instead. | |
cb1a09d0 | 218 | |
5a964f20 | 219 | Unlike dynamic variables created by the "local" operator, lexical |
cb1a09d0 AD |
220 | variables declared with "my" are totally hidden from the outside world, |
221 | including any called subroutines (even if it's the same subroutine called | |
222 | from itself or elsewhere--every call gets its own copy). | |
223 | ||
5a964f20 TC |
224 | This doesn't mean that a my() variable declared in a statically |
225 | I<enclosing> lexical scope would be invisible. Only the dynamic scopes | |
226 | are cut off. For example, the bumpx() function below has access to the | |
227 | lexical $x variable because both the my and the sub occurred at the same | |
228 | scope, presumably the file scope. | |
229 | ||
230 | my $x = 10; | |
231 | sub bumpx { $x++ } | |
232 | ||
cb1a09d0 AD |
233 | (An eval(), however, can see the lexical variables of the scope it is |
234 | being evaluated in so long as the names aren't hidden by declarations within | |
235 | the eval() itself. See L<perlref>.) | |
236 | ||
237 | The parameter list to my() may be assigned to if desired, which allows you | |
238 | to initialize your variables. (If no initializer is given for a | |
239 | particular variable, it is created with the undefined value.) Commonly | |
240 | this is used to name the parameters to a subroutine. Examples: | |
241 | ||
242 | $arg = "fred"; # "global" variable | |
243 | $n = cube_root(27); | |
244 | print "$arg thinks the root is $n\n"; | |
245 | fred thinks the root is 3 | |
246 | ||
247 | sub cube_root { | |
248 | my $arg = shift; # name doesn't matter | |
249 | $arg **= 1/3; | |
250 | return $arg; | |
54310121 | 251 | } |
cb1a09d0 AD |
252 | |
253 | The "my" is simply a modifier on something you might assign to. So when | |
254 | you do assign to the variables in its argument list, the "my" doesn't | |
6cc33c6d | 255 | change whether those variables are viewed as a scalar or an array. So |
cb1a09d0 | 256 | |
5a964f20 | 257 | my ($foo) = <STDIN>; # WRONG? |
cb1a09d0 AD |
258 | my @FOO = <STDIN>; |
259 | ||
5f05dabc | 260 | both supply a list context to the right-hand side, while |
cb1a09d0 AD |
261 | |
262 | my $foo = <STDIN>; | |
263 | ||
5f05dabc | 264 | supplies a scalar context. But the following declares only one variable: |
748a9306 | 265 | |
5a964f20 | 266 | my $foo, $bar = 1; # WRONG |
748a9306 | 267 | |
cb1a09d0 | 268 | That has the same effect as |
748a9306 | 269 | |
cb1a09d0 AD |
270 | my $foo; |
271 | $bar = 1; | |
a0d0e21e | 272 | |
cb1a09d0 AD |
273 | The declared variable is not introduced (is not visible) until after |
274 | the current statement. Thus, | |
275 | ||
276 | my $x = $x; | |
277 | ||
54310121 | 278 | can be used to initialize the new $x with the value of the old $x, and |
cb1a09d0 AD |
279 | the expression |
280 | ||
281 | my $x = 123 and $x == 123 | |
282 | ||
283 | is false unless the old $x happened to have the value 123. | |
284 | ||
55497cff | 285 | Lexical scopes of control structures are not bounded precisely by the |
286 | braces that delimit their controlled blocks; control expressions are | |
287 | part of the scope, too. Thus in the loop | |
288 | ||
54310121 | 289 | while (defined(my $line = <>)) { |
55497cff | 290 | $line = lc $line; |
291 | } continue { | |
292 | print $line; | |
293 | } | |
294 | ||
295 | the scope of $line extends from its declaration throughout the rest of | |
296 | the loop construct (including the C<continue> clause), but not beyond | |
297 | it. Similarly, in the conditional | |
298 | ||
299 | if ((my $answer = <STDIN>) =~ /^yes$/i) { | |
300 | user_agrees(); | |
301 | } elsif ($answer =~ /^no$/i) { | |
302 | user_disagrees(); | |
303 | } else { | |
304 | chomp $answer; | |
305 | die "'$answer' is neither 'yes' nor 'no'"; | |
306 | } | |
307 | ||
308 | the scope of $answer extends from its declaration throughout the rest | |
309 | of the conditional (including C<elsif> and C<else> clauses, if any), | |
310 | but not beyond it. | |
311 | ||
312 | (None of the foregoing applies to C<if/unless> or C<while/until> | |
313 | modifiers appended to simple statements. Such modifiers are not | |
314 | control structures and have no effect on scoping.) | |
315 | ||
5f05dabc | 316 | The C<foreach> loop defaults to scoping its index variable dynamically |
55497cff | 317 | (in the manner of C<local>; see below). However, if the index |
318 | variable is prefixed with the keyword "my", then it is lexically | |
319 | scoped instead. Thus in the loop | |
320 | ||
321 | for my $i (1, 2, 3) { | |
322 | some_function(); | |
323 | } | |
324 | ||
325 | the scope of $i extends to the end of the loop, but not beyond it, and | |
326 | so the value of $i is unavailable in some_function(). | |
327 | ||
cb1a09d0 AD |
328 | Some users may wish to encourage the use of lexically scoped variables. |
329 | As an aid to catching implicit references to package variables, | |
330 | if you say | |
331 | ||
332 | use strict 'vars'; | |
333 | ||
334 | then any variable reference from there to the end of the enclosing | |
335 | block must either refer to a lexical variable, or must be fully | |
336 | qualified with the package name. A compilation error results | |
337 | otherwise. An inner block may countermand this with S<"no strict 'vars'">. | |
338 | ||
339 | A my() has both a compile-time and a run-time effect. At compile time, | |
340 | the compiler takes notice of it; the principle usefulness of this is to | |
7bac28a0 | 341 | quiet C<use strict 'vars'>. The actual initialization is delayed until |
342 | run time, so it gets executed appropriately; every time through a loop, | |
343 | for example. | |
cb1a09d0 AD |
344 | |
345 | Variables declared with "my" are not part of any package and are therefore | |
346 | never fully qualified with the package name. In particular, you're not | |
347 | allowed to try to make a package variable (or other global) lexical: | |
348 | ||
349 | my $pack::var; # ERROR! Illegal syntax | |
350 | my $_; # also illegal (currently) | |
351 | ||
352 | In fact, a dynamic variable (also known as package or global variables) | |
353 | are still accessible using the fully qualified :: notation even while a | |
354 | lexical of the same name is also visible: | |
355 | ||
356 | package main; | |
357 | local $x = 10; | |
358 | my $x = 20; | |
359 | print "$x and $::x\n"; | |
360 | ||
361 | That will print out 20 and 10. | |
362 | ||
5a964f20 TC |
363 | You may declare "my" variables at the outermost scope of a file to hide |
364 | any such identifiers totally from the outside world. This is similar | |
6d28dffb | 365 | to C's static variables at the file level. To do this with a subroutine |
5a964f20 TC |
366 | requires the use of a closure (anonymous function with lexical access). |
367 | If a block (such as an eval(), function, or C<package>) wants to create | |
368 | a private subroutine that cannot be called from outside that block, | |
369 | it can declare a lexical variable containing an anonymous sub reference: | |
cb1a09d0 AD |
370 | |
371 | my $secret_version = '1.001-beta'; | |
372 | my $secret_sub = sub { print $secret_version }; | |
373 | &$secret_sub(); | |
374 | ||
375 | As long as the reference is never returned by any function within the | |
5f05dabc | 376 | module, no outside module can see the subroutine, because its name is not in |
cb1a09d0 AD |
377 | any package's symbol table. Remember that it's not I<REALLY> called |
378 | $some_pack::secret_version or anything; it's just $secret_version, | |
379 | unqualified and unqualifiable. | |
380 | ||
381 | This does not work with object methods, however; all object methods have | |
382 | to be in the symbol table of some package to be found. | |
383 | ||
5a964f20 TC |
384 | =head2 Peristent Private Variables |
385 | ||
386 | Just because a lexical variable is lexically (also called statically) | |
387 | scoped to its enclosing block, eval, or do FILE, this doesn't mean that | |
388 | within a function it works like a C static. It normally works more | |
389 | like a C auto, but with implicit garbage collection. | |
390 | ||
391 | Unlike local variables in C or C++, Perl's lexical variables don't | |
392 | necessarily get recycled just because their scope has exited. | |
393 | If something more permanent is still aware of the lexical, it will | |
394 | stick around. So long as something else references a lexical, that | |
395 | lexical won't be freed--which is as it should be. You wouldn't want | |
396 | memory being free until you were done using it, or kept around once you | |
397 | were done. Automatic garbage collection takes care of this for you. | |
398 | ||
399 | This means that you can pass back or save away references to lexical | |
400 | variables, whereas to return a pointer to a C auto is a grave error. | |
401 | It also gives us a way to simulate C's function statics. Here's a | |
402 | mechanism for giving a function private variables with both lexical | |
403 | scoping and a static lifetime. If you do want to create something like | |
404 | C's static variables, just enclose the whole function in an extra block, | |
405 | and put the static variable outside the function but in the block. | |
cb1a09d0 AD |
406 | |
407 | { | |
54310121 | 408 | my $secret_val = 0; |
cb1a09d0 AD |
409 | sub gimme_another { |
410 | return ++$secret_val; | |
54310121 | 411 | } |
412 | } | |
cb1a09d0 AD |
413 | # $secret_val now becomes unreachable by the outside |
414 | # world, but retains its value between calls to gimme_another | |
415 | ||
54310121 | 416 | If this function is being sourced in from a separate file |
cb1a09d0 | 417 | via C<require> or C<use>, then this is probably just fine. If it's |
54310121 | 418 | all in the main program, you'll need to arrange for the my() |
cb1a09d0 | 419 | to be executed early, either by putting the whole block above |
93e318e6 | 420 | your main program, or more likely, placing merely a BEGIN |
cb1a09d0 AD |
421 | sub around it to make sure it gets executed before your program |
422 | starts to run: | |
423 | ||
424 | sub BEGIN { | |
54310121 | 425 | my $secret_val = 0; |
cb1a09d0 AD |
426 | sub gimme_another { |
427 | return ++$secret_val; | |
54310121 | 428 | } |
429 | } | |
cb1a09d0 AD |
430 | |
431 | See L<perlrun> about the BEGIN function. | |
432 | ||
5a964f20 TC |
433 | If declared at the outermost scope, the file scope, then lexicals work |
434 | someone like C's file statics. They are available to all functions in | |
435 | that same file declared below them, but are inaccessible from outside of | |
436 | the file. This is sometimes used in modules to create private variables | |
437 | for the whole module. | |
438 | ||
cb1a09d0 AD |
439 | =head2 Temporary Values via local() |
440 | ||
441 | B<NOTE>: In general, you should be using "my" instead of "local", because | |
6d28dffb | 442 | it's faster and safer. Exceptions to this include the global punctuation |
cb1a09d0 AD |
443 | variables, filehandles and formats, and direct manipulation of the Perl |
444 | symbol table itself. Format variables often use "local" though, as do | |
445 | other variables whose current value must be visible to called | |
446 | subroutines. | |
447 | ||
448 | Synopsis: | |
449 | ||
450 | local $foo; # declare $foo dynamically local | |
451 | local (@wid, %get); # declare list of variables local | |
452 | local $foo = "flurp"; # declare $foo dynamic, and init it | |
453 | local @oof = @bar; # declare @oof dynamic, and init it | |
454 | ||
455 | local *FH; # localize $FH, @FH, %FH, &FH ... | |
456 | local *merlyn = *randal; # now $merlyn is really $randal, plus | |
457 | # @merlyn is really @randal, etc | |
458 | local *merlyn = 'randal'; # SAME THING: promote 'randal' to *randal | |
54310121 | 459 | local *merlyn = \$randal; # just alias $merlyn, not @merlyn etc |
cb1a09d0 | 460 | |
5a964f20 TC |
461 | A local() modifies its listed variables to be "local" to the enclosing |
462 | block, eval, or C<do FILE>--and to I<any called from within that block>. | |
463 | A local() just gives temporary values to global (meaning package) | |
464 | variables. It does not create a local variable. This is known as | |
465 | dynamic scoping. Lexical scoping is done with "my", which works more | |
466 | like C's auto declarations. | |
cb1a09d0 AD |
467 | |
468 | If more than one variable is given to local(), they must be placed in | |
5f05dabc | 469 | parentheses. All listed elements must be legal lvalues. This operator works |
cb1a09d0 | 470 | by saving the current values of those variables in its argument list on a |
5f05dabc | 471 | hidden stack and restoring them upon exiting the block, subroutine, or |
cb1a09d0 AD |
472 | eval. This means that called subroutines can also reference the local |
473 | variable, but not the global one. The argument list may be assigned to if | |
474 | desired, which allows you to initialize your local variables. (If no | |
475 | initializer is given for a particular variable, it is created with an | |
476 | undefined value.) Commonly this is used to name the parameters to a | |
477 | subroutine. Examples: | |
478 | ||
479 | for $i ( 0 .. 9 ) { | |
480 | $digits{$i} = $i; | |
54310121 | 481 | } |
cb1a09d0 | 482 | # assume this function uses global %digits hash |
54310121 | 483 | parse_num(); |
cb1a09d0 AD |
484 | |
485 | # now temporarily add to %digits hash | |
486 | if ($base12) { | |
487 | # (NOTE: not claiming this is efficient!) | |
488 | local %digits = (%digits, 't' => 10, 'e' => 11); | |
489 | parse_num(); # parse_num gets this new %digits! | |
490 | } | |
491 | # old %digits restored here | |
492 | ||
1fef88e7 | 493 | Because local() is a run-time command, it gets executed every time |
cb1a09d0 AD |
494 | through a loop. In releases of Perl previous to 5.0, this used more stack |
495 | storage each time until the loop was exited. Perl now reclaims the space | |
496 | each time through, but it's still more efficient to declare your variables | |
497 | outside the loop. | |
498 | ||
499 | A local is simply a modifier on an lvalue expression. When you assign to | |
500 | a localized variable, the local doesn't change whether its list is viewed | |
501 | as a scalar or an array. So | |
502 | ||
503 | local($foo) = <STDIN>; | |
504 | local @FOO = <STDIN>; | |
505 | ||
5f05dabc | 506 | both supply a list context to the right-hand side, while |
cb1a09d0 AD |
507 | |
508 | local $foo = <STDIN>; | |
509 | ||
510 | supplies a scalar context. | |
511 | ||
3e3baf6d TB |
512 | A note about C<local()> and composite types is in order. Something |
513 | like C<local(%foo)> works by temporarily placing a brand new hash in | |
514 | the symbol table. The old hash is left alone, but is hidden "behind" | |
515 | the new one. | |
516 | ||
517 | This means the old variable is completely invisible via the symbol | |
518 | table (i.e. the hash entry in the C<*foo> typeglob) for the duration | |
519 | of the dynamic scope within which the C<local()> was seen. This | |
520 | has the effect of allowing one to temporarily occlude any magic on | |
521 | composite types. For instance, this will briefly alter a tied | |
522 | hash to some other implementation: | |
523 | ||
524 | tie %ahash, 'APackage'; | |
525 | [...] | |
526 | { | |
527 | local %ahash; | |
528 | tie %ahash, 'BPackage'; | |
529 | [..called code will see %ahash tied to 'BPackage'..] | |
530 | { | |
531 | local %ahash; | |
532 | [..%ahash is a normal (untied) hash here..] | |
533 | } | |
534 | } | |
535 | [..%ahash back to its initial tied self again..] | |
536 | ||
537 | As another example, a custom implementation of C<%ENV> might look | |
538 | like this: | |
539 | ||
540 | { | |
541 | local %ENV; | |
542 | tie %ENV, 'MyOwnEnv'; | |
543 | [..do your own fancy %ENV manipulation here..] | |
544 | } | |
545 | [..normal %ENV behavior here..] | |
546 | ||
6ee623d5 GS |
547 | It's also worth taking a moment to explain what happens when you |
548 | localize a member of a composite type (i.e. an array or hash element). | |
549 | In this case, the element is localized I<by name>. This means that | |
550 | when the scope of the C<local()> ends, the saved value will be | |
551 | restored to the hash element whose key was named in the C<local()>, or | |
552 | the array element whose index was named in the C<local()>. If that | |
553 | element was deleted while the C<local()> was in effect (e.g. by a | |
554 | C<delete()> from a hash or a C<shift()> of an array), it will spring | |
555 | back into existence, possibly extending an array and filling in the | |
556 | skipped elements with C<undef>. For instance, if you say | |
557 | ||
558 | %hash = ( 'This' => 'is', 'a' => 'test' ); | |
559 | @ary = ( 0..5 ); | |
560 | { | |
561 | local($ary[5]) = 6; | |
562 | local($hash{'a'}) = 'drill'; | |
563 | while (my $e = pop(@ary)) { | |
564 | print "$e . . .\n"; | |
565 | last unless $e > 3; | |
566 | } | |
567 | if (@ary) { | |
568 | $hash{'only a'} = 'test'; | |
569 | delete $hash{'a'}; | |
570 | } | |
571 | } | |
572 | print join(' ', map { "$_ $hash{$_}" } sort keys %hash),".\n"; | |
573 | print "The array has ",scalar(@ary)," elements: ", | |
574 | join(', ', map { defined $_ ? $_ : 'undef' } @ary),"\n"; | |
575 | ||
576 | Perl will print | |
577 | ||
578 | 6 . . . | |
579 | 4 . . . | |
580 | 3 . . . | |
581 | This is a test only a test. | |
582 | The array has 6 elements: 0, 1, 2, undef, undef, 5 | |
583 | ||
cb1a09d0 AD |
584 | =head2 Passing Symbol Table Entries (typeglobs) |
585 | ||
586 | [Note: The mechanism described in this section was originally the only | |
587 | way to simulate pass-by-reference in older versions of Perl. While it | |
588 | still works fine in modern versions, the new reference mechanism is | |
589 | generally easier to work with. See below.] | |
a0d0e21e LW |
590 | |
591 | Sometimes you don't want to pass the value of an array to a subroutine | |
592 | but rather the name of it, so that the subroutine can modify the global | |
593 | copy of it rather than working with a local copy. In perl you can | |
cb1a09d0 | 594 | refer to all objects of a particular name by prefixing the name |
5f05dabc | 595 | with a star: C<*foo>. This is often known as a "typeglob", because the |
a0d0e21e LW |
596 | star on the front can be thought of as a wildcard match for all the |
597 | funny prefix characters on variables and subroutines and such. | |
598 | ||
55497cff | 599 | When evaluated, the typeglob produces a scalar value that represents |
5f05dabc | 600 | all the objects of that name, including any filehandle, format, or |
a0d0e21e LW |
601 | subroutine. When assigned to, it causes the name mentioned to refer to |
602 | whatever "*" value was assigned to it. Example: | |
603 | ||
604 | sub doubleary { | |
605 | local(*someary) = @_; | |
606 | foreach $elem (@someary) { | |
607 | $elem *= 2; | |
608 | } | |
609 | } | |
610 | doubleary(*foo); | |
611 | doubleary(*bar); | |
612 | ||
613 | Note that scalars are already passed by reference, so you can modify | |
614 | scalar arguments without using this mechanism by referring explicitly | |
1fef88e7 | 615 | to C<$_[0]> etc. You can modify all the elements of an array by passing |
a0d0e21e | 616 | all the elements as scalars, but you have to use the * mechanism (or |
5f05dabc | 617 | the equivalent reference mechanism) to push, pop, or change the size of |
a0d0e21e LW |
618 | an array. It will certainly be faster to pass the typeglob (or reference). |
619 | ||
620 | Even if you don't want to modify an array, this mechanism is useful for | |
5f05dabc | 621 | passing multiple arrays in a single LIST, because normally the LIST |
a0d0e21e | 622 | mechanism will merge all the array values so that you can't extract out |
55497cff | 623 | the individual arrays. For more on typeglobs, see |
2ae324a7 | 624 | L<perldata/"Typeglobs and Filehandles">. |
cb1a09d0 | 625 | |
5a964f20 TC |
626 | =head2 When to Still Use local() |
627 | ||
628 | Despite the existence of my(), there are still three places where the | |
629 | local() operator still shines. In fact, in these three places, you | |
630 | I<must> use C<local> instead of C<my>. | |
631 | ||
632 | =over | |
633 | ||
634 | =item 1. You need to give a global variable a temporary value, especially $_. | |
635 | ||
636 | The global variables, like @ARGV or the punctuation variables, must be | |
637 | localized with local(). This block reads in I</etc/motd>, and splits | |
638 | it up into chunks separated by lines of equal signs, which are placed | |
639 | in @Fields. | |
640 | ||
641 | { | |
642 | local @ARGV = ("/etc/motd"); | |
643 | local $/ = undef; | |
644 | local $_ = <>; | |
645 | @Fields = split /^\s*=+\s*$/; | |
646 | } | |
647 | ||
648 | It particular, its important to localize $_ in any routine that assigns | |
649 | to it. Look out for implicit assignments in C<while> conditionals. | |
650 | ||
651 | =item 2. You need to create a local file or directory handle or a local function. | |
652 | ||
653 | A function that needs a filehandle of its own must use local() uses | |
654 | local() on complete typeglob. This can be used to create new symbol | |
655 | table entries: | |
656 | ||
657 | sub ioqueue { | |
658 | local (*READER, *WRITER); # not my! | |
659 | pipe (READER, WRITER); or die "pipe: $!"; | |
660 | return (*READER, *WRITER); | |
661 | } | |
662 | ($head, $tail) = ioqueue(); | |
663 | ||
664 | See the Symbol module for a way to create anonymous symbol table | |
665 | entries. | |
666 | ||
667 | Because assignment of a reference to a typeglob creates an alias, this | |
668 | can be used to create what is effectively a local function, or at least, | |
669 | a local alias. | |
670 | ||
671 | { | |
672 | local *grow = \&shrink; # only until this block exists | |
673 | grow(); # really calls shrink() | |
674 | move(); # if move() grow()s, it shrink()s too | |
675 | } | |
676 | grow(); # get the real grow() again | |
677 | ||
678 | See L<perlref/"Function Templates"> for more about manipulating | |
679 | functions by name in this way. | |
680 | ||
681 | =item 3. You want to temporarily change just one element of an array or hash. | |
682 | ||
683 | You can localize just one element of an aggregate. Usually this | |
684 | is done on dynamics: | |
685 | ||
686 | { | |
687 | local $SIG{INT} = 'IGNORE'; | |
688 | funct(); # uninterruptible | |
689 | } | |
690 | # interruptibility automatically restored here | |
691 | ||
692 | But it also works on lexically declared aggregates. Prior to 5.005, | |
693 | this operation could on occasion misbehave. | |
694 | ||
695 | =back | |
696 | ||
cb1a09d0 AD |
697 | =head2 Pass by Reference |
698 | ||
55497cff | 699 | If you want to pass more than one array or hash into a function--or |
700 | return them from it--and have them maintain their integrity, then | |
701 | you're going to have to use an explicit pass-by-reference. Before you | |
702 | do that, you need to understand references as detailed in L<perlref>. | |
c07a80fd | 703 | This section may not make much sense to you otherwise. |
cb1a09d0 AD |
704 | |
705 | Here are a few simple examples. First, let's pass in several | |
706 | arrays to a function and have it pop all of then, return a new | |
707 | list of all their former last elements: | |
708 | ||
709 | @tailings = popmany ( \@a, \@b, \@c, \@d ); | |
710 | ||
711 | sub popmany { | |
712 | my $aref; | |
713 | my @retlist = (); | |
714 | foreach $aref ( @_ ) { | |
715 | push @retlist, pop @$aref; | |
54310121 | 716 | } |
cb1a09d0 | 717 | return @retlist; |
54310121 | 718 | } |
cb1a09d0 | 719 | |
54310121 | 720 | Here's how you might write a function that returns a |
cb1a09d0 AD |
721 | list of keys occurring in all the hashes passed to it: |
722 | ||
54310121 | 723 | @common = inter( \%foo, \%bar, \%joe ); |
cb1a09d0 AD |
724 | sub inter { |
725 | my ($k, $href, %seen); # locals | |
726 | foreach $href (@_) { | |
727 | while ( $k = each %$href ) { | |
728 | $seen{$k}++; | |
54310121 | 729 | } |
730 | } | |
cb1a09d0 | 731 | return grep { $seen{$_} == @_ } keys %seen; |
54310121 | 732 | } |
cb1a09d0 | 733 | |
5f05dabc | 734 | So far, we're using just the normal list return mechanism. |
54310121 | 735 | What happens if you want to pass or return a hash? Well, |
736 | if you're using only one of them, or you don't mind them | |
cb1a09d0 | 737 | concatenating, then the normal calling convention is ok, although |
54310121 | 738 | a little expensive. |
cb1a09d0 AD |
739 | |
740 | Where people get into trouble is here: | |
741 | ||
742 | (@a, @b) = func(@c, @d); | |
743 | or | |
744 | (%a, %b) = func(%c, %d); | |
745 | ||
5f05dabc | 746 | That syntax simply won't work. It sets just @a or %a and clears the @b or |
cb1a09d0 AD |
747 | %b. Plus the function didn't get passed into two separate arrays or |
748 | hashes: it got one long list in @_, as always. | |
749 | ||
750 | If you can arrange for everyone to deal with this through references, it's | |
751 | cleaner code, although not so nice to look at. Here's a function that | |
752 | takes two array references as arguments, returning the two array elements | |
753 | in order of how many elements they have in them: | |
754 | ||
755 | ($aref, $bref) = func(\@c, \@d); | |
756 | print "@$aref has more than @$bref\n"; | |
757 | sub func { | |
758 | my ($cref, $dref) = @_; | |
759 | if (@$cref > @$dref) { | |
760 | return ($cref, $dref); | |
761 | } else { | |
c07a80fd | 762 | return ($dref, $cref); |
54310121 | 763 | } |
764 | } | |
cb1a09d0 AD |
765 | |
766 | It turns out that you can actually do this also: | |
767 | ||
768 | (*a, *b) = func(\@c, \@d); | |
769 | print "@a has more than @b\n"; | |
770 | sub func { | |
771 | local (*c, *d) = @_; | |
772 | if (@c > @d) { | |
773 | return (\@c, \@d); | |
774 | } else { | |
775 | return (\@d, \@c); | |
54310121 | 776 | } |
777 | } | |
cb1a09d0 AD |
778 | |
779 | Here we're using the typeglobs to do symbol table aliasing. It's | |
780 | a tad subtle, though, and also won't work if you're using my() | |
5f05dabc | 781 | variables, because only globals (well, and local()s) are in the symbol table. |
782 | ||
783 | If you're passing around filehandles, you could usually just use the bare | |
784 | typeglob, like *STDOUT, but typeglobs references would be better because | |
785 | they'll still work properly under C<use strict 'refs'>. For example: | |
786 | ||
787 | splutter(\*STDOUT); | |
788 | sub splutter { | |
789 | my $fh = shift; | |
790 | print $fh "her um well a hmmm\n"; | |
791 | } | |
792 | ||
793 | $rec = get_rec(\*STDIN); | |
794 | sub get_rec { | |
795 | my $fh = shift; | |
796 | return scalar <$fh>; | |
797 | } | |
798 | ||
799 | Another way to do this is using *HANDLE{IO}, see L<perlref> for usage | |
800 | and caveats. | |
801 | ||
802 | If you're planning on generating new filehandles, you could do this: | |
803 | ||
804 | sub openit { | |
805 | my $name = shift; | |
806 | local *FH; | |
e05a3a1e | 807 | return open (FH, $path) ? *FH : undef; |
54310121 | 808 | } |
5f05dabc | 809 | |
810 | Although that will actually produce a small memory leak. See the bottom | |
811 | of L<perlfunc/open()> for a somewhat cleaner way using the IO::Handle | |
812 | package. | |
cb1a09d0 | 813 | |
cb1a09d0 AD |
814 | =head2 Prototypes |
815 | ||
816 | As of the 5.002 release of perl, if you declare | |
817 | ||
818 | sub mypush (\@@) | |
819 | ||
c07a80fd | 820 | then mypush() takes arguments exactly like push() does. The declaration |
821 | of the function to be called must be visible at compile time. The prototype | |
5f05dabc | 822 | affects only the interpretation of new-style calls to the function, where |
c07a80fd | 823 | new-style is defined as not using the C<&> character. In other words, |
824 | if you call it like a builtin function, then it behaves like a builtin | |
825 | function. If you call it like an old-fashioned subroutine, then it | |
826 | behaves like an old-fashioned subroutine. It naturally falls out from | |
827 | this rule that prototypes have no influence on subroutine references | |
828 | like C<\&foo> or on indirect subroutine calls like C<&{$subref}>. | |
829 | ||
830 | Method calls are not influenced by prototypes either, because the | |
5f05dabc | 831 | function to be called is indeterminate at compile time, because it depends |
c07a80fd | 832 | on inheritance. |
cb1a09d0 | 833 | |
5f05dabc | 834 | Because the intent is primarily to let you define subroutines that work |
c07a80fd | 835 | like builtin commands, here are the prototypes for some other functions |
836 | that parse almost exactly like the corresponding builtins. | |
cb1a09d0 AD |
837 | |
838 | Declared as Called as | |
839 | ||
840 | sub mylink ($$) mylink $old, $new | |
841 | sub myvec ($$$) myvec $var, $offset, 1 | |
842 | sub myindex ($$;$) myindex &getstring, "substr" | |
843 | sub mysyswrite ($$$;$) mysyswrite $buf, 0, length($buf) - $off, $off | |
844 | sub myreverse (@) myreverse $a,$b,$c | |
845 | sub myjoin ($@) myjoin ":",$a,$b,$c | |
846 | sub mypop (\@) mypop @array | |
847 | sub mysplice (\@$$@) mysplice @array,@array,0,@pushme | |
848 | sub mykeys (\%) mykeys %{$hashref} | |
849 | sub myopen (*;$) myopen HANDLE, $name | |
850 | sub mypipe (**) mypipe READHANDLE, WRITEHANDLE | |
851 | sub mygrep (&@) mygrep { /foo/ } $a,$b,$c | |
852 | sub myrand ($) myrand 42 | |
853 | sub mytime () mytime | |
854 | ||
c07a80fd | 855 | Any backslashed prototype character represents an actual argument |
6e47f808 | 856 | that absolutely must start with that character. The value passed |
857 | to the subroutine (as part of C<@_>) will be a reference to the | |
858 | actual argument given in the subroutine call, obtained by applying | |
859 | C<\> to that argument. | |
c07a80fd | 860 | |
861 | Unbackslashed prototype characters have special meanings. Any | |
862 | unbackslashed @ or % eats all the rest of the arguments, and forces | |
863 | list context. An argument represented by $ forces scalar context. An | |
864 | & requires an anonymous subroutine, which, if passed as the first | |
865 | argument, does not require the "sub" keyword or a subsequent comma. A | |
866 | * does whatever it has to do to turn the argument into a reference to a | |
867 | symbol table entry. | |
868 | ||
869 | A semicolon separates mandatory arguments from optional arguments. | |
870 | (It is redundant before @ or %.) | |
cb1a09d0 | 871 | |
c07a80fd | 872 | Note how the last three examples above are treated specially by the parser. |
cb1a09d0 AD |
873 | mygrep() is parsed as a true list operator, myrand() is parsed as a |
874 | true unary operator with unary precedence the same as rand(), and | |
5f05dabc | 875 | mytime() is truly without arguments, just like time(). That is, if you |
cb1a09d0 AD |
876 | say |
877 | ||
878 | mytime +2; | |
879 | ||
880 | you'll get mytime() + 2, not mytime(2), which is how it would be parsed | |
881 | without the prototype. | |
882 | ||
883 | The interesting thing about & is that you can generate new syntax with it: | |
884 | ||
6d28dffb | 885 | sub try (&@) { |
cb1a09d0 AD |
886 | my($try,$catch) = @_; |
887 | eval { &$try }; | |
888 | if ($@) { | |
889 | local $_ = $@; | |
890 | &$catch; | |
891 | } | |
892 | } | |
55497cff | 893 | sub catch (&) { $_[0] } |
cb1a09d0 AD |
894 | |
895 | try { | |
896 | die "phooey"; | |
897 | } catch { | |
898 | /phooey/ and print "unphooey\n"; | |
899 | }; | |
900 | ||
901 | That prints "unphooey". (Yes, there are still unresolved | |
902 | issues having to do with the visibility of @_. I'm ignoring that | |
903 | question for the moment. (But note that if we make @_ lexically | |
904 | scoped, those anonymous subroutines can act like closures... (Gee, | |
5f05dabc | 905 | is this sounding a little Lispish? (Never mind.)))) |
cb1a09d0 AD |
906 | |
907 | And here's a reimplementation of grep: | |
908 | ||
909 | sub mygrep (&@) { | |
910 | my $code = shift; | |
911 | my @result; | |
912 | foreach $_ (@_) { | |
6e47f808 | 913 | push(@result, $_) if &$code; |
cb1a09d0 AD |
914 | } |
915 | @result; | |
916 | } | |
a0d0e21e | 917 | |
cb1a09d0 AD |
918 | Some folks would prefer full alphanumeric prototypes. Alphanumerics have |
919 | been intentionally left out of prototypes for the express purpose of | |
920 | someday in the future adding named, formal parameters. The current | |
921 | mechanism's main goal is to let module writers provide better diagnostics | |
922 | for module users. Larry feels the notation quite understandable to Perl | |
923 | programmers, and that it will not intrude greatly upon the meat of the | |
924 | module, nor make it harder to read. The line noise is visually | |
925 | encapsulated into a small pill that's easy to swallow. | |
926 | ||
927 | It's probably best to prototype new functions, not retrofit prototyping | |
928 | into older ones. That's because you must be especially careful about | |
929 | silent impositions of differing list versus scalar contexts. For example, | |
930 | if you decide that a function should take just one parameter, like this: | |
931 | ||
932 | sub func ($) { | |
933 | my $n = shift; | |
934 | print "you gave me $n\n"; | |
54310121 | 935 | } |
cb1a09d0 AD |
936 | |
937 | and someone has been calling it with an array or expression | |
938 | returning a list: | |
939 | ||
940 | func(@foo); | |
941 | func( split /:/ ); | |
942 | ||
943 | Then you've just supplied an automatic scalar() in front of their | |
944 | argument, which can be more than a bit surprising. The old @foo | |
945 | which used to hold one thing doesn't get passed in. Instead, | |
5f05dabc | 946 | the func() now gets passed in 1, that is, the number of elements |
cb1a09d0 AD |
947 | in @foo. And the split() gets called in a scalar context and |
948 | starts scribbling on your @_ parameter list. | |
949 | ||
5f05dabc | 950 | This is all very powerful, of course, and should be used only in moderation |
54310121 | 951 | to make the world a better place. |
44a8e56a | 952 | |
953 | =head2 Constant Functions | |
954 | ||
955 | Functions with a prototype of C<()> are potential candidates for | |
54310121 | 956 | inlining. If the result after optimization and constant folding is |
957 | either a constant or a lexically-scoped scalar which has no other | |
958 | references, then it will be used in place of function calls made | |
959 | without C<&> or C<do>. Calls made using C<&> or C<do> are never | |
960 | inlined. (See constant.pm for an easy way to declare most | |
961 | constants.) | |
44a8e56a | 962 | |
5a964f20 | 963 | The following functions would all be inlined: |
44a8e56a | 964 | |
699e6cd4 TP |
965 | sub pi () { 3.14159 } # Not exact, but close. |
966 | sub PI () { 4 * atan2 1, 1 } # As good as it gets, | |
967 | # and it's inlined, too! | |
44a8e56a | 968 | sub ST_DEV () { 0 } |
969 | sub ST_INO () { 1 } | |
970 | ||
971 | sub FLAG_FOO () { 1 << 8 } | |
972 | sub FLAG_BAR () { 1 << 9 } | |
973 | sub FLAG_MASK () { FLAG_FOO | FLAG_BAR } | |
54310121 | 974 | |
975 | sub OPT_BAZ () { not (0x1B58 & FLAG_MASK) } | |
44a8e56a | 976 | sub BAZ_VAL () { |
977 | if (OPT_BAZ) { | |
978 | return 23; | |
979 | } | |
980 | else { | |
981 | return 42; | |
982 | } | |
983 | } | |
cb1a09d0 | 984 | |
54310121 | 985 | sub N () { int(BAZ_VAL) / 3 } |
986 | BEGIN { | |
987 | my $prod = 1; | |
988 | for (1..N) { $prod *= $_ } | |
989 | sub N_FACTORIAL () { $prod } | |
990 | } | |
991 | ||
5a964f20 | 992 | If you redefine a subroutine that was eligible for inlining, you'll get |
4cee8e80 CS |
993 | a mandatory warning. (You can use this warning to tell whether or not a |
994 | particular subroutine is considered constant.) The warning is | |
995 | considered severe enough not to be optional because previously compiled | |
996 | invocations of the function will still be using the old value of the | |
997 | function. If you need to be able to redefine the subroutine you need to | |
998 | ensure that it isn't inlined, either by dropping the C<()> prototype | |
999 | (which changes the calling semantics, so beware) or by thwarting the | |
1000 | inlining mechanism in some other way, such as | |
1001 | ||
4cee8e80 | 1002 | sub not_inlined () { |
54310121 | 1003 | 23 if $]; |
4cee8e80 CS |
1004 | } |
1005 | ||
cb1a09d0 | 1006 | =head2 Overriding Builtin Functions |
a0d0e21e | 1007 | |
5f05dabc | 1008 | Many builtin functions may be overridden, though this should be tried |
1009 | only occasionally and for good reason. Typically this might be | |
a0d0e21e LW |
1010 | done by a package attempting to emulate missing builtin functionality |
1011 | on a non-Unix system. | |
1012 | ||
5f05dabc | 1013 | Overriding may be done only by importing the name from a |
a0d0e21e | 1014 | module--ordinary predeclaration isn't good enough. However, the |
54310121 | 1015 | C<subs> pragma (compiler directive) lets you, in effect, predeclare subs |
a0d0e21e LW |
1016 | via the import syntax, and these names may then override the builtin ones: |
1017 | ||
1018 | use subs 'chdir', 'chroot', 'chmod', 'chown'; | |
1019 | chdir $somewhere; | |
1020 | sub chdir { ... } | |
1021 | ||
fb73857a | 1022 | To unambiguously refer to the builtin form, one may precede the |
1023 | builtin name with the special package qualifier C<CORE::>. For example, | |
1024 | saying C<CORE::open()> will always refer to the builtin C<open()>, even | |
1025 | if the current package has imported some other subroutine called | |
1026 | C<&open()> from elsewhere. | |
1027 | ||
a0d0e21e | 1028 | Library modules should not in general export builtin names like "open" |
5f05dabc | 1029 | or "chdir" as part of their default @EXPORT list, because these may |
a0d0e21e LW |
1030 | sneak into someone else's namespace and change the semantics unexpectedly. |
1031 | Instead, if the module adds the name to the @EXPORT_OK list, then it's | |
1032 | possible for a user to import the name explicitly, but not implicitly. | |
1033 | That is, they could say | |
1034 | ||
1035 | use Module 'open'; | |
1036 | ||
1037 | and it would import the open override, but if they said | |
1038 | ||
1039 | use Module; | |
1040 | ||
1041 | they would get the default imports without the overrides. | |
1042 | ||
95d94a4f GS |
1043 | The foregoing mechanism for overriding builtins is restricted, quite |
1044 | deliberately, to the package that requests the import. There is a second | |
1045 | method that is sometimes applicable when you wish to override a builtin | |
1046 | everywhere, without regard to namespace boundaries. This is achieved by | |
1047 | importing a sub into the special namespace C<CORE::GLOBAL::>. Here is an | |
1048 | example that quite brazenly replaces the C<glob> operator with something | |
1049 | that understands regular expressions. | |
1050 | ||
1051 | package REGlob; | |
1052 | require Exporter; | |
1053 | @ISA = 'Exporter'; | |
1054 | @EXPORT_OK = 'glob'; | |
1055 | ||
1056 | sub import { | |
1057 | my $pkg = shift; | |
1058 | return unless @_; | |
1059 | my $sym = shift; | |
1060 | my $where = ($sym =~ s/^GLOBAL_// ? 'CORE::GLOBAL' : caller(0)); | |
1061 | $pkg->export($where, $sym, @_); | |
1062 | } | |
1063 | ||
1064 | sub glob { | |
1065 | my $pat = shift; | |
1066 | my @got; | |
1067 | local(*D); | |
227a8b4b | 1068 | if (opendir D, '.') { @got = grep /$pat/, readdir D; closedir D; } |
95d94a4f GS |
1069 | @got; |
1070 | } | |
1071 | 1; | |
1072 | ||
1073 | And here's how it could be (ab)used: | |
1074 | ||
1075 | #use REGlob 'GLOBAL_glob'; # override glob() in ALL namespaces | |
1076 | package Foo; | |
1077 | use REGlob 'glob'; # override glob() in Foo:: only | |
1078 | print for <^[a-z_]+\.pm\$>; # show all pragmatic modules | |
1079 | ||
1080 | Note that the initial comment shows a contrived, even dangerous example. | |
1081 | By overriding C<glob> globally, you would be forcing the new (and | |
1082 | subversive) behavior for the C<glob> operator for B<every> namespace, | |
1083 | without the complete cognizance or cooperation of the modules that own | |
1084 | those namespaces. Naturally, this should be done with extreme caution--if | |
1085 | it must be done at all. | |
1086 | ||
1087 | The C<REGlob> example above does not implement all the support needed to | |
1088 | cleanly override perl's C<glob> operator. The builtin C<glob> has | |
1089 | different behaviors depending on whether it appears in a scalar or list | |
1090 | context, but our C<REGlob> doesn't. Indeed, many perl builtins have such | |
1091 | context sensitive behaviors, and these must be adequately supported by | |
1092 | a properly written override. For a fully functional example of overriding | |
1093 | C<glob>, study the implementation of C<File::DosGlob> in the standard | |
1094 | library. | |
1095 | ||
fb73857a | 1096 | |
a0d0e21e LW |
1097 | =head2 Autoloading |
1098 | ||
1099 | If you call a subroutine that is undefined, you would ordinarily get an | |
1100 | immediate fatal error complaining that the subroutine doesn't exist. | |
1101 | (Likewise for subroutines being used as methods, when the method | |
5a964f20 | 1102 | doesn't exist in any base class of the class package.) If, |
a0d0e21e LW |
1103 | however, there is an C<AUTOLOAD> subroutine defined in the package or |
1104 | packages that were searched for the original subroutine, then that | |
1105 | C<AUTOLOAD> subroutine is called with the arguments that would have been | |
1106 | passed to the original subroutine. The fully qualified name of the | |
1107 | original subroutine magically appears in the $AUTOLOAD variable in the | |
1108 | same package as the C<AUTOLOAD> routine. The name is not passed as an | |
1109 | ordinary argument because, er, well, just because, that's why... | |
1110 | ||
1111 | Most C<AUTOLOAD> routines will load in a definition for the subroutine in | |
1112 | question using eval, and then execute that subroutine using a special | |
1113 | form of "goto" that erases the stack frame of the C<AUTOLOAD> routine | |
1114 | without a trace. (See the standard C<AutoLoader> module, for example.) | |
1115 | But an C<AUTOLOAD> routine can also just emulate the routine and never | |
cb1a09d0 AD |
1116 | define it. For example, let's pretend that a function that wasn't defined |
1117 | should just call system() with those arguments. All you'd do is this: | |
1118 | ||
1119 | sub AUTOLOAD { | |
1120 | my $program = $AUTOLOAD; | |
1121 | $program =~ s/.*:://; | |
1122 | system($program, @_); | |
54310121 | 1123 | } |
cb1a09d0 | 1124 | date(); |
6d28dffb | 1125 | who('am', 'i'); |
cb1a09d0 AD |
1126 | ls('-l'); |
1127 | ||
54310121 | 1128 | In fact, if you predeclare the functions you want to call that way, you don't |
cb1a09d0 AD |
1129 | even need the parentheses: |
1130 | ||
1131 | use subs qw(date who ls); | |
1132 | date; | |
1133 | who "am", "i"; | |
1134 | ls -l; | |
1135 | ||
1136 | A more complete example of this is the standard Shell module, which | |
a0d0e21e LW |
1137 | can treat undefined subroutine calls as calls to Unix programs. |
1138 | ||
cb1a09d0 | 1139 | Mechanisms are available for modules writers to help split the modules |
6d28dffb | 1140 | up into autoloadable files. See the standard AutoLoader module |
1141 | described in L<AutoLoader> and in L<AutoSplit>, the standard | |
1142 | SelfLoader modules in L<SelfLoader>, and the document on adding C | |
1143 | functions to perl code in L<perlxs>. | |
cb1a09d0 AD |
1144 | |
1145 | =head1 SEE ALSO | |
a0d0e21e | 1146 | |
5a964f20 TC |
1147 | See L<perlref> for more about references and closures. See L<perlxs> if |
1148 | you'd like to learn about calling C subroutines from perl. See L<perlmod> | |
1149 | to learn about bundling up your functions in separate files. |