Commit | Line | Data |
---|---|---|
a0d0e21e LW |
1 | =head1 NAME |
2 | ||
3 | perlsub - Perl subroutines | |
4 | ||
5 | =head1 SYNOPSIS | |
6 | ||
7 | To declare subroutines: | |
8 | ||
cb1a09d0 AD |
9 | sub NAME; # A "forward" declaration. |
10 | sub NAME(PROTO); # ditto, but with prototypes | |
11 | ||
12 | sub NAME BLOCK # A declaration and a definition. | |
13 | sub NAME(PROTO) BLOCK # ditto, but with prototypes | |
a0d0e21e | 14 | |
748a9306 LW |
15 | To define an anonymous subroutine at runtime: |
16 | ||
17 | $subref = sub BLOCK; | |
18 | ||
a0d0e21e LW |
19 | To import subroutines: |
20 | ||
21 | use PACKAGE qw(NAME1 NAME2 NAME3); | |
22 | ||
23 | To call subroutines: | |
24 | ||
a0d0e21e LW |
25 | NAME(LIST); # & is optional with parens. |
26 | NAME LIST; # Parens optional if predeclared/imported. | |
cb1a09d0 | 27 | &NAME; # Passes current @_ to subroutine. |
a0d0e21e LW |
28 | |
29 | =head1 DESCRIPTION | |
30 | ||
cb1a09d0 AD |
31 | Like many languages, Perl provides for user-defined subroutines. These |
32 | may be located anywhere in the main program, loaded in from other files | |
33 | via the C<do>, C<require>, or C<use> keywords, or even generated on the | |
34 | fly using C<eval> or anonymous subroutines (closures). You can even call | |
c07a80fd | 35 | a function indirectly using a variable containing its name or a CODE reference |
36 | to it, as in C<$var = \&function>. | |
cb1a09d0 AD |
37 | |
38 | The Perl model for function call and return values is simple: all | |
39 | functions are passed as parameters one single flat list of scalars, and | |
40 | all functions likewise return to their caller one single flat list of | |
41 | scalars. Any arrays or hashes in these call and return lists will | |
42 | collapse, losing their identities--but you may always use | |
43 | pass-by-reference instead to avoid this. Both call and return lists may | |
44 | contain as many or as few scalar elements as you'd like. (Often a | |
45 | function without an explicit return statement is called a subroutine, but | |
46 | there's really no difference from the language's perspective.) | |
47 | ||
48 | Any arguments passed to the routine come in as the array @_. Thus if you | |
49 | called a function with two arguments, those would be stored in C<$_[0]> | |
50 | and C<$_[1]>. The array @_ is a local array, but its values are implicit | |
51 | references (predating L<perlref>) to the actual scalar parameters. The | |
52 | return value of the subroutine is the value of the last expression | |
53 | evaluated. Alternatively, a return statement may be used to specify the | |
54 | returned value and exit the subroutine. If you return one or more arrays | |
55 | and/or hashes, these will be flattened together into one large | |
56 | indistinguishable list. | |
57 | ||
58 | Perl does not have named formal parameters, but in practice all you do is | |
59 | assign to a my() list of these. Any variables you use in the function | |
60 | that aren't declared private are global variables. For the gory details | |
1fef88e7 | 61 | on creating private variables, see |
6d28dffb | 62 | L<"Private Variables via my()"> and L<"Temporary Values via local()">. |
63 | To create protected environments for a set of functions in a separate | |
64 | package (and probably a separate file), see L<perlmod/"Packages">. | |
a0d0e21e LW |
65 | |
66 | Example: | |
67 | ||
cb1a09d0 AD |
68 | sub max { |
69 | my $max = shift(@_); | |
a0d0e21e LW |
70 | foreach $foo (@_) { |
71 | $max = $foo if $max < $foo; | |
72 | } | |
cb1a09d0 | 73 | return $max; |
a0d0e21e | 74 | } |
cb1a09d0 | 75 | $bestday = max($mon,$tue,$wed,$thu,$fri); |
a0d0e21e LW |
76 | |
77 | Example: | |
78 | ||
79 | # get a line, combining continuation lines | |
80 | # that start with whitespace | |
81 | ||
82 | sub get_line { | |
cb1a09d0 | 83 | $thisline = $lookahead; # GLOBAL VARIABLES!! |
a0d0e21e LW |
84 | LINE: while ($lookahead = <STDIN>) { |
85 | if ($lookahead =~ /^[ \t]/) { | |
86 | $thisline .= $lookahead; | |
87 | } | |
88 | else { | |
89 | last LINE; | |
90 | } | |
91 | } | |
92 | $thisline; | |
93 | } | |
94 | ||
95 | $lookahead = <STDIN>; # get first line | |
96 | while ($_ = get_line()) { | |
97 | ... | |
98 | } | |
99 | ||
100 | Use array assignment to a local list to name your formal arguments: | |
101 | ||
102 | sub maybeset { | |
103 | my($key, $value) = @_; | |
cb1a09d0 | 104 | $Foo{$key} = $value unless $Foo{$key}; |
a0d0e21e LW |
105 | } |
106 | ||
cb1a09d0 AD |
107 | This also has the effect of turning call-by-reference into call-by-value, |
108 | since the assignment copies the values. Otherwise a function is free to | |
1fef88e7 | 109 | do in-place modifications of @_ and change its caller's values. |
cb1a09d0 AD |
110 | |
111 | upcase_in($v1, $v2); # this changes $v1 and $v2 | |
112 | sub upcase_in { | |
113 | for (@_) { tr/a-z/A-Z/ } | |
114 | } | |
115 | ||
116 | You aren't allowed to modify constants in this way, of course. If an | |
117 | argument were actually literal and you tried to change it, you'd take a | |
118 | (presumably fatal) exception. For example, this won't work: | |
119 | ||
120 | upcase_in("frederick"); | |
121 | ||
122 | It would be much safer if the upcase_in() function | |
123 | were written to return a copy of its parameters instead | |
124 | of changing them in place: | |
125 | ||
126 | ($v3, $v4) = upcase($v1, $v2); # this doesn't | |
127 | sub upcase { | |
128 | my @parms = @_; | |
129 | for (@parms) { tr/a-z/A-Z/ } | |
c07a80fd | 130 | # wantarray checks if we were called in list context |
131 | return wantarray ? @parms : $parms[0]; | |
cb1a09d0 AD |
132 | } |
133 | ||
134 | Notice how this (unprototyped) function doesn't care whether it was passed | |
135 | real scalars or arrays. Perl will see everything as one big long flat @_ | |
136 | parameter list. This is one of the ways where Perl's simple | |
137 | argument-passing style shines. The upcase() function would work perfectly | |
138 | well without changing the upcase() definition even if we fed it things | |
139 | like this: | |
140 | ||
141 | @newlist = upcase(@list1, @list2); | |
142 | @newlist = upcase( split /:/, $var ); | |
143 | ||
144 | Do not, however, be tempted to do this: | |
145 | ||
146 | (@a, @b) = upcase(@list1, @list2); | |
147 | ||
148 | Because like its flat incoming parameter list, the return list is also | |
149 | flat. So all you have managed to do here is stored everything in @a and | |
150 | made @b an empty list. See L</"Pass by Reference"> for alternatives. | |
151 | ||
152 | A subroutine may be called using the "&" prefix. The "&" is optional in | |
153 | Perl 5, and so are the parens if the subroutine has been predeclared. | |
154 | (Note, however, that the "&" is I<NOT> optional when you're just naming | |
155 | the subroutine, such as when it's used as an argument to defined() or | |
156 | undef(). Nor is it optional when you want to do an indirect subroutine | |
157 | call with a subroutine name or reference using the C<&$subref()> or | |
158 | C<&{$subref}()> constructs. See L<perlref> for more on that.) | |
a0d0e21e LW |
159 | |
160 | Subroutines may be called recursively. If a subroutine is called using | |
cb1a09d0 AD |
161 | the "&" form, the argument list is optional, and if omitted, no @_ array is |
162 | set up for the subroutine: the @_ array at the time of the call is | |
163 | visible to subroutine instead. This is an efficiency mechanism that | |
164 | new users may wish to avoid. | |
a0d0e21e LW |
165 | |
166 | &foo(1,2,3); # pass three arguments | |
167 | foo(1,2,3); # the same | |
168 | ||
169 | foo(); # pass a null list | |
170 | &foo(); # the same | |
a0d0e21e | 171 | |
cb1a09d0 AD |
172 | &foo; # foo() get current args, like foo(@_) !! |
173 | foo; # like foo() IFF sub foo pre-declared, else "foo" | |
174 | ||
c07a80fd | 175 | Not only does the "&" form make the argument list optional, but it also |
176 | disables any prototype checking on the arguments you do provide. This | |
177 | is partly for historical reasons, and partly for having a convenient way | |
178 | to cheat if you know what you're doing. See the section on Prototypes below. | |
179 | ||
cb1a09d0 AD |
180 | =head2 Private Variables via my() |
181 | ||
182 | Synopsis: | |
183 | ||
184 | my $foo; # declare $foo lexically local | |
185 | my (@wid, %get); # declare list of variables local | |
186 | my $foo = "flurp"; # declare $foo lexical, and init it | |
187 | my @oof = @bar; # declare @oof lexical, and init it | |
188 | ||
189 | A "my" declares the listed variables to be confined (lexically) to the | |
190 | enclosing block, subroutine, C<eval>, or C<do/require/use>'d file. If | |
191 | more than one value is listed, the list must be placed in parens. All | |
192 | listed elements must be legal lvalues. Only alphanumeric identifiers may | |
193 | be lexically scoped--magical builtins like $/ must currently be localized with | |
194 | "local" instead. | |
195 | ||
196 | Unlike dynamic variables created by the "local" statement, lexical | |
197 | variables declared with "my" are totally hidden from the outside world, | |
198 | including any called subroutines (even if it's the same subroutine called | |
199 | from itself or elsewhere--every call gets its own copy). | |
200 | ||
201 | (An eval(), however, can see the lexical variables of the scope it is | |
202 | being evaluated in so long as the names aren't hidden by declarations within | |
203 | the eval() itself. See L<perlref>.) | |
204 | ||
205 | The parameter list to my() may be assigned to if desired, which allows you | |
206 | to initialize your variables. (If no initializer is given for a | |
207 | particular variable, it is created with the undefined value.) Commonly | |
208 | this is used to name the parameters to a subroutine. Examples: | |
209 | ||
210 | $arg = "fred"; # "global" variable | |
211 | $n = cube_root(27); | |
212 | print "$arg thinks the root is $n\n"; | |
213 | fred thinks the root is 3 | |
214 | ||
215 | sub cube_root { | |
216 | my $arg = shift; # name doesn't matter | |
217 | $arg **= 1/3; | |
218 | return $arg; | |
219 | } | |
220 | ||
221 | The "my" is simply a modifier on something you might assign to. So when | |
222 | you do assign to the variables in its argument list, the "my" doesn't | |
223 | change whether those variables is viewed as a scalar or an array. So | |
224 | ||
225 | my ($foo) = <STDIN>; | |
226 | my @FOO = <STDIN>; | |
227 | ||
228 | both supply a list context to the righthand side, while | |
229 | ||
230 | my $foo = <STDIN>; | |
231 | ||
232 | supplies a scalar context. But the following only declares one variable: | |
748a9306 | 233 | |
cb1a09d0 | 234 | my $foo, $bar = 1; |
748a9306 | 235 | |
cb1a09d0 | 236 | That has the same effect as |
748a9306 | 237 | |
cb1a09d0 AD |
238 | my $foo; |
239 | $bar = 1; | |
a0d0e21e | 240 | |
cb1a09d0 AD |
241 | The declared variable is not introduced (is not visible) until after |
242 | the current statement. Thus, | |
243 | ||
244 | my $x = $x; | |
245 | ||
246 | can be used to initialize the new $x with the value of the old $x, and | |
247 | the expression | |
248 | ||
249 | my $x = 123 and $x == 123 | |
250 | ||
251 | is false unless the old $x happened to have the value 123. | |
252 | ||
253 | Some users may wish to encourage the use of lexically scoped variables. | |
254 | As an aid to catching implicit references to package variables, | |
255 | if you say | |
256 | ||
257 | use strict 'vars'; | |
258 | ||
259 | then any variable reference from there to the end of the enclosing | |
260 | block must either refer to a lexical variable, or must be fully | |
261 | qualified with the package name. A compilation error results | |
262 | otherwise. An inner block may countermand this with S<"no strict 'vars'">. | |
263 | ||
264 | A my() has both a compile-time and a run-time effect. At compile time, | |
265 | the compiler takes notice of it; the principle usefulness of this is to | |
266 | quiet C<use strict 'vars'>. The actual initialization doesn't happen | |
267 | until run time, so gets executed every time through a loop. | |
268 | ||
269 | Variables declared with "my" are not part of any package and are therefore | |
270 | never fully qualified with the package name. In particular, you're not | |
271 | allowed to try to make a package variable (or other global) lexical: | |
272 | ||
273 | my $pack::var; # ERROR! Illegal syntax | |
274 | my $_; # also illegal (currently) | |
275 | ||
276 | In fact, a dynamic variable (also known as package or global variables) | |
277 | are still accessible using the fully qualified :: notation even while a | |
278 | lexical of the same name is also visible: | |
279 | ||
280 | package main; | |
281 | local $x = 10; | |
282 | my $x = 20; | |
283 | print "$x and $::x\n"; | |
284 | ||
285 | That will print out 20 and 10. | |
286 | ||
287 | You may declare "my" variables at the outer most scope of a file to | |
288 | totally hide any such identifiers from the outside world. This is similar | |
6d28dffb | 289 | to C's static variables at the file level. To do this with a subroutine |
cb1a09d0 AD |
290 | requires the use of a closure (anonymous function). If a block (such as |
291 | an eval(), function, or C<package>) wants to create a private subroutine | |
292 | that cannot be called from outside that block, it can declare a lexical | |
293 | variable containing an anonymous sub reference: | |
294 | ||
295 | my $secret_version = '1.001-beta'; | |
296 | my $secret_sub = sub { print $secret_version }; | |
297 | &$secret_sub(); | |
298 | ||
299 | As long as the reference is never returned by any function within the | |
300 | module, no outside module can see the subroutine, since its name is not in | |
301 | any package's symbol table. Remember that it's not I<REALLY> called | |
302 | $some_pack::secret_version or anything; it's just $secret_version, | |
303 | unqualified and unqualifiable. | |
304 | ||
305 | This does not work with object methods, however; all object methods have | |
306 | to be in the symbol table of some package to be found. | |
307 | ||
308 | Just because the lexical variable is lexically (also called statically) | |
309 | scoped doesn't mean that within a function it works like a C static. It | |
310 | normally works more like a C auto. But here's a mechanism for giving a | |
311 | function private variables with both lexical scoping and a static | |
312 | lifetime. If you do want to create something like C's static variables, | |
313 | just enclose the whole function in an extra block, and put the | |
314 | static variable outside the function but in the block. | |
315 | ||
316 | { | |
317 | my $secret_val = 0; | |
318 | sub gimme_another { | |
319 | return ++$secret_val; | |
320 | } | |
321 | } | |
322 | # $secret_val now becomes unreachable by the outside | |
323 | # world, but retains its value between calls to gimme_another | |
324 | ||
325 | If this function is being sourced in from a separate file | |
326 | via C<require> or C<use>, then this is probably just fine. If it's | |
327 | all in the main program, you'll need to arrange for the my() | |
328 | to be executed early, either by putting the whole block above | |
329 | your pain program, or more likely, merely placing a BEGIN | |
330 | sub around it to make sure it gets executed before your program | |
331 | starts to run: | |
332 | ||
333 | sub BEGIN { | |
334 | my $secret_val = 0; | |
335 | sub gimme_another { | |
336 | return ++$secret_val; | |
337 | } | |
338 | } | |
339 | ||
340 | See L<perlrun> about the BEGIN function. | |
341 | ||
342 | =head2 Temporary Values via local() | |
343 | ||
344 | B<NOTE>: In general, you should be using "my" instead of "local", because | |
6d28dffb | 345 | it's faster and safer. Exceptions to this include the global punctuation |
cb1a09d0 AD |
346 | variables, filehandles and formats, and direct manipulation of the Perl |
347 | symbol table itself. Format variables often use "local" though, as do | |
348 | other variables whose current value must be visible to called | |
349 | subroutines. | |
350 | ||
351 | Synopsis: | |
352 | ||
353 | local $foo; # declare $foo dynamically local | |
354 | local (@wid, %get); # declare list of variables local | |
355 | local $foo = "flurp"; # declare $foo dynamic, and init it | |
356 | local @oof = @bar; # declare @oof dynamic, and init it | |
357 | ||
358 | local *FH; # localize $FH, @FH, %FH, &FH ... | |
359 | local *merlyn = *randal; # now $merlyn is really $randal, plus | |
360 | # @merlyn is really @randal, etc | |
361 | local *merlyn = 'randal'; # SAME THING: promote 'randal' to *randal | |
362 | local *merlyn = \$randal; # just alias $merlyn, not @merlyn etc | |
363 | ||
364 | A local() modifies its listed variables to be local to the enclosing | |
1fef88e7 | 365 | block, (or subroutine, C<eval{}> or C<do>) and I<any called from |
cb1a09d0 AD |
366 | within that block>. A local() just gives temporary values to global |
367 | (meaning package) variables. This is known as dynamic scoping. Lexical | |
368 | scoping is done with "my", which works more like C's auto declarations. | |
369 | ||
370 | If more than one variable is given to local(), they must be placed in | |
371 | parens. All listed elements must be legal lvalues. This operator works | |
372 | by saving the current values of those variables in its argument list on a | |
373 | hidden stack and restoring them upon exiting the block, subroutine or | |
374 | eval. This means that called subroutines can also reference the local | |
375 | variable, but not the global one. The argument list may be assigned to if | |
376 | desired, which allows you to initialize your local variables. (If no | |
377 | initializer is given for a particular variable, it is created with an | |
378 | undefined value.) Commonly this is used to name the parameters to a | |
379 | subroutine. Examples: | |
380 | ||
381 | for $i ( 0 .. 9 ) { | |
382 | $digits{$i} = $i; | |
383 | } | |
384 | # assume this function uses global %digits hash | |
385 | parse_num(); | |
386 | ||
387 | # now temporarily add to %digits hash | |
388 | if ($base12) { | |
389 | # (NOTE: not claiming this is efficient!) | |
390 | local %digits = (%digits, 't' => 10, 'e' => 11); | |
391 | parse_num(); # parse_num gets this new %digits! | |
392 | } | |
393 | # old %digits restored here | |
394 | ||
1fef88e7 | 395 | Because local() is a run-time command, it gets executed every time |
cb1a09d0 AD |
396 | through a loop. In releases of Perl previous to 5.0, this used more stack |
397 | storage each time until the loop was exited. Perl now reclaims the space | |
398 | each time through, but it's still more efficient to declare your variables | |
399 | outside the loop. | |
400 | ||
401 | A local is simply a modifier on an lvalue expression. When you assign to | |
402 | a localized variable, the local doesn't change whether its list is viewed | |
403 | as a scalar or an array. So | |
404 | ||
405 | local($foo) = <STDIN>; | |
406 | local @FOO = <STDIN>; | |
407 | ||
408 | both supply a list context to the righthand side, while | |
409 | ||
410 | local $foo = <STDIN>; | |
411 | ||
412 | supplies a scalar context. | |
413 | ||
414 | =head2 Passing Symbol Table Entries (typeglobs) | |
415 | ||
416 | [Note: The mechanism described in this section was originally the only | |
417 | way to simulate pass-by-reference in older versions of Perl. While it | |
418 | still works fine in modern versions, the new reference mechanism is | |
419 | generally easier to work with. See below.] | |
a0d0e21e LW |
420 | |
421 | Sometimes you don't want to pass the value of an array to a subroutine | |
422 | but rather the name of it, so that the subroutine can modify the global | |
423 | copy of it rather than working with a local copy. In perl you can | |
cb1a09d0 | 424 | refer to all objects of a particular name by prefixing the name |
a0d0e21e LW |
425 | with a star: C<*foo>. This is often known as a "type glob", since the |
426 | star on the front can be thought of as a wildcard match for all the | |
427 | funny prefix characters on variables and subroutines and such. | |
428 | ||
429 | When evaluated, the type glob produces a scalar value that represents | |
430 | all the objects of that name, including any filehandle, format or | |
431 | subroutine. When assigned to, it causes the name mentioned to refer to | |
432 | whatever "*" value was assigned to it. Example: | |
433 | ||
434 | sub doubleary { | |
435 | local(*someary) = @_; | |
436 | foreach $elem (@someary) { | |
437 | $elem *= 2; | |
438 | } | |
439 | } | |
440 | doubleary(*foo); | |
441 | doubleary(*bar); | |
442 | ||
443 | Note that scalars are already passed by reference, so you can modify | |
444 | scalar arguments without using this mechanism by referring explicitly | |
1fef88e7 | 445 | to C<$_[0]> etc. You can modify all the elements of an array by passing |
a0d0e21e LW |
446 | all the elements as scalars, but you have to use the * mechanism (or |
447 | the equivalent reference mechanism) to push, pop or change the size of | |
448 | an array. It will certainly be faster to pass the typeglob (or reference). | |
449 | ||
450 | Even if you don't want to modify an array, this mechanism is useful for | |
451 | passing multiple arrays in a single LIST, since normally the LIST | |
452 | mechanism will merge all the array values so that you can't extract out | |
cb1a09d0 AD |
453 | the individual arrays. For more on typeglobs, see L<perldata/"Typeglobs">. |
454 | ||
455 | =head2 Pass by Reference | |
456 | ||
457 | If you want to pass more than one array or hash into a function--or | |
458 | return them from it--and have them maintain their integrity, | |
459 | then you're going to have to use an explicit pass-by-reference. | |
c07a80fd | 460 | Before you do that, you need to understand references as detailed in L<perlref>. |
461 | This section may not make much sense to you otherwise. | |
cb1a09d0 AD |
462 | |
463 | Here are a few simple examples. First, let's pass in several | |
464 | arrays to a function and have it pop all of then, return a new | |
465 | list of all their former last elements: | |
466 | ||
467 | @tailings = popmany ( \@a, \@b, \@c, \@d ); | |
468 | ||
469 | sub popmany { | |
470 | my $aref; | |
471 | my @retlist = (); | |
472 | foreach $aref ( @_ ) { | |
473 | push @retlist, pop @$aref; | |
474 | } | |
475 | return @retlist; | |
476 | } | |
477 | ||
478 | Here's how you might write a function that returns a | |
479 | list of keys occurring in all the hashes passed to it: | |
480 | ||
481 | @common = inter( \%foo, \%bar, \%joe ); | |
482 | sub inter { | |
483 | my ($k, $href, %seen); # locals | |
484 | foreach $href (@_) { | |
485 | while ( $k = each %$href ) { | |
486 | $seen{$k}++; | |
487 | } | |
488 | } | |
489 | return grep { $seen{$_} == @_ } keys %seen; | |
490 | } | |
491 | ||
492 | So far, we're just using the normal list return mechanism. | |
493 | What happens if you want to pass or return a hash? Well, | |
494 | if you're only using one of them, or you don't mind them | |
495 | concatenating, then the normal calling convention is ok, although | |
496 | a little expensive. | |
497 | ||
498 | Where people get into trouble is here: | |
499 | ||
500 | (@a, @b) = func(@c, @d); | |
501 | or | |
502 | (%a, %b) = func(%c, %d); | |
503 | ||
504 | That syntax simply won't work. It just sets @a or %a and clears the @b or | |
505 | %b. Plus the function didn't get passed into two separate arrays or | |
506 | hashes: it got one long list in @_, as always. | |
507 | ||
508 | If you can arrange for everyone to deal with this through references, it's | |
509 | cleaner code, although not so nice to look at. Here's a function that | |
510 | takes two array references as arguments, returning the two array elements | |
511 | in order of how many elements they have in them: | |
512 | ||
513 | ($aref, $bref) = func(\@c, \@d); | |
514 | print "@$aref has more than @$bref\n"; | |
515 | sub func { | |
516 | my ($cref, $dref) = @_; | |
517 | if (@$cref > @$dref) { | |
518 | return ($cref, $dref); | |
519 | } else { | |
c07a80fd | 520 | return ($dref, $cref); |
cb1a09d0 AD |
521 | } |
522 | } | |
523 | ||
524 | It turns out that you can actually do this also: | |
525 | ||
526 | (*a, *b) = func(\@c, \@d); | |
527 | print "@a has more than @b\n"; | |
528 | sub func { | |
529 | local (*c, *d) = @_; | |
530 | if (@c > @d) { | |
531 | return (\@c, \@d); | |
532 | } else { | |
533 | return (\@d, \@c); | |
534 | } | |
535 | } | |
536 | ||
537 | Here we're using the typeglobs to do symbol table aliasing. It's | |
538 | a tad subtle, though, and also won't work if you're using my() | |
539 | variables, since only globals (well, and local()s) are in the symbol table. | |
540 | ||
541 | If you're passing around filehandles, you could usually just use the bare | |
542 | typeglob, like *STDOUT, but typeglobs references would be better because | |
543 | they'll still work properly under C<use strict 'refs'>. For example: | |
544 | ||
545 | splutter(\*STDOUT); | |
546 | sub splutter { | |
547 | my $fh = shift; | |
548 | print $fh "her um well a hmmm\n"; | |
549 | } | |
550 | ||
551 | $rec = get_rec(\*STDIN); | |
552 | sub get_rec { | |
553 | my $fh = shift; | |
554 | return scalar <$fh>; | |
555 | } | |
556 | ||
557 | If you're planning on generating new filehandles, you could do this: | |
558 | ||
559 | sub openit { | |
560 | my $name = shift; | |
561 | local *FH; | |
562 | return open (FH, $path) ? \*FH : undef; | |
563 | } | |
564 | ||
565 | Although that will actually produce a small memory leak. See the bottom | |
566 | of L<perlfunc/open()> for a somewhat cleaner way using the FileHandle | |
567 | functions supplied with the POSIX package. | |
568 | ||
569 | =head2 Prototypes | |
570 | ||
571 | As of the 5.002 release of perl, if you declare | |
572 | ||
573 | sub mypush (\@@) | |
574 | ||
c07a80fd | 575 | then mypush() takes arguments exactly like push() does. The declaration |
576 | of the function to be called must be visible at compile time. The prototype | |
577 | only affects the interpretation of new-style calls to the function, where | |
578 | new-style is defined as not using the C<&> character. In other words, | |
579 | if you call it like a builtin function, then it behaves like a builtin | |
580 | function. If you call it like an old-fashioned subroutine, then it | |
581 | behaves like an old-fashioned subroutine. It naturally falls out from | |
582 | this rule that prototypes have no influence on subroutine references | |
583 | like C<\&foo> or on indirect subroutine calls like C<&{$subref}>. | |
584 | ||
585 | Method calls are not influenced by prototypes either, because the | |
586 | function to be called is indeterminate at compile time, since it depends | |
587 | on inheritance. | |
cb1a09d0 | 588 | |
c07a80fd | 589 | Since the intent is primarily to let you define subroutines that work |
590 | like builtin commands, here are the prototypes for some other functions | |
591 | that parse almost exactly like the corresponding builtins. | |
cb1a09d0 AD |
592 | |
593 | Declared as Called as | |
594 | ||
595 | sub mylink ($$) mylink $old, $new | |
596 | sub myvec ($$$) myvec $var, $offset, 1 | |
597 | sub myindex ($$;$) myindex &getstring, "substr" | |
598 | sub mysyswrite ($$$;$) mysyswrite $buf, 0, length($buf) - $off, $off | |
599 | sub myreverse (@) myreverse $a,$b,$c | |
600 | sub myjoin ($@) myjoin ":",$a,$b,$c | |
601 | sub mypop (\@) mypop @array | |
602 | sub mysplice (\@$$@) mysplice @array,@array,0,@pushme | |
603 | sub mykeys (\%) mykeys %{$hashref} | |
604 | sub myopen (*;$) myopen HANDLE, $name | |
605 | sub mypipe (**) mypipe READHANDLE, WRITEHANDLE | |
606 | sub mygrep (&@) mygrep { /foo/ } $a,$b,$c | |
607 | sub myrand ($) myrand 42 | |
608 | sub mytime () mytime | |
609 | ||
c07a80fd | 610 | Any backslashed prototype character represents an actual argument |
6e47f808 | 611 | that absolutely must start with that character. The value passed |
612 | to the subroutine (as part of C<@_>) will be a reference to the | |
613 | actual argument given in the subroutine call, obtained by applying | |
614 | C<\> to that argument. | |
c07a80fd | 615 | |
616 | Unbackslashed prototype characters have special meanings. Any | |
617 | unbackslashed @ or % eats all the rest of the arguments, and forces | |
618 | list context. An argument represented by $ forces scalar context. An | |
619 | & requires an anonymous subroutine, which, if passed as the first | |
620 | argument, does not require the "sub" keyword or a subsequent comma. A | |
621 | * does whatever it has to do to turn the argument into a reference to a | |
622 | symbol table entry. | |
623 | ||
624 | A semicolon separates mandatory arguments from optional arguments. | |
625 | (It is redundant before @ or %.) | |
cb1a09d0 | 626 | |
c07a80fd | 627 | Note how the last three examples above are treated specially by the parser. |
cb1a09d0 AD |
628 | mygrep() is parsed as a true list operator, myrand() is parsed as a |
629 | true unary operator with unary precedence the same as rand(), and | |
630 | mytime() is truly argumentless, just like time(). That is, if you | |
631 | say | |
632 | ||
633 | mytime +2; | |
634 | ||
635 | you'll get mytime() + 2, not mytime(2), which is how it would be parsed | |
636 | without the prototype. | |
637 | ||
638 | The interesting thing about & is that you can generate new syntax with it: | |
639 | ||
6d28dffb | 640 | sub try (&@) { |
cb1a09d0 AD |
641 | my($try,$catch) = @_; |
642 | eval { &$try }; | |
643 | if ($@) { | |
644 | local $_ = $@; | |
645 | &$catch; | |
646 | } | |
647 | } | |
648 | sub catch (&) { @_ } | |
649 | ||
650 | try { | |
651 | die "phooey"; | |
652 | } catch { | |
653 | /phooey/ and print "unphooey\n"; | |
654 | }; | |
655 | ||
656 | That prints "unphooey". (Yes, there are still unresolved | |
657 | issues having to do with the visibility of @_. I'm ignoring that | |
658 | question for the moment. (But note that if we make @_ lexically | |
659 | scoped, those anonymous subroutines can act like closures... (Gee, | |
660 | is this sounding a little Lispish? (Nevermind.)))) | |
661 | ||
662 | And here's a reimplementation of grep: | |
663 | ||
664 | sub mygrep (&@) { | |
665 | my $code = shift; | |
666 | my @result; | |
667 | foreach $_ (@_) { | |
6e47f808 | 668 | push(@result, $_) if &$code; |
cb1a09d0 AD |
669 | } |
670 | @result; | |
671 | } | |
a0d0e21e | 672 | |
cb1a09d0 AD |
673 | Some folks would prefer full alphanumeric prototypes. Alphanumerics have |
674 | been intentionally left out of prototypes for the express purpose of | |
675 | someday in the future adding named, formal parameters. The current | |
676 | mechanism's main goal is to let module writers provide better diagnostics | |
677 | for module users. Larry feels the notation quite understandable to Perl | |
678 | programmers, and that it will not intrude greatly upon the meat of the | |
679 | module, nor make it harder to read. The line noise is visually | |
680 | encapsulated into a small pill that's easy to swallow. | |
681 | ||
682 | It's probably best to prototype new functions, not retrofit prototyping | |
683 | into older ones. That's because you must be especially careful about | |
684 | silent impositions of differing list versus scalar contexts. For example, | |
685 | if you decide that a function should take just one parameter, like this: | |
686 | ||
687 | sub func ($) { | |
688 | my $n = shift; | |
689 | print "you gave me $n\n"; | |
690 | } | |
691 | ||
692 | and someone has been calling it with an array or expression | |
693 | returning a list: | |
694 | ||
695 | func(@foo); | |
696 | func( split /:/ ); | |
697 | ||
698 | Then you've just supplied an automatic scalar() in front of their | |
699 | argument, which can be more than a bit surprising. The old @foo | |
700 | which used to hold one thing doesn't get passed in. Instead, | |
701 | the func() now gets passed in 1, that is, the number of elments | |
702 | in @foo. And the split() gets called in a scalar context and | |
703 | starts scribbling on your @_ parameter list. | |
704 | ||
705 | This is all very powerful, of course, and should only be used in moderation | |
706 | to make the world a better place. | |
707 | ||
708 | =head2 Overriding Builtin Functions | |
a0d0e21e LW |
709 | |
710 | Many builtin functions may be overridden, though this should only be | |
711 | tried occasionally and for good reason. Typically this might be | |
712 | done by a package attempting to emulate missing builtin functionality | |
713 | on a non-Unix system. | |
714 | ||
715 | Overriding may only be done by importing the name from a | |
716 | module--ordinary predeclaration isn't good enough. However, the | |
717 | C<subs> pragma (compiler directive) lets you, in effect, predeclare subs | |
718 | via the import syntax, and these names may then override the builtin ones: | |
719 | ||
720 | use subs 'chdir', 'chroot', 'chmod', 'chown'; | |
721 | chdir $somewhere; | |
722 | sub chdir { ... } | |
723 | ||
724 | Library modules should not in general export builtin names like "open" | |
725 | or "chdir" as part of their default @EXPORT list, since these may | |
726 | sneak into someone else's namespace and change the semantics unexpectedly. | |
727 | Instead, if the module adds the name to the @EXPORT_OK list, then it's | |
728 | possible for a user to import the name explicitly, but not implicitly. | |
729 | That is, they could say | |
730 | ||
731 | use Module 'open'; | |
732 | ||
733 | and it would import the open override, but if they said | |
734 | ||
735 | use Module; | |
736 | ||
737 | they would get the default imports without the overrides. | |
738 | ||
739 | =head2 Autoloading | |
740 | ||
741 | If you call a subroutine that is undefined, you would ordinarily get an | |
742 | immediate fatal error complaining that the subroutine doesn't exist. | |
743 | (Likewise for subroutines being used as methods, when the method | |
744 | doesn't exist in any of the base classes of the class package.) If, | |
745 | however, there is an C<AUTOLOAD> subroutine defined in the package or | |
746 | packages that were searched for the original subroutine, then that | |
747 | C<AUTOLOAD> subroutine is called with the arguments that would have been | |
748 | passed to the original subroutine. The fully qualified name of the | |
749 | original subroutine magically appears in the $AUTOLOAD variable in the | |
750 | same package as the C<AUTOLOAD> routine. The name is not passed as an | |
751 | ordinary argument because, er, well, just because, that's why... | |
752 | ||
753 | Most C<AUTOLOAD> routines will load in a definition for the subroutine in | |
754 | question using eval, and then execute that subroutine using a special | |
755 | form of "goto" that erases the stack frame of the C<AUTOLOAD> routine | |
756 | without a trace. (See the standard C<AutoLoader> module, for example.) | |
757 | But an C<AUTOLOAD> routine can also just emulate the routine and never | |
cb1a09d0 AD |
758 | define it. For example, let's pretend that a function that wasn't defined |
759 | should just call system() with those arguments. All you'd do is this: | |
760 | ||
761 | sub AUTOLOAD { | |
762 | my $program = $AUTOLOAD; | |
763 | $program =~ s/.*:://; | |
764 | system($program, @_); | |
765 | } | |
766 | date(); | |
6d28dffb | 767 | who('am', 'i'); |
cb1a09d0 AD |
768 | ls('-l'); |
769 | ||
770 | In fact, if you preclare the functions you want to call that way, you don't | |
771 | even need the parentheses: | |
772 | ||
773 | use subs qw(date who ls); | |
774 | date; | |
775 | who "am", "i"; | |
776 | ls -l; | |
777 | ||
778 | A more complete example of this is the standard Shell module, which | |
a0d0e21e LW |
779 | can treat undefined subroutine calls as calls to Unix programs. |
780 | ||
cb1a09d0 | 781 | Mechanisms are available for modules writers to help split the modules |
6d28dffb | 782 | up into autoloadable files. See the standard AutoLoader module |
783 | described in L<AutoLoader> and in L<AutoSplit>, the standard | |
784 | SelfLoader modules in L<SelfLoader>, and the document on adding C | |
785 | functions to perl code in L<perlxs>. | |
cb1a09d0 AD |
786 | |
787 | =head1 SEE ALSO | |
a0d0e21e | 788 | |
cb1a09d0 AD |
789 | See L<perlref> for more on references. See L<perlxs> if you'd |
790 | like to learn about calling C subroutines from perl. See | |
791 | L<perlmod> to learn about bundling up your functions in | |
792 | separate files. |