| 1 | =head1 NAME |
| 2 | X<reference> X<pointer> X<data structure> X<structure> X<struct> |
| 3 | |
| 4 | perlref - Perl references and nested data structures |
| 5 | |
| 6 | =head1 NOTE |
| 7 | |
| 8 | This is complete documentation about all aspects of references. |
| 9 | For a shorter, tutorial introduction to just the essential features, |
| 10 | see L<perlreftut>. |
| 11 | |
| 12 | =head1 DESCRIPTION |
| 13 | |
| 14 | Before release 5 of Perl it was difficult to represent complex data |
| 15 | structures, because all references had to be symbolic--and even then |
| 16 | it was difficult to refer to a variable instead of a symbol table entry. |
| 17 | Perl now not only makes it easier to use symbolic references to variables, |
| 18 | but also lets you have "hard" references to any piece of data or code. |
| 19 | Any scalar may hold a hard reference. Because arrays and hashes contain |
| 20 | scalars, you can now easily build arrays of arrays, arrays of hashes, |
| 21 | hashes of arrays, arrays of hashes of functions, and so on. |
| 22 | |
| 23 | Hard references are smart--they keep track of reference counts for you, |
| 24 | automatically freeing the thing referred to when its reference count goes |
| 25 | to zero. (Reference counts for values in self-referential or |
| 26 | cyclic data structures may not go to zero without a little help; see |
| 27 | L</"Circular References"> for a detailed explanation.) |
| 28 | If that thing happens to be an object, the object is destructed. See |
| 29 | L<perlobj> for more about objects. (In a sense, everything in Perl is an |
| 30 | object, but we usually reserve the word for references to objects that |
| 31 | have been officially "blessed" into a class package.) |
| 32 | |
| 33 | Symbolic references are names of variables or other objects, just as a |
| 34 | symbolic link in a Unix filesystem contains merely the name of a file. |
| 35 | The C<*glob> notation is something of a symbolic reference. (Symbolic |
| 36 | references are sometimes called "soft references", but please don't call |
| 37 | them that; references are confusing enough without useless synonyms.) |
| 38 | X<reference, symbolic> X<reference, soft> |
| 39 | X<symbolic reference> X<soft reference> |
| 40 | |
| 41 | In contrast, hard references are more like hard links in a Unix file |
| 42 | system: They are used to access an underlying object without concern for |
| 43 | what its (other) name is. When the word "reference" is used without an |
| 44 | adjective, as in the following paragraph, it is usually talking about a |
| 45 | hard reference. |
| 46 | X<reference, hard> X<hard reference> |
| 47 | |
| 48 | References are easy to use in Perl. There is just one overriding |
| 49 | principle: Perl does no implicit referencing or dereferencing. When a |
| 50 | scalar is holding a reference, it always behaves as a simple scalar. It |
| 51 | doesn't magically start being an array or hash or subroutine; you have to |
| 52 | tell it explicitly to do so, by dereferencing it. |
| 53 | |
| 54 | References are easy to use in Perl. There is just one overriding |
| 55 | principle: in general, Perl does no implicit referencing or dereferencing. |
| 56 | When a scalar is holding a reference, it always behaves as a simple scalar. |
| 57 | It doesn't magically start being an array or hash or subroutine; you have to |
| 58 | tell it explicitly to do so, by dereferencing it. |
| 59 | |
| 60 | That said, be aware that Perl version 5.14 introduces an exception |
| 61 | to the rule, for syntactic convenience. Experimental array and hash container |
| 62 | function behavior allows array and hash references to be handled by Perl as |
| 63 | if they had been explicitly syntactically dereferenced. See |
| 64 | L<perl5140delta/"Syntactical Enhancements"> |
| 65 | and L<perlfunc> for details. |
| 66 | |
| 67 | =head2 Making References |
| 68 | X<reference, creation> X<referencing> |
| 69 | |
| 70 | References can be created in several ways. |
| 71 | |
| 72 | =over 4 |
| 73 | |
| 74 | =item 1. |
| 75 | X<\> X<backslash> |
| 76 | |
| 77 | By using the backslash operator on a variable, subroutine, or value. |
| 78 | (This works much like the & (address-of) operator in C.) |
| 79 | This typically creates I<another> reference to a variable, because |
| 80 | there's already a reference to the variable in the symbol table. But |
| 81 | the symbol table reference might go away, and you'll still have the |
| 82 | reference that the backslash returned. Here are some examples: |
| 83 | |
| 84 | $scalarref = \$foo; |
| 85 | $arrayref = \@ARGV; |
| 86 | $hashref = \%ENV; |
| 87 | $coderef = \&handler; |
| 88 | $globref = \*foo; |
| 89 | |
| 90 | It isn't possible to create a true reference to an IO handle (filehandle |
| 91 | or dirhandle) using the backslash operator. The most you can get is a |
| 92 | reference to a typeglob, which is actually a complete symbol table entry. |
| 93 | But see the explanation of the C<*foo{THING}> syntax below. However, |
| 94 | you can still use type globs and globrefs as though they were IO handles. |
| 95 | |
| 96 | =item 2. |
| 97 | X<array, anonymous> X<[> X<[]> X<square bracket> |
| 98 | X<bracket, square> X<arrayref> X<array reference> X<reference, array> |
| 99 | |
| 100 | A reference to an anonymous array can be created using square |
| 101 | brackets: |
| 102 | |
| 103 | $arrayref = [1, 2, ['a', 'b', 'c']]; |
| 104 | |
| 105 | Here we've created a reference to an anonymous array of three elements |
| 106 | whose final element is itself a reference to another anonymous array of three |
| 107 | elements. (The multidimensional syntax described later can be used to |
| 108 | access this. For example, after the above, C<< $arrayref->[2][1] >> would have |
| 109 | the value "b".) |
| 110 | |
| 111 | Taking a reference to an enumerated list is not the same |
| 112 | as using square brackets--instead it's the same as creating |
| 113 | a list of references! |
| 114 | |
| 115 | @list = (\$a, \@b, \%c); |
| 116 | @list = \($a, @b, %c); # same thing! |
| 117 | |
| 118 | As a special case, C<\(@foo)> returns a list of references to the contents |
| 119 | of C<@foo>, not a reference to C<@foo> itself. Likewise for C<%foo>, |
| 120 | except that the key references are to copies (since the keys are just |
| 121 | strings rather than full-fledged scalars). |
| 122 | |
| 123 | =item 3. |
| 124 | X<hash, anonymous> X<{> X<{}> X<curly bracket> |
| 125 | X<bracket, curly> X<brace> X<hashref> X<hash reference> X<reference, hash> |
| 126 | |
| 127 | A reference to an anonymous hash can be created using curly |
| 128 | brackets: |
| 129 | |
| 130 | $hashref = { |
| 131 | 'Adam' => 'Eve', |
| 132 | 'Clyde' => 'Bonnie', |
| 133 | }; |
| 134 | |
| 135 | Anonymous hash and array composers like these can be intermixed freely to |
| 136 | produce as complicated a structure as you want. The multidimensional |
| 137 | syntax described below works for these too. The values above are |
| 138 | literals, but variables and expressions would work just as well, because |
| 139 | assignment operators in Perl (even within local() or my()) are executable |
| 140 | statements, not compile-time declarations. |
| 141 | |
| 142 | Because curly brackets (braces) are used for several other things |
| 143 | including BLOCKs, you may occasionally have to disambiguate braces at the |
| 144 | beginning of a statement by putting a C<+> or a C<return> in front so |
| 145 | that Perl realizes the opening brace isn't starting a BLOCK. The economy and |
| 146 | mnemonic value of using curlies is deemed worth this occasional extra |
| 147 | hassle. |
| 148 | |
| 149 | For example, if you wanted a function to make a new hash and return a |
| 150 | reference to it, you have these options: |
| 151 | |
| 152 | sub hashem { { @_ } } # silently wrong |
| 153 | sub hashem { +{ @_ } } # ok |
| 154 | sub hashem { return { @_ } } # ok |
| 155 | |
| 156 | On the other hand, if you want the other meaning, you can do this: |
| 157 | |
| 158 | sub showem { { @_ } } # ambiguous (currently ok, but may change) |
| 159 | sub showem { {; @_ } } # ok |
| 160 | sub showem { { return @_ } } # ok |
| 161 | |
| 162 | The leading C<+{> and C<{;> always serve to disambiguate |
| 163 | the expression to mean either the HASH reference, or the BLOCK. |
| 164 | |
| 165 | =item 4. |
| 166 | X<subroutine, anonymous> X<subroutine, reference> X<reference, subroutine> |
| 167 | X<scope, lexical> X<closure> X<lexical> X<lexical scope> |
| 168 | |
| 169 | A reference to an anonymous subroutine can be created by using |
| 170 | C<sub> without a subname: |
| 171 | |
| 172 | $coderef = sub { print "Boink!\n" }; |
| 173 | |
| 174 | Note the semicolon. Except for the code |
| 175 | inside not being immediately executed, a C<sub {}> is not so much a |
| 176 | declaration as it is an operator, like C<do{}> or C<eval{}>. (However, no |
| 177 | matter how many times you execute that particular line (unless you're in an |
| 178 | C<eval("...")>), $coderef will still have a reference to the I<same> |
| 179 | anonymous subroutine.) |
| 180 | |
| 181 | Anonymous subroutines act as closures with respect to my() variables, |
| 182 | that is, variables lexically visible within the current scope. Closure |
| 183 | is a notion out of the Lisp world that says if you define an anonymous |
| 184 | function in a particular lexical context, it pretends to run in that |
| 185 | context even when it's called outside the context. |
| 186 | |
| 187 | In human terms, it's a funny way of passing arguments to a subroutine when |
| 188 | you define it as well as when you call it. It's useful for setting up |
| 189 | little bits of code to run later, such as callbacks. You can even |
| 190 | do object-oriented stuff with it, though Perl already provides a different |
| 191 | mechanism to do that--see L<perlobj>. |
| 192 | |
| 193 | You might also think of closure as a way to write a subroutine |
| 194 | template without using eval(). Here's a small example of how |
| 195 | closures work: |
| 196 | |
| 197 | sub newprint { |
| 198 | my $x = shift; |
| 199 | return sub { my $y = shift; print "$x, $y!\n"; }; |
| 200 | } |
| 201 | $h = newprint("Howdy"); |
| 202 | $g = newprint("Greetings"); |
| 203 | |
| 204 | # Time passes... |
| 205 | |
| 206 | &$h("world"); |
| 207 | &$g("earthlings"); |
| 208 | |
| 209 | This prints |
| 210 | |
| 211 | Howdy, world! |
| 212 | Greetings, earthlings! |
| 213 | |
| 214 | Note particularly that $x continues to refer to the value passed |
| 215 | into newprint() I<despite> "my $x" having gone out of scope by the |
| 216 | time the anonymous subroutine runs. That's what a closure is all |
| 217 | about. |
| 218 | |
| 219 | This applies only to lexical variables, by the way. Dynamic variables |
| 220 | continue to work as they have always worked. Closure is not something |
| 221 | that most Perl programmers need trouble themselves about to begin with. |
| 222 | |
| 223 | =item 5. |
| 224 | X<constructor> X<new> |
| 225 | |
| 226 | References are often returned by special subroutines called constructors. Perl |
| 227 | objects are just references to a special type of object that happens to know |
| 228 | which package it's associated with. Constructors are just special subroutines |
| 229 | that know how to create that association. They do so by starting with an |
| 230 | ordinary reference, and it remains an ordinary reference even while it's also |
| 231 | being an object. Constructors are often named C<new()>. You I<can> call them |
| 232 | indirectly: |
| 233 | |
| 234 | $objref = new Doggie( Tail => 'short', Ears => 'long' ); |
| 235 | |
| 236 | But that can produce ambiguous syntax in certain cases, so it's often |
| 237 | better to use the direct method invocation approach: |
| 238 | |
| 239 | $objref = Doggie->new(Tail => 'short', Ears => 'long'); |
| 240 | |
| 241 | use Term::Cap; |
| 242 | $terminal = Term::Cap->Tgetent( { OSPEED => 9600 }); |
| 243 | |
| 244 | use Tk; |
| 245 | $main = MainWindow->new(); |
| 246 | $menubar = $main->Frame(-relief => "raised", |
| 247 | -borderwidth => 2) |
| 248 | |
| 249 | =item 6. |
| 250 | X<autovivification> |
| 251 | |
| 252 | References of the appropriate type can spring into existence if you |
| 253 | dereference them in a context that assumes they exist. Because we haven't |
| 254 | talked about dereferencing yet, we can't show you any examples yet. |
| 255 | |
| 256 | =item 7. |
| 257 | X<*foo{THING}> X<*> |
| 258 | |
| 259 | A reference can be created by using a special syntax, lovingly known as |
| 260 | the *foo{THING} syntax. *foo{THING} returns a reference to the THING |
| 261 | slot in *foo (which is the symbol table entry which holds everything |
| 262 | known as foo). |
| 263 | |
| 264 | $scalarref = *foo{SCALAR}; |
| 265 | $arrayref = *ARGV{ARRAY}; |
| 266 | $hashref = *ENV{HASH}; |
| 267 | $coderef = *handler{CODE}; |
| 268 | $ioref = *STDIN{IO}; |
| 269 | $globref = *foo{GLOB}; |
| 270 | $formatref = *foo{FORMAT}; |
| 271 | |
| 272 | All of these are self-explanatory except for C<*foo{IO}>. It returns |
| 273 | the IO handle, used for file handles (L<perlfunc/open>), sockets |
| 274 | (L<perlfunc/socket> and L<perlfunc/socketpair>), and directory |
| 275 | handles (L<perlfunc/opendir>). For compatibility with previous |
| 276 | versions of Perl, C<*foo{FILEHANDLE}> is a synonym for C<*foo{IO}>, though it |
| 277 | is deprecated as of 5.8.0. If deprecation warnings are in effect, it will warn |
| 278 | of its use. |
| 279 | |
| 280 | C<*foo{THING}> returns undef if that particular THING hasn't been used yet, |
| 281 | except in the case of scalars. C<*foo{SCALAR}> returns a reference to an |
| 282 | anonymous scalar if $foo hasn't been used yet. This might change in a |
| 283 | future release. |
| 284 | |
| 285 | C<*foo{IO}> is an alternative to the C<*HANDLE> mechanism given in |
| 286 | L<perldata/"Typeglobs and Filehandles"> for passing filehandles |
| 287 | into or out of subroutines, or storing into larger data structures. |
| 288 | Its disadvantage is that it won't create a new filehandle for you. |
| 289 | Its advantage is that you have less risk of clobbering more than |
| 290 | you want to with a typeglob assignment. (It still conflates file |
| 291 | and directory handles, though.) However, if you assign the incoming |
| 292 | value to a scalar instead of a typeglob as we do in the examples |
| 293 | below, there's no risk of that happening. |
| 294 | |
| 295 | splutter(*STDOUT); # pass the whole glob |
| 296 | splutter(*STDOUT{IO}); # pass both file and dir handles |
| 297 | |
| 298 | sub splutter { |
| 299 | my $fh = shift; |
| 300 | print $fh "her um well a hmmm\n"; |
| 301 | } |
| 302 | |
| 303 | $rec = get_rec(*STDIN); # pass the whole glob |
| 304 | $rec = get_rec(*STDIN{IO}); # pass both file and dir handles |
| 305 | |
| 306 | sub get_rec { |
| 307 | my $fh = shift; |
| 308 | return scalar <$fh>; |
| 309 | } |
| 310 | |
| 311 | =back |
| 312 | |
| 313 | =head2 Using References |
| 314 | X<reference, use> X<dereferencing> X<dereference> |
| 315 | |
| 316 | That's it for creating references. By now you're probably dying to |
| 317 | know how to use references to get back to your long-lost data. There |
| 318 | are several basic methods. |
| 319 | |
| 320 | =over 4 |
| 321 | |
| 322 | =item 1. |
| 323 | |
| 324 | Anywhere you'd put an identifier (or chain of identifiers) as part |
| 325 | of a variable or subroutine name, you can replace the identifier with |
| 326 | a simple scalar variable containing a reference of the correct type: |
| 327 | |
| 328 | $bar = $$scalarref; |
| 329 | push(@$arrayref, $filename); |
| 330 | $$arrayref[0] = "January"; |
| 331 | $$hashref{"KEY"} = "VALUE"; |
| 332 | &$coderef(1,2,3); |
| 333 | print $globref "output\n"; |
| 334 | |
| 335 | It's important to understand that we are specifically I<not> dereferencing |
| 336 | C<$arrayref[0]> or C<$hashref{"KEY"}> there. The dereference of the |
| 337 | scalar variable happens I<before> it does any key lookups. Anything more |
| 338 | complicated than a simple scalar variable must use methods 2 or 3 below. |
| 339 | However, a "simple scalar" includes an identifier that itself uses method |
| 340 | 1 recursively. Therefore, the following prints "howdy". |
| 341 | |
| 342 | $refrefref = \\\"howdy"; |
| 343 | print $$$$refrefref; |
| 344 | |
| 345 | =item 2. |
| 346 | |
| 347 | Anywhere you'd put an identifier (or chain of identifiers) as part of a |
| 348 | variable or subroutine name, you can replace the identifier with a |
| 349 | BLOCK returning a reference of the correct type. In other words, the |
| 350 | previous examples could be written like this: |
| 351 | |
| 352 | $bar = ${$scalarref}; |
| 353 | push(@{$arrayref}, $filename); |
| 354 | ${$arrayref}[0] = "January"; |
| 355 | ${$hashref}{"KEY"} = "VALUE"; |
| 356 | &{$coderef}(1,2,3); |
| 357 | $globref->print("output\n"); # iff IO::Handle is loaded |
| 358 | |
| 359 | Admittedly, it's a little silly to use the curlies in this case, but |
| 360 | the BLOCK can contain any arbitrary expression, in particular, |
| 361 | subscripted expressions: |
| 362 | |
| 363 | &{ $dispatch{$index} }(1,2,3); # call correct routine |
| 364 | |
| 365 | Because of being able to omit the curlies for the simple case of C<$$x>, |
| 366 | people often make the mistake of viewing the dereferencing symbols as |
| 367 | proper operators, and wonder about their precedence. If they were, |
| 368 | though, you could use parentheses instead of braces. That's not the case. |
| 369 | Consider the difference below; case 0 is a short-hand version of case 1, |
| 370 | I<not> case 2: |
| 371 | |
| 372 | $$hashref{"KEY"} = "VALUE"; # CASE 0 |
| 373 | ${$hashref}{"KEY"} = "VALUE"; # CASE 1 |
| 374 | ${$hashref{"KEY"}} = "VALUE"; # CASE 2 |
| 375 | ${$hashref->{"KEY"}} = "VALUE"; # CASE 3 |
| 376 | |
| 377 | Case 2 is also deceptive in that you're accessing a variable |
| 378 | called %hashref, not dereferencing through $hashref to the hash |
| 379 | it's presumably referencing. That would be case 3. |
| 380 | |
| 381 | =item 3. |
| 382 | |
| 383 | Subroutine calls and lookups of individual array elements arise often |
| 384 | enough that it gets cumbersome to use method 2. As a form of |
| 385 | syntactic sugar, the examples for method 2 may be written: |
| 386 | |
| 387 | $arrayref->[0] = "January"; # Array element |
| 388 | $hashref->{"KEY"} = "VALUE"; # Hash element |
| 389 | $coderef->(1,2,3); # Subroutine call |
| 390 | |
| 391 | The left side of the arrow can be any expression returning a reference, |
| 392 | including a previous dereference. Note that C<$array[$x]> is I<not> the |
| 393 | same thing as C<< $array->[$x] >> here: |
| 394 | |
| 395 | $array[$x]->{"foo"}->[0] = "January"; |
| 396 | |
| 397 | This is one of the cases we mentioned earlier in which references could |
| 398 | spring into existence when in an lvalue context. Before this |
| 399 | statement, C<$array[$x]> may have been undefined. If so, it's |
| 400 | automatically defined with a hash reference so that we can look up |
| 401 | C<{"foo"}> in it. Likewise C<< $array[$x]->{"foo"} >> will automatically get |
| 402 | defined with an array reference so that we can look up C<[0]> in it. |
| 403 | This process is called I<autovivification>. |
| 404 | |
| 405 | One more thing here. The arrow is optional I<between> brackets |
| 406 | subscripts, so you can shrink the above down to |
| 407 | |
| 408 | $array[$x]{"foo"}[0] = "January"; |
| 409 | |
| 410 | Which, in the degenerate case of using only ordinary arrays, gives you |
| 411 | multidimensional arrays just like C's: |
| 412 | |
| 413 | $score[$x][$y][$z] += 42; |
| 414 | |
| 415 | Well, okay, not entirely like C's arrays, actually. C doesn't know how |
| 416 | to grow its arrays on demand. Perl does. |
| 417 | |
| 418 | =item 4. |
| 419 | |
| 420 | If a reference happens to be a reference to an object, then there are |
| 421 | probably methods to access the things referred to, and you should probably |
| 422 | stick to those methods unless you're in the class package that defines the |
| 423 | object's methods. In other words, be nice, and don't violate the object's |
| 424 | encapsulation without a very good reason. Perl does not enforce |
| 425 | encapsulation. We are not totalitarians here. We do expect some basic |
| 426 | civility though. |
| 427 | |
| 428 | =back |
| 429 | |
| 430 | Using a string or number as a reference produces a symbolic reference, |
| 431 | as explained above. Using a reference as a number produces an |
| 432 | integer representing its storage location in memory. The only |
| 433 | useful thing to be done with this is to compare two references |
| 434 | numerically to see whether they refer to the same location. |
| 435 | X<reference, numeric context> |
| 436 | |
| 437 | if ($ref1 == $ref2) { # cheap numeric compare of references |
| 438 | print "refs 1 and 2 refer to the same thing\n"; |
| 439 | } |
| 440 | |
| 441 | Using a reference as a string produces both its referent's type, |
| 442 | including any package blessing as described in L<perlobj>, as well |
| 443 | as the numeric address expressed in hex. The ref() operator returns |
| 444 | just the type of thing the reference is pointing to, without the |
| 445 | address. See L<perlfunc/ref> for details and examples of its use. |
| 446 | X<reference, string context> |
| 447 | |
| 448 | The bless() operator may be used to associate the object a reference |
| 449 | points to with a package functioning as an object class. See L<perlobj>. |
| 450 | |
| 451 | A typeglob may be dereferenced the same way a reference can, because |
| 452 | the dereference syntax always indicates the type of reference desired. |
| 453 | So C<${*foo}> and C<${\$foo}> both indicate the same scalar variable. |
| 454 | |
| 455 | Here's a trick for interpolating a subroutine call into a string: |
| 456 | |
| 457 | print "My sub returned @{[mysub(1,2,3)]} that time.\n"; |
| 458 | |
| 459 | The way it works is that when the C<@{...}> is seen in the double-quoted |
| 460 | string, it's evaluated as a block. The block creates a reference to an |
| 461 | anonymous array containing the results of the call to C<mysub(1,2,3)>. So |
| 462 | the whole block returns a reference to an array, which is then |
| 463 | dereferenced by C<@{...}> and stuck into the double-quoted string. This |
| 464 | chicanery is also useful for arbitrary expressions: |
| 465 | |
| 466 | print "That yields @{[$n + 5]} widgets\n"; |
| 467 | |
| 468 | Similarly, an expression that returns a reference to a scalar can be |
| 469 | dereferenced via C<${...}>. Thus, the above expression may be written |
| 470 | as: |
| 471 | |
| 472 | print "That yields ${\($n + 5)} widgets\n"; |
| 473 | |
| 474 | =head2 Circular References |
| 475 | X<circular reference> X<reference, circular> |
| 476 | |
| 477 | It is possible to create a "circular reference" in Perl, which can lead |
| 478 | to memory leaks. A circular reference occurs when two references |
| 479 | contain a reference to each other, like this: |
| 480 | |
| 481 | my $foo = {}; |
| 482 | my $bar = { foo => $foo }; |
| 483 | $foo->{bar} = $bar; |
| 484 | |
| 485 | You can also create a circular reference with a single variable: |
| 486 | |
| 487 | my $foo; |
| 488 | $foo = \$foo; |
| 489 | |
| 490 | In this case, the reference count for the variables will never reach 0, |
| 491 | and the references will never be garbage-collected. This can lead to |
| 492 | memory leaks. |
| 493 | |
| 494 | Because objects in Perl are implemented as references, it's possible to |
| 495 | have circular references with objects as well. Imagine a TreeNode class |
| 496 | where each node references its parent and child nodes. Any node with a |
| 497 | parent will be part of a circular reference. |
| 498 | |
| 499 | You can break circular references by creating a "weak reference". A |
| 500 | weak reference does not increment the reference count for a variable, |
| 501 | which means that the object can go out of scope and be destroyed. You |
| 502 | can weaken a reference with the C<weaken> function exported by the |
| 503 | L<Scalar::Util> module. |
| 504 | |
| 505 | Here's how we can make the first example safer: |
| 506 | |
| 507 | use Scalar::Util 'weaken'; |
| 508 | |
| 509 | my $foo = {}; |
| 510 | my $bar = { foo => $foo }; |
| 511 | $foo->{bar} = $bar; |
| 512 | |
| 513 | weaken $foo->{bar}; |
| 514 | |
| 515 | The reference from C<$foo> to C<$bar> has been weakened. When the |
| 516 | C<$bar> variable goes out of scope, it will be garbage-collected. The |
| 517 | next time you look at the value of the C<< $foo->{bar} >> key, it will |
| 518 | be C<undef>. |
| 519 | |
| 520 | This action at a distance can be confusing, so you should be careful |
| 521 | with your use of weaken. You should weaken the reference in the |
| 522 | variable that will go out of scope I<first>. That way, the longer-lived |
| 523 | variable will contain the expected reference until it goes out of |
| 524 | scope. |
| 525 | |
| 526 | =head2 Symbolic references |
| 527 | X<reference, symbolic> X<reference, soft> |
| 528 | X<symbolic reference> X<soft reference> |
| 529 | |
| 530 | We said that references spring into existence as necessary if they are |
| 531 | undefined, but we didn't say what happens if a value used as a |
| 532 | reference is already defined, but I<isn't> a hard reference. If you |
| 533 | use it as a reference, it'll be treated as a symbolic |
| 534 | reference. That is, the value of the scalar is taken to be the I<name> |
| 535 | of a variable, rather than a direct link to a (possibly) anonymous |
| 536 | value. |
| 537 | |
| 538 | People frequently expect it to work like this. So it does. |
| 539 | |
| 540 | $name = "foo"; |
| 541 | $$name = 1; # Sets $foo |
| 542 | ${$name} = 2; # Sets $foo |
| 543 | ${$name x 2} = 3; # Sets $foofoo |
| 544 | $name->[0] = 4; # Sets $foo[0] |
| 545 | @$name = (); # Clears @foo |
| 546 | &$name(); # Calls &foo() |
| 547 | $pack = "THAT"; |
| 548 | ${"${pack}::$name"} = 5; # Sets $THAT::foo without eval |
| 549 | |
| 550 | This is powerful, and slightly dangerous, in that it's possible |
| 551 | to intend (with the utmost sincerity) to use a hard reference, and |
| 552 | accidentally use a symbolic reference instead. To protect against |
| 553 | that, you can say |
| 554 | |
| 555 | use strict 'refs'; |
| 556 | |
| 557 | and then only hard references will be allowed for the rest of the enclosing |
| 558 | block. An inner block may countermand that with |
| 559 | |
| 560 | no strict 'refs'; |
| 561 | |
| 562 | Only package variables (globals, even if localized) are visible to |
| 563 | symbolic references. Lexical variables (declared with my()) aren't in |
| 564 | a symbol table, and thus are invisible to this mechanism. For example: |
| 565 | |
| 566 | local $value = 10; |
| 567 | $ref = "value"; |
| 568 | { |
| 569 | my $value = 20; |
| 570 | print $$ref; |
| 571 | } |
| 572 | |
| 573 | This will still print 10, not 20. Remember that local() affects package |
| 574 | variables, which are all "global" to the package. |
| 575 | |
| 576 | =head2 Not-so-symbolic references |
| 577 | |
| 578 | Brackets around a symbolic reference can simply |
| 579 | serve to isolate an identifier or variable name from the rest of an |
| 580 | expression, just as they always have within a string. For example, |
| 581 | |
| 582 | $push = "pop on "; |
| 583 | print "${push}over"; |
| 584 | |
| 585 | has always meant to print "pop on over", even though push is |
| 586 | a reserved word. This is generalized to work the same |
| 587 | without the enclosing double quotes, so that |
| 588 | |
| 589 | print ${push} . "over"; |
| 590 | |
| 591 | and even |
| 592 | |
| 593 | print ${ push } . "over"; |
| 594 | |
| 595 | will have the same effect. This |
| 596 | construct is I<not> considered to be a symbolic reference when you're |
| 597 | using strict refs: |
| 598 | |
| 599 | use strict 'refs'; |
| 600 | ${ bareword }; # Okay, means $bareword. |
| 601 | ${ "bareword" }; # Error, symbolic reference. |
| 602 | |
| 603 | Similarly, because of all the subscripting that is done using single words, |
| 604 | the same rule applies to any bareword that is used for subscripting a hash. |
| 605 | So now, instead of writing |
| 606 | |
| 607 | $array{ "aaa" }{ "bbb" }{ "ccc" } |
| 608 | |
| 609 | you can write just |
| 610 | |
| 611 | $array{ aaa }{ bbb }{ ccc } |
| 612 | |
| 613 | and not worry about whether the subscripts are reserved words. In the |
| 614 | rare event that you do wish to do something like |
| 615 | |
| 616 | $array{ shift } |
| 617 | |
| 618 | you can force interpretation as a reserved word by adding anything that |
| 619 | makes it more than a bareword: |
| 620 | |
| 621 | $array{ shift() } |
| 622 | $array{ +shift } |
| 623 | $array{ shift @_ } |
| 624 | |
| 625 | The C<use warnings> pragma or the B<-w> switch will warn you if it |
| 626 | interprets a reserved word as a string. |
| 627 | But it will no longer warn you about using lowercase words, because the |
| 628 | string is effectively quoted. |
| 629 | |
| 630 | =head2 Pseudo-hashes: Using an array as a hash |
| 631 | X<pseudo-hash> X<pseudo hash> X<pseudohash> |
| 632 | |
| 633 | Pseudo-hashes have been removed from Perl. The 'fields' pragma |
| 634 | remains available. |
| 635 | |
| 636 | =head2 Function Templates |
| 637 | X<scope, lexical> X<closure> X<lexical> X<lexical scope> |
| 638 | X<subroutine, nested> X<sub, nested> X<subroutine, local> X<sub, local> |
| 639 | |
| 640 | As explained above, an anonymous function with access to the lexical |
| 641 | variables visible when that function was compiled, creates a closure. It |
| 642 | retains access to those variables even though it doesn't get run until |
| 643 | later, such as in a signal handler or a Tk callback. |
| 644 | |
| 645 | Using a closure as a function template allows us to generate many functions |
| 646 | that act similarly. Suppose you wanted functions named after the colors |
| 647 | that generated HTML font changes for the various colors: |
| 648 | |
| 649 | print "Be ", red("careful"), "with that ", green("light"); |
| 650 | |
| 651 | The red() and green() functions would be similar. To create these, |
| 652 | we'll assign a closure to a typeglob of the name of the function we're |
| 653 | trying to build. |
| 654 | |
| 655 | @colors = qw(red blue green yellow orange purple violet); |
| 656 | for my $name (@colors) { |
| 657 | no strict 'refs'; # allow symbol table manipulation |
| 658 | *$name = *{uc $name} = sub { "<FONT COLOR='$name'>@_</FONT>" }; |
| 659 | } |
| 660 | |
| 661 | Now all those different functions appear to exist independently. You can |
| 662 | call red(), RED(), blue(), BLUE(), green(), etc. This technique saves on |
| 663 | both compile time and memory use, and is less error-prone as well, since |
| 664 | syntax checks happen at compile time. It's critical that any variables in |
| 665 | the anonymous subroutine be lexicals in order to create a proper closure. |
| 666 | That's the reasons for the C<my> on the loop iteration variable. |
| 667 | |
| 668 | This is one of the only places where giving a prototype to a closure makes |
| 669 | much sense. If you wanted to impose scalar context on the arguments of |
| 670 | these functions (probably not a wise idea for this particular example), |
| 671 | you could have written it this way instead: |
| 672 | |
| 673 | *$name = sub ($) { "<FONT COLOR='$name'>$_[0]</FONT>" }; |
| 674 | |
| 675 | However, since prototype checking happens at compile time, the assignment |
| 676 | above happens too late to be of much use. You could address this by |
| 677 | putting the whole loop of assignments within a BEGIN block, forcing it |
| 678 | to occur during compilation. |
| 679 | |
| 680 | Access to lexicals that change over time--like those in the C<for> loop |
| 681 | above, basically aliases to elements from the surrounding lexical scopes-- |
| 682 | only works with anonymous subs, not with named subroutines. Generally |
| 683 | said, named subroutines do not nest properly and should only be declared |
| 684 | in the main package scope. |
| 685 | |
| 686 | This is because named subroutines are created at compile time so their |
| 687 | lexical variables get assigned to the parent lexicals from the first |
| 688 | execution of the parent block. If a parent scope is entered a second |
| 689 | time, its lexicals are created again, while the nested subs still |
| 690 | reference the old ones. |
| 691 | |
| 692 | Anonymous subroutines get to capture each time you execute the C<sub> |
| 693 | operator, as they are created on the fly. If you are accustomed to using |
| 694 | nested subroutines in other programming languages with their own private |
| 695 | variables, you'll have to work at it a bit in Perl. The intuitive coding |
| 696 | of this type of thing incurs mysterious warnings about "will not stay |
| 697 | shared" due to the reasons explained above. |
| 698 | For example, this won't work: |
| 699 | |
| 700 | sub outer { |
| 701 | my $x = $_[0] + 35; |
| 702 | sub inner { return $x * 19 } # WRONG |
| 703 | return $x + inner(); |
| 704 | } |
| 705 | |
| 706 | A work-around is the following: |
| 707 | |
| 708 | sub outer { |
| 709 | my $x = $_[0] + 35; |
| 710 | local *inner = sub { return $x * 19 }; |
| 711 | return $x + inner(); |
| 712 | } |
| 713 | |
| 714 | Now inner() can only be called from within outer(), because of the |
| 715 | temporary assignments of the anonymous subroutine. But when it does, |
| 716 | it has normal access to the lexical variable $x from the scope of |
| 717 | outer() at the time outer is invoked. |
| 718 | |
| 719 | This has the interesting effect of creating a function local to another |
| 720 | function, something not normally supported in Perl. |
| 721 | |
| 722 | =head1 WARNING |
| 723 | X<reference, string context> X<reference, use as hash key> |
| 724 | |
| 725 | You may not (usefully) use a reference as the key to a hash. It will be |
| 726 | converted into a string: |
| 727 | |
| 728 | $x{ \$a } = $a; |
| 729 | |
| 730 | If you try to dereference the key, it won't do a hard dereference, and |
| 731 | you won't accomplish what you're attempting. You might want to do something |
| 732 | more like |
| 733 | |
| 734 | $r = \@a; |
| 735 | $x{ $r } = $r; |
| 736 | |
| 737 | And then at least you can use the values(), which will be |
| 738 | real refs, instead of the keys(), which won't. |
| 739 | |
| 740 | The standard Tie::RefHash module provides a convenient workaround to this. |
| 741 | |
| 742 | =head1 SEE ALSO |
| 743 | |
| 744 | Besides the obvious documents, source code can be instructive. |
| 745 | Some pathological examples of the use of references can be found |
| 746 | in the F<t/op/ref.t> regression test in the Perl source directory. |
| 747 | |
| 748 | See also L<perldsc> and L<perllol> for how to use references to create |
| 749 | complex data structures, and L<perlootut> and L<perlobj> |
| 750 | for how to use them to create objects. |