| 1 | =head1 NAME |
| 2 | X<reference> X<pointer> X<data structure> X<structure> X<struct> |
| 3 | |
| 4 | perlref - Perl references and nested data structures |
| 5 | |
| 6 | =head1 NOTE |
| 7 | |
| 8 | This is complete documentation about all aspects of references. |
| 9 | For a shorter, tutorial introduction to just the essential features, |
| 10 | see L<perlreftut>. |
| 11 | |
| 12 | =head1 DESCRIPTION |
| 13 | |
| 14 | Before release 5 of Perl it was difficult to represent complex data |
| 15 | structures, because all references had to be symbolic--and even then |
| 16 | it was difficult to refer to a variable instead of a symbol table entry. |
| 17 | Perl now not only makes it easier to use symbolic references to variables, |
| 18 | but also lets you have "hard" references to any piece of data or code. |
| 19 | Any scalar may hold a hard reference. Because arrays and hashes contain |
| 20 | scalars, you can now easily build arrays of arrays, arrays of hashes, |
| 21 | hashes of arrays, arrays of hashes of functions, and so on. |
| 22 | |
| 23 | Hard references are smart--they keep track of reference counts for you, |
| 24 | automatically freeing the thing referred to when its reference count goes |
| 25 | to zero. (Reference counts for values in self-referential or |
| 26 | cyclic data structures may not go to zero without a little help; see |
| 27 | L</"Circular References"> for a detailed explanation.) |
| 28 | If that thing happens to be an object, the object is destructed. See |
| 29 | L<perlobj> for more about objects. (In a sense, everything in Perl is an |
| 30 | object, but we usually reserve the word for references to objects that |
| 31 | have been officially "blessed" into a class package.) |
| 32 | |
| 33 | Symbolic references are names of variables or other objects, just as a |
| 34 | symbolic link in a Unix filesystem contains merely the name of a file. |
| 35 | The C<*glob> notation is something of a symbolic reference. (Symbolic |
| 36 | references are sometimes called "soft references", but please don't call |
| 37 | them that; references are confusing enough without useless synonyms.) |
| 38 | X<reference, symbolic> X<reference, soft> |
| 39 | X<symbolic reference> X<soft reference> |
| 40 | |
| 41 | In contrast, hard references are more like hard links in a Unix file |
| 42 | system: They are used to access an underlying object without concern for |
| 43 | what its (other) name is. When the word "reference" is used without an |
| 44 | adjective, as in the following paragraph, it is usually talking about a |
| 45 | hard reference. |
| 46 | X<reference, hard> X<hard reference> |
| 47 | |
| 48 | References are easy to use in Perl. There is just one overriding |
| 49 | principle: in general, Perl does no implicit referencing or dereferencing. |
| 50 | When a scalar is holding a reference, it always behaves as a simple scalar. |
| 51 | It doesn't magically start being an array or hash or subroutine; you have to |
| 52 | tell it explicitly to do so, by dereferencing it. |
| 53 | |
| 54 | That said, be aware that Perl version 5.14 introduces an exception |
| 55 | to the rule, for syntactic convenience. Experimental array and hash container |
| 56 | function behavior allows array and hash references to be handled by Perl as |
| 57 | if they had been explicitly syntactically dereferenced. See |
| 58 | L<perl5140delta/"Syntactical Enhancements"> |
| 59 | and L<perlfunc> for details. |
| 60 | |
| 61 | =head2 Making References |
| 62 | X<reference, creation> X<referencing> |
| 63 | |
| 64 | References can be created in several ways. |
| 65 | |
| 66 | =over 4 |
| 67 | |
| 68 | =item 1. |
| 69 | X<\> X<backslash> |
| 70 | |
| 71 | By using the backslash operator on a variable, subroutine, or value. |
| 72 | (This works much like the & (address-of) operator in C.) |
| 73 | This typically creates I<another> reference to a variable, because |
| 74 | there's already a reference to the variable in the symbol table. But |
| 75 | the symbol table reference might go away, and you'll still have the |
| 76 | reference that the backslash returned. Here are some examples: |
| 77 | |
| 78 | $scalarref = \$foo; |
| 79 | $arrayref = \@ARGV; |
| 80 | $hashref = \%ENV; |
| 81 | $coderef = \&handler; |
| 82 | $globref = \*foo; |
| 83 | |
| 84 | It isn't possible to create a true reference to an IO handle (filehandle |
| 85 | or dirhandle) using the backslash operator. The most you can get is a |
| 86 | reference to a typeglob, which is actually a complete symbol table entry. |
| 87 | But see the explanation of the C<*foo{THING}> syntax below. However, |
| 88 | you can still use type globs and globrefs as though they were IO handles. |
| 89 | |
| 90 | =item 2. |
| 91 | X<array, anonymous> X<[> X<[]> X<square bracket> |
| 92 | X<bracket, square> X<arrayref> X<array reference> X<reference, array> |
| 93 | |
| 94 | A reference to an anonymous array can be created using square |
| 95 | brackets: |
| 96 | |
| 97 | $arrayref = [1, 2, ['a', 'b', 'c']]; |
| 98 | |
| 99 | Here we've created a reference to an anonymous array of three elements |
| 100 | whose final element is itself a reference to another anonymous array of three |
| 101 | elements. (The multidimensional syntax described later can be used to |
| 102 | access this. For example, after the above, C<< $arrayref->[2][1] >> would have |
| 103 | the value "b".) |
| 104 | |
| 105 | Taking a reference to an enumerated list is not the same |
| 106 | as using square brackets--instead it's the same as creating |
| 107 | a list of references! |
| 108 | |
| 109 | @list = (\$a, \@b, \%c); |
| 110 | @list = \($a, @b, %c); # same thing! |
| 111 | |
| 112 | As a special case, C<\(@foo)> returns a list of references to the contents |
| 113 | of C<@foo>, not a reference to C<@foo> itself. Likewise for C<%foo>, |
| 114 | except that the key references are to copies (since the keys are just |
| 115 | strings rather than full-fledged scalars). |
| 116 | |
| 117 | =item 3. |
| 118 | X<hash, anonymous> X<{> X<{}> X<curly bracket> |
| 119 | X<bracket, curly> X<brace> X<hashref> X<hash reference> X<reference, hash> |
| 120 | |
| 121 | A reference to an anonymous hash can be created using curly |
| 122 | brackets: |
| 123 | |
| 124 | $hashref = { |
| 125 | 'Adam' => 'Eve', |
| 126 | 'Clyde' => 'Bonnie', |
| 127 | }; |
| 128 | |
| 129 | Anonymous hash and array composers like these can be intermixed freely to |
| 130 | produce as complicated a structure as you want. The multidimensional |
| 131 | syntax described below works for these too. The values above are |
| 132 | literals, but variables and expressions would work just as well, because |
| 133 | assignment operators in Perl (even within local() or my()) are executable |
| 134 | statements, not compile-time declarations. |
| 135 | |
| 136 | Because curly brackets (braces) are used for several other things |
| 137 | including BLOCKs, you may occasionally have to disambiguate braces at the |
| 138 | beginning of a statement by putting a C<+> or a C<return> in front so |
| 139 | that Perl realizes the opening brace isn't starting a BLOCK. The economy and |
| 140 | mnemonic value of using curlies is deemed worth this occasional extra |
| 141 | hassle. |
| 142 | |
| 143 | For example, if you wanted a function to make a new hash and return a |
| 144 | reference to it, you have these options: |
| 145 | |
| 146 | sub hashem { { @_ } } # silently wrong |
| 147 | sub hashem { +{ @_ } } # ok |
| 148 | sub hashem { return { @_ } } # ok |
| 149 | |
| 150 | On the other hand, if you want the other meaning, you can do this: |
| 151 | |
| 152 | sub showem { { @_ } } # ambiguous (currently ok, |
| 153 | # but may change) |
| 154 | sub showem { {; @_ } } # ok |
| 155 | sub showem { { return @_ } } # ok |
| 156 | |
| 157 | The leading C<+{> and C<{;> always serve to disambiguate |
| 158 | the expression to mean either the HASH reference, or the BLOCK. |
| 159 | |
| 160 | =item 4. |
| 161 | X<subroutine, anonymous> X<subroutine, reference> X<reference, subroutine> |
| 162 | X<scope, lexical> X<closure> X<lexical> X<lexical scope> |
| 163 | |
| 164 | A reference to an anonymous subroutine can be created by using |
| 165 | C<sub> without a subname: |
| 166 | |
| 167 | $coderef = sub { print "Boink!\n" }; |
| 168 | |
| 169 | Note the semicolon. Except for the code |
| 170 | inside not being immediately executed, a C<sub {}> is not so much a |
| 171 | declaration as it is an operator, like C<do{}> or C<eval{}>. (However, no |
| 172 | matter how many times you execute that particular line (unless you're in an |
| 173 | C<eval("...")>), $coderef will still have a reference to the I<same> |
| 174 | anonymous subroutine.) |
| 175 | |
| 176 | Anonymous subroutines act as closures with respect to my() variables, |
| 177 | that is, variables lexically visible within the current scope. Closure |
| 178 | is a notion out of the Lisp world that says if you define an anonymous |
| 179 | function in a particular lexical context, it pretends to run in that |
| 180 | context even when it's called outside the context. |
| 181 | |
| 182 | In human terms, it's a funny way of passing arguments to a subroutine when |
| 183 | you define it as well as when you call it. It's useful for setting up |
| 184 | little bits of code to run later, such as callbacks. You can even |
| 185 | do object-oriented stuff with it, though Perl already provides a different |
| 186 | mechanism to do that--see L<perlobj>. |
| 187 | |
| 188 | You might also think of closure as a way to write a subroutine |
| 189 | template without using eval(). Here's a small example of how |
| 190 | closures work: |
| 191 | |
| 192 | sub newprint { |
| 193 | my $x = shift; |
| 194 | return sub { my $y = shift; print "$x, $y!\n"; }; |
| 195 | } |
| 196 | $h = newprint("Howdy"); |
| 197 | $g = newprint("Greetings"); |
| 198 | |
| 199 | # Time passes... |
| 200 | |
| 201 | &$h("world"); |
| 202 | &$g("earthlings"); |
| 203 | |
| 204 | This prints |
| 205 | |
| 206 | Howdy, world! |
| 207 | Greetings, earthlings! |
| 208 | |
| 209 | Note particularly that $x continues to refer to the value passed |
| 210 | into newprint() I<despite> "my $x" having gone out of scope by the |
| 211 | time the anonymous subroutine runs. That's what a closure is all |
| 212 | about. |
| 213 | |
| 214 | This applies only to lexical variables, by the way. Dynamic variables |
| 215 | continue to work as they have always worked. Closure is not something |
| 216 | that most Perl programmers need trouble themselves about to begin with. |
| 217 | |
| 218 | =item 5. |
| 219 | X<constructor> X<new> |
| 220 | |
| 221 | References are often returned by special subroutines called constructors. Perl |
| 222 | objects are just references to a special type of object that happens to know |
| 223 | which package it's associated with. Constructors are just special subroutines |
| 224 | that know how to create that association. They do so by starting with an |
| 225 | ordinary reference, and it remains an ordinary reference even while it's also |
| 226 | being an object. Constructors are often named C<new()>. You I<can> call them |
| 227 | indirectly: |
| 228 | |
| 229 | $objref = new Doggie( Tail => 'short', Ears => 'long' ); |
| 230 | |
| 231 | But that can produce ambiguous syntax in certain cases, so it's often |
| 232 | better to use the direct method invocation approach: |
| 233 | |
| 234 | $objref = Doggie->new(Tail => 'short', Ears => 'long'); |
| 235 | |
| 236 | use Term::Cap; |
| 237 | $terminal = Term::Cap->Tgetent( { OSPEED => 9600 }); |
| 238 | |
| 239 | use Tk; |
| 240 | $main = MainWindow->new(); |
| 241 | $menubar = $main->Frame(-relief => "raised", |
| 242 | -borderwidth => 2) |
| 243 | |
| 244 | =item 6. |
| 245 | X<autovivification> |
| 246 | |
| 247 | References of the appropriate type can spring into existence if you |
| 248 | dereference them in a context that assumes they exist. Because we haven't |
| 249 | talked about dereferencing yet, we can't show you any examples yet. |
| 250 | |
| 251 | =item 7. |
| 252 | X<*foo{THING}> X<*> |
| 253 | |
| 254 | A reference can be created by using a special syntax, lovingly known as |
| 255 | the *foo{THING} syntax. *foo{THING} returns a reference to the THING |
| 256 | slot in *foo (which is the symbol table entry which holds everything |
| 257 | known as foo). |
| 258 | |
| 259 | $scalarref = *foo{SCALAR}; |
| 260 | $arrayref = *ARGV{ARRAY}; |
| 261 | $hashref = *ENV{HASH}; |
| 262 | $coderef = *handler{CODE}; |
| 263 | $ioref = *STDIN{IO}; |
| 264 | $globref = *foo{GLOB}; |
| 265 | $formatref = *foo{FORMAT}; |
| 266 | $globname = *foo{NAME}; # "foo" |
| 267 | $pkgname = *foo{PACKAGE}; # "main" |
| 268 | |
| 269 | Most of these are self-explanatory, but C<*foo{IO}> |
| 270 | deserves special attention. It returns |
| 271 | the IO handle, used for file handles (L<perlfunc/open>), sockets |
| 272 | (L<perlfunc/socket> and L<perlfunc/socketpair>), and directory |
| 273 | handles (L<perlfunc/opendir>). For compatibility with previous |
| 274 | versions of Perl, C<*foo{FILEHANDLE}> is a synonym for C<*foo{IO}>, though it |
| 275 | is deprecated as of 5.8.0. If deprecation warnings are in effect, it will warn |
| 276 | of its use. |
| 277 | |
| 278 | C<*foo{THING}> returns undef if that particular THING hasn't been used yet, |
| 279 | except in the case of scalars. C<*foo{SCALAR}> returns a reference to an |
| 280 | anonymous scalar if $foo hasn't been used yet. This might change in a |
| 281 | future release. |
| 282 | |
| 283 | C<*foo{NAME}> and C<*foo{PACKAGE}> are the exception, in that they return |
| 284 | strings, rather than references. These return the package and name of the |
| 285 | typeglob itself, rather than one that has been assigned to it. So, after |
| 286 | C<*foo=*Foo::bar>, C<*foo> will become "*Foo::bar" when used as a string, |
| 287 | but C<*foo{PACKAGE}> and C<*foo{NAME}> will continue to produce "main" and |
| 288 | "foo", respectively. |
| 289 | |
| 290 | C<*foo{IO}> is an alternative to the C<*HANDLE> mechanism given in |
| 291 | L<perldata/"Typeglobs and Filehandles"> for passing filehandles |
| 292 | into or out of subroutines, or storing into larger data structures. |
| 293 | Its disadvantage is that it won't create a new filehandle for you. |
| 294 | Its advantage is that you have less risk of clobbering more than |
| 295 | you want to with a typeglob assignment. (It still conflates file |
| 296 | and directory handles, though.) However, if you assign the incoming |
| 297 | value to a scalar instead of a typeglob as we do in the examples |
| 298 | below, there's no risk of that happening. |
| 299 | |
| 300 | splutter(*STDOUT); # pass the whole glob |
| 301 | splutter(*STDOUT{IO}); # pass both file and dir handles |
| 302 | |
| 303 | sub splutter { |
| 304 | my $fh = shift; |
| 305 | print $fh "her um well a hmmm\n"; |
| 306 | } |
| 307 | |
| 308 | $rec = get_rec(*STDIN); # pass the whole glob |
| 309 | $rec = get_rec(*STDIN{IO}); # pass both file and dir handles |
| 310 | |
| 311 | sub get_rec { |
| 312 | my $fh = shift; |
| 313 | return scalar <$fh>; |
| 314 | } |
| 315 | |
| 316 | =back |
| 317 | |
| 318 | =head2 Using References |
| 319 | X<reference, use> X<dereferencing> X<dereference> |
| 320 | |
| 321 | That's it for creating references. By now you're probably dying to |
| 322 | know how to use references to get back to your long-lost data. There |
| 323 | are several basic methods. |
| 324 | |
| 325 | =over 4 |
| 326 | |
| 327 | =item 1. |
| 328 | |
| 329 | Anywhere you'd put an identifier (or chain of identifiers) as part |
| 330 | of a variable or subroutine name, you can replace the identifier with |
| 331 | a simple scalar variable containing a reference of the correct type: |
| 332 | |
| 333 | $bar = $$scalarref; |
| 334 | push(@$arrayref, $filename); |
| 335 | $$arrayref[0] = "January"; |
| 336 | $$hashref{"KEY"} = "VALUE"; |
| 337 | &$coderef(1,2,3); |
| 338 | print $globref "output\n"; |
| 339 | |
| 340 | It's important to understand that we are specifically I<not> dereferencing |
| 341 | C<$arrayref[0]> or C<$hashref{"KEY"}> there. The dereference of the |
| 342 | scalar variable happens I<before> it does any key lookups. Anything more |
| 343 | complicated than a simple scalar variable must use methods 2 or 3 below. |
| 344 | However, a "simple scalar" includes an identifier that itself uses method |
| 345 | 1 recursively. Therefore, the following prints "howdy". |
| 346 | |
| 347 | $refrefref = \\\"howdy"; |
| 348 | print $$$$refrefref; |
| 349 | |
| 350 | =item 2. |
| 351 | |
| 352 | Anywhere you'd put an identifier (or chain of identifiers) as part of a |
| 353 | variable or subroutine name, you can replace the identifier with a |
| 354 | BLOCK returning a reference of the correct type. In other words, the |
| 355 | previous examples could be written like this: |
| 356 | |
| 357 | $bar = ${$scalarref}; |
| 358 | push(@{$arrayref}, $filename); |
| 359 | ${$arrayref}[0] = "January"; |
| 360 | ${$hashref}{"KEY"} = "VALUE"; |
| 361 | &{$coderef}(1,2,3); |
| 362 | $globref->print("output\n"); # iff IO::Handle is loaded |
| 363 | |
| 364 | Admittedly, it's a little silly to use the curlies in this case, but |
| 365 | the BLOCK can contain any arbitrary expression, in particular, |
| 366 | subscripted expressions: |
| 367 | |
| 368 | &{ $dispatch{$index} }(1,2,3); # call correct routine |
| 369 | |
| 370 | Because of being able to omit the curlies for the simple case of C<$$x>, |
| 371 | people often make the mistake of viewing the dereferencing symbols as |
| 372 | proper operators, and wonder about their precedence. If they were, |
| 373 | though, you could use parentheses instead of braces. That's not the case. |
| 374 | Consider the difference below; case 0 is a short-hand version of case 1, |
| 375 | I<not> case 2: |
| 376 | |
| 377 | $$hashref{"KEY"} = "VALUE"; # CASE 0 |
| 378 | ${$hashref}{"KEY"} = "VALUE"; # CASE 1 |
| 379 | ${$hashref{"KEY"}} = "VALUE"; # CASE 2 |
| 380 | ${$hashref->{"KEY"}} = "VALUE"; # CASE 3 |
| 381 | |
| 382 | Case 2 is also deceptive in that you're accessing a variable |
| 383 | called %hashref, not dereferencing through $hashref to the hash |
| 384 | it's presumably referencing. That would be case 3. |
| 385 | |
| 386 | =item 3. |
| 387 | |
| 388 | Subroutine calls and lookups of individual array elements arise often |
| 389 | enough that it gets cumbersome to use method 2. As a form of |
| 390 | syntactic sugar, the examples for method 2 may be written: |
| 391 | |
| 392 | $arrayref->[0] = "January"; # Array element |
| 393 | $hashref->{"KEY"} = "VALUE"; # Hash element |
| 394 | $coderef->(1,2,3); # Subroutine call |
| 395 | |
| 396 | The left side of the arrow can be any expression returning a reference, |
| 397 | including a previous dereference. Note that C<$array[$x]> is I<not> the |
| 398 | same thing as C<< $array->[$x] >> here: |
| 399 | |
| 400 | $array[$x]->{"foo"}->[0] = "January"; |
| 401 | |
| 402 | This is one of the cases we mentioned earlier in which references could |
| 403 | spring into existence when in an lvalue context. Before this |
| 404 | statement, C<$array[$x]> may have been undefined. If so, it's |
| 405 | automatically defined with a hash reference so that we can look up |
| 406 | C<{"foo"}> in it. Likewise C<< $array[$x]->{"foo"} >> will automatically get |
| 407 | defined with an array reference so that we can look up C<[0]> in it. |
| 408 | This process is called I<autovivification>. |
| 409 | |
| 410 | One more thing here. The arrow is optional I<between> brackets |
| 411 | subscripts, so you can shrink the above down to |
| 412 | |
| 413 | $array[$x]{"foo"}[0] = "January"; |
| 414 | |
| 415 | Which, in the degenerate case of using only ordinary arrays, gives you |
| 416 | multidimensional arrays just like C's: |
| 417 | |
| 418 | $score[$x][$y][$z] += 42; |
| 419 | |
| 420 | Well, okay, not entirely like C's arrays, actually. C doesn't know how |
| 421 | to grow its arrays on demand. Perl does. |
| 422 | |
| 423 | =item 4. |
| 424 | |
| 425 | If a reference happens to be a reference to an object, then there are |
| 426 | probably methods to access the things referred to, and you should probably |
| 427 | stick to those methods unless you're in the class package that defines the |
| 428 | object's methods. In other words, be nice, and don't violate the object's |
| 429 | encapsulation without a very good reason. Perl does not enforce |
| 430 | encapsulation. We are not totalitarians here. We do expect some basic |
| 431 | civility though. |
| 432 | |
| 433 | =back |
| 434 | |
| 435 | Using a string or number as a reference produces a symbolic reference, |
| 436 | as explained above. Using a reference as a number produces an |
| 437 | integer representing its storage location in memory. The only |
| 438 | useful thing to be done with this is to compare two references |
| 439 | numerically to see whether they refer to the same location. |
| 440 | X<reference, numeric context> |
| 441 | |
| 442 | if ($ref1 == $ref2) { # cheap numeric compare of references |
| 443 | print "refs 1 and 2 refer to the same thing\n"; |
| 444 | } |
| 445 | |
| 446 | Using a reference as a string produces both its referent's type, |
| 447 | including any package blessing as described in L<perlobj>, as well |
| 448 | as the numeric address expressed in hex. The ref() operator returns |
| 449 | just the type of thing the reference is pointing to, without the |
| 450 | address. See L<perlfunc/ref> for details and examples of its use. |
| 451 | X<reference, string context> |
| 452 | |
| 453 | The bless() operator may be used to associate the object a reference |
| 454 | points to with a package functioning as an object class. See L<perlobj>. |
| 455 | |
| 456 | A typeglob may be dereferenced the same way a reference can, because |
| 457 | the dereference syntax always indicates the type of reference desired. |
| 458 | So C<${*foo}> and C<${\$foo}> both indicate the same scalar variable. |
| 459 | |
| 460 | Here's a trick for interpolating a subroutine call into a string: |
| 461 | |
| 462 | print "My sub returned @{[mysub(1,2,3)]} that time.\n"; |
| 463 | |
| 464 | The way it works is that when the C<@{...}> is seen in the double-quoted |
| 465 | string, it's evaluated as a block. The block creates a reference to an |
| 466 | anonymous array containing the results of the call to C<mysub(1,2,3)>. So |
| 467 | the whole block returns a reference to an array, which is then |
| 468 | dereferenced by C<@{...}> and stuck into the double-quoted string. This |
| 469 | chicanery is also useful for arbitrary expressions: |
| 470 | |
| 471 | print "That yields @{[$n + 5]} widgets\n"; |
| 472 | |
| 473 | Similarly, an expression that returns a reference to a scalar can be |
| 474 | dereferenced via C<${...}>. Thus, the above expression may be written |
| 475 | as: |
| 476 | |
| 477 | print "That yields ${\($n + 5)} widgets\n"; |
| 478 | |
| 479 | =head2 Circular References |
| 480 | X<circular reference> X<reference, circular> |
| 481 | |
| 482 | It is possible to create a "circular reference" in Perl, which can lead |
| 483 | to memory leaks. A circular reference occurs when two references |
| 484 | contain a reference to each other, like this: |
| 485 | |
| 486 | my $foo = {}; |
| 487 | my $bar = { foo => $foo }; |
| 488 | $foo->{bar} = $bar; |
| 489 | |
| 490 | You can also create a circular reference with a single variable: |
| 491 | |
| 492 | my $foo; |
| 493 | $foo = \$foo; |
| 494 | |
| 495 | In this case, the reference count for the variables will never reach 0, |
| 496 | and the references will never be garbage-collected. This can lead to |
| 497 | memory leaks. |
| 498 | |
| 499 | Because objects in Perl are implemented as references, it's possible to |
| 500 | have circular references with objects as well. Imagine a TreeNode class |
| 501 | where each node references its parent and child nodes. Any node with a |
| 502 | parent will be part of a circular reference. |
| 503 | |
| 504 | You can break circular references by creating a "weak reference". A |
| 505 | weak reference does not increment the reference count for a variable, |
| 506 | which means that the object can go out of scope and be destroyed. You |
| 507 | can weaken a reference with the C<weaken> function exported by the |
| 508 | L<Scalar::Util> module. |
| 509 | |
| 510 | Here's how we can make the first example safer: |
| 511 | |
| 512 | use Scalar::Util 'weaken'; |
| 513 | |
| 514 | my $foo = {}; |
| 515 | my $bar = { foo => $foo }; |
| 516 | $foo->{bar} = $bar; |
| 517 | |
| 518 | weaken $foo->{bar}; |
| 519 | |
| 520 | The reference from C<$foo> to C<$bar> has been weakened. When the |
| 521 | C<$bar> variable goes out of scope, it will be garbage-collected. The |
| 522 | next time you look at the value of the C<< $foo->{bar} >> key, it will |
| 523 | be C<undef>. |
| 524 | |
| 525 | This action at a distance can be confusing, so you should be careful |
| 526 | with your use of weaken. You should weaken the reference in the |
| 527 | variable that will go out of scope I<first>. That way, the longer-lived |
| 528 | variable will contain the expected reference until it goes out of |
| 529 | scope. |
| 530 | |
| 531 | =head2 Symbolic references |
| 532 | X<reference, symbolic> X<reference, soft> |
| 533 | X<symbolic reference> X<soft reference> |
| 534 | |
| 535 | We said that references spring into existence as necessary if they are |
| 536 | undefined, but we didn't say what happens if a value used as a |
| 537 | reference is already defined, but I<isn't> a hard reference. If you |
| 538 | use it as a reference, it'll be treated as a symbolic |
| 539 | reference. That is, the value of the scalar is taken to be the I<name> |
| 540 | of a variable, rather than a direct link to a (possibly) anonymous |
| 541 | value. |
| 542 | |
| 543 | People frequently expect it to work like this. So it does. |
| 544 | |
| 545 | $name = "foo"; |
| 546 | $$name = 1; # Sets $foo |
| 547 | ${$name} = 2; # Sets $foo |
| 548 | ${$name x 2} = 3; # Sets $foofoo |
| 549 | $name->[0] = 4; # Sets $foo[0] |
| 550 | @$name = (); # Clears @foo |
| 551 | &$name(); # Calls &foo() |
| 552 | $pack = "THAT"; |
| 553 | ${"${pack}::$name"} = 5; # Sets $THAT::foo without eval |
| 554 | |
| 555 | This is powerful, and slightly dangerous, in that it's possible |
| 556 | to intend (with the utmost sincerity) to use a hard reference, and |
| 557 | accidentally use a symbolic reference instead. To protect against |
| 558 | that, you can say |
| 559 | |
| 560 | use strict 'refs'; |
| 561 | |
| 562 | and then only hard references will be allowed for the rest of the enclosing |
| 563 | block. An inner block may countermand that with |
| 564 | |
| 565 | no strict 'refs'; |
| 566 | |
| 567 | Only package variables (globals, even if localized) are visible to |
| 568 | symbolic references. Lexical variables (declared with my()) aren't in |
| 569 | a symbol table, and thus are invisible to this mechanism. For example: |
| 570 | |
| 571 | local $value = 10; |
| 572 | $ref = "value"; |
| 573 | { |
| 574 | my $value = 20; |
| 575 | print $$ref; |
| 576 | } |
| 577 | |
| 578 | This will still print 10, not 20. Remember that local() affects package |
| 579 | variables, which are all "global" to the package. |
| 580 | |
| 581 | =head2 Not-so-symbolic references |
| 582 | |
| 583 | Brackets around a symbolic reference can simply |
| 584 | serve to isolate an identifier or variable name from the rest of an |
| 585 | expression, just as they always have within a string. For example, |
| 586 | |
| 587 | $push = "pop on "; |
| 588 | print "${push}over"; |
| 589 | |
| 590 | has always meant to print "pop on over", even though push is |
| 591 | a reserved word. This is generalized to work the same |
| 592 | without the enclosing double quotes, so that |
| 593 | |
| 594 | print ${push} . "over"; |
| 595 | |
| 596 | and even |
| 597 | |
| 598 | print ${ push } . "over"; |
| 599 | |
| 600 | will have the same effect. This |
| 601 | construct is I<not> considered to be a symbolic reference when you're |
| 602 | using strict refs: |
| 603 | |
| 604 | use strict 'refs'; |
| 605 | ${ bareword }; # Okay, means $bareword. |
| 606 | ${ "bareword" }; # Error, symbolic reference. |
| 607 | |
| 608 | Similarly, because of all the subscripting that is done using single words, |
| 609 | the same rule applies to any bareword that is used for subscripting a hash. |
| 610 | So now, instead of writing |
| 611 | |
| 612 | $array{ "aaa" }{ "bbb" }{ "ccc" } |
| 613 | |
| 614 | you can write just |
| 615 | |
| 616 | $array{ aaa }{ bbb }{ ccc } |
| 617 | |
| 618 | and not worry about whether the subscripts are reserved words. In the |
| 619 | rare event that you do wish to do something like |
| 620 | |
| 621 | $array{ shift } |
| 622 | |
| 623 | you can force interpretation as a reserved word by adding anything that |
| 624 | makes it more than a bareword: |
| 625 | |
| 626 | $array{ shift() } |
| 627 | $array{ +shift } |
| 628 | $array{ shift @_ } |
| 629 | |
| 630 | The C<use warnings> pragma or the B<-w> switch will warn you if it |
| 631 | interprets a reserved word as a string. |
| 632 | But it will no longer warn you about using lowercase words, because the |
| 633 | string is effectively quoted. |
| 634 | |
| 635 | =head2 Pseudo-hashes: Using an array as a hash |
| 636 | X<pseudo-hash> X<pseudo hash> X<pseudohash> |
| 637 | |
| 638 | Pseudo-hashes have been removed from Perl. The 'fields' pragma |
| 639 | remains available. |
| 640 | |
| 641 | =head2 Function Templates |
| 642 | X<scope, lexical> X<closure> X<lexical> X<lexical scope> |
| 643 | X<subroutine, nested> X<sub, nested> X<subroutine, local> X<sub, local> |
| 644 | |
| 645 | As explained above, an anonymous function with access to the lexical |
| 646 | variables visible when that function was compiled, creates a closure. It |
| 647 | retains access to those variables even though it doesn't get run until |
| 648 | later, such as in a signal handler or a Tk callback. |
| 649 | |
| 650 | Using a closure as a function template allows us to generate many functions |
| 651 | that act similarly. Suppose you wanted functions named after the colors |
| 652 | that generated HTML font changes for the various colors: |
| 653 | |
| 654 | print "Be ", red("careful"), "with that ", green("light"); |
| 655 | |
| 656 | The red() and green() functions would be similar. To create these, |
| 657 | we'll assign a closure to a typeglob of the name of the function we're |
| 658 | trying to build. |
| 659 | |
| 660 | @colors = qw(red blue green yellow orange purple violet); |
| 661 | for my $name (@colors) { |
| 662 | no strict 'refs'; # allow symbol table manipulation |
| 663 | *$name = *{uc $name} = sub { "<FONT COLOR='$name'>@_</FONT>" }; |
| 664 | } |
| 665 | |
| 666 | Now all those different functions appear to exist independently. You can |
| 667 | call red(), RED(), blue(), BLUE(), green(), etc. This technique saves on |
| 668 | both compile time and memory use, and is less error-prone as well, since |
| 669 | syntax checks happen at compile time. It's critical that any variables in |
| 670 | the anonymous subroutine be lexicals in order to create a proper closure. |
| 671 | That's the reasons for the C<my> on the loop iteration variable. |
| 672 | |
| 673 | This is one of the only places where giving a prototype to a closure makes |
| 674 | much sense. If you wanted to impose scalar context on the arguments of |
| 675 | these functions (probably not a wise idea for this particular example), |
| 676 | you could have written it this way instead: |
| 677 | |
| 678 | *$name = sub ($) { "<FONT COLOR='$name'>$_[0]</FONT>" }; |
| 679 | |
| 680 | However, since prototype checking happens at compile time, the assignment |
| 681 | above happens too late to be of much use. You could address this by |
| 682 | putting the whole loop of assignments within a BEGIN block, forcing it |
| 683 | to occur during compilation. |
| 684 | |
| 685 | Access to lexicals that change over time--like those in the C<for> loop |
| 686 | above, basically aliases to elements from the surrounding lexical scopes-- |
| 687 | only works with anonymous subs, not with named subroutines. Generally |
| 688 | said, named subroutines do not nest properly and should only be declared |
| 689 | in the main package scope. |
| 690 | |
| 691 | This is because named subroutines are created at compile time so their |
| 692 | lexical variables get assigned to the parent lexicals from the first |
| 693 | execution of the parent block. If a parent scope is entered a second |
| 694 | time, its lexicals are created again, while the nested subs still |
| 695 | reference the old ones. |
| 696 | |
| 697 | Anonymous subroutines get to capture each time you execute the C<sub> |
| 698 | operator, as they are created on the fly. If you are accustomed to using |
| 699 | nested subroutines in other programming languages with their own private |
| 700 | variables, you'll have to work at it a bit in Perl. The intuitive coding |
| 701 | of this type of thing incurs mysterious warnings about "will not stay |
| 702 | shared" due to the reasons explained above. |
| 703 | For example, this won't work: |
| 704 | |
| 705 | sub outer { |
| 706 | my $x = $_[0] + 35; |
| 707 | sub inner { return $x * 19 } # WRONG |
| 708 | return $x + inner(); |
| 709 | } |
| 710 | |
| 711 | A work-around is the following: |
| 712 | |
| 713 | sub outer { |
| 714 | my $x = $_[0] + 35; |
| 715 | local *inner = sub { return $x * 19 }; |
| 716 | return $x + inner(); |
| 717 | } |
| 718 | |
| 719 | Now inner() can only be called from within outer(), because of the |
| 720 | temporary assignments of the anonymous subroutine. But when it does, |
| 721 | it has normal access to the lexical variable $x from the scope of |
| 722 | outer() at the time outer is invoked. |
| 723 | |
| 724 | This has the interesting effect of creating a function local to another |
| 725 | function, something not normally supported in Perl. |
| 726 | |
| 727 | =head1 WARNING |
| 728 | X<reference, string context> X<reference, use as hash key> |
| 729 | |
| 730 | You may not (usefully) use a reference as the key to a hash. It will be |
| 731 | converted into a string: |
| 732 | |
| 733 | $x{ \$a } = $a; |
| 734 | |
| 735 | If you try to dereference the key, it won't do a hard dereference, and |
| 736 | you won't accomplish what you're attempting. You might want to do something |
| 737 | more like |
| 738 | |
| 739 | $r = \@a; |
| 740 | $x{ $r } = $r; |
| 741 | |
| 742 | And then at least you can use the values(), which will be |
| 743 | real refs, instead of the keys(), which won't. |
| 744 | |
| 745 | The standard Tie::RefHash module provides a convenient workaround to this. |
| 746 | |
| 747 | =head1 Postfix Dereference Syntax |
| 748 | |
| 749 | Beginning in v5.20.0, a postfix syntax for using references is |
| 750 | available. It behaves as described in L</Using References>, but instead |
| 751 | of a prefixed sigil, a postfixed sigil-and-star is used. |
| 752 | |
| 753 | For example: |
| 754 | |
| 755 | $r = \@a; |
| 756 | @b = $r->@*; # equivalent to @$r or @{ $r } |
| 757 | |
| 758 | $r = [ 1, [ 2, 3 ], 4 ]; |
| 759 | $r->[1]->@*; # equivalent to @{ $r->[1] } |
| 760 | |
| 761 | This syntax must be enabled with C<use feature 'postderef'>. |
| 762 | |
| 763 | Postfix dereference should work in all circumstances where block |
| 764 | (circumfix) dereference worked, and should be entirely equivalent. This |
| 765 | syntax allows dereferencing to be written and read entirely |
| 766 | left-to-right. The following equivalencies are defined: |
| 767 | |
| 768 | $sref->$*; # same as ${ $sref } |
| 769 | $aref->@*; # same as @{ $aref } |
| 770 | $aref->$#*; # same as $#{ $aref } |
| 771 | $href->%*; # same as %{ $href } |
| 772 | $cref->&*; # same as &{ $cref } |
| 773 | $gref->**; # same as *{ $gref } |
| 774 | |
| 775 | Note especially that C<< $cref->&* >> is I<not> equivalent to C<< |
| 776 | $cref->() >>, and can serve different purposes. |
| 777 | |
| 778 | Glob elements can be extracted through the postfix dereferencing feature: |
| 779 | |
| 780 | $gref->*{SCALAR}; # same as *{ $gref }{SCALAR} |
| 781 | |
| 782 | Postfix array and scalar dereferencing I<can> be used in interpolating |
| 783 | strings (double quotes or the C<qq> operator), but only if the |
| 784 | additional C<postderef_qq> feature is enabled. |
| 785 | |
| 786 | =head2 Postfix Reference Slicing |
| 787 | |
| 788 | Value slices of arrays and hashes may also be taken with postfix |
| 789 | dereferencing notation, with the following equivalencies: |
| 790 | |
| 791 | $aref->@[ ... ]; # same as @$aref[ ... ] |
| 792 | $href->@{ ... }; # same as @$href{ ... } |
| 793 | |
| 794 | Postfix key/value pair slicing, added in 5.20.0 and documented in |
| 795 | L<the KeyE<sol>Value Hash Slices section of perldata|perldata/"Key/Value Hash |
| 796 | Slices">, also behaves as expected: |
| 797 | |
| 798 | $aref->%[ ... ]; # same as %$aref[ ... ] |
| 799 | $href->%{ ... }; # same as %$href{ ... } |
| 800 | |
| 801 | As with postfix array, postfix value slice dereferencing I<can> be used |
| 802 | in interpolating strings (double quotes or the C<qq> operator), but only |
| 803 | if the additional C<postderef_qq> L<feature> is enabled. |
| 804 | |
| 805 | =head1 Assigning to References |
| 806 | |
| 807 | Beginning in v5.22.0, the referencing operator can be assigned to. It |
| 808 | performs an aliasing operation, so that the variable name referenced on the |
| 809 | left-hand side becomes an alias for the thing referenced on the right-hand |
| 810 | side: |
| 811 | |
| 812 | \$a = \$b; # $a and $b now point to the same scalar |
| 813 | \&foo = \&bar; # foo() now means bar() |
| 814 | |
| 815 | This syntax must be enabled with C<use feature 'refaliasing'>. It is |
| 816 | experimental, and will warn by default unless C<no warnings |
| 817 | 'experimental::refaliasing'> is in effect. |
| 818 | |
| 819 | These forms may be assigned to, and cause the right-hand side to be |
| 820 | evaluated in scalar context: |
| 821 | |
| 822 | \$scalar |
| 823 | \@array |
| 824 | \%hash |
| 825 | \&sub |
| 826 | \my $scalar |
| 827 | \my @array |
| 828 | \my %hash |
| 829 | \state $scalar # or @array, etc. |
| 830 | \our $scalar # etc. |
| 831 | \local $scalar # etc. |
| 832 | \local our $scalar # etc. |
| 833 | \$some_array[$index] |
| 834 | \$some_hash{$key} |
| 835 | \local $some_array[$index] |
| 836 | \local $some_hash{$key} |
| 837 | condition ? \$this : \$that[0] # etc. |
| 838 | |
| 839 | Slicing operations and parentheses cause |
| 840 | the right-hand side to be evaluated in |
| 841 | list context: |
| 842 | |
| 843 | \@array[5..7] |
| 844 | (\@array[5..7]) |
| 845 | \(@array[5..7]) |
| 846 | \@hash{'foo','bar'} |
| 847 | (\@hash{'foo','bar'}) |
| 848 | \(@hash{'foo','bar'}) |
| 849 | (\$scalar) |
| 850 | \($scalar) |
| 851 | \(my $scalar) |
| 852 | \my($scalar) |
| 853 | (\@array) |
| 854 | (\%hash) |
| 855 | (\&sub) |
| 856 | \(&sub) |
| 857 | \($foo, @bar, %baz) |
| 858 | (\$foo, \@bar, \%baz) |
| 859 | |
| 860 | Each element on the right-hand side must be a reference to a datum of the |
| 861 | right type. Parentheses immediately surrounding an array (and possibly |
| 862 | also C<my>/C<state>/C<our>/C<local>) will make each element of the array an |
| 863 | alias to the corresponding scalar referenced on the right-hand side: |
| 864 | |
| 865 | \(@a) = \(@b); # @a and @b now have the same elements |
| 866 | \my(@a) = \(@b); # likewise |
| 867 | \(my @a) = \(@b); # likewise |
| 868 | push @a, 3; # but now @a has an extra element that @b lacks |
| 869 | \(@a) = (\$a, \$b, \$c); # @a now contains $a, $b, and $c |
| 870 | |
| 871 | Combining that form with C<local> and putting parentheses immediately |
| 872 | around a hash are forbidden (because it is not clear what they should do): |
| 873 | |
| 874 | \local(@array) = foo(); # WRONG |
| 875 | \(%hash) = bar(); # wRONG |
| 876 | |
| 877 | Assignment to references and non-references may be combined in lists and |
| 878 | conditional ternary expressions, as long as the values on the right-hand |
| 879 | side are the right type for each element on the left, though this may make |
| 880 | for obfuscated code: |
| 881 | |
| 882 | (my $tom, \my $dick, \my @harry) = (\1, \2, [1..3]); |
| 883 | # $tom is now \1 |
| 884 | # $dick is now 2 (read-only) |
| 885 | # @harry is (1,2,3) |
| 886 | |
| 887 | my $type = ref $thingy; |
| 888 | ($type ? $type == 'ARRAY' ? \@foo : \$bar : $baz) = $thingy; |
| 889 | |
| 890 | The C<foreach> loop can also take a reference constructor for its loop |
| 891 | variable, though the syntax is limited to one of the following, with an |
| 892 | optional C<my>, C<state>, or C<our> after the backslash: |
| 893 | |
| 894 | \$s |
| 895 | \@a |
| 896 | \%h |
| 897 | \&c |
| 898 | |
| 899 | No parentheses are permitted. This feature is particularly useful for |
| 900 | arrays-of-arrays, or arrays-of-hashes: |
| 901 | |
| 902 | foreach \my @a (@array_of_arrays) { |
| 903 | frobnicate($a[0], $a[-1]); |
| 904 | } |
| 905 | |
| 906 | foreach \my %h (@array_of_hashes) { |
| 907 | $h{gelastic}++ if $h{type} == 'funny'; |
| 908 | } |
| 909 | |
| 910 | B<CAVEAT:> Aliasing does not work correctly with closures. If you try to |
| 911 | alias lexical variables from an inner subroutine or C<eval>, the aliasing |
| 912 | will only be visible within that inner sub, and will not affect the outer |
| 913 | subroutine where the variables are declared. This bizarre behavior is |
| 914 | subject to change. |
| 915 | |
| 916 | =head1 SEE ALSO |
| 917 | |
| 918 | Besides the obvious documents, source code can be instructive. |
| 919 | Some pathological examples of the use of references can be found |
| 920 | in the F<t/op/ref.t> regression test in the Perl source directory. |
| 921 | |
| 922 | See also L<perldsc> and L<perllol> for how to use references to create |
| 923 | complex data structures, and L<perlootut> and L<perlobj> |
| 924 | for how to use them to create objects. |