| 1 | =head1 NAME |
| 2 | |
| 3 | perlref - Perl references and nested data structures |
| 4 | |
| 5 | =head1 NOTE |
| 6 | |
| 7 | This is complete documentation about all aspects of references. |
| 8 | For a shorter, tutorial introduction to just the essential features, |
| 9 | see L<perlreftut>. |
| 10 | |
| 11 | =head1 DESCRIPTION |
| 12 | |
| 13 | Before release 5 of Perl it was difficult to represent complex data |
| 14 | structures, because all references had to be symbolic--and even then |
| 15 | it was difficult to refer to a variable instead of a symbol table entry. |
| 16 | Perl now not only makes it easier to use symbolic references to variables, |
| 17 | but also lets you have "hard" references to any piece of data or code. |
| 18 | Any scalar may hold a hard reference. Because arrays and hashes contain |
| 19 | scalars, you can now easily build arrays of arrays, arrays of hashes, |
| 20 | hashes of arrays, arrays of hashes of functions, and so on. |
| 21 | |
| 22 | Hard references are smart--they keep track of reference counts for you, |
| 23 | automatically freeing the thing referred to when its reference count goes |
| 24 | to zero. (Reference counts for values in self-referential or |
| 25 | cyclic data structures may not go to zero without a little help; see |
| 26 | L<perlobj/"Two-Phased Garbage Collection"> for a detailed explanation.) |
| 27 | If that thing happens to be an object, the object is destructed. See |
| 28 | L<perlobj> for more about objects. (In a sense, everything in Perl is an |
| 29 | object, but we usually reserve the word for references to objects that |
| 30 | have been officially "blessed" into a class package.) |
| 31 | |
| 32 | Symbolic references are names of variables or other objects, just as a |
| 33 | symbolic link in a Unix filesystem contains merely the name of a file. |
| 34 | The C<*glob> notation is something of a symbolic reference. (Symbolic |
| 35 | references are sometimes called "soft references", but please don't call |
| 36 | them that; references are confusing enough without useless synonyms.) |
| 37 | |
| 38 | In contrast, hard references are more like hard links in a Unix file |
| 39 | system: They are used to access an underlying object without concern for |
| 40 | what its (other) name is. When the word "reference" is used without an |
| 41 | adjective, as in the following paragraph, it is usually talking about a |
| 42 | hard reference. |
| 43 | |
| 44 | References are easy to use in Perl. There is just one overriding |
| 45 | principle: Perl does no implicit referencing or dereferencing. When a |
| 46 | scalar is holding a reference, it always behaves as a simple scalar. It |
| 47 | doesn't magically start being an array or hash or subroutine; you have to |
| 48 | tell it explicitly to do so, by dereferencing it. |
| 49 | |
| 50 | =head2 Making References |
| 51 | |
| 52 | References can be created in several ways. |
| 53 | |
| 54 | =over 4 |
| 55 | |
| 56 | =item 1. |
| 57 | |
| 58 | By using the backslash operator on a variable, subroutine, or value. |
| 59 | (This works much like the & (address-of) operator in C.) |
| 60 | This typically creates I<another> reference to a variable, because |
| 61 | there's already a reference to the variable in the symbol table. But |
| 62 | the symbol table reference might go away, and you'll still have the |
| 63 | reference that the backslash returned. Here are some examples: |
| 64 | |
| 65 | $scalarref = \$foo; |
| 66 | $arrayref = \@ARGV; |
| 67 | $hashref = \%ENV; |
| 68 | $coderef = \&handler; |
| 69 | $globref = \*foo; |
| 70 | |
| 71 | It isn't possible to create a true reference to an IO handle (filehandle |
| 72 | or dirhandle) using the backslash operator. The most you can get is a |
| 73 | reference to a typeglob, which is actually a complete symbol table entry. |
| 74 | But see the explanation of the C<*foo{THING}> syntax below. However, |
| 75 | you can still use type globs and globrefs as though they were IO handles. |
| 76 | |
| 77 | =item 2. |
| 78 | |
| 79 | A reference to an anonymous array can be created using square |
| 80 | brackets: |
| 81 | |
| 82 | $arrayref = [1, 2, ['a', 'b', 'c']]; |
| 83 | |
| 84 | Here we've created a reference to an anonymous array of three elements |
| 85 | whose final element is itself a reference to another anonymous array of three |
| 86 | elements. (The multidimensional syntax described later can be used to |
| 87 | access this. For example, after the above, C<< $arrayref->[2][1] >> would have |
| 88 | the value "b".) |
| 89 | |
| 90 | Taking a reference to an enumerated list is not the same |
| 91 | as using square brackets--instead it's the same as creating |
| 92 | a list of references! |
| 93 | |
| 94 | @list = (\$a, \@b, \%c); |
| 95 | @list = \($a, @b, %c); # same thing! |
| 96 | |
| 97 | As a special case, C<\(@foo)> returns a list of references to the contents |
| 98 | of C<@foo>, not a reference to C<@foo> itself. Likewise for C<%foo>, |
| 99 | except that the key references are to copies (since the keys are just |
| 100 | strings rather than full-fledged scalars). |
| 101 | |
| 102 | =item 3. |
| 103 | |
| 104 | A reference to an anonymous hash can be created using curly |
| 105 | brackets: |
| 106 | |
| 107 | $hashref = { |
| 108 | 'Adam' => 'Eve', |
| 109 | 'Clyde' => 'Bonnie', |
| 110 | }; |
| 111 | |
| 112 | Anonymous hash and array composers like these can be intermixed freely to |
| 113 | produce as complicated a structure as you want. The multidimensional |
| 114 | syntax described below works for these too. The values above are |
| 115 | literals, but variables and expressions would work just as well, because |
| 116 | assignment operators in Perl (even within local() or my()) are executable |
| 117 | statements, not compile-time declarations. |
| 118 | |
| 119 | Because curly brackets (braces) are used for several other things |
| 120 | including BLOCKs, you may occasionally have to disambiguate braces at the |
| 121 | beginning of a statement by putting a C<+> or a C<return> in front so |
| 122 | that Perl realizes the opening brace isn't starting a BLOCK. The economy and |
| 123 | mnemonic value of using curlies is deemed worth this occasional extra |
| 124 | hassle. |
| 125 | |
| 126 | For example, if you wanted a function to make a new hash and return a |
| 127 | reference to it, you have these options: |
| 128 | |
| 129 | sub hashem { { @_ } } # silently wrong |
| 130 | sub hashem { +{ @_ } } # ok |
| 131 | sub hashem { return { @_ } } # ok |
| 132 | |
| 133 | On the other hand, if you want the other meaning, you can do this: |
| 134 | |
| 135 | sub showem { { @_ } } # ambiguous (currently ok, but may change) |
| 136 | sub showem { {; @_ } } # ok |
| 137 | sub showem { { return @_ } } # ok |
| 138 | |
| 139 | The leading C<+{> and C<{;> always serve to disambiguate |
| 140 | the expression to mean either the HASH reference, or the BLOCK. |
| 141 | |
| 142 | =item 4. |
| 143 | |
| 144 | A reference to an anonymous subroutine can be created by using |
| 145 | C<sub> without a subname: |
| 146 | |
| 147 | $coderef = sub { print "Boink!\n" }; |
| 148 | |
| 149 | Note the semicolon. Except for the code |
| 150 | inside not being immediately executed, a C<sub {}> is not so much a |
| 151 | declaration as it is an operator, like C<do{}> or C<eval{}>. (However, no |
| 152 | matter how many times you execute that particular line (unless you're in an |
| 153 | C<eval("...")>), $coderef will still have a reference to the I<same> |
| 154 | anonymous subroutine.) |
| 155 | |
| 156 | Anonymous subroutines act as closures with respect to my() variables, |
| 157 | that is, variables lexically visible within the current scope. Closure |
| 158 | is a notion out of the Lisp world that says if you define an anonymous |
| 159 | function in a particular lexical context, it pretends to run in that |
| 160 | context even when it's called outside the context. |
| 161 | |
| 162 | In human terms, it's a funny way of passing arguments to a subroutine when |
| 163 | you define it as well as when you call it. It's useful for setting up |
| 164 | little bits of code to run later, such as callbacks. You can even |
| 165 | do object-oriented stuff with it, though Perl already provides a different |
| 166 | mechanism to do that--see L<perlobj>. |
| 167 | |
| 168 | You might also think of closure as a way to write a subroutine |
| 169 | template without using eval(). Here's a small example of how |
| 170 | closures work: |
| 171 | |
| 172 | sub newprint { |
| 173 | my $x = shift; |
| 174 | return sub { my $y = shift; print "$x, $y!\n"; }; |
| 175 | } |
| 176 | $h = newprint("Howdy"); |
| 177 | $g = newprint("Greetings"); |
| 178 | |
| 179 | # Time passes... |
| 180 | |
| 181 | &$h("world"); |
| 182 | &$g("earthlings"); |
| 183 | |
| 184 | This prints |
| 185 | |
| 186 | Howdy, world! |
| 187 | Greetings, earthlings! |
| 188 | |
| 189 | Note particularly that $x continues to refer to the value passed |
| 190 | into newprint() I<despite> "my $x" having gone out of scope by the |
| 191 | time the anonymous subroutine runs. That's what a closure is all |
| 192 | about. |
| 193 | |
| 194 | This applies only to lexical variables, by the way. Dynamic variables |
| 195 | continue to work as they have always worked. Closure is not something |
| 196 | that most Perl programmers need trouble themselves about to begin with. |
| 197 | |
| 198 | =item 5. |
| 199 | |
| 200 | References are often returned by special subroutines called constructors. |
| 201 | Perl objects are just references to a special type of object that happens to know |
| 202 | which package it's associated with. Constructors are just special |
| 203 | subroutines that know how to create that association. They do so by |
| 204 | starting with an ordinary reference, and it remains an ordinary reference |
| 205 | even while it's also being an object. Constructors are often |
| 206 | named new() and called indirectly: |
| 207 | |
| 208 | $objref = new Doggie (Tail => 'short', Ears => 'long'); |
| 209 | |
| 210 | But don't have to be: |
| 211 | |
| 212 | $objref = Doggie->new(Tail => 'short', Ears => 'long'); |
| 213 | |
| 214 | use Term::Cap; |
| 215 | $terminal = Term::Cap->Tgetent( { OSPEED => 9600 }); |
| 216 | |
| 217 | use Tk; |
| 218 | $main = MainWindow->new(); |
| 219 | $menubar = $main->Frame(-relief => "raised", |
| 220 | -borderwidth => 2) |
| 221 | |
| 222 | =item 6. |
| 223 | |
| 224 | References of the appropriate type can spring into existence if you |
| 225 | dereference them in a context that assumes they exist. Because we haven't |
| 226 | talked about dereferencing yet, we can't show you any examples yet. |
| 227 | |
| 228 | =item 7. |
| 229 | |
| 230 | A reference can be created by using a special syntax, lovingly known as |
| 231 | the *foo{THING} syntax. *foo{THING} returns a reference to the THING |
| 232 | slot in *foo (which is the symbol table entry which holds everything |
| 233 | known as foo). |
| 234 | |
| 235 | $scalarref = *foo{SCALAR}; |
| 236 | $arrayref = *ARGV{ARRAY}; |
| 237 | $hashref = *ENV{HASH}; |
| 238 | $coderef = *handler{CODE}; |
| 239 | $ioref = *STDIN{IO}; |
| 240 | $globref = *foo{GLOB}; |
| 241 | |
| 242 | All of these are self-explanatory except for C<*foo{IO}>. It returns |
| 243 | the IO handle, used for file handles (L<perlfunc/open>), sockets |
| 244 | (L<perlfunc/socket> and L<perlfunc/socketpair>), and directory |
| 245 | handles (L<perlfunc/opendir>). For compatibility with previous |
| 246 | versions of Perl, C<*foo{FILEHANDLE}> is a synonym for C<*foo{IO}>, though it |
| 247 | is deprecated as of 5.8.0. If deprecation warnings are in effect, it will warn |
| 248 | of its use. |
| 249 | |
| 250 | C<*foo{THING}> returns undef if that particular THING hasn't been used yet, |
| 251 | except in the case of scalars. C<*foo{SCALAR}> returns a reference to an |
| 252 | anonymous scalar if $foo hasn't been used yet. This might change in a |
| 253 | future release. |
| 254 | |
| 255 | C<*foo{IO}> is an alternative to the C<*HANDLE> mechanism given in |
| 256 | L<perldata/"Typeglobs and Filehandles"> for passing filehandles |
| 257 | into or out of subroutines, or storing into larger data structures. |
| 258 | Its disadvantage is that it won't create a new filehandle for you. |
| 259 | Its advantage is that you have less risk of clobbering more than |
| 260 | you want to with a typeglob assignment. (It still conflates file |
| 261 | and directory handles, though.) However, if you assign the incoming |
| 262 | value to a scalar instead of a typeglob as we do in the examples |
| 263 | below, there's no risk of that happening. |
| 264 | |
| 265 | splutter(*STDOUT); # pass the whole glob |
| 266 | splutter(*STDOUT{IO}); # pass both file and dir handles |
| 267 | |
| 268 | sub splutter { |
| 269 | my $fh = shift; |
| 270 | print $fh "her um well a hmmm\n"; |
| 271 | } |
| 272 | |
| 273 | $rec = get_rec(*STDIN); # pass the whole glob |
| 274 | $rec = get_rec(*STDIN{IO}); # pass both file and dir handles |
| 275 | |
| 276 | sub get_rec { |
| 277 | my $fh = shift; |
| 278 | return scalar <$fh>; |
| 279 | } |
| 280 | |
| 281 | =back |
| 282 | |
| 283 | =head2 Using References |
| 284 | |
| 285 | That's it for creating references. By now you're probably dying to |
| 286 | know how to use references to get back to your long-lost data. There |
| 287 | are several basic methods. |
| 288 | |
| 289 | =over 4 |
| 290 | |
| 291 | =item 1. |
| 292 | |
| 293 | Anywhere you'd put an identifier (or chain of identifiers) as part |
| 294 | of a variable or subroutine name, you can replace the identifier with |
| 295 | a simple scalar variable containing a reference of the correct type: |
| 296 | |
| 297 | $bar = $$scalarref; |
| 298 | push(@$arrayref, $filename); |
| 299 | $$arrayref[0] = "January"; |
| 300 | $$hashref{"KEY"} = "VALUE"; |
| 301 | &$coderef(1,2,3); |
| 302 | print $globref "output\n"; |
| 303 | |
| 304 | It's important to understand that we are specifically I<not> dereferencing |
| 305 | C<$arrayref[0]> or C<$hashref{"KEY"}> there. The dereference of the |
| 306 | scalar variable happens I<before> it does any key lookups. Anything more |
| 307 | complicated than a simple scalar variable must use methods 2 or 3 below. |
| 308 | However, a "simple scalar" includes an identifier that itself uses method |
| 309 | 1 recursively. Therefore, the following prints "howdy". |
| 310 | |
| 311 | $refrefref = \\\"howdy"; |
| 312 | print $$$$refrefref; |
| 313 | |
| 314 | =item 2. |
| 315 | |
| 316 | Anywhere you'd put an identifier (or chain of identifiers) as part of a |
| 317 | variable or subroutine name, you can replace the identifier with a |
| 318 | BLOCK returning a reference of the correct type. In other words, the |
| 319 | previous examples could be written like this: |
| 320 | |
| 321 | $bar = ${$scalarref}; |
| 322 | push(@{$arrayref}, $filename); |
| 323 | ${$arrayref}[0] = "January"; |
| 324 | ${$hashref}{"KEY"} = "VALUE"; |
| 325 | &{$coderef}(1,2,3); |
| 326 | $globref->print("output\n"); # iff IO::Handle is loaded |
| 327 | |
| 328 | Admittedly, it's a little silly to use the curlies in this case, but |
| 329 | the BLOCK can contain any arbitrary expression, in particular, |
| 330 | subscripted expressions: |
| 331 | |
| 332 | &{ $dispatch{$index} }(1,2,3); # call correct routine |
| 333 | |
| 334 | Because of being able to omit the curlies for the simple case of C<$$x>, |
| 335 | people often make the mistake of viewing the dereferencing symbols as |
| 336 | proper operators, and wonder about their precedence. If they were, |
| 337 | though, you could use parentheses instead of braces. That's not the case. |
| 338 | Consider the difference below; case 0 is a short-hand version of case 1, |
| 339 | I<not> case 2: |
| 340 | |
| 341 | $$hashref{"KEY"} = "VALUE"; # CASE 0 |
| 342 | ${$hashref}{"KEY"} = "VALUE"; # CASE 1 |
| 343 | ${$hashref{"KEY"}} = "VALUE"; # CASE 2 |
| 344 | ${$hashref->{"KEY"}} = "VALUE"; # CASE 3 |
| 345 | |
| 346 | Case 2 is also deceptive in that you're accessing a variable |
| 347 | called %hashref, not dereferencing through $hashref to the hash |
| 348 | it's presumably referencing. That would be case 3. |
| 349 | |
| 350 | =item 3. |
| 351 | |
| 352 | Subroutine calls and lookups of individual array elements arise often |
| 353 | enough that it gets cumbersome to use method 2. As a form of |
| 354 | syntactic sugar, the examples for method 2 may be written: |
| 355 | |
| 356 | $arrayref->[0] = "January"; # Array element |
| 357 | $hashref->{"KEY"} = "VALUE"; # Hash element |
| 358 | $coderef->(1,2,3); # Subroutine call |
| 359 | |
| 360 | The left side of the arrow can be any expression returning a reference, |
| 361 | including a previous dereference. Note that C<$array[$x]> is I<not> the |
| 362 | same thing as C<< $array->[$x] >> here: |
| 363 | |
| 364 | $array[$x]->{"foo"}->[0] = "January"; |
| 365 | |
| 366 | This is one of the cases we mentioned earlier in which references could |
| 367 | spring into existence when in an lvalue context. Before this |
| 368 | statement, C<$array[$x]> may have been undefined. If so, it's |
| 369 | automatically defined with a hash reference so that we can look up |
| 370 | C<{"foo"}> in it. Likewise C<< $array[$x]->{"foo"} >> will automatically get |
| 371 | defined with an array reference so that we can look up C<[0]> in it. |
| 372 | This process is called I<autovivification>. |
| 373 | |
| 374 | One more thing here. The arrow is optional I<between> brackets |
| 375 | subscripts, so you can shrink the above down to |
| 376 | |
| 377 | $array[$x]{"foo"}[0] = "January"; |
| 378 | |
| 379 | Which, in the degenerate case of using only ordinary arrays, gives you |
| 380 | multidimensional arrays just like C's: |
| 381 | |
| 382 | $score[$x][$y][$z] += 42; |
| 383 | |
| 384 | Well, okay, not entirely like C's arrays, actually. C doesn't know how |
| 385 | to grow its arrays on demand. Perl does. |
| 386 | |
| 387 | =item 4. |
| 388 | |
| 389 | If a reference happens to be a reference to an object, then there are |
| 390 | probably methods to access the things referred to, and you should probably |
| 391 | stick to those methods unless you're in the class package that defines the |
| 392 | object's methods. In other words, be nice, and don't violate the object's |
| 393 | encapsulation without a very good reason. Perl does not enforce |
| 394 | encapsulation. We are not totalitarians here. We do expect some basic |
| 395 | civility though. |
| 396 | |
| 397 | =back |
| 398 | |
| 399 | Using a string or number as a reference produces a symbolic reference, |
| 400 | as explained above. Using a reference as a number produces an |
| 401 | integer representing its storage location in memory. The only |
| 402 | useful thing to be done with this is to compare two references |
| 403 | numerically to see whether they refer to the same location. |
| 404 | |
| 405 | if ($ref1 == $ref2) { # cheap numeric compare of references |
| 406 | print "refs 1 and 2 refer to the same thing\n"; |
| 407 | } |
| 408 | |
| 409 | Using a reference as a string produces both its referent's type, |
| 410 | including any package blessing as described in L<perlobj>, as well |
| 411 | as the numeric address expressed in hex. The ref() operator returns |
| 412 | just the type of thing the reference is pointing to, without the |
| 413 | address. See L<perlfunc/ref> for details and examples of its use. |
| 414 | |
| 415 | The bless() operator may be used to associate the object a reference |
| 416 | points to with a package functioning as an object class. See L<perlobj>. |
| 417 | |
| 418 | A typeglob may be dereferenced the same way a reference can, because |
| 419 | the dereference syntax always indicates the type of reference desired. |
| 420 | So C<${*foo}> and C<${\$foo}> both indicate the same scalar variable. |
| 421 | |
| 422 | Here's a trick for interpolating a subroutine call into a string: |
| 423 | |
| 424 | print "My sub returned @{[mysub(1,2,3)]} that time.\n"; |
| 425 | |
| 426 | The way it works is that when the C<@{...}> is seen in the double-quoted |
| 427 | string, it's evaluated as a block. The block creates a reference to an |
| 428 | anonymous array containing the results of the call to C<mysub(1,2,3)>. So |
| 429 | the whole block returns a reference to an array, which is then |
| 430 | dereferenced by C<@{...}> and stuck into the double-quoted string. This |
| 431 | chicanery is also useful for arbitrary expressions: |
| 432 | |
| 433 | print "That yields @{[$n + 5]} widgets\n"; |
| 434 | |
| 435 | =head2 Symbolic references |
| 436 | |
| 437 | We said that references spring into existence as necessary if they are |
| 438 | undefined, but we didn't say what happens if a value used as a |
| 439 | reference is already defined, but I<isn't> a hard reference. If you |
| 440 | use it as a reference, it'll be treated as a symbolic |
| 441 | reference. That is, the value of the scalar is taken to be the I<name> |
| 442 | of a variable, rather than a direct link to a (possibly) anonymous |
| 443 | value. |
| 444 | |
| 445 | People frequently expect it to work like this. So it does. |
| 446 | |
| 447 | $name = "foo"; |
| 448 | $$name = 1; # Sets $foo |
| 449 | ${$name} = 2; # Sets $foo |
| 450 | ${$name x 2} = 3; # Sets $foofoo |
| 451 | $name->[0] = 4; # Sets $foo[0] |
| 452 | @$name = (); # Clears @foo |
| 453 | &$name(); # Calls &foo() (as in Perl 4) |
| 454 | $pack = "THAT"; |
| 455 | ${"${pack}::$name"} = 5; # Sets $THAT::foo without eval |
| 456 | |
| 457 | This is powerful, and slightly dangerous, in that it's possible |
| 458 | to intend (with the utmost sincerity) to use a hard reference, and |
| 459 | accidentally use a symbolic reference instead. To protect against |
| 460 | that, you can say |
| 461 | |
| 462 | use strict 'refs'; |
| 463 | |
| 464 | and then only hard references will be allowed for the rest of the enclosing |
| 465 | block. An inner block may countermand that with |
| 466 | |
| 467 | no strict 'refs'; |
| 468 | |
| 469 | Only package variables (globals, even if localized) are visible to |
| 470 | symbolic references. Lexical variables (declared with my()) aren't in |
| 471 | a symbol table, and thus are invisible to this mechanism. For example: |
| 472 | |
| 473 | local $value = 10; |
| 474 | $ref = "value"; |
| 475 | { |
| 476 | my $value = 20; |
| 477 | print $$ref; |
| 478 | } |
| 479 | |
| 480 | This will still print 10, not 20. Remember that local() affects package |
| 481 | variables, which are all "global" to the package. |
| 482 | |
| 483 | =head2 Not-so-symbolic references |
| 484 | |
| 485 | A new feature contributing to readability in perl version 5.001 is that the |
| 486 | brackets around a symbolic reference behave more like quotes, just as they |
| 487 | always have within a string. That is, |
| 488 | |
| 489 | $push = "pop on "; |
| 490 | print "${push}over"; |
| 491 | |
| 492 | has always meant to print "pop on over", even though push is |
| 493 | a reserved word. This has been generalized to work the same outside |
| 494 | of quotes, so that |
| 495 | |
| 496 | print ${push} . "over"; |
| 497 | |
| 498 | and even |
| 499 | |
| 500 | print ${ push } . "over"; |
| 501 | |
| 502 | will have the same effect. (This would have been a syntax error in |
| 503 | Perl 5.000, though Perl 4 allowed it in the spaceless form.) This |
| 504 | construct is I<not> considered to be a symbolic reference when you're |
| 505 | using strict refs: |
| 506 | |
| 507 | use strict 'refs'; |
| 508 | ${ bareword }; # Okay, means $bareword. |
| 509 | ${ "bareword" }; # Error, symbolic reference. |
| 510 | |
| 511 | Similarly, because of all the subscripting that is done using single |
| 512 | words, we've applied the same rule to any bareword that is used for |
| 513 | subscripting a hash. So now, instead of writing |
| 514 | |
| 515 | $array{ "aaa" }{ "bbb" }{ "ccc" } |
| 516 | |
| 517 | you can write just |
| 518 | |
| 519 | $array{ aaa }{ bbb }{ ccc } |
| 520 | |
| 521 | and not worry about whether the subscripts are reserved words. In the |
| 522 | rare event that you do wish to do something like |
| 523 | |
| 524 | $array{ shift } |
| 525 | |
| 526 | you can force interpretation as a reserved word by adding anything that |
| 527 | makes it more than a bareword: |
| 528 | |
| 529 | $array{ shift() } |
| 530 | $array{ +shift } |
| 531 | $array{ shift @_ } |
| 532 | |
| 533 | The C<use warnings> pragma or the B<-w> switch will warn you if it |
| 534 | interprets a reserved word as a string. |
| 535 | But it will no longer warn you about using lowercase words, because the |
| 536 | string is effectively quoted. |
| 537 | |
| 538 | =head2 Pseudo-hashes: Using an array as a hash |
| 539 | |
| 540 | B<WARNING>: This section describes an experimental feature. Details may |
| 541 | change without notice in future versions. |
| 542 | |
| 543 | B<NOTE>: The current user-visible implementation of pseudo-hashes |
| 544 | (the weird use of the first array element) is deprecated starting from |
| 545 | Perl 5.8.0 and will be removed in Perl 5.10.0, and the feature will be |
| 546 | implemented differently. Not only is the current interface rather ugly, |
| 547 | but the current implementation slows down normal array and hash use quite |
| 548 | noticeably. The 'fields' pragma interface will remain available. |
| 549 | |
| 550 | Beginning with release 5.005 of Perl, you may use an array reference |
| 551 | in some contexts that would normally require a hash reference. This |
| 552 | allows you to access array elements using symbolic names, as if they |
| 553 | were fields in a structure. |
| 554 | |
| 555 | For this to work, the array must contain extra information. The first |
| 556 | element of the array has to be a hash reference that maps field names |
| 557 | to array indices. Here is an example: |
| 558 | |
| 559 | $struct = [{foo => 1, bar => 2}, "FOO", "BAR"]; |
| 560 | |
| 561 | $struct->{foo}; # same as $struct->[1], i.e. "FOO" |
| 562 | $struct->{bar}; # same as $struct->[2], i.e. "BAR" |
| 563 | |
| 564 | keys %$struct; # will return ("foo", "bar") in some order |
| 565 | values %$struct; # will return ("FOO", "BAR") in same some order |
| 566 | |
| 567 | while (my($k,$v) = each %$struct) { |
| 568 | print "$k => $v\n"; |
| 569 | } |
| 570 | |
| 571 | Perl will raise an exception if you try to access nonexistent fields. |
| 572 | To avoid inconsistencies, always use the fields::phash() function |
| 573 | provided by the C<fields> pragma. |
| 574 | |
| 575 | use fields; |
| 576 | $pseudohash = fields::phash(foo => "FOO", bar => "BAR"); |
| 577 | |
| 578 | For better performance, Perl can also do the translation from field |
| 579 | names to array indices at compile time for typed object references. |
| 580 | See L<fields>. |
| 581 | |
| 582 | There are two ways to check for the existence of a key in a |
| 583 | pseudo-hash. The first is to use exists(). This checks to see if the |
| 584 | given field has ever been set. It acts this way to match the behavior |
| 585 | of a regular hash. For instance: |
| 586 | |
| 587 | use fields; |
| 588 | $phash = fields::phash([qw(foo bar pants)], ['FOO']); |
| 589 | $phash->{pants} = undef; |
| 590 | |
| 591 | print exists $phash->{foo}; # true, 'foo' was set in the declaration |
| 592 | print exists $phash->{bar}; # false, 'bar' has not been used. |
| 593 | print exists $phash->{pants}; # true, your 'pants' have been touched |
| 594 | |
| 595 | The second is to use exists() on the hash reference sitting in the |
| 596 | first array element. This checks to see if the given key is a valid |
| 597 | field in the pseudo-hash. |
| 598 | |
| 599 | print exists $phash->[0]{bar}; # true, 'bar' is a valid field |
| 600 | print exists $phash->[0]{shoes};# false, 'shoes' can't be used |
| 601 | |
| 602 | delete() on a pseudo-hash element only deletes the value corresponding |
| 603 | to the key, not the key itself. To delete the key, you'll have to |
| 604 | explicitly delete it from the first hash element. |
| 605 | |
| 606 | print delete $phash->{foo}; # prints $phash->[1], "FOO" |
| 607 | print exists $phash->{foo}; # false |
| 608 | print exists $phash->[0]{foo}; # true, key still exists |
| 609 | print delete $phash->[0]{foo}; # now key is gone |
| 610 | print $phash->{foo}; # runtime exception |
| 611 | |
| 612 | =head2 Function Templates |
| 613 | |
| 614 | As explained above, a closure is an anonymous function with access to the |
| 615 | lexical variables visible when that function was compiled. It retains |
| 616 | access to those variables even though it doesn't get run until later, |
| 617 | such as in a signal handler or a Tk callback. |
| 618 | |
| 619 | Using a closure as a function template allows us to generate many functions |
| 620 | that act similarly. Suppose you wanted functions named after the colors |
| 621 | that generated HTML font changes for the various colors: |
| 622 | |
| 623 | print "Be ", red("careful"), "with that ", green("light"); |
| 624 | |
| 625 | The red() and green() functions would be similar. To create these, |
| 626 | we'll assign a closure to a typeglob of the name of the function we're |
| 627 | trying to build. |
| 628 | |
| 629 | @colors = qw(red blue green yellow orange purple violet); |
| 630 | for my $name (@colors) { |
| 631 | no strict 'refs'; # allow symbol table manipulation |
| 632 | *$name = *{uc $name} = sub { "<FONT COLOR='$name'>@_</FONT>" }; |
| 633 | } |
| 634 | |
| 635 | Now all those different functions appear to exist independently. You can |
| 636 | call red(), RED(), blue(), BLUE(), green(), etc. This technique saves on |
| 637 | both compile time and memory use, and is less error-prone as well, since |
| 638 | syntax checks happen at compile time. It's critical that any variables in |
| 639 | the anonymous subroutine be lexicals in order to create a proper closure. |
| 640 | That's the reasons for the C<my> on the loop iteration variable. |
| 641 | |
| 642 | This is one of the only places where giving a prototype to a closure makes |
| 643 | much sense. If you wanted to impose scalar context on the arguments of |
| 644 | these functions (probably not a wise idea for this particular example), |
| 645 | you could have written it this way instead: |
| 646 | |
| 647 | *$name = sub ($) { "<FONT COLOR='$name'>$_[0]</FONT>" }; |
| 648 | |
| 649 | However, since prototype checking happens at compile time, the assignment |
| 650 | above happens too late to be of much use. You could address this by |
| 651 | putting the whole loop of assignments within a BEGIN block, forcing it |
| 652 | to occur during compilation. |
| 653 | |
| 654 | Access to lexicals that change over type--like those in the C<for> loop |
| 655 | above--only works with closures, not general subroutines. In the general |
| 656 | case, then, named subroutines do not nest properly, although anonymous |
| 657 | ones do. If you are accustomed to using nested subroutines in other |
| 658 | programming languages with their own private variables, you'll have to |
| 659 | work at it a bit in Perl. The intuitive coding of this type of thing |
| 660 | incurs mysterious warnings about ``will not stay shared''. For example, |
| 661 | this won't work: |
| 662 | |
| 663 | sub outer { |
| 664 | my $x = $_[0] + 35; |
| 665 | sub inner { return $x * 19 } # WRONG |
| 666 | return $x + inner(); |
| 667 | } |
| 668 | |
| 669 | A work-around is the following: |
| 670 | |
| 671 | sub outer { |
| 672 | my $x = $_[0] + 35; |
| 673 | local *inner = sub { return $x * 19 }; |
| 674 | return $x + inner(); |
| 675 | } |
| 676 | |
| 677 | Now inner() can only be called from within outer(), because of the |
| 678 | temporary assignments of the closure (anonymous subroutine). But when |
| 679 | it does, it has normal access to the lexical variable $x from the scope |
| 680 | of outer(). |
| 681 | |
| 682 | This has the interesting effect of creating a function local to another |
| 683 | function, something not normally supported in Perl. |
| 684 | |
| 685 | =head1 WARNING |
| 686 | |
| 687 | You may not (usefully) use a reference as the key to a hash. It will be |
| 688 | converted into a string: |
| 689 | |
| 690 | $x{ \$a } = $a; |
| 691 | |
| 692 | If you try to dereference the key, it won't do a hard dereference, and |
| 693 | you won't accomplish what you're attempting. You might want to do something |
| 694 | more like |
| 695 | |
| 696 | $r = \@a; |
| 697 | $x{ $r } = $r; |
| 698 | |
| 699 | And then at least you can use the values(), which will be |
| 700 | real refs, instead of the keys(), which won't. |
| 701 | |
| 702 | The standard Tie::RefHash module provides a convenient workaround to this. |
| 703 | |
| 704 | =head1 SEE ALSO |
| 705 | |
| 706 | Besides the obvious documents, source code can be instructive. |
| 707 | Some pathological examples of the use of references can be found |
| 708 | in the F<t/op/ref.t> regression test in the Perl source directory. |
| 709 | |
| 710 | See also L<perldsc> and L<perllol> for how to use references to create |
| 711 | complex data structures, and L<perltoot>, L<perlobj>, and L<perlbot> |
| 712 | for how to use them to create objects. |