Commit | Line | Data |
---|---|---|
a0d0e21e | 1 | =head1 NAME |
d74e8afc | 2 | X<reference> X<pointer> X<data structure> X<structure> X<struct> |
a0d0e21e LW |
3 | |
4 | perlref - Perl references and nested data structures | |
5 | ||
a1e2a320 GS |
6 | =head1 NOTE |
7 | ||
8 | This is complete documentation about all aspects of references. | |
9 | For a shorter, tutorial introduction to just the essential features, | |
10 | see L<perlreftut>. | |
11 | ||
a0d0e21e LW |
12 | =head1 DESCRIPTION |
13 | ||
cb1a09d0 | 14 | Before release 5 of Perl it was difficult to represent complex data |
5a964f20 TC |
15 | structures, because all references had to be symbolic--and even then |
16 | it was difficult to refer to a variable instead of a symbol table entry. | |
17 | Perl now not only makes it easier to use symbolic references to variables, | |
18 | but also lets you have "hard" references to any piece of data or code. | |
19 | Any scalar may hold a hard reference. Because arrays and hashes contain | |
20 | scalars, you can now easily build arrays of arrays, arrays of hashes, | |
21 | hashes of arrays, arrays of hashes of functions, and so on. | |
a0d0e21e LW |
22 | |
23 | Hard references are smart--they keep track of reference counts for you, | |
2d24ed35 | 24 | automatically freeing the thing referred to when its reference count goes |
7c2ea1c7 | 25 | to zero. (Reference counts for values in self-referential or |
2d24ed35 | 26 | cyclic data structures may not go to zero without a little help; see |
2b4f771d | 27 | L</"Circular References"> for a detailed explanation.) |
2d24ed35 CS |
28 | If that thing happens to be an object, the object is destructed. See |
29 | L<perlobj> for more about objects. (In a sense, everything in Perl is an | |
30 | object, but we usually reserve the word for references to objects that | |
31 | have been officially "blessed" into a class package.) | |
32 | ||
33 | Symbolic references are names of variables or other objects, just as a | |
54310121 | 34 | symbolic link in a Unix filesystem contains merely the name of a file. |
d1be9408 | 35 | The C<*glob> notation is something of a symbolic reference. (Symbolic |
2d24ed35 CS |
36 | references are sometimes called "soft references", but please don't call |
37 | them that; references are confusing enough without useless synonyms.) | |
d74e8afc ITB |
38 | X<reference, symbolic> X<reference, soft> |
39 | X<symbolic reference> X<soft reference> | |
2d24ed35 | 40 | |
54310121 | 41 | In contrast, hard references are more like hard links in a Unix file |
2d24ed35 CS |
42 | system: They are used to access an underlying object without concern for |
43 | what its (other) name is. When the word "reference" is used without an | |
5a964f20 | 44 | adjective, as in the following paragraph, it is usually talking about a |
2d24ed35 | 45 | hard reference. |
d74e8afc | 46 | X<reference, hard> X<hard reference> |
2d24ed35 CS |
47 | |
48 | References are easy to use in Perl. There is just one overriding | |
903c0e71 PM |
49 | principle: in general, Perl does no implicit referencing or dereferencing. |
50 | When a scalar is holding a reference, it always behaves as a simple scalar. | |
51 | It doesn't magically start being an array or hash or subroutine; you have to | |
52 | tell it explicitly to do so, by dereferencing it. | |
53 | ||
5a964f20 | 54 | =head2 Making References |
d74e8afc | 55 | X<reference, creation> X<referencing> |
5a964f20 TC |
56 | |
57 | References can be created in several ways. | |
a0d0e21e | 58 | |
3e3ae962 | 59 | =head3 Backslash Operator |
d74e8afc | 60 | X<\> X<backslash> |
a0d0e21e LW |
61 | |
62 | By using the backslash operator on a variable, subroutine, or value. | |
d962e436 | 63 | (This works much like the & (address-of) operator in C.) |
7c2ea1c7 | 64 | This typically creates I<another> reference to a variable, because |
a0d0e21e LW |
65 | there's already a reference to the variable in the symbol table. But |
66 | the symbol table reference might go away, and you'll still have the | |
67 | reference that the backslash returned. Here are some examples: | |
68 | ||
69 | $scalarref = \$foo; | |
70 | $arrayref = \@ARGV; | |
71 | $hashref = \%ENV; | |
72 | $coderef = \&handler; | |
55497cff | 73 | $globref = \*foo; |
cb1a09d0 | 74 | |
5a964f20 TC |
75 | It isn't possible to create a true reference to an IO handle (filehandle |
76 | or dirhandle) using the backslash operator. The most you can get is a | |
77 | reference to a typeglob, which is actually a complete symbol table entry. | |
78 | But see the explanation of the C<*foo{THING}> syntax below. However, | |
79 | you can still use type globs and globrefs as though they were IO handles. | |
a0d0e21e | 80 | |
3e3ae962 | 81 | =head3 Square Brackets |
d74e8afc ITB |
82 | X<array, anonymous> X<[> X<[]> X<square bracket> |
83 | X<bracket, square> X<arrayref> X<array reference> X<reference, array> | |
a0d0e21e | 84 | |
5a964f20 | 85 | A reference to an anonymous array can be created using square |
a0d0e21e LW |
86 | brackets: |
87 | ||
88 | $arrayref = [1, 2, ['a', 'b', 'c']]; | |
89 | ||
5a964f20 | 90 | Here we've created a reference to an anonymous array of three elements |
54310121 | 91 | whose final element is itself a reference to another anonymous array of three |
a0d0e21e | 92 | elements. (The multidimensional syntax described later can be used to |
c47ff5f1 | 93 | access this. For example, after the above, C<< $arrayref->[2][1] >> would have |
a0d0e21e LW |
94 | the value "b".) |
95 | ||
7c2ea1c7 | 96 | Taking a reference to an enumerated list is not the same |
cb1a09d0 AD |
97 | as using square brackets--instead it's the same as creating |
98 | a list of references! | |
99 | ||
54310121 | 100 | @list = (\$a, \@b, \%c); |
5566fa15 | 101 | @list = \($a, @b, %c); # same thing! |
58e0a6ae | 102 | |
54310121 | 103 | As a special case, C<\(@foo)> returns a list of references to the contents |
b6429b1b GS |
104 | of C<@foo>, not a reference to C<@foo> itself. Likewise for C<%foo>, |
105 | except that the key references are to copies (since the keys are just | |
106 | strings rather than full-fledged scalars). | |
cb1a09d0 | 107 | |
3e3ae962 | 108 | =head3 Curly Brackets |
d74e8afc ITB |
109 | X<hash, anonymous> X<{> X<{}> X<curly bracket> |
110 | X<bracket, curly> X<brace> X<hashref> X<hash reference> X<reference, hash> | |
a0d0e21e | 111 | |
5a964f20 | 112 | A reference to an anonymous hash can be created using curly |
a0d0e21e LW |
113 | brackets: |
114 | ||
115 | $hashref = { | |
5566fa15 SF |
116 | 'Adam' => 'Eve', |
117 | 'Clyde' => 'Bonnie', | |
a0d0e21e LW |
118 | }; |
119 | ||
5a964f20 | 120 | Anonymous hash and array composers like these can be intermixed freely to |
a0d0e21e LW |
121 | produce as complicated a structure as you want. The multidimensional |
122 | syntax described below works for these too. The values above are | |
123 | literals, but variables and expressions would work just as well, because | |
124 | assignment operators in Perl (even within local() or my()) are executable | |
125 | statements, not compile-time declarations. | |
126 | ||
127 | Because curly brackets (braces) are used for several other things | |
128 | including BLOCKs, you may occasionally have to disambiguate braces at the | |
129 | beginning of a statement by putting a C<+> or a C<return> in front so | |
130 | that Perl realizes the opening brace isn't starting a BLOCK. The economy and | |
131 | mnemonic value of using curlies is deemed worth this occasional extra | |
132 | hassle. | |
133 | ||
134 | For example, if you wanted a function to make a new hash and return a | |
135 | reference to it, you have these options: | |
136 | ||
137 | sub hashem { { @_ } } # silently wrong | |
138 | sub hashem { +{ @_ } } # ok | |
139 | sub hashem { return { @_ } } # ok | |
140 | ||
ebc58f1a GS |
141 | On the other hand, if you want the other meaning, you can do this: |
142 | ||
555bd962 BG |
143 | sub showem { { @_ } } # ambiguous (currently ok, |
144 | # but may change) | |
ebc58f1a GS |
145 | sub showem { {; @_ } } # ok |
146 | sub showem { { return @_ } } # ok | |
147 | ||
7c2ea1c7 | 148 | The leading C<+{> and C<{;> always serve to disambiguate |
ebc58f1a GS |
149 | the expression to mean either the HASH reference, or the BLOCK. |
150 | ||
3e3ae962 | 151 | =head3 Anonymous Subroutines |
d74e8afc ITB |
152 | X<subroutine, anonymous> X<subroutine, reference> X<reference, subroutine> |
153 | X<scope, lexical> X<closure> X<lexical> X<lexical scope> | |
a0d0e21e | 154 | |
5a964f20 | 155 | A reference to an anonymous subroutine can be created by using |
a0d0e21e LW |
156 | C<sub> without a subname: |
157 | ||
158 | $coderef = sub { print "Boink!\n" }; | |
159 | ||
7c2ea1c7 GS |
160 | Note the semicolon. Except for the code |
161 | inside not being immediately executed, a C<sub {}> is not so much a | |
a0d0e21e | 162 | declaration as it is an operator, like C<do{}> or C<eval{}>. (However, no |
5a964f20 | 163 | matter how many times you execute that particular line (unless you're in an |
19799a22 | 164 | C<eval("...")>), $coderef will still have a reference to the I<same> |
a0d0e21e LW |
165 | anonymous subroutine.) |
166 | ||
748a9306 | 167 | Anonymous subroutines act as closures with respect to my() variables, |
7c2ea1c7 | 168 | that is, variables lexically visible within the current scope. Closure |
748a9306 LW |
169 | is a notion out of the Lisp world that says if you define an anonymous |
170 | function in a particular lexical context, it pretends to run in that | |
7c2ea1c7 | 171 | context even when it's called outside the context. |
748a9306 LW |
172 | |
173 | In human terms, it's a funny way of passing arguments to a subroutine when | |
174 | you define it as well as when you call it. It's useful for setting up | |
175 | little bits of code to run later, such as callbacks. You can even | |
54310121 | 176 | do object-oriented stuff with it, though Perl already provides a different |
177 | mechanism to do that--see L<perlobj>. | |
748a9306 | 178 | |
7c2ea1c7 GS |
179 | You might also think of closure as a way to write a subroutine |
180 | template without using eval(). Here's a small example of how | |
181 | closures work: | |
748a9306 LW |
182 | |
183 | sub newprint { | |
5566fa15 SF |
184 | my $x = shift; |
185 | return sub { my $y = shift; print "$x, $y!\n"; }; | |
a0d0e21e | 186 | } |
748a9306 LW |
187 | $h = newprint("Howdy"); |
188 | $g = newprint("Greetings"); | |
189 | ||
190 | # Time passes... | |
191 | ||
192 | &$h("world"); | |
193 | &$g("earthlings"); | |
a0d0e21e | 194 | |
748a9306 LW |
195 | This prints |
196 | ||
197 | Howdy, world! | |
198 | Greetings, earthlings! | |
199 | ||
7c2ea1c7 GS |
200 | Note particularly that $x continues to refer to the value passed |
201 | into newprint() I<despite> "my $x" having gone out of scope by the | |
202 | time the anonymous subroutine runs. That's what a closure is all | |
203 | about. | |
748a9306 | 204 | |
5a964f20 | 205 | This applies only to lexical variables, by the way. Dynamic variables |
748a9306 LW |
206 | continue to work as they have always worked. Closure is not something |
207 | that most Perl programmers need trouble themselves about to begin with. | |
a0d0e21e | 208 | |
3e3ae962 | 209 | =head3 Constructors |
d74e8afc | 210 | X<constructor> X<new> |
a0d0e21e | 211 | |
63acfd00 | 212 | References are often returned by special subroutines called constructors. Perl |
213 | objects are just references to a special type of object that happens to know | |
214 | which package it's associated with. Constructors are just special subroutines | |
215 | that know how to create that association. They do so by starting with an | |
216 | ordinary reference, and it remains an ordinary reference even while it's also | |
217 | being an object. Constructors are often named C<new()>. You I<can> call them | |
218 | indirectly: | |
219 | ||
220 | $objref = new Doggie( Tail => 'short', Ears => 'long' ); | |
221 | ||
222 | But that can produce ambiguous syntax in certain cases, so it's often | |
223 | better to use the direct method invocation approach: | |
5a964f20 TC |
224 | |
225 | $objref = Doggie->new(Tail => 'short', Ears => 'long'); | |
226 | ||
227 | use Term::Cap; | |
228 | $terminal = Term::Cap->Tgetent( { OSPEED => 9600 }); | |
229 | ||
230 | use Tk; | |
231 | $main = MainWindow->new(); | |
232 | $menubar = $main->Frame(-relief => "raised", | |
233 | -borderwidth => 2) | |
234 | ||
3e3ae962 | 235 | =head3 Autovivification |
d74e8afc | 236 | X<autovivification> |
a0d0e21e LW |
237 | |
238 | References of the appropriate type can spring into existence if you | |
5f05dabc | 239 | dereference them in a context that assumes they exist. Because we haven't |
a0d0e21e LW |
240 | talked about dereferencing yet, we can't show you any examples yet. |
241 | ||
3e3ae962 | 242 | =head3 Typeglob Slots |
d74e8afc | 243 | X<*foo{THING}> X<*> |
cb1a09d0 | 244 | |
55497cff | 245 | A reference can be created by using a special syntax, lovingly known as |
246 | the *foo{THING} syntax. *foo{THING} returns a reference to the THING | |
247 | slot in *foo (which is the symbol table entry which holds everything | |
248 | known as foo). | |
cb1a09d0 | 249 | |
55497cff | 250 | $scalarref = *foo{SCALAR}; |
251 | $arrayref = *ARGV{ARRAY}; | |
252 | $hashref = *ENV{HASH}; | |
253 | $coderef = *handler{CODE}; | |
36477c24 | 254 | $ioref = *STDIN{IO}; |
55497cff | 255 | $globref = *foo{GLOB}; |
c0bd1adc | 256 | $formatref = *foo{FORMAT}; |
171e2879 FC |
257 | $globname = *foo{NAME}; # "foo" |
258 | $pkgname = *foo{PACKAGE}; # "main" | |
55497cff | 259 | |
171e2879 FC |
260 | Most of these are self-explanatory, but C<*foo{IO}> |
261 | deserves special attention. It returns | |
7c2ea1c7 GS |
262 | the IO handle, used for file handles (L<perlfunc/open>), sockets |
263 | (L<perlfunc/socket> and L<perlfunc/socketpair>), and directory | |
264 | handles (L<perlfunc/opendir>). For compatibility with previous | |
39b99f21 | 265 | versions of Perl, C<*foo{FILEHANDLE}> is a synonym for C<*foo{IO}>, though it |
83677dc5 RS |
266 | is discouraged, to encourage a consistent use of one name: IO. On perls |
267 | between v5.8 and v5.22, it will issue a deprecation warning, but this | |
268 | deprecation has since been rescinded. | |
55497cff | 269 | |
7c2ea1c7 GS |
270 | C<*foo{THING}> returns undef if that particular THING hasn't been used yet, |
271 | except in the case of scalars. C<*foo{SCALAR}> returns a reference to an | |
5f05dabc | 272 | anonymous scalar if $foo hasn't been used yet. This might change in a |
273 | future release. | |
274 | ||
171e2879 FC |
275 | C<*foo{NAME}> and C<*foo{PACKAGE}> are the exception, in that they return |
276 | strings, rather than references. These return the package and name of the | |
277 | typeglob itself, rather than one that has been assigned to it. So, after | |
278 | C<*foo=*Foo::bar>, C<*foo> will become "*Foo::bar" when used as a string, | |
279 | but C<*foo{PACKAGE}> and C<*foo{NAME}> will continue to produce "main" and | |
280 | "foo", respectively. | |
281 | ||
7c2ea1c7 | 282 | C<*foo{IO}> is an alternative to the C<*HANDLE> mechanism given in |
5a964f20 TC |
283 | L<perldata/"Typeglobs and Filehandles"> for passing filehandles |
284 | into or out of subroutines, or storing into larger data structures. | |
285 | Its disadvantage is that it won't create a new filehandle for you. | |
7c2ea1c7 GS |
286 | Its advantage is that you have less risk of clobbering more than |
287 | you want to with a typeglob assignment. (It still conflates file | |
288 | and directory handles, though.) However, if you assign the incoming | |
289 | value to a scalar instead of a typeglob as we do in the examples | |
290 | below, there's no risk of that happening. | |
36477c24 | 291 | |
5566fa15 SF |
292 | splutter(*STDOUT); # pass the whole glob |
293 | splutter(*STDOUT{IO}); # pass both file and dir handles | |
5a964f20 | 294 | |
cb1a09d0 | 295 | sub splutter { |
5566fa15 SF |
296 | my $fh = shift; |
297 | print $fh "her um well a hmmm\n"; | |
cb1a09d0 AD |
298 | } |
299 | ||
5566fa15 | 300 | $rec = get_rec(*STDIN); # pass the whole glob |
7c2ea1c7 | 301 | $rec = get_rec(*STDIN{IO}); # pass both file and dir handles |
5a964f20 | 302 | |
cb1a09d0 | 303 | sub get_rec { |
5566fa15 SF |
304 | my $fh = shift; |
305 | return scalar <$fh>; | |
cb1a09d0 AD |
306 | } |
307 | ||
5a964f20 | 308 | =head2 Using References |
d74e8afc | 309 | X<reference, use> X<dereferencing> X<dereference> |
5a964f20 | 310 | |
a0d0e21e LW |
311 | That's it for creating references. By now you're probably dying to |
312 | know how to use references to get back to your long-lost data. There | |
313 | are several basic methods. | |
314 | ||
3e3ae962 | 315 | =head3 Simple Scalar |
a0d0e21e | 316 | |
6309d9d9 | 317 | Anywhere you'd put an identifier (or chain of identifiers) as part |
318 | of a variable or subroutine name, you can replace the identifier with | |
319 | a simple scalar variable containing a reference of the correct type: | |
a0d0e21e LW |
320 | |
321 | $bar = $$scalarref; | |
322 | push(@$arrayref, $filename); | |
323 | $$arrayref[0] = "January"; | |
324 | $$hashref{"KEY"} = "VALUE"; | |
325 | &$coderef(1,2,3); | |
cb1a09d0 | 326 | print $globref "output\n"; |
a0d0e21e | 327 | |
19799a22 | 328 | It's important to understand that we are specifically I<not> dereferencing |
a0d0e21e | 329 | C<$arrayref[0]> or C<$hashref{"KEY"}> there. The dereference of the |
19799a22 | 330 | scalar variable happens I<before> it does any key lookups. Anything more |
a0d0e21e LW |
331 | complicated than a simple scalar variable must use methods 2 or 3 below. |
332 | However, a "simple scalar" includes an identifier that itself uses method | |
333 | 1 recursively. Therefore, the following prints "howdy". | |
334 | ||
335 | $refrefref = \\\"howdy"; | |
336 | print $$$$refrefref; | |
337 | ||
3e3ae962 | 338 | =head3 Block |
a0d0e21e | 339 | |
6309d9d9 | 340 | Anywhere you'd put an identifier (or chain of identifiers) as part of a |
341 | variable or subroutine name, you can replace the identifier with a | |
342 | BLOCK returning a reference of the correct type. In other words, the | |
343 | previous examples could be written like this: | |
a0d0e21e LW |
344 | |
345 | $bar = ${$scalarref}; | |
346 | push(@{$arrayref}, $filename); | |
347 | ${$arrayref}[0] = "January"; | |
348 | ${$hashref}{"KEY"} = "VALUE"; | |
349 | &{$coderef}(1,2,3); | |
36477c24 | 350 | $globref->print("output\n"); # iff IO::Handle is loaded |
a0d0e21e LW |
351 | |
352 | Admittedly, it's a little silly to use the curlies in this case, but | |
353 | the BLOCK can contain any arbitrary expression, in particular, | |
354 | subscripted expressions: | |
355 | ||
5566fa15 | 356 | &{ $dispatch{$index} }(1,2,3); # call correct routine |
a0d0e21e LW |
357 | |
358 | Because of being able to omit the curlies for the simple case of C<$$x>, | |
359 | people often make the mistake of viewing the dereferencing symbols as | |
360 | proper operators, and wonder about their precedence. If they were, | |
5f05dabc | 361 | though, you could use parentheses instead of braces. That's not the case. |
a0d0e21e | 362 | Consider the difference below; case 0 is a short-hand version of case 1, |
19799a22 | 363 | I<not> case 2: |
a0d0e21e | 364 | |
5566fa15 SF |
365 | $$hashref{"KEY"} = "VALUE"; # CASE 0 |
366 | ${$hashref}{"KEY"} = "VALUE"; # CASE 1 | |
367 | ${$hashref{"KEY"}} = "VALUE"; # CASE 2 | |
368 | ${$hashref->{"KEY"}} = "VALUE"; # CASE 3 | |
a0d0e21e LW |
369 | |
370 | Case 2 is also deceptive in that you're accessing a variable | |
371 | called %hashref, not dereferencing through $hashref to the hash | |
372 | it's presumably referencing. That would be case 3. | |
373 | ||
3e3ae962 | 374 | =head3 Arrow Notation |
a0d0e21e | 375 | |
6da72b64 CS |
376 | Subroutine calls and lookups of individual array elements arise often |
377 | enough that it gets cumbersome to use method 2. As a form of | |
378 | syntactic sugar, the examples for method 2 may be written: | |
a0d0e21e | 379 | |
6da72b64 CS |
380 | $arrayref->[0] = "January"; # Array element |
381 | $hashref->{"KEY"} = "VALUE"; # Hash element | |
382 | $coderef->(1,2,3); # Subroutine call | |
a0d0e21e | 383 | |
6da72b64 | 384 | The left side of the arrow can be any expression returning a reference, |
19799a22 | 385 | including a previous dereference. Note that C<$array[$x]> is I<not> the |
c47ff5f1 | 386 | same thing as C<< $array->[$x] >> here: |
a0d0e21e LW |
387 | |
388 | $array[$x]->{"foo"}->[0] = "January"; | |
389 | ||
390 | This is one of the cases we mentioned earlier in which references could | |
391 | spring into existence when in an lvalue context. Before this | |
392 | statement, C<$array[$x]> may have been undefined. If so, it's | |
393 | automatically defined with a hash reference so that we can look up | |
c47ff5f1 | 394 | C<{"foo"}> in it. Likewise C<< $array[$x]->{"foo"} >> will automatically get |
a0d0e21e | 395 | defined with an array reference so that we can look up C<[0]> in it. |
5a964f20 | 396 | This process is called I<autovivification>. |
a0d0e21e | 397 | |
19799a22 | 398 | One more thing here. The arrow is optional I<between> brackets |
a0d0e21e LW |
399 | subscripts, so you can shrink the above down to |
400 | ||
401 | $array[$x]{"foo"}[0] = "January"; | |
402 | ||
403 | Which, in the degenerate case of using only ordinary arrays, gives you | |
404 | multidimensional arrays just like C's: | |
405 | ||
406 | $score[$x][$y][$z] += 42; | |
407 | ||
408 | Well, okay, not entirely like C's arrays, actually. C doesn't know how | |
409 | to grow its arrays on demand. Perl does. | |
410 | ||
3e3ae962 | 411 | =head3 Objects |
a0d0e21e LW |
412 | |
413 | If a reference happens to be a reference to an object, then there are | |
414 | probably methods to access the things referred to, and you should probably | |
415 | stick to those methods unless you're in the class package that defines the | |
416 | object's methods. In other words, be nice, and don't violate the object's | |
417 | encapsulation without a very good reason. Perl does not enforce | |
418 | encapsulation. We are not totalitarians here. We do expect some basic | |
419 | civility though. | |
420 | ||
3e3ae962 | 421 | =head3 Miscellaneous Usage |
a0d0e21e | 422 | |
7c2ea1c7 GS |
423 | Using a string or number as a reference produces a symbolic reference, |
424 | as explained above. Using a reference as a number produces an | |
425 | integer representing its storage location in memory. The only | |
426 | useful thing to be done with this is to compare two references | |
427 | numerically to see whether they refer to the same location. | |
d74e8afc | 428 | X<reference, numeric context> |
7c2ea1c7 GS |
429 | |
430 | if ($ref1 == $ref2) { # cheap numeric compare of references | |
5566fa15 | 431 | print "refs 1 and 2 refer to the same thing\n"; |
7c2ea1c7 GS |
432 | } |
433 | ||
434 | Using a reference as a string produces both its referent's type, | |
435 | including any package blessing as described in L<perlobj>, as well | |
436 | as the numeric address expressed in hex. The ref() operator returns | |
437 | just the type of thing the reference is pointing to, without the | |
438 | address. See L<perlfunc/ref> for details and examples of its use. | |
d74e8afc | 439 | X<reference, string context> |
a0d0e21e | 440 | |
5a964f20 TC |
441 | The bless() operator may be used to associate the object a reference |
442 | points to with a package functioning as an object class. See L<perlobj>. | |
a0d0e21e | 443 | |
5f05dabc | 444 | A typeglob may be dereferenced the same way a reference can, because |
7c2ea1c7 | 445 | the dereference syntax always indicates the type of reference desired. |
a0d0e21e LW |
446 | So C<${*foo}> and C<${\$foo}> both indicate the same scalar variable. |
447 | ||
448 | Here's a trick for interpolating a subroutine call into a string: | |
449 | ||
cb1a09d0 AD |
450 | print "My sub returned @{[mysub(1,2,3)]} that time.\n"; |
451 | ||
452 | The way it works is that when the C<@{...}> is seen in the double-quoted | |
453 | string, it's evaluated as a block. The block creates a reference to an | |
454 | anonymous array containing the results of the call to C<mysub(1,2,3)>. So | |
455 | the whole block returns a reference to an array, which is then | |
456 | dereferenced by C<@{...}> and stuck into the double-quoted string. This | |
457 | chicanery is also useful for arbitrary expressions: | |
a0d0e21e | 458 | |
184e9718 | 459 | print "That yields @{[$n + 5]} widgets\n"; |
a0d0e21e | 460 | |
35efdb20 DL |
461 | Similarly, an expression that returns a reference to a scalar can be |
462 | dereferenced via C<${...}>. Thus, the above expression may be written | |
463 | as: | |
464 | ||
465 | print "That yields ${\($n + 5)} widgets\n"; | |
466 | ||
0a044a7c DR |
467 | =head2 Circular References |
468 | X<circular reference> X<reference, circular> | |
469 | ||
470 | It is possible to create a "circular reference" in Perl, which can lead | |
471 | to memory leaks. A circular reference occurs when two references | |
472 | contain a reference to each other, like this: | |
473 | ||
474 | my $foo = {}; | |
475 | my $bar = { foo => $foo }; | |
476 | $foo->{bar} = $bar; | |
477 | ||
478 | You can also create a circular reference with a single variable: | |
479 | ||
480 | my $foo; | |
481 | $foo = \$foo; | |
482 | ||
483 | In this case, the reference count for the variables will never reach 0, | |
484 | and the references will never be garbage-collected. This can lead to | |
485 | memory leaks. | |
486 | ||
487 | Because objects in Perl are implemented as references, it's possible to | |
488 | have circular references with objects as well. Imagine a TreeNode class | |
489 | where each node references its parent and child nodes. Any node with a | |
490 | parent will be part of a circular reference. | |
491 | ||
492 | You can break circular references by creating a "weak reference". A | |
493 | weak reference does not increment the reference count for a variable, | |
494 | which means that the object can go out of scope and be destroyed. You | |
495 | can weaken a reference with the C<weaken> function exported by the | |
6ac93b49 PE |
496 | L<Scalar::Util> module, or available as C<builtin::weaken> directly in |
497 | Perl version 5.35.7 or later. | |
0a044a7c DR |
498 | |
499 | Here's how we can make the first example safer: | |
500 | ||
501 | use Scalar::Util 'weaken'; | |
502 | ||
503 | my $foo = {}; | |
504 | my $bar = { foo => $foo }; | |
505 | $foo->{bar} = $bar; | |
506 | ||
507 | weaken $foo->{bar}; | |
508 | ||
509 | The reference from C<$foo> to C<$bar> has been weakened. When the | |
510 | C<$bar> variable goes out of scope, it will be garbage-collected. The | |
511 | next time you look at the value of the C<< $foo->{bar} >> key, it will | |
512 | be C<undef>. | |
513 | ||
514 | This action at a distance can be confusing, so you should be careful | |
515 | with your use of weaken. You should weaken the reference in the | |
516 | variable that will go out of scope I<first>. That way, the longer-lived | |
517 | variable will contain the expected reference until it goes out of | |
518 | scope. | |
519 | ||
a0d0e21e | 520 | =head2 Symbolic references |
d74e8afc ITB |
521 | X<reference, symbolic> X<reference, soft> |
522 | X<symbolic reference> X<soft reference> | |
a0d0e21e LW |
523 | |
524 | We said that references spring into existence as necessary if they are | |
525 | undefined, but we didn't say what happens if a value used as a | |
19799a22 | 526 | reference is already defined, but I<isn't> a hard reference. If you |
7c2ea1c7 | 527 | use it as a reference, it'll be treated as a symbolic |
19799a22 | 528 | reference. That is, the value of the scalar is taken to be the I<name> |
a0d0e21e LW |
529 | of a variable, rather than a direct link to a (possibly) anonymous |
530 | value. | |
531 | ||
532 | People frequently expect it to work like this. So it does. | |
533 | ||
534 | $name = "foo"; | |
5566fa15 SF |
535 | $$name = 1; # Sets $foo |
536 | ${$name} = 2; # Sets $foo | |
537 | ${$name x 2} = 3; # Sets $foofoo | |
538 | $name->[0] = 4; # Sets $foo[0] | |
539 | @$name = (); # Clears @foo | |
540 | &$name(); # Calls &foo() | |
a0d0e21e | 541 | $pack = "THAT"; |
5566fa15 | 542 | ${"${pack}::$name"} = 5; # Sets $THAT::foo without eval |
a0d0e21e | 543 | |
7c2ea1c7 | 544 | This is powerful, and slightly dangerous, in that it's possible |
a0d0e21e LW |
545 | to intend (with the utmost sincerity) to use a hard reference, and |
546 | accidentally use a symbolic reference instead. To protect against | |
547 | that, you can say | |
548 | ||
549 | use strict 'refs'; | |
550 | ||
551 | and then only hard references will be allowed for the rest of the enclosing | |
54310121 | 552 | block. An inner block may countermand that with |
a0d0e21e LW |
553 | |
554 | no strict 'refs'; | |
555 | ||
5a964f20 TC |
556 | Only package variables (globals, even if localized) are visible to |
557 | symbolic references. Lexical variables (declared with my()) aren't in | |
558 | a symbol table, and thus are invisible to this mechanism. For example: | |
a0d0e21e | 559 | |
5a964f20 | 560 | local $value = 10; |
b0c35547 | 561 | $ref = "value"; |
a0d0e21e | 562 | { |
5566fa15 SF |
563 | my $value = 20; |
564 | print $$ref; | |
54310121 | 565 | } |
a0d0e21e LW |
566 | |
567 | This will still print 10, not 20. Remember that local() affects package | |
568 | variables, which are all "global" to the package. | |
569 | ||
748a9306 LW |
570 | =head2 Not-so-symbolic references |
571 | ||
0480bf32 | 572 | Brackets around a symbolic reference can simply |
903c0e71 PM |
573 | serve to isolate an identifier or variable name from the rest of an |
574 | expression, just as they always have within a string. For example, | |
748a9306 LW |
575 | |
576 | $push = "pop on "; | |
577 | print "${push}over"; | |
578 | ||
7c2ea1c7 | 579 | has always meant to print "pop on over", even though push is |
0480bf32 | 580 | a reserved word. This is generalized to work the same |
903c0e71 | 581 | without the enclosing double quotes, so that |
748a9306 LW |
582 | |
583 | print ${push} . "over"; | |
584 | ||
585 | and even | |
586 | ||
587 | print ${ push } . "over"; | |
588 | ||
0480bf32 | 589 | will have the same effect. This |
748a9306 LW |
590 | construct is I<not> considered to be a symbolic reference when you're |
591 | using strict refs: | |
592 | ||
593 | use strict 'refs'; | |
5566fa15 SF |
594 | ${ bareword }; # Okay, means $bareword. |
595 | ${ "bareword" }; # Error, symbolic reference. | |
748a9306 | 596 | |
903c0e71 PM |
597 | Similarly, because of all the subscripting that is done using single words, |
598 | the same rule applies to any bareword that is used for subscripting a hash. | |
599 | So now, instead of writing | |
748a9306 | 600 | |
f06965b9 | 601 | $hash{ "aaa" }{ "bbb" }{ "ccc" } |
748a9306 | 602 | |
5f05dabc | 603 | you can write just |
748a9306 | 604 | |
f06965b9 | 605 | $hash{ aaa }{ bbb }{ ccc } |
748a9306 LW |
606 | |
607 | and not worry about whether the subscripts are reserved words. In the | |
608 | rare event that you do wish to do something like | |
609 | ||
f06965b9 | 610 | $hash{ shift } |
748a9306 LW |
611 | |
612 | you can force interpretation as a reserved word by adding anything that | |
613 | makes it more than a bareword: | |
614 | ||
f06965b9 C |
615 | $hash{ shift() } |
616 | $hash{ +shift } | |
617 | $hash{ shift @_ } | |
748a9306 | 618 | |
9f1b1f2d GS |
619 | The C<use warnings> pragma or the B<-w> switch will warn you if it |
620 | interprets a reserved word as a string. | |
5f05dabc | 621 | But it will no longer warn you about using lowercase words, because the |
748a9306 LW |
622 | string is effectively quoted. |
623 | ||
49399b3f | 624 | =head2 Pseudo-hashes: Using an array as a hash |
d74e8afc | 625 | X<pseudo-hash> X<pseudo hash> X<pseudohash> |
49399b3f | 626 | |
6d822dc4 MS |
627 | Pseudo-hashes have been removed from Perl. The 'fields' pragma |
628 | remains available. | |
e0478e5a | 629 | |
5a964f20 | 630 | =head2 Function Templates |
d74e8afc ITB |
631 | X<scope, lexical> X<closure> X<lexical> X<lexical scope> |
632 | X<subroutine, nested> X<sub, nested> X<subroutine, local> X<sub, local> | |
5a964f20 | 633 | |
b5c19bd7 DM |
634 | As explained above, an anonymous function with access to the lexical |
635 | variables visible when that function was compiled, creates a closure. It | |
636 | retains access to those variables even though it doesn't get run until | |
637 | later, such as in a signal handler or a Tk callback. | |
5a964f20 TC |
638 | |
639 | Using a closure as a function template allows us to generate many functions | |
c2611fb3 | 640 | that act similarly. Suppose you wanted functions named after the colors |
5a964f20 TC |
641 | that generated HTML font changes for the various colors: |
642 | ||
643 | print "Be ", red("careful"), "with that ", green("light"); | |
644 | ||
7c2ea1c7 | 645 | The red() and green() functions would be similar. To create these, |
5a964f20 | 646 | we'll assign a closure to a typeglob of the name of the function we're |
d962e436 | 647 | trying to build. |
5a964f20 TC |
648 | |
649 | @colors = qw(red blue green yellow orange purple violet); | |
650 | for my $name (@colors) { | |
5566fa15 | 651 | no strict 'refs'; # allow symbol table manipulation |
5a964f20 | 652 | *$name = *{uc $name} = sub { "<FONT COLOR='$name'>@_</FONT>" }; |
d962e436 | 653 | } |
5a964f20 TC |
654 | |
655 | Now all those different functions appear to exist independently. You can | |
656 | call red(), RED(), blue(), BLUE(), green(), etc. This technique saves on | |
657 | both compile time and memory use, and is less error-prone as well, since | |
658 | syntax checks happen at compile time. It's critical that any variables in | |
659 | the anonymous subroutine be lexicals in order to create a proper closure. | |
660 | That's the reasons for the C<my> on the loop iteration variable. | |
661 | ||
662 | This is one of the only places where giving a prototype to a closure makes | |
663 | much sense. If you wanted to impose scalar context on the arguments of | |
664 | these functions (probably not a wise idea for this particular example), | |
665 | you could have written it this way instead: | |
666 | ||
667 | *$name = sub ($) { "<FONT COLOR='$name'>$_[0]</FONT>" }; | |
668 | ||
669 | However, since prototype checking happens at compile time, the assignment | |
670 | above happens too late to be of much use. You could address this by | |
671 | putting the whole loop of assignments within a BEGIN block, forcing it | |
672 | to occur during compilation. | |
673 | ||
58e2a187 CW |
674 | Access to lexicals that change over time--like those in the C<for> loop |
675 | above, basically aliases to elements from the surrounding lexical scopes-- | |
676 | only works with anonymous subs, not with named subroutines. Generally | |
677 | said, named subroutines do not nest properly and should only be declared | |
678 | in the main package scope. | |
679 | ||
680 | This is because named subroutines are created at compile time so their | |
681 | lexical variables get assigned to the parent lexicals from the first | |
682 | execution of the parent block. If a parent scope is entered a second | |
683 | time, its lexicals are created again, while the nested subs still | |
684 | reference the old ones. | |
685 | ||
686 | Anonymous subroutines get to capture each time you execute the C<sub> | |
687 | operator, as they are created on the fly. If you are accustomed to using | |
688 | nested subroutines in other programming languages with their own private | |
689 | variables, you'll have to work at it a bit in Perl. The intuitive coding | |
690 | of this type of thing incurs mysterious warnings about "will not stay | |
d962e436 | 691 | shared" due to the reasons explained above. |
58e2a187 | 692 | For example, this won't work: |
5a964f20 TC |
693 | |
694 | sub outer { | |
695 | my $x = $_[0] + 35; | |
696 | sub inner { return $x * 19 } # WRONG | |
697 | return $x + inner(); | |
b432a672 | 698 | } |
5a964f20 TC |
699 | |
700 | A work-around is the following: | |
701 | ||
702 | sub outer { | |
703 | my $x = $_[0] + 35; | |
704 | local *inner = sub { return $x * 19 }; | |
705 | return $x + inner(); | |
b432a672 | 706 | } |
5a964f20 TC |
707 | |
708 | Now inner() can only be called from within outer(), because of the | |
58e2a187 CW |
709 | temporary assignments of the anonymous subroutine. But when it does, |
710 | it has normal access to the lexical variable $x from the scope of | |
711 | outer() at the time outer is invoked. | |
5a964f20 TC |
712 | |
713 | This has the interesting effect of creating a function local to another | |
714 | function, something not normally supported in Perl. | |
715 | ||
f0d99131 | 716 | =head2 Postfix Dereference Syntax |
821361b6 RS |
717 | |
718 | Beginning in v5.20.0, a postfix syntax for using references is | |
719 | available. It behaves as described in L</Using References>, but instead | |
720 | of a prefixed sigil, a postfixed sigil-and-star is used. | |
721 | ||
722 | For example: | |
723 | ||
724 | $r = \@a; | |
725 | @b = $r->@*; # equivalent to @$r or @{ $r } | |
726 | ||
727 | $r = [ 1, [ 2, 3 ], 4 ]; | |
728 | $r->[1]->@*; # equivalent to @{ $r->[1] } | |
729 | ||
1c2511e0 AC |
730 | In Perl 5.20 and 5.22, this syntax must be enabled with C<use feature |
731 | 'postderef'>. As of Perl 5.24, no feature declarations are required to make | |
732 | it available. | |
821361b6 RS |
733 | |
734 | Postfix dereference should work in all circumstances where block | |
735 | (circumfix) dereference worked, and should be entirely equivalent. This | |
736 | syntax allows dereferencing to be written and read entirely | |
737 | left-to-right. The following equivalencies are defined: | |
738 | ||
2bcaee20 FC |
739 | $sref->$*; # same as ${ $sref } |
740 | $aref->@*; # same as @{ $aref } | |
741 | $aref->$#*; # same as $#{ $aref } | |
742 | $href->%*; # same as %{ $href } | |
743 | $cref->&*; # same as &{ $cref } | |
744 | $gref->**; # same as *{ $gref } | |
821361b6 RS |
745 | |
746 | Note especially that C<< $cref->&* >> is I<not> equivalent to C<< | |
864eb29a RS |
747 | $cref->() >>, and can serve different purposes. |
748 | ||
749 | Glob elements can be extracted through the postfix dereferencing feature: | |
750 | ||
751 | $gref->*{SCALAR}; # same as *{ $gref }{SCALAR} | |
821361b6 RS |
752 | |
753 | Postfix array and scalar dereferencing I<can> be used in interpolating | |
754 | strings (double quotes or the C<qq> operator), but only if the | |
1c2511e0 | 755 | C<postderef_qq> feature is enabled. |
821361b6 RS |
756 | |
757 | =head2 Postfix Reference Slicing | |
758 | ||
759 | Value slices of arrays and hashes may also be taken with postfix | |
760 | dereferencing notation, with the following equivalencies: | |
761 | ||
762 | $aref->@[ ... ]; # same as @$aref[ ... ] | |
763 | $href->@{ ... }; # same as @$href{ ... } | |
764 | ||
864eb29a RS |
765 | Postfix key/value pair slicing, added in 5.20.0 and documented in |
766 | L<the KeyE<sol>Value Hash Slices section of perldata|perldata/"Key/Value Hash | |
767 | Slices">, also behaves as expected: | |
821361b6 RS |
768 | |
769 | $aref->%[ ... ]; # same as %$aref[ ... ] | |
770 | $href->%{ ... }; # same as %$href{ ... } | |
771 | ||
772 | As with postfix array, postfix value slice dereferencing I<can> be used | |
773 | in interpolating strings (double quotes or the C<qq> operator), but only | |
1c2511e0 | 774 | if the C<postderef_qq> L<feature> is enabled. |
821361b6 | 775 | |
f0d99131 | 776 | =head2 Assigning to References |
82848c10 FC |
777 | |
778 | Beginning in v5.22.0, the referencing operator can be assigned to. It | |
779 | performs an aliasing operation, so that the variable name referenced on the | |
780 | left-hand side becomes an alias for the thing referenced on the right-hand | |
781 | side: | |
782 | ||
783 | \$a = \$b; # $a and $b now point to the same scalar | |
784 | \&foo = \&bar; # foo() now means bar() | |
785 | ||
baabe3fb | 786 | This syntax must be enabled with C<use feature 'refaliasing'>. It is |
82848c10 | 787 | experimental, and will warn by default unless C<no warnings |
baabe3fb | 788 | 'experimental::refaliasing'> is in effect. |
82848c10 FC |
789 | |
790 | These forms may be assigned to, and cause the right-hand side to be | |
791 | evaluated in scalar context: | |
792 | ||
793 | \$scalar | |
794 | \@array | |
795 | \%hash | |
796 | \&sub | |
797 | \my $scalar | |
798 | \my @array | |
799 | \my %hash | |
800 | \state $scalar # or @array, etc. | |
801 | \our $scalar # etc. | |
802 | \local $scalar # etc. | |
803 | \local our $scalar # etc. | |
804 | \$some_array[$index] | |
805 | \$some_hash{$key} | |
806 | \local $some_array[$index] | |
807 | \local $some_hash{$key} | |
808 | condition ? \$this : \$that[0] # etc. | |
809 | ||
df706e5b FC |
810 | Slicing operations and parentheses cause |
811 | the right-hand side to be evaluated in | |
e05542ee | 812 | list context: |
82848c10 | 813 | |
e05542ee FC |
814 | \@array[5..7] |
815 | (\@array[5..7]) | |
816 | \(@array[5..7]) | |
df706e5b FC |
817 | \@hash{'foo','bar'} |
818 | (\@hash{'foo','bar'}) | |
819 | \(@hash{'foo','bar'}) | |
82848c10 FC |
820 | (\$scalar) |
821 | \($scalar) | |
822 | \(my $scalar) | |
823 | \my($scalar) | |
824 | (\@array) | |
825 | (\%hash) | |
826 | (\&sub) | |
827 | \(&sub) | |
828 | \($foo, @bar, %baz) | |
829 | (\$foo, \@bar, \%baz) | |
830 | ||
831 | Each element on the right-hand side must be a reference to a datum of the | |
832 | right type. Parentheses immediately surrounding an array (and possibly | |
833 | also C<my>/C<state>/C<our>/C<local>) will make each element of the array an | |
834 | alias to the corresponding scalar referenced on the right-hand side: | |
835 | ||
836 | \(@a) = \(@b); # @a and @b now have the same elements | |
837 | \my(@a) = \(@b); # likewise | |
838 | \(my @a) = \(@b); # likewise | |
839 | push @a, 3; # but now @a has an extra element that @b lacks | |
840 | \(@a) = (\$a, \$b, \$c); # @a now contains $a, $b, and $c | |
841 | ||
842 | Combining that form with C<local> and putting parentheses immediately | |
843 | around a hash are forbidden (because it is not clear what they should do): | |
844 | ||
845 | \local(@array) = foo(); # WRONG | |
dabde021 | 846 | \(%hash) = bar(); # WRONG |
82848c10 FC |
847 | |
848 | Assignment to references and non-references may be combined in lists and | |
849 | conditional ternary expressions, as long as the values on the right-hand | |
850 | side are the right type for each element on the left, though this may make | |
851 | for obfuscated code: | |
852 | ||
853 | (my $tom, \my $dick, \my @harry) = (\1, \2, [1..3]); | |
854 | # $tom is now \1 | |
855 | # $dick is now 2 (read-only) | |
856 | # @harry is (1,2,3) | |
857 | ||
858 | my $type = ref $thingy; | |
74bfae27 | 859 | ($type ? $type eq 'ARRAY' ? \@foo : \$bar : $baz) = $thingy; |
82848c10 FC |
860 | |
861 | The C<foreach> loop can also take a reference constructor for its loop | |
862 | variable, though the syntax is limited to one of the following, with an | |
863 | optional C<my>, C<state>, or C<our> after the backslash: | |
864 | ||
865 | \$s | |
866 | \@a | |
867 | \%h | |
868 | \&c | |
869 | ||
870 | No parentheses are permitted. This feature is particularly useful for | |
871 | arrays-of-arrays, or arrays-of-hashes: | |
872 | ||
873 | foreach \my @a (@array_of_arrays) { | |
874 | frobnicate($a[0], $a[-1]); | |
875 | } | |
876 | ||
877 | foreach \my %h (@array_of_hashes) { | |
74bfae27 | 878 | $h{gelastic}++ if $h{type} eq 'funny'; |
82848c10 FC |
879 | } |
880 | ||
881 | B<CAVEAT:> Aliasing does not work correctly with closures. If you try to | |
882 | alias lexical variables from an inner subroutine or C<eval>, the aliasing | |
883 | will only be visible within that inner sub, and will not affect the outer | |
884 | subroutine where the variables are declared. This bizarre behavior is | |
885 | subject to change. | |
886 | ||
415e4667 | 887 | =head2 Declaring a Reference to a Variable |
5c703779 | 888 | |
d4062d50 FC |
889 | Beginning in v5.26.0, the referencing operator can come after C<my>, |
890 | C<state>, C<our>, or C<local>. This syntax must be enabled with C<use | |
891 | feature 'declared_refs'>. It is experimental, and will warn by default | |
892 | unless C<no warnings 'experimental::refaliasing'> is in effect. | |
5c703779 FC |
893 | |
894 | This feature makes these: | |
895 | ||
896 | my \$x; | |
897 | our \$y; | |
898 | ||
899 | equivalent to: | |
900 | ||
901 | \my $x; | |
902 | \our $x; | |
903 | ||
904 | It is intended mainly for use in assignments to references (see | |
905 | L</Assigning to References>, above). It also allows the backslash to be | |
906 | used on just some items in a list of declared variables: | |
907 | ||
908 | my ($foo, \@bar, \%baz); # equivalent to: my $foo, \my(@bar, %baz); | |
909 | ||
415e4667 DIM |
910 | =head1 WARNING: Don't use references as hash keys |
911 | X<reference, string context> X<reference, use as hash key> | |
912 | ||
913 | You may not (usefully) use a reference as the key to a hash. It will be | |
914 | converted into a string: | |
915 | ||
916 | $x{ \$a } = $a; | |
917 | ||
918 | If you try to dereference the key, it won't do a hard dereference, and | |
919 | you won't accomplish what you're attempting. You might want to do something | |
920 | more like | |
921 | ||
922 | $r = \@a; | |
923 | $x{ $r } = $r; | |
924 | ||
925 | And then at least you can use the values(), which will be | |
926 | real refs, instead of the keys(), which won't. | |
927 | ||
928 | The standard Tie::RefHash module provides a convenient workaround to this. | |
929 | ||
cb1a09d0 | 930 | =head1 SEE ALSO |
a0d0e21e LW |
931 | |
932 | Besides the obvious documents, source code can be instructive. | |
7c2ea1c7 | 933 | Some pathological examples of the use of references can be found |
a0d0e21e | 934 | in the F<t/op/ref.t> regression test in the Perl source directory. |
cb1a09d0 AD |
935 | |
936 | See also L<perldsc> and L<perllol> for how to use references to create | |
82e1c0d9 | 937 | complex data structures, and L<perlootut> and L<perlobj> |
5a964f20 | 938 | for how to use them to create objects. |