Commit | Line | Data |
---|---|---|
a1e2a320 GS |
1 | |
2 | =head1 NAME | |
3 | ||
4 | perlreftut - Mark's very short tutorial about references | |
5 | ||
6 | =head1 DESCRIPTION | |
7 | ||
8 | One of the most important new features in Perl 5 was the capability to | |
9 | manage complicated data structures like multidimensional arrays and | |
10 | nested hashes. To enable these, Perl 5 introduced a feature called | |
11 | `references', and using references is the key to managing complicated, | |
12 | structured data in Perl. Unfortunately, there's a lot of funny syntax | |
13 | to learn, and the main manual page can be hard to follow. The manual | |
1da6492a GS |
14 | is quite complete, and sometimes people find that a problem, because |
15 | it can be hard to tell what is important and what isn't. | |
a1e2a320 GS |
16 | |
17 | Fortunately, you only need to know 10% of what's in the main page to get | |
18 | 90% of the benefit. This page will show you that 10%. | |
19 | ||
20 | =head1 Who Needs Complicated Data Structures? | |
21 | ||
22 | One problem that came up all the time in Perl 4 was how to represent a | |
23 | hash whose values were lists. Perl 4 had hashes, of course, but the | |
24 | values had to be scalars; they couldn't be lists. | |
25 | ||
26 | Why would you want a hash of lists? Let's take a simple example: You | |
1da6492a | 27 | have a file of city and country names, like this: |
a1e2a320 | 28 | |
1da6492a GS |
29 | Chicago, USA |
30 | Frankfurt, Germany | |
31 | Berlin, Germany | |
32 | Washington, USA | |
33 | Helsinki, Finland | |
34 | New York, USA | |
a1e2a320 | 35 | |
1da6492a GS |
36 | and you want to produce an output like this, with each country mentioned |
37 | once, and then an alphabetical list of the cities in that country: | |
a1e2a320 | 38 | |
1da6492a GS |
39 | Finland: Helsinki. |
40 | Germany: Berlin, Frankfurt. | |
41 | USA: Chicago, New York, Washington. | |
a1e2a320 | 42 | |
1da6492a GS |
43 | The natural way to do this is to have a hash whose keys are country |
44 | names. Associated with each country name key is a list of the cities in | |
45 | that country. Each time you read a line of input, split it into a country | |
a1e2a320 | 46 | and a city, look up the list of cities already known to be in that |
1da6492a | 47 | country, and append the new city to the list. When you're done reading |
a1e2a320 GS |
48 | the input, iterate over the hash as usual, sorting each list of cities |
49 | before you print it out. | |
50 | ||
51 | If hash values can't be lists, you lose. In Perl 4, hash values can't | |
52 | be lists; they can only be strings. You lose. You'd probably have to | |
53 | combine all the cities into a single string somehow, and then when | |
54 | time came to write the output, you'd have to break the string into a | |
55 | list, sort the list, and turn it back into a string. This is messy | |
56 | and error-prone. And it's frustrating, because Perl already has | |
57 | perfectly good lists that would solve the problem if only you could | |
58 | use them. | |
59 | ||
60 | =head1 The Solution | |
61 | ||
1da6492a GS |
62 | By the time Perl 5 rolled around, we were already stuck with this |
63 | design: Hash values must be scalars. The solution to this is | |
a1e2a320 GS |
64 | references. |
65 | ||
66 | A reference is a scalar value that I<refers to> an entire array or an | |
1da6492a | 67 | entire hash (or to just about anything else). Names are one kind of |
e937c8c3 MJD |
68 | reference that you're already familiar with. Think of the President |
69 | of the United States: a messy, inconvenient bag of blood and bones. | |
70 | But to talk about him, or to represent him in a computer program, all | |
71 | you need is the easy, convenient scalar string "George Bush". | |
a1e2a320 GS |
72 | |
73 | References in Perl are like names for arrays and hashes. They're | |
74 | Perl's private, internal names, so you can be sure they're | |
e937c8c3 | 75 | unambiguous. Unlike "George Bush", a reference only refers to one |
a1e2a320 GS |
76 | thing, and you always know what it refers to. If you have a reference |
77 | to an array, you can recover the entire array from it. If you have a | |
78 | reference to a hash, you can recover the entire hash. But the | |
79 | reference is still an easy, compact scalar value. | |
80 | ||
81 | You can't have a hash whose values are arrays; hash values can only be | |
82 | scalars. We're stuck with that. But a single reference can refer to | |
83 | an entire array, and references are scalars, so you can have a hash of | |
84 | references to arrays, and it'll act a lot like a hash of arrays, and | |
85 | it'll be just as useful as a hash of arrays. | |
86 | ||
1da6492a | 87 | We'll come back to this city-country problem later, after we've seen |
a1e2a320 GS |
88 | some syntax for managing references. |
89 | ||
90 | ||
91 | =head1 Syntax | |
92 | ||
93 | There are just two ways to make a reference, and just two ways to use | |
94 | it once you have it. | |
95 | ||
96 | =head2 Making References | |
97 | ||
a29d1a25 | 98 | =head3 B<Make Rule 1> |
a1e2a320 GS |
99 | |
100 | If you put a C<\> in front of a variable, you get a | |
101 | reference to that variable. | |
102 | ||
103 | $aref = \@array; # $aref now holds a reference to @array | |
104 | $href = \%hash; # $href now holds a reference to %hash | |
105 | ||
106 | Once the reference is stored in a variable like $aref or $href, you | |
107 | can copy it or store it just the same as any other scalar value: | |
108 | ||
109 | $xy = $aref; # $xy now holds a reference to @array | |
110 | $p[3] = $href; # $p[3] now holds a reference to %hash | |
111 | $z = $p[3]; # $z now holds a reference to %hash | |
112 | ||
113 | ||
114 | These examples show how to make references to variables with names. | |
115 | Sometimes you want to make an array or a hash that doesn't have a | |
116 | name. This is analogous to the way you like to be able to use the | |
117 | string C<"\n"> or the number 80 without having to store it in a named | |
118 | variable first. | |
119 | ||
120 | B<Make Rule 2> | |
121 | ||
122 | C<[ ITEMS ]> makes a new, anonymous array, and returns a reference to | |
123 | that array. C<{ ITEMS }> makes a new, anonymous hash. and returns a | |
124 | reference to that hash. | |
125 | ||
126 | $aref = [ 1, "foo", undef, 13 ]; | |
127 | # $aref now holds a reference to an array | |
128 | ||
129 | $href = { APR => 4, AUG => 8 }; | |
130 | # $href now holds a reference to a hash | |
131 | ||
132 | ||
133 | The references you get from rule 2 are the same kind of | |
134 | references that you get from rule 1: | |
135 | ||
136 | # This: | |
137 | $aref = [ 1, 2, 3 ]; | |
138 | ||
139 | # Does the same as this: | |
140 | @array = (1, 2, 3); | |
141 | $aref = \@array; | |
142 | ||
143 | ||
144 | The first line is an abbreviation for the following two lines, except | |
145 | that it doesn't create the superfluous array variable C<@array>. | |
146 | ||
a29d1a25 JH |
147 | If you write just C<[]>, you get a new, empty anonymous array. |
148 | If you write just C<{}>, you get a new, empty anonymous hash. | |
149 | ||
a1e2a320 GS |
150 | |
151 | =head2 Using References | |
152 | ||
153 | What can you do with a reference once you have it? It's a scalar | |
154 | value, and we've seen that you can store it as a scalar and get it back | |
155 | again just like any scalar. There are just two more ways to use it: | |
156 | ||
a29d1a25 | 157 | =head3 B<Use Rule 1> |
a1e2a320 | 158 | |
a29d1a25 JH |
159 | You can always use an array reference, in curly braces, in place of |
160 | the name of an array. For example, C<@{$aref}> instead of C<@array>. | |
a1e2a320 GS |
161 | |
162 | Here are some examples of that: | |
163 | ||
164 | Arrays: | |
165 | ||
166 | ||
167 | @a @{$aref} An array | |
168 | reverse @a reverse @{$aref} Reverse the array | |
169 | $a[3] ${$aref}[3] An element of the array | |
170 | $a[3] = 17; ${$aref}[3] = 17 Assigning an element | |
171 | ||
172 | ||
173 | On each line are two expressions that do the same thing. The | |
174 | left-hand versions operate on the array C<@a>, and the right-hand | |
175 | versions operate on the array that is referred to by C<$aref>, but | |
176 | once they find the array they're operating on, they do the same things | |
177 | to the arrays. | |
178 | ||
179 | Using a hash reference is I<exactly> the same: | |
180 | ||
181 | %h %{$href} A hash | |
182 | keys %h keys %{$href} Get the keys from the hash | |
183 | $h{'red'} ${$href}{'red'} An element of the hash | |
184 | $h{'red'} = 17 ${$href}{'red'} = 17 Assigning an element | |
185 | ||
a29d1a25 JH |
186 | Whatever you want to do with a reference, B<Use Rule 1> tells you how |
187 | to do it. You just write the Perl code that you would have written | |
188 | for doing the same thing to a regular array or hash, and then replace | |
189 | the array or hash name with C<{$reference}>. "How do I loop over an | |
190 | array when all I have is a reference?" Well, to loop over an array, you | |
191 | would write | |
192 | ||
193 | for my $element (@array) { | |
194 | ... | |
195 | } | |
196 | ||
197 | so replace the array name, C<@array>, with the reference: | |
198 | ||
199 | for my $element (@{$aref}) { | |
200 | ... | |
201 | } | |
202 | ||
203 | "How do I print out the contents of a hash when all I have is a | |
204 | reference?" First write the code for printing out a hash: | |
205 | ||
206 | for my $key (keys %hash) { | |
207 | print "$key => $hash{$key}\n"; | |
208 | } | |
209 | ||
210 | And then replace the hash name with the reference: | |
211 | ||
212 | for my $key (keys %{$href}) { | |
213 | print "$key => ${$href}{$key}\n"; | |
214 | } | |
215 | ||
216 | =head3 B<Use Rule 2> | |
a1e2a320 | 217 | |
a29d1a25 JH |
218 | B<Use Rule 1> is all you really need, because it tells you how to to |
219 | absolutely everything you ever need to do with references. But the | |
220 | most common thing to do with an array or a hash is to extract a single | |
221 | element, and the B<Use Rule 1> notation is cumbersome. So there is an | |
222 | abbreviation. | |
a1e2a320 | 223 | |
c47ff5f1 | 224 | C<${$aref}[3]> is too hard to read, so you can write C<< $aref->[3] >> |
a1e2a320 GS |
225 | instead. |
226 | ||
227 | C<${$href}{red}> is too hard to read, so you can write | |
c47ff5f1 | 228 | C<< $href->{red} >> instead. |
a1e2a320 | 229 | |
c47ff5f1 | 230 | If C<$aref> holds a reference to an array, then C<< $aref->[3] >> is |
a1e2a320 GS |
231 | the fourth element of the array. Don't confuse this with C<$aref[3]>, |
232 | which is the fourth element of a totally different array, one | |
233 | deceptively named C<@aref>. C<$aref> and C<@aref> are unrelated the | |
234 | same way that C<$item> and C<@item> are. | |
235 | ||
c47ff5f1 | 236 | Similarly, C<< $href->{'red'} >> is part of the hash referred to by |
a1e2a320 GS |
237 | the scalar variable C<$href>, perhaps even one with no name. |
238 | C<$href{'red'}> is part of the deceptively named C<%href> hash. It's | |
c47ff5f1 | 239 | easy to forget to leave out the C<< -> >>, and if you do, you'll get |
a1e2a320 GS |
240 | bizarre results when your program gets array and hash elements out of |
241 | totally unexpected hashes and arrays that weren't the ones you wanted | |
242 | to use. | |
243 | ||
244 | ||
a29d1a25 | 245 | =head2 An Example |
a1e2a320 GS |
246 | |
247 | Let's see a quick example of how all this is useful. | |
248 | ||
249 | First, remember that C<[1, 2, 3]> makes an anonymous array containing | |
250 | C<(1, 2, 3)>, and gives you a reference to that array. | |
251 | ||
252 | Now think about | |
253 | ||
254 | @a = ( [1, 2, 3], | |
255 | [4, 5, 6], | |
256 | [7, 8, 9] | |
257 | ); | |
258 | ||
259 | @a is an array with three elements, and each one is a reference to | |
260 | another array. | |
261 | ||
262 | C<$a[1]> is one of these references. It refers to an array, the array | |
263 | containing C<(4, 5, 6)>, and because it is a reference to an array, | |
a29d1a25 | 264 | B<Use Rule 2> says that we can write C<< $a[1]->[2] >> to get the |
c47ff5f1 GS |
265 | third element from that array. C<< $a[1]->[2] >> is the 6. |
266 | Similarly, C<< $a[0]->[1] >> is the 2. What we have here is like a | |
267 | two-dimensional array; you can write C<< $a[ROW]->[COLUMN] >> to get | |
a1e2a320 GS |
268 | or set the element in any row and any column of the array. |
269 | ||
270 | The notation still looks a little cumbersome, so there's one more | |
271 | abbreviation: | |
272 | ||
a29d1a25 | 273 | =head2 Arrow Rule |
a1e2a320 GS |
274 | |
275 | In between two B<subscripts>, the arrow is optional. | |
276 | ||
c47ff5f1 | 277 | Instead of C<< $a[1]->[2] >>, we can write C<$a[1][2]>; it means the |
a29d1a25 JH |
278 | same thing. Instead of C<< $a[0]->[1] = 23 >>, we can write |
279 | C<$a[0][1] = 23>; it means the same thing. | |
a1e2a320 GS |
280 | |
281 | Now it really looks like two-dimensional arrays! | |
282 | ||
283 | You can see why the arrows are important. Without them, we would have | |
284 | had to write C<${$a[1]}[2]> instead of C<$a[1][2]>. For | |
285 | three-dimensional arrays, they let us write C<$x[2][3][5]> instead of | |
286 | the unreadable C<${${$x[2]}[3]}[5]>. | |
287 | ||
a1e2a320 GS |
288 | =head1 Solution |
289 | ||
1da6492a GS |
290 | Here's the answer to the problem I posed earlier, of reformatting a |
291 | file of city and country names. | |
a1e2a320 | 292 | |
a29d1a25 JH |
293 | 1 my %table; |
294 | ||
295 | 2 while (<>) { | |
296 | 3 chomp; | |
297 | 4 my ($city, $country) = split /, /; | |
298 | 5 $table{$country} = [] unless exists $table{$country}; | |
299 | 6 push @{$table{$country}}, $city; | |
300 | 7 } | |
301 | ||
302 | 8 foreach $country (sort keys %table) { | |
303 | 9 print "$country: "; | |
304 | 10 my @cities = @{$table{$country}}; | |
305 | 11 print join ', ', sort @cities; | |
306 | 12 print ".\n"; | |
307 | 13 } | |
308 | ||
309 | ||
310 | The program has two pieces: Lines 2--7 read the input and build a data | |
311 | structure, and lines 8-13 analyze the data and print out the report. | |
312 | We're going to have a hash, C<%table>, whose keys are country names, | |
313 | and whose values are references to arrays of city names. The data | |
314 | structure will look like this: | |
315 | ||
316 | ||
317 | %table | |
318 | +-------+---+ | |
319 | | | | +-----------+--------+ | |
320 | |Germany| *---->| Frankfurt | Berlin | | |
321 | | | | +-----------+--------+ | |
322 | +-------+---+ | |
323 | | | | +----------+ | |
324 | |Finland| *---->| Helsinki | | |
325 | | | | +----------+ | |
326 | +-------+---+ | |
327 | | | | +---------+------------+----------+ | |
328 | | USA | *---->| Chicago | Washington | New York | | |
329 | | | | +---------+------------+----------+ | |
330 | +-------+---+ | |
331 | ||
332 | We'll look at output first. Supposing we already have this structure, | |
333 | how do we print it out? | |
334 | ||
335 | C<%table> is an | |
336 | ordinary hash, and we get a list of keys from it, sort the keys, and | |
337 | loop over the keys as usual. The only use of references is in line 10. | |
338 | C<$table{$country}> looks up the key C<$country> in the hash | |
339 | and gets the value, which is a reference to an array of cities in that country. | |
340 | B<Use Rule 1> says that | |
341 | we can recover the array by saying | |
342 | C<@{$table{$country}}>. Line 10 is just like | |
a1e2a320 | 343 | |
a29d1a25 | 344 | @cities = @array; |
a1e2a320 GS |
345 | |
346 | except that the name C<array> has been replaced by the reference | |
a29d1a25 JH |
347 | C<{$table{$country}}>. The C<@> tells Perl to get the entire array. |
348 | Having gotten the list of cities, we sort it, join it, and print it | |
349 | out as usual. | |
a1e2a320 | 350 | |
a29d1a25 JH |
351 | Lines 2-7 are responsible for building the structure in the first |
352 | place; here they are again: | |
a1e2a320 | 353 | |
a29d1a25 JH |
354 | 2 while (<>) { |
355 | 3 chomp; | |
356 | 4 my ($city, $country) = split /, /; | |
357 | 5 $table{$country} = [] unless exists $table{$country}; | |
358 | 6 push @{$table{$country}}, $city; | |
359 | 7 } | |
a1e2a320 | 360 | |
a29d1a25 JH |
361 | Lines 2-4 acquire a city and country name. Line 5 looks to see if the |
362 | country is already present as a key in the hash. If it's not, the | |
363 | program uses the C<[]> notation (B<Make Rule 2>) to manufacture a new, | |
364 | empty anonymous array of cities, and installs a reference to it into | |
365 | the hash under the appropriate key. | |
a1e2a320 | 366 | |
a29d1a25 JH |
367 | Line 6 installs the city name into the appropriate array. |
368 | C<$table{$country}> now holds a reference to the array of cities seen | |
369 | in that country so far. Line 6 is exactly like | |
a1e2a320 | 370 | |
a29d1a25 | 371 | push @array, $city; |
a1e2a320 | 372 | |
a29d1a25 JH |
373 | except that the name C<array> has been replaced by the reference |
374 | C<{$table{$country}}>. The C<push> adds a city name to the end of the | |
375 | referred-to array. | |
a1e2a320 | 376 | |
a29d1a25 JH |
377 | There's one fine point I skipped. Line 5 is unnecessary, and we can |
378 | get rid of it. | |
379 | ||
380 | 2 while (<>) { | |
381 | 3 chomp; | |
382 | 4 my ($city, $country) = split /, /; | |
383 | 5 #### $table{$country} = [] unless exists $table{$country}; | |
384 | 6 push @{$table{$country}}, $city; | |
385 | 7 } | |
386 | ||
387 | If there's already an entry in C<%table> for the current C<$country>, | |
388 | then nothing is different. Line 6 will locate the value in | |
389 | C<$table{$country}>, which is a reference to an array, and push | |
390 | C<$city> into the array. But | |
391 | what does it do when | |
392 | C<$country> holds a key, say C<Greece>, that is not yet in C<%table>? | |
a1e2a320 GS |
393 | |
394 | This is Perl, so it does the exact right thing. It sees that you want | |
1da6492a | 395 | to push C<Athens> onto an array that doesn't exist, so it helpfully |
a29d1a25 JH |
396 | makes a new, empty, anonymous array for you, installs it into |
397 | C<%table>, and then pushes C<Athens> onto it. This is called | |
398 | `autovivification'--bringing things to life automatically. Perl saw | |
399 | that they key wasn't in the hash, so it created a new hash entry | |
400 | automatically. Perl saw that you wanted to use the hash value as an | |
401 | array, so it created a new empty array and installed a reference to it | |
402 | in the hash automatically. And as usual, Perl made the array one | |
403 | element longer to hold the new city name. | |
a1e2a320 GS |
404 | |
405 | =head1 The Rest | |
406 | ||
407 | I promised to give you 90% of the benefit with 10% of the details, and | |
408 | that means I left out 90% of the details. Now that you have an | |
409 | overview of the important parts, it should be easier to read the | |
410 | L<perlref> manual page, which discusses 100% of the details. | |
411 | ||
412 | Some of the highlights of L<perlref>: | |
413 | ||
414 | =over 4 | |
415 | ||
416 | =item * | |
417 | ||
418 | You can make references to anything, including scalars, functions, and | |
419 | other references. | |
420 | ||
421 | =item * | |
422 | ||
d98d5fff | 423 | In B<USE RULE 1>, you can omit the curly brackets whenever the thing |
1da6492a | 424 | inside them is an atomic scalar variable like C<$aref>. For example, |
a1e2a320 | 425 | C<@$aref> is the same as C<@{$aref}>, and C<$$aref[1]> is the same as |
1da6492a | 426 | C<${$aref}[1]>. If you're just starting out, you may want to adopt |
d98d5fff | 427 | the habit of always including the curly brackets. |
a1e2a320 | 428 | |
a29d1a25 JH |
429 | =item * |
430 | ||
431 | This doesn't copy the underlying array: | |
432 | ||
433 | $aref2 = $aref1; | |
434 | ||
435 | You get two references to the same array. If you modify | |
436 | C<< $aref1->[23] >> and then look at | |
437 | C<< $aref2->[23] >> you'll see the change. | |
438 | ||
439 | To copy the array, use | |
440 | ||
441 | $aref2 = [@{$aref1}]; | |
442 | ||
443 | This uses C<[...]> notation to create a new anonymous array, and | |
444 | C<$aref2> is assigned a reference to the new array. The new array is | |
445 | initialized with the contents of the array referred to by C<$aref1>. | |
446 | ||
447 | Similarly, to copy an anonymous hash, you can use | |
448 | ||
449 | $href = {%{$href}}; | |
450 | ||
a1e2a320 GS |
451 | =item * |
452 | ||
a29d1a25 JH |
453 | To see if a variable contains a reference, use the `ref' function. It |
454 | returns true if its argument is a reference. Actually it's a little | |
455 | better than that: It returns C<HASH> for hash references and C<ARRAY> | |
456 | for array references. | |
a1e2a320 GS |
457 | |
458 | =item * | |
459 | ||
460 | If you try to use a reference like a string, you get strings like | |
461 | ||
462 | ARRAY(0x80f5dec) or HASH(0x826afc0) | |
463 | ||
464 | If you ever see a string that looks like this, you'll know you | |
465 | printed out a reference by mistake. | |
466 | ||
467 | A side effect of this representation is that you can use C<eq> to see | |
468 | if two references refer to the same thing. (But you should usually use | |
469 | C<==> instead because it's much faster.) | |
470 | ||
471 | =item * | |
472 | ||
473 | You can use a string as if it were a reference. If you use the string | |
474 | C<"foo"> as an array reference, it's taken to be a reference to the | |
475 | array C<@foo>. This is called a I<soft reference> or I<symbolic reference>. | |
476 | ||
477 | =back | |
478 | ||
479 | You might prefer to go on to L<perllol> instead of L<perlref>; it | |
480 | discusses lists of lists and multidimensional arrays in detail. After | |
481 | that, you should move on to L<perldsc>; it's a Data Structure Cookbook | |
482 | that shows recipes for using and printing out arrays of hashes, hashes | |
483 | of arrays, and other kinds of data. | |
484 | ||
485 | =head1 Summary | |
486 | ||
487 | Everyone needs compound data structures, and in Perl the way you get | |
488 | them is with references. There are four important rules for managing | |
489 | references: Two for making references and two for using them. Once | |
490 | you know these rules you can do most of the important things you need | |
491 | to do with references. | |
492 | ||
493 | =head1 Credits | |
494 | ||
fd97da5a | 495 | Author: Mark-Jason Dominus, Plover Systems (C<mjd-perl-ref+@plover.com>) |
a1e2a320 | 496 | |
1da6492a | 497 | This article originally appeared in I<The Perl Journal> |
f224927c | 498 | ( http://www.tpj.com/ ) volume 3, #2. Reprinted with permission. |
a1e2a320 GS |
499 | |
500 | The original title was I<Understand References Today>. | |
501 | ||
1da6492a GS |
502 | =head2 Distribution Conditions |
503 | ||
504 | Copyright 1998 The Perl Journal. | |
505 | ||
506 | When included as part of the Standard Version of Perl, or as part of | |
507 | its complete documentation whether printed or otherwise, this work may | |
508 | be distributed only under the terms of Perl's Artistic License. Any | |
509 | distribution of this file or derivatives thereof outside of that | |
510 | package require that special arrangements be made with copyright | |
511 | holder. | |
512 | ||
513 | Irrespective of its distribution, all code examples in these files are | |
514 | hereby placed into the public domain. You are permitted and | |
515 | encouraged to use this code in your own programs for fun or for profit | |
516 | as you see fit. A simple comment in the code giving credit would be | |
517 | courteous but is not required. | |
a1e2a320 | 518 | |
a1e2a320 | 519 | |
1da6492a GS |
520 | |
521 | ||
522 | =cut |