Commit | Line | Data |
---|---|---|
cb1a09d0 | 1 | =head1 NAME |
4633a7c4 | 2 | |
cb1a09d0 | 3 | perldsc - Perl Data Structures Cookbook |
4633a7c4 | 4 | |
cb1a09d0 | 5 | =head1 DESCRIPTION |
4633a7c4 LW |
6 | |
7 | The single feature most sorely lacking in the Perl programming language | |
8 | prior to its 5.0 release was complex data structures. Even without direct | |
9 | language support, some valiant programmers did manage to emulate them, but | |
10 | it was hard work and not for the faint of heart. You could occasionally | |
19799a22 GS |
11 | get away with the C<$m{$AoA,$b}> notation borrowed from B<awk> in which the |
12 | keys are actually more like a single concatenated string C<"$AoA$b">, but | |
4633a7c4 LW |
13 | traversal and sorting were difficult. More desperate programmers even |
14 | hacked Perl's internal symbol table directly, a strategy that proved hard | |
15 | to develop and maintain--to put it mildly. | |
16 | ||
17 | The 5.0 release of Perl let us have complex data structures. You | |
18 | may now write something like this and all of a sudden, you'd have a array | |
19 | with three dimensions! | |
20 | ||
21 | for $x (1 .. 10) { | |
22 | for $y (1 .. 10) { | |
23 | for $z (1 .. 10) { | |
19799a22 | 24 | $AoA[$x][$y][$z] = |
4633a7c4 LW |
25 | $x ** $y + $z; |
26 | } | |
27 | } | |
28 | } | |
29 | ||
30 | Alas, however simple this may appear, underneath it's a much more | |
31 | elaborate construct than meets the eye! | |
32 | ||
19799a22 | 33 | How do you print it out? Why can't you say just C<print @AoA>? How do |
4633a7c4 LW |
34 | you sort it? How can you pass it to a function or get one of these back |
35 | from a function? Is is an object? Can you save it to disk to read | |
36 | back later? How do you access whole rows or columns of that matrix? Do | |
4973169d | 37 | all the values have to be numeric? |
4633a7c4 LW |
38 | |
39 | As you see, it's quite easy to become confused. While some small portion | |
40 | of the blame for this can be attributed to the reference-based | |
41 | implementation, it's really more due to a lack of existing documentation with | |
42 | examples designed for the beginner. | |
43 | ||
5f05dabc | 44 | This document is meant to be a detailed but understandable treatment of the |
45 | many different sorts of data structures you might want to develop. It | |
46 | should also serve as a cookbook of examples. That way, when you need to | |
47 | create one of these complex data structures, you can just pinch, pilfer, or | |
48 | purloin a drop-in example from here. | |
4633a7c4 LW |
49 | |
50 | Let's look at each of these possible constructs in detail. There are separate | |
28757baa | 51 | sections on each of the following: |
4633a7c4 LW |
52 | |
53 | =over 5 | |
54 | ||
55 | =item * arrays of arrays | |
56 | ||
57 | =item * hashes of arrays | |
58 | ||
59 | =item * arrays of hashes | |
60 | ||
61 | =item * hashes of hashes | |
62 | ||
63 | =item * more elaborate constructs | |
64 | ||
4633a7c4 LW |
65 | =back |
66 | ||
5a964f20 TC |
67 | But for now, let's look at general issues common to all |
68 | these types of data structures. | |
4633a7c4 LW |
69 | |
70 | =head1 REFERENCES | |
71 | ||
72 | The most important thing to understand about all data structures in Perl | |
73 | -- including multidimensional arrays--is that even though they might | |
74 | appear otherwise, Perl C<@ARRAY>s and C<%HASH>es are all internally | |
5f05dabc | 75 | one-dimensional. They can hold only scalar values (meaning a string, |
4633a7c4 LW |
76 | number, or a reference). They cannot directly contain other arrays or |
77 | hashes, but instead contain I<references> to other arrays or hashes. | |
78 | ||
5f05dabc | 79 | You can't use a reference to a array or hash in quite the same way that you |
80 | would a real array or hash. For C or C++ programmers unused to | |
81 | distinguishing between arrays and pointers to the same, this can be | |
82 | confusing. If so, just think of it as the difference between a structure | |
83 | and a pointer to a structure. | |
4633a7c4 LW |
84 | |
85 | You can (and should) read more about references in the perlref(1) man | |
86 | page. Briefly, references are rather like pointers that know what they | |
87 | point to. (Objects are also a kind of reference, but we won't be needing | |
4973169d | 88 | them right away--if ever.) This means that when you have something which |
89 | looks to you like an access to a two-or-more-dimensional array and/or hash, | |
90 | what's really going on is that the base type is | |
4633a7c4 LW |
91 | merely a one-dimensional entity that contains references to the next |
92 | level. It's just that you can I<use> it as though it were a | |
93 | two-dimensional one. This is actually the way almost all C | |
94 | multidimensional arrays work as well. | |
95 | ||
19799a22 GS |
96 | $array[7][12] # array of arrays |
97 | $array[7]{string} # array of hashes | |
4633a7c4 LW |
98 | $hash{string}[7] # hash of arrays |
99 | $hash{string}{'another string'} # hash of hashes | |
100 | ||
5f05dabc | 101 | Now, because the top level contains only references, if you try to print |
4633a7c4 LW |
102 | out your array in with a simple print() function, you'll get something |
103 | that doesn't look very nice, like this: | |
104 | ||
19799a22 GS |
105 | @AoA = ( [2, 3], [4, 5, 7], [0] ); |
106 | print $AoA[1][2]; | |
4633a7c4 | 107 | 7 |
19799a22 | 108 | print @AoA; |
4633a7c4 LW |
109 | ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0) |
110 | ||
111 | ||
112 | That's because Perl doesn't (ever) implicitly dereference your variables. | |
113 | If you want to get at the thing a reference is referring to, then you have | |
114 | to do this yourself using either prefix typing indicators, like | |
115 | C<${$blah}>, C<@{$blah}>, C<@{$blah[$i]}>, or else postfix pointer arrows, | |
116 | like C<$a-E<gt>[3]>, C<$h-E<gt>{fred}>, or even C<$ob-E<gt>method()-E<gt>[3]>. | |
117 | ||
118 | =head1 COMMON MISTAKES | |
119 | ||
120 | The two most common mistakes made in constructing something like | |
121 | an array of arrays is either accidentally counting the number of | |
122 | elements or else taking a reference to the same memory location | |
123 | repeatedly. Here's the case where you just get the count instead | |
124 | of a nested array: | |
125 | ||
126 | for $i (1..10) { | |
19799a22 GS |
127 | @array = somefunc($i); |
128 | $AoA[$i] = @array; # WRONG! | |
4973169d | 129 | } |
4633a7c4 | 130 | |
19799a22 | 131 | That's just the simple case of assigning an array to a scalar and getting |
4633a7c4 LW |
132 | its element count. If that's what you really and truly want, then you |
133 | might do well to consider being a tad more explicit about it, like this: | |
134 | ||
135 | for $i (1..10) { | |
19799a22 GS |
136 | @array = somefunc($i); |
137 | $counts[$i] = scalar @array; | |
4973169d | 138 | } |
4633a7c4 LW |
139 | |
140 | Here's the case of taking a reference to the same memory location | |
141 | again and again: | |
142 | ||
143 | for $i (1..10) { | |
19799a22 GS |
144 | @array = somefunc($i); |
145 | $AoA[$i] = \@array; # WRONG! | |
4973169d | 146 | } |
4633a7c4 | 147 | |
5f05dabc | 148 | So, what's the big problem with that? It looks right, doesn't it? |
4633a7c4 LW |
149 | After all, I just told you that you need an array of references, so by |
150 | golly, you've made me one! | |
151 | ||
152 | Unfortunately, while this is true, it's still broken. All the references | |
19799a22 GS |
153 | in @AoA refer to the I<very same place>, and they will therefore all hold |
154 | whatever was last in @array! It's similar to the problem demonstrated in | |
4633a7c4 LW |
155 | the following C program: |
156 | ||
157 | #include <pwd.h> | |
158 | main() { | |
159 | struct passwd *getpwnam(), *rp, *dp; | |
160 | rp = getpwnam("root"); | |
161 | dp = getpwnam("daemon"); | |
162 | ||
4973169d | 163 | printf("daemon name is %s\nroot name is %s\n", |
4633a7c4 LW |
164 | dp->pw_name, rp->pw_name); |
165 | } | |
166 | ||
167 | Which will print | |
168 | ||
169 | daemon name is daemon | |
4973169d | 170 | root name is daemon |
4633a7c4 LW |
171 | |
172 | The problem is that both C<rp> and C<dp> are pointers to the same location | |
173 | in memory! In C, you'd have to remember to malloc() yourself some new | |
174 | memory. In Perl, you'll want to use the array constructor C<[]> or the | |
175 | hash constructor C<{}> instead. Here's the right way to do the preceding | |
4973169d | 176 | broken code fragments: |
4633a7c4 LW |
177 | |
178 | for $i (1..10) { | |
19799a22 GS |
179 | @array = somefunc($i); |
180 | $AoA[$i] = [ @array ]; | |
4973169d | 181 | } |
4633a7c4 LW |
182 | |
183 | The square brackets make a reference to a new array with a I<copy> | |
19799a22 | 184 | of what's in @array at the time of the assignment. This is what |
4973169d | 185 | you want. |
4633a7c4 LW |
186 | |
187 | Note that this will produce something similar, but it's | |
188 | much harder to read: | |
189 | ||
190 | for $i (1..10) { | |
19799a22 GS |
191 | @array = 0 .. $i; |
192 | @{$AoA[$i]} = @array; | |
4973169d | 193 | } |
4633a7c4 LW |
194 | |
195 | Is it the same? Well, maybe so--and maybe not. The subtle difference | |
196 | is that when you assign something in square brackets, you know for sure | |
197 | it's always a brand new reference with a new I<copy> of the data. | |
19799a22 | 198 | Something else could be going on in this new case with the C<@{$AoA[$i]}}> |
4633a7c4 | 199 | dereference on the left-hand-side of the assignment. It all depends on |
19799a22 GS |
200 | whether C<$AoA[$i]> had been undefined to start with, or whether it |
201 | already contained a reference. If you had already populated @AoA with | |
4633a7c4 LW |
202 | references, as in |
203 | ||
19799a22 | 204 | $AoA[3] = \@another_array; |
4633a7c4 LW |
205 | |
206 | Then the assignment with the indirection on the left-hand-side would | |
207 | use the existing reference that was already there: | |
208 | ||
19799a22 | 209 | @{$AoA[3]} = @array; |
4633a7c4 LW |
210 | |
211 | Of course, this I<would> have the "interesting" effect of clobbering | |
19799a22 | 212 | @another_array. (Have you ever noticed how when a programmer says |
4633a7c4 LW |
213 | something is "interesting", that rather than meaning "intriguing", |
214 | they're disturbingly more apt to mean that it's "annoying", | |
215 | "difficult", or both? :-) | |
216 | ||
5f05dabc | 217 | So just remember always to use the array or hash constructors with C<[]> |
4633a7c4 | 218 | or C<{}>, and you'll be fine, although it's not always optimally |
4973169d | 219 | efficient. |
4633a7c4 LW |
220 | |
221 | Surprisingly, the following dangerous-looking construct will | |
222 | actually work out fine: | |
223 | ||
224 | for $i (1..10) { | |
19799a22 GS |
225 | my @array = somefunc($i); |
226 | $AoA[$i] = \@array; | |
4973169d | 227 | } |
4633a7c4 LW |
228 | |
229 | That's because my() is more of a run-time statement than it is a | |
230 | compile-time declaration I<per se>. This means that the my() variable is | |
231 | remade afresh each time through the loop. So even though it I<looks> as | |
232 | though you stored the same variable reference each time, you actually did | |
233 | not! This is a subtle distinction that can produce more efficient code at | |
234 | the risk of misleading all but the most experienced of programmers. So I | |
235 | usually advise against teaching it to beginners. In fact, except for | |
236 | passing arguments to functions, I seldom like to see the gimme-a-reference | |
237 | operator (backslash) used much at all in code. Instead, I advise | |
238 | beginners that they (and most of the rest of us) should try to use the | |
239 | much more easily understood constructors C<[]> and C<{}> instead of | |
240 | relying upon lexical (or dynamic) scoping and hidden reference-counting to | |
241 | do the right thing behind the scenes. | |
242 | ||
243 | In summary: | |
244 | ||
19799a22 GS |
245 | $AoA[$i] = [ @array ]; # usually best |
246 | $AoA[$i] = \@array; # perilous; just how my() was that array? | |
247 | @{ $AoA[$i] } = @array; # way too tricky for most programmers | |
4633a7c4 LW |
248 | |
249 | ||
4973169d | 250 | =head1 CAVEAT ON PRECEDENCE |
4633a7c4 | 251 | |
19799a22 | 252 | Speaking of things like C<@{$AoA[$i]}>, the following are actually the |
4633a7c4 LW |
253 | same thing: |
254 | ||
19799a22 GS |
255 | $aref->[2][2] # clear |
256 | $$aref[2][2] # confusing | |
4633a7c4 LW |
257 | |
258 | That's because Perl's precedence rules on its five prefix dereferencers | |
259 | (which look like someone swearing: C<$ @ * % &>) make them bind more | |
260 | tightly than the postfix subscripting brackets or braces! This will no | |
261 | doubt come as a great shock to the C or C++ programmer, who is quite | |
262 | accustomed to using C<*a[i]> to mean what's pointed to by the I<i'th> | |
263 | element of C<a>. That is, they first take the subscript, and only then | |
264 | dereference the thing at that subscript. That's fine in C, but this isn't C. | |
265 | ||
19799a22 GS |
266 | The seemingly equivalent construct in Perl, C<$$aref[$i]> first does |
267 | the deref of $aref, making it take $aref as a reference to an | |
4633a7c4 | 268 | array, and then dereference that, and finally tell you the I<i'th> value |
19799a22 GS |
269 | of the array pointed to by $AoA. If you wanted the C notion, you'd have to |
270 | write C<${$AoA[$i]}> to force the C<$AoA[$i]> to get evaluated first | |
4633a7c4 LW |
271 | before the leading C<$> dereferencer. |
272 | ||
273 | =head1 WHY YOU SHOULD ALWAYS C<use strict> | |
274 | ||
275 | If this is starting to sound scarier than it's worth, relax. Perl has | |
276 | some features to help you avoid its most common pitfalls. The best | |
277 | way to avoid getting confused is to start every program like this: | |
278 | ||
279 | #!/usr/bin/perl -w | |
280 | use strict; | |
281 | ||
282 | This way, you'll be forced to declare all your variables with my() and | |
283 | also disallow accidental "symbolic dereferencing". Therefore if you'd done | |
284 | this: | |
285 | ||
19799a22 | 286 | my $aref = [ |
4633a7c4 LW |
287 | [ "fred", "barney", "pebbles", "bambam", "dino", ], |
288 | [ "homer", "bart", "marge", "maggie", ], | |
5f05dabc | 289 | [ "george", "jane", "elroy", "judy", ], |
4633a7c4 LW |
290 | ]; |
291 | ||
19799a22 | 292 | print $aref[2][2]; |
4633a7c4 LW |
293 | |
294 | The compiler would immediately flag that as an error I<at compile time>, | |
19799a22 | 295 | because you were accidentally accessing C<@aref>, an undeclared |
5f05dabc | 296 | variable, and it would thereby remind you to write instead: |
4633a7c4 | 297 | |
19799a22 | 298 | print $aref->[2][2] |
4633a7c4 LW |
299 | |
300 | =head1 DEBUGGING | |
301 | ||
a6006777 | 302 | Before version 5.002, the standard Perl debugger didn't do a very nice job of |
303 | printing out complex data structures. With 5.002 or above, the | |
4973169d | 304 | debugger includes several new features, including command line editing as |
305 | well as the C<x> command to dump out complex data structures. For | |
19799a22 | 306 | example, given the assignment to $AoA above, here's the debugger output: |
4633a7c4 | 307 | |
19799a22 GS |
308 | DB<1> x $AoA |
309 | $AoA = ARRAY(0x13b5a0) | |
4633a7c4 LW |
310 | 0 ARRAY(0x1f0a24) |
311 | 0 'fred' | |
312 | 1 'barney' | |
313 | 2 'pebbles' | |
314 | 3 'bambam' | |
315 | 4 'dino' | |
316 | 1 ARRAY(0x13b558) | |
317 | 0 'homer' | |
318 | 1 'bart' | |
319 | 2 'marge' | |
320 | 3 'maggie' | |
321 | 2 ARRAY(0x13b540) | |
322 | 0 'george' | |
323 | 1 'jane' | |
5f05dabc | 324 | 2 'elroy' |
4633a7c4 LW |
325 | 3 'judy' |
326 | ||
cb1a09d0 AD |
327 | =head1 CODE EXAMPLES |
328 | ||
54310121 | 329 | Presented with little comment (these will get their own manpages someday) |
4973169d | 330 | here are short code examples illustrating access of various |
cb1a09d0 AD |
331 | types of data structures. |
332 | ||
19799a22 | 333 | =head1 ARRAYS OF ARRAYS |
cb1a09d0 | 334 | |
19799a22 | 335 | =head2 Declaration of a ARRAY OF ARRAYS |
cb1a09d0 | 336 | |
19799a22 | 337 | @AoA = ( |
cb1a09d0 AD |
338 | [ "fred", "barney" ], |
339 | [ "george", "jane", "elroy" ], | |
340 | [ "homer", "marge", "bart" ], | |
341 | ); | |
342 | ||
19799a22 | 343 | =head2 Generation of a ARRAY OF ARRAYS |
cb1a09d0 AD |
344 | |
345 | # reading from file | |
346 | while ( <> ) { | |
19799a22 | 347 | push @AoA, [ split ]; |
4973169d | 348 | } |
cb1a09d0 AD |
349 | |
350 | # calling a function | |
351 | for $i ( 1 .. 10 ) { | |
19799a22 | 352 | $AoA[$i] = [ somefunc($i) ]; |
4973169d | 353 | } |
cb1a09d0 AD |
354 | |
355 | # using temp vars | |
356 | for $i ( 1 .. 10 ) { | |
357 | @tmp = somefunc($i); | |
19799a22 | 358 | $AoA[$i] = [ @tmp ]; |
4973169d | 359 | } |
cb1a09d0 AD |
360 | |
361 | # add to an existing row | |
19799a22 | 362 | push @{ $AoA[0] }, "wilma", "betty"; |
cb1a09d0 | 363 | |
19799a22 | 364 | =head2 Access and Printing of a ARRAY OF ARRAYS |
cb1a09d0 AD |
365 | |
366 | # one element | |
19799a22 | 367 | $AoA[0][0] = "Fred"; |
cb1a09d0 AD |
368 | |
369 | # another element | |
19799a22 | 370 | $AoA[1][1] =~ s/(\w)/\u$1/; |
cb1a09d0 AD |
371 | |
372 | # print the whole thing with refs | |
19799a22 | 373 | for $aref ( @AoA ) { |
cb1a09d0 | 374 | print "\t [ @$aref ],\n"; |
4973169d | 375 | } |
cb1a09d0 AD |
376 | |
377 | # print the whole thing with indices | |
19799a22 GS |
378 | for $i ( 0 .. $#AoA ) { |
379 | print "\t [ @{$AoA[$i]} ],\n"; | |
4973169d | 380 | } |
cb1a09d0 AD |
381 | |
382 | # print the whole thing one at a time | |
19799a22 GS |
383 | for $i ( 0 .. $#AoA ) { |
384 | for $j ( 0 .. $#{ $AoA[$i] } ) { | |
385 | print "elt $i $j is $AoA[$i][$j]\n"; | |
cb1a09d0 | 386 | } |
4973169d | 387 | } |
cb1a09d0 | 388 | |
19799a22 | 389 | =head1 HASHES OF ARRAYS |
cb1a09d0 | 390 | |
19799a22 | 391 | =head2 Declaration of a HASH OF ARRAYS |
cb1a09d0 | 392 | |
19799a22 | 393 | %HoA = ( |
28757baa | 394 | flintstones => [ "fred", "barney" ], |
395 | jetsons => [ "george", "jane", "elroy" ], | |
396 | simpsons => [ "homer", "marge", "bart" ], | |
cb1a09d0 AD |
397 | ); |
398 | ||
19799a22 | 399 | =head2 Generation of a HASH OF ARRAYS |
cb1a09d0 AD |
400 | |
401 | # reading from file | |
402 | # flintstones: fred barney wilma dino | |
403 | while ( <> ) { | |
404 | next unless s/^(.*?):\s*//; | |
19799a22 | 405 | $HoA{$1} = [ split ]; |
4973169d | 406 | } |
cb1a09d0 AD |
407 | |
408 | # reading from file; more temps | |
409 | # flintstones: fred barney wilma dino | |
410 | while ( $line = <> ) { | |
411 | ($who, $rest) = split /:\s*/, $line, 2; | |
412 | @fields = split ' ', $rest; | |
19799a22 | 413 | $HoA{$who} = [ @fields ]; |
4973169d | 414 | } |
cb1a09d0 AD |
415 | |
416 | # calling a function that returns a list | |
417 | for $group ( "simpsons", "jetsons", "flintstones" ) { | |
19799a22 | 418 | $HoA{$group} = [ get_family($group) ]; |
4973169d | 419 | } |
cb1a09d0 AD |
420 | |
421 | # likewise, but using temps | |
422 | for $group ( "simpsons", "jetsons", "flintstones" ) { | |
423 | @members = get_family($group); | |
19799a22 | 424 | $HoA{$group} = [ @members ]; |
4973169d | 425 | } |
cb1a09d0 AD |
426 | |
427 | # append new members to an existing family | |
19799a22 | 428 | push @{ $HoA{"flintstones"} }, "wilma", "betty"; |
cb1a09d0 | 429 | |
19799a22 | 430 | =head2 Access and Printing of a HASH OF ARRAYS |
cb1a09d0 AD |
431 | |
432 | # one element | |
19799a22 | 433 | $HoA{flintstones}[0] = "Fred"; |
cb1a09d0 AD |
434 | |
435 | # another element | |
19799a22 | 436 | $HoA{simpsons}[1] =~ s/(\w)/\u$1/; |
cb1a09d0 AD |
437 | |
438 | # print the whole thing | |
19799a22 GS |
439 | foreach $family ( keys %HoA ) { |
440 | print "$family: @{ $HoA{$family} }\n" | |
4973169d | 441 | } |
cb1a09d0 AD |
442 | |
443 | # print the whole thing with indices | |
19799a22 | 444 | foreach $family ( keys %HoA ) { |
cb1a09d0 | 445 | print "family: "; |
19799a22 GS |
446 | foreach $i ( 0 .. $#{ $HoA{$family} } ) { |
447 | print " $i = $HoA{$family}[$i]"; | |
cb1a09d0 AD |
448 | } |
449 | print "\n"; | |
4973169d | 450 | } |
cb1a09d0 AD |
451 | |
452 | # print the whole thing sorted by number of members | |
19799a22 GS |
453 | foreach $family ( sort { @{$HoA{$b}} <=> @{$HoA{$a}} } keys %HoA ) { |
454 | print "$family: @{ $HoA{$family} }\n" | |
4973169d | 455 | } |
cb1a09d0 AD |
456 | |
457 | # print the whole thing sorted by number of members and name | |
54310121 | 458 | foreach $family ( sort { |
19799a22 | 459 | @{$HoA{$b}} <=> @{$HoA{$a}} |
28757baa | 460 | || |
461 | $a cmp $b | |
19799a22 | 462 | } keys %HoA ) |
28757baa | 463 | { |
19799a22 | 464 | print "$family: ", join(", ", sort @{ $HoA{$family} }), "\n"; |
4973169d | 465 | } |
cb1a09d0 | 466 | |
19799a22 | 467 | =head1 ARRAYS OF HASHES |
cb1a09d0 | 468 | |
19799a22 | 469 | =head2 Declaration of a ARRAY OF HASHES |
cb1a09d0 | 470 | |
19799a22 | 471 | @AoH = ( |
cb1a09d0 | 472 | { |
4973169d | 473 | Lead => "fred", |
474 | Friend => "barney", | |
cb1a09d0 AD |
475 | }, |
476 | { | |
477 | Lead => "george", | |
478 | Wife => "jane", | |
479 | Son => "elroy", | |
480 | }, | |
481 | { | |
482 | Lead => "homer", | |
483 | Wife => "marge", | |
484 | Son => "bart", | |
485 | } | |
486 | ); | |
487 | ||
19799a22 | 488 | =head2 Generation of a ARRAY OF HASHES |
cb1a09d0 AD |
489 | |
490 | # reading from file | |
491 | # format: LEAD=fred FRIEND=barney | |
492 | while ( <> ) { | |
493 | $rec = {}; | |
494 | for $field ( split ) { | |
495 | ($key, $value) = split /=/, $field; | |
496 | $rec->{$key} = $value; | |
497 | } | |
19799a22 | 498 | push @AoH, $rec; |
4973169d | 499 | } |
cb1a09d0 AD |
500 | |
501 | ||
502 | # reading from file | |
503 | # format: LEAD=fred FRIEND=barney | |
504 | # no temp | |
505 | while ( <> ) { | |
19799a22 | 506 | push @AoH, { split /[\s+=]/ }; |
4973169d | 507 | } |
cb1a09d0 | 508 | |
19799a22 | 509 | # calling a function that returns a key/value pair list, like |
cb1a09d0 | 510 | # "lead","fred","daughter","pebbles" |
1fef88e7 | 511 | while ( %fields = getnextpairset() ) { |
19799a22 | 512 | push @AoH, { %fields }; |
4973169d | 513 | } |
cb1a09d0 AD |
514 | |
515 | # likewise, but using no temp vars | |
516 | while (<>) { | |
19799a22 | 517 | push @AoH, { parsepairs($_) }; |
4973169d | 518 | } |
cb1a09d0 AD |
519 | |
520 | # add key/value to an element | |
19799a22 GS |
521 | $AoH[0]{pet} = "dino"; |
522 | $AoH[2]{pet} = "santa's little helper"; | |
cb1a09d0 | 523 | |
19799a22 | 524 | =head2 Access and Printing of a ARRAY OF HASHES |
cb1a09d0 AD |
525 | |
526 | # one element | |
19799a22 | 527 | $AoH[0]{lead} = "fred"; |
cb1a09d0 AD |
528 | |
529 | # another element | |
19799a22 | 530 | $AoH[1]{lead} =~ s/(\w)/\u$1/; |
cb1a09d0 AD |
531 | |
532 | # print the whole thing with refs | |
19799a22 | 533 | for $href ( @AoH ) { |
cb1a09d0 AD |
534 | print "{ "; |
535 | for $role ( keys %$href ) { | |
536 | print "$role=$href->{$role} "; | |
537 | } | |
538 | print "}\n"; | |
4973169d | 539 | } |
cb1a09d0 AD |
540 | |
541 | # print the whole thing with indices | |
19799a22 | 542 | for $i ( 0 .. $#AoH ) { |
cb1a09d0 | 543 | print "$i is { "; |
19799a22 GS |
544 | for $role ( keys %{ $AoH[$i] } ) { |
545 | print "$role=$AoH[$i]{$role} "; | |
cb1a09d0 AD |
546 | } |
547 | print "}\n"; | |
4973169d | 548 | } |
cb1a09d0 AD |
549 | |
550 | # print the whole thing one at a time | |
19799a22 GS |
551 | for $i ( 0 .. $#AoH ) { |
552 | for $role ( keys %{ $AoH[$i] } ) { | |
553 | print "elt $i $role is $AoH[$i]{$role}\n"; | |
cb1a09d0 | 554 | } |
4973169d | 555 | } |
cb1a09d0 AD |
556 | |
557 | =head1 HASHES OF HASHES | |
558 | ||
559 | =head2 Declaration of a HASH OF HASHES | |
560 | ||
561 | %HoH = ( | |
28757baa | 562 | flintstones => { |
563 | lead => "fred", | |
564 | pal => "barney", | |
cb1a09d0 | 565 | }, |
28757baa | 566 | jetsons => { |
567 | lead => "george", | |
568 | wife => "jane", | |
569 | "his boy" => "elroy", | |
4973169d | 570 | }, |
28757baa | 571 | simpsons => { |
572 | lead => "homer", | |
573 | wife => "marge", | |
574 | kid => "bart", | |
4973169d | 575 | }, |
576 | ); | |
cb1a09d0 AD |
577 | |
578 | =head2 Generation of a HASH OF HASHES | |
579 | ||
580 | # reading from file | |
581 | # flintstones: lead=fred pal=barney wife=wilma pet=dino | |
582 | while ( <> ) { | |
583 | next unless s/^(.*?):\s*//; | |
584 | $who = $1; | |
585 | for $field ( split ) { | |
586 | ($key, $value) = split /=/, $field; | |
587 | $HoH{$who}{$key} = $value; | |
588 | } | |
589 | ||
590 | ||
591 | # reading from file; more temps | |
592 | while ( <> ) { | |
593 | next unless s/^(.*?):\s*//; | |
594 | $who = $1; | |
595 | $rec = {}; | |
596 | $HoH{$who} = $rec; | |
597 | for $field ( split ) { | |
598 | ($key, $value) = split /=/, $field; | |
599 | $rec->{$key} = $value; | |
600 | } | |
4973169d | 601 | } |
cb1a09d0 | 602 | |
cb1a09d0 AD |
603 | # calling a function that returns a key,value hash |
604 | for $group ( "simpsons", "jetsons", "flintstones" ) { | |
605 | $HoH{$group} = { get_family($group) }; | |
4973169d | 606 | } |
cb1a09d0 AD |
607 | |
608 | # likewise, but using temps | |
609 | for $group ( "simpsons", "jetsons", "flintstones" ) { | |
610 | %members = get_family($group); | |
611 | $HoH{$group} = { %members }; | |
4973169d | 612 | } |
cb1a09d0 AD |
613 | |
614 | # append new members to an existing family | |
615 | %new_folks = ( | |
28757baa | 616 | wife => "wilma", |
5a964f20 | 617 | pet => "dino", |
cb1a09d0 | 618 | ); |
4973169d | 619 | |
cb1a09d0 AD |
620 | for $what (keys %new_folks) { |
621 | $HoH{flintstones}{$what} = $new_folks{$what}; | |
4973169d | 622 | } |
cb1a09d0 AD |
623 | |
624 | =head2 Access and Printing of a HASH OF HASHES | |
625 | ||
626 | # one element | |
4973169d | 627 | $HoH{flintstones}{wife} = "wilma"; |
cb1a09d0 AD |
628 | |
629 | # another element | |
630 | $HoH{simpsons}{lead} =~ s/(\w)/\u$1/; | |
631 | ||
632 | # print the whole thing | |
633 | foreach $family ( keys %HoH ) { | |
1fef88e7 | 634 | print "$family: { "; |
4973169d | 635 | for $role ( keys %{ $HoH{$family} } ) { |
cb1a09d0 AD |
636 | print "$role=$HoH{$family}{$role} "; |
637 | } | |
638 | print "}\n"; | |
4973169d | 639 | } |
cb1a09d0 AD |
640 | |
641 | # print the whole thing somewhat sorted | |
642 | foreach $family ( sort keys %HoH ) { | |
1fef88e7 | 643 | print "$family: { "; |
4973169d | 644 | for $role ( sort keys %{ $HoH{$family} } ) { |
cb1a09d0 AD |
645 | print "$role=$HoH{$family}{$role} "; |
646 | } | |
647 | print "}\n"; | |
4973169d | 648 | } |
cb1a09d0 AD |
649 | |
650 | ||
651 | # print the whole thing sorted by number of members | |
28757baa | 652 | foreach $family ( sort { keys %{$HoH{$b}} <=> keys %{$HoH{$a}} } keys %HoH ) { |
1fef88e7 | 653 | print "$family: { "; |
4973169d | 654 | for $role ( sort keys %{ $HoH{$family} } ) { |
cb1a09d0 AD |
655 | print "$role=$HoH{$family}{$role} "; |
656 | } | |
657 | print "}\n"; | |
4973169d | 658 | } |
cb1a09d0 AD |
659 | |
660 | # establish a sort order (rank) for each role | |
661 | $i = 0; | |
662 | for ( qw(lead wife son daughter pal pet) ) { $rank{$_} = ++$i } | |
663 | ||
664 | # now print the whole thing sorted by number of members | |
28757baa | 665 | foreach $family ( sort { keys %{ $HoH{$b} } <=> keys %{ $HoH{$a} } } keys %HoH ) { |
1fef88e7 | 666 | print "$family: { "; |
cb1a09d0 | 667 | # and print these according to rank order |
28757baa | 668 | for $role ( sort { $rank{$a} <=> $rank{$b} } keys %{ $HoH{$family} } ) { |
cb1a09d0 AD |
669 | print "$role=$HoH{$family}{$role} "; |
670 | } | |
671 | print "}\n"; | |
4973169d | 672 | } |
cb1a09d0 AD |
673 | |
674 | ||
675 | =head1 MORE ELABORATE RECORDS | |
676 | ||
677 | =head2 Declaration of MORE ELABORATE RECORDS | |
678 | ||
679 | Here's a sample showing how to create and use a record whose fields are of | |
680 | many different sorts: | |
681 | ||
682 | $rec = { | |
4973169d | 683 | TEXT => $string, |
684 | SEQUENCE => [ @old_values ], | |
685 | LOOKUP => { %some_table }, | |
686 | THATCODE => \&some_function, | |
687 | THISCODE => sub { $_[0] ** $_[1] }, | |
688 | HANDLE => \*STDOUT, | |
cb1a09d0 AD |
689 | }; |
690 | ||
4973169d | 691 | print $rec->{TEXT}; |
cb1a09d0 | 692 | |
5b2220f5 | 693 | print $rec->{SEQUENCE}[0]; |
4973169d | 694 | $last = pop @ { $rec->{SEQUENCE} }; |
cb1a09d0 AD |
695 | |
696 | print $rec->{LOOKUP}{"key"}; | |
697 | ($first_k, $first_v) = each %{ $rec->{LOOKUP} }; | |
698 | ||
6da72b64 CS |
699 | $answer = $rec->{THATCODE}->($arg); |
700 | $answer = $rec->{THISCODE}->($arg1, $arg2); | |
cb1a09d0 AD |
701 | |
702 | # careful of extra block braces on fh ref | |
4973169d | 703 | print { $rec->{HANDLE} } "a string\n"; |
cb1a09d0 AD |
704 | |
705 | use FileHandle; | |
4973169d | 706 | $rec->{HANDLE}->autoflush(1); |
707 | $rec->{HANDLE}->print(" a string\n"); | |
cb1a09d0 AD |
708 | |
709 | =head2 Declaration of a HASH OF COMPLEX RECORDS | |
710 | ||
711 | %TV = ( | |
28757baa | 712 | flintstones => { |
cb1a09d0 | 713 | series => "flintstones", |
4973169d | 714 | nights => [ qw(monday thursday friday) ], |
cb1a09d0 AD |
715 | members => [ |
716 | { name => "fred", role => "lead", age => 36, }, | |
717 | { name => "wilma", role => "wife", age => 31, }, | |
4973169d | 718 | { name => "pebbles", role => "kid", age => 4, }, |
cb1a09d0 AD |
719 | ], |
720 | }, | |
721 | ||
28757baa | 722 | jetsons => { |
cb1a09d0 | 723 | series => "jetsons", |
4973169d | 724 | nights => [ qw(wednesday saturday) ], |
cb1a09d0 AD |
725 | members => [ |
726 | { name => "george", role => "lead", age => 41, }, | |
727 | { name => "jane", role => "wife", age => 39, }, | |
728 | { name => "elroy", role => "kid", age => 9, }, | |
729 | ], | |
730 | }, | |
731 | ||
28757baa | 732 | simpsons => { |
cb1a09d0 | 733 | series => "simpsons", |
4973169d | 734 | nights => [ qw(monday) ], |
cb1a09d0 AD |
735 | members => [ |
736 | { name => "homer", role => "lead", age => 34, }, | |
737 | { name => "marge", role => "wife", age => 37, }, | |
738 | { name => "bart", role => "kid", age => 11, }, | |
739 | ], | |
740 | }, | |
741 | ); | |
742 | ||
743 | =head2 Generation of a HASH OF COMPLEX RECORDS | |
744 | ||
745 | # reading from file | |
746 | # this is most easily done by having the file itself be | |
747 | # in the raw data format as shown above. perl is happy | |
5f05dabc | 748 | # to parse complex data structures if declared as data, so |
cb1a09d0 AD |
749 | # sometimes it's easiest to do that |
750 | ||
751 | # here's a piece by piece build up | |
752 | $rec = {}; | |
753 | $rec->{series} = "flintstones"; | |
754 | $rec->{nights} = [ find_days() ]; | |
755 | ||
756 | @members = (); | |
757 | # assume this file in field=value syntax | |
1fef88e7 | 758 | while (<>) { |
cb1a09d0 AD |
759 | %fields = split /[\s=]+/; |
760 | push @members, { %fields }; | |
761 | } | |
762 | $rec->{members} = [ @members ]; | |
763 | ||
764 | # now remember the whole thing | |
765 | $TV{ $rec->{series} } = $rec; | |
766 | ||
767 | ########################################################### | |
768 | # now, you might want to make interesting extra fields that | |
769 | # include pointers back into the same data structure so if | |
19799a22 GS |
770 | # change one piece, it changes everywhere, like for example |
771 | # if you wanted a {kids} field that was a reference | |
772 | # to an array of the kids' records without having duplicate | |
cb1a09d0 AD |
773 | # records and thus update problems. |
774 | ########################################################### | |
775 | foreach $family (keys %TV) { | |
776 | $rec = $TV{$family}; # temp pointer | |
777 | @kids = (); | |
28757baa | 778 | for $person ( @{ $rec->{members} } ) { |
cb1a09d0 AD |
779 | if ($person->{role} =~ /kid|son|daughter/) { |
780 | push @kids, $person; | |
781 | } | |
782 | } | |
783 | # REMEMBER: $rec and $TV{$family} point to same data!! | |
784 | $rec->{kids} = [ @kids ]; | |
785 | } | |
786 | ||
19799a22 | 787 | # you copied the array, but the array itself contains pointers |
cb1a09d0 AD |
788 | # to uncopied objects. this means that if you make bart get |
789 | # older via | |
790 | ||
791 | $TV{simpsons}{kids}[0]{age}++; | |
792 | ||
793 | # then this would also change in | |
794 | print $TV{simpsons}{members}[2]{age}; | |
795 | ||
796 | # because $TV{simpsons}{kids}[0] and $TV{simpsons}{members}[2] | |
797 | # both point to the same underlying anonymous hash table | |
798 | ||
799 | # print the whole thing | |
800 | foreach $family ( keys %TV ) { | |
801 | print "the $family"; | |
802 | print " is on during @{ $TV{$family}{nights} }\n"; | |
803 | print "its members are:\n"; | |
804 | for $who ( @{ $TV{$family}{members} } ) { | |
805 | print " $who->{name} ($who->{role}), age $who->{age}\n"; | |
806 | } | |
28757baa | 807 | print "it turns out that $TV{$family}{lead} has "; |
cb1a09d0 AD |
808 | print scalar ( @{ $TV{$family}{kids} } ), " kids named "; |
809 | print join (", ", map { $_->{name} } @{ $TV{$family}{kids} } ); | |
810 | print "\n"; | |
811 | } | |
812 | ||
c07a80fd | 813 | =head1 Database Ties |
814 | ||
815 | You cannot easily tie a multilevel data structure (such as a hash of | |
816 | hashes) to a dbm file. The first problem is that all but GDBM and | |
817 | Berkeley DB have size limitations, but beyond that, you also have problems | |
818 | with how references are to be represented on disk. One experimental | |
5f05dabc | 819 | module that does partially attempt to address this need is the MLDBM |
f102b883 | 820 | module. Check your nearest CPAN site as described in L<perlmodlib> for |
c07a80fd | 821 | source code to MLDBM. |
822 | ||
4633a7c4 LW |
823 | =head1 SEE ALSO |
824 | ||
1fef88e7 | 825 | perlref(1), perllol(1), perldata(1), perlobj(1) |
4633a7c4 LW |
826 | |
827 | =head1 AUTHOR | |
828 | ||
9607fc9c | 829 | Tom Christiansen <F<tchrist@perl.com>> |
4633a7c4 | 830 | |
4973169d | 831 | Last update: |
28757baa | 832 | Wed Oct 23 04:57:50 MET DST 1996 |