Commit | Line | Data |
---|---|---|
cb1a09d0 | 1 | =head1 NAME |
d74e8afc | 2 | X<data structure> X<complex data structure> X<struct> |
4633a7c4 | 3 | |
cb1a09d0 | 4 | perldsc - Perl Data Structures Cookbook |
4633a7c4 | 5 | |
cb1a09d0 | 6 | =head1 DESCRIPTION |
4633a7c4 | 7 | |
cb1e035e BF |
8 | Perl lets us have complex data structures. You can write something like |
9 | this and all of a sudden, you'd have an array with three dimensions! | |
4633a7c4 | 10 | |
5939083a SF |
11 | for my $x (1 .. 10) { |
12 | for my $y (1 .. 10) { | |
13 | for my $z (1 .. 10) { | |
6a40a726 SF |
14 | $AoA[$x][$y][$z] = |
15 | $x ** $y + $z; | |
16 | } | |
17 | } | |
4633a7c4 LW |
18 | } |
19 | ||
20 | Alas, however simple this may appear, underneath it's a much more | |
21 | elaborate construct than meets the eye! | |
22 | ||
19799a22 | 23 | How do you print it out? Why can't you say just C<print @AoA>? How do |
4633a7c4 | 24 | you sort it? How can you pass it to a function or get one of these back |
d1be9408 | 25 | from a function? Is it an object? Can you save it to disk to read |
4633a7c4 | 26 | back later? How do you access whole rows or columns of that matrix? Do |
4973169d | 27 | all the values have to be numeric? |
4633a7c4 LW |
28 | |
29 | As you see, it's quite easy to become confused. While some small portion | |
30 | of the blame for this can be attributed to the reference-based | |
31 | implementation, it's really more due to a lack of existing documentation with | |
32 | examples designed for the beginner. | |
33 | ||
5f05dabc | 34 | This document is meant to be a detailed but understandable treatment of the |
35 | many different sorts of data structures you might want to develop. It | |
36 | should also serve as a cookbook of examples. That way, when you need to | |
37 | create one of these complex data structures, you can just pinch, pilfer, or | |
38 | purloin a drop-in example from here. | |
4633a7c4 LW |
39 | |
40 | Let's look at each of these possible constructs in detail. There are separate | |
28757baa | 41 | sections on each of the following: |
4633a7c4 LW |
42 | |
43 | =over 5 | |
44 | ||
45 | =item * arrays of arrays | |
46 | ||
47 | =item * hashes of arrays | |
48 | ||
49 | =item * arrays of hashes | |
50 | ||
51 | =item * hashes of hashes | |
52 | ||
53 | =item * more elaborate constructs | |
54 | ||
4633a7c4 LW |
55 | =back |
56 | ||
5a964f20 TC |
57 | But for now, let's look at general issues common to all |
58 | these types of data structures. | |
4633a7c4 LW |
59 | |
60 | =head1 REFERENCES | |
d74e8afc | 61 | X<reference> X<dereference> X<dereferencing> X<pointer> |
4633a7c4 | 62 | |
1f025261 ML |
63 | The most important thing to understand about all data structures in |
64 | Perl--including multidimensional arrays--is that even though they might | |
4633a7c4 | 65 | appear otherwise, Perl C<@ARRAY>s and C<%HASH>es are all internally |
5f05dabc | 66 | one-dimensional. They can hold only scalar values (meaning a string, |
4633a7c4 LW |
67 | number, or a reference). They cannot directly contain other arrays or |
68 | hashes, but instead contain I<references> to other arrays or hashes. | |
d74e8afc | 69 | X<multidimensional array> X<array, multidimensional> |
4633a7c4 | 70 | |
d1be9408 | 71 | You can't use a reference to an array or hash in quite the same way that you |
5f05dabc | 72 | would a real array or hash. For C or C++ programmers unused to |
73 | distinguishing between arrays and pointers to the same, this can be | |
74 | confusing. If so, just think of it as the difference between a structure | |
75 | and a pointer to a structure. | |
4633a7c4 | 76 | |
ba555bf5 TH |
77 | You can (and should) read more about references in L<perlref>. |
78 | Briefly, references are rather like pointers that know what they | |
4633a7c4 | 79 | point to. (Objects are also a kind of reference, but we won't be needing |
4973169d | 80 | them right away--if ever.) This means that when you have something which |
81 | looks to you like an access to a two-or-more-dimensional array and/or hash, | |
82 | what's really going on is that the base type is | |
4633a7c4 LW |
83 | merely a one-dimensional entity that contains references to the next |
84 | level. It's just that you can I<use> it as though it were a | |
85 | two-dimensional one. This is actually the way almost all C | |
86 | multidimensional arrays work as well. | |
87 | ||
6a40a726 SF |
88 | $array[7][12] # array of arrays |
89 | $array[7]{string} # array of hashes | |
90 | $hash{string}[7] # hash of arrays | |
91 | $hash{string}{'another string'} # hash of hashes | |
4633a7c4 | 92 | |
5f05dabc | 93 | Now, because the top level contains only references, if you try to print |
4633a7c4 LW |
94 | out your array in with a simple print() function, you'll get something |
95 | that doesn't look very nice, like this: | |
96 | ||
5939083a | 97 | my @AoA = ( [2, 3], [4, 5, 7], [0] ); |
19799a22 | 98 | print $AoA[1][2]; |
4633a7c4 | 99 | 7 |
19799a22 | 100 | print @AoA; |
4633a7c4 LW |
101 | ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0) |
102 | ||
103 | ||
104 | That's because Perl doesn't (ever) implicitly dereference your variables. | |
105 | If you want to get at the thing a reference is referring to, then you have | |
106 | to do this yourself using either prefix typing indicators, like | |
107 | C<${$blah}>, C<@{$blah}>, C<@{$blah[$i]}>, or else postfix pointer arrows, | |
108 | like C<$a-E<gt>[3]>, C<$h-E<gt>{fred}>, or even C<$ob-E<gt>method()-E<gt>[3]>. | |
109 | ||
110 | =head1 COMMON MISTAKES | |
111 | ||
112 | The two most common mistakes made in constructing something like | |
113 | an array of arrays is either accidentally counting the number of | |
114 | elements or else taking a reference to the same memory location | |
115 | repeatedly. Here's the case where you just get the count instead | |
116 | of a nested array: | |
117 | ||
5939083a SF |
118 | for my $i (1..10) { |
119 | my @array = somefunc($i); | |
6a40a726 | 120 | $AoA[$i] = @array; # WRONG! |
4973169d | 121 | } |
4633a7c4 | 122 | |
19799a22 | 123 | That's just the simple case of assigning an array to a scalar and getting |
4633a7c4 LW |
124 | its element count. If that's what you really and truly want, then you |
125 | might do well to consider being a tad more explicit about it, like this: | |
126 | ||
5939083a SF |
127 | for my $i (1..10) { |
128 | my @array = somefunc($i); | |
6a40a726 | 129 | $counts[$i] = scalar @array; |
4973169d | 130 | } |
4633a7c4 | 131 | |
84f709e7 JH |
132 | Here's the case of taking a reference to the same memory location |
133 | again and again: | |
4633a7c4 | 134 | |
bd45a9fb KW |
135 | # Either without strict or having an outer-scope my @array; |
136 | # declaration. | |
5939083a SF |
137 | |
138 | for my $i (1..10) { | |
6a40a726 SF |
139 | @array = somefunc($i); |
140 | $AoA[$i] = \@array; # WRONG! | |
84f709e7 JH |
141 | } |
142 | ||
143 | So, what's the big problem with that? It looks right, doesn't it? | |
144 | After all, I just told you that you need an array of references, so by | |
145 | golly, you've made me one! | |
146 | ||
147 | Unfortunately, while this is true, it's still broken. All the references | |
148 | in @AoA refer to the I<very same place>, and they will therefore all hold | |
149 | whatever was last in @array! It's similar to the problem demonstrated in | |
150 | the following C program: | |
151 | ||
152 | #include <pwd.h> | |
153 | main() { | |
6a40a726 SF |
154 | struct passwd *getpwnam(), *rp, *dp; |
155 | rp = getpwnam("root"); | |
156 | dp = getpwnam("daemon"); | |
84f709e7 | 157 | |
6a40a726 SF |
158 | printf("daemon name is %s\nroot name is %s\n", |
159 | dp->pw_name, rp->pw_name); | |
84f709e7 JH |
160 | } |
161 | ||
162 | Which will print | |
163 | ||
164 | daemon name is daemon | |
165 | root name is daemon | |
166 | ||
167 | The problem is that both C<rp> and C<dp> are pointers to the same location | |
168 | in memory! In C, you'd have to remember to malloc() yourself some new | |
169 | memory. In Perl, you'll want to use the array constructor C<[]> or the | |
170 | hash constructor C<{}> instead. Here's the right way to do the preceding | |
171 | broken code fragments: | |
d74e8afc | 172 | X<[]> X<{}> |
84f709e7 | 173 | |
bd45a9fb KW |
174 | # Either without strict or having an outer-scope my @array; |
175 | # declaration. | |
5939083a SF |
176 | |
177 | for my $i (1..10) { | |
6a40a726 SF |
178 | @array = somefunc($i); |
179 | $AoA[$i] = [ @array ]; | |
4973169d | 180 | } |
4633a7c4 LW |
181 | |
182 | The square brackets make a reference to a new array with a I<copy> | |
84f709e7 JH |
183 | of what's in @array at the time of the assignment. This is what |
184 | you want. | |
4633a7c4 | 185 | |
53e62bf8 | 186 | Note that this will produce something similar: |
4633a7c4 | 187 | |
bd45a9fb KW |
188 | # Either without strict or having an outer-scope my @array; |
189 | # declaration. | |
5939083a | 190 | for my $i (1..10) { |
6a40a726 | 191 | @array = 0 .. $i; |
53e62bf8 | 192 | $AoA[$i]->@* = @array; |
4973169d | 193 | } |
4633a7c4 LW |
194 | |
195 | Is it the same? Well, maybe so--and maybe not. The subtle difference | |
196 | is that when you assign something in square brackets, you know for sure | |
197 | it's always a brand new reference with a new I<copy> of the data. | |
53e62bf8 RS |
198 | Something else could be going on in this new case with the |
199 | C<< $AoA[$i]->@* >> dereference on the left-hand-side of the assignment. | |
200 | It all depends on whether C<$AoA[$i]> had been undefined to start with, | |
201 | or whether it already contained a reference. If you had already | |
202 | populated @AoA with references, as in | |
4633a7c4 | 203 | |
19799a22 | 204 | $AoA[3] = \@another_array; |
4633a7c4 LW |
205 | |
206 | Then the assignment with the indirection on the left-hand-side would | |
207 | use the existing reference that was already there: | |
208 | ||
53e62bf8 | 209 | $AoA[3]->@* = @array; |
4633a7c4 LW |
210 | |
211 | Of course, this I<would> have the "interesting" effect of clobbering | |
19799a22 | 212 | @another_array. (Have you ever noticed how when a programmer says |
4633a7c4 LW |
213 | something is "interesting", that rather than meaning "intriguing", |
214 | they're disturbingly more apt to mean that it's "annoying", | |
215 | "difficult", or both? :-) | |
216 | ||
5f05dabc | 217 | So just remember always to use the array or hash constructors with C<[]> |
4633a7c4 | 218 | or C<{}>, and you'll be fine, although it's not always optimally |
4973169d | 219 | efficient. |
4633a7c4 LW |
220 | |
221 | Surprisingly, the following dangerous-looking construct will | |
222 | actually work out fine: | |
223 | ||
5939083a | 224 | for my $i (1..10) { |
84f709e7 JH |
225 | my @array = somefunc($i); |
226 | $AoA[$i] = \@array; | |
4973169d | 227 | } |
4633a7c4 LW |
228 | |
229 | That's because my() is more of a run-time statement than it is a | |
230 | compile-time declaration I<per se>. This means that the my() variable is | |
231 | remade afresh each time through the loop. So even though it I<looks> as | |
232 | though you stored the same variable reference each time, you actually did | |
233 | not! This is a subtle distinction that can produce more efficient code at | |
234 | the risk of misleading all but the most experienced of programmers. So I | |
235 | usually advise against teaching it to beginners. In fact, except for | |
236 | passing arguments to functions, I seldom like to see the gimme-a-reference | |
237 | operator (backslash) used much at all in code. Instead, I advise | |
238 | beginners that they (and most of the rest of us) should try to use the | |
239 | much more easily understood constructors C<[]> and C<{}> instead of | |
240 | relying upon lexical (or dynamic) scoping and hidden reference-counting to | |
241 | do the right thing behind the scenes. | |
242 | ||
53e62bf8 RS |
243 | Note also that there exists another way to write a dereference! These |
244 | two lines are equivalent: | |
245 | ||
246 | $AoA[$i]->@* = @array; | |
247 | @{ $AoA[$i] } = @array; | |
248 | ||
249 | The first form, called I<postfix dereference> is generally easier to | |
250 | read, because the expression can be read from left to right, and there | |
251 | are no enclosing braces to balance. On the other hand, it is also | |
252 | newer. It was added to the language in 2014, so you will often | |
253 | encounter the other form, I<circumfix dereference>, in older code. | |
254 | ||
4633a7c4 LW |
255 | In summary: |
256 | ||
bd45a9fb KW |
257 | $AoA[$i] = [ @array ]; # usually best |
258 | $AoA[$i] = \@array; # perilous; just how my() was that array? | |
53e62bf8 RS |
259 | $AoA[$i]->@* = @array; # way too tricky for most programmers |
260 | @{ $AoA[$i] } = @array; # just as tricky, and also harder to read | |
4633a7c4 | 261 | |
4973169d | 262 | =head1 CAVEAT ON PRECEDENCE |
d74e8afc | 263 | X<dereference, precedence> X<dereferencing, precedence> |
4633a7c4 | 264 | |
84f709e7 | 265 | Speaking of things like C<@{$AoA[$i]}>, the following are actually the |
4633a7c4 | 266 | same thing: |
d74e8afc | 267 | X<< -> >> |
4633a7c4 | 268 | |
6a40a726 SF |
269 | $aref->[2][2] # clear |
270 | $$aref[2][2] # confusing | |
4633a7c4 LW |
271 | |
272 | That's because Perl's precedence rules on its five prefix dereferencers | |
273 | (which look like someone swearing: C<$ @ * % &>) make them bind more | |
274 | tightly than the postfix subscripting brackets or braces! This will no | |
275 | doubt come as a great shock to the C or C++ programmer, who is quite | |
276 | accustomed to using C<*a[i]> to mean what's pointed to by the I<i'th> | |
277 | element of C<a>. That is, they first take the subscript, and only then | |
278 | dereference the thing at that subscript. That's fine in C, but this isn't C. | |
279 | ||
19799a22 GS |
280 | The seemingly equivalent construct in Perl, C<$$aref[$i]> first does |
281 | the deref of $aref, making it take $aref as a reference to an | |
4633a7c4 | 282 | array, and then dereference that, and finally tell you the I<i'th> value |
53e62bf8 RS |
283 | of the array pointed to by $AoA. If you wanted the C notion, you could |
284 | write C<< $AoA[$i]->$* >> to explicitly dereference the I<i'th> item, | |
285 | reading left to right. | |
4633a7c4 LW |
286 | |
287 | =head1 WHY YOU SHOULD ALWAYS C<use strict> | |
288 | ||
289 | If this is starting to sound scarier than it's worth, relax. Perl has | |
290 | some features to help you avoid its most common pitfalls. The best | |
494ca1ca | 291 | way to avoid getting confused is to start every program with: |
4633a7c4 | 292 | |
4633a7c4 LW |
293 | use strict; |
294 | ||
295 | This way, you'll be forced to declare all your variables with my() and | |
296 | also disallow accidental "symbolic dereferencing". Therefore if you'd done | |
297 | this: | |
298 | ||
19799a22 | 299 | my $aref = [ |
6a40a726 SF |
300 | [ "fred", "barney", "pebbles", "bambam", "dino", ], |
301 | [ "homer", "bart", "marge", "maggie", ], | |
302 | [ "george", "jane", "elroy", "judy", ], | |
4633a7c4 LW |
303 | ]; |
304 | ||
19799a22 | 305 | print $aref[2][2]; |
4633a7c4 LW |
306 | |
307 | The compiler would immediately flag that as an error I<at compile time>, | |
19799a22 | 308 | because you were accidentally accessing C<@aref>, an undeclared |
5f05dabc | 309 | variable, and it would thereby remind you to write instead: |
4633a7c4 | 310 | |
19799a22 | 311 | print $aref->[2][2] |
4633a7c4 LW |
312 | |
313 | =head1 DEBUGGING | |
d74e8afc ITB |
314 | X<data structure, debugging> X<complex data structure, debugging> |
315 | X<AoA, debugging> X<HoA, debugging> X<AoH, debugging> X<HoH, debugging> | |
316 | X<array of arrays, debugging> X<hash of arrays, debugging> | |
317 | X<array of hashes, debugging> X<hash of hashes, debugging> | |
4633a7c4 | 318 | |
cb1e035e BF |
319 | You can use the debugger's C<x> command to dump out complex data structures. |
320 | For example, given the assignment to $AoA above, here's the debugger output: | |
4633a7c4 | 321 | |
19799a22 GS |
322 | DB<1> x $AoA |
323 | $AoA = ARRAY(0x13b5a0) | |
4633a7c4 | 324 | 0 ARRAY(0x1f0a24) |
6a40a726 SF |
325 | 0 'fred' |
326 | 1 'barney' | |
327 | 2 'pebbles' | |
328 | 3 'bambam' | |
329 | 4 'dino' | |
4633a7c4 | 330 | 1 ARRAY(0x13b558) |
6a40a726 SF |
331 | 0 'homer' |
332 | 1 'bart' | |
333 | 2 'marge' | |
334 | 3 'maggie' | |
4633a7c4 | 335 | 2 ARRAY(0x13b540) |
6a40a726 SF |
336 | 0 'george' |
337 | 1 'jane' | |
338 | 2 'elroy' | |
339 | 3 'judy' | |
4633a7c4 | 340 | |
cb1a09d0 AD |
341 | =head1 CODE EXAMPLES |
342 | ||
247c9d46 RS |
343 | Presented with little comment here are short code examples illustrating |
344 | access of various types of data structures. | |
cb1a09d0 | 345 | |
19799a22 | 346 | =head1 ARRAYS OF ARRAYS |
d74e8afc | 347 | X<array of arrays> X<AoA> |
cb1a09d0 | 348 | |
d1be9408 | 349 | =head2 Declaration of an ARRAY OF ARRAYS |
cb1a09d0 | 350 | |
84f709e7 JH |
351 | @AoA = ( |
352 | [ "fred", "barney" ], | |
353 | [ "george", "jane", "elroy" ], | |
354 | [ "homer", "marge", "bart" ], | |
cb1a09d0 AD |
355 | ); |
356 | ||
d1be9408 | 357 | =head2 Generation of an ARRAY OF ARRAYS |
cb1a09d0 AD |
358 | |
359 | # reading from file | |
360 | while ( <> ) { | |
19799a22 | 361 | push @AoA, [ split ]; |
4973169d | 362 | } |
cb1a09d0 AD |
363 | |
364 | # calling a function | |
84f709e7 | 365 | for $i ( 1 .. 10 ) { |
19799a22 | 366 | $AoA[$i] = [ somefunc($i) ]; |
4973169d | 367 | } |
cb1a09d0 AD |
368 | |
369 | # using temp vars | |
84f709e7 JH |
370 | for $i ( 1 .. 10 ) { |
371 | @tmp = somefunc($i); | |
372 | $AoA[$i] = [ @tmp ]; | |
4973169d | 373 | } |
cb1a09d0 AD |
374 | |
375 | # add to an existing row | |
53e62bf8 | 376 | push $AoA[0]->@*, "wilma", "betty"; |
cb1a09d0 | 377 | |
d1be9408 | 378 | =head2 Access and Printing of an ARRAY OF ARRAYS |
cb1a09d0 AD |
379 | |
380 | # one element | |
84f709e7 | 381 | $AoA[0][0] = "Fred"; |
cb1a09d0 AD |
382 | |
383 | # another element | |
19799a22 | 384 | $AoA[1][1] =~ s/(\w)/\u$1/; |
cb1a09d0 AD |
385 | |
386 | # print the whole thing with refs | |
84f709e7 | 387 | for $aref ( @AoA ) { |
cb1a09d0 | 388 | print "\t [ @$aref ],\n"; |
4973169d | 389 | } |
cb1a09d0 AD |
390 | |
391 | # print the whole thing with indices | |
84f709e7 | 392 | for $i ( 0 .. $#AoA ) { |
53e62bf8 | 393 | print "\t [ $AoA[$i]->@* ],\n"; |
4973169d | 394 | } |
cb1a09d0 AD |
395 | |
396 | # print the whole thing one at a time | |
84f709e7 | 397 | for $i ( 0 .. $#AoA ) { |
8084d6ce RS |
398 | for $j ( 0 .. $AoA[$i]->$#* ) { |
399 | print "elem at ($i, $j) is $AoA[$i][$j]\n"; | |
cb1a09d0 | 400 | } |
4973169d | 401 | } |
cb1a09d0 | 402 | |
19799a22 | 403 | =head1 HASHES OF ARRAYS |
d74e8afc | 404 | X<hash of arrays> X<HoA> |
cb1a09d0 | 405 | |
19799a22 | 406 | =head2 Declaration of a HASH OF ARRAYS |
cb1a09d0 | 407 | |
84f709e7 JH |
408 | %HoA = ( |
409 | flintstones => [ "fred", "barney" ], | |
410 | jetsons => [ "george", "jane", "elroy" ], | |
411 | simpsons => [ "homer", "marge", "bart" ], | |
cb1a09d0 AD |
412 | ); |
413 | ||
19799a22 | 414 | =head2 Generation of a HASH OF ARRAYS |
cb1a09d0 AD |
415 | |
416 | # reading from file | |
417 | # flintstones: fred barney wilma dino | |
418 | while ( <> ) { | |
84f709e7 | 419 | next unless s/^(.*?):\s*//; |
19799a22 | 420 | $HoA{$1} = [ split ]; |
4973169d | 421 | } |
cb1a09d0 AD |
422 | |
423 | # reading from file; more temps | |
424 | # flintstones: fred barney wilma dino | |
84f709e7 JH |
425 | while ( $line = <> ) { |
426 | ($who, $rest) = split /:\s*/, $line, 2; | |
427 | @fields = split ' ', $rest; | |
428 | $HoA{$who} = [ @fields ]; | |
4973169d | 429 | } |
cb1a09d0 AD |
430 | |
431 | # calling a function that returns a list | |
84f709e7 | 432 | for $group ( "simpsons", "jetsons", "flintstones" ) { |
19799a22 | 433 | $HoA{$group} = [ get_family($group) ]; |
4973169d | 434 | } |
cb1a09d0 AD |
435 | |
436 | # likewise, but using temps | |
84f709e7 JH |
437 | for $group ( "simpsons", "jetsons", "flintstones" ) { |
438 | @members = get_family($group); | |
439 | $HoA{$group} = [ @members ]; | |
4973169d | 440 | } |
cb1a09d0 AD |
441 | |
442 | # append new members to an existing family | |
53e62bf8 | 443 | push $HoA{flintstones}->@*, "wilma", "betty"; |
cb1a09d0 | 444 | |
19799a22 | 445 | =head2 Access and Printing of a HASH OF ARRAYS |
cb1a09d0 AD |
446 | |
447 | # one element | |
84f709e7 | 448 | $HoA{flintstones}[0] = "Fred"; |
cb1a09d0 AD |
449 | |
450 | # another element | |
19799a22 | 451 | $HoA{simpsons}[1] =~ s/(\w)/\u$1/; |
cb1a09d0 AD |
452 | |
453 | # print the whole thing | |
84f709e7 | 454 | foreach $family ( keys %HoA ) { |
53e62bf8 | 455 | print "$family: $HoA{$family}->@* \n" |
4973169d | 456 | } |
cb1a09d0 AD |
457 | |
458 | # print the whole thing with indices | |
84f709e7 JH |
459 | foreach $family ( keys %HoA ) { |
460 | print "family: "; | |
53e62bf8 | 461 | foreach $i ( 0 .. $HoA{$family}->$#* ) { |
19799a22 | 462 | print " $i = $HoA{$family}[$i]"; |
cb1a09d0 AD |
463 | } |
464 | print "\n"; | |
4973169d | 465 | } |
cb1a09d0 AD |
466 | |
467 | # print the whole thing sorted by number of members | |
53e62bf8 RS |
468 | foreach $family ( sort { $HoA{$b}->@* <=> $HoA{$a}->@* } keys %HoA ) { |
469 | print "$family: $HoA{$family}->@* \n" | |
4973169d | 470 | } |
cb1a09d0 AD |
471 | |
472 | # print the whole thing sorted by number of members and name | |
84f709e7 | 473 | foreach $family ( sort { |
53e62bf8 RS |
474 | $HoA{$b}->@* <=> $HoA{$a}->@* |
475 | || | |
476 | $a cmp $b | |
6a40a726 | 477 | } keys %HoA ) |
84f709e7 | 478 | { |
53e62bf8 | 479 | print "$family: ", join(", ", sort $HoA{$family}->@* ), "\n"; |
4973169d | 480 | } |
cb1a09d0 | 481 | |
19799a22 | 482 | =head1 ARRAYS OF HASHES |
d74e8afc | 483 | X<array of hashes> X<AoH> |
cb1a09d0 | 484 | |
d1be9408 | 485 | =head2 Declaration of an ARRAY OF HASHES |
cb1a09d0 | 486 | |
84f709e7 | 487 | @AoH = ( |
cb1a09d0 | 488 | { |
84f709e7 JH |
489 | Lead => "fred", |
490 | Friend => "barney", | |
cb1a09d0 AD |
491 | }, |
492 | { | |
84f709e7 JH |
493 | Lead => "george", |
494 | Wife => "jane", | |
495 | Son => "elroy", | |
cb1a09d0 AD |
496 | }, |
497 | { | |
84f709e7 JH |
498 | Lead => "homer", |
499 | Wife => "marge", | |
500 | Son => "bart", | |
cb1a09d0 AD |
501 | } |
502 | ); | |
503 | ||
d1be9408 | 504 | =head2 Generation of an ARRAY OF HASHES |
cb1a09d0 AD |
505 | |
506 | # reading from file | |
507 | # format: LEAD=fred FRIEND=barney | |
508 | while ( <> ) { | |
84f709e7 JH |
509 | $rec = {}; |
510 | for $field ( split ) { | |
511 | ($key, $value) = split /=/, $field; | |
512 | $rec->{$key} = $value; | |
cb1a09d0 | 513 | } |
19799a22 | 514 | push @AoH, $rec; |
4973169d | 515 | } |
cb1a09d0 AD |
516 | |
517 | ||
518 | # reading from file | |
519 | # format: LEAD=fred FRIEND=barney | |
520 | # no temp | |
521 | while ( <> ) { | |
19799a22 | 522 | push @AoH, { split /[\s+=]/ }; |
4973169d | 523 | } |
cb1a09d0 | 524 | |
19799a22 | 525 | # calling a function that returns a key/value pair list, like |
84f709e7 JH |
526 | # "lead","fred","daughter","pebbles" |
527 | while ( %fields = getnextpairset() ) { | |
19799a22 | 528 | push @AoH, { %fields }; |
4973169d | 529 | } |
cb1a09d0 AD |
530 | |
531 | # likewise, but using no temp vars | |
532 | while (<>) { | |
19799a22 | 533 | push @AoH, { parsepairs($_) }; |
4973169d | 534 | } |
cb1a09d0 AD |
535 | |
536 | # add key/value to an element | |
84f709e7 | 537 | $AoH[0]{pet} = "dino"; |
19799a22 | 538 | $AoH[2]{pet} = "santa's little helper"; |
cb1a09d0 | 539 | |
d1be9408 | 540 | =head2 Access and Printing of an ARRAY OF HASHES |
cb1a09d0 AD |
541 | |
542 | # one element | |
84f709e7 | 543 | $AoH[0]{lead} = "fred"; |
cb1a09d0 AD |
544 | |
545 | # another element | |
19799a22 | 546 | $AoH[1]{lead} =~ s/(\w)/\u$1/; |
cb1a09d0 AD |
547 | |
548 | # print the whole thing with refs | |
84f709e7 JH |
549 | for $href ( @AoH ) { |
550 | print "{ "; | |
551 | for $role ( keys %$href ) { | |
552 | print "$role=$href->{$role} "; | |
cb1a09d0 AD |
553 | } |
554 | print "}\n"; | |
4973169d | 555 | } |
cb1a09d0 AD |
556 | |
557 | # print the whole thing with indices | |
84f709e7 | 558 | for $i ( 0 .. $#AoH ) { |
cb1a09d0 | 559 | print "$i is { "; |
53e62bf8 | 560 | for $role ( keys $AoH[$i]->%* ) { |
84f709e7 | 561 | print "$role=$AoH[$i]{$role} "; |
cb1a09d0 AD |
562 | } |
563 | print "}\n"; | |
4973169d | 564 | } |
cb1a09d0 AD |
565 | |
566 | # print the whole thing one at a time | |
84f709e7 | 567 | for $i ( 0 .. $#AoH ) { |
8084d6ce RS |
568 | for $role ( keys $AoH[$i]->%* ) { |
569 | print "elem at ($i, $role) is $AoH[$i]{$role}\n"; | |
cb1a09d0 | 570 | } |
4973169d | 571 | } |
cb1a09d0 AD |
572 | |
573 | =head1 HASHES OF HASHES | |
8e0aa7ce | 574 | X<hash of hashes> X<HoH> |
cb1a09d0 AD |
575 | |
576 | =head2 Declaration of a HASH OF HASHES | |
577 | ||
84f709e7 | 578 | %HoH = ( |
28757baa | 579 | flintstones => { |
6a40a726 SF |
580 | lead => "fred", |
581 | pal => "barney", | |
cb1a09d0 | 582 | }, |
28757baa | 583 | jetsons => { |
6a40a726 SF |
584 | lead => "george", |
585 | wife => "jane", | |
586 | "his boy" => "elroy", | |
4973169d | 587 | }, |
28757baa | 588 | simpsons => { |
6a40a726 SF |
589 | lead => "homer", |
590 | wife => "marge", | |
591 | kid => "bart", | |
592 | }, | |
4973169d | 593 | ); |
cb1a09d0 AD |
594 | |
595 | =head2 Generation of a HASH OF HASHES | |
596 | ||
597 | # reading from file | |
598 | # flintstones: lead=fred pal=barney wife=wilma pet=dino | |
599 | while ( <> ) { | |
84f709e7 JH |
600 | next unless s/^(.*?):\s*//; |
601 | $who = $1; | |
602 | for $field ( split ) { | |
603 | ($key, $value) = split /=/, $field; | |
cb1a09d0 AD |
604 | $HoH{$who}{$key} = $value; |
605 | } | |
606 | ||
607 | ||
608 | # reading from file; more temps | |
609 | while ( <> ) { | |
84f709e7 JH |
610 | next unless s/^(.*?):\s*//; |
611 | $who = $1; | |
612 | $rec = {}; | |
cb1a09d0 | 613 | $HoH{$who} = $rec; |
84f709e7 JH |
614 | for $field ( split ) { |
615 | ($key, $value) = split /=/, $field; | |
616 | $rec->{$key} = $value; | |
cb1a09d0 | 617 | } |
4973169d | 618 | } |
cb1a09d0 | 619 | |
cb1a09d0 | 620 | # calling a function that returns a key,value hash |
84f709e7 | 621 | for $group ( "simpsons", "jetsons", "flintstones" ) { |
cb1a09d0 | 622 | $HoH{$group} = { get_family($group) }; |
4973169d | 623 | } |
cb1a09d0 AD |
624 | |
625 | # likewise, but using temps | |
84f709e7 JH |
626 | for $group ( "simpsons", "jetsons", "flintstones" ) { |
627 | %members = get_family($group); | |
cb1a09d0 | 628 | $HoH{$group} = { %members }; |
4973169d | 629 | } |
cb1a09d0 AD |
630 | |
631 | # append new members to an existing family | |
84f709e7 JH |
632 | %new_folks = ( |
633 | wife => "wilma", | |
634 | pet => "dino", | |
cb1a09d0 | 635 | ); |
4973169d | 636 | |
84f709e7 | 637 | for $what (keys %new_folks) { |
cb1a09d0 | 638 | $HoH{flintstones}{$what} = $new_folks{$what}; |
4973169d | 639 | } |
cb1a09d0 AD |
640 | |
641 | =head2 Access and Printing of a HASH OF HASHES | |
642 | ||
643 | # one element | |
84f709e7 | 644 | $HoH{flintstones}{wife} = "wilma"; |
cb1a09d0 AD |
645 | |
646 | # another element | |
647 | $HoH{simpsons}{lead} =~ s/(\w)/\u$1/; | |
648 | ||
649 | # print the whole thing | |
84f709e7 | 650 | foreach $family ( keys %HoH ) { |
1fef88e7 | 651 | print "$family: { "; |
53e62bf8 | 652 | for $role ( keys $HoH{$family}->%* ) { |
84f709e7 | 653 | print "$role=$HoH{$family}{$role} "; |
cb1a09d0 AD |
654 | } |
655 | print "}\n"; | |
4973169d | 656 | } |
cb1a09d0 AD |
657 | |
658 | # print the whole thing somewhat sorted | |
84f709e7 | 659 | foreach $family ( sort keys %HoH ) { |
1fef88e7 | 660 | print "$family: { "; |
53e62bf8 | 661 | for $role ( sort keys $HoH{$family}->%* ) { |
84f709e7 | 662 | print "$role=$HoH{$family}{$role} "; |
cb1a09d0 AD |
663 | } |
664 | print "}\n"; | |
4973169d | 665 | } |
cb1a09d0 | 666 | |
84f709e7 | 667 | |
cb1a09d0 | 668 | # print the whole thing sorted by number of members |
6e30921b | 669 | foreach $family ( sort { $HoH{$b}->%* <=> $HoH{$a}->%* } keys %HoH ) { |
1fef88e7 | 670 | print "$family: { "; |
53e62bf8 | 671 | for $role ( sort keys $HoH{$family}->%* ) { |
84f709e7 | 672 | print "$role=$HoH{$family}{$role} "; |
cb1a09d0 AD |
673 | } |
674 | print "}\n"; | |
4973169d | 675 | } |
cb1a09d0 AD |
676 | |
677 | # establish a sort order (rank) for each role | |
84f709e7 JH |
678 | $i = 0; |
679 | for ( qw(lead wife son daughter pal pet) ) { $rank{$_} = ++$i } | |
cb1a09d0 AD |
680 | |
681 | # now print the whole thing sorted by number of members | |
6e30921b | 682 | foreach $family ( sort { $HoH{$b}->%* <=> $HoH{$a}->%* } keys %HoH ) { |
1fef88e7 | 683 | print "$family: { "; |
cb1a09d0 | 684 | # and print these according to rank order |
bd45a9fb | 685 | for $role ( sort { $rank{$a} <=> $rank{$b} } |
53e62bf8 | 686 | keys $HoH{$family}->%* ) |
bd45a9fb | 687 | { |
84f709e7 | 688 | print "$role=$HoH{$family}{$role} "; |
cb1a09d0 AD |
689 | } |
690 | print "}\n"; | |
4973169d | 691 | } |
cb1a09d0 AD |
692 | |
693 | ||
694 | =head1 MORE ELABORATE RECORDS | |
d74e8afc | 695 | X<record> X<structure> X<struct> |
cb1a09d0 AD |
696 | |
697 | =head2 Declaration of MORE ELABORATE RECORDS | |
698 | ||
699 | Here's a sample showing how to create and use a record whose fields are of | |
700 | many different sorts: | |
701 | ||
84f709e7 | 702 | $rec = { |
6a40a726 SF |
703 | TEXT => $string, |
704 | SEQUENCE => [ @old_values ], | |
705 | LOOKUP => { %some_table }, | |
706 | THATCODE => \&some_function, | |
707 | THISCODE => sub { $_[0] ** $_[1] }, | |
708 | HANDLE => \*STDOUT, | |
cb1a09d0 AD |
709 | }; |
710 | ||
4973169d | 711 | print $rec->{TEXT}; |
cb1a09d0 | 712 | |
84f709e7 | 713 | print $rec->{SEQUENCE}[0]; |
a07c6b4e | 714 | $last = pop $rec->{SEQUENCE}->@*; |
cb1a09d0 | 715 | |
84f709e7 | 716 | print $rec->{LOOKUP}{"key"}; |
53e62bf8 | 717 | ($first_k, $first_v) = each $rec->{LOOKUP}->%*; |
cb1a09d0 | 718 | |
84f709e7 JH |
719 | $answer = $rec->{THATCODE}->($arg); |
720 | $answer = $rec->{THISCODE}->($arg1, $arg2); | |
cb1a09d0 AD |
721 | |
722 | # careful of extra block braces on fh ref | |
4973169d | 723 | print { $rec->{HANDLE} } "a string\n"; |
cb1a09d0 AD |
724 | |
725 | use FileHandle; | |
4973169d | 726 | $rec->{HANDLE}->autoflush(1); |
727 | $rec->{HANDLE}->print(" a string\n"); | |
cb1a09d0 AD |
728 | |
729 | =head2 Declaration of a HASH OF COMPLEX RECORDS | |
730 | ||
84f709e7 | 731 | %TV = ( |
28757baa | 732 | flintstones => { |
84f709e7 | 733 | series => "flintstones", |
4973169d | 734 | nights => [ qw(monday thursday friday) ], |
cb1a09d0 | 735 | members => [ |
84f709e7 JH |
736 | { name => "fred", role => "lead", age => 36, }, |
737 | { name => "wilma", role => "wife", age => 31, }, | |
738 | { name => "pebbles", role => "kid", age => 4, }, | |
cb1a09d0 AD |
739 | ], |
740 | }, | |
741 | ||
28757baa | 742 | jetsons => { |
84f709e7 | 743 | series => "jetsons", |
4973169d | 744 | nights => [ qw(wednesday saturday) ], |
cb1a09d0 | 745 | members => [ |
84f709e7 JH |
746 | { name => "george", role => "lead", age => 41, }, |
747 | { name => "jane", role => "wife", age => 39, }, | |
748 | { name => "elroy", role => "kid", age => 9, }, | |
cb1a09d0 AD |
749 | ], |
750 | }, | |
751 | ||
28757baa | 752 | simpsons => { |
84f709e7 | 753 | series => "simpsons", |
4973169d | 754 | nights => [ qw(monday) ], |
cb1a09d0 | 755 | members => [ |
84f709e7 JH |
756 | { name => "homer", role => "lead", age => 34, }, |
757 | { name => "marge", role => "wife", age => 37, }, | |
758 | { name => "bart", role => "kid", age => 11, }, | |
cb1a09d0 AD |
759 | ], |
760 | }, | |
761 | ); | |
762 | ||
763 | =head2 Generation of a HASH OF COMPLEX RECORDS | |
764 | ||
84f709e7 JH |
765 | # reading from file |
766 | # this is most easily done by having the file itself be | |
767 | # in the raw data format as shown above. perl is happy | |
768 | # to parse complex data structures if declared as data, so | |
769 | # sometimes it's easiest to do that | |
cb1a09d0 | 770 | |
84f709e7 JH |
771 | # here's a piece by piece build up |
772 | $rec = {}; | |
773 | $rec->{series} = "flintstones"; | |
cb1a09d0 AD |
774 | $rec->{nights} = [ find_days() ]; |
775 | ||
84f709e7 | 776 | @members = (); |
cb1a09d0 | 777 | # assume this file in field=value syntax |
84f709e7 JH |
778 | while (<>) { |
779 | %fields = split /[\s=]+/; | |
cb1a09d0 AD |
780 | push @members, { %fields }; |
781 | } | |
782 | $rec->{members} = [ @members ]; | |
783 | ||
784 | # now remember the whole thing | |
785 | $TV{ $rec->{series} } = $rec; | |
786 | ||
84f709e7 JH |
787 | ########################################################### |
788 | # now, you might want to make interesting extra fields that | |
789 | # include pointers back into the same data structure so if | |
790 | # change one piece, it changes everywhere, like for example | |
791 | # if you wanted a {kids} field that was a reference | |
792 | # to an array of the kids' records without having duplicate | |
793 | # records and thus update problems. | |
794 | ########################################################### | |
795 | foreach $family (keys %TV) { | |
796 | $rec = $TV{$family}; # temp pointer | |
797 | @kids = (); | |
53e62bf8 | 798 | for $person ( $rec->{members}->@* ) { |
84f709e7 | 799 | if ($person->{role} =~ /kid|son|daughter/) { |
cb1a09d0 AD |
800 | push @kids, $person; |
801 | } | |
802 | } | |
803 | # REMEMBER: $rec and $TV{$family} point to same data!! | |
804 | $rec->{kids} = [ @kids ]; | |
805 | } | |
806 | ||
84f709e7 JH |
807 | # you copied the array, but the array itself contains pointers |
808 | # to uncopied objects. this means that if you make bart get | |
809 | # older via | |
cb1a09d0 AD |
810 | |
811 | $TV{simpsons}{kids}[0]{age}++; | |
812 | ||
84f709e7 JH |
813 | # then this would also change in |
814 | print $TV{simpsons}{members}[2]{age}; | |
cb1a09d0 | 815 | |
84f709e7 JH |
816 | # because $TV{simpsons}{kids}[0] and $TV{simpsons}{members}[2] |
817 | # both point to the same underlying anonymous hash table | |
6ba6f0ec | 818 | |
84f709e7 JH |
819 | # print the whole thing |
820 | foreach $family ( keys %TV ) { | |
821 | print "the $family"; | |
53e62bf8 | 822 | print " is on during $TV{$family}{nights}->@*\n"; |
84f709e7 | 823 | print "its members are:\n"; |
53e62bf8 | 824 | for $who ( $TV{$family}{members}->@* ) { |
cb1a09d0 AD |
825 | print " $who->{name} ($who->{role}), age $who->{age}\n"; |
826 | } | |
84f709e7 | 827 | print "it turns out that $TV{$family}{lead} has "; |
53e62bf8 RS |
828 | print scalar ( $TV{$family}{kids}->@* ), " kids named "; |
829 | print join (", ", map { $_->{name} } $TV{$family}{kids}->@* ); | |
84f709e7 | 830 | print "\n"; |
cb1a09d0 AD |
831 | } |
832 | ||
c07a80fd | 833 | =head1 Database Ties |
834 | ||
835 | You cannot easily tie a multilevel data structure (such as a hash of | |
836 | hashes) to a dbm file. The first problem is that all but GDBM and | |
837 | Berkeley DB have size limitations, but beyond that, you also have problems | |
838 | with how references are to be represented on disk. One experimental | |
5f05dabc | 839 | module that does partially attempt to address this need is the MLDBM |
f102b883 | 840 | module. Check your nearest CPAN site as described in L<perlmodlib> for |
c07a80fd | 841 | source code to MLDBM. |
842 | ||
4633a7c4 LW |
843 | =head1 SEE ALSO |
844 | ||
ba555bf5 | 845 | L<perlref>, L<perllol>, L<perldata>, L<perlobj> |
4633a7c4 LW |
846 | |
847 | =head1 AUTHOR | |
848 | ||
9607fc9c | 849 | Tom Christiansen <F<tchrist@perl.com>> |