| 1 | =head1 NAME |
| 2 | X<data structure> X<complex data structure> X<struct> |
| 3 | |
| 4 | perldsc - Perl Data Structures Cookbook |
| 5 | |
| 6 | =head1 DESCRIPTION |
| 7 | |
| 8 | The single feature most sorely lacking in the Perl programming language |
| 9 | prior to its 5.0 release was complex data structures. Even without direct |
| 10 | language support, some valiant programmers did manage to emulate them, but |
| 11 | it was hard work and not for the faint of heart. You could occasionally |
| 12 | get away with the C<$m{$AoA,$b}> notation borrowed from B<awk> in which the |
| 13 | keys are actually more like a single concatenated string C<"$AoA$b">, but |
| 14 | traversal and sorting were difficult. More desperate programmers even |
| 15 | hacked Perl's internal symbol table directly, a strategy that proved hard |
| 16 | to develop and maintain--to put it mildly. |
| 17 | |
| 18 | The 5.0 release of Perl let us have complex data structures. You |
| 19 | may now write something like this and all of a sudden, you'd have an array |
| 20 | with three dimensions! |
| 21 | |
| 22 | for $x (1 .. 10) { |
| 23 | for $y (1 .. 10) { |
| 24 | for $z (1 .. 10) { |
| 25 | $AoA[$x][$y][$z] = |
| 26 | $x ** $y + $z; |
| 27 | } |
| 28 | } |
| 29 | } |
| 30 | |
| 31 | Alas, however simple this may appear, underneath it's a much more |
| 32 | elaborate construct than meets the eye! |
| 33 | |
| 34 | How do you print it out? Why can't you say just C<print @AoA>? How do |
| 35 | you sort it? How can you pass it to a function or get one of these back |
| 36 | from a function? Is it an object? Can you save it to disk to read |
| 37 | back later? How do you access whole rows or columns of that matrix? Do |
| 38 | all the values have to be numeric? |
| 39 | |
| 40 | As you see, it's quite easy to become confused. While some small portion |
| 41 | of the blame for this can be attributed to the reference-based |
| 42 | implementation, it's really more due to a lack of existing documentation with |
| 43 | examples designed for the beginner. |
| 44 | |
| 45 | This document is meant to be a detailed but understandable treatment of the |
| 46 | many different sorts of data structures you might want to develop. It |
| 47 | should also serve as a cookbook of examples. That way, when you need to |
| 48 | create one of these complex data structures, you can just pinch, pilfer, or |
| 49 | purloin a drop-in example from here. |
| 50 | |
| 51 | Let's look at each of these possible constructs in detail. There are separate |
| 52 | sections on each of the following: |
| 53 | |
| 54 | =over 5 |
| 55 | |
| 56 | =item * arrays of arrays |
| 57 | |
| 58 | =item * hashes of arrays |
| 59 | |
| 60 | =item * arrays of hashes |
| 61 | |
| 62 | =item * hashes of hashes |
| 63 | |
| 64 | =item * more elaborate constructs |
| 65 | |
| 66 | =back |
| 67 | |
| 68 | But for now, let's look at general issues common to all |
| 69 | these types of data structures. |
| 70 | |
| 71 | =head1 REFERENCES |
| 72 | X<reference> X<dereference> X<dereferencing> X<pointer> |
| 73 | |
| 74 | The most important thing to understand about all data structures in Perl |
| 75 | -- including multidimensional arrays--is that even though they might |
| 76 | appear otherwise, Perl C<@ARRAY>s and C<%HASH>es are all internally |
| 77 | one-dimensional. They can hold only scalar values (meaning a string, |
| 78 | number, or a reference). They cannot directly contain other arrays or |
| 79 | hashes, but instead contain I<references> to other arrays or hashes. |
| 80 | X<multidimensional array> X<array, multidimensional> |
| 81 | |
| 82 | You can't use a reference to an array or hash in quite the same way that you |
| 83 | would a real array or hash. For C or C++ programmers unused to |
| 84 | distinguishing between arrays and pointers to the same, this can be |
| 85 | confusing. If so, just think of it as the difference between a structure |
| 86 | and a pointer to a structure. |
| 87 | |
| 88 | You can (and should) read more about references in the perlref(1) man |
| 89 | page. Briefly, references are rather like pointers that know what they |
| 90 | point to. (Objects are also a kind of reference, but we won't be needing |
| 91 | them right away--if ever.) This means that when you have something which |
| 92 | looks to you like an access to a two-or-more-dimensional array and/or hash, |
| 93 | what's really going on is that the base type is |
| 94 | merely a one-dimensional entity that contains references to the next |
| 95 | level. It's just that you can I<use> it as though it were a |
| 96 | two-dimensional one. This is actually the way almost all C |
| 97 | multidimensional arrays work as well. |
| 98 | |
| 99 | $array[7][12] # array of arrays |
| 100 | $array[7]{string} # array of hashes |
| 101 | $hash{string}[7] # hash of arrays |
| 102 | $hash{string}{'another string'} # hash of hashes |
| 103 | |
| 104 | Now, because the top level contains only references, if you try to print |
| 105 | out your array in with a simple print() function, you'll get something |
| 106 | that doesn't look very nice, like this: |
| 107 | |
| 108 | @AoA = ( [2, 3], [4, 5, 7], [0] ); |
| 109 | print $AoA[1][2]; |
| 110 | 7 |
| 111 | print @AoA; |
| 112 | ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0) |
| 113 | |
| 114 | |
| 115 | That's because Perl doesn't (ever) implicitly dereference your variables. |
| 116 | If you want to get at the thing a reference is referring to, then you have |
| 117 | to do this yourself using either prefix typing indicators, like |
| 118 | C<${$blah}>, C<@{$blah}>, C<@{$blah[$i]}>, or else postfix pointer arrows, |
| 119 | like C<$a-E<gt>[3]>, C<$h-E<gt>{fred}>, or even C<$ob-E<gt>method()-E<gt>[3]>. |
| 120 | |
| 121 | =head1 COMMON MISTAKES |
| 122 | |
| 123 | The two most common mistakes made in constructing something like |
| 124 | an array of arrays is either accidentally counting the number of |
| 125 | elements or else taking a reference to the same memory location |
| 126 | repeatedly. Here's the case where you just get the count instead |
| 127 | of a nested array: |
| 128 | |
| 129 | for $i (1..10) { |
| 130 | @array = somefunc($i); |
| 131 | $AoA[$i] = @array; # WRONG! |
| 132 | } |
| 133 | |
| 134 | That's just the simple case of assigning an array to a scalar and getting |
| 135 | its element count. If that's what you really and truly want, then you |
| 136 | might do well to consider being a tad more explicit about it, like this: |
| 137 | |
| 138 | for $i (1..10) { |
| 139 | @array = somefunc($i); |
| 140 | $counts[$i] = scalar @array; |
| 141 | } |
| 142 | |
| 143 | Here's the case of taking a reference to the same memory location |
| 144 | again and again: |
| 145 | |
| 146 | for $i (1..10) { |
| 147 | @array = somefunc($i); |
| 148 | $AoA[$i] = \@array; # WRONG! |
| 149 | } |
| 150 | |
| 151 | So, what's the big problem with that? It looks right, doesn't it? |
| 152 | After all, I just told you that you need an array of references, so by |
| 153 | golly, you've made me one! |
| 154 | |
| 155 | Unfortunately, while this is true, it's still broken. All the references |
| 156 | in @AoA refer to the I<very same place>, and they will therefore all hold |
| 157 | whatever was last in @array! It's similar to the problem demonstrated in |
| 158 | the following C program: |
| 159 | |
| 160 | #include <pwd.h> |
| 161 | main() { |
| 162 | struct passwd *getpwnam(), *rp, *dp; |
| 163 | rp = getpwnam("root"); |
| 164 | dp = getpwnam("daemon"); |
| 165 | |
| 166 | printf("daemon name is %s\nroot name is %s\n", |
| 167 | dp->pw_name, rp->pw_name); |
| 168 | } |
| 169 | |
| 170 | Which will print |
| 171 | |
| 172 | daemon name is daemon |
| 173 | root name is daemon |
| 174 | |
| 175 | The problem is that both C<rp> and C<dp> are pointers to the same location |
| 176 | in memory! In C, you'd have to remember to malloc() yourself some new |
| 177 | memory. In Perl, you'll want to use the array constructor C<[]> or the |
| 178 | hash constructor C<{}> instead. Here's the right way to do the preceding |
| 179 | broken code fragments: |
| 180 | X<[]> X<{}> |
| 181 | |
| 182 | for $i (1..10) { |
| 183 | @array = somefunc($i); |
| 184 | $AoA[$i] = [ @array ]; |
| 185 | } |
| 186 | |
| 187 | The square brackets make a reference to a new array with a I<copy> |
| 188 | of what's in @array at the time of the assignment. This is what |
| 189 | you want. |
| 190 | |
| 191 | Note that this will produce something similar, but it's |
| 192 | much harder to read: |
| 193 | |
| 194 | for $i (1..10) { |
| 195 | @array = 0 .. $i; |
| 196 | @{$AoA[$i]} = @array; |
| 197 | } |
| 198 | |
| 199 | Is it the same? Well, maybe so--and maybe not. The subtle difference |
| 200 | is that when you assign something in square brackets, you know for sure |
| 201 | it's always a brand new reference with a new I<copy> of the data. |
| 202 | Something else could be going on in this new case with the C<@{$AoA[$i]}> |
| 203 | dereference on the left-hand-side of the assignment. It all depends on |
| 204 | whether C<$AoA[$i]> had been undefined to start with, or whether it |
| 205 | already contained a reference. If you had already populated @AoA with |
| 206 | references, as in |
| 207 | |
| 208 | $AoA[3] = \@another_array; |
| 209 | |
| 210 | Then the assignment with the indirection on the left-hand-side would |
| 211 | use the existing reference that was already there: |
| 212 | |
| 213 | @{$AoA[3]} = @array; |
| 214 | |
| 215 | Of course, this I<would> have the "interesting" effect of clobbering |
| 216 | @another_array. (Have you ever noticed how when a programmer says |
| 217 | something is "interesting", that rather than meaning "intriguing", |
| 218 | they're disturbingly more apt to mean that it's "annoying", |
| 219 | "difficult", or both? :-) |
| 220 | |
| 221 | So just remember always to use the array or hash constructors with C<[]> |
| 222 | or C<{}>, and you'll be fine, although it's not always optimally |
| 223 | efficient. |
| 224 | |
| 225 | Surprisingly, the following dangerous-looking construct will |
| 226 | actually work out fine: |
| 227 | |
| 228 | for $i (1..10) { |
| 229 | my @array = somefunc($i); |
| 230 | $AoA[$i] = \@array; |
| 231 | } |
| 232 | |
| 233 | That's because my() is more of a run-time statement than it is a |
| 234 | compile-time declaration I<per se>. This means that the my() variable is |
| 235 | remade afresh each time through the loop. So even though it I<looks> as |
| 236 | though you stored the same variable reference each time, you actually did |
| 237 | not! This is a subtle distinction that can produce more efficient code at |
| 238 | the risk of misleading all but the most experienced of programmers. So I |
| 239 | usually advise against teaching it to beginners. In fact, except for |
| 240 | passing arguments to functions, I seldom like to see the gimme-a-reference |
| 241 | operator (backslash) used much at all in code. Instead, I advise |
| 242 | beginners that they (and most of the rest of us) should try to use the |
| 243 | much more easily understood constructors C<[]> and C<{}> instead of |
| 244 | relying upon lexical (or dynamic) scoping and hidden reference-counting to |
| 245 | do the right thing behind the scenes. |
| 246 | |
| 247 | In summary: |
| 248 | |
| 249 | $AoA[$i] = [ @array ]; # usually best |
| 250 | $AoA[$i] = \@array; # perilous; just how my() was that array? |
| 251 | @{ $AoA[$i] } = @array; # way too tricky for most programmers |
| 252 | |
| 253 | |
| 254 | =head1 CAVEAT ON PRECEDENCE |
| 255 | X<dereference, precedence> X<dereferencing, precedence> |
| 256 | |
| 257 | Speaking of things like C<@{$AoA[$i]}>, the following are actually the |
| 258 | same thing: |
| 259 | X<< -> >> |
| 260 | |
| 261 | $aref->[2][2] # clear |
| 262 | $$aref[2][2] # confusing |
| 263 | |
| 264 | That's because Perl's precedence rules on its five prefix dereferencers |
| 265 | (which look like someone swearing: C<$ @ * % &>) make them bind more |
| 266 | tightly than the postfix subscripting brackets or braces! This will no |
| 267 | doubt come as a great shock to the C or C++ programmer, who is quite |
| 268 | accustomed to using C<*a[i]> to mean what's pointed to by the I<i'th> |
| 269 | element of C<a>. That is, they first take the subscript, and only then |
| 270 | dereference the thing at that subscript. That's fine in C, but this isn't C. |
| 271 | |
| 272 | The seemingly equivalent construct in Perl, C<$$aref[$i]> first does |
| 273 | the deref of $aref, making it take $aref as a reference to an |
| 274 | array, and then dereference that, and finally tell you the I<i'th> value |
| 275 | of the array pointed to by $AoA. If you wanted the C notion, you'd have to |
| 276 | write C<${$AoA[$i]}> to force the C<$AoA[$i]> to get evaluated first |
| 277 | before the leading C<$> dereferencer. |
| 278 | |
| 279 | =head1 WHY YOU SHOULD ALWAYS C<use strict> |
| 280 | |
| 281 | If this is starting to sound scarier than it's worth, relax. Perl has |
| 282 | some features to help you avoid its most common pitfalls. The best |
| 283 | way to avoid getting confused is to start every program like this: |
| 284 | |
| 285 | #!/usr/bin/perl -w |
| 286 | use strict; |
| 287 | |
| 288 | This way, you'll be forced to declare all your variables with my() and |
| 289 | also disallow accidental "symbolic dereferencing". Therefore if you'd done |
| 290 | this: |
| 291 | |
| 292 | my $aref = [ |
| 293 | [ "fred", "barney", "pebbles", "bambam", "dino", ], |
| 294 | [ "homer", "bart", "marge", "maggie", ], |
| 295 | [ "george", "jane", "elroy", "judy", ], |
| 296 | ]; |
| 297 | |
| 298 | print $aref[2][2]; |
| 299 | |
| 300 | The compiler would immediately flag that as an error I<at compile time>, |
| 301 | because you were accidentally accessing C<@aref>, an undeclared |
| 302 | variable, and it would thereby remind you to write instead: |
| 303 | |
| 304 | print $aref->[2][2] |
| 305 | |
| 306 | =head1 DEBUGGING |
| 307 | X<data structure, debugging> X<complex data structure, debugging> |
| 308 | X<AoA, debugging> X<HoA, debugging> X<AoH, debugging> X<HoH, debugging> |
| 309 | X<array of arrays, debugging> X<hash of arrays, debugging> |
| 310 | X<array of hashes, debugging> X<hash of hashes, debugging> |
| 311 | |
| 312 | Before version 5.002, the standard Perl debugger didn't do a very nice job of |
| 313 | printing out complex data structures. With 5.002 or above, the |
| 314 | debugger includes several new features, including command line editing as |
| 315 | well as the C<x> command to dump out complex data structures. For |
| 316 | example, given the assignment to $AoA above, here's the debugger output: |
| 317 | |
| 318 | DB<1> x $AoA |
| 319 | $AoA = ARRAY(0x13b5a0) |
| 320 | 0 ARRAY(0x1f0a24) |
| 321 | 0 'fred' |
| 322 | 1 'barney' |
| 323 | 2 'pebbles' |
| 324 | 3 'bambam' |
| 325 | 4 'dino' |
| 326 | 1 ARRAY(0x13b558) |
| 327 | 0 'homer' |
| 328 | 1 'bart' |
| 329 | 2 'marge' |
| 330 | 3 'maggie' |
| 331 | 2 ARRAY(0x13b540) |
| 332 | 0 'george' |
| 333 | 1 'jane' |
| 334 | 2 'elroy' |
| 335 | 3 'judy' |
| 336 | |
| 337 | =head1 CODE EXAMPLES |
| 338 | |
| 339 | Presented with little comment (these will get their own manpages someday) |
| 340 | here are short code examples illustrating access of various |
| 341 | types of data structures. |
| 342 | |
| 343 | =head1 ARRAYS OF ARRAYS |
| 344 | X<array of arrays> X<AoA> |
| 345 | |
| 346 | =head2 Declaration of an ARRAY OF ARRAYS |
| 347 | |
| 348 | @AoA = ( |
| 349 | [ "fred", "barney" ], |
| 350 | [ "george", "jane", "elroy" ], |
| 351 | [ "homer", "marge", "bart" ], |
| 352 | ); |
| 353 | |
| 354 | =head2 Generation of an ARRAY OF ARRAYS |
| 355 | |
| 356 | # reading from file |
| 357 | while ( <> ) { |
| 358 | push @AoA, [ split ]; |
| 359 | } |
| 360 | |
| 361 | # calling a function |
| 362 | for $i ( 1 .. 10 ) { |
| 363 | $AoA[$i] = [ somefunc($i) ]; |
| 364 | } |
| 365 | |
| 366 | # using temp vars |
| 367 | for $i ( 1 .. 10 ) { |
| 368 | @tmp = somefunc($i); |
| 369 | $AoA[$i] = [ @tmp ]; |
| 370 | } |
| 371 | |
| 372 | # add to an existing row |
| 373 | push @{ $AoA[0] }, "wilma", "betty"; |
| 374 | |
| 375 | =head2 Access and Printing of an ARRAY OF ARRAYS |
| 376 | |
| 377 | # one element |
| 378 | $AoA[0][0] = "Fred"; |
| 379 | |
| 380 | # another element |
| 381 | $AoA[1][1] =~ s/(\w)/\u$1/; |
| 382 | |
| 383 | # print the whole thing with refs |
| 384 | for $aref ( @AoA ) { |
| 385 | print "\t [ @$aref ],\n"; |
| 386 | } |
| 387 | |
| 388 | # print the whole thing with indices |
| 389 | for $i ( 0 .. $#AoA ) { |
| 390 | print "\t [ @{$AoA[$i]} ],\n"; |
| 391 | } |
| 392 | |
| 393 | # print the whole thing one at a time |
| 394 | for $i ( 0 .. $#AoA ) { |
| 395 | for $j ( 0 .. $#{ $AoA[$i] } ) { |
| 396 | print "elt $i $j is $AoA[$i][$j]\n"; |
| 397 | } |
| 398 | } |
| 399 | |
| 400 | =head1 HASHES OF ARRAYS |
| 401 | X<hash of arrays> X<HoA> |
| 402 | |
| 403 | =head2 Declaration of a HASH OF ARRAYS |
| 404 | |
| 405 | %HoA = ( |
| 406 | flintstones => [ "fred", "barney" ], |
| 407 | jetsons => [ "george", "jane", "elroy" ], |
| 408 | simpsons => [ "homer", "marge", "bart" ], |
| 409 | ); |
| 410 | |
| 411 | =head2 Generation of a HASH OF ARRAYS |
| 412 | |
| 413 | # reading from file |
| 414 | # flintstones: fred barney wilma dino |
| 415 | while ( <> ) { |
| 416 | next unless s/^(.*?):\s*//; |
| 417 | $HoA{$1} = [ split ]; |
| 418 | } |
| 419 | |
| 420 | # reading from file; more temps |
| 421 | # flintstones: fred barney wilma dino |
| 422 | while ( $line = <> ) { |
| 423 | ($who, $rest) = split /:\s*/, $line, 2; |
| 424 | @fields = split ' ', $rest; |
| 425 | $HoA{$who} = [ @fields ]; |
| 426 | } |
| 427 | |
| 428 | # calling a function that returns a list |
| 429 | for $group ( "simpsons", "jetsons", "flintstones" ) { |
| 430 | $HoA{$group} = [ get_family($group) ]; |
| 431 | } |
| 432 | |
| 433 | # likewise, but using temps |
| 434 | for $group ( "simpsons", "jetsons", "flintstones" ) { |
| 435 | @members = get_family($group); |
| 436 | $HoA{$group} = [ @members ]; |
| 437 | } |
| 438 | |
| 439 | # append new members to an existing family |
| 440 | push @{ $HoA{"flintstones"} }, "wilma", "betty"; |
| 441 | |
| 442 | =head2 Access and Printing of a HASH OF ARRAYS |
| 443 | |
| 444 | # one element |
| 445 | $HoA{flintstones}[0] = "Fred"; |
| 446 | |
| 447 | # another element |
| 448 | $HoA{simpsons}[1] =~ s/(\w)/\u$1/; |
| 449 | |
| 450 | # print the whole thing |
| 451 | foreach $family ( keys %HoA ) { |
| 452 | print "$family: @{ $HoA{$family} }\n" |
| 453 | } |
| 454 | |
| 455 | # print the whole thing with indices |
| 456 | foreach $family ( keys %HoA ) { |
| 457 | print "family: "; |
| 458 | foreach $i ( 0 .. $#{ $HoA{$family} } ) { |
| 459 | print " $i = $HoA{$family}[$i]"; |
| 460 | } |
| 461 | print "\n"; |
| 462 | } |
| 463 | |
| 464 | # print the whole thing sorted by number of members |
| 465 | foreach $family ( sort { @{$HoA{$b}} <=> @{$HoA{$a}} } keys %HoA ) { |
| 466 | print "$family: @{ $HoA{$family} }\n" |
| 467 | } |
| 468 | |
| 469 | # print the whole thing sorted by number of members and name |
| 470 | foreach $family ( sort { |
| 471 | @{$HoA{$b}} <=> @{$HoA{$a}} |
| 472 | || |
| 473 | $a cmp $b |
| 474 | } keys %HoA ) |
| 475 | { |
| 476 | print "$family: ", join(", ", sort @{ $HoA{$family} }), "\n"; |
| 477 | } |
| 478 | |
| 479 | =head1 ARRAYS OF HASHES |
| 480 | X<array of hashes> X<AoH> |
| 481 | |
| 482 | =head2 Declaration of an ARRAY OF HASHES |
| 483 | |
| 484 | @AoH = ( |
| 485 | { |
| 486 | Lead => "fred", |
| 487 | Friend => "barney", |
| 488 | }, |
| 489 | { |
| 490 | Lead => "george", |
| 491 | Wife => "jane", |
| 492 | Son => "elroy", |
| 493 | }, |
| 494 | { |
| 495 | Lead => "homer", |
| 496 | Wife => "marge", |
| 497 | Son => "bart", |
| 498 | } |
| 499 | ); |
| 500 | |
| 501 | =head2 Generation of an ARRAY OF HASHES |
| 502 | |
| 503 | # reading from file |
| 504 | # format: LEAD=fred FRIEND=barney |
| 505 | while ( <> ) { |
| 506 | $rec = {}; |
| 507 | for $field ( split ) { |
| 508 | ($key, $value) = split /=/, $field; |
| 509 | $rec->{$key} = $value; |
| 510 | } |
| 511 | push @AoH, $rec; |
| 512 | } |
| 513 | |
| 514 | |
| 515 | # reading from file |
| 516 | # format: LEAD=fred FRIEND=barney |
| 517 | # no temp |
| 518 | while ( <> ) { |
| 519 | push @AoH, { split /[\s+=]/ }; |
| 520 | } |
| 521 | |
| 522 | # calling a function that returns a key/value pair list, like |
| 523 | # "lead","fred","daughter","pebbles" |
| 524 | while ( %fields = getnextpairset() ) { |
| 525 | push @AoH, { %fields }; |
| 526 | } |
| 527 | |
| 528 | # likewise, but using no temp vars |
| 529 | while (<>) { |
| 530 | push @AoH, { parsepairs($_) }; |
| 531 | } |
| 532 | |
| 533 | # add key/value to an element |
| 534 | $AoH[0]{pet} = "dino"; |
| 535 | $AoH[2]{pet} = "santa's little helper"; |
| 536 | |
| 537 | =head2 Access and Printing of an ARRAY OF HASHES |
| 538 | |
| 539 | # one element |
| 540 | $AoH[0]{lead} = "fred"; |
| 541 | |
| 542 | # another element |
| 543 | $AoH[1]{lead} =~ s/(\w)/\u$1/; |
| 544 | |
| 545 | # print the whole thing with refs |
| 546 | for $href ( @AoH ) { |
| 547 | print "{ "; |
| 548 | for $role ( keys %$href ) { |
| 549 | print "$role=$href->{$role} "; |
| 550 | } |
| 551 | print "}\n"; |
| 552 | } |
| 553 | |
| 554 | # print the whole thing with indices |
| 555 | for $i ( 0 .. $#AoH ) { |
| 556 | print "$i is { "; |
| 557 | for $role ( keys %{ $AoH[$i] } ) { |
| 558 | print "$role=$AoH[$i]{$role} "; |
| 559 | } |
| 560 | print "}\n"; |
| 561 | } |
| 562 | |
| 563 | # print the whole thing one at a time |
| 564 | for $i ( 0 .. $#AoH ) { |
| 565 | for $role ( keys %{ $AoH[$i] } ) { |
| 566 | print "elt $i $role is $AoH[$i]{$role}\n"; |
| 567 | } |
| 568 | } |
| 569 | |
| 570 | =head1 HASHES OF HASHES |
| 571 | X<hass of hashes> X<HoH> |
| 572 | |
| 573 | =head2 Declaration of a HASH OF HASHES |
| 574 | |
| 575 | %HoH = ( |
| 576 | flintstones => { |
| 577 | lead => "fred", |
| 578 | pal => "barney", |
| 579 | }, |
| 580 | jetsons => { |
| 581 | lead => "george", |
| 582 | wife => "jane", |
| 583 | "his boy" => "elroy", |
| 584 | }, |
| 585 | simpsons => { |
| 586 | lead => "homer", |
| 587 | wife => "marge", |
| 588 | kid => "bart", |
| 589 | }, |
| 590 | ); |
| 591 | |
| 592 | =head2 Generation of a HASH OF HASHES |
| 593 | |
| 594 | # reading from file |
| 595 | # flintstones: lead=fred pal=barney wife=wilma pet=dino |
| 596 | while ( <> ) { |
| 597 | next unless s/^(.*?):\s*//; |
| 598 | $who = $1; |
| 599 | for $field ( split ) { |
| 600 | ($key, $value) = split /=/, $field; |
| 601 | $HoH{$who}{$key} = $value; |
| 602 | } |
| 603 | |
| 604 | |
| 605 | # reading from file; more temps |
| 606 | while ( <> ) { |
| 607 | next unless s/^(.*?):\s*//; |
| 608 | $who = $1; |
| 609 | $rec = {}; |
| 610 | $HoH{$who} = $rec; |
| 611 | for $field ( split ) { |
| 612 | ($key, $value) = split /=/, $field; |
| 613 | $rec->{$key} = $value; |
| 614 | } |
| 615 | } |
| 616 | |
| 617 | # calling a function that returns a key,value hash |
| 618 | for $group ( "simpsons", "jetsons", "flintstones" ) { |
| 619 | $HoH{$group} = { get_family($group) }; |
| 620 | } |
| 621 | |
| 622 | # likewise, but using temps |
| 623 | for $group ( "simpsons", "jetsons", "flintstones" ) { |
| 624 | %members = get_family($group); |
| 625 | $HoH{$group} = { %members }; |
| 626 | } |
| 627 | |
| 628 | # append new members to an existing family |
| 629 | %new_folks = ( |
| 630 | wife => "wilma", |
| 631 | pet => "dino", |
| 632 | ); |
| 633 | |
| 634 | for $what (keys %new_folks) { |
| 635 | $HoH{flintstones}{$what} = $new_folks{$what}; |
| 636 | } |
| 637 | |
| 638 | =head2 Access and Printing of a HASH OF HASHES |
| 639 | |
| 640 | # one element |
| 641 | $HoH{flintstones}{wife} = "wilma"; |
| 642 | |
| 643 | # another element |
| 644 | $HoH{simpsons}{lead} =~ s/(\w)/\u$1/; |
| 645 | |
| 646 | # print the whole thing |
| 647 | foreach $family ( keys %HoH ) { |
| 648 | print "$family: { "; |
| 649 | for $role ( keys %{ $HoH{$family} } ) { |
| 650 | print "$role=$HoH{$family}{$role} "; |
| 651 | } |
| 652 | print "}\n"; |
| 653 | } |
| 654 | |
| 655 | # print the whole thing somewhat sorted |
| 656 | foreach $family ( sort keys %HoH ) { |
| 657 | print "$family: { "; |
| 658 | for $role ( sort keys %{ $HoH{$family} } ) { |
| 659 | print "$role=$HoH{$family}{$role} "; |
| 660 | } |
| 661 | print "}\n"; |
| 662 | } |
| 663 | |
| 664 | |
| 665 | # print the whole thing sorted by number of members |
| 666 | foreach $family ( sort { keys %{$HoH{$b}} <=> keys %{$HoH{$a}} } keys %HoH ) { |
| 667 | print "$family: { "; |
| 668 | for $role ( sort keys %{ $HoH{$family} } ) { |
| 669 | print "$role=$HoH{$family}{$role} "; |
| 670 | } |
| 671 | print "}\n"; |
| 672 | } |
| 673 | |
| 674 | # establish a sort order (rank) for each role |
| 675 | $i = 0; |
| 676 | for ( qw(lead wife son daughter pal pet) ) { $rank{$_} = ++$i } |
| 677 | |
| 678 | # now print the whole thing sorted by number of members |
| 679 | foreach $family ( sort { keys %{ $HoH{$b} } <=> keys %{ $HoH{$a} } } keys %HoH ) { |
| 680 | print "$family: { "; |
| 681 | # and print these according to rank order |
| 682 | for $role ( sort { $rank{$a} <=> $rank{$b} } keys %{ $HoH{$family} } ) { |
| 683 | print "$role=$HoH{$family}{$role} "; |
| 684 | } |
| 685 | print "}\n"; |
| 686 | } |
| 687 | |
| 688 | |
| 689 | =head1 MORE ELABORATE RECORDS |
| 690 | X<record> X<structure> X<struct> |
| 691 | |
| 692 | =head2 Declaration of MORE ELABORATE RECORDS |
| 693 | |
| 694 | Here's a sample showing how to create and use a record whose fields are of |
| 695 | many different sorts: |
| 696 | |
| 697 | $rec = { |
| 698 | TEXT => $string, |
| 699 | SEQUENCE => [ @old_values ], |
| 700 | LOOKUP => { %some_table }, |
| 701 | THATCODE => \&some_function, |
| 702 | THISCODE => sub { $_[0] ** $_[1] }, |
| 703 | HANDLE => \*STDOUT, |
| 704 | }; |
| 705 | |
| 706 | print $rec->{TEXT}; |
| 707 | |
| 708 | print $rec->{SEQUENCE}[0]; |
| 709 | $last = pop @ { $rec->{SEQUENCE} }; |
| 710 | |
| 711 | print $rec->{LOOKUP}{"key"}; |
| 712 | ($first_k, $first_v) = each %{ $rec->{LOOKUP} }; |
| 713 | |
| 714 | $answer = $rec->{THATCODE}->($arg); |
| 715 | $answer = $rec->{THISCODE}->($arg1, $arg2); |
| 716 | |
| 717 | # careful of extra block braces on fh ref |
| 718 | print { $rec->{HANDLE} } "a string\n"; |
| 719 | |
| 720 | use FileHandle; |
| 721 | $rec->{HANDLE}->autoflush(1); |
| 722 | $rec->{HANDLE}->print(" a string\n"); |
| 723 | |
| 724 | =head2 Declaration of a HASH OF COMPLEX RECORDS |
| 725 | |
| 726 | %TV = ( |
| 727 | flintstones => { |
| 728 | series => "flintstones", |
| 729 | nights => [ qw(monday thursday friday) ], |
| 730 | members => [ |
| 731 | { name => "fred", role => "lead", age => 36, }, |
| 732 | { name => "wilma", role => "wife", age => 31, }, |
| 733 | { name => "pebbles", role => "kid", age => 4, }, |
| 734 | ], |
| 735 | }, |
| 736 | |
| 737 | jetsons => { |
| 738 | series => "jetsons", |
| 739 | nights => [ qw(wednesday saturday) ], |
| 740 | members => [ |
| 741 | { name => "george", role => "lead", age => 41, }, |
| 742 | { name => "jane", role => "wife", age => 39, }, |
| 743 | { name => "elroy", role => "kid", age => 9, }, |
| 744 | ], |
| 745 | }, |
| 746 | |
| 747 | simpsons => { |
| 748 | series => "simpsons", |
| 749 | nights => [ qw(monday) ], |
| 750 | members => [ |
| 751 | { name => "homer", role => "lead", age => 34, }, |
| 752 | { name => "marge", role => "wife", age => 37, }, |
| 753 | { name => "bart", role => "kid", age => 11, }, |
| 754 | ], |
| 755 | }, |
| 756 | ); |
| 757 | |
| 758 | =head2 Generation of a HASH OF COMPLEX RECORDS |
| 759 | |
| 760 | # reading from file |
| 761 | # this is most easily done by having the file itself be |
| 762 | # in the raw data format as shown above. perl is happy |
| 763 | # to parse complex data structures if declared as data, so |
| 764 | # sometimes it's easiest to do that |
| 765 | |
| 766 | # here's a piece by piece build up |
| 767 | $rec = {}; |
| 768 | $rec->{series} = "flintstones"; |
| 769 | $rec->{nights} = [ find_days() ]; |
| 770 | |
| 771 | @members = (); |
| 772 | # assume this file in field=value syntax |
| 773 | while (<>) { |
| 774 | %fields = split /[\s=]+/; |
| 775 | push @members, { %fields }; |
| 776 | } |
| 777 | $rec->{members} = [ @members ]; |
| 778 | |
| 779 | # now remember the whole thing |
| 780 | $TV{ $rec->{series} } = $rec; |
| 781 | |
| 782 | ########################################################### |
| 783 | # now, you might want to make interesting extra fields that |
| 784 | # include pointers back into the same data structure so if |
| 785 | # change one piece, it changes everywhere, like for example |
| 786 | # if you wanted a {kids} field that was a reference |
| 787 | # to an array of the kids' records without having duplicate |
| 788 | # records and thus update problems. |
| 789 | ########################################################### |
| 790 | foreach $family (keys %TV) { |
| 791 | $rec = $TV{$family}; # temp pointer |
| 792 | @kids = (); |
| 793 | for $person ( @{ $rec->{members} } ) { |
| 794 | if ($person->{role} =~ /kid|son|daughter/) { |
| 795 | push @kids, $person; |
| 796 | } |
| 797 | } |
| 798 | # REMEMBER: $rec and $TV{$family} point to same data!! |
| 799 | $rec->{kids} = [ @kids ]; |
| 800 | } |
| 801 | |
| 802 | # you copied the array, but the array itself contains pointers |
| 803 | # to uncopied objects. this means that if you make bart get |
| 804 | # older via |
| 805 | |
| 806 | $TV{simpsons}{kids}[0]{age}++; |
| 807 | |
| 808 | # then this would also change in |
| 809 | print $TV{simpsons}{members}[2]{age}; |
| 810 | |
| 811 | # because $TV{simpsons}{kids}[0] and $TV{simpsons}{members}[2] |
| 812 | # both point to the same underlying anonymous hash table |
| 813 | |
| 814 | # print the whole thing |
| 815 | foreach $family ( keys %TV ) { |
| 816 | print "the $family"; |
| 817 | print " is on during @{ $TV{$family}{nights} }\n"; |
| 818 | print "its members are:\n"; |
| 819 | for $who ( @{ $TV{$family}{members} } ) { |
| 820 | print " $who->{name} ($who->{role}), age $who->{age}\n"; |
| 821 | } |
| 822 | print "it turns out that $TV{$family}{lead} has "; |
| 823 | print scalar ( @{ $TV{$family}{kids} } ), " kids named "; |
| 824 | print join (", ", map { $_->{name} } @{ $TV{$family}{kids} } ); |
| 825 | print "\n"; |
| 826 | } |
| 827 | |
| 828 | =head1 Database Ties |
| 829 | |
| 830 | You cannot easily tie a multilevel data structure (such as a hash of |
| 831 | hashes) to a dbm file. The first problem is that all but GDBM and |
| 832 | Berkeley DB have size limitations, but beyond that, you also have problems |
| 833 | with how references are to be represented on disk. One experimental |
| 834 | module that does partially attempt to address this need is the MLDBM |
| 835 | module. Check your nearest CPAN site as described in L<perlmodlib> for |
| 836 | source code to MLDBM. |
| 837 | |
| 838 | =head1 SEE ALSO |
| 839 | |
| 840 | perlref(1), perllol(1), perldata(1), perlobj(1) |
| 841 | |
| 842 | =head1 AUTHOR |
| 843 | |
| 844 | Tom Christiansen <F<tchrist@perl.com>> |
| 845 | |
| 846 | Last update: |
| 847 | Wed Oct 23 04:57:50 MET DST 1996 |