2 X<data structure> X<complex data structure> X<struct>
4 perldsc - Perl Data Structures Cookbook
8 Perl lets us have complex data structures. You can write something like
9 this and all of a sudden, you'd have an array with three dimensions!
20 Alas, however simple this may appear, underneath it's a much more
21 elaborate construct than meets the eye!
23 How do you print it out? Why can't you say just C<print @AoA>? How do
24 you sort it? How can you pass it to a function or get one of these back
25 from a function? Is it an object? Can you save it to disk to read
26 back later? How do you access whole rows or columns of that matrix? Do
27 all the values have to be numeric?
29 As you see, it's quite easy to become confused. While some small portion
30 of the blame for this can be attributed to the reference-based
31 implementation, it's really more due to a lack of existing documentation with
32 examples designed for the beginner.
34 This document is meant to be a detailed but understandable treatment of the
35 many different sorts of data structures you might want to develop. It
36 should also serve as a cookbook of examples. That way, when you need to
37 create one of these complex data structures, you can just pinch, pilfer, or
38 purloin a drop-in example from here.
40 Let's look at each of these possible constructs in detail. There are separate
41 sections on each of the following:
45 =item * arrays of arrays
47 =item * hashes of arrays
49 =item * arrays of hashes
51 =item * hashes of hashes
53 =item * more elaborate constructs
57 But for now, let's look at general issues common to all
58 these types of data structures.
61 X<reference> X<dereference> X<dereferencing> X<pointer>
63 The most important thing to understand about all data structures in
64 Perl--including multidimensional arrays--is that even though they might
65 appear otherwise, Perl C<@ARRAY>s and C<%HASH>es are all internally
66 one-dimensional. They can hold only scalar values (meaning a string,
67 number, or a reference). They cannot directly contain other arrays or
68 hashes, but instead contain I<references> to other arrays or hashes.
69 X<multidimensional array> X<array, multidimensional>
71 You can't use a reference to an array or hash in quite the same way that you
72 would a real array or hash. For C or C++ programmers unused to
73 distinguishing between arrays and pointers to the same, this can be
74 confusing. If so, just think of it as the difference between a structure
75 and a pointer to a structure.
77 You can (and should) read more about references in L<perlref>.
78 Briefly, references are rather like pointers that know what they
79 point to. (Objects are also a kind of reference, but we won't be needing
80 them right away--if ever.) This means that when you have something which
81 looks to you like an access to a two-or-more-dimensional array and/or hash,
82 what's really going on is that the base type is
83 merely a one-dimensional entity that contains references to the next
84 level. It's just that you can I<use> it as though it were a
85 two-dimensional one. This is actually the way almost all C
86 multidimensional arrays work as well.
88 $array[7][12] # array of arrays
89 $array[7]{string} # array of hashes
90 $hash{string}[7] # hash of arrays
91 $hash{string}{'another string'} # hash of hashes
93 Now, because the top level contains only references, if you try to print
94 out your array in with a simple print() function, you'll get something
95 that doesn't look very nice, like this:
97 my @AoA = ( [2, 3], [4, 5, 7], [0] );
101 ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0)
104 That's because Perl doesn't (ever) implicitly dereference your variables.
105 If you want to get at the thing a reference is referring to, then you have
106 to do this yourself using either prefix typing indicators, like
107 C<${$blah}>, C<@{$blah}>, C<@{$blah[$i]}>, or else postfix pointer arrows,
108 like C<$a-E<gt>[3]>, C<$h-E<gt>{fred}>, or even C<$ob-E<gt>method()-E<gt>[3]>.
110 =head1 COMMON MISTAKES
112 The two most common mistakes made in constructing something like
113 an array of arrays is either accidentally counting the number of
114 elements or else taking a reference to the same memory location
115 repeatedly. Here's the case where you just get the count instead
119 my @array = somefunc($i);
120 $AoA[$i] = @array; # WRONG!
123 That's just the simple case of assigning an array to a scalar and getting
124 its element count. If that's what you really and truly want, then you
125 might do well to consider being a tad more explicit about it, like this:
128 my @array = somefunc($i);
129 $counts[$i] = scalar @array;
132 Here's the case of taking a reference to the same memory location
135 # Either without strict or having an outer-scope my @array;
139 @array = somefunc($i);
140 $AoA[$i] = \@array; # WRONG!
143 So, what's the big problem with that? It looks right, doesn't it?
144 After all, I just told you that you need an array of references, so by
145 golly, you've made me one!
147 Unfortunately, while this is true, it's still broken. All the references
148 in @AoA refer to the I<very same place>, and they will therefore all hold
149 whatever was last in @array! It's similar to the problem demonstrated in
150 the following C program:
154 struct passwd *getpwnam(), *rp, *dp;
155 rp = getpwnam("root");
156 dp = getpwnam("daemon");
158 printf("daemon name is %s\nroot name is %s\n",
159 dp->pw_name, rp->pw_name);
164 daemon name is daemon
167 The problem is that both C<rp> and C<dp> are pointers to the same location
168 in memory! In C, you'd have to remember to malloc() yourself some new
169 memory. In Perl, you'll want to use the array constructor C<[]> or the
170 hash constructor C<{}> instead. Here's the right way to do the preceding
171 broken code fragments:
174 # Either without strict or having an outer-scope my @array;
178 @array = somefunc($i);
179 $AoA[$i] = [ @array ];
182 The square brackets make a reference to a new array with a I<copy>
183 of what's in @array at the time of the assignment. This is what
186 Note that this will produce something similar, but it's
189 # Either without strict or having an outer-scope my @array;
193 @{$AoA[$i]} = @array;
196 Is it the same? Well, maybe so--and maybe not. The subtle difference
197 is that when you assign something in square brackets, you know for sure
198 it's always a brand new reference with a new I<copy> of the data.
199 Something else could be going on in this new case with the C<@{$AoA[$i]}>
200 dereference on the left-hand-side of the assignment. It all depends on
201 whether C<$AoA[$i]> had been undefined to start with, or whether it
202 already contained a reference. If you had already populated @AoA with
205 $AoA[3] = \@another_array;
207 Then the assignment with the indirection on the left-hand-side would
208 use the existing reference that was already there:
212 Of course, this I<would> have the "interesting" effect of clobbering
213 @another_array. (Have you ever noticed how when a programmer says
214 something is "interesting", that rather than meaning "intriguing",
215 they're disturbingly more apt to mean that it's "annoying",
216 "difficult", or both? :-)
218 So just remember always to use the array or hash constructors with C<[]>
219 or C<{}>, and you'll be fine, although it's not always optimally
222 Surprisingly, the following dangerous-looking construct will
223 actually work out fine:
226 my @array = somefunc($i);
230 That's because my() is more of a run-time statement than it is a
231 compile-time declaration I<per se>. This means that the my() variable is
232 remade afresh each time through the loop. So even though it I<looks> as
233 though you stored the same variable reference each time, you actually did
234 not! This is a subtle distinction that can produce more efficient code at
235 the risk of misleading all but the most experienced of programmers. So I
236 usually advise against teaching it to beginners. In fact, except for
237 passing arguments to functions, I seldom like to see the gimme-a-reference
238 operator (backslash) used much at all in code. Instead, I advise
239 beginners that they (and most of the rest of us) should try to use the
240 much more easily understood constructors C<[]> and C<{}> instead of
241 relying upon lexical (or dynamic) scoping and hidden reference-counting to
242 do the right thing behind the scenes.
246 $AoA[$i] = [ @array ]; # usually best
247 $AoA[$i] = \@array; # perilous; just how my() was that array?
248 @{ $AoA[$i] } = @array; # way too tricky for most programmers
251 =head1 CAVEAT ON PRECEDENCE
252 X<dereference, precedence> X<dereferencing, precedence>
254 Speaking of things like C<@{$AoA[$i]}>, the following are actually the
258 $aref->[2][2] # clear
259 $$aref[2][2] # confusing
261 That's because Perl's precedence rules on its five prefix dereferencers
262 (which look like someone swearing: C<$ @ * % &>) make them bind more
263 tightly than the postfix subscripting brackets or braces! This will no
264 doubt come as a great shock to the C or C++ programmer, who is quite
265 accustomed to using C<*a[i]> to mean what's pointed to by the I<i'th>
266 element of C<a>. That is, they first take the subscript, and only then
267 dereference the thing at that subscript. That's fine in C, but this isn't C.
269 The seemingly equivalent construct in Perl, C<$$aref[$i]> first does
270 the deref of $aref, making it take $aref as a reference to an
271 array, and then dereference that, and finally tell you the I<i'th> value
272 of the array pointed to by $AoA. If you wanted the C notion, you'd have to
273 write C<${$AoA[$i]}> to force the C<$AoA[$i]> to get evaluated first
274 before the leading C<$> dereferencer.
276 =head1 WHY YOU SHOULD ALWAYS C<use strict>
278 If this is starting to sound scarier than it's worth, relax. Perl has
279 some features to help you avoid its most common pitfalls. The best
280 way to avoid getting confused is to start every program like this:
285 This way, you'll be forced to declare all your variables with my() and
286 also disallow accidental "symbolic dereferencing". Therefore if you'd done
290 [ "fred", "barney", "pebbles", "bambam", "dino", ],
291 [ "homer", "bart", "marge", "maggie", ],
292 [ "george", "jane", "elroy", "judy", ],
297 The compiler would immediately flag that as an error I<at compile time>,
298 because you were accidentally accessing C<@aref>, an undeclared
299 variable, and it would thereby remind you to write instead:
304 X<data structure, debugging> X<complex data structure, debugging>
305 X<AoA, debugging> X<HoA, debugging> X<AoH, debugging> X<HoH, debugging>
306 X<array of arrays, debugging> X<hash of arrays, debugging>
307 X<array of hashes, debugging> X<hash of hashes, debugging>
309 You can use the debugger's C<x> command to dump out complex data structures.
310 For example, given the assignment to $AoA above, here's the debugger output:
313 $AoA = ARRAY(0x13b5a0)
333 Presented with little comment (these will get their own manpages someday)
334 here are short code examples illustrating access of various
335 types of data structures.
337 =head1 ARRAYS OF ARRAYS
338 X<array of arrays> X<AoA>
340 =head2 Declaration of an ARRAY OF ARRAYS
343 [ "fred", "barney" ],
344 [ "george", "jane", "elroy" ],
345 [ "homer", "marge", "bart" ],
348 =head2 Generation of an ARRAY OF ARRAYS
352 push @AoA, [ split ];
357 $AoA[$i] = [ somefunc($i) ];
366 # add to an existing row
367 push @{ $AoA[0] }, "wilma", "betty";
369 =head2 Access and Printing of an ARRAY OF ARRAYS
375 $AoA[1][1] =~ s/(\w)/\u$1/;
377 # print the whole thing with refs
379 print "\t [ @$aref ],\n";
382 # print the whole thing with indices
383 for $i ( 0 .. $#AoA ) {
384 print "\t [ @{$AoA[$i]} ],\n";
387 # print the whole thing one at a time
388 for $i ( 0 .. $#AoA ) {
389 for $j ( 0 .. $#{ $AoA[$i] } ) {
390 print "elt $i $j is $AoA[$i][$j]\n";
394 =head1 HASHES OF ARRAYS
395 X<hash of arrays> X<HoA>
397 =head2 Declaration of a HASH OF ARRAYS
400 flintstones => [ "fred", "barney" ],
401 jetsons => [ "george", "jane", "elroy" ],
402 simpsons => [ "homer", "marge", "bart" ],
405 =head2 Generation of a HASH OF ARRAYS
408 # flintstones: fred barney wilma dino
410 next unless s/^(.*?):\s*//;
411 $HoA{$1} = [ split ];
414 # reading from file; more temps
415 # flintstones: fred barney wilma dino
416 while ( $line = <> ) {
417 ($who, $rest) = split /:\s*/, $line, 2;
418 @fields = split ' ', $rest;
419 $HoA{$who} = [ @fields ];
422 # calling a function that returns a list
423 for $group ( "simpsons", "jetsons", "flintstones" ) {
424 $HoA{$group} = [ get_family($group) ];
427 # likewise, but using temps
428 for $group ( "simpsons", "jetsons", "flintstones" ) {
429 @members = get_family($group);
430 $HoA{$group} = [ @members ];
433 # append new members to an existing family
434 push @{ $HoA{"flintstones"} }, "wilma", "betty";
436 =head2 Access and Printing of a HASH OF ARRAYS
439 $HoA{flintstones}[0] = "Fred";
442 $HoA{simpsons}[1] =~ s/(\w)/\u$1/;
444 # print the whole thing
445 foreach $family ( keys %HoA ) {
446 print "$family: @{ $HoA{$family} }\n"
449 # print the whole thing with indices
450 foreach $family ( keys %HoA ) {
452 foreach $i ( 0 .. $#{ $HoA{$family} } ) {
453 print " $i = $HoA{$family}[$i]";
458 # print the whole thing sorted by number of members
459 foreach $family ( sort { @{$HoA{$b}} <=> @{$HoA{$a}} } keys %HoA ) {
460 print "$family: @{ $HoA{$family} }\n"
463 # print the whole thing sorted by number of members and name
464 foreach $family ( sort {
465 @{$HoA{$b}} <=> @{$HoA{$a}}
470 print "$family: ", join(", ", sort @{ $HoA{$family} }), "\n";
473 =head1 ARRAYS OF HASHES
474 X<array of hashes> X<AoH>
476 =head2 Declaration of an ARRAY OF HASHES
495 =head2 Generation of an ARRAY OF HASHES
498 # format: LEAD=fred FRIEND=barney
501 for $field ( split ) {
502 ($key, $value) = split /=/, $field;
503 $rec->{$key} = $value;
510 # format: LEAD=fred FRIEND=barney
513 push @AoH, { split /[\s+=]/ };
516 # calling a function that returns a key/value pair list, like
517 # "lead","fred","daughter","pebbles"
518 while ( %fields = getnextpairset() ) {
519 push @AoH, { %fields };
522 # likewise, but using no temp vars
524 push @AoH, { parsepairs($_) };
527 # add key/value to an element
528 $AoH[0]{pet} = "dino";
529 $AoH[2]{pet} = "santa's little helper";
531 =head2 Access and Printing of an ARRAY OF HASHES
534 $AoH[0]{lead} = "fred";
537 $AoH[1]{lead} =~ s/(\w)/\u$1/;
539 # print the whole thing with refs
542 for $role ( keys %$href ) {
543 print "$role=$href->{$role} ";
548 # print the whole thing with indices
549 for $i ( 0 .. $#AoH ) {
551 for $role ( keys %{ $AoH[$i] } ) {
552 print "$role=$AoH[$i]{$role} ";
557 # print the whole thing one at a time
558 for $i ( 0 .. $#AoH ) {
559 for $role ( keys %{ $AoH[$i] } ) {
560 print "elt $i $role is $AoH[$i]{$role}\n";
564 =head1 HASHES OF HASHES
565 X<hash of hashes> X<HoH>
567 =head2 Declaration of a HASH OF HASHES
577 "his boy" => "elroy",
586 =head2 Generation of a HASH OF HASHES
589 # flintstones: lead=fred pal=barney wife=wilma pet=dino
591 next unless s/^(.*?):\s*//;
593 for $field ( split ) {
594 ($key, $value) = split /=/, $field;
595 $HoH{$who}{$key} = $value;
599 # reading from file; more temps
601 next unless s/^(.*?):\s*//;
605 for $field ( split ) {
606 ($key, $value) = split /=/, $field;
607 $rec->{$key} = $value;
611 # calling a function that returns a key,value hash
612 for $group ( "simpsons", "jetsons", "flintstones" ) {
613 $HoH{$group} = { get_family($group) };
616 # likewise, but using temps
617 for $group ( "simpsons", "jetsons", "flintstones" ) {
618 %members = get_family($group);
619 $HoH{$group} = { %members };
622 # append new members to an existing family
628 for $what (keys %new_folks) {
629 $HoH{flintstones}{$what} = $new_folks{$what};
632 =head2 Access and Printing of a HASH OF HASHES
635 $HoH{flintstones}{wife} = "wilma";
638 $HoH{simpsons}{lead} =~ s/(\w)/\u$1/;
640 # print the whole thing
641 foreach $family ( keys %HoH ) {
643 for $role ( keys %{ $HoH{$family} } ) {
644 print "$role=$HoH{$family}{$role} ";
649 # print the whole thing somewhat sorted
650 foreach $family ( sort keys %HoH ) {
652 for $role ( sort keys %{ $HoH{$family} } ) {
653 print "$role=$HoH{$family}{$role} ";
659 # print the whole thing sorted by number of members
660 foreach $family ( sort { keys %{$HoH{$b}} <=> keys %{$HoH{$a}} }
664 for $role ( sort keys %{ $HoH{$family} } ) {
665 print "$role=$HoH{$family}{$role} ";
670 # establish a sort order (rank) for each role
672 for ( qw(lead wife son daughter pal pet) ) { $rank{$_} = ++$i }
674 # now print the whole thing sorted by number of members
675 foreach $family ( sort { keys %{ $HoH{$b} } <=> keys %{ $HoH{$a} } }
679 # and print these according to rank order
680 for $role ( sort { $rank{$a} <=> $rank{$b} }
681 keys %{ $HoH{$family} } )
683 print "$role=$HoH{$family}{$role} ";
689 =head1 MORE ELABORATE RECORDS
690 X<record> X<structure> X<struct>
692 =head2 Declaration of MORE ELABORATE RECORDS
694 Here's a sample showing how to create and use a record whose fields are of
695 many different sorts:
699 SEQUENCE => [ @old_values ],
700 LOOKUP => { %some_table },
701 THATCODE => \&some_function,
702 THISCODE => sub { $_[0] ** $_[1] },
708 print $rec->{SEQUENCE}[0];
709 $last = pop @ { $rec->{SEQUENCE} };
711 print $rec->{LOOKUP}{"key"};
712 ($first_k, $first_v) = each %{ $rec->{LOOKUP} };
714 $answer = $rec->{THATCODE}->($arg);
715 $answer = $rec->{THISCODE}->($arg1, $arg2);
717 # careful of extra block braces on fh ref
718 print { $rec->{HANDLE} } "a string\n";
721 $rec->{HANDLE}->autoflush(1);
722 $rec->{HANDLE}->print(" a string\n");
724 =head2 Declaration of a HASH OF COMPLEX RECORDS
728 series => "flintstones",
729 nights => [ qw(monday thursday friday) ],
731 { name => "fred", role => "lead", age => 36, },
732 { name => "wilma", role => "wife", age => 31, },
733 { name => "pebbles", role => "kid", age => 4, },
739 nights => [ qw(wednesday saturday) ],
741 { name => "george", role => "lead", age => 41, },
742 { name => "jane", role => "wife", age => 39, },
743 { name => "elroy", role => "kid", age => 9, },
748 series => "simpsons",
749 nights => [ qw(monday) ],
751 { name => "homer", role => "lead", age => 34, },
752 { name => "marge", role => "wife", age => 37, },
753 { name => "bart", role => "kid", age => 11, },
758 =head2 Generation of a HASH OF COMPLEX RECORDS
761 # this is most easily done by having the file itself be
762 # in the raw data format as shown above. perl is happy
763 # to parse complex data structures if declared as data, so
764 # sometimes it's easiest to do that
766 # here's a piece by piece build up
768 $rec->{series} = "flintstones";
769 $rec->{nights} = [ find_days() ];
772 # assume this file in field=value syntax
774 %fields = split /[\s=]+/;
775 push @members, { %fields };
777 $rec->{members} = [ @members ];
779 # now remember the whole thing
780 $TV{ $rec->{series} } = $rec;
782 ###########################################################
783 # now, you might want to make interesting extra fields that
784 # include pointers back into the same data structure so if
785 # change one piece, it changes everywhere, like for example
786 # if you wanted a {kids} field that was a reference
787 # to an array of the kids' records without having duplicate
788 # records and thus update problems.
789 ###########################################################
790 foreach $family (keys %TV) {
791 $rec = $TV{$family}; # temp pointer
793 for $person ( @{ $rec->{members} } ) {
794 if ($person->{role} =~ /kid|son|daughter/) {
798 # REMEMBER: $rec and $TV{$family} point to same data!!
799 $rec->{kids} = [ @kids ];
802 # you copied the array, but the array itself contains pointers
803 # to uncopied objects. this means that if you make bart get
806 $TV{simpsons}{kids}[0]{age}++;
808 # then this would also change in
809 print $TV{simpsons}{members}[2]{age};
811 # because $TV{simpsons}{kids}[0] and $TV{simpsons}{members}[2]
812 # both point to the same underlying anonymous hash table
814 # print the whole thing
815 foreach $family ( keys %TV ) {
817 print " is on during @{ $TV{$family}{nights} }\n";
818 print "its members are:\n";
819 for $who ( @{ $TV{$family}{members} } ) {
820 print " $who->{name} ($who->{role}), age $who->{age}\n";
822 print "it turns out that $TV{$family}{lead} has ";
823 print scalar ( @{ $TV{$family}{kids} } ), " kids named ";
824 print join (", ", map { $_->{name} } @{ $TV{$family}{kids} } );
830 You cannot easily tie a multilevel data structure (such as a hash of
831 hashes) to a dbm file. The first problem is that all but GDBM and
832 Berkeley DB have size limitations, but beyond that, you also have problems
833 with how references are to be represented on disk. One experimental
834 module that does partially attempt to address this need is the MLDBM
835 module. Check your nearest CPAN site as described in L<perlmodlib> for
836 source code to MLDBM.
840 L<perlref>, L<perllol>, L<perldata>, L<perlobj>
844 Tom Christiansen <F<tchrist@perl.com>>