=head1 NAME
+X<data structure> X<complex data structure> X<struct>
perldsc - Perl Data Structures Cookbook
=head1 DESCRIPTION
-The single feature most sorely lacking in the Perl programming language
-prior to its 5.0 release was complex data structures. Even without direct
-language support, some valiant programmers did manage to emulate them, but
-it was hard work and not for the faint of heart. You could occasionally
-get away with the C<$m{$AoA,$b}> notation borrowed from B<awk> in which the
-keys are actually more like a single concatenated string C<"$AoA$b">, but
-traversal and sorting were difficult. More desperate programmers even
-hacked Perl's internal symbol table directly, a strategy that proved hard
-to develop and maintain--to put it mildly.
-
-The 5.0 release of Perl let us have complex data structures. You
-may now write something like this and all of a sudden, you'd have a array
-with three dimensions!
-
- for $x (1 .. 10) {
- for $y (1 .. 10) {
- for $z (1 .. 10) {
- $AoA[$x][$y][$z] =
- $x ** $y + $z;
- }
- }
+Perl lets us have complex data structures. You can write something like
+this and all of a sudden, you'd have an array with three dimensions!
+
+ for my $x (1 .. 10) {
+ for my $y (1 .. 10) {
+ for my $z (1 .. 10) {
+ $AoA[$x][$y][$z] =
+ $x ** $y + $z;
+ }
+ }
}
Alas, however simple this may appear, underneath it's a much more
How do you print it out? Why can't you say just C<print @AoA>? How do
you sort it? How can you pass it to a function or get one of these back
-from a function? Is is an object? Can you save it to disk to read
+from a function? Is it an object? Can you save it to disk to read
back later? How do you access whole rows or columns of that matrix? Do
all the values have to be numeric?
these types of data structures.
=head1 REFERENCES
+X<reference> X<dereference> X<dereferencing> X<pointer>
-The most important thing to understand about all data structures in Perl
--- including multidimensional arrays--is that even though they might
+The most important thing to understand about all data structures in
+Perl--including multidimensional arrays--is that even though they might
appear otherwise, Perl C<@ARRAY>s and C<%HASH>es are all internally
one-dimensional. They can hold only scalar values (meaning a string,
number, or a reference). They cannot directly contain other arrays or
hashes, but instead contain I<references> to other arrays or hashes.
+X<multidimensional array> X<array, multidimensional>
-You can't use a reference to a array or hash in quite the same way that you
+You can't use a reference to an array or hash in quite the same way that you
would a real array or hash. For C or C++ programmers unused to
distinguishing between arrays and pointers to the same, this can be
confusing. If so, just think of it as the difference between a structure
and a pointer to a structure.
-You can (and should) read more about references in the perlref(1) man
-page. Briefly, references are rather like pointers that know what they
+You can (and should) read more about references in L<perlref>.
+Briefly, references are rather like pointers that know what they
point to. (Objects are also a kind of reference, but we won't be needing
them right away--if ever.) This means that when you have something which
looks to you like an access to a two-or-more-dimensional array and/or hash,
two-dimensional one. This is actually the way almost all C
multidimensional arrays work as well.
- $array[7][12] # array of arrays
- $array[7]{string} # array of hashes
- $hash{string}[7] # hash of arrays
- $hash{string}{'another string'} # hash of hashes
+ $array[7][12] # array of arrays
+ $array[7]{string} # array of hashes
+ $hash{string}[7] # hash of arrays
+ $hash{string}{'another string'} # hash of hashes
Now, because the top level contains only references, if you try to print
out your array in with a simple print() function, you'll get something
that doesn't look very nice, like this:
- @AoA = ( [2, 3], [4, 5, 7], [0] );
+ my @AoA = ( [2, 3], [4, 5, 7], [0] );
print $AoA[1][2];
7
print @AoA;
repeatedly. Here's the case where you just get the count instead
of a nested array:
- for $i (1..10) {
- @array = somefunc($i);
- $AoA[$i] = @array; # WRONG!
+ for my $i (1..10) {
+ my @array = somefunc($i);
+ $AoA[$i] = @array; # WRONG!
}
That's just the simple case of assigning an array to a scalar and getting
its element count. If that's what you really and truly want, then you
might do well to consider being a tad more explicit about it, like this:
- for $i (1..10) {
- @array = somefunc($i);
- $counts[$i] = scalar @array;
+ for my $i (1..10) {
+ my @array = somefunc($i);
+ $counts[$i] = scalar @array;
}
Here's the case of taking a reference to the same memory location
again and again:
- for $i (1..10) {
- @array = somefunc($i);
- $AoA[$i] = \@array; # WRONG!
+ # Either without strict or having an outer-scope my @array;
+ # declaration.
+
+ for my $i (1..10) {
+ @array = somefunc($i);
+ $AoA[$i] = \@array; # WRONG!
}
So, what's the big problem with that? It looks right, doesn't it?
#include <pwd.h>
main() {
- struct passwd *getpwnam(), *rp, *dp;
- rp = getpwnam("root");
- dp = getpwnam("daemon");
+ struct passwd *getpwnam(), *rp, *dp;
+ rp = getpwnam("root");
+ dp = getpwnam("daemon");
- printf("daemon name is %s\nroot name is %s\n",
- dp->pw_name, rp->pw_name);
+ printf("daemon name is %s\nroot name is %s\n",
+ dp->pw_name, rp->pw_name);
}
Which will print
memory. In Perl, you'll want to use the array constructor C<[]> or the
hash constructor C<{}> instead. Here's the right way to do the preceding
broken code fragments:
+X<[]> X<{}>
- for $i (1..10) {
- @array = somefunc($i);
- $AoA[$i] = [ @array ];
+ # Either without strict or having an outer-scope my @array;
+ # declaration.
+
+ for my $i (1..10) {
+ @array = somefunc($i);
+ $AoA[$i] = [ @array ];
}
The square brackets make a reference to a new array with a I<copy>
Note that this will produce something similar, but it's
much harder to read:
- for $i (1..10) {
- @array = 0 .. $i;
- @{$AoA[$i]} = @array;
+ # Either without strict or having an outer-scope my @array;
+ # declaration.
+ for my $i (1..10) {
+ @array = 0 .. $i;
+ @{$AoA[$i]} = @array;
}
Is it the same? Well, maybe so--and maybe not. The subtle difference
is that when you assign something in square brackets, you know for sure
it's always a brand new reference with a new I<copy> of the data.
-Something else could be going on in this new case with the C<@{$AoA[$i]}}>
+Something else could be going on in this new case with the C<@{$AoA[$i]}>
dereference on the left-hand-side of the assignment. It all depends on
whether C<$AoA[$i]> had been undefined to start with, or whether it
already contained a reference. If you had already populated @AoA with
Surprisingly, the following dangerous-looking construct will
actually work out fine:
- for $i (1..10) {
+ for my $i (1..10) {
my @array = somefunc($i);
$AoA[$i] = \@array;
}
In summary:
- $AoA[$i] = [ @array ]; # usually best
- $AoA[$i] = \@array; # perilous; just how my() was that array?
- @{ $AoA[$i] } = @array; # way too tricky for most programmers
+ $AoA[$i] = [ @array ]; # usually best
+ $AoA[$i] = \@array; # perilous; just how my() was that array?
+ @{ $AoA[$i] } = @array; # way too tricky for most programmers
=head1 CAVEAT ON PRECEDENCE
+X<dereference, precedence> X<dereferencing, precedence>
Speaking of things like C<@{$AoA[$i]}>, the following are actually the
same thing:
+X<< -> >>
- $aref->[2][2] # clear
- $$aref[2][2] # confusing
+ $aref->[2][2] # clear
+ $$aref[2][2] # confusing
That's because Perl's precedence rules on its five prefix dereferencers
(which look like someone swearing: C<$ @ * % &>) make them bind more
this:
my $aref = [
- [ "fred", "barney", "pebbles", "bambam", "dino", ],
- [ "homer", "bart", "marge", "maggie", ],
- [ "george", "jane", "elroy", "judy", ],
+ [ "fred", "barney", "pebbles", "bambam", "dino", ],
+ [ "homer", "bart", "marge", "maggie", ],
+ [ "george", "jane", "elroy", "judy", ],
];
print $aref[2][2];
print $aref->[2][2]
=head1 DEBUGGING
+X<data structure, debugging> X<complex data structure, debugging>
+X<AoA, debugging> X<HoA, debugging> X<AoH, debugging> X<HoH, debugging>
+X<array of arrays, debugging> X<hash of arrays, debugging>
+X<array of hashes, debugging> X<hash of hashes, debugging>
-Before version 5.002, the standard Perl debugger didn't do a very nice job of
-printing out complex data structures. With 5.002 or above, the
-debugger includes several new features, including command line editing as
-well as the C<x> command to dump out complex data structures. For
-example, given the assignment to $AoA above, here's the debugger output:
+You can use the debugger's C<x> command to dump out complex data structures.
+For example, given the assignment to $AoA above, here's the debugger output:
DB<1> x $AoA
$AoA = ARRAY(0x13b5a0)
0 ARRAY(0x1f0a24)
- 0 'fred'
- 1 'barney'
- 2 'pebbles'
- 3 'bambam'
- 4 'dino'
+ 0 'fred'
+ 1 'barney'
+ 2 'pebbles'
+ 3 'bambam'
+ 4 'dino'
1 ARRAY(0x13b558)
- 0 'homer'
- 1 'bart'
- 2 'marge'
- 3 'maggie'
+ 0 'homer'
+ 1 'bart'
+ 2 'marge'
+ 3 'maggie'
2 ARRAY(0x13b540)
- 0 'george'
- 1 'jane'
- 2 'elroy'
- 3 'judy'
+ 0 'george'
+ 1 'jane'
+ 2 'elroy'
+ 3 'judy'
=head1 CODE EXAMPLES
types of data structures.
=head1 ARRAYS OF ARRAYS
+X<array of arrays> X<AoA>
-=head2 Declaration of a ARRAY OF ARRAYS
+=head2 Declaration of an ARRAY OF ARRAYS
@AoA = (
[ "fred", "barney" ],
[ "homer", "marge", "bart" ],
);
-=head2 Generation of a ARRAY OF ARRAYS
+=head2 Generation of an ARRAY OF ARRAYS
# reading from file
while ( <> ) {
# add to an existing row
push @{ $AoA[0] }, "wilma", "betty";
-=head2 Access and Printing of a ARRAY OF ARRAYS
+=head2 Access and Printing of an ARRAY OF ARRAYS
# one element
$AoA[0][0] = "Fred";
}
=head1 HASHES OF ARRAYS
+X<hash of arrays> X<HoA>
=head2 Declaration of a HASH OF ARRAYS
# print the whole thing sorted by number of members and name
foreach $family ( sort {
- @{$HoA{$b}} <=> @{$HoA{$a}}
- ||
- $a cmp $b
- } keys %HoA )
+ @{$HoA{$b}} <=> @{$HoA{$a}}
+ ||
+ $a cmp $b
+ } keys %HoA )
{
print "$family: ", join(", ", sort @{ $HoA{$family} }), "\n";
}
=head1 ARRAYS OF HASHES
+X<array of hashes> X<AoH>
-=head2 Declaration of a ARRAY OF HASHES
+=head2 Declaration of an ARRAY OF HASHES
@AoH = (
{
}
);
-=head2 Generation of a ARRAY OF HASHES
+=head2 Generation of an ARRAY OF HASHES
# reading from file
# format: LEAD=fred FRIEND=barney
$AoH[0]{pet} = "dino";
$AoH[2]{pet} = "santa's little helper";
-=head2 Access and Printing of a ARRAY OF HASHES
+=head2 Access and Printing of an ARRAY OF HASHES
# one element
$AoH[0]{lead} = "fred";
}
=head1 HASHES OF HASHES
+X<hash of hashes> X<HoH>
=head2 Declaration of a HASH OF HASHES
%HoH = (
flintstones => {
- lead => "fred",
- pal => "barney",
+ lead => "fred",
+ pal => "barney",
},
jetsons => {
- lead => "george",
- wife => "jane",
- "his boy" => "elroy",
+ lead => "george",
+ wife => "jane",
+ "his boy" => "elroy",
},
simpsons => {
- lead => "homer",
- wife => "marge",
- kid => "bart",
- },
+ lead => "homer",
+ wife => "marge",
+ kid => "bart",
+ },
);
=head2 Generation of a HASH OF HASHES
# print the whole thing sorted by number of members
- foreach $family ( sort { keys %{$HoH{$b}} <=> keys %{$HoH{$a}} } keys %HoH ) {
+ foreach $family ( sort { keys %{$HoH{$b}} <=> keys %{$HoH{$a}} }
+ keys %HoH )
+ {
print "$family: { ";
for $role ( sort keys %{ $HoH{$family} } ) {
print "$role=$HoH{$family}{$role} ";
for ( qw(lead wife son daughter pal pet) ) { $rank{$_} = ++$i }
# now print the whole thing sorted by number of members
- foreach $family ( sort { keys %{ $HoH{$b} } <=> keys %{ $HoH{$a} } } keys %HoH ) {
+ foreach $family ( sort { keys %{ $HoH{$b} } <=> keys %{ $HoH{$a} } }
+ keys %HoH )
+ {
print "$family: { ";
# and print these according to rank order
- for $role ( sort { $rank{$a} <=> $rank{$b} } keys %{ $HoH{$family} } ) {
+ for $role ( sort { $rank{$a} <=> $rank{$b} }
+ keys %{ $HoH{$family} } )
+ {
print "$role=$HoH{$family}{$role} ";
}
print "}\n";
=head1 MORE ELABORATE RECORDS
+X<record> X<structure> X<struct>
=head2 Declaration of MORE ELABORATE RECORDS
many different sorts:
$rec = {
- TEXT => $string,
- SEQUENCE => [ @old_values ],
- LOOKUP => { %some_table },
- THATCODE => \&some_function,
- THISCODE => sub { $_[0] ** $_[1] },
- HANDLE => \*STDOUT,
+ TEXT => $string,
+ SEQUENCE => [ @old_values ],
+ LOOKUP => { %some_table },
+ THATCODE => \&some_function,
+ THISCODE => sub { $_[0] ** $_[1] },
+ HANDLE => \*STDOUT,
};
print $rec->{TEXT};
=head1 SEE ALSO
-perlref(1), perllol(1), perldata(1), perlobj(1)
+L<perlref>, L<perllol>, L<perldata>, L<perlobj>
=head1 AUTHOR
Tom Christiansen <F<tchrist@perl.com>>
-
-Last update:
-Wed Oct 23 04:57:50 MET DST 1996