[perl5.git] / pod / perlreftut.pod


=head1 NAME

perlreftut - Mark's very short tutorial about references

=head1 DESCRIPTION

One of the most important new features in Perl 5 was the capability to
manage complicated data structures like multidimensional arrays and
nested hashes.  To enable these, Perl 5 introduced a feature called
`references', and using references is the key to managing complicated,
structured data in Perl.  Unfortunately, there's a lot of funny syntax
to learn, and the main manual page can be hard to follow.  The manual
is quite complete, and sometimes people find that a problem, because
it can be hard to tell what is important and what isn't.

Fortunately, you only need to know 10% of what's in the main page to get
90% of the benefit.  This page will show you that 10%.

=head1 Who Needs Complicated Data Structures?

One problem that came up all the time in Perl 4 was how to represent a
hash whose values were lists.  Perl 4 had hashes, of course, but the
values had to be scalars; they couldn't be lists.  

Why would you want a hash of lists?  Let's take a simple example: You
have a file of city and country names, like this:

	Chicago, USA
	Frankfurt, Germany
	Berlin, Germany
	Washington, USA
	Helsinki, Finland
	New York, USA

and you want to produce an output like this, with each country mentioned
once, and then an alphabetical list of the cities in that country:

	Finland: Helsinki.
	Germany: Berlin, Frankfurt.
	USA:  Chicago, New York, Washington.

The natural way to do this is to have a hash whose keys are country
names.  Associated with each country name key is a list of the cities in
that country.  Each time you read a line of input, split it into a country
and a city, look up the list of cities already known to be in that
country, and append the new city to the list.  When you're done reading
the input, iterate over the hash as usual, sorting each list of cities
before you print it out.

If hash values can't be lists, you lose.  In Perl 4, hash values can't
be lists; they can only be strings.  You lose.  You'd probably have to
combine all the cities into a single string somehow, and then when
time came to write the output, you'd have to break the string into a
list, sort the list, and turn it back into a string.  This is messy
and error-prone.  And it's frustrating, because Perl already has
perfectly good lists that would solve the problem if only you could
use them.

=head1 The Solution

By the time Perl 5 rolled around, we were already stuck with this
design: Hash values must be scalars.  The solution to this is
references.

A reference is a scalar value that I<refers to> an entire array or an
entire hash (or to just about anything else).  Names are one kind of
reference that you're already familiar with.  Think of the President
of the United States: a messy, inconvenient bag of blood and bones.
But to talk about him, or to represent him in a computer program, all
you need is the easy, convenient scalar string "George Bush".

References in Perl are like names for arrays and hashes.  They're
Perl's private, internal names, so you can be sure they're
unambiguous.  Unlike "George Bush", a reference only refers to one
thing, and you always know what it refers to.  If you have a reference
to an array, you can recover the entire array from it.  If you have a
reference to a hash, you can recover the entire hash.  But the
reference is still an easy, compact scalar value.

You can't have a hash whose values are arrays; hash values can only be
scalars.  We're stuck with that.  But a single reference can refer to
an entire array, and references are scalars, so you can have a hash of
references to arrays, and it'll act a lot like a hash of arrays, and
it'll be just as useful as a hash of arrays.

We'll come back to this city-country problem later, after we've seen
some syntax for managing references.


=head1 Syntax

There are just two ways to make a reference, and just two ways to use
it once you have it.

=head2 Making References

=head3 B<Make Rule 1>

If you put a C<\> in front of a variable, you get a
reference to that variable.

    $aref = \@array;         # $aref now holds a reference to @array
    $href = \%hash;          # $href now holds a reference to %hash

Once the reference is stored in a variable like $aref or $href, you
can copy it or store it just the same as any other scalar value:

    $xy = $aref;             # $xy now holds a reference to @array
    $p[3] = $href;           # $p[3] now holds a reference to %hash
    $z = $p[3];              # $z now holds a reference to %hash


These examples show how to make references to variables with names.
Sometimes you want to make an array or a hash that doesn't have a
name.  This is analogous to the way you like to be able to use the
string C<"\n"> or the number 80 without having to store it in a named
variable first.

B<Make Rule 2>

C<[ ITEMS ]> makes a new, anonymous array, and returns a reference to
that array. C<{ ITEMS }> makes a new, anonymous hash. and returns a
reference to that hash.

    $aref = [ 1, "foo", undef, 13 ];  
    # $aref now holds a reference to an array

    $href = { APR => 4, AUG => 8 };   
    # $href now holds a reference to a hash


The references you get from rule 2 are the same kind of
references that you get from rule 1:

	# This:
	$aref = [ 1, 2, 3 ];

	# Does the same as this:
	@array = (1, 2, 3);
	$aref = \@array;


The first line is an abbreviation for the following two lines, except
that it doesn't create the superfluous array variable C<@array>.

If you write just C<[]>, you get a new, empty anonymous array.
If you write just C<{}>, you get a new, empty anonymous hash.


=head2 Using References

What can you do with a reference once you have it?  It's a scalar
value, and we've seen that you can store it as a scalar and get it back
again just like any scalar.  There are just two more ways to use it:

=head3 B<Use Rule 1>

You can always use an array reference, in curly braces, in place of
the name of an array.  For example, C<@{$aref}> instead of C<@array>.

Here are some examples of that:

Arrays:


	@a		@{$aref}		An array
	reverse @a	reverse @{$aref}	Reverse the array
	$a[3]		${$aref}[3]		An element of the array
	$a[3] = 17;	${$aref}[3] = 17	Assigning an element


On each line are two expressions that do the same thing.  The
left-hand versions operate on the array C<@a>, and the right-hand
versions operate on the array that is referred to by C<$aref>, but
once they find the array they're operating on, they do the same things
to the arrays.

Using a hash reference is I<exactly> the same:

	%h		%{$href}	      A hash
	keys %h		keys %{$href}	      Get the keys from the hash
	$h{'red'}	${$href}{'red'}	      An element of the hash
	$h{'red'} = 17	${$href}{'red'} = 17  Assigning an element

Whatever you want to do with a reference, B<Use Rule 1> tells you how
to do it.  You just write the Perl code that you would have written
for doing the same thing to a regular array or hash, and then replace
the array or hash name with C<{$reference}>.  "How do I loop over an
array when all I have is a reference?"  Well, to loop over an array, you
would write

        for my $element (@array) {
           ...
        }

so replace the array name, C<@array>, with the reference:

        for my $element (@{$aref}) {
           ...
        }

"How do I print out the contents of a hash when all I have is a
reference?"  First write the code for printing out a hash:

        for my $key (keys %hash) {
          print "$key => $hash{$key}\n";
        }

And then replace the hash name with the reference:

        for my $key (keys %{$href}) {
          print "$key => ${$href}{$key}\n";
        }

=head3 B<Use Rule 2>

B<Use Rule 1> is all you really need, because it tells you how to to
absolutely everything you ever need to do with references.  But the
most common thing to do with an array or a hash is to extract a single
element, and the B<Use Rule 1> notation is cumbersome.  So there is an
abbreviation.

C<${$aref}[3]> is too hard to read, so you can write C<< $aref->[3] >>
instead.

C<${$href}{red}> is too hard to read, so you can write
C<< $href->{red} >> instead.

If C<$aref> holds a reference to an array, then C<< $aref->[3] >> is
the fourth element of the array.  Don't confuse this with C<$aref[3]>,
which is the fourth element of a totally different array, one
deceptively named C<@aref>.  C<$aref> and C<@aref> are unrelated the
same way that C<$item> and C<@item> are.

Similarly, C<< $href->{'red'} >> is part of the hash referred to by
the scalar variable C<$href>, perhaps even one with no name.
C<$href{'red'}> is part of the deceptively named C<%href> hash.  It's
easy to forget to leave out the C<< -> >>, and if you do, you'll get
bizarre results when your program gets array and hash elements out of
totally unexpected hashes and arrays that weren't the ones you wanted
to use.


=head2 An Example

Let's see a quick example of how all this is useful.

First, remember that C<[1, 2, 3]> makes an anonymous array containing
C<(1, 2, 3)>, and gives you a reference to that array.

Now think about

	@a = ( [1, 2, 3],
               [4, 5, 6],
	       [7, 8, 9]
             );

@a is an array with three elements, and each one is a reference to
another array.

C<$a[1]> is one of these references.  It refers to an array, the array
containing C<(4, 5, 6)>, and because it is a reference to an array,
B<Use Rule 2> says that we can write C<< $a[1]->[2] >> to get the
third element from that array.  C<< $a[1]->[2] >> is the 6.
Similarly, C<< $a[0]->[1] >> is the 2.  What we have here is like a
two-dimensional array; you can write C<< $a[ROW]->[COLUMN] >> to get
or set the element in any row and any column of the array.

The notation still looks a little cumbersome, so there's one more
abbreviation:  

=head2 Arrow Rule

In between two B<subscripts>, the arrow is optional.

Instead of C<< $a[1]->[2] >>, we can write C<$a[1][2]>; it means the
same thing.  Instead of C<< $a[0]->[1] = 23 >>, we can write
C<$a[0][1] = 23>; it means the same thing.

Now it really looks like two-dimensional arrays!

You can see why the arrows are important.  Without them, we would have
had to write C<${$a[1]}[2]> instead of C<$a[1][2]>.  For
three-dimensional arrays, they let us write C<$x[2][3][5]> instead of
the unreadable C<${${$x[2]}[3]}[5]>.

=head1 Solution

Here's the answer to the problem I posed earlier, of reformatting a
file of city and country names.

    1   my %table;

    2   while (<>) {
    3    chomp;
    4     my ($city, $country) = split /, /;
    5     $table{$country} = [] unless exists $table{$country};
    6     push @{$table{$country}}, $city;
    7   }

    8   foreach $country (sort keys %table) {
    9     print "$country: ";
   10     my @cities = @{$table{$country}};
   11     print join ', ', sort @cities;
   12     print ".\n";
   13	}


The program has two pieces: Lines 2--7 read the input and build a data
structure, and lines 8-13 analyze the data and print out the report.
We're going to have a hash, C<%table>, whose keys are country names,
and whose values are references to arrays of city names.  The data
structure will look like this:


           %table
        +-------+---+   
        |       |   |   +-----------+--------+
        |Germany| *---->| Frankfurt | Berlin |
        |       |   |   +-----------+--------+
        +-------+---+
        |       |   |   +----------+
        |Finland| *---->| Helsinki |
        |       |   |   +----------+
        +-------+---+
        |       |   |   +---------+------------+----------+
        |  USA  | *---->| Chicago | Washington | New York |
        |       |   |   +---------+------------+----------+
        +-------+---+

We'll look at output first.  Supposing we already have this structure,
how do we print it out?

C<%table> is an
ordinary hash, and we get a list of keys from it, sort the keys, and
loop over the keys as usual.  The only use of references is in line 10.
C<$table{$country}> looks up the key C<$country> in the hash
and gets the value, which is a reference to an array of cities in that country.
B<Use Rule 1> says that
we can recover the array by saying
C<@{$table{$country}}>.  Line 10 is just like

	@cities = @array;

except that the name C<array> has been replaced by the reference
C<{$table{$country}}>.  The C<@> tells Perl to get the entire array.
Having gotten the list of cities, we sort it, join it, and print it
out as usual.

Lines 2-7 are responsible for building the structure in the first
place; here they are again:

    2   while (<>) {
    3    chomp;
    4     my ($city, $country) = split /, /;
    5     $table{$country} = [] unless exists $table{$country};
    6     push @{$table{$country}}, $city;
    7   }

Lines 2-4 acquire a city and country name.  Line 5 looks to see if the
country is already present as a key in the hash.  If it's not, the
program uses the C<[]> notation (B<Make Rule 2>) to manufacture a new,
empty anonymous array of cities, and installs a reference to it into
the hash under the appropriate key.

Line 6 installs the city name into the appropriate array.
C<$table{$country}> now holds a reference to the array of cities seen
in that country so far.  Line 6 is exactly like

	push @array, $city;

except that the name C<array> has been replaced by the reference
C<{$table{$country}}>.  The C<push> adds a city name to the end of the
referred-to array.

There's one fine point I skipped.  Line 5 is unnecessary, and we can
get rid of it.  

    2   while (<>) {
    3    chomp;
    4     my ($city, $country) = split /, /;
    5   ####  $table{$country} = [] unless exists $table{$country};
    6     push @{$table{$country}}, $city;
    7   }

If there's already an entry in C<%table> for the current C<$country>,
then nothing is different.  Line 6 will locate the value in
C<$table{$country}>, which is a reference to an array, and push
C<$city> into the array.  But
what does it do when
C<$country> holds a key, say C<Greece>, that is not yet in C<%table>?

This is Perl, so it does the exact right thing.  It sees that you want
to push C<Athens> onto an array that doesn't exist, so it helpfully
makes a new, empty, anonymous array for you, installs it into
C<%table>, and then pushes C<Athens> onto it.  This is called
`autovivification'--bringing things to life automatically.  Perl saw
that they key wasn't in the hash, so it created a new hash entry
automatically. Perl saw that you wanted to use the hash value as an
array, so it created a new empty array and installed a reference to it
in the hash automatically.  And as usual, Perl made the array one
element longer to hold the new city name.

=head1 The Rest

I promised to give you 90% of the benefit with 10% of the details, and
that means I left out 90% of the details.  Now that you have an
overview of the important parts, it should be easier to read the
L<perlref> manual page, which discusses 100% of the details.

Some of the highlights of L<perlref>:

=over 4

=item *

You can make references to anything, including scalars, functions, and
other references.

=item *

In B<USE RULE 1>, you can omit the curly brackets whenever the thing
inside them is an atomic scalar variable like C<$aref>.  For example,
C<@$aref> is the same as C<@{$aref}>, and C<$$aref[1]> is the same as
C<${$aref}[1]>.  If you're just starting out, you may want to adopt
the habit of always including the curly brackets.

=item *

This doesn't copy the underlying array:

        $aref2 = $aref1;        

You get two references to the same array.  If you modify 
C<< $aref1->[23] >> and then look at
C<< $aref2->[23] >> you'll see the change.   

To copy the array, use

        $aref2 = [@{$aref1}];

This uses C<[...]> notation to create a new anonymous array, and
C<$aref2> is assigned a reference to the new array.  The new array is
initialized with the contents of the array referred to by C<$aref1>.

Similarly, to copy an anonymous hash, you can use

        $href = {%{$href}};

=item * 

To see if a variable contains a reference, use the `ref' function.  It
returns true if its argument is a reference.  Actually it's a little
better than that: It returns C<HASH> for hash references and C<ARRAY>
for array references.

=item * 

If you try to use a reference like a string, you get strings like

	ARRAY(0x80f5dec)   or    HASH(0x826afc0)

If you ever see a string that looks like this, you'll know you
printed out a reference by mistake.

A side effect of this representation is that you can use C<eq> to see
if two references refer to the same thing.  (But you should usually use
C<==> instead because it's much faster.)

=item *

You can use a string as if it were a reference.  If you use the string
C<"foo"> as an array reference, it's taken to be a reference to the
array C<@foo>.  This is called a I<soft reference> or I<symbolic reference>.

=back

You might prefer to go on to L<perllol> instead of L<perlref>; it
discusses lists of lists and multidimensional arrays in detail.  After
that, you should move on to L<perldsc>; it's a Data Structure Cookbook
that shows recipes for using and printing out arrays of hashes, hashes
of arrays, and other kinds of data.

=head1 Summary

Everyone needs compound data structures, and in Perl the way you get
them is with references.  There are four important rules for managing
references: Two for making references and two for using them.  Once
you know these rules you can do most of the important things you need
to do with references.

=head1 Credits

Author: Mark-Jason Dominus, Plover Systems (C<mjd-perl-ref+@plover.com>)

This article originally appeared in I<The Perl Journal>
( http://www.tpj.com/ ) volume 3, #2.  Reprinted with permission.  

The original title was I<Understand References Today>.

=head2 Distribution Conditions

Copyright 1998 The Perl Journal.

When included as part of the Standard Version of Perl, or as part of
its complete documentation whether printed or otherwise, this work may
be distributed only under the terms of Perl's Artistic License.  Any
distribution of this file or derivatives thereof outside of that
package require that special arrangements be made with copyright
holder.

Irrespective of its distribution, all code examples in these files are
hereby placed into the public domain.  You are permitted and
encouraged to use this code in your own programs for fun or for profit
as you see fit.  A simple comment in the code giving credit would be
courteous but is not required.


=cut
Commit	Line	Data
a1e2a320 GS	1
	2	=head1 NAME
	3
	4	perlreftut - Mark's very short tutorial about references
	5
	6	=head1 DESCRIPTION
	7
	8	One of the most important new features in Perl 5 was the capability to
	9	manage complicated data structures like multidimensional arrays and
	10	nested hashes. To enable these, Perl 5 introduced a feature called
	11	`references', and using references is the key to managing complicated,
	12	structured data in Perl. Unfortunately, there's a lot of funny syntax
	13	to learn, and the main manual page can be hard to follow. The manual
1da6492a GS	14	is quite complete, and sometimes people find that a problem, because
1da6492a GS	15	it can be hard to tell what is important and what isn't.
a1e2a320 GS	16
	17	Fortunately, you only need to know 10% of what's in the main page to get
	18	90% of the benefit. This page will show you that 10%.
	19
	20	=head1 Who Needs Complicated Data Structures?
	21
	22	One problem that came up all the time in Perl 4 was how to represent a
	23	hash whose values were lists. Perl 4 had hashes, of course, but the
	24	values had to be scalars; they couldn't be lists.
	25
	26	Why would you want a hash of lists? Let's take a simple example: You
1da6492a	27	have a file of city and country names, like this:
a1e2a320	28
1da6492a GS	29	Chicago, USA
	30	Frankfurt, Germany
	31	Berlin, Germany
	32	Washington, USA
	33	Helsinki, Finland
	34	New York, USA
a1e2a320	35
1da6492a GS	36	and you want to produce an output like this, with each country mentioned
1da6492a GS	37	once, and then an alphabetical list of the cities in that country:
a1e2a320	38
1da6492a GS	39	Finland: Helsinki.
	40	Germany: Berlin, Frankfurt.
	41	USA: Chicago, New York, Washington.
a1e2a320	42
1da6492a GS	43	The natural way to do this is to have a hash whose keys are country
	44	names. Associated with each country name key is a list of the cities in
	45	that country. Each time you read a line of input, split it into a country
a1e2a320	46	and a city, look up the list of cities already known to be in that
1da6492a	47	country, and append the new city to the list. When you're done reading
a1e2a320 GS	48	the input, iterate over the hash as usual, sorting each list of cities
	49	before you print it out.
	50
	51	If hash values can't be lists, you lose. In Perl 4, hash values can't
	52	be lists; they can only be strings. You lose. You'd probably have to
	53	combine all the cities into a single string somehow, and then when
	54	time came to write the output, you'd have to break the string into a
	55	list, sort the list, and turn it back into a string. This is messy
	56	and error-prone. And it's frustrating, because Perl already has
	57	perfectly good lists that would solve the problem if only you could
	58	use them.
	59
	60	=head1 The Solution
	61
1da6492a GS	62	By the time Perl 5 rolled around, we were already stuck with this
1da6492a GS	63	design: Hash values must be scalars. The solution to this is
a1e2a320 GS	64	references.
	65
	66	A reference is a scalar value that I<refers to> an entire array or an
1da6492a	67	entire hash (or to just about anything else). Names are one kind of
e937c8c3 MJD	68	reference that you're already familiar with. Think of the President
	69	of the United States: a messy, inconvenient bag of blood and bones.
	70	But to talk about him, or to represent him in a computer program, all
	71	you need is the easy, convenient scalar string "George Bush".
a1e2a320 GS	72
	73	References in Perl are like names for arrays and hashes. They're
	74	Perl's private, internal names, so you can be sure they're
e937c8c3	75	unambiguous. Unlike "George Bush", a reference only refers to one
a1e2a320 GS	76	thing, and you always know what it refers to. If you have a reference
	77	to an array, you can recover the entire array from it. If you have a
	78	reference to a hash, you can recover the entire hash. But the
	79	reference is still an easy, compact scalar value.
	80
	81	You can't have a hash whose values are arrays; hash values can only be
	82	scalars. We're stuck with that. But a single reference can refer to
	83	an entire array, and references are scalars, so you can have a hash of
	84	references to arrays, and it'll act a lot like a hash of arrays, and
	85	it'll be just as useful as a hash of arrays.
	86
1da6492a	87	We'll come back to this city-country problem later, after we've seen
a1e2a320 GS	88	some syntax for managing references.
	89
	90
	91	=head1 Syntax
	92
	93	There are just two ways to make a reference, and just two ways to use
	94	it once you have it.
	95
	96	=head2 Making References
	97
a29d1a25	98	=head3 B<Make Rule 1>
a1e2a320 GS	99
	100	If you put a C<\> in front of a variable, you get a
	101	reference to that variable.
	102
	103	$aref = \@array; # $aref now holds a reference to @array
	104	$href = \%hash; # $href now holds a reference to %hash
	105
	106	Once the reference is stored in a variable like $aref or $href, you
	107	can copy it or store it just the same as any other scalar value:
	108
	109	$xy = $aref; # $xy now holds a reference to @array
	110	$p[3] = $href; # $p[3] now holds a reference to %hash
	111	$z = $p[3]; # $z now holds a reference to %hash
	112
	113
	114	These examples show how to make references to variables with names.
	115	Sometimes you want to make an array or a hash that doesn't have a
	116	name. This is analogous to the way you like to be able to use the
	117	string C<"\n"> or the number 80 without having to store it in a named
	118	variable first.
	119
	120	B<Make Rule 2>
	121
	122	C<[ ITEMS ]> makes a new, anonymous array, and returns a reference to
	123	that array. C<{ ITEMS }> makes a new, anonymous hash. and returns a
	124	reference to that hash.
	125
	126	$aref = [ 1, "foo", undef, 13 ];
	127	# $aref now holds a reference to an array
	128
	129	$href = { APR => 4, AUG => 8 };
	130	# $href now holds a reference to a hash
	131
	132
	133	The references you get from rule 2 are the same kind of
	134	references that you get from rule 1:
	135
	136	# This:
	137	$aref = [ 1, 2, 3 ];
	138
	139	# Does the same as this:
	140	@array = (1, 2, 3);
	141	$aref = \@array;
	142
	143
	144	The first line is an abbreviation for the following two lines, except
	145	that it doesn't create the superfluous array variable C<@array>.
	146
a29d1a25 JH	147	If you write just C<[]>, you get a new, empty anonymous array.
	148	If you write just C<{}>, you get a new, empty anonymous hash.
	149
a1e2a320 GS	150
	151	=head2 Using References
	152
	153	What can you do with a reference once you have it? It's a scalar
	154	value, and we've seen that you can store it as a scalar and get it back
	155	again just like any scalar. There are just two more ways to use it:
	156
a29d1a25	157	=head3 B<Use Rule 1>
a1e2a320	158
a29d1a25 JH	159	You can always use an array reference, in curly braces, in place of
a29d1a25 JH	160	the name of an array. For example, C<@{$aref}> instead of C<@array>.
a1e2a320 GS	161
	162	Here are some examples of that:
	163
	164	Arrays:
	165
	166
	167	@a @{$aref} An array
	168	reverse @a reverse @{$aref} Reverse the array
	169	$a[3] ${$aref}[3] An element of the array
	170	$a[3] = 17; ${$aref}[3] = 17 Assigning an element
	171
	172
	173	On each line are two expressions that do the same thing. The
	174	left-hand versions operate on the array C<@a>, and the right-hand
	175	versions operate on the array that is referred to by C<$aref>, but
	176	once they find the array they're operating on, they do the same things
	177	to the arrays.
	178
	179	Using a hash reference is I<exactly> the same:
	180
	181	%h %{$href} A hash
	182	keys %h keys %{$href} Get the keys from the hash
	183	$h{'red'} ${$href}{'red'} An element of the hash
	184	$h{'red'} = 17 ${$href}{'red'} = 17 Assigning an element
	185
a29d1a25 JH	186	Whatever you want to do with a reference, B<Use Rule 1> tells you how
	187	to do it. You just write the Perl code that you would have written
	188	for doing the same thing to a regular array or hash, and then replace
	189	the array or hash name with C<{$reference}>. "How do I loop over an
	190	array when all I have is a reference?" Well, to loop over an array, you
	191	would write
	192
	193	for my $element (@array) {
	194	...
	195	}
	196
	197	so replace the array name, C<@array>, with the reference:
	198
	199	for my $element (@{$aref}) {
	200	...
	201	}
	202
	203	"How do I print out the contents of a hash when all I have is a
	204	reference?" First write the code for printing out a hash:
	205
	206	for my $key (keys %hash) {
	207	print "$key => $hash{$key}\n";
	208	}
	209
	210	And then replace the hash name with the reference:
	211
	212	for my $key (keys %{$href}) {
	213	print "$key => ${$href}{$key}\n";
	214	}
	215
	216	=head3 B<Use Rule 2>
a1e2a320	217
a29d1a25 JH	218	B<Use Rule 1> is all you really need, because it tells you how to to
	219	absolutely everything you ever need to do with references. But the
	220	most common thing to do with an array or a hash is to extract a single
	221	element, and the B<Use Rule 1> notation is cumbersome. So there is an
	222	abbreviation.
a1e2a320	223
c47ff5f1	224	C<${$aref}[3]> is too hard to read, so you can write C<< $aref->[3] >>
a1e2a320 GS	225	instead.
	226
	227	C<${$href}{red}> is too hard to read, so you can write
c47ff5f1	228	C<< $href->{red} >> instead.
a1e2a320	229
c47ff5f1	230	If C<$aref> holds a reference to an array, then C<< $aref->[3] >> is
a1e2a320 GS	231	the fourth element of the array. Don't confuse this with C<$aref[3]>,
	232	which is the fourth element of a totally different array, one
	233	deceptively named C<@aref>. C<$aref> and C<@aref> are unrelated the
	234	same way that C<$item> and C<@item> are.
	235
c47ff5f1	236	Similarly, C<< $href->{'red'} >> is part of the hash referred to by
a1e2a320 GS	237	the scalar variable C<$href>, perhaps even one with no name.
a1e2a320 GS	238	C<$href{'red'}> is part of the deceptively named C<%href> hash. It's
c47ff5f1	239	easy to forget to leave out the C<< -> >>, and if you do, you'll get
a1e2a320 GS	240	bizarre results when your program gets array and hash elements out of
	241	totally unexpected hashes and arrays that weren't the ones you wanted
	242	to use.
	243
	244
a29d1a25	245	=head2 An Example
a1e2a320 GS	246
	247	Let's see a quick example of how all this is useful.
	248
	249	First, remember that C<[1, 2, 3]> makes an anonymous array containing
	250	C<(1, 2, 3)>, and gives you a reference to that array.
	251
	252	Now think about
	253
	254	@a = ( [1, 2, 3],
	255	[4, 5, 6],
	256	[7, 8, 9]
	257	);
	258
	259	@a is an array with three elements, and each one is a reference to
	260	another array.
	261
	262	C<$a[1]> is one of these references. It refers to an array, the array
	263	containing C<(4, 5, 6)>, and because it is a reference to an array,
a29d1a25	264	B<Use Rule 2> says that we can write C<< $a[1]->[2] >> to get the
c47ff5f1 GS	265	third element from that array. C<< $a[1]->[2] >> is the 6.
	266	Similarly, C<< $a[0]->[1] >> is the 2. What we have here is like a
	267	two-dimensional array; you can write C<< $a[ROW]->[COLUMN] >> to get
a1e2a320 GS	268	or set the element in any row and any column of the array.
	269
	270	The notation still looks a little cumbersome, so there's one more
	271	abbreviation:
	272
a29d1a25	273	=head2 Arrow Rule
a1e2a320 GS	274
	275	In between two B<subscripts>, the arrow is optional.
	276
c47ff5f1	277	Instead of C<< $a[1]->[2] >>, we can write C<$a[1][2]>; it means the
a29d1a25 JH	278	same thing. Instead of C<< $a[0]->[1] = 23 >>, we can write
a29d1a25 JH	279	C<$a[0][1] = 23>; it means the same thing.
a1e2a320 GS	280
	281	Now it really looks like two-dimensional arrays!
	282
	283	You can see why the arrows are important. Without them, we would have
	284	had to write C<${$a[1]}[2]> instead of C<$a[1][2]>. For
	285	three-dimensional arrays, they let us write C<$x[2][3][5]> instead of
	286	the unreadable C<${${$x[2]}[3]}[5]>.
	287
a1e2a320 GS	288	=head1 Solution
a1e2a320 GS	289
1da6492a GS	290	Here's the answer to the problem I posed earlier, of reformatting a
1da6492a GS	291	file of city and country names.
a1e2a320	292
a29d1a25 JH	293	1 my %table;
	294
	295	2 while (<>) {
	296	3 chomp;
	297	4 my ($city, $country) = split /, /;
	298	5 $table{$country} = [] unless exists $table{$country};
	299	6 push @{$table{$country}}, $city;
	300	7 }
	301
	302	8 foreach $country (sort keys %table) {
	303	9 print "$country: ";
	304	10 my @cities = @{$table{$country}};
	305	11 print join ', ', sort @cities;
	306	12 print ".\n";
	307	13 }
	308
	309
	310	The program has two pieces: Lines 2--7 read the input and build a data
	311	structure, and lines 8-13 analyze the data and print out the report.
	312	We're going to have a hash, C<%table>, whose keys are country names,
	313	and whose values are references to arrays of city names. The data
	314	structure will look like this:
	315
	316
	317	%table
	318	+-------+---+
	319	\| \| \| +-----------+--------+
	320	\|Germany\| *---->\| Frankfurt \| Berlin \|
	321	\| \| \| +-----------+--------+
	322	+-------+---+
	323	\| \| \| +----------+
	324	\|Finland\| *---->\| Helsinki \|
	325	\| \| \| +----------+
	326	+-------+---+
	327	\| \| \| +---------+------------+----------+
	328	\| USA \| *---->\| Chicago \| Washington \| New York \|
	329	\| \| \| +---------+------------+----------+
	330	+-------+---+
	331
	332	We'll look at output first. Supposing we already have this structure,
	333	how do we print it out?
	334
	335	C<%table> is an
	336	ordinary hash, and we get a list of keys from it, sort the keys, and
	337	loop over the keys as usual. The only use of references is in line 10.
	338	C<$table{$country}> looks up the key C<$country> in the hash
	339	and gets the value, which is a reference to an array of cities in that country.
	340	B<Use Rule 1> says that
	341	we can recover the array by saying
	342	C<@{$table{$country}}>. Line 10 is just like
a1e2a320	343
a29d1a25	344	@cities = @array;
a1e2a320 GS	345
a1e2a320 GS	346	except that the name C<array> has been replaced by the reference
a29d1a25 JH	347	C<{$table{$country}}>. The C<@> tells Perl to get the entire array.
	348	Having gotten the list of cities, we sort it, join it, and print it
	349	out as usual.
a1e2a320	350
a29d1a25 JH	351	Lines 2-7 are responsible for building the structure in the first
a29d1a25 JH	352	place; here they are again:
a1e2a320	353
a29d1a25 JH	354	2 while (<>) {
	355	3 chomp;
	356	4 my ($city, $country) = split /, /;
	357	5 $table{$country} = [] unless exists $table{$country};
	358	6 push @{$table{$country}}, $city;
	359	7 }
a1e2a320	360
a29d1a25 JH	361	Lines 2-4 acquire a city and country name. Line 5 looks to see if the
	362	country is already present as a key in the hash. If it's not, the
	363	program uses the C<[]> notation (B<Make Rule 2>) to manufacture a new,
	364	empty anonymous array of cities, and installs a reference to it into
	365	the hash under the appropriate key.
a1e2a320	366
a29d1a25 JH	367	Line 6 installs the city name into the appropriate array.
	368	C<$table{$country}> now holds a reference to the array of cities seen
	369	in that country so far. Line 6 is exactly like
a1e2a320	370
a29d1a25	371	push @array, $city;
a1e2a320	372
a29d1a25 JH	373	except that the name C<array> has been replaced by the reference
	374	C<{$table{$country}}>. The C<push> adds a city name to the end of the
	375	referred-to array.
a1e2a320	376
a29d1a25 JH	377	There's one fine point I skipped. Line 5 is unnecessary, and we can
	378	get rid of it.
	379
	380	2 while (<>) {
	381	3 chomp;
	382	4 my ($city, $country) = split /, /;
	383	5 #### $table{$country} = [] unless exists $table{$country};
	384	6 push @{$table{$country}}, $city;
	385	7 }
	386
	387	If there's already an entry in C<%table> for the current C<$country>,
	388	then nothing is different. Line 6 will locate the value in
	389	C<$table{$country}>, which is a reference to an array, and push
	390	C<$city> into the array. But
	391	what does it do when
	392	C<$country> holds a key, say C<Greece>, that is not yet in C<%table>?
a1e2a320 GS	393
a1e2a320 GS	394	This is Perl, so it does the exact right thing. It sees that you want
1da6492a	395	to push C<Athens> onto an array that doesn't exist, so it helpfully
a29d1a25 JH	396	makes a new, empty, anonymous array for you, installs it into
	397	C<%table>, and then pushes C<Athens> onto it. This is called
	398	`autovivification'--bringing things to life automatically. Perl saw
	399	that they key wasn't in the hash, so it created a new hash entry
	400	automatically. Perl saw that you wanted to use the hash value as an
	401	array, so it created a new empty array and installed a reference to it
	402	in the hash automatically. And as usual, Perl made the array one
	403	element longer to hold the new city name.
a1e2a320 GS	404
	405	=head1 The Rest
	406
	407	I promised to give you 90% of the benefit with 10% of the details, and
	408	that means I left out 90% of the details. Now that you have an
	409	overview of the important parts, it should be easier to read the
	410	L<perlref> manual page, which discusses 100% of the details.
	411
	412	Some of the highlights of L<perlref>:
	413
	414	=over 4
	415
	416	=item *
	417
	418	You can make references to anything, including scalars, functions, and
	419	other references.
	420
	421	=item *
	422
d98d5fff	423	In B<USE RULE 1>, you can omit the curly brackets whenever the thing
1da6492a	424	inside them is an atomic scalar variable like C<$aref>. For example,
a1e2a320	425	C<@$aref> is the same as C<@{$aref}>, and C<$$aref[1]> is the same as
1da6492a	426	C<${$aref}[1]>. If you're just starting out, you may want to adopt
d98d5fff	427	the habit of always including the curly brackets.
a1e2a320	428
a29d1a25 JH	429	=item *
	430
	431	This doesn't copy the underlying array:
	432
	433	$aref2 = $aref1;
	434
	435	You get two references to the same array. If you modify
	436	C<< $aref1->[23] >> and then look at
	437	C<< $aref2->[23] >> you'll see the change.
	438
	439	To copy the array, use
	440
	441	$aref2 = [@{$aref1}];
	442
	443	This uses C<[...]> notation to create a new anonymous array, and
	444	C<$aref2> is assigned a reference to the new array. The new array is
	445	initialized with the contents of the array referred to by C<$aref1>.
	446
	447	Similarly, to copy an anonymous hash, you can use
	448
	449	$href = {%{$href}};
	450
a1e2a320 GS	451	=item *
a1e2a320 GS	452
a29d1a25 JH	453	To see if a variable contains a reference, use the `ref' function. It
	454	returns true if its argument is a reference. Actually it's a little
	455	better than that: It returns C<HASH> for hash references and C<ARRAY>
	456	for array references.
a1e2a320 GS	457
	458	=item *
	459
	460	If you try to use a reference like a string, you get strings like
	461
	462	ARRAY(0x80f5dec) or HASH(0x826afc0)
	463
	464	If you ever see a string that looks like this, you'll know you
	465	printed out a reference by mistake.
	466
	467	A side effect of this representation is that you can use C<eq> to see
	468	if two references refer to the same thing. (But you should usually use
	469	C<==> instead because it's much faster.)
	470
	471	=item *
	472
	473	You can use a string as if it were a reference. If you use the string
	474	C<"foo"> as an array reference, it's taken to be a reference to the
	475	array C<@foo>. This is called a I<soft reference> or I<symbolic reference>.
	476
	477	=back
	478
	479	You might prefer to go on to L<perllol> instead of L<perlref>; it
	480	discusses lists of lists and multidimensional arrays in detail. After
	481	that, you should move on to L<perldsc>; it's a Data Structure Cookbook
	482	that shows recipes for using and printing out arrays of hashes, hashes
	483	of arrays, and other kinds of data.
	484
	485	=head1 Summary
	486
	487	Everyone needs compound data structures, and in Perl the way you get
	488	them is with references. There are four important rules for managing
	489	references: Two for making references and two for using them. Once
	490	you know these rules you can do most of the important things you need
	491	to do with references.
	492
	493	=head1 Credits
	494
fd97da5a	495	Author: Mark-Jason Dominus, Plover Systems (C<mjd-perl-ref+@plover.com>)
a1e2a320	496
1da6492a	497	This article originally appeared in I<The Perl Journal>
f224927c	498	( http://www.tpj.com/ ) volume 3, #2. Reprinted with permission.
a1e2a320 GS	499
	500	The original title was I<Understand References Today>.
	501
1da6492a GS	502	=head2 Distribution Conditions
	503
	504	Copyright 1998 The Perl Journal.
	505
	506	When included as part of the Standard Version of Perl, or as part of
	507	its complete documentation whether printed or otherwise, this work may
	508	be distributed only under the terms of Perl's Artistic License. Any
	509	distribution of this file or derivatives thereof outside of that
	510	package require that special arrangements be made with copyright
	511	holder.
	512
	513	Irrespective of its distribution, all code examples in these files are
	514	hereby placed into the public domain. You are permitted and
	515	encouraged to use this code in your own programs for fun or for profit
	516	as you see fit. A simple comment in the code giving credit would be
	517	courteous but is not required.
a1e2a320	518
a1e2a320	519
1da6492a GS	520
	521
	522	=cut