This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Dual life modules maintained in core are in dist/ not ext/
[perl5.git] / pod / perlfaq4.pod
CommitLineData
68dc0745 1=head1 NAME
2
109f0441 3perlfaq4 - Data Manipulation
68dc0745 4
5=head1 DESCRIPTION
6
ae3d0b9f
JH
7This section of the FAQ answers questions related to manipulating
8numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
68dc0745 9
10=head1 Data: Numbers
11
46fc3d4c 12=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
13
ac9dac7f
RGS
14Internally, your computer represents floating-point numbers in binary.
15Digital (as in powers of two) computers cannot store all numbers
16exactly. Some real numbers lose precision in the process. This is a
17problem with how computers store numbers and affects all computer
18languages, not just Perl.
46fc3d4c 19
ee891a00 20L<perlnumber> shows the gory details of number representations and
ac9dac7f 21conversions.
49d635f9 22
ac9dac7f
RGS
23To limit the number of decimal places in your numbers, you can use the
24printf or sprintf function. See the L<"Floating Point
25Arithmetic"|perlop> for more details.
49d635f9
RGS
26
27 printf "%.2f", 10/3;
197aec24 28
49d635f9 29 my $number = sprintf "%.2f", 10/3;
197aec24 30
32969b6e
BB
31=head2 Why is int() broken?
32
ac9dac7f 33Your C<int()> is most probably working just fine. It's the numbers that
32969b6e
BB
34aren't quite what you think.
35
ac9dac7f 36First, see the answer to "Why am I getting long decimals
32969b6e
BB
37(eg, 19.9499999999999) instead of the numbers I should be getting
38(eg, 19.95)?".
39
40For example, this
41
ac9dac7f 42 print int(0.6/0.2-2), "\n";
32969b6e
BB
43
44will in most computers print 0, not 1, because even such simple
45numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
46numbers. What you think in the above as 'three' is really more like
472.9999999999999995559.
48
68dc0745 49=head2 Why isn't my octal data interpreted correctly?
50
109f0441
S
51(contributed by brian d foy)
52
53You're probably trying to convert a string to a number, which Perl only
54converts as a decimal number. When Perl converts a string to a number, it
55ignores leading spaces and zeroes, then assumes the rest of the digits
56are in base 10:
57
58 my $string = '0644';
59
60 print $string + 0; # prints 644
61
62 print $string + 44; # prints 688, certainly not octal!
63
64This problem usually involves one of the Perl built-ins that has the
65same name a unix command that uses octal numbers as arguments on the
66command line. In this example, C<chmod> on the command line knows that
67its first argument is octal because that's what it does:
68
69 %prompt> chmod 644 file
70
71If you want to use the same literal digits (644) in Perl, you have to tell
72Perl to treat them as octal numbers either by prefixing the digits with
73a C<0> or using C<oct>:
74
75 chmod( 0644, $file); # right, has leading zero
76 chmod( oct(644), $file ); # also correct
68dc0745 77
109f0441
S
78The problem comes in when you take your numbers from something that Perl
79thinks is a string, such as a command line argument in C<@ARGV>:
68dc0745 80
109f0441 81 chmod( $ARGV[0], $file); # wrong, even if "0644"
68dc0745 82
109f0441 83 chmod( oct($ARGV[0]), $file ); # correct, treat string as octal
33ce146f 84
109f0441
S
85You can always check the value you're using by printing it in octal
86notation to ensure it matches what you think it should be. Print it
87in octal and decimal format:
33ce146f 88
109f0441 89 printf "0%o %d", $number, $number;
33ce146f 90
65acb1b1 91=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
68dc0745 92
ac9dac7f
RGS
93Remember that C<int()> merely truncates toward 0. For rounding to a
94certain number of digits, C<sprintf()> or C<printf()> is usually the
95easiest route.
92c2ed05 96
ac9dac7f 97 printf("%.3f", 3.1415926535); # prints 3.142
68dc0745 98
ac9dac7f
RGS
99The C<POSIX> module (part of the standard Perl distribution)
100implements C<ceil()>, C<floor()>, and a number of other mathematical
101and trigonometric functions.
68dc0745 102
ac9dac7f
RGS
103 use POSIX;
104 $ceil = ceil(3.5); # 4
105 $floor = floor(3.5); # 3
92c2ed05 106
ac9dac7f
RGS
107In 5.000 to 5.003 perls, trigonometry was done in the C<Math::Complex>
108module. With 5.004, the C<Math::Trig> module (part of the standard Perl
46fc3d4c 109distribution) implements the trigonometric functions. Internally it
ac9dac7f 110uses the C<Math::Complex> module and some functions can break out from
46fc3d4c 111the real axis into the complex plane, for example the inverse sine of
1122.
68dc0745 113
114Rounding in financial applications can have serious implications, and
115the rounding method used should be specified precisely. In these
116cases, it probably pays not to trust whichever system rounding is
117being used by Perl, but to instead implement the rounding function you
118need yourself.
119
65acb1b1
TC
120To see why, notice how you'll still have an issue on half-way-point
121alternation:
122
ac9dac7f 123 for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
65acb1b1 124
ac9dac7f
RGS
125 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
126 0.8 0.8 0.9 0.9 1.0 1.0
65acb1b1 127
ac9dac7f
RGS
128Don't blame Perl. It's the same as in C. IEEE says we have to do
129this. Perl numbers whose absolute values are integers under 2**31 (on
13032 bit machines) will work pretty much like mathematical integers.
131Other numbers are not guaranteed.
65acb1b1 132
6f0efb17 133=head2 How do I convert between numeric representations/bases/radixes?
68dc0745 134
ac9dac7f
RGS
135As always with Perl there is more than one way to do it. Below are a
136few examples of approaches to making common conversions between number
137representations. This is intended to be representational rather than
138exhaustive.
68dc0745 139
ac9dac7f
RGS
140Some of the examples later in L<perlfaq4> use the C<Bit::Vector>
141module from CPAN. The reason you might choose C<Bit::Vector> over the
142perl built in functions is that it works with numbers of ANY size,
143that it is optimized for speed on some operations, and for at least
144some programmers the notation might be familiar.
d92eb7b0 145
818c4caa
JH
146=over 4
147
148=item How do I convert hexadecimal into decimal
d92eb7b0 149
ac9dac7f 150Using perl's built in conversion of C<0x> notation:
6761e064 151
ac9dac7f 152 $dec = 0xDEADBEEF;
7207e29d 153
ac9dac7f 154Using the C<hex> function:
6761e064 155
ac9dac7f 156 $dec = hex("DEADBEEF");
6761e064 157
ac9dac7f 158Using C<pack>:
6761e064 159
ac9dac7f 160 $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
6761e064 161
ac9dac7f 162Using the CPAN module C<Bit::Vector>:
6761e064 163
ac9dac7f
RGS
164 use Bit::Vector;
165 $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
166 $dec = $vec->to_Dec();
6761e064 167
818c4caa 168=item How do I convert from decimal to hexadecimal
6761e064 169
ac9dac7f 170Using C<sprintf>:
6761e064 171
ac9dac7f
RGS
172 $hex = sprintf("%X", 3735928559); # upper case A-F
173 $hex = sprintf("%x", 3735928559); # lower case a-f
6761e064 174
ac9dac7f 175Using C<unpack>:
6761e064 176
ac9dac7f 177 $hex = unpack("H*", pack("N", 3735928559));
6761e064 178
ac9dac7f 179Using C<Bit::Vector>:
6761e064 180
ac9dac7f
RGS
181 use Bit::Vector;
182 $vec = Bit::Vector->new_Dec(32, -559038737);
183 $hex = $vec->to_Hex();
6761e064 184
ac9dac7f 185And C<Bit::Vector> supports odd bit counts:
6761e064 186
ac9dac7f
RGS
187 use Bit::Vector;
188 $vec = Bit::Vector->new_Dec(33, 3735928559);
189 $vec->Resize(32); # suppress leading 0 if unwanted
190 $hex = $vec->to_Hex();
6761e064 191
818c4caa 192=item How do I convert from octal to decimal
6761e064
JH
193
194Using Perl's built in conversion of numbers with leading zeros:
195
ac9dac7f 196 $dec = 033653337357; # note the leading 0!
6761e064 197
ac9dac7f 198Using the C<oct> function:
6761e064 199
ac9dac7f 200 $dec = oct("33653337357");
6761e064 201
ac9dac7f 202Using C<Bit::Vector>:
6761e064 203
ac9dac7f
RGS
204 use Bit::Vector;
205 $vec = Bit::Vector->new(32);
206 $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
207 $dec = $vec->to_Dec();
6761e064 208
818c4caa 209=item How do I convert from decimal to octal
6761e064 210
ac9dac7f 211Using C<sprintf>:
6761e064 212
ac9dac7f 213 $oct = sprintf("%o", 3735928559);
6761e064 214
ac9dac7f 215Using C<Bit::Vector>:
6761e064 216
ac9dac7f
RGS
217 use Bit::Vector;
218 $vec = Bit::Vector->new_Dec(32, -559038737);
219 $oct = reverse join('', $vec->Chunk_List_Read(3));
6761e064 220
818c4caa 221=item How do I convert from binary to decimal
6761e064 222
2c646907 223Perl 5.6 lets you write binary numbers directly with
ac9dac7f 224the C<0b> notation:
2c646907 225
ac9dac7f 226 $number = 0b10110110;
6f0efb17 227
ac9dac7f 228Using C<oct>:
6f0efb17 229
ac9dac7f
RGS
230 my $input = "10110110";
231 $decimal = oct( "0b$input" );
2c646907 232
ac9dac7f 233Using C<pack> and C<ord>:
d92eb7b0 234
ac9dac7f 235 $decimal = ord(pack('B8', '10110110'));
68dc0745 236
ac9dac7f 237Using C<pack> and C<unpack> for larger strings:
6761e064 238
ac9dac7f 239 $int = unpack("N", pack("B32",
6761e064 240 substr("0" x 32 . "11110101011011011111011101111", -32)));
ac9dac7f 241 $dec = sprintf("%d", $int);
6761e064 242
ac9dac7f 243 # substr() is used to left pad a 32 character string with zeros.
6761e064 244
ac9dac7f 245Using C<Bit::Vector>:
6761e064 246
ac9dac7f
RGS
247 $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
248 $dec = $vec->to_Dec();
6761e064 249
818c4caa 250=item How do I convert from decimal to binary
6761e064 251
ac9dac7f 252Using C<sprintf> (perl 5.6+):
4dfcc30b 253
ac9dac7f 254 $bin = sprintf("%b", 3735928559);
4dfcc30b 255
ac9dac7f 256Using C<unpack>:
6761e064 257
ac9dac7f 258 $bin = unpack("B*", pack("N", 3735928559));
6761e064 259
ac9dac7f 260Using C<Bit::Vector>:
6761e064 261
ac9dac7f
RGS
262 use Bit::Vector;
263 $vec = Bit::Vector->new_Dec(32, -559038737);
264 $bin = $vec->to_Bin();
6761e064
JH
265
266The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
267are left as an exercise to the inclined reader.
68dc0745 268
818c4caa 269=back
68dc0745 270
65acb1b1
TC
271=head2 Why doesn't & work the way I want it to?
272
273The behavior of binary arithmetic operators depends on whether they're
274used on numbers or strings. The operators treat a string as a series
275of bits and work with that (the string C<"3"> is the bit pattern
276C<00110011>). The operators work with the binary form of a number
277(the number C<3> is treated as the bit pattern C<00000011>).
278
279So, saying C<11 & 3> performs the "and" operation on numbers (yielding
49d635f9 280C<3>). Saying C<"11" & "3"> performs the "and" operation on strings
65acb1b1
TC
281(yielding C<"1">).
282
283Most problems with C<&> and C<|> arise because the programmer thinks
284they have a number but really it's a string. The rest arise because
285the programmer says:
286
ac9dac7f
RGS
287 if ("\020\020" & "\101\101") {
288 # ...
289 }
65acb1b1
TC
290
291but a string consisting of two null bytes (the result of C<"\020\020"
292& "\101\101">) is not a false value in Perl. You need:
293
ac9dac7f
RGS
294 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
295 # ...
296 }
65acb1b1 297
68dc0745 298=head2 How do I multiply matrices?
299
300Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
301or the PDL extension (also available from CPAN).
302
303=head2 How do I perform an operation on a series of integers?
304
305To call a function on each element in an array, and collect the
306results, use:
307
ac9dac7f 308 @results = map { my_func($_) } @array;
68dc0745 309
310For example:
311
ac9dac7f 312 @triple = map { 3 * $_ } @single;
68dc0745 313
314To call a function on each element of an array, but ignore the
315results:
316
ac9dac7f
RGS
317 foreach $iterator (@array) {
318 some_func($iterator);
319 }
68dc0745 320
321To call a function on each integer in a (small) range, you B<can> use:
322
ac9dac7f 323 @results = map { some_func($_) } (5 .. 25);
68dc0745 324
325but you should be aware that the C<..> operator creates an array of
326all integers in the range. This can take a lot of memory for large
327ranges. Instead use:
328
ac9dac7f
RGS
329 @results = ();
330 for ($i=5; $i < 500_005; $i++) {
331 push(@results, some_func($i));
332 }
68dc0745 333
87275199
GS
334This situation has been fixed in Perl5.005. Use of C<..> in a C<for>
335loop will iterate over the range, without creating the entire range.
336
ac9dac7f
RGS
337 for my $i (5 .. 500_005) {
338 push(@results, some_func($i));
339 }
87275199
GS
340
341will not create a list of 500,000 integers.
342
68dc0745 343=head2 How can I output Roman numerals?
344
a93751fa 345Get the http://www.cpan.org/modules/by-module/Roman module.
68dc0745 346
347=head2 Why aren't my random numbers random?
348
65acb1b1
TC
349If you're using a version of Perl before 5.004, you must call C<srand>
350once at the start of your program to seed the random number generator.
49d635f9 351
5cd0b561 352 BEGIN { srand() if $] < 5.004 }
49d635f9 353
65acb1b1 3545.004 and later automatically call C<srand> at the beginning. Don't
ac9dac7f
RGS
355call C<srand> more than once--you make your numbers less random,
356rather than more.
92c2ed05 357
65acb1b1 358Computers are good at being predictable and bad at being random
06a5f41f 359(despite appearances caused by bugs in your programs :-). see the
49d635f9 360F<random> article in the "Far More Than You Ever Wanted To Know"
ac9dac7f
RGS
361collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy
362of Tom Phoenix, talks more about this. John von Neumann said, "Anyone
06a5f41f 363who attempts to generate random numbers by deterministic means is, of
b432a672 364course, living in a state of sin."
65acb1b1
TC
365
366If you want numbers that are more random than C<rand> with C<srand>
ac9dac7f 367provides, you should also check out the C<Math::TrulyRandom> module from
65acb1b1
TC
368CPAN. It uses the imperfections in your system's timer to generate
369random numbers, but this takes quite a while. If you want a better
92c2ed05 370pseudorandom generator than comes with your operating system, look at
b432a672 371"Numerical Recipes in C" at http://www.nr.com/ .
68dc0745 372
881bdbd4
JH
373=head2 How do I get a random number between X and Y?
374
ee891a00 375To get a random number between two values, you can use the C<rand()>
109f0441 376built-in to get a random number between 0 and 1. From there, you shift
ee891a00 377that into the range that you want.
500071f4 378
ee891a00
RGS
379C<rand($x)> returns a number such that C<< 0 <= rand($x) < $x >>. Thus
380what you want to have perl figure out is a random number in the range
381from 0 to the difference between your I<X> and I<Y>.
793f5136 382
ee891a00
RGS
383That is, to get a number between 10 and 15, inclusive, you want a
384random number between 0 and 5 that you can then add to 10.
793f5136 385
109f0441 386 my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
793f5136
RGS
387
388Hence you derive the following simple function to abstract
389that. It selects a random integer between the two given
500071f4
RGS
390integers (inclusive), For example: C<random_int_between(50,120)>.
391
ac9dac7f 392 sub random_int_between {
500071f4
RGS
393 my($min, $max) = @_;
394 # Assumes that the two arguments are integers themselves!
395 return $min if $min == $max;
396 ($min, $max) = ($max, $min) if $min > $max;
397 return $min + int rand(1 + $max - $min);
398 }
881bdbd4 399
68dc0745 400=head1 Data: Dates
401
5cd0b561 402=head2 How do I find the day or week of the year?
68dc0745 403
571e049f 404The localtime function returns the day of the year. Without an
5cd0b561 405argument localtime uses the current time.
68dc0745 406
a05e4845 407 $day_of_year = (localtime)[7];
ffc145e8 408
ac9dac7f 409The C<POSIX> module can also format a date as the day of the year or
5cd0b561 410week of the year.
68dc0745 411
5cd0b561
RGS
412 use POSIX qw/strftime/;
413 my $day_of_year = strftime "%j", localtime;
414 my $week_of_year = strftime "%W", localtime;
415
ac9dac7f 416To get the day of year for any date, use C<POSIX>'s C<mktime> to get
5cd0b561 417a time in epoch seconds for the argument to localtime.
ffc145e8 418
ac9dac7f 419 use POSIX qw/mktime strftime/;
6670e5e7 420 my $week_of_year = strftime "%W",
ac9dac7f 421 localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
5cd0b561 422
ac9dac7f 423The C<Date::Calc> module provides two functions to calculate these.
5cd0b561
RGS
424
425 use Date::Calc;
426 my $day_of_year = Day_of_Year( 1987, 12, 18 );
427 my $week_of_year = Week_of_Year( 1987, 12, 18 );
ffc145e8 428
d92eb7b0
GS
429=head2 How do I find the current century or millennium?
430
431Use the following simple functions:
432
ac9dac7f
RGS
433 sub get_century {
434 return int((((localtime(shift || time))[5] + 1999))/100);
435 }
6670e5e7 436
ac9dac7f
RGS
437 sub get_millennium {
438 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
439 }
d92eb7b0 440
ac9dac7f
RGS
441On some systems, the C<POSIX> module's C<strftime()> function has been
442extended in a non-standard way to use a C<%C> format, which they
443sometimes claim is the "century". It isn't, because on most such
444systems, this is only the first two digits of the four-digit year, and
445thus cannot be used to reliably determine the current century or
446millennium.
d92eb7b0 447
92c2ed05 448=head2 How can I compare two dates and find the difference?
68dc0745 449
b68463f7
RGS
450(contributed by brian d foy)
451
ac9dac7f
RGS
452You could just store all your dates as a number and then subtract.
453Life isn't always that simple though. If you want to work with
454formatted dates, the C<Date::Manip>, C<Date::Calc>, or C<DateTime>
455modules can help you.
68dc0745 456
457=head2 How can I take a string and turn it into epoch seconds?
458
459If it's a regular enough string that it always has the same format,
92c2ed05 460you can split it up and pass the parts to C<timelocal> in the standard
ac9dac7f
RGS
461C<Time::Local> module. Otherwise, you should look into the C<Date::Calc>
462and C<Date::Manip> modules from CPAN.
68dc0745 463
464=head2 How can I find the Julian Day?
465
7678cced
RGS
466(contributed by brian d foy and Dave Cross)
467
ac9dac7f
RGS
468You can use the C<Time::JulianDay> module available on CPAN. Ensure
469that you really want to find a Julian day, though, as many people have
7678cced
RGS
470different ideas about Julian days. See
471http://www.hermetic.ch/cal_stud/jdn.htm for instance.
472
ac9dac7f 473You can also try the C<DateTime> module, which can convert a date/time
7678cced
RGS
474to a Julian Day.
475
ac9dac7f
RGS
476 $ perl -MDateTime -le'print DateTime->today->jd'
477 2453401.5
7678cced
RGS
478
479Or the modified Julian Day
480
ac9dac7f
RGS
481 $ perl -MDateTime -le'print DateTime->today->mjd'
482 53401
7678cced
RGS
483
484Or even the day of the year (which is what some people think of as a
485Julian day)
486
ac9dac7f
RGS
487 $ perl -MDateTime -le'print DateTime->today->doy'
488 31
be94a901 489
65acb1b1 490=head2 How do I find yesterday's date?
109f0441
S
491X<date> X<yesterday> X<DateTime> X<Date::Calc> X<Time::Local>
492X<daylight saving time> X<day> X<Today_and_Now> X<localtime>
493X<timelocal>
65acb1b1 494
6670e5e7 495(contributed by brian d foy)
49d635f9 496
6670e5e7
RGS
497Use one of the Date modules. The C<DateTime> module makes it simple, and
498give you the same time of day, only the day before.
49d635f9 499
6670e5e7 500 use DateTime;
58103a2e 501
6670e5e7 502 my $yesterday = DateTime->now->subtract( days => 1 );
58103a2e 503
6670e5e7 504 print "Yesterday was $yesterday\n";
49d635f9 505
ee891a00 506You can also use the C<Date::Calc> module using its C<Today_and_Now>
6670e5e7 507function.
49d635f9 508
6670e5e7 509 use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
58103a2e 510
6670e5e7 511 my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
58103a2e 512
ee891a00 513 print "@date_time\n";
58103a2e 514
6670e5e7
RGS
515Most people try to use the time rather than the calendar to figure out
516dates, but that assumes that days are twenty-four hours each. For
517most people, there are two days a year when they aren't: the switch to
518and from summer time throws this off. Let the modules do the work.
d92eb7b0 519
109f0441
S
520If you absolutely must do it yourself (or can't use one of the
521modules), here's a solution using C<Time::Local>, which comes with
522Perl:
523
524 # contributed by Gunnar Hjalmarsson
525 use Time::Local;
526 my $today = timelocal 0, 0, 12, ( localtime )[3..5];
527 my ($d, $m, $y) = ( localtime $today-86400 )[3..5];
528 printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d;
529
530In this case, you measure the day starting at noon, and subtract 24
531hours. Even if the length of the calendar day is 23 or 25 hours,
532you'll still end up on the previous calendar day, although not at
533noon. Since you don't care about the time, the one hour difference
534doesn't matter and you end up with the previous date.
535
ac9dac7f 536=head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
68dc0745 537
65acb1b1 538Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is
ac9dac7f 539Y2K compliant (whatever that means). The programmers you've hired to
65acb1b1
TC
540use it, however, probably are not.
541
542Long answer: The question belies a true understanding of the issue.
543Perl is just as Y2K compliant as your pencil--no more, and no less.
544Can you use your pencil to write a non-Y2K-compliant memo? Of course
545you can. Is that the pencil's fault? Of course it isn't.
92c2ed05 546
87275199 547The date and time functions supplied with Perl (gmtime and localtime)
dc164757
MS
548supply adequate information to determine the year well beyond 2000 and
5492038. The year returned by these functions when used in a list
550context is the year minus 1900. For years between 1910 and 1999 this
551I<happens> to be a 2-digit decimal number. To avoid the year 2000
552problem simply do not treat the year as a 2-digit number. It isn't.
68dc0745 553
5a964f20 554When gmtime() and localtime() are used in scalar context they return
68dc0745 555a timestamp string that contains a fully-expanded year. For example,
556C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
5572001". There's no year 2000 problem here.
558
5a964f20
TC
559That doesn't mean that Perl can't be used to create non-Y2K compliant
560programs. It can. But so can your pencil. It's the fault of the user,
b432a672
AL
561not the language. At the risk of inflaming the NRA: "Perl doesn't
562break Y2K, people do." See http://www.perl.org/about/y2k.html for
5a964f20
TC
563a longer exposition.
564
68dc0745 565=head1 Data: Strings
566
567=head2 How do I validate input?
568
6670e5e7
RGS
569(contributed by brian d foy)
570
571There are many ways to ensure that values are what you expect or
572want to accept. Besides the specific examples that we cover in the
573perlfaq, you can also look at the modules with "Assert" and "Validate"
574in their names, along with other modules such as C<Regexp::Common>.
575
576Some modules have validation for particular types of input, such
577as C<Business::ISBN>, C<Business::CreditCard>, C<Email::Valid>,
578and C<Data::Validate::IP>.
68dc0745 579
580=head2 How do I unescape a string?
581
b432a672 582It depends just what you mean by "escape". URL escapes are dealt
92c2ed05 583with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
a6dd486b 584character are removed with
68dc0745 585
ac9dac7f 586 s/\\(.)/$1/g;
68dc0745 587
92c2ed05 588This won't expand C<"\n"> or C<"\t"> or any other special escapes.
68dc0745 589
590=head2 How do I remove consecutive pairs of characters?
591
6670e5e7
RGS
592(contributed by brian d foy)
593
594You can use the substitution operator to find pairs of characters (or
595runs of characters) and replace them with a single instance. In this
596substitution, we find a character in C<(.)>. The memory parentheses
597store the matched character in the back-reference C<\1> and we use
598that to require that the same thing immediately follow it. We replace
599that part of the string with the character in C<$1>.
68dc0745 600
ac9dac7f 601 s/(.)\1/$1/g;
d92eb7b0 602
6670e5e7
RGS
603We can also use the transliteration operator, C<tr///>. In this
604example, the search list side of our C<tr///> contains nothing, but
605the C<c> option complements that so it contains everything. The
606replacement list also contains nothing, so the transliteration is
607almost a no-op since it won't do any replacements (or more exactly,
608replace the character with itself). However, the C<s> option squashes
609duplicated and consecutive characters in the string so a character
610does not show up next to itself
d92eb7b0 611
6670e5e7 612 my $str = 'Haarlem'; # in the Netherlands
ac9dac7f 613 $str =~ tr///cs; # Now Harlem, like in New York
68dc0745 614
615=head2 How do I expand function calls in a string?
616
6670e5e7
RGS
617(contributed by brian d foy)
618
619This is documented in L<perlref>, and although it's not the easiest
620thing to read, it does work. In each of these examples, we call the
58103a2e 621function inside the braces used to dereference a reference. If we
5ae37c3f 622have more than one return value, we can construct and dereference an
6670e5e7
RGS
623anonymous array. In this case, we call the function in list context.
624
58103a2e 625 print "The time values are @{ [localtime] }.\n";
6670e5e7
RGS
626
627If we want to call the function in scalar context, we have to do a bit
628more work. We can really have any code we like inside the braces, so
629we simply have to end with the scalar reference, although how you do
e573f903
RGS
630that is up to you, and you can use code inside the braces. Note that
631the use of parens creates a list context, so we need C<scalar> to
632force the scalar context on the function:
68dc0745 633
6670e5e7 634 print "The time is ${\(scalar localtime)}.\n"
58103a2e 635
6670e5e7 636 print "The time is ${ my $x = localtime; \$x }.\n";
58103a2e 637
6670e5e7
RGS
638If your function already returns a reference, you don't need to create
639the reference yourself.
640
641 sub timestamp { my $t = localtime; \$t }
58103a2e 642
6670e5e7 643 print "The time is ${ timestamp() }.\n";
58103a2e
RGS
644
645The C<Interpolation> module can also do a lot of magic for you. You can
646specify a variable name, in this case C<E>, to set up a tied hash that
647does the interpolation for you. It has several other methods to do this
648as well.
649
650 use Interpolation E => 'eval';
651 print "The time values are $E{localtime()}.\n";
652
653In most cases, it is probably easier to simply use string concatenation,
654which also forces scalar context.
6670e5e7 655
ac9dac7f 656 print "The time is " . localtime() . ".\n";
68dc0745 657
68dc0745 658=head2 How do I find matching/nesting anything?
659
92c2ed05
GS
660This isn't something that can be done in one regular expression, no
661matter how complicated. To find something between two single
662characters, a pattern like C</x([^x]*)x/> will get the intervening
663bits in $1. For multiple ones, then something more like
ac9dac7f 664C</alpha(.*?)omega/> would be needed. But none of these deals with
6670e5e7
RGS
665nested patterns. For balanced expressions using C<(>, C<{>, C<[> or
666C<< < >> as delimiters, use the CPAN module Regexp::Common, or see
667L<perlre/(??{ code })>. For other cases, you'll have to write a
668parser.
92c2ed05
GS
669
670If you are serious about writing a parser, there are a number of
6a2af475 671modules or oddities that will make your life a lot easier. There are
ac9dac7f
RGS
672the CPAN modules C<Parse::RecDescent>, C<Parse::Yapp>, and
673C<Text::Balanced>; and the C<byacc> program. Starting from perl 5.8
674the C<Text::Balanced> is part of the standard distribution.
68dc0745 675
92c2ed05
GS
676One simple destructive, inside-out approach that you might try is to
677pull out the smallest nesting parts one at a time:
5a964f20 678
ac9dac7f
RGS
679 while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
680 # do something with $1
681 }
5a964f20 682
65acb1b1
TC
683A more complicated and sneaky approach is to make Perl's regular
684expression engine do it for you. This is courtesy Dean Inada, and
685rather has the nature of an Obfuscated Perl Contest entry, but it
686really does work:
687
ac9dac7f
RGS
688 # $_ contains the string to parse
689 # BEGIN and END are the opening and closing markers for the
690 # nested text.
c47ff5f1 691
ac9dac7f
RGS
692 @( = ('(','');
693 @) = (')','');
694 ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
695 @$ = (eval{/$re/},$@!~/unmatched/i);
696 print join("\n",@$[0..$#$]) if( $$[-1] );
65acb1b1 697
68dc0745 698=head2 How do I reverse a string?
699
ac9dac7f 700Use C<reverse()> in scalar context, as documented in
68dc0745 701L<perlfunc/reverse>.
702
ac9dac7f 703 $reversed = reverse $string;
68dc0745 704
705=head2 How do I expand tabs in a string?
706
5a964f20 707You can do it yourself:
68dc0745 708
ac9dac7f 709 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
68dc0745 710
ac9dac7f 711Or you can just use the C<Text::Tabs> module (part of the standard Perl
68dc0745 712distribution).
713
ac9dac7f
RGS
714 use Text::Tabs;
715 @expanded_lines = expand(@lines_with_tabs);
68dc0745 716
717=head2 How do I reformat a paragraph?
718
ac9dac7f 719Use C<Text::Wrap> (part of the standard Perl distribution):
68dc0745 720
ac9dac7f
RGS
721 use Text::Wrap;
722 print wrap("\t", ' ', @paragraphs);
68dc0745 723
ac9dac7f
RGS
724The paragraphs you give to C<Text::Wrap> should not contain embedded
725newlines. C<Text::Wrap> doesn't justify the lines (flush-right).
46fc3d4c 726
ac9dac7f
RGS
727Or use the CPAN module C<Text::Autoformat>. Formatting files can be
728easily done by making a shell alias, like so:
bc06af74 729
ac9dac7f
RGS
730 alias fmt="perl -i -MText::Autoformat -n0777 \
731 -e 'print autoformat $_, {all=>1}' $*"
bc06af74 732
ac9dac7f 733See the documentation for C<Text::Autoformat> to appreciate its many
bc06af74
JH
734capabilities.
735
49d635f9 736=head2 How can I access or change N characters of a string?
68dc0745 737
49d635f9
RGS
738You can access the first characters of a string with substr().
739To get the first character, for example, start at position 0
197aec24 740and grab the string of length 1.
68dc0745 741
68dc0745 742
49d635f9 743 $string = "Just another Perl Hacker";
ac9dac7f 744 $first_char = substr( $string, 0, 1 ); # 'J'
68dc0745 745
49d635f9
RGS
746To change part of a string, you can use the optional fourth
747argument which is the replacement string.
68dc0745 748
ac9dac7f 749 substr( $string, 13, 4, "Perl 5.8.0" );
197aec24 750
49d635f9 751You can also use substr() as an lvalue.
68dc0745 752
ac9dac7f 753 substr( $string, 13, 4 ) = "Perl 5.8.0";
197aec24 754
68dc0745 755=head2 How do I change the Nth occurrence of something?
756
92c2ed05
GS
757You have to keep track of N yourself. For example, let's say you want
758to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
d92eb7b0
GS
759C<"whosoever"> or C<"whomsoever">, case insensitively. These
760all assume that $_ contains the string to be altered.
68dc0745 761
ac9dac7f
RGS
762 $count = 0;
763 s{((whom?)ever)}{
764 ++$count == 5 # is it the 5th?
765 ? "${2}soever" # yes, swap
766 : $1 # renege and leave it there
767 }ige;
68dc0745 768
5a964f20
TC
769In the more general case, you can use the C</g> modifier in a C<while>
770loop, keeping count of matches.
771
ac9dac7f
RGS
772 $WANT = 3;
773 $count = 0;
774 $_ = "One fish two fish red fish blue fish";
775 while (/(\w+)\s+fish\b/gi) {
776 if (++$count == $WANT) {
777 print "The third fish is a $1 one.\n";
778 }
779 }
5a964f20 780
92c2ed05 781That prints out: C<"The third fish is a red one."> You can also use a
5a964f20
TC
782repetition count and repeated pattern like this:
783
ac9dac7f 784 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
5a964f20 785
68dc0745 786=head2 How can I count the number of occurrences of a substring within a string?
787
a6dd486b 788There are a number of ways, with varying efficiency. If you want a
68dc0745 789count of a certain single character (X) within a string, you can use the
790C<tr///> function like so:
791
ac9dac7f
RGS
792 $string = "ThisXlineXhasXsomeXx'sXinXit";
793 $count = ($string =~ tr/X//);
794 print "There are $count X characters in the string";
68dc0745 795
796This is fine if you are just looking for a single character. However,
797if you are trying to count multiple character substrings within a
798larger string, C<tr///> won't work. What you can do is wrap a while()
799loop around a global pattern match. For example, let's count negative
800integers:
801
ac9dac7f
RGS
802 $string = "-9 55 48 -2 23 -76 4 14 -44";
803 while ($string =~ /-\d+/g) { $count++ }
804 print "There are $count negative numbers in the string";
68dc0745 805
881bdbd4
JH
806Another version uses a global match in list context, then assigns the
807result to a scalar, producing a count of the number of matches.
808
809 $count = () = $string =~ /-\d+/g;
810
109f0441 811=head2 Does Perl have a Year 2038 problem?
68dc0745 812
109f0441
S
813No, all of Perl's built in date and time functions and modules will
814work to about 2 billion years before and after 1970.
3fe9a6f1 815
109f0441
S
816Many systems cannot count time past the year 2038. Older versions of
817Perl were dependent on the system to do date calculation and thus
818shared their 2038 bug.
68dc0745 819
109f0441
S
820=head2 How do I capitalize all the words on one line?
821X<Text::Autoformat> X<capitalize> X<case, title> X<case, sentence>
5a964f20 822
109f0441 823(contributed by brian d foy)
65acb1b1 824
109f0441
S
825Damian Conway's L<Text::Autoformat> handles all of the thinking
826for you.
369b44b4 827
ac9dac7f
RGS
828 use Text::Autoformat;
829 my $x = "Dr. Strangelove or: How I Learned to Stop ".
830 "Worrying and Love the Bomb";
369b44b4 831
ac9dac7f
RGS
832 print $x, "\n";
833 for my $style (qw( sentence title highlight )) {
834 print autoformat($x, { case => $style }), "\n";
835 }
369b44b4 836
109f0441
S
837How do you want to capitalize those words?
838
839 FRED AND BARNEY'S LODGE # all uppercase
840 Fred And Barney's Lodge # title case
841 Fred and Barney's Lodge # highlight case
842
843It's not as easy a problem as it looks. How many words do you think
844are in there? Wait for it... wait for it.... If you answered 5
845you're right. Perl words are groups of C<\w+>, but that's not what
846you want to capitalize. How is Perl supposed to know not to capitalize
847that C<s> after the apostrophe? You could try a regular expression:
848
849 $string =~ s/ (
850 (^\w) #at the beginning of the line
851 | # or
852 (\s\w) #preceded by whitespace
853 )
854 /\U$1/xg;
855
856 $string =~ s/([\w']+)/\u\L$1/g;
857
858Now, what if you don't want to capitalize that "and"? Just use
859L<Text::Autoformat> and get on with the next problem. :)
860
49d635f9 861=head2 How can I split a [character] delimited string except when inside [character]?
68dc0745 862
ac9dac7f
RGS
863Several modules can handle this sort of parsing--C<Text::Balanced>,
864C<Text::CSV>, C<Text::CSV_XS>, and C<Text::ParseWords>, among others.
49d635f9
RGS
865
866Take the example case of trying to split a string that is
867comma-separated into its different fields. You can't use C<split(/,/)>
868because you shouldn't split if the comma is inside quotes. For
869example, take a data line like this:
68dc0745 870
ac9dac7f 871 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
68dc0745 872
873Due to the restriction of the quotes, this is a fairly complex
197aec24 874problem. Thankfully, we have Jeffrey Friedl, author of
49d635f9 875I<Mastering Regular Expressions>, to handle these for us. He
ac9dac7f 876suggests (assuming your string is contained in C<$text>):
68dc0745 877
ac9dac7f
RGS
878 @new = ();
879 push(@new, $+) while $text =~ m{
880 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
881 | ([^,]+),?
882 | ,
883 }gx;
884 push(@new, undef) if substr($text,-1,1) eq ',';
68dc0745 885
46fc3d4c 886If you want to represent quotation marks inside a
887quotation-mark-delimited field, escape them with backslashes (eg,
49d635f9 888C<"like \"this\"">.
46fc3d4c 889
ac9dac7f
RGS
890Alternatively, the C<Text::ParseWords> module (part of the standard
891Perl distribution) lets you say:
68dc0745 892
ac9dac7f
RGS
893 use Text::ParseWords;
894 @new = quotewords(",", 0, $text);
65acb1b1 895
68dc0745 896=head2 How do I strip blank space from the beginning/end of a string?
897
6670e5e7 898(contributed by brian d foy)
68dc0745 899
6670e5e7
RGS
900A substitution can do this for you. For a single line, you want to
901replace all the leading or trailing whitespace with nothing. You
902can do that with a pair of substitutions.
68dc0745 903
6670e5e7
RGS
904 s/^\s+//;
905 s/\s+$//;
68dc0745 906
6670e5e7
RGS
907You can also write that as a single substitution, although it turns
908out the combined statement is slower than the separate ones. That
909might not matter to you, though.
68dc0745 910
6670e5e7 911 s/^\s+|\s+$//g;
68dc0745 912
6670e5e7
RGS
913In this regular expression, the alternation matches either at the
914beginning or the end of the string since the anchors have a lower
915precedence than the alternation. With the C</g> flag, the substitution
916makes all possible matches, so it gets both. Remember, the trailing
917newline matches the C<\s+>, and the C<$> anchor can match to the
918physical end of the string, so the newline disappears too. Just add
919the newline to the output, which has the added benefit of preserving
920"blank" (consisting entirely of whitespace) lines which the C<^\s+>
921would remove all by itself.
68dc0745 922
6670e5e7
RGS
923 while( <> )
924 {
925 s/^\s+|\s+$//g;
926 print "$_\n";
927 }
5a964f20 928
6670e5e7
RGS
929For a multi-line string, you can apply the regular expression
930to each logical line in the string by adding the C</m> flag (for
931"multi-line"). With the C</m> flag, the C<$> matches I<before> an
932embedded newline, so it doesn't remove it. It still removes the
933newline at the end of the string.
934
ac9dac7f 935 $string =~ s/^\s+|\s+$//gm;
6670e5e7
RGS
936
937Remember that lines consisting entirely of whitespace will disappear,
938since the first part of the alternation can match the entire string
939and replace it with nothing. If need to keep embedded blank lines,
940you have to do a little more work. Instead of matching any whitespace
941(since that includes a newline), just match the other whitespace.
942
943 $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
5a964f20 944
65acb1b1
TC
945=head2 How do I pad a string with blanks or pad a number with zeroes?
946
65acb1b1 947In the following examples, C<$pad_len> is the length to which you wish
d92eb7b0
GS
948to pad the string, C<$text> or C<$num> contains the string to be padded,
949and C<$pad_char> contains the padding character. You can use a single
950character string constant instead of the C<$pad_char> variable if you
951know what it is in advance. And in the same way you can use an integer in
952place of C<$pad_len> if you know the pad length in advance.
65acb1b1 953
d92eb7b0
GS
954The simplest method uses the C<sprintf> function. It can pad on the left
955or right with blanks and on the left with zeroes and it will not
956truncate the result. The C<pack> function can only pad strings on the
957right with blanks and it will truncate the result to a maximum length of
958C<$pad_len>.
65acb1b1 959
ac9dac7f 960 # Left padding a string with blanks (no truncation):
04d666b1
RGS
961 $padded = sprintf("%${pad_len}s", $text);
962 $padded = sprintf("%*s", $pad_len, $text); # same thing
65acb1b1 963
ac9dac7f 964 # Right padding a string with blanks (no truncation):
04d666b1
RGS
965 $padded = sprintf("%-${pad_len}s", $text);
966 $padded = sprintf("%-*s", $pad_len, $text); # same thing
65acb1b1 967
ac9dac7f 968 # Left padding a number with 0 (no truncation):
04d666b1
RGS
969 $padded = sprintf("%0${pad_len}d", $num);
970 $padded = sprintf("%0*d", $pad_len, $num); # same thing
65acb1b1 971
ac9dac7f
RGS
972 # Right padding a string with blanks using pack (will truncate):
973 $padded = pack("A$pad_len",$text);
65acb1b1 974
d92eb7b0
GS
975If you need to pad with a character other than blank or zero you can use
976one of the following methods. They all generate a pad string with the
977C<x> operator and combine that with C<$text>. These methods do
978not truncate C<$text>.
65acb1b1 979
d92eb7b0 980Left and right padding with any character, creating a new string:
65acb1b1 981
ac9dac7f
RGS
982 $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
983 $padded = $text . $pad_char x ( $pad_len - length( $text ) );
65acb1b1 984
d92eb7b0 985Left and right padding with any character, modifying C<$text> directly:
65acb1b1 986
ac9dac7f
RGS
987 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
988 $text .= $pad_char x ( $pad_len - length( $text ) );
65acb1b1 989
68dc0745 990=head2 How do I extract selected columns from a string?
991
e573f903
RGS
992(contributed by brian d foy)
993
994If you know where the columns that contain the data, you can
995use C<substr> to extract a single column.
996
997 my $column = substr( $line, $start_column, $length );
998
999You can use C<split> if the columns are separated by whitespace or
1000some other delimiter, as long as whitespace or the delimiter cannot
1001appear as part of the data.
1002
1003 my $line = ' fred barney betty ';
1004 my @columns = split /\s+/, $line;
1005 # ( '', 'fred', 'barney', 'betty' );
1006
1007 my $line = 'fred||barney||betty';
1008 my @columns = split /\|/, $line;
1009 # ( 'fred', '', 'barney', '', 'betty' );
1010
1011If you want to work with comma-separated values, don't do this since
1012that format is a bit more complicated. Use one of the modules that
109f0441 1013handle that format, such as C<Text::CSV>, C<Text::CSV_XS>, or
e573f903
RGS
1014C<Text::CSV_PP>.
1015
1016If you want to break apart an entire line of fixed columns, you can use
1017C<unpack> with the A (ASCII) format. by using a number after the format
1018specifier, you can denote the column width. See the C<pack> and C<unpack>
1019entries in L<perlfunc> for more details.
1020
1021 my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
1022
1023Note that spaces in the format argument to C<unpack> do not denote literal
1024spaces. If you have space separated data, you may want C<split> instead.
68dc0745 1025
1026=head2 How do I find the soundex value of a string?
1027
7678cced
RGS
1028(contributed by brian d foy)
1029
1030You can use the Text::Soundex module. If you want to do fuzzy or close
ac9dac7f
RGS
1031matching, you might also try the C<String::Approx>, and
1032C<Text::Metaphone>, and C<Text::DoubleMetaphone> modules.
68dc0745 1033
1034=head2 How can I expand variables in text strings?
1035
e573f903 1036(contributed by brian d foy)
5a964f20 1037
322be77c 1038If you can avoid it, don't, or if you can use a templating system,
c195e131
RGS
1039such as C<Text::Template> or C<Template> Toolkit, do that instead. You
1040might even be able to get the job done with C<sprintf> or C<printf>:
1041
1042 my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
322be77c
RGS
1043
1044However, for the one-off simple case where I don't want to pull out a
1045full templating system, I'll use a string that has two Perl scalar
1046variables in it. In this example, I want to expand C<$foo> and C<$bar>
c195e131 1047to their variable's values:
e573f903
RGS
1048
1049 my $foo = 'Fred';
1050 my $bar = 'Barney';
1051 $string = 'Say hello to $foo and $bar';
1052
1053One way I can do this involves the substitution operator and a double
1054C</e> flag. The first C</e> evaluates C<$1> on the replacement side and
1055turns it into C<$foo>. The second /e starts with C<$foo> and replaces
1056it with its value. C<$foo>, then, turns into 'Fred', and that's finally
c195e131 1057what's left in the string:
e573f903
RGS
1058
1059 $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
322be77c 1060
e573f903 1061The C</e> will also silently ignore violations of strict, replacing
c195e131 1062undefined variable names with the empty string. Since I'm using the
109f0441 1063C</e> flag (twice even!), I have all of the same security problems I
c195e131
RGS
1064have with C<eval> in its string form. If there's something odd in
1065C<$foo>, perhaps something like C<@{[ system "rm -rf /" ]}>, then
1066I could get myself in trouble.
1067
1068To get around the security problem, I could also pull the values from
1069a hash instead of evaluating variable names. Using a single C</e>, I
1070can check the hash to ensure the value exists, and if it doesn't, I
1071can replace the missing value with a marker, in this case C<???> to
1072signal that I missed something:
e573f903
RGS
1073
1074 my $string = 'This has $foo and $bar';
109f0441 1075
e573f903
RGS
1076 my %Replacements = (
1077 foo => 'Fred',
ac9dac7f 1078 );
322be77c 1079
e573f903
RGS
1080 # $string =~ s/\$(\w+)/$Replacements{$1}/g;
1081 $string =~ s/\$(\w+)/
1082 exists $Replacements{$1} ? $Replacements{$1} : '???'
1083 /eg;
322be77c 1084
e573f903 1085 print $string;
322be77c 1086
68dc0745 1087=head2 What's wrong with always quoting "$vars"?
1088
ac9dac7f 1089The problem is that those double-quotes force
e573f903
RGS
1090stringification--coercing numbers and references into strings--even
1091when you don't want them to be strings. Think of it this way:
1092double-quote expansion is used to produce new strings. If you already
1093have a string, why do you need more?
68dc0745 1094
1095If you get used to writing odd things like these:
1096
ac9dac7f
RGS
1097 print "$var"; # BAD
1098 $new = "$old"; # BAD
1099 somefunc("$var"); # BAD
68dc0745 1100
1101You'll be in trouble. Those should (in 99.8% of the cases) be
1102the simpler and more direct:
1103
ac9dac7f
RGS
1104 print $var;
1105 $new = $old;
1106 somefunc($var);
68dc0745 1107
1108Otherwise, besides slowing you down, you're going to break code when
1109the thing in the scalar is actually neither a string nor a number, but
1110a reference:
1111
ac9dac7f
RGS
1112 func(\@array);
1113 sub func {
1114 my $aref = shift;
1115 my $oref = "$aref"; # WRONG
1116 }
68dc0745 1117
1118You can also get into subtle problems on those few operations in Perl
1119that actually do care about the difference between a string and a
1120number, such as the magical C<++> autoincrement operator or the
1121syscall() function.
1122
197aec24 1123Stringification also destroys arrays.
5a964f20 1124
ac9dac7f
RGS
1125 @lines = `command`;
1126 print "@lines"; # WRONG - extra blanks
1127 print @lines; # right
5a964f20 1128
04d666b1 1129=head2 Why don't my E<lt>E<lt>HERE documents work?
68dc0745 1130
1131Check for these three things:
1132
1133=over 4
1134
04d666b1 1135=item There must be no space after the E<lt>E<lt> part.
68dc0745 1136
197aec24 1137=item There (probably) should be a semicolon at the end.
68dc0745 1138
197aec24 1139=item You can't (easily) have any space in front of the tag.
68dc0745 1140
1141=back
1142
197aec24 1143If you want to indent the text in the here document, you
5a964f20
TC
1144can do this:
1145
1146 # all in one
1147 ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1148 your text
1149 goes here
1150 HERE_TARGET
1151
1152But the HERE_TARGET must still be flush against the margin.
197aec24 1153If you want that indented also, you'll have to quote
5a964f20
TC
1154in the indentation.
1155
1156 ($quote = <<' FINIS') =~ s/^\s+//gm;
1157 ...we will have peace, when you and all your works have
1158 perished--and the works of your dark master to whom you
1159 would deliver us. You are a liar, Saruman, and a corrupter
1160 of men's hearts. --Theoden in /usr/src/perl/taint.c
1161 FINIS
83ded9ee 1162 $quote =~ s/\s+--/\n--/;
5a964f20
TC
1163
1164A nice general-purpose fixer-upper function for indented here documents
1165follows. It expects to be called with a here document as its argument.
1166It looks to see whether each line begins with a common substring, and
a6dd486b
JB
1167if so, strips that substring off. Otherwise, it takes the amount of leading
1168whitespace found on the first line and removes that much off each
5a964f20
TC
1169subsequent line.
1170
1171 sub fix {
1172 local $_ = shift;
a6dd486b 1173 my ($white, $leader); # common whitespace and common leading string
5a964f20
TC
1174 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
1175 ($white, $leader) = ($2, quotemeta($1));
1176 } else {
1177 ($white, $leader) = (/^(\s+)/, '');
1178 }
1179 s/^\s*?$leader(?:$white)?//gm;
1180 return $_;
1181 }
1182
c8db1d39 1183This works with leading special strings, dynamically determined:
5a964f20 1184
ac9dac7f 1185 $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
5a964f20
TC
1186 @@@ int
1187 @@@ runops() {
1188 @@@ SAVEI32(runlevel);
1189 @@@ runlevel++;
d92eb7b0 1190 @@@ while ( op = (*op->op_ppaddr)() );
5a964f20
TC
1191 @@@ TAINT_NOT;
1192 @@@ return 0;
1193 @@@ }
ac9dac7f 1194 MAIN_INTERPRETER_LOOP
5a964f20 1195
a6dd486b 1196Or with a fixed amount of leading whitespace, with remaining
5a964f20
TC
1197indentation correctly preserved:
1198
ac9dac7f 1199 $poem = fix<<EVER_ON_AND_ON;
5a964f20
TC
1200 Now far ahead the Road has gone,
1201 And I must follow, if I can,
1202 Pursuing it with eager feet,
1203 Until it joins some larger way
1204 Where many paths and errands meet.
1205 And whither then? I cannot say.
1206 --Bilbo in /usr/src/perl/pp_ctl.c
ac9dac7f 1207 EVER_ON_AND_ON
5a964f20 1208
68dc0745 1209=head1 Data: Arrays
1210
65acb1b1
TC
1211=head2 What is the difference between a list and an array?
1212
ac9dac7f
RGS
1213An array has a changeable length. A list does not. An array is
1214something you can push or pop, while a list is a set of values. Some
1215people make the distinction that a list is a value while an array is a
1216variable. Subroutines are passed and return lists, you put things into
1217list context, you initialize arrays with lists, and you C<foreach()>
1218across a list. C<@> variables are arrays, anonymous arrays are
1219arrays, arrays in scalar context behave like the number of elements in
1220them, subroutines access their arguments through the array C<@_>, and
1221C<push>/C<pop>/C<shift> only work on arrays.
65acb1b1
TC
1222
1223As a side note, there's no such thing as a list in scalar context.
1224When you say
1225
ac9dac7f 1226 $scalar = (2, 5, 7, 9);
65acb1b1 1227
d92eb7b0 1228you're using the comma operator in scalar context, so it uses the scalar
ac9dac7f 1229comma operator. There never was a list there at all! This causes the
d92eb7b0 1230last value to be returned: 9.
65acb1b1 1231
68dc0745 1232=head2 What is the difference between $array[1] and @array[1]?
1233
a6dd486b 1234The former is a scalar value; the latter an array slice, making
68dc0745 1235it a list with one (scalar) value. You should use $ when you want a
1236scalar value (most of the time) and @ when you want a list with one
1237scalar value in it (very, very rarely; nearly never, in fact).
1238
1239Sometimes it doesn't make a difference, but sometimes it does.
1240For example, compare:
1241
ac9dac7f 1242 $good[0] = `some program that outputs several lines`;
68dc0745 1243
1244with
1245
ac9dac7f 1246 @bad[0] = `same program that outputs several lines`;
68dc0745 1247
197aec24 1248The C<use warnings> pragma and the B<-w> flag will warn you about these
9f1b1f2d 1249matters.
68dc0745 1250
d92eb7b0 1251=head2 How can I remove duplicate elements from a list or array?
68dc0745 1252
6670e5e7 1253(contributed by brian d foy)
68dc0745 1254
6670e5e7
RGS
1255Use a hash. When you think the words "unique" or "duplicated", think
1256"hash keys".
68dc0745 1257
6670e5e7
RGS
1258If you don't care about the order of the elements, you could just
1259create the hash then extract the keys. It's not important how you
1260create that hash: just that you use C<keys> to get the unique
1261elements.
551e1d92 1262
ac9dac7f
RGS
1263 my %hash = map { $_, 1 } @array;
1264 # or a hash slice: @hash{ @array } = ();
1265 # or a foreach: $hash{$_} = 1 foreach ( @array );
1266
1267 my @unique = keys %hash;
68dc0745 1268
ac9dac7f
RGS
1269If you want to use a module, try the C<uniq> function from
1270C<List::MoreUtils>. In list context it returns the unique elements,
1271preserving their order in the list. In scalar context, it returns the
1272number of unique elements.
1273
1274 use List::MoreUtils qw(uniq);
1275
1276 my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1277 my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
68dc0745 1278
6670e5e7
RGS
1279You can also go through each element and skip the ones you've seen
1280before. Use a hash to keep track. The first time the loop sees an
1281element, that element has no key in C<%Seen>. The C<next> statement
1282creates the key and immediately uses its value, which is C<undef>, so
1283the loop continues to the C<push> and increments the value for that
1284key. The next time the loop sees that same element, its key exists in
1285the hash I<and> the value for that key is true (since it's not 0 or
ac9dac7f
RGS
1286C<undef>), so the next skips that iteration and the loop goes to the
1287next element.
551e1d92 1288
6670e5e7
RGS
1289 my @unique = ();
1290 my %seen = ();
68dc0745 1291
6670e5e7
RGS
1292 foreach my $elem ( @array )
1293 {
1294 next if $seen{ $elem }++;
1295 push @unique, $elem;
1296 }
68dc0745 1297
6670e5e7
RGS
1298You can write this more briefly using a grep, which does the
1299same thing.
68dc0745 1300
ac9dac7f
RGS
1301 my %seen = ();
1302 my @unique = grep { ! $seen{ $_ }++ } @array;
65acb1b1 1303
ddbc1f16 1304=head2 How can I tell whether a certain element is contained in a list or array?
5a964f20 1305
109f0441 1306(portions of this answer contributed by Anno Siegel and brian d foy)
9e72e4c6 1307
5a964f20
TC
1308Hearing the word "in" is an I<in>dication that you probably should have
1309used a hash, not a list or array, to store your data. Hashes are
1310designed to answer this question quickly and efficiently. Arrays aren't.
68dc0745 1311
109f0441
S
1312That being said, there are several ways to approach this. In Perl 5.10
1313and later, you can use the smart match operator to check that an item is
1314contained in an array or a hash:
1315
1316 use 5.010;
1317
1318 if( $item ~~ @array )
1319 {
1320 say "The array contains $item"
1321 }
1322
1323 if( $item ~~ %hash )
1324 {
1325 say "The hash contains $item"
1326 }
1327
1328With earlier versions of Perl, you have to do a bit more work. If you
5a964f20 1329are going to make this query many times over arbitrary string values,
881bdbd4 1330the fastest way is probably to invert the original array and maintain a
109f0441 1331hash whose keys are the first array's values:
68dc0745 1332
ac9dac7f
RGS
1333 @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1334 %is_blue = ();
1335 for (@blues) { $is_blue{$_} = 1 }
68dc0745 1336
ac9dac7f
RGS
1337Now you can check whether C<$is_blue{$some_color}>. It might have
1338been a good idea to keep the blues all in a hash in the first place.
68dc0745 1339
1340If the values are all small integers, you could use a simple indexed
1341array. This kind of an array will take up less space:
1342
ac9dac7f
RGS
1343 @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1344 @is_tiny_prime = ();
1345 for (@primes) { $is_tiny_prime[$_] = 1 }
1346 # or simply @istiny_prime[@primes] = (1) x @primes;
68dc0745 1347
1348Now you check whether $is_tiny_prime[$some_number].
1349
1350If the values in question are integers instead of strings, you can save
1351quite a lot of space by using bit strings instead:
1352
ac9dac7f
RGS
1353 @articles = ( 1..10, 150..2000, 2017 );
1354 undef $read;
1355 for (@articles) { vec($read,$_,1) = 1 }
68dc0745 1356
1357Now check whether C<vec($read,$n,1)> is true for some C<$n>.
1358
9e72e4c6
RGS
1359These methods guarantee fast individual tests but require a re-organization
1360of the original list or array. They only pay off if you have to test
1361multiple values against the same array.
68dc0745 1362
ac9dac7f 1363If you are testing only once, the standard module C<List::Util> exports
9e72e4c6 1364the function C<first> for this purpose. It works by stopping once it
c195e131 1365finds the element. It's written in C for speed, and its Perl equivalent
9e72e4c6 1366looks like this subroutine:
68dc0745 1367
9e72e4c6
RGS
1368 sub first (&@) {
1369 my $code = shift;
1370 foreach (@_) {
1371 return $_ if &{$code}();
1372 }
1373 undef;
1374 }
68dc0745 1375
9e72e4c6
RGS
1376If speed is of little concern, the common idiom uses grep in scalar context
1377(which returns the number of items that passed its condition) to traverse the
1378entire list. This does have the benefit of telling you how many matches it
1379found, though.
68dc0745 1380
9e72e4c6 1381 my $is_there = grep $_ eq $whatever, @array;
65acb1b1 1382
9e72e4c6
RGS
1383If you want to actually extract the matching elements, simply use grep in
1384list context.
68dc0745 1385
9e72e4c6 1386 my @matches = grep $_ eq $whatever, @array;
58103a2e 1387
68dc0745 1388=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
1389
ac9dac7f
RGS
1390Use a hash. Here's code to do both and more. It assumes that each
1391element is unique in a given array:
68dc0745 1392
ac9dac7f
RGS
1393 @union = @intersection = @difference = ();
1394 %count = ();
1395 foreach $element (@array1, @array2) { $count{$element}++ }
1396 foreach $element (keys %count) {
1397 push @union, $element;
1398 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1399 }
68dc0745 1400
ac9dac7f
RGS
1401Note that this is the I<symmetric difference>, that is, all elements
1402in either A or in B but not in both. Think of it as an xor operation.
d92eb7b0 1403
65acb1b1
TC
1404=head2 How do I test whether two arrays or hashes are equal?
1405
109f0441
S
1406With Perl 5.10 and later, the smart match operator can give you the answer
1407with the least amount of work:
1408
1409 use 5.010;
1410
1411 if( @array1 ~~ @array2 )
1412 {
1413 say "The arrays are the same";
1414 }
1415
1416 if( %hash1 ~~ %hash2 ) # doesn't check values!
1417 {
1418 say "The hash keys are the same";
1419 }
1420
ac9dac7f
RGS
1421The following code works for single-level arrays. It uses a
1422stringwise comparison, and does not distinguish defined versus
1423undefined empty strings. Modify if you have other needs.
65acb1b1 1424
ac9dac7f 1425 $are_equal = compare_arrays(\@frogs, \@toads);
65acb1b1 1426
ac9dac7f
RGS
1427 sub compare_arrays {
1428 my ($first, $second) = @_;
1429 no warnings; # silence spurious -w undef complaints
1430 return 0 unless @$first == @$second;
1431 for (my $i = 0; $i < @$first; $i++) {
1432 return 0 if $first->[$i] ne $second->[$i];
1433 }
1434 return 1;
1435 }
65acb1b1
TC
1436
1437For multilevel structures, you may wish to use an approach more
ac9dac7f 1438like this one. It uses the CPAN module C<FreezeThaw>:
65acb1b1 1439
ac9dac7f
RGS
1440 use FreezeThaw qw(cmpStr);
1441 @a = @b = ( "this", "that", [ "more", "stuff" ] );
65acb1b1 1442
ac9dac7f
RGS
1443 printf "a and b contain %s arrays\n",
1444 cmpStr(\@a, \@b) == 0
1445 ? "the same"
1446 : "different";
65acb1b1 1447
ac9dac7f
RGS
1448This approach also works for comparing hashes. Here we'll demonstrate
1449two different answers:
65acb1b1 1450
ac9dac7f 1451 use FreezeThaw qw(cmpStr cmpStrHard);
65acb1b1 1452
ac9dac7f
RGS
1453 %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1454 $a{EXTRA} = \%b;
1455 $b{EXTRA} = \%a;
65acb1b1 1456
ac9dac7f 1457 printf "a and b contain %s hashes\n",
65acb1b1
TC
1458 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1459
ac9dac7f 1460 printf "a and b contain %s hashes\n",
65acb1b1
TC
1461 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1462
1463
1464The first reports that both those the hashes contain the same data,
1465while the second reports that they do not. Which you prefer is left as
1466an exercise to the reader.
1467
68dc0745 1468=head2 How do I find the first array element for which a condition is true?
1469
49d635f9 1470To find the first array element which satisfies a condition, you can
ac9dac7f
RGS
1471use the C<first()> function in the C<List::Util> module, which comes
1472with Perl 5.8. This example finds the first element that contains
1473"Perl".
49d635f9
RGS
1474
1475 use List::Util qw(first);
197aec24 1476
49d635f9 1477 my $element = first { /Perl/ } @array;
197aec24 1478
ac9dac7f 1479If you cannot use C<List::Util>, you can make your own loop to do the
49d635f9
RGS
1480same thing. Once you find the element, you stop the loop with last.
1481
1482 my $found;
ac9dac7f 1483 foreach ( @array ) {
6670e5e7 1484 if( /Perl/ ) { $found = $_; last }
49d635f9
RGS
1485 }
1486
1487If you want the array index, you can iterate through the indices
1488and check the array element at each index until you find one
1489that satisfies the condition.
1490
197aec24 1491 my( $found, $index ) = ( undef, -1 );
ac9dac7f
RGS
1492 for( $i = 0; $i < @array; $i++ ) {
1493 if( $array[$i] =~ /Perl/ ) {
6670e5e7
RGS
1494 $found = $array[$i];
1495 $index = $i;
1496 last;
1497 }
1498 }
68dc0745 1499
1500=head2 How do I handle linked lists?
1501
1502In general, you usually don't need a linked list in Perl, since with
ac9dac7f
RGS
1503regular arrays, you can push and pop or shift and unshift at either
1504end, or you can use splice to add and/or remove arbitrary number of
ac003c96 1505elements at arbitrary points. Both pop and shift are O(1)
ac9dac7f
RGS
1506operations on Perl's dynamic arrays. In the absence of shifts and
1507pops, push in general needs to reallocate on the order every log(N)
1508times, and unshift will need to copy pointers each time.
68dc0745 1509
1510If you really, really wanted, you could use structures as described in
ac9dac7f
RGS
1511L<perldsc> or L<perltoot> and do just what the algorithm book tells
1512you to do. For example, imagine a list node like this:
65acb1b1 1513
ac9dac7f
RGS
1514 $node = {
1515 VALUE => 42,
1516 LINK => undef,
1517 };
65acb1b1
TC
1518
1519You could walk the list this way:
1520
ac9dac7f
RGS
1521 print "List: ";
1522 for ($node = $head; $node; $node = $node->{LINK}) {
1523 print $node->{VALUE}, " ";
1524 }
1525 print "\n";
65acb1b1 1526
a6dd486b 1527You could add to the list this way:
65acb1b1 1528
ac9dac7f
RGS
1529 my ($head, $tail);
1530 $tail = append($head, 1); # grow a new head
1531 for $value ( 2 .. 10 ) {
1532 $tail = append($tail, $value);
1533 }
65acb1b1 1534
ac9dac7f
RGS
1535 sub append {
1536 my($list, $value) = @_;
1537 my $node = { VALUE => $value };
1538 if ($list) {
1539 $node->{LINK} = $list->{LINK};
1540 $list->{LINK} = $node;
1541 }
1542 else {
1543 $_[0] = $node; # replace caller's version
1544 }
1545 return $node;
1546 }
65acb1b1
TC
1547
1548But again, Perl's built-in are virtually always good enough.
68dc0745 1549
1550=head2 How do I handle circular lists?
109f0441
S
1551X<circular> X<array> X<Tie::Cycle> X<Array::Iterator::Circular>
1552X<cycle> X<modulus>
68dc0745 1553
109f0441
S
1554(contributed by brian d foy)
1555
1556If you want to cycle through an array endlessy, you can increment the
1557index modulo the number of elements in the array:
68dc0745 1558
109f0441
S
1559 my @array = qw( a b c );
1560 my $i = 0;
1561
1562 while( 1 ) {
1563 print $array[ $i++ % @array ], "\n";
1564 last if $i > 20;
1565 }
ac9dac7f 1566
109f0441
S
1567You can also use C<Tie::Cycle> to use a scalar that always has the
1568next element of the circular array:
ac9dac7f
RGS
1569
1570 use Tie::Cycle;
1571
1572 tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1573
1574 print $cycle; # FFFFFF
1575 print $cycle; # 000000
1576 print $cycle; # FFFF00
68dc0745 1577
109f0441
S
1578The C<Array::Iterator::Circular> creates an iterator object for
1579circular arrays:
1580
1581 use Array::Iterator::Circular;
1582
1583 my $color_iterator = Array::Iterator::Circular->new(
1584 qw(red green blue orange)
1585 );
1586
1587 foreach ( 1 .. 20 ) {
1588 print $color_iterator->next, "\n";
1589 }
1590
68dc0745 1591=head2 How do I shuffle an array randomly?
1592
45bbf655
JH
1593If you either have Perl 5.8.0 or later installed, or if you have
1594Scalar-List-Utils 1.03 or later installed, you can say:
1595
ac9dac7f 1596 use List::Util 'shuffle';
45bbf655
JH
1597
1598 @shuffled = shuffle(@list);
1599
f05bbc40 1600If not, you can use a Fisher-Yates shuffle.
5a964f20 1601
ac9dac7f
RGS
1602 sub fisher_yates_shuffle {
1603 my $deck = shift; # $deck is a reference to an array
109f0441
S
1604 return unless @$deck; # must not be empty!
1605
ac9dac7f
RGS
1606 my $i = @$deck;
1607 while (--$i) {
1608 my $j = int rand ($i+1);
1609 @$deck[$i,$j] = @$deck[$j,$i];
1610 }
1611 }
5a964f20 1612
ac9dac7f
RGS
1613 # shuffle my mpeg collection
1614 #
1615 my @mpeg = <audio/*/*.mp3>;
1616 fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
1617 print @mpeg;
5a964f20 1618
45bbf655 1619Note that the above implementation shuffles an array in place,
ac9dac7f 1620unlike the C<List::Util::shuffle()> which takes a list and returns
45bbf655
JH
1621a new shuffled list.
1622
d92eb7b0 1623You've probably seen shuffling algorithms that work using splice,
a6dd486b 1624randomly picking another element to swap the current element with
68dc0745 1625
ac9dac7f
RGS
1626 srand;
1627 @new = ();
1628 @old = 1 .. 10; # just a demo
1629 while (@old) {
1630 push(@new, splice(@old, rand @old, 1));
1631 }
68dc0745 1632
ac9dac7f
RGS
1633This is bad because splice is already O(N), and since you do it N
1634times, you just invented a quadratic algorithm; that is, O(N**2).
1635This does not scale, although Perl is so efficient that you probably
1636won't notice this until you have rather largish arrays.
68dc0745 1637
1638=head2 How do I process/modify each element of an array?
1639
1640Use C<for>/C<foreach>:
1641
ac9dac7f 1642 for (@lines) {
6670e5e7
RGS
1643 s/foo/bar/; # change that word
1644 tr/XZ/ZX/; # swap those letters
ac9dac7f 1645 }
68dc0745 1646
1647Here's another; let's compute spherical volumes:
1648
ac9dac7f 1649 for (@volumes = @radii) { # @volumes has changed parts
6670e5e7
RGS
1650 $_ **= 3;
1651 $_ *= (4/3) * 3.14159; # this will be constant folded
ac9dac7f 1652 }
197aec24 1653
ac9dac7f 1654which can also be done with C<map()> which is made to transform
49d635f9
RGS
1655one list into another:
1656
1657 @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
68dc0745 1658
76817d6d
JH
1659If you want to do the same thing to modify the values of the
1660hash, you can use the C<values> function. As of Perl 5.6
1661the values are not copied, so if you modify $orbit (in this
1662case), you modify the value.
5a964f20 1663
ac9dac7f 1664 for $orbit ( values %orbits ) {
6670e5e7 1665 ($orbit **= 3) *= (4/3) * 3.14159;
ac9dac7f 1666 }
818c4caa 1667
76817d6d
JH
1668Prior to perl 5.6 C<values> returned copies of the values,
1669so older perl code often contains constructions such as
1670C<@orbits{keys %orbits}> instead of C<values %orbits> where
1671the hash is to be modified.
818c4caa 1672
68dc0745 1673=head2 How do I select a random element from an array?
1674
ac9dac7f 1675Use the C<rand()> function (see L<perlfunc/rand>):
68dc0745 1676
ac9dac7f
RGS
1677 $index = rand @array;
1678 $element = $array[$index];
68dc0745 1679
793f5136 1680Or, simply:
ac9dac7f
RGS
1681
1682 my $element = $array[ rand @array ];
5a964f20 1683
68dc0745 1684=head2 How do I permute N elements of a list?
c195e131
RGS
1685X<List::Permuter> X<permute> X<Algorithm::Loops> X<Knuth>
1686X<The Art of Computer Programming> X<Fischer-Krause>
68dc0745 1687
c195e131 1688Use the C<List::Permutor> module on CPAN. If the list is actually an
ac9dac7f 1689array, try the C<Algorithm::Permute> module (also on CPAN). It's
c195e131 1690written in XS code and is very efficient:
49d635f9
RGS
1691
1692 use Algorithm::Permute;
c195e131 1693
49d635f9
RGS
1694 my @array = 'a'..'d';
1695 my $p_iterator = Algorithm::Permute->new ( \@array );
c195e131 1696
49d635f9
RGS
1697 while (my @perm = $p_iterator->next) {
1698 print "next permutation: (@perm)\n";
ac9dac7f 1699 }
49d635f9 1700
197aec24
RGS
1701For even faster execution, you could do:
1702
ac9dac7f 1703 use Algorithm::Permute;
c195e131 1704
ac9dac7f 1705 my @array = 'a'..'d';
c195e131 1706
ac9dac7f
RGS
1707 Algorithm::Permute::permute {
1708 print "next permutation: (@array)\n";
1709 } @array;
197aec24 1710
c195e131
RGS
1711Here's a little program that generates all permutations of all the
1712words on each line of input. The algorithm embodied in the
1713C<permute()> function is discussed in Volume 4 (still unpublished) of
1714Knuth's I<The Art of Computer Programming> and will work on any list:
49d635f9
RGS
1715
1716 #!/usr/bin/perl -n
ac003c96 1717 # Fischer-Krause ordered permutation generator
49d635f9
RGS
1718
1719 sub permute (&@) {
1720 my $code = shift;
1721 my @idx = 0..$#_;
1722 while ( $code->(@_[@idx]) ) {
1723 my $p = $#idx;
1724 --$p while $idx[$p-1] > $idx[$p];
1725 my $q = $p or return;
1726 push @idx, reverse splice @idx, $p;
1727 ++$q while $idx[$p-1] > $idx[$q];
1728 @idx[$p-1,$q]=@idx[$q,$p-1];
1729 }
68dc0745 1730 }
68dc0745 1731
c195e131
RGS
1732 permute { print "@_\n" } split;
1733
1734The C<Algorithm::Loops> module also provides the C<NextPermute> and
1735C<NextPermuteNum> functions which efficiently find all unique permutations
1736of an array, even if it contains duplicate values, modifying it in-place:
1737if its elements are in reverse-sorted order then the array is reversed,
1738making it sorted, and it returns false; otherwise the next
1739permutation is returned.
1740
1741C<NextPermute> uses string order and C<NextPermuteNum> numeric order, so
1742you can enumerate all the permutations of C<0..9> like this:
1743
1744 use Algorithm::Loops qw(NextPermuteNum);
109f0441 1745
c195e131
RGS
1746 my @list= 0..9;
1747 do { print "@list\n" } while NextPermuteNum @list;
b8d2732a 1748
68dc0745 1749=head2 How do I sort an array by (anything)?
1750
1751Supply a comparison function to sort() (described in L<perlfunc/sort>):
1752
ac9dac7f 1753 @list = sort { $a <=> $b } @list;
68dc0745 1754
1755The default sort function is cmp, string comparison, which would
c47ff5f1 1756sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is
68dc0745 1757the numerical comparison operator.
1758
1759If you have a complicated function needed to pull out the part you
1760want to sort on, then don't do it inside the sort function. Pull it
1761out first, because the sort BLOCK can be called many times for the
1762same element. Here's an example of how to pull out the first word
1763after the first number on each item, and then sort those words
1764case-insensitively.
1765
ac9dac7f
RGS
1766 @idx = ();
1767 for (@data) {
1768 ($item) = /\d+\s*(\S+)/;
1769 push @idx, uc($item);
1770 }
1771 @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
68dc0745 1772
a6dd486b 1773which could also be written this way, using a trick
68dc0745 1774that's come to be known as the Schwartzian Transform:
1775
ac9dac7f
RGS
1776 @sorted = map { $_->[0] }
1777 sort { $a->[1] cmp $b->[1] }
1778 map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
68dc0745 1779
1780If you need to sort on several fields, the following paradigm is useful.
1781
ac9dac7f
RGS
1782 @sorted = sort {
1783 field1($a) <=> field1($b) ||
1784 field2($a) cmp field2($b) ||
1785 field3($a) cmp field3($b)
1786 } @data;
68dc0745 1787
1788This can be conveniently combined with precalculation of keys as given
1789above.
1790
379e39d7 1791See the F<sort> article in the "Far More Than You Ever Wanted
49d635f9 1792To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
06a5f41f 1793more about this approach.
68dc0745 1794
ac9dac7f 1795See also the question later in L<perlfaq4> on sorting hashes.
68dc0745 1796
1797=head2 How do I manipulate arrays of bits?
1798
ac9dac7f
RGS
1799Use C<pack()> and C<unpack()>, or else C<vec()> and the bitwise
1800operations.
1801
109f0441
S
1802For example, you don't have to store individual bits in an array
1803(which would mean that you're wasting a lot of space). To convert an
1804array of bits to a string, use C<vec()> to set the right bits. This
1805sets C<$vec> to have bit N set only if C<$ints[N]> was set:
ac9dac7f 1806
109f0441 1807 @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
ac9dac7f 1808 $vec = '';
109f0441
S
1809 foreach( 0 .. $#ints ) {
1810 vec($vec,$_,1) = 1 if $ints[$_];
1811 }
ac9dac7f 1812
109f0441
S
1813The string C<$vec> only takes up as many bits as it needs. For
1814instance, if you had 16 entries in C<@ints>, C<$vec> only needs two
1815bytes to store them (not counting the scalar variable overhead).
1816
1817Here's how, given a vector in C<$vec>, you can get those bits into
1818your C<@ints> array:
ac9dac7f
RGS
1819
1820 sub bitvec_to_list {
1821 my $vec = shift;
1822 my @ints;
1823 # Find null-byte density then select best algorithm
1824 if ($vec =~ tr/\0// / length $vec > 0.95) {
1825 use integer;
1826 my $i;
1827
1828 # This method is faster with mostly null-bytes
1829 while($vec =~ /[^\0]/g ) {
1830 $i = -9 + 8 * pos $vec;
1831 push @ints, $i if vec($vec, ++$i, 1);
1832 push @ints, $i if vec($vec, ++$i, 1);
1833 push @ints, $i if vec($vec, ++$i, 1);
1834 push @ints, $i if vec($vec, ++$i, 1);
1835 push @ints, $i if vec($vec, ++$i, 1);
1836 push @ints, $i if vec($vec, ++$i, 1);
1837 push @ints, $i if vec($vec, ++$i, 1);
1838 push @ints, $i if vec($vec, ++$i, 1);
1839 }
1840 }
1841 else {
1842 # This method is a fast general algorithm
1843 use integer;
1844 my $bits = unpack "b*", $vec;
1845 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1846 push @ints, pos $bits while($bits =~ /1/g);
1847 }
1848
1849 return \@ints;
1850 }
68dc0745 1851
1852This method gets faster the more sparse the bit vector is.
1853(Courtesy of Tim Bunce and Winfried Koenig.)
1854
76817d6d
JH
1855You can make the while loop a lot shorter with this suggestion
1856from Benjamin Goldberg:
1857
1858 while($vec =~ /[^\0]+/g ) {
ac9dac7f
RGS
1859 push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1860 }
76817d6d 1861
ac9dac7f 1862Or use the CPAN module C<Bit::Vector>:
cc30d1a7 1863
ac9dac7f
RGS
1864 $vector = Bit::Vector->new($num_of_bits);
1865 $vector->Index_List_Store(@ints);
1866 @ints = $vector->Index_List_Read();
cc30d1a7 1867
ac9dac7f
RGS
1868C<Bit::Vector> provides efficient methods for bit vector, sets of
1869small integers and "big int" math.
cc30d1a7
JH
1870
1871Here's a more extensive illustration using vec():
65acb1b1 1872
ac9dac7f
RGS
1873 # vec demo
1874 $vector = "\xff\x0f\xef\xfe";
1875 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
65acb1b1 1876 unpack("N", $vector), "\n";
ac9dac7f
RGS
1877 $is_set = vec($vector, 23, 1);
1878 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
65acb1b1 1879 pvec($vector);
65acb1b1 1880
ac9dac7f
RGS
1881 set_vec(1,1,1);
1882 set_vec(3,1,1);
1883 set_vec(23,1,1);
1884
1885 set_vec(3,1,3);
1886 set_vec(3,2,3);
1887 set_vec(3,4,3);
1888 set_vec(3,4,7);
1889 set_vec(3,8,3);
1890 set_vec(3,8,7);
1891
1892 set_vec(0,32,17);
1893 set_vec(1,32,17);
1894
1895 sub set_vec {
1896 my ($offset, $width, $value) = @_;
1897 my $vector = '';
1898 vec($vector, $offset, $width) = $value;
1899 print "offset=$offset width=$width value=$value\n";
1900 pvec($vector);
1901 }
65acb1b1 1902
ac9dac7f
RGS
1903 sub pvec {
1904 my $vector = shift;
1905 my $bits = unpack("b*", $vector);
1906 my $i = 0;
1907 my $BASE = 8;
1908
1909 print "vector length in bytes: ", length($vector), "\n";
1910 @bytes = unpack("A8" x length($vector), $bits);
1911 print "bits are: @bytes\n\n";
1912 }
65acb1b1 1913
68dc0745 1914=head2 Why does defined() return true on empty arrays and hashes?
1915
65acb1b1
TC
1916The short story is that you should probably only use defined on scalars or
1917functions, not on aggregates (arrays and hashes). See L<perlfunc/defined>
1918in the 5.004 release or later of Perl for more detail.
68dc0745 1919
1920=head1 Data: Hashes (Associative Arrays)
1921
1922=head2 How do I process an entire hash?
1923
ee891a00
RGS
1924(contributed by brian d foy)
1925
1926There are a couple of ways that you can process an entire hash. You
1927can get a list of keys, then go through each key, or grab a one
1928key-value pair at a time.
68dc0745 1929
ee891a00
RGS
1930To go through all of the keys, use the C<keys> function. This extracts
1931all of the keys of the hash and gives them back to you as a list. You
1932can then get the value through the particular key you're processing:
1933
1934 foreach my $key ( keys %hash ) {
1935 my $value = $hash{$key}
1936 ...
ac9dac7f 1937 }
68dc0745 1938
ee891a00 1939Once you have the list of keys, you can process that list before you
109f0441 1940process the hash elements. For instance, you can sort the keys so you
ee891a00
RGS
1941can process them in lexical order:
1942
1943 foreach my $key ( sort keys %hash ) {
1944 my $value = $hash{$key}
1945 ...
1946 }
1947
1948Or, you might want to only process some of the items. If you only want
1949to deal with the keys that start with C<text:>, you can select just
1950those using C<grep>:
1951
1952 foreach my $key ( grep /^text:/, keys %hash ) {
1953 my $value = $hash{$key}
1954 ...
1955 }
1956
1957If the hash is very large, you might not want to create a long list of
109f0441 1958keys. To save some memory, you can grab one key-value pair at a time using
ee891a00
RGS
1959C<each()>, which returns a pair you haven't seen yet:
1960
1961 while( my( $key, $value ) = each( %hash ) ) {
1962 ...
1963 }
1964
1965The C<each> operator returns the pairs in apparently random order, so if
1966ordering matters to you, you'll have to stick with the C<keys> method.
1967
1968The C<each()> operator can be a bit tricky though. You can't add or
1969delete keys of the hash while you're using it without possibly
1970skipping or re-processing some pairs after Perl internally rehashes
1971all of the elements. Additionally, a hash has only one iterator, so if
1972you use C<keys>, C<values>, or C<each> on the same hash, you can reset
1973the iterator and mess up your processing. See the C<each> entry in
1974L<perlfunc> for more details.
68dc0745 1975
109f0441
S
1976=head2 How do I merge two hashes?
1977X<hash> X<merge> X<slice, hash>
1978
1979(contributed by brian d foy)
1980
1981Before you decide to merge two hashes, you have to decide what to do
1982if both hashes contain keys that are the same and if you want to leave
1983the original hashes as they were.
1984
1985If you want to preserve the original hashes, copy one hash (C<%hash1>)
1986to a new hash (C<%new_hash>), then add the keys from the other hash
1987(C<%hash2> to the new hash. Checking that the key already exists in
1988C<%new_hash> gives you a chance to decide what to do with the
1989duplicates:
1990
1991 my %new_hash = %hash1; # make a copy; leave %hash1 alone
1992
1993 foreach my $key2 ( keys %hash2 )
1994 {
1995 if( exists $new_hash{$key2} )
1996 {
1997 warn "Key [$key2] is in both hashes!";
1998 # handle the duplicate (perhaps only warning)
1999 ...
2000 next;
2001 }
2002 else
2003 {
2004 $new_hash{$key2} = $hash2{$key2};
2005 }
2006 }
2007
2008If you don't want to create a new hash, you can still use this looping
2009technique; just change the C<%new_hash> to C<%hash1>.
2010
2011 foreach my $key2 ( keys %hash2 )
2012 {
2013 if( exists $hash1{$key2} )
2014 {
2015 warn "Key [$key2] is in both hashes!";
2016 # handle the duplicate (perhaps only warning)
2017 ...
2018 next;
2019 }
2020 else
2021 {
2022 $hash1{$key2} = $hash2{$key2};
2023 }
2024 }
2025
2026If you don't care that one hash overwrites keys and values from the other, you
2027could just use a hash slice to add one hash to another. In this case, values
2028from C<%hash2> replace values from C<%hash1> when they have keys in common:
2029
2030 @hash1{ keys %hash2 } = values %hash2;
2031
68dc0745 2032=head2 What happens if I add or remove keys from a hash while iterating over it?
2033
28b41a80 2034(contributed by brian d foy)
d92eb7b0 2035
28b41a80 2036The easy answer is "Don't do that!"
d92eb7b0 2037
28b41a80
RGS
2038If you iterate through the hash with each(), you can delete the key
2039most recently returned without worrying about it. If you delete or add
2040other keys, the iterator may skip or double up on them since perl
2041may rearrange the hash table. See the
2042entry for C<each()> in L<perlfunc>.
68dc0745 2043
2044=head2 How do I look up a hash element by value?
2045
2046Create a reverse hash:
2047
ac9dac7f
RGS
2048 %by_value = reverse %by_key;
2049 $key = $by_value{$value};
68dc0745 2050
2051That's not particularly efficient. It would be more space-efficient
2052to use:
2053
ac9dac7f
RGS
2054 while (($key, $value) = each %by_key) {
2055 $by_value{$value} = $key;
2056 }
68dc0745 2057
d92eb7b0
GS
2058If your hash could have repeated values, the methods above will only find
2059one of the associated keys. This may or may not worry you. If it does
2060worry you, you can always reverse the hash into a hash of arrays instead:
2061
ac9dac7f
RGS
2062 while (($key, $value) = each %by_key) {
2063 push @{$key_list_by_value{$value}}, $key;
2064 }
68dc0745 2065
2066=head2 How can I know how many entries are in a hash?
2067
109f0441
S
2068(contributed by brian d foy)
2069
2070This is very similar to "How do I process an entire hash?", also in
2071L<perlfaq4>, but a bit simpler in the common cases.
2072
2073You can use the C<keys()> built-in function in scalar context to find out
2074have many entries you have in a hash:
68dc0745 2075
109f0441
S
2076 my $key_count = keys %hash; # must be scalar context!
2077
2078If you want to find out how many entries have a defined value, that's
2079a bit different. You have to check each value. A C<grep> is handy:
2080
2081 my $defined_value_count = grep { defined } values %hash;
68dc0745 2082
109f0441
S
2083You can use that same structure to count the entries any way that
2084you like. If you want the count of the keys with vowels in them,
2085you just test for that instead:
2086
2087 my $vowel_count = grep { /[aeiou]/ } keys %hash;
2088
2089The C<grep> in scalar context returns the count. If you want the list
2090of matching items, just use it in list context instead:
2091
2092 my @defined_values = grep { defined } values %hash;
2093
2094The C<keys()> function also resets the iterator, which means that you may
197aec24 2095see strange results if you use this between uses of other hash operators
109f0441 2096such as C<each()>.
68dc0745 2097
2098=head2 How do I sort a hash (optionally by value instead of key)?
2099
a05e4845
RGS
2100(contributed by brian d foy)
2101
2102To sort a hash, start with the keys. In this example, we give the list of
2103keys to the sort function which then compares them ASCIIbetically (which
2104might be affected by your locale settings). The output list has the keys
2105in ASCIIbetical order. Once we have the keys, we can go through them to
2106create a report which lists the keys in ASCIIbetical order.
2107
2108 my @keys = sort { $a cmp $b } keys %hash;
58103a2e 2109
a05e4845
RGS
2110 foreach my $key ( @keys )
2111 {
109f0441 2112 printf "%-20s %6d\n", $key, $hash{$key};
a05e4845
RGS
2113 }
2114
58103a2e 2115We could get more fancy in the C<sort()> block though. Instead of
a05e4845 2116comparing the keys, we can compute a value with them and use that
58103a2e 2117value as the comparison.
a05e4845
RGS
2118
2119For instance, to make our report order case-insensitive, we use
58103a2e 2120the C<\L> sequence in a double-quoted string to make everything
a05e4845
RGS
2121lowercase. The C<sort()> block then compares the lowercased
2122values to determine in which order to put the keys.
2123
2124 my @keys = sort { "\L$a" cmp "\L$b" } keys %hash;
58103a2e 2125
a05e4845 2126Note: if the computation is expensive or the hash has many elements,
58103a2e 2127you may want to look at the Schwartzian Transform to cache the
a05e4845
RGS
2128computation results.
2129
2130If we want to sort by the hash value instead, we use the hash key
2131to look it up. We still get out a list of keys, but this time they
2132are ordered by their value.
2133
2134 my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
2135
2136From there we can get more complex. If the hash values are the same,
2137we can provide a secondary sort on the hash key.
2138
58103a2e
RGS
2139 my @keys = sort {
2140 $hash{$a} <=> $hash{$b}
a05e4845
RGS
2141 or
2142 "\L$a" cmp "\L$b"
2143 } keys %hash;
68dc0745 2144
2145=head2 How can I always keep my hash sorted?
ac9dac7f 2146X<hash tie sort DB_File Tie::IxHash>
68dc0745 2147
ac9dac7f
RGS
2148You can look into using the C<DB_File> module and C<tie()> using the
2149C<$DB_BTREE> hash bindings as documented in L<DB_File/"In Memory
2150Databases">. The C<Tie::IxHash> module from CPAN might also be
2151instructive. Although this does keep your hash sorted, you might not
2152like the slow down you suffer from the tie interface. Are you sure you
2153need to do this? :)
68dc0745 2154
2155=head2 What's the difference between "delete" and "undef" with hashes?
2156
92993692
JH
2157Hashes contain pairs of scalars: the first is the key, the
2158second is the value. The key will be coerced to a string,
2159although the value can be any kind of scalar: string,
ac9dac7f 2160number, or reference. If a key C<$key> is present in
92993692
JH
2161%hash, C<exists($hash{$key})> will return true. The value
2162for a given key can be C<undef>, in which case
2163C<$hash{$key}> will be C<undef> while C<exists $hash{$key}>
2164will return true. This corresponds to (C<$key>, C<undef>)
2165being in the hash.
68dc0745 2166
ac9dac7f 2167Pictures help... here's the C<%hash> table:
68dc0745 2168
2169 keys values
2170 +------+------+
2171 | a | 3 |
2172 | x | 7 |
2173 | d | 0 |
2174 | e | 2 |
2175 +------+------+
2176
2177And these conditions hold
2178
92993692
JH
2179 $hash{'a'} is true
2180 $hash{'d'} is false
2181 defined $hash{'d'} is true
2182 defined $hash{'a'} is true
e9d185f8 2183 exists $hash{'a'} is true (Perl 5 only)
92993692 2184 grep ($_ eq 'a', keys %hash) is true
68dc0745 2185
2186If you now say
2187
92993692 2188 undef $hash{'a'}
68dc0745 2189
2190your table now reads:
2191
2192
2193 keys values
2194 +------+------+
2195 | a | undef|
2196 | x | 7 |
2197 | d | 0 |
2198 | e | 2 |
2199 +------+------+
2200
2201and these conditions now hold; changes in caps:
2202
92993692
JH
2203 $hash{'a'} is FALSE
2204 $hash{'d'} is false
2205 defined $hash{'d'} is true
2206 defined $hash{'a'} is FALSE
e9d185f8 2207 exists $hash{'a'} is true (Perl 5 only)
92993692 2208 grep ($_ eq 'a', keys %hash) is true
68dc0745 2209
2210Notice the last two: you have an undef value, but a defined key!
2211
2212Now, consider this:
2213
92993692 2214 delete $hash{'a'}
68dc0745 2215
2216your table now reads:
2217
2218 keys values
2219 +------+------+
2220 | x | 7 |
2221 | d | 0 |
2222 | e | 2 |
2223 +------+------+
2224
2225and these conditions now hold; changes in caps:
2226
92993692
JH
2227 $hash{'a'} is false
2228 $hash{'d'} is false
2229 defined $hash{'d'} is true
2230 defined $hash{'a'} is false
e9d185f8 2231 exists $hash{'a'} is FALSE (Perl 5 only)
92993692 2232 grep ($_ eq 'a', keys %hash) is FALSE
68dc0745 2233
2234See, the whole entry is gone!
2235
2236=head2 Why don't my tied hashes make the defined/exists distinction?
2237
92993692
JH
2238This depends on the tied hash's implementation of EXISTS().
2239For example, there isn't the concept of undef with hashes
2240that are tied to DBM* files. It also means that exists() and
2241defined() do the same thing with a DBM* file, and what they
2242end up doing is not what they do with ordinary hashes.
68dc0745 2243
2244=head2 How do I reset an each() operation part-way through?
2245
fb2fe781
RGS
2246(contributed by brian d foy)
2247
2248You can use the C<keys> or C<values> functions to reset C<each>. To
2249simply reset the iterator used by C<each> without doing anything else,
2250use one of them in void context:
2251
2252 keys %hash; # resets iterator, nothing else.
2253 values %hash; # resets iterator, nothing else.
2254
2255See the documentation for C<each> in L<perlfunc>.
68dc0745 2256
2257=head2 How can I get the unique keys from two hashes?
2258
d92eb7b0
GS
2259First you extract the keys from the hashes into lists, then solve
2260the "removing duplicates" problem described above. For example:
68dc0745 2261
ac9dac7f
RGS
2262 %seen = ();
2263 for $element (keys(%foo), keys(%bar)) {
2264 $seen{$element}++;
2265 }
2266 @uniq = keys %seen;
68dc0745 2267
2268Or more succinctly:
2269
ac9dac7f 2270 @uniq = keys %{{%foo,%bar}};
68dc0745 2271
2272Or if you really want to save space:
2273
ac9dac7f
RGS
2274 %seen = ();
2275 while (defined ($key = each %foo)) {
2276 $seen{$key}++;
2277 }
2278 while (defined ($key = each %bar)) {
2279 $seen{$key}++;
2280 }
2281 @uniq = keys %seen;
68dc0745 2282
2283=head2 How can I store a multidimensional array in a DBM file?
2284
2285Either stringify the structure yourself (no fun), or else
2286get the MLDBM (which uses Data::Dumper) module from CPAN and layer
2287it on top of either DB_File or GDBM_File.
2288
2289=head2 How can I make my hash remember the order I put elements into it?
2290
ac9dac7f 2291Use the C<Tie::IxHash> from CPAN.
68dc0745 2292
ac9dac7f
RGS
2293 use Tie::IxHash;
2294
2295 tie my %myhash, 'Tie::IxHash';
2296
2297 for (my $i=0; $i<20; $i++) {
2298 $myhash{$i} = 2*$i;
2299 }
2300
2301 my @keys = keys %myhash;
2302 # @keys = (0,1,2,3,...)
46fc3d4c 2303
68dc0745 2304=head2 Why does passing a subroutine an undefined element in a hash create it?
2305
109f0441
S
2306(contributed by brian d foy)
2307
2308Are you using a really old version of Perl?
2309
2310Normally, accessing a hash key's value for a nonexistent key will
2311I<not> create the key.
2312
2313 my %hash = ();
2314 my $value = $hash{ 'foo' };
2315 print "This won't print\n" if exists $hash{ 'foo' };
2316
2317Passing C<$hash{ 'foo' }> to a subroutine used to be a special case, though.
2318Since you could assign directly to C<$_[0]>, Perl had to be ready to
2319make that assignment so it created the hash key ahead of time:
2320
2321 my_sub( $hash{ 'foo' } );
2322 print "This will print before 5.004\n" if exists $hash{ 'foo' };
68dc0745 2323
109f0441
S
2324 sub my_sub {
2325 # $_[0] = 'bar'; # create hash key in case you do this
2326 1;
2327 }
2328
2329Since Perl 5.004, however, this situation is a special case and Perl
2330creates the hash key only when you make the assignment:
68dc0745 2331
109f0441
S
2332 my_sub( $hash{ 'foo' } );
2333 print "This will print, even after 5.004\n" if exists $hash{ 'foo' };
2334
2335 sub my_sub {
2336 $_[0] = 'bar';
2337 }
68dc0745 2338
109f0441
S
2339However, if you want the old behavior (and think carefully about that
2340because it's a weird side effect), you can pass a hash slice instead.
2341Perl 5.004 didn't make this a special case:
68dc0745 2342
109f0441 2343 my_sub( @hash{ qw/foo/ } );
68dc0745 2344
fc36a67e 2345=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
68dc0745 2346
65acb1b1
TC
2347Usually a hash ref, perhaps like this:
2348
ac9dac7f
RGS
2349 $record = {
2350 NAME => "Jason",
2351 EMPNO => 132,
2352 TITLE => "deputy peon",
2353 AGE => 23,
2354 SALARY => 37_000,
2355 PALS => [ "Norbert", "Rhys", "Phineas"],
2356 };
65acb1b1
TC
2357
2358References are documented in L<perlref> and the upcoming L<perlreftut>.
2359Examples of complex data structures are given in L<perldsc> and
2360L<perllol>. Examples of structures and object-oriented classes are
2361in L<perltoot>.
68dc0745 2362
2363=head2 How can I use a reference as a hash key?
2364
109f0441 2365(contributed by brian d foy and Ben Morrow)
9e72e4c6
RGS
2366
2367Hash keys are strings, so you can't really use a reference as the key.
2368When you try to do that, perl turns the reference into its stringified
ac9dac7f
RGS
2369form (for instance, C<HASH(0xDEADBEEF)>). From there you can't get
2370back the reference from the stringified form, at least without doing
109f0441
S
2371some extra work on your own.
2372
2373Remember that the entry in the hash will still be there even if
2374the referenced variable goes out of scope, and that it is entirely
2375possible for Perl to subsequently allocate a different variable at
2376the same address. This will mean a new variable might accidentally
2377be associated with the value for an old.
2378
2379If you have Perl 5.10 or later, and you just want to store a value
2380against the reference for lookup later, you can use the core
2381Hash::Util::Fieldhash module. This will also handle renaming the
2382keys if you use multiple threads (which causes all variables to be
2383reallocated at new addresses, changing their stringification), and
2384garbage-collecting the entries when the referenced variable goes out
2385of scope.
2386
2387If you actually need to be able to get a real reference back from
2388each hash entry, you can use the Tie::RefHash module, which does the
2389required work for you.
68dc0745 2390
2391=head1 Data: Misc
2392
2393=head2 How do I handle binary data correctly?
2394
ac9dac7f 2395Perl is binary clean, so it can handle binary data just fine.
e573f903 2396On Windows or DOS, however, you have to use C<binmode> for binary
ac9dac7f
RGS
2397files to avoid conversions for line endings. In general, you should
2398use C<binmode> any time you want to work with binary data.
68dc0745 2399
ac9dac7f 2400Also see L<perlfunc/"binmode"> or L<perlopentut>.
68dc0745 2401
ac9dac7f 2402If you're concerned about 8-bit textual data then see L<perllocale>.
54310121 2403If you want to deal with multibyte characters, however, there are
68dc0745 2404some gotchas. See the section on Regular Expressions.
2405
2406=head2 How do I determine whether a scalar is a number/whole/integer/float?
2407
2408Assuming that you don't care about IEEE notations like "NaN" or
2409"Infinity", you probably just want to use a regular expression.
2410
ac9dac7f
RGS
2411 if (/\D/) { print "has nondigits\n" }
2412 if (/^\d+$/) { print "is a whole number\n" }
2413 if (/^-?\d+$/) { print "is an integer\n" }
2414 if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
2415 if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
2416 if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
2417 if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
881bdbd4 2418 { print "a C float\n" }
68dc0745 2419
f0d19b68
RGS
2420There are also some commonly used modules for the task.
2421L<Scalar::Util> (distributed with 5.8) provides access to perl's
ac9dac7f
RGS
2422internal function C<looks_like_number> for determining whether a
2423variable looks like a number. L<Data::Types> exports functions that
2424validate data types using both the above and other regular
2425expressions. Thirdly, there is C<Regexp::Common> which has regular
2426expressions to match various types of numbers. Those three modules are
2427available from the CPAN.
f0d19b68
RGS
2428
2429If you're on a POSIX system, Perl supports the C<POSIX::strtod>
ac9dac7f
RGS
2430function. Its semantics are somewhat cumbersome, so here's a
2431C<getnum> wrapper function for more convenient access. This function
2432takes a string and returns the number it found, or C<undef> for input
2433that isn't a C float. The C<is_numeric> function is a front end to
2434C<getnum> if you just want to say, "Is this a float?"
2435
2436 sub getnum {
2437 use POSIX qw(strtod);
2438 my $str = shift;
2439 $str =~ s/^\s+//;
2440 $str =~ s/\s+$//;
2441 $! = 0;
2442 my($num, $unparsed) = strtod($str);
2443 if (($str eq '') || ($unparsed != 0) || $!) {
2444 return undef;
2445 }
2446 else {
2447 return $num;
2448 }
2449 }
5a964f20 2450
ac9dac7f 2451 sub is_numeric { defined getnum($_[0]) }
5a964f20 2452
f0d19b68 2453Or you could check out the L<String::Scanf> module on the CPAN
ac9dac7f
RGS
2454instead. The C<POSIX> module (part of the standard Perl distribution)
2455provides the C<strtod> and C<strtol> for converting strings to double
2456and longs, respectively.
68dc0745 2457
2458=head2 How do I keep persistent data across program calls?
2459
2460For some specific applications, you can use one of the DBM modules.
ac9dac7f
RGS
2461See L<AnyDBM_File>. More generically, you should consult the C<FreezeThaw>
2462or C<Storable> modules from CPAN. Starting from Perl 5.8 C<Storable> is part
2463of the standard distribution. Here's one example using C<Storable>'s C<store>
fe854a6f 2464and C<retrieve> functions:
65acb1b1 2465
ac9dac7f
RGS
2466 use Storable;
2467 store(\%hash, "filename");
65acb1b1 2468
ac9dac7f
RGS
2469 # later on...
2470 $href = retrieve("filename"); # by ref
2471 %hash = %{ retrieve("filename") }; # direct to hash
68dc0745 2472
2473=head2 How do I print out or copy a recursive data structure?
2474
ac9dac7f
RGS
2475The C<Data::Dumper> module on CPAN (or the 5.005 release of Perl) is great
2476for printing out data structures. The C<Storable> module on CPAN (or the
6f82c03a
EM
24775.8 release of Perl), provides a function called C<dclone> that recursively
2478copies its argument.
65acb1b1 2479
ac9dac7f
RGS
2480 use Storable qw(dclone);
2481 $r2 = dclone($r1);
68dc0745 2482
ac9dac7f 2483Where C<$r1> can be a reference to any kind of data structure you'd like.
65acb1b1
TC
2484It will be deeply copied. Because C<dclone> takes and returns references,
2485you'd have to add extra punctuation if you had a hash of arrays that
2486you wanted to copy.
68dc0745 2487
ac9dac7f 2488 %newhash = %{ dclone(\%oldhash) };
68dc0745 2489
2490=head2 How do I define methods for every class/object?
2491
109f0441
S
2492(contributed by Ben Morrow)
2493
2494You can use the C<UNIVERSAL> class (see L<UNIVERSAL>). However, please
2495be very careful to consider the consequences of doing this: adding
2496methods to every object is very likely to have unintended
2497consequences. If possible, it would be better to have all your object
2498inherit from some common base class, or to use an object system like
2499Moose that supports roles.
68dc0745 2500
2501=head2 How do I verify a credit card checksum?
2502
ac9dac7f 2503Get the C<Business::CreditCard> module from CPAN.
68dc0745 2504
65acb1b1
TC
2505=head2 How do I pack arrays of doubles or floats for XS code?
2506
109f0441 2507The arrays.h/arrays.c code in the C<PGPLOT> module on CPAN does just this.
65acb1b1 2508If you're doing a lot of float or double processing, consider using
ac9dac7f 2509the C<PDL> module from CPAN instead--it makes number-crunching easy.
65acb1b1 2510
109f0441
S
2511See L<http://search.cpan.org/dist/PGPLOT> for the code.
2512
500071f4
RGS
2513=head1 REVISION
2514
109f0441 2515Revision: $Revision$
500071f4 2516
109f0441 2517Date: $Date$
500071f4
RGS
2518
2519See L<perlfaq> for source control details and availability.
2520
68dc0745 2521=head1 AUTHOR AND COPYRIGHT
2522
109f0441 2523Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and
7678cced 2524other authors as noted. All rights reserved.
5a964f20 2525
5a7beb56
JH
2526This documentation is free; you can redistribute it and/or modify it
2527under the same terms as Perl itself.
5a964f20
TC
2528
2529Irrespective of its distribution, all code examples in this file
2530are hereby placed into the public domain. You are permitted and
2531encouraged to use this code in your own programs for fun
2532or for profit as you see fit. A simple comment in the code giving
2533credit would be courteous but is not required.