This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Implement facility to plug in syntax triggered by keywords
[perl5.git] / pod / perlfaq4.pod
CommitLineData
68dc0745 1=head1 NAME
2
109f0441 3perlfaq4 - Data Manipulation
68dc0745 4
5=head1 DESCRIPTION
6
ae3d0b9f
JH
7This section of the FAQ answers questions related to manipulating
8numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
68dc0745 9
10=head1 Data: Numbers
11
46fc3d4c 12=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
13
ac9dac7f
RGS
14Internally, your computer represents floating-point numbers in binary.
15Digital (as in powers of two) computers cannot store all numbers
16exactly. Some real numbers lose precision in the process. This is a
17problem with how computers store numbers and affects all computer
18languages, not just Perl.
46fc3d4c 19
ee891a00 20L<perlnumber> shows the gory details of number representations and
ac9dac7f 21conversions.
49d635f9 22
ac9dac7f
RGS
23To limit the number of decimal places in your numbers, you can use the
24printf or sprintf function. See the L<"Floating Point
25Arithmetic"|perlop> for more details.
49d635f9
RGS
26
27 printf "%.2f", 10/3;
197aec24 28
49d635f9 29 my $number = sprintf "%.2f", 10/3;
197aec24 30
32969b6e
BB
31=head2 Why is int() broken?
32
ac9dac7f 33Your C<int()> is most probably working just fine. It's the numbers that
32969b6e
BB
34aren't quite what you think.
35
ac9dac7f 36First, see the answer to "Why am I getting long decimals
32969b6e
BB
37(eg, 19.9499999999999) instead of the numbers I should be getting
38(eg, 19.95)?".
39
40For example, this
41
ac9dac7f 42 print int(0.6/0.2-2), "\n";
32969b6e
BB
43
44will in most computers print 0, not 1, because even such simple
45numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
46numbers. What you think in the above as 'three' is really more like
472.9999999999999995559.
48
68dc0745 49=head2 Why isn't my octal data interpreted correctly?
50
109f0441
S
51(contributed by brian d foy)
52
53You're probably trying to convert a string to a number, which Perl only
54converts as a decimal number. When Perl converts a string to a number, it
55ignores leading spaces and zeroes, then assumes the rest of the digits
56are in base 10:
57
58 my $string = '0644';
59
60 print $string + 0; # prints 644
61
62 print $string + 44; # prints 688, certainly not octal!
63
64This problem usually involves one of the Perl built-ins that has the
65same name a unix command that uses octal numbers as arguments on the
66command line. In this example, C<chmod> on the command line knows that
67its first argument is octal because that's what it does:
68
69 %prompt> chmod 644 file
70
71If you want to use the same literal digits (644) in Perl, you have to tell
72Perl to treat them as octal numbers either by prefixing the digits with
73a C<0> or using C<oct>:
74
75 chmod( 0644, $file); # right, has leading zero
76 chmod( oct(644), $file ); # also correct
68dc0745 77
109f0441
S
78The problem comes in when you take your numbers from something that Perl
79thinks is a string, such as a command line argument in C<@ARGV>:
68dc0745 80
109f0441 81 chmod( $ARGV[0], $file); # wrong, even if "0644"
68dc0745 82
109f0441 83 chmod( oct($ARGV[0]), $file ); # correct, treat string as octal
33ce146f 84
109f0441
S
85You can always check the value you're using by printing it in octal
86notation to ensure it matches what you think it should be. Print it
87in octal and decimal format:
33ce146f 88
109f0441 89 printf "0%o %d", $number, $number;
33ce146f 90
65acb1b1 91=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
68dc0745 92
ac9dac7f
RGS
93Remember that C<int()> merely truncates toward 0. For rounding to a
94certain number of digits, C<sprintf()> or C<printf()> is usually the
95easiest route.
92c2ed05 96
ac9dac7f 97 printf("%.3f", 3.1415926535); # prints 3.142
68dc0745 98
ac9dac7f
RGS
99The C<POSIX> module (part of the standard Perl distribution)
100implements C<ceil()>, C<floor()>, and a number of other mathematical
101and trigonometric functions.
68dc0745 102
ac9dac7f
RGS
103 use POSIX;
104 $ceil = ceil(3.5); # 4
105 $floor = floor(3.5); # 3
92c2ed05 106
ac9dac7f
RGS
107In 5.000 to 5.003 perls, trigonometry was done in the C<Math::Complex>
108module. With 5.004, the C<Math::Trig> module (part of the standard Perl
46fc3d4c 109distribution) implements the trigonometric functions. Internally it
ac9dac7f 110uses the C<Math::Complex> module and some functions can break out from
46fc3d4c 111the real axis into the complex plane, for example the inverse sine of
1122.
68dc0745 113
114Rounding in financial applications can have serious implications, and
115the rounding method used should be specified precisely. In these
116cases, it probably pays not to trust whichever system rounding is
117being used by Perl, but to instead implement the rounding function you
118need yourself.
119
65acb1b1
TC
120To see why, notice how you'll still have an issue on half-way-point
121alternation:
122
ac9dac7f 123 for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
65acb1b1 124
ac9dac7f
RGS
125 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
126 0.8 0.8 0.9 0.9 1.0 1.0
65acb1b1 127
ac9dac7f
RGS
128Don't blame Perl. It's the same as in C. IEEE says we have to do
129this. Perl numbers whose absolute values are integers under 2**31 (on
13032 bit machines) will work pretty much like mathematical integers.
131Other numbers are not guaranteed.
65acb1b1 132
6f0efb17 133=head2 How do I convert between numeric representations/bases/radixes?
68dc0745 134
ac9dac7f
RGS
135As always with Perl there is more than one way to do it. Below are a
136few examples of approaches to making common conversions between number
137representations. This is intended to be representational rather than
138exhaustive.
68dc0745 139
ac9dac7f
RGS
140Some of the examples later in L<perlfaq4> use the C<Bit::Vector>
141module from CPAN. The reason you might choose C<Bit::Vector> over the
142perl built in functions is that it works with numbers of ANY size,
143that it is optimized for speed on some operations, and for at least
144some programmers the notation might be familiar.
d92eb7b0 145
818c4caa
JH
146=over 4
147
148=item How do I convert hexadecimal into decimal
d92eb7b0 149
ac9dac7f 150Using perl's built in conversion of C<0x> notation:
6761e064 151
ac9dac7f 152 $dec = 0xDEADBEEF;
7207e29d 153
ac9dac7f 154Using the C<hex> function:
6761e064 155
ac9dac7f 156 $dec = hex("DEADBEEF");
6761e064 157
ac9dac7f 158Using C<pack>:
6761e064 159
ac9dac7f 160 $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
6761e064 161
ac9dac7f 162Using the CPAN module C<Bit::Vector>:
6761e064 163
ac9dac7f
RGS
164 use Bit::Vector;
165 $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
166 $dec = $vec->to_Dec();
6761e064 167
818c4caa 168=item How do I convert from decimal to hexadecimal
6761e064 169
ac9dac7f 170Using C<sprintf>:
6761e064 171
ac9dac7f
RGS
172 $hex = sprintf("%X", 3735928559); # upper case A-F
173 $hex = sprintf("%x", 3735928559); # lower case a-f
6761e064 174
ac9dac7f 175Using C<unpack>:
6761e064 176
ac9dac7f 177 $hex = unpack("H*", pack("N", 3735928559));
6761e064 178
ac9dac7f 179Using C<Bit::Vector>:
6761e064 180
ac9dac7f
RGS
181 use Bit::Vector;
182 $vec = Bit::Vector->new_Dec(32, -559038737);
183 $hex = $vec->to_Hex();
6761e064 184
ac9dac7f 185And C<Bit::Vector> supports odd bit counts:
6761e064 186
ac9dac7f
RGS
187 use Bit::Vector;
188 $vec = Bit::Vector->new_Dec(33, 3735928559);
189 $vec->Resize(32); # suppress leading 0 if unwanted
190 $hex = $vec->to_Hex();
6761e064 191
818c4caa 192=item How do I convert from octal to decimal
6761e064
JH
193
194Using Perl's built in conversion of numbers with leading zeros:
195
ac9dac7f 196 $dec = 033653337357; # note the leading 0!
6761e064 197
ac9dac7f 198Using the C<oct> function:
6761e064 199
ac9dac7f 200 $dec = oct("33653337357");
6761e064 201
ac9dac7f 202Using C<Bit::Vector>:
6761e064 203
ac9dac7f
RGS
204 use Bit::Vector;
205 $vec = Bit::Vector->new(32);
206 $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
207 $dec = $vec->to_Dec();
6761e064 208
818c4caa 209=item How do I convert from decimal to octal
6761e064 210
ac9dac7f 211Using C<sprintf>:
6761e064 212
ac9dac7f 213 $oct = sprintf("%o", 3735928559);
6761e064 214
ac9dac7f 215Using C<Bit::Vector>:
6761e064 216
ac9dac7f
RGS
217 use Bit::Vector;
218 $vec = Bit::Vector->new_Dec(32, -559038737);
219 $oct = reverse join('', $vec->Chunk_List_Read(3));
6761e064 220
818c4caa 221=item How do I convert from binary to decimal
6761e064 222
2c646907 223Perl 5.6 lets you write binary numbers directly with
ac9dac7f 224the C<0b> notation:
2c646907 225
ac9dac7f 226 $number = 0b10110110;
6f0efb17 227
ac9dac7f 228Using C<oct>:
6f0efb17 229
ac9dac7f
RGS
230 my $input = "10110110";
231 $decimal = oct( "0b$input" );
2c646907 232
ac9dac7f 233Using C<pack> and C<ord>:
d92eb7b0 234
ac9dac7f 235 $decimal = ord(pack('B8', '10110110'));
68dc0745 236
ac9dac7f 237Using C<pack> and C<unpack> for larger strings:
6761e064 238
ac9dac7f 239 $int = unpack("N", pack("B32",
6761e064 240 substr("0" x 32 . "11110101011011011111011101111", -32)));
ac9dac7f 241 $dec = sprintf("%d", $int);
6761e064 242
ac9dac7f 243 # substr() is used to left pad a 32 character string with zeros.
6761e064 244
ac9dac7f 245Using C<Bit::Vector>:
6761e064 246
ac9dac7f
RGS
247 $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
248 $dec = $vec->to_Dec();
6761e064 249
818c4caa 250=item How do I convert from decimal to binary
6761e064 251
ac9dac7f 252Using C<sprintf> (perl 5.6+):
4dfcc30b 253
ac9dac7f 254 $bin = sprintf("%b", 3735928559);
4dfcc30b 255
ac9dac7f 256Using C<unpack>:
6761e064 257
ac9dac7f 258 $bin = unpack("B*", pack("N", 3735928559));
6761e064 259
ac9dac7f 260Using C<Bit::Vector>:
6761e064 261
ac9dac7f
RGS
262 use Bit::Vector;
263 $vec = Bit::Vector->new_Dec(32, -559038737);
264 $bin = $vec->to_Bin();
6761e064
JH
265
266The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
267are left as an exercise to the inclined reader.
68dc0745 268
818c4caa 269=back
68dc0745 270
65acb1b1
TC
271=head2 Why doesn't & work the way I want it to?
272
273The behavior of binary arithmetic operators depends on whether they're
274used on numbers or strings. The operators treat a string as a series
275of bits and work with that (the string C<"3"> is the bit pattern
276C<00110011>). The operators work with the binary form of a number
277(the number C<3> is treated as the bit pattern C<00000011>).
278
279So, saying C<11 & 3> performs the "and" operation on numbers (yielding
49d635f9 280C<3>). Saying C<"11" & "3"> performs the "and" operation on strings
65acb1b1
TC
281(yielding C<"1">).
282
283Most problems with C<&> and C<|> arise because the programmer thinks
284they have a number but really it's a string. The rest arise because
285the programmer says:
286
ac9dac7f
RGS
287 if ("\020\020" & "\101\101") {
288 # ...
289 }
65acb1b1
TC
290
291but a string consisting of two null bytes (the result of C<"\020\020"
292& "\101\101">) is not a false value in Perl. You need:
293
ac9dac7f
RGS
294 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
295 # ...
296 }
65acb1b1 297
68dc0745 298=head2 How do I multiply matrices?
299
300Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
301or the PDL extension (also available from CPAN).
302
303=head2 How do I perform an operation on a series of integers?
304
305To call a function on each element in an array, and collect the
306results, use:
307
ac9dac7f 308 @results = map { my_func($_) } @array;
68dc0745 309
310For example:
311
ac9dac7f 312 @triple = map { 3 * $_ } @single;
68dc0745 313
314To call a function on each element of an array, but ignore the
315results:
316
ac9dac7f
RGS
317 foreach $iterator (@array) {
318 some_func($iterator);
319 }
68dc0745 320
321To call a function on each integer in a (small) range, you B<can> use:
322
ac9dac7f 323 @results = map { some_func($_) } (5 .. 25);
68dc0745 324
325but you should be aware that the C<..> operator creates an array of
326all integers in the range. This can take a lot of memory for large
327ranges. Instead use:
328
ac9dac7f
RGS
329 @results = ();
330 for ($i=5; $i < 500_005; $i++) {
331 push(@results, some_func($i));
332 }
68dc0745 333
87275199
GS
334This situation has been fixed in Perl5.005. Use of C<..> in a C<for>
335loop will iterate over the range, without creating the entire range.
336
ac9dac7f
RGS
337 for my $i (5 .. 500_005) {
338 push(@results, some_func($i));
339 }
87275199
GS
340
341will not create a list of 500,000 integers.
342
68dc0745 343=head2 How can I output Roman numerals?
344
a93751fa 345Get the http://www.cpan.org/modules/by-module/Roman module.
68dc0745 346
347=head2 Why aren't my random numbers random?
348
65acb1b1
TC
349If you're using a version of Perl before 5.004, you must call C<srand>
350once at the start of your program to seed the random number generator.
49d635f9 351
5cd0b561 352 BEGIN { srand() if $] < 5.004 }
49d635f9 353
65acb1b1 3545.004 and later automatically call C<srand> at the beginning. Don't
ac9dac7f
RGS
355call C<srand> more than once--you make your numbers less random,
356rather than more.
92c2ed05 357
65acb1b1 358Computers are good at being predictable and bad at being random
06a5f41f 359(despite appearances caused by bugs in your programs :-). see the
49d635f9 360F<random> article in the "Far More Than You Ever Wanted To Know"
ac9dac7f
RGS
361collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy
362of Tom Phoenix, talks more about this. John von Neumann said, "Anyone
06a5f41f 363who attempts to generate random numbers by deterministic means is, of
b432a672 364course, living in a state of sin."
65acb1b1
TC
365
366If you want numbers that are more random than C<rand> with C<srand>
ac9dac7f 367provides, you should also check out the C<Math::TrulyRandom> module from
65acb1b1
TC
368CPAN. It uses the imperfections in your system's timer to generate
369random numbers, but this takes quite a while. If you want a better
92c2ed05 370pseudorandom generator than comes with your operating system, look at
b432a672 371"Numerical Recipes in C" at http://www.nr.com/ .
68dc0745 372
881bdbd4
JH
373=head2 How do I get a random number between X and Y?
374
ee891a00 375To get a random number between two values, you can use the C<rand()>
109f0441 376built-in to get a random number between 0 and 1. From there, you shift
ee891a00 377that into the range that you want.
500071f4 378
ee891a00
RGS
379C<rand($x)> returns a number such that C<< 0 <= rand($x) < $x >>. Thus
380what you want to have perl figure out is a random number in the range
381from 0 to the difference between your I<X> and I<Y>.
793f5136 382
ee891a00
RGS
383That is, to get a number between 10 and 15, inclusive, you want a
384random number between 0 and 5 that you can then add to 10.
793f5136 385
109f0441 386 my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
793f5136
RGS
387
388Hence you derive the following simple function to abstract
389that. It selects a random integer between the two given
500071f4
RGS
390integers (inclusive), For example: C<random_int_between(50,120)>.
391
ac9dac7f 392 sub random_int_between {
500071f4
RGS
393 my($min, $max) = @_;
394 # Assumes that the two arguments are integers themselves!
395 return $min if $min == $max;
396 ($min, $max) = ($max, $min) if $min > $max;
397 return $min + int rand(1 + $max - $min);
398 }
881bdbd4 399
68dc0745 400=head1 Data: Dates
401
5cd0b561 402=head2 How do I find the day or week of the year?
68dc0745 403
571e049f 404The localtime function returns the day of the year. Without an
5cd0b561 405argument localtime uses the current time.
68dc0745 406
a05e4845 407 $day_of_year = (localtime)[7];
ffc145e8 408
ac9dac7f 409The C<POSIX> module can also format a date as the day of the year or
5cd0b561 410week of the year.
68dc0745 411
5cd0b561
RGS
412 use POSIX qw/strftime/;
413 my $day_of_year = strftime "%j", localtime;
414 my $week_of_year = strftime "%W", localtime;
415
ac9dac7f 416To get the day of year for any date, use C<POSIX>'s C<mktime> to get
5cd0b561 417a time in epoch seconds for the argument to localtime.
ffc145e8 418
ac9dac7f 419 use POSIX qw/mktime strftime/;
6670e5e7 420 my $week_of_year = strftime "%W",
ac9dac7f 421 localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
5cd0b561 422
ac9dac7f 423The C<Date::Calc> module provides two functions to calculate these.
5cd0b561
RGS
424
425 use Date::Calc;
426 my $day_of_year = Day_of_Year( 1987, 12, 18 );
427 my $week_of_year = Week_of_Year( 1987, 12, 18 );
ffc145e8 428
d92eb7b0
GS
429=head2 How do I find the current century or millennium?
430
431Use the following simple functions:
432
ac9dac7f
RGS
433 sub get_century {
434 return int((((localtime(shift || time))[5] + 1999))/100);
435 }
6670e5e7 436
ac9dac7f
RGS
437 sub get_millennium {
438 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
439 }
d92eb7b0 440
ac9dac7f
RGS
441On some systems, the C<POSIX> module's C<strftime()> function has been
442extended in a non-standard way to use a C<%C> format, which they
443sometimes claim is the "century". It isn't, because on most such
444systems, this is only the first two digits of the four-digit year, and
445thus cannot be used to reliably determine the current century or
446millennium.
d92eb7b0 447
92c2ed05 448=head2 How can I compare two dates and find the difference?
68dc0745 449
b68463f7
RGS
450(contributed by brian d foy)
451
ac9dac7f
RGS
452You could just store all your dates as a number and then subtract.
453Life isn't always that simple though. If you want to work with
454formatted dates, the C<Date::Manip>, C<Date::Calc>, or C<DateTime>
455modules can help you.
68dc0745 456
457=head2 How can I take a string and turn it into epoch seconds?
458
459If it's a regular enough string that it always has the same format,
92c2ed05 460you can split it up and pass the parts to C<timelocal> in the standard
ac9dac7f
RGS
461C<Time::Local> module. Otherwise, you should look into the C<Date::Calc>
462and C<Date::Manip> modules from CPAN.
68dc0745 463
464=head2 How can I find the Julian Day?
465
7678cced
RGS
466(contributed by brian d foy and Dave Cross)
467
ac9dac7f
RGS
468You can use the C<Time::JulianDay> module available on CPAN. Ensure
469that you really want to find a Julian day, though, as many people have
7678cced
RGS
470different ideas about Julian days. See
471http://www.hermetic.ch/cal_stud/jdn.htm for instance.
472
ac9dac7f 473You can also try the C<DateTime> module, which can convert a date/time
7678cced
RGS
474to a Julian Day.
475
ac9dac7f
RGS
476 $ perl -MDateTime -le'print DateTime->today->jd'
477 2453401.5
7678cced
RGS
478
479Or the modified Julian Day
480
ac9dac7f
RGS
481 $ perl -MDateTime -le'print DateTime->today->mjd'
482 53401
7678cced
RGS
483
484Or even the day of the year (which is what some people think of as a
485Julian day)
486
ac9dac7f
RGS
487 $ perl -MDateTime -le'print DateTime->today->doy'
488 31
be94a901 489
65acb1b1 490=head2 How do I find yesterday's date?
109f0441
S
491X<date> X<yesterday> X<DateTime> X<Date::Calc> X<Time::Local>
492X<daylight saving time> X<day> X<Today_and_Now> X<localtime>
493X<timelocal>
65acb1b1 494
6670e5e7 495(contributed by brian d foy)
49d635f9 496
6670e5e7
RGS
497Use one of the Date modules. The C<DateTime> module makes it simple, and
498give you the same time of day, only the day before.
49d635f9 499
6670e5e7 500 use DateTime;
58103a2e 501
6670e5e7 502 my $yesterday = DateTime->now->subtract( days => 1 );
58103a2e 503
6670e5e7 504 print "Yesterday was $yesterday\n";
49d635f9 505
ee891a00 506You can also use the C<Date::Calc> module using its C<Today_and_Now>
6670e5e7 507function.
49d635f9 508
6670e5e7 509 use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
58103a2e 510
6670e5e7 511 my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
58103a2e 512
ee891a00 513 print "@date_time\n";
58103a2e 514
6670e5e7
RGS
515Most people try to use the time rather than the calendar to figure out
516dates, but that assumes that days are twenty-four hours each. For
517most people, there are two days a year when they aren't: the switch to
518and from summer time throws this off. Let the modules do the work.
d92eb7b0 519
109f0441
S
520If you absolutely must do it yourself (or can't use one of the
521modules), here's a solution using C<Time::Local>, which comes with
522Perl:
523
524 # contributed by Gunnar Hjalmarsson
525 use Time::Local;
526 my $today = timelocal 0, 0, 12, ( localtime )[3..5];
527 my ($d, $m, $y) = ( localtime $today-86400 )[3..5];
528 printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d;
529
530In this case, you measure the day starting at noon, and subtract 24
531hours. Even if the length of the calendar day is 23 or 25 hours,
532you'll still end up on the previous calendar day, although not at
533noon. Since you don't care about the time, the one hour difference
534doesn't matter and you end up with the previous date.
535
ac9dac7f 536=head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
68dc0745 537
65acb1b1 538Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is
ac9dac7f 539Y2K compliant (whatever that means). The programmers you've hired to
65acb1b1
TC
540use it, however, probably are not.
541
542Long answer: The question belies a true understanding of the issue.
543Perl is just as Y2K compliant as your pencil--no more, and no less.
544Can you use your pencil to write a non-Y2K-compliant memo? Of course
545you can. Is that the pencil's fault? Of course it isn't.
92c2ed05 546
87275199 547The date and time functions supplied with Perl (gmtime and localtime)
f12f5f55 548supply adequate information to determine the year well beyond 2000
549(2038 is when trouble strikes for 32-bit machines). The year returned
550by these functions when used in a list context is the year minus 1900.
551For years between 1910 and 1999 this I<happens> to be a 2-digit decimal
552number. To avoid the year 2000 problem simply do not treat the year as
553a 2-digit number. It isn't.
68dc0745 554
5a964f20 555When gmtime() and localtime() are used in scalar context they return
68dc0745 556a timestamp string that contains a fully-expanded year. For example,
557C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
5582001". There's no year 2000 problem here.
559
5a964f20
TC
560That doesn't mean that Perl can't be used to create non-Y2K compliant
561programs. It can. But so can your pencil. It's the fault of the user,
b432a672
AL
562not the language. At the risk of inflaming the NRA: "Perl doesn't
563break Y2K, people do." See http://www.perl.org/about/y2k.html for
5a964f20
TC
564a longer exposition.
565
68dc0745 566=head1 Data: Strings
567
568=head2 How do I validate input?
569
6670e5e7
RGS
570(contributed by brian d foy)
571
572There are many ways to ensure that values are what you expect or
573want to accept. Besides the specific examples that we cover in the
574perlfaq, you can also look at the modules with "Assert" and "Validate"
575in their names, along with other modules such as C<Regexp::Common>.
576
577Some modules have validation for particular types of input, such
578as C<Business::ISBN>, C<Business::CreditCard>, C<Email::Valid>,
579and C<Data::Validate::IP>.
68dc0745 580
581=head2 How do I unescape a string?
582
b432a672 583It depends just what you mean by "escape". URL escapes are dealt
92c2ed05 584with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
a6dd486b 585character are removed with
68dc0745 586
ac9dac7f 587 s/\\(.)/$1/g;
68dc0745 588
92c2ed05 589This won't expand C<"\n"> or C<"\t"> or any other special escapes.
68dc0745 590
591=head2 How do I remove consecutive pairs of characters?
592
6670e5e7
RGS
593(contributed by brian d foy)
594
595You can use the substitution operator to find pairs of characters (or
596runs of characters) and replace them with a single instance. In this
597substitution, we find a character in C<(.)>. The memory parentheses
598store the matched character in the back-reference C<\1> and we use
599that to require that the same thing immediately follow it. We replace
600that part of the string with the character in C<$1>.
68dc0745 601
ac9dac7f 602 s/(.)\1/$1/g;
d92eb7b0 603
6670e5e7
RGS
604We can also use the transliteration operator, C<tr///>. In this
605example, the search list side of our C<tr///> contains nothing, but
606the C<c> option complements that so it contains everything. The
607replacement list also contains nothing, so the transliteration is
608almost a no-op since it won't do any replacements (or more exactly,
609replace the character with itself). However, the C<s> option squashes
610duplicated and consecutive characters in the string so a character
611does not show up next to itself
d92eb7b0 612
6670e5e7 613 my $str = 'Haarlem'; # in the Netherlands
ac9dac7f 614 $str =~ tr///cs; # Now Harlem, like in New York
68dc0745 615
616=head2 How do I expand function calls in a string?
617
6670e5e7
RGS
618(contributed by brian d foy)
619
620This is documented in L<perlref>, and although it's not the easiest
621thing to read, it does work. In each of these examples, we call the
58103a2e 622function inside the braces used to dereference a reference. If we
5ae37c3f 623have more than one return value, we can construct and dereference an
6670e5e7
RGS
624anonymous array. In this case, we call the function in list context.
625
58103a2e 626 print "The time values are @{ [localtime] }.\n";
6670e5e7
RGS
627
628If we want to call the function in scalar context, we have to do a bit
629more work. We can really have any code we like inside the braces, so
630we simply have to end with the scalar reference, although how you do
e573f903
RGS
631that is up to you, and you can use code inside the braces. Note that
632the use of parens creates a list context, so we need C<scalar> to
633force the scalar context on the function:
68dc0745 634
6670e5e7 635 print "The time is ${\(scalar localtime)}.\n"
58103a2e 636
6670e5e7 637 print "The time is ${ my $x = localtime; \$x }.\n";
58103a2e 638
6670e5e7
RGS
639If your function already returns a reference, you don't need to create
640the reference yourself.
641
642 sub timestamp { my $t = localtime; \$t }
58103a2e 643
6670e5e7 644 print "The time is ${ timestamp() }.\n";
58103a2e
RGS
645
646The C<Interpolation> module can also do a lot of magic for you. You can
647specify a variable name, in this case C<E>, to set up a tied hash that
648does the interpolation for you. It has several other methods to do this
649as well.
650
651 use Interpolation E => 'eval';
652 print "The time values are $E{localtime()}.\n";
653
654In most cases, it is probably easier to simply use string concatenation,
655which also forces scalar context.
6670e5e7 656
ac9dac7f 657 print "The time is " . localtime() . ".\n";
68dc0745 658
68dc0745 659=head2 How do I find matching/nesting anything?
660
92c2ed05
GS
661This isn't something that can be done in one regular expression, no
662matter how complicated. To find something between two single
663characters, a pattern like C</x([^x]*)x/> will get the intervening
664bits in $1. For multiple ones, then something more like
ac9dac7f 665C</alpha(.*?)omega/> would be needed. But none of these deals with
6670e5e7
RGS
666nested patterns. For balanced expressions using C<(>, C<{>, C<[> or
667C<< < >> as delimiters, use the CPAN module Regexp::Common, or see
668L<perlre/(??{ code })>. For other cases, you'll have to write a
669parser.
92c2ed05
GS
670
671If you are serious about writing a parser, there are a number of
6a2af475 672modules or oddities that will make your life a lot easier. There are
ac9dac7f
RGS
673the CPAN modules C<Parse::RecDescent>, C<Parse::Yapp>, and
674C<Text::Balanced>; and the C<byacc> program. Starting from perl 5.8
675the C<Text::Balanced> is part of the standard distribution.
68dc0745 676
92c2ed05
GS
677One simple destructive, inside-out approach that you might try is to
678pull out the smallest nesting parts one at a time:
5a964f20 679
ac9dac7f
RGS
680 while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
681 # do something with $1
682 }
5a964f20 683
65acb1b1
TC
684A more complicated and sneaky approach is to make Perl's regular
685expression engine do it for you. This is courtesy Dean Inada, and
686rather has the nature of an Obfuscated Perl Contest entry, but it
687really does work:
688
ac9dac7f
RGS
689 # $_ contains the string to parse
690 # BEGIN and END are the opening and closing markers for the
691 # nested text.
c47ff5f1 692
ac9dac7f
RGS
693 @( = ('(','');
694 @) = (')','');
695 ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
696 @$ = (eval{/$re/},$@!~/unmatched/i);
697 print join("\n",@$[0..$#$]) if( $$[-1] );
65acb1b1 698
68dc0745 699=head2 How do I reverse a string?
700
ac9dac7f 701Use C<reverse()> in scalar context, as documented in
68dc0745 702L<perlfunc/reverse>.
703
ac9dac7f 704 $reversed = reverse $string;
68dc0745 705
706=head2 How do I expand tabs in a string?
707
5a964f20 708You can do it yourself:
68dc0745 709
ac9dac7f 710 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
68dc0745 711
ac9dac7f 712Or you can just use the C<Text::Tabs> module (part of the standard Perl
68dc0745 713distribution).
714
ac9dac7f
RGS
715 use Text::Tabs;
716 @expanded_lines = expand(@lines_with_tabs);
68dc0745 717
718=head2 How do I reformat a paragraph?
719
ac9dac7f 720Use C<Text::Wrap> (part of the standard Perl distribution):
68dc0745 721
ac9dac7f
RGS
722 use Text::Wrap;
723 print wrap("\t", ' ', @paragraphs);
68dc0745 724
ac9dac7f
RGS
725The paragraphs you give to C<Text::Wrap> should not contain embedded
726newlines. C<Text::Wrap> doesn't justify the lines (flush-right).
46fc3d4c 727
ac9dac7f
RGS
728Or use the CPAN module C<Text::Autoformat>. Formatting files can be
729easily done by making a shell alias, like so:
bc06af74 730
ac9dac7f
RGS
731 alias fmt="perl -i -MText::Autoformat -n0777 \
732 -e 'print autoformat $_, {all=>1}' $*"
bc06af74 733
ac9dac7f 734See the documentation for C<Text::Autoformat> to appreciate its many
bc06af74
JH
735capabilities.
736
49d635f9 737=head2 How can I access or change N characters of a string?
68dc0745 738
49d635f9
RGS
739You can access the first characters of a string with substr().
740To get the first character, for example, start at position 0
197aec24 741and grab the string of length 1.
68dc0745 742
68dc0745 743
49d635f9 744 $string = "Just another Perl Hacker";
ac9dac7f 745 $first_char = substr( $string, 0, 1 ); # 'J'
68dc0745 746
49d635f9
RGS
747To change part of a string, you can use the optional fourth
748argument which is the replacement string.
68dc0745 749
ac9dac7f 750 substr( $string, 13, 4, "Perl 5.8.0" );
197aec24 751
49d635f9 752You can also use substr() as an lvalue.
68dc0745 753
ac9dac7f 754 substr( $string, 13, 4 ) = "Perl 5.8.0";
197aec24 755
68dc0745 756=head2 How do I change the Nth occurrence of something?
757
92c2ed05
GS
758You have to keep track of N yourself. For example, let's say you want
759to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
d92eb7b0
GS
760C<"whosoever"> or C<"whomsoever">, case insensitively. These
761all assume that $_ contains the string to be altered.
68dc0745 762
ac9dac7f
RGS
763 $count = 0;
764 s{((whom?)ever)}{
765 ++$count == 5 # is it the 5th?
766 ? "${2}soever" # yes, swap
767 : $1 # renege and leave it there
768 }ige;
68dc0745 769
5a964f20
TC
770In the more general case, you can use the C</g> modifier in a C<while>
771loop, keeping count of matches.
772
ac9dac7f
RGS
773 $WANT = 3;
774 $count = 0;
775 $_ = "One fish two fish red fish blue fish";
776 while (/(\w+)\s+fish\b/gi) {
777 if (++$count == $WANT) {
778 print "The third fish is a $1 one.\n";
779 }
780 }
5a964f20 781
92c2ed05 782That prints out: C<"The third fish is a red one."> You can also use a
5a964f20
TC
783repetition count and repeated pattern like this:
784
ac9dac7f 785 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
5a964f20 786
68dc0745 787=head2 How can I count the number of occurrences of a substring within a string?
788
a6dd486b 789There are a number of ways, with varying efficiency. If you want a
68dc0745 790count of a certain single character (X) within a string, you can use the
791C<tr///> function like so:
792
ac9dac7f
RGS
793 $string = "ThisXlineXhasXsomeXx'sXinXit";
794 $count = ($string =~ tr/X//);
795 print "There are $count X characters in the string";
68dc0745 796
797This is fine if you are just looking for a single character. However,
798if you are trying to count multiple character substrings within a
799larger string, C<tr///> won't work. What you can do is wrap a while()
800loop around a global pattern match. For example, let's count negative
801integers:
802
ac9dac7f
RGS
803 $string = "-9 55 48 -2 23 -76 4 14 -44";
804 while ($string =~ /-\d+/g) { $count++ }
805 print "There are $count negative numbers in the string";
68dc0745 806
881bdbd4
JH
807Another version uses a global match in list context, then assigns the
808result to a scalar, producing a count of the number of matches.
809
810 $count = () = $string =~ /-\d+/g;
811
109f0441
S
812=head2 How do I capitalize all the words on one line?
813X<Text::Autoformat> X<capitalize> X<case, title> X<case, sentence>
5a964f20 814
109f0441 815(contributed by brian d foy)
65acb1b1 816
109f0441
S
817Damian Conway's L<Text::Autoformat> handles all of the thinking
818for you.
369b44b4 819
ac9dac7f
RGS
820 use Text::Autoformat;
821 my $x = "Dr. Strangelove or: How I Learned to Stop ".
822 "Worrying and Love the Bomb";
369b44b4 823
ac9dac7f
RGS
824 print $x, "\n";
825 for my $style (qw( sentence title highlight )) {
826 print autoformat($x, { case => $style }), "\n";
827 }
369b44b4 828
109f0441
S
829How do you want to capitalize those words?
830
831 FRED AND BARNEY'S LODGE # all uppercase
832 Fred And Barney's Lodge # title case
833 Fred and Barney's Lodge # highlight case
834
835It's not as easy a problem as it looks. How many words do you think
836are in there? Wait for it... wait for it.... If you answered 5
837you're right. Perl words are groups of C<\w+>, but that's not what
838you want to capitalize. How is Perl supposed to know not to capitalize
839that C<s> after the apostrophe? You could try a regular expression:
840
841 $string =~ s/ (
842 (^\w) #at the beginning of the line
843 | # or
844 (\s\w) #preceded by whitespace
845 )
846 /\U$1/xg;
847
848 $string =~ s/([\w']+)/\u\L$1/g;
849
850Now, what if you don't want to capitalize that "and"? Just use
851L<Text::Autoformat> and get on with the next problem. :)
852
49d635f9 853=head2 How can I split a [character] delimited string except when inside [character]?
68dc0745 854
ac9dac7f
RGS
855Several modules can handle this sort of parsing--C<Text::Balanced>,
856C<Text::CSV>, C<Text::CSV_XS>, and C<Text::ParseWords>, among others.
49d635f9
RGS
857
858Take the example case of trying to split a string that is
859comma-separated into its different fields. You can't use C<split(/,/)>
860because you shouldn't split if the comma is inside quotes. For
861example, take a data line like this:
68dc0745 862
ac9dac7f 863 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
68dc0745 864
865Due to the restriction of the quotes, this is a fairly complex
197aec24 866problem. Thankfully, we have Jeffrey Friedl, author of
49d635f9 867I<Mastering Regular Expressions>, to handle these for us. He
ac9dac7f 868suggests (assuming your string is contained in C<$text>):
68dc0745 869
ac9dac7f
RGS
870 @new = ();
871 push(@new, $+) while $text =~ m{
872 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
873 | ([^,]+),?
874 | ,
875 }gx;
876 push(@new, undef) if substr($text,-1,1) eq ',';
68dc0745 877
46fc3d4c 878If you want to represent quotation marks inside a
879quotation-mark-delimited field, escape them with backslashes (eg,
49d635f9 880C<"like \"this\"">.
46fc3d4c 881
ac9dac7f
RGS
882Alternatively, the C<Text::ParseWords> module (part of the standard
883Perl distribution) lets you say:
68dc0745 884
ac9dac7f
RGS
885 use Text::ParseWords;
886 @new = quotewords(",", 0, $text);
65acb1b1 887
68dc0745 888=head2 How do I strip blank space from the beginning/end of a string?
889
6670e5e7 890(contributed by brian d foy)
68dc0745 891
6670e5e7
RGS
892A substitution can do this for you. For a single line, you want to
893replace all the leading or trailing whitespace with nothing. You
894can do that with a pair of substitutions.
68dc0745 895
6670e5e7
RGS
896 s/^\s+//;
897 s/\s+$//;
68dc0745 898
6670e5e7
RGS
899You can also write that as a single substitution, although it turns
900out the combined statement is slower than the separate ones. That
901might not matter to you, though.
68dc0745 902
6670e5e7 903 s/^\s+|\s+$//g;
68dc0745 904
6670e5e7
RGS
905In this regular expression, the alternation matches either at the
906beginning or the end of the string since the anchors have a lower
907precedence than the alternation. With the C</g> flag, the substitution
908makes all possible matches, so it gets both. Remember, the trailing
909newline matches the C<\s+>, and the C<$> anchor can match to the
910physical end of the string, so the newline disappears too. Just add
911the newline to the output, which has the added benefit of preserving
912"blank" (consisting entirely of whitespace) lines which the C<^\s+>
913would remove all by itself.
68dc0745 914
6670e5e7
RGS
915 while( <> )
916 {
917 s/^\s+|\s+$//g;
918 print "$_\n";
919 }
5a964f20 920
6670e5e7
RGS
921For a multi-line string, you can apply the regular expression
922to each logical line in the string by adding the C</m> flag (for
923"multi-line"). With the C</m> flag, the C<$> matches I<before> an
924embedded newline, so it doesn't remove it. It still removes the
925newline at the end of the string.
926
ac9dac7f 927 $string =~ s/^\s+|\s+$//gm;
6670e5e7
RGS
928
929Remember that lines consisting entirely of whitespace will disappear,
930since the first part of the alternation can match the entire string
931and replace it with nothing. If need to keep embedded blank lines,
932you have to do a little more work. Instead of matching any whitespace
933(since that includes a newline), just match the other whitespace.
934
935 $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
5a964f20 936
65acb1b1
TC
937=head2 How do I pad a string with blanks or pad a number with zeroes?
938
65acb1b1 939In the following examples, C<$pad_len> is the length to which you wish
d92eb7b0
GS
940to pad the string, C<$text> or C<$num> contains the string to be padded,
941and C<$pad_char> contains the padding character. You can use a single
942character string constant instead of the C<$pad_char> variable if you
943know what it is in advance. And in the same way you can use an integer in
944place of C<$pad_len> if you know the pad length in advance.
65acb1b1 945
d92eb7b0
GS
946The simplest method uses the C<sprintf> function. It can pad on the left
947or right with blanks and on the left with zeroes and it will not
948truncate the result. The C<pack> function can only pad strings on the
949right with blanks and it will truncate the result to a maximum length of
950C<$pad_len>.
65acb1b1 951
ac9dac7f 952 # Left padding a string with blanks (no truncation):
04d666b1
RGS
953 $padded = sprintf("%${pad_len}s", $text);
954 $padded = sprintf("%*s", $pad_len, $text); # same thing
65acb1b1 955
ac9dac7f 956 # Right padding a string with blanks (no truncation):
04d666b1
RGS
957 $padded = sprintf("%-${pad_len}s", $text);
958 $padded = sprintf("%-*s", $pad_len, $text); # same thing
65acb1b1 959
ac9dac7f 960 # Left padding a number with 0 (no truncation):
04d666b1
RGS
961 $padded = sprintf("%0${pad_len}d", $num);
962 $padded = sprintf("%0*d", $pad_len, $num); # same thing
65acb1b1 963
ac9dac7f
RGS
964 # Right padding a string with blanks using pack (will truncate):
965 $padded = pack("A$pad_len",$text);
65acb1b1 966
d92eb7b0
GS
967If you need to pad with a character other than blank or zero you can use
968one of the following methods. They all generate a pad string with the
969C<x> operator and combine that with C<$text>. These methods do
970not truncate C<$text>.
65acb1b1 971
d92eb7b0 972Left and right padding with any character, creating a new string:
65acb1b1 973
ac9dac7f
RGS
974 $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
975 $padded = $text . $pad_char x ( $pad_len - length( $text ) );
65acb1b1 976
d92eb7b0 977Left and right padding with any character, modifying C<$text> directly:
65acb1b1 978
ac9dac7f
RGS
979 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
980 $text .= $pad_char x ( $pad_len - length( $text ) );
65acb1b1 981
68dc0745 982=head2 How do I extract selected columns from a string?
983
e573f903
RGS
984(contributed by brian d foy)
985
986If you know where the columns that contain the data, you can
987use C<substr> to extract a single column.
988
989 my $column = substr( $line, $start_column, $length );
990
991You can use C<split> if the columns are separated by whitespace or
992some other delimiter, as long as whitespace or the delimiter cannot
993appear as part of the data.
994
995 my $line = ' fred barney betty ';
996 my @columns = split /\s+/, $line;
997 # ( '', 'fred', 'barney', 'betty' );
998
999 my $line = 'fred||barney||betty';
1000 my @columns = split /\|/, $line;
1001 # ( 'fred', '', 'barney', '', 'betty' );
1002
1003If you want to work with comma-separated values, don't do this since
1004that format is a bit more complicated. Use one of the modules that
109f0441 1005handle that format, such as C<Text::CSV>, C<Text::CSV_XS>, or
e573f903
RGS
1006C<Text::CSV_PP>.
1007
1008If you want to break apart an entire line of fixed columns, you can use
1009C<unpack> with the A (ASCII) format. by using a number after the format
1010specifier, you can denote the column width. See the C<pack> and C<unpack>
1011entries in L<perlfunc> for more details.
1012
1013 my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
1014
1015Note that spaces in the format argument to C<unpack> do not denote literal
1016spaces. If you have space separated data, you may want C<split> instead.
68dc0745 1017
1018=head2 How do I find the soundex value of a string?
1019
7678cced
RGS
1020(contributed by brian d foy)
1021
1022You can use the Text::Soundex module. If you want to do fuzzy or close
ac9dac7f
RGS
1023matching, you might also try the C<String::Approx>, and
1024C<Text::Metaphone>, and C<Text::DoubleMetaphone> modules.
68dc0745 1025
1026=head2 How can I expand variables in text strings?
1027
e573f903 1028(contributed by brian d foy)
5a964f20 1029
322be77c 1030If you can avoid it, don't, or if you can use a templating system,
c195e131
RGS
1031such as C<Text::Template> or C<Template> Toolkit, do that instead. You
1032might even be able to get the job done with C<sprintf> or C<printf>:
1033
1034 my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
322be77c
RGS
1035
1036However, for the one-off simple case where I don't want to pull out a
1037full templating system, I'll use a string that has two Perl scalar
1038variables in it. In this example, I want to expand C<$foo> and C<$bar>
c195e131 1039to their variable's values:
e573f903
RGS
1040
1041 my $foo = 'Fred';
1042 my $bar = 'Barney';
1043 $string = 'Say hello to $foo and $bar';
1044
1045One way I can do this involves the substitution operator and a double
1046C</e> flag. The first C</e> evaluates C<$1> on the replacement side and
1047turns it into C<$foo>. The second /e starts with C<$foo> and replaces
1048it with its value. C<$foo>, then, turns into 'Fred', and that's finally
c195e131 1049what's left in the string:
e573f903
RGS
1050
1051 $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
322be77c 1052
e573f903 1053The C</e> will also silently ignore violations of strict, replacing
c195e131 1054undefined variable names with the empty string. Since I'm using the
109f0441 1055C</e> flag (twice even!), I have all of the same security problems I
c195e131
RGS
1056have with C<eval> in its string form. If there's something odd in
1057C<$foo>, perhaps something like C<@{[ system "rm -rf /" ]}>, then
1058I could get myself in trouble.
1059
1060To get around the security problem, I could also pull the values from
1061a hash instead of evaluating variable names. Using a single C</e>, I
1062can check the hash to ensure the value exists, and if it doesn't, I
1063can replace the missing value with a marker, in this case C<???> to
1064signal that I missed something:
e573f903
RGS
1065
1066 my $string = 'This has $foo and $bar';
109f0441 1067
e573f903
RGS
1068 my %Replacements = (
1069 foo => 'Fred',
ac9dac7f 1070 );
322be77c 1071
e573f903
RGS
1072 # $string =~ s/\$(\w+)/$Replacements{$1}/g;
1073 $string =~ s/\$(\w+)/
1074 exists $Replacements{$1} ? $Replacements{$1} : '???'
1075 /eg;
322be77c 1076
e573f903 1077 print $string;
322be77c 1078
68dc0745 1079=head2 What's wrong with always quoting "$vars"?
1080
ac9dac7f 1081The problem is that those double-quotes force
e573f903
RGS
1082stringification--coercing numbers and references into strings--even
1083when you don't want them to be strings. Think of it this way:
1084double-quote expansion is used to produce new strings. If you already
1085have a string, why do you need more?
68dc0745 1086
1087If you get used to writing odd things like these:
1088
ac9dac7f
RGS
1089 print "$var"; # BAD
1090 $new = "$old"; # BAD
1091 somefunc("$var"); # BAD
68dc0745 1092
1093You'll be in trouble. Those should (in 99.8% of the cases) be
1094the simpler and more direct:
1095
ac9dac7f
RGS
1096 print $var;
1097 $new = $old;
1098 somefunc($var);
68dc0745 1099
1100Otherwise, besides slowing you down, you're going to break code when
1101the thing in the scalar is actually neither a string nor a number, but
1102a reference:
1103
ac9dac7f
RGS
1104 func(\@array);
1105 sub func {
1106 my $aref = shift;
1107 my $oref = "$aref"; # WRONG
1108 }
68dc0745 1109
1110You can also get into subtle problems on those few operations in Perl
1111that actually do care about the difference between a string and a
1112number, such as the magical C<++> autoincrement operator or the
1113syscall() function.
1114
197aec24 1115Stringification also destroys arrays.
5a964f20 1116
ac9dac7f
RGS
1117 @lines = `command`;
1118 print "@lines"; # WRONG - extra blanks
1119 print @lines; # right
5a964f20 1120
04d666b1 1121=head2 Why don't my E<lt>E<lt>HERE documents work?
68dc0745 1122
1123Check for these three things:
1124
1125=over 4
1126
04d666b1 1127=item There must be no space after the E<lt>E<lt> part.
68dc0745 1128
197aec24 1129=item There (probably) should be a semicolon at the end.
68dc0745 1130
197aec24 1131=item You can't (easily) have any space in front of the tag.
68dc0745 1132
1133=back
1134
197aec24 1135If you want to indent the text in the here document, you
5a964f20
TC
1136can do this:
1137
1138 # all in one
1139 ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1140 your text
1141 goes here
1142 HERE_TARGET
1143
1144But the HERE_TARGET must still be flush against the margin.
197aec24 1145If you want that indented also, you'll have to quote
5a964f20
TC
1146in the indentation.
1147
1148 ($quote = <<' FINIS') =~ s/^\s+//gm;
1149 ...we will have peace, when you and all your works have
1150 perished--and the works of your dark master to whom you
1151 would deliver us. You are a liar, Saruman, and a corrupter
1152 of men's hearts. --Theoden in /usr/src/perl/taint.c
1153 FINIS
83ded9ee 1154 $quote =~ s/\s+--/\n--/;
5a964f20
TC
1155
1156A nice general-purpose fixer-upper function for indented here documents
1157follows. It expects to be called with a here document as its argument.
1158It looks to see whether each line begins with a common substring, and
a6dd486b
JB
1159if so, strips that substring off. Otherwise, it takes the amount of leading
1160whitespace found on the first line and removes that much off each
5a964f20
TC
1161subsequent line.
1162
1163 sub fix {
1164 local $_ = shift;
a6dd486b 1165 my ($white, $leader); # common whitespace and common leading string
5a964f20
TC
1166 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
1167 ($white, $leader) = ($2, quotemeta($1));
1168 } else {
1169 ($white, $leader) = (/^(\s+)/, '');
1170 }
1171 s/^\s*?$leader(?:$white)?//gm;
1172 return $_;
1173 }
1174
c8db1d39 1175This works with leading special strings, dynamically determined:
5a964f20 1176
ac9dac7f 1177 $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
5a964f20
TC
1178 @@@ int
1179 @@@ runops() {
1180 @@@ SAVEI32(runlevel);
1181 @@@ runlevel++;
d92eb7b0 1182 @@@ while ( op = (*op->op_ppaddr)() );
5a964f20
TC
1183 @@@ TAINT_NOT;
1184 @@@ return 0;
1185 @@@ }
ac9dac7f 1186 MAIN_INTERPRETER_LOOP
5a964f20 1187
a6dd486b 1188Or with a fixed amount of leading whitespace, with remaining
5a964f20
TC
1189indentation correctly preserved:
1190
ac9dac7f 1191 $poem = fix<<EVER_ON_AND_ON;
5a964f20
TC
1192 Now far ahead the Road has gone,
1193 And I must follow, if I can,
1194 Pursuing it with eager feet,
1195 Until it joins some larger way
1196 Where many paths and errands meet.
1197 And whither then? I cannot say.
1198 --Bilbo in /usr/src/perl/pp_ctl.c
ac9dac7f 1199 EVER_ON_AND_ON
5a964f20 1200
68dc0745 1201=head1 Data: Arrays
1202
65acb1b1
TC
1203=head2 What is the difference between a list and an array?
1204
ac9dac7f
RGS
1205An array has a changeable length. A list does not. An array is
1206something you can push or pop, while a list is a set of values. Some
1207people make the distinction that a list is a value while an array is a
1208variable. Subroutines are passed and return lists, you put things into
1209list context, you initialize arrays with lists, and you C<foreach()>
1210across a list. C<@> variables are arrays, anonymous arrays are
1211arrays, arrays in scalar context behave like the number of elements in
1212them, subroutines access their arguments through the array C<@_>, and
1213C<push>/C<pop>/C<shift> only work on arrays.
65acb1b1
TC
1214
1215As a side note, there's no such thing as a list in scalar context.
1216When you say
1217
ac9dac7f 1218 $scalar = (2, 5, 7, 9);
65acb1b1 1219
d92eb7b0 1220you're using the comma operator in scalar context, so it uses the scalar
ac9dac7f 1221comma operator. There never was a list there at all! This causes the
d92eb7b0 1222last value to be returned: 9.
65acb1b1 1223
68dc0745 1224=head2 What is the difference between $array[1] and @array[1]?
1225
a6dd486b 1226The former is a scalar value; the latter an array slice, making
68dc0745 1227it a list with one (scalar) value. You should use $ when you want a
1228scalar value (most of the time) and @ when you want a list with one
1229scalar value in it (very, very rarely; nearly never, in fact).
1230
1231Sometimes it doesn't make a difference, but sometimes it does.
1232For example, compare:
1233
ac9dac7f 1234 $good[0] = `some program that outputs several lines`;
68dc0745 1235
1236with
1237
ac9dac7f 1238 @bad[0] = `same program that outputs several lines`;
68dc0745 1239
197aec24 1240The C<use warnings> pragma and the B<-w> flag will warn you about these
9f1b1f2d 1241matters.
68dc0745 1242
d92eb7b0 1243=head2 How can I remove duplicate elements from a list or array?
68dc0745 1244
6670e5e7 1245(contributed by brian d foy)
68dc0745 1246
6670e5e7
RGS
1247Use a hash. When you think the words "unique" or "duplicated", think
1248"hash keys".
68dc0745 1249
6670e5e7
RGS
1250If you don't care about the order of the elements, you could just
1251create the hash then extract the keys. It's not important how you
1252create that hash: just that you use C<keys> to get the unique
1253elements.
551e1d92 1254
ac9dac7f
RGS
1255 my %hash = map { $_, 1 } @array;
1256 # or a hash slice: @hash{ @array } = ();
1257 # or a foreach: $hash{$_} = 1 foreach ( @array );
1258
1259 my @unique = keys %hash;
68dc0745 1260
ac9dac7f
RGS
1261If you want to use a module, try the C<uniq> function from
1262C<List::MoreUtils>. In list context it returns the unique elements,
1263preserving their order in the list. In scalar context, it returns the
1264number of unique elements.
1265
1266 use List::MoreUtils qw(uniq);
1267
1268 my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1269 my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
68dc0745 1270
6670e5e7
RGS
1271You can also go through each element and skip the ones you've seen
1272before. Use a hash to keep track. The first time the loop sees an
1273element, that element has no key in C<%Seen>. The C<next> statement
1274creates the key and immediately uses its value, which is C<undef>, so
1275the loop continues to the C<push> and increments the value for that
1276key. The next time the loop sees that same element, its key exists in
1277the hash I<and> the value for that key is true (since it's not 0 or
ac9dac7f
RGS
1278C<undef>), so the next skips that iteration and the loop goes to the
1279next element.
551e1d92 1280
6670e5e7
RGS
1281 my @unique = ();
1282 my %seen = ();
68dc0745 1283
6670e5e7
RGS
1284 foreach my $elem ( @array )
1285 {
1286 next if $seen{ $elem }++;
1287 push @unique, $elem;
1288 }
68dc0745 1289
6670e5e7
RGS
1290You can write this more briefly using a grep, which does the
1291same thing.
68dc0745 1292
ac9dac7f
RGS
1293 my %seen = ();
1294 my @unique = grep { ! $seen{ $_ }++ } @array;
65acb1b1 1295
ddbc1f16 1296=head2 How can I tell whether a certain element is contained in a list or array?
5a964f20 1297
109f0441 1298(portions of this answer contributed by Anno Siegel and brian d foy)
9e72e4c6 1299
5a964f20
TC
1300Hearing the word "in" is an I<in>dication that you probably should have
1301used a hash, not a list or array, to store your data. Hashes are
1302designed to answer this question quickly and efficiently. Arrays aren't.
68dc0745 1303
109f0441
S
1304That being said, there are several ways to approach this. In Perl 5.10
1305and later, you can use the smart match operator to check that an item is
1306contained in an array or a hash:
1307
1308 use 5.010;
1309
1310 if( $item ~~ @array )
1311 {
1312 say "The array contains $item"
1313 }
1314
1315 if( $item ~~ %hash )
1316 {
1317 say "The hash contains $item"
1318 }
1319
1320With earlier versions of Perl, you have to do a bit more work. If you
5a964f20 1321are going to make this query many times over arbitrary string values,
881bdbd4 1322the fastest way is probably to invert the original array and maintain a
109f0441 1323hash whose keys are the first array's values:
68dc0745 1324
ac9dac7f
RGS
1325 @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1326 %is_blue = ();
1327 for (@blues) { $is_blue{$_} = 1 }
68dc0745 1328
ac9dac7f
RGS
1329Now you can check whether C<$is_blue{$some_color}>. It might have
1330been a good idea to keep the blues all in a hash in the first place.
68dc0745 1331
1332If the values are all small integers, you could use a simple indexed
1333array. This kind of an array will take up less space:
1334
ac9dac7f
RGS
1335 @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1336 @is_tiny_prime = ();
1337 for (@primes) { $is_tiny_prime[$_] = 1 }
1338 # or simply @istiny_prime[@primes] = (1) x @primes;
68dc0745 1339
1340Now you check whether $is_tiny_prime[$some_number].
1341
1342If the values in question are integers instead of strings, you can save
1343quite a lot of space by using bit strings instead:
1344
ac9dac7f
RGS
1345 @articles = ( 1..10, 150..2000, 2017 );
1346 undef $read;
1347 for (@articles) { vec($read,$_,1) = 1 }
68dc0745 1348
1349Now check whether C<vec($read,$n,1)> is true for some C<$n>.
1350
9e72e4c6
RGS
1351These methods guarantee fast individual tests but require a re-organization
1352of the original list or array. They only pay off if you have to test
1353multiple values against the same array.
68dc0745 1354
ac9dac7f 1355If you are testing only once, the standard module C<List::Util> exports
9e72e4c6 1356the function C<first> for this purpose. It works by stopping once it
c195e131 1357finds the element. It's written in C for speed, and its Perl equivalent
9e72e4c6 1358looks like this subroutine:
68dc0745 1359
9e72e4c6
RGS
1360 sub first (&@) {
1361 my $code = shift;
1362 foreach (@_) {
1363 return $_ if &{$code}();
1364 }
1365 undef;
1366 }
68dc0745 1367
9e72e4c6
RGS
1368If speed is of little concern, the common idiom uses grep in scalar context
1369(which returns the number of items that passed its condition) to traverse the
1370entire list. This does have the benefit of telling you how many matches it
1371found, though.
68dc0745 1372
9e72e4c6 1373 my $is_there = grep $_ eq $whatever, @array;
65acb1b1 1374
9e72e4c6
RGS
1375If you want to actually extract the matching elements, simply use grep in
1376list context.
68dc0745 1377
9e72e4c6 1378 my @matches = grep $_ eq $whatever, @array;
58103a2e 1379
68dc0745 1380=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
1381
ac9dac7f
RGS
1382Use a hash. Here's code to do both and more. It assumes that each
1383element is unique in a given array:
68dc0745 1384
ac9dac7f
RGS
1385 @union = @intersection = @difference = ();
1386 %count = ();
1387 foreach $element (@array1, @array2) { $count{$element}++ }
1388 foreach $element (keys %count) {
1389 push @union, $element;
1390 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1391 }
68dc0745 1392
ac9dac7f
RGS
1393Note that this is the I<symmetric difference>, that is, all elements
1394in either A or in B but not in both. Think of it as an xor operation.
d92eb7b0 1395
65acb1b1
TC
1396=head2 How do I test whether two arrays or hashes are equal?
1397
109f0441
S
1398With Perl 5.10 and later, the smart match operator can give you the answer
1399with the least amount of work:
1400
1401 use 5.010;
1402
1403 if( @array1 ~~ @array2 )
1404 {
1405 say "The arrays are the same";
1406 }
1407
1408 if( %hash1 ~~ %hash2 ) # doesn't check values!
1409 {
1410 say "The hash keys are the same";
1411 }
1412
ac9dac7f
RGS
1413The following code works for single-level arrays. It uses a
1414stringwise comparison, and does not distinguish defined versus
1415undefined empty strings. Modify if you have other needs.
65acb1b1 1416
ac9dac7f 1417 $are_equal = compare_arrays(\@frogs, \@toads);
65acb1b1 1418
ac9dac7f
RGS
1419 sub compare_arrays {
1420 my ($first, $second) = @_;
1421 no warnings; # silence spurious -w undef complaints
1422 return 0 unless @$first == @$second;
1423 for (my $i = 0; $i < @$first; $i++) {
1424 return 0 if $first->[$i] ne $second->[$i];
1425 }
1426 return 1;
1427 }
65acb1b1
TC
1428
1429For multilevel structures, you may wish to use an approach more
ac9dac7f 1430like this one. It uses the CPAN module C<FreezeThaw>:
65acb1b1 1431
ac9dac7f
RGS
1432 use FreezeThaw qw(cmpStr);
1433 @a = @b = ( "this", "that", [ "more", "stuff" ] );
65acb1b1 1434
ac9dac7f
RGS
1435 printf "a and b contain %s arrays\n",
1436 cmpStr(\@a, \@b) == 0
1437 ? "the same"
1438 : "different";
65acb1b1 1439
ac9dac7f
RGS
1440This approach also works for comparing hashes. Here we'll demonstrate
1441two different answers:
65acb1b1 1442
ac9dac7f 1443 use FreezeThaw qw(cmpStr cmpStrHard);
65acb1b1 1444
ac9dac7f
RGS
1445 %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1446 $a{EXTRA} = \%b;
1447 $b{EXTRA} = \%a;
65acb1b1 1448
ac9dac7f 1449 printf "a and b contain %s hashes\n",
65acb1b1
TC
1450 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1451
ac9dac7f 1452 printf "a and b contain %s hashes\n",
65acb1b1
TC
1453 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1454
1455
1456The first reports that both those the hashes contain the same data,
1457while the second reports that they do not. Which you prefer is left as
1458an exercise to the reader.
1459
68dc0745 1460=head2 How do I find the first array element for which a condition is true?
1461
49d635f9 1462To find the first array element which satisfies a condition, you can
ac9dac7f
RGS
1463use the C<first()> function in the C<List::Util> module, which comes
1464with Perl 5.8. This example finds the first element that contains
1465"Perl".
49d635f9
RGS
1466
1467 use List::Util qw(first);
197aec24 1468
49d635f9 1469 my $element = first { /Perl/ } @array;
197aec24 1470
ac9dac7f 1471If you cannot use C<List::Util>, you can make your own loop to do the
49d635f9
RGS
1472same thing. Once you find the element, you stop the loop with last.
1473
1474 my $found;
ac9dac7f 1475 foreach ( @array ) {
6670e5e7 1476 if( /Perl/ ) { $found = $_; last }
49d635f9
RGS
1477 }
1478
1479If you want the array index, you can iterate through the indices
1480and check the array element at each index until you find one
1481that satisfies the condition.
1482
197aec24 1483 my( $found, $index ) = ( undef, -1 );
ac9dac7f
RGS
1484 for( $i = 0; $i < @array; $i++ ) {
1485 if( $array[$i] =~ /Perl/ ) {
6670e5e7
RGS
1486 $found = $array[$i];
1487 $index = $i;
1488 last;
1489 }
1490 }
68dc0745 1491
1492=head2 How do I handle linked lists?
1493
1494In general, you usually don't need a linked list in Perl, since with
ac9dac7f
RGS
1495regular arrays, you can push and pop or shift and unshift at either
1496end, or you can use splice to add and/or remove arbitrary number of
ac003c96 1497elements at arbitrary points. Both pop and shift are O(1)
ac9dac7f
RGS
1498operations on Perl's dynamic arrays. In the absence of shifts and
1499pops, push in general needs to reallocate on the order every log(N)
1500times, and unshift will need to copy pointers each time.
68dc0745 1501
1502If you really, really wanted, you could use structures as described in
ac9dac7f
RGS
1503L<perldsc> or L<perltoot> and do just what the algorithm book tells
1504you to do. For example, imagine a list node like this:
65acb1b1 1505
ac9dac7f
RGS
1506 $node = {
1507 VALUE => 42,
1508 LINK => undef,
1509 };
65acb1b1
TC
1510
1511You could walk the list this way:
1512
ac9dac7f
RGS
1513 print "List: ";
1514 for ($node = $head; $node; $node = $node->{LINK}) {
1515 print $node->{VALUE}, " ";
1516 }
1517 print "\n";
65acb1b1 1518
a6dd486b 1519You could add to the list this way:
65acb1b1 1520
ac9dac7f
RGS
1521 my ($head, $tail);
1522 $tail = append($head, 1); # grow a new head
1523 for $value ( 2 .. 10 ) {
1524 $tail = append($tail, $value);
1525 }
65acb1b1 1526
ac9dac7f
RGS
1527 sub append {
1528 my($list, $value) = @_;
1529 my $node = { VALUE => $value };
1530 if ($list) {
1531 $node->{LINK} = $list->{LINK};
1532 $list->{LINK} = $node;
1533 }
1534 else {
1535 $_[0] = $node; # replace caller's version
1536 }
1537 return $node;
1538 }
65acb1b1
TC
1539
1540But again, Perl's built-in are virtually always good enough.
68dc0745 1541
1542=head2 How do I handle circular lists?
109f0441
S
1543X<circular> X<array> X<Tie::Cycle> X<Array::Iterator::Circular>
1544X<cycle> X<modulus>
68dc0745 1545
109f0441
S
1546(contributed by brian d foy)
1547
1548If you want to cycle through an array endlessy, you can increment the
1549index modulo the number of elements in the array:
68dc0745 1550
109f0441
S
1551 my @array = qw( a b c );
1552 my $i = 0;
1553
1554 while( 1 ) {
1555 print $array[ $i++ % @array ], "\n";
1556 last if $i > 20;
1557 }
ac9dac7f 1558
109f0441
S
1559You can also use C<Tie::Cycle> to use a scalar that always has the
1560next element of the circular array:
ac9dac7f
RGS
1561
1562 use Tie::Cycle;
1563
1564 tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1565
1566 print $cycle; # FFFFFF
1567 print $cycle; # 000000
1568 print $cycle; # FFFF00
68dc0745 1569
109f0441
S
1570The C<Array::Iterator::Circular> creates an iterator object for
1571circular arrays:
1572
1573 use Array::Iterator::Circular;
1574
1575 my $color_iterator = Array::Iterator::Circular->new(
1576 qw(red green blue orange)
1577 );
1578
1579 foreach ( 1 .. 20 ) {
1580 print $color_iterator->next, "\n";
1581 }
1582
68dc0745 1583=head2 How do I shuffle an array randomly?
1584
45bbf655
JH
1585If you either have Perl 5.8.0 or later installed, or if you have
1586Scalar-List-Utils 1.03 or later installed, you can say:
1587
ac9dac7f 1588 use List::Util 'shuffle';
45bbf655
JH
1589
1590 @shuffled = shuffle(@list);
1591
f05bbc40 1592If not, you can use a Fisher-Yates shuffle.
5a964f20 1593
ac9dac7f
RGS
1594 sub fisher_yates_shuffle {
1595 my $deck = shift; # $deck is a reference to an array
109f0441
S
1596 return unless @$deck; # must not be empty!
1597
ac9dac7f
RGS
1598 my $i = @$deck;
1599 while (--$i) {
1600 my $j = int rand ($i+1);
1601 @$deck[$i,$j] = @$deck[$j,$i];
1602 }
1603 }
5a964f20 1604
ac9dac7f
RGS
1605 # shuffle my mpeg collection
1606 #
1607 my @mpeg = <audio/*/*.mp3>;
1608 fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
1609 print @mpeg;
5a964f20 1610
45bbf655 1611Note that the above implementation shuffles an array in place,
ac9dac7f 1612unlike the C<List::Util::shuffle()> which takes a list and returns
45bbf655
JH
1613a new shuffled list.
1614
d92eb7b0 1615You've probably seen shuffling algorithms that work using splice,
a6dd486b 1616randomly picking another element to swap the current element with
68dc0745 1617
ac9dac7f
RGS
1618 srand;
1619 @new = ();
1620 @old = 1 .. 10; # just a demo
1621 while (@old) {
1622 push(@new, splice(@old, rand @old, 1));
1623 }
68dc0745 1624
ac9dac7f
RGS
1625This is bad because splice is already O(N), and since you do it N
1626times, you just invented a quadratic algorithm; that is, O(N**2).
1627This does not scale, although Perl is so efficient that you probably
1628won't notice this until you have rather largish arrays.
68dc0745 1629
1630=head2 How do I process/modify each element of an array?
1631
1632Use C<for>/C<foreach>:
1633
ac9dac7f 1634 for (@lines) {
6670e5e7
RGS
1635 s/foo/bar/; # change that word
1636 tr/XZ/ZX/; # swap those letters
ac9dac7f 1637 }
68dc0745 1638
1639Here's another; let's compute spherical volumes:
1640
ac9dac7f 1641 for (@volumes = @radii) { # @volumes has changed parts
6670e5e7
RGS
1642 $_ **= 3;
1643 $_ *= (4/3) * 3.14159; # this will be constant folded
ac9dac7f 1644 }
197aec24 1645
ac9dac7f 1646which can also be done with C<map()> which is made to transform
49d635f9
RGS
1647one list into another:
1648
1649 @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
68dc0745 1650
76817d6d
JH
1651If you want to do the same thing to modify the values of the
1652hash, you can use the C<values> function. As of Perl 5.6
1653the values are not copied, so if you modify $orbit (in this
1654case), you modify the value.
5a964f20 1655
ac9dac7f 1656 for $orbit ( values %orbits ) {
6670e5e7 1657 ($orbit **= 3) *= (4/3) * 3.14159;
ac9dac7f 1658 }
818c4caa 1659
76817d6d
JH
1660Prior to perl 5.6 C<values> returned copies of the values,
1661so older perl code often contains constructions such as
1662C<@orbits{keys %orbits}> instead of C<values %orbits> where
1663the hash is to be modified.
818c4caa 1664
68dc0745 1665=head2 How do I select a random element from an array?
1666
ac9dac7f 1667Use the C<rand()> function (see L<perlfunc/rand>):
68dc0745 1668
ac9dac7f
RGS
1669 $index = rand @array;
1670 $element = $array[$index];
68dc0745 1671
793f5136 1672Or, simply:
ac9dac7f
RGS
1673
1674 my $element = $array[ rand @array ];
5a964f20 1675
68dc0745 1676=head2 How do I permute N elements of a list?
c195e131
RGS
1677X<List::Permuter> X<permute> X<Algorithm::Loops> X<Knuth>
1678X<The Art of Computer Programming> X<Fischer-Krause>
68dc0745 1679
c195e131 1680Use the C<List::Permutor> module on CPAN. If the list is actually an
ac9dac7f 1681array, try the C<Algorithm::Permute> module (also on CPAN). It's
c195e131 1682written in XS code and is very efficient:
49d635f9
RGS
1683
1684 use Algorithm::Permute;
c195e131 1685
49d635f9
RGS
1686 my @array = 'a'..'d';
1687 my $p_iterator = Algorithm::Permute->new ( \@array );
c195e131 1688
49d635f9
RGS
1689 while (my @perm = $p_iterator->next) {
1690 print "next permutation: (@perm)\n";
ac9dac7f 1691 }
49d635f9 1692
197aec24
RGS
1693For even faster execution, you could do:
1694
ac9dac7f 1695 use Algorithm::Permute;
c195e131 1696
ac9dac7f 1697 my @array = 'a'..'d';
c195e131 1698
ac9dac7f
RGS
1699 Algorithm::Permute::permute {
1700 print "next permutation: (@array)\n";
1701 } @array;
197aec24 1702
c195e131
RGS
1703Here's a little program that generates all permutations of all the
1704words on each line of input. The algorithm embodied in the
1705C<permute()> function is discussed in Volume 4 (still unpublished) of
1706Knuth's I<The Art of Computer Programming> and will work on any list:
49d635f9
RGS
1707
1708 #!/usr/bin/perl -n
ac003c96 1709 # Fischer-Krause ordered permutation generator
49d635f9
RGS
1710
1711 sub permute (&@) {
1712 my $code = shift;
1713 my @idx = 0..$#_;
1714 while ( $code->(@_[@idx]) ) {
1715 my $p = $#idx;
1716 --$p while $idx[$p-1] > $idx[$p];
1717 my $q = $p or return;
1718 push @idx, reverse splice @idx, $p;
1719 ++$q while $idx[$p-1] > $idx[$q];
1720 @idx[$p-1,$q]=@idx[$q,$p-1];
1721 }
68dc0745 1722 }
68dc0745 1723
c195e131
RGS
1724 permute { print "@_\n" } split;
1725
1726The C<Algorithm::Loops> module also provides the C<NextPermute> and
1727C<NextPermuteNum> functions which efficiently find all unique permutations
1728of an array, even if it contains duplicate values, modifying it in-place:
1729if its elements are in reverse-sorted order then the array is reversed,
1730making it sorted, and it returns false; otherwise the next
1731permutation is returned.
1732
1733C<NextPermute> uses string order and C<NextPermuteNum> numeric order, so
1734you can enumerate all the permutations of C<0..9> like this:
1735
1736 use Algorithm::Loops qw(NextPermuteNum);
109f0441 1737
c195e131
RGS
1738 my @list= 0..9;
1739 do { print "@list\n" } while NextPermuteNum @list;
b8d2732a 1740
68dc0745 1741=head2 How do I sort an array by (anything)?
1742
1743Supply a comparison function to sort() (described in L<perlfunc/sort>):
1744
ac9dac7f 1745 @list = sort { $a <=> $b } @list;
68dc0745 1746
1747The default sort function is cmp, string comparison, which would
c47ff5f1 1748sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is
68dc0745 1749the numerical comparison operator.
1750
1751If you have a complicated function needed to pull out the part you
1752want to sort on, then don't do it inside the sort function. Pull it
1753out first, because the sort BLOCK can be called many times for the
1754same element. Here's an example of how to pull out the first word
1755after the first number on each item, and then sort those words
1756case-insensitively.
1757
ac9dac7f
RGS
1758 @idx = ();
1759 for (@data) {
1760 ($item) = /\d+\s*(\S+)/;
1761 push @idx, uc($item);
1762 }
1763 @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
68dc0745 1764
a6dd486b 1765which could also be written this way, using a trick
68dc0745 1766that's come to be known as the Schwartzian Transform:
1767
ac9dac7f
RGS
1768 @sorted = map { $_->[0] }
1769 sort { $a->[1] cmp $b->[1] }
1770 map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
68dc0745 1771
1772If you need to sort on several fields, the following paradigm is useful.
1773
ac9dac7f
RGS
1774 @sorted = sort {
1775 field1($a) <=> field1($b) ||
1776 field2($a) cmp field2($b) ||
1777 field3($a) cmp field3($b)
1778 } @data;
68dc0745 1779
1780This can be conveniently combined with precalculation of keys as given
1781above.
1782
379e39d7 1783See the F<sort> article in the "Far More Than You Ever Wanted
49d635f9 1784To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
06a5f41f 1785more about this approach.
68dc0745 1786
ac9dac7f 1787See also the question later in L<perlfaq4> on sorting hashes.
68dc0745 1788
1789=head2 How do I manipulate arrays of bits?
1790
ac9dac7f
RGS
1791Use C<pack()> and C<unpack()>, or else C<vec()> and the bitwise
1792operations.
1793
109f0441
S
1794For example, you don't have to store individual bits in an array
1795(which would mean that you're wasting a lot of space). To convert an
1796array of bits to a string, use C<vec()> to set the right bits. This
1797sets C<$vec> to have bit N set only if C<$ints[N]> was set:
ac9dac7f 1798
109f0441 1799 @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
ac9dac7f 1800 $vec = '';
109f0441
S
1801 foreach( 0 .. $#ints ) {
1802 vec($vec,$_,1) = 1 if $ints[$_];
1803 }
ac9dac7f 1804
109f0441
S
1805The string C<$vec> only takes up as many bits as it needs. For
1806instance, if you had 16 entries in C<@ints>, C<$vec> only needs two
1807bytes to store them (not counting the scalar variable overhead).
1808
1809Here's how, given a vector in C<$vec>, you can get those bits into
1810your C<@ints> array:
ac9dac7f
RGS
1811
1812 sub bitvec_to_list {
1813 my $vec = shift;
1814 my @ints;
1815 # Find null-byte density then select best algorithm
1816 if ($vec =~ tr/\0// / length $vec > 0.95) {
1817 use integer;
1818 my $i;
1819
1820 # This method is faster with mostly null-bytes
1821 while($vec =~ /[^\0]/g ) {
1822 $i = -9 + 8 * pos $vec;
1823 push @ints, $i if vec($vec, ++$i, 1);
1824 push @ints, $i if vec($vec, ++$i, 1);
1825 push @ints, $i if vec($vec, ++$i, 1);
1826 push @ints, $i if vec($vec, ++$i, 1);
1827 push @ints, $i if vec($vec, ++$i, 1);
1828 push @ints, $i if vec($vec, ++$i, 1);
1829 push @ints, $i if vec($vec, ++$i, 1);
1830 push @ints, $i if vec($vec, ++$i, 1);
1831 }
1832 }
1833 else {
1834 # This method is a fast general algorithm
1835 use integer;
1836 my $bits = unpack "b*", $vec;
1837 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1838 push @ints, pos $bits while($bits =~ /1/g);
1839 }
1840
1841 return \@ints;
1842 }
68dc0745 1843
1844This method gets faster the more sparse the bit vector is.
1845(Courtesy of Tim Bunce and Winfried Koenig.)
1846
76817d6d
JH
1847You can make the while loop a lot shorter with this suggestion
1848from Benjamin Goldberg:
1849
1850 while($vec =~ /[^\0]+/g ) {
ac9dac7f
RGS
1851 push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1852 }
76817d6d 1853
ac9dac7f 1854Or use the CPAN module C<Bit::Vector>:
cc30d1a7 1855
ac9dac7f
RGS
1856 $vector = Bit::Vector->new($num_of_bits);
1857 $vector->Index_List_Store(@ints);
1858 @ints = $vector->Index_List_Read();
cc30d1a7 1859
ac9dac7f
RGS
1860C<Bit::Vector> provides efficient methods for bit vector, sets of
1861small integers and "big int" math.
cc30d1a7
JH
1862
1863Here's a more extensive illustration using vec():
65acb1b1 1864
ac9dac7f
RGS
1865 # vec demo
1866 $vector = "\xff\x0f\xef\xfe";
1867 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
65acb1b1 1868 unpack("N", $vector), "\n";
ac9dac7f
RGS
1869 $is_set = vec($vector, 23, 1);
1870 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
65acb1b1 1871 pvec($vector);
65acb1b1 1872
ac9dac7f
RGS
1873 set_vec(1,1,1);
1874 set_vec(3,1,1);
1875 set_vec(23,1,1);
1876
1877 set_vec(3,1,3);
1878 set_vec(3,2,3);
1879 set_vec(3,4,3);
1880 set_vec(3,4,7);
1881 set_vec(3,8,3);
1882 set_vec(3,8,7);
1883
1884 set_vec(0,32,17);
1885 set_vec(1,32,17);
1886
1887 sub set_vec {
1888 my ($offset, $width, $value) = @_;
1889 my $vector = '';
1890 vec($vector, $offset, $width) = $value;
1891 print "offset=$offset width=$width value=$value\n";
1892 pvec($vector);
1893 }
65acb1b1 1894
ac9dac7f
RGS
1895 sub pvec {
1896 my $vector = shift;
1897 my $bits = unpack("b*", $vector);
1898 my $i = 0;
1899 my $BASE = 8;
1900
1901 print "vector length in bytes: ", length($vector), "\n";
1902 @bytes = unpack("A8" x length($vector), $bits);
1903 print "bits are: @bytes\n\n";
1904 }
65acb1b1 1905
68dc0745 1906=head2 Why does defined() return true on empty arrays and hashes?
1907
65acb1b1
TC
1908The short story is that you should probably only use defined on scalars or
1909functions, not on aggregates (arrays and hashes). See L<perlfunc/defined>
1910in the 5.004 release or later of Perl for more detail.
68dc0745 1911
1912=head1 Data: Hashes (Associative Arrays)
1913
1914=head2 How do I process an entire hash?
1915
ee891a00
RGS
1916(contributed by brian d foy)
1917
1918There are a couple of ways that you can process an entire hash. You
1919can get a list of keys, then go through each key, or grab a one
1920key-value pair at a time.
68dc0745 1921
ee891a00
RGS
1922To go through all of the keys, use the C<keys> function. This extracts
1923all of the keys of the hash and gives them back to you as a list. You
1924can then get the value through the particular key you're processing:
1925
1926 foreach my $key ( keys %hash ) {
1927 my $value = $hash{$key}
1928 ...
ac9dac7f 1929 }
68dc0745 1930
ee891a00 1931Once you have the list of keys, you can process that list before you
109f0441 1932process the hash elements. For instance, you can sort the keys so you
ee891a00
RGS
1933can process them in lexical order:
1934
1935 foreach my $key ( sort keys %hash ) {
1936 my $value = $hash{$key}
1937 ...
1938 }
1939
1940Or, you might want to only process some of the items. If you only want
1941to deal with the keys that start with C<text:>, you can select just
1942those using C<grep>:
1943
1944 foreach my $key ( grep /^text:/, keys %hash ) {
1945 my $value = $hash{$key}
1946 ...
1947 }
1948
1949If the hash is very large, you might not want to create a long list of
109f0441 1950keys. To save some memory, you can grab one key-value pair at a time using
ee891a00
RGS
1951C<each()>, which returns a pair you haven't seen yet:
1952
1953 while( my( $key, $value ) = each( %hash ) ) {
1954 ...
1955 }
1956
1957The C<each> operator returns the pairs in apparently random order, so if
1958ordering matters to you, you'll have to stick with the C<keys> method.
1959
1960The C<each()> operator can be a bit tricky though. You can't add or
1961delete keys of the hash while you're using it without possibly
1962skipping or re-processing some pairs after Perl internally rehashes
1963all of the elements. Additionally, a hash has only one iterator, so if
1964you use C<keys>, C<values>, or C<each> on the same hash, you can reset
1965the iterator and mess up your processing. See the C<each> entry in
1966L<perlfunc> for more details.
68dc0745 1967
109f0441
S
1968=head2 How do I merge two hashes?
1969X<hash> X<merge> X<slice, hash>
1970
1971(contributed by brian d foy)
1972
1973Before you decide to merge two hashes, you have to decide what to do
1974if both hashes contain keys that are the same and if you want to leave
1975the original hashes as they were.
1976
1977If you want to preserve the original hashes, copy one hash (C<%hash1>)
1978to a new hash (C<%new_hash>), then add the keys from the other hash
1979(C<%hash2> to the new hash. Checking that the key already exists in
1980C<%new_hash> gives you a chance to decide what to do with the
1981duplicates:
1982
1983 my %new_hash = %hash1; # make a copy; leave %hash1 alone
1984
1985 foreach my $key2 ( keys %hash2 )
1986 {
1987 if( exists $new_hash{$key2} )
1988 {
1989 warn "Key [$key2] is in both hashes!";
1990 # handle the duplicate (perhaps only warning)
1991 ...
1992 next;
1993 }
1994 else
1995 {
1996 $new_hash{$key2} = $hash2{$key2};
1997 }
1998 }
1999
2000If you don't want to create a new hash, you can still use this looping
2001technique; just change the C<%new_hash> to C<%hash1>.
2002
2003 foreach my $key2 ( keys %hash2 )
2004 {
2005 if( exists $hash1{$key2} )
2006 {
2007 warn "Key [$key2] is in both hashes!";
2008 # handle the duplicate (perhaps only warning)
2009 ...
2010 next;
2011 }
2012 else
2013 {
2014 $hash1{$key2} = $hash2{$key2};
2015 }
2016 }
2017
2018If you don't care that one hash overwrites keys and values from the other, you
2019could just use a hash slice to add one hash to another. In this case, values
2020from C<%hash2> replace values from C<%hash1> when they have keys in common:
2021
2022 @hash1{ keys %hash2 } = values %hash2;
2023
68dc0745 2024=head2 What happens if I add or remove keys from a hash while iterating over it?
2025
28b41a80 2026(contributed by brian d foy)
d92eb7b0 2027
28b41a80 2028The easy answer is "Don't do that!"
d92eb7b0 2029
28b41a80
RGS
2030If you iterate through the hash with each(), you can delete the key
2031most recently returned without worrying about it. If you delete or add
2032other keys, the iterator may skip or double up on them since perl
2033may rearrange the hash table. See the
2034entry for C<each()> in L<perlfunc>.
68dc0745 2035
2036=head2 How do I look up a hash element by value?
2037
2038Create a reverse hash:
2039
ac9dac7f
RGS
2040 %by_value = reverse %by_key;
2041 $key = $by_value{$value};
68dc0745 2042
2043That's not particularly efficient. It would be more space-efficient
2044to use:
2045
ac9dac7f
RGS
2046 while (($key, $value) = each %by_key) {
2047 $by_value{$value} = $key;
2048 }
68dc0745 2049
d92eb7b0
GS
2050If your hash could have repeated values, the methods above will only find
2051one of the associated keys. This may or may not worry you. If it does
2052worry you, you can always reverse the hash into a hash of arrays instead:
2053
ac9dac7f
RGS
2054 while (($key, $value) = each %by_key) {
2055 push @{$key_list_by_value{$value}}, $key;
2056 }
68dc0745 2057
2058=head2 How can I know how many entries are in a hash?
2059
109f0441
S
2060(contributed by brian d foy)
2061
2062This is very similar to "How do I process an entire hash?", also in
2063L<perlfaq4>, but a bit simpler in the common cases.
2064
2065You can use the C<keys()> built-in function in scalar context to find out
2066have many entries you have in a hash:
68dc0745 2067
109f0441
S
2068 my $key_count = keys %hash; # must be scalar context!
2069
2070If you want to find out how many entries have a defined value, that's
2071a bit different. You have to check each value. A C<grep> is handy:
2072
2073 my $defined_value_count = grep { defined } values %hash;
68dc0745 2074
109f0441
S
2075You can use that same structure to count the entries any way that
2076you like. If you want the count of the keys with vowels in them,
2077you just test for that instead:
2078
2079 my $vowel_count = grep { /[aeiou]/ } keys %hash;
2080
2081The C<grep> in scalar context returns the count. If you want the list
2082of matching items, just use it in list context instead:
2083
2084 my @defined_values = grep { defined } values %hash;
2085
2086The C<keys()> function also resets the iterator, which means that you may
197aec24 2087see strange results if you use this between uses of other hash operators
109f0441 2088such as C<each()>.
68dc0745 2089
2090=head2 How do I sort a hash (optionally by value instead of key)?
2091
a05e4845
RGS
2092(contributed by brian d foy)
2093
2094To sort a hash, start with the keys. In this example, we give the list of
2095keys to the sort function which then compares them ASCIIbetically (which
2096might be affected by your locale settings). The output list has the keys
2097in ASCIIbetical order. Once we have the keys, we can go through them to
2098create a report which lists the keys in ASCIIbetical order.
2099
2100 my @keys = sort { $a cmp $b } keys %hash;
58103a2e 2101
a05e4845
RGS
2102 foreach my $key ( @keys )
2103 {
109f0441 2104 printf "%-20s %6d\n", $key, $hash{$key};
a05e4845
RGS
2105 }
2106
58103a2e 2107We could get more fancy in the C<sort()> block though. Instead of
a05e4845 2108comparing the keys, we can compute a value with them and use that
58103a2e 2109value as the comparison.
a05e4845
RGS
2110
2111For instance, to make our report order case-insensitive, we use
58103a2e 2112the C<\L> sequence in a double-quoted string to make everything
a05e4845
RGS
2113lowercase. The C<sort()> block then compares the lowercased
2114values to determine in which order to put the keys.
2115
2116 my @keys = sort { "\L$a" cmp "\L$b" } keys %hash;
58103a2e 2117
a05e4845 2118Note: if the computation is expensive or the hash has many elements,
58103a2e 2119you may want to look at the Schwartzian Transform to cache the
a05e4845
RGS
2120computation results.
2121
2122If we want to sort by the hash value instead, we use the hash key
2123to look it up. We still get out a list of keys, but this time they
2124are ordered by their value.
2125
2126 my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
2127
2128From there we can get more complex. If the hash values are the same,
2129we can provide a secondary sort on the hash key.
2130
58103a2e
RGS
2131 my @keys = sort {
2132 $hash{$a} <=> $hash{$b}
a05e4845
RGS
2133 or
2134 "\L$a" cmp "\L$b"
2135 } keys %hash;
68dc0745 2136
2137=head2 How can I always keep my hash sorted?
ac9dac7f 2138X<hash tie sort DB_File Tie::IxHash>
68dc0745 2139
ac9dac7f
RGS
2140You can look into using the C<DB_File> module and C<tie()> using the
2141C<$DB_BTREE> hash bindings as documented in L<DB_File/"In Memory
2142Databases">. The C<Tie::IxHash> module from CPAN might also be
2143instructive. Although this does keep your hash sorted, you might not
2144like the slow down you suffer from the tie interface. Are you sure you
2145need to do this? :)
68dc0745 2146
2147=head2 What's the difference between "delete" and "undef" with hashes?
2148
92993692
JH
2149Hashes contain pairs of scalars: the first is the key, the
2150second is the value. The key will be coerced to a string,
2151although the value can be any kind of scalar: string,
ac9dac7f 2152number, or reference. If a key C<$key> is present in
92993692
JH
2153%hash, C<exists($hash{$key})> will return true. The value
2154for a given key can be C<undef>, in which case
2155C<$hash{$key}> will be C<undef> while C<exists $hash{$key}>
2156will return true. This corresponds to (C<$key>, C<undef>)
2157being in the hash.
68dc0745 2158
ac9dac7f 2159Pictures help... here's the C<%hash> table:
68dc0745 2160
2161 keys values
2162 +------+------+
2163 | a | 3 |
2164 | x | 7 |
2165 | d | 0 |
2166 | e | 2 |
2167 +------+------+
2168
2169And these conditions hold
2170
92993692
JH
2171 $hash{'a'} is true
2172 $hash{'d'} is false
2173 defined $hash{'d'} is true
2174 defined $hash{'a'} is true
e9d185f8 2175 exists $hash{'a'} is true (Perl 5 only)
92993692 2176 grep ($_ eq 'a', keys %hash) is true
68dc0745 2177
2178If you now say
2179
92993692 2180 undef $hash{'a'}
68dc0745 2181
2182your table now reads:
2183
2184
2185 keys values
2186 +------+------+
2187 | a | undef|
2188 | x | 7 |
2189 | d | 0 |
2190 | e | 2 |
2191 +------+------+
2192
2193and these conditions now hold; changes in caps:
2194
92993692
JH
2195 $hash{'a'} is FALSE
2196 $hash{'d'} is false
2197 defined $hash{'d'} is true
2198 defined $hash{'a'} is FALSE
e9d185f8 2199 exists $hash{'a'} is true (Perl 5 only)
92993692 2200 grep ($_ eq 'a', keys %hash) is true
68dc0745 2201
2202Notice the last two: you have an undef value, but a defined key!
2203
2204Now, consider this:
2205
92993692 2206 delete $hash{'a'}
68dc0745 2207
2208your table now reads:
2209
2210 keys values
2211 +------+------+
2212 | x | 7 |
2213 | d | 0 |
2214 | e | 2 |
2215 +------+------+
2216
2217and these conditions now hold; changes in caps:
2218
92993692
JH
2219 $hash{'a'} is false
2220 $hash{'d'} is false
2221 defined $hash{'d'} is true
2222 defined $hash{'a'} is false
e9d185f8 2223 exists $hash{'a'} is FALSE (Perl 5 only)
92993692 2224 grep ($_ eq 'a', keys %hash) is FALSE
68dc0745 2225
2226See, the whole entry is gone!
2227
2228=head2 Why don't my tied hashes make the defined/exists distinction?
2229
92993692
JH
2230This depends on the tied hash's implementation of EXISTS().
2231For example, there isn't the concept of undef with hashes
2232that are tied to DBM* files. It also means that exists() and
2233defined() do the same thing with a DBM* file, and what they
2234end up doing is not what they do with ordinary hashes.
68dc0745 2235
2236=head2 How do I reset an each() operation part-way through?
2237
fb2fe781
RGS
2238(contributed by brian d foy)
2239
2240You can use the C<keys> or C<values> functions to reset C<each>. To
2241simply reset the iterator used by C<each> without doing anything else,
2242use one of them in void context:
2243
2244 keys %hash; # resets iterator, nothing else.
2245 values %hash; # resets iterator, nothing else.
2246
2247See the documentation for C<each> in L<perlfunc>.
68dc0745 2248
2249=head2 How can I get the unique keys from two hashes?
2250
d92eb7b0
GS
2251First you extract the keys from the hashes into lists, then solve
2252the "removing duplicates" problem described above. For example:
68dc0745 2253
ac9dac7f
RGS
2254 %seen = ();
2255 for $element (keys(%foo), keys(%bar)) {
2256 $seen{$element}++;
2257 }
2258 @uniq = keys %seen;
68dc0745 2259
2260Or more succinctly:
2261
ac9dac7f 2262 @uniq = keys %{{%foo,%bar}};
68dc0745 2263
2264Or if you really want to save space:
2265
ac9dac7f
RGS
2266 %seen = ();
2267 while (defined ($key = each %foo)) {
2268 $seen{$key}++;
2269 }
2270 while (defined ($key = each %bar)) {
2271 $seen{$key}++;
2272 }
2273 @uniq = keys %seen;
68dc0745 2274
2275=head2 How can I store a multidimensional array in a DBM file?
2276
2277Either stringify the structure yourself (no fun), or else
2278get the MLDBM (which uses Data::Dumper) module from CPAN and layer
2279it on top of either DB_File or GDBM_File.
2280
2281=head2 How can I make my hash remember the order I put elements into it?
2282
ac9dac7f 2283Use the C<Tie::IxHash> from CPAN.
68dc0745 2284
ac9dac7f
RGS
2285 use Tie::IxHash;
2286
2287 tie my %myhash, 'Tie::IxHash';
2288
2289 for (my $i=0; $i<20; $i++) {
2290 $myhash{$i} = 2*$i;
2291 }
2292
2293 my @keys = keys %myhash;
2294 # @keys = (0,1,2,3,...)
46fc3d4c 2295
68dc0745 2296=head2 Why does passing a subroutine an undefined element in a hash create it?
2297
109f0441
S
2298(contributed by brian d foy)
2299
2300Are you using a really old version of Perl?
2301
2302Normally, accessing a hash key's value for a nonexistent key will
2303I<not> create the key.
2304
2305 my %hash = ();
2306 my $value = $hash{ 'foo' };
2307 print "This won't print\n" if exists $hash{ 'foo' };
2308
2309Passing C<$hash{ 'foo' }> to a subroutine used to be a special case, though.
2310Since you could assign directly to C<$_[0]>, Perl had to be ready to
2311make that assignment so it created the hash key ahead of time:
2312
2313 my_sub( $hash{ 'foo' } );
2314 print "This will print before 5.004\n" if exists $hash{ 'foo' };
68dc0745 2315
109f0441
S
2316 sub my_sub {
2317 # $_[0] = 'bar'; # create hash key in case you do this
2318 1;
2319 }
2320
2321Since Perl 5.004, however, this situation is a special case and Perl
2322creates the hash key only when you make the assignment:
68dc0745 2323
109f0441
S
2324 my_sub( $hash{ 'foo' } );
2325 print "This will print, even after 5.004\n" if exists $hash{ 'foo' };
2326
2327 sub my_sub {
2328 $_[0] = 'bar';
2329 }
68dc0745 2330
109f0441
S
2331However, if you want the old behavior (and think carefully about that
2332because it's a weird side effect), you can pass a hash slice instead.
2333Perl 5.004 didn't make this a special case:
68dc0745 2334
109f0441 2335 my_sub( @hash{ qw/foo/ } );
68dc0745 2336
fc36a67e 2337=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
68dc0745 2338
65acb1b1
TC
2339Usually a hash ref, perhaps like this:
2340
ac9dac7f
RGS
2341 $record = {
2342 NAME => "Jason",
2343 EMPNO => 132,
2344 TITLE => "deputy peon",
2345 AGE => 23,
2346 SALARY => 37_000,
2347 PALS => [ "Norbert", "Rhys", "Phineas"],
2348 };
65acb1b1
TC
2349
2350References are documented in L<perlref> and the upcoming L<perlreftut>.
2351Examples of complex data structures are given in L<perldsc> and
2352L<perllol>. Examples of structures and object-oriented classes are
2353in L<perltoot>.
68dc0745 2354
2355=head2 How can I use a reference as a hash key?
2356
109f0441 2357(contributed by brian d foy and Ben Morrow)
9e72e4c6
RGS
2358
2359Hash keys are strings, so you can't really use a reference as the key.
2360When you try to do that, perl turns the reference into its stringified
ac9dac7f
RGS
2361form (for instance, C<HASH(0xDEADBEEF)>). From there you can't get
2362back the reference from the stringified form, at least without doing
109f0441
S
2363some extra work on your own.
2364
2365Remember that the entry in the hash will still be there even if
2366the referenced variable goes out of scope, and that it is entirely
2367possible for Perl to subsequently allocate a different variable at
2368the same address. This will mean a new variable might accidentally
2369be associated with the value for an old.
2370
2371If you have Perl 5.10 or later, and you just want to store a value
2372against the reference for lookup later, you can use the core
2373Hash::Util::Fieldhash module. This will also handle renaming the
2374keys if you use multiple threads (which causes all variables to be
2375reallocated at new addresses, changing their stringification), and
2376garbage-collecting the entries when the referenced variable goes out
2377of scope.
2378
2379If you actually need to be able to get a real reference back from
2380each hash entry, you can use the Tie::RefHash module, which does the
2381required work for you.
68dc0745 2382
2383=head1 Data: Misc
2384
2385=head2 How do I handle binary data correctly?
2386
ac9dac7f 2387Perl is binary clean, so it can handle binary data just fine.
e573f903 2388On Windows or DOS, however, you have to use C<binmode> for binary
ac9dac7f
RGS
2389files to avoid conversions for line endings. In general, you should
2390use C<binmode> any time you want to work with binary data.
68dc0745 2391
ac9dac7f 2392Also see L<perlfunc/"binmode"> or L<perlopentut>.
68dc0745 2393
ac9dac7f 2394If you're concerned about 8-bit textual data then see L<perllocale>.
54310121 2395If you want to deal with multibyte characters, however, there are
68dc0745 2396some gotchas. See the section on Regular Expressions.
2397
2398=head2 How do I determine whether a scalar is a number/whole/integer/float?
2399
2400Assuming that you don't care about IEEE notations like "NaN" or
2401"Infinity", you probably just want to use a regular expression.
2402
ac9dac7f
RGS
2403 if (/\D/) { print "has nondigits\n" }
2404 if (/^\d+$/) { print "is a whole number\n" }
2405 if (/^-?\d+$/) { print "is an integer\n" }
2406 if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
2407 if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
2408 if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
2409 if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
881bdbd4 2410 { print "a C float\n" }
68dc0745 2411
f0d19b68
RGS
2412There are also some commonly used modules for the task.
2413L<Scalar::Util> (distributed with 5.8) provides access to perl's
ac9dac7f
RGS
2414internal function C<looks_like_number> for determining whether a
2415variable looks like a number. L<Data::Types> exports functions that
2416validate data types using both the above and other regular
2417expressions. Thirdly, there is C<Regexp::Common> which has regular
2418expressions to match various types of numbers. Those three modules are
2419available from the CPAN.
f0d19b68
RGS
2420
2421If you're on a POSIX system, Perl supports the C<POSIX::strtod>
ac9dac7f
RGS
2422function. Its semantics are somewhat cumbersome, so here's a
2423C<getnum> wrapper function for more convenient access. This function
2424takes a string and returns the number it found, or C<undef> for input
2425that isn't a C float. The C<is_numeric> function is a front end to
2426C<getnum> if you just want to say, "Is this a float?"
2427
2428 sub getnum {
2429 use POSIX qw(strtod);
2430 my $str = shift;
2431 $str =~ s/^\s+//;
2432 $str =~ s/\s+$//;
2433 $! = 0;
2434 my($num, $unparsed) = strtod($str);
2435 if (($str eq '') || ($unparsed != 0) || $!) {
2436 return undef;
2437 }
2438 else {
2439 return $num;
2440 }
2441 }
5a964f20 2442
ac9dac7f 2443 sub is_numeric { defined getnum($_[0]) }
5a964f20 2444
f0d19b68 2445Or you could check out the L<String::Scanf> module on the CPAN
ac9dac7f
RGS
2446instead. The C<POSIX> module (part of the standard Perl distribution)
2447provides the C<strtod> and C<strtol> for converting strings to double
2448and longs, respectively.
68dc0745 2449
2450=head2 How do I keep persistent data across program calls?
2451
2452For some specific applications, you can use one of the DBM modules.
ac9dac7f
RGS
2453See L<AnyDBM_File>. More generically, you should consult the C<FreezeThaw>
2454or C<Storable> modules from CPAN. Starting from Perl 5.8 C<Storable> is part
2455of the standard distribution. Here's one example using C<Storable>'s C<store>
fe854a6f 2456and C<retrieve> functions:
65acb1b1 2457
ac9dac7f
RGS
2458 use Storable;
2459 store(\%hash, "filename");
65acb1b1 2460
ac9dac7f
RGS
2461 # later on...
2462 $href = retrieve("filename"); # by ref
2463 %hash = %{ retrieve("filename") }; # direct to hash
68dc0745 2464
2465=head2 How do I print out or copy a recursive data structure?
2466
ac9dac7f
RGS
2467The C<Data::Dumper> module on CPAN (or the 5.005 release of Perl) is great
2468for printing out data structures. The C<Storable> module on CPAN (or the
6f82c03a
EM
24695.8 release of Perl), provides a function called C<dclone> that recursively
2470copies its argument.
65acb1b1 2471
ac9dac7f
RGS
2472 use Storable qw(dclone);
2473 $r2 = dclone($r1);
68dc0745 2474
ac9dac7f 2475Where C<$r1> can be a reference to any kind of data structure you'd like.
65acb1b1
TC
2476It will be deeply copied. Because C<dclone> takes and returns references,
2477you'd have to add extra punctuation if you had a hash of arrays that
2478you wanted to copy.
68dc0745 2479
ac9dac7f 2480 %newhash = %{ dclone(\%oldhash) };
68dc0745 2481
2482=head2 How do I define methods for every class/object?
2483
109f0441
S
2484(contributed by Ben Morrow)
2485
2486You can use the C<UNIVERSAL> class (see L<UNIVERSAL>). However, please
2487be very careful to consider the consequences of doing this: adding
2488methods to every object is very likely to have unintended
2489consequences. If possible, it would be better to have all your object
2490inherit from some common base class, or to use an object system like
2491Moose that supports roles.
68dc0745 2492
2493=head2 How do I verify a credit card checksum?
2494
ac9dac7f 2495Get the C<Business::CreditCard> module from CPAN.
68dc0745 2496
65acb1b1
TC
2497=head2 How do I pack arrays of doubles or floats for XS code?
2498
109f0441 2499The arrays.h/arrays.c code in the C<PGPLOT> module on CPAN does just this.
65acb1b1 2500If you're doing a lot of float or double processing, consider using
ac9dac7f 2501the C<PDL> module from CPAN instead--it makes number-crunching easy.
65acb1b1 2502
109f0441
S
2503See L<http://search.cpan.org/dist/PGPLOT> for the code.
2504
500071f4
RGS
2505=head1 REVISION
2506
109f0441 2507Revision: $Revision$
500071f4 2508
109f0441 2509Date: $Date$
500071f4
RGS
2510
2511See L<perlfaq> for source control details and availability.
2512
68dc0745 2513=head1 AUTHOR AND COPYRIGHT
2514
109f0441 2515Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and
7678cced 2516other authors as noted. All rights reserved.
5a964f20 2517
5a7beb56
JH
2518This documentation is free; you can redistribute it and/or modify it
2519under the same terms as Perl itself.
5a964f20
TC
2520
2521Irrespective of its distribution, all code examples in this file
2522are hereby placed into the public domain. You are permitted and
2523encouraged to use this code in your own programs for fun
2524or for profit as you see fit. A simple comment in the code giving
2525credit would be courteous but is not required.