This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Update perlfaq4 examples to use Time::Piece
[perl5.git] / pod / perlfaq4.pod
CommitLineData
68dc0745 1=head1 NAME
2
109f0441 3perlfaq4 - Data Manipulation
68dc0745 4
5=head1 DESCRIPTION
6
ae3d0b9f
JH
7This section of the FAQ answers questions related to manipulating
8numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
68dc0745 9
10=head1 Data: Numbers
11
46fc3d4c 12=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
13
d12d61cf 14For the long explanation, see David Goldberg's "What Every Computer
15Scientist Should Know About Floating-Point Arithmetic"
16(http://docs.sun.com/source/806-3568/ncg_goldberg.html).
17
ac9dac7f
RGS
18Internally, your computer represents floating-point numbers in binary.
19Digital (as in powers of two) computers cannot store all numbers
20exactly. Some real numbers lose precision in the process. This is a
21problem with how computers store numbers and affects all computer
22languages, not just Perl.
46fc3d4c 23
ee891a00 24L<perlnumber> shows the gory details of number representations and
ac9dac7f 25conversions.
49d635f9 26
ac9dac7f 27To limit the number of decimal places in your numbers, you can use the
3bc3c5be 28C<printf> or C<sprintf> function. See the L<"Floating Point
ac9dac7f 29Arithmetic"|perlop> for more details.
49d635f9
RGS
30
31 printf "%.2f", 10/3;
197aec24 32
49d635f9 33 my $number = sprintf "%.2f", 10/3;
197aec24 34
32969b6e
BB
35=head2 Why is int() broken?
36
ac9dac7f 37Your C<int()> is most probably working just fine. It's the numbers that
32969b6e
BB
38aren't quite what you think.
39
ac9dac7f 40First, see the answer to "Why am I getting long decimals
32969b6e
BB
41(eg, 19.9499999999999) instead of the numbers I should be getting
42(eg, 19.95)?".
43
44For example, this
45
ac9dac7f 46 print int(0.6/0.2-2), "\n";
32969b6e
BB
47
48will in most computers print 0, not 1, because even such simple
49numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
50numbers. What you think in the above as 'three' is really more like
512.9999999999999995559.
52
68dc0745 53=head2 Why isn't my octal data interpreted correctly?
54
109f0441
S
55(contributed by brian d foy)
56
57You're probably trying to convert a string to a number, which Perl only
58converts as a decimal number. When Perl converts a string to a number, it
59ignores leading spaces and zeroes, then assumes the rest of the digits
60are in base 10:
61
62 my $string = '0644';
63
64 print $string + 0; # prints 644
65
66 print $string + 44; # prints 688, certainly not octal!
67
68This problem usually involves one of the Perl built-ins that has the
23bec515 69same name a Unix command that uses octal numbers as arguments on the
109f0441
S
70command line. In this example, C<chmod> on the command line knows that
71its first argument is octal because that's what it does:
72
73 %prompt> chmod 644 file
74
75If you want to use the same literal digits (644) in Perl, you have to tell
76Perl to treat them as octal numbers either by prefixing the digits with
77a C<0> or using C<oct>:
78
79 chmod( 0644, $file); # right, has leading zero
80 chmod( oct(644), $file ); # also correct
68dc0745 81
109f0441
S
82The problem comes in when you take your numbers from something that Perl
83thinks is a string, such as a command line argument in C<@ARGV>:
68dc0745 84
109f0441 85 chmod( $ARGV[0], $file); # wrong, even if "0644"
68dc0745 86
109f0441 87 chmod( oct($ARGV[0]), $file ); # correct, treat string as octal
33ce146f 88
109f0441
S
89You can always check the value you're using by printing it in octal
90notation to ensure it matches what you think it should be. Print it
91in octal and decimal format:
33ce146f 92
109f0441 93 printf "0%o %d", $number, $number;
33ce146f 94
65acb1b1 95=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
68dc0745 96
ac9dac7f
RGS
97Remember that C<int()> merely truncates toward 0. For rounding to a
98certain number of digits, C<sprintf()> or C<printf()> is usually the
99easiest route.
92c2ed05 100
ac9dac7f 101 printf("%.3f", 3.1415926535); # prints 3.142
68dc0745 102
ac9dac7f
RGS
103The C<POSIX> module (part of the standard Perl distribution)
104implements C<ceil()>, C<floor()>, and a number of other mathematical
105and trigonometric functions.
68dc0745 106
ac9dac7f
RGS
107 use POSIX;
108 $ceil = ceil(3.5); # 4
109 $floor = floor(3.5); # 3
92c2ed05 110
ac9dac7f
RGS
111In 5.000 to 5.003 perls, trigonometry was done in the C<Math::Complex>
112module. With 5.004, the C<Math::Trig> module (part of the standard Perl
46fc3d4c 113distribution) implements the trigonometric functions. Internally it
ac9dac7f 114uses the C<Math::Complex> module and some functions can break out from
46fc3d4c 115the real axis into the complex plane, for example the inverse sine of
1162.
68dc0745 117
118Rounding in financial applications can have serious implications, and
119the rounding method used should be specified precisely. In these
120cases, it probably pays not to trust whichever system rounding is
121being used by Perl, but to instead implement the rounding function you
122need yourself.
123
65acb1b1
TC
124To see why, notice how you'll still have an issue on half-way-point
125alternation:
126
ac9dac7f 127 for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
65acb1b1 128
ac9dac7f
RGS
129 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
130 0.8 0.8 0.9 0.9 1.0 1.0
65acb1b1 131
ac9dac7f
RGS
132Don't blame Perl. It's the same as in C. IEEE says we have to do
133this. Perl numbers whose absolute values are integers under 2**31 (on
13432 bit machines) will work pretty much like mathematical integers.
135Other numbers are not guaranteed.
65acb1b1 136
6f0efb17 137=head2 How do I convert between numeric representations/bases/radixes?
68dc0745 138
ac9dac7f
RGS
139As always with Perl there is more than one way to do it. Below are a
140few examples of approaches to making common conversions between number
141representations. This is intended to be representational rather than
142exhaustive.
68dc0745 143
ac9dac7f
RGS
144Some of the examples later in L<perlfaq4> use the C<Bit::Vector>
145module from CPAN. The reason you might choose C<Bit::Vector> over the
146perl built in functions is that it works with numbers of ANY size,
147that it is optimized for speed on some operations, and for at least
148some programmers the notation might be familiar.
d92eb7b0 149
818c4caa
JH
150=over 4
151
152=item How do I convert hexadecimal into decimal
d92eb7b0 153
ac9dac7f 154Using perl's built in conversion of C<0x> notation:
6761e064 155
ac9dac7f 156 $dec = 0xDEADBEEF;
7207e29d 157
ac9dac7f 158Using the C<hex> function:
6761e064 159
ac9dac7f 160 $dec = hex("DEADBEEF");
6761e064 161
ac9dac7f 162Using C<pack>:
6761e064 163
ac9dac7f 164 $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
6761e064 165
ac9dac7f 166Using the CPAN module C<Bit::Vector>:
6761e064 167
ac9dac7f
RGS
168 use Bit::Vector;
169 $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
170 $dec = $vec->to_Dec();
6761e064 171
818c4caa 172=item How do I convert from decimal to hexadecimal
6761e064 173
ac9dac7f 174Using C<sprintf>:
6761e064 175
ac9dac7f
RGS
176 $hex = sprintf("%X", 3735928559); # upper case A-F
177 $hex = sprintf("%x", 3735928559); # lower case a-f
6761e064 178
ac9dac7f 179Using C<unpack>:
6761e064 180
ac9dac7f 181 $hex = unpack("H*", pack("N", 3735928559));
6761e064 182
ac9dac7f 183Using C<Bit::Vector>:
6761e064 184
ac9dac7f
RGS
185 use Bit::Vector;
186 $vec = Bit::Vector->new_Dec(32, -559038737);
187 $hex = $vec->to_Hex();
6761e064 188
ac9dac7f 189And C<Bit::Vector> supports odd bit counts:
6761e064 190
ac9dac7f
RGS
191 use Bit::Vector;
192 $vec = Bit::Vector->new_Dec(33, 3735928559);
193 $vec->Resize(32); # suppress leading 0 if unwanted
194 $hex = $vec->to_Hex();
6761e064 195
818c4caa 196=item How do I convert from octal to decimal
6761e064
JH
197
198Using Perl's built in conversion of numbers with leading zeros:
199
ac9dac7f 200 $dec = 033653337357; # note the leading 0!
6761e064 201
ac9dac7f 202Using the C<oct> function:
6761e064 203
ac9dac7f 204 $dec = oct("33653337357");
6761e064 205
ac9dac7f 206Using C<Bit::Vector>:
6761e064 207
ac9dac7f
RGS
208 use Bit::Vector;
209 $vec = Bit::Vector->new(32);
210 $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
211 $dec = $vec->to_Dec();
6761e064 212
818c4caa 213=item How do I convert from decimal to octal
6761e064 214
ac9dac7f 215Using C<sprintf>:
6761e064 216
ac9dac7f 217 $oct = sprintf("%o", 3735928559);
6761e064 218
ac9dac7f 219Using C<Bit::Vector>:
6761e064 220
ac9dac7f
RGS
221 use Bit::Vector;
222 $vec = Bit::Vector->new_Dec(32, -559038737);
223 $oct = reverse join('', $vec->Chunk_List_Read(3));
6761e064 224
818c4caa 225=item How do I convert from binary to decimal
6761e064 226
2c646907 227Perl 5.6 lets you write binary numbers directly with
ac9dac7f 228the C<0b> notation:
2c646907 229
ac9dac7f 230 $number = 0b10110110;
6f0efb17 231
ac9dac7f 232Using C<oct>:
6f0efb17 233
ac9dac7f
RGS
234 my $input = "10110110";
235 $decimal = oct( "0b$input" );
2c646907 236
ac9dac7f 237Using C<pack> and C<ord>:
d92eb7b0 238
ac9dac7f 239 $decimal = ord(pack('B8', '10110110'));
68dc0745 240
ac9dac7f 241Using C<pack> and C<unpack> for larger strings:
6761e064 242
ac9dac7f 243 $int = unpack("N", pack("B32",
6761e064 244 substr("0" x 32 . "11110101011011011111011101111", -32)));
ac9dac7f 245 $dec = sprintf("%d", $int);
6761e064 246
ac9dac7f 247 # substr() is used to left pad a 32 character string with zeros.
6761e064 248
ac9dac7f 249Using C<Bit::Vector>:
6761e064 250
ac9dac7f
RGS
251 $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
252 $dec = $vec->to_Dec();
6761e064 253
818c4caa 254=item How do I convert from decimal to binary
6761e064 255
ac9dac7f 256Using C<sprintf> (perl 5.6+):
4dfcc30b 257
ac9dac7f 258 $bin = sprintf("%b", 3735928559);
4dfcc30b 259
ac9dac7f 260Using C<unpack>:
6761e064 261
ac9dac7f 262 $bin = unpack("B*", pack("N", 3735928559));
6761e064 263
ac9dac7f 264Using C<Bit::Vector>:
6761e064 265
ac9dac7f
RGS
266 use Bit::Vector;
267 $vec = Bit::Vector->new_Dec(32, -559038737);
268 $bin = $vec->to_Bin();
6761e064
JH
269
270The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
271are left as an exercise to the inclined reader.
68dc0745 272
818c4caa 273=back
68dc0745 274
65acb1b1
TC
275=head2 Why doesn't & work the way I want it to?
276
277The behavior of binary arithmetic operators depends on whether they're
278used on numbers or strings. The operators treat a string as a series
279of bits and work with that (the string C<"3"> is the bit pattern
280C<00110011>). The operators work with the binary form of a number
281(the number C<3> is treated as the bit pattern C<00000011>).
282
283So, saying C<11 & 3> performs the "and" operation on numbers (yielding
49d635f9 284C<3>). Saying C<"11" & "3"> performs the "and" operation on strings
65acb1b1
TC
285(yielding C<"1">).
286
287Most problems with C<&> and C<|> arise because the programmer thinks
288they have a number but really it's a string. The rest arise because
289the programmer says:
290
ac9dac7f
RGS
291 if ("\020\020" & "\101\101") {
292 # ...
293 }
65acb1b1
TC
294
295but a string consisting of two null bytes (the result of C<"\020\020"
296& "\101\101">) is not a false value in Perl. You need:
297
ac9dac7f
RGS
298 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
299 # ...
300 }
65acb1b1 301
68dc0745 302=head2 How do I multiply matrices?
303
d12d61cf 304Use the C<Math::Matrix> or C<Math::MatrixReal> modules (available from CPAN)
305or the C<PDL> extension (also available from CPAN).
68dc0745 306
307=head2 How do I perform an operation on a series of integers?
308
309To call a function on each element in an array, and collect the
310results, use:
311
ac9dac7f 312 @results = map { my_func($_) } @array;
68dc0745 313
314For example:
315
ac9dac7f 316 @triple = map { 3 * $_ } @single;
68dc0745 317
318To call a function on each element of an array, but ignore the
319results:
320
ac9dac7f
RGS
321 foreach $iterator (@array) {
322 some_func($iterator);
323 }
68dc0745 324
325To call a function on each integer in a (small) range, you B<can> use:
326
ac9dac7f 327 @results = map { some_func($_) } (5 .. 25);
68dc0745 328
329but you should be aware that the C<..> operator creates an array of
330all integers in the range. This can take a lot of memory for large
331ranges. Instead use:
332
ac9dac7f 333 @results = ();
eaffe51e 334 for ($i=5; $i <= 500_005; $i++) {
ac9dac7f
RGS
335 push(@results, some_func($i));
336 }
68dc0745 337
87275199
GS
338This situation has been fixed in Perl5.005. Use of C<..> in a C<for>
339loop will iterate over the range, without creating the entire range.
340
ac9dac7f
RGS
341 for my $i (5 .. 500_005) {
342 push(@results, some_func($i));
343 }
87275199
GS
344
345will not create a list of 500,000 integers.
346
68dc0745 347=head2 How can I output Roman numerals?
348
d12d61cf 349Get the L<http://www.cpan.org/modules/by-module/Roman> module.
68dc0745 350
351=head2 Why aren't my random numbers random?
352
65acb1b1
TC
353If you're using a version of Perl before 5.004, you must call C<srand>
354once at the start of your program to seed the random number generator.
49d635f9 355
5cd0b561 356 BEGIN { srand() if $] < 5.004 }
49d635f9 357
65acb1b1 3585.004 and later automatically call C<srand> at the beginning. Don't
ac9dac7f
RGS
359call C<srand> more than once--you make your numbers less random,
360rather than more.
92c2ed05 361
65acb1b1 362Computers are good at being predictable and bad at being random
06a5f41f 363(despite appearances caused by bugs in your programs :-). see the
49d635f9 364F<random> article in the "Far More Than You Ever Wanted To Know"
d12d61cf 365collection in L<http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz>, courtesy
ac9dac7f 366of Tom Phoenix, talks more about this. John von Neumann said, "Anyone
06a5f41f 367who attempts to generate random numbers by deterministic means is, of
b432a672 368course, living in a state of sin."
65acb1b1
TC
369
370If you want numbers that are more random than C<rand> with C<srand>
ac9dac7f 371provides, you should also check out the C<Math::TrulyRandom> module from
65acb1b1
TC
372CPAN. It uses the imperfections in your system's timer to generate
373random numbers, but this takes quite a while. If you want a better
92c2ed05 374pseudorandom generator than comes with your operating system, look at
d12d61cf 375"Numerical Recipes in C" at L<http://www.nr.com/>.
68dc0745 376
881bdbd4
JH
377=head2 How do I get a random number between X and Y?
378
ee891a00 379To get a random number between two values, you can use the C<rand()>
109f0441 380built-in to get a random number between 0 and 1. From there, you shift
ee891a00 381that into the range that you want.
500071f4 382
ee891a00
RGS
383C<rand($x)> returns a number such that C<< 0 <= rand($x) < $x >>. Thus
384what you want to have perl figure out is a random number in the range
385from 0 to the difference between your I<X> and I<Y>.
793f5136 386
ee891a00
RGS
387That is, to get a number between 10 and 15, inclusive, you want a
388random number between 0 and 5 that you can then add to 10.
793f5136 389
109f0441 390 my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
793f5136
RGS
391
392Hence you derive the following simple function to abstract
393that. It selects a random integer between the two given
500071f4
RGS
394integers (inclusive), For example: C<random_int_between(50,120)>.
395
ac9dac7f 396 sub random_int_between {
500071f4
RGS
397 my($min, $max) = @_;
398 # Assumes that the two arguments are integers themselves!
399 return $min if $min == $max;
400 ($min, $max) = ($max, $min) if $min > $max;
401 return $min + int rand(1 + $max - $min);
402 }
881bdbd4 403
68dc0745 404=head1 Data: Dates
405
5cd0b561 406=head2 How do I find the day or week of the year?
68dc0745 407
d12d61cf 408The C<localtime> function returns the day of the year. Without an
409argument C<localtime> uses the current time.
68dc0745 410
92435912 411 my $day_of_year = (localtime)[7];
ffc145e8 412
ac9dac7f 413The C<POSIX> module can also format a date as the day of the year or
5cd0b561 414week of the year.
68dc0745 415
5cd0b561
RGS
416 use POSIX qw/strftime/;
417 my $day_of_year = strftime "%j", localtime;
418 my $week_of_year = strftime "%W", localtime;
419
ac9dac7f 420To get the day of year for any date, use C<POSIX>'s C<mktime> to get
d12d61cf 421a time in epoch seconds for the argument to C<localtime>.
ffc145e8 422
ac9dac7f 423 use POSIX qw/mktime strftime/;
6670e5e7 424 my $week_of_year = strftime "%W",
ac9dac7f 425 localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
5cd0b561 426
92435912 427You can also use C<Time::Piece>, which comes with Perl and provides a
428C<localtime> that returns an object:
429
430 use Time::Piece;
431 my $day_of_year = localtime->yday;
432 my $week_of_year = localtime->week;
433
434The C<Date::Calc> module provides two functions to calculate these too:.
5cd0b561
RGS
435
436 use Date::Calc;
437 my $day_of_year = Day_of_Year( 1987, 12, 18 );
438 my $week_of_year = Week_of_Year( 1987, 12, 18 );
ffc145e8 439
d92eb7b0
GS
440=head2 How do I find the current century or millennium?
441
442Use the following simple functions:
443
ac9dac7f
RGS
444 sub get_century {
445 return int((((localtime(shift || time))[5] + 1999))/100);
446 }
6670e5e7 447
ac9dac7f
RGS
448 sub get_millennium {
449 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
450 }
d92eb7b0 451
ac9dac7f
RGS
452On some systems, the C<POSIX> module's C<strftime()> function has been
453extended in a non-standard way to use a C<%C> format, which they
454sometimes claim is the "century". It isn't, because on most such
455systems, this is only the first two digits of the four-digit year, and
456thus cannot be used to reliably determine the current century or
457millennium.
d92eb7b0 458
92c2ed05 459=head2 How can I compare two dates and find the difference?
68dc0745 460
b68463f7
RGS
461(contributed by brian d foy)
462
ac9dac7f 463You could just store all your dates as a number and then subtract.
92435912 464Life isn't always that simple though.
465
466The C<Time::Piece> module, which comes with Perl, replaces C<localtime>
467with a version that returns an object. It also overloads the comparison
468operators so you can compare them directly:
469
470 use Time::Piece;
471 my $date1 = localtime( $some_time );
472 my $date2 = localtime( $some_other_time );
473
474 if( $date1 < $date2 ) {
475 print "The date was in the past\n";
476 }
477
478You can also get differences with a subtraction, which returns a
479C<Time::Seconds> object:
480
481 my $diff = $date1 - $date2;
482 print "The difference is ", $date_diff->days, " days\n";
483
484If you want to work with formatted dates, the C<Date::Manip>,
485C<Date::Calc>, or C<DateTime> modules can help you.
68dc0745 486
487=head2 How can I take a string and turn it into epoch seconds?
488
489If it's a regular enough string that it always has the same format,
92c2ed05 490you can split it up and pass the parts to C<timelocal> in the standard
ac9dac7f 491C<Time::Local> module. Otherwise, you should look into the C<Date::Calc>
92435912 492C<Date::Parse>, and C<Date::Manip> modules from CPAN.
68dc0745 493
494=head2 How can I find the Julian Day?
495
7678cced
RGS
496(contributed by brian d foy and Dave Cross)
497
92435912 498You can use the C<Time::Piece> module, part of the Standard Library,
499which can convert a date/time to a Julian Day:
7678cced 500
92435912 501 $ perl -MTime::Piece -le 'print localtime->julian_day'
502 2455607.7959375
7678cced 503
92435912 504Or the modified Julian Day:
7678cced 505
92435912 506 $ perl -MTime::Piece -le 'print localtime->mjd'
507 55607.2961226851
7678cced
RGS
508
509Or even the day of the year (which is what some people think of as a
92435912 510Julian day):
511
512 $ perl -MTime::Piece -le 'print localtime->yday'
513 45
514
515You can also do the same things with the C<DateTime> module:
7678cced 516
92435912 517 $ perl -MDateTime -le'print DateTime->today->jd'
518 2453401.5
519 $ perl -MDateTime -le'print DateTime->today->mjd'
520 53401
ac9dac7f
RGS
521 $ perl -MDateTime -le'print DateTime->today->doy'
522 31
be94a901 523
92435912 524You can use the C<Time::JulianDay> module available on CPAN. Ensure
525that you really want to find a Julian day, though, as many people have
526different ideas about Julian days (see http://www.hermetic.ch/cal_stud/jdn.htm
527for instance):
528
529 $ perl -MTime::JulianDay -le 'print local_julian_day( time )'
530 55608
531
65acb1b1 532=head2 How do I find yesterday's date?
109f0441
S
533X<date> X<yesterday> X<DateTime> X<Date::Calc> X<Time::Local>
534X<daylight saving time> X<day> X<Today_and_Now> X<localtime>
535X<timelocal>
65acb1b1 536
6670e5e7 537(contributed by brian d foy)
49d635f9 538
92435912 539To do it correctly, you can use one of the C<Date> modules since they
540work with calendars instead of times. The C<DateTime> module makes it
541simple, and give you the same time of day, only the day before,
542despite daylight saving time changes:
49d635f9 543
6670e5e7 544 use DateTime;
58103a2e 545
6670e5e7 546 my $yesterday = DateTime->now->subtract( days => 1 );
58103a2e 547
6670e5e7 548 print "Yesterday was $yesterday\n";
49d635f9 549
ee891a00 550You can also use the C<Date::Calc> module using its C<Today_and_Now>
6670e5e7 551function.
49d635f9 552
6670e5e7 553 use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
58103a2e 554
6670e5e7 555 my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
58103a2e 556
ee891a00 557 print "@date_time\n";
58103a2e 558
6670e5e7
RGS
559Most people try to use the time rather than the calendar to figure out
560dates, but that assumes that days are twenty-four hours each. For
561most people, there are two days a year when they aren't: the switch to
92435912 562and from summer time throws this off. For example, the rest of the
563suggestions will be wrong sometimes:
564
565Starting with Perl 5.10, C<Time::Piece> and C<Time::Seconds> are part
566of the standard distribution, so you might think that you could do
567something like this:
568
569 use Time::Piece;
570 use Time::Seconds;
571
572 my $yesterday = localtime() - ONE_DAY; # WRONG
573 print "Yesterday was $yesterday\n";
574
575The C<Time::Piece> module exports a new C<localtime> that returns an
576object, and C<Time::Seconds> exports the C<ONE_DAY> constant that is a
577set number of seconds. This means that it always gives the time 24
578hours ago, which is not always yesterday. This can cause problems
579around the end of daylight saving time when there's one day that is 25
580hours long.
d92eb7b0 581
92435912 582You have the same problem with C<Time::Local>, which will give the wrong
583answer for those same special cases:
109f0441
S
584
585 # contributed by Gunnar Hjalmarsson
586 use Time::Local;
587 my $today = timelocal 0, 0, 12, ( localtime )[3..5];
92435912 588 my ($d, $m, $y) = ( localtime $today-86400 )[3..5]; # WRONG
109f0441
S
589 printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d;
590
3bc3c5be 591=head2 Does Perl have a Year 2000 or 2038 problem? Is Perl Y2K compliant?
592
593(contributed by brian d foy)
594
23bec515 595Perl itself never had a Y2K problem, although that never stopped people
3bc3c5be 596from creating Y2K problems on their own. See the documentation for
597C<localtime> for its proper use.
598
92435912 599Starting with Perl 5.12, C<localtime> and C<gmtime> can handle dates past
3bc3c5be 60003:14:08 January 19, 2038, when a 32-bit based time would overflow. You
601still might get a warning on a 32-bit C<perl>:
602
92435912 603 % perl5.12 -E 'say scalar localtime( 0x9FFF_FFFFFFFF )'
3bc3c5be 604 Integer overflow in hexadecimal number at -e line 1.
605 Wed Nov 1 19:42:39 5576711
606
607On a 64-bit C<perl>, you can get even larger dates for those really long
608running projects:
609
92435912 610 % perl5.12 -E 'say scalar gmtime( 0x9FFF_FFFFFFFF )'
3bc3c5be 611 Thu Nov 2 00:42:39 5576711
612
701f2f01 613You're still out of luck if you need to keep track of decaying protons
3bc3c5be 614though.
5a964f20 615
68dc0745 616=head1 Data: Strings
617
618=head2 How do I validate input?
619
6670e5e7
RGS
620(contributed by brian d foy)
621
622There are many ways to ensure that values are what you expect or
623want to accept. Besides the specific examples that we cover in the
624perlfaq, you can also look at the modules with "Assert" and "Validate"
625in their names, along with other modules such as C<Regexp::Common>.
626
627Some modules have validation for particular types of input, such
628as C<Business::ISBN>, C<Business::CreditCard>, C<Email::Valid>,
629and C<Data::Validate::IP>.
68dc0745 630
631=head2 How do I unescape a string?
632
b432a672 633It depends just what you mean by "escape". URL escapes are dealt
92c2ed05 634with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
a6dd486b 635character are removed with
68dc0745 636
ac9dac7f 637 s/\\(.)/$1/g;
68dc0745 638
92c2ed05 639This won't expand C<"\n"> or C<"\t"> or any other special escapes.
68dc0745 640
641=head2 How do I remove consecutive pairs of characters?
642
6670e5e7
RGS
643(contributed by brian d foy)
644
645You can use the substitution operator to find pairs of characters (or
646runs of characters) and replace them with a single instance. In this
647substitution, we find a character in C<(.)>. The memory parentheses
d8b950dc 648store the matched character in the back-reference C<\g1> and we use
6670e5e7
RGS
649that to require that the same thing immediately follow it. We replace
650that part of the string with the character in C<$1>.
68dc0745 651
d8b950dc 652 s/(.)\g1/$1/g;
d92eb7b0 653
6670e5e7
RGS
654We can also use the transliteration operator, C<tr///>. In this
655example, the search list side of our C<tr///> contains nothing, but
656the C<c> option complements that so it contains everything. The
657replacement list also contains nothing, so the transliteration is
658almost a no-op since it won't do any replacements (or more exactly,
659replace the character with itself). However, the C<s> option squashes
660duplicated and consecutive characters in the string so a character
661does not show up next to itself
d92eb7b0 662
6670e5e7 663 my $str = 'Haarlem'; # in the Netherlands
ac9dac7f 664 $str =~ tr///cs; # Now Harlem, like in New York
68dc0745 665
666=head2 How do I expand function calls in a string?
667
6670e5e7
RGS
668(contributed by brian d foy)
669
670This is documented in L<perlref>, and although it's not the easiest
671thing to read, it does work. In each of these examples, we call the
58103a2e 672function inside the braces used to dereference a reference. If we
5ae37c3f 673have more than one return value, we can construct and dereference an
6670e5e7
RGS
674anonymous array. In this case, we call the function in list context.
675
58103a2e 676 print "The time values are @{ [localtime] }.\n";
6670e5e7
RGS
677
678If we want to call the function in scalar context, we have to do a bit
679more work. We can really have any code we like inside the braces, so
680we simply have to end with the scalar reference, although how you do
e573f903
RGS
681that is up to you, and you can use code inside the braces. Note that
682the use of parens creates a list context, so we need C<scalar> to
683force the scalar context on the function:
68dc0745 684
6670e5e7 685 print "The time is ${\(scalar localtime)}.\n"
58103a2e 686
6670e5e7 687 print "The time is ${ my $x = localtime; \$x }.\n";
58103a2e 688
6670e5e7
RGS
689If your function already returns a reference, you don't need to create
690the reference yourself.
691
692 sub timestamp { my $t = localtime; \$t }
58103a2e 693
6670e5e7 694 print "The time is ${ timestamp() }.\n";
58103a2e
RGS
695
696The C<Interpolation> module can also do a lot of magic for you. You can
697specify a variable name, in this case C<E>, to set up a tied hash that
698does the interpolation for you. It has several other methods to do this
699as well.
700
701 use Interpolation E => 'eval';
702 print "The time values are $E{localtime()}.\n";
703
704In most cases, it is probably easier to simply use string concatenation,
705which also forces scalar context.
6670e5e7 706
ac9dac7f 707 print "The time is " . localtime() . ".\n";
68dc0745 708
68dc0745 709=head2 How do I find matching/nesting anything?
710
92c2ed05
GS
711This isn't something that can be done in one regular expression, no
712matter how complicated. To find something between two single
713characters, a pattern like C</x([^x]*)x/> will get the intervening
714bits in $1. For multiple ones, then something more like
ac9dac7f 715C</alpha(.*?)omega/> would be needed. But none of these deals with
6670e5e7
RGS
716nested patterns. For balanced expressions using C<(>, C<{>, C<[> or
717C<< < >> as delimiters, use the CPAN module Regexp::Common, or see
718L<perlre/(??{ code })>. For other cases, you'll have to write a
719parser.
92c2ed05
GS
720
721If you are serious about writing a parser, there are a number of
6a2af475 722modules or oddities that will make your life a lot easier. There are
ac9dac7f
RGS
723the CPAN modules C<Parse::RecDescent>, C<Parse::Yapp>, and
724C<Text::Balanced>; and the C<byacc> program. Starting from perl 5.8
725the C<Text::Balanced> is part of the standard distribution.
68dc0745 726
92c2ed05
GS
727One simple destructive, inside-out approach that you might try is to
728pull out the smallest nesting parts one at a time:
5a964f20 729
ac9dac7f
RGS
730 while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
731 # do something with $1
732 }
5a964f20 733
65acb1b1
TC
734A more complicated and sneaky approach is to make Perl's regular
735expression engine do it for you. This is courtesy Dean Inada, and
736rather has the nature of an Obfuscated Perl Contest entry, but it
737really does work:
738
ac9dac7f
RGS
739 # $_ contains the string to parse
740 # BEGIN and END are the opening and closing markers for the
741 # nested text.
c47ff5f1 742
ac9dac7f
RGS
743 @( = ('(','');
744 @) = (')','');
745 ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
746 @$ = (eval{/$re/},$@!~/unmatched/i);
747 print join("\n",@$[0..$#$]) if( $$[-1] );
65acb1b1 748
68dc0745 749=head2 How do I reverse a string?
750
ac9dac7f 751Use C<reverse()> in scalar context, as documented in
68dc0745 752L<perlfunc/reverse>.
753
ac9dac7f 754 $reversed = reverse $string;
68dc0745 755
756=head2 How do I expand tabs in a string?
757
5a964f20 758You can do it yourself:
68dc0745 759
ac9dac7f 760 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
68dc0745 761
ac9dac7f 762Or you can just use the C<Text::Tabs> module (part of the standard Perl
68dc0745 763distribution).
764
ac9dac7f
RGS
765 use Text::Tabs;
766 @expanded_lines = expand(@lines_with_tabs);
68dc0745 767
768=head2 How do I reformat a paragraph?
769
ac9dac7f 770Use C<Text::Wrap> (part of the standard Perl distribution):
68dc0745 771
ac9dac7f
RGS
772 use Text::Wrap;
773 print wrap("\t", ' ', @paragraphs);
68dc0745 774
ac9dac7f
RGS
775The paragraphs you give to C<Text::Wrap> should not contain embedded
776newlines. C<Text::Wrap> doesn't justify the lines (flush-right).
46fc3d4c 777
ac9dac7f
RGS
778Or use the CPAN module C<Text::Autoformat>. Formatting files can be
779easily done by making a shell alias, like so:
bc06af74 780
ac9dac7f
RGS
781 alias fmt="perl -i -MText::Autoformat -n0777 \
782 -e 'print autoformat $_, {all=>1}' $*"
bc06af74 783
ac9dac7f 784See the documentation for C<Text::Autoformat> to appreciate its many
bc06af74
JH
785capabilities.
786
49d635f9 787=head2 How can I access or change N characters of a string?
68dc0745 788
49d635f9
RGS
789You can access the first characters of a string with substr().
790To get the first character, for example, start at position 0
197aec24 791and grab the string of length 1.
68dc0745 792
68dc0745 793
49d635f9 794 $string = "Just another Perl Hacker";
ac9dac7f 795 $first_char = substr( $string, 0, 1 ); # 'J'
68dc0745 796
49d635f9
RGS
797To change part of a string, you can use the optional fourth
798argument which is the replacement string.
68dc0745 799
ac9dac7f 800 substr( $string, 13, 4, "Perl 5.8.0" );
197aec24 801
49d635f9 802You can also use substr() as an lvalue.
68dc0745 803
ac9dac7f 804 substr( $string, 13, 4 ) = "Perl 5.8.0";
197aec24 805
68dc0745 806=head2 How do I change the Nth occurrence of something?
807
92c2ed05
GS
808You have to keep track of N yourself. For example, let's say you want
809to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
d92eb7b0
GS
810C<"whosoever"> or C<"whomsoever">, case insensitively. These
811all assume that $_ contains the string to be altered.
68dc0745 812
ac9dac7f
RGS
813 $count = 0;
814 s{((whom?)ever)}{
815 ++$count == 5 # is it the 5th?
816 ? "${2}soever" # yes, swap
817 : $1 # renege and leave it there
818 }ige;
68dc0745 819
5a964f20
TC
820In the more general case, you can use the C</g> modifier in a C<while>
821loop, keeping count of matches.
822
ac9dac7f
RGS
823 $WANT = 3;
824 $count = 0;
825 $_ = "One fish two fish red fish blue fish";
826 while (/(\w+)\s+fish\b/gi) {
827 if (++$count == $WANT) {
828 print "The third fish is a $1 one.\n";
829 }
830 }
5a964f20 831
92c2ed05 832That prints out: C<"The third fish is a red one."> You can also use a
5a964f20
TC
833repetition count and repeated pattern like this:
834
ac9dac7f 835 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
5a964f20 836
68dc0745 837=head2 How can I count the number of occurrences of a substring within a string?
838
a6dd486b 839There are a number of ways, with varying efficiency. If you want a
68dc0745 840count of a certain single character (X) within a string, you can use the
841C<tr///> function like so:
842
ac9dac7f
RGS
843 $string = "ThisXlineXhasXsomeXx'sXinXit";
844 $count = ($string =~ tr/X//);
845 print "There are $count X characters in the string";
68dc0745 846
847This is fine if you are just looking for a single character. However,
848if you are trying to count multiple character substrings within a
849larger string, C<tr///> won't work. What you can do is wrap a while()
850loop around a global pattern match. For example, let's count negative
851integers:
852
ac9dac7f
RGS
853 $string = "-9 55 48 -2 23 -76 4 14 -44";
854 while ($string =~ /-\d+/g) { $count++ }
855 print "There are $count negative numbers in the string";
68dc0745 856
881bdbd4
JH
857Another version uses a global match in list context, then assigns the
858result to a scalar, producing a count of the number of matches.
859
860 $count = () = $string =~ /-\d+/g;
861
109f0441
S
862=head2 How do I capitalize all the words on one line?
863X<Text::Autoformat> X<capitalize> X<case, title> X<case, sentence>
5a964f20 864
109f0441 865(contributed by brian d foy)
65acb1b1 866
109f0441
S
867Damian Conway's L<Text::Autoformat> handles all of the thinking
868for you.
369b44b4 869
ac9dac7f
RGS
870 use Text::Autoformat;
871 my $x = "Dr. Strangelove or: How I Learned to Stop ".
872 "Worrying and Love the Bomb";
369b44b4 873
ac9dac7f
RGS
874 print $x, "\n";
875 for my $style (qw( sentence title highlight )) {
876 print autoformat($x, { case => $style }), "\n";
877 }
369b44b4 878
109f0441
S
879How do you want to capitalize those words?
880
881 FRED AND BARNEY'S LODGE # all uppercase
882 Fred And Barney's Lodge # title case
883 Fred and Barney's Lodge # highlight case
884
885It's not as easy a problem as it looks. How many words do you think
886are in there? Wait for it... wait for it.... If you answered 5
887you're right. Perl words are groups of C<\w+>, but that's not what
888you want to capitalize. How is Perl supposed to know not to capitalize
889that C<s> after the apostrophe? You could try a regular expression:
890
891 $string =~ s/ (
892 (^\w) #at the beginning of the line
893 | # or
894 (\s\w) #preceded by whitespace
895 )
896 /\U$1/xg;
897
898 $string =~ s/([\w']+)/\u\L$1/g;
899
900Now, what if you don't want to capitalize that "and"? Just use
901L<Text::Autoformat> and get on with the next problem. :)
902
49d635f9 903=head2 How can I split a [character] delimited string except when inside [character]?
68dc0745 904
ac9dac7f
RGS
905Several modules can handle this sort of parsing--C<Text::Balanced>,
906C<Text::CSV>, C<Text::CSV_XS>, and C<Text::ParseWords>, among others.
49d635f9
RGS
907
908Take the example case of trying to split a string that is
909comma-separated into its different fields. You can't use C<split(/,/)>
910because you shouldn't split if the comma is inside quotes. For
911example, take a data line like this:
68dc0745 912
ac9dac7f 913 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
68dc0745 914
915Due to the restriction of the quotes, this is a fairly complex
197aec24 916problem. Thankfully, we have Jeffrey Friedl, author of
49d635f9 917I<Mastering Regular Expressions>, to handle these for us. He
ac9dac7f 918suggests (assuming your string is contained in C<$text>):
68dc0745 919
ac9dac7f
RGS
920 @new = ();
921 push(@new, $+) while $text =~ m{
922 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
923 | ([^,]+),?
924 | ,
925 }gx;
926 push(@new, undef) if substr($text,-1,1) eq ',';
68dc0745 927
46fc3d4c 928If you want to represent quotation marks inside a
929quotation-mark-delimited field, escape them with backslashes (eg,
49d635f9 930C<"like \"this\"">.
46fc3d4c 931
ac9dac7f
RGS
932Alternatively, the C<Text::ParseWords> module (part of the standard
933Perl distribution) lets you say:
68dc0745 934
ac9dac7f
RGS
935 use Text::ParseWords;
936 @new = quotewords(",", 0, $text);
65acb1b1 937
68dc0745 938=head2 How do I strip blank space from the beginning/end of a string?
939
6670e5e7 940(contributed by brian d foy)
68dc0745 941
6670e5e7
RGS
942A substitution can do this for you. For a single line, you want to
943replace all the leading or trailing whitespace with nothing. You
960c6898 944can do that with a pair of substitutions:
68dc0745 945
6670e5e7
RGS
946 s/^\s+//;
947 s/\s+$//;
68dc0745 948
6670e5e7
RGS
949You can also write that as a single substitution, although it turns
950out the combined statement is slower than the separate ones. That
960c6898 951might not matter to you, though:
68dc0745 952
6670e5e7 953 s/^\s+|\s+$//g;
68dc0745 954
6670e5e7
RGS
955In this regular expression, the alternation matches either at the
956beginning or the end of the string since the anchors have a lower
957precedence than the alternation. With the C</g> flag, the substitution
958makes all possible matches, so it gets both. Remember, the trailing
959newline matches the C<\s+>, and the C<$> anchor can match to the
960c6898 960absolute end of the string, so the newline disappears too. Just add
6670e5e7
RGS
961the newline to the output, which has the added benefit of preserving
962"blank" (consisting entirely of whitespace) lines which the C<^\s+>
960c6898 963would remove all by itself:
68dc0745 964
960c6898 965 while( <> ) {
6670e5e7
RGS
966 s/^\s+|\s+$//g;
967 print "$_\n";
968 }
5a964f20 969
960c6898 970For a multi-line string, you can apply the regular expression to each
971logical line in the string by adding the C</m> flag (for
6670e5e7 972"multi-line"). With the C</m> flag, the C<$> matches I<before> an
960c6898 973embedded newline, so it doesn't remove it. This pattern still removes
974the newline at the end of the string:
6670e5e7 975
ac9dac7f 976 $string =~ s/^\s+|\s+$//gm;
6670e5e7
RGS
977
978Remember that lines consisting entirely of whitespace will disappear,
979since the first part of the alternation can match the entire string
960c6898 980and replace it with nothing. If you need to keep embedded blank lines,
6670e5e7 981you have to do a little more work. Instead of matching any whitespace
960c6898 982(since that includes a newline), just match the other whitespace:
6670e5e7
RGS
983
984 $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
5a964f20 985
65acb1b1
TC
986=head2 How do I pad a string with blanks or pad a number with zeroes?
987
65acb1b1 988In the following examples, C<$pad_len> is the length to which you wish
d92eb7b0
GS
989to pad the string, C<$text> or C<$num> contains the string to be padded,
990and C<$pad_char> contains the padding character. You can use a single
991character string constant instead of the C<$pad_char> variable if you
992know what it is in advance. And in the same way you can use an integer in
993place of C<$pad_len> if you know the pad length in advance.
65acb1b1 994
d92eb7b0
GS
995The simplest method uses the C<sprintf> function. It can pad on the left
996or right with blanks and on the left with zeroes and it will not
997truncate the result. The C<pack> function can only pad strings on the
998right with blanks and it will truncate the result to a maximum length of
999C<$pad_len>.
65acb1b1 1000
ac9dac7f 1001 # Left padding a string with blanks (no truncation):
04d666b1
RGS
1002 $padded = sprintf("%${pad_len}s", $text);
1003 $padded = sprintf("%*s", $pad_len, $text); # same thing
65acb1b1 1004
ac9dac7f 1005 # Right padding a string with blanks (no truncation):
04d666b1
RGS
1006 $padded = sprintf("%-${pad_len}s", $text);
1007 $padded = sprintf("%-*s", $pad_len, $text); # same thing
65acb1b1 1008
ac9dac7f 1009 # Left padding a number with 0 (no truncation):
04d666b1
RGS
1010 $padded = sprintf("%0${pad_len}d", $num);
1011 $padded = sprintf("%0*d", $pad_len, $num); # same thing
65acb1b1 1012
ac9dac7f
RGS
1013 # Right padding a string with blanks using pack (will truncate):
1014 $padded = pack("A$pad_len",$text);
65acb1b1 1015
d92eb7b0
GS
1016If you need to pad with a character other than blank or zero you can use
1017one of the following methods. They all generate a pad string with the
1018C<x> operator and combine that with C<$text>. These methods do
1019not truncate C<$text>.
65acb1b1 1020
d92eb7b0 1021Left and right padding with any character, creating a new string:
65acb1b1 1022
ac9dac7f
RGS
1023 $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
1024 $padded = $text . $pad_char x ( $pad_len - length( $text ) );
65acb1b1 1025
d92eb7b0 1026Left and right padding with any character, modifying C<$text> directly:
65acb1b1 1027
ac9dac7f
RGS
1028 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
1029 $text .= $pad_char x ( $pad_len - length( $text ) );
65acb1b1 1030
68dc0745 1031=head2 How do I extract selected columns from a string?
1032
e573f903
RGS
1033(contributed by brian d foy)
1034
d12d61cf 1035If you know the columns that contain the data, you can
e573f903
RGS
1036use C<substr> to extract a single column.
1037
1038 my $column = substr( $line, $start_column, $length );
1039
1040You can use C<split> if the columns are separated by whitespace or
1041some other delimiter, as long as whitespace or the delimiter cannot
1042appear as part of the data.
1043
1044 my $line = ' fred barney betty ';
1045 my @columns = split /\s+/, $line;
1046 # ( '', 'fred', 'barney', 'betty' );
1047
1048 my $line = 'fred||barney||betty';
1049 my @columns = split /\|/, $line;
1050 # ( 'fred', '', 'barney', '', 'betty' );
1051
1052If you want to work with comma-separated values, don't do this since
1053that format is a bit more complicated. Use one of the modules that
109f0441 1054handle that format, such as C<Text::CSV>, C<Text::CSV_XS>, or
e573f903
RGS
1055C<Text::CSV_PP>.
1056
1057If you want to break apart an entire line of fixed columns, you can use
589a5df2 1058C<unpack> with the A (ASCII) format. By using a number after the format
e573f903
RGS
1059specifier, you can denote the column width. See the C<pack> and C<unpack>
1060entries in L<perlfunc> for more details.
1061
1062 my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
1063
1064Note that spaces in the format argument to C<unpack> do not denote literal
1065spaces. If you have space separated data, you may want C<split> instead.
68dc0745 1066
1067=head2 How do I find the soundex value of a string?
1068
7678cced
RGS
1069(contributed by brian d foy)
1070
1071You can use the Text::Soundex module. If you want to do fuzzy or close
ac9dac7f
RGS
1072matching, you might also try the C<String::Approx>, and
1073C<Text::Metaphone>, and C<Text::DoubleMetaphone> modules.
68dc0745 1074
1075=head2 How can I expand variables in text strings?
1076
e573f903 1077(contributed by brian d foy)
5a964f20 1078
322be77c 1079If you can avoid it, don't, or if you can use a templating system,
c195e131
RGS
1080such as C<Text::Template> or C<Template> Toolkit, do that instead. You
1081might even be able to get the job done with C<sprintf> or C<printf>:
1082
1083 my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
322be77c
RGS
1084
1085However, for the one-off simple case where I don't want to pull out a
1086full templating system, I'll use a string that has two Perl scalar
1087variables in it. In this example, I want to expand C<$foo> and C<$bar>
c195e131 1088to their variable's values:
e573f903
RGS
1089
1090 my $foo = 'Fred';
1091 my $bar = 'Barney';
1092 $string = 'Say hello to $foo and $bar';
1093
1094One way I can do this involves the substitution operator and a double
1095C</e> flag. The first C</e> evaluates C<$1> on the replacement side and
1096turns it into C<$foo>. The second /e starts with C<$foo> and replaces
1097it with its value. C<$foo>, then, turns into 'Fred', and that's finally
c195e131 1098what's left in the string:
e573f903
RGS
1099
1100 $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
322be77c 1101
e573f903 1102The C</e> will also silently ignore violations of strict, replacing
c195e131 1103undefined variable names with the empty string. Since I'm using the
109f0441 1104C</e> flag (twice even!), I have all of the same security problems I
c195e131
RGS
1105have with C<eval> in its string form. If there's something odd in
1106C<$foo>, perhaps something like C<@{[ system "rm -rf /" ]}>, then
1107I could get myself in trouble.
1108
1109To get around the security problem, I could also pull the values from
1110a hash instead of evaluating variable names. Using a single C</e>, I
1111can check the hash to ensure the value exists, and if it doesn't, I
1112can replace the missing value with a marker, in this case C<???> to
1113signal that I missed something:
e573f903
RGS
1114
1115 my $string = 'This has $foo and $bar';
109f0441 1116
e573f903
RGS
1117 my %Replacements = (
1118 foo => 'Fred',
ac9dac7f 1119 );
322be77c 1120
e573f903
RGS
1121 # $string =~ s/\$(\w+)/$Replacements{$1}/g;
1122 $string =~ s/\$(\w+)/
1123 exists $Replacements{$1} ? $Replacements{$1} : '???'
1124 /eg;
322be77c 1125
e573f903 1126 print $string;
322be77c 1127
68dc0745 1128=head2 What's wrong with always quoting "$vars"?
1129
ac9dac7f 1130The problem is that those double-quotes force
e573f903
RGS
1131stringification--coercing numbers and references into strings--even
1132when you don't want them to be strings. Think of it this way:
1133double-quote expansion is used to produce new strings. If you already
1134have a string, why do you need more?
68dc0745 1135
1136If you get used to writing odd things like these:
1137
ac9dac7f
RGS
1138 print "$var"; # BAD
1139 $new = "$old"; # BAD
1140 somefunc("$var"); # BAD
68dc0745 1141
1142You'll be in trouble. Those should (in 99.8% of the cases) be
1143the simpler and more direct:
1144
ac9dac7f
RGS
1145 print $var;
1146 $new = $old;
1147 somefunc($var);
68dc0745 1148
1149Otherwise, besides slowing you down, you're going to break code when
1150the thing in the scalar is actually neither a string nor a number, but
1151a reference:
1152
ac9dac7f
RGS
1153 func(\@array);
1154 sub func {
1155 my $aref = shift;
1156 my $oref = "$aref"; # WRONG
1157 }
68dc0745 1158
1159You can also get into subtle problems on those few operations in Perl
1160that actually do care about the difference between a string and a
1161number, such as the magical C<++> autoincrement operator or the
1162syscall() function.
1163
197aec24 1164Stringification also destroys arrays.
5a964f20 1165
ac9dac7f
RGS
1166 @lines = `command`;
1167 print "@lines"; # WRONG - extra blanks
1168 print @lines; # right
5a964f20 1169
04d666b1 1170=head2 Why don't my E<lt>E<lt>HERE documents work?
68dc0745 1171
1172Check for these three things:
1173
1174=over 4
1175
04d666b1 1176=item There must be no space after the E<lt>E<lt> part.
68dc0745 1177
197aec24 1178=item There (probably) should be a semicolon at the end.
68dc0745 1179
197aec24 1180=item You can't (easily) have any space in front of the tag.
68dc0745 1181
1182=back
1183
197aec24 1184If you want to indent the text in the here document, you
5a964f20
TC
1185can do this:
1186
1187 # all in one
1188 ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1189 your text
1190 goes here
1191 HERE_TARGET
1192
1193But the HERE_TARGET must still be flush against the margin.
197aec24 1194If you want that indented also, you'll have to quote
5a964f20
TC
1195in the indentation.
1196
1197 ($quote = <<' FINIS') =~ s/^\s+//gm;
1198 ...we will have peace, when you and all your works have
1199 perished--and the works of your dark master to whom you
1200 would deliver us. You are a liar, Saruman, and a corrupter
1201 of men's hearts. --Theoden in /usr/src/perl/taint.c
1202 FINIS
83ded9ee 1203 $quote =~ s/\s+--/\n--/;
5a964f20
TC
1204
1205A nice general-purpose fixer-upper function for indented here documents
1206follows. It expects to be called with a here document as its argument.
1207It looks to see whether each line begins with a common substring, and
a6dd486b
JB
1208if so, strips that substring off. Otherwise, it takes the amount of leading
1209whitespace found on the first line and removes that much off each
5a964f20
TC
1210subsequent line.
1211
1212 sub fix {
1213 local $_ = shift;
a6dd486b 1214 my ($white, $leader); # common whitespace and common leading string
d8b950dc 1215 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\g1\g2?.*\n)+$/) {
5a964f20
TC
1216 ($white, $leader) = ($2, quotemeta($1));
1217 } else {
1218 ($white, $leader) = (/^(\s+)/, '');
1219 }
1220 s/^\s*?$leader(?:$white)?//gm;
1221 return $_;
1222 }
1223
c8db1d39 1224This works with leading special strings, dynamically determined:
5a964f20 1225
ac9dac7f 1226 $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
5a964f20
TC
1227 @@@ int
1228 @@@ runops() {
1229 @@@ SAVEI32(runlevel);
1230 @@@ runlevel++;
d92eb7b0 1231 @@@ while ( op = (*op->op_ppaddr)() );
5a964f20
TC
1232 @@@ TAINT_NOT;
1233 @@@ return 0;
1234 @@@ }
ac9dac7f 1235 MAIN_INTERPRETER_LOOP
5a964f20 1236
a6dd486b 1237Or with a fixed amount of leading whitespace, with remaining
5a964f20
TC
1238indentation correctly preserved:
1239
ac9dac7f 1240 $poem = fix<<EVER_ON_AND_ON;
5a964f20
TC
1241 Now far ahead the Road has gone,
1242 And I must follow, if I can,
1243 Pursuing it with eager feet,
1244 Until it joins some larger way
1245 Where many paths and errands meet.
1246 And whither then? I cannot say.
1247 --Bilbo in /usr/src/perl/pp_ctl.c
ac9dac7f 1248 EVER_ON_AND_ON
5a964f20 1249
68dc0745 1250=head1 Data: Arrays
1251
65acb1b1
TC
1252=head2 What is the difference between a list and an array?
1253
8d2e243f 1254(contributed by brian d foy)
1255
1256A list is a fixed collection of scalars. An array is a variable that
1257holds a variable collection of scalars. An array can supply its collection
1258for list operations, so list operations also work on arrays:
1259
1260 # slices
1261 ( 'dog', 'cat', 'bird' )[2,3];
1262 @animals[2,3];
1263
1264 # iteration
1265 foreach ( qw( dog cat bird ) ) { ... }
1266 foreach ( @animals ) { ... }
1267
1268 my @three = grep { length == 3 } qw( dog cat bird );
1269 my @three = grep { length == 3 } @animals;
d12d61cf 1270
8d2e243f 1271 # supply an argument list
1272 wash_animals( qw( dog cat bird ) );
1273 wash_animals( @animals );
1274
c69ca1d4 1275Array operations, which change the scalars, rearranges them, or adds
8d2e243f 1276or subtracts some scalars, only work on arrays. These can't work on a
1277list, which is fixed. Array operations include C<shift>, C<unshift>,
1278C<push>, C<pop>, and C<splice>.
1279
1280An array can also change its length:
1281
1282 $#animals = 1; # truncate to two elements
1283 $#animals = 10000; # pre-extend to 10,001 elements
1284
1285You can change an array element, but you can't change a list element:
1286
1287 $animals[0] = 'Rottweiler';
1288 qw( dog cat bird )[0] = 'Rottweiler'; # syntax error!
1289
1290 foreach ( @animals ) {
1291 s/^d/fr/; # works fine
1292 }
d12d61cf 1293
8d2e243f 1294 foreach ( qw( dog cat bird ) ) {
1295 s/^d/fr/; # Error! Modification of read only value!
1296 }
1297
d12d61cf 1298However, if the list element is itself a variable, it appears that you
8d2e243f 1299can change a list element. However, the list element is the variable, not
1300the data. You're not changing the list element, but something the list
d12d61cf 1301element refers to. The list element itself doesn't change: it's still
8d2e243f 1302the same variable.
65acb1b1 1303
8d2e243f 1304You also have to be careful about context. You can assign an array to
1305a scalar to get the number of elements in the array. This only works
1306for arrays, though:
1307
1308 my $count = @animals; # only works with arrays
d12d61cf 1309
8d2e243f 1310If you try to do the same thing with what you think is a list, you
1311get a quite different result. Although it looks like you have a list
1312on the righthand side, Perl actually sees a bunch of scalars separated
1313by a comma:
65acb1b1 1314
8d2e243f 1315 my $scalar = ( 'dog', 'cat', 'bird' ); # $scalar gets bird
65acb1b1 1316
8d2e243f 1317Since you're assigning to a scalar, the righthand side is in scalar
1318context. The comma operator (yes, it's an operator!) in scalar
1319context evaluates its lefthand side, throws away the result, and
1320evaluates it's righthand side and returns the result. In effect,
1321that list-lookalike assigns to C<$scalar> it's rightmost value. Many
c69ca1d4 1322people mess this up because they choose a list-lookalike whose
8d2e243f 1323last element is also the count they expect:
1324
1325 my $scalar = ( 1, 2, 3 ); # $scalar gets 3, accidentally
65acb1b1 1326
68dc0745 1327=head2 What is the difference between $array[1] and @array[1]?
1328
8d2e243f 1329(contributed by brian d foy)
1330
1331The difference is the sigil, that special character in front of the
1332array name. The C<$> sigil means "exactly one item", while the C<@>
1333sigil means "zero or more items". The C<$> gets you a single scalar,
1334while the C<@> gets you a list.
68dc0745 1335
8d2e243f 1336The confusion arises because people incorrectly assume that the sigil
1337denotes the variable type.
68dc0745 1338
8d2e243f 1339The C<$array[1]> is a single-element access to the array. It's going
1340to return the item in index 1 (or undef if there is no item there).
1341If you intend to get exactly one element from the array, this is the
1342form you should use.
68dc0745 1343
8d2e243f 1344The C<@array[1]> is an array slice, although it has only one index.
1345You can pull out multiple elements simultaneously by specifying
1346additional indices as a list, like C<@array[1,4,3,0]>.
68dc0745 1347
8d2e243f 1348Using a slice on the lefthand side of the assignment supplies list
d12d61cf 1349context to the righthand side. This can lead to unexpected results.
1350For instance, if you want to read a single line from a filehandle,
8d2e243f 1351assigning to a scalar value is fine:
68dc0745 1352
8d2e243f 1353 $array[1] = <STDIN>;
1354
1355However, in list context, the line input operator returns all of the
1356lines as a list. The first line goes into C<@array[1]> and the rest
1357of the lines mysteriously disappear:
1358
1359 @array[1] = <STDIN>; # most likely not what you want
1360
1361Either the C<use warnings> pragma or the B<-w> flag will warn you when
1362you use an array slice with a single index.
68dc0745 1363
d92eb7b0 1364=head2 How can I remove duplicate elements from a list or array?
68dc0745 1365
6670e5e7 1366(contributed by brian d foy)
68dc0745 1367
6670e5e7
RGS
1368Use a hash. When you think the words "unique" or "duplicated", think
1369"hash keys".
68dc0745 1370
6670e5e7
RGS
1371If you don't care about the order of the elements, you could just
1372create the hash then extract the keys. It's not important how you
1373create that hash: just that you use C<keys> to get the unique
1374elements.
551e1d92 1375
ac9dac7f
RGS
1376 my %hash = map { $_, 1 } @array;
1377 # or a hash slice: @hash{ @array } = ();
1378 # or a foreach: $hash{$_} = 1 foreach ( @array );
1379
1380 my @unique = keys %hash;
68dc0745 1381
ac9dac7f
RGS
1382If you want to use a module, try the C<uniq> function from
1383C<List::MoreUtils>. In list context it returns the unique elements,
1384preserving their order in the list. In scalar context, it returns the
1385number of unique elements.
1386
1387 use List::MoreUtils qw(uniq);
1388
1389 my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1390 my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
68dc0745 1391
6670e5e7
RGS
1392You can also go through each element and skip the ones you've seen
1393before. Use a hash to keep track. The first time the loop sees an
1394element, that element has no key in C<%Seen>. The C<next> statement
1395creates the key and immediately uses its value, which is C<undef>, so
1396the loop continues to the C<push> and increments the value for that
1397key. The next time the loop sees that same element, its key exists in
1398the hash I<and> the value for that key is true (since it's not 0 or
ac9dac7f
RGS
1399C<undef>), so the next skips that iteration and the loop goes to the
1400next element.
551e1d92 1401
6670e5e7
RGS
1402 my @unique = ();
1403 my %seen = ();
68dc0745 1404
6670e5e7
RGS
1405 foreach my $elem ( @array )
1406 {
1407 next if $seen{ $elem }++;
1408 push @unique, $elem;
1409 }
68dc0745 1410
6670e5e7
RGS
1411You can write this more briefly using a grep, which does the
1412same thing.
68dc0745 1413
ac9dac7f
RGS
1414 my %seen = ();
1415 my @unique = grep { ! $seen{ $_ }++ } @array;
65acb1b1 1416
ddbc1f16 1417=head2 How can I tell whether a certain element is contained in a list or array?
5a964f20 1418
109f0441 1419(portions of this answer contributed by Anno Siegel and brian d foy)
9e72e4c6 1420
5a964f20
TC
1421Hearing the word "in" is an I<in>dication that you probably should have
1422used a hash, not a list or array, to store your data. Hashes are
1423designed to answer this question quickly and efficiently. Arrays aren't.
68dc0745 1424
109f0441
S
1425That being said, there are several ways to approach this. In Perl 5.10
1426and later, you can use the smart match operator to check that an item is
1427contained in an array or a hash:
1428
1429 use 5.010;
1430
1431 if( $item ~~ @array )
1432 {
1433 say "The array contains $item"
1434 }
1435
1436 if( $item ~~ %hash )
1437 {
1438 say "The hash contains $item"
1439 }
1440
1441With earlier versions of Perl, you have to do a bit more work. If you
5a964f20 1442are going to make this query many times over arbitrary string values,
881bdbd4 1443the fastest way is probably to invert the original array and maintain a
109f0441 1444hash whose keys are the first array's values:
68dc0745 1445
ac9dac7f
RGS
1446 @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1447 %is_blue = ();
1448 for (@blues) { $is_blue{$_} = 1 }
68dc0745 1449
ac9dac7f
RGS
1450Now you can check whether C<$is_blue{$some_color}>. It might have
1451been a good idea to keep the blues all in a hash in the first place.
68dc0745 1452
1453If the values are all small integers, you could use a simple indexed
1454array. This kind of an array will take up less space:
1455
ac9dac7f
RGS
1456 @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1457 @is_tiny_prime = ();
1458 for (@primes) { $is_tiny_prime[$_] = 1 }
1459 # or simply @istiny_prime[@primes] = (1) x @primes;
68dc0745 1460
1461Now you check whether $is_tiny_prime[$some_number].
1462
1463If the values in question are integers instead of strings, you can save
1464quite a lot of space by using bit strings instead:
1465
ac9dac7f
RGS
1466 @articles = ( 1..10, 150..2000, 2017 );
1467 undef $read;
1468 for (@articles) { vec($read,$_,1) = 1 }
68dc0745 1469
1470Now check whether C<vec($read,$n,1)> is true for some C<$n>.
1471
9e72e4c6
RGS
1472These methods guarantee fast individual tests but require a re-organization
1473of the original list or array. They only pay off if you have to test
1474multiple values against the same array.
68dc0745 1475
ac9dac7f 1476If you are testing only once, the standard module C<List::Util> exports
9e72e4c6 1477the function C<first> for this purpose. It works by stopping once it
c195e131 1478finds the element. It's written in C for speed, and its Perl equivalent
9e72e4c6 1479looks like this subroutine:
68dc0745 1480
9e72e4c6
RGS
1481 sub first (&@) {
1482 my $code = shift;
1483 foreach (@_) {
1484 return $_ if &{$code}();
1485 }
1486 undef;
1487 }
68dc0745 1488
9e72e4c6
RGS
1489If speed is of little concern, the common idiom uses grep in scalar context
1490(which returns the number of items that passed its condition) to traverse the
1491entire list. This does have the benefit of telling you how many matches it
1492found, though.
68dc0745 1493
9e72e4c6 1494 my $is_there = grep $_ eq $whatever, @array;
65acb1b1 1495
9e72e4c6
RGS
1496If you want to actually extract the matching elements, simply use grep in
1497list context.
68dc0745 1498
9e72e4c6 1499 my @matches = grep $_ eq $whatever, @array;
58103a2e 1500
68dc0745 1501=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
1502
ac9dac7f
RGS
1503Use a hash. Here's code to do both and more. It assumes that each
1504element is unique in a given array:
68dc0745 1505
ac9dac7f
RGS
1506 @union = @intersection = @difference = ();
1507 %count = ();
1508 foreach $element (@array1, @array2) { $count{$element}++ }
1509 foreach $element (keys %count) {
1510 push @union, $element;
1511 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1512 }
68dc0745 1513
ac9dac7f
RGS
1514Note that this is the I<symmetric difference>, that is, all elements
1515in either A or in B but not in both. Think of it as an xor operation.
d92eb7b0 1516
65acb1b1
TC
1517=head2 How do I test whether two arrays or hashes are equal?
1518
109f0441
S
1519With Perl 5.10 and later, the smart match operator can give you the answer
1520with the least amount of work:
1521
1522 use 5.010;
1523
1524 if( @array1 ~~ @array2 )
1525 {
1526 say "The arrays are the same";
1527 }
1528
1529 if( %hash1 ~~ %hash2 ) # doesn't check values!
1530 {
1531 say "The hash keys are the same";
1532 }
1533
ac9dac7f
RGS
1534The following code works for single-level arrays. It uses a
1535stringwise comparison, and does not distinguish defined versus
1536undefined empty strings. Modify if you have other needs.
65acb1b1 1537
ac9dac7f 1538 $are_equal = compare_arrays(\@frogs, \@toads);
65acb1b1 1539
ac9dac7f
RGS
1540 sub compare_arrays {
1541 my ($first, $second) = @_;
1542 no warnings; # silence spurious -w undef complaints
1543 return 0 unless @$first == @$second;
1544 for (my $i = 0; $i < @$first; $i++) {
1545 return 0 if $first->[$i] ne $second->[$i];
1546 }
1547 return 1;
1548 }
65acb1b1
TC
1549
1550For multilevel structures, you may wish to use an approach more
ac9dac7f 1551like this one. It uses the CPAN module C<FreezeThaw>:
65acb1b1 1552
ac9dac7f
RGS
1553 use FreezeThaw qw(cmpStr);
1554 @a = @b = ( "this", "that", [ "more", "stuff" ] );
65acb1b1 1555
ac9dac7f
RGS
1556 printf "a and b contain %s arrays\n",
1557 cmpStr(\@a, \@b) == 0
1558 ? "the same"
1559 : "different";
65acb1b1 1560
ac9dac7f
RGS
1561This approach also works for comparing hashes. Here we'll demonstrate
1562two different answers:
65acb1b1 1563
ac9dac7f 1564 use FreezeThaw qw(cmpStr cmpStrHard);
65acb1b1 1565
ac9dac7f
RGS
1566 %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1567 $a{EXTRA} = \%b;
1568 $b{EXTRA} = \%a;
65acb1b1 1569
ac9dac7f 1570 printf "a and b contain %s hashes\n",
65acb1b1
TC
1571 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1572
ac9dac7f 1573 printf "a and b contain %s hashes\n",
65acb1b1
TC
1574 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1575
1576
1577The first reports that both those the hashes contain the same data,
1578while the second reports that they do not. Which you prefer is left as
1579an exercise to the reader.
1580
68dc0745 1581=head2 How do I find the first array element for which a condition is true?
1582
49d635f9 1583To find the first array element which satisfies a condition, you can
ac9dac7f
RGS
1584use the C<first()> function in the C<List::Util> module, which comes
1585with Perl 5.8. This example finds the first element that contains
1586"Perl".
49d635f9
RGS
1587
1588 use List::Util qw(first);
197aec24 1589
49d635f9 1590 my $element = first { /Perl/ } @array;
197aec24 1591
ac9dac7f 1592If you cannot use C<List::Util>, you can make your own loop to do the
49d635f9
RGS
1593same thing. Once you find the element, you stop the loop with last.
1594
1595 my $found;
ac9dac7f 1596 foreach ( @array ) {
6670e5e7 1597 if( /Perl/ ) { $found = $_; last }
49d635f9
RGS
1598 }
1599
1600If you want the array index, you can iterate through the indices
1601and check the array element at each index until you find one
1602that satisfies the condition.
1603
197aec24 1604 my( $found, $index ) = ( undef, -1 );
ac9dac7f
RGS
1605 for( $i = 0; $i < @array; $i++ ) {
1606 if( $array[$i] =~ /Perl/ ) {
6670e5e7
RGS
1607 $found = $array[$i];
1608 $index = $i;
1609 last;
1610 }
1611 }
68dc0745 1612
1613=head2 How do I handle linked lists?
1614
159235ed 1615(contributed by brian d foy)
65acb1b1 1616
159235ed 1617Perl's arrays do not have a fixed size, so you don't need linked lists
1618if you just want to add or remove items. You can use array operations
1619such as C<push>, C<pop>, C<shift>, C<unshift>, or C<splice> to do
1620that.
1621
1622Sometimes, however, linked lists can be useful in situations where you
1623want to "shard" an array so you have have many small arrays instead of
1624a single big array. You can keep arrays longer than Perl's largest
1625array index, lock smaller arrays separately in threaded programs,
1626reallocate less memory, or quickly insert elements in the middle of
1627the chain.
1628
1629Steve Lembark goes through the details in his YAPC::NA 2009 talk "Perly
84adb724 1630Linked Lists" ( http://www.slideshare.net/lembark/perly-linked-lists ),
159235ed 1631although you can just use his C<LinkedList::Single> module.
68dc0745 1632
1633=head2 How do I handle circular lists?
109f0441
S
1634X<circular> X<array> X<Tie::Cycle> X<Array::Iterator::Circular>
1635X<cycle> X<modulus>
68dc0745 1636
109f0441
S
1637(contributed by brian d foy)
1638
589a5df2 1639If you want to cycle through an array endlessly, you can increment the
109f0441 1640index modulo the number of elements in the array:
68dc0745 1641
109f0441
S
1642 my @array = qw( a b c );
1643 my $i = 0;
1644
1645 while( 1 ) {
1646 print $array[ $i++ % @array ], "\n";
1647 last if $i > 20;
1648 }
ac9dac7f 1649
109f0441
S
1650You can also use C<Tie::Cycle> to use a scalar that always has the
1651next element of the circular array:
ac9dac7f
RGS
1652
1653 use Tie::Cycle;
1654
1655 tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1656
1657 print $cycle; # FFFFFF
1658 print $cycle; # 000000
1659 print $cycle; # FFFF00
68dc0745 1660
109f0441
S
1661The C<Array::Iterator::Circular> creates an iterator object for
1662circular arrays:
1663
1664 use Array::Iterator::Circular;
1665
1666 my $color_iterator = Array::Iterator::Circular->new(
1667 qw(red green blue orange)
1668 );
1669
1670 foreach ( 1 .. 20 ) {
1671 print $color_iterator->next, "\n";
1672 }
1673
68dc0745 1674=head2 How do I shuffle an array randomly?
1675
45bbf655
JH
1676If you either have Perl 5.8.0 or later installed, or if you have
1677Scalar-List-Utils 1.03 or later installed, you can say:
1678
ac9dac7f 1679 use List::Util 'shuffle';
45bbf655
JH
1680
1681 @shuffled = shuffle(@list);
1682
f05bbc40 1683If not, you can use a Fisher-Yates shuffle.
5a964f20 1684
ac9dac7f
RGS
1685 sub fisher_yates_shuffle {
1686 my $deck = shift; # $deck is a reference to an array
109f0441
S
1687 return unless @$deck; # must not be empty!
1688
ac9dac7f
RGS
1689 my $i = @$deck;
1690 while (--$i) {
1691 my $j = int rand ($i+1);
1692 @$deck[$i,$j] = @$deck[$j,$i];
1693 }
1694 }
5a964f20 1695
ac9dac7f
RGS
1696 # shuffle my mpeg collection
1697 #
1698 my @mpeg = <audio/*/*.mp3>;
1699 fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
1700 print @mpeg;
5a964f20 1701
45bbf655 1702Note that the above implementation shuffles an array in place,
ac9dac7f 1703unlike the C<List::Util::shuffle()> which takes a list and returns
45bbf655
JH
1704a new shuffled list.
1705
d92eb7b0 1706You've probably seen shuffling algorithms that work using splice,
a6dd486b 1707randomly picking another element to swap the current element with
68dc0745 1708
ac9dac7f
RGS
1709 srand;
1710 @new = ();
1711 @old = 1 .. 10; # just a demo
1712 while (@old) {
1713 push(@new, splice(@old, rand @old, 1));
1714 }
68dc0745 1715
ac9dac7f
RGS
1716This is bad because splice is already O(N), and since you do it N
1717times, you just invented a quadratic algorithm; that is, O(N**2).
1718This does not scale, although Perl is so efficient that you probably
1719won't notice this until you have rather largish arrays.
68dc0745 1720
1721=head2 How do I process/modify each element of an array?
1722
1723Use C<for>/C<foreach>:
1724
ac9dac7f 1725 for (@lines) {
6670e5e7
RGS
1726 s/foo/bar/; # change that word
1727 tr/XZ/ZX/; # swap those letters
ac9dac7f 1728 }
68dc0745 1729
1730Here's another; let's compute spherical volumes:
1731
ac9dac7f 1732 for (@volumes = @radii) { # @volumes has changed parts
6670e5e7
RGS
1733 $_ **= 3;
1734 $_ *= (4/3) * 3.14159; # this will be constant folded
ac9dac7f 1735 }
197aec24 1736
ac9dac7f 1737which can also be done with C<map()> which is made to transform
49d635f9
RGS
1738one list into another:
1739
1740 @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
68dc0745 1741
76817d6d
JH
1742If you want to do the same thing to modify the values of the
1743hash, you can use the C<values> function. As of Perl 5.6
1744the values are not copied, so if you modify $orbit (in this
1745case), you modify the value.
5a964f20 1746
ac9dac7f 1747 for $orbit ( values %orbits ) {
6670e5e7 1748 ($orbit **= 3) *= (4/3) * 3.14159;
ac9dac7f 1749 }
818c4caa 1750
76817d6d
JH
1751Prior to perl 5.6 C<values> returned copies of the values,
1752so older perl code often contains constructions such as
1753C<@orbits{keys %orbits}> instead of C<values %orbits> where
1754the hash is to be modified.
818c4caa 1755
68dc0745 1756=head2 How do I select a random element from an array?
1757
ac9dac7f 1758Use the C<rand()> function (see L<perlfunc/rand>):
68dc0745 1759
ac9dac7f
RGS
1760 $index = rand @array;
1761 $element = $array[$index];
68dc0745 1762
793f5136 1763Or, simply:
ac9dac7f
RGS
1764
1765 my $element = $array[ rand @array ];
5a964f20 1766
68dc0745 1767=head2 How do I permute N elements of a list?
c69ca1d4 1768X<List::Permutor> X<permute> X<Algorithm::Loops> X<Knuth>
c195e131 1769X<The Art of Computer Programming> X<Fischer-Krause>
68dc0745 1770
c195e131 1771Use the C<List::Permutor> module on CPAN. If the list is actually an
ac9dac7f 1772array, try the C<Algorithm::Permute> module (also on CPAN). It's
c195e131 1773written in XS code and is very efficient:
49d635f9
RGS
1774
1775 use Algorithm::Permute;
c195e131 1776
49d635f9
RGS
1777 my @array = 'a'..'d';
1778 my $p_iterator = Algorithm::Permute->new ( \@array );
c195e131 1779
49d635f9
RGS
1780 while (my @perm = $p_iterator->next) {
1781 print "next permutation: (@perm)\n";
ac9dac7f 1782 }
49d635f9 1783
197aec24
RGS
1784For even faster execution, you could do:
1785
ac9dac7f 1786 use Algorithm::Permute;
c195e131 1787
ac9dac7f 1788 my @array = 'a'..'d';
c195e131 1789
ac9dac7f
RGS
1790 Algorithm::Permute::permute {
1791 print "next permutation: (@array)\n";
1792 } @array;
197aec24 1793
c195e131
RGS
1794Here's a little program that generates all permutations of all the
1795words on each line of input. The algorithm embodied in the
1796C<permute()> function is discussed in Volume 4 (still unpublished) of
1797Knuth's I<The Art of Computer Programming> and will work on any list:
49d635f9
RGS
1798
1799 #!/usr/bin/perl -n
ac003c96 1800 # Fischer-Krause ordered permutation generator
49d635f9
RGS
1801
1802 sub permute (&@) {
1803 my $code = shift;
1804 my @idx = 0..$#_;
1805 while ( $code->(@_[@idx]) ) {
1806 my $p = $#idx;
1807 --$p while $idx[$p-1] > $idx[$p];
1808 my $q = $p or return;
1809 push @idx, reverse splice @idx, $p;
1810 ++$q while $idx[$p-1] > $idx[$q];
1811 @idx[$p-1,$q]=@idx[$q,$p-1];
1812 }
68dc0745 1813 }
68dc0745 1814
c195e131
RGS
1815 permute { print "@_\n" } split;
1816
1817The C<Algorithm::Loops> module also provides the C<NextPermute> and
1818C<NextPermuteNum> functions which efficiently find all unique permutations
1819of an array, even if it contains duplicate values, modifying it in-place:
1820if its elements are in reverse-sorted order then the array is reversed,
1821making it sorted, and it returns false; otherwise the next
1822permutation is returned.
1823
1824C<NextPermute> uses string order and C<NextPermuteNum> numeric order, so
1825you can enumerate all the permutations of C<0..9> like this:
1826
1827 use Algorithm::Loops qw(NextPermuteNum);
109f0441 1828
c195e131
RGS
1829 my @list= 0..9;
1830 do { print "@list\n" } while NextPermuteNum @list;
b8d2732a 1831
68dc0745 1832=head2 How do I sort an array by (anything)?
1833
1834Supply a comparison function to sort() (described in L<perlfunc/sort>):
1835
ac9dac7f 1836 @list = sort { $a <=> $b } @list;
68dc0745 1837
1838The default sort function is cmp, string comparison, which would
c47ff5f1 1839sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is
68dc0745 1840the numerical comparison operator.
1841
1842If you have a complicated function needed to pull out the part you
1843want to sort on, then don't do it inside the sort function. Pull it
1844out first, because the sort BLOCK can be called many times for the
1845same element. Here's an example of how to pull out the first word
1846after the first number on each item, and then sort those words
1847case-insensitively.
1848
ac9dac7f
RGS
1849 @idx = ();
1850 for (@data) {
1851 ($item) = /\d+\s*(\S+)/;
1852 push @idx, uc($item);
1853 }
1854 @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
68dc0745 1855
a6dd486b 1856which could also be written this way, using a trick
68dc0745 1857that's come to be known as the Schwartzian Transform:
1858
ac9dac7f
RGS
1859 @sorted = map { $_->[0] }
1860 sort { $a->[1] cmp $b->[1] }
1861 map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
68dc0745 1862
1863If you need to sort on several fields, the following paradigm is useful.
1864
ac9dac7f
RGS
1865 @sorted = sort {
1866 field1($a) <=> field1($b) ||
1867 field2($a) cmp field2($b) ||
1868 field3($a) cmp field3($b)
1869 } @data;
68dc0745 1870
1871This can be conveniently combined with precalculation of keys as given
1872above.
1873
379e39d7 1874See the F<sort> article in the "Far More Than You Ever Wanted
49d635f9 1875To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
06a5f41f 1876more about this approach.
68dc0745 1877
ac9dac7f 1878See also the question later in L<perlfaq4> on sorting hashes.
68dc0745 1879
1880=head2 How do I manipulate arrays of bits?
1881
ac9dac7f
RGS
1882Use C<pack()> and C<unpack()>, or else C<vec()> and the bitwise
1883operations.
1884
109f0441
S
1885For example, you don't have to store individual bits in an array
1886(which would mean that you're wasting a lot of space). To convert an
1887array of bits to a string, use C<vec()> to set the right bits. This
1888sets C<$vec> to have bit N set only if C<$ints[N]> was set:
ac9dac7f 1889
109f0441 1890 @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
ac9dac7f 1891 $vec = '';
109f0441
S
1892 foreach( 0 .. $#ints ) {
1893 vec($vec,$_,1) = 1 if $ints[$_];
1894 }
ac9dac7f 1895
109f0441
S
1896The string C<$vec> only takes up as many bits as it needs. For
1897instance, if you had 16 entries in C<@ints>, C<$vec> only needs two
1898bytes to store them (not counting the scalar variable overhead).
1899
1900Here's how, given a vector in C<$vec>, you can get those bits into
1901your C<@ints> array:
ac9dac7f
RGS
1902
1903 sub bitvec_to_list {
1904 my $vec = shift;
1905 my @ints;
1906 # Find null-byte density then select best algorithm
1907 if ($vec =~ tr/\0// / length $vec > 0.95) {
1908 use integer;
1909 my $i;
1910
1911 # This method is faster with mostly null-bytes
1912 while($vec =~ /[^\0]/g ) {
1913 $i = -9 + 8 * pos $vec;
1914 push @ints, $i if vec($vec, ++$i, 1);
1915 push @ints, $i if vec($vec, ++$i, 1);
1916 push @ints, $i if vec($vec, ++$i, 1);
1917 push @ints, $i if vec($vec, ++$i, 1);
1918 push @ints, $i if vec($vec, ++$i, 1);
1919 push @ints, $i if vec($vec, ++$i, 1);
1920 push @ints, $i if vec($vec, ++$i, 1);
1921 push @ints, $i if vec($vec, ++$i, 1);
1922 }
1923 }
1924 else {
1925 # This method is a fast general algorithm
1926 use integer;
1927 my $bits = unpack "b*", $vec;
1928 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1929 push @ints, pos $bits while($bits =~ /1/g);
1930 }
1931
1932 return \@ints;
1933 }
68dc0745 1934
1935This method gets faster the more sparse the bit vector is.
1936(Courtesy of Tim Bunce and Winfried Koenig.)
1937
76817d6d
JH
1938You can make the while loop a lot shorter with this suggestion
1939from Benjamin Goldberg:
1940
1941 while($vec =~ /[^\0]+/g ) {
ac9dac7f
RGS
1942 push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1943 }
76817d6d 1944
ac9dac7f 1945Or use the CPAN module C<Bit::Vector>:
cc30d1a7 1946
ac9dac7f
RGS
1947 $vector = Bit::Vector->new($num_of_bits);
1948 $vector->Index_List_Store(@ints);
1949 @ints = $vector->Index_List_Read();
cc30d1a7 1950
ac9dac7f
RGS
1951C<Bit::Vector> provides efficient methods for bit vector, sets of
1952small integers and "big int" math.
cc30d1a7
JH
1953
1954Here's a more extensive illustration using vec():
65acb1b1 1955
ac9dac7f
RGS
1956 # vec demo
1957 $vector = "\xff\x0f\xef\xfe";
1958 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
65acb1b1 1959 unpack("N", $vector), "\n";
ac9dac7f
RGS
1960 $is_set = vec($vector, 23, 1);
1961 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
65acb1b1 1962 pvec($vector);
65acb1b1 1963
ac9dac7f
RGS
1964 set_vec(1,1,1);
1965 set_vec(3,1,1);
1966 set_vec(23,1,1);
1967
1968 set_vec(3,1,3);
1969 set_vec(3,2,3);
1970 set_vec(3,4,3);
1971 set_vec(3,4,7);
1972 set_vec(3,8,3);
1973 set_vec(3,8,7);
1974
1975 set_vec(0,32,17);
1976 set_vec(1,32,17);
1977
1978 sub set_vec {
1979 my ($offset, $width, $value) = @_;
1980 my $vector = '';
1981 vec($vector, $offset, $width) = $value;
1982 print "offset=$offset width=$width value=$value\n";
1983 pvec($vector);
1984 }
65acb1b1 1985
ac9dac7f
RGS
1986 sub pvec {
1987 my $vector = shift;
1988 my $bits = unpack("b*", $vector);
1989 my $i = 0;
1990 my $BASE = 8;
1991
1992 print "vector length in bytes: ", length($vector), "\n";
1993 @bytes = unpack("A8" x length($vector), $bits);
1994 print "bits are: @bytes\n\n";
1995 }
65acb1b1 1996
68dc0745 1997=head2 Why does defined() return true on empty arrays and hashes?
1998
65acb1b1
TC
1999The short story is that you should probably only use defined on scalars or
2000functions, not on aggregates (arrays and hashes). See L<perlfunc/defined>
2001in the 5.004 release or later of Perl for more detail.
68dc0745 2002
2003=head1 Data: Hashes (Associative Arrays)
2004
2005=head2 How do I process an entire hash?
2006
ee891a00
RGS
2007(contributed by brian d foy)
2008
2009There are a couple of ways that you can process an entire hash. You
2010can get a list of keys, then go through each key, or grab a one
2011key-value pair at a time.
68dc0745 2012
ee891a00
RGS
2013To go through all of the keys, use the C<keys> function. This extracts
2014all of the keys of the hash and gives them back to you as a list. You
2015can then get the value through the particular key you're processing:
2016
2017 foreach my $key ( keys %hash ) {
2018 my $value = $hash{$key}
2019 ...
ac9dac7f 2020 }
68dc0745 2021
ee891a00 2022Once you have the list of keys, you can process that list before you
109f0441 2023process the hash elements. For instance, you can sort the keys so you
ee891a00
RGS
2024can process them in lexical order:
2025
2026 foreach my $key ( sort keys %hash ) {
2027 my $value = $hash{$key}
2028 ...
2029 }
2030
2031Or, you might want to only process some of the items. If you only want
2032to deal with the keys that start with C<text:>, you can select just
2033those using C<grep>:
2034
2035 foreach my $key ( grep /^text:/, keys %hash ) {
2036 my $value = $hash{$key}
2037 ...
2038 }
2039
2040If the hash is very large, you might not want to create a long list of
109f0441 2041keys. To save some memory, you can grab one key-value pair at a time using
ee891a00
RGS
2042C<each()>, which returns a pair you haven't seen yet:
2043
2044 while( my( $key, $value ) = each( %hash ) ) {
2045 ...
2046 }
2047
2048The C<each> operator returns the pairs in apparently random order, so if
2049ordering matters to you, you'll have to stick with the C<keys> method.
2050
2051The C<each()> operator can be a bit tricky though. You can't add or
2052delete keys of the hash while you're using it without possibly
2053skipping or re-processing some pairs after Perl internally rehashes
2054all of the elements. Additionally, a hash has only one iterator, so if
2055you use C<keys>, C<values>, or C<each> on the same hash, you can reset
2056the iterator and mess up your processing. See the C<each> entry in
2057L<perlfunc> for more details.
68dc0745 2058
109f0441
S
2059=head2 How do I merge two hashes?
2060X<hash> X<merge> X<slice, hash>
2061
2062(contributed by brian d foy)
2063
2064Before you decide to merge two hashes, you have to decide what to do
2065if both hashes contain keys that are the same and if you want to leave
2066the original hashes as they were.
2067
2068If you want to preserve the original hashes, copy one hash (C<%hash1>)
2069to a new hash (C<%new_hash>), then add the keys from the other hash
2070(C<%hash2> to the new hash. Checking that the key already exists in
2071C<%new_hash> gives you a chance to decide what to do with the
2072duplicates:
2073
2074 my %new_hash = %hash1; # make a copy; leave %hash1 alone
2075
2076 foreach my $key2 ( keys %hash2 )
2077 {
2078 if( exists $new_hash{$key2} )
2079 {
2080 warn "Key [$key2] is in both hashes!";
2081 # handle the duplicate (perhaps only warning)
2082 ...
2083 next;
2084 }
2085 else
2086 {
2087 $new_hash{$key2} = $hash2{$key2};
2088 }
2089 }
2090
2091If you don't want to create a new hash, you can still use this looping
2092technique; just change the C<%new_hash> to C<%hash1>.
2093
2094 foreach my $key2 ( keys %hash2 )
2095 {
2096 if( exists $hash1{$key2} )
2097 {
2098 warn "Key [$key2] is in both hashes!";
2099 # handle the duplicate (perhaps only warning)
2100 ...
2101 next;
2102 }
2103 else
2104 {
2105 $hash1{$key2} = $hash2{$key2};
2106 }
2107 }
2108
2109If you don't care that one hash overwrites keys and values from the other, you
2110could just use a hash slice to add one hash to another. In this case, values
2111from C<%hash2> replace values from C<%hash1> when they have keys in common:
2112
2113 @hash1{ keys %hash2 } = values %hash2;
2114
68dc0745 2115=head2 What happens if I add or remove keys from a hash while iterating over it?
2116
28b41a80 2117(contributed by brian d foy)
d92eb7b0 2118
28b41a80 2119The easy answer is "Don't do that!"
d92eb7b0 2120
28b41a80
RGS
2121If you iterate through the hash with each(), you can delete the key
2122most recently returned without worrying about it. If you delete or add
2123other keys, the iterator may skip or double up on them since perl
2124may rearrange the hash table. See the
2125entry for C<each()> in L<perlfunc>.
68dc0745 2126
2127=head2 How do I look up a hash element by value?
2128
2129Create a reverse hash:
2130
ac9dac7f
RGS
2131 %by_value = reverse %by_key;
2132 $key = $by_value{$value};
68dc0745 2133
2134That's not particularly efficient. It would be more space-efficient
2135to use:
2136
ac9dac7f
RGS
2137 while (($key, $value) = each %by_key) {
2138 $by_value{$value} = $key;
2139 }
68dc0745 2140
d92eb7b0
GS
2141If your hash could have repeated values, the methods above will only find
2142one of the associated keys. This may or may not worry you. If it does
2143worry you, you can always reverse the hash into a hash of arrays instead:
2144
ac9dac7f
RGS
2145 while (($key, $value) = each %by_key) {
2146 push @{$key_list_by_value{$value}}, $key;
2147 }
68dc0745 2148
2149=head2 How can I know how many entries are in a hash?
2150
109f0441
S
2151(contributed by brian d foy)
2152
2153This is very similar to "How do I process an entire hash?", also in
2154L<perlfaq4>, but a bit simpler in the common cases.
2155
2156You can use the C<keys()> built-in function in scalar context to find out
2157have many entries you have in a hash:
68dc0745 2158
109f0441 2159 my $key_count = keys %hash; # must be scalar context!
d12d61cf 2160
109f0441 2161If you want to find out how many entries have a defined value, that's
d12d61cf 2162a bit different. You have to check each value. A C<grep> is handy:
109f0441
S
2163
2164 my $defined_value_count = grep { defined } values %hash;
68dc0745 2165
109f0441
S
2166You can use that same structure to count the entries any way that
2167you like. If you want the count of the keys with vowels in them,
2168you just test for that instead:
2169
2170 my $vowel_count = grep { /[aeiou]/ } keys %hash;
d12d61cf 2171
109f0441
S
2172The C<grep> in scalar context returns the count. If you want the list
2173of matching items, just use it in list context instead:
2174
2175 my @defined_values = grep { defined } values %hash;
2176
2177The C<keys()> function also resets the iterator, which means that you may
197aec24 2178see strange results if you use this between uses of other hash operators
109f0441 2179such as C<each()>.
68dc0745 2180
2181=head2 How do I sort a hash (optionally by value instead of key)?
2182
a05e4845
RGS
2183(contributed by brian d foy)
2184
2185To sort a hash, start with the keys. In this example, we give the list of
2186keys to the sort function which then compares them ASCIIbetically (which
2187might be affected by your locale settings). The output list has the keys
2188in ASCIIbetical order. Once we have the keys, we can go through them to
2189create a report which lists the keys in ASCIIbetical order.
2190
2191 my @keys = sort { $a cmp $b } keys %hash;
58103a2e 2192
a05e4845
RGS
2193 foreach my $key ( @keys )
2194 {
109f0441 2195 printf "%-20s %6d\n", $key, $hash{$key};
a05e4845
RGS
2196 }
2197
58103a2e 2198We could get more fancy in the C<sort()> block though. Instead of
a05e4845 2199comparing the keys, we can compute a value with them and use that
58103a2e 2200value as the comparison.
a05e4845
RGS
2201
2202For instance, to make our report order case-insensitive, we use
58103a2e 2203the C<\L> sequence in a double-quoted string to make everything
a05e4845
RGS
2204lowercase. The C<sort()> block then compares the lowercased
2205values to determine in which order to put the keys.
2206
2207 my @keys = sort { "\L$a" cmp "\L$b" } keys %hash;
58103a2e 2208
a05e4845 2209Note: if the computation is expensive or the hash has many elements,
58103a2e 2210you may want to look at the Schwartzian Transform to cache the
a05e4845
RGS
2211computation results.
2212
2213If we want to sort by the hash value instead, we use the hash key
2214to look it up. We still get out a list of keys, but this time they
2215are ordered by their value.
2216
2217 my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
2218
2219From there we can get more complex. If the hash values are the same,
2220we can provide a secondary sort on the hash key.
2221
58103a2e
RGS
2222 my @keys = sort {
2223 $hash{$a} <=> $hash{$b}
a05e4845
RGS
2224 or
2225 "\L$a" cmp "\L$b"
2226 } keys %hash;
68dc0745 2227
2228=head2 How can I always keep my hash sorted?
ac9dac7f 2229X<hash tie sort DB_File Tie::IxHash>
68dc0745 2230
ac9dac7f
RGS
2231You can look into using the C<DB_File> module and C<tie()> using the
2232C<$DB_BTREE> hash bindings as documented in L<DB_File/"In Memory
2233Databases">. The C<Tie::IxHash> module from CPAN might also be
2234instructive. Although this does keep your hash sorted, you might not
2235like the slow down you suffer from the tie interface. Are you sure you
2236need to do this? :)
68dc0745 2237
2238=head2 What's the difference between "delete" and "undef" with hashes?
2239
92993692
JH
2240Hashes contain pairs of scalars: the first is the key, the
2241second is the value. The key will be coerced to a string,
2242although the value can be any kind of scalar: string,
ac9dac7f 2243number, or reference. If a key C<$key> is present in
92993692
JH
2244%hash, C<exists($hash{$key})> will return true. The value
2245for a given key can be C<undef>, in which case
2246C<$hash{$key}> will be C<undef> while C<exists $hash{$key}>
2247will return true. This corresponds to (C<$key>, C<undef>)
2248being in the hash.
68dc0745 2249
589a5df2 2250Pictures help... Here's the C<%hash> table:
68dc0745 2251
2252 keys values
2253 +------+------+
2254 | a | 3 |
2255 | x | 7 |
2256 | d | 0 |
2257 | e | 2 |
2258 +------+------+
2259
2260And these conditions hold
2261
92993692
JH
2262 $hash{'a'} is true
2263 $hash{'d'} is false
2264 defined $hash{'d'} is true
2265 defined $hash{'a'} is true
e9d185f8 2266 exists $hash{'a'} is true (Perl 5 only)
92993692 2267 grep ($_ eq 'a', keys %hash) is true
68dc0745 2268
2269If you now say
2270
92993692 2271 undef $hash{'a'}
68dc0745 2272
2273your table now reads:
2274
2275
2276 keys values
2277 +------+------+
2278 | a | undef|
2279 | x | 7 |
2280 | d | 0 |
2281 | e | 2 |
2282 +------+------+
2283
2284and these conditions now hold; changes in caps:
2285
92993692
JH
2286 $hash{'a'} is FALSE
2287 $hash{'d'} is false
2288 defined $hash{'d'} is true
2289 defined $hash{'a'} is FALSE
e9d185f8 2290 exists $hash{'a'} is true (Perl 5 only)
92993692 2291 grep ($_ eq 'a', keys %hash) is true
68dc0745 2292
2293Notice the last two: you have an undef value, but a defined key!
2294
2295Now, consider this:
2296
92993692 2297 delete $hash{'a'}
68dc0745 2298
2299your table now reads:
2300
2301 keys values
2302 +------+------+
2303 | x | 7 |
2304 | d | 0 |
2305 | e | 2 |
2306 +------+------+
2307
2308and these conditions now hold; changes in caps:
2309
92993692
JH
2310 $hash{'a'} is false
2311 $hash{'d'} is false
2312 defined $hash{'d'} is true
2313 defined $hash{'a'} is false
e9d185f8 2314 exists $hash{'a'} is FALSE (Perl 5 only)
92993692 2315 grep ($_ eq 'a', keys %hash) is FALSE
68dc0745 2316
2317See, the whole entry is gone!
2318
2319=head2 Why don't my tied hashes make the defined/exists distinction?
2320
92993692
JH
2321This depends on the tied hash's implementation of EXISTS().
2322For example, there isn't the concept of undef with hashes
2323that are tied to DBM* files. It also means that exists() and
2324defined() do the same thing with a DBM* file, and what they
2325end up doing is not what they do with ordinary hashes.
68dc0745 2326
2327=head2 How do I reset an each() operation part-way through?
2328
fb2fe781
RGS
2329(contributed by brian d foy)
2330
2331You can use the C<keys> or C<values> functions to reset C<each>. To
2332simply reset the iterator used by C<each> without doing anything else,
2333use one of them in void context:
2334
2335 keys %hash; # resets iterator, nothing else.
2336 values %hash; # resets iterator, nothing else.
2337
2338See the documentation for C<each> in L<perlfunc>.
68dc0745 2339
2340=head2 How can I get the unique keys from two hashes?
2341
d92eb7b0
GS
2342First you extract the keys from the hashes into lists, then solve
2343the "removing duplicates" problem described above. For example:
68dc0745 2344
ac9dac7f
RGS
2345 %seen = ();
2346 for $element (keys(%foo), keys(%bar)) {
2347 $seen{$element}++;
2348 }
2349 @uniq = keys %seen;
68dc0745 2350
2351Or more succinctly:
2352
ac9dac7f 2353 @uniq = keys %{{%foo,%bar}};
68dc0745 2354
2355Or if you really want to save space:
2356
ac9dac7f
RGS
2357 %seen = ();
2358 while (defined ($key = each %foo)) {
2359 $seen{$key}++;
2360 }
2361 while (defined ($key = each %bar)) {
2362 $seen{$key}++;
2363 }
2364 @uniq = keys %seen;
68dc0745 2365
2366=head2 How can I store a multidimensional array in a DBM file?
2367
2368Either stringify the structure yourself (no fun), or else
2369get the MLDBM (which uses Data::Dumper) module from CPAN and layer
2370it on top of either DB_File or GDBM_File.
2371
2372=head2 How can I make my hash remember the order I put elements into it?
2373
ac9dac7f 2374Use the C<Tie::IxHash> from CPAN.
68dc0745 2375
ac9dac7f
RGS
2376 use Tie::IxHash;
2377
2378 tie my %myhash, 'Tie::IxHash';
2379
2380 for (my $i=0; $i<20; $i++) {
2381 $myhash{$i} = 2*$i;
2382 }
2383
2384 my @keys = keys %myhash;
2385 # @keys = (0,1,2,3,...)
46fc3d4c 2386
68dc0745 2387=head2 Why does passing a subroutine an undefined element in a hash create it?
2388
109f0441
S
2389(contributed by brian d foy)
2390
2391Are you using a really old version of Perl?
2392
2393Normally, accessing a hash key's value for a nonexistent key will
2394I<not> create the key.
2395
2396 my %hash = ();
2397 my $value = $hash{ 'foo' };
2398 print "This won't print\n" if exists $hash{ 'foo' };
2399
2400Passing C<$hash{ 'foo' }> to a subroutine used to be a special case, though.
2401Since you could assign directly to C<$_[0]>, Perl had to be ready to
2402make that assignment so it created the hash key ahead of time:
2403
2404 my_sub( $hash{ 'foo' } );
2405 print "This will print before 5.004\n" if exists $hash{ 'foo' };
68dc0745 2406
109f0441
S
2407 sub my_sub {
2408 # $_[0] = 'bar'; # create hash key in case you do this
2409 1;
2410 }
2411
2412Since Perl 5.004, however, this situation is a special case and Perl
2413creates the hash key only when you make the assignment:
68dc0745 2414
109f0441
S
2415 my_sub( $hash{ 'foo' } );
2416 print "This will print, even after 5.004\n" if exists $hash{ 'foo' };
2417
2418 sub my_sub {
2419 $_[0] = 'bar';
2420 }
68dc0745 2421
109f0441
S
2422However, if you want the old behavior (and think carefully about that
2423because it's a weird side effect), you can pass a hash slice instead.
2424Perl 5.004 didn't make this a special case:
68dc0745 2425
109f0441 2426 my_sub( @hash{ qw/foo/ } );
68dc0745 2427
fc36a67e 2428=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
68dc0745 2429
65acb1b1
TC
2430Usually a hash ref, perhaps like this:
2431
ac9dac7f
RGS
2432 $record = {
2433 NAME => "Jason",
2434 EMPNO => 132,
2435 TITLE => "deputy peon",
2436 AGE => 23,
2437 SALARY => 37_000,
2438 PALS => [ "Norbert", "Rhys", "Phineas"],
2439 };
65acb1b1 2440
ab093f19 2441References are documented in L<perlref> and L<perlreftut>.
65acb1b1
TC
2442Examples of complex data structures are given in L<perldsc> and
2443L<perllol>. Examples of structures and object-oriented classes are
2444in L<perltoot>.
68dc0745 2445
2446=head2 How can I use a reference as a hash key?
2447
109f0441 2448(contributed by brian d foy and Ben Morrow)
9e72e4c6
RGS
2449
2450Hash keys are strings, so you can't really use a reference as the key.
2451When you try to do that, perl turns the reference into its stringified
ac9dac7f
RGS
2452form (for instance, C<HASH(0xDEADBEEF)>). From there you can't get
2453back the reference from the stringified form, at least without doing
109f0441
S
2454some extra work on your own.
2455
2456Remember that the entry in the hash will still be there even if
2457the referenced variable goes out of scope, and that it is entirely
2458possible for Perl to subsequently allocate a different variable at
2459the same address. This will mean a new variable might accidentally
2460be associated with the value for an old.
2461
2462If you have Perl 5.10 or later, and you just want to store a value
2463against the reference for lookup later, you can use the core
2464Hash::Util::Fieldhash module. This will also handle renaming the
2465keys if you use multiple threads (which causes all variables to be
2466reallocated at new addresses, changing their stringification), and
2467garbage-collecting the entries when the referenced variable goes out
2468of scope.
2469
2470If you actually need to be able to get a real reference back from
2471each hash entry, you can use the Tie::RefHash module, which does the
2472required work for you.
68dc0745 2473
ebeb11a2 2474=head2 How can I check if a key exists in a multilevel hash?
a1bbdff3 2475
2476(contributed by brian d foy)
2477
2478The trick to this problem is avoiding accidental autovivification. If
2479you want to check three keys deep, you might naïvely try this:
2480
2481 my %hash;
2482 if( exists $hash{key1}{key2}{key3} ) {
2483 ...;
2484 }
6f1f337b 2485
a1bbdff3 2486Even though you started with a completely empty hash, after that call to
2487C<exists> you've created the structure you needed to check for C<key3>:
2488
2489 %hash = (
2490 'key1' => {
2491 'key2' => {}
2492 }
2493 );
2494
2495That's autovivification. You can get around this in a few ways. The
2496easiest way is to just turn it off. The lexical C<autovivification>
2497pragma is available on CPAN. Now you don't add to the hash:
2498
2499 {
2500 no autovivification;
2501 my %hash;
2502 if( exists $hash{key1}{key2}{key3} ) {
2503 ...;
2504 }
2505 }
2506
2507The C<Data::Diver> module on CPAN can do it for you too. Its C<Dive>
2508subroutine can tell you not only if the keys exist but also get the
2509value:
2510
2511 use Data::Diver qw(Dive);
2512
2513 my @exists = Dive( \%hash, qw(key1 key2 key3) );
2514 if( ! @exists ) {
2515 ...; # keys do not exist
2516 }
2517 elsif( ! defined $exists[0] ) {
2518 ...; # keys exist but value is undef
2519 }
2520
2521You can easily do this yourself too by checking each level of the hash
2522before you move onto the next level. This is essentially what
2523C<Data::Diver> does for you:
2524
2525 if( check_hash( \%hash, qw(key1 key2 key3) ) ) {
2526 ...;
2527 }
2528
2529 sub check_hash {
2530 my( $hash, @keys ) = @_;
2531
2532 return unless @keys;
2533
2534 foreach my $key ( @keys ) {
2535 return unless eval { exists $hash->{$key} };
2536 $hash = $hash->{$key};
2537 }
2538
2539 return 1;
2540 }
2541
68dc0745 2542=head1 Data: Misc
2543
2544=head2 How do I handle binary data correctly?
2545
ac9dac7f 2546Perl is binary clean, so it can handle binary data just fine.
e573f903 2547On Windows or DOS, however, you have to use C<binmode> for binary
ac9dac7f
RGS
2548files to avoid conversions for line endings. In general, you should
2549use C<binmode> any time you want to work with binary data.
68dc0745 2550
ac9dac7f 2551Also see L<perlfunc/"binmode"> or L<perlopentut>.
68dc0745 2552
ac9dac7f 2553If you're concerned about 8-bit textual data then see L<perllocale>.
54310121 2554If you want to deal with multibyte characters, however, there are
68dc0745 2555some gotchas. See the section on Regular Expressions.
2556
2557=head2 How do I determine whether a scalar is a number/whole/integer/float?
2558
2559Assuming that you don't care about IEEE notations like "NaN" or
a13ded55 2560"Infinity", you probably just want to use a regular expression:
68dc0745 2561
a13ded55 2562 use 5.010;
2563
2564 given( $number ) {
2565 when( /\D/ )
2566 { say "\thas nondigits"; continue }
2567 when( /^\d+\z/ )
2568 { say "\tis a whole number"; continue }
2569 when( /^-?\d+\z/ )
2570 { say "\tis an integer"; continue }
2571 when( /^[+-]?\d+\z/ )
2572 { say "\tis a +/- integer"; continue }
2573 when( /^-?(?:\d+\.?|\.\d)\d*\z/ )
2574 { say "\tis a real number"; continue }
2575 when( /^[+-]?(?=\.?\d)\d*\.?\d*(?:e[+-]?\d+)?\z/i)
2576 { say "\tis a C float" }
2577 }
68dc0745 2578
f0d19b68
RGS
2579There are also some commonly used modules for the task.
2580L<Scalar::Util> (distributed with 5.8) provides access to perl's
ac9dac7f 2581internal function C<looks_like_number> for determining whether a
a13ded55 2582variable looks like a number. L<Data::Types> exports functions that
ac9dac7f
RGS
2583validate data types using both the above and other regular
2584expressions. Thirdly, there is C<Regexp::Common> which has regular
2585expressions to match various types of numbers. Those three modules are
2586available from the CPAN.
f0d19b68
RGS
2587
2588If you're on a POSIX system, Perl supports the C<POSIX::strtod>
a13ded55 2589function. Its semantics are somewhat cumbersome, so here's a
2590C<getnum> wrapper function for more convenient access. This function
ac9dac7f 2591takes a string and returns the number it found, or C<undef> for input
a13ded55 2592that isn't a C float. The C<is_numeric> function is a front end to
ac9dac7f
RGS
2593C<getnum> if you just want to say, "Is this a float?"
2594
2595 sub getnum {
2596 use POSIX qw(strtod);
2597 my $str = shift;
2598 $str =~ s/^\s+//;
2599 $str =~ s/\s+$//;
2600 $! = 0;
2601 my($num, $unparsed) = strtod($str);
2602 if (($str eq '') || ($unparsed != 0) || $!) {
2603 return undef;
2604 }
2605 else {
2606 return $num;
2607 }
2608 }
5a964f20 2609
ac9dac7f 2610 sub is_numeric { defined getnum($_[0]) }
5a964f20 2611
f0d19b68 2612Or you could check out the L<String::Scanf> module on the CPAN
ac9dac7f
RGS
2613instead. The C<POSIX> module (part of the standard Perl distribution)
2614provides the C<strtod> and C<strtol> for converting strings to double
2615and longs, respectively.
68dc0745 2616
2617=head2 How do I keep persistent data across program calls?
2618
2619For some specific applications, you can use one of the DBM modules.
ac9dac7f
RGS
2620See L<AnyDBM_File>. More generically, you should consult the C<FreezeThaw>
2621or C<Storable> modules from CPAN. Starting from Perl 5.8 C<Storable> is part
2622of the standard distribution. Here's one example using C<Storable>'s C<store>
fe854a6f 2623and C<retrieve> functions:
65acb1b1 2624
ac9dac7f
RGS
2625 use Storable;
2626 store(\%hash, "filename");
65acb1b1 2627
ac9dac7f
RGS
2628 # later on...
2629 $href = retrieve("filename"); # by ref
2630 %hash = %{ retrieve("filename") }; # direct to hash
68dc0745 2631
2632=head2 How do I print out or copy a recursive data structure?
2633
ac9dac7f
RGS
2634The C<Data::Dumper> module on CPAN (or the 5.005 release of Perl) is great
2635for printing out data structures. The C<Storable> module on CPAN (or the
6f82c03a
EM
26365.8 release of Perl), provides a function called C<dclone> that recursively
2637copies its argument.
65acb1b1 2638
ac9dac7f
RGS
2639 use Storable qw(dclone);
2640 $r2 = dclone($r1);
68dc0745 2641
ac9dac7f 2642Where C<$r1> can be a reference to any kind of data structure you'd like.
65acb1b1
TC
2643It will be deeply copied. Because C<dclone> takes and returns references,
2644you'd have to add extra punctuation if you had a hash of arrays that
2645you wanted to copy.
68dc0745 2646
ac9dac7f 2647 %newhash = %{ dclone(\%oldhash) };
68dc0745 2648
2649=head2 How do I define methods for every class/object?
2650
109f0441
S
2651(contributed by Ben Morrow)
2652
2653You can use the C<UNIVERSAL> class (see L<UNIVERSAL>). However, please
2654be very careful to consider the consequences of doing this: adding
2655methods to every object is very likely to have unintended
2656consequences. If possible, it would be better to have all your object
2657inherit from some common base class, or to use an object system like
2658Moose that supports roles.
68dc0745 2659
2660=head2 How do I verify a credit card checksum?
2661
ac9dac7f 2662Get the C<Business::CreditCard> module from CPAN.
68dc0745 2663
65acb1b1
TC
2664=head2 How do I pack arrays of doubles or floats for XS code?
2665
109f0441 2666The arrays.h/arrays.c code in the C<PGPLOT> module on CPAN does just this.
65acb1b1 2667If you're doing a lot of float or double processing, consider using
ac9dac7f 2668the C<PDL> module from CPAN instead--it makes number-crunching easy.
65acb1b1 2669
109f0441
S
2670See L<http://search.cpan.org/dist/PGPLOT> for the code.
2671
68dc0745 2672=head1 AUTHOR AND COPYRIGHT
2673
8d2e243f 2674Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and
7678cced 2675other authors as noted. All rights reserved.
5a964f20 2676
5a7beb56
JH
2677This documentation is free; you can redistribute it and/or modify it
2678under the same terms as Perl itself.
5a964f20
TC
2679
2680Irrespective of its distribution, all code examples in this file
2681are hereby placed into the public domain. You are permitted and
2682encouraged to use this code in your own programs for fun
2683or for profit as you see fit. A simple comment in the code giving
2684credit would be courteous but is not required.