3 perlfaq4 - Data Manipulation
7 This section of the FAQ answers questions related to manipulating
8 numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
12 =head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
14 For the long explanation, see David Goldberg's "What Every Computer
15 Scientist Should Know About Floating-Point Arithmetic"
16 (L<http://web.cse.msu.edu/~cse320/Documents/FloatingPoint.pdf>).
18 Internally, your computer represents floating-point numbers in binary.
19 Digital (as in powers of two) computers cannot store all numbers
20 exactly. Some real numbers lose precision in the process. This is a
21 problem with how computers store numbers and affects all computer
22 languages, not just Perl.
24 L<perlnumber> shows the gory details of number representations and
27 To limit the number of decimal places in your numbers, you can use the
28 C<printf> or C<sprintf> function. See
29 L<perlop/"Floating-point Arithmetic"> for more details.
33 my $number = sprintf "%.2f", 10/3;
35 =head2 Why is int() broken?
37 Your C<int()> is most probably working just fine. It's the numbers that
38 aren't quite what you think.
40 First, see the answer to "Why am I getting long decimals
41 (eg, 19.9499999999999) instead of the numbers I should be getting
46 print int(0.6/0.2-2), "\n";
48 will in most computers print 0, not 1, because even such simple
49 numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
50 numbers. What you think in the above as 'three' is really more like
51 2.9999999999999995559.
53 =head2 Why isn't my octal data interpreted correctly?
55 (contributed by brian d foy)
57 You're probably trying to convert a string to a number, which Perl only
58 converts as a decimal number. When Perl converts a string to a number, it
59 ignores leading spaces and zeroes, then assumes the rest of the digits
64 print $string + 0; # prints 644
66 print $string + 44; # prints 688, certainly not octal!
68 This problem usually involves one of the Perl built-ins that has the
69 same name a Unix command that uses octal numbers as arguments on the
70 command line. In this example, C<chmod> on the command line knows that
71 its first argument is octal because that's what it does:
73 %prompt> chmod 644 file
75 If you want to use the same literal digits (644) in Perl, you have to tell
76 Perl to treat them as octal numbers either by prefixing the digits with
77 a C<0> or using C<oct>:
79 chmod( 0644, $filename ); # right, has leading zero
80 chmod( oct(644), $filename ); # also correct
82 The problem comes in when you take your numbers from something that Perl
83 thinks is a string, such as a command line argument in C<@ARGV>:
85 chmod( $ARGV[0], $filename ); # wrong, even if "0644"
87 chmod( oct($ARGV[0]), $filename ); # correct, treat string as octal
89 You can always check the value you're using by printing it in octal
90 notation to ensure it matches what you think it should be. Print it
91 in octal and decimal format:
93 printf "0%o %d", $number, $number;
95 =head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
97 Remember that C<int()> merely truncates toward 0. For rounding to a
98 certain number of digits, C<sprintf()> or C<printf()> is usually the
101 printf("%.3f", 3.1415926535); # prints 3.142
103 The L<POSIX> module (part of the standard Perl distribution)
104 implements C<ceil()>, C<floor()>, and a number of other mathematical
105 and trigonometric functions.
108 my $ceil = ceil(3.5); # 4
109 my $floor = floor(3.5); # 3
111 In 5.000 to 5.003 perls, trigonometry was done in the L<Math::Complex>
112 module. With 5.004, the L<Math::Trig> module (part of the standard Perl
113 distribution) implements the trigonometric functions. Internally it
114 uses the L<Math::Complex> module and some functions can break out from
115 the real axis into the complex plane, for example the inverse sine of
118 Rounding in financial applications can have serious implications, and
119 the rounding method used should be specified precisely. In these
120 cases, it probably pays not to trust whichever system of rounding is
121 being used by Perl, but instead to implement the rounding function you
124 To see why, notice how you'll still have an issue on half-way-point
127 for (my $i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
129 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
130 0.8 0.8 0.9 0.9 1.0 1.0
132 Don't blame Perl. It's the same as in C. IEEE says we have to do
133 this. Perl numbers whose absolute values are integers under 2**31 (on
134 32-bit machines) will work pretty much like mathematical integers.
135 Other numbers are not guaranteed.
137 =head2 How do I convert between numeric representations/bases/radixes?
139 As always with Perl there is more than one way to do it. Below are a
140 few examples of approaches to making common conversions between number
141 representations. This is intended to be representational rather than
144 Some of the examples later in L<perlfaq4> use the L<Bit::Vector>
145 module from CPAN. The reason you might choose L<Bit::Vector> over the
146 perl built-in functions is that it works with numbers of ANY size,
147 that it is optimized for speed on some operations, and for at least
148 some programmers the notation might be familiar.
152 =item How do I convert hexadecimal into decimal
154 Using perl's built in conversion of C<0x> notation:
156 my $dec = 0xDEADBEEF;
158 Using the C<hex> function:
160 my $dec = hex("DEADBEEF");
164 my $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
166 Using the CPAN module C<Bit::Vector>:
169 my $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
170 my $dec = $vec->to_Dec();
172 =item How do I convert from decimal to hexadecimal
176 my $hex = sprintf("%X", 3735928559); # upper case A-F
177 my $hex = sprintf("%x", 3735928559); # lower case a-f
181 my $hex = unpack("H*", pack("N", 3735928559));
183 Using L<Bit::Vector>:
186 my $vec = Bit::Vector->new_Dec(32, -559038737);
187 my $hex = $vec->to_Hex();
189 And L<Bit::Vector> supports odd bit counts:
192 my $vec = Bit::Vector->new_Dec(33, 3735928559);
193 $vec->Resize(32); # suppress leading 0 if unwanted
194 my $hex = $vec->to_Hex();
196 =item How do I convert from octal to decimal
198 Using Perl's built in conversion of numbers with leading zeros:
200 my $dec = 033653337357; # note the leading 0!
202 Using the C<oct> function:
204 my $dec = oct("33653337357");
206 Using L<Bit::Vector>:
209 my $vec = Bit::Vector->new(32);
210 $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
211 my $dec = $vec->to_Dec();
213 =item How do I convert from decimal to octal
217 my $oct = sprintf("%o", 3735928559);
219 Using L<Bit::Vector>:
222 my $vec = Bit::Vector->new_Dec(32, -559038737);
223 my $oct = reverse join('', $vec->Chunk_List_Read(3));
225 =item How do I convert from binary to decimal
227 Perl 5.6 lets you write binary numbers directly with
230 my $number = 0b10110110;
234 my $input = "10110110";
235 my $decimal = oct( "0b$input" );
237 Using C<pack> and C<ord>:
239 my $decimal = ord(pack('B8', '10110110'));
241 Using C<pack> and C<unpack> for larger strings:
243 my $int = unpack("N", pack("B32",
244 substr("0" x 32 . "11110101011011011111011101111", -32)));
245 my $dec = sprintf("%d", $int);
247 # substr() is used to left-pad a 32-character string with zeros.
249 Using L<Bit::Vector>:
251 my $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
252 my $dec = $vec->to_Dec();
254 =item How do I convert from decimal to binary
256 Using C<sprintf> (perl 5.6+):
258 my $bin = sprintf("%b", 3735928559);
262 my $bin = unpack("B*", pack("N", 3735928559));
264 Using L<Bit::Vector>:
267 my $vec = Bit::Vector->new_Dec(32, -559038737);
268 my $bin = $vec->to_Bin();
270 The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
271 are left as an exercise to the inclined reader.
275 =head2 Why doesn't & work the way I want it to?
277 The behavior of binary arithmetic operators depends on whether they're
278 used on numbers or strings. The operators treat a string as a series
279 of bits and work with that (the string C<"3"> is the bit pattern
280 C<00110011>). The operators work with the binary form of a number
281 (the number C<3> is treated as the bit pattern C<00000011>).
283 So, saying C<11 & 3> performs the "and" operation on numbers (yielding
284 C<3>). Saying C<"11" & "3"> performs the "and" operation on strings
287 Most problems with C<&> and C<|> arise because the programmer thinks
288 they have a number but really it's a string or vice versa. To avoid this,
289 stringify the arguments explicitly (using C<""> or C<qq()>) or convert them
290 to numbers explicitly (using C<0+$arg>). The rest arise because
293 if ("\020\020" & "\101\101") {
297 but a string consisting of two null bytes (the result of C<"\020\020"
298 & "\101\101">) is not a false value in Perl. You need:
300 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
304 =head2 How do I multiply matrices?
306 Use the L<Math::Matrix> or L<Math::MatrixReal> modules (available from CPAN)
307 or the L<PDL> extension (also available from CPAN).
309 =head2 How do I perform an operation on a series of integers?
311 To call a function on each element in an array, and collect the
314 my @results = map { my_func($_) } @array;
318 my @triple = map { 3 * $_ } @single;
320 To call a function on each element of an array, but ignore the
323 foreach my $iterator (@array) {
324 some_func($iterator);
327 To call a function on each integer in a (small) range, you B<can> use:
329 my @results = map { some_func($_) } (5 .. 25);
331 but you should be aware that in this form, the C<..> operator
332 creates a list of all integers in the range, which can take a lot of
333 memory for large ranges. However, the problem does not occur when
334 using C<..> within a C<for> loop, because in that case the range
335 operator is optimized to I<iterate> over the range, without creating
339 for my $i (5 .. 500_005) {
340 push(@results, some_func($i));
345 push(@results, some_func($_)) for 5 .. 500_005;
347 will not create an intermediate list of 500,000 integers.
349 =head2 How can I output Roman numerals?
351 Get the L<http://www.cpan.org/modules/by-module/Roman> module.
353 =head2 Why aren't my random numbers random?
355 If you're using a version of Perl before 5.004, you must call C<srand>
356 once at the start of your program to seed the random number generator.
358 BEGIN { srand() if $] < 5.004 }
360 5.004 and later automatically call C<srand> at the beginning. Don't
361 call C<srand> more than once--you make your numbers less random,
364 Computers are good at being predictable and bad at being random
365 (despite appearances caused by bugs in your programs :-). The
366 F<random> article in the "Far More Than You Ever Wanted To Know"
367 collection in L<http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz>, courtesy
368 of Tom Phoenix, talks more about this. John von Neumann said, "Anyone
369 who attempts to generate random numbers by deterministic means is, of
370 course, living in a state of sin."
372 Perl relies on the underlying system for the implementation of
373 C<rand> and C<srand>; on some systems, the generated numbers are
374 not random enough (especially on Windows : see
375 L<http://www.perlmonks.org/?node_id=803632>).
376 Several CPAN modules in the C<Math> namespace implement better
377 pseudorandom generators; see for example
378 L<Math::Random::MT> ("Mersenne Twister", fast), or
379 L<Math::TrulyRandom> (uses the imperfections in the system's
380 timer to generate random numbers, which is rather slow).
381 More algorithms for random numbers are described in
382 "Numerical Recipes in C" at L<http://www.nr.com/>
384 =head2 How do I get a random number between X and Y?
386 To get a random number between two values, you can use the C<rand()>
387 built-in to get a random number between 0 and 1. From there, you shift
388 that into the range that you want.
390 C<rand($x)> returns a number such that C<< 0 <= rand($x) < $x >>. Thus
391 what you want to have perl figure out is a random number in the range
392 from 0 to the difference between your I<X> and I<Y>.
394 That is, to get a number between 10 and 15, inclusive, you want a
395 random number between 0 and 5 that you can then add to 10.
397 my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
399 Hence you derive the following simple function to abstract
400 that. It selects a random integer between the two given
401 integers (inclusive). For example: C<random_int_between(50,120)>.
403 sub random_int_between {
405 # Assumes that the two arguments are integers themselves!
406 return $min if $min == $max;
407 ($min, $max) = ($max, $min) if $min > $max;
408 return $min + int rand(1 + $max - $min);
413 =head2 How do I find the day or week of the year?
415 The day of the year is in the list returned
416 by the C<localtime> function. Without an
417 argument C<localtime> uses the current time.
419 my $day_of_year = (localtime)[7];
421 The L<POSIX> module can also format a date as the day of the year or
424 use POSIX qw/strftime/;
425 my $day_of_year = strftime "%j", localtime;
426 my $week_of_year = strftime "%W", localtime;
428 To get the day of year for any date, use L<POSIX>'s C<mktime> to get
429 a time in epoch seconds for the argument to C<localtime>.
431 use POSIX qw/mktime strftime/;
432 my $week_of_year = strftime "%W",
433 localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
435 You can also use L<Time::Piece>, which comes with Perl and provides a
436 C<localtime> that returns an object:
439 my $day_of_year = localtime->yday;
440 my $week_of_year = localtime->week;
442 The L<Date::Calc> module provides two functions to calculate these, too:
445 my $day_of_year = Day_of_Year( 1987, 12, 18 );
446 my $week_of_year = Week_of_Year( 1987, 12, 18 );
448 =head2 How do I find the current century or millennium?
450 Use the following simple functions:
453 return int((((localtime(shift || time))[5] + 1999))/100);
457 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
460 On some systems, the L<POSIX> module's C<strftime()> function has been
461 extended in a non-standard way to use a C<%C> format, which they
462 sometimes claim is the "century". It isn't, because on most such
463 systems, this is only the first two digits of the four-digit year, and
464 thus cannot be used to determine reliably the current century or
467 =head2 How can I compare two dates and find the difference?
469 (contributed by brian d foy)
471 You could just store all your dates as a number and then subtract.
472 Life isn't always that simple though.
474 The L<Time::Piece> module, which comes with Perl, replaces L<localtime>
475 with a version that returns an object. It also overloads the comparison
476 operators so you can compare them directly:
479 my $date1 = localtime( $some_time );
480 my $date2 = localtime( $some_other_time );
482 if( $date1 < $date2 ) {
483 print "The date was in the past\n";
486 You can also get differences with a subtraction, which returns a
487 L<Time::Seconds> object:
489 my $diff = $date1 - $date2;
490 print "The difference is ", $date_diff->days, " days\n";
492 If you want to work with formatted dates, the L<Date::Manip>,
493 L<Date::Calc>, or L<DateTime> modules can help you.
495 =head2 How can I take a string and turn it into epoch seconds?
497 If it's a regular enough string that it always has the same format,
498 you can split it up and pass the parts to C<timelocal> in the standard
499 L<Time::Local> module. Otherwise, you should look into the L<Date::Calc>,
500 L<Date::Parse>, and L<Date::Manip> modules from CPAN.
502 =head2 How can I find the Julian Day?
504 (contributed by brian d foy and Dave Cross)
506 You can use the L<Time::Piece> module, part of the Standard Library,
507 which can convert a date/time to a Julian Day:
509 $ perl -MTime::Piece -le 'print localtime->julian_day'
512 Or the modified Julian Day:
514 $ perl -MTime::Piece -le 'print localtime->mjd'
517 Or even the day of the year (which is what some people think of as a
520 $ perl -MTime::Piece -le 'print localtime->yday'
523 You can also do the same things with the L<DateTime> module:
525 $ perl -MDateTime -le'print DateTime->today->jd'
527 $ perl -MDateTime -le'print DateTime->today->mjd'
529 $ perl -MDateTime -le'print DateTime->today->doy'
532 You can use the L<Time::JulianDay> module available on CPAN. Ensure
533 that you really want to find a Julian day, though, as many people have
534 different ideas about Julian days (see L<http://www.hermetic.ch/cal_stud/jdn.htm>
537 $ perl -MTime::JulianDay -le 'print local_julian_day( time )'
540 =head2 How do I find yesterday's date?
541 X<date> X<yesterday> X<DateTime> X<Date::Calc> X<Time::Local>
542 X<daylight saving time> X<day> X<Today_and_Now> X<localtime>
545 (contributed by brian d foy)
547 To do it correctly, you can use one of the C<Date> modules since they
548 work with calendars instead of times. The L<DateTime> module makes it
549 simple, and give you the same time of day, only the day before,
550 despite daylight saving time changes:
554 my $yesterday = DateTime->now->subtract( days => 1 );
556 print "Yesterday was $yesterday\n";
558 You can also use the L<Date::Calc> module using its C<Today_and_Now>
561 use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
563 my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
565 print "@date_time\n";
567 Most people try to use the time rather than the calendar to figure out
568 dates, but that assumes that days are twenty-four hours each. For
569 most people, there are two days a year when they aren't: the switch to
570 and from summer time throws this off. For example, the rest of the
571 suggestions will be wrong sometimes:
573 Starting with Perl 5.10, L<Time::Piece> and L<Time::Seconds> are part
574 of the standard distribution, so you might think that you could do
580 my $yesterday = localtime() - ONE_DAY; # WRONG
581 print "Yesterday was $yesterday\n";
583 The L<Time::Piece> module exports a new C<localtime> that returns an
584 object, and L<Time::Seconds> exports the C<ONE_DAY> constant that is a
585 set number of seconds. This means that it always gives the time 24
586 hours ago, which is not always yesterday. This can cause problems
587 around the end of daylight saving time when there's one day that is 25
590 You have the same problem with L<Time::Local>, which will give the wrong
591 answer for those same special cases:
593 # contributed by Gunnar Hjalmarsson
595 my $today = timelocal 0, 0, 12, ( localtime )[3..5];
596 my ($d, $m, $y) = ( localtime $today-86400 )[3..5]; # WRONG
597 printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d;
599 =head2 Does Perl have a Year 2000 or 2038 problem? Is Perl Y2K compliant?
601 (contributed by brian d foy)
603 Perl itself never had a Y2K problem, although that never stopped people
604 from creating Y2K problems on their own. See the documentation for
605 C<localtime> for its proper use.
607 Starting with Perl 5.12, C<localtime> and C<gmtime> can handle dates past
608 03:14:08 January 19, 2038, when a 32-bit based time would overflow. You
609 still might get a warning on a 32-bit C<perl>:
611 % perl5.12 -E 'say scalar localtime( 0x9FFF_FFFFFFFF )'
612 Integer overflow in hexadecimal number at -e line 1.
613 Wed Nov 1 19:42:39 5576711
615 On a 64-bit C<perl>, you can get even larger dates for those really long
618 % perl5.12 -E 'say scalar gmtime( 0x9FFF_FFFFFFFF )'
619 Thu Nov 2 00:42:39 5576711
621 You're still out of luck if you need to keep track of decaying protons
626 =head2 How do I validate input?
628 (contributed by brian d foy)
630 There are many ways to ensure that values are what you expect or
631 want to accept. Besides the specific examples that we cover in the
632 perlfaq, you can also look at the modules with "Assert" and "Validate"
633 in their names, along with other modules such as L<Regexp::Common>.
635 Some modules have validation for particular types of input, such
636 as L<Business::ISBN>, L<Business::CreditCard>, L<Email::Valid>,
637 and L<Data::Validate::IP>.
639 =head2 How do I unescape a string?
641 It depends just what you mean by "escape". URL escapes are dealt
642 with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
643 character are removed with
647 This won't expand C<"\n"> or C<"\t"> or any other special escapes.
649 =head2 How do I remove consecutive pairs of characters?
651 (contributed by brian d foy)
653 You can use the substitution operator to find pairs of characters (or
654 runs of characters) and replace them with a single instance. In this
655 substitution, we find a character in C<(.)>. The memory parentheses
656 store the matched character in the back-reference C<\g1> and we use
657 that to require that the same thing immediately follow it. We replace
658 that part of the string with the character in C<$1>.
662 We can also use the transliteration operator, C<tr///>. In this
663 example, the search list side of our C<tr///> contains nothing, but
664 the C<c> option complements that so it contains everything. The
665 replacement list also contains nothing, so the transliteration is
666 almost a no-op since it won't do any replacements (or more exactly,
667 replace the character with itself). However, the C<s> option squashes
668 duplicated and consecutive characters in the string so a character
669 does not show up next to itself
671 my $str = 'Haarlem'; # in the Netherlands
672 $str =~ tr///cs; # Now Harlem, like in New York
674 =head2 How do I expand function calls in a string?
676 (contributed by brian d foy)
678 This is documented in L<perlref>, and although it's not the easiest
679 thing to read, it does work. In each of these examples, we call the
680 function inside the braces used to dereference a reference. If we
681 have more than one return value, we can construct and dereference an
682 anonymous array. In this case, we call the function in list context.
684 print "The time values are @{ [localtime] }.\n";
686 If we want to call the function in scalar context, we have to do a bit
687 more work. We can really have any code we like inside the braces, so
688 we simply have to end with the scalar reference, although how you do
689 that is up to you, and you can use code inside the braces. Note that
690 the use of parens creates a list context, so we need C<scalar> to
691 force the scalar context on the function:
693 print "The time is ${\(scalar localtime)}.\n"
695 print "The time is ${ my $x = localtime; \$x }.\n";
697 If your function already returns a reference, you don't need to create
698 the reference yourself.
700 sub timestamp { my $t = localtime; \$t }
702 print "The time is ${ timestamp() }.\n";
704 The C<Interpolation> module can also do a lot of magic for you. You can
705 specify a variable name, in this case C<E>, to set up a tied hash that
706 does the interpolation for you. It has several other methods to do this
709 use Interpolation E => 'eval';
710 print "The time values are $E{localtime()}.\n";
712 In most cases, it is probably easier to simply use string concatenation,
713 which also forces scalar context.
715 print "The time is " . localtime() . ".\n";
717 =head2 How do I find matching/nesting anything?
719 To find something between two single
720 characters, a pattern like C</x([^x]*)x/> will get the intervening
721 bits in $1. For multiple ones, then something more like
722 C</alpha(.*?)omega/> would be needed. For nested patterns
723 and/or balanced expressions, see the so-called
724 L<< (?PARNO)|perlre/C<(?PARNO)> C<(?-PARNO)> C<(?+PARNO)> C<(?R)> C<(?0)> >>
725 construct (available since perl 5.10).
726 The CPAN module L<Regexp::Common> can help to build such
727 regular expressions (see in particular
728 L<Regexp::Common::balanced> and L<Regexp::Common::delimited>).
730 More complex cases will require to write a parser, probably
731 using a parsing module from CPAN, like
732 L<Regexp::Grammars>, L<Parse::RecDescent>, L<Parse::Yapp>,
733 L<Text::Balanced>, or L<Marpa::XS>.
735 =head2 How do I reverse a string?
737 Use C<reverse()> in scalar context, as documented in
740 my $reversed = reverse $string;
742 =head2 How do I expand tabs in a string?
744 You can do it yourself:
746 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
748 Or you can just use the L<Text::Tabs> module (part of the standard Perl
752 my @expanded_lines = expand(@lines_with_tabs);
754 =head2 How do I reformat a paragraph?
756 Use L<Text::Wrap> (part of the standard Perl distribution):
759 print wrap("\t", ' ', @paragraphs);
761 The paragraphs you give to L<Text::Wrap> should not contain embedded
762 newlines. L<Text::Wrap> doesn't justify the lines (flush-right).
764 Or use the CPAN module L<Text::Autoformat>. Formatting files can be
765 easily done by making a shell alias, like so:
767 alias fmt="perl -i -MText::Autoformat -n0777 \
768 -e 'print autoformat $_, {all=>1}' $*"
770 See the documentation for L<Text::Autoformat> to appreciate its many
773 =head2 How can I access or change N characters of a string?
775 You can access the first characters of a string with substr().
776 To get the first character, for example, start at position 0
777 and grab the string of length 1.
780 my $string = "Just another Perl Hacker";
781 my $first_char = substr( $string, 0, 1 ); # 'J'
783 To change part of a string, you can use the optional fourth
784 argument which is the replacement string.
786 substr( $string, 13, 4, "Perl 5.8.0" );
788 You can also use substr() as an lvalue.
790 substr( $string, 13, 4 ) = "Perl 5.8.0";
792 =head2 How do I change the Nth occurrence of something?
794 You have to keep track of N yourself. For example, let's say you want
795 to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
796 C<"whosoever"> or C<"whomsoever">, case insensitively. These
797 all assume that $_ contains the string to be altered.
801 ++$count == 5 # is it the 5th?
802 ? "${2}soever" # yes, swap
803 : $1 # renege and leave it there
806 In the more general case, you can use the C</g> modifier in a C<while>
807 loop, keeping count of matches.
811 $_ = "One fish two fish red fish blue fish";
812 while (/(\w+)\s+fish\b/gi) {
813 if (++$count == $WANT) {
814 print "The third fish is a $1 one.\n";
818 That prints out: C<"The third fish is a red one."> You can also use a
819 repetition count and repeated pattern like this:
821 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
823 =head2 How can I count the number of occurrences of a substring within a string?
825 There are a number of ways, with varying efficiency. If you want a
826 count of a certain single character (X) within a string, you can use the
827 C<tr///> function like so:
829 my $string = "ThisXlineXhasXsomeXx'sXinXit";
830 my $count = ($string =~ tr/X//);
831 print "There are $count X characters in the string";
833 This is fine if you are just looking for a single character. However,
834 if you are trying to count multiple character substrings within a
835 larger string, C<tr///> won't work. What you can do is wrap a while()
836 loop around a global pattern match. For example, let's count negative
839 my $string = "-9 55 48 -2 23 -76 4 14 -44";
841 while ($string =~ /-\d+/g) { $count++ }
842 print "There are $count negative numbers in the string";
844 Another version uses a global match in list context, then assigns the
845 result to a scalar, producing a count of the number of matches.
847 my $count = () = $string =~ /-\d+/g;
849 =head2 How do I capitalize all the words on one line?
850 X<Text::Autoformat> X<capitalize> X<case, title> X<case, sentence>
852 (contributed by brian d foy)
854 Damian Conway's L<Text::Autoformat> handles all of the thinking
857 use Text::Autoformat;
858 my $x = "Dr. Strangelove or: How I Learned to Stop ".
859 "Worrying and Love the Bomb";
862 for my $style (qw( sentence title highlight )) {
863 print autoformat($x, { case => $style }), "\n";
866 How do you want to capitalize those words?
868 FRED AND BARNEY'S LODGE # all uppercase
869 Fred And Barney's Lodge # title case
870 Fred and Barney's Lodge # highlight case
872 It's not as easy a problem as it looks. How many words do you think
873 are in there? Wait for it... wait for it.... If you answered 5
874 you're right. Perl words are groups of C<\w+>, but that's not what
875 you want to capitalize. How is Perl supposed to know not to capitalize
876 that C<s> after the apostrophe? You could try a regular expression:
879 (^\w) #at the beginning of the line
881 (\s\w) #preceded by whitespace
885 $string =~ s/([\w']+)/\u\L$1/g;
887 Now, what if you don't want to capitalize that "and"? Just use
888 L<Text::Autoformat> and get on with the next problem. :)
890 =head2 How can I split a [character]-delimited string except when inside [character]?
892 Several modules can handle this sort of parsing--L<Text::Balanced>,
893 L<Text::CSV>, L<Text::CSV_XS>, and L<Text::ParseWords>, among others.
895 Take the example case of trying to split a string that is
896 comma-separated into its different fields. You can't use C<split(/,/)>
897 because you shouldn't split if the comma is inside quotes. For
898 example, take a data line like this:
900 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
902 Due to the restriction of the quotes, this is a fairly complex
903 problem. Thankfully, we have Jeffrey Friedl, author of
904 I<Mastering Regular Expressions>, to handle these for us. He
905 suggests (assuming your string is contained in C<$text>):
908 push(@new, $+) while $text =~ m{
909 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
913 push(@new, undef) if substr($text,-1,1) eq ',';
915 If you want to represent quotation marks inside a
916 quotation-mark-delimited field, escape them with backslashes (eg,
919 Alternatively, the L<Text::ParseWords> module (part of the standard
920 Perl distribution) lets you say:
922 use Text::ParseWords;
923 @new = quotewords(",", 0, $text);
925 For parsing or generating CSV, though, using L<Text::CSV> rather than
926 implementing it yourself is highly recommended; you'll save yourself odd bugs
927 popping up later by just using code which has already been tried and tested in
928 production for years.
930 =head2 How do I strip blank space from the beginning/end of a string?
932 (contributed by brian d foy)
934 A substitution can do this for you. For a single line, you want to
935 replace all the leading or trailing whitespace with nothing. You
936 can do that with a pair of substitutions:
941 You can also write that as a single substitution, although it turns
942 out the combined statement is slower than the separate ones. That
943 might not matter to you, though:
947 In this regular expression, the alternation matches either at the
948 beginning or the end of the string since the anchors have a lower
949 precedence than the alternation. With the C</g> flag, the substitution
950 makes all possible matches, so it gets both. Remember, the trailing
951 newline matches the C<\s+>, and the C<$> anchor can match to the
952 absolute end of the string, so the newline disappears too. Just add
953 the newline to the output, which has the added benefit of preserving
954 "blank" (consisting entirely of whitespace) lines which the C<^\s+>
955 would remove all by itself:
962 For a multi-line string, you can apply the regular expression to each
963 logical line in the string by adding the C</m> flag (for
964 "multi-line"). With the C</m> flag, the C<$> matches I<before> an
965 embedded newline, so it doesn't remove it. This pattern still removes
966 the newline at the end of the string:
968 $string =~ s/^\s+|\s+$//gm;
970 Remember that lines consisting entirely of whitespace will disappear,
971 since the first part of the alternation can match the entire string
972 and replace it with nothing. If you need to keep embedded blank lines,
973 you have to do a little more work. Instead of matching any whitespace
974 (since that includes a newline), just match the other whitespace:
976 $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
978 =head2 How do I pad a string with blanks or pad a number with zeroes?
980 In the following examples, C<$pad_len> is the length to which you wish
981 to pad the string, C<$text> or C<$num> contains the string to be padded,
982 and C<$pad_char> contains the padding character. You can use a single
983 character string constant instead of the C<$pad_char> variable if you
984 know what it is in advance. And in the same way you can use an integer in
985 place of C<$pad_len> if you know the pad length in advance.
987 The simplest method uses the C<sprintf> function. It can pad on the left
988 or right with blanks and on the left with zeroes and it will not
989 truncate the result. The C<pack> function can only pad strings on the
990 right with blanks and it will truncate the result to a maximum length of
993 # Left padding a string with blanks (no truncation):
994 my $padded = sprintf("%${pad_len}s", $text);
995 my $padded = sprintf("%*s", $pad_len, $text); # same thing
997 # Right padding a string with blanks (no truncation):
998 my $padded = sprintf("%-${pad_len}s", $text);
999 my $padded = sprintf("%-*s", $pad_len, $text); # same thing
1001 # Left padding a number with 0 (no truncation):
1002 my $padded = sprintf("%0${pad_len}d", $num);
1003 my $padded = sprintf("%0*d", $pad_len, $num); # same thing
1005 # Right padding a string with blanks using pack (will truncate):
1006 my $padded = pack("A$pad_len",$text);
1008 If you need to pad with a character other than blank or zero you can use
1009 one of the following methods. They all generate a pad string with the
1010 C<x> operator and combine that with C<$text>. These methods do
1011 not truncate C<$text>.
1013 Left and right padding with any character, creating a new string:
1015 my $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
1016 my $padded = $text . $pad_char x ( $pad_len - length( $text ) );
1018 Left and right padding with any character, modifying C<$text> directly:
1020 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
1021 $text .= $pad_char x ( $pad_len - length( $text ) );
1023 =head2 How do I extract selected columns from a string?
1025 (contributed by brian d foy)
1027 If you know the columns that contain the data, you can
1028 use C<substr> to extract a single column.
1030 my $column = substr( $line, $start_column, $length );
1032 You can use C<split> if the columns are separated by whitespace or
1033 some other delimiter, as long as whitespace or the delimiter cannot
1034 appear as part of the data.
1036 my $line = ' fred barney betty ';
1037 my @columns = split /\s+/, $line;
1038 # ( '', 'fred', 'barney', 'betty' );
1040 my $line = 'fred||barney||betty';
1041 my @columns = split /\|/, $line;
1042 # ( 'fred', '', 'barney', '', 'betty' );
1044 If you want to work with comma-separated values, don't do this since
1045 that format is a bit more complicated. Use one of the modules that
1046 handle that format, such as L<Text::CSV>, L<Text::CSV_XS>, or
1049 If you want to break apart an entire line of fixed columns, you can use
1050 C<unpack> with the A (ASCII) format. By using a number after the format
1051 specifier, you can denote the column width. See the C<pack> and C<unpack>
1052 entries in L<perlfunc> for more details.
1054 my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
1056 Note that spaces in the format argument to C<unpack> do not denote literal
1057 spaces. If you have space separated data, you may want C<split> instead.
1059 =head2 How do I find the soundex value of a string?
1061 (contributed by brian d foy)
1063 You can use the C<Text::Soundex> module. If you want to do fuzzy or close
1064 matching, you might also try the L<String::Approx>, and
1065 L<Text::Metaphone>, and L<Text::DoubleMetaphone> modules.
1067 =head2 How can I expand variables in text strings?
1069 (contributed by brian d foy)
1071 If you can avoid it, don't, or if you can use a templating system,
1072 such as L<Text::Template> or L<Template> Toolkit, do that instead. You
1073 might even be able to get the job done with C<sprintf> or C<printf>:
1075 my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
1077 However, for the one-off simple case where I don't want to pull out a
1078 full templating system, I'll use a string that has two Perl scalar
1079 variables in it. In this example, I want to expand C<$foo> and C<$bar>
1080 to their variable's values:
1084 $string = 'Say hello to $foo and $bar';
1086 One way I can do this involves the substitution operator and a double
1087 C</e> flag. The first C</e> evaluates C<$1> on the replacement side and
1088 turns it into C<$foo>. The second /e starts with C<$foo> and replaces
1089 it with its value. C<$foo>, then, turns into 'Fred', and that's finally
1090 what's left in the string:
1092 $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
1094 The C</e> will also silently ignore violations of strict, replacing
1095 undefined variable names with the empty string. Since I'm using the
1096 C</e> flag (twice even!), I have all of the same security problems I
1097 have with C<eval> in its string form. If there's something odd in
1098 C<$foo>, perhaps something like C<@{[ system "rm -rf /" ]}>, then
1099 I could get myself in trouble.
1101 To get around the security problem, I could also pull the values from
1102 a hash instead of evaluating variable names. Using a single C</e>, I
1103 can check the hash to ensure the value exists, and if it doesn't, I
1104 can replace the missing value with a marker, in this case C<???> to
1105 signal that I missed something:
1107 my $string = 'This has $foo and $bar';
1109 my %Replacements = (
1113 # $string =~ s/\$(\w+)/$Replacements{$1}/g;
1114 $string =~ s/\$(\w+)/
1115 exists $Replacements{$1} ? $Replacements{$1} : '???'
1120 =head2 What's wrong with always quoting "$vars"?
1122 The problem is that those double-quotes force
1123 stringification--coercing numbers and references into strings--even
1124 when you don't want them to be strings. Think of it this way:
1125 double-quote expansion is used to produce new strings. If you already
1126 have a string, why do you need more?
1128 If you get used to writing odd things like these:
1131 my $new = "$old"; # BAD
1132 somefunc("$var"); # BAD
1134 You'll be in trouble. Those should (in 99.8% of the cases) be
1135 the simpler and more direct:
1141 Otherwise, besides slowing you down, you're going to break code when
1142 the thing in the scalar is actually neither a string nor a number, but
1148 my $oref = "$aref"; # WRONG
1151 You can also get into subtle problems on those few operations in Perl
1152 that actually do care about the difference between a string and a
1153 number, such as the magical C<++> autoincrement operator or the
1156 Stringification also destroys arrays.
1158 my @lines = `command`;
1159 print "@lines"; # WRONG - extra blanks
1160 print @lines; # right
1162 =head2 Why don't my E<lt>E<lt>HERE documents work?
1164 Here documents are found in L<perlop>. Check for these three things:
1168 =item There must be no space after the E<lt>E<lt> part.
1170 =item There (probably) should be a semicolon at the end of the opening token
1172 =item You can't (easily) have any space in front of the tag.
1174 =item There needs to be at least a line separator after the end token.
1178 If you want to indent the text in the here document, you
1182 (my $VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1187 But the HERE_TARGET must still be flush against the margin.
1188 If you want that indented also, you'll have to quote
1191 (my $quote = <<' FINIS') =~ s/^\s+//gm;
1192 ...we will have peace, when you and all your works have
1193 perished--and the works of your dark master to whom you
1194 would deliver us. You are a liar, Saruman, and a corrupter
1195 of men's hearts. --Theoden in /usr/src/perl/taint.c
1197 $quote =~ s/\s+--/\n--/;
1199 A nice general-purpose fixer-upper function for indented here documents
1200 follows. It expects to be called with a here document as its argument.
1201 It looks to see whether each line begins with a common substring, and
1202 if so, strips that substring off. Otherwise, it takes the amount of leading
1203 whitespace found on the first line and removes that much off each
1208 my ($white, $leader); # common whitespace and common leading string
1209 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\g1\g2?.*\n)+$/) {
1210 ($white, $leader) = ($2, quotemeta($1));
1212 ($white, $leader) = (/^(\s+)/, '');
1214 s/^\s*?$leader(?:$white)?//gm;
1218 This works with leading special strings, dynamically determined:
1220 my $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
1223 @@@ SAVEI32(runlevel);
1225 @@@ while ( op = (*op->op_ppaddr)() );
1229 MAIN_INTERPRETER_LOOP
1231 Or with a fixed amount of leading whitespace, with remaining
1232 indentation correctly preserved:
1234 my $poem = fix<<EVER_ON_AND_ON;
1235 Now far ahead the Road has gone,
1236 And I must follow, if I can,
1237 Pursuing it with eager feet,
1238 Until it joins some larger way
1239 Where many paths and errands meet.
1240 And whither then? I cannot say.
1241 --Bilbo in /usr/src/perl/pp_ctl.c
1246 =head2 What is the difference between a list and an array?
1248 (contributed by brian d foy)
1250 A list is a fixed collection of scalars. An array is a variable that
1251 holds a variable collection of scalars. An array can supply its collection
1252 for list operations, so list operations also work on arrays:
1255 ( 'dog', 'cat', 'bird' )[2,3];
1259 foreach ( qw( dog cat bird ) ) { ... }
1260 foreach ( @animals ) { ... }
1262 my @three = grep { length == 3 } qw( dog cat bird );
1263 my @three = grep { length == 3 } @animals;
1265 # supply an argument list
1266 wash_animals( qw( dog cat bird ) );
1267 wash_animals( @animals );
1269 Array operations, which change the scalars, rearrange them, or add
1270 or subtract some scalars, only work on arrays. These can't work on a
1271 list, which is fixed. Array operations include C<shift>, C<unshift>,
1272 C<push>, C<pop>, and C<splice>.
1274 An array can also change its length:
1276 $#animals = 1; # truncate to two elements
1277 $#animals = 10000; # pre-extend to 10,001 elements
1279 You can change an array element, but you can't change a list element:
1281 $animals[0] = 'Rottweiler';
1282 qw( dog cat bird )[0] = 'Rottweiler'; # syntax error!
1284 foreach ( @animals ) {
1285 s/^d/fr/; # works fine
1288 foreach ( qw( dog cat bird ) ) {
1289 s/^d/fr/; # Error! Modification of read only value!
1292 However, if the list element is itself a variable, it appears that you
1293 can change a list element. However, the list element is the variable, not
1294 the data. You're not changing the list element, but something the list
1295 element refers to. The list element itself doesn't change: it's still
1298 You also have to be careful about context. You can assign an array to
1299 a scalar to get the number of elements in the array. This only works
1302 my $count = @animals; # only works with arrays
1304 If you try to do the same thing with what you think is a list, you
1305 get a quite different result. Although it looks like you have a list
1306 on the righthand side, Perl actually sees a bunch of scalars separated
1309 my $scalar = ( 'dog', 'cat', 'bird' ); # $scalar gets bird
1311 Since you're assigning to a scalar, the righthand side is in scalar
1312 context. The comma operator (yes, it's an operator!) in scalar
1313 context evaluates its lefthand side, throws away the result, and
1314 evaluates it's righthand side and returns the result. In effect,
1315 that list-lookalike assigns to C<$scalar> it's rightmost value. Many
1316 people mess this up because they choose a list-lookalike whose
1317 last element is also the count they expect:
1319 my $scalar = ( 1, 2, 3 ); # $scalar gets 3, accidentally
1321 =head2 What is the difference between $array[1] and @array[1]?
1323 (contributed by brian d foy)
1325 The difference is the sigil, that special character in front of the
1326 array name. The C<$> sigil means "exactly one item", while the C<@>
1327 sigil means "zero or more items". The C<$> gets you a single scalar,
1328 while the C<@> gets you a list.
1330 The confusion arises because people incorrectly assume that the sigil
1331 denotes the variable type.
1333 The C<$array[1]> is a single-element access to the array. It's going
1334 to return the item in index 1 (or undef if there is no item there).
1335 If you intend to get exactly one element from the array, this is the
1336 form you should use.
1338 The C<@array[1]> is an array slice, although it has only one index.
1339 You can pull out multiple elements simultaneously by specifying
1340 additional indices as a list, like C<@array[1,4,3,0]>.
1342 Using a slice on the lefthand side of the assignment supplies list
1343 context to the righthand side. This can lead to unexpected results.
1344 For instance, if you want to read a single line from a filehandle,
1345 assigning to a scalar value is fine:
1347 $array[1] = <STDIN>;
1349 However, in list context, the line input operator returns all of the
1350 lines as a list. The first line goes into C<@array[1]> and the rest
1351 of the lines mysteriously disappear:
1353 @array[1] = <STDIN>; # most likely not what you want
1355 Either the C<use warnings> pragma or the B<-w> flag will warn you when
1356 you use an array slice with a single index.
1358 =head2 How can I remove duplicate elements from a list or array?
1360 (contributed by brian d foy)
1362 Use a hash. When you think the words "unique" or "duplicated", think
1365 If you don't care about the order of the elements, you could just
1366 create the hash then extract the keys. It's not important how you
1367 create that hash: just that you use C<keys> to get the unique
1370 my %hash = map { $_, 1 } @array;
1371 # or a hash slice: @hash{ @array } = ();
1372 # or a foreach: $hash{$_} = 1 foreach ( @array );
1374 my @unique = keys %hash;
1376 If you want to use a module, try the C<uniq> function from
1377 L<List::MoreUtils>. In list context it returns the unique elements,
1378 preserving their order in the list. In scalar context, it returns the
1379 number of unique elements.
1381 use List::MoreUtils qw(uniq);
1383 my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1384 my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
1386 You can also go through each element and skip the ones you've seen
1387 before. Use a hash to keep track. The first time the loop sees an
1388 element, that element has no key in C<%Seen>. The C<next> statement
1389 creates the key and immediately uses its value, which is C<undef>, so
1390 the loop continues to the C<push> and increments the value for that
1391 key. The next time the loop sees that same element, its key exists in
1392 the hash I<and> the value for that key is true (since it's not 0 or
1393 C<undef>), so the next skips that iteration and the loop goes to the
1399 foreach my $elem ( @array ) {
1400 next if $seen{ $elem }++;
1401 push @unique, $elem;
1404 You can write this more briefly using a grep, which does the
1408 my @unique = grep { ! $seen{ $_ }++ } @array;
1410 =head2 How can I tell whether a certain element is contained in a list or array?
1412 (portions of this answer contributed by Anno Siegel and brian d foy)
1414 Hearing the word "in" is an I<in>dication that you probably should have
1415 used a hash, not a list or array, to store your data. Hashes are
1416 designed to answer this question quickly and efficiently. Arrays aren't.
1418 That being said, there are several ways to approach this. In Perl 5.10
1419 and later, you can use the smart match operator to check that an item is
1420 contained in an array or a hash:
1424 if( $item ~~ @array ) {
1425 say "The array contains $item"
1428 if( $item ~~ %hash ) {
1429 say "The hash contains $item"
1432 With earlier versions of Perl, you have to do a bit more work. If you
1433 are going to make this query many times over arbitrary string values,
1434 the fastest way is probably to invert the original array and maintain a
1435 hash whose keys are the first array's values:
1437 my @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1439 for (@blues) { $is_blue{$_} = 1 }
1441 Now you can check whether C<$is_blue{$some_color}>. It might have
1442 been a good idea to keep the blues all in a hash in the first place.
1444 If the values are all small integers, you could use a simple indexed
1445 array. This kind of an array will take up less space:
1447 my @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1448 my @is_tiny_prime = ();
1449 for (@primes) { $is_tiny_prime[$_] = 1 }
1450 # or simply @istiny_prime[@primes] = (1) x @primes;
1452 Now you check whether $is_tiny_prime[$some_number].
1454 If the values in question are integers instead of strings, you can save
1455 quite a lot of space by using bit strings instead:
1457 my @articles = ( 1..10, 150..2000, 2017 );
1459 for (@articles) { vec($read,$_,1) = 1 }
1461 Now check whether C<vec($read,$n,1)> is true for some C<$n>.
1463 These methods guarantee fast individual tests but require a re-organization
1464 of the original list or array. They only pay off if you have to test
1465 multiple values against the same array.
1467 If you are testing only once, the standard module L<List::Util> exports
1468 the function C<first> for this purpose. It works by stopping once it
1469 finds the element. It's written in C for speed, and its Perl equivalent
1470 looks like this subroutine:
1475 return $_ if &{$code}();
1480 If speed is of little concern, the common idiom uses grep in scalar context
1481 (which returns the number of items that passed its condition) to traverse the
1482 entire list. This does have the benefit of telling you how many matches it
1485 my $is_there = grep $_ eq $whatever, @array;
1487 If you want to actually extract the matching elements, simply use grep in
1490 my @matches = grep $_ eq $whatever, @array;
1492 =head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
1494 Use a hash. Here's code to do both and more. It assumes that each
1495 element is unique in a given array:
1497 my (@union, @intersection, @difference);
1499 foreach my $element (@array1, @array2) { $count{$element}++ }
1500 foreach my $element (keys %count) {
1501 push @union, $element;
1502 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1505 Note that this is the I<symmetric difference>, that is, all elements
1506 in either A or in B but not in both. Think of it as an xor operation.
1508 =head2 How do I test whether two arrays or hashes are equal?
1510 With Perl 5.10 and later, the smart match operator can give you the answer
1511 with the least amount of work:
1515 if( @array1 ~~ @array2 ) {
1516 say "The arrays are the same";
1519 if( %hash1 ~~ %hash2 ) # doesn't check values! {
1520 say "The hash keys are the same";
1523 The following code works for single-level arrays. It uses a
1524 stringwise comparison, and does not distinguish defined versus
1525 undefined empty strings. Modify if you have other needs.
1527 $are_equal = compare_arrays(\@frogs, \@toads);
1529 sub compare_arrays {
1530 my ($first, $second) = @_;
1531 no warnings; # silence spurious -w undef complaints
1532 return 0 unless @$first == @$second;
1533 for (my $i = 0; $i < @$first; $i++) {
1534 return 0 if $first->[$i] ne $second->[$i];
1539 For multilevel structures, you may wish to use an approach more
1540 like this one. It uses the CPAN module L<FreezeThaw>:
1542 use FreezeThaw qw(cmpStr);
1543 my @a = my @b = ( "this", "that", [ "more", "stuff" ] );
1545 printf "a and b contain %s arrays\n",
1546 cmpStr(\@a, \@b) == 0
1550 This approach also works for comparing hashes. Here we'll demonstrate
1551 two different answers:
1553 use FreezeThaw qw(cmpStr cmpStrHard);
1555 my %a = my %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1559 printf "a and b contain %s hashes\n",
1560 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1562 printf "a and b contain %s hashes\n",
1563 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1566 The first reports that both those the hashes contain the same data,
1567 while the second reports that they do not. Which you prefer is left as
1568 an exercise to the reader.
1570 =head2 How do I find the first array element for which a condition is true?
1572 To find the first array element which satisfies a condition, you can
1573 use the C<first()> function in the L<List::Util> module, which comes
1574 with Perl 5.8. This example finds the first element that contains
1577 use List::Util qw(first);
1579 my $element = first { /Perl/ } @array;
1581 If you cannot use L<List::Util>, you can make your own loop to do the
1582 same thing. Once you find the element, you stop the loop with last.
1585 foreach ( @array ) {
1586 if( /Perl/ ) { $found = $_; last }
1589 If you want the array index, use the C<firstidx()> function from
1592 use List::MoreUtils qw(firstidx);
1593 my $index = firstidx { /Perl/ } @array;
1595 Or write it yourself, iterating through the indices
1596 and checking the array element at each index until you find one
1597 that satisfies the condition:
1599 my( $found, $index ) = ( undef, -1 );
1600 for( $i = 0; $i < @array; $i++ ) {
1601 if( $array[$i] =~ /Perl/ ) {
1602 $found = $array[$i];
1608 =head2 How do I handle linked lists?
1610 (contributed by brian d foy)
1612 Perl's arrays do not have a fixed size, so you don't need linked lists
1613 if you just want to add or remove items. You can use array operations
1614 such as C<push>, C<pop>, C<shift>, C<unshift>, or C<splice> to do
1617 Sometimes, however, linked lists can be useful in situations where you
1618 want to "shard" an array so you have many small arrays instead of
1619 a single big array. You can keep arrays longer than Perl's largest
1620 array index, lock smaller arrays separately in threaded programs,
1621 reallocate less memory, or quickly insert elements in the middle of
1624 Steve Lembark goes through the details in his YAPC::NA 2009 talk "Perly
1625 Linked Lists" ( L<http://www.slideshare.net/lembark/perly-linked-lists> ),
1626 although you can just use his L<LinkedList::Single> module.
1628 =head2 How do I handle circular lists?
1629 X<circular> X<array> X<Tie::Cycle> X<Array::Iterator::Circular>
1632 (contributed by brian d foy)
1634 If you want to cycle through an array endlessly, you can increment the
1635 index modulo the number of elements in the array:
1637 my @array = qw( a b c );
1641 print $array[ $i++ % @array ], "\n";
1645 You can also use L<Tie::Cycle> to use a scalar that always has the
1646 next element of the circular array:
1650 tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1652 print $cycle; # FFFFFF
1653 print $cycle; # 000000
1654 print $cycle; # FFFF00
1656 The L<Array::Iterator::Circular> creates an iterator object for
1659 use Array::Iterator::Circular;
1661 my $color_iterator = Array::Iterator::Circular->new(
1662 qw(red green blue orange)
1665 foreach ( 1 .. 20 ) {
1666 print $color_iterator->next, "\n";
1669 =head2 How do I shuffle an array randomly?
1671 If you either have Perl 5.8.0 or later installed, or if you have
1672 Scalar-List-Utils 1.03 or later installed, you can say:
1674 use List::Util 'shuffle';
1676 @shuffled = shuffle(@list);
1678 If not, you can use a Fisher-Yates shuffle.
1680 sub fisher_yates_shuffle {
1681 my $deck = shift; # $deck is a reference to an array
1682 return unless @$deck; # must not be empty!
1686 my $j = int rand ($i+1);
1687 @$deck[$i,$j] = @$deck[$j,$i];
1691 # shuffle my mpeg collection
1693 my @mpeg = <audio/*/*.mp3>;
1694 fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
1697 Note that the above implementation shuffles an array in place,
1698 unlike the C<List::Util::shuffle()> which takes a list and returns
1699 a new shuffled list.
1701 You've probably seen shuffling algorithms that work using splice,
1702 randomly picking another element to swap the current element with
1706 @old = 1 .. 10; # just a demo
1708 push(@new, splice(@old, rand @old, 1));
1711 This is bad because splice is already O(N), and since you do it N
1712 times, you just invented a quadratic algorithm; that is, O(N**2).
1713 This does not scale, although Perl is so efficient that you probably
1714 won't notice this until you have rather largish arrays.
1716 =head2 How do I process/modify each element of an array?
1718 Use C<for>/C<foreach>:
1721 s/foo/bar/; # change that word
1722 tr/XZ/ZX/; # swap those letters
1725 Here's another; let's compute spherical volumes:
1727 my @volumes = @radii;
1728 for (@volumes) { # @volumes has changed parts
1730 $_ *= (4/3) * 3.14159; # this will be constant folded
1733 which can also be done with C<map()> which is made to transform
1734 one list into another:
1736 my @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
1738 If you want to do the same thing to modify the values of the
1739 hash, you can use the C<values> function. As of Perl 5.6
1740 the values are not copied, so if you modify $orbit (in this
1741 case), you modify the value.
1743 for my $orbit ( values %orbits ) {
1744 ($orbit **= 3) *= (4/3) * 3.14159;
1747 Prior to perl 5.6 C<values> returned copies of the values,
1748 so older perl code often contains constructions such as
1749 C<@orbits{keys %orbits}> instead of C<values %orbits> where
1750 the hash is to be modified.
1752 =head2 How do I select a random element from an array?
1754 Use the C<rand()> function (see L<perlfunc/rand>):
1756 my $index = rand @array;
1757 my $element = $array[$index];
1761 my $element = $array[ rand @array ];
1763 =head2 How do I permute N elements of a list?
1764 X<List::Permutor> X<permute> X<Algorithm::Loops> X<Knuth>
1765 X<The Art of Computer Programming> X<Fischer-Krause>
1767 Use the L<List::Permutor> module on CPAN. If the list is actually an
1768 array, try the L<Algorithm::Permute> module (also on CPAN). It's
1769 written in XS code and is very efficient:
1771 use Algorithm::Permute;
1773 my @array = 'a'..'d';
1774 my $p_iterator = Algorithm::Permute->new ( \@array );
1776 while (my @perm = $p_iterator->next) {
1777 print "next permutation: (@perm)\n";
1780 For even faster execution, you could do:
1782 use Algorithm::Permute;
1784 my @array = 'a'..'d';
1786 Algorithm::Permute::permute {
1787 print "next permutation: (@array)\n";
1790 Here's a little program that generates all permutations of all the
1791 words on each line of input. The algorithm embodied in the
1792 C<permute()> function is discussed in Volume 4 (still unpublished) of
1793 Knuth's I<The Art of Computer Programming> and will work on any list:
1796 # Fischer-Krause ordered permutation generator
1801 while ( $code->(@_[@idx]) ) {
1803 --$p while $idx[$p-1] > $idx[$p];
1804 my $q = $p or return;
1805 push @idx, reverse splice @idx, $p;
1806 ++$q while $idx[$p-1] > $idx[$q];
1807 @idx[$p-1,$q]=@idx[$q,$p-1];
1811 permute { print "@_\n" } split;
1813 The L<Algorithm::Loops> module also provides the C<NextPermute> and
1814 C<NextPermuteNum> functions which efficiently find all unique permutations
1815 of an array, even if it contains duplicate values, modifying it in-place:
1816 if its elements are in reverse-sorted order then the array is reversed,
1817 making it sorted, and it returns false; otherwise the next
1818 permutation is returned.
1820 C<NextPermute> uses string order and C<NextPermuteNum> numeric order, so
1821 you can enumerate all the permutations of C<0..9> like this:
1823 use Algorithm::Loops qw(NextPermuteNum);
1826 do { print "@list\n" } while NextPermuteNum @list;
1828 =head2 How do I sort an array by (anything)?
1830 Supply a comparison function to sort() (described in L<perlfunc/sort>):
1832 @list = sort { $a <=> $b } @list;
1834 The default sort function is cmp, string comparison, which would
1835 sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is
1836 the numerical comparison operator.
1838 If you have a complicated function needed to pull out the part you
1839 want to sort on, then don't do it inside the sort function. Pull it
1840 out first, because the sort BLOCK can be called many times for the
1841 same element. Here's an example of how to pull out the first word
1842 after the first number on each item, and then sort those words
1848 ($item) = /\d+\s*(\S+)/;
1849 push @idx, uc($item);
1851 my @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1853 which could also be written this way, using a trick
1854 that's come to be known as the Schwartzian Transform:
1856 my @sorted = map { $_->[0] }
1857 sort { $a->[1] cmp $b->[1] }
1858 map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
1860 If you need to sort on several fields, the following paradigm is useful.
1863 field1($a) <=> field1($b) ||
1864 field2($a) cmp field2($b) ||
1865 field3($a) cmp field3($b)
1868 This can be conveniently combined with precalculation of keys as given
1871 See the F<sort> article in the "Far More Than You Ever Wanted
1872 To Know" collection in L<http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz> for
1873 more about this approach.
1875 See also the question later in L<perlfaq4> on sorting hashes.
1877 =head2 How do I manipulate arrays of bits?
1879 Use C<pack()> and C<unpack()>, or else C<vec()> and the bitwise
1882 For example, you don't have to store individual bits in an array
1883 (which would mean that you're wasting a lot of space). To convert an
1884 array of bits to a string, use C<vec()> to set the right bits. This
1885 sets C<$vec> to have bit N set only if C<$ints[N]> was set:
1887 my @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
1889 foreach( 0 .. $#ints ) {
1890 vec($vec,$_,1) = 1 if $ints[$_];
1893 The string C<$vec> only takes up as many bits as it needs. For
1894 instance, if you had 16 entries in C<@ints>, C<$vec> only needs two
1895 bytes to store them (not counting the scalar variable overhead).
1897 Here's how, given a vector in C<$vec>, you can get those bits into
1898 your C<@ints> array:
1900 sub bitvec_to_list {
1903 # Find null-byte density then select best algorithm
1904 if ($vec =~ tr/\0// / length $vec > 0.95) {
1908 # This method is faster with mostly null-bytes
1909 while($vec =~ /[^\0]/g ) {
1910 $i = -9 + 8 * pos $vec;
1911 push @ints, $i if vec($vec, ++$i, 1);
1912 push @ints, $i if vec($vec, ++$i, 1);
1913 push @ints, $i if vec($vec, ++$i, 1);
1914 push @ints, $i if vec($vec, ++$i, 1);
1915 push @ints, $i if vec($vec, ++$i, 1);
1916 push @ints, $i if vec($vec, ++$i, 1);
1917 push @ints, $i if vec($vec, ++$i, 1);
1918 push @ints, $i if vec($vec, ++$i, 1);
1922 # This method is a fast general algorithm
1924 my $bits = unpack "b*", $vec;
1925 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1926 push @ints, pos $bits while($bits =~ /1/g);
1932 This method gets faster the more sparse the bit vector is.
1933 (Courtesy of Tim Bunce and Winfried Koenig.)
1935 You can make the while loop a lot shorter with this suggestion
1936 from Benjamin Goldberg:
1938 while($vec =~ /[^\0]+/g ) {
1939 push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1942 Or use the CPAN module L<Bit::Vector>:
1944 my $vector = Bit::Vector->new($num_of_bits);
1945 $vector->Index_List_Store(@ints);
1946 my @ints = $vector->Index_List_Read();
1948 L<Bit::Vector> provides efficient methods for bit vector, sets of
1949 small integers and "big int" math.
1951 Here's a more extensive illustration using vec():
1954 my $vector = "\xff\x0f\xef\xfe";
1955 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1956 unpack("N", $vector), "\n";
1957 my $is_set = vec($vector, 23, 1);
1958 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1976 my ($offset, $width, $value) = @_;
1978 vec($vector, $offset, $width) = $value;
1979 print "offset=$offset width=$width value=$value\n";
1985 my $bits = unpack("b*", $vector);
1989 print "vector length in bytes: ", length($vector), "\n";
1990 @bytes = unpack("A8" x length($vector), $bits);
1991 print "bits are: @bytes\n\n";
1994 =head2 Why does defined() return true on empty arrays and hashes?
1996 The short story is that you should probably only use defined on scalars or
1997 functions, not on aggregates (arrays and hashes). See L<perlfunc/defined>
1998 in the 5.004 release or later of Perl for more detail.
2000 =head1 Data: Hashes (Associative Arrays)
2002 =head2 How do I process an entire hash?
2004 (contributed by brian d foy)
2006 There are a couple of ways that you can process an entire hash. You
2007 can get a list of keys, then go through each key, or grab a one
2008 key-value pair at a time.
2010 To go through all of the keys, use the C<keys> function. This extracts
2011 all of the keys of the hash and gives them back to you as a list. You
2012 can then get the value through the particular key you're processing:
2014 foreach my $key ( keys %hash ) {
2015 my $value = $hash{$key}
2019 Once you have the list of keys, you can process that list before you
2020 process the hash elements. For instance, you can sort the keys so you
2021 can process them in lexical order:
2023 foreach my $key ( sort keys %hash ) {
2024 my $value = $hash{$key}
2028 Or, you might want to only process some of the items. If you only want
2029 to deal with the keys that start with C<text:>, you can select just
2030 those using C<grep>:
2032 foreach my $key ( grep /^text:/, keys %hash ) {
2033 my $value = $hash{$key}
2037 If the hash is very large, you might not want to create a long list of
2038 keys. To save some memory, you can grab one key-value pair at a time using
2039 C<each()>, which returns a pair you haven't seen yet:
2041 while( my( $key, $value ) = each( %hash ) ) {
2045 The C<each> operator returns the pairs in apparently random order, so if
2046 ordering matters to you, you'll have to stick with the C<keys> method.
2048 The C<each()> operator can be a bit tricky though. You can't add or
2049 delete keys of the hash while you're using it without possibly
2050 skipping or re-processing some pairs after Perl internally rehashes
2051 all of the elements. Additionally, a hash has only one iterator, so if
2052 you mix C<keys>, C<values>, or C<each> on the same hash, you risk resetting
2053 the iterator and messing up your processing. See the C<each> entry in
2054 L<perlfunc> for more details.
2056 =head2 How do I merge two hashes?
2057 X<hash> X<merge> X<slice, hash>
2059 (contributed by brian d foy)
2061 Before you decide to merge two hashes, you have to decide what to do
2062 if both hashes contain keys that are the same and if you want to leave
2063 the original hashes as they were.
2065 If you want to preserve the original hashes, copy one hash (C<%hash1>)
2066 to a new hash (C<%new_hash>), then add the keys from the other hash
2067 (C<%hash2> to the new hash. Checking that the key already exists in
2068 C<%new_hash> gives you a chance to decide what to do with the
2071 my %new_hash = %hash1; # make a copy; leave %hash1 alone
2073 foreach my $key2 ( keys %hash2 ) {
2074 if( exists $new_hash{$key2} ) {
2075 warn "Key [$key2] is in both hashes!";
2076 # handle the duplicate (perhaps only warning)
2081 $new_hash{$key2} = $hash2{$key2};
2085 If you don't want to create a new hash, you can still use this looping
2086 technique; just change the C<%new_hash> to C<%hash1>.
2088 foreach my $key2 ( keys %hash2 ) {
2089 if( exists $hash1{$key2} ) {
2090 warn "Key [$key2] is in both hashes!";
2091 # handle the duplicate (perhaps only warning)
2096 $hash1{$key2} = $hash2{$key2};
2100 If you don't care that one hash overwrites keys and values from the other, you
2101 could just use a hash slice to add one hash to another. In this case, values
2102 from C<%hash2> replace values from C<%hash1> when they have keys in common:
2104 @hash1{ keys %hash2 } = values %hash2;
2106 =head2 What happens if I add or remove keys from a hash while iterating over it?
2108 (contributed by brian d foy)
2110 The easy answer is "Don't do that!"
2112 If you iterate through the hash with each(), you can delete the key
2113 most recently returned without worrying about it. If you delete or add
2114 other keys, the iterator may skip or double up on them since perl
2115 may rearrange the hash table. See the
2116 entry for C<each()> in L<perlfunc>.
2118 =head2 How do I look up a hash element by value?
2120 Create a reverse hash:
2122 my %by_value = reverse %by_key;
2123 my $key = $by_value{$value};
2125 That's not particularly efficient. It would be more space-efficient
2128 while (my ($key, $value) = each %by_key) {
2129 $by_value{$value} = $key;
2132 If your hash could have repeated values, the methods above will only find
2133 one of the associated keys. This may or may not worry you. If it does
2134 worry you, you can always reverse the hash into a hash of arrays instead:
2136 while (my ($key, $value) = each %by_key) {
2137 push @{$key_list_by_value{$value}}, $key;
2140 =head2 How can I know how many entries are in a hash?
2142 (contributed by brian d foy)
2144 This is very similar to "How do I process an entire hash?", also in
2145 L<perlfaq4>, but a bit simpler in the common cases.
2147 You can use the C<keys()> built-in function in scalar context to find out
2148 have many entries you have in a hash:
2150 my $key_count = keys %hash; # must be scalar context!
2152 If you want to find out how many entries have a defined value, that's
2153 a bit different. You have to check each value. A C<grep> is handy:
2155 my $defined_value_count = grep { defined } values %hash;
2157 You can use that same structure to count the entries any way that
2158 you like. If you want the count of the keys with vowels in them,
2159 you just test for that instead:
2161 my $vowel_count = grep { /[aeiou]/ } keys %hash;
2163 The C<grep> in scalar context returns the count. If you want the list
2164 of matching items, just use it in list context instead:
2166 my @defined_values = grep { defined } values %hash;
2168 The C<keys()> function also resets the iterator, which means that you may
2169 see strange results if you use this between uses of other hash operators
2172 =head2 How do I sort a hash (optionally by value instead of key)?
2174 (contributed by brian d foy)
2176 To sort a hash, start with the keys. In this example, we give the list of
2177 keys to the sort function which then compares them ASCIIbetically (which
2178 might be affected by your locale settings). The output list has the keys
2179 in ASCIIbetical order. Once we have the keys, we can go through them to
2180 create a report which lists the keys in ASCIIbetical order.
2182 my @keys = sort { $a cmp $b } keys %hash;
2184 foreach my $key ( @keys ) {
2185 printf "%-20s %6d\n", $key, $hash{$key};
2188 We could get more fancy in the C<sort()> block though. Instead of
2189 comparing the keys, we can compute a value with them and use that
2190 value as the comparison.
2192 For instance, to make our report order case-insensitive, we use
2193 C<lc> to lowercase the keys before comparing them:
2195 my @keys = sort { lc $a cmp lc $b } keys %hash;
2197 Note: if the computation is expensive or the hash has many elements,
2198 you may want to look at the Schwartzian Transform to cache the
2199 computation results.
2201 If we want to sort by the hash value instead, we use the hash key
2202 to look it up. We still get out a list of keys, but this time they
2203 are ordered by their value.
2205 my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
2207 From there we can get more complex. If the hash values are the same,
2208 we can provide a secondary sort on the hash key.
2211 $hash{$a} <=> $hash{$b}
2216 =head2 How can I always keep my hash sorted?
2217 X<hash tie sort DB_File Tie::IxHash>
2219 You can look into using the C<DB_File> module and C<tie()> using the
2220 C<$DB_BTREE> hash bindings as documented in L<DB_File/"In Memory
2221 Databases">. The L<Tie::IxHash> module from CPAN might also be
2222 instructive. Although this does keep your hash sorted, you might not
2223 like the slowdown you suffer from the tie interface. Are you sure you
2226 =head2 What's the difference between "delete" and "undef" with hashes?
2228 Hashes contain pairs of scalars: the first is the key, the
2229 second is the value. The key will be coerced to a string,
2230 although the value can be any kind of scalar: string,
2231 number, or reference. If a key C<$key> is present in
2232 %hash, C<exists($hash{$key})> will return true. The value
2233 for a given key can be C<undef>, in which case
2234 C<$hash{$key}> will be C<undef> while C<exists $hash{$key}>
2235 will return true. This corresponds to (C<$key>, C<undef>)
2238 Pictures help... Here's the C<%hash> table:
2248 And these conditions hold
2252 defined $hash{'d'} is true
2253 defined $hash{'a'} is true
2254 exists $hash{'a'} is true (Perl 5 only)
2255 grep ($_ eq 'a', keys %hash) is true
2261 your table now reads:
2272 and these conditions now hold; changes in caps:
2276 defined $hash{'d'} is true
2277 defined $hash{'a'} is FALSE
2278 exists $hash{'a'} is true (Perl 5 only)
2279 grep ($_ eq 'a', keys %hash) is true
2281 Notice the last two: you have an undef value, but a defined key!
2287 your table now reads:
2296 and these conditions now hold; changes in caps:
2300 defined $hash{'d'} is true
2301 defined $hash{'a'} is false
2302 exists $hash{'a'} is FALSE (Perl 5 only)
2303 grep ($_ eq 'a', keys %hash) is FALSE
2305 See, the whole entry is gone!
2307 =head2 Why don't my tied hashes make the defined/exists distinction?
2309 This depends on the tied hash's implementation of EXISTS().
2310 For example, there isn't the concept of undef with hashes
2311 that are tied to DBM* files. It also means that exists() and
2312 defined() do the same thing with a DBM* file, and what they
2313 end up doing is not what they do with ordinary hashes.
2315 =head2 How do I reset an each() operation part-way through?
2317 (contributed by brian d foy)
2319 You can use the C<keys> or C<values> functions to reset C<each>. To
2320 simply reset the iterator used by C<each> without doing anything else,
2321 use one of them in void context:
2323 keys %hash; # resets iterator, nothing else.
2324 values %hash; # resets iterator, nothing else.
2326 See the documentation for C<each> in L<perlfunc>.
2328 =head2 How can I get the unique keys from two hashes?
2330 First you extract the keys from the hashes into lists, then solve
2331 the "removing duplicates" problem described above. For example:
2334 for my $element (keys(%foo), keys(%bar)) {
2337 my @uniq = keys %seen;
2341 my @uniq = keys %{{%foo,%bar}};
2343 Or if you really want to save space:
2346 while (defined ($key = each %foo)) {
2349 while (defined ($key = each %bar)) {
2352 my @uniq = keys %seen;
2354 =head2 How can I store a multidimensional array in a DBM file?
2356 Either stringify the structure yourself (no fun), or else
2357 get the MLDBM (which uses Data::Dumper) module from CPAN and layer
2358 it on top of either DB_File or GDBM_File. You might also try DBM::Deep, but
2359 it can be a bit slow.
2361 =head2 How can I make my hash remember the order I put elements into it?
2363 Use the L<Tie::IxHash> from CPAN.
2367 tie my %myhash, 'Tie::IxHash';
2369 for (my $i=0; $i<20; $i++) {
2373 my @keys = keys %myhash;
2374 # @keys = (0,1,2,3,...)
2376 =head2 Why does passing a subroutine an undefined element in a hash create it?
2378 (contributed by brian d foy)
2380 Are you using a really old version of Perl?
2382 Normally, accessing a hash key's value for a nonexistent key will
2383 I<not> create the key.
2386 my $value = $hash{ 'foo' };
2387 print "This won't print\n" if exists $hash{ 'foo' };
2389 Passing C<$hash{ 'foo' }> to a subroutine used to be a special case, though.
2390 Since you could assign directly to C<$_[0]>, Perl had to be ready to
2391 make that assignment so it created the hash key ahead of time:
2393 my_sub( $hash{ 'foo' } );
2394 print "This will print before 5.004\n" if exists $hash{ 'foo' };
2397 # $_[0] = 'bar'; # create hash key in case you do this
2401 Since Perl 5.004, however, this situation is a special case and Perl
2402 creates the hash key only when you make the assignment:
2404 my_sub( $hash{ 'foo' } );
2405 print "This will print, even after 5.004\n" if exists $hash{ 'foo' };
2411 However, if you want the old behavior (and think carefully about that
2412 because it's a weird side effect), you can pass a hash slice instead.
2413 Perl 5.004 didn't make this a special case:
2415 my_sub( @hash{ qw/foo/ } );
2417 =head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
2419 Usually a hash ref, perhaps like this:
2424 TITLE => "deputy peon",
2427 PALS => [ "Norbert", "Rhys", "Phineas"],
2430 References are documented in L<perlref> and L<perlreftut>.
2431 Examples of complex data structures are given in L<perldsc> and
2432 L<perllol>. Examples of structures and object-oriented classes are
2435 =head2 How can I use a reference as a hash key?
2437 (contributed by brian d foy and Ben Morrow)
2439 Hash keys are strings, so you can't really use a reference as the key.
2440 When you try to do that, perl turns the reference into its stringified
2441 form (for instance, C<HASH(0xDEADBEEF)>). From there you can't get
2442 back the reference from the stringified form, at least without doing
2443 some extra work on your own.
2445 Remember that the entry in the hash will still be there even if
2446 the referenced variable goes out of scope, and that it is entirely
2447 possible for Perl to subsequently allocate a different variable at
2448 the same address. This will mean a new variable might accidentally
2449 be associated with the value for an old.
2451 If you have Perl 5.10 or later, and you just want to store a value
2452 against the reference for lookup later, you can use the core
2453 Hash::Util::Fieldhash module. This will also handle renaming the
2454 keys if you use multiple threads (which causes all variables to be
2455 reallocated at new addresses, changing their stringification), and
2456 garbage-collecting the entries when the referenced variable goes out
2459 If you actually need to be able to get a real reference back from
2460 each hash entry, you can use the Tie::RefHash module, which does the
2461 required work for you.
2463 =head2 How can I check if a key exists in a multilevel hash?
2465 (contributed by brian d foy)
2467 The trick to this problem is avoiding accidental autovivification. If
2468 you want to check three keys deep, you might naE<0xEF>vely try this:
2471 if( exists $hash{key1}{key2}{key3} ) {
2475 Even though you started with a completely empty hash, after that call to
2476 C<exists> you've created the structure you needed to check for C<key3>:
2484 That's autovivification. You can get around this in a few ways. The
2485 easiest way is to just turn it off. The lexical C<autovivification>
2486 pragma is available on CPAN. Now you don't add to the hash:
2489 no autovivification;
2491 if( exists $hash{key1}{key2}{key3} ) {
2496 The L<Data::Diver> module on CPAN can do it for you too. Its C<Dive>
2497 subroutine can tell you not only if the keys exist but also get the
2500 use Data::Diver qw(Dive);
2502 my @exists = Dive( \%hash, qw(key1 key2 key3) );
2504 ...; # keys do not exist
2506 elsif( ! defined $exists[0] ) {
2507 ...; # keys exist but value is undef
2510 You can easily do this yourself too by checking each level of the hash
2511 before you move onto the next level. This is essentially what
2512 L<Data::Diver> does for you:
2514 if( check_hash( \%hash, qw(key1 key2 key3) ) ) {
2519 my( $hash, @keys ) = @_;
2521 return unless @keys;
2523 foreach my $key ( @keys ) {
2524 return unless eval { exists $hash->{$key} };
2525 $hash = $hash->{$key};
2531 =head2 How can I prevent addition of unwanted keys into a hash?
2533 Since version 5.8.0, hashes can be I<restricted> to a fixed number
2534 of given keys. Methods for creating and dealing with restricted hashes
2535 are exported by the L<Hash::Util> module.
2539 =head2 How do I handle binary data correctly?
2541 Perl is binary-clean, so it can handle binary data just fine.
2542 On Windows or DOS, however, you have to use C<binmode> for binary
2543 files to avoid conversions for line endings. In general, you should
2544 use C<binmode> any time you want to work with binary data.
2546 Also see L<perlfunc/"binmode"> or L<perlopentut>.
2548 If you're concerned about 8-bit textual data then see L<perllocale>.
2549 If you want to deal with multibyte characters, however, there are
2550 some gotchas. See the section on Regular Expressions.
2552 =head2 How do I determine whether a scalar is a number/whole/integer/float?
2554 Assuming that you don't care about IEEE notations like "NaN" or
2555 "Infinity", you probably just want to use a regular expression:
2561 { say "\thas nondigits"; continue }
2563 { say "\tis a whole number"; continue }
2565 { say "\tis an integer"; continue }
2566 when( /^[+-]?\d+\z/ )
2567 { say "\tis a +/- integer"; continue }
2568 when( /^-?(?:\d+\.?|\.\d)\d*\z/ )
2569 { say "\tis a real number"; continue }
2570 when( /^[+-]?(?=\.?\d)\d*\.?\d*(?:e[+-]?\d+)?\z/i)
2571 { say "\tis a C float" }
2574 There are also some commonly used modules for the task.
2575 L<Scalar::Util> (distributed with 5.8) provides access to perl's
2576 internal function C<looks_like_number> for determining whether a
2577 variable looks like a number. L<Data::Types> exports functions that
2578 validate data types using both the above and other regular
2579 expressions. Thirdly, there is L<Regexp::Common> which has regular
2580 expressions to match various types of numbers. Those three modules are
2581 available from the CPAN.
2583 If you're on a POSIX system, Perl supports the C<POSIX::strtod>
2584 function for converting strings to doubles (and also C<POSIX::strtol>
2585 for longs). Its semantics are somewhat cumbersome, so here's a
2586 C<getnum> wrapper function for more convenient access. This function
2587 takes a string and returns the number it found, or C<undef> for input
2588 that isn't a C float. The C<is_numeric> function is a front end to
2589 C<getnum> if you just want to say, "Is this a float?"
2592 use POSIX qw(strtod);
2597 my($num, $unparsed) = strtod($str);
2598 if (($str eq '') || ($unparsed != 0) || $!) {
2606 sub is_numeric { defined getnum($_[0]) }
2608 Or you could check out the L<String::Scanf> module on the CPAN
2611 =head2 How do I keep persistent data across program calls?
2613 For some specific applications, you can use one of the DBM modules.
2614 See L<AnyDBM_File>. More generically, you should consult the L<FreezeThaw>
2615 or L<Storable> modules from CPAN. Starting from Perl 5.8, L<Storable> is part
2616 of the standard distribution. Here's one example using L<Storable>'s C<store>
2617 and C<retrieve> functions:
2620 store(\%hash, "filename");
2623 $href = retrieve("filename"); # by ref
2624 %hash = %{ retrieve("filename") }; # direct to hash
2626 =head2 How do I print out or copy a recursive data structure?
2628 The L<Data::Dumper> module on CPAN (or the 5.005 release of Perl) is great
2629 for printing out data structures. The L<Storable> module on CPAN (or the
2630 5.8 release of Perl), provides a function called C<dclone> that recursively
2631 copies its argument.
2633 use Storable qw(dclone);
2636 Where C<$r1> can be a reference to any kind of data structure you'd like.
2637 It will be deeply copied. Because C<dclone> takes and returns references,
2638 you'd have to add extra punctuation if you had a hash of arrays that
2641 %newhash = %{ dclone(\%oldhash) };
2643 =head2 How do I define methods for every class/object?
2645 (contributed by Ben Morrow)
2647 You can use the C<UNIVERSAL> class (see L<UNIVERSAL>). However, please
2648 be very careful to consider the consequences of doing this: adding
2649 methods to every object is very likely to have unintended
2650 consequences. If possible, it would be better to have all your object
2651 inherit from some common base class, or to use an object system like
2652 Moose that supports roles.
2654 =head2 How do I verify a credit card checksum?
2656 Get the L<Business::CreditCard> module from CPAN.
2658 =head2 How do I pack arrays of doubles or floats for XS code?
2660 The arrays.h/arrays.c code in the L<PGPLOT> module on CPAN does just this.
2661 If you're doing a lot of float or double processing, consider using
2662 the L<PDL> module from CPAN instead--it makes number-crunching easy.
2664 See L<http://search.cpan.org/dist/PGPLOT> for the code.
2667 =head1 AUTHOR AND COPYRIGHT
2669 Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and
2670 other authors as noted. All rights reserved.
2672 This documentation is free; you can redistribute it and/or modify it
2673 under the same terms as Perl itself.
2675 Irrespective of its distribution, all code examples in this file
2676 are hereby placed into the public domain. You are permitted and
2677 encouraged to use this code in your own programs for fun
2678 or for profit as you see fit. A simple comment in the code giving
2679 credit would be courteous but is not required.