pod/perlfaq4.pod

   1 =head1 NAME
   2
   3 perlfaq4 - Data Manipulation
   4
   5 =head1 DESCRIPTION
   6
   7 This section of the FAQ answers questions related to manipulating
   8 numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
   9
  10 =head1 Data: Numbers
  11
  12 =head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
  13
  14 Internally, your computer represents floating-point numbers in binary.
  15 Digital (as in powers of two) computers cannot store all numbers
  16 exactly.  Some real numbers lose precision in the process.  This is a
  17 problem with how computers store numbers and affects all computer
  18 languages, not just Perl.
  19
  20 L<perlnumber> shows the gory details of number representations and
  21 conversions.
  22
  23 To limit the number of decimal places in your numbers, you can use the
  24 C<printf> or C<sprintf> function.  See the L<"Floating Point
  25 Arithmetic"|perlop> for more details.
  26
  27         printf "%.2f", 10/3;
  28
  29         my $number = sprintf "%.2f", 10/3;
  30
  31 =head2 Why is int() broken?
  32
  33 Your C<int()> is most probably working just fine.  It's the numbers that
  34 aren't quite what you think.
  35
  36 First, see the answer to "Why am I getting long decimals
  37 (eg, 19.9499999999999) instead of the numbers I should be getting
  38 (eg, 19.95)?".
  39
  40 For example, this
  41
  42         print int(0.6/0.2-2), "\n";
  43
  44 will in most computers print 0, not 1, because even such simple
  45 numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
  46 numbers.  What you think in the above as 'three' is really more like
  47 2.9999999999999995559.
  48
  49 =head2 Why isn't my octal data interpreted correctly?
  50
  51 (contributed by brian d foy)
  52
  53 You're probably trying to convert a string to a number, which Perl only
  54 converts as a decimal number. When Perl converts a string to a number, it
  55 ignores leading spaces and zeroes, then assumes the rest of the digits
  56 are in base 10:
  57
  58         my $string = '0644';
  59
  60         print $string + 0;  # prints 644
  61
  62         print $string + 44; # prints 688, certainly not octal!
  63
  64 This problem usually involves one of the Perl built-ins that has the
  65 same name a Unix command that uses octal numbers as arguments on the
  66 command line. In this example, C<chmod> on the command line knows that
  67 its first argument is octal because that's what it does:
  68
  69         %prompt> chmod 644 file
  70
  71 If you want to use the same literal digits (644) in Perl, you have to tell
  72 Perl to treat them as octal numbers either by prefixing the digits with
  73 a C<0> or using C<oct>:
  74
  75         chmod(     0644, $file);   # right, has leading zero
  76         chmod( oct(644), $file );  # also correct
  77
  78 The problem comes in when you take your numbers from something that Perl
  79 thinks is a string, such as a command line argument in C<@ARGV>:
  80
  81         chmod( $ARGV[0],      $file);   # wrong, even if "0644"
  82
  83         chmod( oct($ARGV[0]), $file );  # correct, treat string as octal
  84
  85 You can always check the value you're using by printing it in octal
  86 notation to ensure it matches what you think it should be. Print it
  87 in octal  and decimal format:
  88
  89         printf "0%o %d", $number, $number;
  90
  91 =head2 Does Perl have a round() function?  What about ceil() and floor()?  Trig functions?
  92
  93 Remember that C<int()> merely truncates toward 0.  For rounding to a
  94 certain number of digits, C<sprintf()> or C<printf()> is usually the
  95 easiest route.
  96
  97         printf("%.3f", 3.1415926535);   # prints 3.142
  98
  99 The C<POSIX> module (part of the standard Perl distribution)
 100 implements C<ceil()>, C<floor()>, and a number of other mathematical
 101 and trigonometric functions.
 102
 103         use POSIX;
 104         $ceil   = ceil(3.5);   # 4
 105         $floor  = floor(3.5);  # 3
 106
 107 In 5.000 to 5.003 perls, trigonometry was done in the C<Math::Complex>
 108 module.  With 5.004, the C<Math::Trig> module (part of the standard Perl
 109 distribution) implements the trigonometric functions. Internally it
 110 uses the C<Math::Complex> module and some functions can break out from
 111 the real axis into the complex plane, for example the inverse sine of
 112 2.
 113
 114 Rounding in financial applications can have serious implications, and
 115 the rounding method used should be specified precisely.  In these
 116 cases, it probably pays not to trust whichever system rounding is
 117 being used by Perl, but to instead implement the rounding function you
 118 need yourself.
 119
 120 To see why, notice how you'll still have an issue on half-way-point
 121 alternation:
 122
 123         for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
 124
 125         0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
 126         0.8 0.8 0.9 0.9 1.0 1.0
 127
 128 Don't blame Perl.  It's the same as in C.  IEEE says we have to do
 129 this. Perl numbers whose absolute values are integers under 2**31 (on
 130 32 bit machines) will work pretty much like mathematical integers.
 131 Other numbers are not guaranteed.
 132
 133 =head2 How do I convert between numeric representations/bases/radixes?
 134
 135 As always with Perl there is more than one way to do it.  Below are a
 136 few examples of approaches to making common conversions between number
 137 representations.  This is intended to be representational rather than
 138 exhaustive.
 139
 140 Some of the examples later in L<perlfaq4> use the C<Bit::Vector>
 141 module from CPAN. The reason you might choose C<Bit::Vector> over the
 142 perl built in functions is that it works with numbers of ANY size,
 143 that it is optimized for speed on some operations, and for at least
 144 some programmers the notation might be familiar.
 145
 146 =over 4
 147
 148 =item How do I convert hexadecimal into decimal
 149
 150 Using perl's built in conversion of C<0x> notation:
 151
 152         $dec = 0xDEADBEEF;
 153
 154 Using the C<hex> function:
 155
 156         $dec = hex("DEADBEEF");
 157
 158 Using C<pack>:
 159
 160         $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
 161
 162 Using the CPAN module C<Bit::Vector>:
 163
 164         use Bit::Vector;
 165         $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
 166         $dec = $vec->to_Dec();
 167
 168 =item How do I convert from decimal to hexadecimal
 169
 170 Using C<sprintf>:
 171
 172         $hex = sprintf("%X", 3735928559); # upper case A-F
 173         $hex = sprintf("%x", 3735928559); # lower case a-f
 174
 175 Using C<unpack>:
 176
 177         $hex = unpack("H*", pack("N", 3735928559));
 178
 179 Using C<Bit::Vector>:
 180
 181         use Bit::Vector;
 182         $vec = Bit::Vector->new_Dec(32, -559038737);
 183         $hex = $vec->to_Hex();
 184
 185 And C<Bit::Vector> supports odd bit counts:
 186
 187         use Bit::Vector;
 188         $vec = Bit::Vector->new_Dec(33, 3735928559);
 189         $vec->Resize(32); # suppress leading 0 if unwanted
 190         $hex = $vec->to_Hex();
 191
 192 =item How do I convert from octal to decimal
 193
 194 Using Perl's built in conversion of numbers with leading zeros:
 195
 196         $dec = 033653337357; # note the leading 0!
 197
 198 Using the C<oct> function:
 199
 200         $dec = oct("33653337357");
 201
 202 Using C<Bit::Vector>:
 203
 204         use Bit::Vector;
 205         $vec = Bit::Vector->new(32);
 206         $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
 207         $dec = $vec->to_Dec();
 208
 209 =item How do I convert from decimal to octal
 210
 211 Using C<sprintf>:
 212
 213         $oct = sprintf("%o", 3735928559);
 214
 215 Using C<Bit::Vector>:
 216
 217         use Bit::Vector;
 218         $vec = Bit::Vector->new_Dec(32, -559038737);
 219         $oct = reverse join('', $vec->Chunk_List_Read(3));
 220
 221 =item How do I convert from binary to decimal
 222
 223 Perl 5.6 lets you write binary numbers directly with
 224 the C<0b> notation:
 225
 226         $number = 0b10110110;
 227
 228 Using C<oct>:
 229
 230         my $input = "10110110";
 231         $decimal = oct( "0b$input" );
 232
 233 Using C<pack> and C<ord>:
 234
 235         $decimal = ord(pack('B8', '10110110'));
 236
 237 Using C<pack> and C<unpack> for larger strings:
 238
 239         $int = unpack("N", pack("B32",
 240         substr("0" x 32 . "11110101011011011111011101111", -32)));
 241         $dec = sprintf("%d", $int);
 242
 243         # substr() is used to left pad a 32 character string with zeros.
 244
 245 Using C<Bit::Vector>:
 246
 247         $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
 248         $dec = $vec->to_Dec();
 249
 250 =item How do I convert from decimal to binary
 251
 252 Using C<sprintf> (perl 5.6+):
 253
 254         $bin = sprintf("%b", 3735928559);
 255
 256 Using C<unpack>:
 257
 258         $bin = unpack("B*", pack("N", 3735928559));
 259
 260 Using C<Bit::Vector>:
 261
 262         use Bit::Vector;
 263         $vec = Bit::Vector->new_Dec(32, -559038737);
 264         $bin = $vec->to_Bin();
 265
 266 The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
 267 are left as an exercise to the inclined reader.
 268
 269 =back
 270
 271 =head2 Why doesn't & work the way I want it to?
 272
 273 The behavior of binary arithmetic operators depends on whether they're
 274 used on numbers or strings.  The operators treat a string as a series
 275 of bits and work with that (the string C<"3"> is the bit pattern
 276 C<00110011>).  The operators work with the binary form of a number
 277 (the number C<3> is treated as the bit pattern C<00000011>).
 278
 279 So, saying C<11 & 3> performs the "and" operation on numbers (yielding
 280 C<3>).  Saying C<"11" & "3"> performs the "and" operation on strings
 281 (yielding C<"1">).
 282
 283 Most problems with C<&> and C<|> arise because the programmer thinks
 284 they have a number but really it's a string.  The rest arise because
 285 the programmer says:
 286
 287         if ("\020\020" & "\101\101") {
 288                 # ...
 289                 }
 290
 291 but a string consisting of two null bytes (the result of C<"\020\020"
 292 & "\101\101">) is not a false value in Perl.  You need:
 293
 294         if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
 295                 # ...
 296                 }
 297
 298 =head2 How do I multiply matrices?
 299
 300 Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
 301 or the PDL extension (also available from CPAN).
 302
 303 =head2 How do I perform an operation on a series of integers?
 304
 305 To call a function on each element in an array, and collect the
 306 results, use:
 307
 308         @results = map { my_func($_) } @array;
 309
 310 For example:
 311
 312         @triple = map { 3 * $_ } @single;
 313
 314 To call a function on each element of an array, but ignore the
 315 results:
 316
 317         foreach $iterator (@array) {
 318                 some_func($iterator);
 319                 }
 320
 321 To call a function on each integer in a (small) range, you B<can> use:
 322
 323         @results = map { some_func($_) } (5 .. 25);
 324
 325 but you should be aware that the C<..> operator creates an array of
 326 all integers in the range.  This can take a lot of memory for large
 327 ranges.  Instead use:
 328
 329         @results = ();
 330         for ($i=5; $i < 500_005; $i++) {
 331                 push(@results, some_func($i));
 332                 }
 333
 334 This situation has been fixed in Perl5.005. Use of C<..> in a C<for>
 335 loop will iterate over the range, without creating the entire range.
 336
 337         for my $i (5 .. 500_005) {
 338                 push(@results, some_func($i));
 339                 }
 340
 341 will not create a list of 500,000 integers.
 342
 343 =head2 How can I output Roman numerals?
 344
 345 Get the http://www.cpan.org/modules/by-module/Roman module.
 346
 347 =head2 Why aren't my random numbers random?
 348
 349 If you're using a version of Perl before 5.004, you must call C<srand>
 350 once at the start of your program to seed the random number generator.
 351
 352          BEGIN { srand() if $] < 5.004 }
 353
 354 5.004 and later automatically call C<srand> at the beginning.  Don't
 355 call C<srand> more than once--you make your numbers less random,
 356 rather than more.
 357
 358 Computers are good at being predictable and bad at being random
 359 (despite appearances caused by bugs in your programs :-).  see the
 360 F<random> article in the "Far More Than You Ever Wanted To Know"
 361 collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy
 362 of Tom Phoenix, talks more about this.  John von Neumann said, "Anyone
 363 who attempts to generate random numbers by deterministic means is, of
 364 course, living in a state of sin."
 365
 366 If you want numbers that are more random than C<rand> with C<srand>
 367 provides, you should also check out the C<Math::TrulyRandom> module from
 368 CPAN.  It uses the imperfections in your system's timer to generate
 369 random numbers, but this takes quite a while.  If you want a better
 370 pseudorandom generator than comes with your operating system, look at
 371 "Numerical Recipes in C" at http://www.nr.com/ .
 372
 373 =head2 How do I get a random number between X and Y?
 374
 375 To get a random number between two values, you can use the C<rand()>
 376 built-in to get a random number between 0 and 1. From there, you shift
 377 that into the range that you want.
 378
 379 C<rand($x)> returns a number such that C<< 0 <= rand($x) < $x >>. Thus
 380 what you want to have perl figure out is a random number in the range
 381 from 0 to the difference between your I<X> and I<Y>.
 382
 383 That is, to get a number between 10 and 15, inclusive, you want a
 384 random number between 0 and 5 that you can then add to 10.
 385
 386         my $number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
 387
 388 Hence you derive the following simple function to abstract
 389 that. It selects a random integer between the two given
 390 integers (inclusive), For example: C<random_int_between(50,120)>.
 391
 392         sub random_int_between {
 393                 my($min, $max) = @_;
 394                 # Assumes that the two arguments are integers themselves!
 395                 return $min if $min == $max;
 396                 ($min, $max) = ($max, $min)  if  $min > $max;
 397                 return $min + int rand(1 + $max - $min);
 398                 }
 399
 400 =head1 Data: Dates
 401
 402 =head2 How do I find the day or week of the year?
 403
 404 The localtime function returns the day of the year.  Without an
 405 argument localtime uses the current time.
 406
 407         $day_of_year = (localtime)[7];
 408
 409 The C<POSIX> module can also format a date as the day of the year or
 410 week of the year.
 411
 412         use POSIX qw/strftime/;
 413         my $day_of_year  = strftime "%j", localtime;
 414         my $week_of_year = strftime "%W", localtime;
 415
 416 To get the day of year for any date, use C<POSIX>'s C<mktime> to get
 417 a time in epoch seconds for the argument to localtime.
 418
 419         use POSIX qw/mktime strftime/;
 420         my $week_of_year = strftime "%W",
 421                 localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
 422
 423 The C<Date::Calc> module provides two functions to calculate these.
 424
 425         use Date::Calc;
 426         my $day_of_year  = Day_of_Year(  1987, 12, 18 );
 427         my $week_of_year = Week_of_Year( 1987, 12, 18 );
 428
 429 =head2 How do I find the current century or millennium?
 430
 431 Use the following simple functions:
 432
 433         sub get_century    {
 434                 return int((((localtime(shift || time))[5] + 1999))/100);
 435                 }
 436
 437         sub get_millennium {
 438                 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
 439                 }
 440
 441 On some systems, the C<POSIX> module's C<strftime()> function has been
 442 extended in a non-standard way to use a C<%C> format, which they
 443 sometimes claim is the "century". It isn't, because on most such
 444 systems, this is only the first two digits of the four-digit year, and
 445 thus cannot be used to reliably determine the current century or
 446 millennium.
 447
 448 =head2 How can I compare two dates and find the difference?
 449
 450 (contributed by brian d foy)
 451
 452 You could just store all your dates as a number and then subtract.
 453 Life isn't always that simple though. If you want to work with
 454 formatted dates, the C<Date::Manip>, C<Date::Calc>, or C<DateTime>
 455 modules can help you.
 456
 457 =head2 How can I take a string and turn it into epoch seconds?
 458
 459 If it's a regular enough string that it always has the same format,
 460 you can split it up and pass the parts to C<timelocal> in the standard
 461 C<Time::Local> module.  Otherwise, you should look into the C<Date::Calc>
 462 and C<Date::Manip> modules from CPAN.
 463
 464 =head2 How can I find the Julian Day?
 465
 466 (contributed by brian d foy and Dave Cross)
 467
 468 You can use the C<Time::JulianDay> module available on CPAN.  Ensure
 469 that you really want to find a Julian day, though, as many people have
 470 different ideas about Julian days.  See
 471 http://www.hermetic.ch/cal_stud/jdn.htm for instance.
 472
 473 You can also try the C<DateTime> module, which can convert a date/time
 474 to a Julian Day.
 475
 476         $ perl -MDateTime -le'print DateTime->today->jd'
 477         2453401.5
 478
 479 Or the modified Julian Day
 480
 481         $ perl -MDateTime -le'print DateTime->today->mjd'
 482         53401
 483
 484 Or even the day of the year (which is what some people think of as a
 485 Julian day)
 486
 487         $ perl -MDateTime -le'print DateTime->today->doy'
 488         31
 489
 490 =head2 How do I find yesterday's date?
 491 X<date> X<yesterday> X<DateTime> X<Date::Calc> X<Time::Local>
 492 X<daylight saving time> X<day> X<Today_and_Now> X<localtime>
 493 X<timelocal>
 494
 495 (contributed by brian d foy)
 496
 497 Use one of the Date modules. The C<DateTime> module makes it simple, and
 498 give you the same time of day, only the day before.
 499
 500         use DateTime;
 501
 502         my $yesterday = DateTime->now->subtract( days => 1 );
 503
 504         print "Yesterday was $yesterday\n";
 505
 506 You can also use the C<Date::Calc> module using its C<Today_and_Now>
 507 function.
 508
 509         use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
 510
 511         my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
 512
 513         print "@date_time\n";
 514
 515 Most people try to use the time rather than the calendar to figure out
 516 dates, but that assumes that days are twenty-four hours each.  For
 517 most people, there are two days a year when they aren't: the switch to
 518 and from summer time throws this off. Let the modules do the work.
 519
 520 If you absolutely must do it yourself (or can't use one of the
 521 modules), here's a solution using C<Time::Local>, which comes with
 522 Perl:
 523
 524         # contributed by Gunnar Hjalmarsson
 525          use Time::Local;
 526          my $today = timelocal 0, 0, 12, ( localtime )[3..5];
 527          my ($d, $m, $y) = ( localtime $today-86400 )[3..5];
 528          printf "Yesterday: %d-%02d-%02d\n", $y+1900, $m+1, $d;
 529
 530 In this case, you measure the day starting at noon, and subtract 24
 531 hours. Even if the length of the calendar day is 23 or 25 hours,
 532 you'll still end up on the previous calendar day, although not at
 533 noon. Since you don't care about the time, the one hour difference
 534 doesn't matter and you end up with the previous date.
 535
 536 =head2 Does Perl have a Year 2000 or 2038 problem? Is Perl Y2K compliant?
 537
 538 (contributed by brian d foy)
 539
 540 Perl itself never had a Y2K problem, although that never stopped people
 541 from creating Y2K problems on their own. See the documentation for
 542 C<localtime> for its proper use.
 543
 544 Starting with Perl 5.11, C<localtime> and C<gmtime> can handle dates past
 545 03:14:08 January 19, 2038, when a 32-bit based time would overflow. You
 546 still might get a warning on a 32-bit C<perl>:
 547
 548         % perl5.11.2 -E 'say scalar localtime( 0x9FFF_FFFFFFFF )'
 549         Integer overflow in hexadecimal number at -e line 1.
 550         Wed Nov  1 19:42:39 5576711
 551
 552 On a 64-bit C<perl>, you can get even larger dates for those really long
 553 running projects:
 554
 555         % perl5.11.2 -E 'say scalar gmtime( 0x9FFF_FFFFFFFF )'
 556         Thu Nov  2 00:42:39 5576711
 557
 558 You're still out of luck if you need to keep tracking of decaying protons
 559 though.
 560
 561 =head1 Data: Strings
 562
 563 =head2 How do I validate input?
 564
 565 (contributed by brian d foy)
 566
 567 There are many ways to ensure that values are what you expect or
 568 want to accept. Besides the specific examples that we cover in the
 569 perlfaq, you can also look at the modules with "Assert" and "Validate"
 570 in their names, along with other modules such as C<Regexp::Common>.
 571
 572 Some modules have validation for particular types of input, such
 573 as C<Business::ISBN>, C<Business::CreditCard>, C<Email::Valid>,
 574 and C<Data::Validate::IP>.
 575
 576 =head2 How do I unescape a string?
 577
 578 It depends just what you mean by "escape".  URL escapes are dealt
 579 with in L<perlfaq9>.  Shell escapes with the backslash (C<\>)
 580 character are removed with
 581
 582         s/\\(.)/$1/g;
 583
 584 This won't expand C<"\n"> or C<"\t"> or any other special escapes.
 585
 586 =head2 How do I remove consecutive pairs of characters?
 587
 588 (contributed by brian d foy)
 589
 590 You can use the substitution operator to find pairs of characters (or
 591 runs of characters) and replace them with a single instance. In this
 592 substitution, we find a character in C<(.)>. The memory parentheses
 593 store the matched character in the back-reference C<\1> and we use
 594 that to require that the same thing immediately follow it. We replace
 595 that part of the string with the character in C<$1>.
 596
 597         s/(.)\1/$1/g;
 598
 599 We can also use the transliteration operator, C<tr///>. In this
 600 example, the search list side of our C<tr///> contains nothing, but
 601 the C<c> option complements that so it contains everything. The
 602 replacement list also contains nothing, so the transliteration is
 603 almost a no-op since it won't do any replacements (or more exactly,
 604 replace the character with itself). However, the C<s> option squashes
 605 duplicated and consecutive characters in the string so a character
 606 does not show up next to itself
 607
 608         my $str = 'Haarlem';   # in the Netherlands
 609         $str =~ tr///cs;       # Now Harlem, like in New York
 610
 611 =head2 How do I expand function calls in a string?
 612
 613 (contributed by brian d foy)
 614
 615 This is documented in L<perlref>, and although it's not the easiest
 616 thing to read, it does work. In each of these examples, we call the
 617 function inside the braces used to dereference a reference. If we
 618 have more than one return value, we can construct and dereference an
 619 anonymous array. In this case, we call the function in list context.
 620
 621         print "The time values are @{ [localtime] }.\n";
 622
 623 If we want to call the function in scalar context, we have to do a bit
 624 more work. We can really have any code we like inside the braces, so
 625 we simply have to end with the scalar reference, although how you do
 626 that is up to you, and you can use code inside the braces. Note that
 627 the use of parens creates a list context, so we need C<scalar> to
 628 force the scalar context on the function:
 629
 630         print "The time is ${\(scalar localtime)}.\n"
 631
 632         print "The time is ${ my $x = localtime; \$x }.\n";
 633
 634 If your function already returns a reference, you don't need to create
 635 the reference yourself.
 636
 637         sub timestamp { my $t = localtime; \$t }
 638
 639         print "The time is ${ timestamp() }.\n";
 640
 641 The C<Interpolation> module can also do a lot of magic for you. You can
 642 specify a variable name, in this case C<E>, to set up a tied hash that
 643 does the interpolation for you. It has several other methods to do this
 644 as well.
 645
 646         use Interpolation E => 'eval';
 647         print "The time values are $E{localtime()}.\n";
 648
 649 In most cases, it is probably easier to simply use string concatenation,
 650 which also forces scalar context.
 651
 652         print "The time is " . localtime() . ".\n";
 653
 654 =head2 How do I find matching/nesting anything?
 655
 656 This isn't something that can be done in one regular expression, no
 657 matter how complicated.  To find something between two single
 658 characters, a pattern like C</x([^x]*)x/> will get the intervening
 659 bits in $1. For multiple ones, then something more like
 660 C</alpha(.*?)omega/> would be needed. But none of these deals with
 661 nested patterns.  For balanced expressions using C<(>, C<{>, C<[> or
 662 C<< < >> as delimiters, use the CPAN module Regexp::Common, or see
 663 L<perlre/(??{ code })>.  For other cases, you'll have to write a
 664 parser.
 665
 666 If you are serious about writing a parser, there are a number of
 667 modules or oddities that will make your life a lot easier.  There are
 668 the CPAN modules C<Parse::RecDescent>, C<Parse::Yapp>, and
 669 C<Text::Balanced>; and the C<byacc> program. Starting from perl 5.8
 670 the C<Text::Balanced> is part of the standard distribution.
 671
 672 One simple destructive, inside-out approach that you might try is to
 673 pull out the smallest nesting parts one at a time:
 674
 675         while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
 676                 # do something with $1
 677                 }
 678
 679 A more complicated and sneaky approach is to make Perl's regular
 680 expression engine do it for you.  This is courtesy Dean Inada, and
 681 rather has the nature of an Obfuscated Perl Contest entry, but it
 682 really does work:
 683
 684         # $_ contains the string to parse
 685         # BEGIN and END are the opening and closing markers for the
 686         # nested text.
 687
 688         @( = ('(','');
 689         @) = (')','');
 690         ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
 691         @$ = (eval{/$re/},$@!~/unmatched/i);
 692         print join("\n",@$[0..$#$]) if( $$[-1] );
 693
 694 =head2 How do I reverse a string?
 695
 696 Use C<reverse()> in scalar context, as documented in
 697 L<perlfunc/reverse>.
 698
 699         $reversed = reverse $string;
 700
 701 =head2 How do I expand tabs in a string?
 702
 703 You can do it yourself:
 704
 705         1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
 706
 707 Or you can just use the C<Text::Tabs> module (part of the standard Perl
 708 distribution).
 709
 710         use Text::Tabs;
 711         @expanded_lines = expand(@lines_with_tabs);
 712
 713 =head2 How do I reformat a paragraph?
 714
 715 Use C<Text::Wrap> (part of the standard Perl distribution):
 716
 717         use Text::Wrap;
 718         print wrap("\t", '  ', @paragraphs);
 719
 720 The paragraphs you give to C<Text::Wrap> should not contain embedded
 721 newlines.  C<Text::Wrap> doesn't justify the lines (flush-right).
 722
 723 Or use the CPAN module C<Text::Autoformat>.  Formatting files can be
 724 easily done by making a shell alias, like so:
 725
 726         alias fmt="perl -i -MText::Autoformat -n0777 \
 727                 -e 'print autoformat $_, {all=>1}' $*"
 728
 729 See the documentation for C<Text::Autoformat> to appreciate its many
 730 capabilities.
 731
 732 =head2 How can I access or change N characters of a string?
 733
 734 You can access the first characters of a string with substr().
 735 To get the first character, for example, start at position 0
 736 and grab the string of length 1.
 737
 738
 739         $string = "Just another Perl Hacker";
 740         $first_char = substr( $string, 0, 1 );  #  'J'
 741
 742 To change part of a string, you can use the optional fourth
 743 argument which is the replacement string.
 744
 745         substr( $string, 13, 4, "Perl 5.8.0" );
 746
 747 You can also use substr() as an lvalue.
 748
 749         substr( $string, 13, 4 ) =  "Perl 5.8.0";
 750
 751 =head2 How do I change the Nth occurrence of something?
 752
 753 You have to keep track of N yourself.  For example, let's say you want
 754 to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
 755 C<"whosoever"> or C<"whomsoever">, case insensitively.  These
 756 all assume that $_ contains the string to be altered.
 757
 758         $count = 0;
 759         s{((whom?)ever)}{
 760         ++$count == 5       # is it the 5th?
 761             ? "${2}soever"  # yes, swap
 762             : $1            # renege and leave it there
 763                 }ige;
 764
 765 In the more general case, you can use the C</g> modifier in a C<while>
 766 loop, keeping count of matches.
 767
 768         $WANT = 3;
 769         $count = 0;
 770         $_ = "One fish two fish red fish blue fish";
 771         while (/(\w+)\s+fish\b/gi) {
 772                 if (++$count == $WANT) {
 773                         print "The third fish is a $1 one.\n";
 774                         }
 775                 }
 776
 777 That prints out: C<"The third fish is a red one.">  You can also use a
 778 repetition count and repeated pattern like this:
 779
 780         /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
 781
 782 =head2 How can I count the number of occurrences of a substring within a string?
 783
 784 There are a number of ways, with varying efficiency.  If you want a
 785 count of a certain single character (X) within a string, you can use the
 786 C<tr///> function like so:
 787
 788         $string = "ThisXlineXhasXsomeXx'sXinXit";
 789         $count = ($string =~ tr/X//);
 790         print "There are $count X characters in the string";
 791
 792 This is fine if you are just looking for a single character.  However,
 793 if you are trying to count multiple character substrings within a
 794 larger string, C<tr///> won't work.  What you can do is wrap a while()
 795 loop around a global pattern match.  For example, let's count negative
 796 integers:
 797
 798         $string = "-9 55 48 -2 23 -76 4 14 -44";
 799         while ($string =~ /-\d+/g) { $count++ }
 800         print "There are $count negative numbers in the string";
 801
 802 Another version uses a global match in list context, then assigns the
 803 result to a scalar, producing a count of the number of matches.
 804
 805         $count = () = $string =~ /-\d+/g;
 806
 807 =head2 How do I capitalize all the words on one line?
 808 X<Text::Autoformat> X<capitalize> X<case, title> X<case, sentence>
 809
 810 (contributed by brian d foy)
 811
 812 Damian Conway's L<Text::Autoformat> handles all of the thinking
 813 for you.
 814
 815         use Text::Autoformat;
 816         my $x = "Dr. Strangelove or: How I Learned to Stop ".
 817           "Worrying and Love the Bomb";
 818
 819         print $x, "\n";
 820         for my $style (qw( sentence title highlight )) {
 821                 print autoformat($x, { case => $style }), "\n";
 822                 }
 823
 824 How do you want to capitalize those words?
 825
 826         FRED AND BARNEY'S LODGE        # all uppercase
 827         Fred And Barney's Lodge        # title case
 828         Fred and Barney's Lodge        # highlight case
 829
 830 It's not as easy a problem as it looks. How many words do you think
 831 are in there? Wait for it... wait for it.... If you answered 5
 832 you're right. Perl words are groups of C<\w+>, but that's not what
 833 you want to capitalize. How is Perl supposed to know not to capitalize
 834 that C<s> after the apostrophe? You could try a regular expression:
 835
 836         $string =~ s/ (
 837                                  (^\w)    #at the beginning of the line
 838                                    |      # or
 839                                  (\s\w)   #preceded by whitespace
 840                                    )
 841                                 /\U$1/xg;
 842
 843         $string =~ s/([\w']+)/\u\L$1/g;
 844
 845 Now, what if you don't want to capitalize that "and"? Just use
 846 L<Text::Autoformat> and get on with the next problem. :)
 847
 848 =head2 How can I split a [character] delimited string except when inside [character]?
 849
 850 Several modules can handle this sort of parsing--C<Text::Balanced>,
 851 C<Text::CSV>, C<Text::CSV_XS>, and C<Text::ParseWords>, among others.
 852
 853 Take the example case of trying to split a string that is
 854 comma-separated into its different fields. You can't use C<split(/,/)>
 855 because you shouldn't split if the comma is inside quotes.  For
 856 example, take a data line like this:
 857
 858         SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
 859
 860 Due to the restriction of the quotes, this is a fairly complex
 861 problem.  Thankfully, we have Jeffrey Friedl, author of
 862 I<Mastering Regular Expressions>, to handle these for us.  He
 863 suggests (assuming your string is contained in C<$text>):
 864
 865          @new = ();
 866          push(@new, $+) while $text =~ m{
 867                  "([^\"\\]*(?:\\.[^\"\\]*)*)",?  # groups the phrase inside the quotes
 868                 | ([^,]+),?
 869                 | ,
 870                 }gx;
 871          push(@new, undef) if substr($text,-1,1) eq ',';
 872
 873 If you want to represent quotation marks inside a
 874 quotation-mark-delimited field, escape them with backslashes (eg,
 875 C<"like \"this\"">.
 876
 877 Alternatively, the C<Text::ParseWords> module (part of the standard
 878 Perl distribution) lets you say:
 879
 880         use Text::ParseWords;
 881         @new = quotewords(",", 0, $text);
 882
 883 =head2 How do I strip blank space from the beginning/end of a string?
 884
 885 (contributed by brian d foy)
 886
 887 A substitution can do this for you. For a single line, you want to
 888 replace all the leading or trailing whitespace with nothing. You
 889 can do that with a pair of substitutions.
 890
 891         s/^\s+//;
 892         s/\s+$//;
 893
 894 You can also write that as a single substitution, although it turns
 895 out the combined statement is slower than the separate ones. That
 896 might not matter to you, though.
 897
 898         s/^\s+|\s+$//g;
 899
 900 In this regular expression, the alternation matches either at the
 901 beginning or the end of the string since the anchors have a lower
 902 precedence than the alternation. With the C</g> flag, the substitution
 903 makes all possible matches, so it gets both. Remember, the trailing
 904 newline matches the C<\s+>, and  the C<$> anchor can match to the
 905 physical end of the string, so the newline disappears too. Just add
 906 the newline to the output, which has the added benefit of preserving
 907 "blank" (consisting entirely of whitespace) lines which the C<^\s+>
 908 would remove all by itself.
 909
 910         while( <> )
 911                 {
 912                 s/^\s+|\s+$//g;
 913                 print "$_\n";
 914                 }
 915
 916 For a multi-line string, you can apply the regular expression
 917 to each logical line in the string by adding the C</m> flag (for
 918 "multi-line"). With the C</m> flag, the C<$> matches I<before> an
 919 embedded newline, so it doesn't remove it. It still removes the
 920 newline at the end of the string.
 921
 922         $string =~ s/^\s+|\s+$//gm;
 923
 924 Remember that lines consisting entirely of whitespace will disappear,
 925 since the first part of the alternation can match the entire string
 926 and replace it with nothing. If need to keep embedded blank lines,
 927 you have to do a little more work. Instead of matching any whitespace
 928 (since that includes a newline), just match the other whitespace.
 929
 930         $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
 931
 932 =head2 How do I pad a string with blanks or pad a number with zeroes?
 933
 934 In the following examples, C<$pad_len> is the length to which you wish
 935 to pad the string, C<$text> or C<$num> contains the string to be padded,
 936 and C<$pad_char> contains the padding character. You can use a single
 937 character string constant instead of the C<$pad_char> variable if you
 938 know what it is in advance. And in the same way you can use an integer in
 939 place of C<$pad_len> if you know the pad length in advance.
 940
 941 The simplest method uses the C<sprintf> function. It can pad on the left
 942 or right with blanks and on the left with zeroes and it will not
 943 truncate the result. The C<pack> function can only pad strings on the
 944 right with blanks and it will truncate the result to a maximum length of
 945 C<$pad_len>.
 946
 947         # Left padding a string with blanks (no truncation):
 948         $padded = sprintf("%${pad_len}s", $text);
 949         $padded = sprintf("%*s", $pad_len, $text);  # same thing
 950
 951         # Right padding a string with blanks (no truncation):
 952         $padded = sprintf("%-${pad_len}s", $text);
 953         $padded = sprintf("%-*s", $pad_len, $text); # same thing
 954
 955         # Left padding a number with 0 (no truncation):
 956         $padded = sprintf("%0${pad_len}d", $num);
 957         $padded = sprintf("%0*d", $pad_len, $num); # same thing
 958
 959         # Right padding a string with blanks using pack (will truncate):
 960         $padded = pack("A$pad_len",$text);
 961
 962 If you need to pad with a character other than blank or zero you can use
 963 one of the following methods.  They all generate a pad string with the
 964 C<x> operator and combine that with C<$text>. These methods do
 965 not truncate C<$text>.
 966
 967 Left and right padding with any character, creating a new string:
 968
 969         $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
 970         $padded = $text . $pad_char x ( $pad_len - length( $text ) );
 971
 972 Left and right padding with any character, modifying C<$text> directly:
 973
 974         substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
 975         $text .= $pad_char x ( $pad_len - length( $text ) );
 976
 977 =head2 How do I extract selected columns from a string?
 978
 979 (contributed by brian d foy)
 980
 981 If you know where the columns that contain the data, you can
 982 use C<substr> to extract a single column.
 983
 984         my $column = substr( $line, $start_column, $length );
 985
 986 You can use C<split> if the columns are separated by whitespace or
 987 some other delimiter, as long as whitespace or the delimiter cannot
 988 appear as part of the data.
 989
 990         my $line    = ' fred barney   betty   ';
 991         my @columns = split /\s+/, $line;
 992                 # ( '', 'fred', 'barney', 'betty' );
 993
 994         my $line    = 'fred||barney||betty';
 995         my @columns = split /\|/, $line;
 996                 # ( 'fred', '', 'barney', '', 'betty' );
 997
 998 If you want to work with comma-separated values, don't do this since
 999 that format is a bit more complicated. Use one of the modules that
1000 handle that format, such as C<Text::CSV>, C<Text::CSV_XS>, or
1001 C<Text::CSV_PP>.
1002
1003 If you want to break apart an entire line of fixed columns, you can use
1004 C<unpack> with the A (ASCII) format. By using a number after the format
1005 specifier, you can denote the column width. See the C<pack> and C<unpack>
1006 entries in L<perlfunc> for more details.
1007
1008         my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
1009
1010 Note that spaces in the format argument to C<unpack> do not denote literal
1011 spaces. If you have space separated data, you may want C<split> instead.
1012
1013 =head2 How do I find the soundex value of a string?
1014
1015 (contributed by brian d foy)
1016
1017 You can use the Text::Soundex module. If you want to do fuzzy or close
1018 matching, you might also try the C<String::Approx>, and
1019 C<Text::Metaphone>, and C<Text::DoubleMetaphone> modules.
1020
1021 =head2 How can I expand variables in text strings?
1022
1023 (contributed by brian d foy)
1024
1025 If you can avoid it, don't, or if you can use a templating system,
1026 such as C<Text::Template> or C<Template> Toolkit, do that instead. You
1027 might even be able to get the job done with C<sprintf> or C<printf>:
1028
1029         my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
1030
1031 However, for the one-off simple case where I don't want to pull out a
1032 full templating system, I'll use a string that has two Perl scalar
1033 variables in it. In this example, I want to expand C<$foo> and C<$bar>
1034 to their variable's values:
1035
1036         my $foo = 'Fred';
1037         my $bar = 'Barney';
1038         $string = 'Say hello to $foo and $bar';
1039
1040 One way I can do this involves the substitution operator and a double
1041 C</e> flag.  The first C</e> evaluates C<$1> on the replacement side and
1042 turns it into C<$foo>. The second /e starts with C<$foo> and replaces
1043 it with its value. C<$foo>, then, turns into 'Fred', and that's finally
1044 what's left in the string:
1045
1046         $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
1047
1048 The C</e> will also silently ignore violations of strict, replacing
1049 undefined variable names with the empty string. Since I'm using the
1050 C</e> flag (twice even!), I have all of the same security problems I
1051 have with C<eval> in its string form. If there's something odd in
1052 C<$foo>, perhaps something like C<@{[ system "rm -rf /" ]}>, then
1053 I could get myself in trouble.
1054
1055 To get around the security problem, I could also pull the values from
1056 a hash instead of evaluating variable names. Using a single C</e>, I
1057 can check the hash to ensure the value exists, and if it doesn't, I
1058 can replace the missing value with a marker, in this case C<???> to
1059 signal that I missed something:
1060
1061         my $string = 'This has $foo and $bar';
1062
1063         my %Replacements = (
1064                 foo  => 'Fred',
1065                 );
1066
1067         # $string =~ s/\$(\w+)/$Replacements{$1}/g;
1068         $string =~ s/\$(\w+)/
1069                 exists $Replacements{$1} ? $Replacements{$1} : '???'
1070                 /eg;
1071
1072         print $string;
1073
1074 =head2 What's wrong with always quoting "$vars"?
1075
1076 The problem is that those double-quotes force
1077 stringification--coercing numbers and references into strings--even
1078 when you don't want them to be strings.  Think of it this way:
1079 double-quote expansion is used to produce new strings.  If you already
1080 have a string, why do you need more?
1081
1082 If you get used to writing odd things like these:
1083
1084         print "$var";           # BAD
1085         $new = "$old";          # BAD
1086         somefunc("$var");       # BAD
1087
1088 You'll be in trouble.  Those should (in 99.8% of the cases) be
1089 the simpler and more direct:
1090
1091         print $var;
1092         $new = $old;
1093         somefunc($var);
1094
1095 Otherwise, besides slowing you down, you're going to break code when
1096 the thing in the scalar is actually neither a string nor a number, but
1097 a reference:
1098
1099         func(\@array);
1100         sub func {
1101                 my $aref = shift;
1102                 my $oref = "$aref";  # WRONG
1103                 }
1104
1105 You can also get into subtle problems on those few operations in Perl
1106 that actually do care about the difference between a string and a
1107 number, such as the magical C<++> autoincrement operator or the
1108 syscall() function.
1109
1110 Stringification also destroys arrays.
1111
1112         @lines = `command`;
1113         print "@lines";     # WRONG - extra blanks
1114         print @lines;       # right
1115
1116 =head2 Why don't my E<lt>E<lt>HERE documents work?
1117
1118 Check for these three things:
1119
1120 =over 4
1121
1122 =item There must be no space after the E<lt>E<lt> part.
1123
1124 =item There (probably) should be a semicolon at the end.
1125
1126 =item You can't (easily) have any space in front of the tag.
1127
1128 =back
1129
1130 If you want to indent the text in the here document, you
1131 can do this:
1132
1133     # all in one
1134     ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1135         your text
1136         goes here
1137     HERE_TARGET
1138
1139 But the HERE_TARGET must still be flush against the margin.
1140 If you want that indented also, you'll have to quote
1141 in the indentation.
1142
1143     ($quote = <<'    FINIS') =~ s/^\s+//gm;
1144             ...we will have peace, when you and all your works have
1145             perished--and the works of your dark master to whom you
1146             would deliver us. You are a liar, Saruman, and a corrupter
1147             of men's hearts.  --Theoden in /usr/src/perl/taint.c
1148         FINIS
1149     $quote =~ s/\s+--/\n--/;
1150
1151 A nice general-purpose fixer-upper function for indented here documents
1152 follows.  It expects to be called with a here document as its argument.
1153 It looks to see whether each line begins with a common substring, and
1154 if so, strips that substring off.  Otherwise, it takes the amount of leading
1155 whitespace found on the first line and removes that much off each
1156 subsequent line.
1157
1158     sub fix {
1159         local $_ = shift;
1160         my ($white, $leader);  # common whitespace and common leading string
1161         if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
1162             ($white, $leader) = ($2, quotemeta($1));
1163         } else {
1164             ($white, $leader) = (/^(\s+)/, '');
1165         }
1166         s/^\s*?$leader(?:$white)?//gm;
1167         return $_;
1168     }
1169
1170 This works with leading special strings, dynamically determined:
1171
1172         $remember_the_main = fix<<'    MAIN_INTERPRETER_LOOP';
1173         @@@ int
1174         @@@ runops() {
1175         @@@     SAVEI32(runlevel);
1176         @@@     runlevel++;
1177         @@@     while ( op = (*op->op_ppaddr)() );
1178         @@@     TAINT_NOT;
1179         @@@     return 0;
1180         @@@ }
1181         MAIN_INTERPRETER_LOOP
1182
1183 Or with a fixed amount of leading whitespace, with remaining
1184 indentation correctly preserved:
1185
1186         $poem = fix<<EVER_ON_AND_ON;
1187        Now far ahead the Road has gone,
1188           And I must follow, if I can,
1189        Pursuing it with eager feet,
1190           Until it joins some larger way
1191        Where many paths and errands meet.
1192           And whither then? I cannot say.
1193                 --Bilbo in /usr/src/perl/pp_ctl.c
1194         EVER_ON_AND_ON
1195
1196 =head1 Data: Arrays
1197
1198 =head2 What is the difference between a list and an array?
1199
1200 An array has a changeable length.  A list does not.  An array is
1201 something you can push or pop, while a list is a set of values.  Some
1202 people make the distinction that a list is a value while an array is a
1203 variable. Subroutines are passed and return lists, you put things into
1204 list context, you initialize arrays with lists, and you C<foreach()>
1205 across a list.  C<@> variables are arrays, anonymous arrays are
1206 arrays, arrays in scalar context behave like the number of elements in
1207 them, subroutines access their arguments through the array C<@_>, and
1208 C<push>/C<pop>/C<shift> only work on arrays.
1209
1210 As a side note, there's no such thing as a list in scalar context.
1211 When you say
1212
1213         $scalar = (2, 5, 7, 9);
1214
1215 you're using the comma operator in scalar context, so it uses the scalar
1216 comma operator.  There never was a list there at all! This causes the
1217 last value to be returned: 9.
1218
1219 =head2 What is the difference between $array[1] and @array[1]?
1220
1221 The former is a scalar value; the latter an array slice, making
1222 it a list with one (scalar) value.  You should use $ when you want a
1223 scalar value (most of the time) and @ when you want a list with one
1224 scalar value in it (very, very rarely; nearly never, in fact).
1225
1226 Sometimes it doesn't make a difference, but sometimes it does.
1227 For example, compare:
1228
1229         $good[0] = `some program that outputs several lines`;
1230
1231 with
1232
1233         @bad[0]  = `same program that outputs several lines`;
1234
1235 The C<use warnings> pragma and the B<-w> flag will warn you about these
1236 matters.
1237
1238 =head2 How can I remove duplicate elements from a list or array?
1239
1240 (contributed by brian d foy)
1241
1242 Use a hash. When you think the words "unique" or "duplicated", think
1243 "hash keys".
1244
1245 If you don't care about the order of the elements, you could just
1246 create the hash then extract the keys. It's not important how you
1247 create that hash: just that you use C<keys> to get the unique
1248 elements.
1249
1250         my %hash   = map { $_, 1 } @array;
1251         # or a hash slice: @hash{ @array } = ();
1252         # or a foreach: $hash{$_} = 1 foreach ( @array );
1253
1254         my @unique = keys %hash;
1255
1256 If you want to use a module, try the C<uniq> function from
1257 C<List::MoreUtils>. In list context it returns the unique elements,
1258 preserving their order in the list. In scalar context, it returns the
1259 number of unique elements.
1260
1261         use List::MoreUtils qw(uniq);
1262
1263         my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1264         my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
1265
1266 You can also go through each element and skip the ones you've seen
1267 before. Use a hash to keep track. The first time the loop sees an
1268 element, that element has no key in C<%Seen>. The C<next> statement
1269 creates the key and immediately uses its value, which is C<undef>, so
1270 the loop continues to the C<push> and increments the value for that
1271 key. The next time the loop sees that same element, its key exists in
1272 the hash I<and> the value for that key is true (since it's not 0 or
1273 C<undef>), so the next skips that iteration and the loop goes to the
1274 next element.
1275
1276         my @unique = ();
1277         my %seen   = ();
1278
1279         foreach my $elem ( @array )
1280                 {
1281                 next if $seen{ $elem }++;
1282                 push @unique, $elem;
1283                 }
1284
1285 You can write this more briefly using a grep, which does the
1286 same thing.
1287
1288         my %seen = ();
1289         my @unique = grep { ! $seen{ $_ }++ } @array;
1290
1291 =head2 How can I tell whether a certain element is contained in a list or array?
1292
1293 (portions of this answer contributed by Anno Siegel and brian d foy)
1294
1295 Hearing the word "in" is an I<in>dication that you probably should have
1296 used a hash, not a list or array, to store your data.  Hashes are
1297 designed to answer this question quickly and efficiently.  Arrays aren't.
1298
1299 That being said, there are several ways to approach this.  In Perl 5.10
1300 and later, you can use the smart match operator to check that an item is
1301 contained in an array or a hash:
1302
1303         use 5.010;
1304
1305         if( $item ~~ @array )
1306                 {
1307                 say "The array contains $item"
1308                 }
1309
1310         if( $item ~~ %hash )
1311                 {
1312                 say "The hash contains $item"
1313                 }
1314
1315 With earlier versions of Perl, you have to do a bit more work. If you
1316 are going to make this query many times over arbitrary string values,
1317 the fastest way is probably to invert the original array and maintain a
1318 hash whose keys are the first array's values:
1319
1320         @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1321         %is_blue = ();
1322         for (@blues) { $is_blue{$_} = 1 }
1323
1324 Now you can check whether C<$is_blue{$some_color}>.  It might have
1325 been a good idea to keep the blues all in a hash in the first place.
1326
1327 If the values are all small integers, you could use a simple indexed
1328 array.  This kind of an array will take up less space:
1329
1330         @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1331         @is_tiny_prime = ();
1332         for (@primes) { $is_tiny_prime[$_] = 1 }
1333         # or simply  @istiny_prime[@primes] = (1) x @primes;
1334
1335 Now you check whether $is_tiny_prime[$some_number].
1336
1337 If the values in question are integers instead of strings, you can save
1338 quite a lot of space by using bit strings instead:
1339
1340         @articles = ( 1..10, 150..2000, 2017 );
1341         undef $read;
1342         for (@articles) { vec($read,$_,1) = 1 }
1343
1344 Now check whether C<vec($read,$n,1)> is true for some C<$n>.
1345
1346 These methods guarantee fast individual tests but require a re-organization
1347 of the original list or array.  They only pay off if you have to test
1348 multiple values against the same array.
1349
1350 If you are testing only once, the standard module C<List::Util> exports
1351 the function C<first> for this purpose.  It works by stopping once it
1352 finds the element. It's written in C for speed, and its Perl equivalent
1353 looks like this subroutine:
1354
1355         sub first (&@) {
1356                 my $code = shift;
1357                 foreach (@_) {
1358                         return $_ if &{$code}();
1359                 }
1360                 undef;
1361         }
1362
1363 If speed is of little concern, the common idiom uses grep in scalar context
1364 (which returns the number of items that passed its condition) to traverse the
1365 entire list. This does have the benefit of telling you how many matches it
1366 found, though.
1367
1368         my $is_there = grep $_ eq $whatever, @array;
1369
1370 If you want to actually extract the matching elements, simply use grep in
1371 list context.
1372
1373         my @matches = grep $_ eq $whatever, @array;
1374
1375 =head2 How do I compute the difference of two arrays?  How do I compute the intersection of two arrays?
1376
1377 Use a hash.  Here's code to do both and more.  It assumes that each
1378 element is unique in a given array:
1379
1380         @union = @intersection = @difference = ();
1381         %count = ();
1382         foreach $element (@array1, @array2) { $count{$element}++ }
1383         foreach $element (keys %count) {
1384                 push @union, $element;
1385                 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1386                 }
1387
1388 Note that this is the I<symmetric difference>, that is, all elements
1389 in either A or in B but not in both.  Think of it as an xor operation.
1390
1391 =head2 How do I test whether two arrays or hashes are equal?
1392
1393 With Perl 5.10 and later, the smart match operator can give you the answer
1394 with the least amount of work:
1395
1396         use 5.010;
1397
1398         if( @array1 ~~ @array2 )
1399                 {
1400                 say "The arrays are the same";
1401                 }
1402
1403         if( %hash1 ~~ %hash2 ) # doesn't check values!
1404                 {
1405                 say "The hash keys are the same";
1406                 }
1407
1408 The following code works for single-level arrays.  It uses a
1409 stringwise comparison, and does not distinguish defined versus
1410 undefined empty strings.  Modify if you have other needs.
1411
1412         $are_equal = compare_arrays(\@frogs, \@toads);
1413
1414         sub compare_arrays {
1415                 my ($first, $second) = @_;
1416                 no warnings;  # silence spurious -w undef complaints
1417                 return 0 unless @$first == @$second;
1418                 for (my $i = 0; $i < @$first; $i++) {
1419                         return 0 if $first->[$i] ne $second->[$i];
1420                         }
1421                 return 1;
1422                 }
1423
1424 For multilevel structures, you may wish to use an approach more
1425 like this one.  It uses the CPAN module C<FreezeThaw>:
1426
1427         use FreezeThaw qw(cmpStr);
1428         @a = @b = ( "this", "that", [ "more", "stuff" ] );
1429
1430         printf "a and b contain %s arrays\n",
1431                 cmpStr(\@a, \@b) == 0
1432                 ? "the same"
1433                 : "different";
1434
1435 This approach also works for comparing hashes.  Here we'll demonstrate
1436 two different answers:
1437
1438         use FreezeThaw qw(cmpStr cmpStrHard);
1439
1440         %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1441         $a{EXTRA} = \%b;
1442         $b{EXTRA} = \%a;
1443
1444         printf "a and b contain %s hashes\n",
1445         cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1446
1447         printf "a and b contain %s hashes\n",
1448         cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1449
1450
1451 The first reports that both those the hashes contain the same data,
1452 while the second reports that they do not.  Which you prefer is left as
1453 an exercise to the reader.
1454
1455 =head2 How do I find the first array element for which a condition is true?
1456
1457 To find the first array element which satisfies a condition, you can
1458 use the C<first()> function in the C<List::Util> module, which comes
1459 with Perl 5.8. This example finds the first element that contains
1460 "Perl".
1461
1462         use List::Util qw(first);
1463
1464         my $element = first { /Perl/ } @array;
1465
1466 If you cannot use C<List::Util>, you can make your own loop to do the
1467 same thing.  Once you find the element, you stop the loop with last.
1468
1469         my $found;
1470         foreach ( @array ) {
1471                 if( /Perl/ ) { $found = $_; last }
1472                 }
1473
1474 If you want the array index, you can iterate through the indices
1475 and check the array element at each index until you find one
1476 that satisfies the condition.
1477
1478         my( $found, $index ) = ( undef, -1 );
1479         for( $i = 0; $i < @array; $i++ ) {
1480                 if( $array[$i] =~ /Perl/ ) {
1481                         $found = $array[$i];
1482                         $index = $i;
1483                         last;
1484                         }
1485                 }
1486
1487 =head2 How do I handle linked lists?
1488
1489 In general, you usually don't need a linked list in Perl, since with
1490 regular arrays, you can push and pop or shift and unshift at either
1491 end, or you can use splice to add and/or remove arbitrary number of
1492 elements at arbitrary points.  Both pop and shift are O(1)
1493 operations on Perl's dynamic arrays.  In the absence of shifts and
1494 pops, push in general needs to reallocate on the order every log(N)
1495 times, and unshift will need to copy pointers each time.
1496
1497 If you really, really wanted, you could use structures as described in
1498 L<perldsc> or L<perltoot> and do just what the algorithm book tells
1499 you to do.  For example, imagine a list node like this:
1500
1501         $node = {
1502                 VALUE => 42,
1503                 LINK  => undef,
1504                 };
1505
1506 You could walk the list this way:
1507
1508         print "List: ";
1509         for ($node = $head;  $node; $node = $node->{LINK}) {
1510                 print $node->{VALUE}, " ";
1511                 }
1512         print "\n";
1513
1514 You could add to the list this way:
1515
1516         my ($head, $tail);
1517         $tail = append($head, 1);       # grow a new head
1518         for $value ( 2 .. 10 ) {
1519                 $tail = append($tail, $value);
1520                 }
1521
1522         sub append {
1523                 my($list, $value) = @_;
1524                 my $node = { VALUE => $value };
1525                 if ($list) {
1526                         $node->{LINK} = $list->{LINK};
1527                         $list->{LINK} = $node;
1528                         }
1529                 else {
1530                         $_[0] = $node;      # replace caller's version
1531                         }
1532                 return $node;
1533                 }
1534
1535 But again, Perl's built-in are virtually always good enough.
1536
1537 =head2 How do I handle circular lists?
1538 X<circular> X<array> X<Tie::Cycle> X<Array::Iterator::Circular>
1539 X<cycle> X<modulus>
1540
1541 (contributed by brian d foy)
1542
1543 If you want to cycle through an array endlessly, you can increment the
1544 index modulo the number of elements in the array:
1545
1546         my @array = qw( a b c );
1547         my $i = 0;
1548
1549         while( 1 ) {
1550                 print $array[ $i++ % @array ], "\n";
1551                 last if $i > 20;
1552                 }
1553
1554 You can also use C<Tie::Cycle> to use a scalar that always has the
1555 next element of the circular array:
1556
1557         use Tie::Cycle;
1558
1559         tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1560
1561         print $cycle; # FFFFFF
1562         print $cycle; # 000000
1563         print $cycle; # FFFF00
1564
1565 The C<Array::Iterator::Circular> creates an iterator object for
1566 circular arrays:
1567
1568         use Array::Iterator::Circular;
1569
1570         my $color_iterator = Array::Iterator::Circular->new(
1571                 qw(red green blue orange)
1572                 );
1573
1574         foreach ( 1 .. 20 ) {
1575                 print $color_iterator->next, "\n";
1576                 }
1577
1578 =head2 How do I shuffle an array randomly?
1579
1580 If you either have Perl 5.8.0 or later installed, or if you have
1581 Scalar-List-Utils 1.03 or later installed, you can say:
1582
1583         use List::Util 'shuffle';
1584
1585         @shuffled = shuffle(@list);
1586
1587 If not, you can use a Fisher-Yates shuffle.
1588
1589         sub fisher_yates_shuffle {
1590                 my $deck = shift;  # $deck is a reference to an array
1591                 return unless @$deck; # must not be empty!
1592
1593                 my $i = @$deck;
1594                 while (--$i) {
1595                         my $j = int rand ($i+1);
1596                         @$deck[$i,$j] = @$deck[$j,$i];
1597                         }
1598         }
1599
1600         # shuffle my mpeg collection
1601         #
1602         my @mpeg = <audio/*/*.mp3>;
1603         fisher_yates_shuffle( \@mpeg );    # randomize @mpeg in place
1604         print @mpeg;
1605
1606 Note that the above implementation shuffles an array in place,
1607 unlike the C<List::Util::shuffle()> which takes a list and returns
1608 a new shuffled list.
1609
1610 You've probably seen shuffling algorithms that work using splice,
1611 randomly picking another element to swap the current element with
1612
1613         srand;
1614         @new = ();
1615         @old = 1 .. 10;  # just a demo
1616         while (@old) {
1617                 push(@new, splice(@old, rand @old, 1));
1618                 }
1619
1620 This is bad because splice is already O(N), and since you do it N
1621 times, you just invented a quadratic algorithm; that is, O(N**2).
1622 This does not scale, although Perl is so efficient that you probably
1623 won't notice this until you have rather largish arrays.
1624
1625 =head2 How do I process/modify each element of an array?
1626
1627 Use C<for>/C<foreach>:
1628
1629         for (@lines) {
1630                 s/foo/bar/;     # change that word
1631                 tr/XZ/ZX/;      # swap those letters
1632                 }
1633
1634 Here's another; let's compute spherical volumes:
1635
1636         for (@volumes = @radii) {   # @volumes has changed parts
1637                 $_ **= 3;
1638                 $_ *= (4/3) * 3.14159;  # this will be constant folded
1639                 }
1640
1641 which can also be done with C<map()> which is made to transform
1642 one list into another:
1643
1644         @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
1645
1646 If you want to do the same thing to modify the values of the
1647 hash, you can use the C<values> function.  As of Perl 5.6
1648 the values are not copied, so if you modify $orbit (in this
1649 case), you modify the value.
1650
1651         for $orbit ( values %orbits ) {
1652                 ($orbit **= 3) *= (4/3) * 3.14159;
1653                 }
1654
1655 Prior to perl 5.6 C<values> returned copies of the values,
1656 so older perl code often contains constructions such as
1657 C<@orbits{keys %orbits}> instead of C<values %orbits> where
1658 the hash is to be modified.
1659
1660 =head2 How do I select a random element from an array?
1661
1662 Use the C<rand()> function (see L<perlfunc/rand>):
1663
1664         $index   = rand @array;
1665         $element = $array[$index];
1666
1667 Or, simply:
1668
1669         my $element = $array[ rand @array ];
1670
1671 =head2 How do I permute N elements of a list?
1672 X<List::Permuter> X<permute> X<Algorithm::Loops> X<Knuth>
1673 X<The Art of Computer Programming> X<Fischer-Krause>
1674
1675 Use the C<List::Permutor> module on CPAN. If the list is actually an
1676 array, try the C<Algorithm::Permute> module (also on CPAN). It's
1677 written in XS code and is very efficient:
1678
1679         use Algorithm::Permute;
1680
1681         my @array = 'a'..'d';
1682         my $p_iterator = Algorithm::Permute->new ( \@array );
1683
1684         while (my @perm = $p_iterator->next) {
1685            print "next permutation: (@perm)\n";
1686                 }
1687
1688 For even faster execution, you could do:
1689
1690         use Algorithm::Permute;
1691
1692         my @array = 'a'..'d';
1693
1694         Algorithm::Permute::permute {
1695                 print "next permutation: (@array)\n";
1696                 } @array;
1697
1698 Here's a little program that generates all permutations of all the
1699 words on each line of input. The algorithm embodied in the
1700 C<permute()> function is discussed in Volume 4 (still unpublished) of
1701 Knuth's I<The Art of Computer Programming> and will work on any list:
1702
1703         #!/usr/bin/perl -n
1704         # Fischer-Krause ordered permutation generator
1705
1706         sub permute (&@) {
1707                 my $code = shift;
1708                 my @idx = 0..$#_;
1709                 while ( $code->(@_[@idx]) ) {
1710                         my $p = $#idx;
1711                         --$p while $idx[$p-1] > $idx[$p];
1712                         my $q = $p or return;
1713                         push @idx, reverse splice @idx, $p;
1714                         ++$q while $idx[$p-1] > $idx[$q];
1715                         @idx[$p-1,$q]=@idx[$q,$p-1];
1716                 }
1717         }
1718
1719         permute { print "@_\n" } split;
1720
1721 The C<Algorithm::Loops> module also provides the C<NextPermute> and
1722 C<NextPermuteNum> functions which efficiently find all unique permutations
1723 of an array, even if it contains duplicate values, modifying it in-place:
1724 if its elements are in reverse-sorted order then the array is reversed,
1725 making it sorted, and it returns false; otherwise the next
1726 permutation is returned.
1727
1728 C<NextPermute> uses string order and C<NextPermuteNum> numeric order, so
1729 you can enumerate all the permutations of C<0..9> like this:
1730
1731         use Algorithm::Loops qw(NextPermuteNum);
1732
1733     my @list= 0..9;
1734     do { print "@list\n" } while NextPermuteNum @list;
1735
1736 =head2 How do I sort an array by (anything)?
1737
1738 Supply a comparison function to sort() (described in L<perlfunc/sort>):
1739
1740         @list = sort { $a <=> $b } @list;
1741
1742 The default sort function is cmp, string comparison, which would
1743 sort C<(1, 2, 10)> into C<(1, 10, 2)>.  C<< <=> >>, used above, is
1744 the numerical comparison operator.
1745
1746 If you have a complicated function needed to pull out the part you
1747 want to sort on, then don't do it inside the sort function.  Pull it
1748 out first, because the sort BLOCK can be called many times for the
1749 same element.  Here's an example of how to pull out the first word
1750 after the first number on each item, and then sort those words
1751 case-insensitively.
1752
1753         @idx = ();
1754         for (@data) {
1755                 ($item) = /\d+\s*(\S+)/;
1756                 push @idx, uc($item);
1757             }
1758         @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1759
1760 which could also be written this way, using a trick
1761 that's come to be known as the Schwartzian Transform:
1762
1763         @sorted = map  { $_->[0] }
1764                 sort { $a->[1] cmp $b->[1] }
1765                 map  { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
1766
1767 If you need to sort on several fields, the following paradigm is useful.
1768
1769         @sorted = sort {
1770                 field1($a) <=> field1($b) ||
1771                 field2($a) cmp field2($b) ||
1772                 field3($a) cmp field3($b)
1773                 } @data;
1774
1775 This can be conveniently combined with precalculation of keys as given
1776 above.
1777
1778 See the F<sort> article in the "Far More Than You Ever Wanted
1779 To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
1780 more about this approach.
1781
1782 See also the question later in L<perlfaq4> on sorting hashes.
1783
1784 =head2 How do I manipulate arrays of bits?
1785
1786 Use C<pack()> and C<unpack()>, or else C<vec()> and the bitwise
1787 operations.
1788
1789 For example, you don't have to store individual bits in an array
1790 (which would mean that you're wasting a lot of space). To convert an
1791 array of bits to a string, use C<vec()> to set the right bits. This
1792 sets C<$vec> to have bit N set only if C<$ints[N]> was set:
1793
1794         @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
1795         $vec = '';
1796         foreach( 0 .. $#ints ) {
1797                 vec($vec,$_,1) = 1 if $ints[$_];
1798                 }
1799
1800 The string C<$vec> only takes up as many bits as it needs. For
1801 instance, if you had 16 entries in C<@ints>, C<$vec> only needs two
1802 bytes to store them (not counting the scalar variable overhead).
1803
1804 Here's how, given a vector in C<$vec>, you can get those bits into
1805 your C<@ints> array:
1806
1807         sub bitvec_to_list {
1808                 my $vec = shift;
1809                 my @ints;
1810                 # Find null-byte density then select best algorithm
1811                 if ($vec =~ tr/\0// / length $vec > 0.95) {
1812                         use integer;
1813                         my $i;
1814
1815                         # This method is faster with mostly null-bytes
1816                         while($vec =~ /[^\0]/g ) {
1817                                 $i = -9 + 8 * pos $vec;
1818                                 push @ints, $i if vec($vec, ++$i, 1);
1819                                 push @ints, $i if vec($vec, ++$i, 1);
1820                                 push @ints, $i if vec($vec, ++$i, 1);
1821                                 push @ints, $i if vec($vec, ++$i, 1);
1822                                 push @ints, $i if vec($vec, ++$i, 1);
1823                                 push @ints, $i if vec($vec, ++$i, 1);
1824                                 push @ints, $i if vec($vec, ++$i, 1);
1825                                 push @ints, $i if vec($vec, ++$i, 1);
1826                                 }
1827                         }
1828                 else {
1829                         # This method is a fast general algorithm
1830                         use integer;
1831                         my $bits = unpack "b*", $vec;
1832                         push @ints, 0 if $bits =~ s/^(\d)// && $1;
1833                         push @ints, pos $bits while($bits =~ /1/g);
1834                         }
1835
1836                 return \@ints;
1837                 }
1838
1839 This method gets faster the more sparse the bit vector is.
1840 (Courtesy of Tim Bunce and Winfried Koenig.)
1841
1842 You can make the while loop a lot shorter with this suggestion
1843 from Benjamin Goldberg:
1844
1845         while($vec =~ /[^\0]+/g ) {
1846                 push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1847                 }
1848
1849 Or use the CPAN module C<Bit::Vector>:
1850
1851         $vector = Bit::Vector->new($num_of_bits);
1852         $vector->Index_List_Store(@ints);
1853         @ints = $vector->Index_List_Read();
1854
1855 C<Bit::Vector> provides efficient methods for bit vector, sets of
1856 small integers and "big int" math.
1857
1858 Here's a more extensive illustration using vec():
1859
1860         # vec demo
1861         $vector = "\xff\x0f\xef\xfe";
1862         print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1863         unpack("N", $vector), "\n";
1864         $is_set = vec($vector, 23, 1);
1865         print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1866         pvec($vector);
1867
1868         set_vec(1,1,1);
1869         set_vec(3,1,1);
1870         set_vec(23,1,1);
1871
1872         set_vec(3,1,3);
1873         set_vec(3,2,3);
1874         set_vec(3,4,3);
1875         set_vec(3,4,7);
1876         set_vec(3,8,3);
1877         set_vec(3,8,7);
1878
1879         set_vec(0,32,17);
1880         set_vec(1,32,17);
1881
1882         sub set_vec {
1883                 my ($offset, $width, $value) = @_;
1884                 my $vector = '';
1885                 vec($vector, $offset, $width) = $value;
1886                 print "offset=$offset width=$width value=$value\n";
1887                 pvec($vector);
1888                 }
1889
1890         sub pvec {
1891                 my $vector = shift;
1892                 my $bits = unpack("b*", $vector);
1893                 my $i = 0;
1894                 my $BASE = 8;
1895
1896                 print "vector length in bytes: ", length($vector), "\n";
1897                 @bytes = unpack("A8" x length($vector), $bits);
1898                 print "bits are: @bytes\n\n";
1899                 }
1900
1901 =head2 Why does defined() return true on empty arrays and hashes?
1902
1903 The short story is that you should probably only use defined on scalars or
1904 functions, not on aggregates (arrays and hashes).  See L<perlfunc/defined>
1905 in the 5.004 release or later of Perl for more detail.
1906
1907 =head1 Data: Hashes (Associative Arrays)
1908
1909 =head2 How do I process an entire hash?
1910
1911 (contributed by brian d foy)
1912
1913 There are a couple of ways that you can process an entire hash. You
1914 can get a list of keys, then go through each key, or grab a one
1915 key-value pair at a time.
1916
1917 To go through all of the keys, use the C<keys> function. This extracts
1918 all of the keys of the hash and gives them back to you as a list. You
1919 can then get the value through the particular key you're processing:
1920
1921         foreach my $key ( keys %hash ) {
1922                 my $value = $hash{$key}
1923                 ...
1924                 }
1925
1926 Once you have the list of keys, you can process that list before you
1927 process the hash elements. For instance, you can sort the keys so you
1928 can process them in lexical order:
1929
1930         foreach my $key ( sort keys %hash ) {
1931                 my $value = $hash{$key}
1932                 ...
1933                 }
1934
1935 Or, you might want to only process some of the items. If you only want
1936 to deal with the keys that start with C<text:>, you can select just
1937 those using C<grep>:
1938
1939         foreach my $key ( grep /^text:/, keys %hash ) {
1940                 my $value = $hash{$key}
1941                 ...
1942                 }
1943
1944 If the hash is very large, you might not want to create a long list of
1945 keys. To save some memory, you can grab one key-value pair at a time using
1946 C<each()>, which returns a pair you haven't seen yet:
1947
1948         while( my( $key, $value ) = each( %hash ) ) {
1949                 ...
1950                 }
1951
1952 The C<each> operator returns the pairs in apparently random order, so if
1953 ordering matters to you, you'll have to stick with the C<keys> method.
1954
1955 The C<each()> operator can be a bit tricky though. You can't add or
1956 delete keys of the hash while you're using it without possibly
1957 skipping or re-processing some pairs after Perl internally rehashes
1958 all of the elements. Additionally, a hash has only one iterator, so if
1959 you use C<keys>, C<values>, or C<each> on the same hash, you can reset
1960 the iterator and mess up your processing. See the C<each> entry in
1961 L<perlfunc> for more details.
1962
1963 =head2 How do I merge two hashes?
1964 X<hash> X<merge> X<slice, hash>
1965
1966 (contributed by brian d foy)
1967
1968 Before you decide to merge two hashes, you have to decide what to do
1969 if both hashes contain keys that are the same and if you want to leave
1970 the original hashes as they were.
1971
1972 If you want to preserve the original hashes, copy one hash (C<%hash1>)
1973 to a new hash (C<%new_hash>), then add the keys from the other hash
1974 (C<%hash2> to the new hash. Checking that the key already exists in
1975 C<%new_hash> gives you a chance to decide what to do with the
1976 duplicates:
1977
1978         my %new_hash = %hash1; # make a copy; leave %hash1 alone
1979
1980         foreach my $key2 ( keys %hash2 )
1981                 {
1982                 if( exists $new_hash{$key2} )
1983                         {
1984                         warn "Key [$key2] is in both hashes!";
1985                         # handle the duplicate (perhaps only warning)
1986                         ...
1987                         next;
1988                         }
1989                 else
1990                         {
1991                         $new_hash{$key2} = $hash2{$key2};
1992                         }
1993                 }
1994
1995 If you don't want to create a new hash, you can still use this looping
1996 technique; just change the C<%new_hash> to C<%hash1>.
1997
1998         foreach my $key2 ( keys %hash2 )
1999                 {
2000                 if( exists $hash1{$key2} )
2001                         {
2002                         warn "Key [$key2] is in both hashes!";
2003                         # handle the duplicate (perhaps only warning)
2004                         ...
2005                         next;
2006                         }
2007                 else
2008                         {
2009                         $hash1{$key2} = $hash2{$key2};
2010                         }
2011                 }
2012
2013 If you don't care that one hash overwrites keys and values from the other, you
2014 could just use a hash slice to add one hash to another. In this case, values
2015 from C<%hash2> replace values from C<%hash1> when they have keys in common:
2016
2017         @hash1{ keys %hash2 } = values %hash2;
2018
2019 =head2 What happens if I add or remove keys from a hash while iterating over it?
2020
2021 (contributed by brian d foy)
2022
2023 The easy answer is "Don't do that!"
2024
2025 If you iterate through the hash with each(), you can delete the key
2026 most recently returned without worrying about it.  If you delete or add
2027 other keys, the iterator may skip or double up on them since perl
2028 may rearrange the hash table.  See the
2029 entry for C<each()> in L<perlfunc>.
2030
2031 =head2 How do I look up a hash element by value?
2032
2033 Create a reverse hash:
2034
2035         %by_value = reverse %by_key;
2036         $key = $by_value{$value};
2037
2038 That's not particularly efficient.  It would be more space-efficient
2039 to use:
2040
2041         while (($key, $value) = each %by_key) {
2042                 $by_value{$value} = $key;
2043             }
2044
2045 If your hash could have repeated values, the methods above will only find
2046 one of the associated keys.   This may or may not worry you.  If it does
2047 worry you, you can always reverse the hash into a hash of arrays instead:
2048
2049         while (($key, $value) = each %by_key) {
2050                  push @{$key_list_by_value{$value}}, $key;
2051                 }
2052
2053 =head2 How can I know how many entries are in a hash?
2054
2055 (contributed by brian d foy)
2056
2057 This is very similar to "How do I process an entire hash?", also in
2058 L<perlfaq4>, but a bit simpler in the common cases.
2059
2060 You can use the C<keys()> built-in function in scalar context to find out
2061 have many entries you have in a hash:
2062
2063         my $key_count = keys %hash; # must be scalar context!
2064
2065 If you want to find out how many entries have a defined value, that's
2066 a bit different. You have to check each value. A C<grep> is handy:
2067
2068         my $defined_value_count = grep { defined } values %hash;
2069
2070 You can use that same structure to count the entries any way that
2071 you like. If you want the count of the keys with vowels in them,
2072 you just test for that instead:
2073
2074         my $vowel_count = grep { /[aeiou]/ } keys %hash;
2075
2076 The C<grep> in scalar context returns the count. If you want the list
2077 of matching items, just use it in list context instead:
2078
2079         my @defined_values = grep { defined } values %hash;
2080
2081 The C<keys()> function also resets the iterator, which means that you may
2082 see strange results if you use this between uses of other hash operators
2083 such as C<each()>.
2084
2085 =head2 How do I sort a hash (optionally by value instead of key)?
2086
2087 (contributed by brian d foy)
2088
2089 To sort a hash, start with the keys. In this example, we give the list of
2090 keys to the sort function which then compares them ASCIIbetically (which
2091 might be affected by your locale settings). The output list has the keys
2092 in ASCIIbetical order. Once we have the keys, we can go through them to
2093 create a report which lists the keys in ASCIIbetical order.
2094
2095         my @keys = sort { $a cmp $b } keys %hash;
2096
2097         foreach my $key ( @keys )
2098                 {
2099                 printf "%-20s %6d\n", $key, $hash{$key};
2100                 }
2101
2102 We could get more fancy in the C<sort()> block though. Instead of
2103 comparing the keys, we can compute a value with them and use that
2104 value as the comparison.
2105
2106 For instance, to make our report order case-insensitive, we use
2107 the C<\L> sequence in a double-quoted string to make everything
2108 lowercase. The C<sort()> block then compares the lowercased
2109 values to determine in which order to put the keys.
2110
2111         my @keys = sort { "\L$a" cmp "\L$b" } keys %hash;
2112
2113 Note: if the computation is expensive or the hash has many elements,
2114 you may want to look at the Schwartzian Transform to cache the
2115 computation results.
2116
2117 If we want to sort by the hash value instead, we use the hash key
2118 to look it up. We still get out a list of keys, but this time they
2119 are ordered by their value.
2120
2121         my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
2122
2123 From there we can get more complex. If the hash values are the same,
2124 we can provide a secondary sort on the hash key.
2125
2126         my @keys = sort {
2127                 $hash{$a} <=> $hash{$b}
2128                         or
2129                 "\L$a" cmp "\L$b"
2130                 } keys %hash;
2131
2132 =head2 How can I always keep my hash sorted?
2133 X<hash tie sort DB_File Tie::IxHash>
2134
2135 You can look into using the C<DB_File> module and C<tie()> using the
2136 C<$DB_BTREE> hash bindings as documented in L<DB_File/"In Memory
2137 Databases">. The C<Tie::IxHash> module from CPAN might also be
2138 instructive. Although this does keep your hash sorted, you might not
2139 like the slow down you suffer from the tie interface. Are you sure you
2140 need to do this? :)
2141
2142 =head2 What's the difference between "delete" and "undef" with hashes?
2143
2144 Hashes contain pairs of scalars: the first is the key, the
2145 second is the value.  The key will be coerced to a string,
2146 although the value can be any kind of scalar: string,
2147 number, or reference.  If a key C<$key> is present in
2148 %hash, C<exists($hash{$key})> will return true.  The value
2149 for a given key can be C<undef>, in which case
2150 C<$hash{$key}> will be C<undef> while C<exists $hash{$key}>
2151 will return true.  This corresponds to (C<$key>, C<undef>)
2152 being in the hash.
2153
2154 Pictures help...  Here's the C<%hash> table:
2155
2156           keys  values
2157         +------+------+
2158         |  a   |  3   |
2159         |  x   |  7   |
2160         |  d   |  0   |
2161         |  e   |  2   |
2162         +------+------+
2163
2164 And these conditions hold
2165
2166         $hash{'a'}                       is true
2167         $hash{'d'}                       is false
2168         defined $hash{'d'}               is true
2169         defined $hash{'a'}               is true
2170         exists $hash{'a'}                is true (Perl 5 only)
2171         grep ($_ eq 'a', keys %hash)     is true
2172
2173 If you now say
2174
2175         undef $hash{'a'}
2176
2177 your table now reads:
2178
2179
2180           keys  values
2181         +------+------+
2182         |  a   | undef|
2183         |  x   |  7   |
2184         |  d   |  0   |
2185         |  e   |  2   |
2186         +------+------+
2187
2188 and these conditions now hold; changes in caps:
2189
2190         $hash{'a'}                       is FALSE
2191         $hash{'d'}                       is false
2192         defined $hash{'d'}               is true
2193         defined $hash{'a'}               is FALSE
2194         exists $hash{'a'}                is true (Perl 5 only)
2195         grep ($_ eq 'a', keys %hash)     is true
2196
2197 Notice the last two: you have an undef value, but a defined key!
2198
2199 Now, consider this:
2200
2201         delete $hash{'a'}
2202
2203 your table now reads:
2204
2205           keys  values
2206         +------+------+
2207         |  x   |  7   |
2208         |  d   |  0   |
2209         |  e   |  2   |
2210         +------+------+
2211
2212 and these conditions now hold; changes in caps:
2213
2214         $hash{'a'}                       is false
2215         $hash{'d'}                       is false
2216         defined $hash{'d'}               is true
2217         defined $hash{'a'}               is false
2218         exists $hash{'a'}                is FALSE (Perl 5 only)
2219         grep ($_ eq 'a', keys %hash)     is FALSE
2220
2221 See, the whole entry is gone!
2222
2223 =head2 Why don't my tied hashes make the defined/exists distinction?
2224
2225 This depends on the tied hash's implementation of EXISTS().
2226 For example, there isn't the concept of undef with hashes
2227 that are tied to DBM* files. It also means that exists() and
2228 defined() do the same thing with a DBM* file, and what they
2229 end up doing is not what they do with ordinary hashes.
2230
2231 =head2 How do I reset an each() operation part-way through?
2232
2233 (contributed by brian d foy)
2234
2235 You can use the C<keys> or C<values> functions to reset C<each>. To
2236 simply reset the iterator used by C<each> without doing anything else,
2237 use one of them in void context:
2238
2239         keys %hash; # resets iterator, nothing else.
2240         values %hash; # resets iterator, nothing else.
2241
2242 See the documentation for C<each> in L<perlfunc>.
2243
2244 =head2 How can I get the unique keys from two hashes?
2245
2246 First you extract the keys from the hashes into lists, then solve
2247 the "removing duplicates" problem described above.  For example:
2248
2249         %seen = ();
2250         for $element (keys(%foo), keys(%bar)) {
2251                 $seen{$element}++;
2252                 }
2253         @uniq = keys %seen;
2254
2255 Or more succinctly:
2256
2257         @uniq = keys %{{%foo,%bar}};
2258
2259 Or if you really want to save space:
2260
2261         %seen = ();
2262         while (defined ($key = each %foo)) {
2263                 $seen{$key}++;
2264         }
2265         while (defined ($key = each %bar)) {
2266                 $seen{$key}++;
2267         }
2268         @uniq = keys %seen;
2269
2270 =head2 How can I store a multidimensional array in a DBM file?
2271
2272 Either stringify the structure yourself (no fun), or else
2273 get the MLDBM (which uses Data::Dumper) module from CPAN and layer
2274 it on top of either DB_File or GDBM_File.
2275
2276 =head2 How can I make my hash remember the order I put elements into it?
2277
2278 Use the C<Tie::IxHash> from CPAN.
2279
2280         use Tie::IxHash;
2281
2282         tie my %myhash, 'Tie::IxHash';
2283
2284         for (my $i=0; $i<20; $i++) {
2285                 $myhash{$i} = 2*$i;
2286                 }
2287
2288         my @keys = keys %myhash;
2289         # @keys = (0,1,2,3,...)
2290
2291 =head2 Why does passing a subroutine an undefined element in a hash create it?
2292
2293 (contributed by brian d foy)
2294
2295 Are you using a really old version of Perl?
2296
2297 Normally, accessing a hash key's value for a nonexistent key will
2298 I<not> create the key.
2299
2300         my %hash  = ();
2301         my $value = $hash{ 'foo' };
2302         print "This won't print\n" if exists $hash{ 'foo' };
2303
2304 Passing C<$hash{ 'foo' }> to a subroutine used to be a special case, though.
2305 Since you could assign directly to C<$_[0]>, Perl had to be ready to
2306 make that assignment so it created the hash key ahead of time:
2307
2308     my_sub( $hash{ 'foo' } );
2309         print "This will print before 5.004\n" if exists $hash{ 'foo' };
2310
2311         sub my_sub {
2312                 # $_[0] = 'bar'; # create hash key in case you do this
2313                 1;
2314                 }
2315
2316 Since Perl 5.004, however, this situation is a special case and Perl
2317 creates the hash key only when you make the assignment:
2318
2319     my_sub( $hash{ 'foo' } );
2320         print "This will print, even after 5.004\n" if exists $hash{ 'foo' };
2321
2322         sub my_sub {
2323                 $_[0] = 'bar';
2324                 }
2325
2326 However, if you want the old behavior (and think carefully about that
2327 because it's a weird side effect), you can pass a hash slice instead.
2328 Perl 5.004 didn't make this a special case:
2329
2330         my_sub( @hash{ qw/foo/ } );
2331
2332 =head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
2333
2334 Usually a hash ref, perhaps like this:
2335
2336         $record = {
2337                 NAME   => "Jason",
2338                 EMPNO  => 132,
2339                 TITLE  => "deputy peon",
2340                 AGE    => 23,
2341                 SALARY => 37_000,
2342                 PALS   => [ "Norbert", "Rhys", "Phineas"],
2343         };
2344
2345 References are documented in L<perlref> and the upcoming L<perlreftut>.
2346 Examples of complex data structures are given in L<perldsc> and
2347 L<perllol>.  Examples of structures and object-oriented classes are
2348 in L<perltoot>.
2349
2350 =head2 How can I use a reference as a hash key?
2351
2352 (contributed by brian d foy and Ben Morrow)
2353
2354 Hash keys are strings, so you can't really use a reference as the key.
2355 When you try to do that, perl turns the reference into its stringified
2356 form (for instance, C<HASH(0xDEADBEEF)>). From there you can't get
2357 back the reference from the stringified form, at least without doing
2358 some extra work on your own.
2359
2360 Remember that the entry in the hash will still be there even if
2361 the referenced variable  goes out of scope, and that it is entirely
2362 possible for Perl to subsequently allocate a different variable at
2363 the same address. This will mean a new variable might accidentally
2364 be associated with the value for an old.
2365
2366 If you have Perl 5.10 or later, and you just want to store a value
2367 against the reference for lookup later, you can use the core
2368 Hash::Util::Fieldhash module. This will also handle renaming the
2369 keys if you use multiple threads (which causes all variables to be
2370 reallocated at new addresses, changing their stringification), and
2371 garbage-collecting the entries when the referenced variable goes out
2372 of scope.
2373
2374 If you actually need to be able to get a real reference back from
2375 each hash entry, you can use the Tie::RefHash module, which does the
2376 required work for you.
2377
2378 =head1 Data: Misc
2379
2380 =head2 How do I handle binary data correctly?
2381
2382 Perl is binary clean, so it can handle binary data just fine.
2383 On Windows or DOS, however, you have to use C<binmode> for binary
2384 files to avoid conversions for line endings. In general, you should
2385 use C<binmode> any time you want to work with binary data.
2386
2387 Also see L<perlfunc/"binmode"> or L<perlopentut>.
2388
2389 If you're concerned about 8-bit textual data then see L<perllocale>.
2390 If you want to deal with multibyte characters, however, there are
2391 some gotchas.  See the section on Regular Expressions.
2392
2393 =head2 How do I determine whether a scalar is a number/whole/integer/float?
2394
2395 Assuming that you don't care about IEEE notations like "NaN" or
2396 "Infinity", you probably just want to use a regular expression.
2397
2398         if (/\D/)            { print "has nondigits\n" }
2399         if (/^\d+$/)         { print "is a whole number\n" }
2400         if (/^-?\d+$/)       { print "is an integer\n" }
2401         if (/^[+-]?\d+$/)    { print "is a +/- integer\n" }
2402         if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
2403         if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
2404         if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
2405                         { print "a C float\n" }
2406
2407 There are also some commonly used modules for the task.
2408 L<Scalar::Util> (distributed with 5.8) provides access to perl's
2409 internal function C<looks_like_number> for determining whether a
2410 variable looks like a number.  L<Data::Types> exports functions that
2411 validate data types using both the above and other regular
2412 expressions. Thirdly, there is C<Regexp::Common> which has regular
2413 expressions to match various types of numbers. Those three modules are
2414 available from the CPAN.
2415
2416 If you're on a POSIX system, Perl supports the C<POSIX::strtod>
2417 function.  Its semantics are somewhat cumbersome, so here's a
2418 C<getnum> wrapper function for more convenient access.  This function
2419 takes a string and returns the number it found, or C<undef> for input
2420 that isn't a C float.  The C<is_numeric> function is a front end to
2421 C<getnum> if you just want to say, "Is this a float?"
2422
2423         sub getnum {
2424                 use POSIX qw(strtod);
2425                 my $str = shift;
2426                 $str =~ s/^\s+//;
2427                 $str =~ s/\s+$//;
2428                 $! = 0;
2429                 my($num, $unparsed) = strtod($str);
2430                 if (($str eq '') || ($unparsed != 0) || $!) {
2431                                 return undef;
2432                         }
2433                 else {
2434                         return $num;
2435                         }
2436                 }
2437
2438         sub is_numeric { defined getnum($_[0]) }
2439
2440 Or you could check out the L<String::Scanf> module on the CPAN
2441 instead. The C<POSIX> module (part of the standard Perl distribution)
2442 provides the C<strtod> and C<strtol> for converting strings to double
2443 and longs, respectively.
2444
2445 =head2 How do I keep persistent data across program calls?
2446
2447 For some specific applications, you can use one of the DBM modules.
2448 See L<AnyDBM_File>.  More generically, you should consult the C<FreezeThaw>
2449 or C<Storable> modules from CPAN.  Starting from Perl 5.8 C<Storable> is part
2450 of the standard distribution.  Here's one example using C<Storable>'s C<store>
2451 and C<retrieve> functions:
2452
2453         use Storable;
2454         store(\%hash, "filename");
2455
2456         # later on...
2457         $href = retrieve("filename");        # by ref
2458         %hash = %{ retrieve("filename") };   # direct to hash
2459
2460 =head2 How do I print out or copy a recursive data structure?
2461
2462 The C<Data::Dumper> module on CPAN (or the 5.005 release of Perl) is great
2463 for printing out data structures.  The C<Storable> module on CPAN (or the
2464 5.8 release of Perl), provides a function called C<dclone> that recursively
2465 copies its argument.
2466
2467         use Storable qw(dclone);
2468         $r2 = dclone($r1);
2469
2470 Where C<$r1> can be a reference to any kind of data structure you'd like.
2471 It will be deeply copied.  Because C<dclone> takes and returns references,
2472 you'd have to add extra punctuation if you had a hash of arrays that
2473 you wanted to copy.
2474
2475         %newhash = %{ dclone(\%oldhash) };
2476
2477 =head2 How do I define methods for every class/object?
2478
2479 (contributed by Ben Morrow)
2480
2481 You can use the C<UNIVERSAL> class (see L<UNIVERSAL>). However, please
2482 be very careful to consider the consequences of doing this: adding
2483 methods to every object is very likely to have unintended
2484 consequences. If possible, it would be better to have all your object
2485 inherit from some common base class, or to use an object system like
2486 Moose that supports roles.
2487
2488 =head2 How do I verify a credit card checksum?
2489
2490 Get the C<Business::CreditCard> module from CPAN.
2491
2492 =head2 How do I pack arrays of doubles or floats for XS code?
2493
2494 The arrays.h/arrays.c code in the C<PGPLOT> module on CPAN does just this.
2495 If you're doing a lot of float or double processing, consider using
2496 the C<PDL> module from CPAN instead--it makes number-crunching easy.
2497
2498 See L<http://search.cpan.org/dist/PGPLOT> for the code.
2499
2500 =head1 REVISION
2501
2502 Revision: $Revision$
2503
2504 Date: $Date$
2505
2506 See L<perlfaq> for source control details and availability.
2507
2508 =head1 AUTHOR AND COPYRIGHT
2509
2510 Copyright (c) 1997-2009 Tom Christiansen, Nathan Torkington, and
2511 other authors as noted. All rights reserved.
2512
2513 This documentation is free; you can redistribute it and/or modify it
2514 under the same terms as Perl itself.
2515
2516 Irrespective of its distribution, all code examples in this file
2517 are hereby placed into the public domain.  You are permitted and
2518 encouraged to use this code in your own programs for fun
2519 or for profit as you see fit.  A simple comment in the code giving
2520 credit would be courteous but is not required.