This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
CommitLineData
68dc0745
PP
1=head1 NAME
2
109f0441 3perlfaq4 - Data Manipulation
68dc0745
PP
4
5=head1 DESCRIPTION
6
ae3d0b9f
JH
7This section of the FAQ answers questions related to manipulating
8numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
68dc0745
PP
9
10=head1 Data: Numbers
11
46fc3d4c
PP
12=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
13
d12d61cf 14For the long explanation, see David Goldberg's "What Every Computer
15Scientist Should Know About Floating-Point Arithmetic"
16(http://docs.sun.com/source/806-3568/ncg_goldberg.html).
17
ac9dac7f
RGS
18Internally, your computer represents floating-point numbers in binary.
19Digital (as in powers of two) computers cannot store all numbers
20exactly. Some real numbers lose precision in the process. This is a
21problem with how computers store numbers and affects all computer
22languages, not just Perl.
46fc3d4c 23
ee891a00 24L<perlnumber> shows the gory details of number representations and
ac9dac7f 25conversions.
49d635f9 26
ac9dac7f 27To limit the number of decimal places in your numbers, you can use the
3bc3c5be 28C<printf> or C<sprintf> function. See the L<"Floating Point
ac9dac7f 29Arithmetic"|perlop> for more details.
49d635f9
RGS
30
31 printf "%.2f", 10/3;
197aec24 32
49d635f9 33 my \$number = sprintf "%.2f", 10/3;
197aec24 34
32969b6e
BB
35=head2 Why is int() broken?
36
ac9dac7f 37Your C<int()> is most probably working just fine. It's the numbers that
32969b6e
BB
38aren't quite what you think.
39
ac9dac7f 40First, see the answer to "Why am I getting long decimals
32969b6e
BB
41(eg, 19.9499999999999) instead of the numbers I should be getting
42(eg, 19.95)?".
43
44For example, this
45
ac9dac7f 46 print int(0.6/0.2-2), "\n";
32969b6e
BB
47
48will in most computers print 0, not 1, because even such simple
49numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
50numbers. What you think in the above as 'three' is really more like
512.9999999999999995559.
52
68dc0745
PP
53=head2 Why isn't my octal data interpreted correctly?
54
109f0441
SM
55(contributed by brian d foy)
56
57You're probably trying to convert a string to a number, which Perl only
58converts as a decimal number. When Perl converts a string to a number, it
59ignores leading spaces and zeroes, then assumes the rest of the digits
60are in base 10:
61
62 my \$string = '0644';
63
64 print \$string + 0; # prints 644
65
66 print \$string + 44; # prints 688, certainly not octal!
67
68This problem usually involves one of the Perl built-ins that has the
23bec515 69same name a Unix command that uses octal numbers as arguments on the
109f0441
SM
70command line. In this example, C<chmod> on the command line knows that
71its first argument is octal because that's what it does:
72
73 %prompt> chmod 644 file
74
75If you want to use the same literal digits (644) in Perl, you have to tell
76Perl to treat them as octal numbers either by prefixing the digits with
77a C<0> or using C<oct>:
78
79 chmod( 0644, \$file); # right, has leading zero
80 chmod( oct(644), \$file ); # also correct
68dc0745 81
109f0441
SM
82The problem comes in when you take your numbers from something that Perl
83thinks is a string, such as a command line argument in C<@ARGV>:
68dc0745 84
109f0441 85 chmod( \$ARGV[0], \$file); # wrong, even if "0644"
68dc0745 86
109f0441 87 chmod( oct(\$ARGV[0]), \$file ); # correct, treat string as octal
33ce146f 88
109f0441
SM
89You can always check the value you're using by printing it in octal
90notation to ensure it matches what you think it should be. Print it
91in octal and decimal format:
33ce146f 92
109f0441 93 printf "0%o %d", \$number, \$number;
33ce146f 94
65acb1b1 95=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
68dc0745 96
ac9dac7f
RGS
97Remember that C<int()> merely truncates toward 0. For rounding to a
98certain number of digits, C<sprintf()> or C<printf()> is usually the
99easiest route.
92c2ed05 100
ac9dac7f 101 printf("%.3f", 3.1415926535); # prints 3.142
68dc0745 102
ac9dac7f
RGS
103The C<POSIX> module (part of the standard Perl distribution)
104implements C<ceil()>, C<floor()>, and a number of other mathematical
105and trigonometric functions.
68dc0745 106
ac9dac7f
RGS
107 use POSIX;
108 \$ceil = ceil(3.5); # 4
109 \$floor = floor(3.5); # 3
92c2ed05 110
ac9dac7f
RGS
111In 5.000 to 5.003 perls, trigonometry was done in the C<Math::Complex>
112module. With 5.004, the C<Math::Trig> module (part of the standard Perl
46fc3d4c 113distribution) implements the trigonometric functions. Internally it
ac9dac7f 114uses the C<Math::Complex> module and some functions can break out from
46fc3d4c
PP
115the real axis into the complex plane, for example the inverse sine of
1162.
68dc0745
PP
117
118Rounding in financial applications can have serious implications, and
119the rounding method used should be specified precisely. In these
120cases, it probably pays not to trust whichever system rounding is
121being used by Perl, but to instead implement the rounding function you
122need yourself.
123
65acb1b1
TC
124To see why, notice how you'll still have an issue on half-way-point
125alternation:
126
ac9dac7f 127 for (\$i = 0; \$i < 1.01; \$i += 0.05) { printf "%.1f ",\$i}
65acb1b1 128
ac9dac7f
RGS
129 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
130 0.8 0.8 0.9 0.9 1.0 1.0
65acb1b1 131
ac9dac7f
RGS
132Don't blame Perl. It's the same as in C. IEEE says we have to do
133this. Perl numbers whose absolute values are integers under 2**31 (on
13432 bit machines) will work pretty much like mathematical integers.
135Other numbers are not guaranteed.
65acb1b1 136
6f0efb17 137=head2 How do I convert between numeric representations/bases/radixes?
68dc0745 138
ac9dac7f
RGS
139As always with Perl there is more than one way to do it. Below are a
140few examples of approaches to making common conversions between number
141representations. This is intended to be representational rather than
142exhaustive.
68dc0745 143
ac9dac7f
RGS
144Some of the examples later in L<perlfaq4> use the C<Bit::Vector>
145module from CPAN. The reason you might choose C<Bit::Vector> over the
146perl built in functions is that it works with numbers of ANY size,
147that it is optimized for speed on some operations, and for at least
148some programmers the notation might be familiar.
d92eb7b0 149
818c4caa
JH
150=over 4
151
152=item How do I convert hexadecimal into decimal
d92eb7b0 153
ac9dac7f 154Using perl's built in conversion of C<0x> notation:
6761e064 155
ac9dac7f 156 \$dec = 0xDEADBEEF;
7207e29d 157
ac9dac7f 158Using the C<hex> function:
6761e064 159
ac9dac7f 160 \$dec = hex("DEADBEEF");
6761e064 161
ac9dac7f 162Using C<pack>:
6761e064 163
ac9dac7f 164 \$dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
6761e064 165
ac9dac7f 166Using the CPAN module C<Bit::Vector>:
6761e064 167
ac9dac7f
RGS
168 use Bit::Vector;
169 \$vec = Bit::Vector->new_Hex(32, "DEADBEEF");
170 \$dec = \$vec->to_Dec();
6761e064 171
818c4caa 172=item How do I convert from decimal to hexadecimal
6761e064 173
ac9dac7f 174Using C<sprintf>:
6761e064 175
ac9dac7f
RGS
176 \$hex = sprintf("%X", 3735928559); # upper case A-F
177 \$hex = sprintf("%x", 3735928559); # lower case a-f
6761e064 178
ac9dac7f 179Using C<unpack>:
6761e064 180
ac9dac7f 181 \$hex = unpack("H*", pack("N", 3735928559));
6761e064 182
ac9dac7f 183Using C<Bit::Vector>:
6761e064 184
ac9dac7f
RGS
185 use Bit::Vector;
186 \$vec = Bit::Vector->new_Dec(32, -559038737);
187 \$hex = \$vec->to_Hex();
6761e064 188
ac9dac7f 189And C<Bit::Vector> supports odd bit counts:
6761e064 190
ac9dac7f
RGS
191 use Bit::Vector;
192 \$vec = Bit::Vector->new_Dec(33, 3735928559);
193 \$vec->Resize(32); # suppress leading 0 if unwanted
194 \$hex = \$vec->to_Hex();
6761e064 195
818c4caa 196=item How do I convert from octal to decimal
6761e064
JH
197
198Using Perl's built in conversion of numbers with leading zeros:
199
ac9dac7f 200 \$dec = 033653337357; # note the leading 0!
6761e064 201
ac9dac7f 202Using the C<oct> function:
6761e064 203
ac9dac7f 204 \$dec = oct("33653337357");
6761e064 205
ac9dac7f 206Using C<Bit::Vector>:
6761e064 207
ac9dac7f
RGS
208 use Bit::Vector;
209 \$vec = Bit::Vector->new(32);
210 \$vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
211 \$dec = \$vec->to_Dec();
6761e064 212
818c4caa 213=item How do I convert from decimal to octal
6761e064 214
ac9dac7f 215Using C<sprintf>:
6761e064 216
ac9dac7f 217 \$oct = sprintf("%o", 3735928559);
6761e064 218
ac9dac7f 219Using C<Bit::Vector>:
6761e064 220
ac9dac7f
RGS
221 use Bit::Vector;
222 \$vec = Bit::Vector->new_Dec(32, -559038737);
223 \$oct = reverse join('', \$vec->Chunk_List_Read(3));
6761e064 224
818c4caa 225=item How do I convert from binary to decimal
6761e064 226
2c646907 227Perl 5.6 lets you write binary numbers directly with
ac9dac7f 228the C<0b> notation:
2c646907 229
ac9dac7f 230 \$number = 0b10110110;
6f0efb17 231
ac9dac7f 232Using C<oct>:
6f0efb17 233
ac9dac7f
RGS
234 my \$input = "10110110";
235 \$decimal = oct( "0b\$input" );
2c646907 236
ac9dac7f 237Using C<pack> and C<ord>:
d92eb7b0 238
ac9dac7f 239 \$decimal = ord(pack('B8', '10110110'));
68dc0745 240
ac9dac7f 241Using C<pack> and C<unpack> for larger strings:
6761e064 242
ac9dac7f 243 \$int = unpack("N", pack("B32",
6761e064 244 substr("0" x 32 . "11110101011011011111011101111", -32)));
ac9dac7f 245 \$dec = sprintf("%d", \$int);
6761e064 246
ac9dac7f 247 # substr() is used to left pad a 32 character string with zeros.
6761e064 248
ac9dac7f 249Using C<Bit::Vector>:
6761e064 250
ac9dac7f
RGS
251 \$vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
252 \$dec = \$vec->to_Dec();
6761e064 253
818c4caa 254=item How do I convert from decimal to binary
6761e064 255
ac9dac7f 256Using C<sprintf> (perl 5.6+):
4dfcc30b 257
ac9dac7f 258 \$bin = sprintf("%b", 3735928559);
4dfcc30b 259
ac9dac7f 260Using C<unpack>:
6761e064 261
ac9dac7f 262 \$bin = unpack("B*", pack("N", 3735928559));
6761e064 263
ac9dac7f 264Using C<Bit::Vector>:
6761e064 265
ac9dac7f
RGS
266 use Bit::Vector;
267 \$vec = Bit::Vector->new_Dec(32, -559038737);
268 \$bin = \$vec->to_Bin();
6761e064
JH
269
270The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
271are left as an exercise to the inclined reader.
68dc0745 272
818c4caa 273=back
68dc0745 274
65acb1b1
TC
275=head2 Why doesn't & work the way I want it to?
276
277The behavior of binary arithmetic operators depends on whether they're
278used on numbers or strings. The operators treat a string as a series
279of bits and work with that (the string C<"3"> is the bit pattern
280C<00110011>). The operators work with the binary form of a number
281(the number C<3> is treated as the bit pattern C<00000011>).
282
283So, saying C<11 & 3> performs the "and" operation on numbers (yielding
49d635f9 284C<3>). Saying C<"11" & "3"> performs the "and" operation on strings
65acb1b1
TC
285(yielding C<"1">).
286
287Most problems with C<&> and C<|> arise because the programmer thinks
288they have a number but really it's a string. The rest arise because
289the programmer says:
290
ac9dac7f
RGS
291 if ("\020\020" & "\101\101") {
292 # ...
293 }
65acb1b1
TC
294
295but a string consisting of two null bytes (the result of C<"\020\020"
296& "\101\101">) is not a false value in Perl. You need:
297
ac9dac7f
RGS
298 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
299 # ...
300 }
65acb1b1 301
68dc0745
PP
302=head2 How do I multiply matrices?
303
d12d61cf 304Use the C<Math::Matrix> or C<Math::MatrixReal> modules (available from CPAN)
305or the C<PDL> extension (also available from CPAN).
68dc0745
PP
306
307=head2 How do I perform an operation on a series of integers?
308
309To call a function on each element in an array, and collect the
310results, use:
311
ac9dac7f 312 @results = map { my_func(\$_) } @array;
68dc0745
PP
313
314For example:
315
ac9dac7f 316 @triple = map { 3 * \$_ } @single;
68dc0745
PP
317
318To call a function on each element of an array, but ignore the
319results:
320
ac9dac7f
RGS
321 foreach \$iterator (@array) {
322 some_func(\$iterator);
323 }
68dc0745
PP
324
325To call a function on each integer in a (small) range, you B<can> use:
326
ac9dac7f 327 @results = map { some_func(\$_) } (5 .. 25);
68dc0745
PP
328
329but you should be aware that the C<..> operator creates an array of
330all integers in the range. This can take a lot of memory for large
331ranges. Instead use:
332
ac9dac7f 333 @results = ();
eaffe51e 334 for (\$i=5; \$i <= 500_005; \$i++) {
ac9dac7f
RGS
335 push(@results, some_func(\$i));
336 }
68dc0745 337
87275199
GS
338This situation has been fixed in Perl5.005. Use of C<..> in a C<for>
339loop will iterate over the range, without creating the entire range.
340
ac9dac7f
RGS
341 for my \$i (5 .. 500_005) {
342 push(@results, some_func(\$i));
343 }
87275199
GS
344
345will not create a list of 500,000 integers.
346
68dc0745
PP
347=head2 How can I output Roman numerals?
348
d12d61cf 349Get the L<http://www.cpan.org/modules/by-module/Roman> module.
68dc0745
PP
350
351=head2 Why aren't my random numbers random?
352
65acb1b1
TC
353If you're using a version of Perl before 5.004, you must call C<srand>
354once at the start of your program to seed the random number generator.
49d635f9 355
5cd0b561 356 BEGIN { srand() if \$] < 5.004 }
49d635f9 357
65acb1b1 3585.004 and later automatically call C<srand> at the beginning. Don't
ac9dac7f
RGS
359call C<srand> more than once--you make your numbers less random,
360rather than more.
92c2ed05 361
65acb1b1 362Computers are good at being predictable and bad at being random
06a5f41f 363(despite appearances caused by bugs in your programs :-). see the
49d635f9 364F<random> article in the "Far More Than You Ever Wanted To Know"
d12d61cf 365collection in L<http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz>, courtesy
ac9dac7f 366of Tom Phoenix, talks more about this. John von Neumann said, "Anyone
06a5f41f 367who attempts to generate random numbers by deterministic means is, of
b432a672 368course, living in a state of sin."
65acb1b1
TC
369
370If you want numbers that are more random than C<rand> with C<srand>
ac9dac7f 371provides, you should also check out the C<Math::TrulyRandom> module from
65acb1b1
TC
372CPAN. It uses the imperfections in your system's timer to generate
373random numbers, but this takes quite a while. If you want a better
92c2ed05 374pseudorandom generator than comes with your operating system, look at
d12d61cf 375"Numerical Recipes in C" at L<http://www.nr.com/>.
68dc0745 376
881bdbd4
JH
377=head2 How do I get a random number between X and Y?
378
ee891a00 379To get a random number between two values, you can use the C<rand()>
109f0441 380built-in to get a random number between 0 and 1. From there, you shift
ee891a00 381that into the range that you want.
500071f4 382
ee891a00
RGS
383C<rand(\$x)> returns a number such that C<< 0 <= rand(\$x) < \$x >>. Thus
384what you want to have perl figure out is a random number in the range
385from 0 to the difference between your I<X> and I<Y>.
793f5136 386
ee891a00
RGS
387That is, to get a number between 10 and 15, inclusive, you want a
388random number between 0 and 5 that you can then add to 10.
793f5136 389
109f0441 390 my \$number = 10 + int rand( 15-10+1 ); # ( 10,11,12,13,14, or 15 )
793f5136
RGS
391
392Hence you derive the following simple function to abstract
393that. It selects a random integer between the two given
500071f4
RGS
394integers (inclusive), For example: C<random_int_between(50,120)>.
395
ac9dac7f 396 sub random_int_between {
500071f4
RGS
397 my(\$min, \$max) = @_;
398 # Assumes that the two arguments are integers themselves!
399 return \$min if \$min == \$max;
400 (\$min, \$max) = (\$max, \$min) if \$min > \$max;
401 return \$min + int rand(1 + \$max - \$min);
402 }
881bdbd4 403
68dc0745
PP
404=head1 Data: Dates
405
5cd0b561 406=head2 How do I find the day or week of the year?
68dc0745 407
d12d61cf 408The C<localtime> function returns the day of the year. Without an
409argument C<localtime> uses the current time.
68dc0745 410
a05e4845 411 \$day_of_year = (localtime)[7];
ffc145e8 412
ac9dac7f 413The C<POSIX> module can also format a date as the day of the year or
5cd0b561 414week of the year.
68dc0745 415
5cd0b561
RGS
416 use POSIX qw/strftime/;
417 my \$day_of_year = strftime "%j", localtime;
418 my \$week_of_year = strftime "%W", localtime;
419
ac9dac7f 420To get the day of year for any date, use C<POSIX>'s C<mktime> to get
d12d61cf 421a time in epoch seconds for the argument to C<localtime>.
ffc145e8 422
ac9dac7f 423 use POSIX qw/mktime strftime/;
6670e5e7 424 my \$week_of_year = strftime "%W",
ac9dac7f 425 localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
5cd0b561 426
ac9dac7f 427The C<Date::Calc> module provides two functions to calculate these.
5cd0b561
RGS
428
429 use Date::Calc;
430 my \$day_of_year = Day_of_Year( 1987, 12, 18 );
431 my \$week_of_year = Week_of_Year( 1987, 12, 18 );
ffc145e8 432
d92eb7b0
GS
433=head2 How do I find the current century or millennium?
434
435Use the following simple functions:
436
ac9dac7f
RGS
437 sub get_century {
438 return int((((localtime(shift || time))[5] + 1999))/100);
439 }
6670e5e7 440
ac9dac7f
RGS
441 sub get_millennium {
442 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
443 }
d92eb7b0 444
ac9dac7f
RGS
445On some systems, the C<POSIX> module's C<strftime()> function has been
446extended in a non-standard way to use a C<%C> format, which they
447sometimes claim is the "century". It isn't, because on most such
448systems, this is only the first two digits of the four-digit year, and
449thus cannot be used to reliably determine the current century or
450millennium.
d92eb7b0 451
92c2ed05 452=head2 How can I compare two dates and find the difference?
68dc0745 453
b68463f7
RGS
454(contributed by brian d foy)
455
ac9dac7f
RGS
456You could just store all your dates as a number and then subtract.
457Life isn't always that simple though. If you want to work with
458formatted dates, the C<Date::Manip>, C<Date::Calc>, or C<DateTime>
459modules can help you.
68dc0745
PP
460
461=head2 How can I take a string and turn it into epoch seconds?
462
463If it's a regular enough string that it always has the same format,
92c2ed05 464you can split it up and pass the parts to C<timelocal> in the standard
ac9dac7f
RGS
465C<Time::Local> module. Otherwise, you should look into the C<Date::Calc>
466and C<Date::Manip> modules from CPAN.
68dc0745
PP
467
468=head2 How can I find the Julian Day?
469
7678cced
RGS
470(contributed by brian d foy and Dave Cross)
471
ac9dac7f
RGS
472You can use the C<Time::JulianDay> module available on CPAN. Ensure
473that you really want to find a Julian day, though, as many people have
7678cced
RGS
474different ideas about Julian days. See
475http://www.hermetic.ch/cal_stud/jdn.htm for instance.
476
ac9dac7f 477You can also try the C<DateTime> module, which can convert a date/time
7678cced
RGS
478to a Julian Day.
479
ac9dac7f
RGS
480 \$ perl -MDateTime -le'print DateTime->today->jd'
481 2453401.5
7678cced
RGS
482
483Or the modified Julian Day
484
ac9dac7f
RGS
485 \$ perl -MDateTime -le'print DateTime->today->mjd'
486 53401
7678cced
RGS
487
488Or even the day of the year (which is what some people think of as a
489Julian day)
490
ac9dac7f
RGS
491 \$ perl -MDateTime -le'print DateTime->today->doy'
492 31
be94a901 493
65acb1b1 494=head2 How do I find yesterday's date?
109f0441
SM
495X<date> X<yesterday> X<DateTime> X<Date::Calc> X<Time::Local>
496X<daylight saving time> X<day> X<Today_and_Now> X<localtime>
497X<timelocal>
65acb1b1 498
6670e5e7 499(contributed by brian d foy)
49d635f9 500
6670e5e7
RGS
501Use one of the Date modules. The C<DateTime> module makes it simple, and
502give you the same time of day, only the day before.
49d635f9 503
6670e5e7 504 use DateTime;
58103a2e 505
6670e5e7 506 my \$yesterday = DateTime->now->subtract( days => 1 );
58103a2e 507
6670e5e7 508 print "Yesterday was \$yesterday\n";
49d635f9 509
ee891a00 510You can also use the C<Date::Calc> module using its C<Today_and_Now>
6670e5e7 511function.
49d635f9 512
6670e5e7 513 use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
58103a2e 514
6670e5e7 515 my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
58103a2e 516
ee891a00 517 print "@date_time\n";
58103a2e 518
6670e5e7
RGS
519Most people try to use the time rather than the calendar to figure out
520dates, but that assumes that days are twenty-four hours each. For
521most people, there are two days a year when they aren't: the switch to
522and from summer time throws this off. Let the modules do the work.
d92eb7b0 523
109f0441
SM
524If you absolutely must do it yourself (or can't use one of the
525modules), here's a solution using C<Time::Local>, which comes with
526Perl:
527
528 # contributed by Gunnar Hjalmarsson
529 use Time::Local;
530 my \$today = timelocal 0, 0, 12, ( localtime )[3..5];
531 my (\$d, \$m, \$y) = ( localtime \$today-86400 )[3..5];
532 printf "Yesterday: %d-%02d-%02d\n", \$y+1900, \$m+1, \$d;
533
534In this case, you measure the day starting at noon, and subtract 24
535hours. Even if the length of the calendar day is 23 or 25 hours,
536you'll still end up on the previous calendar day, although not at
537noon. Since you don't care about the time, the one hour difference
538doesn't matter and you end up with the previous date.
539
3bc3c5be 540=head2 Does Perl have a Year 2000 or 2038 problem? Is Perl Y2K compliant?
541
542(contributed by brian d foy)
543
23bec515 544Perl itself never had a Y2K problem, although that never stopped people
3bc3c5be 545from creating Y2K problems on their own. See the documentation for
546C<localtime> for its proper use.
547
d12d61cf 548Starting with Perl 5.11, C<localtime> and C<gmtime> can handle dates past
3bc3c5be 54903:14:08 January 19, 2038, when a 32-bit based time would overflow. You
550still might get a warning on a 32-bit C<perl>:
551
552 % perl5.11.2 -E 'say scalar localtime( 0x9FFF_FFFFFFFF )'
553 Integer overflow in hexadecimal number at -e line 1.
554 Wed Nov 1 19:42:39 5576711
555
556On a 64-bit C<perl>, you can get even larger dates for those really long
557running projects:
558
559 % perl5.11.2 -E 'say scalar gmtime( 0x9FFF_FFFFFFFF )'
560 Thu Nov 2 00:42:39 5576711
561
701f2f01 562You're still out of luck if you need to keep track of decaying protons
3bc3c5be 563though.
5a964f20 564
68dc0745
PP
565=head1 Data: Strings
566
567=head2 How do I validate input?
568
6670e5e7
RGS
569(contributed by brian d foy)
570
571There are many ways to ensure that values are what you expect or
572want to accept. Besides the specific examples that we cover in the
573perlfaq, you can also look at the modules with "Assert" and "Validate"
574in their names, along with other modules such as C<Regexp::Common>.
575
576Some modules have validation for particular types of input, such
577as C<Business::ISBN>, C<Business::CreditCard>, C<Email::Valid>,
578and C<Data::Validate::IP>.
68dc0745
PP
579
580=head2 How do I unescape a string?
581
b432a672 582It depends just what you mean by "escape". URL escapes are dealt
92c2ed05 583with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
a6dd486b 584character are removed with
68dc0745 585
ac9dac7f 586 s/\\(.)/\$1/g;
68dc0745 587
92c2ed05 588This won't expand C<"\n"> or C<"\t"> or any other special escapes.
68dc0745
PP
589
590=head2 How do I remove consecutive pairs of characters?
591
6670e5e7
RGS
592(contributed by brian d foy)
593
594You can use the substitution operator to find pairs of characters (or
595runs of characters) and replace them with a single instance. In this
596substitution, we find a character in C<(.)>. The memory parentheses
d8b950dc 597store the matched character in the back-reference C<\g1> and we use
6670e5e7
RGS
598that to require that the same thing immediately follow it. We replace
599that part of the string with the character in C<\$1>.
68dc0745 600
d8b950dc 601 s/(.)\g1/\$1/g;
d92eb7b0 602
6670e5e7
RGS
603We can also use the transliteration operator, C<tr///>. In this
604example, the search list side of our C<tr///> contains nothing, but
605the C<c> option complements that so it contains everything. The
606replacement list also contains nothing, so the transliteration is
607almost a no-op since it won't do any replacements (or more exactly,
608replace the character with itself). However, the C<s> option squashes
609duplicated and consecutive characters in the string so a character
610does not show up next to itself
d92eb7b0 611
6670e5e7 612 my \$str = 'Haarlem'; # in the Netherlands
ac9dac7f 613 \$str =~ tr///cs; # Now Harlem, like in New York
68dc0745
PP
614
615=head2 How do I expand function calls in a string?
616
6670e5e7
RGS
617(contributed by brian d foy)
618
619This is documented in L<perlref>, and although it's not the easiest
620thing to read, it does work. In each of these examples, we call the
58103a2e 621function inside the braces used to dereference a reference. If we
5ae37c3f 622have more than one return value, we can construct and dereference an
6670e5e7
RGS
623anonymous array. In this case, we call the function in list context.
624
58103a2e 625 print "The time values are @{ [localtime] }.\n";
6670e5e7
RGS
626
627If we want to call the function in scalar context, we have to do a bit
628more work. We can really have any code we like inside the braces, so
629we simply have to end with the scalar reference, although how you do
e573f903
RGS
630that is up to you, and you can use code inside the braces. Note that
631the use of parens creates a list context, so we need C<scalar> to
632force the scalar context on the function:
68dc0745 633
6670e5e7 634 print "The time is \${\(scalar localtime)}.\n"
58103a2e 635
6670e5e7 636 print "The time is \${ my \$x = localtime; \\$x }.\n";
58103a2e 637
6670e5e7
RGS
638If your function already returns a reference, you don't need to create
639the reference yourself.
640
641 sub timestamp { my \$t = localtime; \\$t }
58103a2e 642
6670e5e7 643 print "The time is \${ timestamp() }.\n";
58103a2e
RGS
644
645The C<Interpolation> module can also do a lot of magic for you. You can
646specify a variable name, in this case C<E>, to set up a tied hash that
647does the interpolation for you. It has several other methods to do this
648as well.
649
650 use Interpolation E => 'eval';
651 print "The time values are \$E{localtime()}.\n";
652
653In most cases, it is probably easier to simply use string concatenation,
654which also forces scalar context.
6670e5e7 655
ac9dac7f 656 print "The time is " . localtime() . ".\n";
68dc0745 657
68dc0745
PP
658=head2 How do I find matching/nesting anything?
659
92c2ed05
GS
660This isn't something that can be done in one regular expression, no
661matter how complicated. To find something between two single
662characters, a pattern like C</x([^x]*)x/> will get the intervening
663bits in \$1. For multiple ones, then something more like
ac9dac7f 664C</alpha(.*?)omega/> would be needed. But none of these deals with
6670e5e7
RGS
665nested patterns. For balanced expressions using C<(>, C<{>, C<[> or
666C<< < >> as delimiters, use the CPAN module Regexp::Common, or see
667L<perlre/(??{ code })>. For other cases, you'll have to write a
668parser.
92c2ed05
GS
669
670If you are serious about writing a parser, there are a number of
6a2af475 671modules or oddities that will make your life a lot easier. There are
ac9dac7f
RGS
672the CPAN modules C<Parse::RecDescent>, C<Parse::Yapp>, and
673C<Text::Balanced>; and the C<byacc> program. Starting from perl 5.8
674the C<Text::Balanced> is part of the standard distribution.
68dc0745 675
92c2ed05
GS
676One simple destructive, inside-out approach that you might try is to
677pull out the smallest nesting parts one at a time:
5a964f20 678
ac9dac7f
RGS
679 while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
680 # do something with \$1
681 }
5a964f20 682
65acb1b1
TC
683A more complicated and sneaky approach is to make Perl's regular
684expression engine do it for you. This is courtesy Dean Inada, and
685rather has the nature of an Obfuscated Perl Contest entry, but it
686really does work:
687
ac9dac7f
RGS
688 # \$_ contains the string to parse
689 # BEGIN and END are the opening and closing markers for the
690 # nested text.
c47ff5f1 691
ac9dac7f
RGS
692 @( = ('(','');
693 @) = (')','');
694 (\$re=\$_)=~s/((BEGIN)|(END)|.)/\$)[!\$3]\Q\$1\E\$([!\$2]/gs;
695 @\$ = (eval{/\$re/},\$@!~/unmatched/i);
696 print join("\n",@\$[0..\$#\$]) if( \$\$[-1] );
65acb1b1 697
68dc0745
PP
698=head2 How do I reverse a string?
699
ac9dac7f 700Use C<reverse()> in scalar context, as documented in
68dc0745
PP
701L<perlfunc/reverse>.
702
ac9dac7f 703 \$reversed = reverse \$string;
68dc0745
PP
704
705=head2 How do I expand tabs in a string?
706
5a964f20 707You can do it yourself:
68dc0745 708
ac9dac7f 709 1 while \$string =~ s/\t+/' ' x (length(\$&) * 8 - length(\$`) % 8)/e;
68dc0745 710
ac9dac7f 711Or you can just use the C<Text::Tabs> module (part of the standard Perl
68dc0745
PP
712distribution).
713
ac9dac7f
RGS
714 use Text::Tabs;
715 @expanded_lines = expand(@lines_with_tabs);
68dc0745
PP
716
717=head2 How do I reformat a paragraph?
718
ac9dac7f 719Use C<Text::Wrap> (part of the standard Perl distribution):
68dc0745 720
ac9dac7f
RGS
721 use Text::Wrap;
722 print wrap("\t", ' ', @paragraphs);
68dc0745 723
ac9dac7f
RGS
724The paragraphs you give to C<Text::Wrap> should not contain embedded
725newlines. C<Text::Wrap> doesn't justify the lines (flush-right).
46fc3d4c 726
ac9dac7f
RGS
727Or use the CPAN module C<Text::Autoformat>. Formatting files can be
728easily done by making a shell alias, like so:
bc06af74 729
ac9dac7f
RGS
730 alias fmt="perl -i -MText::Autoformat -n0777 \
731 -e 'print autoformat \$_, {all=>1}' \$*"
bc06af74 732
ac9dac7f 733See the documentation for C<Text::Autoformat> to appreciate its many
bc06af74
JH
734capabilities.
735
49d635f9 736=head2 How can I access or change N characters of a string?
68dc0745 737
49d635f9
RGS
738You can access the first characters of a string with substr().
739To get the first character, for example, start at position 0
197aec24 740and grab the string of length 1.
68dc0745 741
68dc0745 742
49d635f9 743 \$string = "Just another Perl Hacker";
ac9dac7f 744 \$first_char = substr( \$string, 0, 1 ); # 'J'
68dc0745 745
49d635f9
RGS
746To change part of a string, you can use the optional fourth
747argument which is the replacement string.
68dc0745 748
ac9dac7f 749 substr( \$string, 13, 4, "Perl 5.8.0" );
197aec24 750
49d635f9 751You can also use substr() as an lvalue.
68dc0745 752
ac9dac7f 753 substr( \$string, 13, 4 ) = "Perl 5.8.0";
197aec24 754
68dc0745
PP
755=head2 How do I change the Nth occurrence of something?
756
92c2ed05
GS
757You have to keep track of N yourself. For example, let's say you want
758to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
d92eb7b0
GS
759C<"whosoever"> or C<"whomsoever">, case insensitively. These
760all assume that \$_ contains the string to be altered.
68dc0745 761
ac9dac7f
RGS
762 \$count = 0;
763 s{((whom?)ever)}{
764 ++\$count == 5 # is it the 5th?
765 ? "\${2}soever" # yes, swap
766 : \$1 # renege and leave it there
767 }ige;
68dc0745 768
5a964f20
TC
769In the more general case, you can use the C</g> modifier in a C<while>
770loop, keeping count of matches.
771
ac9dac7f
RGS
772 \$WANT = 3;
773 \$count = 0;
774 \$_ = "One fish two fish red fish blue fish";
775 while (/(\w+)\s+fish\b/gi) {
776 if (++\$count == \$WANT) {
777 print "The third fish is a \$1 one.\n";
778 }
779 }
5a964f20 780
92c2ed05 781That prints out: C<"The third fish is a red one."> You can also use a
5a964f20
TC
782repetition count and repeated pattern like this:
783
ac9dac7f 784 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
5a964f20 785
68dc0745
PP
786=head2 How can I count the number of occurrences of a substring within a string?
787
a6dd486b 788There are a number of ways, with varying efficiency. If you want a
68dc0745
PP
789count of a certain single character (X) within a string, you can use the
790C<tr///> function like so:
791
ac9dac7f
RGS
792 \$string = "ThisXlineXhasXsomeXx'sXinXit";
793 \$count = (\$string =~ tr/X//);
794 print "There are \$count X characters in the string";
68dc0745
PP
795
796This is fine if you are just looking for a single character. However,
797if you are trying to count multiple character substrings within a
798larger string, C<tr///> won't work. What you can do is wrap a while()
799loop around a global pattern match. For example, let's count negative
800integers:
801
ac9dac7f
RGS
802 \$string = "-9 55 48 -2 23 -76 4 14 -44";
803 while (\$string =~ /-\d+/g) { \$count++ }
804 print "There are \$count negative numbers in the string";
68dc0745 805
881bdbd4
JH
806Another version uses a global match in list context, then assigns the
807result to a scalar, producing a count of the number of matches.
808
809 \$count = () = \$string =~ /-\d+/g;
810
109f0441
SM
811=head2 How do I capitalize all the words on one line?
812X<Text::Autoformat> X<capitalize> X<case, title> X<case, sentence>
5a964f20 813
109f0441 814(contributed by brian d foy)
65acb1b1 815
109f0441
SM
816Damian Conway's L<Text::Autoformat> handles all of the thinking
817for you.
369b44b4 818
ac9dac7f
RGS
819 use Text::Autoformat;
820 my \$x = "Dr. Strangelove or: How I Learned to Stop ".
821 "Worrying and Love the Bomb";
369b44b4 822
ac9dac7f
RGS
823 print \$x, "\n";
824 for my \$style (qw( sentence title highlight )) {
825 print autoformat(\$x, { case => \$style }), "\n";
826 }
369b44b4 827
109f0441
SM
828How do you want to capitalize those words?
829
830 FRED AND BARNEY'S LODGE # all uppercase
831 Fred And Barney's Lodge # title case
832 Fred and Barney's Lodge # highlight case
833
834It's not as easy a problem as it looks. How many words do you think
835are in there? Wait for it... wait for it.... If you answered 5
836you're right. Perl words are groups of C<\w+>, but that's not what
837you want to capitalize. How is Perl supposed to know not to capitalize
838that C<s> after the apostrophe? You could try a regular expression:
839
840 \$string =~ s/ (
841 (^\w) #at the beginning of the line
842 | # or
843 (\s\w) #preceded by whitespace
844 )
845 /\U\$1/xg;
846
847 \$string =~ s/([\w']+)/\u\L\$1/g;
848
849Now, what if you don't want to capitalize that "and"? Just use
850L<Text::Autoformat> and get on with the next problem. :)
851
49d635f9 852=head2 How can I split a [character] delimited string except when inside [character]?
68dc0745 853
ac9dac7f
RGS
854Several modules can handle this sort of parsing--C<Text::Balanced>,
855C<Text::CSV>, C<Text::CSV_XS>, and C<Text::ParseWords>, among others.
49d635f9
RGS
856
857Take the example case of trying to split a string that is
858comma-separated into its different fields. You can't use C<split(/,/)>
859because you shouldn't split if the comma is inside quotes. For
860example, take a data line like this:
68dc0745 861
ac9dac7f 862 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
68dc0745
PP
863
864Due to the restriction of the quotes, this is a fairly complex
197aec24 865problem. Thankfully, we have Jeffrey Friedl, author of
49d635f9 866I<Mastering Regular Expressions>, to handle these for us. He
ac9dac7f 867suggests (assuming your string is contained in C<\$text>):
68dc0745 868
ac9dac7f
RGS
869 @new = ();
870 push(@new, \$+) while \$text =~ m{
871 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
872 | ([^,]+),?
873 | ,
874 }gx;
875 push(@new, undef) if substr(\$text,-1,1) eq ',';
68dc0745 876
46fc3d4c
PP
877If you want to represent quotation marks inside a
878quotation-mark-delimited field, escape them with backslashes (eg,
49d635f9 879C<"like \"this\"">.
46fc3d4c 880
ac9dac7f
RGS
881Alternatively, the C<Text::ParseWords> module (part of the standard
882Perl distribution) lets you say:
68dc0745 883
ac9dac7f
RGS
884 use Text::ParseWords;
885 @new = quotewords(",", 0, \$text);
65acb1b1 886
68dc0745
PP
887=head2 How do I strip blank space from the beginning/end of a string?
888
6670e5e7 889(contributed by brian d foy)
68dc0745 890
6670e5e7
RGS
891A substitution can do this for you. For a single line, you want to
892replace all the leading or trailing whitespace with nothing. You
960c6898 893can do that with a pair of substitutions:
68dc0745 894
6670e5e7
RGS
895 s/^\s+//;
896 s/\s+\$//;
68dc0745 897
6670e5e7
RGS
898You can also write that as a single substitution, although it turns
899out the combined statement is slower than the separate ones. That
960c6898 900might not matter to you, though:
68dc0745 901
6670e5e7 902 s/^\s+|\s+\$//g;
68dc0745 903
6670e5e7
RGS
904In this regular expression, the alternation matches either at the
905beginning or the end of the string since the anchors have a lower
906precedence than the alternation. With the C</g> flag, the substitution
907makes all possible matches, so it gets both. Remember, the trailing
908newline matches the C<\s+>, and the C<\$> anchor can match to the
960c6898 909absolute end of the string, so the newline disappears too. Just add
6670e5e7
RGS
910the newline to the output, which has the added benefit of preserving
911"blank" (consisting entirely of whitespace) lines which the C<^\s+>
960c6898 912would remove all by itself:
68dc0745 913
960c6898 914 while( <> ) {
6670e5e7
RGS
915 s/^\s+|\s+\$//g;
916 print "\$_\n";
917 }
5a964f20 918
960c6898 919For a multi-line string, you can apply the regular expression to each
920logical line in the string by adding the C</m> flag (for
6670e5e7 921"multi-line"). With the C</m> flag, the C<\$> matches I<before> an
960c6898 922embedded newline, so it doesn't remove it. This pattern still removes
923the newline at the end of the string:
6670e5e7 924
ac9dac7f 925 \$string =~ s/^\s+|\s+\$//gm;
6670e5e7
RGS
926
927Remember that lines consisting entirely of whitespace will disappear,
928since the first part of the alternation can match the entire string
960c6898 929and replace it with nothing. If you need to keep embedded blank lines,
6670e5e7 930you have to do a little more work. Instead of matching any whitespace
960c6898 931(since that includes a newline), just match the other whitespace:
6670e5e7
RGS
932
933 \$string =~ s/^[\t\f ]+|[\t\f ]+\$//mg;
5a964f20 934
65acb1b1
TC
935=head2 How do I pad a string with blanks or pad a number with zeroes?
936
65acb1b1 937In the following examples, C<\$pad_len> is the length to which you wish
d92eb7b0
GS
938to pad the string, C<\$text> or C<\$num> contains the string to be padded,
939and C<\$pad_char> contains the padding character. You can use a single
940character string constant instead of the C<\$pad_char> variable if you
941know what it is in advance. And in the same way you can use an integer in
942place of C<\$pad_len> if you know the pad length in advance.
65acb1b1 943
d92eb7b0
GS
944The simplest method uses the C<sprintf> function. It can pad on the left
945or right with blanks and on the left with zeroes and it will not
946truncate the result. The C<pack> function can only pad strings on the
947right with blanks and it will truncate the result to a maximum length of
948C<\$pad_len>.
65acb1b1 949
ac9dac7f 950 # Left padding a string with blanks (no truncation):
04d666b1
RGS
951 \$padded = sprintf("%\${pad_len}s", \$text);
952 \$padded = sprintf("%*s", \$pad_len, \$text); # same thing
65acb1b1 953
ac9dac7f 954 # Right padding a string with blanks (no truncation):
04d666b1
RGS
955 \$padded = sprintf("%-\${pad_len}s", \$text);
956 \$padded = sprintf("%-*s", \$pad_len, \$text); # same thing
65acb1b1 957
ac9dac7f 958 # Left padding a number with 0 (no truncation):
04d666b1
RGS
959 \$padded = sprintf("%0\${pad_len}d", \$num);
960 \$padded = sprintf("%0*d", \$pad_len, \$num); # same thing
65acb1b1 961
ac9dac7f
RGS
962 # Right padding a string with blanks using pack (will truncate):
963 \$padded = pack("A\$pad_len",\$text);
65acb1b1 964
d92eb7b0
GS
965If you need to pad with a character other than blank or zero you can use
966one of the following methods. They all generate a pad string with the
967C<x> operator and combine that with C<\$text>. These methods do
968not truncate C<\$text>.
65acb1b1 969
d92eb7b0 970Left and right padding with any character, creating a new string:
65acb1b1 971
ac9dac7f
RGS
972 \$padded = \$pad_char x ( \$pad_len - length( \$text ) ) . \$text;
973 \$padded = \$text . \$pad_char x ( \$pad_len - length( \$text ) );
65acb1b1 974
d92eb7b0 975Left and right padding with any character, modifying C<\$text> directly:
65acb1b1 976
ac9dac7f
RGS
977 substr( \$text, 0, 0 ) = \$pad_char x ( \$pad_len - length( \$text ) );
978 \$text .= \$pad_char x ( \$pad_len - length( \$text ) );
65acb1b1 979
68dc0745
PP
980=head2 How do I extract selected columns from a string?
981
e573f903
RGS
982(contributed by brian d foy)
983
d12d61cf 984If you know the columns that contain the data, you can
e573f903
RGS
985use C<substr> to extract a single column.
986
987 my \$column = substr( \$line, \$start_column, \$length );
988
989You can use C<split> if the columns are separated by whitespace or
990some other delimiter, as long as whitespace or the delimiter cannot
991appear as part of the data.
992
993 my \$line = ' fred barney betty ';
994 my @columns = split /\s+/, \$line;
995 # ( '', 'fred', 'barney', 'betty' );
996
997 my \$line = 'fred||barney||betty';
998 my @columns = split /\|/, \$line;
999 # ( 'fred', '', 'barney', '', 'betty' );
1000
1001If you want to work with comma-separated values, don't do this since
1002that format is a bit more complicated. Use one of the modules that
109f0441 1003handle that format, such as C<Text::CSV>, C<Text::CSV_XS>, or
e573f903
RGS
1004C<Text::CSV_PP>.
1005
1006If you want to break apart an entire line of fixed columns, you can use
589a5df2 1007C<unpack> with the A (ASCII) format. By using a number after the format
e573f903
RGS
1008specifier, you can denote the column width. See the C<pack> and C<unpack>
1009entries in L<perlfunc> for more details.
1010
1011 my @fields = unpack( \$line, "A8 A8 A8 A16 A4" );
1012
1013Note that spaces in the format argument to C<unpack> do not denote literal
1014spaces. If you have space separated data, you may want C<split> instead.
68dc0745
PP
1015
1016=head2 How do I find the soundex value of a string?
1017
7678cced
RGS
1018(contributed by brian d foy)
1019
1020You can use the Text::Soundex module. If you want to do fuzzy or close
ac9dac7f
RGS
1021matching, you might also try the C<String::Approx>, and
1022C<Text::Metaphone>, and C<Text::DoubleMetaphone> modules.
68dc0745
PP
1023
1024=head2 How can I expand variables in text strings?
1025
e573f903 1026(contributed by brian d foy)
5a964f20 1027
322be77c 1028If you can avoid it, don't, or if you can use a templating system,
c195e131
RGS
1029such as C<Text::Template> or C<Template> Toolkit, do that instead. You
1030might even be able to get the job done with C<sprintf> or C<printf>:
1031
1032 my \$string = sprintf 'Say hello to %s and %s', \$foo, \$bar;
322be77c
RGS
1033
1034However, for the one-off simple case where I don't want to pull out a
1035full templating system, I'll use a string that has two Perl scalar
1036variables in it. In this example, I want to expand C<\$foo> and C<\$bar>
c195e131 1037to their variable's values:
e573f903
RGS
1038
1039 my \$foo = 'Fred';
1040 my \$bar = 'Barney';
1041 \$string = 'Say hello to \$foo and \$bar';
1042
1043One way I can do this involves the substitution operator and a double
1044C</e> flag. The first C</e> evaluates C<\$1> on the replacement side and
1045turns it into C<\$foo>. The second /e starts with C<\$foo> and replaces
1046it with its value. C<\$foo>, then, turns into 'Fred', and that's finally
c195e131 1047what's left in the string:
e573f903
RGS
1048
1049 \$string =~ s/(\\$\w+)/\$1/eeg; # 'Say hello to Fred and Barney'
322be77c 1050
e573f903 1051The C</e> will also silently ignore violations of strict, replacing
c195e131 1052undefined variable names with the empty string. Since I'm using the
109f0441 1053C</e> flag (twice even!), I have all of the same security problems I
c195e131
RGS
1054have with C<eval> in its string form. If there's something odd in
1055C<\$foo>, perhaps something like C<@{[ system "rm -rf /" ]}>, then
1056I could get myself in trouble.
1057
1058To get around the security problem, I could also pull the values from
1059a hash instead of evaluating variable names. Using a single C</e>, I
1060can check the hash to ensure the value exists, and if it doesn't, I
1061can replace the missing value with a marker, in this case C<???> to
1062signal that I missed something:
e573f903
RGS
1063
1064 my \$string = 'This has \$foo and \$bar';
109f0441 1065
e573f903
RGS
1066 my %Replacements = (
1067 foo => 'Fred',
ac9dac7f 1068 );
322be77c 1069
e573f903
RGS
1070 # \$string =~ s/\\$(\w+)/\$Replacements{\$1}/g;
1071 \$string =~ s/\\$(\w+)/
1072 exists \$Replacements{\$1} ? \$Replacements{\$1} : '???'
1073 /eg;
322be77c 1074
e573f903 1075 print \$string;
322be77c 1076
68dc0745
PP
1077=head2 What's wrong with always quoting "\$vars"?
1078
ac9dac7f 1079The problem is that those double-quotes force
e573f903
RGS
1080stringification--coercing numbers and references into strings--even
1081when you don't want them to be strings. Think of it this way:
1082double-quote expansion is used to produce new strings. If you already
1083have a string, why do you need more?
68dc0745
PP
1084
1085If you get used to writing odd things like these:
1086
ac9dac7f
RGS
1087 print "\$var"; # BAD
1088 \$new = "\$old"; # BAD
1089 somefunc("\$var"); # BAD
68dc0745
PP
1090
1091You'll be in trouble. Those should (in 99.8% of the cases) be
1092the simpler and more direct:
1093
ac9dac7f
RGS
1094 print \$var;
1095 \$new = \$old;
1096 somefunc(\$var);
68dc0745
PP
1097
1098Otherwise, besides slowing you down, you're going to break code when
1099the thing in the scalar is actually neither a string nor a number, but
1100a reference:
1101
ac9dac7f
RGS
1102 func(\@array);
1103 sub func {
1104 my \$aref = shift;
1105 my \$oref = "\$aref"; # WRONG
1106 }
68dc0745
PP
1107
1108You can also get into subtle problems on those few operations in Perl
1109that actually do care about the difference between a string and a
1110number, such as the magical C<++> autoincrement operator or the
1111syscall() function.
1112
197aec24 1113Stringification also destroys arrays.
5a964f20 1114
ac9dac7f
RGS
1115 @lines = `command`;
1116 print "@lines"; # WRONG - extra blanks
1117 print @lines; # right
5a964f20 1118
04d666b1 1119=head2 Why don't my E<lt>E<lt>HERE documents work?
68dc0745
PP
1120
1121Check for these three things:
1122
1123=over 4
1124
04d666b1 1125=item There must be no space after the E<lt>E<lt> part.
68dc0745 1126
197aec24 1127=item There (probably) should be a semicolon at the end.
68dc0745 1128
197aec24 1129=item You can't (easily) have any space in front of the tag.
68dc0745
PP
1130
1131=back
1132
197aec24 1133If you want to indent the text in the here document, you
5a964f20
TC
1134can do this:
1135
1136 # all in one
1137 (\$VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1138 your text
1139 goes here
1140 HERE_TARGET
1141
1142But the HERE_TARGET must still be flush against the margin.
197aec24 1143If you want that indented also, you'll have to quote
5a964f20
TC
1144in the indentation.
1145
1146 (\$quote = <<' FINIS') =~ s/^\s+//gm;
1147 ...we will have peace, when you and all your works have
1148 perished--and the works of your dark master to whom you
1149 would deliver us. You are a liar, Saruman, and a corrupter
1150 of men's hearts. --Theoden in /usr/src/perl/taint.c
1151 FINIS
83ded9ee 1152 \$quote =~ s/\s+--/\n--/;
5a964f20
TC
1153
1154A nice general-purpose fixer-upper function for indented here documents
1155follows. It expects to be called with a here document as its argument.
1156It looks to see whether each line begins with a common substring, and
a6dd486b
JB
1157if so, strips that substring off. Otherwise, it takes the amount of leading
1158whitespace found on the first line and removes that much off each
5a964f20
TC
1159subsequent line.
1160
1161 sub fix {
1162 local \$_ = shift;
a6dd486b 1163 my (\$white, \$leader); # common whitespace and common leading string
d8b950dc 1164 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\g1\g2?.*\n)+\$/) {
5a964f20
TC
1165 (\$white, \$leader) = (\$2, quotemeta(\$1));
1166 } else {
1167 (\$white, \$leader) = (/^(\s+)/, '');
1168 }
1169 s/^\s*?\$leader(?:\$white)?//gm;
1170 return \$_;
1171 }
1172
c8db1d39 1173This works with leading special strings, dynamically determined:
5a964f20 1174
ac9dac7f 1175 \$remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
5a964f20
TC
1176 @@@ int
1177 @@@ runops() {
1178 @@@ SAVEI32(runlevel);
1179 @@@ runlevel++;
d92eb7b0 1180 @@@ while ( op = (*op->op_ppaddr)() );
5a964f20
TC
1181 @@@ TAINT_NOT;
1182 @@@ return 0;
1183 @@@ }
ac9dac7f 1184 MAIN_INTERPRETER_LOOP
5a964f20 1185
a6dd486b 1186Or with a fixed amount of leading whitespace, with remaining
5a964f20
TC
1187indentation correctly preserved:
1188
ac9dac7f 1189 \$poem = fix<<EVER_ON_AND_ON;
5a964f20
TC
1190 Now far ahead the Road has gone,
1191 And I must follow, if I can,
1192 Pursuing it with eager feet,
1193 Until it joins some larger way
1194 Where many paths and errands meet.
1195 And whither then? I cannot say.
1196 --Bilbo in /usr/src/perl/pp_ctl.c
ac9dac7f 1197 EVER_ON_AND_ON
5a964f20 1198
68dc0745
PP
1199=head1 Data: Arrays
1200
65acb1b1
TC
1201=head2 What is the difference between a list and an array?
1202
8d2e243f 1203(contributed by brian d foy)
1204
1205A list is a fixed collection of scalars. An array is a variable that
1206holds a variable collection of scalars. An array can supply its collection
1207for list operations, so list operations also work on arrays:
1208
1209 # slices
1210 ( 'dog', 'cat', 'bird' )[2,3];
1211 @animals[2,3];
1212
1213 # iteration
1214 foreach ( qw( dog cat bird ) ) { ... }
1215 foreach ( @animals ) { ... }
1216
1217 my @three = grep { length == 3 } qw( dog cat bird );
1218 my @three = grep { length == 3 } @animals;
d12d61cf 1219
8d2e243f 1220 # supply an argument list
1221 wash_animals( qw( dog cat bird ) );
1222 wash_animals( @animals );
1223
c69ca1d4 1224Array operations, which change the scalars, rearranges them, or adds
8d2e243f 1225or subtracts some scalars, only work on arrays. These can't work on a
1226list, which is fixed. Array operations include C<shift>, C<unshift>,
1227C<push>, C<pop>, and C<splice>.
1228
1229An array can also change its length:
1230
1231 \$#animals = 1; # truncate to two elements
1232 \$#animals = 10000; # pre-extend to 10,001 elements
1233
1234You can change an array element, but you can't change a list element:
1235
1236 \$animals[0] = 'Rottweiler';
1237 qw( dog cat bird )[0] = 'Rottweiler'; # syntax error!
1238
1239 foreach ( @animals ) {
1240 s/^d/fr/; # works fine
1241 }
d12d61cf 1242
8d2e243f 1243 foreach ( qw( dog cat bird ) ) {
1244 s/^d/fr/; # Error! Modification of read only value!
1245 }
1246
d12d61cf 1247However, if the list element is itself a variable, it appears that you
8d2e243f 1248can change a list element. However, the list element is the variable, not
1249the data. You're not changing the list element, but something the list
d12d61cf 1250element refers to. The list element itself doesn't change: it's still
8d2e243f 1251the same variable.
65acb1b1 1252
8d2e243f 1253You also have to be careful about context. You can assign an array to
1254a scalar to get the number of elements in the array. This only works
1255for arrays, though:
1256
1257 my \$count = @animals; # only works with arrays
d12d61cf 1258
8d2e243f 1259If you try to do the same thing with what you think is a list, you
1260get a quite different result. Although it looks like you have a list
1261on the righthand side, Perl actually sees a bunch of scalars separated
1262by a comma:
65acb1b1 1263
8d2e243f 1264 my \$scalar = ( 'dog', 'cat', 'bird' ); # \$scalar gets bird
65acb1b1 1265
8d2e243f 1266Since you're assigning to a scalar, the righthand side is in scalar
1267context. The comma operator (yes, it's an operator!) in scalar
1268context evaluates its lefthand side, throws away the result, and
1269evaluates it's righthand side and returns the result. In effect,
1270that list-lookalike assigns to C<\$scalar> it's rightmost value. Many
c69ca1d4 1271people mess this up because they choose a list-lookalike whose
8d2e243f 1272last element is also the count they expect:
1273
1274 my \$scalar = ( 1, 2, 3 ); # \$scalar gets 3, accidentally
65acb1b1 1275
68dc0745
PP
1276=head2 What is the difference between \$array[1] and @array[1]?
1277
8d2e243f 1278(contributed by brian d foy)
1279
1280The difference is the sigil, that special character in front of the
1281array name. The C<\$> sigil means "exactly one item", while the C<@>
1282sigil means "zero or more items". The C<\$> gets you a single scalar,
1283while the C<@> gets you a list.
68dc0745 1284
8d2e243f 1285The confusion arises because people incorrectly assume that the sigil
1286denotes the variable type.
68dc0745 1287
8d2e243f 1288The C<\$array[1]> is a single-element access to the array. It's going
1289to return the item in index 1 (or undef if there is no item there).
1290If you intend to get exactly one element from the array, this is the
1291form you should use.
68dc0745 1292
8d2e243f 1293The C<@array[1]> is an array slice, although it has only one index.
1294You can pull out multiple elements simultaneously by specifying
1295additional indices as a list, like C<@array[1,4,3,0]>.
68dc0745 1296
8d2e243f 1297Using a slice on the lefthand side of the assignment supplies list
d12d61cf 1298context to the righthand side. This can lead to unexpected results.
1299For instance, if you want to read a single line from a filehandle,
8d2e243f 1300assigning to a scalar value is fine:
68dc0745 1301
8d2e243f 1302 \$array[1] = <STDIN>;
1303
1304However, in list context, the line input operator returns all of the
1305lines as a list. The first line goes into C<@array[1]> and the rest
1306of the lines mysteriously disappear:
1307
1308 @array[1] = <STDIN>; # most likely not what you want
1309
1310Either the C<use warnings> pragma or the B<-w> flag will warn you when
1311you use an array slice with a single index.
68dc0745 1312
d92eb7b0 1313=head2 How can I remove duplicate elements from a list or array?
68dc0745 1314
6670e5e7 1315(contributed by brian d foy)
68dc0745 1316
6670e5e7
RGS
1317Use a hash. When you think the words "unique" or "duplicated", think
1318"hash keys".
68dc0745 1319
6670e5e7
RGS
1320If you don't care about the order of the elements, you could just
1321create the hash then extract the keys. It's not important how you
1322create that hash: just that you use C<keys> to get the unique
1323elements.
551e1d92 1324
ac9dac7f
RGS
1325 my %hash = map { \$_, 1 } @array;
1326 # or a hash slice: @hash{ @array } = ();
1327 # or a foreach: \$hash{\$_} = 1 foreach ( @array );
1328
1329 my @unique = keys %hash;
68dc0745 1330
ac9dac7f
RGS
1331If you want to use a module, try the C<uniq> function from
1332C<List::MoreUtils>. In list context it returns the unique elements,
1333preserving their order in the list. In scalar context, it returns the
1334number of unique elements.
1335
1336 use List::MoreUtils qw(uniq);
1337
1338 my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1339 my \$unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
68dc0745 1340
6670e5e7
RGS
1341You can also go through each element and skip the ones you've seen
1342before. Use a hash to keep track. The first time the loop sees an
1343element, that element has no key in C<%Seen>. The C<next> statement
1344creates the key and immediately uses its value, which is C<undef>, so
1345the loop continues to the C<push> and increments the value for that
1346key. The next time the loop sees that same element, its key exists in
1347the hash I<and> the value for that key is true (since it's not 0 or
ac9dac7f
RGS
1348C<undef>), so the next skips that iteration and the loop goes to the
1349next element.
551e1d92 1350
6670e5e7
RGS
1351 my @unique = ();
1352 my %seen = ();
68dc0745 1353
6670e5e7
RGS
1354 foreach my \$elem ( @array )
1355 {
1356 next if \$seen{ \$elem }++;
1357 push @unique, \$elem;
1358 }
68dc0745 1359
6670e5e7
RGS
1360You can write this more briefly using a grep, which does the
1361same thing.
68dc0745 1362
ac9dac7f
RGS
1363 my %seen = ();
1364 my @unique = grep { ! \$seen{ \$_ }++ } @array;
65acb1b1 1365
ddbc1f16 1366=head2 How can I tell whether a certain element is contained in a list or array?
5a964f20 1367
109f0441 1368(portions of this answer contributed by Anno Siegel and brian d foy)
9e72e4c6 1369
5a964f20
TC
1370Hearing the word "in" is an I<in>dication that you probably should have
1371used a hash, not a list or array, to store your data. Hashes are
1372designed to answer this question quickly and efficiently. Arrays aren't.
68dc0745 1373
109f0441
SM
1374That being said, there are several ways to approach this. In Perl 5.10
1375and later, you can use the smart match operator to check that an item is
1376contained in an array or a hash:
1377
1378 use 5.010;
1379
1380 if( \$item ~~ @array )
1381 {
1382 say "The array contains \$item"
1383 }
1384
1385 if( \$item ~~ %hash )
1386 {
1387 say "The hash contains \$item"
1388 }
1389
1390With earlier versions of Perl, you have to do a bit more work. If you
5a964f20 1391are going to make this query many times over arbitrary string values,
881bdbd4 1392the fastest way is probably to invert the original array and maintain a
109f0441 1393hash whose keys are the first array's values:
68dc0745 1394
ac9dac7f
RGS
1395 @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1396 %is_blue = ();
1397 for (@blues) { \$is_blue{\$_} = 1 }
68dc0745 1398
ac9dac7f
RGS
1399Now you can check whether C<\$is_blue{\$some_color}>. It might have
1400been a good idea to keep the blues all in a hash in the first place.
68dc0745
PP
1401
1402If the values are all small integers, you could use a simple indexed
1403array. This kind of an array will take up less space:
1404
ac9dac7f
RGS
1405 @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1406 @is_tiny_prime = ();
1407 for (@primes) { \$is_tiny_prime[\$_] = 1 }
1408 # or simply @istiny_prime[@primes] = (1) x @primes;
68dc0745
PP
1409
1410Now you check whether \$is_tiny_prime[\$some_number].
1411
1412If the values in question are integers instead of strings, you can save
1413quite a lot of space by using bit strings instead:
1414
ac9dac7f
RGS
1415 @articles = ( 1..10, 150..2000, 2017 );
1416 undef \$read;
1417 for (@articles) { vec(\$read,\$_,1) = 1 }
68dc0745
PP
1418
1419Now check whether C<vec(\$read,\$n,1)> is true for some C<\$n>.
1420
9e72e4c6
RGS
1421These methods guarantee fast individual tests but require a re-organization
1422of the original list or array. They only pay off if you have to test
1423multiple values against the same array.
68dc0745 1424
ac9dac7f 1425If you are testing only once, the standard module C<List::Util> exports
9e72e4c6 1426the function C<first> for this purpose. It works by stopping once it
c195e131 1427finds the element. It's written in C for speed, and its Perl equivalent
9e72e4c6 1428looks like this subroutine:
68dc0745 1429
9e72e4c6
RGS
1430 sub first (&@) {
1431 my \$code = shift;
1432 foreach (@_) {
1433 return \$_ if &{\$code}();
1434 }
1435 undef;
1436 }
68dc0745 1437
9e72e4c6
RGS
1438If speed is of little concern, the common idiom uses grep in scalar context
1439(which returns the number of items that passed its condition) to traverse the
1440entire list. This does have the benefit of telling you how many matches it
1441found, though.
68dc0745 1442
9e72e4c6 1443 my \$is_there = grep \$_ eq \$whatever, @array;
65acb1b1 1444
9e72e4c6
RGS
1445If you want to actually extract the matching elements, simply use grep in
1446list context.
68dc0745 1447
9e72e4c6 1448 my @matches = grep \$_ eq \$whatever, @array;
58103a2e 1449
68dc0745
PP
1450=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
1451
ac9dac7f
RGS
1452Use a hash. Here's code to do both and more. It assumes that each
1453element is unique in a given array:
68dc0745 1454
ac9dac7f
RGS
1455 @union = @intersection = @difference = ();
1456 %count = ();
1457 foreach \$element (@array1, @array2) { \$count{\$element}++ }
1458 foreach \$element (keys %count) {
1459 push @union, \$element;
1460 push @{ \$count{\$element} > 1 ? \@intersection : \@difference }, \$element;
1461 }
68dc0745 1462
ac9dac7f
RGS
1463Note that this is the I<symmetric difference>, that is, all elements
1464in either A or in B but not in both. Think of it as an xor operation.
d92eb7b0 1465
65acb1b1
TC
1466=head2 How do I test whether two arrays or hashes are equal?
1467
109f0441
SM
1468With Perl 5.10 and later, the smart match operator can give you the answer
1469with the least amount of work:
1470
1471 use 5.010;
1472
1473 if( @array1 ~~ @array2 )
1474 {
1475 say "The arrays are the same";
1476 }
1477
1478 if( %hash1 ~~ %hash2 ) # doesn't check values!
1479 {
1480 say "The hash keys are the same";
1481 }
1482
ac9dac7f
RGS
1483The following code works for single-level arrays. It uses a
1484stringwise comparison, and does not distinguish defined versus
1485undefined empty strings. Modify if you have other needs.
65acb1b1 1486
ac9dac7f 1487 \$are_equal = compare_arrays(\@frogs, \@toads);
65acb1b1 1488
ac9dac7f
RGS
1489 sub compare_arrays {
1490 my (\$first, \$second) = @_;
1491 no warnings; # silence spurious -w undef complaints
1492 return 0 unless @\$first == @\$second;
1493 for (my \$i = 0; \$i < @\$first; \$i++) {
1494 return 0 if \$first->[\$i] ne \$second->[\$i];
1495 }
1496 return 1;
1497 }
65acb1b1
TC
1498
1499For multilevel structures, you may wish to use an approach more
ac9dac7f 1500like this one. It uses the CPAN module C<FreezeThaw>:
65acb1b1 1501
ac9dac7f
RGS
1502 use FreezeThaw qw(cmpStr);
1503 @a = @b = ( "this", "that", [ "more", "stuff" ] );
65acb1b1 1504
ac9dac7f
RGS
1505 printf "a and b contain %s arrays\n",
1506 cmpStr(\@a, \@b) == 0
1507 ? "the same"
1508 : "different";
65acb1b1 1509
ac9dac7f
RGS
1510This approach also works for comparing hashes. Here we'll demonstrate
1511two different answers:
65acb1b1 1512
ac9dac7f 1513 use FreezeThaw qw(cmpStr cmpStrHard);
65acb1b1 1514
ac9dac7f
RGS
1515 %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1516 \$a{EXTRA} = \%b;
1517 \$b{EXTRA} = \%a;
65acb1b1 1518
ac9dac7f 1519 printf "a and b contain %s hashes\n",
65acb1b1
TC
1520 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1521
ac9dac7f 1522 printf "a and b contain %s hashes\n",
65acb1b1
TC
1523 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1524
1525
1526The first reports that both those the hashes contain the same data,
1527while the second reports that they do not. Which you prefer is left as
1528an exercise to the reader.
1529
68dc0745
PP
1530=head2 How do I find the first array element for which a condition is true?
1531
49d635f9 1532To find the first array element which satisfies a condition, you can
ac9dac7f
RGS
1533use the C<first()> function in the C<List::Util> module, which comes
1534with Perl 5.8. This example finds the first element that contains
1535"Perl".
49d635f9
RGS
1536
1537 use List::Util qw(first);
197aec24 1538
49d635f9 1539 my \$element = first { /Perl/ } @array;
197aec24 1540
ac9dac7f 1541If you cannot use C<List::Util>, you can make your own loop to do the
49d635f9
RGS
1542same thing. Once you find the element, you stop the loop with last.
1543
1544 my \$found;
ac9dac7f 1545 foreach ( @array ) {
6670e5e7 1546 if( /Perl/ ) { \$found = \$_; last }
49d635f9
RGS
1547 }
1548
1549If you want the array index, you can iterate through the indices
1550and check the array element at each index until you find one
1551that satisfies the condition.
1552
197aec24 1553 my( \$found, \$index ) = ( undef, -1 );
ac9dac7f
RGS
1554 for( \$i = 0; \$i < @array; \$i++ ) {
1555 if( \$array[\$i] =~ /Perl/ ) {
6670e5e7
RGS
1556 \$found = \$array[\$i];
1557 \$index = \$i;
1558 last;
1559 }
1560 }
68dc0745
PP
1561
1562=head2 How do I handle linked lists?
1563
159235ed 1564(contributed by brian d foy)
65acb1b1 1565
159235ed 1566Perl's arrays do not have a fixed size, so you don't need linked lists
1567if you just want to add or remove items. You can use array operations
1568such as C<push>, C<pop>, C<shift>, C<unshift>, or C<splice> to do
1569that.
1570
1571Sometimes, however, linked lists can be useful in situations where you
1572want to "shard" an array so you have have many small arrays instead of
1573a single big array. You can keep arrays longer than Perl's largest
1574array index, lock smaller arrays separately in threaded programs,
1575reallocate less memory, or quickly insert elements in the middle of
1576the chain.
1577
1578Steve Lembark goes through the details in his YAPC::NA 2009 talk "Perly
84adb724 1579Linked Lists" ( http://www.slideshare.net/lembark/perly-linked-lists ),
159235ed 1580although you can just use his C<LinkedList::Single> module.
68dc0745
PP
1581
1582=head2 How do I handle circular lists?
109f0441
SM
1583X<circular> X<array> X<Tie::Cycle> X<Array::Iterator::Circular>
1584X<cycle> X<modulus>
68dc0745 1585
109f0441
SM
1586(contributed by brian d foy)
1587
589a5df2 1588If you want to cycle through an array endlessly, you can increment the
109f0441 1589index modulo the number of elements in the array:
68dc0745 1590
109f0441
SM
1591 my @array = qw( a b c );
1592 my \$i = 0;
1593
1594 while( 1 ) {
1595 print \$array[ \$i++ % @array ], "\n";
1596 last if \$i > 20;
1597 }
ac9dac7f 1598
109f0441
SM
1599You can also use C<Tie::Cycle> to use a scalar that always has the
1600next element of the circular array:
ac9dac7f
RGS
1601
1602 use Tie::Cycle;
1603
1604 tie my \$cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1605
1606 print \$cycle; # FFFFFF
1607 print \$cycle; # 000000
1608 print \$cycle; # FFFF00
68dc0745 1609
109f0441
SM
1610The C<Array::Iterator::Circular> creates an iterator object for
1611circular arrays:
1612
1613 use Array::Iterator::Circular;
1614
1615 my \$color_iterator = Array::Iterator::Circular->new(
1616 qw(red green blue orange)
1617 );
1618
1619 foreach ( 1 .. 20 ) {
1620 print \$color_iterator->next, "\n";
1621 }
1622
68dc0745
PP
1623=head2 How do I shuffle an array randomly?
1624
45bbf655
JH
1625If you either have Perl 5.8.0 or later installed, or if you have
1626Scalar-List-Utils 1.03 or later installed, you can say:
1627
ac9dac7f 1628 use List::Util 'shuffle';
45bbf655
JH
1629
1630 @shuffled = shuffle(@list);
1631
f05bbc40 1632If not, you can use a Fisher-Yates shuffle.
5a964f20 1633
ac9dac7f
RGS
1634 sub fisher_yates_shuffle {
1635 my \$deck = shift; # \$deck is a reference to an array
109f0441
SM
1636 return unless @\$deck; # must not be empty!
1637
ac9dac7f
RGS
1638 my \$i = @\$deck;
1639 while (--\$i) {
1640 my \$j = int rand (\$i+1);
1641 @\$deck[\$i,\$j] = @\$deck[\$j,\$i];
1642 }
1643 }
5a964f20 1644
ac9dac7f
RGS
1645 # shuffle my mpeg collection
1646 #
1647 my @mpeg = <audio/*/*.mp3>;
1648 fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
1649 print @mpeg;
5a964f20 1650
45bbf655 1651Note that the above implementation shuffles an array in place,
ac9dac7f 1652unlike the C<List::Util::shuffle()> which takes a list and returns
45bbf655
JH
1653a new shuffled list.
1654
d92eb7b0 1655You've probably seen shuffling algorithms that work using splice,
a6dd486b 1656randomly picking another element to swap the current element with
68dc0745 1657
ac9dac7f
RGS
1658 srand;
1659 @new = ();
1660 @old = 1 .. 10; # just a demo
1661 while (@old) {
1662 push(@new, splice(@old, rand @old, 1));
1663 }
68dc0745 1664
ac9dac7f
RGS
1665This is bad because splice is already O(N), and since you do it N
1666times, you just invented a quadratic algorithm; that is, O(N**2).
1667This does not scale, although Perl is so efficient that you probably
1668won't notice this until you have rather largish arrays.
68dc0745
PP
1669
1670=head2 How do I process/modify each element of an array?
1671
1672Use C<for>/C<foreach>:
1673
ac9dac7f 1674 for (@lines) {
6670e5e7
RGS
1675 s/foo/bar/; # change that word
1676 tr/XZ/ZX/; # swap those letters
ac9dac7f 1677 }
68dc0745
PP
1678
1679Here's another; let's compute spherical volumes:
1680
ac9dac7f 1681 for (@volumes = @radii) { # @volumes has changed parts
6670e5e7
RGS
1682 \$_ **= 3;
1683 \$_ *= (4/3) * 3.14159; # this will be constant folded
ac9dac7f 1684 }
197aec24 1685
ac9dac7f 1686which can also be done with C<map()> which is made to transform
49d635f9
RGS
1687one list into another:
1688
1689 @volumes = map {\$_ ** 3 * (4/3) * 3.14159} @radii;
68dc0745 1690
76817d6d
JH
1691If you want to do the same thing to modify the values of the
1692hash, you can use the C<values> function. As of Perl 5.6
1693the values are not copied, so if you modify \$orbit (in this
1694case), you modify the value.
5a964f20 1695
ac9dac7f 1696 for \$orbit ( values %orbits ) {
6670e5e7 1697 (\$orbit **= 3) *= (4/3) * 3.14159;
ac9dac7f 1698 }
818c4caa 1699
76817d6d
JH
1700Prior to perl 5.6 C<values> returned copies of the values,
1701so older perl code often contains constructions such as
1702C<@orbits{keys %orbits}> instead of C<values %orbits> where
1703the hash is to be modified.
818c4caa 1704
68dc0745
PP
1705=head2 How do I select a random element from an array?
1706
ac9dac7f 1707Use the C<rand()> function (see L<perlfunc/rand>):
68dc0745 1708
ac9dac7f
RGS
1709 \$index = rand @array;
1710 \$element = \$array[\$index];
68dc0745 1711
793f5136 1712Or, simply:
ac9dac7f
RGS
1713
1714 my \$element = \$array[ rand @array ];
5a964f20 1715
68dc0745 1716=head2 How do I permute N elements of a list?
c69ca1d4 1717X<List::Permutor> X<permute> X<Algorithm::Loops> X<Knuth>
c195e131 1718X<The Art of Computer Programming> X<Fischer-Krause>
68dc0745 1719
c195e131 1720Use the C<List::Permutor> module on CPAN. If the list is actually an
ac9dac7f 1721array, try the C<Algorithm::Permute> module (also on CPAN). It's
c195e131 1722written in XS code and is very efficient:
49d635f9
RGS
1723
1724 use Algorithm::Permute;
c195e131 1725
49d635f9
RGS
1726 my @array = 'a'..'d';
1727 my \$p_iterator = Algorithm::Permute->new ( \@array );
c195e131 1728
49d635f9
RGS
1729 while (my @perm = \$p_iterator->next) {
1730 print "next permutation: (@perm)\n";
ac9dac7f 1731 }
49d635f9 1732
197aec24
RGS
1733For even faster execution, you could do:
1734
ac9dac7f 1735 use Algorithm::Permute;
c195e131 1736
ac9dac7f 1737 my @array = 'a'..'d';
c195e131 1738
ac9dac7f
RGS
1739 Algorithm::Permute::permute {
1740 print "next permutation: (@array)\n";
1741 } @array;
197aec24 1742
c195e131
RGS
1743Here's a little program that generates all permutations of all the
1744words on each line of input. The algorithm embodied in the
1745C<permute()> function is discussed in Volume 4 (still unpublished) of
1746Knuth's I<The Art of Computer Programming> and will work on any list:
49d635f9
RGS
1747
1748 #!/usr/bin/perl -n
ac003c96 1749 # Fischer-Krause ordered permutation generator
49d635f9
RGS
1750
1751 sub permute (&@) {
1752 my \$code = shift;
1753 my @idx = 0..\$#_;
1754 while ( \$code->(@_[@idx]) ) {
1755 my \$p = \$#idx;
1756 --\$p while \$idx[\$p-1] > \$idx[\$p];
1757 my \$q = \$p or return;
1758 push @idx, reverse splice @idx, \$p;
1759 ++\$q while \$idx[\$p-1] > \$idx[\$q];
1760 @idx[\$p-1,\$q]=@idx[\$q,\$p-1];
1761 }
68dc0745 1762 }
68dc0745 1763
c195e131
RGS
1764 permute { print "@_\n" } split;
1765
1766The C<Algorithm::Loops> module also provides the C<NextPermute> and
1767C<NextPermuteNum> functions which efficiently find all unique permutations
1768of an array, even if it contains duplicate values, modifying it in-place:
1769if its elements are in reverse-sorted order then the array is reversed,
1770making it sorted, and it returns false; otherwise the next
1771permutation is returned.
1772
1773C<NextPermute> uses string order and C<NextPermuteNum> numeric order, so
1774you can enumerate all the permutations of C<0..9> like this:
1775
1776 use Algorithm::Loops qw(NextPermuteNum);
109f0441 1777
c195e131
RGS
1778 my @list= 0..9;
1779 do { print "@list\n" } while NextPermuteNum @list;
b8d2732a 1780
68dc0745
PP
1781=head2 How do I sort an array by (anything)?
1782
1783Supply a comparison function to sort() (described in L<perlfunc/sort>):
1784
ac9dac7f 1785 @list = sort { \$a <=> \$b } @list;
68dc0745
PP
1786
1787The default sort function is cmp, string comparison, which would
c47ff5f1 1788sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is
68dc0745
PP
1789the numerical comparison operator.
1790
1791If you have a complicated function needed to pull out the part you
1792want to sort on, then don't do it inside the sort function. Pull it
1793out first, because the sort BLOCK can be called many times for the
1794same element. Here's an example of how to pull out the first word
1795after the first number on each item, and then sort those words
1796case-insensitively.
1797
ac9dac7f
RGS
1798 @idx = ();
1799 for (@data) {
1800 (\$item) = /\d+\s*(\S+)/;
1801 push @idx, uc(\$item);
1802 }
1803 @sorted = @data[ sort { \$idx[\$a] cmp \$idx[\$b] } 0 .. \$#idx ];
68dc0745 1804
a6dd486b 1805which could also be written this way, using a trick
68dc0745
PP
1806that's come to be known as the Schwartzian Transform:
1807
ac9dac7f
RGS
1808 @sorted = map { \$_->[0] }
1809 sort { \$a->[1] cmp \$b->[1] }
1810 map { [ \$_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
68dc0745
PP
1811
1812If you need to sort on several fields, the following paradigm is useful.
1813
ac9dac7f
RGS
1814 @sorted = sort {
1815 field1(\$a) <=> field1(\$b) ||
1816 field2(\$a) cmp field2(\$b) ||
1817 field3(\$a) cmp field3(\$b)
1818 } @data;
68dc0745
PP
1819
1820This can be conveniently combined with precalculation of keys as given
1821above.
1822
379e39d7 1823See the F<sort> article in the "Far More Than You Ever Wanted
49d635f9 1824To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
06a5f41f 1825more about this approach.
68dc0745 1826
ac9dac7f 1827See also the question later in L<perlfaq4> on sorting hashes.
68dc0745
PP
1828
1829=head2 How do I manipulate arrays of bits?
1830
ac9dac7f
RGS
1831Use C<pack()> and C<unpack()>, or else C<vec()> and the bitwise
1832operations.
1833
109f0441
SM
1834For example, you don't have to store individual bits in an array
1835(which would mean that you're wasting a lot of space). To convert an
1836array of bits to a string, use C<vec()> to set the right bits. This
1837sets C<\$vec> to have bit N set only if C<\$ints[N]> was set:
ac9dac7f 1838
109f0441 1839 @ints = (...); # array of bits, e.g. ( 1, 0, 0, 1, 1, 0 ... )
ac9dac7f 1840 \$vec = '';
109f0441
SM
1841 foreach( 0 .. \$#ints ) {
1842 vec(\$vec,\$_,1) = 1 if \$ints[\$_];
1843 }
ac9dac7f 1844
109f0441
SM
1845The string C<\$vec> only takes up as many bits as it needs. For
1846instance, if you had 16 entries in C<@ints>, C<\$vec> only needs two
1847bytes to store them (not counting the scalar variable overhead).
1848
1849Here's how, given a vector in C<\$vec>, you can get those bits into
1850your C<@ints> array:
ac9dac7f
RGS
1851
1852 sub bitvec_to_list {
1853 my \$vec = shift;
1854 my @ints;
1855 # Find null-byte density then select best algorithm
1856 if (\$vec =~ tr/\0// / length \$vec > 0.95) {
1857 use integer;
1858 my \$i;
1859
1860 # This method is faster with mostly null-bytes
1861 while(\$vec =~ /[^\0]/g ) {
1862 \$i = -9 + 8 * pos \$vec;
1863 push @ints, \$i if vec(\$vec, ++\$i, 1);
1864 push @ints, \$i if vec(\$vec, ++\$i, 1);
1865 push @ints, \$i if vec(\$vec, ++\$i, 1);
1866 push @ints, \$i if vec(\$vec, ++\$i, 1);
1867 push @ints, \$i if vec(\$vec, ++\$i, 1);
1868 push @ints, \$i if vec(\$vec, ++\$i, 1);
1869 push @ints, \$i if vec(\$vec, ++\$i, 1);
1870 push @ints, \$i if vec(\$vec, ++\$i, 1);
1871 }
1872 }
1873 else {
1874 # This method is a fast general algorithm
1875 use integer;
1876 my \$bits = unpack "b*", \$vec;
1877 push @ints, 0 if \$bits =~ s/^(\d)// && \$1;
1878 push @ints, pos \$bits while(\$bits =~ /1/g);
1879 }
1880
1881 return \@ints;
1882 }
68dc0745
PP
1883
1884This method gets faster the more sparse the bit vector is.
1885(Courtesy of Tim Bunce and Winfried Koenig.)
1886
76817d6d
JH
1887You can make the while loop a lot shorter with this suggestion
1888from Benjamin Goldberg:
1889
1890 while(\$vec =~ /[^\0]+/g ) {
ac9dac7f
RGS
1891 push @ints, grep vec(\$vec, \$_, 1), \$-[0] * 8 .. \$+[0] * 8;
1892 }
76817d6d 1893
ac9dac7f 1894Or use the CPAN module C<Bit::Vector>:
cc30d1a7 1895
ac9dac7f
RGS
1896 \$vector = Bit::Vector->new(\$num_of_bits);
1897 \$vector->Index_List_Store(@ints);
1898 @ints = \$vector->Index_List_Read();
cc30d1a7 1899
ac9dac7f
RGS
1900C<Bit::Vector> provides efficient methods for bit vector, sets of
1901small integers and "big int" math.
cc30d1a7
JH
1902
1903Here's a more extensive illustration using vec():
65acb1b1 1904
ac9dac7f
RGS
1905 # vec demo
1906 \$vector = "\xff\x0f\xef\xfe";
1907 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
65acb1b1 1908 unpack("N", \$vector), "\n";
ac9dac7f
RGS
1909 \$is_set = vec(\$vector, 23, 1);
1910 print "Its 23rd bit is ", \$is_set ? "set" : "clear", ".\n";
65acb1b1 1911 pvec(\$vector);
65acb1b1 1912
ac9dac7f
RGS
1913 set_vec(1,1,1);
1914 set_vec(3,1,1);
1915 set_vec(23,1,1);
1916
1917 set_vec(3,1,3);
1918 set_vec(3,2,3);
1919 set_vec(3,4,3);
1920 set_vec(3,4,7);
1921 set_vec(3,8,3);
1922 set_vec(3,8,7);
1923
1924 set_vec(0,32,17);
1925 set_vec(1,32,17);
1926
1927 sub set_vec {
1928 my (\$offset, \$width, \$value) = @_;
1929 my \$vector = '';
1930 vec(\$vector, \$offset, \$width) = \$value;
1931 print "offset=\$offset width=\$width value=\$value\n";
1932 pvec(\$vector);
1933 }
65acb1b1 1934
ac9dac7f
RGS
1935 sub pvec {
1936 my \$vector = shift;
1937 my \$bits = unpack("b*", \$vector);
1938 my \$i = 0;
1939 my \$BASE = 8;
1940
1941 print "vector length in bytes: ", length(\$vector), "\n";
1942 @bytes = unpack("A8" x length(\$vector), \$bits);
1943 print "bits are: @bytes\n\n";
1944 }
65acb1b1 1945
68dc0745
PP
1946=head2 Why does defined() return true on empty arrays and hashes?
1947
65acb1b1
TC
1948The short story is that you should probably only use defined on scalars or
1949functions, not on aggregates (arrays and hashes). See L<perlfunc/defined>
1950in the 5.004 release or later of Perl for more detail.
68dc0745
PP
1951
1952=head1 Data: Hashes (Associative Arrays)
1953
1954=head2 How do I process an entire hash?
1955
ee891a00
RGS
1956(contributed by brian d foy)
1957
1958There are a couple of ways that you can process an entire hash. You
1959can get a list of keys, then go through each key, or grab a one
1960key-value pair at a time.
68dc0745 1961
ee891a00
RGS
1962To go through all of the keys, use the C<keys> function. This extracts
1963all of the keys of the hash and gives them back to you as a list. You
1964can then get the value through the particular key you're processing:
1965
1966 foreach my \$key ( keys %hash ) {
1967 my \$value = \$hash{\$key}
1968 ...
ac9dac7f 1969 }
68dc0745 1970
ee891a00 1971Once you have the list of keys, you can process that list before you
109f0441 1972process the hash elements. For instance, you can sort the keys so you
ee891a00
RGS
1973can process them in lexical order:
1974
1975 foreach my \$key ( sort keys %hash ) {
1976 my \$value = \$hash{\$key}
1977 ...
1978 }
1979
1980Or, you might want to only process some of the items. If you only want
1981to deal with the keys that start with C<text:>, you can select just
1982those using C<grep>:
1983
1984 foreach my \$key ( grep /^text:/, keys %hash ) {
1985 my \$value = \$hash{\$key}
1986 ...
1987 }
1988
1989If the hash is very large, you might not want to create a long list of
109f0441 1990keys. To save some memory, you can grab one key-value pair at a time using
ee891a00
RGS
1991C<each()>, which returns a pair you haven't seen yet:
1992
1993 while( my( \$key, \$value ) = each( %hash ) ) {
1994 ...
1995 }
1996
1997The C<each> operator returns the pairs in apparently random order, so if
1998ordering matters to you, you'll have to stick with the C<keys> method.
1999
2000The C<each()> operator can be a bit tricky though. You can't add or
2001delete keys of the hash while you're using it without possibly
2002skipping or re-processing some pairs after Perl internally rehashes
2003all of the elements. Additionally, a hash has only one iterator, so if
2004you use C<keys>, C<values>, or C<each> on the same hash, you can reset
2005the iterator and mess up your processing. See the C<each> entry in
2006L<perlfunc> for more details.
68dc0745 2007
109f0441
SM
2008=head2 How do I merge two hashes?
2009X<hash> X<merge> X<slice, hash>
2010
2011(contributed by brian d foy)
2012
2013Before you decide to merge two hashes, you have to decide what to do
2014if both hashes contain keys that are the same and if you want to leave
2015the original hashes as they were.
2016
2017If you want to preserve the original hashes, copy one hash (C<%hash1>)
2018to a new hash (C<%new_hash>), then add the keys from the other hash
2019(C<%hash2> to the new hash. Checking that the key already exists in
2020C<%new_hash> gives you a chance to decide what to do with the
2021duplicates:
2022
2023 my %new_hash = %hash1; # make a copy; leave %hash1 alone
2024
2025 foreach my \$key2 ( keys %hash2 )
2026 {
2027 if( exists \$new_hash{\$key2} )
2028 {
2029 warn "Key [\$key2] is in both hashes!";
2030 # handle the duplicate (perhaps only warning)
2031 ...
2032 next;
2033 }
2034 else
2035 {
2036 \$new_hash{\$key2} = \$hash2{\$key2};
2037 }
2038 }
2039
2040If you don't want to create a new hash, you can still use this looping
2041technique; just change the C<%new_hash> to C<%hash1>.
2042
2043 foreach my \$key2 ( keys %hash2 )
2044 {
2045 if( exists \$hash1{\$key2} )
2046 {
2047 warn "Key [\$key2] is in both hashes!";
2048 # handle the duplicate (perhaps only warning)
2049 ...
2050 next;
2051 }
2052 else
2053 {
2054 \$hash1{\$key2} = \$hash2{\$key2};
2055 }
2056 }
2057
2058If you don't care that one hash overwrites keys and values from the other, you
2059could just use a hash slice to add one hash to another. In this case, values
2060from C<%hash2> replace values from C<%hash1> when they have keys in common:
2061
2062 @hash1{ keys %hash2 } = values %hash2;
2063
68dc0745
PP
2064=head2 What happens if I add or remove keys from a hash while iterating over it?
2065
28b41a80 2066(contributed by brian d foy)
d92eb7b0 2067
28b41a80 2068The easy answer is "Don't do that!"
d92eb7b0 2069
28b41a80
RGS
2070If you iterate through the hash with each(), you can delete the key
2071most recently returned without worrying about it. If you delete or add
2072other keys, the iterator may skip or double up on them since perl
2073may rearrange the hash table. See the
2074entry for C<each()> in L<perlfunc>.
68dc0745
PP
2075
2076=head2 How do I look up a hash element by value?
2077
2078Create a reverse hash:
2079
ac9dac7f
RGS
2080 %by_value = reverse %by_key;
2081 \$key = \$by_value{\$value};
68dc0745
PP
2082
2083That's not particularly efficient. It would be more space-efficient
2084to use:
2085
ac9dac7f
RGS
2086 while ((\$key, \$value) = each %by_key) {
2087 \$by_value{\$value} = \$key;
2088 }
68dc0745 2089
d92eb7b0
GS
2090If your hash could have repeated values, the methods above will only find
2091one of the associated keys. This may or may not worry you. If it does
2092worry you, you can always reverse the hash into a hash of arrays instead:
2093
ac9dac7f
RGS
2094 while ((\$key, \$value) = each %by_key) {
2095 push @{\$key_list_by_value{\$value}}, \$key;
2096 }
68dc0745
PP
2097
2098=head2 How can I know how many entries are in a hash?
2099
109f0441
SM
2100(contributed by brian d foy)
2101
2102This is very similar to "How do I process an entire hash?", also in
2103L<perlfaq4>, but a bit simpler in the common cases.
2104
2105You can use the C<keys()> built-in function in scalar context to find out
2106have many entries you have in a hash:
68dc0745 2107
109f0441 2108 my \$key_count = keys %hash; # must be scalar context!
d12d61cf 2109
109f0441 2110If you want to find out how many entries have a defined value, that's
d12d61cf 2111a bit different. You have to check each value. A C<grep> is handy:
109f0441
SM
2112
2113 my \$defined_value_count = grep { defined } values %hash;
68dc0745 2114
109f0441
SM
2115You can use that same structure to count the entries any way that
2116you like. If you want the count of the keys with vowels in them,
2117you just test for that instead:
2118
2119 my \$vowel_count = grep { /[aeiou]/ } keys %hash;
d12d61cf 2120
109f0441
SM
2121The C<grep> in scalar context returns the count. If you want the list
2122of matching items, just use it in list context instead:
2123
2124 my @defined_values = grep { defined } values %hash;
2125
2126The C<keys()> function also resets the iterator, which means that you may
197aec24 2127see strange results if you use this between uses of other hash operators
109f0441 2128such as C<each()>.
68dc0745
PP
2129
2130=head2 How do I sort a hash (optionally by value instead of key)?
2131
a05e4845
RGS
2132(contributed by brian d foy)
2133
2134To sort a hash, start with the keys. In this example, we give the list of
2135keys to the sort function which then compares them ASCIIbetically (which
2136might be affected by your locale settings). The output list has the keys
2137in ASCIIbetical order. Once we have the keys, we can go through them to
2138create a report which lists the keys in ASCIIbetical order.
2139
2140 my @keys = sort { \$a cmp \$b } keys %hash;
58103a2e 2141
a05e4845
RGS
2142 foreach my \$key ( @keys )
2143 {
109f0441 2144 printf "%-20s %6d\n", \$key, \$hash{\$key};
a05e4845
RGS
2145 }
2146
58103a2e 2147We could get more fancy in the C<sort()> block though. Instead of
a05e4845 2148comparing the keys, we can compute a value with them and use that
58103a2e 2149value as the comparison.
a05e4845
RGS
2150
2151For instance, to make our report order case-insensitive, we use
58103a2e 2152the C<\L> sequence in a double-quoted string to make everything
a05e4845
RGS
2153lowercase. The C<sort()> block then compares the lowercased
2154values to determine in which order to put the keys.
2155
2156 my @keys = sort { "\L\$a" cmp "\L\$b" } keys %hash;
58103a2e 2157
a05e4845 2158Note: if the computation is expensive or the hash has many elements,
58103a2e 2159you may want to look at the Schwartzian Transform to cache the
a05e4845
RGS
2160computation results.
2161
2162If we want to sort by the hash value instead, we use the hash key
2163to look it up. We still get out a list of keys, but this time they
2164are ordered by their value.
2165
2166 my @keys = sort { \$hash{\$a} <=> \$hash{\$b} } keys %hash;
2167
2168From there we can get more complex. If the hash values are the same,
2169we can provide a secondary sort on the hash key.
2170
58103a2e
RGS
2171 my @keys = sort {
2172 \$hash{\$a} <=> \$hash{\$b}
a05e4845
RGS
2173 or
2174 "\L\$a" cmp "\L\$b"
2175 } keys %hash;
68dc0745
PP
2176
2177=head2 How can I always keep my hash sorted?
ac9dac7f 2178X<hash tie sort DB_File Tie::IxHash>
68dc0745 2179
ac9dac7f
RGS
2180You can look into using the C<DB_File> module and C<tie()> using the
2181C<\$DB_BTREE> hash bindings as documented in L<DB_File/"In Memory
2182Databases">. The C<Tie::IxHash> module from CPAN might also be
2183instructive. Although this does keep your hash sorted, you might not
2184like the slow down you suffer from the tie interface. Are you sure you
2185need to do this? :)
68dc0745
PP
2186
2187=head2 What's the difference between "delete" and "undef" with hashes?
2188
92993692
JH
2189Hashes contain pairs of scalars: the first is the key, the
2190second is the value. The key will be coerced to a string,
2191although the value can be any kind of scalar: string,
ac9dac7f 2192number, or reference. If a key C<\$key> is present in
92993692
JH
2193%hash, C<exists(\$hash{\$key})> will return true. The value
2194for a given key can be C<undef>, in which case
2195C<\$hash{\$key}> will be C<undef> while C<exists \$hash{\$key}>
2196will return true. This corresponds to (C<\$key>, C<undef>)
2197being in the hash.
68dc0745 2198
589a5df2 2199Pictures help... Here's the C<%hash> table:
68dc0745
PP
2200
2201 keys values
2202 +------+------+
2203 | a | 3 |
2204 | x | 7 |
2205 | d | 0 |
2206 | e | 2 |
2207 +------+------+
2208
2209And these conditions hold
2210
92993692
JH
2211 \$hash{'a'} is true
2212 \$hash{'d'} is false
2213 defined \$hash{'d'} is true
2214 defined \$hash{'a'} is true
e9d185f8 2215 exists \$hash{'a'} is true (Perl 5 only)
92993692 2216 grep (\$_ eq 'a', keys %hash) is true
68dc0745
PP
2217
2218If you now say
2219
92993692 2220 undef \$hash{'a'}
68dc0745
PP
2221
2222your table now reads:
2223
2224
2225 keys values
2226 +------+------+
2227 | a | undef|
2228 | x | 7 |
2229 | d | 0 |
2230 | e | 2 |
2231 +------+------+
2232
2233and these conditions now hold; changes in caps:
2234
92993692
JH
2235 \$hash{'a'} is FALSE
2236 \$hash{'d'} is false
2237 defined \$hash{'d'} is true
2238 defined \$hash{'a'} is FALSE
e9d185f8 2239 exists \$hash{'a'} is true (Perl 5 only)
92993692 2240 grep (\$_ eq 'a', keys %hash) is true
68dc0745
PP
2241
2242Notice the last two: you have an undef value, but a defined key!
2243
2244Now, consider this:
2245
92993692 2246 delete \$hash{'a'}
68dc0745
PP
2247
2248your table now reads:
2249
2250 keys values
2251 +------+------+
2252 | x | 7 |
2253 | d | 0 |
2254 | e | 2 |
2255 +------+------+
2256
2257and these conditions now hold; changes in caps:
2258
92993692
JH
2259 \$hash{'a'} is false
2260 \$hash{'d'} is false
2261 defined \$hash{'d'} is true
2262 defined \$hash{'a'} is false
e9d185f8 2263 exists \$hash{'a'} is FALSE (Perl 5 only)
92993692 2264 grep (\$_ eq 'a', keys %hash) is FALSE
68dc0745
PP
2265
2266See, the whole entry is gone!
2267
2268=head2 Why don't my tied hashes make the defined/exists distinction?
2269
92993692
JH
2270This depends on the tied hash's implementation of EXISTS().
2271For example, there isn't the concept of undef with hashes
2272that are tied to DBM* files. It also means that exists() and
2273defined() do the same thing with a DBM* file, and what they
2274end up doing is not what they do with ordinary hashes.
68dc0745
PP
2275
2276=head2 How do I reset an each() operation part-way through?
2277
fb2fe781
RGS
2278(contributed by brian d foy)
2279
2280You can use the C<keys> or C<values> functions to reset C<each>. To
2281simply reset the iterator used by C<each> without doing anything else,
2282use one of them in void context:
2283
2284 keys %hash; # resets iterator, nothing else.
2285 values %hash; # resets iterator, nothing else.
2286
2287See the documentation for C<each> in L<perlfunc>.
68dc0745
PP
2288
2289=head2 How can I get the unique keys from two hashes?
2290
d92eb7b0
GS
2291First you extract the keys from the hashes into lists, then solve
2292the "removing duplicates" problem described above. For example:
68dc0745 2293
ac9dac7f
RGS
2294 %seen = ();
2295 for \$element (keys(%foo), keys(%bar)) {
2296 \$seen{\$element}++;
2297 }
2298 @uniq = keys %seen;
68dc0745
PP
2299
2300Or more succinctly:
2301
ac9dac7f 2302 @uniq = keys %{{%foo,%bar}};
68dc0745
PP
2303
2304Or if you really want to save space:
2305
ac9dac7f
RGS
2306 %seen = ();
2307 while (defined (\$key = each %foo)) {
2308 \$seen{\$key}++;
2309 }
2310 while (defined (\$key = each %bar)) {
2311 \$seen{\$key}++;
2312 }
2313 @uniq = keys %seen;
68dc0745
PP
2314
2315=head2 How can I store a multidimensional array in a DBM file?
2316
2317Either stringify the structure yourself (no fun), or else
2318get the MLDBM (which uses Data::Dumper) module from CPAN and layer
2319it on top of either DB_File or GDBM_File.
2320
2321=head2 How can I make my hash remember the order I put elements into it?
2322
ac9dac7f 2323Use the C<Tie::IxHash> from CPAN.
68dc0745 2324
ac9dac7f
RGS
2325 use Tie::IxHash;
2326
2327 tie my %myhash, 'Tie::IxHash';
2328
2329 for (my \$i=0; \$i<20; \$i++) {
2330 \$myhash{\$i} = 2*\$i;
2331 }
2332
2333 my @keys = keys %myhash;
2334 # @keys = (0,1,2,3,...)
46fc3d4c 2335
68dc0745
PP
2336=head2 Why does passing a subroutine an undefined element in a hash create it?
2337
109f0441
SM
2338(contributed by brian d foy)
2339
2340Are you using a really old version of Perl?
2341
2342Normally, accessing a hash key's value for a nonexistent key will
2343I<not> create the key.
2344
2345 my %hash = ();
2346 my \$value = \$hash{ 'foo' };
2347 print "This won't print\n" if exists \$hash{ 'foo' };
2348
2349Passing C<\$hash{ 'foo' }> to a subroutine used to be a special case, though.
2350Since you could assign directly to C<\$_[0]>, Perl had to be ready to
2351make that assignment so it created the hash key ahead of time:
2352
2353 my_sub( \$hash{ 'foo' } );
2354 print "This will print before 5.004\n" if exists \$hash{ 'foo' };
68dc0745 2355
109f0441
SM
2356 sub my_sub {
2357 # \$_[0] = 'bar'; # create hash key in case you do this
2358 1;
2359 }
2360
2361Since Perl 5.004, however, this situation is a special case and Perl
2362creates the hash key only when you make the assignment:
68dc0745 2363
109f0441
SM
2364 my_sub( \$hash{ 'foo' } );
2365 print "This will print, even after 5.004\n" if exists \$hash{ 'foo' };
2366
2367 sub my_sub {
2368 \$_[0] = 'bar';
2369 }
68dc0745 2370
109f0441
SM
2371However, if you want the old behavior (and think carefully about that
2372because it's a weird side effect), you can pass a hash slice instead.
2373Perl 5.004 didn't make this a special case:
68dc0745 2374
109f0441 2375 my_sub( @hash{ qw/foo/ } );
68dc0745 2376
fc36a67e 2377=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
68dc0745 2378
65acb1b1
TC
2379Usually a hash ref, perhaps like this:
2380
ac9dac7f
RGS
2381 \$record = {
2382 NAME => "Jason",
2383 EMPNO => 132,
2384 TITLE => "deputy peon",
2385 AGE => 23,
2386 SALARY => 37_000,
2387 PALS => [ "Norbert", "Rhys", "Phineas"],
2388 };
65acb1b1 2389
ab093f19 2390References are documented in L<perlref> and L<perlreftut>.
65acb1b1
TC
2391Examples of complex data structures are given in L<perldsc> and
2392L<perllol>. Examples of structures and object-oriented classes are
2393in L<perltoot>.
68dc0745
PP
2394
2395=head2 How can I use a reference as a hash key?
2396
109f0441 2397(contributed by brian d foy and Ben Morrow)
9e72e4c6
RGS
2398
2399Hash keys are strings, so you can't really use a reference as the key.
2400When you try to do that, perl turns the reference into its stringified
ac9dac7f
RGS
2401form (for instance, C<HASH(0xDEADBEEF)>). From there you can't get
2402back the reference from the stringified form, at least without doing
109f0441
SM
2403some extra work on your own.
2404
2405Remember that the entry in the hash will still be there even if
2406the referenced variable goes out of scope, and that it is entirely
2407possible for Perl to subsequently allocate a different variable at
2408the same address. This will mean a new variable might accidentally
2409be associated with the value for an old.
2410
2411If you have Perl 5.10 or later, and you just want to store a value
2412against the reference for lookup later, you can use the core
2413Hash::Util::Fieldhash module. This will also handle renaming the
2414keys if you use multiple threads (which causes all variables to be
2415reallocated at new addresses, changing their stringification), and
2416garbage-collecting the entries when the referenced variable goes out
2417of scope.
2418
2419If you actually need to be able to get a real reference back from
2420each hash entry, you can use the Tie::RefHash module, which does the
2421required work for you.
68dc0745 2422
ebeb11a2 2423=head2 How can I check if a key exists in a multilevel hash?
a1bbdff3 2424
2425(contributed by brian d foy)
2426
2427The trick to this problem is avoiding accidental autovivification. If
2428you want to check three keys deep, you might naïvely try this:
2429
2430 my %hash;
2431 if( exists \$hash{key1}{key2}{key3} ) {
2432 ...;
2433 }
6f1f337b 2434
a1bbdff3 2435Even though you started with a completely empty hash, after that call to
2436C<exists> you've created the structure you needed to check for C<key3>:
2437
2438 %hash = (
2439 'key1' => {
2440 'key2' => {}
2441 }
2442 );
2443
2444That's autovivification. You can get around this in a few ways. The
2445easiest way is to just turn it off. The lexical C<autovivification>
2446pragma is available on CPAN. Now you don't add to the hash:
2447
2448 {
2449 no autovivification;
2450 my %hash;
2451 if( exists \$hash{key1}{key2}{key3} ) {
2452 ...;
2453 }
2454 }
2455
2456The C<Data::Diver> module on CPAN can do it for you too. Its C<Dive>
2457subroutine can tell you not only if the keys exist but also get the
2458value:
2459
2460 use Data::Diver qw(Dive);
2461
2462 my @exists = Dive( \%hash, qw(key1 key2 key3) );
2463 if( ! @exists ) {
2464 ...; # keys do not exist
2465 }
2466 elsif( ! defined \$exists[0] ) {
2467 ...; # keys exist but value is undef
2468 }
2469
2470You can easily do this yourself too by checking each level of the hash
2471before you move onto the next level. This is essentially what
2472C<Data::Diver> does for you:
2473
2474 if( check_hash( \%hash, qw(key1 key2 key3) ) ) {
2475 ...;
2476 }
2477
2478 sub check_hash {
2479 my( \$hash, @keys ) = @_;
2480
2481 return unless @keys;
2482
2483 foreach my \$key ( @keys ) {
2484 return unless eval { exists \$hash->{\$key} };
2485 \$hash = \$hash->{\$key};
2486 }
2487
2488 return 1;
2489 }
2490
68dc0745
PP
2491=head1 Data: Misc
2492
2493=head2 How do I handle binary data correctly?
2494
ac9dac7f 2495Perl is binary clean, so it can handle binary data just fine.
e573f903 2496On Windows or DOS, however, you have to use C<binmode> for binary
ac9dac7f
RGS
2497files to avoid conversions for line endings. In general, you should
2498use C<binmode> any time you want to work with binary data.
68dc0745 2499
ac9dac7f 2500Also see L<perlfunc/"binmode"> or L<perlopentut>.
68dc0745 2501
ac9dac7f 2502If you're concerned about 8-bit textual data then see L<perllocale>.
54310121 2503If you want to deal with multibyte characters, however, there are
68dc0745
PP
2504some gotchas. See the section on Regular Expressions.
2505
2506=head2 How do I determine whether a scalar is a number/whole/integer/float?
2507
2508Assuming that you don't care about IEEE notations like "NaN" or
a13ded55 2509"Infinity", you probably just want to use a regular expression:
68dc0745 2510
a13ded55 2511 use 5.010;
2512
2513 given( \$number ) {
2514 when( /\D/ )
2515 { say "\thas nondigits"; continue }
2516 when( /^\d+\z/ )
2517 { say "\tis a whole number"; continue }
2518 when( /^-?\d+\z/ )
2519 { say "\tis an integer"; continue }
2520 when( /^[+-]?\d+\z/ )
2521 { say "\tis a +/- integer"; continue }
2522 when( /^-?(?:\d+\.?|\.\d)\d*\z/ )
2523 { say "\tis a real number"; continue }
2524 when( /^[+-]?(?=\.?\d)\d*\.?\d*(?:e[+-]?\d+)?\z/i)
2525 { say "\tis a C float" }
2526 }
68dc0745 2527
f0d19b68
RGS
2528There are also some commonly used modules for the task.
2529L<Scalar::Util> (distributed with 5.8) provides access to perl's
ac9dac7f 2530internal function C<looks_like_number> for determining whether a
a13ded55 2531variable looks like a number. L<Data::Types> exports functions that
ac9dac7f
RGS
2532validate data types using both the above and other regular
2533expressions. Thirdly, there is C<Regexp::Common> which has regular
2534expressions to match various types of numbers. Those three modules are
2535available from the CPAN.
f0d19b68
RGS
2536
2537If you're on a POSIX system, Perl supports the C<POSIX::strtod>
a13ded55 2538function. Its semantics are somewhat cumbersome, so here's a
2539C<getnum> wrapper function for more convenient access. This function
ac9dac7f 2540takes a string and returns the number it found, or C<undef> for input
a13ded55 2541that isn't a C float. The C<is_numeric> function is a front end to
ac9dac7f
RGS
2542C<getnum> if you just want to say, "Is this a float?"
2543
2544 sub getnum {
2545 use POSIX qw(strtod);
2546 my \$str = shift;
2547 \$str =~ s/^\s+//;
2548 \$str =~ s/\s+\$//;
2549 \$! = 0;
2550 my(\$num, \$unparsed) = strtod(\$str);
2551 if ((\$str eq '') || (\$unparsed != 0) || \$!) {
2552 return undef;
2553 }
2554 else {
2555 return \$num;
2556 }
2557 }
5a964f20 2558
ac9dac7f 2559 sub is_numeric { defined getnum(\$_[0]) }
5a964f20 2560
f0d19b68 2561Or you could check out the L<String::Scanf> module on the CPAN
ac9dac7f
RGS
2562instead. The C<POSIX> module (part of the standard Perl distribution)
2563provides the C<strtod> and C<strtol> for converting strings to double
2564and longs, respectively.
68dc0745
PP
2565
2566=head2 How do I keep persistent data across program calls?
2567
2568For some specific applications, you can use one of the DBM modules.
ac9dac7f
RGS
2569See L<AnyDBM_File>. More generically, you should consult the C<FreezeThaw>
2570or C<Storable> modules from CPAN. Starting from Perl 5.8 C<Storable> is part
2571of the standard distribution. Here's one example using C<Storable>'s C<store>
fe854a6f 2572and C<retrieve> functions:
65acb1b1 2573
ac9dac7f
RGS
2574 use Storable;
2575 store(\%hash, "filename");
65acb1b1 2576
ac9dac7f
RGS
2577 # later on...
2578 \$href = retrieve("filename"); # by ref
2579 %hash = %{ retrieve("filename") }; # direct to hash
68dc0745
PP
2580
2581=head2 How do I print out or copy a recursive data structure?
2582
ac9dac7f
RGS
2583The C<Data::Dumper> module on CPAN (or the 5.005 release of Perl) is great
2584for printing out data structures. The C<Storable> module on CPAN (or the
6f82c03a
EM
25855.8 release of Perl), provides a function called C<dclone> that recursively
2586copies its argument.
65acb1b1 2587
ac9dac7f
RGS
2588 use Storable qw(dclone);
2589 \$r2 = dclone(\$r1);
68dc0745 2590
ac9dac7f 2591Where C<\$r1> can be a reference to any kind of data structure you'd like.
65acb1b1
TC
2592It will be deeply copied. Because C<dclone> takes and returns references,
2593you'd have to add extra punctuation if you had a hash of arrays that
2594you wanted to copy.
68dc0745 2595
ac9dac7f 2596 %newhash = %{ dclone(\%oldhash) };
68dc0745
PP
2597
2598=head2 How do I define methods for every class/object?
2599
109f0441
SM
2600(contributed by Ben Morrow)
2601
2602You can use the C<UNIVERSAL> class (see L<UNIVERSAL>). However, please
2603be very careful to consider the consequences of doing this: adding
2604methods to every object is very likely to have unintended
2605consequences. If possible, it would be better to have all your object
2606inherit from some common base class, or to use an object system like
2607Moose that supports roles.
68dc0745
PP
2608
2609=head2 How do I verify a credit card checksum?
2610
ac9dac7f 2611Get the C<Business::CreditCard> module from CPAN.
68dc0745 2612
65acb1b1
TC
2613=head2 How do I pack arrays of doubles or floats for XS code?
2614
109f0441 2615The arrays.h/arrays.c code in the C<PGPLOT> module on CPAN does just this.
65acb1b1 2616If you're doing a lot of float or double processing, consider using
ac9dac7f 2617the C<PDL> module from CPAN instead--it makes number-crunching easy.
65acb1b1 2618
109f0441
SM
2619See L<http://search.cpan.org/dist/PGPLOT> for the code.
2620
68dc0745
PP
2621=head1 AUTHOR AND COPYRIGHT
2622
8d2e243f 2623Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and
7678cced 2624other authors as noted. All rights reserved.
5a964f20 2625
5a7beb56
JH
2626This documentation is free; you can redistribute it and/or modify it
2627under the same terms as Perl itself.
5a964f20
TC
2628
2629Irrespective of its distribution, all code examples in this file
2630are hereby placed into the public domain. You are permitted and
2631encouraged to use this code in your own programs for fun
2632or for profit as you see fit. A simple comment in the code giving
2633credit would be courteous but is not required.