This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Makefile.PL changes to compiler Win32.xs using cygwin
[perl5.git] / pod / perlfaq4.pod
CommitLineData
68dc0745 1=head1 NAME
2
322be77c 3perlfaq4 - Data Manipulation ($Revision: 7996 $)
68dc0745 4
5=head1 DESCRIPTION
6
ae3d0b9f
JH
7This section of the FAQ answers questions related to manipulating
8numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
68dc0745 9
10=head1 Data: Numbers
11
46fc3d4c 12=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
13
ac9dac7f
RGS
14Internally, your computer represents floating-point numbers in binary.
15Digital (as in powers of two) computers cannot store all numbers
16exactly. Some real numbers lose precision in the process. This is a
17problem with how computers store numbers and affects all computer
18languages, not just Perl.
46fc3d4c 19
ac9dac7f
RGS
20L<perlnumber> show the gory details of number representations and
21conversions.
49d635f9 22
ac9dac7f
RGS
23To limit the number of decimal places in your numbers, you can use the
24printf or sprintf function. See the L<"Floating Point
25Arithmetic"|perlop> for more details.
49d635f9
RGS
26
27 printf "%.2f", 10/3;
197aec24 28
49d635f9 29 my $number = sprintf "%.2f", 10/3;
197aec24 30
32969b6e
BB
31=head2 Why is int() broken?
32
ac9dac7f 33Your C<int()> is most probably working just fine. It's the numbers that
32969b6e
BB
34aren't quite what you think.
35
ac9dac7f 36First, see the answer to "Why am I getting long decimals
32969b6e
BB
37(eg, 19.9499999999999) instead of the numbers I should be getting
38(eg, 19.95)?".
39
40For example, this
41
ac9dac7f 42 print int(0.6/0.2-2), "\n";
32969b6e
BB
43
44will in most computers print 0, not 1, because even such simple
45numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
46numbers. What you think in the above as 'three' is really more like
472.9999999999999995559.
48
68dc0745 49=head2 Why isn't my octal data interpreted correctly?
50
49d635f9
RGS
51Perl only understands octal and hex numbers as such when they occur as
52literals in your program. Octal literals in perl must start with a
ac9dac7f 53leading C<0> and hexadecimal literals must start with a leading C<0x>.
49d635f9 54If they are read in from somewhere and assigned, no automatic
ac9dac7f
RGS
55conversion takes place. You must explicitly use C<oct()> or C<hex()> if you
56want the values converted to decimal. C<oct()> interprets hexadecimal (C<0x350>),
57octal (C<0350> or even without the leading C<0>, like C<377>) and binary
58(C<0b1010>) numbers, while C<hex()> only converts hexadecimal ones, with
59or without a leading C<0x>, such as C<0x255>, C<3A>, C<ff>, or C<deadbeef>.
33ce146f 60The inverse mapping from decimal to octal can be done with either the
ac9dac7f 61<%o> or C<%O> C<sprintf()> formats.
68dc0745 62
ac9dac7f
RGS
63This problem shows up most often when people try using C<chmod()>,
64C<mkdir()>, C<umask()>, or C<sysopen()>, which by widespread tradition
65typically take permissions in octal.
68dc0745 66
ac9dac7f
RGS
67 chmod(644, $file); # WRONG
68 chmod(0644, $file); # right
68dc0745 69
197aec24 70Note the mistake in the first line was specifying the decimal literal
ac9dac7f 71C<644>, rather than the intended octal literal C<0644>. The problem can
33ce146f
PP
72be seen with:
73
ac9dac7f 74 printf("%#o",644); # prints 01204
33ce146f
PP
75
76Surely you had not intended C<chmod(01204, $file);> - did you? If you
77want to use numeric literals as arguments to chmod() et al. then please
197aec24 78try to express them as octal constants, that is with a leading zero and
ac9dac7f 79with the following digits restricted to the set C<0..7>.
33ce146f 80
65acb1b1 81=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
68dc0745 82
ac9dac7f
RGS
83Remember that C<int()> merely truncates toward 0. For rounding to a
84certain number of digits, C<sprintf()> or C<printf()> is usually the
85easiest route.
92c2ed05 86
ac9dac7f 87 printf("%.3f", 3.1415926535); # prints 3.142
68dc0745 88
ac9dac7f
RGS
89The C<POSIX> module (part of the standard Perl distribution)
90implements C<ceil()>, C<floor()>, and a number of other mathematical
91and trigonometric functions.
68dc0745 92
ac9dac7f
RGS
93 use POSIX;
94 $ceil = ceil(3.5); # 4
95 $floor = floor(3.5); # 3
92c2ed05 96
ac9dac7f
RGS
97In 5.000 to 5.003 perls, trigonometry was done in the C<Math::Complex>
98module. With 5.004, the C<Math::Trig> module (part of the standard Perl
46fc3d4c 99distribution) implements the trigonometric functions. Internally it
ac9dac7f 100uses the C<Math::Complex> module and some functions can break out from
46fc3d4c 101the real axis into the complex plane, for example the inverse sine of
1022.
68dc0745 103
104Rounding in financial applications can have serious implications, and
105the rounding method used should be specified precisely. In these
106cases, it probably pays not to trust whichever system rounding is
107being used by Perl, but to instead implement the rounding function you
108need yourself.
109
65acb1b1
TC
110To see why, notice how you'll still have an issue on half-way-point
111alternation:
112
ac9dac7f 113 for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
65acb1b1 114
ac9dac7f
RGS
115 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
116 0.8 0.8 0.9 0.9 1.0 1.0
65acb1b1 117
ac9dac7f
RGS
118Don't blame Perl. It's the same as in C. IEEE says we have to do
119this. Perl numbers whose absolute values are integers under 2**31 (on
12032 bit machines) will work pretty much like mathematical integers.
121Other numbers are not guaranteed.
65acb1b1 122
6f0efb17 123=head2 How do I convert between numeric representations/bases/radixes?
68dc0745 124
ac9dac7f
RGS
125As always with Perl there is more than one way to do it. Below are a
126few examples of approaches to making common conversions between number
127representations. This is intended to be representational rather than
128exhaustive.
68dc0745 129
ac9dac7f
RGS
130Some of the examples later in L<perlfaq4> use the C<Bit::Vector>
131module from CPAN. The reason you might choose C<Bit::Vector> over the
132perl built in functions is that it works with numbers of ANY size,
133that it is optimized for speed on some operations, and for at least
134some programmers the notation might be familiar.
d92eb7b0 135
818c4caa
JH
136=over 4
137
138=item How do I convert hexadecimal into decimal
d92eb7b0 139
ac9dac7f 140Using perl's built in conversion of C<0x> notation:
6761e064 141
ac9dac7f 142 $dec = 0xDEADBEEF;
7207e29d 143
ac9dac7f 144Using the C<hex> function:
6761e064 145
ac9dac7f 146 $dec = hex("DEADBEEF");
6761e064 147
ac9dac7f 148Using C<pack>:
6761e064 149
ac9dac7f 150 $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
6761e064 151
ac9dac7f 152Using the CPAN module C<Bit::Vector>:
6761e064 153
ac9dac7f
RGS
154 use Bit::Vector;
155 $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
156 $dec = $vec->to_Dec();
6761e064 157
818c4caa 158=item How do I convert from decimal to hexadecimal
6761e064 159
ac9dac7f 160Using C<sprintf>:
6761e064 161
ac9dac7f
RGS
162 $hex = sprintf("%X", 3735928559); # upper case A-F
163 $hex = sprintf("%x", 3735928559); # lower case a-f
6761e064 164
ac9dac7f 165Using C<unpack>:
6761e064 166
ac9dac7f 167 $hex = unpack("H*", pack("N", 3735928559));
6761e064 168
ac9dac7f 169Using C<Bit::Vector>:
6761e064 170
ac9dac7f
RGS
171 use Bit::Vector;
172 $vec = Bit::Vector->new_Dec(32, -559038737);
173 $hex = $vec->to_Hex();
6761e064 174
ac9dac7f 175And C<Bit::Vector> supports odd bit counts:
6761e064 176
ac9dac7f
RGS
177 use Bit::Vector;
178 $vec = Bit::Vector->new_Dec(33, 3735928559);
179 $vec->Resize(32); # suppress leading 0 if unwanted
180 $hex = $vec->to_Hex();
6761e064 181
818c4caa 182=item How do I convert from octal to decimal
6761e064
JH
183
184Using Perl's built in conversion of numbers with leading zeros:
185
ac9dac7f 186 $dec = 033653337357; # note the leading 0!
6761e064 187
ac9dac7f 188Using the C<oct> function:
6761e064 189
ac9dac7f 190 $dec = oct("33653337357");
6761e064 191
ac9dac7f 192Using C<Bit::Vector>:
6761e064 193
ac9dac7f
RGS
194 use Bit::Vector;
195 $vec = Bit::Vector->new(32);
196 $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
197 $dec = $vec->to_Dec();
6761e064 198
818c4caa 199=item How do I convert from decimal to octal
6761e064 200
ac9dac7f 201Using C<sprintf>:
6761e064 202
ac9dac7f 203 $oct = sprintf("%o", 3735928559);
6761e064 204
ac9dac7f 205Using C<Bit::Vector>:
6761e064 206
ac9dac7f
RGS
207 use Bit::Vector;
208 $vec = Bit::Vector->new_Dec(32, -559038737);
209 $oct = reverse join('', $vec->Chunk_List_Read(3));
6761e064 210
818c4caa 211=item How do I convert from binary to decimal
6761e064 212
2c646907 213Perl 5.6 lets you write binary numbers directly with
ac9dac7f 214the C<0b> notation:
2c646907 215
ac9dac7f 216 $number = 0b10110110;
6f0efb17 217
ac9dac7f 218Using C<oct>:
6f0efb17 219
ac9dac7f
RGS
220 my $input = "10110110";
221 $decimal = oct( "0b$input" );
2c646907 222
ac9dac7f 223Using C<pack> and C<ord>:
d92eb7b0 224
ac9dac7f 225 $decimal = ord(pack('B8', '10110110'));
68dc0745 226
ac9dac7f 227Using C<pack> and C<unpack> for larger strings:
6761e064 228
ac9dac7f 229 $int = unpack("N", pack("B32",
6761e064 230 substr("0" x 32 . "11110101011011011111011101111", -32)));
ac9dac7f 231 $dec = sprintf("%d", $int);
6761e064 232
ac9dac7f 233 # substr() is used to left pad a 32 character string with zeros.
6761e064 234
ac9dac7f 235Using C<Bit::Vector>:
6761e064 236
ac9dac7f
RGS
237 $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
238 $dec = $vec->to_Dec();
6761e064 239
818c4caa 240=item How do I convert from decimal to binary
6761e064 241
ac9dac7f 242Using C<sprintf> (perl 5.6+):
4dfcc30b 243
ac9dac7f 244 $bin = sprintf("%b", 3735928559);
4dfcc30b 245
ac9dac7f 246Using C<unpack>:
6761e064 247
ac9dac7f 248 $bin = unpack("B*", pack("N", 3735928559));
6761e064 249
ac9dac7f 250Using C<Bit::Vector>:
6761e064 251
ac9dac7f
RGS
252 use Bit::Vector;
253 $vec = Bit::Vector->new_Dec(32, -559038737);
254 $bin = $vec->to_Bin();
6761e064
JH
255
256The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
257are left as an exercise to the inclined reader.
68dc0745 258
818c4caa 259=back
68dc0745 260
65acb1b1
TC
261=head2 Why doesn't & work the way I want it to?
262
263The behavior of binary arithmetic operators depends on whether they're
264used on numbers or strings. The operators treat a string as a series
265of bits and work with that (the string C<"3"> is the bit pattern
266C<00110011>). The operators work with the binary form of a number
267(the number C<3> is treated as the bit pattern C<00000011>).
268
269So, saying C<11 & 3> performs the "and" operation on numbers (yielding
49d635f9 270C<3>). Saying C<"11" & "3"> performs the "and" operation on strings
65acb1b1
TC
271(yielding C<"1">).
272
273Most problems with C<&> and C<|> arise because the programmer thinks
274they have a number but really it's a string. The rest arise because
275the programmer says:
276
ac9dac7f
RGS
277 if ("\020\020" & "\101\101") {
278 # ...
279 }
65acb1b1
TC
280
281but a string consisting of two null bytes (the result of C<"\020\020"
282& "\101\101">) is not a false value in Perl. You need:
283
ac9dac7f
RGS
284 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
285 # ...
286 }
65acb1b1 287
68dc0745 288=head2 How do I multiply matrices?
289
290Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
291or the PDL extension (also available from CPAN).
292
293=head2 How do I perform an operation on a series of integers?
294
295To call a function on each element in an array, and collect the
296results, use:
297
ac9dac7f 298 @results = map { my_func($_) } @array;
68dc0745 299
300For example:
301
ac9dac7f 302 @triple = map { 3 * $_ } @single;
68dc0745 303
304To call a function on each element of an array, but ignore the
305results:
306
ac9dac7f
RGS
307 foreach $iterator (@array) {
308 some_func($iterator);
309 }
68dc0745 310
311To call a function on each integer in a (small) range, you B<can> use:
312
ac9dac7f 313 @results = map { some_func($_) } (5 .. 25);
68dc0745 314
315but you should be aware that the C<..> operator creates an array of
316all integers in the range. This can take a lot of memory for large
317ranges. Instead use:
318
ac9dac7f
RGS
319 @results = ();
320 for ($i=5; $i < 500_005; $i++) {
321 push(@results, some_func($i));
322 }
68dc0745 323
87275199
GS
324This situation has been fixed in Perl5.005. Use of C<..> in a C<for>
325loop will iterate over the range, without creating the entire range.
326
ac9dac7f
RGS
327 for my $i (5 .. 500_005) {
328 push(@results, some_func($i));
329 }
87275199
GS
330
331will not create a list of 500,000 integers.
332
68dc0745 333=head2 How can I output Roman numerals?
334
a93751fa 335Get the http://www.cpan.org/modules/by-module/Roman module.
68dc0745 336
337=head2 Why aren't my random numbers random?
338
65acb1b1
TC
339If you're using a version of Perl before 5.004, you must call C<srand>
340once at the start of your program to seed the random number generator.
49d635f9 341
5cd0b561 342 BEGIN { srand() if $] < 5.004 }
49d635f9 343
65acb1b1 3445.004 and later automatically call C<srand> at the beginning. Don't
ac9dac7f
RGS
345call C<srand> more than once--you make your numbers less random,
346rather than more.
92c2ed05 347
65acb1b1 348Computers are good at being predictable and bad at being random
06a5f41f 349(despite appearances caused by bugs in your programs :-). see the
49d635f9 350F<random> article in the "Far More Than You Ever Wanted To Know"
ac9dac7f
RGS
351collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy
352of Tom Phoenix, talks more about this. John von Neumann said, "Anyone
06a5f41f 353who attempts to generate random numbers by deterministic means is, of
b432a672 354course, living in a state of sin."
65acb1b1
TC
355
356If you want numbers that are more random than C<rand> with C<srand>
ac9dac7f 357provides, you should also check out the C<Math::TrulyRandom> module from
65acb1b1
TC
358CPAN. It uses the imperfections in your system's timer to generate
359random numbers, but this takes quite a while. If you want a better
92c2ed05 360pseudorandom generator than comes with your operating system, look at
b432a672 361"Numerical Recipes in C" at http://www.nr.com/ .
68dc0745 362
881bdbd4
JH
363=head2 How do I get a random number between X and Y?
364
500071f4
RGS
365To get a random number between two values, you can use the
366C<rand()> builtin to get a random number between 0 and
367
793f5136
RGS
368C<rand($x)> returns a number such that
369C<< 0 <= rand($x) < $x >>. Thus what you want to have perl
370figure out is a random number in the range from 0 to the
371difference between your I<X> and I<Y>.
372
373That is, to get a number between 10 and 15, inclusive, you
374want a random number between 0 and 5 that you can then add
375to 10.
376
500071f4 377 my $number = 10 + int rand( 15-10+1 );
793f5136
RGS
378
379Hence you derive the following simple function to abstract
380that. It selects a random integer between the two given
500071f4
RGS
381integers (inclusive), For example: C<random_int_between(50,120)>.
382
ac9dac7f 383 sub random_int_between {
500071f4
RGS
384 my($min, $max) = @_;
385 # Assumes that the two arguments are integers themselves!
386 return $min if $min == $max;
387 ($min, $max) = ($max, $min) if $min > $max;
388 return $min + int rand(1 + $max - $min);
389 }
881bdbd4 390
68dc0745 391=head1 Data: Dates
392
5cd0b561 393=head2 How do I find the day or week of the year?
68dc0745 394
571e049f 395The localtime function returns the day of the year. Without an
5cd0b561 396argument localtime uses the current time.
68dc0745 397
a05e4845 398 $day_of_year = (localtime)[7];
ffc145e8 399
ac9dac7f 400The C<POSIX> module can also format a date as the day of the year or
5cd0b561 401week of the year.
68dc0745 402
5cd0b561
RGS
403 use POSIX qw/strftime/;
404 my $day_of_year = strftime "%j", localtime;
405 my $week_of_year = strftime "%W", localtime;
406
ac9dac7f 407To get the day of year for any date, use C<POSIX>'s C<mktime> to get
5cd0b561 408a time in epoch seconds for the argument to localtime.
ffc145e8 409
ac9dac7f 410 use POSIX qw/mktime strftime/;
6670e5e7 411 my $week_of_year = strftime "%W",
ac9dac7f 412 localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
5cd0b561 413
ac9dac7f 414The C<Date::Calc> module provides two functions to calculate these.
5cd0b561
RGS
415
416 use Date::Calc;
417 my $day_of_year = Day_of_Year( 1987, 12, 18 );
418 my $week_of_year = Week_of_Year( 1987, 12, 18 );
ffc145e8 419
d92eb7b0
GS
420=head2 How do I find the current century or millennium?
421
422Use the following simple functions:
423
ac9dac7f
RGS
424 sub get_century {
425 return int((((localtime(shift || time))[5] + 1999))/100);
426 }
6670e5e7 427
ac9dac7f
RGS
428 sub get_millennium {
429 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
430 }
d92eb7b0 431
ac9dac7f
RGS
432On some systems, the C<POSIX> module's C<strftime()> function has been
433extended in a non-standard way to use a C<%C> format, which they
434sometimes claim is the "century". It isn't, because on most such
435systems, this is only the first two digits of the four-digit year, and
436thus cannot be used to reliably determine the current century or
437millennium.
d92eb7b0 438
92c2ed05 439=head2 How can I compare two dates and find the difference?
68dc0745 440
b68463f7
RGS
441(contributed by brian d foy)
442
ac9dac7f
RGS
443You could just store all your dates as a number and then subtract.
444Life isn't always that simple though. If you want to work with
445formatted dates, the C<Date::Manip>, C<Date::Calc>, or C<DateTime>
446modules can help you.
68dc0745 447
448=head2 How can I take a string and turn it into epoch seconds?
449
450If it's a regular enough string that it always has the same format,
92c2ed05 451you can split it up and pass the parts to C<timelocal> in the standard
ac9dac7f
RGS
452C<Time::Local> module. Otherwise, you should look into the C<Date::Calc>
453and C<Date::Manip> modules from CPAN.
68dc0745 454
455=head2 How can I find the Julian Day?
456
7678cced
RGS
457(contributed by brian d foy and Dave Cross)
458
ac9dac7f
RGS
459You can use the C<Time::JulianDay> module available on CPAN. Ensure
460that you really want to find a Julian day, though, as many people have
7678cced
RGS
461different ideas about Julian days. See
462http://www.hermetic.ch/cal_stud/jdn.htm for instance.
463
ac9dac7f 464You can also try the C<DateTime> module, which can convert a date/time
7678cced
RGS
465to a Julian Day.
466
ac9dac7f
RGS
467 $ perl -MDateTime -le'print DateTime->today->jd'
468 2453401.5
7678cced
RGS
469
470Or the modified Julian Day
471
ac9dac7f
RGS
472 $ perl -MDateTime -le'print DateTime->today->mjd'
473 53401
7678cced
RGS
474
475Or even the day of the year (which is what some people think of as a
476Julian day)
477
ac9dac7f
RGS
478 $ perl -MDateTime -le'print DateTime->today->doy'
479 31
be94a901 480
65acb1b1
TC
481=head2 How do I find yesterday's date?
482
6670e5e7 483(contributed by brian d foy)
49d635f9 484
6670e5e7
RGS
485Use one of the Date modules. The C<DateTime> module makes it simple, and
486give you the same time of day, only the day before.
49d635f9 487
6670e5e7 488 use DateTime;
58103a2e 489
6670e5e7 490 my $yesterday = DateTime->now->subtract( days => 1 );
58103a2e 491
6670e5e7 492 print "Yesterday was $yesterday\n";
49d635f9 493
6670e5e7
RGS
494You can also use the C<Date::Calc> module using its Today_and_Now
495function.
49d635f9 496
6670e5e7 497 use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
58103a2e 498
6670e5e7 499 my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
58103a2e 500
6670e5e7 501 print "@date\n";
58103a2e 502
6670e5e7
RGS
503Most people try to use the time rather than the calendar to figure out
504dates, but that assumes that days are twenty-four hours each. For
505most people, there are two days a year when they aren't: the switch to
506and from summer time throws this off. Let the modules do the work.
d92eb7b0 507
ac9dac7f 508=head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
68dc0745 509
65acb1b1 510Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is
ac9dac7f 511Y2K compliant (whatever that means). The programmers you've hired to
65acb1b1
TC
512use it, however, probably are not.
513
514Long answer: The question belies a true understanding of the issue.
515Perl is just as Y2K compliant as your pencil--no more, and no less.
516Can you use your pencil to write a non-Y2K-compliant memo? Of course
517you can. Is that the pencil's fault? Of course it isn't.
92c2ed05 518
87275199 519The date and time functions supplied with Perl (gmtime and localtime)
65acb1b1
TC
520supply adequate information to determine the year well beyond 2000
521(2038 is when trouble strikes for 32-bit machines). The year returned
90fdbbb7 522by these functions when used in a list context is the year minus 1900.
65acb1b1
TC
523For years between 1910 and 1999 this I<happens> to be a 2-digit decimal
524number. To avoid the year 2000 problem simply do not treat the year as
525a 2-digit number. It isn't.
68dc0745 526
5a964f20 527When gmtime() and localtime() are used in scalar context they return
68dc0745 528a timestamp string that contains a fully-expanded year. For example,
529C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
5302001". There's no year 2000 problem here.
531
5a964f20
TC
532That doesn't mean that Perl can't be used to create non-Y2K compliant
533programs. It can. But so can your pencil. It's the fault of the user,
b432a672
AL
534not the language. At the risk of inflaming the NRA: "Perl doesn't
535break Y2K, people do." See http://www.perl.org/about/y2k.html for
5a964f20
TC
536a longer exposition.
537
68dc0745 538=head1 Data: Strings
539
540=head2 How do I validate input?
541
6670e5e7
RGS
542(contributed by brian d foy)
543
544There are many ways to ensure that values are what you expect or
545want to accept. Besides the specific examples that we cover in the
546perlfaq, you can also look at the modules with "Assert" and "Validate"
547in their names, along with other modules such as C<Regexp::Common>.
548
549Some modules have validation for particular types of input, such
550as C<Business::ISBN>, C<Business::CreditCard>, C<Email::Valid>,
551and C<Data::Validate::IP>.
68dc0745 552
553=head2 How do I unescape a string?
554
b432a672 555It depends just what you mean by "escape". URL escapes are dealt
92c2ed05 556with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
a6dd486b 557character are removed with
68dc0745 558
ac9dac7f 559 s/\\(.)/$1/g;
68dc0745 560
92c2ed05 561This won't expand C<"\n"> or C<"\t"> or any other special escapes.
68dc0745 562
563=head2 How do I remove consecutive pairs of characters?
564
6670e5e7
RGS
565(contributed by brian d foy)
566
567You can use the substitution operator to find pairs of characters (or
568runs of characters) and replace them with a single instance. In this
569substitution, we find a character in C<(.)>. The memory parentheses
570store the matched character in the back-reference C<\1> and we use
571that to require that the same thing immediately follow it. We replace
572that part of the string with the character in C<$1>.
68dc0745 573
ac9dac7f 574 s/(.)\1/$1/g;
d92eb7b0 575
6670e5e7
RGS
576We can also use the transliteration operator, C<tr///>. In this
577example, the search list side of our C<tr///> contains nothing, but
578the C<c> option complements that so it contains everything. The
579replacement list also contains nothing, so the transliteration is
580almost a no-op since it won't do any replacements (or more exactly,
581replace the character with itself). However, the C<s> option squashes
582duplicated and consecutive characters in the string so a character
583does not show up next to itself
d92eb7b0 584
6670e5e7 585 my $str = 'Haarlem'; # in the Netherlands
ac9dac7f 586 $str =~ tr///cs; # Now Harlem, like in New York
68dc0745 587
588=head2 How do I expand function calls in a string?
589
6670e5e7
RGS
590(contributed by brian d foy)
591
592This is documented in L<perlref>, and although it's not the easiest
593thing to read, it does work. In each of these examples, we call the
58103a2e 594function inside the braces used to dereference a reference. If we
5ae37c3f 595have more than one return value, we can construct and dereference an
6670e5e7
RGS
596anonymous array. In this case, we call the function in list context.
597
58103a2e 598 print "The time values are @{ [localtime] }.\n";
6670e5e7
RGS
599
600If we want to call the function in scalar context, we have to do a bit
601more work. We can really have any code we like inside the braces, so
602we simply have to end with the scalar reference, although how you do
e573f903
RGS
603that is up to you, and you can use code inside the braces. Note that
604the use of parens creates a list context, so we need C<scalar> to
605force the scalar context on the function:
68dc0745 606
6670e5e7 607 print "The time is ${\(scalar localtime)}.\n"
58103a2e 608
6670e5e7 609 print "The time is ${ my $x = localtime; \$x }.\n";
58103a2e 610
6670e5e7
RGS
611If your function already returns a reference, you don't need to create
612the reference yourself.
613
614 sub timestamp { my $t = localtime; \$t }
58103a2e 615
6670e5e7 616 print "The time is ${ timestamp() }.\n";
58103a2e
RGS
617
618The C<Interpolation> module can also do a lot of magic for you. You can
619specify a variable name, in this case C<E>, to set up a tied hash that
620does the interpolation for you. It has several other methods to do this
621as well.
622
623 use Interpolation E => 'eval';
624 print "The time values are $E{localtime()}.\n";
625
626In most cases, it is probably easier to simply use string concatenation,
627which also forces scalar context.
6670e5e7 628
ac9dac7f 629 print "The time is " . localtime() . ".\n";
68dc0745 630
68dc0745 631=head2 How do I find matching/nesting anything?
632
92c2ed05
GS
633This isn't something that can be done in one regular expression, no
634matter how complicated. To find something between two single
635characters, a pattern like C</x([^x]*)x/> will get the intervening
636bits in $1. For multiple ones, then something more like
ac9dac7f 637C</alpha(.*?)omega/> would be needed. But none of these deals with
6670e5e7
RGS
638nested patterns. For balanced expressions using C<(>, C<{>, C<[> or
639C<< < >> as delimiters, use the CPAN module Regexp::Common, or see
640L<perlre/(??{ code })>. For other cases, you'll have to write a
641parser.
92c2ed05
GS
642
643If you are serious about writing a parser, there are a number of
6a2af475 644modules or oddities that will make your life a lot easier. There are
ac9dac7f
RGS
645the CPAN modules C<Parse::RecDescent>, C<Parse::Yapp>, and
646C<Text::Balanced>; and the C<byacc> program. Starting from perl 5.8
647the C<Text::Balanced> is part of the standard distribution.
68dc0745 648
92c2ed05
GS
649One simple destructive, inside-out approach that you might try is to
650pull out the smallest nesting parts one at a time:
5a964f20 651
ac9dac7f
RGS
652 while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
653 # do something with $1
654 }
5a964f20 655
65acb1b1
TC
656A more complicated and sneaky approach is to make Perl's regular
657expression engine do it for you. This is courtesy Dean Inada, and
658rather has the nature of an Obfuscated Perl Contest entry, but it
659really does work:
660
ac9dac7f
RGS
661 # $_ contains the string to parse
662 # BEGIN and END are the opening and closing markers for the
663 # nested text.
c47ff5f1 664
ac9dac7f
RGS
665 @( = ('(','');
666 @) = (')','');
667 ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
668 @$ = (eval{/$re/},$@!~/unmatched/i);
669 print join("\n",@$[0..$#$]) if( $$[-1] );
65acb1b1 670
68dc0745 671=head2 How do I reverse a string?
672
ac9dac7f 673Use C<reverse()> in scalar context, as documented in
68dc0745 674L<perlfunc/reverse>.
675
ac9dac7f 676 $reversed = reverse $string;
68dc0745 677
678=head2 How do I expand tabs in a string?
679
5a964f20 680You can do it yourself:
68dc0745 681
ac9dac7f 682 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
68dc0745 683
ac9dac7f 684Or you can just use the C<Text::Tabs> module (part of the standard Perl
68dc0745 685distribution).
686
ac9dac7f
RGS
687 use Text::Tabs;
688 @expanded_lines = expand(@lines_with_tabs);
68dc0745 689
690=head2 How do I reformat a paragraph?
691
ac9dac7f 692Use C<Text::Wrap> (part of the standard Perl distribution):
68dc0745 693
ac9dac7f
RGS
694 use Text::Wrap;
695 print wrap("\t", ' ', @paragraphs);
68dc0745 696
ac9dac7f
RGS
697The paragraphs you give to C<Text::Wrap> should not contain embedded
698newlines. C<Text::Wrap> doesn't justify the lines (flush-right).
46fc3d4c 699
ac9dac7f
RGS
700Or use the CPAN module C<Text::Autoformat>. Formatting files can be
701easily done by making a shell alias, like so:
bc06af74 702
ac9dac7f
RGS
703 alias fmt="perl -i -MText::Autoformat -n0777 \
704 -e 'print autoformat $_, {all=>1}' $*"
bc06af74 705
ac9dac7f 706See the documentation for C<Text::Autoformat> to appreciate its many
bc06af74
JH
707capabilities.
708
49d635f9 709=head2 How can I access or change N characters of a string?
68dc0745 710
49d635f9
RGS
711You can access the first characters of a string with substr().
712To get the first character, for example, start at position 0
197aec24 713and grab the string of length 1.
68dc0745 714
68dc0745 715
49d635f9 716 $string = "Just another Perl Hacker";
ac9dac7f 717 $first_char = substr( $string, 0, 1 ); # 'J'
68dc0745 718
49d635f9
RGS
719To change part of a string, you can use the optional fourth
720argument which is the replacement string.
68dc0745 721
ac9dac7f 722 substr( $string, 13, 4, "Perl 5.8.0" );
197aec24 723
49d635f9 724You can also use substr() as an lvalue.
68dc0745 725
ac9dac7f 726 substr( $string, 13, 4 ) = "Perl 5.8.0";
197aec24 727
68dc0745 728=head2 How do I change the Nth occurrence of something?
729
92c2ed05
GS
730You have to keep track of N yourself. For example, let's say you want
731to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
d92eb7b0
GS
732C<"whosoever"> or C<"whomsoever">, case insensitively. These
733all assume that $_ contains the string to be altered.
68dc0745 734
ac9dac7f
RGS
735 $count = 0;
736 s{((whom?)ever)}{
737 ++$count == 5 # is it the 5th?
738 ? "${2}soever" # yes, swap
739 : $1 # renege and leave it there
740 }ige;
68dc0745 741
5a964f20
TC
742In the more general case, you can use the C</g> modifier in a C<while>
743loop, keeping count of matches.
744
ac9dac7f
RGS
745 $WANT = 3;
746 $count = 0;
747 $_ = "One fish two fish red fish blue fish";
748 while (/(\w+)\s+fish\b/gi) {
749 if (++$count == $WANT) {
750 print "The third fish is a $1 one.\n";
751 }
752 }
5a964f20 753
92c2ed05 754That prints out: C<"The third fish is a red one."> You can also use a
5a964f20
TC
755repetition count and repeated pattern like this:
756
ac9dac7f 757 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
5a964f20 758
68dc0745 759=head2 How can I count the number of occurrences of a substring within a string?
760
a6dd486b 761There are a number of ways, with varying efficiency. If you want a
68dc0745 762count of a certain single character (X) within a string, you can use the
763C<tr///> function like so:
764
ac9dac7f
RGS
765 $string = "ThisXlineXhasXsomeXx'sXinXit";
766 $count = ($string =~ tr/X//);
767 print "There are $count X characters in the string";
68dc0745 768
769This is fine if you are just looking for a single character. However,
770if you are trying to count multiple character substrings within a
771larger string, C<tr///> won't work. What you can do is wrap a while()
772loop around a global pattern match. For example, let's count negative
773integers:
774
ac9dac7f
RGS
775 $string = "-9 55 48 -2 23 -76 4 14 -44";
776 while ($string =~ /-\d+/g) { $count++ }
777 print "There are $count negative numbers in the string";
68dc0745 778
881bdbd4
JH
779Another version uses a global match in list context, then assigns the
780result to a scalar, producing a count of the number of matches.
781
782 $count = () = $string =~ /-\d+/g;
783
68dc0745 784=head2 How do I capitalize all the words on one line?
785
786To make the first letter of each word upper case:
3fe9a6f1 787
ac9dac7f 788 $line =~ s/\b(\w)/\U$1/g;
68dc0745 789
46fc3d4c 790This has the strange effect of turning "C<don't do it>" into "C<Don'T
a6dd486b 791Do It>". Sometimes you might want this. Other times you might need a
24f1ba9b 792more thorough solution (Suggested by brian d foy):
46fc3d4c 793
ac9dac7f
RGS
794 $string =~ s/ (
795 (^\w) #at the beginning of the line
796 | # or
797 (\s\w) #preceded by whitespace
798 )
799 /\U$1/xg;
800
801 $string =~ s/([\w']+)/\u\L$1/g;
46fc3d4c 802
68dc0745 803To make the whole line upper case:
3fe9a6f1 804
ac9dac7f 805 $line = uc($line);
68dc0745 806
807To force each word to be lower case, with the first letter upper case:
3fe9a6f1 808
ac9dac7f 809 $line =~ s/(\w+)/\u\L$1/g;
68dc0745 810
5a964f20
TC
811You can (and probably should) enable locale awareness of those
812characters by placing a C<use locale> pragma in your program.
92c2ed05 813See L<perllocale> for endless details on locales.
5a964f20 814
65acb1b1 815This is sometimes referred to as putting something into "title
d92eb7b0 816case", but that's not quite accurate. Consider the proper
65acb1b1
TC
817capitalization of the movie I<Dr. Strangelove or: How I Learned to
818Stop Worrying and Love the Bomb>, for example.
819
369b44b4
RGS
820Damian Conway's L<Text::Autoformat> module provides some smart
821case transformations:
822
ac9dac7f
RGS
823 use Text::Autoformat;
824 my $x = "Dr. Strangelove or: How I Learned to Stop ".
825 "Worrying and Love the Bomb";
369b44b4 826
ac9dac7f
RGS
827 print $x, "\n";
828 for my $style (qw( sentence title highlight )) {
829 print autoformat($x, { case => $style }), "\n";
830 }
369b44b4 831
49d635f9 832=head2 How can I split a [character] delimited string except when inside [character]?
68dc0745 833
ac9dac7f
RGS
834Several modules can handle this sort of parsing--C<Text::Balanced>,
835C<Text::CSV>, C<Text::CSV_XS>, and C<Text::ParseWords>, among others.
49d635f9
RGS
836
837Take the example case of trying to split a string that is
838comma-separated into its different fields. You can't use C<split(/,/)>
839because you shouldn't split if the comma is inside quotes. For
840example, take a data line like this:
68dc0745 841
ac9dac7f 842 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
68dc0745 843
844Due to the restriction of the quotes, this is a fairly complex
197aec24 845problem. Thankfully, we have Jeffrey Friedl, author of
49d635f9 846I<Mastering Regular Expressions>, to handle these for us. He
ac9dac7f 847suggests (assuming your string is contained in C<$text>):
68dc0745 848
ac9dac7f
RGS
849 @new = ();
850 push(@new, $+) while $text =~ m{
851 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
852 | ([^,]+),?
853 | ,
854 }gx;
855 push(@new, undef) if substr($text,-1,1) eq ',';
68dc0745 856
46fc3d4c 857If you want to represent quotation marks inside a
858quotation-mark-delimited field, escape them with backslashes (eg,
49d635f9 859C<"like \"this\"">.
46fc3d4c 860
ac9dac7f
RGS
861Alternatively, the C<Text::ParseWords> module (part of the standard
862Perl distribution) lets you say:
68dc0745 863
ac9dac7f
RGS
864 use Text::ParseWords;
865 @new = quotewords(",", 0, $text);
65acb1b1 866
68dc0745 867=head2 How do I strip blank space from the beginning/end of a string?
868
6670e5e7 869(contributed by brian d foy)
68dc0745 870
6670e5e7
RGS
871A substitution can do this for you. For a single line, you want to
872replace all the leading or trailing whitespace with nothing. You
873can do that with a pair of substitutions.
68dc0745 874
6670e5e7
RGS
875 s/^\s+//;
876 s/\s+$//;
68dc0745 877
6670e5e7
RGS
878You can also write that as a single substitution, although it turns
879out the combined statement is slower than the separate ones. That
880might not matter to you, though.
68dc0745 881
6670e5e7 882 s/^\s+|\s+$//g;
68dc0745 883
6670e5e7
RGS
884In this regular expression, the alternation matches either at the
885beginning or the end of the string since the anchors have a lower
886precedence than the alternation. With the C</g> flag, the substitution
887makes all possible matches, so it gets both. Remember, the trailing
888newline matches the C<\s+>, and the C<$> anchor can match to the
889physical end of the string, so the newline disappears too. Just add
890the newline to the output, which has the added benefit of preserving
891"blank" (consisting entirely of whitespace) lines which the C<^\s+>
892would remove all by itself.
68dc0745 893
6670e5e7
RGS
894 while( <> )
895 {
896 s/^\s+|\s+$//g;
897 print "$_\n";
898 }
5a964f20 899
6670e5e7
RGS
900For a multi-line string, you can apply the regular expression
901to each logical line in the string by adding the C</m> flag (for
902"multi-line"). With the C</m> flag, the C<$> matches I<before> an
903embedded newline, so it doesn't remove it. It still removes the
904newline at the end of the string.
905
ac9dac7f 906 $string =~ s/^\s+|\s+$//gm;
6670e5e7
RGS
907
908Remember that lines consisting entirely of whitespace will disappear,
909since the first part of the alternation can match the entire string
910and replace it with nothing. If need to keep embedded blank lines,
911you have to do a little more work. Instead of matching any whitespace
912(since that includes a newline), just match the other whitespace.
913
914 $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
5a964f20 915
65acb1b1
TC
916=head2 How do I pad a string with blanks or pad a number with zeroes?
917
65acb1b1 918In the following examples, C<$pad_len> is the length to which you wish
d92eb7b0
GS
919to pad the string, C<$text> or C<$num> contains the string to be padded,
920and C<$pad_char> contains the padding character. You can use a single
921character string constant instead of the C<$pad_char> variable if you
922know what it is in advance. And in the same way you can use an integer in
923place of C<$pad_len> if you know the pad length in advance.
65acb1b1 924
d92eb7b0
GS
925The simplest method uses the C<sprintf> function. It can pad on the left
926or right with blanks and on the left with zeroes and it will not
927truncate the result. The C<pack> function can only pad strings on the
928right with blanks and it will truncate the result to a maximum length of
929C<$pad_len>.
65acb1b1 930
ac9dac7f 931 # Left padding a string with blanks (no truncation):
04d666b1
RGS
932 $padded = sprintf("%${pad_len}s", $text);
933 $padded = sprintf("%*s", $pad_len, $text); # same thing
65acb1b1 934
ac9dac7f 935 # Right padding a string with blanks (no truncation):
04d666b1
RGS
936 $padded = sprintf("%-${pad_len}s", $text);
937 $padded = sprintf("%-*s", $pad_len, $text); # same thing
65acb1b1 938
ac9dac7f 939 # Left padding a number with 0 (no truncation):
04d666b1
RGS
940 $padded = sprintf("%0${pad_len}d", $num);
941 $padded = sprintf("%0*d", $pad_len, $num); # same thing
65acb1b1 942
ac9dac7f
RGS
943 # Right padding a string with blanks using pack (will truncate):
944 $padded = pack("A$pad_len",$text);
65acb1b1 945
d92eb7b0
GS
946If you need to pad with a character other than blank or zero you can use
947one of the following methods. They all generate a pad string with the
948C<x> operator and combine that with C<$text>. These methods do
949not truncate C<$text>.
65acb1b1 950
d92eb7b0 951Left and right padding with any character, creating a new string:
65acb1b1 952
ac9dac7f
RGS
953 $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
954 $padded = $text . $pad_char x ( $pad_len - length( $text ) );
65acb1b1 955
d92eb7b0 956Left and right padding with any character, modifying C<$text> directly:
65acb1b1 957
ac9dac7f
RGS
958 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
959 $text .= $pad_char x ( $pad_len - length( $text ) );
65acb1b1 960
68dc0745 961=head2 How do I extract selected columns from a string?
962
e573f903
RGS
963(contributed by brian d foy)
964
965If you know where the columns that contain the data, you can
966use C<substr> to extract a single column.
967
968 my $column = substr( $line, $start_column, $length );
969
970You can use C<split> if the columns are separated by whitespace or
971some other delimiter, as long as whitespace or the delimiter cannot
972appear as part of the data.
973
974 my $line = ' fred barney betty ';
975 my @columns = split /\s+/, $line;
976 # ( '', 'fred', 'barney', 'betty' );
977
978 my $line = 'fred||barney||betty';
979 my @columns = split /\|/, $line;
980 # ( 'fred', '', 'barney', '', 'betty' );
981
982If you want to work with comma-separated values, don't do this since
983that format is a bit more complicated. Use one of the modules that
984handle that fornat, such as C<Text::CSV>, C<Text::CSV_XS>, or
985C<Text::CSV_PP>.
986
987If you want to break apart an entire line of fixed columns, you can use
988C<unpack> with the A (ASCII) format. by using a number after the format
989specifier, you can denote the column width. See the C<pack> and C<unpack>
990entries in L<perlfunc> for more details.
991
992 my @fields = unpack( $line, "A8 A8 A8 A16 A4" );
993
994Note that spaces in the format argument to C<unpack> do not denote literal
995spaces. If you have space separated data, you may want C<split> instead.
68dc0745 996
997=head2 How do I find the soundex value of a string?
998
7678cced
RGS
999(contributed by brian d foy)
1000
1001You can use the Text::Soundex module. If you want to do fuzzy or close
ac9dac7f
RGS
1002matching, you might also try the C<String::Approx>, and
1003C<Text::Metaphone>, and C<Text::DoubleMetaphone> modules.
68dc0745 1004
1005=head2 How can I expand variables in text strings?
1006
e573f903 1007(contributed by brian d foy)
5a964f20 1008
322be77c
RGS
1009If you can avoid it, don't, or if you can use a templating system,
1010such as C<Text::Template> or C<Template> Toolkit, do that instead.
1011
1012However, for the one-off simple case where I don't want to pull out a
1013full templating system, I'll use a string that has two Perl scalar
1014variables in it. In this example, I want to expand C<$foo> and C<$bar>
1015to their variable's values.
e573f903
RGS
1016
1017 my $foo = 'Fred';
1018 my $bar = 'Barney';
1019 $string = 'Say hello to $foo and $bar';
1020
1021One way I can do this involves the substitution operator and a double
1022C</e> flag. The first C</e> evaluates C<$1> on the replacement side and
1023turns it into C<$foo>. The second /e starts with C<$foo> and replaces
1024it with its value. C<$foo>, then, turns into 'Fred', and that's finally
1025what's left in the string.
1026
1027 $string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
322be77c 1028
e573f903
RGS
1029The C</e> will also silently ignore violations of strict, replacing
1030undefined variable names with the empty string.
1031
1032I could also pull the values from a hash instead of evaluating
1033variable names. Using a single C</e>, I can check the hash to ensure
1034the value exists, and if it doesn't, I can replace the missing value
1035with a marker, in this case C<???> to signal that I missed something:
1036
1037 my $string = 'This has $foo and $bar';
1038
1039 my %Replacements = (
1040 foo => 'Fred',
ac9dac7f 1041 );
322be77c 1042
e573f903
RGS
1043 # $string =~ s/\$(\w+)/$Replacements{$1}/g;
1044 $string =~ s/\$(\w+)/
1045 exists $Replacements{$1} ? $Replacements{$1} : '???'
1046 /eg;
322be77c 1047
e573f903 1048 print $string;
322be77c 1049
68dc0745 1050=head2 What's wrong with always quoting "$vars"?
1051
ac9dac7f 1052The problem is that those double-quotes force
e573f903
RGS
1053stringification--coercing numbers and references into strings--even
1054when you don't want them to be strings. Think of it this way:
1055double-quote expansion is used to produce new strings. If you already
1056have a string, why do you need more?
68dc0745 1057
1058If you get used to writing odd things like these:
1059
ac9dac7f
RGS
1060 print "$var"; # BAD
1061 $new = "$old"; # BAD
1062 somefunc("$var"); # BAD
68dc0745 1063
1064You'll be in trouble. Those should (in 99.8% of the cases) be
1065the simpler and more direct:
1066
ac9dac7f
RGS
1067 print $var;
1068 $new = $old;
1069 somefunc($var);
68dc0745 1070
1071Otherwise, besides slowing you down, you're going to break code when
1072the thing in the scalar is actually neither a string nor a number, but
1073a reference:
1074
ac9dac7f
RGS
1075 func(\@array);
1076 sub func {
1077 my $aref = shift;
1078 my $oref = "$aref"; # WRONG
1079 }
68dc0745 1080
1081You can also get into subtle problems on those few operations in Perl
1082that actually do care about the difference between a string and a
1083number, such as the magical C<++> autoincrement operator or the
1084syscall() function.
1085
197aec24 1086Stringification also destroys arrays.
5a964f20 1087
ac9dac7f
RGS
1088 @lines = `command`;
1089 print "@lines"; # WRONG - extra blanks
1090 print @lines; # right
5a964f20 1091
04d666b1 1092=head2 Why don't my E<lt>E<lt>HERE documents work?
68dc0745 1093
1094Check for these three things:
1095
1096=over 4
1097
04d666b1 1098=item There must be no space after the E<lt>E<lt> part.
68dc0745 1099
197aec24 1100=item There (probably) should be a semicolon at the end.
68dc0745 1101
197aec24 1102=item You can't (easily) have any space in front of the tag.
68dc0745 1103
1104=back
1105
197aec24 1106If you want to indent the text in the here document, you
5a964f20
TC
1107can do this:
1108
1109 # all in one
1110 ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1111 your text
1112 goes here
1113 HERE_TARGET
1114
1115But the HERE_TARGET must still be flush against the margin.
197aec24 1116If you want that indented also, you'll have to quote
5a964f20
TC
1117in the indentation.
1118
1119 ($quote = <<' FINIS') =~ s/^\s+//gm;
1120 ...we will have peace, when you and all your works have
1121 perished--and the works of your dark master to whom you
1122 would deliver us. You are a liar, Saruman, and a corrupter
1123 of men's hearts. --Theoden in /usr/src/perl/taint.c
1124 FINIS
83ded9ee 1125 $quote =~ s/\s+--/\n--/;
5a964f20
TC
1126
1127A nice general-purpose fixer-upper function for indented here documents
1128follows. It expects to be called with a here document as its argument.
1129It looks to see whether each line begins with a common substring, and
a6dd486b
JB
1130if so, strips that substring off. Otherwise, it takes the amount of leading
1131whitespace found on the first line and removes that much off each
5a964f20
TC
1132subsequent line.
1133
1134 sub fix {
1135 local $_ = shift;
a6dd486b 1136 my ($white, $leader); # common whitespace and common leading string
5a964f20
TC
1137 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
1138 ($white, $leader) = ($2, quotemeta($1));
1139 } else {
1140 ($white, $leader) = (/^(\s+)/, '');
1141 }
1142 s/^\s*?$leader(?:$white)?//gm;
1143 return $_;
1144 }
1145
c8db1d39 1146This works with leading special strings, dynamically determined:
5a964f20 1147
ac9dac7f 1148 $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
5a964f20
TC
1149 @@@ int
1150 @@@ runops() {
1151 @@@ SAVEI32(runlevel);
1152 @@@ runlevel++;
d92eb7b0 1153 @@@ while ( op = (*op->op_ppaddr)() );
5a964f20
TC
1154 @@@ TAINT_NOT;
1155 @@@ return 0;
1156 @@@ }
ac9dac7f 1157 MAIN_INTERPRETER_LOOP
5a964f20 1158
a6dd486b 1159Or with a fixed amount of leading whitespace, with remaining
5a964f20
TC
1160indentation correctly preserved:
1161
ac9dac7f 1162 $poem = fix<<EVER_ON_AND_ON;
5a964f20
TC
1163 Now far ahead the Road has gone,
1164 And I must follow, if I can,
1165 Pursuing it with eager feet,
1166 Until it joins some larger way
1167 Where many paths and errands meet.
1168 And whither then? I cannot say.
1169 --Bilbo in /usr/src/perl/pp_ctl.c
ac9dac7f 1170 EVER_ON_AND_ON
5a964f20 1171
68dc0745 1172=head1 Data: Arrays
1173
65acb1b1
TC
1174=head2 What is the difference between a list and an array?
1175
ac9dac7f
RGS
1176An array has a changeable length. A list does not. An array is
1177something you can push or pop, while a list is a set of values. Some
1178people make the distinction that a list is a value while an array is a
1179variable. Subroutines are passed and return lists, you put things into
1180list context, you initialize arrays with lists, and you C<foreach()>
1181across a list. C<@> variables are arrays, anonymous arrays are
1182arrays, arrays in scalar context behave like the number of elements in
1183them, subroutines access their arguments through the array C<@_>, and
1184C<push>/C<pop>/C<shift> only work on arrays.
65acb1b1
TC
1185
1186As a side note, there's no such thing as a list in scalar context.
1187When you say
1188
ac9dac7f 1189 $scalar = (2, 5, 7, 9);
65acb1b1 1190
d92eb7b0 1191you're using the comma operator in scalar context, so it uses the scalar
ac9dac7f 1192comma operator. There never was a list there at all! This causes the
d92eb7b0 1193last value to be returned: 9.
65acb1b1 1194
68dc0745 1195=head2 What is the difference between $array[1] and @array[1]?
1196
a6dd486b 1197The former is a scalar value; the latter an array slice, making
68dc0745 1198it a list with one (scalar) value. You should use $ when you want a
1199scalar value (most of the time) and @ when you want a list with one
1200scalar value in it (very, very rarely; nearly never, in fact).
1201
1202Sometimes it doesn't make a difference, but sometimes it does.
1203For example, compare:
1204
ac9dac7f 1205 $good[0] = `some program that outputs several lines`;
68dc0745 1206
1207with
1208
ac9dac7f 1209 @bad[0] = `same program that outputs several lines`;
68dc0745 1210
197aec24 1211The C<use warnings> pragma and the B<-w> flag will warn you about these
9f1b1f2d 1212matters.
68dc0745 1213
d92eb7b0 1214=head2 How can I remove duplicate elements from a list or array?
68dc0745 1215
6670e5e7 1216(contributed by brian d foy)
68dc0745 1217
6670e5e7
RGS
1218Use a hash. When you think the words "unique" or "duplicated", think
1219"hash keys".
68dc0745 1220
6670e5e7
RGS
1221If you don't care about the order of the elements, you could just
1222create the hash then extract the keys. It's not important how you
1223create that hash: just that you use C<keys> to get the unique
1224elements.
551e1d92 1225
ac9dac7f
RGS
1226 my %hash = map { $_, 1 } @array;
1227 # or a hash slice: @hash{ @array } = ();
1228 # or a foreach: $hash{$_} = 1 foreach ( @array );
1229
1230 my @unique = keys %hash;
68dc0745 1231
ac9dac7f
RGS
1232If you want to use a module, try the C<uniq> function from
1233C<List::MoreUtils>. In list context it returns the unique elements,
1234preserving their order in the list. In scalar context, it returns the
1235number of unique elements.
1236
1237 use List::MoreUtils qw(uniq);
1238
1239 my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1240 my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
68dc0745 1241
6670e5e7
RGS
1242You can also go through each element and skip the ones you've seen
1243before. Use a hash to keep track. The first time the loop sees an
1244element, that element has no key in C<%Seen>. The C<next> statement
1245creates the key and immediately uses its value, which is C<undef>, so
1246the loop continues to the C<push> and increments the value for that
1247key. The next time the loop sees that same element, its key exists in
1248the hash I<and> the value for that key is true (since it's not 0 or
ac9dac7f
RGS
1249C<undef>), so the next skips that iteration and the loop goes to the
1250next element.
551e1d92 1251
6670e5e7
RGS
1252 my @unique = ();
1253 my %seen = ();
68dc0745 1254
6670e5e7
RGS
1255 foreach my $elem ( @array )
1256 {
1257 next if $seen{ $elem }++;
1258 push @unique, $elem;
1259 }
68dc0745 1260
6670e5e7
RGS
1261You can write this more briefly using a grep, which does the
1262same thing.
68dc0745 1263
ac9dac7f
RGS
1264 my %seen = ();
1265 my @unique = grep { ! $seen{ $_ }++ } @array;
65acb1b1 1266
ddbc1f16 1267=head2 How can I tell whether a certain element is contained in a list or array?
5a964f20 1268
9e72e4c6
RGS
1269(portions of this answer contributed by Anno Siegel)
1270
5a964f20
TC
1271Hearing the word "in" is an I<in>dication that you probably should have
1272used a hash, not a list or array, to store your data. Hashes are
1273designed to answer this question quickly and efficiently. Arrays aren't.
68dc0745 1274
5a964f20
TC
1275That being said, there are several ways to approach this. If you
1276are going to make this query many times over arbitrary string values,
881bdbd4
JH
1277the fastest way is probably to invert the original array and maintain a
1278hash whose keys are the first array's values.
68dc0745 1279
ac9dac7f
RGS
1280 @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1281 %is_blue = ();
1282 for (@blues) { $is_blue{$_} = 1 }
68dc0745 1283
ac9dac7f
RGS
1284Now you can check whether C<$is_blue{$some_color}>. It might have
1285been a good idea to keep the blues all in a hash in the first place.
68dc0745 1286
1287If the values are all small integers, you could use a simple indexed
1288array. This kind of an array will take up less space:
1289
ac9dac7f
RGS
1290 @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1291 @is_tiny_prime = ();
1292 for (@primes) { $is_tiny_prime[$_] = 1 }
1293 # or simply @istiny_prime[@primes] = (1) x @primes;
68dc0745 1294
1295Now you check whether $is_tiny_prime[$some_number].
1296
1297If the values in question are integers instead of strings, you can save
1298quite a lot of space by using bit strings instead:
1299
ac9dac7f
RGS
1300 @articles = ( 1..10, 150..2000, 2017 );
1301 undef $read;
1302 for (@articles) { vec($read,$_,1) = 1 }
68dc0745 1303
1304Now check whether C<vec($read,$n,1)> is true for some C<$n>.
1305
9e72e4c6
RGS
1306These methods guarantee fast individual tests but require a re-organization
1307of the original list or array. They only pay off if you have to test
1308multiple values against the same array.
68dc0745 1309
ac9dac7f 1310If you are testing only once, the standard module C<List::Util> exports
9e72e4c6
RGS
1311the function C<first> for this purpose. It works by stopping once it
1312finds the element. It's written in C for speed, and its Perl equivalant
1313looks like this subroutine:
68dc0745 1314
9e72e4c6
RGS
1315 sub first (&@) {
1316 my $code = shift;
1317 foreach (@_) {
1318 return $_ if &{$code}();
1319 }
1320 undef;
1321 }
68dc0745 1322
9e72e4c6
RGS
1323If speed is of little concern, the common idiom uses grep in scalar context
1324(which returns the number of items that passed its condition) to traverse the
1325entire list. This does have the benefit of telling you how many matches it
1326found, though.
68dc0745 1327
9e72e4c6 1328 my $is_there = grep $_ eq $whatever, @array;
65acb1b1 1329
9e72e4c6
RGS
1330If you want to actually extract the matching elements, simply use grep in
1331list context.
68dc0745 1332
9e72e4c6 1333 my @matches = grep $_ eq $whatever, @array;
58103a2e 1334
68dc0745 1335=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
1336
ac9dac7f
RGS
1337Use a hash. Here's code to do both and more. It assumes that each
1338element is unique in a given array:
68dc0745 1339
ac9dac7f
RGS
1340 @union = @intersection = @difference = ();
1341 %count = ();
1342 foreach $element (@array1, @array2) { $count{$element}++ }
1343 foreach $element (keys %count) {
1344 push @union, $element;
1345 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1346 }
68dc0745 1347
ac9dac7f
RGS
1348Note that this is the I<symmetric difference>, that is, all elements
1349in either A or in B but not in both. Think of it as an xor operation.
d92eb7b0 1350
65acb1b1
TC
1351=head2 How do I test whether two arrays or hashes are equal?
1352
ac9dac7f
RGS
1353The following code works for single-level arrays. It uses a
1354stringwise comparison, and does not distinguish defined versus
1355undefined empty strings. Modify if you have other needs.
65acb1b1 1356
ac9dac7f 1357 $are_equal = compare_arrays(\@frogs, \@toads);
65acb1b1 1358
ac9dac7f
RGS
1359 sub compare_arrays {
1360 my ($first, $second) = @_;
1361 no warnings; # silence spurious -w undef complaints
1362 return 0 unless @$first == @$second;
1363 for (my $i = 0; $i < @$first; $i++) {
1364 return 0 if $first->[$i] ne $second->[$i];
1365 }
1366 return 1;
1367 }
65acb1b1
TC
1368
1369For multilevel structures, you may wish to use an approach more
ac9dac7f 1370like this one. It uses the CPAN module C<FreezeThaw>:
65acb1b1 1371
ac9dac7f
RGS
1372 use FreezeThaw qw(cmpStr);
1373 @a = @b = ( "this", "that", [ "more", "stuff" ] );
65acb1b1 1374
ac9dac7f
RGS
1375 printf "a and b contain %s arrays\n",
1376 cmpStr(\@a, \@b) == 0
1377 ? "the same"
1378 : "different";
65acb1b1 1379
ac9dac7f
RGS
1380This approach also works for comparing hashes. Here we'll demonstrate
1381two different answers:
65acb1b1 1382
ac9dac7f 1383 use FreezeThaw qw(cmpStr cmpStrHard);
65acb1b1 1384
ac9dac7f
RGS
1385 %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1386 $a{EXTRA} = \%b;
1387 $b{EXTRA} = \%a;
65acb1b1 1388
ac9dac7f 1389 printf "a and b contain %s hashes\n",
65acb1b1
TC
1390 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1391
ac9dac7f 1392 printf "a and b contain %s hashes\n",
65acb1b1
TC
1393 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1394
1395
1396The first reports that both those the hashes contain the same data,
1397while the second reports that they do not. Which you prefer is left as
1398an exercise to the reader.
1399
68dc0745 1400=head2 How do I find the first array element for which a condition is true?
1401
49d635f9 1402To find the first array element which satisfies a condition, you can
ac9dac7f
RGS
1403use the C<first()> function in the C<List::Util> module, which comes
1404with Perl 5.8. This example finds the first element that contains
1405"Perl".
49d635f9
RGS
1406
1407 use List::Util qw(first);
197aec24 1408
49d635f9 1409 my $element = first { /Perl/ } @array;
197aec24 1410
ac9dac7f 1411If you cannot use C<List::Util>, you can make your own loop to do the
49d635f9
RGS
1412same thing. Once you find the element, you stop the loop with last.
1413
1414 my $found;
ac9dac7f 1415 foreach ( @array ) {
6670e5e7 1416 if( /Perl/ ) { $found = $_; last }
49d635f9
RGS
1417 }
1418
1419If you want the array index, you can iterate through the indices
1420and check the array element at each index until you find one
1421that satisfies the condition.
1422
197aec24 1423 my( $found, $index ) = ( undef, -1 );
ac9dac7f
RGS
1424 for( $i = 0; $i < @array; $i++ ) {
1425 if( $array[$i] =~ /Perl/ ) {
6670e5e7
RGS
1426 $found = $array[$i];
1427 $index = $i;
1428 last;
1429 }
1430 }
68dc0745 1431
1432=head2 How do I handle linked lists?
1433
1434In general, you usually don't need a linked list in Perl, since with
ac9dac7f
RGS
1435regular arrays, you can push and pop or shift and unshift at either
1436end, or you can use splice to add and/or remove arbitrary number of
1437elements at arbitrary points. Both pop and shift are both O(1)
1438operations on Perl's dynamic arrays. In the absence of shifts and
1439pops, push in general needs to reallocate on the order every log(N)
1440times, and unshift will need to copy pointers each time.
68dc0745 1441
1442If you really, really wanted, you could use structures as described in
ac9dac7f
RGS
1443L<perldsc> or L<perltoot> and do just what the algorithm book tells
1444you to do. For example, imagine a list node like this:
65acb1b1 1445
ac9dac7f
RGS
1446 $node = {
1447 VALUE => 42,
1448 LINK => undef,
1449 };
65acb1b1
TC
1450
1451You could walk the list this way:
1452
ac9dac7f
RGS
1453 print "List: ";
1454 for ($node = $head; $node; $node = $node->{LINK}) {
1455 print $node->{VALUE}, " ";
1456 }
1457 print "\n";
65acb1b1 1458
a6dd486b 1459You could add to the list this way:
65acb1b1 1460
ac9dac7f
RGS
1461 my ($head, $tail);
1462 $tail = append($head, 1); # grow a new head
1463 for $value ( 2 .. 10 ) {
1464 $tail = append($tail, $value);
1465 }
65acb1b1 1466
ac9dac7f
RGS
1467 sub append {
1468 my($list, $value) = @_;
1469 my $node = { VALUE => $value };
1470 if ($list) {
1471 $node->{LINK} = $list->{LINK};
1472 $list->{LINK} = $node;
1473 }
1474 else {
1475 $_[0] = $node; # replace caller's version
1476 }
1477 return $node;
1478 }
65acb1b1
TC
1479
1480But again, Perl's built-in are virtually always good enough.
68dc0745 1481
1482=head2 How do I handle circular lists?
1483
1484Circular lists could be handled in the traditional fashion with linked
1485lists, or you could just do something like this with an array:
1486
ac9dac7f
RGS
1487 unshift(@array, pop(@array)); # the last shall be first
1488 push(@array, shift(@array)); # and vice versa
1489
1490You can also use C<Tie::Cycle>:
1491
1492 use Tie::Cycle;
1493
1494 tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1495
1496 print $cycle; # FFFFFF
1497 print $cycle; # 000000
1498 print $cycle; # FFFF00
68dc0745 1499
1500=head2 How do I shuffle an array randomly?
1501
45bbf655
JH
1502If you either have Perl 5.8.0 or later installed, or if you have
1503Scalar-List-Utils 1.03 or later installed, you can say:
1504
ac9dac7f 1505 use List::Util 'shuffle';
45bbf655
JH
1506
1507 @shuffled = shuffle(@list);
1508
f05bbc40 1509If not, you can use a Fisher-Yates shuffle.
5a964f20 1510
ac9dac7f
RGS
1511 sub fisher_yates_shuffle {
1512 my $deck = shift; # $deck is a reference to an array
1513 my $i = @$deck;
1514 while (--$i) {
1515 my $j = int rand ($i+1);
1516 @$deck[$i,$j] = @$deck[$j,$i];
1517 }
1518 }
5a964f20 1519
ac9dac7f
RGS
1520 # shuffle my mpeg collection
1521 #
1522 my @mpeg = <audio/*/*.mp3>;
1523 fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
1524 print @mpeg;
5a964f20 1525
45bbf655 1526Note that the above implementation shuffles an array in place,
ac9dac7f 1527unlike the C<List::Util::shuffle()> which takes a list and returns
45bbf655
JH
1528a new shuffled list.
1529
d92eb7b0 1530You've probably seen shuffling algorithms that work using splice,
a6dd486b 1531randomly picking another element to swap the current element with
68dc0745 1532
ac9dac7f
RGS
1533 srand;
1534 @new = ();
1535 @old = 1 .. 10; # just a demo
1536 while (@old) {
1537 push(@new, splice(@old, rand @old, 1));
1538 }
68dc0745 1539
ac9dac7f
RGS
1540This is bad because splice is already O(N), and since you do it N
1541times, you just invented a quadratic algorithm; that is, O(N**2).
1542This does not scale, although Perl is so efficient that you probably
1543won't notice this until you have rather largish arrays.
68dc0745 1544
1545=head2 How do I process/modify each element of an array?
1546
1547Use C<for>/C<foreach>:
1548
ac9dac7f 1549 for (@lines) {
6670e5e7
RGS
1550 s/foo/bar/; # change that word
1551 tr/XZ/ZX/; # swap those letters
ac9dac7f 1552 }
68dc0745 1553
1554Here's another; let's compute spherical volumes:
1555
ac9dac7f 1556 for (@volumes = @radii) { # @volumes has changed parts
6670e5e7
RGS
1557 $_ **= 3;
1558 $_ *= (4/3) * 3.14159; # this will be constant folded
ac9dac7f 1559 }
197aec24 1560
ac9dac7f 1561which can also be done with C<map()> which is made to transform
49d635f9
RGS
1562one list into another:
1563
1564 @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
68dc0745 1565
76817d6d
JH
1566If you want to do the same thing to modify the values of the
1567hash, you can use the C<values> function. As of Perl 5.6
1568the values are not copied, so if you modify $orbit (in this
1569case), you modify the value.
5a964f20 1570
ac9dac7f 1571 for $orbit ( values %orbits ) {
6670e5e7 1572 ($orbit **= 3) *= (4/3) * 3.14159;
ac9dac7f 1573 }
818c4caa 1574
76817d6d
JH
1575Prior to perl 5.6 C<values> returned copies of the values,
1576so older perl code often contains constructions such as
1577C<@orbits{keys %orbits}> instead of C<values %orbits> where
1578the hash is to be modified.
818c4caa 1579
68dc0745 1580=head2 How do I select a random element from an array?
1581
ac9dac7f 1582Use the C<rand()> function (see L<perlfunc/rand>):
68dc0745 1583
ac9dac7f
RGS
1584 $index = rand @array;
1585 $element = $array[$index];
68dc0745 1586
793f5136 1587Or, simply:
ac9dac7f
RGS
1588
1589 my $element = $array[ rand @array ];
5a964f20 1590
68dc0745 1591=head2 How do I permute N elements of a list?
1592
ac9dac7f
RGS
1593Use the C<List::Permutor> module on CPAN. If the list is actually an
1594array, try the C<Algorithm::Permute> module (also on CPAN). It's
1595written in XS code and is very efficient.
49d635f9
RGS
1596
1597 use Algorithm::Permute;
1598 my @array = 'a'..'d';
1599 my $p_iterator = Algorithm::Permute->new ( \@array );
1600 while (my @perm = $p_iterator->next) {
1601 print "next permutation: (@perm)\n";
ac9dac7f 1602 }
49d635f9 1603
197aec24
RGS
1604For even faster execution, you could do:
1605
ac9dac7f
RGS
1606 use Algorithm::Permute;
1607 my @array = 'a'..'d';
1608 Algorithm::Permute::permute {
1609 print "next permutation: (@array)\n";
1610 } @array;
197aec24 1611
49d635f9
RGS
1612Here's a little program that generates all permutations of
1613all the words on each line of input. The algorithm embodied
ac9dac7f 1614in the C<permute()> function is discussed in Volume 4 (still
49d635f9
RGS
1615unpublished) of Knuth's I<The Art of Computer Programming>
1616and will work on any list:
1617
1618 #!/usr/bin/perl -n
1619 # Fischer-Kause ordered permutation generator
1620
1621 sub permute (&@) {
1622 my $code = shift;
1623 my @idx = 0..$#_;
1624 while ( $code->(@_[@idx]) ) {
1625 my $p = $#idx;
1626 --$p while $idx[$p-1] > $idx[$p];
1627 my $q = $p or return;
1628 push @idx, reverse splice @idx, $p;
1629 ++$q while $idx[$p-1] > $idx[$q];
1630 @idx[$p-1,$q]=@idx[$q,$p-1];
1631 }
68dc0745 1632 }
68dc0745 1633
49d635f9 1634 permute {print"@_\n"} split;
b8d2732a 1635
68dc0745 1636=head2 How do I sort an array by (anything)?
1637
1638Supply a comparison function to sort() (described in L<perlfunc/sort>):
1639
ac9dac7f 1640 @list = sort { $a <=> $b } @list;
68dc0745 1641
1642The default sort function is cmp, string comparison, which would
c47ff5f1 1643sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is
68dc0745 1644the numerical comparison operator.
1645
1646If you have a complicated function needed to pull out the part you
1647want to sort on, then don't do it inside the sort function. Pull it
1648out first, because the sort BLOCK can be called many times for the
1649same element. Here's an example of how to pull out the first word
1650after the first number on each item, and then sort those words
1651case-insensitively.
1652
ac9dac7f
RGS
1653 @idx = ();
1654 for (@data) {
1655 ($item) = /\d+\s*(\S+)/;
1656 push @idx, uc($item);
1657 }
1658 @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
68dc0745 1659
a6dd486b 1660which could also be written this way, using a trick
68dc0745 1661that's come to be known as the Schwartzian Transform:
1662
ac9dac7f
RGS
1663 @sorted = map { $_->[0] }
1664 sort { $a->[1] cmp $b->[1] }
1665 map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
68dc0745 1666
1667If you need to sort on several fields, the following paradigm is useful.
1668
ac9dac7f
RGS
1669 @sorted = sort {
1670 field1($a) <=> field1($b) ||
1671 field2($a) cmp field2($b) ||
1672 field3($a) cmp field3($b)
1673 } @data;
68dc0745 1674
1675This can be conveniently combined with precalculation of keys as given
1676above.
1677
379e39d7 1678See the F<sort> article in the "Far More Than You Ever Wanted
49d635f9 1679To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
06a5f41f 1680more about this approach.
68dc0745 1681
ac9dac7f 1682See also the question later in L<perlfaq4> on sorting hashes.
68dc0745 1683
1684=head2 How do I manipulate arrays of bits?
1685
ac9dac7f
RGS
1686Use C<pack()> and C<unpack()>, or else C<vec()> and the bitwise
1687operations.
1688
1689For example, this sets C<$vec> to have bit N set if C<$ints[N]> was
1690set:
1691
1692 $vec = '';
1693 foreach(@ints) { vec($vec,$_,1) = 1 }
1694
1695Here's how, given a vector in C<$vec>, you can get those bits into your
1696C<@ints> array:
1697
1698 sub bitvec_to_list {
1699 my $vec = shift;
1700 my @ints;
1701 # Find null-byte density then select best algorithm
1702 if ($vec =~ tr/\0// / length $vec > 0.95) {
1703 use integer;
1704 my $i;
1705
1706 # This method is faster with mostly null-bytes
1707 while($vec =~ /[^\0]/g ) {
1708 $i = -9 + 8 * pos $vec;
1709 push @ints, $i if vec($vec, ++$i, 1);
1710 push @ints, $i if vec($vec, ++$i, 1);
1711 push @ints, $i if vec($vec, ++$i, 1);
1712 push @ints, $i if vec($vec, ++$i, 1);
1713 push @ints, $i if vec($vec, ++$i, 1);
1714 push @ints, $i if vec($vec, ++$i, 1);
1715 push @ints, $i if vec($vec, ++$i, 1);
1716 push @ints, $i if vec($vec, ++$i, 1);
1717 }
1718 }
1719 else {
1720 # This method is a fast general algorithm
1721 use integer;
1722 my $bits = unpack "b*", $vec;
1723 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1724 push @ints, pos $bits while($bits =~ /1/g);
1725 }
1726
1727 return \@ints;
1728 }
68dc0745 1729
1730This method gets faster the more sparse the bit vector is.
1731(Courtesy of Tim Bunce and Winfried Koenig.)
1732
76817d6d
JH
1733You can make the while loop a lot shorter with this suggestion
1734from Benjamin Goldberg:
1735
1736 while($vec =~ /[^\0]+/g ) {
ac9dac7f
RGS
1737 push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1738 }
76817d6d 1739
ac9dac7f 1740Or use the CPAN module C<Bit::Vector>:
cc30d1a7 1741
ac9dac7f
RGS
1742 $vector = Bit::Vector->new($num_of_bits);
1743 $vector->Index_List_Store(@ints);
1744 @ints = $vector->Index_List_Read();
cc30d1a7 1745
ac9dac7f
RGS
1746C<Bit::Vector> provides efficient methods for bit vector, sets of
1747small integers and "big int" math.
cc30d1a7
JH
1748
1749Here's a more extensive illustration using vec():
65acb1b1 1750
ac9dac7f
RGS
1751 # vec demo
1752 $vector = "\xff\x0f\xef\xfe";
1753 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
65acb1b1 1754 unpack("N", $vector), "\n";
ac9dac7f
RGS
1755 $is_set = vec($vector, 23, 1);
1756 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
65acb1b1 1757 pvec($vector);
65acb1b1 1758
ac9dac7f
RGS
1759 set_vec(1,1,1);
1760 set_vec(3,1,1);
1761 set_vec(23,1,1);
1762
1763 set_vec(3,1,3);
1764 set_vec(3,2,3);
1765 set_vec(3,4,3);
1766 set_vec(3,4,7);
1767 set_vec(3,8,3);
1768 set_vec(3,8,7);
1769
1770 set_vec(0,32,17);
1771 set_vec(1,32,17);
1772
1773 sub set_vec {
1774 my ($offset, $width, $value) = @_;
1775 my $vector = '';
1776 vec($vector, $offset, $width) = $value;
1777 print "offset=$offset width=$width value=$value\n";
1778 pvec($vector);
1779 }
65acb1b1 1780
ac9dac7f
RGS
1781 sub pvec {
1782 my $vector = shift;
1783 my $bits = unpack("b*", $vector);
1784 my $i = 0;
1785 my $BASE = 8;
1786
1787 print "vector length in bytes: ", length($vector), "\n";
1788 @bytes = unpack("A8" x length($vector), $bits);
1789 print "bits are: @bytes\n\n";
1790 }
65acb1b1 1791
68dc0745 1792=head2 Why does defined() return true on empty arrays and hashes?
1793
65acb1b1
TC
1794The short story is that you should probably only use defined on scalars or
1795functions, not on aggregates (arrays and hashes). See L<perlfunc/defined>
1796in the 5.004 release or later of Perl for more detail.
68dc0745 1797
1798=head1 Data: Hashes (Associative Arrays)
1799
1800=head2 How do I process an entire hash?
1801
1802Use the each() function (see L<perlfunc/each>) if you don't care
1803whether it's sorted:
1804
ac9dac7f
RGS
1805 while ( ($key, $value) = each %hash) {
1806 print "$key = $value\n";
1807 }
68dc0745 1808
1809If you want it sorted, you'll have to use foreach() on the result of
1810sorting the keys as shown in an earlier question.
1811
1812=head2 What happens if I add or remove keys from a hash while iterating over it?
1813
28b41a80 1814(contributed by brian d foy)
d92eb7b0 1815
28b41a80 1816The easy answer is "Don't do that!"
d92eb7b0 1817
28b41a80
RGS
1818If you iterate through the hash with each(), you can delete the key
1819most recently returned without worrying about it. If you delete or add
1820other keys, the iterator may skip or double up on them since perl
1821may rearrange the hash table. See the
1822entry for C<each()> in L<perlfunc>.
68dc0745 1823
1824=head2 How do I look up a hash element by value?
1825
1826Create a reverse hash:
1827
ac9dac7f
RGS
1828 %by_value = reverse %by_key;
1829 $key = $by_value{$value};
68dc0745 1830
1831That's not particularly efficient. It would be more space-efficient
1832to use:
1833
ac9dac7f
RGS
1834 while (($key, $value) = each %by_key) {
1835 $by_value{$value} = $key;
1836 }
68dc0745 1837
d92eb7b0
GS
1838If your hash could have repeated values, the methods above will only find
1839one of the associated keys. This may or may not worry you. If it does
1840worry you, you can always reverse the hash into a hash of arrays instead:
1841
ac9dac7f
RGS
1842 while (($key, $value) = each %by_key) {
1843 push @{$key_list_by_value{$value}}, $key;
1844 }
68dc0745 1845
1846=head2 How can I know how many entries are in a hash?
1847
1848If you mean how many keys, then all you have to do is
875e5c2f 1849use the keys() function in a scalar context:
68dc0745 1850
875e5c2f 1851 $num_keys = keys %hash;
68dc0745 1852
197aec24
RGS
1853The keys() function also resets the iterator, which means that you may
1854see strange results if you use this between uses of other hash operators
875e5c2f 1855such as each().
68dc0745 1856
1857=head2 How do I sort a hash (optionally by value instead of key)?
1858
a05e4845
RGS
1859(contributed by brian d foy)
1860
1861To sort a hash, start with the keys. In this example, we give the list of
1862keys to the sort function which then compares them ASCIIbetically (which
1863might be affected by your locale settings). The output list has the keys
1864in ASCIIbetical order. Once we have the keys, we can go through them to
1865create a report which lists the keys in ASCIIbetical order.
1866
1867 my @keys = sort { $a cmp $b } keys %hash;
58103a2e 1868
a05e4845
RGS
1869 foreach my $key ( @keys )
1870 {
1871 printf "%-20s %6d\n", $key, $hash{$value};
1872 }
1873
58103a2e 1874We could get more fancy in the C<sort()> block though. Instead of
a05e4845 1875comparing the keys, we can compute a value with them and use that
58103a2e 1876value as the comparison.
a05e4845
RGS
1877
1878For instance, to make our report order case-insensitive, we use
58103a2e 1879the C<\L> sequence in a double-quoted string to make everything
a05e4845
RGS
1880lowercase. The C<sort()> block then compares the lowercased
1881values to determine in which order to put the keys.
1882
1883 my @keys = sort { "\L$a" cmp "\L$b" } keys %hash;
58103a2e 1884
a05e4845 1885Note: if the computation is expensive or the hash has many elements,
58103a2e 1886you may want to look at the Schwartzian Transform to cache the
a05e4845
RGS
1887computation results.
1888
1889If we want to sort by the hash value instead, we use the hash key
1890to look it up. We still get out a list of keys, but this time they
1891are ordered by their value.
1892
1893 my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
1894
1895From there we can get more complex. If the hash values are the same,
1896we can provide a secondary sort on the hash key.
1897
58103a2e
RGS
1898 my @keys = sort {
1899 $hash{$a} <=> $hash{$b}
a05e4845
RGS
1900 or
1901 "\L$a" cmp "\L$b"
1902 } keys %hash;
68dc0745 1903
1904=head2 How can I always keep my hash sorted?
ac9dac7f 1905X<hash tie sort DB_File Tie::IxHash>
68dc0745 1906
ac9dac7f
RGS
1907You can look into using the C<DB_File> module and C<tie()> using the
1908C<$DB_BTREE> hash bindings as documented in L<DB_File/"In Memory
1909Databases">. The C<Tie::IxHash> module from CPAN might also be
1910instructive. Although this does keep your hash sorted, you might not
1911like the slow down you suffer from the tie interface. Are you sure you
1912need to do this? :)
68dc0745 1913
1914=head2 What's the difference between "delete" and "undef" with hashes?
1915
92993692
JH
1916Hashes contain pairs of scalars: the first is the key, the
1917second is the value. The key will be coerced to a string,
1918although the value can be any kind of scalar: string,
ac9dac7f 1919number, or reference. If a key C<$key> is present in
92993692
JH
1920%hash, C<exists($hash{$key})> will return true. The value
1921for a given key can be C<undef>, in which case
1922C<$hash{$key}> will be C<undef> while C<exists $hash{$key}>
1923will return true. This corresponds to (C<$key>, C<undef>)
1924being in the hash.
68dc0745 1925
ac9dac7f 1926Pictures help... here's the C<%hash> table:
68dc0745 1927
1928 keys values
1929 +------+------+
1930 | a | 3 |
1931 | x | 7 |
1932 | d | 0 |
1933 | e | 2 |
1934 +------+------+
1935
1936And these conditions hold
1937
92993692
JH
1938 $hash{'a'} is true
1939 $hash{'d'} is false
1940 defined $hash{'d'} is true
1941 defined $hash{'a'} is true
1942 exists $hash{'a'} is true (Perl5 only)
1943 grep ($_ eq 'a', keys %hash) is true
68dc0745 1944
1945If you now say
1946
92993692 1947 undef $hash{'a'}
68dc0745 1948
1949your table now reads:
1950
1951
1952 keys values
1953 +------+------+
1954 | a | undef|
1955 | x | 7 |
1956 | d | 0 |
1957 | e | 2 |
1958 +------+------+
1959
1960and these conditions now hold; changes in caps:
1961
92993692
JH
1962 $hash{'a'} is FALSE
1963 $hash{'d'} is false
1964 defined $hash{'d'} is true
1965 defined $hash{'a'} is FALSE
1966 exists $hash{'a'} is true (Perl5 only)
1967 grep ($_ eq 'a', keys %hash) is true
68dc0745 1968
1969Notice the last two: you have an undef value, but a defined key!
1970
1971Now, consider this:
1972
92993692 1973 delete $hash{'a'}
68dc0745 1974
1975your table now reads:
1976
1977 keys values
1978 +------+------+
1979 | x | 7 |
1980 | d | 0 |
1981 | e | 2 |
1982 +------+------+
1983
1984and these conditions now hold; changes in caps:
1985
92993692
JH
1986 $hash{'a'} is false
1987 $hash{'d'} is false
1988 defined $hash{'d'} is true
1989 defined $hash{'a'} is false
1990 exists $hash{'a'} is FALSE (Perl5 only)
1991 grep ($_ eq 'a', keys %hash) is FALSE
68dc0745 1992
1993See, the whole entry is gone!
1994
1995=head2 Why don't my tied hashes make the defined/exists distinction?
1996
92993692
JH
1997This depends on the tied hash's implementation of EXISTS().
1998For example, there isn't the concept of undef with hashes
1999that are tied to DBM* files. It also means that exists() and
2000defined() do the same thing with a DBM* file, and what they
2001end up doing is not what they do with ordinary hashes.
68dc0745 2002
2003=head2 How do I reset an each() operation part-way through?
2004
5a964f20 2005Using C<keys %hash> in scalar context returns the number of keys in
68dc0745 2006the hash I<and> resets the iterator associated with the hash. You may
ac9dac7f
RGS
2007need to do this if you use C<last> to exit a loop early so that when
2008you re-enter it, the hash iterator has been reset.
68dc0745 2009
2010=head2 How can I get the unique keys from two hashes?
2011
d92eb7b0
GS
2012First you extract the keys from the hashes into lists, then solve
2013the "removing duplicates" problem described above. For example:
68dc0745 2014
ac9dac7f
RGS
2015 %seen = ();
2016 for $element (keys(%foo), keys(%bar)) {
2017 $seen{$element}++;
2018 }
2019 @uniq = keys %seen;
68dc0745 2020
2021Or more succinctly:
2022
ac9dac7f 2023 @uniq = keys %{{%foo,%bar}};
68dc0745 2024
2025Or if you really want to save space:
2026
ac9dac7f
RGS
2027 %seen = ();
2028 while (defined ($key = each %foo)) {
2029 $seen{$key}++;
2030 }
2031 while (defined ($key = each %bar)) {
2032 $seen{$key}++;
2033 }
2034 @uniq = keys %seen;
68dc0745 2035
2036=head2 How can I store a multidimensional array in a DBM file?
2037
2038Either stringify the structure yourself (no fun), or else
2039get the MLDBM (which uses Data::Dumper) module from CPAN and layer
2040it on top of either DB_File or GDBM_File.
2041
2042=head2 How can I make my hash remember the order I put elements into it?
2043
ac9dac7f 2044Use the C<Tie::IxHash> from CPAN.
68dc0745 2045
ac9dac7f
RGS
2046 use Tie::IxHash;
2047
2048 tie my %myhash, 'Tie::IxHash';
2049
2050 for (my $i=0; $i<20; $i++) {
2051 $myhash{$i} = 2*$i;
2052 }
2053
2054 my @keys = keys %myhash;
2055 # @keys = (0,1,2,3,...)
46fc3d4c 2056
68dc0745 2057=head2 Why does passing a subroutine an undefined element in a hash create it?
2058
2059If you say something like:
2060
ac9dac7f 2061 somefunc($hash{"nonesuch key here"});
68dc0745 2062
2063Then that element "autovivifies"; that is, it springs into existence
2064whether you store something there or not. That's because functions
2065get scalars passed in by reference. If somefunc() modifies C<$_[0]>,
2066it has to be ready to write it back into the caller's version.
2067
87275199 2068This has been fixed as of Perl5.004.
68dc0745 2069
2070Normally, merely accessing a key's value for a nonexistent key does
2071I<not> cause that key to be forever there. This is different than
2072awk's behavior.
2073
fc36a67e 2074=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
68dc0745 2075
65acb1b1
TC
2076Usually a hash ref, perhaps like this:
2077
ac9dac7f
RGS
2078 $record = {
2079 NAME => "Jason",
2080 EMPNO => 132,
2081 TITLE => "deputy peon",
2082 AGE => 23,
2083 SALARY => 37_000,
2084 PALS => [ "Norbert", "Rhys", "Phineas"],
2085 };
65acb1b1
TC
2086
2087References are documented in L<perlref> and the upcoming L<perlreftut>.
2088Examples of complex data structures are given in L<perldsc> and
2089L<perllol>. Examples of structures and object-oriented classes are
2090in L<perltoot>.
68dc0745 2091
2092=head2 How can I use a reference as a hash key?
2093
9e72e4c6
RGS
2094(contributed by brian d foy)
2095
2096Hash keys are strings, so you can't really use a reference as the key.
2097When you try to do that, perl turns the reference into its stringified
ac9dac7f
RGS
2098form (for instance, C<HASH(0xDEADBEEF)>). From there you can't get
2099back the reference from the stringified form, at least without doing
2100some extra work on your own. Also remember that hash keys must be
2101unique, but two different variables can store the same reference (and
2102those variables can change later).
9e72e4c6 2103
ac9dac7f
RGS
2104The C<Tie::RefHash> module, which is distributed with perl, might be
2105what you want. It handles that extra work.
68dc0745 2106
2107=head1 Data: Misc
2108
2109=head2 How do I handle binary data correctly?
2110
ac9dac7f 2111Perl is binary clean, so it can handle binary data just fine.
e573f903 2112On Windows or DOS, however, you have to use C<binmode> for binary
ac9dac7f
RGS
2113files to avoid conversions for line endings. In general, you should
2114use C<binmode> any time you want to work with binary data.
68dc0745 2115
ac9dac7f 2116Also see L<perlfunc/"binmode"> or L<perlopentut>.
68dc0745 2117
ac9dac7f 2118If you're concerned about 8-bit textual data then see L<perllocale>.
54310121 2119If you want to deal with multibyte characters, however, there are
68dc0745 2120some gotchas. See the section on Regular Expressions.
2121
2122=head2 How do I determine whether a scalar is a number/whole/integer/float?
2123
2124Assuming that you don't care about IEEE notations like "NaN" or
2125"Infinity", you probably just want to use a regular expression.
2126
ac9dac7f
RGS
2127 if (/\D/) { print "has nondigits\n" }
2128 if (/^\d+$/) { print "is a whole number\n" }
2129 if (/^-?\d+$/) { print "is an integer\n" }
2130 if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
2131 if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
2132 if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
2133 if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
881bdbd4 2134 { print "a C float\n" }
68dc0745 2135
f0d19b68
RGS
2136There are also some commonly used modules for the task.
2137L<Scalar::Util> (distributed with 5.8) provides access to perl's
ac9dac7f
RGS
2138internal function C<looks_like_number> for determining whether a
2139variable looks like a number. L<Data::Types> exports functions that
2140validate data types using both the above and other regular
2141expressions. Thirdly, there is C<Regexp::Common> which has regular
2142expressions to match various types of numbers. Those three modules are
2143available from the CPAN.
f0d19b68
RGS
2144
2145If you're on a POSIX system, Perl supports the C<POSIX::strtod>
ac9dac7f
RGS
2146function. Its semantics are somewhat cumbersome, so here's a
2147C<getnum> wrapper function for more convenient access. This function
2148takes a string and returns the number it found, or C<undef> for input
2149that isn't a C float. The C<is_numeric> function is a front end to
2150C<getnum> if you just want to say, "Is this a float?"
2151
2152 sub getnum {
2153 use POSIX qw(strtod);
2154 my $str = shift;
2155 $str =~ s/^\s+//;
2156 $str =~ s/\s+$//;
2157 $! = 0;
2158 my($num, $unparsed) = strtod($str);
2159 if (($str eq '') || ($unparsed != 0) || $!) {
2160 return undef;
2161 }
2162 else {
2163 return $num;
2164 }
2165 }
5a964f20 2166
ac9dac7f 2167 sub is_numeric { defined getnum($_[0]) }
5a964f20 2168
f0d19b68 2169Or you could check out the L<String::Scanf> module on the CPAN
ac9dac7f
RGS
2170instead. The C<POSIX> module (part of the standard Perl distribution)
2171provides the C<strtod> and C<strtol> for converting strings to double
2172and longs, respectively.
68dc0745 2173
2174=head2 How do I keep persistent data across program calls?
2175
2176For some specific applications, you can use one of the DBM modules.
ac9dac7f
RGS
2177See L<AnyDBM_File>. More generically, you should consult the C<FreezeThaw>
2178or C<Storable> modules from CPAN. Starting from Perl 5.8 C<Storable> is part
2179of the standard distribution. Here's one example using C<Storable>'s C<store>
fe854a6f 2180and C<retrieve> functions:
65acb1b1 2181
ac9dac7f
RGS
2182 use Storable;
2183 store(\%hash, "filename");
65acb1b1 2184
ac9dac7f
RGS
2185 # later on...
2186 $href = retrieve("filename"); # by ref
2187 %hash = %{ retrieve("filename") }; # direct to hash
68dc0745 2188
2189=head2 How do I print out or copy a recursive data structure?
2190
ac9dac7f
RGS
2191The C<Data::Dumper> module on CPAN (or the 5.005 release of Perl) is great
2192for printing out data structures. The C<Storable> module on CPAN (or the
6f82c03a
EM
21935.8 release of Perl), provides a function called C<dclone> that recursively
2194copies its argument.
65acb1b1 2195
ac9dac7f
RGS
2196 use Storable qw(dclone);
2197 $r2 = dclone($r1);
68dc0745 2198
ac9dac7f 2199Where C<$r1> can be a reference to any kind of data structure you'd like.
65acb1b1
TC
2200It will be deeply copied. Because C<dclone> takes and returns references,
2201you'd have to add extra punctuation if you had a hash of arrays that
2202you wanted to copy.
68dc0745 2203
ac9dac7f 2204 %newhash = %{ dclone(\%oldhash) };
68dc0745 2205
2206=head2 How do I define methods for every class/object?
2207
ac9dac7f 2208Use the C<UNIVERSAL> class (see L<UNIVERSAL>).
68dc0745 2209
2210=head2 How do I verify a credit card checksum?
2211
ac9dac7f 2212Get the C<Business::CreditCard> module from CPAN.
68dc0745 2213
65acb1b1
TC
2214=head2 How do I pack arrays of doubles or floats for XS code?
2215
ac9dac7f 2216The kgbpack.c code in the C<PGPLOT> module on CPAN does just this.
65acb1b1 2217If you're doing a lot of float or double processing, consider using
ac9dac7f 2218the C<PDL> module from CPAN instead--it makes number-crunching easy.
65acb1b1 2219
500071f4
RGS
2220=head1 REVISION
2221
322be77c 2222Revision: $Revision: 7996 $
500071f4 2223
322be77c 2224Date: $Date: 2006-11-01 09:24:38 +0100 (mer, 01 nov 2006) $
500071f4
RGS
2225
2226See L<perlfaq> for source control details and availability.
2227
68dc0745 2228=head1 AUTHOR AND COPYRIGHT
2229
58103a2e 2230Copyright (c) 1997-2006 Tom Christiansen, Nathan Torkington, and
7678cced 2231other authors as noted. All rights reserved.
5a964f20 2232
5a7beb56
JH
2233This documentation is free; you can redistribute it and/or modify it
2234under the same terms as Perl itself.
5a964f20
TC
2235
2236Irrespective of its distribution, all code examples in this file
2237are hereby placed into the public domain. You are permitted and
2238encouraged to use this code in your own programs for fun
2239or for profit as you see fit. A simple comment in the code giving
2240credit would be courteous but is not required.