This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Re: [PATCH] remove leaveit from toke.c:scan_const
[perl5.git] / pod / perlfaq4.pod
CommitLineData
68dc0745 1=head1 NAME
2
ac9dac7f 3perlfaq4 - Data Manipulation ($Revision: 6816 $)
68dc0745 4
5=head1 DESCRIPTION
6
ae3d0b9f
JH
7This section of the FAQ answers questions related to manipulating
8numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
68dc0745 9
10=head1 Data: Numbers
11
46fc3d4c 12=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
13
ac9dac7f
RGS
14Internally, your computer represents floating-point numbers in binary.
15Digital (as in powers of two) computers cannot store all numbers
16exactly. Some real numbers lose precision in the process. This is a
17problem with how computers store numbers and affects all computer
18languages, not just Perl.
46fc3d4c 19
ac9dac7f
RGS
20L<perlnumber> show the gory details of number representations and
21conversions.
49d635f9 22
ac9dac7f
RGS
23To limit the number of decimal places in your numbers, you can use the
24printf or sprintf function. See the L<"Floating Point
25Arithmetic"|perlop> for more details.
49d635f9
RGS
26
27 printf "%.2f", 10/3;
197aec24 28
49d635f9 29 my $number = sprintf "%.2f", 10/3;
197aec24 30
32969b6e
BB
31=head2 Why is int() broken?
32
ac9dac7f 33Your C<int()> is most probably working just fine. It's the numbers that
32969b6e
BB
34aren't quite what you think.
35
ac9dac7f 36First, see the answer to "Why am I getting long decimals
32969b6e
BB
37(eg, 19.9499999999999) instead of the numbers I should be getting
38(eg, 19.95)?".
39
40For example, this
41
ac9dac7f 42 print int(0.6/0.2-2), "\n";
32969b6e
BB
43
44will in most computers print 0, not 1, because even such simple
45numbers as 0.6 and 0.2 cannot be presented exactly by floating-point
46numbers. What you think in the above as 'three' is really more like
472.9999999999999995559.
48
68dc0745 49=head2 Why isn't my octal data interpreted correctly?
50
49d635f9
RGS
51Perl only understands octal and hex numbers as such when they occur as
52literals in your program. Octal literals in perl must start with a
ac9dac7f 53leading C<0> and hexadecimal literals must start with a leading C<0x>.
49d635f9 54If they are read in from somewhere and assigned, no automatic
ac9dac7f
RGS
55conversion takes place. You must explicitly use C<oct()> or C<hex()> if you
56want the values converted to decimal. C<oct()> interprets hexadecimal (C<0x350>),
57octal (C<0350> or even without the leading C<0>, like C<377>) and binary
58(C<0b1010>) numbers, while C<hex()> only converts hexadecimal ones, with
59or without a leading C<0x>, such as C<0x255>, C<3A>, C<ff>, or C<deadbeef>.
33ce146f 60The inverse mapping from decimal to octal can be done with either the
ac9dac7f 61<%o> or C<%O> C<sprintf()> formats.
68dc0745 62
ac9dac7f
RGS
63This problem shows up most often when people try using C<chmod()>,
64C<mkdir()>, C<umask()>, or C<sysopen()>, which by widespread tradition
65typically take permissions in octal.
68dc0745 66
ac9dac7f
RGS
67 chmod(644, $file); # WRONG
68 chmod(0644, $file); # right
68dc0745 69
197aec24 70Note the mistake in the first line was specifying the decimal literal
ac9dac7f 71C<644>, rather than the intended octal literal C<0644>. The problem can
33ce146f
PP
72be seen with:
73
ac9dac7f 74 printf("%#o",644); # prints 01204
33ce146f
PP
75
76Surely you had not intended C<chmod(01204, $file);> - did you? If you
77want to use numeric literals as arguments to chmod() et al. then please
197aec24 78try to express them as octal constants, that is with a leading zero and
ac9dac7f 79with the following digits restricted to the set C<0..7>.
33ce146f 80
65acb1b1 81=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
68dc0745 82
ac9dac7f
RGS
83Remember that C<int()> merely truncates toward 0. For rounding to a
84certain number of digits, C<sprintf()> or C<printf()> is usually the
85easiest route.
92c2ed05 86
ac9dac7f 87 printf("%.3f", 3.1415926535); # prints 3.142
68dc0745 88
ac9dac7f
RGS
89The C<POSIX> module (part of the standard Perl distribution)
90implements C<ceil()>, C<floor()>, and a number of other mathematical
91and trigonometric functions.
68dc0745 92
ac9dac7f
RGS
93 use POSIX;
94 $ceil = ceil(3.5); # 4
95 $floor = floor(3.5); # 3
92c2ed05 96
ac9dac7f
RGS
97In 5.000 to 5.003 perls, trigonometry was done in the C<Math::Complex>
98module. With 5.004, the C<Math::Trig> module (part of the standard Perl
46fc3d4c 99distribution) implements the trigonometric functions. Internally it
ac9dac7f 100uses the C<Math::Complex> module and some functions can break out from
46fc3d4c 101the real axis into the complex plane, for example the inverse sine of
1022.
68dc0745 103
104Rounding in financial applications can have serious implications, and
105the rounding method used should be specified precisely. In these
106cases, it probably pays not to trust whichever system rounding is
107being used by Perl, but to instead implement the rounding function you
108need yourself.
109
65acb1b1
TC
110To see why, notice how you'll still have an issue on half-way-point
111alternation:
112
ac9dac7f 113 for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
65acb1b1 114
ac9dac7f
RGS
115 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
116 0.8 0.8 0.9 0.9 1.0 1.0
65acb1b1 117
ac9dac7f
RGS
118Don't blame Perl. It's the same as in C. IEEE says we have to do
119this. Perl numbers whose absolute values are integers under 2**31 (on
12032 bit machines) will work pretty much like mathematical integers.
121Other numbers are not guaranteed.
65acb1b1 122
6f0efb17 123=head2 How do I convert between numeric representations/bases/radixes?
68dc0745 124
ac9dac7f
RGS
125As always with Perl there is more than one way to do it. Below are a
126few examples of approaches to making common conversions between number
127representations. This is intended to be representational rather than
128exhaustive.
68dc0745 129
ac9dac7f
RGS
130Some of the examples later in L<perlfaq4> use the C<Bit::Vector>
131module from CPAN. The reason you might choose C<Bit::Vector> over the
132perl built in functions is that it works with numbers of ANY size,
133that it is optimized for speed on some operations, and for at least
134some programmers the notation might be familiar.
d92eb7b0 135
818c4caa
JH
136=over 4
137
138=item How do I convert hexadecimal into decimal
d92eb7b0 139
ac9dac7f 140Using perl's built in conversion of C<0x> notation:
6761e064 141
ac9dac7f 142 $dec = 0xDEADBEEF;
7207e29d 143
ac9dac7f 144Using the C<hex> function:
6761e064 145
ac9dac7f 146 $dec = hex("DEADBEEF");
6761e064 147
ac9dac7f 148Using C<pack>:
6761e064 149
ac9dac7f 150 $dec = unpack("N", pack("H8", substr("0" x 8 . "DEADBEEF", -8)));
6761e064 151
ac9dac7f 152Using the CPAN module C<Bit::Vector>:
6761e064 153
ac9dac7f
RGS
154 use Bit::Vector;
155 $vec = Bit::Vector->new_Hex(32, "DEADBEEF");
156 $dec = $vec->to_Dec();
6761e064 157
818c4caa 158=item How do I convert from decimal to hexadecimal
6761e064 159
ac9dac7f 160Using C<sprintf>:
6761e064 161
ac9dac7f
RGS
162 $hex = sprintf("%X", 3735928559); # upper case A-F
163 $hex = sprintf("%x", 3735928559); # lower case a-f
6761e064 164
ac9dac7f 165Using C<unpack>:
6761e064 166
ac9dac7f 167 $hex = unpack("H*", pack("N", 3735928559));
6761e064 168
ac9dac7f 169Using C<Bit::Vector>:
6761e064 170
ac9dac7f
RGS
171 use Bit::Vector;
172 $vec = Bit::Vector->new_Dec(32, -559038737);
173 $hex = $vec->to_Hex();
6761e064 174
ac9dac7f 175And C<Bit::Vector> supports odd bit counts:
6761e064 176
ac9dac7f
RGS
177 use Bit::Vector;
178 $vec = Bit::Vector->new_Dec(33, 3735928559);
179 $vec->Resize(32); # suppress leading 0 if unwanted
180 $hex = $vec->to_Hex();
6761e064 181
818c4caa 182=item How do I convert from octal to decimal
6761e064
JH
183
184Using Perl's built in conversion of numbers with leading zeros:
185
ac9dac7f 186 $dec = 033653337357; # note the leading 0!
6761e064 187
ac9dac7f 188Using the C<oct> function:
6761e064 189
ac9dac7f 190 $dec = oct("33653337357");
6761e064 191
ac9dac7f 192Using C<Bit::Vector>:
6761e064 193
ac9dac7f
RGS
194 use Bit::Vector;
195 $vec = Bit::Vector->new(32);
196 $vec->Chunk_List_Store(3, split(//, reverse "33653337357"));
197 $dec = $vec->to_Dec();
6761e064 198
818c4caa 199=item How do I convert from decimal to octal
6761e064 200
ac9dac7f 201Using C<sprintf>:
6761e064 202
ac9dac7f 203 $oct = sprintf("%o", 3735928559);
6761e064 204
ac9dac7f 205Using C<Bit::Vector>:
6761e064 206
ac9dac7f
RGS
207 use Bit::Vector;
208 $vec = Bit::Vector->new_Dec(32, -559038737);
209 $oct = reverse join('', $vec->Chunk_List_Read(3));
6761e064 210
818c4caa 211=item How do I convert from binary to decimal
6761e064 212
2c646907 213Perl 5.6 lets you write binary numbers directly with
ac9dac7f 214the C<0b> notation:
2c646907 215
ac9dac7f 216 $number = 0b10110110;
6f0efb17 217
ac9dac7f 218Using C<oct>:
6f0efb17 219
ac9dac7f
RGS
220 my $input = "10110110";
221 $decimal = oct( "0b$input" );
2c646907 222
ac9dac7f 223Using C<pack> and C<ord>:
d92eb7b0 224
ac9dac7f 225 $decimal = ord(pack('B8', '10110110'));
68dc0745 226
ac9dac7f 227Using C<pack> and C<unpack> for larger strings:
6761e064 228
ac9dac7f 229 $int = unpack("N", pack("B32",
6761e064 230 substr("0" x 32 . "11110101011011011111011101111", -32)));
ac9dac7f 231 $dec = sprintf("%d", $int);
6761e064 232
ac9dac7f 233 # substr() is used to left pad a 32 character string with zeros.
6761e064 234
ac9dac7f 235Using C<Bit::Vector>:
6761e064 236
ac9dac7f
RGS
237 $vec = Bit::Vector->new_Bin(32, "11011110101011011011111011101111");
238 $dec = $vec->to_Dec();
6761e064 239
818c4caa 240=item How do I convert from decimal to binary
6761e064 241
ac9dac7f 242Using C<sprintf> (perl 5.6+):
4dfcc30b 243
ac9dac7f 244 $bin = sprintf("%b", 3735928559);
4dfcc30b 245
ac9dac7f 246Using C<unpack>:
6761e064 247
ac9dac7f 248 $bin = unpack("B*", pack("N", 3735928559));
6761e064 249
ac9dac7f 250Using C<Bit::Vector>:
6761e064 251
ac9dac7f
RGS
252 use Bit::Vector;
253 $vec = Bit::Vector->new_Dec(32, -559038737);
254 $bin = $vec->to_Bin();
6761e064
JH
255
256The remaining transformations (e.g. hex -> oct, bin -> hex, etc.)
257are left as an exercise to the inclined reader.
68dc0745 258
818c4caa 259=back
68dc0745 260
65acb1b1
TC
261=head2 Why doesn't & work the way I want it to?
262
263The behavior of binary arithmetic operators depends on whether they're
264used on numbers or strings. The operators treat a string as a series
265of bits and work with that (the string C<"3"> is the bit pattern
266C<00110011>). The operators work with the binary form of a number
267(the number C<3> is treated as the bit pattern C<00000011>).
268
269So, saying C<11 & 3> performs the "and" operation on numbers (yielding
49d635f9 270C<3>). Saying C<"11" & "3"> performs the "and" operation on strings
65acb1b1
TC
271(yielding C<"1">).
272
273Most problems with C<&> and C<|> arise because the programmer thinks
274they have a number but really it's a string. The rest arise because
275the programmer says:
276
ac9dac7f
RGS
277 if ("\020\020" & "\101\101") {
278 # ...
279 }
65acb1b1
TC
280
281but a string consisting of two null bytes (the result of C<"\020\020"
282& "\101\101">) is not a false value in Perl. You need:
283
ac9dac7f
RGS
284 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
285 # ...
286 }
65acb1b1 287
68dc0745 288=head2 How do I multiply matrices?
289
290Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
291or the PDL extension (also available from CPAN).
292
293=head2 How do I perform an operation on a series of integers?
294
295To call a function on each element in an array, and collect the
296results, use:
297
ac9dac7f 298 @results = map { my_func($_) } @array;
68dc0745 299
300For example:
301
ac9dac7f 302 @triple = map { 3 * $_ } @single;
68dc0745 303
304To call a function on each element of an array, but ignore the
305results:
306
ac9dac7f
RGS
307 foreach $iterator (@array) {
308 some_func($iterator);
309 }
68dc0745 310
311To call a function on each integer in a (small) range, you B<can> use:
312
ac9dac7f 313 @results = map { some_func($_) } (5 .. 25);
68dc0745 314
315but you should be aware that the C<..> operator creates an array of
316all integers in the range. This can take a lot of memory for large
317ranges. Instead use:
318
ac9dac7f
RGS
319 @results = ();
320 for ($i=5; $i < 500_005; $i++) {
321 push(@results, some_func($i));
322 }
68dc0745 323
87275199
GS
324This situation has been fixed in Perl5.005. Use of C<..> in a C<for>
325loop will iterate over the range, without creating the entire range.
326
ac9dac7f
RGS
327 for my $i (5 .. 500_005) {
328 push(@results, some_func($i));
329 }
87275199
GS
330
331will not create a list of 500,000 integers.
332
68dc0745 333=head2 How can I output Roman numerals?
334
a93751fa 335Get the http://www.cpan.org/modules/by-module/Roman module.
68dc0745 336
337=head2 Why aren't my random numbers random?
338
65acb1b1
TC
339If you're using a version of Perl before 5.004, you must call C<srand>
340once at the start of your program to seed the random number generator.
49d635f9 341
5cd0b561 342 BEGIN { srand() if $] < 5.004 }
49d635f9 343
65acb1b1 3445.004 and later automatically call C<srand> at the beginning. Don't
ac9dac7f
RGS
345call C<srand> more than once--you make your numbers less random,
346rather than more.
92c2ed05 347
65acb1b1 348Computers are good at being predictable and bad at being random
06a5f41f 349(despite appearances caused by bugs in your programs :-). see the
49d635f9 350F<random> article in the "Far More Than You Ever Wanted To Know"
ac9dac7f
RGS
351collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz , courtesy
352of Tom Phoenix, talks more about this. John von Neumann said, "Anyone
06a5f41f 353who attempts to generate random numbers by deterministic means is, of
b432a672 354course, living in a state of sin."
65acb1b1
TC
355
356If you want numbers that are more random than C<rand> with C<srand>
ac9dac7f 357provides, you should also check out the C<Math::TrulyRandom> module from
65acb1b1
TC
358CPAN. It uses the imperfections in your system's timer to generate
359random numbers, but this takes quite a while. If you want a better
92c2ed05 360pseudorandom generator than comes with your operating system, look at
b432a672 361"Numerical Recipes in C" at http://www.nr.com/ .
68dc0745 362
881bdbd4
JH
363=head2 How do I get a random number between X and Y?
364
500071f4
RGS
365To get a random number between two values, you can use the
366C<rand()> builtin to get a random number between 0 and
367
793f5136
RGS
368C<rand($x)> returns a number such that
369C<< 0 <= rand($x) < $x >>. Thus what you want to have perl
370figure out is a random number in the range from 0 to the
371difference between your I<X> and I<Y>.
372
373That is, to get a number between 10 and 15, inclusive, you
374want a random number between 0 and 5 that you can then add
375to 10.
376
500071f4 377 my $number = 10 + int rand( 15-10+1 );
793f5136
RGS
378
379Hence you derive the following simple function to abstract
380that. It selects a random integer between the two given
500071f4
RGS
381integers (inclusive), For example: C<random_int_between(50,120)>.
382
ac9dac7f 383 sub random_int_between {
500071f4
RGS
384 my($min, $max) = @_;
385 # Assumes that the two arguments are integers themselves!
386 return $min if $min == $max;
387 ($min, $max) = ($max, $min) if $min > $max;
388 return $min + int rand(1 + $max - $min);
389 }
881bdbd4 390
68dc0745 391=head1 Data: Dates
392
5cd0b561 393=head2 How do I find the day or week of the year?
68dc0745 394
571e049f 395The localtime function returns the day of the year. Without an
5cd0b561 396argument localtime uses the current time.
68dc0745 397
a05e4845 398 $day_of_year = (localtime)[7];
ffc145e8 399
ac9dac7f 400The C<POSIX> module can also format a date as the day of the year or
5cd0b561 401week of the year.
68dc0745 402
5cd0b561
RGS
403 use POSIX qw/strftime/;
404 my $day_of_year = strftime "%j", localtime;
405 my $week_of_year = strftime "%W", localtime;
406
ac9dac7f 407To get the day of year for any date, use C<POSIX>'s C<mktime> to get
5cd0b561 408a time in epoch seconds for the argument to localtime.
ffc145e8 409
ac9dac7f 410 use POSIX qw/mktime strftime/;
6670e5e7 411 my $week_of_year = strftime "%W",
ac9dac7f 412 localtime( mktime( 0, 0, 0, 18, 11, 87 ) );
5cd0b561 413
ac9dac7f 414The C<Date::Calc> module provides two functions to calculate these.
5cd0b561
RGS
415
416 use Date::Calc;
417 my $day_of_year = Day_of_Year( 1987, 12, 18 );
418 my $week_of_year = Week_of_Year( 1987, 12, 18 );
ffc145e8 419
d92eb7b0
GS
420=head2 How do I find the current century or millennium?
421
422Use the following simple functions:
423
ac9dac7f
RGS
424 sub get_century {
425 return int((((localtime(shift || time))[5] + 1999))/100);
426 }
6670e5e7 427
ac9dac7f
RGS
428 sub get_millennium {
429 return 1+int((((localtime(shift || time))[5] + 1899))/1000);
430 }
d92eb7b0 431
ac9dac7f
RGS
432On some systems, the C<POSIX> module's C<strftime()> function has been
433extended in a non-standard way to use a C<%C> format, which they
434sometimes claim is the "century". It isn't, because on most such
435systems, this is only the first two digits of the four-digit year, and
436thus cannot be used to reliably determine the current century or
437millennium.
d92eb7b0 438
92c2ed05 439=head2 How can I compare two dates and find the difference?
68dc0745 440
b68463f7
RGS
441(contributed by brian d foy)
442
ac9dac7f
RGS
443You could just store all your dates as a number and then subtract.
444Life isn't always that simple though. If you want to work with
445formatted dates, the C<Date::Manip>, C<Date::Calc>, or C<DateTime>
446modules can help you.
68dc0745 447
448=head2 How can I take a string and turn it into epoch seconds?
449
450If it's a regular enough string that it always has the same format,
92c2ed05 451you can split it up and pass the parts to C<timelocal> in the standard
ac9dac7f
RGS
452C<Time::Local> module. Otherwise, you should look into the C<Date::Calc>
453and C<Date::Manip> modules from CPAN.
68dc0745 454
455=head2 How can I find the Julian Day?
456
7678cced
RGS
457(contributed by brian d foy and Dave Cross)
458
ac9dac7f
RGS
459You can use the C<Time::JulianDay> module available on CPAN. Ensure
460that you really want to find a Julian day, though, as many people have
7678cced
RGS
461different ideas about Julian days. See
462http://www.hermetic.ch/cal_stud/jdn.htm for instance.
463
ac9dac7f 464You can also try the C<DateTime> module, which can convert a date/time
7678cced
RGS
465to a Julian Day.
466
ac9dac7f
RGS
467 $ perl -MDateTime -le'print DateTime->today->jd'
468 2453401.5
7678cced
RGS
469
470Or the modified Julian Day
471
ac9dac7f
RGS
472 $ perl -MDateTime -le'print DateTime->today->mjd'
473 53401
7678cced
RGS
474
475Or even the day of the year (which is what some people think of as a
476Julian day)
477
ac9dac7f
RGS
478 $ perl -MDateTime -le'print DateTime->today->doy'
479 31
be94a901 480
65acb1b1
TC
481=head2 How do I find yesterday's date?
482
6670e5e7 483(contributed by brian d foy)
49d635f9 484
6670e5e7
RGS
485Use one of the Date modules. The C<DateTime> module makes it simple, and
486give you the same time of day, only the day before.
49d635f9 487
6670e5e7 488 use DateTime;
58103a2e 489
6670e5e7 490 my $yesterday = DateTime->now->subtract( days => 1 );
58103a2e 491
6670e5e7 492 print "Yesterday was $yesterday\n";
49d635f9 493
6670e5e7
RGS
494You can also use the C<Date::Calc> module using its Today_and_Now
495function.
49d635f9 496
6670e5e7 497 use Date::Calc qw( Today_and_Now Add_Delta_DHMS );
58103a2e 498
6670e5e7 499 my @date_time = Add_Delta_DHMS( Today_and_Now(), -1, 0, 0, 0 );
58103a2e 500
6670e5e7 501 print "@date\n";
58103a2e 502
6670e5e7
RGS
503Most people try to use the time rather than the calendar to figure out
504dates, but that assumes that days are twenty-four hours each. For
505most people, there are two days a year when they aren't: the switch to
506and from summer time throws this off. Let the modules do the work.
d92eb7b0 507
ac9dac7f 508=head2 Does Perl have a Year 2000 problem? Is Perl Y2K compliant?
68dc0745 509
65acb1b1 510Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is
ac9dac7f 511Y2K compliant (whatever that means). The programmers you've hired to
65acb1b1
TC
512use it, however, probably are not.
513
514Long answer: The question belies a true understanding of the issue.
515Perl is just as Y2K compliant as your pencil--no more, and no less.
516Can you use your pencil to write a non-Y2K-compliant memo? Of course
517you can. Is that the pencil's fault? Of course it isn't.
92c2ed05 518
87275199 519The date and time functions supplied with Perl (gmtime and localtime)
65acb1b1
TC
520supply adequate information to determine the year well beyond 2000
521(2038 is when trouble strikes for 32-bit machines). The year returned
90fdbbb7 522by these functions when used in a list context is the year minus 1900.
65acb1b1
TC
523For years between 1910 and 1999 this I<happens> to be a 2-digit decimal
524number. To avoid the year 2000 problem simply do not treat the year as
525a 2-digit number. It isn't.
68dc0745 526
5a964f20 527When gmtime() and localtime() are used in scalar context they return
68dc0745 528a timestamp string that contains a fully-expanded year. For example,
529C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
5302001". There's no year 2000 problem here.
531
5a964f20
TC
532That doesn't mean that Perl can't be used to create non-Y2K compliant
533programs. It can. But so can your pencil. It's the fault of the user,
b432a672
AL
534not the language. At the risk of inflaming the NRA: "Perl doesn't
535break Y2K, people do." See http://www.perl.org/about/y2k.html for
5a964f20
TC
536a longer exposition.
537
68dc0745 538=head1 Data: Strings
539
540=head2 How do I validate input?
541
6670e5e7
RGS
542(contributed by brian d foy)
543
544There are many ways to ensure that values are what you expect or
545want to accept. Besides the specific examples that we cover in the
546perlfaq, you can also look at the modules with "Assert" and "Validate"
547in their names, along with other modules such as C<Regexp::Common>.
548
549Some modules have validation for particular types of input, such
550as C<Business::ISBN>, C<Business::CreditCard>, C<Email::Valid>,
551and C<Data::Validate::IP>.
68dc0745 552
553=head2 How do I unescape a string?
554
b432a672 555It depends just what you mean by "escape". URL escapes are dealt
92c2ed05 556with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
a6dd486b 557character are removed with
68dc0745 558
ac9dac7f 559 s/\\(.)/$1/g;
68dc0745 560
92c2ed05 561This won't expand C<"\n"> or C<"\t"> or any other special escapes.
68dc0745 562
563=head2 How do I remove consecutive pairs of characters?
564
6670e5e7
RGS
565(contributed by brian d foy)
566
567You can use the substitution operator to find pairs of characters (or
568runs of characters) and replace them with a single instance. In this
569substitution, we find a character in C<(.)>. The memory parentheses
570store the matched character in the back-reference C<\1> and we use
571that to require that the same thing immediately follow it. We replace
572that part of the string with the character in C<$1>.
68dc0745 573
ac9dac7f 574 s/(.)\1/$1/g;
d92eb7b0 575
6670e5e7
RGS
576We can also use the transliteration operator, C<tr///>. In this
577example, the search list side of our C<tr///> contains nothing, but
578the C<c> option complements that so it contains everything. The
579replacement list also contains nothing, so the transliteration is
580almost a no-op since it won't do any replacements (or more exactly,
581replace the character with itself). However, the C<s> option squashes
582duplicated and consecutive characters in the string so a character
583does not show up next to itself
d92eb7b0 584
6670e5e7 585 my $str = 'Haarlem'; # in the Netherlands
ac9dac7f 586 $str =~ tr///cs; # Now Harlem, like in New York
68dc0745 587
588=head2 How do I expand function calls in a string?
589
6670e5e7
RGS
590(contributed by brian d foy)
591
592This is documented in L<perlref>, and although it's not the easiest
593thing to read, it does work. In each of these examples, we call the
58103a2e 594function inside the braces used to dereference a reference. If we
5ae37c3f 595have more than one return value, we can construct and dereference an
6670e5e7
RGS
596anonymous array. In this case, we call the function in list context.
597
58103a2e 598 print "The time values are @{ [localtime] }.\n";
6670e5e7
RGS
599
600If we want to call the function in scalar context, we have to do a bit
601more work. We can really have any code we like inside the braces, so
602we simply have to end with the scalar reference, although how you do
603that is up to you, and you can use code inside the braces.
68dc0745 604
6670e5e7 605 print "The time is ${\(scalar localtime)}.\n"
58103a2e 606
6670e5e7 607 print "The time is ${ my $x = localtime; \$x }.\n";
58103a2e 608
6670e5e7
RGS
609If your function already returns a reference, you don't need to create
610the reference yourself.
611
612 sub timestamp { my $t = localtime; \$t }
58103a2e 613
6670e5e7 614 print "The time is ${ timestamp() }.\n";
58103a2e
RGS
615
616The C<Interpolation> module can also do a lot of magic for you. You can
617specify a variable name, in this case C<E>, to set up a tied hash that
618does the interpolation for you. It has several other methods to do this
619as well.
620
621 use Interpolation E => 'eval';
622 print "The time values are $E{localtime()}.\n";
623
624In most cases, it is probably easier to simply use string concatenation,
625which also forces scalar context.
6670e5e7 626
ac9dac7f 627 print "The time is " . localtime() . ".\n";
68dc0745 628
68dc0745 629=head2 How do I find matching/nesting anything?
630
92c2ed05
GS
631This isn't something that can be done in one regular expression, no
632matter how complicated. To find something between two single
633characters, a pattern like C</x([^x]*)x/> will get the intervening
634bits in $1. For multiple ones, then something more like
ac9dac7f 635C</alpha(.*?)omega/> would be needed. But none of these deals with
6670e5e7
RGS
636nested patterns. For balanced expressions using C<(>, C<{>, C<[> or
637C<< < >> as delimiters, use the CPAN module Regexp::Common, or see
638L<perlre/(??{ code })>. For other cases, you'll have to write a
639parser.
92c2ed05
GS
640
641If you are serious about writing a parser, there are a number of
6a2af475 642modules or oddities that will make your life a lot easier. There are
ac9dac7f
RGS
643the CPAN modules C<Parse::RecDescent>, C<Parse::Yapp>, and
644C<Text::Balanced>; and the C<byacc> program. Starting from perl 5.8
645the C<Text::Balanced> is part of the standard distribution.
68dc0745 646
92c2ed05
GS
647One simple destructive, inside-out approach that you might try is to
648pull out the smallest nesting parts one at a time:
5a964f20 649
ac9dac7f
RGS
650 while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
651 # do something with $1
652 }
5a964f20 653
65acb1b1
TC
654A more complicated and sneaky approach is to make Perl's regular
655expression engine do it for you. This is courtesy Dean Inada, and
656rather has the nature of an Obfuscated Perl Contest entry, but it
657really does work:
658
ac9dac7f
RGS
659 # $_ contains the string to parse
660 # BEGIN and END are the opening and closing markers for the
661 # nested text.
c47ff5f1 662
ac9dac7f
RGS
663 @( = ('(','');
664 @) = (')','');
665 ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
666 @$ = (eval{/$re/},$@!~/unmatched/i);
667 print join("\n",@$[0..$#$]) if( $$[-1] );
65acb1b1 668
68dc0745 669=head2 How do I reverse a string?
670
ac9dac7f 671Use C<reverse()> in scalar context, as documented in
68dc0745 672L<perlfunc/reverse>.
673
ac9dac7f 674 $reversed = reverse $string;
68dc0745 675
676=head2 How do I expand tabs in a string?
677
5a964f20 678You can do it yourself:
68dc0745 679
ac9dac7f 680 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
68dc0745 681
ac9dac7f 682Or you can just use the C<Text::Tabs> module (part of the standard Perl
68dc0745 683distribution).
684
ac9dac7f
RGS
685 use Text::Tabs;
686 @expanded_lines = expand(@lines_with_tabs);
68dc0745 687
688=head2 How do I reformat a paragraph?
689
ac9dac7f 690Use C<Text::Wrap> (part of the standard Perl distribution):
68dc0745 691
ac9dac7f
RGS
692 use Text::Wrap;
693 print wrap("\t", ' ', @paragraphs);
68dc0745 694
ac9dac7f
RGS
695The paragraphs you give to C<Text::Wrap> should not contain embedded
696newlines. C<Text::Wrap> doesn't justify the lines (flush-right).
46fc3d4c 697
ac9dac7f
RGS
698Or use the CPAN module C<Text::Autoformat>. Formatting files can be
699easily done by making a shell alias, like so:
bc06af74 700
ac9dac7f
RGS
701 alias fmt="perl -i -MText::Autoformat -n0777 \
702 -e 'print autoformat $_, {all=>1}' $*"
bc06af74 703
ac9dac7f 704See the documentation for C<Text::Autoformat> to appreciate its many
bc06af74
JH
705capabilities.
706
49d635f9 707=head2 How can I access or change N characters of a string?
68dc0745 708
49d635f9
RGS
709You can access the first characters of a string with substr().
710To get the first character, for example, start at position 0
197aec24 711and grab the string of length 1.
68dc0745 712
68dc0745 713
49d635f9 714 $string = "Just another Perl Hacker";
ac9dac7f 715 $first_char = substr( $string, 0, 1 ); # 'J'
68dc0745 716
49d635f9
RGS
717To change part of a string, you can use the optional fourth
718argument which is the replacement string.
68dc0745 719
ac9dac7f 720 substr( $string, 13, 4, "Perl 5.8.0" );
197aec24 721
49d635f9 722You can also use substr() as an lvalue.
68dc0745 723
ac9dac7f 724 substr( $string, 13, 4 ) = "Perl 5.8.0";
197aec24 725
68dc0745 726=head2 How do I change the Nth occurrence of something?
727
92c2ed05
GS
728You have to keep track of N yourself. For example, let's say you want
729to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
d92eb7b0
GS
730C<"whosoever"> or C<"whomsoever">, case insensitively. These
731all assume that $_ contains the string to be altered.
68dc0745 732
ac9dac7f
RGS
733 $count = 0;
734 s{((whom?)ever)}{
735 ++$count == 5 # is it the 5th?
736 ? "${2}soever" # yes, swap
737 : $1 # renege and leave it there
738 }ige;
68dc0745 739
5a964f20
TC
740In the more general case, you can use the C</g> modifier in a C<while>
741loop, keeping count of matches.
742
ac9dac7f
RGS
743 $WANT = 3;
744 $count = 0;
745 $_ = "One fish two fish red fish blue fish";
746 while (/(\w+)\s+fish\b/gi) {
747 if (++$count == $WANT) {
748 print "The third fish is a $1 one.\n";
749 }
750 }
5a964f20 751
92c2ed05 752That prints out: C<"The third fish is a red one."> You can also use a
5a964f20
TC
753repetition count and repeated pattern like this:
754
ac9dac7f 755 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
5a964f20 756
68dc0745 757=head2 How can I count the number of occurrences of a substring within a string?
758
a6dd486b 759There are a number of ways, with varying efficiency. If you want a
68dc0745 760count of a certain single character (X) within a string, you can use the
761C<tr///> function like so:
762
ac9dac7f
RGS
763 $string = "ThisXlineXhasXsomeXx'sXinXit";
764 $count = ($string =~ tr/X//);
765 print "There are $count X characters in the string";
68dc0745 766
767This is fine if you are just looking for a single character. However,
768if you are trying to count multiple character substrings within a
769larger string, C<tr///> won't work. What you can do is wrap a while()
770loop around a global pattern match. For example, let's count negative
771integers:
772
ac9dac7f
RGS
773 $string = "-9 55 48 -2 23 -76 4 14 -44";
774 while ($string =~ /-\d+/g) { $count++ }
775 print "There are $count negative numbers in the string";
68dc0745 776
881bdbd4
JH
777Another version uses a global match in list context, then assigns the
778result to a scalar, producing a count of the number of matches.
779
780 $count = () = $string =~ /-\d+/g;
781
68dc0745 782=head2 How do I capitalize all the words on one line?
783
784To make the first letter of each word upper case:
3fe9a6f1 785
ac9dac7f 786 $line =~ s/\b(\w)/\U$1/g;
68dc0745 787
46fc3d4c 788This has the strange effect of turning "C<don't do it>" into "C<Don'T
a6dd486b 789Do It>". Sometimes you might want this. Other times you might need a
24f1ba9b 790more thorough solution (Suggested by brian d foy):
46fc3d4c 791
ac9dac7f
RGS
792 $string =~ s/ (
793 (^\w) #at the beginning of the line
794 | # or
795 (\s\w) #preceded by whitespace
796 )
797 /\U$1/xg;
798
799 $string =~ s/([\w']+)/\u\L$1/g;
46fc3d4c 800
68dc0745 801To make the whole line upper case:
3fe9a6f1 802
ac9dac7f 803 $line = uc($line);
68dc0745 804
805To force each word to be lower case, with the first letter upper case:
3fe9a6f1 806
ac9dac7f 807 $line =~ s/(\w+)/\u\L$1/g;
68dc0745 808
5a964f20
TC
809You can (and probably should) enable locale awareness of those
810characters by placing a C<use locale> pragma in your program.
92c2ed05 811See L<perllocale> for endless details on locales.
5a964f20 812
65acb1b1 813This is sometimes referred to as putting something into "title
d92eb7b0 814case", but that's not quite accurate. Consider the proper
65acb1b1
TC
815capitalization of the movie I<Dr. Strangelove or: How I Learned to
816Stop Worrying and Love the Bomb>, for example.
817
369b44b4
RGS
818Damian Conway's L<Text::Autoformat> module provides some smart
819case transformations:
820
ac9dac7f
RGS
821 use Text::Autoformat;
822 my $x = "Dr. Strangelove or: How I Learned to Stop ".
823 "Worrying and Love the Bomb";
369b44b4 824
ac9dac7f
RGS
825 print $x, "\n";
826 for my $style (qw( sentence title highlight )) {
827 print autoformat($x, { case => $style }), "\n";
828 }
369b44b4 829
49d635f9 830=head2 How can I split a [character] delimited string except when inside [character]?
68dc0745 831
ac9dac7f
RGS
832Several modules can handle this sort of parsing--C<Text::Balanced>,
833C<Text::CSV>, C<Text::CSV_XS>, and C<Text::ParseWords>, among others.
49d635f9
RGS
834
835Take the example case of trying to split a string that is
836comma-separated into its different fields. You can't use C<split(/,/)>
837because you shouldn't split if the comma is inside quotes. For
838example, take a data line like this:
68dc0745 839
ac9dac7f 840 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
68dc0745 841
842Due to the restriction of the quotes, this is a fairly complex
197aec24 843problem. Thankfully, we have Jeffrey Friedl, author of
49d635f9 844I<Mastering Regular Expressions>, to handle these for us. He
ac9dac7f 845suggests (assuming your string is contained in C<$text>):
68dc0745 846
ac9dac7f
RGS
847 @new = ();
848 push(@new, $+) while $text =~ m{
849 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
850 | ([^,]+),?
851 | ,
852 }gx;
853 push(@new, undef) if substr($text,-1,1) eq ',';
68dc0745 854
46fc3d4c 855If you want to represent quotation marks inside a
856quotation-mark-delimited field, escape them with backslashes (eg,
49d635f9 857C<"like \"this\"">.
46fc3d4c 858
ac9dac7f
RGS
859Alternatively, the C<Text::ParseWords> module (part of the standard
860Perl distribution) lets you say:
68dc0745 861
ac9dac7f
RGS
862 use Text::ParseWords;
863 @new = quotewords(",", 0, $text);
65acb1b1 864
68dc0745 865=head2 How do I strip blank space from the beginning/end of a string?
866
6670e5e7 867(contributed by brian d foy)
68dc0745 868
6670e5e7
RGS
869A substitution can do this for you. For a single line, you want to
870replace all the leading or trailing whitespace with nothing. You
871can do that with a pair of substitutions.
68dc0745 872
6670e5e7
RGS
873 s/^\s+//;
874 s/\s+$//;
68dc0745 875
6670e5e7
RGS
876You can also write that as a single substitution, although it turns
877out the combined statement is slower than the separate ones. That
878might not matter to you, though.
68dc0745 879
6670e5e7 880 s/^\s+|\s+$//g;
68dc0745 881
6670e5e7
RGS
882In this regular expression, the alternation matches either at the
883beginning or the end of the string since the anchors have a lower
884precedence than the alternation. With the C</g> flag, the substitution
885makes all possible matches, so it gets both. Remember, the trailing
886newline matches the C<\s+>, and the C<$> anchor can match to the
887physical end of the string, so the newline disappears too. Just add
888the newline to the output, which has the added benefit of preserving
889"blank" (consisting entirely of whitespace) lines which the C<^\s+>
890would remove all by itself.
68dc0745 891
6670e5e7
RGS
892 while( <> )
893 {
894 s/^\s+|\s+$//g;
895 print "$_\n";
896 }
5a964f20 897
6670e5e7
RGS
898For a multi-line string, you can apply the regular expression
899to each logical line in the string by adding the C</m> flag (for
900"multi-line"). With the C</m> flag, the C<$> matches I<before> an
901embedded newline, so it doesn't remove it. It still removes the
902newline at the end of the string.
903
ac9dac7f 904 $string =~ s/^\s+|\s+$//gm;
6670e5e7
RGS
905
906Remember that lines consisting entirely of whitespace will disappear,
907since the first part of the alternation can match the entire string
908and replace it with nothing. If need to keep embedded blank lines,
909you have to do a little more work. Instead of matching any whitespace
910(since that includes a newline), just match the other whitespace.
911
912 $string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
5a964f20 913
65acb1b1
TC
914=head2 How do I pad a string with blanks or pad a number with zeroes?
915
65acb1b1 916In the following examples, C<$pad_len> is the length to which you wish
d92eb7b0
GS
917to pad the string, C<$text> or C<$num> contains the string to be padded,
918and C<$pad_char> contains the padding character. You can use a single
919character string constant instead of the C<$pad_char> variable if you
920know what it is in advance. And in the same way you can use an integer in
921place of C<$pad_len> if you know the pad length in advance.
65acb1b1 922
d92eb7b0
GS
923The simplest method uses the C<sprintf> function. It can pad on the left
924or right with blanks and on the left with zeroes and it will not
925truncate the result. The C<pack> function can only pad strings on the
926right with blanks and it will truncate the result to a maximum length of
927C<$pad_len>.
65acb1b1 928
ac9dac7f 929 # Left padding a string with blanks (no truncation):
04d666b1
RGS
930 $padded = sprintf("%${pad_len}s", $text);
931 $padded = sprintf("%*s", $pad_len, $text); # same thing
65acb1b1 932
ac9dac7f 933 # Right padding a string with blanks (no truncation):
04d666b1
RGS
934 $padded = sprintf("%-${pad_len}s", $text);
935 $padded = sprintf("%-*s", $pad_len, $text); # same thing
65acb1b1 936
ac9dac7f 937 # Left padding a number with 0 (no truncation):
04d666b1
RGS
938 $padded = sprintf("%0${pad_len}d", $num);
939 $padded = sprintf("%0*d", $pad_len, $num); # same thing
65acb1b1 940
ac9dac7f
RGS
941 # Right padding a string with blanks using pack (will truncate):
942 $padded = pack("A$pad_len",$text);
65acb1b1 943
d92eb7b0
GS
944If you need to pad with a character other than blank or zero you can use
945one of the following methods. They all generate a pad string with the
946C<x> operator and combine that with C<$text>. These methods do
947not truncate C<$text>.
65acb1b1 948
d92eb7b0 949Left and right padding with any character, creating a new string:
65acb1b1 950
ac9dac7f
RGS
951 $padded = $pad_char x ( $pad_len - length( $text ) ) . $text;
952 $padded = $text . $pad_char x ( $pad_len - length( $text ) );
65acb1b1 953
d92eb7b0 954Left and right padding with any character, modifying C<$text> directly:
65acb1b1 955
ac9dac7f
RGS
956 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) );
957 $text .= $pad_char x ( $pad_len - length( $text ) );
65acb1b1 958
68dc0745 959=head2 How do I extract selected columns from a string?
960
ac9dac7f 961Use C<substr()> or C<unpack()>, both documented in L<perlfunc>.
197aec24 962If you prefer thinking in terms of columns instead of widths,
5a964f20
TC
963you can use this kind of thing:
964
ac9dac7f
RGS
965 # determine the unpack format needed to split Linux ps output
966 # arguments are cut columns
967 my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
968
969 sub cut2fmt {
970 my(@positions) = @_;
971 my $template = '';
972 my $lastpos = 1;
973 for my $place (@positions) {
974 $template .= "A" . ($place - $lastpos) . " ";
975 $lastpos = $place;
976 }
977 $template .= "A*";
978 return $template;
979 }
68dc0745 980
981=head2 How do I find the soundex value of a string?
982
7678cced
RGS
983(contributed by brian d foy)
984
985You can use the Text::Soundex module. If you want to do fuzzy or close
ac9dac7f
RGS
986matching, you might also try the C<String::Approx>, and
987C<Text::Metaphone>, and C<Text::DoubleMetaphone> modules.
68dc0745 988
989=head2 How can I expand variables in text strings?
990
7678cced
RGS
991Let's assume that you have a string that contains placeholder
992variables.
68dc0745 993
ac9dac7f 994 $text = 'this has a $foo in it and a $bar';
5a964f20 995
7678cced
RGS
996You can use a substitution with a double evaluation. The
997first /e turns C<$1> into C<$foo>, and the second /e turns
998C<$foo> into its value. You may want to wrap this in an
999C<eval>: if you try to get the value of an undeclared variable
1000while running under C<use strict>, you get a fatal error.
5a964f20 1001
ac9dac7f
RGS
1002 eval { $text =~ s/(\$\w+)/$1/eeg };
1003 die if $@;
68dc0745 1004
5a964f20
TC
1005It's probably better in the general case to treat those
1006variables as entries in some special hash. For example:
1007
ac9dac7f
RGS
1008 %user_defs = (
1009 foo => 23,
1010 bar => 19,
1011 );
1012 $text =~ s/\$(\w+)/$user_defs{$1}/g;
68dc0745 1013
1014=head2 What's wrong with always quoting "$vars"?
1015
ac9dac7f
RGS
1016The problem is that those double-quotes force
1017stringification--coercing numbers and references into
1018strings--even when you don't want them to be strings. Think
1019of it this way: double-quote expansion is used to produce
1020new strings. If you already have a string, why do you need
1021more?
68dc0745 1022
1023If you get used to writing odd things like these:
1024
ac9dac7f
RGS
1025 print "$var"; # BAD
1026 $new = "$old"; # BAD
1027 somefunc("$var"); # BAD
68dc0745 1028
1029You'll be in trouble. Those should (in 99.8% of the cases) be
1030the simpler and more direct:
1031
ac9dac7f
RGS
1032 print $var;
1033 $new = $old;
1034 somefunc($var);
68dc0745 1035
1036Otherwise, besides slowing you down, you're going to break code when
1037the thing in the scalar is actually neither a string nor a number, but
1038a reference:
1039
ac9dac7f
RGS
1040 func(\@array);
1041 sub func {
1042 my $aref = shift;
1043 my $oref = "$aref"; # WRONG
1044 }
68dc0745 1045
1046You can also get into subtle problems on those few operations in Perl
1047that actually do care about the difference between a string and a
1048number, such as the magical C<++> autoincrement operator or the
1049syscall() function.
1050
197aec24 1051Stringification also destroys arrays.
5a964f20 1052
ac9dac7f
RGS
1053 @lines = `command`;
1054 print "@lines"; # WRONG - extra blanks
1055 print @lines; # right
5a964f20 1056
04d666b1 1057=head2 Why don't my E<lt>E<lt>HERE documents work?
68dc0745 1058
1059Check for these three things:
1060
1061=over 4
1062
04d666b1 1063=item There must be no space after the E<lt>E<lt> part.
68dc0745 1064
197aec24 1065=item There (probably) should be a semicolon at the end.
68dc0745 1066
197aec24 1067=item You can't (easily) have any space in front of the tag.
68dc0745 1068
1069=back
1070
197aec24 1071If you want to indent the text in the here document, you
5a964f20
TC
1072can do this:
1073
1074 # all in one
1075 ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
1076 your text
1077 goes here
1078 HERE_TARGET
1079
1080But the HERE_TARGET must still be flush against the margin.
197aec24 1081If you want that indented also, you'll have to quote
5a964f20
TC
1082in the indentation.
1083
1084 ($quote = <<' FINIS') =~ s/^\s+//gm;
1085 ...we will have peace, when you and all your works have
1086 perished--and the works of your dark master to whom you
1087 would deliver us. You are a liar, Saruman, and a corrupter
1088 of men's hearts. --Theoden in /usr/src/perl/taint.c
1089 FINIS
83ded9ee 1090 $quote =~ s/\s+--/\n--/;
5a964f20
TC
1091
1092A nice general-purpose fixer-upper function for indented here documents
1093follows. It expects to be called with a here document as its argument.
1094It looks to see whether each line begins with a common substring, and
a6dd486b
JB
1095if so, strips that substring off. Otherwise, it takes the amount of leading
1096whitespace found on the first line and removes that much off each
5a964f20
TC
1097subsequent line.
1098
1099 sub fix {
1100 local $_ = shift;
a6dd486b 1101 my ($white, $leader); # common whitespace and common leading string
5a964f20
TC
1102 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
1103 ($white, $leader) = ($2, quotemeta($1));
1104 } else {
1105 ($white, $leader) = (/^(\s+)/, '');
1106 }
1107 s/^\s*?$leader(?:$white)?//gm;
1108 return $_;
1109 }
1110
c8db1d39 1111This works with leading special strings, dynamically determined:
5a964f20 1112
ac9dac7f 1113 $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
5a964f20
TC
1114 @@@ int
1115 @@@ runops() {
1116 @@@ SAVEI32(runlevel);
1117 @@@ runlevel++;
d92eb7b0 1118 @@@ while ( op = (*op->op_ppaddr)() );
5a964f20
TC
1119 @@@ TAINT_NOT;
1120 @@@ return 0;
1121 @@@ }
ac9dac7f 1122 MAIN_INTERPRETER_LOOP
5a964f20 1123
a6dd486b 1124Or with a fixed amount of leading whitespace, with remaining
5a964f20
TC
1125indentation correctly preserved:
1126
ac9dac7f 1127 $poem = fix<<EVER_ON_AND_ON;
5a964f20
TC
1128 Now far ahead the Road has gone,
1129 And I must follow, if I can,
1130 Pursuing it with eager feet,
1131 Until it joins some larger way
1132 Where many paths and errands meet.
1133 And whither then? I cannot say.
1134 --Bilbo in /usr/src/perl/pp_ctl.c
ac9dac7f 1135 EVER_ON_AND_ON
5a964f20 1136
68dc0745 1137=head1 Data: Arrays
1138
65acb1b1
TC
1139=head2 What is the difference between a list and an array?
1140
ac9dac7f
RGS
1141An array has a changeable length. A list does not. An array is
1142something you can push or pop, while a list is a set of values. Some
1143people make the distinction that a list is a value while an array is a
1144variable. Subroutines are passed and return lists, you put things into
1145list context, you initialize arrays with lists, and you C<foreach()>
1146across a list. C<@> variables are arrays, anonymous arrays are
1147arrays, arrays in scalar context behave like the number of elements in
1148them, subroutines access their arguments through the array C<@_>, and
1149C<push>/C<pop>/C<shift> only work on arrays.
65acb1b1
TC
1150
1151As a side note, there's no such thing as a list in scalar context.
1152When you say
1153
ac9dac7f 1154 $scalar = (2, 5, 7, 9);
65acb1b1 1155
d92eb7b0 1156you're using the comma operator in scalar context, so it uses the scalar
ac9dac7f 1157comma operator. There never was a list there at all! This causes the
d92eb7b0 1158last value to be returned: 9.
65acb1b1 1159
68dc0745 1160=head2 What is the difference between $array[1] and @array[1]?
1161
a6dd486b 1162The former is a scalar value; the latter an array slice, making
68dc0745 1163it a list with one (scalar) value. You should use $ when you want a
1164scalar value (most of the time) and @ when you want a list with one
1165scalar value in it (very, very rarely; nearly never, in fact).
1166
1167Sometimes it doesn't make a difference, but sometimes it does.
1168For example, compare:
1169
ac9dac7f 1170 $good[0] = `some program that outputs several lines`;
68dc0745 1171
1172with
1173
ac9dac7f 1174 @bad[0] = `same program that outputs several lines`;
68dc0745 1175
197aec24 1176The C<use warnings> pragma and the B<-w> flag will warn you about these
9f1b1f2d 1177matters.
68dc0745 1178
d92eb7b0 1179=head2 How can I remove duplicate elements from a list or array?
68dc0745 1180
6670e5e7 1181(contributed by brian d foy)
68dc0745 1182
6670e5e7
RGS
1183Use a hash. When you think the words "unique" or "duplicated", think
1184"hash keys".
68dc0745 1185
6670e5e7
RGS
1186If you don't care about the order of the elements, you could just
1187create the hash then extract the keys. It's not important how you
1188create that hash: just that you use C<keys> to get the unique
1189elements.
551e1d92 1190
ac9dac7f
RGS
1191 my %hash = map { $_, 1 } @array;
1192 # or a hash slice: @hash{ @array } = ();
1193 # or a foreach: $hash{$_} = 1 foreach ( @array );
1194
1195 my @unique = keys %hash;
68dc0745 1196
ac9dac7f
RGS
1197If you want to use a module, try the C<uniq> function from
1198C<List::MoreUtils>. In list context it returns the unique elements,
1199preserving their order in the list. In scalar context, it returns the
1200number of unique elements.
1201
1202 use List::MoreUtils qw(uniq);
1203
1204 my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
1205 my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
68dc0745 1206
6670e5e7
RGS
1207You can also go through each element and skip the ones you've seen
1208before. Use a hash to keep track. The first time the loop sees an
1209element, that element has no key in C<%Seen>. The C<next> statement
1210creates the key and immediately uses its value, which is C<undef>, so
1211the loop continues to the C<push> and increments the value for that
1212key. The next time the loop sees that same element, its key exists in
1213the hash I<and> the value for that key is true (since it's not 0 or
ac9dac7f
RGS
1214C<undef>), so the next skips that iteration and the loop goes to the
1215next element.
551e1d92 1216
6670e5e7
RGS
1217 my @unique = ();
1218 my %seen = ();
68dc0745 1219
6670e5e7
RGS
1220 foreach my $elem ( @array )
1221 {
1222 next if $seen{ $elem }++;
1223 push @unique, $elem;
1224 }
68dc0745 1225
6670e5e7
RGS
1226You can write this more briefly using a grep, which does the
1227same thing.
68dc0745 1228
ac9dac7f
RGS
1229 my %seen = ();
1230 my @unique = grep { ! $seen{ $_ }++ } @array;
65acb1b1 1231
ddbc1f16 1232=head2 How can I tell whether a certain element is contained in a list or array?
5a964f20 1233
9e72e4c6
RGS
1234(portions of this answer contributed by Anno Siegel)
1235
5a964f20
TC
1236Hearing the word "in" is an I<in>dication that you probably should have
1237used a hash, not a list or array, to store your data. Hashes are
1238designed to answer this question quickly and efficiently. Arrays aren't.
68dc0745 1239
5a964f20
TC
1240That being said, there are several ways to approach this. If you
1241are going to make this query many times over arbitrary string values,
881bdbd4
JH
1242the fastest way is probably to invert the original array and maintain a
1243hash whose keys are the first array's values.
68dc0745 1244
ac9dac7f
RGS
1245 @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
1246 %is_blue = ();
1247 for (@blues) { $is_blue{$_} = 1 }
68dc0745 1248
ac9dac7f
RGS
1249Now you can check whether C<$is_blue{$some_color}>. It might have
1250been a good idea to keep the blues all in a hash in the first place.
68dc0745 1251
1252If the values are all small integers, you could use a simple indexed
1253array. This kind of an array will take up less space:
1254
ac9dac7f
RGS
1255 @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
1256 @is_tiny_prime = ();
1257 for (@primes) { $is_tiny_prime[$_] = 1 }
1258 # or simply @istiny_prime[@primes] = (1) x @primes;
68dc0745 1259
1260Now you check whether $is_tiny_prime[$some_number].
1261
1262If the values in question are integers instead of strings, you can save
1263quite a lot of space by using bit strings instead:
1264
ac9dac7f
RGS
1265 @articles = ( 1..10, 150..2000, 2017 );
1266 undef $read;
1267 for (@articles) { vec($read,$_,1) = 1 }
68dc0745 1268
1269Now check whether C<vec($read,$n,1)> is true for some C<$n>.
1270
9e72e4c6
RGS
1271These methods guarantee fast individual tests but require a re-organization
1272of the original list or array. They only pay off if you have to test
1273multiple values against the same array.
68dc0745 1274
ac9dac7f 1275If you are testing only once, the standard module C<List::Util> exports
9e72e4c6
RGS
1276the function C<first> for this purpose. It works by stopping once it
1277finds the element. It's written in C for speed, and its Perl equivalant
1278looks like this subroutine:
68dc0745 1279
9e72e4c6
RGS
1280 sub first (&@) {
1281 my $code = shift;
1282 foreach (@_) {
1283 return $_ if &{$code}();
1284 }
1285 undef;
1286 }
68dc0745 1287
9e72e4c6
RGS
1288If speed is of little concern, the common idiom uses grep in scalar context
1289(which returns the number of items that passed its condition) to traverse the
1290entire list. This does have the benefit of telling you how many matches it
1291found, though.
68dc0745 1292
9e72e4c6 1293 my $is_there = grep $_ eq $whatever, @array;
65acb1b1 1294
9e72e4c6
RGS
1295If you want to actually extract the matching elements, simply use grep in
1296list context.
68dc0745 1297
9e72e4c6 1298 my @matches = grep $_ eq $whatever, @array;
58103a2e 1299
68dc0745 1300=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
1301
ac9dac7f
RGS
1302Use a hash. Here's code to do both and more. It assumes that each
1303element is unique in a given array:
68dc0745 1304
ac9dac7f
RGS
1305 @union = @intersection = @difference = ();
1306 %count = ();
1307 foreach $element (@array1, @array2) { $count{$element}++ }
1308 foreach $element (keys %count) {
1309 push @union, $element;
1310 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
1311 }
68dc0745 1312
ac9dac7f
RGS
1313Note that this is the I<symmetric difference>, that is, all elements
1314in either A or in B but not in both. Think of it as an xor operation.
d92eb7b0 1315
65acb1b1
TC
1316=head2 How do I test whether two arrays or hashes are equal?
1317
ac9dac7f
RGS
1318The following code works for single-level arrays. It uses a
1319stringwise comparison, and does not distinguish defined versus
1320undefined empty strings. Modify if you have other needs.
65acb1b1 1321
ac9dac7f 1322 $are_equal = compare_arrays(\@frogs, \@toads);
65acb1b1 1323
ac9dac7f
RGS
1324 sub compare_arrays {
1325 my ($first, $second) = @_;
1326 no warnings; # silence spurious -w undef complaints
1327 return 0 unless @$first == @$second;
1328 for (my $i = 0; $i < @$first; $i++) {
1329 return 0 if $first->[$i] ne $second->[$i];
1330 }
1331 return 1;
1332 }
65acb1b1
TC
1333
1334For multilevel structures, you may wish to use an approach more
ac9dac7f 1335like this one. It uses the CPAN module C<FreezeThaw>:
65acb1b1 1336
ac9dac7f
RGS
1337 use FreezeThaw qw(cmpStr);
1338 @a = @b = ( "this", "that", [ "more", "stuff" ] );
65acb1b1 1339
ac9dac7f
RGS
1340 printf "a and b contain %s arrays\n",
1341 cmpStr(\@a, \@b) == 0
1342 ? "the same"
1343 : "different";
65acb1b1 1344
ac9dac7f
RGS
1345This approach also works for comparing hashes. Here we'll demonstrate
1346two different answers:
65acb1b1 1347
ac9dac7f 1348 use FreezeThaw qw(cmpStr cmpStrHard);
65acb1b1 1349
ac9dac7f
RGS
1350 %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
1351 $a{EXTRA} = \%b;
1352 $b{EXTRA} = \%a;
65acb1b1 1353
ac9dac7f 1354 printf "a and b contain %s hashes\n",
65acb1b1
TC
1355 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
1356
ac9dac7f 1357 printf "a and b contain %s hashes\n",
65acb1b1
TC
1358 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
1359
1360
1361The first reports that both those the hashes contain the same data,
1362while the second reports that they do not. Which you prefer is left as
1363an exercise to the reader.
1364
68dc0745 1365=head2 How do I find the first array element for which a condition is true?
1366
49d635f9 1367To find the first array element which satisfies a condition, you can
ac9dac7f
RGS
1368use the C<first()> function in the C<List::Util> module, which comes
1369with Perl 5.8. This example finds the first element that contains
1370"Perl".
49d635f9
RGS
1371
1372 use List::Util qw(first);
197aec24 1373
49d635f9 1374 my $element = first { /Perl/ } @array;
197aec24 1375
ac9dac7f 1376If you cannot use C<List::Util>, you can make your own loop to do the
49d635f9
RGS
1377same thing. Once you find the element, you stop the loop with last.
1378
1379 my $found;
ac9dac7f 1380 foreach ( @array ) {
6670e5e7 1381 if( /Perl/ ) { $found = $_; last }
49d635f9
RGS
1382 }
1383
1384If you want the array index, you can iterate through the indices
1385and check the array element at each index until you find one
1386that satisfies the condition.
1387
197aec24 1388 my( $found, $index ) = ( undef, -1 );
ac9dac7f
RGS
1389 for( $i = 0; $i < @array; $i++ ) {
1390 if( $array[$i] =~ /Perl/ ) {
6670e5e7
RGS
1391 $found = $array[$i];
1392 $index = $i;
1393 last;
1394 }
1395 }
68dc0745 1396
1397=head2 How do I handle linked lists?
1398
1399In general, you usually don't need a linked list in Perl, since with
ac9dac7f
RGS
1400regular arrays, you can push and pop or shift and unshift at either
1401end, or you can use splice to add and/or remove arbitrary number of
1402elements at arbitrary points. Both pop and shift are both O(1)
1403operations on Perl's dynamic arrays. In the absence of shifts and
1404pops, push in general needs to reallocate on the order every log(N)
1405times, and unshift will need to copy pointers each time.
68dc0745 1406
1407If you really, really wanted, you could use structures as described in
ac9dac7f
RGS
1408L<perldsc> or L<perltoot> and do just what the algorithm book tells
1409you to do. For example, imagine a list node like this:
65acb1b1 1410
ac9dac7f
RGS
1411 $node = {
1412 VALUE => 42,
1413 LINK => undef,
1414 };
65acb1b1
TC
1415
1416You could walk the list this way:
1417
ac9dac7f
RGS
1418 print "List: ";
1419 for ($node = $head; $node; $node = $node->{LINK}) {
1420 print $node->{VALUE}, " ";
1421 }
1422 print "\n";
65acb1b1 1423
a6dd486b 1424You could add to the list this way:
65acb1b1 1425
ac9dac7f
RGS
1426 my ($head, $tail);
1427 $tail = append($head, 1); # grow a new head
1428 for $value ( 2 .. 10 ) {
1429 $tail = append($tail, $value);
1430 }
65acb1b1 1431
ac9dac7f
RGS
1432 sub append {
1433 my($list, $value) = @_;
1434 my $node = { VALUE => $value };
1435 if ($list) {
1436 $node->{LINK} = $list->{LINK};
1437 $list->{LINK} = $node;
1438 }
1439 else {
1440 $_[0] = $node; # replace caller's version
1441 }
1442 return $node;
1443 }
65acb1b1
TC
1444
1445But again, Perl's built-in are virtually always good enough.
68dc0745 1446
1447=head2 How do I handle circular lists?
1448
1449Circular lists could be handled in the traditional fashion with linked
1450lists, or you could just do something like this with an array:
1451
ac9dac7f
RGS
1452 unshift(@array, pop(@array)); # the last shall be first
1453 push(@array, shift(@array)); # and vice versa
1454
1455You can also use C<Tie::Cycle>:
1456
1457 use Tie::Cycle;
1458
1459 tie my $cycle, 'Tie::Cycle', [ qw( FFFFFF 000000 FFFF00 ) ];
1460
1461 print $cycle; # FFFFFF
1462 print $cycle; # 000000
1463 print $cycle; # FFFF00
68dc0745 1464
1465=head2 How do I shuffle an array randomly?
1466
45bbf655
JH
1467If you either have Perl 5.8.0 or later installed, or if you have
1468Scalar-List-Utils 1.03 or later installed, you can say:
1469
ac9dac7f 1470 use List::Util 'shuffle';
45bbf655
JH
1471
1472 @shuffled = shuffle(@list);
1473
f05bbc40 1474If not, you can use a Fisher-Yates shuffle.
5a964f20 1475
ac9dac7f
RGS
1476 sub fisher_yates_shuffle {
1477 my $deck = shift; # $deck is a reference to an array
1478 my $i = @$deck;
1479 while (--$i) {
1480 my $j = int rand ($i+1);
1481 @$deck[$i,$j] = @$deck[$j,$i];
1482 }
1483 }
5a964f20 1484
ac9dac7f
RGS
1485 # shuffle my mpeg collection
1486 #
1487 my @mpeg = <audio/*/*.mp3>;
1488 fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place
1489 print @mpeg;
5a964f20 1490
45bbf655 1491Note that the above implementation shuffles an array in place,
ac9dac7f 1492unlike the C<List::Util::shuffle()> which takes a list and returns
45bbf655
JH
1493a new shuffled list.
1494
d92eb7b0 1495You've probably seen shuffling algorithms that work using splice,
a6dd486b 1496randomly picking another element to swap the current element with
68dc0745 1497
ac9dac7f
RGS
1498 srand;
1499 @new = ();
1500 @old = 1 .. 10; # just a demo
1501 while (@old) {
1502 push(@new, splice(@old, rand @old, 1));
1503 }
68dc0745 1504
ac9dac7f
RGS
1505This is bad because splice is already O(N), and since you do it N
1506times, you just invented a quadratic algorithm; that is, O(N**2).
1507This does not scale, although Perl is so efficient that you probably
1508won't notice this until you have rather largish arrays.
68dc0745 1509
1510=head2 How do I process/modify each element of an array?
1511
1512Use C<for>/C<foreach>:
1513
ac9dac7f 1514 for (@lines) {
6670e5e7
RGS
1515 s/foo/bar/; # change that word
1516 tr/XZ/ZX/; # swap those letters
ac9dac7f 1517 }
68dc0745 1518
1519Here's another; let's compute spherical volumes:
1520
ac9dac7f 1521 for (@volumes = @radii) { # @volumes has changed parts
6670e5e7
RGS
1522 $_ **= 3;
1523 $_ *= (4/3) * 3.14159; # this will be constant folded
ac9dac7f 1524 }
197aec24 1525
ac9dac7f 1526which can also be done with C<map()> which is made to transform
49d635f9
RGS
1527one list into another:
1528
1529 @volumes = map {$_ ** 3 * (4/3) * 3.14159} @radii;
68dc0745 1530
76817d6d
JH
1531If you want to do the same thing to modify the values of the
1532hash, you can use the C<values> function. As of Perl 5.6
1533the values are not copied, so if you modify $orbit (in this
1534case), you modify the value.
5a964f20 1535
ac9dac7f 1536 for $orbit ( values %orbits ) {
6670e5e7 1537 ($orbit **= 3) *= (4/3) * 3.14159;
ac9dac7f 1538 }
818c4caa 1539
76817d6d
JH
1540Prior to perl 5.6 C<values> returned copies of the values,
1541so older perl code often contains constructions such as
1542C<@orbits{keys %orbits}> instead of C<values %orbits> where
1543the hash is to be modified.
818c4caa 1544
68dc0745 1545=head2 How do I select a random element from an array?
1546
ac9dac7f 1547Use the C<rand()> function (see L<perlfunc/rand>):
68dc0745 1548
ac9dac7f
RGS
1549 $index = rand @array;
1550 $element = $array[$index];
68dc0745 1551
793f5136 1552Or, simply:
ac9dac7f
RGS
1553
1554 my $element = $array[ rand @array ];
5a964f20 1555
68dc0745 1556=head2 How do I permute N elements of a list?
1557
ac9dac7f
RGS
1558Use the C<List::Permutor> module on CPAN. If the list is actually an
1559array, try the C<Algorithm::Permute> module (also on CPAN). It's
1560written in XS code and is very efficient.
49d635f9
RGS
1561
1562 use Algorithm::Permute;
1563 my @array = 'a'..'d';
1564 my $p_iterator = Algorithm::Permute->new ( \@array );
1565 while (my @perm = $p_iterator->next) {
1566 print "next permutation: (@perm)\n";
ac9dac7f 1567 }
49d635f9 1568
197aec24
RGS
1569For even faster execution, you could do:
1570
ac9dac7f
RGS
1571 use Algorithm::Permute;
1572 my @array = 'a'..'d';
1573 Algorithm::Permute::permute {
1574 print "next permutation: (@array)\n";
1575 } @array;
197aec24 1576
49d635f9
RGS
1577Here's a little program that generates all permutations of
1578all the words on each line of input. The algorithm embodied
ac9dac7f 1579in the C<permute()> function is discussed in Volume 4 (still
49d635f9
RGS
1580unpublished) of Knuth's I<The Art of Computer Programming>
1581and will work on any list:
1582
1583 #!/usr/bin/perl -n
1584 # Fischer-Kause ordered permutation generator
1585
1586 sub permute (&@) {
1587 my $code = shift;
1588 my @idx = 0..$#_;
1589 while ( $code->(@_[@idx]) ) {
1590 my $p = $#idx;
1591 --$p while $idx[$p-1] > $idx[$p];
1592 my $q = $p or return;
1593 push @idx, reverse splice @idx, $p;
1594 ++$q while $idx[$p-1] > $idx[$q];
1595 @idx[$p-1,$q]=@idx[$q,$p-1];
1596 }
68dc0745 1597 }
68dc0745 1598
49d635f9 1599 permute {print"@_\n"} split;
b8d2732a 1600
68dc0745 1601=head2 How do I sort an array by (anything)?
1602
1603Supply a comparison function to sort() (described in L<perlfunc/sort>):
1604
ac9dac7f 1605 @list = sort { $a <=> $b } @list;
68dc0745 1606
1607The default sort function is cmp, string comparison, which would
c47ff5f1 1608sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<< <=> >>, used above, is
68dc0745 1609the numerical comparison operator.
1610
1611If you have a complicated function needed to pull out the part you
1612want to sort on, then don't do it inside the sort function. Pull it
1613out first, because the sort BLOCK can be called many times for the
1614same element. Here's an example of how to pull out the first word
1615after the first number on each item, and then sort those words
1616case-insensitively.
1617
ac9dac7f
RGS
1618 @idx = ();
1619 for (@data) {
1620 ($item) = /\d+\s*(\S+)/;
1621 push @idx, uc($item);
1622 }
1623 @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
68dc0745 1624
a6dd486b 1625which could also be written this way, using a trick
68dc0745 1626that's come to be known as the Schwartzian Transform:
1627
ac9dac7f
RGS
1628 @sorted = map { $_->[0] }
1629 sort { $a->[1] cmp $b->[1] }
1630 map { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
68dc0745 1631
1632If you need to sort on several fields, the following paradigm is useful.
1633
ac9dac7f
RGS
1634 @sorted = sort {
1635 field1($a) <=> field1($b) ||
1636 field2($a) cmp field2($b) ||
1637 field3($a) cmp field3($b)
1638 } @data;
68dc0745 1639
1640This can be conveniently combined with precalculation of keys as given
1641above.
1642
379e39d7 1643See the F<sort> article in the "Far More Than You Ever Wanted
49d635f9 1644To Know" collection in http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz for
06a5f41f 1645more about this approach.
68dc0745 1646
ac9dac7f 1647See also the question later in L<perlfaq4> on sorting hashes.
68dc0745 1648
1649=head2 How do I manipulate arrays of bits?
1650
ac9dac7f
RGS
1651Use C<pack()> and C<unpack()>, or else C<vec()> and the bitwise
1652operations.
1653
1654For example, this sets C<$vec> to have bit N set if C<$ints[N]> was
1655set:
1656
1657 $vec = '';
1658 foreach(@ints) { vec($vec,$_,1) = 1 }
1659
1660Here's how, given a vector in C<$vec>, you can get those bits into your
1661C<@ints> array:
1662
1663 sub bitvec_to_list {
1664 my $vec = shift;
1665 my @ints;
1666 # Find null-byte density then select best algorithm
1667 if ($vec =~ tr/\0// / length $vec > 0.95) {
1668 use integer;
1669 my $i;
1670
1671 # This method is faster with mostly null-bytes
1672 while($vec =~ /[^\0]/g ) {
1673 $i = -9 + 8 * pos $vec;
1674 push @ints, $i if vec($vec, ++$i, 1);
1675 push @ints, $i if vec($vec, ++$i, 1);
1676 push @ints, $i if vec($vec, ++$i, 1);
1677 push @ints, $i if vec($vec, ++$i, 1);
1678 push @ints, $i if vec($vec, ++$i, 1);
1679 push @ints, $i if vec($vec, ++$i, 1);
1680 push @ints, $i if vec($vec, ++$i, 1);
1681 push @ints, $i if vec($vec, ++$i, 1);
1682 }
1683 }
1684 else {
1685 # This method is a fast general algorithm
1686 use integer;
1687 my $bits = unpack "b*", $vec;
1688 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1689 push @ints, pos $bits while($bits =~ /1/g);
1690 }
1691
1692 return \@ints;
1693 }
68dc0745 1694
1695This method gets faster the more sparse the bit vector is.
1696(Courtesy of Tim Bunce and Winfried Koenig.)
1697
76817d6d
JH
1698You can make the while loop a lot shorter with this suggestion
1699from Benjamin Goldberg:
1700
1701 while($vec =~ /[^\0]+/g ) {
ac9dac7f
RGS
1702 push @ints, grep vec($vec, $_, 1), $-[0] * 8 .. $+[0] * 8;
1703 }
76817d6d 1704
ac9dac7f 1705Or use the CPAN module C<Bit::Vector>:
cc30d1a7 1706
ac9dac7f
RGS
1707 $vector = Bit::Vector->new($num_of_bits);
1708 $vector->Index_List_Store(@ints);
1709 @ints = $vector->Index_List_Read();
cc30d1a7 1710
ac9dac7f
RGS
1711C<Bit::Vector> provides efficient methods for bit vector, sets of
1712small integers and "big int" math.
cc30d1a7
JH
1713
1714Here's a more extensive illustration using vec():
65acb1b1 1715
ac9dac7f
RGS
1716 # vec demo
1717 $vector = "\xff\x0f\xef\xfe";
1718 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
65acb1b1 1719 unpack("N", $vector), "\n";
ac9dac7f
RGS
1720 $is_set = vec($vector, 23, 1);
1721 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
65acb1b1 1722 pvec($vector);
65acb1b1 1723
ac9dac7f
RGS
1724 set_vec(1,1,1);
1725 set_vec(3,1,1);
1726 set_vec(23,1,1);
1727
1728 set_vec(3,1,3);
1729 set_vec(3,2,3);
1730 set_vec(3,4,3);
1731 set_vec(3,4,7);
1732 set_vec(3,8,3);
1733 set_vec(3,8,7);
1734
1735 set_vec(0,32,17);
1736 set_vec(1,32,17);
1737
1738 sub set_vec {
1739 my ($offset, $width, $value) = @_;
1740 my $vector = '';
1741 vec($vector, $offset, $width) = $value;
1742 print "offset=$offset width=$width value=$value\n";
1743 pvec($vector);
1744 }
65acb1b1 1745
ac9dac7f
RGS
1746 sub pvec {
1747 my $vector = shift;
1748 my $bits = unpack("b*", $vector);
1749 my $i = 0;
1750 my $BASE = 8;
1751
1752 print "vector length in bytes: ", length($vector), "\n";
1753 @bytes = unpack("A8" x length($vector), $bits);
1754 print "bits are: @bytes\n\n";
1755 }
65acb1b1 1756
68dc0745 1757=head2 Why does defined() return true on empty arrays and hashes?
1758
65acb1b1
TC
1759The short story is that you should probably only use defined on scalars or
1760functions, not on aggregates (arrays and hashes). See L<perlfunc/defined>
1761in the 5.004 release or later of Perl for more detail.
68dc0745 1762
1763=head1 Data: Hashes (Associative Arrays)
1764
1765=head2 How do I process an entire hash?
1766
1767Use the each() function (see L<perlfunc/each>) if you don't care
1768whether it's sorted:
1769
ac9dac7f
RGS
1770 while ( ($key, $value) = each %hash) {
1771 print "$key = $value\n";
1772 }
68dc0745 1773
1774If you want it sorted, you'll have to use foreach() on the result of
1775sorting the keys as shown in an earlier question.
1776
1777=head2 What happens if I add or remove keys from a hash while iterating over it?
1778
28b41a80 1779(contributed by brian d foy)
d92eb7b0 1780
28b41a80 1781The easy answer is "Don't do that!"
d92eb7b0 1782
28b41a80
RGS
1783If you iterate through the hash with each(), you can delete the key
1784most recently returned without worrying about it. If you delete or add
1785other keys, the iterator may skip or double up on them since perl
1786may rearrange the hash table. See the
1787entry for C<each()> in L<perlfunc>.
68dc0745 1788
1789=head2 How do I look up a hash element by value?
1790
1791Create a reverse hash:
1792
ac9dac7f
RGS
1793 %by_value = reverse %by_key;
1794 $key = $by_value{$value};
68dc0745 1795
1796That's not particularly efficient. It would be more space-efficient
1797to use:
1798
ac9dac7f
RGS
1799 while (($key, $value) = each %by_key) {
1800 $by_value{$value} = $key;
1801 }
68dc0745 1802
d92eb7b0
GS
1803If your hash could have repeated values, the methods above will only find
1804one of the associated keys. This may or may not worry you. If it does
1805worry you, you can always reverse the hash into a hash of arrays instead:
1806
ac9dac7f
RGS
1807 while (($key, $value) = each %by_key) {
1808 push @{$key_list_by_value{$value}}, $key;
1809 }
68dc0745 1810
1811=head2 How can I know how many entries are in a hash?
1812
1813If you mean how many keys, then all you have to do is
875e5c2f 1814use the keys() function in a scalar context:
68dc0745 1815
875e5c2f 1816 $num_keys = keys %hash;
68dc0745 1817
197aec24
RGS
1818The keys() function also resets the iterator, which means that you may
1819see strange results if you use this between uses of other hash operators
875e5c2f 1820such as each().
68dc0745 1821
1822=head2 How do I sort a hash (optionally by value instead of key)?
1823
a05e4845
RGS
1824(contributed by brian d foy)
1825
1826To sort a hash, start with the keys. In this example, we give the list of
1827keys to the sort function which then compares them ASCIIbetically (which
1828might be affected by your locale settings). The output list has the keys
1829in ASCIIbetical order. Once we have the keys, we can go through them to
1830create a report which lists the keys in ASCIIbetical order.
1831
1832 my @keys = sort { $a cmp $b } keys %hash;
58103a2e 1833
a05e4845
RGS
1834 foreach my $key ( @keys )
1835 {
1836 printf "%-20s %6d\n", $key, $hash{$value};
1837 }
1838
58103a2e 1839We could get more fancy in the C<sort()> block though. Instead of
a05e4845 1840comparing the keys, we can compute a value with them and use that
58103a2e 1841value as the comparison.
a05e4845
RGS
1842
1843For instance, to make our report order case-insensitive, we use
58103a2e 1844the C<\L> sequence in a double-quoted string to make everything
a05e4845
RGS
1845lowercase. The C<sort()> block then compares the lowercased
1846values to determine in which order to put the keys.
1847
1848 my @keys = sort { "\L$a" cmp "\L$b" } keys %hash;
58103a2e 1849
a05e4845 1850Note: if the computation is expensive or the hash has many elements,
58103a2e 1851you may want to look at the Schwartzian Transform to cache the
a05e4845
RGS
1852computation results.
1853
1854If we want to sort by the hash value instead, we use the hash key
1855to look it up. We still get out a list of keys, but this time they
1856are ordered by their value.
1857
1858 my @keys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
1859
1860From there we can get more complex. If the hash values are the same,
1861we can provide a secondary sort on the hash key.
1862
58103a2e
RGS
1863 my @keys = sort {
1864 $hash{$a} <=> $hash{$b}
a05e4845
RGS
1865 or
1866 "\L$a" cmp "\L$b"
1867 } keys %hash;
68dc0745 1868
1869=head2 How can I always keep my hash sorted?
ac9dac7f 1870X<hash tie sort DB_File Tie::IxHash>
68dc0745 1871
ac9dac7f
RGS
1872You can look into using the C<DB_File> module and C<tie()> using the
1873C<$DB_BTREE> hash bindings as documented in L<DB_File/"In Memory
1874Databases">. The C<Tie::IxHash> module from CPAN might also be
1875instructive. Although this does keep your hash sorted, you might not
1876like the slow down you suffer from the tie interface. Are you sure you
1877need to do this? :)
68dc0745 1878
1879=head2 What's the difference between "delete" and "undef" with hashes?
1880
92993692
JH
1881Hashes contain pairs of scalars: the first is the key, the
1882second is the value. The key will be coerced to a string,
1883although the value can be any kind of scalar: string,
ac9dac7f 1884number, or reference. If a key C<$key> is present in
92993692
JH
1885%hash, C<exists($hash{$key})> will return true. The value
1886for a given key can be C<undef>, in which case
1887C<$hash{$key}> will be C<undef> while C<exists $hash{$key}>
1888will return true. This corresponds to (C<$key>, C<undef>)
1889being in the hash.
68dc0745 1890
ac9dac7f 1891Pictures help... here's the C<%hash> table:
68dc0745 1892
1893 keys values
1894 +------+------+
1895 | a | 3 |
1896 | x | 7 |
1897 | d | 0 |
1898 | e | 2 |
1899 +------+------+
1900
1901And these conditions hold
1902
92993692
JH
1903 $hash{'a'} is true
1904 $hash{'d'} is false
1905 defined $hash{'d'} is true
1906 defined $hash{'a'} is true
1907 exists $hash{'a'} is true (Perl5 only)
1908 grep ($_ eq 'a', keys %hash) is true
68dc0745 1909
1910If you now say
1911
92993692 1912 undef $hash{'a'}
68dc0745 1913
1914your table now reads:
1915
1916
1917 keys values
1918 +------+------+
1919 | a | undef|
1920 | x | 7 |
1921 | d | 0 |
1922 | e | 2 |
1923 +------+------+
1924
1925and these conditions now hold; changes in caps:
1926
92993692
JH
1927 $hash{'a'} is FALSE
1928 $hash{'d'} is false
1929 defined $hash{'d'} is true
1930 defined $hash{'a'} is FALSE
1931 exists $hash{'a'} is true (Perl5 only)
1932 grep ($_ eq 'a', keys %hash) is true
68dc0745 1933
1934Notice the last two: you have an undef value, but a defined key!
1935
1936Now, consider this:
1937
92993692 1938 delete $hash{'a'}
68dc0745 1939
1940your table now reads:
1941
1942 keys values
1943 +------+------+
1944 | x | 7 |
1945 | d | 0 |
1946 | e | 2 |
1947 +------+------+
1948
1949and these conditions now hold; changes in caps:
1950
92993692
JH
1951 $hash{'a'} is false
1952 $hash{'d'} is false
1953 defined $hash{'d'} is true
1954 defined $hash{'a'} is false
1955 exists $hash{'a'} is FALSE (Perl5 only)
1956 grep ($_ eq 'a', keys %hash) is FALSE
68dc0745 1957
1958See, the whole entry is gone!
1959
1960=head2 Why don't my tied hashes make the defined/exists distinction?
1961
92993692
JH
1962This depends on the tied hash's implementation of EXISTS().
1963For example, there isn't the concept of undef with hashes
1964that are tied to DBM* files. It also means that exists() and
1965defined() do the same thing with a DBM* file, and what they
1966end up doing is not what they do with ordinary hashes.
68dc0745 1967
1968=head2 How do I reset an each() operation part-way through?
1969
5a964f20 1970Using C<keys %hash> in scalar context returns the number of keys in
68dc0745 1971the hash I<and> resets the iterator associated with the hash. You may
ac9dac7f
RGS
1972need to do this if you use C<last> to exit a loop early so that when
1973you re-enter it, the hash iterator has been reset.
68dc0745 1974
1975=head2 How can I get the unique keys from two hashes?
1976
d92eb7b0
GS
1977First you extract the keys from the hashes into lists, then solve
1978the "removing duplicates" problem described above. For example:
68dc0745 1979
ac9dac7f
RGS
1980 %seen = ();
1981 for $element (keys(%foo), keys(%bar)) {
1982 $seen{$element}++;
1983 }
1984 @uniq = keys %seen;
68dc0745 1985
1986Or more succinctly:
1987
ac9dac7f 1988 @uniq = keys %{{%foo,%bar}};
68dc0745 1989
1990Or if you really want to save space:
1991
ac9dac7f
RGS
1992 %seen = ();
1993 while (defined ($key = each %foo)) {
1994 $seen{$key}++;
1995 }
1996 while (defined ($key = each %bar)) {
1997 $seen{$key}++;
1998 }
1999 @uniq = keys %seen;
68dc0745 2000
2001=head2 How can I store a multidimensional array in a DBM file?
2002
2003Either stringify the structure yourself (no fun), or else
2004get the MLDBM (which uses Data::Dumper) module from CPAN and layer
2005it on top of either DB_File or GDBM_File.
2006
2007=head2 How can I make my hash remember the order I put elements into it?
2008
ac9dac7f 2009Use the C<Tie::IxHash> from CPAN.
68dc0745 2010
ac9dac7f
RGS
2011 use Tie::IxHash;
2012
2013 tie my %myhash, 'Tie::IxHash';
2014
2015 for (my $i=0; $i<20; $i++) {
2016 $myhash{$i} = 2*$i;
2017 }
2018
2019 my @keys = keys %myhash;
2020 # @keys = (0,1,2,3,...)
46fc3d4c 2021
68dc0745 2022=head2 Why does passing a subroutine an undefined element in a hash create it?
2023
2024If you say something like:
2025
ac9dac7f 2026 somefunc($hash{"nonesuch key here"});
68dc0745 2027
2028Then that element "autovivifies"; that is, it springs into existence
2029whether you store something there or not. That's because functions
2030get scalars passed in by reference. If somefunc() modifies C<$_[0]>,
2031it has to be ready to write it back into the caller's version.
2032
87275199 2033This has been fixed as of Perl5.004.
68dc0745 2034
2035Normally, merely accessing a key's value for a nonexistent key does
2036I<not> cause that key to be forever there. This is different than
2037awk's behavior.
2038
fc36a67e 2039=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
68dc0745 2040
65acb1b1
TC
2041Usually a hash ref, perhaps like this:
2042
ac9dac7f
RGS
2043 $record = {
2044 NAME => "Jason",
2045 EMPNO => 132,
2046 TITLE => "deputy peon",
2047 AGE => 23,
2048 SALARY => 37_000,
2049 PALS => [ "Norbert", "Rhys", "Phineas"],
2050 };
65acb1b1
TC
2051
2052References are documented in L<perlref> and the upcoming L<perlreftut>.
2053Examples of complex data structures are given in L<perldsc> and
2054L<perllol>. Examples of structures and object-oriented classes are
2055in L<perltoot>.
68dc0745 2056
2057=head2 How can I use a reference as a hash key?
2058
9e72e4c6
RGS
2059(contributed by brian d foy)
2060
2061Hash keys are strings, so you can't really use a reference as the key.
2062When you try to do that, perl turns the reference into its stringified
ac9dac7f
RGS
2063form (for instance, C<HASH(0xDEADBEEF)>). From there you can't get
2064back the reference from the stringified form, at least without doing
2065some extra work on your own. Also remember that hash keys must be
2066unique, but two different variables can store the same reference (and
2067those variables can change later).
9e72e4c6 2068
ac9dac7f
RGS
2069The C<Tie::RefHash> module, which is distributed with perl, might be
2070what you want. It handles that extra work.
68dc0745 2071
2072=head1 Data: Misc
2073
2074=head2 How do I handle binary data correctly?
2075
ac9dac7f
RGS
2076Perl is binary clean, so it can handle binary data just fine.
2077On Windows or DOS, however, you have to use C<binmode> for binary
2078files to avoid conversions for line endings. In general, you should
2079use C<binmode> any time you want to work with binary data.
68dc0745 2080
ac9dac7f 2081Also see L<perlfunc/"binmode"> or L<perlopentut>.
68dc0745 2082
ac9dac7f 2083If you're concerned about 8-bit textual data then see L<perllocale>.
54310121 2084If you want to deal with multibyte characters, however, there are
68dc0745 2085some gotchas. See the section on Regular Expressions.
2086
2087=head2 How do I determine whether a scalar is a number/whole/integer/float?
2088
2089Assuming that you don't care about IEEE notations like "NaN" or
2090"Infinity", you probably just want to use a regular expression.
2091
ac9dac7f
RGS
2092 if (/\D/) { print "has nondigits\n" }
2093 if (/^\d+$/) { print "is a whole number\n" }
2094 if (/^-?\d+$/) { print "is an integer\n" }
2095 if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
2096 if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
2097 if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
2098 if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
881bdbd4 2099 { print "a C float\n" }
68dc0745 2100
f0d19b68
RGS
2101There are also some commonly used modules for the task.
2102L<Scalar::Util> (distributed with 5.8) provides access to perl's
ac9dac7f
RGS
2103internal function C<looks_like_number> for determining whether a
2104variable looks like a number. L<Data::Types> exports functions that
2105validate data types using both the above and other regular
2106expressions. Thirdly, there is C<Regexp::Common> which has regular
2107expressions to match various types of numbers. Those three modules are
2108available from the CPAN.
f0d19b68
RGS
2109
2110If you're on a POSIX system, Perl supports the C<POSIX::strtod>
ac9dac7f
RGS
2111function. Its semantics are somewhat cumbersome, so here's a
2112C<getnum> wrapper function for more convenient access. This function
2113takes a string and returns the number it found, or C<undef> for input
2114that isn't a C float. The C<is_numeric> function is a front end to
2115C<getnum> if you just want to say, "Is this a float?"
2116
2117 sub getnum {
2118 use POSIX qw(strtod);
2119 my $str = shift;
2120 $str =~ s/^\s+//;
2121 $str =~ s/\s+$//;
2122 $! = 0;
2123 my($num, $unparsed) = strtod($str);
2124 if (($str eq '') || ($unparsed != 0) || $!) {
2125 return undef;
2126 }
2127 else {
2128 return $num;
2129 }
2130 }
5a964f20 2131
ac9dac7f 2132 sub is_numeric { defined getnum($_[0]) }
5a964f20 2133
f0d19b68 2134Or you could check out the L<String::Scanf> module on the CPAN
ac9dac7f
RGS
2135instead. The C<POSIX> module (part of the standard Perl distribution)
2136provides the C<strtod> and C<strtol> for converting strings to double
2137and longs, respectively.
68dc0745 2138
2139=head2 How do I keep persistent data across program calls?
2140
2141For some specific applications, you can use one of the DBM modules.
ac9dac7f
RGS
2142See L<AnyDBM_File>. More generically, you should consult the C<FreezeThaw>
2143or C<Storable> modules from CPAN. Starting from Perl 5.8 C<Storable> is part
2144of the standard distribution. Here's one example using C<Storable>'s C<store>
fe854a6f 2145and C<retrieve> functions:
65acb1b1 2146
ac9dac7f
RGS
2147 use Storable;
2148 store(\%hash, "filename");
65acb1b1 2149
ac9dac7f
RGS
2150 # later on...
2151 $href = retrieve("filename"); # by ref
2152 %hash = %{ retrieve("filename") }; # direct to hash
68dc0745 2153
2154=head2 How do I print out or copy a recursive data structure?
2155
ac9dac7f
RGS
2156The C<Data::Dumper> module on CPAN (or the 5.005 release of Perl) is great
2157for printing out data structures. The C<Storable> module on CPAN (or the
6f82c03a
EM
21585.8 release of Perl), provides a function called C<dclone> that recursively
2159copies its argument.
65acb1b1 2160
ac9dac7f
RGS
2161 use Storable qw(dclone);
2162 $r2 = dclone($r1);
68dc0745 2163
ac9dac7f 2164Where C<$r1> can be a reference to any kind of data structure you'd like.
65acb1b1
TC
2165It will be deeply copied. Because C<dclone> takes and returns references,
2166you'd have to add extra punctuation if you had a hash of arrays that
2167you wanted to copy.
68dc0745 2168
ac9dac7f 2169 %newhash = %{ dclone(\%oldhash) };
68dc0745 2170
2171=head2 How do I define methods for every class/object?
2172
ac9dac7f 2173Use the C<UNIVERSAL> class (see L<UNIVERSAL>).
68dc0745 2174
2175=head2 How do I verify a credit card checksum?
2176
ac9dac7f 2177Get the C<Business::CreditCard> module from CPAN.
68dc0745 2178
65acb1b1
TC
2179=head2 How do I pack arrays of doubles or floats for XS code?
2180
ac9dac7f 2181The kgbpack.c code in the C<PGPLOT> module on CPAN does just this.
65acb1b1 2182If you're doing a lot of float or double processing, consider using
ac9dac7f 2183the C<PDL> module from CPAN instead--it makes number-crunching easy.
65acb1b1 2184
500071f4
RGS
2185=head1 REVISION
2186
ac9dac7f 2187Revision: $Revision: 6816 $
500071f4 2188
ac9dac7f 2189Date: $Date: 2006-08-20 21:20:03 +0200 (dim, 20 aoĆ» 2006) $
500071f4
RGS
2190
2191See L<perlfaq> for source control details and availability.
2192
68dc0745 2193=head1 AUTHOR AND COPYRIGHT
2194
58103a2e 2195Copyright (c) 1997-2006 Tom Christiansen, Nathan Torkington, and
7678cced 2196other authors as noted. All rights reserved.
5a964f20 2197
5a7beb56
JH
2198This documentation is free; you can redistribute it and/or modify it
2199under the same terms as Perl itself.
5a964f20
TC
2200
2201Irrespective of its distribution, all code examples in this file
2202are hereby placed into the public domain. You are permitted and
2203encouraged to use this code in your own programs for fun
2204or for profit as you see fit. A simple comment in the code giving
2205credit would be courteous but is not required.