This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
integrate change#2904 from maint-5.005
[perl5.git] / pod / perlfaq4.pod
CommitLineData
68dc0745 1=head1 NAME
2
65acb1b1 3perlfaq4 - Data Manipulation ($Revision: 1.40 $, $Date: 1999/01/08 04:26:39 $)
68dc0745 4
5=head1 DESCRIPTION
6
7The section of the FAQ answers question related to the manipulation
8of data as numbers, dates, strings, arrays, hashes, and miscellaneous
9data issues.
10
11=head1 Data: Numbers
12
46fc3d4c 13=head2 Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
14
5a964f20
TC
15The infinite set that a mathematician thinks of as the real numbers can
16only be approximate on a computer, since the computer only has a finite
17number of bits to store an infinite number of, um, numbers.
18
46fc3d4c 19Internally, your computer represents floating-point numbers in binary.
92c2ed05
GS
20Floating-point numbers read in from a file or appearing as literals
21in your program are converted from their decimal floating-point
46fc3d4c 22representation (eg, 19.95) to the internal binary representation.
23
24However, 19.95 can't be precisely represented as a binary
25floating-point number, just like 1/3 can't be exactly represented as a
26decimal floating-point number. The computer's binary representation
27of 19.95, therefore, isn't exactly 19.95.
28
29When a floating-point number gets printed, the binary floating-point
30representation is converted back to decimal. These decimal numbers
31are displayed in either the format you specify with printf(), or the
32current output format for numbers (see L<perlvar/"$#"> if you use
33print. C<$#> has a different default value in Perl5 than it did in
34Perl4. Changing C<$#> yourself is deprecated.
35
36This affects B<all> computer languages that represent decimal
37floating-point numbers in binary, not just Perl. Perl provides
38arbitrary-precision decimal numbers with the Math::BigFloat module
39(part of the standard Perl distribution), but mathematical operations
40are consequently slower.
41
42To get rid of the superfluous digits, just use a format (eg,
43C<printf("%.2f", 19.95)>) to get the required precision.
65acb1b1 44See L<perlop/"Floating-point Arithmetic">.
46fc3d4c 45
68dc0745 46=head2 Why isn't my octal data interpreted correctly?
47
48Perl only understands octal and hex numbers as such when they occur
49as literals in your program. If they are read in from somewhere and
50assigned, no automatic conversion takes place. You must explicitly
51use oct() or hex() if you want the values converted. oct() interprets
52both hex ("0x350") numbers and octal ones ("0350" or even without the
53leading "0", like "377"), while hex() only converts hexadecimal ones,
54with or without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
55
56This problem shows up most often when people try using chmod(), mkdir(),
57umask(), or sysopen(), which all want permissions in octal.
58
59 chmod(644, $file); # WRONG -- perl -w catches this
60 chmod(0644, $file); # right
61
65acb1b1 62=head2 Does Perl have a round() function? What about ceil() and floor()? Trig functions?
68dc0745 63
92c2ed05
GS
64Remember that int() merely truncates toward 0. For rounding to a
65certain number of digits, sprintf() or printf() is usually the easiest
66route.
67
68 printf("%.3f", 3.1415926535); # prints 3.142
68dc0745 69
70The POSIX module (part of the standard perl distribution) implements
71ceil(), floor(), and a number of other mathematical and trigonometric
72functions.
73
92c2ed05
GS
74 use POSIX;
75 $ceil = ceil(3.5); # 4
76 $floor = floor(3.5); # 3
77
46fc3d4c 78In 5.000 to 5.003 Perls, trigonometry was done in the Math::Complex
79module. With 5.004, the Math::Trig module (part of the standard perl
80distribution) implements the trigonometric functions. Internally it
81uses the Math::Complex module and some functions can break out from
82the real axis into the complex plane, for example the inverse sine of
832.
68dc0745 84
85Rounding in financial applications can have serious implications, and
86the rounding method used should be specified precisely. In these
87cases, it probably pays not to trust whichever system rounding is
88being used by Perl, but to instead implement the rounding function you
89need yourself.
90
65acb1b1
TC
91To see why, notice how you'll still have an issue on half-way-point
92alternation:
93
94 for ($i = 0; $i < 1.01; $i += 0.05) { printf "%.1f ",$i}
95
96 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.7
97 0.8 0.8 0.9 0.9 1.0 1.0
98
99Don't blame Perl. It's the same as in C. IEEE says we have to do this.
100Perl numbers whose absolute values are integers under 2**31 (on 32 bit
101machines) will work pretty much like mathematical integers. Other numbers
102are not guaranteed.
103
68dc0745 104=head2 How do I convert bits into ints?
105
92c2ed05 106To turn a string of 1s and 0s like C<10110110> into a scalar containing
68dc0745 107its binary value, use the pack() function (documented in
108L<perlfunc/"pack">):
109
110 $decimal = pack('B8', '10110110');
111
112Here's an example of going the other way:
113
114 $binary_string = join('', unpack('B*', "\x29"));
115
65acb1b1
TC
116=head2 Why doesn't & work the way I want it to?
117
118The behavior of binary arithmetic operators depends on whether they're
119used on numbers or strings. The operators treat a string as a series
120of bits and work with that (the string C<"3"> is the bit pattern
121C<00110011>). The operators work with the binary form of a number
122(the number C<3> is treated as the bit pattern C<00000011>).
123
124So, saying C<11 & 3> performs the "and" operation on numbers (yielding
125C<1>). Saying C<"11" & "3"> performs the "and" operation on strings
126(yielding C<"1">).
127
128Most problems with C<&> and C<|> arise because the programmer thinks
129they have a number but really it's a string. The rest arise because
130the programmer says:
131
132 if ("\020\020" & "\101\101") {
133 # ...
134 }
135
136but a string consisting of two null bytes (the result of C<"\020\020"
137& "\101\101">) is not a false value in Perl. You need:
138
139 if ( ("\020\020" & "\101\101") !~ /[^\000]/) {
140 # ...
141 }
142
68dc0745 143=head2 How do I multiply matrices?
144
145Use the Math::Matrix or Math::MatrixReal modules (available from CPAN)
146or the PDL extension (also available from CPAN).
147
148=head2 How do I perform an operation on a series of integers?
149
150To call a function on each element in an array, and collect the
151results, use:
152
153 @results = map { my_func($_) } @array;
154
155For example:
156
157 @triple = map { 3 * $_ } @single;
158
159To call a function on each element of an array, but ignore the
160results:
161
162 foreach $iterator (@array) {
65acb1b1 163 some_func($iterator);
68dc0745 164 }
165
166To call a function on each integer in a (small) range, you B<can> use:
167
65acb1b1 168 @results = map { some_func($_) } (5 .. 25);
68dc0745 169
170but you should be aware that the C<..> operator creates an array of
171all integers in the range. This can take a lot of memory for large
172ranges. Instead use:
173
174 @results = ();
175 for ($i=5; $i < 500_005; $i++) {
65acb1b1 176 push(@results, some_func($i));
68dc0745 177 }
178
179=head2 How can I output Roman numerals?
180
181Get the http://www.perl.com/CPAN/modules/by-module/Roman module.
182
183=head2 Why aren't my random numbers random?
184
65acb1b1
TC
185If you're using a version of Perl before 5.004, you must call C<srand>
186once at the start of your program to seed the random number generator.
1875.004 and later automatically call C<srand> at the beginning. Don't
188call C<srand> more than once--you make your numbers less random, rather
189than more.
92c2ed05 190
65acb1b1
TC
191Computers are good at being predictable and bad at being random
192(despite appearances caused by bugs in your programs :-).
193http://www.perl.com/CPAN/doc/FMTEYEWTK/random, courtesy of Tom
194Phoenix, talks more about this.. John von Neumann said, ``Anyone who
195attempts to generate random numbers by deterministic means is, of
196course, living in a state of sin.''
197
198If you want numbers that are more random than C<rand> with C<srand>
199provides, you should also check out the Math::TrulyRandom module from
200CPAN. It uses the imperfections in your system's timer to generate
201random numbers, but this takes quite a while. If you want a better
92c2ed05 202pseudorandom generator than comes with your operating system, look at
65acb1b1 203``Numerical Recipes in C'' at http://www.nr.com/ .
68dc0745 204
205=head1 Data: Dates
206
207=head2 How do I find the week-of-the-year/day-of-the-year?
208
209The day of the year is in the array returned by localtime() (see
210L<perlfunc/"localtime">):
211
212 $day_of_year = (localtime(time()))[7];
213
214or more legibly (in 5.004 or higher):
215
216 use Time::localtime;
217 $day_of_year = localtime(time())->yday;
218
219You can find the week of the year by dividing this by 7:
220
221 $week_of_year = int($day_of_year / 7);
222
92c2ed05
GS
223Of course, this believes that weeks start at zero. The Date::Calc
224module from CPAN has a lot of date calculation functions, including
5e3006a4 225day of the year, week of the year, and so on. Note that not
65acb1b1
TC
226all businesses consider ``week 1'' to be the same; for example,
227American businesses often consider the first week with a Monday
228in it to be Work Week #1, despite ISO 8601, which considers
229WW1 to be the first week with a Thursday in it.
68dc0745 230
92c2ed05 231=head2 How can I compare two dates and find the difference?
68dc0745 232
92c2ed05
GS
233If you're storing your dates as epoch seconds then simply subtract one
234from the other. If you've got a structured date (distinct year, day,
235month, hour, minute, seconds values) then use one of the Date::Manip
236and Date::Calc modules from CPAN.
68dc0745 237
238=head2 How can I take a string and turn it into epoch seconds?
239
240If it's a regular enough string that it always has the same format,
92c2ed05
GS
241you can split it up and pass the parts to C<timelocal> in the standard
242Time::Local module. Otherwise, you should look into the Date::Calc
243and Date::Manip modules from CPAN.
68dc0745 244
245=head2 How can I find the Julian Day?
246
92c2ed05
GS
247Neither Date::Manip nor Date::Calc deal with Julian days. Instead,
248there is an example of Julian date calculation that should help you in
be94a901
GS
249Time::JulianDay (part of the Time-modules bundle) which can be found at
250http://www.perl.com/CPAN/modules/by-module/Time/.
251
68dc0745 252
65acb1b1
TC
253=head2 How do I find yesterday's date?
254
255The C<time()> function returns the current time in seconds since the
256epoch. Take one day off that:
257
258 $yesterday = time() - ( 24 * 60 * 60 );
259
260Then you can pass this to C<localtime()> and get the individual year,
261month, day, hour, minute, seconds values.
262
5a964f20 263=head2 Does Perl have a year 2000 problem? Is Perl Y2K compliant?
68dc0745 264
65acb1b1
TC
265Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is
266Y2K compliant (whatever that means). The programmers you've hired to
267use it, however, probably are not.
268
269Long answer: The question belies a true understanding of the issue.
270Perl is just as Y2K compliant as your pencil--no more, and no less.
271Can you use your pencil to write a non-Y2K-compliant memo? Of course
272you can. Is that the pencil's fault? Of course it isn't.
92c2ed05 273
65acb1b1
TC
274The date and time functions supplied with perl (gmtime and localtime)
275supply adequate information to determine the year well beyond 2000
276(2038 is when trouble strikes for 32-bit machines). The year returned
277by these functions when used in an array context is the year minus 1900.
278For years between 1910 and 1999 this I<happens> to be a 2-digit decimal
279number. To avoid the year 2000 problem simply do not treat the year as
280a 2-digit number. It isn't.
68dc0745 281
5a964f20 282When gmtime() and localtime() are used in scalar context they return
68dc0745 283a timestamp string that contains a fully-expanded year. For example,
284C<$timestamp = gmtime(1005613200)> sets $timestamp to "Tue Nov 13 01:00:00
2852001". There's no year 2000 problem here.
286
5a964f20
TC
287That doesn't mean that Perl can't be used to create non-Y2K compliant
288programs. It can. But so can your pencil. It's the fault of the user,
289not the language. At the risk of inflaming the NRA: ``Perl doesn't
290break Y2K, people do.'' See http://language.perl.com/news/y2k.html for
291a longer exposition.
292
68dc0745 293=head1 Data: Strings
294
295=head2 How do I validate input?
296
297The answer to this question is usually a regular expression, perhaps
5a964f20 298with auxiliary logic. See the more specific questions (numbers, mail
68dc0745 299addresses, etc.) for details.
300
301=head2 How do I unescape a string?
302
92c2ed05
GS
303It depends just what you mean by ``escape''. URL escapes are dealt
304with in L<perlfaq9>. Shell escapes with the backslash (C<\>)
68dc0745 305character are removed with:
306
307 s/\\(.)/$1/g;
308
92c2ed05 309This won't expand C<"\n"> or C<"\t"> or any other special escapes.
68dc0745 310
311=head2 How do I remove consecutive pairs of characters?
312
92c2ed05 313To turn C<"abbcccd"> into C<"abccd">:
68dc0745 314
315 s/(.)\1/$1/g;
316
317=head2 How do I expand function calls in a string?
318
319This is documented in L<perlref>. In general, this is fraught with
320quoting and readability problems, but it is possible. To interpolate
5a964f20 321a subroutine call (in list context) into a string:
68dc0745 322
323 print "My sub returned @{[mysub(1,2,3)]} that time.\n";
324
325If you prefer scalar context, similar chicanery is also useful for
326arbitrary expressions:
327
328 print "That yields ${\($n + 5)} widgets\n";
329
92c2ed05
GS
330Version 5.004 of Perl had a bug that gave list context to the
331expression in C<${...}>, but this is fixed in version 5.005.
332
333See also ``How can I expand variables in text strings?'' in this
334section of the FAQ.
46fc3d4c 335
68dc0745 336=head2 How do I find matching/nesting anything?
337
92c2ed05
GS
338This isn't something that can be done in one regular expression, no
339matter how complicated. To find something between two single
340characters, a pattern like C</x([^x]*)x/> will get the intervening
341bits in $1. For multiple ones, then something more like
342C</alpha(.*?)omega/> would be needed. But none of these deals with
343nested patterns, nor can they. For that you'll have to write a
344parser.
345
346If you are serious about writing a parser, there are a number of
347modules or oddities that will make your life a lot easier. There is
348the CPAN module Parse::RecDescent, the standard module Text::Balanced,
65acb1b1
TC
349the byacc program, the CPAN module Parse::Yapp, and Mark-Jason
350Dominus's excellent I<py> tool at http://www.plover.com/~mjd/perl/py/
351.
68dc0745 352
92c2ed05
GS
353One simple destructive, inside-out approach that you might try is to
354pull out the smallest nesting parts one at a time:
5a964f20 355
c8db1d39 356 while (s//BEGIN((?:(?!BEGIN)(?!END).)*)END/gs) {
5a964f20
TC
357 # do something with $1
358 }
359
65acb1b1
TC
360A more complicated and sneaky approach is to make Perl's regular
361expression engine do it for you. This is courtesy Dean Inada, and
362rather has the nature of an Obfuscated Perl Contest entry, but it
363really does work:
364
365 # $_ contains the string to parse
366 # BEGIN and END are the opening and closing markers for the
367 # nested text.
368
369 @( = ('(','');
370 @) = (')','');
371 ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;
372 @$ = (eval{/$re/},$@!~/unmatched/);
373 print join("\n",@$[0..$#$]) if( $$[-1] );
374
68dc0745 375=head2 How do I reverse a string?
376
5a964f20 377Use reverse() in scalar context, as documented in
68dc0745 378L<perlfunc/reverse>.
379
380 $reversed = reverse $string;
381
382=head2 How do I expand tabs in a string?
383
5a964f20 384You can do it yourself:
68dc0745 385
386 1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
387
388Or you can just use the Text::Tabs module (part of the standard perl
389distribution).
390
391 use Text::Tabs;
392 @expanded_lines = expand(@lines_with_tabs);
393
394=head2 How do I reformat a paragraph?
395
396Use Text::Wrap (part of the standard perl distribution):
397
398 use Text::Wrap;
399 print wrap("\t", ' ', @paragraphs);
400
92c2ed05 401The paragraphs you give to Text::Wrap should not contain embedded
46fc3d4c 402newlines. Text::Wrap doesn't justify the lines (flush-right).
403
68dc0745 404=head2 How can I access/change the first N letters of a string?
405
406There are many ways. If you just want to grab a copy, use
92c2ed05 407substr():
68dc0745 408
409 $first_byte = substr($a, 0, 1);
410
411If you want to modify part of a string, the simplest way is often to
412use substr() as an lvalue:
413
414 substr($a, 0, 3) = "Tom";
415
92c2ed05
GS
416Although those with a pattern matching kind of thought process will
417likely prefer:
68dc0745 418
419 $a =~ s/^.../Tom/;
420
421=head2 How do I change the Nth occurrence of something?
422
92c2ed05
GS
423You have to keep track of N yourself. For example, let's say you want
424to change the fifth occurrence of C<"whoever"> or C<"whomever"> into
425C<"whosoever"> or C<"whomsoever">, case insensitively.
68dc0745 426
427 $count = 0;
428 s{((whom?)ever)}{
429 ++$count == 5 # is it the 5th?
430 ? "${2}soever" # yes, swap
431 : $1 # renege and leave it there
432 }igex;
433
5a964f20
TC
434In the more general case, you can use the C</g> modifier in a C<while>
435loop, keeping count of matches.
436
437 $WANT = 3;
438 $count = 0;
439 while (/(\w+)\s+fish\b/gi) {
440 if (++$count == $WANT) {
441 print "The third fish is a $1 one.\n";
442 # Warning: don't `last' out of this loop
443 }
444 }
445
92c2ed05 446That prints out: C<"The third fish is a red one."> You can also use a
5a964f20
TC
447repetition count and repeated pattern like this:
448
449 /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
450
68dc0745 451=head2 How can I count the number of occurrences of a substring within a string?
452
453There are a number of ways, with varying efficiency: If you want a
454count of a certain single character (X) within a string, you can use the
455C<tr///> function like so:
456
368c9434 457 $string = "ThisXlineXhasXsomeXx'sXinXit";
68dc0745 458 $count = ($string =~ tr/X//);
46fc3d4c 459 print "There are $count X charcters in the string";
68dc0745 460
461This is fine if you are just looking for a single character. However,
462if you are trying to count multiple character substrings within a
463larger string, C<tr///> won't work. What you can do is wrap a while()
464loop around a global pattern match. For example, let's count negative
465integers:
466
467 $string = "-9 55 48 -2 23 -76 4 14 -44";
468 while ($string =~ /-\d+/g) { $count++ }
469 print "There are $count negative numbers in the string";
470
471=head2 How do I capitalize all the words on one line?
472
473To make the first letter of each word upper case:
3fe9a6f1 474
68dc0745 475 $line =~ s/\b(\w)/\U$1/g;
476
46fc3d4c 477This has the strange effect of turning "C<don't do it>" into "C<Don'T
478Do It>". Sometimes you might want this, instead (Suggested by Brian
92c2ed05 479Foy):
46fc3d4c 480
481 $string =~ s/ (
482 (^\w) #at the beginning of the line
483 | # or
484 (\s\w) #preceded by whitespace
485 )
486 /\U$1/xg;
487 $string =~ /([\w']+)/\u\L$1/g;
488
68dc0745 489To make the whole line upper case:
3fe9a6f1 490
68dc0745 491 $line = uc($line);
492
493To force each word to be lower case, with the first letter upper case:
3fe9a6f1 494
68dc0745 495 $line =~ s/(\w+)/\u\L$1/g;
496
5a964f20
TC
497You can (and probably should) enable locale awareness of those
498characters by placing a C<use locale> pragma in your program.
92c2ed05 499See L<perllocale> for endless details on locales.
5a964f20 500
65acb1b1
TC
501This is sometimes referred to as putting something into "title
502case", but that's not quite accurate. Consdier the proper
503capitalization of the movie I<Dr. Strangelove or: How I Learned to
504Stop Worrying and Love the Bomb>, for example.
505
68dc0745 506=head2 How can I split a [character] delimited string except when inside
507[character]? (Comma-separated files)
508
509Take the example case of trying to split a string that is comma-separated
510into its different fields. (We'll pretend you said comma-separated, not
511comma-delimited, which is different and almost never what you mean.) You
512can't use C<split(/,/)> because you shouldn't split if the comma is inside
513quotes. For example, take a data line like this:
514
515 SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
516
517Due to the restriction of the quotes, this is a fairly complex
518problem. Thankfully, we have Jeffrey Friedl, author of a highly
519recommended book on regular expressions, to handle these for us. He
520suggests (assuming your string is contained in $text):
521
522 @new = ();
523 push(@new, $+) while $text =~ m{
524 "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
525 | ([^,]+),?
526 | ,
527 }gx;
528 push(@new, undef) if substr($text,-1,1) eq ',';
529
46fc3d4c 530If you want to represent quotation marks inside a
531quotation-mark-delimited field, escape them with backslashes (eg,
2ceaccd7 532C<"like \"this\"">. Unescaping them is a task addressed earlier in
46fc3d4c 533this section.
534
68dc0745 535Alternatively, the Text::ParseWords module (part of the standard perl
536distribution) lets you say:
537
538 use Text::ParseWords;
539 @new = quotewords(",", 0, $text);
540
65acb1b1
TC
541There's also a Text::CSV module on CPAN.
542
68dc0745 543=head2 How do I strip blank space from the beginning/end of a string?
544
5a964f20 545Although the simplest approach would seem to be:
68dc0745 546
547 $string =~ s/^\s*(.*?)\s*$/$1/;
548
65acb1b1 549This is unnecessarily slow, destructive, and fails with embedded newlines.
5a964f20 550It is much better faster to do this in two steps:
68dc0745 551
552 $string =~ s/^\s+//;
553 $string =~ s/\s+$//;
554
555Or more nicely written as:
556
557 for ($string) {
558 s/^\s+//;
559 s/\s+$//;
560 }
561
5e3006a4 562This idiom takes advantage of the C<foreach> loop's aliasing
5a964f20
TC
563behavior to factor out common code. You can do this
564on several strings at once, or arrays, or even the
565values of a hash if you use a slide:
566
567 # trim whitespace in the scalar, the array,
568 # and all the values in the hash
569 foreach ($scalar, @array, @hash{keys %hash}) {
570 s/^\s+//;
571 s/\s+$//;
572 }
573
65acb1b1
TC
574=head2 How do I pad a string with blanks or pad a number with zeroes?
575
576(This answer contributed by Uri Guttman)
577
578In the following examples, C<$pad_len> is the length to which you wish
579to pad the string, C<$text> or C<$num> contains the string to be
580padded, and C<$pad_char> contains the padding character. You can use a
581single character string constant instead of the C<$pad_char> variable
582if you know what it is in advance.
583
584The simplest method use the C<sprintf> function. It can pad on the
585left or right with blanks and on the left with zeroes.
586
587 # Left padding with blank:
588 $padded = sprintf( "%${pad_len}s", $text ) ;
589
590 # Right padding with blank:
591 $padded = sprintf( "%${pad_len}s", $text ) ;
592
593 # Left padding with 0:
594 $padded = sprintf( "%0${pad_len}d", $num ) ;
595
596If you need to pad with a character other than blank or zero you can use
597one of the following methods.
598
599These methods generate a pad string with the C<x> operator and
600concatenate that with the original text.
601
602Left and right padding with any character:
603
604 $padded = $pad_char x ( $pad_len - length( $text ) ) . $text ;
605 $padded = $text . $pad_char x ( $pad_len - length( $text ) ) ;
606
607Or you can left or right pad $text directly:
608
609 $text .= $pad_char x ( $pad_len - length( $text ) ) ;
610 substr( $text, 0, 0 ) = $pad_char x ( $pad_len - length( $text ) ) ;
611
68dc0745 612=head2 How do I extract selected columns from a string?
613
614Use substr() or unpack(), both documented in L<perlfunc>.
5a964f20
TC
615If you prefer thinking in terms of columns instead of widths,
616you can use this kind of thing:
617
618 # determine the unpack format needed to split Linux ps output
619 # arguments are cut columns
620 my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
621
622 sub cut2fmt {
623 my(@positions) = @_;
624 my $template = '';
625 my $lastpos = 1;
626 for my $place (@positions) {
627 $template .= "A" . ($place - $lastpos) . " ";
628 $lastpos = $place;
629 }
630 $template .= "A*";
631 return $template;
632 }
68dc0745 633
634=head2 How do I find the soundex value of a string?
635
636Use the standard Text::Soundex module distributed with perl.
637
638=head2 How can I expand variables in text strings?
639
640Let's assume that you have a string like:
641
642 $text = 'this has a $foo in it and a $bar';
5a964f20
TC
643
644If those were both global variables, then this would
645suffice:
646
65acb1b1 647 $text =~ s/\$(\w+)/${$1}/g; # no /e needed
68dc0745 648
5a964f20
TC
649But since they are probably lexicals, or at least, they could
650be, you'd have to do this:
68dc0745 651
652 $text =~ s/(\$\w+)/$1/eeg;
65acb1b1 653 die if $@; # needed /ee, not /e
68dc0745 654
5a964f20
TC
655It's probably better in the general case to treat those
656variables as entries in some special hash. For example:
657
658 %user_defs = (
659 foo => 23,
660 bar => 19,
661 );
662 $text =~ s/\$(\w+)/$user_defs{$1}/g;
68dc0745 663
92c2ed05 664See also ``How do I expand function calls in a string?'' in this section
46fc3d4c 665of the FAQ.
666
68dc0745 667=head2 What's wrong with always quoting "$vars"?
668
669The problem is that those double-quotes force stringification,
670coercing numbers and references into strings, even when you
65acb1b1
TC
671don't want them to be. Think of it this way: double-quote
672expansion is used to produce new strings. If you already
673have a string, why do you need more?
68dc0745 674
675If you get used to writing odd things like these:
676
677 print "$var"; # BAD
678 $new = "$old"; # BAD
679 somefunc("$var"); # BAD
680
681You'll be in trouble. Those should (in 99.8% of the cases) be
682the simpler and more direct:
683
684 print $var;
685 $new = $old;
686 somefunc($var);
687
688Otherwise, besides slowing you down, you're going to break code when
689the thing in the scalar is actually neither a string nor a number, but
690a reference:
691
692 func(\@array);
693 sub func {
694 my $aref = shift;
695 my $oref = "$aref"; # WRONG
696 }
697
698You can also get into subtle problems on those few operations in Perl
699that actually do care about the difference between a string and a
700number, such as the magical C<++> autoincrement operator or the
701syscall() function.
702
5a964f20
TC
703Stringification also destroys arrays.
704
705 @lines = `command`;
706 print "@lines"; # WRONG - extra blanks
707 print @lines; # right
708
65acb1b1 709=head2 Why don't my E<lt>E<lt>HERE documents work?
68dc0745 710
711Check for these three things:
712
713=over 4
714
715=item 1. There must be no space after the << part.
716
717=item 2. There (probably) should be a semicolon at the end.
718
719=item 3. You can't (easily) have any space in front of the tag.
720
721=back
722
5a964f20
TC
723If you want to indent the text in the here document, you
724can do this:
725
726 # all in one
727 ($VAR = <<HERE_TARGET) =~ s/^\s+//gm;
728 your text
729 goes here
730 HERE_TARGET
731
732But the HERE_TARGET must still be flush against the margin.
733If you want that indented also, you'll have to quote
734in the indentation.
735
736 ($quote = <<' FINIS') =~ s/^\s+//gm;
737 ...we will have peace, when you and all your works have
738 perished--and the works of your dark master to whom you
739 would deliver us. You are a liar, Saruman, and a corrupter
740 of men's hearts. --Theoden in /usr/src/perl/taint.c
741 FINIS
742 $quote =~ s/\s*--/\n--/;
743
744A nice general-purpose fixer-upper function for indented here documents
745follows. It expects to be called with a here document as its argument.
746It looks to see whether each line begins with a common substring, and
747if so, strips that off. Otherwise, it takes the amount of leading
748white space found on the first line and removes that much off each
749subsequent line.
750
751 sub fix {
752 local $_ = shift;
753 my ($white, $leader); # common white space and common leading string
754 if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
755 ($white, $leader) = ($2, quotemeta($1));
756 } else {
757 ($white, $leader) = (/^(\s+)/, '');
758 }
759 s/^\s*?$leader(?:$white)?//gm;
760 return $_;
761 }
762
c8db1d39 763This works with leading special strings, dynamically determined:
5a964f20
TC
764
765 $remember_the_main = fix<<' MAIN_INTERPRETER_LOOP';
766 @@@ int
767 @@@ runops() {
768 @@@ SAVEI32(runlevel);
769 @@@ runlevel++;
770 @@@ while ( op = (*op->op_ppaddr)() ) ;
771 @@@ TAINT_NOT;
772 @@@ return 0;
773 @@@ }
774 MAIN_INTERPRETER_LOOP
775
776Or with a fixed amount of leading white space, with remaining
777indentation correctly preserved:
778
779 $poem = fix<<EVER_ON_AND_ON;
780 Now far ahead the Road has gone,
781 And I must follow, if I can,
782 Pursuing it with eager feet,
783 Until it joins some larger way
784 Where many paths and errands meet.
785 And whither then? I cannot say.
786 --Bilbo in /usr/src/perl/pp_ctl.c
787 EVER_ON_AND_ON
788
68dc0745 789=head1 Data: Arrays
790
65acb1b1
TC
791=head2 What is the difference between a list and an array?
792
793An array has a changeable length. A list does not. An array is something
794you can push or pop, while a list is a set of values. Some people make
795the distinction that a list is a value while an array is a variable.
796Subroutines are passed and return lists, you put things into list
797context, you initialize arrays with lists, and you foreach() across
798a list. C<@> variables are arrays, anonymous arrays are arrays, arrays
799in scalar context behave like the number of elements in them, subroutines
800access their arguments through the array C<@_>, push/pop/shift only work
801on arrays.
802
803As a side note, there's no such thing as a list in scalar context.
804When you say
805
806 $scalar = (2, 5, 7, 9);
807
808you're using the comma operator in scalar context, so it evaluates the
809left hand side, then evaluates and returns the left hand side. This
810causes the last value to be returned: 9.
811
68dc0745 812=head2 What is the difference between $array[1] and @array[1]?
813
814The former is a scalar value, the latter an array slice, which makes
815it a list with one (scalar) value. You should use $ when you want a
816scalar value (most of the time) and @ when you want a list with one
817scalar value in it (very, very rarely; nearly never, in fact).
818
819Sometimes it doesn't make a difference, but sometimes it does.
820For example, compare:
821
822 $good[0] = `some program that outputs several lines`;
823
824with
825
826 @bad[0] = `same program that outputs several lines`;
827
828The B<-w> flag will warn you about these matters.
829
830=head2 How can I extract just the unique elements of an array?
831
832There are several possible ways, depending on whether the array is
833ordered and whether you wish to preserve the ordering.
834
835=over 4
836
837=item a) If @in is sorted, and you want @out to be sorted:
5a964f20 838(this assumes all true values in the array)
68dc0745 839
840 $prev = 'nonesuch';
841 @out = grep($_ ne $prev && ($prev = $_), @in);
842
c8db1d39
TC
843This is nice in that it doesn't use much extra memory, simulating
844uniq(1)'s behavior of removing only adjacent duplicates. It's less
845nice in that it won't work with false values like undef, 0, or "";
846"0 but true" is ok, though.
68dc0745 847
848=item b) If you don't know whether @in is sorted:
849
850 undef %saw;
851 @out = grep(!$saw{$_}++, @in);
852
853=item c) Like (b), but @in contains only small integers:
854
855 @out = grep(!$saw[$_]++, @in);
856
857=item d) A way to do (b) without any loops or greps:
858
859 undef %saw;
860 @saw{@in} = ();
861 @out = sort keys %saw; # remove sort if undesired
862
863=item e) Like (d), but @in contains only small positive integers:
864
865 undef @ary;
866 @ary[@in] = @in;
867 @out = @ary;
868
869=back
870
65acb1b1
TC
871But perhaps you should have been using a hash all along, eh?
872
5a964f20
TC
873=head2 How can I tell whether a list or array contains a certain element?
874
875Hearing the word "in" is an I<in>dication that you probably should have
876used a hash, not a list or array, to store your data. Hashes are
877designed to answer this question quickly and efficiently. Arrays aren't.
68dc0745 878
5a964f20
TC
879That being said, there are several ways to approach this. If you
880are going to make this query many times over arbitrary string values,
881the fastest way is probably to invert the original array and keep an
68dc0745 882associative array lying about whose keys are the first array's values.
883
884 @blues = qw/azure cerulean teal turquoise lapis-lazuli/;
885 undef %is_blue;
886 for (@blues) { $is_blue{$_} = 1 }
887
888Now you can check whether $is_blue{$some_color}. It might have been a
889good idea to keep the blues all in a hash in the first place.
890
891If the values are all small integers, you could use a simple indexed
892array. This kind of an array will take up less space:
893
894 @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
895 undef @is_tiny_prime;
896 for (@primes) { $is_tiny_prime[$_] = 1; }
897
898Now you check whether $is_tiny_prime[$some_number].
899
900If the values in question are integers instead of strings, you can save
901quite a lot of space by using bit strings instead:
902
903 @articles = ( 1..10, 150..2000, 2017 );
904 undef $read;
7b8d334a 905 for (@articles) { vec($read,$_,1) = 1 }
68dc0745 906
907Now check whether C<vec($read,$n,1)> is true for some C<$n>.
908
909Please do not use
910
911 $is_there = grep $_ eq $whatever, @array;
912
913or worse yet
914
915 $is_there = grep /$whatever/, @array;
916
917These are slow (checks every element even if the first matches),
918inefficient (same reason), and potentially buggy (what if there are
65acb1b1
TC
919regexp characters in $whatever?). If you're only testing once, then
920use:
921
922 $is_there = 0;
923 foreach $elt (@array) {
924 if ($elt eq $elt_to_find) {
925 $is_there = 1;
926 last;
927 }
928 }
929 if ($is_there) { ... }
68dc0745 930
931=head2 How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
932
933Use a hash. Here's code to do both and more. It assumes that
934each element is unique in a given array:
935
936 @union = @intersection = @difference = ();
937 %count = ();
938 foreach $element (@array1, @array2) { $count{$element}++ }
939 foreach $element (keys %count) {
940 push @union, $element;
941 push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
942 }
943
65acb1b1
TC
944=head2 How do I test whether two arrays or hashes are equal?
945
946The following code works for single-level arrays. It uses a stringwise
947comparison, and does not distinguish defined versus undefined empty
948strings. Modify if you have other needs.
949
950 $are_equal = compare_arrays(\@frogs, \@toads);
951
952 sub compare_arrays {
953 my ($first, $second) = @_;
954 local $^W = 0; # silence spurious -w undef complaints
955 return 0 unless @$first == @$second;
956 for (my $i = 0; $i < @$first; $i++) {
957 return 0 if $first->[$i] ne $second->[$i];
958 }
959 return 1;
960 }
961
962For multilevel structures, you may wish to use an approach more
963like this one. It uses the CPAN module FreezeThaw:
964
965 use FreezeThaw qw(cmpStr);
966 @a = @b = ( "this", "that", [ "more", "stuff" ] );
967
968 printf "a and b contain %s arrays\n",
969 cmpStr(\@a, \@b) == 0
970 ? "the same"
971 : "different";
972
973This approach also works for comparing hashes. Here
974we'll demonstrate two different answers:
975
976 use FreezeThaw qw(cmpStr cmpStrHard);
977
978 %a = %b = ( "this" => "that", "extra" => [ "more", "stuff" ] );
979 $a{EXTRA} = \%b;
980 $b{EXTRA} = \%a;
981
982 printf "a and b contain %s hashes\n",
983 cmpStr(\%a, \%b) == 0 ? "the same" : "different";
984
985 printf "a and b contain %s hashes\n",
986 cmpStrHard(\%a, \%b) == 0 ? "the same" : "different";
987
988
989The first reports that both those the hashes contain the same data,
990while the second reports that they do not. Which you prefer is left as
991an exercise to the reader.
992
68dc0745 993=head2 How do I find the first array element for which a condition is true?
994
995You can use this if you care about the index:
996
65acb1b1 997 for ($i= 0; $i < @array; $i++) {
68dc0745 998 if ($array[$i] eq "Waldo") {
999 $found_index = $i;
1000 last;
1001 }
1002 }
1003
1004Now C<$found_index> has what you want.
1005
1006=head2 How do I handle linked lists?
1007
1008In general, you usually don't need a linked list in Perl, since with
1009regular arrays, you can push and pop or shift and unshift at either end,
5a964f20
TC
1010or you can use splice to add and/or remove arbitrary number of elements at
1011arbitrary points. Both pop and shift are both O(1) operations on perl's
1012dynamic arrays. In the absence of shifts and pops, push in general
1013needs to reallocate on the order every log(N) times, and unshift will
1014need to copy pointers each time.
68dc0745 1015
1016If you really, really wanted, you could use structures as described in
1017L<perldsc> or L<perltoot> and do just what the algorithm book tells you
65acb1b1
TC
1018to do. For example, imagine a list node like this:
1019
1020 $node = {
1021 VALUE => 42,
1022 LINK => undef,
1023 };
1024
1025You could walk the list this way:
1026
1027 print "List: ";
1028 for ($node = $head; $node; $node = $node->{LINK}) {
1029 print $node->{VALUE}, " ";
1030 }
1031 print "\n";
1032
1033You could grow the list this way:
1034
1035 my ($head, $tail);
1036 $tail = append($head, 1); # grow a new head
1037 for $value ( 2 .. 10 ) {
1038 $tail = append($tail, $value);
1039 }
1040
1041 sub append {
1042 my($list, $value) = @_;
1043 my $node = { VALUE => $value };
1044 if ($list) {
1045 $node->{LINK} = $list->{LINK};
1046 $list->{LINK} = $node;
1047 } else {
1048 $_[0] = $node; # replace caller's version
1049 }
1050 return $node;
1051 }
1052
1053But again, Perl's built-in are virtually always good enough.
68dc0745 1054
1055=head2 How do I handle circular lists?
1056
1057Circular lists could be handled in the traditional fashion with linked
1058lists, or you could just do something like this with an array:
1059
1060 unshift(@array, pop(@array)); # the last shall be first
1061 push(@array, shift(@array)); # and vice versa
1062
1063=head2 How do I shuffle an array randomly?
1064
5a964f20
TC
1065Use this:
1066
1067 # fisher_yates_shuffle( \@array ) :
1068 # generate a random permutation of @array in place
1069 sub fisher_yates_shuffle {
1070 my $array = shift;
1071 my $i;
1072 for ($i = @$array; --$i; ) {
1073 my $j = int rand ($i+1);
1074 next if $i == $j;
1075 @$array[$i,$j] = @$array[$j,$i];
1076 }
1077 }
1078
1079 fisher_yates_shuffle( \@array ); # permutes @array in place
1080
1081You've probably seen shuffling algorithms that works using splice,
68dc0745 1082randomly picking another element to swap the current element with:
1083
1084 srand;
1085 @new = ();
1086 @old = 1 .. 10; # just a demo
1087 while (@old) {
1088 push(@new, splice(@old, rand @old, 1));
1089 }
1090
5a964f20
TC
1091This is bad because splice is already O(N), and since you do it N times,
1092you just invented a quadratic algorithm; that is, O(N**2). This does
1093not scale, although Perl is so efficient that you probably won't notice
1094this until you have rather largish arrays.
68dc0745 1095
1096=head2 How do I process/modify each element of an array?
1097
1098Use C<for>/C<foreach>:
1099
1100 for (@lines) {
5a964f20
TC
1101 s/foo/bar/; # change that word
1102 y/XZ/ZX/; # swap those letters
68dc0745 1103 }
1104
1105Here's another; let's compute spherical volumes:
1106
5a964f20 1107 for (@volumes = @radii) { # @volumes has changed parts
68dc0745 1108 $_ **= 3;
1109 $_ *= (4/3) * 3.14159; # this will be constant folded
1110 }
1111
5a964f20
TC
1112If you want to do the same thing to modify the values of the hash,
1113you may not use the C<values> function, oddly enough. You need a slice:
1114
1115 for $orbit ( @orbits{keys %orbits} ) {
1116 ($orbit **= 3) *= (4/3) * 3.14159;
1117 }
1118
68dc0745 1119=head2 How do I select a random element from an array?
1120
1121Use the rand() function (see L<perlfunc/rand>):
1122
5a964f20 1123 # at the top of the program:
68dc0745 1124 srand; # not needed for 5.004 and later
5a964f20
TC
1125
1126 # then later on
68dc0745 1127 $index = rand @array;
1128 $element = $array[$index];
1129
5a964f20
TC
1130Make sure you I<only call srand once per program, if then>.
1131If you are calling it more than once (such as before each
1132call to rand), you're almost certainly doing something wrong.
1133
68dc0745 1134=head2 How do I permute N elements of a list?
1135
1136Here's a little program that generates all permutations
1137of all the words on each line of input. The algorithm embodied
5a964f20 1138in the permute() function should work on any list:
68dc0745 1139
1140 #!/usr/bin/perl -n
5a964f20
TC
1141 # tsc-permute: permute each word of input
1142 permute([split], []);
1143 sub permute {
1144 my @items = @{ $_[0] };
1145 my @perms = @{ $_[1] };
1146 unless (@items) {
1147 print "@perms\n";
68dc0745 1148 } else {
5a964f20
TC
1149 my(@newitems,@newperms,$i);
1150 foreach $i (0 .. $#items) {
1151 @newitems = @items;
1152 @newperms = @perms;
1153 unshift(@newperms, splice(@newitems, $i, 1));
1154 permute([@newitems], [@newperms]);
68dc0745 1155 }
1156 }
1157 }
1158
1159=head2 How do I sort an array by (anything)?
1160
1161Supply a comparison function to sort() (described in L<perlfunc/sort>):
1162
1163 @list = sort { $a <=> $b } @list;
1164
1165The default sort function is cmp, string comparison, which would
1166sort C<(1, 2, 10)> into C<(1, 10, 2)>. C<E<lt>=E<gt>>, used above, is
1167the numerical comparison operator.
1168
1169If you have a complicated function needed to pull out the part you
1170want to sort on, then don't do it inside the sort function. Pull it
1171out first, because the sort BLOCK can be called many times for the
1172same element. Here's an example of how to pull out the first word
1173after the first number on each item, and then sort those words
1174case-insensitively.
1175
1176 @idx = ();
1177 for (@data) {
1178 ($item) = /\d+\s*(\S+)/;
1179 push @idx, uc($item);
1180 }
1181 @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
1182
1183Which could also be written this way, using a trick
1184that's come to be known as the Schwartzian Transform:
1185
1186 @sorted = map { $_->[0] }
1187 sort { $a->[1] cmp $b->[1] }
46fc3d4c 1188 map { [ $_, uc((/\d+\s*(\S+)/ )[0] ] } @data;
68dc0745 1189
1190If you need to sort on several fields, the following paradigm is useful.
1191
1192 @sorted = sort { field1($a) <=> field1($b) ||
1193 field2($a) cmp field2($b) ||
1194 field3($a) cmp field3($b)
1195 } @data;
1196
1197This can be conveniently combined with precalculation of keys as given
1198above.
1199
1200See http://www.perl.com/CPAN/doc/FMTEYEWTK/sort.html for more about
1201this approach.
1202
1203See also the question below on sorting hashes.
1204
1205=head2 How do I manipulate arrays of bits?
1206
1207Use pack() and unpack(), or else vec() and the bitwise operations.
1208
1209For example, this sets $vec to have bit N set if $ints[N] was set:
1210
1211 $vec = '';
1212 foreach(@ints) { vec($vec,$_,1) = 1 }
1213
1214And here's how, given a vector in $vec, you can
1215get those bits into your @ints array:
1216
1217 sub bitvec_to_list {
1218 my $vec = shift;
1219 my @ints;
1220 # Find null-byte density then select best algorithm
1221 if ($vec =~ tr/\0// / length $vec > 0.95) {
1222 use integer;
1223 my $i;
1224 # This method is faster with mostly null-bytes
1225 while($vec =~ /[^\0]/g ) {
1226 $i = -9 + 8 * pos $vec;
1227 push @ints, $i if vec($vec, ++$i, 1);
1228 push @ints, $i if vec($vec, ++$i, 1);
1229 push @ints, $i if vec($vec, ++$i, 1);
1230 push @ints, $i if vec($vec, ++$i, 1);
1231 push @ints, $i if vec($vec, ++$i, 1);
1232 push @ints, $i if vec($vec, ++$i, 1);
1233 push @ints, $i if vec($vec, ++$i, 1);
1234 push @ints, $i if vec($vec, ++$i, 1);
1235 }
1236 } else {
1237 # This method is a fast general algorithm
1238 use integer;
1239 my $bits = unpack "b*", $vec;
1240 push @ints, 0 if $bits =~ s/^(\d)// && $1;
1241 push @ints, pos $bits while($bits =~ /1/g);
1242 }
1243 return \@ints;
1244 }
1245
1246This method gets faster the more sparse the bit vector is.
1247(Courtesy of Tim Bunce and Winfried Koenig.)
1248
65acb1b1
TC
1249Here's a demo on how to use vec():
1250
1251 # vec demo
1252 $vector = "\xff\x0f\xef\xfe";
1253 print "Ilya's string \\xff\\x0f\\xef\\xfe represents the number ",
1254 unpack("N", $vector), "\n";
1255 $is_set = vec($vector, 23, 1);
1256 print "Its 23rd bit is ", $is_set ? "set" : "clear", ".\n";
1257 pvec($vector);
1258
1259 set_vec(1,1,1);
1260 set_vec(3,1,1);
1261 set_vec(23,1,1);
1262
1263 set_vec(3,1,3);
1264 set_vec(3,2,3);
1265 set_vec(3,4,3);
1266 set_vec(3,4,7);
1267 set_vec(3,8,3);
1268 set_vec(3,8,7);
1269
1270 set_vec(0,32,17);
1271 set_vec(1,32,17);
1272
1273 sub set_vec {
1274 my ($offset, $width, $value) = @_;
1275 my $vector = '';
1276 vec($vector, $offset, $width) = $value;
1277 print "offset=$offset width=$width value=$value\n";
1278 pvec($vector);
1279 }
1280
1281 sub pvec {
1282 my $vector = shift;
1283 my $bits = unpack("b*", $vector);
1284 my $i = 0;
1285 my $BASE = 8;
1286
1287 print "vector length in bytes: ", length($vector), "\n";
1288 @bytes = unpack("A8" x length($vector), $bits);
1289 print "bits are: @bytes\n\n";
1290 }
1291
68dc0745 1292=head2 Why does defined() return true on empty arrays and hashes?
1293
65acb1b1
TC
1294The short story is that you should probably only use defined on scalars or
1295functions, not on aggregates (arrays and hashes). See L<perlfunc/defined>
1296in the 5.004 release or later of Perl for more detail.
68dc0745 1297
1298=head1 Data: Hashes (Associative Arrays)
1299
1300=head2 How do I process an entire hash?
1301
1302Use the each() function (see L<perlfunc/each>) if you don't care
1303whether it's sorted:
1304
5a964f20 1305 while ( ($key, $value) = each %hash) {
68dc0745 1306 print "$key = $value\n";
1307 }
1308
1309If you want it sorted, you'll have to use foreach() on the result of
1310sorting the keys as shown in an earlier question.
1311
1312=head2 What happens if I add or remove keys from a hash while iterating over it?
1313
1314Don't do that.
1315
1316=head2 How do I look up a hash element by value?
1317
1318Create a reverse hash:
1319
1320 %by_value = reverse %by_key;
1321 $key = $by_value{$value};
1322
1323That's not particularly efficient. It would be more space-efficient
1324to use:
1325
1326 while (($key, $value) = each %by_key) {
1327 $by_value{$value} = $key;
1328 }
1329
1330If your hash could have repeated values, the methods above will only
1331find one of the associated keys. This may or may not worry you.
1332
1333=head2 How can I know how many entries are in a hash?
1334
1335If you mean how many keys, then all you have to do is
1336take the scalar sense of the keys() function:
1337
3fe9a6f1 1338 $num_keys = scalar keys %hash;
68dc0745 1339
1340In void context it just resets the iterator, which is faster
1341for tied hashes.
1342
1343=head2 How do I sort a hash (optionally by value instead of key)?
1344
1345Internally, hashes are stored in a way that prevents you from imposing
1346an order on key-value pairs. Instead, you have to sort a list of the
1347keys or values:
1348
1349 @keys = sort keys %hash; # sorted by key
1350 @keys = sort {
1351 $hash{$a} cmp $hash{$b}
1352 } keys %hash; # and by value
1353
1354Here we'll do a reverse numeric sort by value, and if two keys are
1355identical, sort by length of key, and if that fails, by straight ASCII
1356comparison of the keys (well, possibly modified by your locale -- see
1357L<perllocale>).
1358
1359 @keys = sort {
1360 $hash{$b} <=> $hash{$a}
1361 ||
1362 length($b) <=> length($a)
1363 ||
1364 $a cmp $b
1365 } keys %hash;
1366
1367=head2 How can I always keep my hash sorted?
1368
1369You can look into using the DB_File module and tie() using the
1370$DB_BTREE hash bindings as documented in L<DB_File/"In Memory Databases">.
5a964f20 1371The Tie::IxHash module from CPAN might also be instructive.
68dc0745 1372
1373=head2 What's the difference between "delete" and "undef" with hashes?
1374
1375Hashes are pairs of scalars: the first is the key, the second is the
1376value. The key will be coerced to a string, although the value can be
1377any kind of scalar: string, number, or reference. If a key C<$key> is
1378present in the array, C<exists($key)> will return true. The value for
1379a given key can be C<undef>, in which case C<$array{$key}> will be
1380C<undef> while C<$exists{$key}> will return true. This corresponds to
1381(C<$key>, C<undef>) being in the hash.
1382
1383Pictures help... here's the C<%ary> table:
1384
1385 keys values
1386 +------+------+
1387 | a | 3 |
1388 | x | 7 |
1389 | d | 0 |
1390 | e | 2 |
1391 +------+------+
1392
1393And these conditions hold
1394
1395 $ary{'a'} is true
1396 $ary{'d'} is false
1397 defined $ary{'d'} is true
1398 defined $ary{'a'} is true
1399 exists $ary{'a'} is true (perl5 only)
1400 grep ($_ eq 'a', keys %ary) is true
1401
1402If you now say
1403
1404 undef $ary{'a'}
1405
1406your table now reads:
1407
1408
1409 keys values
1410 +------+------+
1411 | a | undef|
1412 | x | 7 |
1413 | d | 0 |
1414 | e | 2 |
1415 +------+------+
1416
1417and these conditions now hold; changes in caps:
1418
1419 $ary{'a'} is FALSE
1420 $ary{'d'} is false
1421 defined $ary{'d'} is true
1422 defined $ary{'a'} is FALSE
1423 exists $ary{'a'} is true (perl5 only)
1424 grep ($_ eq 'a', keys %ary) is true
1425
1426Notice the last two: you have an undef value, but a defined key!
1427
1428Now, consider this:
1429
1430 delete $ary{'a'}
1431
1432your table now reads:
1433
1434 keys values
1435 +------+------+
1436 | x | 7 |
1437 | d | 0 |
1438 | e | 2 |
1439 +------+------+
1440
1441and these conditions now hold; changes in caps:
1442
1443 $ary{'a'} is false
1444 $ary{'d'} is false
1445 defined $ary{'d'} is true
1446 defined $ary{'a'} is false
1447 exists $ary{'a'} is FALSE (perl5 only)
1448 grep ($_ eq 'a', keys %ary) is FALSE
1449
1450See, the whole entry is gone!
1451
1452=head2 Why don't my tied hashes make the defined/exists distinction?
1453
1454They may or may not implement the EXISTS() and DEFINED() methods
1455differently. For example, there isn't the concept of undef with hashes
1456that are tied to DBM* files. This means the true/false tables above
1457will give different results when used on such a hash. It also means
1458that exists and defined do the same thing with a DBM* file, and what
1459they end up doing is not what they do with ordinary hashes.
1460
1461=head2 How do I reset an each() operation part-way through?
1462
5a964f20 1463Using C<keys %hash> in scalar context returns the number of keys in
68dc0745 1464the hash I<and> resets the iterator associated with the hash. You may
1465need to do this if you use C<last> to exit a loop early so that when you
46fc3d4c 1466re-enter it, the hash iterator has been reset.
68dc0745 1467
1468=head2 How can I get the unique keys from two hashes?
1469
1470First you extract the keys from the hashes into arrays, and then solve
1471the uniquifying the array problem described above. For example:
1472
1473 %seen = ();
1474 for $element (keys(%foo), keys(%bar)) {
1475 $seen{$element}++;
1476 }
1477 @uniq = keys %seen;
1478
1479Or more succinctly:
1480
1481 @uniq = keys %{{%foo,%bar}};
1482
1483Or if you really want to save space:
1484
1485 %seen = ();
1486 while (defined ($key = each %foo)) {
1487 $seen{$key}++;
1488 }
1489 while (defined ($key = each %bar)) {
1490 $seen{$key}++;
1491 }
1492 @uniq = keys %seen;
1493
1494=head2 How can I store a multidimensional array in a DBM file?
1495
1496Either stringify the structure yourself (no fun), or else
1497get the MLDBM (which uses Data::Dumper) module from CPAN and layer
1498it on top of either DB_File or GDBM_File.
1499
1500=head2 How can I make my hash remember the order I put elements into it?
1501
1502Use the Tie::IxHash from CPAN.
1503
46fc3d4c 1504 use Tie::IxHash;
1505 tie(%myhash, Tie::IxHash);
1506 for ($i=0; $i<20; $i++) {
1507 $myhash{$i} = 2*$i;
1508 }
1509 @keys = keys %myhash;
1510 # @keys = (0,1,2,3,...)
1511
68dc0745 1512=head2 Why does passing a subroutine an undefined element in a hash create it?
1513
1514If you say something like:
1515
1516 somefunc($hash{"nonesuch key here"});
1517
1518Then that element "autovivifies"; that is, it springs into existence
1519whether you store something there or not. That's because functions
1520get scalars passed in by reference. If somefunc() modifies C<$_[0]>,
1521it has to be ready to write it back into the caller's version.
1522
1523This has been fixed as of perl5.004.
1524
1525Normally, merely accessing a key's value for a nonexistent key does
1526I<not> cause that key to be forever there. This is different than
1527awk's behavior.
1528
fc36a67e 1529=head2 How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
68dc0745 1530
65acb1b1
TC
1531Usually a hash ref, perhaps like this:
1532
1533 $record = {
1534 NAME => "Jason",
1535 EMPNO => 132,
1536 TITLE => "deputy peon",
1537 AGE => 23,
1538 SALARY => 37_000,
1539 PALS => [ "Norbert", "Rhys", "Phineas"],
1540 };
1541
1542References are documented in L<perlref> and the upcoming L<perlreftut>.
1543Examples of complex data structures are given in L<perldsc> and
1544L<perllol>. Examples of structures and object-oriented classes are
1545in L<perltoot>.
68dc0745 1546
1547=head2 How can I use a reference as a hash key?
1548
1549You can't do this directly, but you could use the standard Tie::Refhash
1550module distributed with perl.
1551
1552=head1 Data: Misc
1553
1554=head2 How do I handle binary data correctly?
1555
1556Perl is binary clean, so this shouldn't be a problem. For example,
1557this works fine (assuming the files are found):
1558
1559 if (`cat /vmunix` =~ /gzip/) {
1560 print "Your kernel is GNU-zip enabled!\n";
1561 }
1562
65acb1b1
TC
1563On some legacy systems, however, you have to play tedious games with
1564"text" versus "binary" files. See L<perlfunc/"binmode">, or the upcoming
1565L<perlopentut> manpage.
68dc0745 1566
1567If you're concerned about 8-bit ASCII data, then see L<perllocale>.
1568
54310121 1569If you want to deal with multibyte characters, however, there are
68dc0745 1570some gotchas. See the section on Regular Expressions.
1571
1572=head2 How do I determine whether a scalar is a number/whole/integer/float?
1573
1574Assuming that you don't care about IEEE notations like "NaN" or
1575"Infinity", you probably just want to use a regular expression.
1576
65acb1b1
TC
1577 if (/\D/) { print "has nondigits\n" }
1578 if (/^\d+$/) { print "is a whole number\n" }
1579 if (/^-?\d+$/) { print "is an integer\n" }
1580 if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
1581 if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
1582 if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number" }
1583 if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
1584 { print "a C float" }
68dc0745 1585
5a964f20
TC
1586If you're on a POSIX system, Perl's supports the C<POSIX::strtod>
1587function. Its semantics are somewhat cumbersome, so here's a C<getnum>
1588wrapper function for more convenient access. This function takes
1589a string and returns the number it found, or C<undef> for input that
1590isn't a C float. The C<is_numeric> function is a front end to C<getnum>
1591if you just want to say, ``Is this a float?''
1592
1593 sub getnum {
1594 use POSIX qw(strtod);
1595 my $str = shift;
1596 $str =~ s/^\s+//;
1597 $str =~ s/\s+$//;
1598 $! = 0;
1599 my($num, $unparsed) = strtod($str);
1600 if (($str eq '') || ($unparsed != 0) || $!) {
1601 return undef;
1602 } else {
1603 return $num;
1604 }
1605 }
1606
1607 sub is_numeric { defined &getnum }
1608
be94a901
GS
1609Or you could check out String::Scanf which can be found at
1610http://www.perl.com/CPAN/modules/by-module/String/.
1611The POSIX module (part of the standard Perl distribution) provides
1612the C<strtol> and C<strtod> for converting strings to double
68dc0745 1613and longs, respectively.
1614
1615=head2 How do I keep persistent data across program calls?
1616
1617For some specific applications, you can use one of the DBM modules.
65acb1b1
TC
1618See L<AnyDBM_File>. More generically, you should consult the FreezeThaw,
1619Storable, or Class::Eroot modules from CPAN. Here's one example using
1620Storable's C<store> and C<retrieve> functions:
1621
1622 use Storable;
1623 store(\%hash, "filename");
1624
1625 # later on...
1626 $href = retrieve("filename"); # by ref
1627 %hash = %{ retrieve("filename") }; # direct to hash
68dc0745 1628
1629=head2 How do I print out or copy a recursive data structure?
1630
65acb1b1
TC
1631The Data::Dumper module on CPAN (or the 5.005 release of Perl) is great
1632for printing out data structures. The Storable module, found on CPAN,
1633provides a function called C<dclone> that recursively copies its argument.
1634
1635 use Storable qw(dclone);
1636 $r2 = dclone($r1);
68dc0745 1637
65acb1b1
TC
1638Where $r1 can be a reference to any kind of data structure you'd like.
1639It will be deeply copied. Because C<dclone> takes and returns references,
1640you'd have to add extra punctuation if you had a hash of arrays that
1641you wanted to copy.
68dc0745 1642
65acb1b1 1643 %newhash = %{ dclone(\%oldhash) };
68dc0745 1644
1645=head2 How do I define methods for every class/object?
1646
1647Use the UNIVERSAL class (see L<UNIVERSAL>).
1648
1649=head2 How do I verify a credit card checksum?
1650
1651Get the Business::CreditCard module from CPAN.
1652
65acb1b1
TC
1653=head2 How do I pack arrays of doubles or floats for XS code?
1654
1655The kgbpack.c code in the PGPLOT module on CPAN does just this.
1656If you're doing a lot of float or double processing, consider using
1657the PDL module from CPAN instead--it makes number-crunching easy.
1658
68dc0745 1659=head1 AUTHOR AND COPYRIGHT
1660
65acb1b1 1661Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington.
5a964f20
TC
1662All rights reserved.
1663
1664When included as part of the Standard Version of Perl, or as part of
1665its complete documentation whether printed or otherwise, this work
c2611fb3 1666may be distributed only under the terms of Perl's Artistic Licence.
5a964f20
TC
1667Any distribution of this file or derivatives thereof I<outside>
1668of that package require that special arrangements be made with
1669copyright holder.
1670
1671Irrespective of its distribution, all code examples in this file
1672are hereby placed into the public domain. You are permitted and
1673encouraged to use this code in your own programs for fun
1674or for profit as you see fit. A simple comment in the code giving
1675credit would be courteous but is not required.
65acb1b1 1676