This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Update perlfaq to CPAN version 5.0150034
[perl5.git] / cpan / perlfaq / lib / perlfaq6.pod
CommitLineData
68dc0745 1=head1 NAME
2
109f0441 3perlfaq6 - Regular Expressions
68dc0745 4
5=head1 DESCRIPTION
6
7This section is surprisingly small because the rest of the FAQ is
b400a9bf 8littered with answers involving regular expressions. For example,
c56bc1f6 9decoding a URL and checking whether something is a number can be handled
68dc0745 10with regular expressions, but those answers are found elsewhere in
b432a672
AL
11this document (in L<perlfaq9>: "How do I decode or create those %-encodings
12on the web" and L<perlfaq4>: "How do I determine whether a scalar is
13a number/whole/integer/float", to be precise).
68dc0745 14
54310121 15=head2 How can I hope to use regular expressions without creating illegible and unmaintainable code?
d74e8afc
ITB
16X<regex, legibility> X<regexp, legibility>
17X<regular expression, legibility> X</x>
68dc0745 18
19Three techniques can make regular expressions maintainable and
20understandable.
21
22=over 4
23
d92eb7b0 24=item Comments Outside the Regex
68dc0745 25
26Describe what you're doing and how you're doing it, using normal Perl
27comments.
28
a9feb6cb
CBW
29 # turn the line into the first word, a colon, and the
30 # number of characters on the rest of the line
31 s/^(\w+)(.*)/ lc($1) . ":" . length($2) /meg;
68dc0745 32
d92eb7b0 33=item Comments Inside the Regex
68dc0745 34
d92eb7b0 35The C</x> modifier causes whitespace to be ignored in a regex pattern
7b059540 36(except in a character class and a few other places), and also allows you to
b400a9bf 37use normal comments there, too. As you can imagine, whitespace and comments
7b059540 38help a lot.
68dc0745 39
40C</x> lets you turn this:
41
a9feb6cb 42 s{<(?:[^>'"]*|".*?"|'.*?')+>}{}gs;
68dc0745 43
44into this:
45
a9feb6cb
CBW
46 s{ < # opening angle bracket
47 (?: # Non-backreffing grouping paren
48 [^>'"] * # 0 or more things that are neither > nor ' nor "
49 | # or else
50 ".*?" # a section between double quotes (stingy match)
51 | # or else
52 '.*?' # a section between single quotes (stingy match)
53 ) + # all occurring one or more times
54 > # closing angle bracket
55 }{}gsx; # replace with nothing, i.e. delete
68dc0745 56
57It's still not quite so clear as prose, but it is very useful for
58describing the meaning of each part of the pattern.
59
60=item Different Delimiters
61
62While we normally think of patterns as being delimited with C</>
b400a9bf 63characters, they can be delimited by almost any character. L<perlre>
64describes this. For example, the C<s///> above uses braces as
65delimiters. Selecting another delimiter can avoid quoting the
68dc0745 66delimiter within the pattern:
67
a9feb6cb
CBW
68 s/\/usr\/local/\/usr\/share/g; # bad delimiter choice
69 s#/usr/local#/usr/share#g; # better
68dc0745 70
c56bc1f6
CBW
71Using logically paired delimiters can be even more readable:
72
73 s{/usr/local/}{/usr/share}g; # better still
74
68dc0745 75=back
76
b400a9bf 77=head2 I'm having trouble matching over more than one line. What's wrong?
d74e8afc 78X<regex, multiline> X<regexp, multiline> X<regular expression, multiline>
68dc0745 79
3392b9ec
JH
80Either you don't have more than one line in the string you're looking
81at (probably), or else you aren't using the correct modifier(s) on
82your pattern (possibly).
68dc0745 83
b400a9bf 84There are many ways to get multiline data into a string. If you want
68dc0745 85it to happen automatically while reading input, you'll want to set $/
86(probably to '' for paragraphs or C<undef> for the whole file) to
87allow you to read more than one line at a time.
88
89Read L<perlre> to help you decide which of C</s> and C</m> (or both)
90you might want to use: C</s> allows dot to include newline, and C</m>
91allows caret and dollar to match next to a newline, not just at the
b400a9bf 92end of the string. You do need to make sure that you've actually
68dc0745 93got a multiline string in there.
94
95For example, this program detects duplicate words, even when they span
b400a9bf 96line breaks (but not paragraph ones). For this example, we don't need
68dc0745 97C</s> because we aren't using dot in a regular expression that we want
b400a9bf 98to cross line boundaries. Neither do we need C</m> because we aren't
68dc0745 99wanting caret or dollar to match at any point inside the record next
b400a9bf 100to newlines. But it's imperative that $/ be set to something other
68dc0745 101than the default, or else we won't actually ever have a multiline
102record read in.
103
a9feb6cb
CBW
104 $/ = ''; # read in whole paragraph, not just one line
105 while ( <> ) {
106 while ( /\b([\w'-]+)(\s+\g1)+\b/gi ) { # word starts alpha
107 print "Duplicate $1 at paragraph $.\n";
108 }
109 }
68dc0745 110
111Here's code that finds sentences that begin with "From " (which would
112be mangled by many mailers):
113
a9feb6cb
CBW
114 $/ = ''; # read in whole paragraph, not just one line
115 while ( <> ) {
116 while ( /^From /gm ) { # /m makes ^ match next to \n
117 print "leading from in paragraph $.\n";
118 }
119 }
68dc0745 120
121Here's code that finds everything between START and END in a paragraph:
122
a9feb6cb
CBW
123 undef $/; # read in whole file, not just one line or paragraph
124 while ( <> ) {
125 while ( /START(.*?)END/sgm ) { # /s makes . cross line boundaries
126 print "$1\n";
127 }
128 }
68dc0745 129
130=head2 How can I pull out lines between two patterns that are themselves on different lines?
d74e8afc 131X<..>
68dc0745 132
133You can use Perl's somewhat exotic C<..> operator (documented in
134L<perlop>):
135
a9feb6cb 136 perl -ne 'print if /START/ .. /END/' file1 file2 ...
68dc0745 137
138If you wanted text and not lines, you would use
139
a9feb6cb 140 perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ...
68dc0745 141
142But if you want nested occurrences of C<START> through C<END>, you'll
143run up against the problem described in the question in this section
144on matching balanced text.
145
5a964f20
TC
146Here's another example of using C<..>:
147
a9feb6cb
CBW
148 while (<>) {
149 $in_header = 1 .. /^$/;
150 $in_body = /^$/ .. eof;
151 # now choose between them
152 } continue {
153 $. = 0 if eof; # fix $.
154 }
5a964f20 155
109f0441
S
156=head2 How do I match XML, HTML, or other nasty, ugly things with a regex?
157X<regex, XML> X<regex, HTML> X<XML> X<HTML> X<pain> X<frustration>
158X<sucking out, will to live>
159
160(contributed by brian d foy)
161
162If you just want to get work done, use a module and forget about the
a9feb6cb 163regular expressions. The L<XML::Parser> and L<HTML::Parser> modules
109f0441
S
164are good starts, although each namespace has other parsing modules
165specialized for certain tasks and different ways of doing it. Start at
c56bc1f6 166CPAN Search ( L<http://search.cpan.org> ) and wonder at all the work people
109f0441
S
167have done for you already! :)
168
169The problem with things such as XML is that they have balanced text
170containing multiple levels of balanced text, but sometimes it isn't
171balanced text, as in an empty tag (C<< <br/> >>, for instance). Even then,
172things can occur out-of-order. Just when you think you've got a
173pattern that matches your input, someone throws you a curveball.
174
175If you'd like to do it the hard way, scratching and clawing your way
589a5df2 176toward a right answer but constantly being disappointed, besieged by
109f0441
S
177bug reports, and weary from the inordinate amount of time you have to
178spend reinventing a triangular wheel, then there are several things
179you can try before you give up in frustration:
180
181=over 4
182
183=item * Solve the balanced text problem from another question in L<perlfaq6>
184
185=item * Try the recursive regex features in Perl 5.10 and later. See L<perlre>
186
187=item * Try defining a grammar using Perl 5.10's C<(?DEFINE)> feature.
188
189=item * Break the problem down into sub-problems instead of trying to use a single regex
190
191=item * Convince everyone not to use XML or HTML in the first place
192
193=back
194
195Good luck!
196
68dc0745 197=head2 I put a regular expression into $/ but it didn't work. What's wrong?
d74e8afc
ITB
198X<$/, regexes in> X<$INPUT_RECORD_SEPARATOR, regexes in>
199X<$RS, regexes in>
68dc0745 200
b400a9bf 201$/ has to be a string. You can use these examples if you really need to
c195e131 202do this.
49d635f9 203
28b41a80
RGS
204If you have File::Stream, this is easy.
205
a9feb6cb 206 use File::Stream;
ac9dac7f 207
a9feb6cb
CBW
208 my $stream = File::Stream->new(
209 $filehandle,
210 separator => qr/\s*,\s*/,
211 );
28b41a80 212
a9feb6cb 213 print "$_\n" while <$stream>;
28b41a80
RGS
214
215If you don't have File::Stream, you have to do a little more work.
216
109f0441 217You can use the four-argument form of sysread to continually add to
b400a9bf 218a buffer. After you add to the buffer, you check if you have a
49d635f9
RGS
219complete line (using your regular expression).
220
a9feb6cb
CBW
221 local $_ = "";
222 while( sysread FH, $_, 8192, length ) {
223 while( s/^((?s).*?)your_pattern// ) {
224 my $record = $1;
225 # do stuff here.
226 }
227 }
197aec24 228
109f0441
S
229You can do the same thing with foreach and a match using the
230c flag and the \G anchor, if you do not mind your entire file
231being in memory at the end.
197aec24 232
a9feb6cb
CBW
233 local $_ = "";
234 while( sysread FH, $_, 8192, length ) {
235 foreach my $record ( m/\G((?s).*?)your_pattern/gc ) {
236 # do stuff here.
237 }
238 substr( $_, 0, pos ) = "" if pos;
239 }
68dc0745 240
3fe9a6f1 241
a3cdda95 242=head2 How do I substitute case-insensitively on the LHS while preserving case on the RHS?
d74e8afc
ITB
243X<replace, case preserving> X<substitute, case preserving>
244X<substitution, case preserving> X<s, case preserving>
68dc0745 245
b400a9bf 246Here's a lovely Perlish solution by Larry Rosler. It exploits
d92eb7b0
GS
247properties of bitwise xor on ASCII strings.
248
a9feb6cb 249 $_= "this is a TEsT case";
d92eb7b0 250
a9feb6cb
CBW
251 $old = 'test';
252 $new = 'success';
d92eb7b0 253
a9feb6cb
CBW
254 s{(\Q$old\E)}
255 { uc $new | (uc $1 ^ $1) .
256 (uc(substr $1, -1) ^ substr $1, -1) x
257 (length($new) - length $1)
258 }egi;
d92eb7b0 259
a9feb6cb 260 print;
d92eb7b0 261
8305e449 262And here it is as a subroutine, modeled after the above:
d92eb7b0 263
a9feb6cb
CBW
264 sub preserve_case($$) {
265 my ($old, $new) = @_;
266 my $mask = uc $old ^ $old;
d92eb7b0 267
a9feb6cb
CBW
268 uc $new | $mask .
269 substr($mask, -1) x (length($new) - length($old))
d92eb7b0
GS
270 }
271
a9feb6cb
CBW
272 $string = "this is a TEsT case";
273 $string =~ s/(test)/preserve_case($1, "success")/egi;
274 print "$string\n";
d92eb7b0
GS
275
276This prints:
277
a9feb6cb 278 this is a SUcCESS case
d92eb7b0 279
74b9445a
JP
280As an alternative, to keep the case of the replacement word if it is
281longer than the original, you can use this code, by Jeff Pinyan:
282
a9feb6cb
CBW
283 sub preserve_case {
284 my ($from, $to) = @_;
285 my ($lf, $lt) = map length, @_;
7207e29d 286
a9feb6cb
CBW
287 if ($lt < $lf) { $from = substr $from, 0, $lt }
288 else { $from .= substr $to, $lf }
7207e29d 289
a9feb6cb
CBW
290 return uc $to | ($from ^ uc $from);
291 }
74b9445a
JP
292
293This changes the sentence to "this is a SUcCess case."
294
d92eb7b0
GS
295Just to show that C programmers can write C in any programming language,
296if you prefer a more C-like solution, the following script makes the
297substitution have the same case, letter by letter, as the original.
298(It also happens to run about 240% slower than the Perlish solution runs.)
299If the substitution has more characters than the string being substituted,
300the case of the last character is used for the rest of the substitution.
68dc0745 301
a9feb6cb
CBW
302 # Original by Nathan Torkington, massaged by Jeffrey Friedl
303 #
304 sub preserve_case($$)
305 {
306 my ($old, $new) = @_;
307 my ($state) = 0; # 0 = no change; 1 = lc; 2 = uc
308 my ($i, $oldlen, $newlen, $c) = (0, length($old), length($new));
309 my ($len) = $oldlen < $newlen ? $oldlen : $newlen;
310
311 for ($i = 0; $i < $len; $i++) {
312 if ($c = substr($old, $i, 1), $c =~ /[\W\d_]/) {
313 $state = 0;
314 } elsif (lc $c eq $c) {
315 substr($new, $i, 1) = lc(substr($new, $i, 1));
316 $state = 1;
317 } else {
318 substr($new, $i, 1) = uc(substr($new, $i, 1));
319 $state = 2;
320 }
321 }
322 # finish up with any remaining new (for when new is longer than old)
323 if ($newlen > $oldlen) {
324 if ($state == 1) {
325 substr($new, $oldlen) = lc(substr($new, $oldlen));
326 } elsif ($state == 2) {
327 substr($new, $oldlen) = uc(substr($new, $oldlen));
328 }
329 }
330 return $new;
331 }
68dc0745 332
5a964f20 333=head2 How can I make C<\w> match national character sets?
d74e8afc 334X<\w>
68dc0745 335
b400a9bf 336Put C<use locale;> in your script. The \w character class is taken
49d635f9
RGS
337from the current locale.
338
339See L<perllocale> for details.
68dc0745 340
341=head2 How can I match a locale-smart version of C</[a-zA-Z]/>?
d74e8afc 342X<alpha>
68dc0745 343
49d635f9
RGS
344You can use the POSIX character class syntax C</[[:alpha:]]/>
345documented in L<perlre>.
346
347No matter which locale you are in, the alphabetic characters are
348the characters in \w without the digits and the underscore.
b400a9bf 349As a regex, that looks like C</[^\W\d_]/>. Its complement,
197aec24
RGS
350the non-alphabetics, is then everything in \W along with
351the digits and the underscore, or C</[\W\d_]/>.
68dc0745 352
d92eb7b0 353=head2 How can I quote a variable to use in a regex?
d74e8afc 354X<regex, escaping> X<regexp, escaping> X<regular expression, escaping>
68dc0745 355
356The Perl parser will expand $variable and @variable references in
b400a9bf 357regular expressions unless the delimiter is a single quote. Remember,
79a522f5 358too, that the right-hand side of a C<s///> substitution is considered
b400a9bf 359a double-quoted string (see L<perlop> for more details). Remember
d92eb7b0 360also that any regex special characters will be acted on unless you
b400a9bf 361precede the substitution with \Q. Here's an example:
68dc0745 362
a9feb6cb
CBW
363 $string = "Placido P. Octopus";
364 $regex = "P.";
68dc0745 365
a9feb6cb
CBW
366 $string =~ s/$regex/Polyp/;
367 # $string is now "Polypacido P. Octopus"
68dc0745 368
c83084d1
MJD
369Because C<.> is special in regular expressions, and can match any
370single character, the regex C<P.> here has matched the <Pl> in the
371original string.
372
373To escape the special meaning of C<.>, we use C<\Q>:
374
a9feb6cb
CBW
375 $string = "Placido P. Octopus";
376 $regex = "P.";
c83084d1 377
a9feb6cb
CBW
378 $string =~ s/\Q$regex/Polyp/;
379 # $string is now "Placido Polyp Octopus"
c83084d1
MJD
380
381The use of C<\Q> causes the <.> in the regex to be treated as a
382regular character, so that C<P.> matches a C<P> followed by a dot.
68dc0745 383
384=head2 What is C</o> really for?
ee891a00 385X</o, regular expressions> X<compile, regular expressions>
68dc0745 386
ee891a00 387(contributed by brian d foy)
68dc0745 388
ee891a00
RGS
389The C</o> option for regular expressions (documented in L<perlop> and
390L<perlreref>) tells Perl to compile the regular expression only once.
391This is only useful when the pattern contains a variable. Perls 5.6
392and later handle this automatically if the pattern does not change.
68dc0745 393
ee891a00
RGS
394Since the match operator C<m//>, the substitution operator C<s///>,
395and the regular expression quoting operator C<qr//> are double-quotish
396constructs, you can interpolate variables into the pattern. See the
397answer to "How can I quote a variable to use in a regex?" for more
398details.
68dc0745 399
ee891a00
RGS
400This example takes a regular expression from the argument list and
401prints the lines of input that match it:
68dc0745 402
a9feb6cb 403 my $pattern = shift @ARGV;
109f0441 404
a9feb6cb
CBW
405 while( <> ) {
406 print if m/$pattern/;
407 }
ee891a00
RGS
408
409Versions of Perl prior to 5.6 would recompile the regular expression
410for each iteration, even if C<$pattern> had not changed. The C</o>
411would prevent this by telling Perl to compile the pattern the first
412time, then reuse that for subsequent iterations:
413
a9feb6cb 414 my $pattern = shift @ARGV;
109f0441 415
a9feb6cb
CBW
416 while( <> ) {
417 print if m/$pattern/o; # useful for Perl < 5.6
418 }
ee891a00
RGS
419
420In versions 5.6 and later, Perl won't recompile the regular expression
421if the variable hasn't changed, so you probably don't need the C</o>
422option. It doesn't hurt, but it doesn't help either. If you want any
423version of Perl to compile the regular expression only once even if
424the variable changes (thus, only using its initial value), you still
425need the C</o>.
426
427You can watch Perl's regular expression engine at work to verify for
428yourself if Perl is recompiling a regular expression. The C<use re
429'debug'> pragma (comes with Perl 5.005 and later) shows the details.
430With Perls before 5.6, you should see C<re> reporting that its
431compiling the regular expression on each iteration. With Perl 5.6 or
432later, you should only see C<re> report that for the first iteration.
433
a9feb6cb 434 use re 'debug';
109f0441 435
a9feb6cb
CBW
436 $regex = 'Perl';
437 foreach ( qw(Perl Java Ruby Python) ) {
438 print STDERR "-" x 73, "\n";
439 print STDERR "Trying $_...\n";
440 print STDERR "\t$_ is good!\n" if m/$regex/;
441 }
68dc0745 442
c905bd84 443=head2 How do I use a regular expression to strip C-style comments from a file?
68dc0745 444
445While this actually can be done, it's much harder than you'd think.
446For example, this one-liner
447
a9feb6cb 448 perl -0777 -pe 's{/\*.*?\*/}{}gs' foo.c
68dc0745 449
b400a9bf 450will work in many but not all cases. You see, it's too simple-minded for
68dc0745 451certain kinds of C programs, in particular, those with what appear to be
b400a9bf 452comments in quoted strings. For that, you'd need something like this,
d92eb7b0 453created by Jeffrey Friedl and later modified by Fred Curtis.
68dc0745 454
a9feb6cb
CBW
455 $/ = undef;
456 $_ = <>;
457 s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse;
458 print;
68dc0745 459
460This could, of course, be more legibly written with the C</x> modifier, adding
b400a9bf 461whitespace and comments. Here it is expanded, courtesy of Fred Curtis.
d92eb7b0
GS
462
463 s{
464 /\* ## Start of /* ... */ comment
465 [^*]*\*+ ## Non-* followed by 1-or-more *'s
466 (
467 [^/*][^*]*\*+
468 )* ## 0-or-more things which don't start with /
469 ## but do end with '*'
470 / ## End of /* ... */ comment
471
472 | ## OR various things which aren't comments:
473
474 (
475 " ## Start of " ... " string
476 (
477 \\. ## Escaped char
478 | ## OR
479 [^"\\] ## Non "\
480 )*
481 " ## End of " ... " string
482
483 | ## OR
484
485 ' ## Start of ' ... ' string
486 (
487 \\. ## Escaped char
488 | ## OR
489 [^'\\] ## Non '\
490 )*
491 ' ## End of ' ... ' string
492
493 | ## OR
494
495 . ## Anything other char
496 [^/"'\\]* ## Chars which doesn't start a comment, string or escape
497 )
c98c5709 498 }{defined $2 ? $2 : ""}gxse;
d92eb7b0 499
109f0441
S
500A slight modification also removes C++ comments, possibly spanning multiple lines
501using a continuation character:
d92eb7b0 502
109f0441 503 s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\\]|[^\n][\n]?)*?\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $3 ? $3 : ""#gse;
68dc0745 504
505=head2 Can I use Perl regular expressions to match balanced text?
d74e8afc 506X<regex, matching balanced test> X<regexp, matching balanced test>
109f0441
S
507X<regular expression, matching balanced test> X<possessive> X<PARNO>
508X<Text::Balanced> X<Regexp::Common> X<backtracking> X<recursion>
509
510(contributed by brian d foy)
511
a9feb6cb 512Your first try should probably be the L<Text::Balanced> module, which
109f0441 513is in the Perl standard library since Perl 5.8. It has a variety of
a9feb6cb 514functions to deal with tricky text. The L<Regexp::Common> module can
109f0441
S
515also help by providing canned patterns you can use.
516
517As of Perl 5.10, you can match balanced text with regular expressions
518using recursive patterns. Before Perl 5.10, you had to resort to
519various tricks such as using Perl code in C<(??{})> sequences.
520
521Here's an example using a recursive regular expression. The goal is to
522capture all of the text within angle brackets, including the text in
523nested angle brackets. This sample text has two "major" groups: a
524group with one level of nesting and a group with two levels of
525nesting. There are five total groups in angle brackets:
526
a9feb6cb
CBW
527 I have some <brackets in <nested brackets> > and
528 <another group <nested once <nested twice> > >
529 and that's it.
109f0441 530
b400a9bf 531The regular expression to match the balanced text uses two new (to
109f0441
S
532Perl 5.10) regular expression features. These are covered in L<perlre>
533and this example is a modified version of one in that documentation.
534
589a5df2 535First, adding the new possessive C<+> to any quantifier finds the
109f0441
S
536longest match and does not backtrack. That's important since you want
537to handle any angle brackets through the recursion, not backtracking.
538The group C<< [^<>]++ >> finds one or more non-angle brackets without
539backtracking.
540
541Second, the new C<(?PARNO)> refers to the sub-pattern in the
c27a5cfe
KW
542particular capture group given by C<PARNO>. In the following regex,
543the first capture group finds (and remembers) the balanced text, and
b400a9bf 544you need that same pattern within the first buffer to get past the
109f0441 545nested text. That's the recursive part. The C<(?1)> uses the pattern
c27a5cfe 546in the outer capture group as an independent part of the regex.
109f0441
S
547
548Putting it all together, you have:
549
a9feb6cb
CBW
550 #!/usr/local/bin/perl5.10.0
551
552 my $string =<<"HERE";
553 I have some <brackets in <nested brackets> > and
554 <another group <nested once <nested twice> > >
555 and that's it.
556 HERE
557
558 my @groups = $string =~ m/
559 ( # start of capture group 1
560 < # match an opening angle bracket
561 (?:
562 [^<>]++ # one or more non angle brackets, non backtracking
563 |
564 (?1) # found < or >, so recurse to capture group 1
565 )*
566 > # match a closing angle bracket
567 ) # end of capture group 1
568 /xg;
569
570 $" = "\n\t";
571 print "Found:\n\t@groups\n";
109f0441
S
572
573The output shows that Perl found the two major groups:
574
a9feb6cb
CBW
575 Found:
576 <brackets in <nested brackets> >
577 <another group <nested once <nested twice> > >
109f0441
S
578
579With a little extra work, you can get the all of the groups in angle
580brackets even if they are in other angle brackets too. Each time you
581get a balanced match, remove its outer delimiter (that's the one you
582just matched so don't match it again) and add it to a queue of strings
583to process. Keep doing that until you get no matches:
584
a9feb6cb 585 #!/usr/local/bin/perl5.10.0
109f0441 586
a9feb6cb
CBW
587 my @queue =<<"HERE";
588 I have some <brackets in <nested brackets> > and
589 <another group <nested once <nested twice> > >
590 and that's it.
591 HERE
109f0441 592
a9feb6cb
CBW
593 my $regex = qr/
594 ( # start of bracket 1
595 < # match an opening angle bracket
596 (?:
597 [^<>]++ # one or more non angle brackets, non backtracking
598 |
599 (?1) # recurse to bracket 1
600 )*
601 > # match a closing angle bracket
602 ) # end of bracket 1
603 /x;
109f0441 604
a9feb6cb 605 $" = "\n\t";
109f0441 606
a9feb6cb
CBW
607 while( @queue ) {
608 my $string = shift @queue;
109f0441 609
a9feb6cb
CBW
610 my @groups = $string =~ m/$regex/g;
611 print "Found:\n\t@groups\n\n" if @groups;
109f0441 612
a9feb6cb
CBW
613 unshift @queue, map { s/^<//; s/>$//; $_ } @groups;
614 }
109f0441
S
615
616The output shows all of the groups. The outermost matches show up
617first and the nested matches so up later:
618
a9feb6cb
CBW
619 Found:
620 <brackets in <nested brackets> >
621 <another group <nested once <nested twice> > >
109f0441 622
a9feb6cb
CBW
623 Found:
624 <nested brackets>
109f0441 625
a9feb6cb
CBW
626 Found:
627 <nested once <nested twice> >
109f0441 628
a9feb6cb
CBW
629 Found:
630 <nested twice>
68dc0745 631
b400a9bf 632=head2 What does it mean that regexes are greedy? How can I get around it?
d74e8afc 633X<greedy> X<greediness>
68dc0745 634
d92eb7b0 635Most people mean that greedy regexes match as much as they can.
68dc0745 636Technically speaking, it's actually the quantifiers (C<?>, C<*>, C<+>,
637C<{}>) that are greedy rather than the whole pattern; Perl prefers local
b400a9bf 638greed and immediate gratification to overall greed. To get non-greedy
68dc0745 639versions of the same quantifiers, use (C<??>, C<*?>, C<+?>, C<{}?>).
640
641An example:
642
a9feb6cb
CBW
643 $s1 = $s2 = "I am very very cold";
644 $s1 =~ s/ve.*y //; # I am cold
645 $s2 =~ s/ve.*?y //; # I am very cold
68dc0745 646
647Notice how the second substitution stopped matching as soon as it
b400a9bf 648encountered "y ". The C<*?> quantifier effectively tells the regular
68dc0745 649expression engine to find a match as quickly as possible and pass
a3cdda95 650control on to whatever is next in line, as you would if you were
68dc0745 651playing hot potato.
652
f9ac83b8 653=head2 How do I process each word on each line?
d74e8afc 654X<word>
68dc0745 655
656Use the split function:
657
a9feb6cb
CBW
658 while (<>) {
659 foreach $word ( split ) {
660 # do something with $word here
661 }
662 }
68dc0745 663
54310121 664Note that this isn't really a word in the English sense; it's just
665chunks of consecutive non-whitespace characters.
68dc0745 666
f1cbbd6e
GS
667To work with only alphanumeric sequences (including underscores), you
668might consider
68dc0745 669
a9feb6cb
CBW
670 while (<>) {
671 foreach $word (m/(\w+)/g) {
672 # do something with $word here
673 }
674 }
68dc0745 675
676=head2 How can I print out a word-frequency or line-frequency summary?
677
b400a9bf 678To do this, you have to parse out each word in the input stream. We'll
54310121 679pretend that by word you mean chunk of alphabetics, hyphens, or
680apostrophes, rather than the non-whitespace chunk idea of a word given
68dc0745 681in the previous question:
682
a9feb6cb
CBW
683 while (<>) {
684 while ( /(\b[^\W_\d][\w'-]+\b)/g ) { # misses "`sheep'"
685 $seen{$1}++;
686 }
687 }
ac9dac7f 688
a9feb6cb
CBW
689 while ( ($word, $count) = each %seen ) {
690 print "$count $word\n";
691 }
68dc0745 692
693If you wanted to do the same thing for lines, you wouldn't need a
694regular expression:
695
a9feb6cb
CBW
696 while (<>) {
697 $seen{$_}++;
698 }
ac9dac7f 699
a9feb6cb
CBW
700 while ( ($line, $count) = each %seen ) {
701 print "$count $line";
702 }
68dc0745 703
b432a672
AL
704If you want these output in a sorted order, see L<perlfaq4>: "How do I
705sort a hash (optionally by value instead of key)?".
68dc0745 706
707=head2 How can I do approximate matching?
d74e8afc 708X<match, approximate> X<matching, approximate>
68dc0745 709
710See the module String::Approx available from CPAN.
711
712=head2 How do I efficiently match many regular expressions at once?
d74e8afc
ITB
713X<regex, efficiency> X<regexp, efficiency>
714X<regular expression, efficiency>
68dc0745 715
c93274ad 716(contributed by brian d foy)
7678cced 717
03c6e0f8 718If you have Perl 5.10 or later, this is almost trivial. You just smart
719match against an array of regular expression objects:
7678cced 720
a9feb6cb 721 my @patterns = ( qr/Fr.d/, qr/B.rn.y/, qr/W.lm./ );
21cde153 722
a9feb6cb
CBW
723 if( $string ~~ @patterns ) {
724 ...
725 };
6670e5e7 726
03c6e0f8 727The smart match stops when it finds a match, so it doesn't have to try
728every expression.
729
730Earlier than Perl 5.10, you have a bit of work to do. You want to
731avoid compiling a regular expression every time you want to match it.
732In this example, perl must recompile the regular expression for every
733iteration of the C<foreach> loop since it has no way to know what
734C<$pattern> will be:
735
a9feb6cb 736 my @patterns = qw( foo bar baz );
03c6e0f8 737
a9feb6cb
CBW
738 LINE: while( <DATA> ) {
739 foreach $pattern ( @patterns ) {
740 if( /\b$pattern\b/i ) {
741 print;
742 next LINE;
743 }
744 }
745 }
68dc0745 746
03c6e0f8 747The C<qr//> operator showed up in perl 5.005. It compiles a regular
748expression, but doesn't apply it. When you use the pre-compiled
749version of the regex, perl does less work. In this example, I inserted
750a C<map> to turn each pattern into its pre-compiled form. The rest of
751the script is the same, but faster:
7678cced 752
a9feb6cb 753 my @patterns = map { qr/\b$_\b/i } qw( foo bar baz );
7678cced 754
a9feb6cb
CBW
755 LINE: while( <> ) {
756 foreach $pattern ( @patterns ) {
757 if( /$pattern/ ) {
758 print;
759 next LINE;
760 }
761 }
762 }
6670e5e7 763
03c6e0f8 764In some cases, you may be able to make several patterns into a single
765regular expression. Beware of situations that require backtracking
766though.
65acb1b1 767
a9feb6cb 768 my $regex = join '|', qw( foo bar baz );
7678cced 769
a9feb6cb
CBW
770 LINE: while( <> ) {
771 print if /\b(?:$regex)\b/i;
772 }
7678cced 773
109f0441 774For more details on regular expression efficiency, see I<Mastering
c69ca1d4 775Regular Expressions> by Jeffrey Friedl. He explains how regular
7678cced 776expressions engine work and why some patterns are surprisingly
03c6e0f8 777inefficient. Once you understand how perl applies regular expressions,
778you can tune them for individual situations.
68dc0745 779
780=head2 Why don't word-boundary searches with C<\b> work for me?
d74e8afc 781X<\b>
68dc0745 782
7678cced
RGS
783(contributed by brian d foy)
784
785Ensure that you know what \b really does: it's the boundary between a
786word character, \w, and something that isn't a word character. That
787thing that isn't a word character might be \W, but it can also be the
788start or end of the string.
789
790It's not (not!) the boundary between whitespace and non-whitespace,
791and it's not the stuff between words we use to create sentences.
792
793In regex speak, a word boundary (\b) is a "zero width assertion",
794meaning that it doesn't represent a character in the string, but a
795condition at a certain position.
796
797For the regular expression, /\bPerl\b/, there has to be a word
c93274ad 798boundary before the "P" and after the "l". As long as something other
7678cced
RGS
799than a word character precedes the "P" and succeeds the "l", the
800pattern will match. These strings match /\bPerl\b/.
801
a9feb6cb
CBW
802 "Perl" # no word char before P or after l
803 "Perl " # same as previous (space is not a word char)
804 "'Perl'" # the ' char is not a word char
805 "Perl's" # no word char before P, non-word char after "l"
7678cced
RGS
806
807These strings do not match /\bPerl\b/.
808
a9feb6cb
CBW
809 "Perl_" # _ is a word char!
810 "Perler" # no word char before P, but one after l
6670e5e7 811
c93274ad 812You don't have to use \b to match words though. You can look for
813non-word characters surrounded by word characters. These strings
7678cced
RGS
814match the pattern /\b'\b/.
815
a9feb6cb
CBW
816 "don't" # the ' char is surrounded by "n" and "t"
817 "qep'a'" # the ' char is surrounded by "p" and "a"
6670e5e7 818
7678cced 819These strings do not match /\b'\b/.
68dc0745 820
a9feb6cb 821 "foo'" # there is no word char after non-word '
6670e5e7 822
7678cced
RGS
823You can also use the complement of \b, \B, to specify that there
824should not be a word boundary.
68dc0745 825
7678cced
RGS
826In the pattern /\Bam\B/, there must be a word character before the "a"
827and after the "m". These patterns match /\Bam\B/:
68dc0745 828
a9feb6cb
CBW
829 "llama" # "am" surrounded by word chars
830 "Samuel" # same
6670e5e7 831
7678cced 832These strings do not match /\Bam\B/
68dc0745 833
a9feb6cb
CBW
834 "Sam" # no word boundary before "a", but one after "m"
835 "I am Sam" # "am" surrounded by non-word chars
68dc0745 836
68dc0745 837
838=head2 Why does using $&, $`, or $' slow my program down?
d74e8afc 839X<$MATCH> X<$&> X<$POSTMATCH> X<$'> X<$PREMATCH> X<$`>
68dc0745 840
571e049f 841(contributed by Anno Siegel)
68dc0745 842
571e049f 843Once Perl sees that you need one of these variables anywhere in the
b68463f7
RGS
844program, it provides them on each and every pattern match. That means
845that on every pattern match the entire string will be copied, part of it
846to $`, part to $&, and part to $'. Thus the penalty is most severe with
847long strings and patterns that match often. Avoid $&, $', and $` if you
848can, but if you can't, once you've used them at all, use them at will
849because you've already paid the price. Remember that some algorithms
850really appreciate them. As of the 5.005 release, the $& variable is no
851longer "expensive" the way the other two are.
852
853Since Perl 5.6.1 the special variables @- and @+ can functionally replace
c93274ad 854$`, $& and $'. These arrays contain pointers to the beginning and end
b68463f7
RGS
855of each match (see perlvar for the full story), so they give you
856essentially the same information, but without the risk of excessive
857string copying.
6670e5e7 858
109f0441
S
859Perl 5.10 added three specials, C<${^MATCH}>, C<${^PREMATCH}>, and
860C<${^POSTMATCH}> to do the same job but without the global performance
861penalty. Perl 5.10 only sets these variables if you compile or execute the
862regular expression with the C</p> modifier.
863
68dc0745 864=head2 What good is C<\G> in a regular expression?
d74e8afc 865X<\G>
68dc0745 866
49d635f9 867You use the C<\G> anchor to start the next match on the same
c93274ad 868string where the last match left off. The regular
49d635f9
RGS
869expression engine cannot skip over any characters to find
870the next match with this anchor, so C<\G> is similar to the
c93274ad 871beginning of string anchor, C<^>. The C<\G> anchor is typically
872used with the C<g> flag. It uses the value of C<pos()>
873as the position to start the next match. As the match
ee891a00 874operator makes successive matches, it updates C<pos()> with the
49d635f9
RGS
875position of the next character past the last match (or the
876first character of the next match, depending on how you like
ee891a00 877to look at it). Each string has its own C<pos()> value.
49d635f9 878
ee891a00 879Suppose you want to match all of consecutive pairs of digits
49d635f9 880in a string like "1122a44" and stop matching when you
c93274ad 881encounter non-digits. You want to match C<11> and C<22> but
49d635f9
RGS
882the letter <a> shows up between C<22> and C<44> and you want
883to stop at C<a>. Simply matching pairs of digits skips over
884the C<a> and still matches C<44>.
885
a9feb6cb
CBW
886 $_ = "1122a44";
887 my @pairs = m/(\d\d)/g; # qw( 11 22 44 )
49d635f9 888
ee891a00 889If you use the C<\G> anchor, you force the match after C<22> to
c93274ad 890start with the C<a>. The regular expression cannot match
49d635f9
RGS
891there since it does not find a digit, so the next match
892fails and the match operator returns the pairs it already
893found.
894
a9feb6cb
CBW
895 $_ = "1122a44";
896 my @pairs = m/\G(\d\d)/g; # qw( 11 22 )
49d635f9
RGS
897
898You can also use the C<\G> anchor in scalar context. You
899still need the C<g> flag.
900
a9feb6cb
CBW
901 $_ = "1122a44";
902 while( m/\G(\d\d)/g ) {
903 print "Found $1\n";
904 }
197aec24 905
ee891a00 906After the match fails at the letter C<a>, perl resets C<pos()>
49d635f9
RGS
907and the next match on the same string starts at the beginning.
908
a9feb6cb
CBW
909 $_ = "1122a44";
910 while( m/\G(\d\d)/g ) {
911 print "Found $1\n";
912 }
49d635f9 913
a9feb6cb 914 print "Found $1 after while" if m/(\d\d)/g; # finds "11"
49d635f9 915
ee891a00
RGS
916You can disable C<pos()> resets on fail with the C<c> flag, documented
917in L<perlop> and L<perlreref>. Subsequent matches start where the last
918successful match ended (the value of C<pos()>) even if a match on the
919same string has failed in the meantime. In this case, the match after
920the C<while()> loop starts at the C<a> (where the last match stopped),
921and since it does not use any anchor it can skip over the C<a> to find
922C<44>.
49d635f9 923
a9feb6cb
CBW
924 $_ = "1122a44";
925 while( m/\G(\d\d)/gc ) {
926 print "Found $1\n";
927 }
49d635f9 928
a9feb6cb 929 print "Found $1 after while" if m/(\d\d)/g; # finds "44"
49d635f9
RGS
930
931Typically you use the C<\G> anchor with the C<c> flag
932when you want to try a different match if one fails,
933such as in a tokenizer. Jeffrey Friedl offers this example
934which works in 5.004 or later.
68dc0745 935
a9feb6cb
CBW
936 while (<>) {
937 chomp;
938 PARSER: {
939 m/ \G( \d+\b )/gcx && do { print "number: $1\n"; redo; };
940 m/ \G( \w+ )/gcx && do { print "word: $1\n"; redo; };
941 m/ \G( \s+ )/gcx && do { print "space: $1\n"; redo; };
942 m/ \G( [^\w\d]+ )/gcx && do { print "other: $1\n"; redo; };
943 }
944 }
68dc0745 945
ee891a00 946For each line, the C<PARSER> loop first tries to match a series
c93274ad 947of digits followed by a word boundary. This match has to
49d635f9 948start at the place the last match left off (or the beginning
197aec24 949of the string on the first match). Since C<m/ \G( \d+\b
49d635f9
RGS
950)/gcx> uses the C<c> flag, if the string does not match that
951regular expression, perl does not reset pos() and the next
952match starts at the same position to try a different
953pattern.
68dc0745 954
c93274ad 955=head2 Are Perl regexes DFAs or NFAs? Are they POSIX compliant?
d74e8afc 956X<DFA> X<NFA> X<POSIX>
68dc0745 957
958While it's true that Perl's regular expressions resemble the DFAs
959(deterministic finite automata) of the egrep(1) program, they are in
46fc3d4c 960fact implemented as NFAs (non-deterministic finite automata) to allow
c93274ad 961backtracking and backreferencing. And they aren't POSIX-style either,
962because those guarantee worst-case behavior for all cases. (It seems
68dc0745 963that some people prefer guarantees of consistency, even when what's
c93274ad 964guaranteed is slowness.) See the book "Mastering Regular Expressions"
68dc0745 965(from O'Reilly) by Jeffrey Friedl for all the details you could ever
966hope to know on these matters (a full citation appears in
967L<perlfaq2>).
968
788611b6 969=head2 What's wrong with using grep in a void context?
d74e8afc 970X<grep>
68dc0745 971
788611b6
A
972The problem is that grep builds a return list, regardless of the context.
973This means you're making Perl go to the trouble of building a list that
974you then just throw away. If the list is large, you waste both time and space.
975If your intent is to iterate over the list, then use a for loop for this
f05bbc40 976purpose.
68dc0745 977
788611b6
A
978In perls older than 5.8.1, map suffers from this problem as well.
979But since 5.8.1, this has been fixed, and map is context aware - in void
980context, no lists are constructed.
981
54310121 982=head2 How can I match strings with multibyte characters?
d74e8afc 983X<regex, and multibyte characters> X<regexp, and multibyte characters>
ac9dac7f 984X<regular expression, and multibyte characters> X<martian> X<encoding, Martian>
68dc0745 985
d9d154f2 986Starting from Perl 5.6 Perl has had some level of multibyte character
c93274ad 987support. Perl 5.8 or later is recommended. Supported multibyte
fe854a6f 988character repertoires include Unicode, and legacy encodings
c93274ad 989through the Encode module. See L<perluniintro>, L<perlunicode>,
d9d154f2
JH
990and L<Encode>.
991
992If you are stuck with older Perls, you can do Unicode with the
a9feb6cb
CBW
993L<Unicode::String> module, and character conversions using the
994L<Unicode::Map8> and L<Unicode::Map> modules. If you are using
d9d154f2
JH
995Japanese encodings, you might try using the jperl 5.005_03.
996
997Finally, the following set of approaches was offered by Jeffrey
998Friedl, whose article in issue #5 of The Perl Journal talks about
999this very matter.
68dc0745 1000
fc36a67e 1001Let's suppose you have some weird Martian encoding where pairs of
1002ASCII uppercase letters encode single Martian letters (i.e. the two
1003bytes "CV" make a single Martian letter, as do the two bytes "SG",
1004"VS", "XX", etc.). Other bytes represent single characters, just like
1005ASCII.
68dc0745 1006
fc36a67e 1007So, the string of Martian "I am CVSGXX!" uses 12 bytes to encode the
1008nine characters 'I', ' ', 'a', 'm', ' ', 'CV', 'SG', 'XX', '!'.
68dc0745 1009
1010Now, say you want to search for the single character C</GX/>. Perl
fc36a67e 1011doesn't know about Martian, so it'll find the two bytes "GX" in the "I
c93274ad 1012am CVSGXX!" string, even though that character isn't there: it just
fc36a67e 1013looks like it is because "SG" is next to "XX", but there's no real
c93274ad 1014"GX". This is a big problem.
68dc0745 1015
1016Here are a few ways, all painful, to deal with it:
1017
a9feb6cb
CBW
1018 # Make sure adjacent "martian" bytes are no longer adjacent.
1019 $martian =~ s/([A-Z][A-Z])/ $1 /g;
ac9dac7f 1020
a9feb6cb 1021 print "found GX!\n" if $martian =~ /GX/;
68dc0745 1022
1023Or like this:
1024
a9feb6cb
CBW
1025 @chars = $martian =~ m/([A-Z][A-Z]|[^A-Z])/g;
1026 # above is conceptually similar to: @chars = $text =~ m/(.)/g;
1027 #
1028 foreach $char (@chars) {
1029 print "found GX!\n", last if $char eq 'GX';
1030 }
68dc0745 1031
1032Or like this:
1033
a9feb6cb
CBW
1034 while ($martian =~ m/\G([A-Z][A-Z]|.)/gs) { # \G probably unneeded
1035 print "found GX!\n", last if $1 eq 'GX';
1036 }
68dc0745 1037
49d635f9 1038Here's another, slightly less painful, way to do it from Benjamin
c98c5709 1039Goldberg, who uses a zero-width negative look-behind assertion.
49d635f9 1040
a9feb6cb
CBW
1041 print "found GX!\n" if $martian =~ m/
1042 (?<![A-Z])
1043 (?:[A-Z][A-Z])*?
1044 GX
1045 /x;
197aec24 1046
49d635f9 1047This succeeds if the "martian" character GX is in the string, and fails
c93274ad 1048otherwise. If you don't like using (?<!), a zero-width negative
c98c5709 1049look-behind assertion, you can replace (?<![A-Z]) with (?:^|[^A-Z]).
49d635f9
RGS
1050
1051It does have the drawback of putting the wrong thing in $-[0] and $+[0],
1052but this usually can be worked around.
68dc0745 1053
ac9dac7f
RGS
1054=head2 How do I match a regular expression that's in a variable?
1055X<regex, in variable> X<eval> X<regex> X<quotemeta> X<\Q, regex>
c56bc1f6 1056X<\E, regex> X<qr//>
65acb1b1 1057
ac9dac7f 1058(contributed by brian d foy)
65acb1b1 1059
ac9dac7f
RGS
1060We don't have to hard-code patterns into the match operator (or
1061anything else that works with regular expressions). We can put the
1062pattern in a variable for later use.
65acb1b1 1063
ac9dac7f
RGS
1064The match operator is a double quote context, so you can interpolate
1065your variable just like a double quoted string. In this case, you
1066read the regular expression as user input and store it in C<$regex>.
1067Once you have the pattern in C<$regex>, you use that variable in the
1068match operator.
65acb1b1 1069
a9feb6cb 1070 chomp( my $regex = <STDIN> );
65acb1b1 1071
a9feb6cb 1072 if( $string =~ m/$regex/ ) { ... }
65acb1b1 1073
ac9dac7f
RGS
1074Any regular expression special characters in C<$regex> are still
1075special, and the pattern still has to be valid or Perl will complain.
1076For instance, in this pattern there is an unpaired parenthesis.
65acb1b1 1077
a9feb6cb 1078 my $regex = "Unmatched ( paren";
ac9dac7f 1079
a9feb6cb 1080 "Two parens to bind them all" =~ m/$regex/;
ac9dac7f
RGS
1081
1082When Perl compiles the regular expression, it treats the parenthesis
1083as the start of a memory match. When it doesn't find the closing
1084parenthesis, it complains:
1085
a9feb6cb 1086 Unmatched ( in regex; marked by <-- HERE in m/Unmatched ( <-- HERE paren/ at script line 3.
ac9dac7f
RGS
1087
1088You can get around this in several ways depending on our situation.
1089First, if you don't want any of the characters in the string to be
1090special, you can escape them with C<quotemeta> before you use the string.
1091
a9feb6cb
CBW
1092 chomp( my $regex = <STDIN> );
1093 $regex = quotemeta( $regex );
ac9dac7f 1094
a9feb6cb 1095 if( $string =~ m/$regex/ ) { ... }
ac9dac7f
RGS
1096
1097You can also do this directly in the match operator using the C<\Q>
1098and C<\E> sequences. The C<\Q> tells Perl where to start escaping
1099special characters, and the C<\E> tells it where to stop (see L<perlop>
1100for more details).
1101
a9feb6cb 1102 chomp( my $regex = <STDIN> );
ac9dac7f 1103
a9feb6cb 1104 if( $string =~ m/\Q$regex\E/ ) { ... }
ac9dac7f
RGS
1105
1106Alternately, you can use C<qr//>, the regular expression quote operator (see
c93274ad 1107L<perlop> for more details). It quotes and perhaps compiles the pattern,
ac9dac7f
RGS
1108and you can apply regular expression flags to the pattern.
1109
a9feb6cb 1110 chomp( my $input = <STDIN> );
ac9dac7f 1111
a9feb6cb 1112 my $regex = qr/$input/is;
ac9dac7f 1113
a9feb6cb 1114 $string =~ m/$regex/ # same as m/$input/is;
ac9dac7f
RGS
1115
1116You might also want to trap any errors by wrapping an C<eval> block
1117around the whole thing.
1118
a9feb6cb 1119 chomp( my $input = <STDIN> );
ac9dac7f 1120
a9feb6cb
CBW
1121 eval {
1122 if( $string =~ m/\Q$input\E/ ) { ... }
1123 };
1124 warn $@ if $@;
ac9dac7f
RGS
1125
1126Or...
1127
a9feb6cb
CBW
1128 my $regex = eval { qr/$input/is };
1129 if( defined $regex ) {
1130 $string =~ m/$regex/;
1131 }
1132 else {
1133 warn $@;
1134 }
65acb1b1 1135
68dc0745 1136=head1 AUTHOR AND COPYRIGHT
1137
8d2e243f 1138Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and
7678cced 1139other authors as noted. All rights reserved.
5a964f20 1140
5a7beb56
JH
1141This documentation is free; you can redistribute it and/or modify it
1142under the same terms as Perl itself.
5a964f20
TC
1143
1144Irrespective of its distribution, all code examples in this file
c93274ad 1145are hereby placed into the public domain. You are permitted and
5a964f20 1146encouraged to use this code in your own programs for fun
c93274ad 1147or for profit as you see fit. A simple comment in the code giving
5a964f20 1148credit would be courteous but is not required.