escape sequences, e.g., C<\033>, or hexadecimal escape sequences,
e.g., C<\x1B>:
- "1000\t2000" =~ m(0\t2) # matches
- "cat" =~ /\143\x61\x74/ # matches in ASCII, but a weird way to spell cat
+ "1000\t2000" =~ m(0\t2) # matches
+ "cat" =~ /\143\x61\x74/ # matches in ASCII, but
+ # a weird way to spell cat
Regexes are treated mostly as double-quoted strings, so variable
substitution works:
/[a^]at/; # matches 'aat' or '^at'; here '^' is ordinary
Perl has several abbreviations for common character classes. (These
-definitions are those that Perl uses in ASCII mode with the C</a> modifier.
-See L<perlrecharclass/Backslash sequences> for details.)
+definitions are those that Perl uses in ASCII-safe mode with the C</a> modifier.
+Otherwise they could match many more non-ASCII Unicode characters as
+well. See L<perlrecharclass/Backslash sequences> for details.)
=over 4
In the last example, the end of the string is considered a word
boundary.
+For natural language processing (so that, for example, apostrophes are
+included in words), use instead C<\b{wb}>
+
+ "don't" =~ / .+? \b{wb} /x; # matches the whole string
+
=head2 Matching this or that
We can match different character strings with the B<alternation>
/(\w+)\s+\g1/; # match doubled words of arbitrary length
$year =~ /^\d{2,4}$/; # make sure year is at least 2 but not more
# than 4 digits
- $year =~ /^\d{4}$|^\d{2}$/; # better match; throw out 3 digit dates
+ $year =~ /^\d{4}$|^\d{2}$/; # better match; throw out 3 digit dates
These quantifiers will try to match as much of the string as possible,
while still allowing the regex to match. So we have
=head2 More matching
There are a few more things you might want to know about matching
-operators. In the code
-
- $pattern = 'Seuss';
- while (<>) {
- print if /$pattern/;
- }
-
-Perl has to re-evaluate C<$pattern> each time through the loop. If
-C<$pattern> won't be changing, use the C<//o> modifier, to only
-perform variable substitutions once. If you don't want any
-substitutions at all, use the special delimiter C<m''>:
-
- @pattern = ('Seuss');
- m/@pattern/; # matches 'Seuss'
- m'@pattern'; # matches the literal string '@pattern'
-
+operators.
The global modifier C<//g> allows the matching operator to match
within a string as many times as possible. In scalar context,
successive matches against a string will have C<//g> jump from match
print "$x $y\n"; # prints "I like dogs. I like cats."
$x = "Cats are great.";
- print $x =~ s/Cats/Dogs/r =~ s/Dogs/Frogs/r =~ s/Frogs/Hedgehogs/r, "\n";
+ print $x =~ s/Cats/Dogs/r =~ s/Dogs/Frogs/r =~
+ s/Frogs/Hedgehogs/r, "\n";
# prints "Hedgehogs are great."
@foo = map { s/[a-z]/X/r } qw(a b c 1 2 3);
Since the first character of $x matched the regex, C<split> prepended
an empty initial element to the list.
+=head2 C<use re 'strict'>
+
+New in v5.22, this applies stricter rules than otherwise when compiling
+regular expression patterns. It can find things that, while legal, may
+not be what you intended.
+
+See L<'strict' in re|re/'strict' mode>.
+
=head1 BUGS
None.