the string), and "$" will match before any newline. At the
cost of a little more overhead, you can do this by using the /m modifier
on the pattern match operator. (Older programs did this by setting C<$*>,
-but this option was removed in perl 5.9.)
+but this option was removed in perl 5.10.)
X<^> X<$> X</m>
To simplify multi-line substitutions, the "." character never matches a
B<WARNING>: This extended regular expression feature is considered
experimental, and may be changed without notice. Code executed that
has side effects may not perform identically from version to version
-due to the effect of future optimisations in the regex engine.
+due to the effect of future optimisations in the regex engine. The
+implementation of this feature was radically overhauled for the 5.18.0
+release, and its behaviour in earlier versions of perl was much buggier,
+especially in relation to parsing, lexical vars, scoping, recursion and
+reentrancy.
-This zero-width assertion evaluates any embedded Perl code. It
-always succeeds, and its C<code> is not interpolated. Currently,
-the rules to determine where the C<code> ends are somewhat convoluted.
+This zero-width assertion executes any embedded Perl code. It always
+succeeds, and its return value is set as C<$^R>.
-This feature can be used together with the special variable C<$^N> to
-capture the results of submatches in variables without having to keep
-track of the number of nested parentheses. For example:
+In literal patterns, the code is parsed at the same time as the
+surrounding code. While within the pattern, control is passed temporarily
+back to the perl parser, until the logically-balancing closing brace is
+encountered. This is similar to the way that an array index expression in
+a literal string is handled, for example
- $_ = "The brown fox jumps over the lazy dog";
- /the (\S+)(?{ $color = $^N }) (\S+)(?{ $animal = $^N })/i;
- print "color = $color, animal = $animal\n";
+ "abc$array[ 1 + f('[') + g()]def"
+
+In particular, braces do not need to be balanced:
+
+ /abc(?{ f('{'); })/def/
+
+Even in a pattern that is interpolated and compiled at run-time, literal
+code blocks will be compiled once, at perl compile time; the following
+prints "ABCD":
+
+ print "D";
+ my $qr = qr/(?{ BEGIN { print "A" } })/;
+ my $foo = "foo";
+ /$foo$qr(?{ BEGIN { print "B" } })/;
+ BEGIN { print "C" }
+
+In patterns where the text of the code is derived from run-time
+information rather than appearing literally in a source code /pattern/,
+the code is compiled at the same time that the pattern is compiled, and
+fro reasons of security, C<use re 'eval'> must be in scope. This is to
+stop user-supplied patterns containing code snippets from being
+executable.
+
+In situations where you need enable this with C<use re 'eval'>, you should
+also have taint checking enabled. Better yet, use the carefully
+constrained evaluation within a Safe compartment. See L<perlsec> for
+details about both these mechanisms.
+
+From the viewpoint of parsing, lexical variable scope and closures,
+
+ /AAA(?{ BBB })CCC/
+
+behaves approximately like
+
+ /AAA/ && do { BBB } && /CCC/
+
+Similarly,
+
+ qr/AAA(?{ BBB })CCC/
+
+behaves approximately like
-Inside the C<(?{...})> block, C<$_> refers to the string the regular
+ sub { /AAA/ && do { BBB } && /CCC/ }
+
+In particular:
+
+ { my $i = 1; $r = qr/(?{ print $i })/ }
+ my $i = 2;
+ /$r/; # prints "1"
+
+Inside a C<(?{...})> block, C<$_> refers to the string the regular
expression is matching against. You can also use C<pos()> to know what is
the current position of matching within this string.
-The C<code> is properly scoped in the following sense: If the assertion
-is backtracked (compare L<"Backtracking">), all changes introduced after
-C<local>ization are undone, so that
+The code block introduces a new scope from the perspective of lexical
+variable declarations, but B<not> from the perspective of C<local> and
+similar localizing behaviours. So later code blocks within the same
+pattern will still see the values which were localized in earlier blocks.
+These accumulated localizations are undone either at the end of a
+successful match, or if the assertion is backtracked (compare
+L<"Backtracking">). For example,
$_ = 'a' x 8;
m<
# non-localized location.
>x;
-will set C<$res = 4>. Note that after the match, C<$cnt> returns to the globally
-introduced value, because the scopes that restrict C<local> operators
-are unwound.
+will initially increment C<$cnt> up to 8; then during backtracking, its
+value will be unwound back to 4, which is the value assigned to C<$res>.
+At the end of the regex execution, $cnt will be wound back to its initial
+value of 0.
+
+This assertion may be used as the condition in a
+
+ (?(condition)yes-pattern|no-pattern)
-This assertion may be used as a C<(?(condition)yes-pattern|no-pattern)>
-switch. If I<not> used in this way, the result of evaluation of
-C<code> is put into the special variable C<$^R>. This happens
-immediately, so C<$^R> can be used from other C<(?{ code })> assertions
-inside the same regular expression.
+switch. If I<not> used in this way, the result of evaluation of C<code>
+is put into the special variable C<$^R>. This happens immediately, so
+C<$^R> can be used from other C<(?{ code })> assertions inside the same
+regular expression.
The assignment to C<$^R> above is properly localized, so the old
value of C<$^R> is restored if the assertion is backtracked; compare
L<"Backtracking">.
-For reasons of security, this construct is forbidden if the regular
-expression involves run-time interpolation of variables, unless the
-perilous C<use re 'eval'> pragma has been used (see L<re>), or the
-variables contain results of the C<qr//> operator (see
-L<perlop/"qr/STRINGE<sol>msixpodual">).
+Note that the special variable C<$^N> is particularly useful with code
+blocks to capture the results of submatches in variables without having to
+keep track of the number of nested parentheses. For example:
-This restriction is due to the wide-spread and remarkably convenient
-custom of using run-time determined strings as patterns. For example:
+ $_ = "The brown fox jumps over the lazy dog";
+ /the (\S+)(?{ $color = $^N }) (\S+)(?{ $animal = $^N })/i;
+ print "color = $color, animal = $animal\n";
- $re = <>;
- chomp $re;
- $string =~ /$re/;
-
-Before Perl knew how to execute interpolated code within a pattern,
-this operation was completely safe from a security point of view,
-although it could raise an exception from an illegal pattern. If
-you turn on the C<use re 'eval'>, though, it is no longer secure,
-so you should only do so if you are also using taint checking.
-Better yet, use the carefully constrained evaluation within a Safe
-compartment. See L<perlsec> for details about both these mechanisms.
-
-B<WARNING>: Use of lexical (C<my>) variables in these blocks is
-broken. The result is unpredictable and will make perl unstable. The
-workaround is to use global (C<our>) variables.
-
-B<WARNING>: In perl 5.12.x and earlier, the regex engine
-was not re-entrant, so interpolated code could not
-safely invoke the regex engine either directly with
-C<m//> or C<s///>), or indirectly with functions such as
-C<split>. Invoking the regex engine in these blocks would make perl
-unstable.
=item C<(??{ code })>
X<(??{})>
has side effects may not perform identically from version to version
due to the effect of future optimisations in the regex engine.
-This is a "postponed" regular subexpression. The C<code> is evaluated
-at run time, at the moment this subexpression may match. The result
-of evaluation is considered a regular expression and matched as
-if it were inserted instead of this construct. Note that this means
-that the contents of capture groups defined inside an eval'ed pattern
-are not available outside of the pattern, and vice versa, there is no
-way for the inner pattern returned from the code block to refer to a
-capture group defined outside. (The code block itself can use C<$1>, etc.,
-to refer to the enclosing pattern's capture groups.) Thus,
+This is a "postponed" regular subexpression. It behaves in I<exactly> the
+same way as a C<(?{ code })> code block as described above, except that
+its return value, rather than being assigned to C<$^R>, is treated as a
+pattern, compiled if it's a string (or used as-is if its a qr// object),
+then matched as if it were inserted instead of this construct.
- ('a' x 100)=~/(??{'(.)' x 100})/
+During the matching of this sub-pattern, it has its own set of
+captures which are valid during the sub-match, but are discarded once
+control returns to the main pattern. For example, the following matches,
+with the inner pattern capturing "B" and matching "BB", while the outer
+pattern captures "A";
+
+ my $inner = '(.)\1';
+ "ABBA" =~ /^(.)(??{ $inner })\1/;
+ print $1; # prints "A";
+
+Note that this means that there is no way for the inner pattern to refer
+to a capture group defined outside. (The code block itself can use C<$1>,
+etc., to refer to the enclosing pattern's capture groups.) Thus, although
-B<will> match, it will B<not> set $1.
+ ('a' x 100)=~/(??{'(.)' x 100})/
-The C<code> is not interpolated. As before, the rules to determine
-where the C<code> ends are currently somewhat convoluted.
+I<will> match, it will I<not> set $1 on exit.
The following pattern matches a parenthesized group:
See also C<(?PARNO)> for a different, more efficient way to accomplish
the same task.
-For reasons of security, this construct is forbidden if the regular
-expression involves run-time interpolation of variables, unless the
-perilous C<use re 'eval'> pragma has been used (see L<re>), or the
-variables contain results of the C<qr//> operator (see
-L<perlop/"qrE<sol>STRINGE<sol>msixpodual">).
-
-In perl 5.12.x and earlier, because the regex engine was not re-entrant,
-delayed code could not safely invoke the regex engine either directly with
-C<m//> or C<s///>), or indirectly with functions such as C<split>.
-
-Recursing deeper than 50 times without consuming any input string will
-result in a fatal error. The maximum depth is compiled into perl, so
-changing it requires a custom build.
+Executing a postponed regular expression 50 times without consuming any
+input string will result in a fatal error. The maximum depth is compiled
+into perl, so changing it requires a custom build.
=item C<(?PARNO)> C<(?-PARNO)> C<(?+PARNO)> C<(?R)> C<(?0)>
X<(?PARNO)> X<(?1)> X<(?R)> X<(?0)> X<(?-1)> X<(?+1)> X<(?-PARNO)> X<(?+PARNO)>
X<regex, recursive> X<regexp, recursive> X<regular expression, recursive>
X<regex, relative recursion>
-Similar to C<(??{ code })> except it does not involve compiling any code,
-instead it treats the contents of a capture group as an independent
-pattern that must match at the current position. Capture groups
-contained by the pattern will have the value as determined by the
-outermost recursion.
+Similar to C<(??{ code })> except that it does not involve executing any
+code or potentially compiling a returned pattern string; instead it treats
+the part of the current pattern contained within a specified capture group
+as an independent pattern that must match at the current position.
+Capture groups contained by the pattern will have the value as determined
+by the outermost recursion.
PARNO is a sequence of digits (not starting with 0) whose value reflects
the paren-number of the capture group to recurse to. C<(?R)> recurses to
for later use:
my $parens = qr/(\((?:[^()]++|(?-1))*+\))/;
- if (/foo $parens \s+ + \s+ bar $parens/x) {
+ if (/foo $parens \s+ \+ \s+ bar $parens/x) {
# do something here...
}
a true value, matches C<no-pattern> otherwise. A missing pattern always
matches.
-C<(condition)> should be either an integer in
+C<(condition)> should be one of: 1) an integer in
parentheses (which is valid if the corresponding pair of parentheses
-matched), a look-ahead/look-behind/evaluate zero-width assertion, a
+matched); 2) a look-ahead/look-behind/evaluate zero-width assertion; 3) a
name in angle brackets or single quotes (which is valid if a group
-with the given name matched), or the special symbol (R) (true when
+with the given name matched); or 4) the special symbol (R) (true when
evaluated inside of recursion or eval). Additionally the R may be
followed by a number, (which will be true when evaluated when recursing
inside of the appropriate group), or by C<&NAME>, in which case it will
but
- / ( A (*THEN) B | C (*THEN) D ) /
+ / ( A (*THEN) B | C ) /
is not the same as
- / ( A (*PRUNE) B | C (*PRUNE) D ) /
+ / ( A (*PRUNE) B | C ) /
as after matching the A but failing on the B the C<(*THEN)> verb will
backtrack and try C; but the C<(*PRUNE)> verb will simply fail.