=item p
X</p> X<regex, preserve> X<regexp, preserve>
-Preserve the string matched such that ${^PREMATCH}, {$^MATCH}, and
+Preserve the string matched such that ${^PREMATCH}, ${^MATCH}, and
${^POSTMATCH} are available for use after matching.
=item g and c
=head3 Metacharacters
-The patterns used in Perl pattern matching evolved from the ones supplied in
+The patterns used in Perl pattern matching evolved from those supplied in
the Version 8 regex routines. (The routines are derived
(distantly) from Henry Spencer's freely redistributable reimplementation
of the V8 routines.) See L<Version 8 Regular Expressions> for
\pP Match P, named property. Use \p{Prop} for longer names.
\PP Match non-P
\X Match eXtended Unicode "combining character sequence",
- equivalent to (?:\PM\pM*)
+ equivalent to (?>\PM\pM*)
\C Match a single C char (octet) even under Unicode.
NOTE: breaks up characters into their UTF-8 bytes,
so you may end up with malformed pieces of UTF-8.
\g{name} Named backreference
\k<name> Named backreference
\K Keep the stuff left of the \K, don't include it in $&
+ \N Any character but \n
\v Vertical whitespace
\V Not vertical whitespace
\h Horizontal whitespace
# this is not, and will generate a warning:
$string =~ /[:alpha:]/;
-The available classes and their backslash equivalents (if available) are
-as follows:
-X<character class>
+The following table shows the mapping of POSIX character class
+names, common escapes, literal escape sequences and their equivalent
+Unicode style property names.
+X<character class> X<\p> X<\p{}>
X<alpha> X<alnum> X<ascii> X<blank> X<cntrl> X<digit> X<graph>
X<lower> X<print> X<punct> X<space> X<upper> X<word> X<xdigit>
- alpha
- alnum
- ascii
- blank [1]
- cntrl
- digit \d
- graph
- lower
- print
- punct
- space \s [2]
- upper
- word \w [3]
- xdigit
+B<Note:> up to Perl 5.10 the property names used were shared with
+standard Unicode properties, this was changed in Perl 5.11, see
+L<perl5110delta> for details.
+
+ POSIX Esc Class Property Note
+ --------------------------------------------------------
+ alnum [0-9A-Za-z] IsPosixAlnum
+ alpha [A-Za-z] IsPosixAlpha
+ ascii [\000-\177] IsASCII
+ blank [\011 ] IsPosixBlank [1]
+ cntrl [\0-\37\177] IsPosixCntrl
+ digit \d [0-9] IsPosixDigit
+ graph [!-~] IsPosixGraph
+ lower [a-z] IsPosixLower
+ print [ -~] IsPosixPrint
+ punct [!-/:-@[-`{-~] IsPosixPunct
+ space [\11-\15 ] IsPosixSpace [2]
+ \s [\11\12\14\15 ] IsPerlSpace [2]
+ upper [A-Z] IsPosixUpper
+ word \w [0-9A-Z_a-z] IsPerlWord [3]
+ xdigit [0-9A-Fa-f] IsXDigit
=over
=item [2]
-Not exactly equivalent to C<\s> since the C<[[:space:]]> includes
-also the (very rare) "vertical tabulator", "\cK" or chr(11) in ASCII.
+Note that C<\s> and C<[[:space:]]> are B<not> equivalent as C<[[:space:]]>
+includes also the (very rare) "vertical tabulator", "\cK" or chr(11) in
+ASCII.
=item [3]
matches zero, one, any alphabetic character, and the percent sign.
-The following equivalences to Unicode \p{} constructs and equivalent
-backslash character classes (if available), will hold:
-X<character class> X<\p> X<\p{}>
+=over 4
+
+=item C<$>
+
+Currency symbol
+
+=item C<+> C<< < >> C<=> C<< > >> C<|> C<~>
+
+Mathematical symbols
+
+=item C<^> C<`>
- [[:...:]] \p{...} backslash
-
- alpha IsAlpha
- alnum IsAlnum
- ascii IsASCII
- blank
- cntrl IsCntrl
- digit IsDigit \d
- graph IsGraph
- lower IsLower
- print IsPrint
- punct IsPunct
- space IsSpace
- IsSpacePerl \s
- upper IsUpper
- word IsWord
- xdigit IsXDigit
-
-For example C<[[:lower:]]> and C<\p{IsLower}> are equivalent.
-
-If the C<utf8> pragma is not used but the C<locale> pragma is, the
-classes correlate with the usual isalpha(3) interface (except for
-"word" and "blank").
+Modifier symbols (accents)
+
+
+=back
The other named classes are:
POSIX traditional Unicode
- [[:^digit:]] \D \P{IsDigit}
- [[:^space:]] \S \P{IsSpace}
- [[:^word:]] \W \P{IsWord}
+ [[:^digit:]] \D \P{IsPosixDigit}
+ [[:^space:]] \S \P{IsPosixSpace}
+ [[:^word:]] \W \P{IsPerlWord}
Perl respects the POSIX standard in that POSIX character classes are
only supported within a character class. The POSIX character classes
X<\g{1}> X<\g{-1}> X<\g{name}> X<relative backreference> X<named backreference>
In order to provide a safer and easier way to construct patterns using
-backreferences, Perl 5.10 provides the C<\g{N}> notation. The curly
-brackets are optional, however omitting them is less safe as the meaning
-of the pattern can be changed by text (such as digits) following it.
-When N is a positive integer the C<\g{N}> notation is exactly equivalent
-to using normal backreferences. When N is a negative integer then it is
-a relative backreference referring to the previous N'th capturing group.
-When the bracket form is used and N is not an integer, it is treated as a
-reference to a named buffer.
+backreferences, Perl provides the C<\g{N}> notation (starting with perl
+5.10.0). The curly brackets are optional, however omitting them is less
+safe as the meaning of the pattern can be changed by text (such as digits)
+following it. When N is a positive integer the C<\g{N}> notation is
+exactly equivalent to using normal backreferences. When N is a negative
+integer then it is a relative backreference referring to the previous N'th
+capturing group. When the bracket form is used and N is not an integer, it
+is treated as a reference to a named buffer.
Thus C<\g{-1}> refers to the last buffer, C<\g{-2}> refers to the
buffer before that. For example:
and would match the same as C</(Y) ( (X) \3 \1 )/x>.
-Additionally, as of Perl 5.10 you may use named capture buffers and named
+Additionally, as of Perl 5.10.0 you may use named capture buffers and named
backreferences. The notation is C<< (?<name>...) >> to declare and C<< \k<name> >>
to reference. You may also use apostrophes instead of angle brackets to delimit the
name; and you may use the bracketed C<< \g{name} >> backreference syntax.
other two.
X<$&> X<$`> X<$'>
-As a workaround for this problem, Perl 5.10 introduces C<${^PREMATCH}>,
+As a workaround for this problem, Perl 5.10.0 introduces C<${^PREMATCH}>,
C<${^MATCH}> and C<${^POSTMATCH}>, which are equivalent to C<$`>, C<$&>
and C<$'>, B<except> that they are only guaranteed to be defined after a
successful match that was executed with the C</p> (preserve) modifier.
repetition of the previous word, assuming the C</x> modifier, and no C</i>
modifier outside this group.
+These modifiers do not carry over into named subpatterns called in the
+enclosing group. In other words, a pattern such as C<((?i)(&NAME))> does not
+change the case-sensitivity of the "NAME" pattern.
+
Note that the C<p> modifier is special in that it can only be enabled,
not disabled, and that its presence anywhere in a pattern has a global
effect. Thus C<(?-p)> and C<(?-p:...)> are meaningless and will warn
This is the "branch reset" pattern, which has the special property
that the capture buffers are numbered from the same starting point
-in each alternation branch. It is available starting from perl 5.10.
+in each alternation branch. It is available starting from perl 5.10.0.
Capture buffers are numbered from left to right, but inside this
construct the numbering is restarted for each branch.
value of C<$^R> is restored if the assertion is backtracked; compare
L<"Backtracking">.
-Due to an unfortunate implementation issue, the Perl code contained in these
-blocks is treated as a compile time closure that can have seemingly bizarre
-consequences when used with lexically scoped variables inside of subroutines
-or loops. There are various workarounds for this, including simply using
-global variables instead. If you are using this construct and strange results
-occur then check for the use of lexically scoped variables.
-
For reasons of security, this construct is forbidden if the regular
expression involves run-time interpolation of variables, unless the
perilous C<use re 'eval'> pragma has been used (see L<re>), or the
Better yet, use the carefully constrained evaluation within a Safe
compartment. See L<perlsec> for details about both these mechanisms.
-Because Perl's regex engine is currently not re-entrant, interpolated
-code may not invoke the regex engine either directly with C<m//> or C<s///>),
-or indirectly with functions such as C<split>.
+B<WARNING>: Use of lexical (C<my>) variables in these blocks is
+broken. The result is unpredictable and will make perl unstable. The
+workaround is to use global (C<our>) variables.
+
+B<WARNING>: Because Perl's regex engine is currently not re-entrant,
+interpolated code may not invoke the regex engine either directly with
+C<m//> or C<s///>), or indirectly with functions such as
+C<split>. Invoking the regex engine in these blocks will make perl
+unstable.
=item C<(??{ code })>
X<(??{})>
=head1 PCRE/Python Support
-As of Perl 5.10 Perl supports several Python/PCRE specific extensions
+As of Perl 5.10.0, Perl supports several Python/PCRE specific extensions
to the regex syntax. While Perl programmers are encouraged to use the
-Perl specific syntax, the following are legal in Perl 5.10:
+Perl specific syntax, the following are also accepted:
=over 4