=item p
X</p> X<regex, preserve> X<regexp, preserve>
-Preserve the string matched such that ${^PREMATCH}, {$^MATCH}, and
+Preserve the string matched such that ${^PREMATCH}, ${^MATCH}, and
${^POSTMATCH} are available for use after matching.
=item g and c
=head3 Metacharacters
-The patterns used in Perl pattern matching evolved from the ones supplied in
+The patterns used in Perl pattern matching evolved from those supplied in
the Version 8 regex routines. (The routines are derived
(distantly) from Henry Spencer's freely redistributable reimplementation
of the V8 routines.) See L<Version 8 Regular Expressions> for
\pP Match P, named property. Use \p{Prop} for longer names.
\PP Match non-P
\X Match eXtended Unicode "combining character sequence",
- equivalent to (?:\PM\pM*)
+ equivalent to (?>\PM\pM*)
\C Match a single C char (octet) even under Unicode.
NOTE: breaks up characters into their UTF-8 bytes,
so you may end up with malformed pieces of UTF-8.
\g{name} Named backreference
\k<name> Named backreference
\K Keep the stuff left of the \K, don't include it in $&
+ \N Any character but \n
\v Vertical whitespace
\V Not vertical whitespace
\h Horizontal whitespace
# this is not, and will generate a warning:
$string =~ /[:alpha:]/;
-The available classes and their backslash equivalents (if available) are
-as follows:
-X<character class>
+The following table shows the mapping of POSIX character class
+names, common escapes, literal escape sequences and their equivalent
+Unicode style property names.
+X<character class> X<\p> X<\p{}>
X<alpha> X<alnum> X<ascii> X<blank> X<cntrl> X<digit> X<graph>
X<lower> X<print> X<punct> X<space> X<upper> X<word> X<xdigit>
- alpha
- alnum
- ascii
- blank [1]
- cntrl
- digit \d
- graph
- lower
- print
- punct
- space \s [2]
- upper
- word \w [3]
- xdigit
+B<Note:> up to Perl 5.10 the property names used were shared with
+standard Unicode properties, this was changed in Perl 5.11, see
+L<perl5110delta> for details.
+
+ POSIX Esc Class Property Note
+ --------------------------------------------------------
+ alnum [0-9A-Za-z] IsPosixAlnum
+ alpha [A-Za-z] IsPosixAlpha
+ ascii [\000-\177] IsASCII
+ blank [\011 ] IsPosixBlank [1]
+ cntrl [\0-\37\177] IsPosixCntrl
+ digit \d [0-9] IsPosixDigit
+ graph [!-~] IsPosixGraph
+ lower [a-z] IsPosixLower
+ print [ -~] IsPosixPrint
+ punct [!-/:-@[-`{-~] IsPosixPunct
+ space [\11-\15 ] IsPosixSpace [2]
+ \s [\11\12\14\15 ] IsPerlSpace [2]
+ upper [A-Z] IsPosixUpper
+ word \w [0-9A-Z_a-z] IsPerlWord [3]
+ xdigit [0-9A-Fa-f] IsXDigit
=over
=item [2]
-Not exactly equivalent to C<\s> since the C<[[:space:]]> includes
-also the (very rare) "vertical tabulator", "\cK" or chr(11) in ASCII.
+Note that C<\s> and C<[[:space:]]> are B<not> equivalent as C<[[:space:]]>
+includes also the (very rare) "vertical tabulator", "\cK" or chr(11) in
+ASCII.
=item [3]
matches zero, one, any alphabetic character, and the percent sign.
-The following equivalences to Unicode \p{} constructs and equivalent
-backslash character classes (if available), will hold:
-X<character class> X<\p> X<\p{}>
+=over 4
+
+=item C<$>
+
+Currency symbol
+
+=item C<+> C<< < >> C<=> C<< > >> C<|> C<~>
+
+Mathematical symbols
+
+=item C<^> C<`>
- [[:...:]] \p{...} backslash
-
- alpha IsAlpha
- alnum IsAlnum
- ascii IsASCII
- blank
- cntrl IsCntrl
- digit IsDigit \d
- graph IsGraph
- lower IsLower
- print IsPrint
- punct IsPunct
- space IsSpace
- IsSpacePerl \s
- upper IsUpper
- word IsWord
- xdigit IsXDigit
-
-For example C<[[:lower:]]> and C<\p{IsLower}> are equivalent.
-
-If the C<utf8> pragma is not used but the C<locale> pragma is, the
-classes correlate with the usual isalpha(3) interface (except for
-"word" and "blank").
+Modifier symbols (accents)
+
+
+=back
The other named classes are:
POSIX traditional Unicode
- [[:^digit:]] \D \P{IsDigit}
- [[:^space:]] \S \P{IsSpace}
- [[:^word:]] \W \P{IsWord}
+ [[:^digit:]] \D \P{IsPosixDigit}
+ [[:^space:]] \S \P{IsPosixSpace}
+ [[:^word:]] \W \P{IsPerlWord}
Perl respects the POSIX standard in that POSIX character classes are
only supported within a character class. The POSIX character classes
repetition of the previous word, assuming the C</x> modifier, and no C</i>
modifier outside this group.
+These modifiers do not carry over into named subpatterns called in the
+enclosing group. In other words, a pattern such as C<((?i)(&NAME))> does not
+change the case-sensitivity of the "NAME" pattern.
+
Note that the C<p> modifier is special in that it can only be enabled,
not disabled, and that its presence anywhere in a pattern has a global
effect. Thus C<(?-p)> and C<(?-p:...)> are meaningless and will warn
value of C<$^R> is restored if the assertion is backtracked; compare
L<"Backtracking">.
-Due to an unfortunate implementation issue, the Perl code contained in these
-blocks is treated as a compile time closure that can have seemingly bizarre
-consequences when used with lexically scoped variables inside of subroutines
-or loops. There are various workarounds for this, including simply using
-global variables instead. If you are using this construct and strange results
-occur then check for the use of lexically scoped variables.
-
For reasons of security, this construct is forbidden if the regular
expression involves run-time interpolation of variables, unless the
perilous C<use re 'eval'> pragma has been used (see L<re>), or the
Better yet, use the carefully constrained evaluation within a Safe
compartment. See L<perlsec> for details about both these mechanisms.
-Because Perl's regex engine is currently not re-entrant, interpolated
-code may not invoke the regex engine either directly with C<m//> or C<s///>),
-or indirectly with functions such as C<split>.
+B<WARNING>: Use of lexical (C<my>) variables in these blocks is
+broken. The result is unpredictable and will make perl unstable. The
+workaround is to use global (C<our>) variables.
+
+B<WARNING>: Because Perl's regex engine is currently not re-entrant,
+interpolated code may not invoke the regex engine either directly with
+C<m//> or C<s///>), or indirectly with functions such as
+C<split>. Invoking the regex engine in these blocks will make perl
+unstable.
=item C<(??{ code })>
X<(??{})>