X-Git-Url: https://perl5.git.perl.org/perl5.git/blobdiff_plain/d35dd6c678badc24d545f8b7b7a3ebdf0fb0b355..4ee2b8db537d28b77d127a86307e426289e5c8b5:/pod/perlreref.pod
diff --git a/pod/perlreref.pod b/pod/perlreref.pod
index 5247a63..db7c173 100644
--- a/pod/perlreref.pod
+++ b/pod/perlreref.pod
@@ -21,7 +21,7 @@ false if the match succeeds, and true if it fails.
$var !~ /foo/;
-C searches a string for a pattern match,
+C searches a string for a pattern match,
applying the given options.
m Multiline mode - ^ and $ match internal lines
@@ -33,17 +33,24 @@ applying the given options.
o compile pattern Once
g Global - all occurrences
c don't reset pos on failed matches when using /g
+ a restrict \d, \s, \w and [:posix:] to match ASCII only
+ aa (two a's) also /i matches exclude ASCII/non-ASCII
+ l match according to current locale
+ u match according to Unicode rules
+ d match according to native rules unless something indicates
+ Unicode
+ n Non-capture mode. Don't let () fill in $1, $2, etc...
If 'pattern' is an empty string, the last I matched
regex is used. Delimiters other than '/' may be used for both this
operator and the following ones. The leading C can be omitted
if the delimiter is '/'.
-C lets you store a regex in a variable,
+C lets you store a regex in a variable,
or pass one around. Modifiers as for C, and are stored
within the regex.
-C substitutes matches of
+C substitutes matches of
'pattern' with 'replacement'. Modifiers as for C,
with two additions:
@@ -101,6 +108,7 @@ These work as in normal strings.
\u Titlecase next character
\L Lowercase until \E
\U Uppercase until \E
+ \F Foldcase until \E
\Q Disable pattern metacharacters until \E
\E End modification
@@ -129,14 +137,13 @@ and L for details.
\S A non-whitespace character
\h An horizontal whitespace
\H A non horizontal whitespace
- \N A non newline (when not followed by '{NAME}'; experimental;
+ \N A non newline (when not followed by '{NAME}';;
not valid in a character class; equivalent to [^\n]; it's
like '.' without /s modifier)
\v A vertical whitespace
\V A non vertical whitespace
\R A generic newline (?>\v|\x0D\x0A)
- \C Match a byte (with Unicode, '.' matches a character)
\pP Match P-named (Unicode) property
\p{...} Match Unicode property with name longer than 1 character
\PP Match non-P
@@ -150,7 +157,7 @@ POSIX character classes and their Unicode and Perl equivalents:
[[:...:]] \p{...} \p{...} sequence Description
-----------------------------------------------------------------------
- alnum PosixAlnum XPosixAlnum Alpha plus Digit
+ alnum PosixAlnum XPosixAlnum 'alpha' plus 'digit'
alpha PosixAlpha XPosixAlpha Alphabetic characters
ascii ASCII Any ASCII character
blank PosixBlank XPosixBlank \h Horizontal whitespace;
@@ -160,19 +167,18 @@ POSIX character classes and their Unicode and Perl equivalents:
extension)
cntrl PosixCntrl XPosixCntrl Control characters
digit PosixDigit XPosixDigit \d Decimal digits
- graph PosixGraph XPosixGraph Alnum plus Punct
+ graph PosixGraph XPosixGraph 'alnum' plus 'punct'
lower PosixLower XPosixLower Lowercase characters
- print PosixPrint XPosixPrint Graph plus Print, but
- not any Cntrls
+ print PosixPrint XPosixPrint 'graph' plus 'space',
+ but not any Controls
punct PosixPunct XPosixPunct Punctuation and Symbols
in ASCII-range; just
punct outside it
- space PosixSpace XPosixSpace [\s\cK] Whitespace
- PerlSpace XPerlSpace \s Perl's whitespace def'n
+ space PosixSpace XPosixSpace \s Whitespace
upper PosixUpper XPosixUpper Uppercase characters
- word PerlWord XPosixWord \w Alnum + Unicode marks +
- connectors, like '_'
- (Perl extension)
+ word PosixWord XPosixWord \w 'alnum' + Unicode marks
+ + connectors, like
+ '_' (Perl extension)
xdigit ASCII_Hex_Digit XPosixDigit Hexadecimal digit,
ASCII-range is
[0-9A-Fa-f]
@@ -192,6 +198,8 @@ All are zero-width assertions.
^ Match string start (or line, if /m is used)
$ Match string end (or line, if /m is used) or before newline
+ \b{} Match boundary of type specified within the braces
+ \B{} Match wherever \b{} doesn't match
\b Match word boundary (between \w and \W)
\B Match except at word boundary (between \w and \w or \W and \W)
\A Match string start (regardless of /m)
@@ -234,6 +242,7 @@ There is no quantifier C<{,n}>. That's interpreted as a literal string.
(?...) Named capture
(?'name'...) Named capture
(?P...) Named capture (python syntax)
+ (?[...]) Extended bracketed character class
(?{ code }) Embedded code, return value becomes $^R
(??{ code }) Dynamic regex, return value used as regex
(?N) Recurse into subpattern number N
@@ -243,6 +252,10 @@ There is no quantifier C<{,n}>. That's interpreted as a literal string.
(?P>name) Recurse into a named subpattern (python syntax)
(?(cond)yes|no)
(?(cond)yes) Conditional expression, where "cond" can be:
+ (?=pat) lookahead
+ (?!pat) negative lookahead
+ (?<=pat) lookbehind
+ (?) named subpattern has matched something
('name') named subpattern has matched something
@@ -264,6 +277,7 @@ There is no quantifier C<{,n}>. That's interpreted as a literal string.
${^MATCH} Entire matched string
${^POSTMATCH} Everything after to matched string
+Note to those still using Perl 5.18 or earlier:
The use of C<$`>, C<$&> or C<$'> will slow down B regex use
within your program. Consult L for C<@->
to see equivalent expressions that won't cause slow down.
@@ -271,6 +285,7 @@ See also L. Starting with Perl 5.10, you
can also use the equivalent variables C<${^PREMATCH}>, C<${^MATCH}>
and C<${^POSTMATCH}>, but for them to be defined, you have to
specify the C
(preserve) modifier on your regular expression.
+In Perl 5.20, the use of C<$`>, C<$&> and C<$'> makes no speed difference.
$1, $2 ... hold the Xth captured expr
$+ Last parenthesized pattern match
@@ -289,6 +304,7 @@ Captured groups are numbered according to their I paren.
lcfirst Lowercase first char of a string
uc Uppercase a string
ucfirst Titlecase first char of a string
+ fc Foldcase a string
pos Return or set current match position
quotemeta Quote metacharacters
@@ -297,8 +313,9 @@ Captured groups are numbered according to their I paren.
split Use a regex to split a string into parts
-The first four of these are like the escape sequences C<\L>, C<\l>,
-C<\U>, and C<\u>. For Titlecase, see L.
+The first five of these are like the escape sequences C<\L>, C<\l>,
+C<\U>, C<\u>, and C<\F>. For Titlecase, see L; For
+Foldcase, see L.
=head2 TERMINOLOGY
@@ -307,6 +324,12 @@ C<\U>, and C<\u>. For Titlecase, see L.
Unicode concept which most often is equal to uppercase, but for
certain characters like the German "sharp s" there is a difference.
+=head3 Foldcase
+
+Unicode form that is useful when comparing strings regardless of case,
+as certain characters have complex one-to-many case mappings. Primarily a
+variant of lowercase.
+
=head1 AUTHOR
Iain Truskett. Updated by the Perl 5 Porters.
@@ -360,7 +383,7 @@ debugging.
=item *
-L
+L
=item *