X-Git-Url: https://perl5.git.perl.org/perl5.git/blobdiff_plain/64c5a5665d9d2e73526d93f8e1b8e0488ead3228..9f4a55d4a28d81a94e54fa1913ec5c7affbce6fe:/pod/perlreref.pod diff --git a/pod/perlreref.pod b/pod/perlreref.pod index b9fb3b0..817b740 100644 --- a/pod/perlreref.pod +++ b/pod/perlreref.pod @@ -57,25 +57,26 @@ delimiters can be used. Must be reset with reset(). =head2 SYNTAX - \ Escapes the character immediately following it - . Matches any single character except a newline (unless /s is used) - ^ Matches at the beginning of the string (or line, if /m is used) - $ Matches at the end of the string (or line, if /m is used) - * Matches the preceding element 0 or more times - + Matches the preceding element 1 or more times - ? Matches the preceding element 0 or 1 times - {...} Specifies a range of occurrences for the element preceding it - [...] Matches any one of the characters contained within the brackets - (...) Groups subexpressions for capturing to $1, $2... - (?:...) Groups subexpressions without capturing (cluster) - | Matches either the subexpression preceding or following it - \1, \2, \3 ... Matches the text from the Nth group - \g1 or \g{1}, \g2 ... Matches the text from the Nth group - \g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group - \g{name} Named backreference - \k Named backreference - \k'name' Named backreference - (?P=name) Named backreference (python syntax) + \ Escapes the character immediately following it + . Matches any single character except a newline (unless /s is + used) + ^ Matches at the beginning of the string (or line, if /m is used) + $ Matches at the end of the string (or line, if /m is used) + * Matches the preceding element 0 or more times + + Matches the preceding element 1 or more times + ? Matches the preceding element 0 or 1 times + {...} Specifies a range of occurrences for the element preceding it + [...] Matches any one of the characters contained within the brackets + (...) Groups subexpressions for capturing to $1, $2... + (?:...) Groups subexpressions without capturing (cluster) + | Matches either the subexpression preceding or following it + \1, \2, \3 ... Matches the text from the Nth group + \g1 or \g{1}, \g2 ... Matches the text from the Nth group + \g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group + \g{name} Named backreference + \k Named backreference + \k'name' Named backreference + (?P=name) Named backreference (python syntax) =head2 ESCAPE SEQUENCES @@ -92,6 +93,7 @@ These work as in normal strings. \x{263a} A wide hexadecimal value \cx Control-x \N{name} A named character + \N{U+263D} A Unicode character by hex ordinal \l Lowercase next character \u Titlecase next character @@ -113,7 +115,7 @@ This one works differently from normal strings: [f-j-] Dash escaped or at start or end means 'dash' [^f-j] Caret indicates "match any character _except_ these" -The following sequences work within or without a character class. +The following sequences (except C<\N>) work within or without a character class. The first six are locale aware, all are Unicode aware. See L and L for details. @@ -123,42 +125,68 @@ and L for details. \W A non-word character \s A whitespace character \S A non-whitespace character - \h An horizontal white space - \H A non horizontal white space - \v A vertical white space - \V A non vertical white space + \h An horizontal whitespace + \H A non horizontal whitespace + \N A non newline (when not followed by '{NAME}'; experimental; + not valid in a character class; equivalent to [^\n]; it's + like '.' without /s modifier) + \v A vertical whitespace + \V A non vertical whitespace \R A generic newline (?>\v|\x0D\x0A) \C Match a byte (with Unicode, '.' matches a character) \pP Match P-named (Unicode) property - \p{...} Match Unicode property with long name + \p{...} Match Unicode property with name longer than 1 character \PP Match non-P - \P{...} Match lack of Unicode property with long name - \X Match extended Unicode combining character sequence + \P{...} Match lack of Unicode property with name longer than 1 char + \X Match Unicode extended grapheme cluster POSIX character classes and their Unicode and Perl equivalents: - alnum IsAlnum Alphanumeric - alpha IsAlpha Alphabetic - ascii IsASCII Any ASCII char - blank IsSpace [ \t] Horizontal whitespace (GNU extension) - cntrl IsCntrl Control characters - digit IsDigit \d Digits - graph IsGraph Alphanumeric and punctuation - lower IsLower Lowercase chars (locale and Unicode aware) - print IsPrint Alphanumeric, punct, and space - punct IsPunct Punctuation - space IsSpace [\s\ck] Whitespace - IsSpacePerl \s Perl's whitespace definition - upper IsUpper Uppercase chars (locale and Unicode aware) - word IsWord \w Alphanumeric plus _ (Perl extension) - xdigit IsXDigit [0-9A-Fa-f] Hexadecimal digit + ASCII- Full- + range range backslash + POSIX \p{...} \p{} sequence Description + ----------------------------------------------------------------------- + alnum PosixAlnum Alnum Alpha plus Digit + alpha PosixAlpha Alpha Alphabetic characters + ascii ASCII Any ASCII character + blank PosixBlank Blank \h Horizontal whitespace; + full-range also written + as \p{HorizSpace} (GNU + extension) + cntrl PosixCntrl Cntrl Control characters + digit PosixDigit Digit \d Decimal digits + graph PosixGraph Graph Alnum plus Punct + lower PosixLower Lower Lowercase characters + print PosixPrint Print Graph plus Print, but not + any Cntrls + punct PosixPunct Punct These aren't precisely + equivalent. See NOTE, + below. + space PosixSpace Space [\s\cK] Whitespace + PerlSpace SpacePerl \s Perl's whitespace + definition + upper PosixUpper Upper Uppercase characters + word PerlWord Word \w Alnum plus '_' (Perl + extension) + xdigit ASCII_Hex_Digit XDigit Hexadecimal digit, + ASCII-range is + [0-9A-Fa-f] + +NOTE on C<[[:punct:]]>, C<\p{PosixPunct}> and C<\p{Punct}>: +In the ASCII range, C<[[:punct:]]> and C<\p{PosixPunct}> match +C<[-!"#$%&'()*+,./:;<=E?@[\\\]^_`{|}~]> (although if a locale is in +effect, it could alter the behavior of C<[[:punct:]]>); and C<\p{Punct}> +matches C<[-!"#%&'()*,./:;?@[\\\]_{}]>. When matching a UTF-8 string, +C<[[:punct:]]> matches what it does in the ASCII range, plus what +C<\p{Punct}> matches. C<\p{Punct}> matches, anything that isn't a +control, an alphanumeric, a space, nor a symbol. Within a character class: - POSIX traditional Unicode - [:digit:] \d \p{IsDigit} - [:^digit:] \D \P{IsDigit} + POSIX traditional Unicode + [:digit:] \d \p{Digit} + [:^digit:] \D \P{Digit} =head2 ANCHORS @@ -172,12 +200,11 @@ All are zero-width assertions. \Z Match string end (before optional newline) \z Match absolute string end \G Match where previous m//g left off - \K Keep the stuff left of the \K, don't include it in $& =head2 QUANTIFIERS -Quantifiers are greedy by default -- match the B leftmost. +Quantifiers are greedy by default and match the B leftmost. Maximal Minimal Possessive Allowed range ------- ------- ---------- ------------- @@ -193,7 +220,7 @@ The possessive forms (new in Perl 5.10) prevent backtracking: what gets matched by a pattern with a possessive quantifier will not be backtracked into, even if that causes the whole match to fail. -There is no quantifier {,n} -- that gets understood as a literal string. +There is no quantifier C<{,n}>. That's interpreted as a literal string. =head2 EXTENDED CONSTRUCTS @@ -345,7 +372,7 @@ for details on regexes and internationalisation. =item * I by Jeffrey Friedl -(F) for a thorough grounding and +(F) for a thorough grounding and reference on the topic. =back