X-Git-Url: https://perl5.git.perl.org/perl5.git/blobdiff_plain/b3b85878703a83ab8f906188035b0be144ebdd9e..ba535ffe33d92fe0557b19c2d88ef45885ef313a:/pod/perlreref.pod diff --git a/pod/perlreref.pod b/pod/perlreref.pod index c6cbe75..83c1316 100644 --- a/pod/perlreref.pod +++ b/pod/perlreref.pod @@ -21,7 +21,7 @@ false if the match succeeds, and true if it fails. $var !~ /foo/; -C searches a string for a pattern match, +C searches a string for a pattern match, applying the given options. m Multiline mode - ^ and $ match internal lines @@ -33,21 +33,28 @@ applying the given options. o compile pattern Once g Global - all occurrences c don't reset pos on failed matches when using /g + a restrict \d, \s, \w and [:posix:] to match ASCII only + aa (two a's) also /i matches exclude ASCII/non-ASCII + l match according to current locale + u match according to Unicode rules + d match according to native rules unless something indicates + Unicode If 'pattern' is an empty string, the last I matched regex is used. Delimiters other than '/' may be used for both this operator and the following ones. The leading C can be omitted if the delimiter is '/'. -C lets you store a regex in a variable, +C lets you store a regex in a variable, or pass one around. Modifiers as for C, and are stored within the regex. -C substitutes matches of +C substitutes matches of 'pattern' with 'replacement'. Modifiers as for C, -with one addition: +with two additions: e Evaluate 'replacement' as an expression + r Return substitution and leave the original string untouched. 'e' may be specified multiple times. 'replacement' is interpreted as a double quoted string unless a single-quote (C<'>) is the delimiter. @@ -57,25 +64,26 @@ delimiters can be used. Must be reset with reset(). =head2 SYNTAX - \ Escapes the character immediately following it - . Matches any single character except a newline (unless /s is used) - ^ Matches at the beginning of the string (or line, if /m is used) - $ Matches at the end of the string (or line, if /m is used) - * Matches the preceding element 0 or more times - + Matches the preceding element 1 or more times - ? Matches the preceding element 0 or 1 times - {...} Specifies a range of occurrences for the element preceding it - [...] Matches any one of the characters contained within the brackets - (...) Groups subexpressions for capturing to $1, $2... - (?:...) Groups subexpressions without capturing (cluster) - | Matches either the subexpression preceding or following it - \1, \2, \3 ... Matches the text from the Nth group - \g1 or \g{1}, \g2 ... Matches the text from the Nth group - \g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group - \g{name} Named backreference - \k Named backreference - \k'name' Named backreference - (?P=name) Named backreference (python syntax) + \ Escapes the character immediately following it + . Matches any single character except a newline (unless /s is + used) + ^ Matches at the beginning of the string (or line, if /m is used) + $ Matches at the end of the string (or line, if /m is used) + * Matches the preceding element 0 or more times + + Matches the preceding element 1 or more times + ? Matches the preceding element 0 or 1 times + {...} Specifies a range of occurrences for the element preceding it + [...] Matches any one of the characters contained within the brackets + (...) Groups subexpressions for capturing to $1, $2... + (?:...) Groups subexpressions without capturing (cluster) + | Matches either the subexpression preceding or following it + \g1 or \g{1}, \g2 ... Matches the text from the Nth group + \1, \2, \3 ... Matches the text from the Nth group + \g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group + \g{name} Named backreference + \k Named backreference + \k'name' Named backreference + (?P=name) Named backreference (python syntax) =head2 ESCAPE SEQUENCES @@ -87,17 +95,19 @@ These work as in normal strings. \n Newline \r Carriage return \t Tab - \037 Any octal ASCII value - \x7f Any hexadecimal ASCII value - \x{263a} A wide hexadecimal value + \037 Char whose ordinal is the 3 octal digits, max \777 + \o{2307} Char whose ordinal is the octal number, unrestricted + \x7f Char whose ordinal is the 2 hex digits, max \xFF + \x{263a} Char whose ordinal is the hex number, unrestricted \cx Control-x - \N{name} A named character + \N{name} A named Unicode character or character sequence \N{U+263D} A Unicode character by hex ordinal \l Lowercase next character \u Titlecase next character \L Lowercase until \E \U Uppercase until \E + \F Foldcase until \E \Q Disable pattern metacharacters until \E \E End modification @@ -124,13 +134,13 @@ and L for details. \W A non-word character \s A whitespace character \S A non-whitespace character - \h An horizontal white space - \H A non horizontal white space - \N A non newline (when not followed by '{NAME}'; experimental; not - valid in a character class; equivalent to [^\n]; it's like '.' - without /s modifier) - \v A vertical white space - \V A non vertical white space + \h An horizontal whitespace + \H A non horizontal whitespace + \N A non newline (when not followed by '{NAME}'; experimental; + not valid in a character class; equivalent to [^\n]; it's + like '.' without /s modifier) + \v A vertical whitespace + \V A non vertical whitespace \R A generic newline (?>\v|\x0D\x0A) \C Match a byte (with Unicode, '.' matches a character) @@ -142,27 +152,46 @@ and L for details. POSIX character classes and their Unicode and Perl equivalents: - alnum IsAlnum Alphanumeric - alpha IsAlpha Alphabetic - ascii IsASCII Any ASCII char - blank IsSpace [ \t] Horizontal whitespace (GNU extension) - cntrl IsCntrl Control characters - digit IsDigit \d Digits - graph IsGraph Alphanumeric and punctuation - lower IsLower Lowercase chars (locale and Unicode aware) - print IsPrint Alphanumeric, punct, and space - punct IsPunct Punctuation - space IsSpace [\s\ck] Whitespace - IsSpacePerl \s Perl's whitespace definition - upper IsUpper Uppercase chars (locale and Unicode aware) - word IsWord \w Alphanumeric plus _ (Perl extension) - xdigit IsXDigit [0-9A-Fa-f] Hexadecimal digit + ASCII- Full- + POSIX range range backslash + [[:...:]] \p{...} \p{...} sequence Description + + ----------------------------------------------------------------------- + alnum PosixAlnum XPosixAlnum Alpha plus Digit + alpha PosixAlpha XPosixAlpha Alphabetic characters + ascii ASCII Any ASCII character + blank PosixBlank XPosixBlank \h Horizontal whitespace; + full-range also + written as + \p{HorizSpace} (GNU + extension) + cntrl PosixCntrl XPosixCntrl Control characters + digit PosixDigit XPosixDigit \d Decimal digits + graph PosixGraph XPosixGraph Alnum plus Punct + lower PosixLower XPosixLower Lowercase characters + print PosixPrint XPosixPrint Graph plus Print, but + not any Cntrls + punct PosixPunct XPosixPunct Punctuation and Symbols + in ASCII-range; just + punct outside it + space PosixSpace XPosixSpace [\s\cK] + PerlSpace XPerlSpace \s Perl's whitespace def'n + upper PosixUpper XPosixUpper Uppercase characters + word PosixWord XPosixWord \w Alnum + Unicode marks + + connectors, like '_' + (Perl extension) + xdigit ASCII_Hex_Digit XPosixDigit Hexadecimal digit, + ASCII-range is + [0-9A-Fa-f] + +Also, various synonyms like C<\p{Alpha}> for C<\p{XPosixAlpha}>; all listed +in L Within a character class: - POSIX traditional Unicode - [:digit:] \d \p{IsDigit} - [:^digit:] \D \P{IsDigit} + POSIX traditional Unicode + [:digit:] \d \p{Digit} + [:^digit:] \D \P{Digit} =head2 ANCHORS @@ -176,7 +205,6 @@ All are zero-width assertions. \Z Match string end (before optional newline) \z Match absolute string end \G Match where previous m//g left off - \K Keep the stuff left of the \K, don't include it in $& =head2 QUANTIFIERS @@ -222,6 +250,10 @@ There is no quantifier C<{,n}>. That's interpreted as a literal string. (?P>name) Recurse into a named subpattern (python syntax) (?(cond)yes|no) (?(cond)yes) Conditional expression, where "cond" can be: + (?=pat) look-ahead + (?!pat) negative look-ahead + (?<=pat) look-behind + (?) named subpattern has matched something ('name') named subpattern has matched something @@ -243,13 +275,15 @@ There is no quantifier C<{,n}>. That's interpreted as a literal string. ${^MATCH} Entire matched string ${^POSTMATCH} Everything after to matched string +Note to those still using Perl 5.16 or earlier: The use of C<$`>, C<$&> or C<$'> will slow down B regex use -within your program. Consult L for C<@-> -to see equivalent expressions that won't cause slow down. -See also L. Starting with Perl 5.10, you +within your program. Consult L for C<@-> +to see equivalent expressions that won't cause slowdown. +See also L. Starting with Perl 5.10, you can also use the equivalent variables C<${^PREMATCH}>, C<${^MATCH}> and C<${^POSTMATCH}>, but for them to be defined, you have to specify the C

(preserve) modifier on your regular expression. +In Perl 5.18, the use of C<$`>, C<$&> and C<$'> makes no speed difference. $1, $2 ... hold the Xth captured expr $+ Last parenthesized pattern match @@ -257,8 +291,8 @@ specify the C

(preserve) modifier on your regular expression. $^R Holds the result of the last (?{...}) expr @- Offsets of starts of groups. $-[0] holds start of whole match @+ Offsets of ends of groups. $+[0] holds end of whole match - %+ Named capture buffers - %- Named capture buffers, as array refs + %+ Named capture groups + %- Named capture groups, as array refs Captured groups are numbered according to their I paren. @@ -268,6 +302,7 @@ Captured groups are numbered according to their I paren. lcfirst Lowercase first char of a string uc Uppercase a string ucfirst Titlecase first char of a string + fc Foldcase a string pos Return or set current match position quotemeta Quote metacharacters @@ -276,8 +311,9 @@ Captured groups are numbered according to their I paren. split Use a regex to split a string into parts -The first four of these are like the escape sequences C<\L>, C<\l>, -C<\U>, and C<\u>. For Titlecase, see L. +The first five of these are like the escape sequences C<\L>, C<\l>, +C<\U>, C<\u>, and C<\F>. For Titlecase, see L; For +Foldcase, see L. =head2 TERMINOLOGY @@ -286,6 +322,12 @@ C<\U>, and C<\u>. For Titlecase, see L. Unicode concept which most often is equal to uppercase, but for certain characters like the German "sharp s" there is a difference. +=head3 Foldcase + +Unicode form that is useful when comparing strings regardless of case, +as certain characters have compex one-to-many case mappings. Primarily a +variant of lowercase. + =head1 AUTHOR Iain Truskett. Updated by the Perl 5 Porters. @@ -339,7 +381,7 @@ debugging. =item * -L +L =item *