X-Git-Url: https://perl5.git.perl.org/perl5.git/blobdiff_plain/41ef34de832ea2283fbc3b35c6da583d89497204..4ee2b8db537d28b77d127a86307e426289e5c8b5:/pod/perlreref.pod diff --git a/pod/perlreref.pod b/pod/perlreref.pod index efae00c..db7c173 100644 --- a/pod/perlreref.pod +++ b/pod/perlreref.pod @@ -21,7 +21,7 @@ false if the match succeeds, and true if it fails. $var !~ /foo/; -C searches a string for a pattern match, +C searches a string for a pattern match, applying the given options. m Multiline mode - ^ and $ match internal lines @@ -33,17 +33,24 @@ applying the given options. o compile pattern Once g Global - all occurrences c don't reset pos on failed matches when using /g + a restrict \d, \s, \w and [:posix:] to match ASCII only + aa (two a's) also /i matches exclude ASCII/non-ASCII + l match according to current locale + u match according to Unicode rules + d match according to native rules unless something indicates + Unicode + n Non-capture mode. Don't let () fill in $1, $2, etc... If 'pattern' is an empty string, the last I matched regex is used. Delimiters other than '/' may be used for both this operator and the following ones. The leading C can be omitted if the delimiter is '/'. -C lets you store a regex in a variable, +C lets you store a regex in a variable, or pass one around. Modifiers as for C, and are stored within the regex. -C substitutes matches of +C substitutes matches of 'pattern' with 'replacement'. Modifiers as for C, with two additions: @@ -101,6 +108,7 @@ These work as in normal strings. \u Titlecase next character \L Lowercase until \E \U Uppercase until \E + \F Foldcase until \E \Q Disable pattern metacharacters until \E \E End modification @@ -129,14 +137,13 @@ and L for details. \S A non-whitespace character \h An horizontal whitespace \H A non horizontal whitespace - \N A non newline (when not followed by '{NAME}'; experimental; + \N A non newline (when not followed by '{NAME}';; not valid in a character class; equivalent to [^\n]; it's like '.' without /s modifier) \v A vertical whitespace \V A non vertical whitespace \R A generic newline (?>\v|\x0D\x0A) - \C Match a byte (with Unicode, '.' matches a character) \pP Match P-named (Unicode) property \p{...} Match Unicode property with name longer than 1 character \PP Match non-P @@ -150,7 +157,7 @@ POSIX character classes and their Unicode and Perl equivalents: [[:...:]] \p{...} \p{...} sequence Description ----------------------------------------------------------------------- - alnum PosixAlnum XPosixAlnum Alpha plus Digit + alnum PosixAlnum XPosixAlnum 'alpha' plus 'digit' alpha PosixAlpha XPosixAlpha Alphabetic characters ascii ASCII Any ASCII character blank PosixBlank XPosixBlank \h Horizontal whitespace; @@ -160,19 +167,18 @@ POSIX character classes and their Unicode and Perl equivalents: extension) cntrl PosixCntrl XPosixCntrl Control characters digit PosixDigit XPosixDigit \d Decimal digits - graph PosixGraph XPosixGraph Alnum plus Punct + graph PosixGraph XPosixGraph 'alnum' plus 'punct' lower PosixLower XPosixLower Lowercase characters - print PosixPrint XPosixPrint Graph plus Print, but - not any Cntrls + print PosixPrint XPosixPrint 'graph' plus 'space', + but not any Controls punct PosixPunct XPosixPunct Punctuation and Symbols in ASCII-range; just punct outside it - space PosixSpace XPosixSpace [\s\cK] Whitespace - PerlSpace XPerlSpace \s Perl's whitespace def'n + space PosixSpace XPosixSpace \s Whitespace upper PosixUpper XPosixUpper Uppercase characters - word PerlWord XPosixWord \w Alnum + Unicode marks + - connectors, like '_' - (Perl extension) + word PosixWord XPosixWord \w 'alnum' + Unicode marks + + connectors, like + '_' (Perl extension) xdigit ASCII_Hex_Digit XPosixDigit Hexadecimal digit, ASCII-range is [0-9A-Fa-f] @@ -192,6 +198,8 @@ All are zero-width assertions. ^ Match string start (or line, if /m is used) $ Match string end (or line, if /m is used) or before newline + \b{} Match boundary of type specified within the braces + \B{} Match wherever \b{} doesn't match \b Match word boundary (between \w and \W) \B Match except at word boundary (between \w and \w or \W and \W) \A Match string start (regardless of /m) @@ -234,6 +242,7 @@ There is no quantifier C<{,n}>. That's interpreted as a literal string. (?...) Named capture (?'name'...) Named capture (?P...) Named capture (python syntax) + (?[...]) Extended bracketed character class (?{ code }) Embedded code, return value becomes $^R (??{ code }) Dynamic regex, return value used as regex (?N) Recurse into subpattern number N @@ -243,10 +252,10 @@ There is no quantifier C<{,n}>. That's interpreted as a literal string. (?P>name) Recurse into a named subpattern (python syntax) (?(cond)yes|no) (?(cond)yes) Conditional expression, where "cond" can be: - (?=pat) look-ahead - (?!pat) negative look-ahead - (?<=pat) look-behind - (?) named subpattern has matched something ('name') named subpattern has matched something @@ -268,6 +277,7 @@ There is no quantifier C<{,n}>. That's interpreted as a literal string. ${^MATCH} Entire matched string ${^POSTMATCH} Everything after to matched string +Note to those still using Perl 5.18 or earlier: The use of C<$`>, C<$&> or C<$'> will slow down B regex use within your program. Consult L for C<@-> to see equivalent expressions that won't cause slow down. @@ -275,6 +285,7 @@ See also L. Starting with Perl 5.10, you can also use the equivalent variables C<${^PREMATCH}>, C<${^MATCH}> and C<${^POSTMATCH}>, but for them to be defined, you have to specify the C

(preserve) modifier on your regular expression. +In Perl 5.20, the use of C<$`>, C<$&> and C<$'> makes no speed difference. $1, $2 ... hold the Xth captured expr $+ Last parenthesized pattern match @@ -293,6 +304,7 @@ Captured groups are numbered according to their I paren. lcfirst Lowercase first char of a string uc Uppercase a string ucfirst Titlecase first char of a string + fc Foldcase a string pos Return or set current match position quotemeta Quote metacharacters @@ -301,8 +313,9 @@ Captured groups are numbered according to their I paren. split Use a regex to split a string into parts -The first four of these are like the escape sequences C<\L>, C<\l>, -C<\U>, and C<\u>. For Titlecase, see L. +The first five of these are like the escape sequences C<\L>, C<\l>, +C<\U>, C<\u>, and C<\F>. For Titlecase, see L; For +Foldcase, see L. =head2 TERMINOLOGY @@ -311,6 +324,12 @@ C<\U>, and C<\u>. For Titlecase, see L. Unicode concept which most often is equal to uppercase, but for certain characters like the German "sharp s" there is a difference. +=head3 Foldcase + +Unicode form that is useful when comparing strings regardless of case, +as certain characters have complex one-to-many case mappings. Primarily a +variant of lowercase. + =head1 AUTHOR Iain Truskett. Updated by the Perl 5 Porters.