pod/perlreref.pod

   1 =head1 NAME
   2
   3 perlreref - Perl Regular Expressions Reference
   4
   5 =head1 DESCRIPTION
   6
   7 This is a quick reference to Perl's regular expressions.
   8 For full information see L<perlre> and L<perlop>, as well
   9 as the L<references|/"SEE ALSO"> section in this document.
  10
  11 =head1 OPERATORS
  12
  13   =~ determines to which variable the regex is applied.
  14      In its absence, $_ is used.
  15
  16         $var =~ /foo/;
  17
  18   m/pattern/igmsoxc searches a string for a pattern match,
  19      applying the given options.
  20
  21         i  case-Insensitive
  22         g  Global - all occurrences
  23         m  Multiline mode - ^ and $ match internal lines
  24         s  match as a Single line - . matches \n
  25         o  compile pattern Once
  26         x  eXtended legibility - free whitespace and comments
  27         c  don't reset pos on fails when using /g
  28
  29      If 'pattern' is an empty string, the last I<successfully> match
  30      regex is used. Delimiters other than '/' may be used for both this
  31      operator and the following ones.
  32
  33   qr/pattern/imsox lets you store a regex in a variable,
  34      or pass one around. Modifiers as for m// and are stored
  35      within the regex.
  36
  37   s/pattern/replacement/igmsoxe substitutes matches of
  38      'pattern' with 'replacement'. Modifiers as for m//
  39      with one addition:
  40
  41         e  Evaluate replacement as an expression
  42
  43      'e' may be specified multiple times. 'replacement' is interpreted
  44      as a double quoted string unless a single-quote (') is the delimiter.
  45
  46   ?pattern? is like m/pattern/ but matches only once. No alternate
  47      delimiters can be used. Must be reset with 'reset'.
  48
  49 =head1 SYNTAX
  50
  51    \       Escapes the character(s) immediately following it
  52    .       Matches any single character except a newline (unless /s is used)
  53    ^       Matches at the beginning of the string (or line, if /m is used)
  54    $       Matches at the end of the string (or line, if /m is used)
  55    *       Matches the preceding element 0 or more times
  56    +       Matches the preceding element 1 or more times
  57    ?       Matches the preceding element 0 or 1 times
  58    {...}   Specifies a range of occurrences for the element preceding it
  59    [...]   Matches any one of the characters contained within the brackets
  60    (...)   Groups subexpressions for capturing to $1, $2...
  61    (?:...) Groups subexpressions without capturing (cluster)
  62    |       Matches either the expression preceding or following it
  63    \1, \2 ...  The text from the Nth group
  64
  65 =head2 ESCAPE SEQUENCES
  66
  67 These work as in normal strings.
  68
  69    \a       Alarm (beep)
  70    \e       Escape
  71    \f       Formfeed
  72    \n       Newline
  73    \r       Carriage return
  74    \t       Tab
  75    \038     Any octal ASCII value
  76    \x7f     Any hexadecimal ASCII value
  77    \x{263a} A wide hexadecimal value
  78    \cx      Control-x
  79    \N{name} A named character
  80
  81    \l  Lowercase until next character
  82    \u  Uppercase until next character
  83    \L  Lowercase until \E
  84    \U  Uppercase until \E
  85    \Q  Disable pattern metacharacters until \E
  86    \E  End case modification
  87
  88 This one works differently from normal strings:
  89
  90    \b  An assertion, not backspace, except in a character class
  91
  92 =head2 CHARACTER CLASSES
  93
  94    [amy]    Match 'a', 'm' or 'y'
  95    [f-j]    Dash specifies "range"
  96    [f-j-]   Dash escaped or at start or end means 'dash'
  97    [^f-j]   Caret indicates "match char any _except_ these"
  98
  99 The following work within or without a character class:
 100
 101    \d      A digit, same as [0-9]
 102    \D      A nondigit, same as [^0-9]
 103    \w      A word character (alphanumeric), same as [a-zA-Z_0-9]
 104    \W      A non-word character, [^a-zA-Z_0-9]
 105    \s      A whitespace character, same as [ \t\n\r\f]
 106    \S      A non-whitespace character, [^ \t\n\r\f]
 107    \C      Match a byte (with Unicode. '.' matches char)
 108    \pP     Match P-named (Unicode) property
 109    \p{...} Match Unicode property with long name
 110    \PP     Match non-P
 111    \P{...} Match lack of Unicode property with long name
 112    \X      Match extended unicode sequence
 113
 114 POSIX character classes and their Unicode and Perl equivalents:
 115
 116    alnum   IsAlnum             Alphanumeric
 117    alpha   IsAlpha             Alphabetic
 118    ascii   IsASCII             Any ASCII char
 119    blank   IsSpace  [ \t]      Horizontal whitespace (GNU)
 120    cntrl   IsCntrl             Control characters
 121    digit   IsDigit  \d         Digits
 122    graph   IsGraph             Alphanumeric and punctuation
 123    lower   IsLower             Lowercase chars (locale aware)
 124    print   IsPrint             Alphanumeric, punct, and space
 125    punct   IsPunct             Punctuation
 126    space   IsSpace  [\s\ck]    Whitespace
 127            IsSpacePerl   \s    Perl's whitespace definition
 128    upper   IsUpper             Uppercase chars (locale aware)
 129    word    IsWord   \w         Alphanumeric plus _ (Perl)
 130    xdigit  IsXDigit [\dA-Fa-f] Hexadecimal digit
 131
 132 Within a character class:
 133
 134     POSIX       traditional   Unicode
 135     [:digit:]       \d        \p{IsDigit}
 136     [:^digit:]      \D        \P{IsDigit}
 137
 138 =head2 ANCHORS
 139
 140 All are zero-width assertions.
 141
 142    ^  Match string start (or line, if /m is used)
 143    $  Match string end (or line, if /m is used) or before newline
 144    \b Match word boundary (between \w and \W)
 145    \B Match except at word boundary
 146    \A Match string start (regardless of /m)
 147    \Z Match string end (preceding optional newline)
 148    \z Match absolute string end
 149    \G Match where previous m//g left off
 150    \c Suppresses resetting of search position when used with /g.
 151       Without \c, search pattern is reset to the beginning of the string
 152
 153 =head2 QUANTIFIERS
 154
 155 Quantifiers are greedy by default --- match the B<longest> leftmost.
 156
 157    Maximal Minimal Allowed range
 158    ------- ------- -------------
 159    {n,m}   {n,m}?  Must occur at least n times but no more than m times
 160    {n,}    {n,}?   Must occur at least n times
 161    {n}     {n}?    Must match exactly n times
 162    *       *?      0 or more times (same as {0,})
 163    +       +?      1 or more times (same as {1,})
 164    ?       ??      0 or 1 time (same as {0,1})
 165
 166 =head2 EXTENDED CONSTRUCTS
 167
 168    (?#text)         A comment
 169    (?imxs-imsx:...) Enable/disable option (as per m//)
 170    (?=...)          Zero-width positive lookahead assertion
 171    (?!...)          Zero-width negative lookahead assertion
 172    (?<...)          Zero-width positive lookbehind assertion
 173    (?<!...)         Zero-width negative lookbehind assertion
 174    (?>...)          Grab what we can, prohibit backtracking
 175    (?{ code })      Embedded code, return value becomes $^R
 176    (??{ code })     Dynamic regex, return value used as regex
 177    (?(cond)yes|no)  cond being integer corresponding to capturing parens
 178    (?(cond)yes)        or a lookaround/eval zero-width assertion
 179
 180 =head1 VARIABLES
 181
 182    $_    Default variable for operators to use
 183    $*    Enable multiline matching (deprecated; not in 5.9.0 or later)
 184
 185    $&    Entire matched string
 186    $`    Everything prior to matched string
 187    $'    Everything after to matched string
 188
 189 The use of those last three will slow down B<all> regex use
 190 within your program. Consult L<perlvar> for C<@LAST_MATCH_START>
 191 to see equivalent expressions that won't cause slow down.
 192 See also L<Devel::SawAmpersand>.
 193
 194    $1, $2 ...  hold the Xth captured expr
 195    $+    Last parenthesized pattern match
 196    $^N   Holds the most recently closed capture
 197    $^R   Holds the result of the last (?{...}) expr
 198    @-    Offsets of starts of groups. [0] holds start of whole match
 199    @+    Offsets of ends of groups. [0] holds end of whole match
 200
 201 Capture groups are numbered according to their I<opening> paren.
 202
 203 =head1 FUNCTIONS
 204
 205    lc          Lowercase a string
 206    lcfirst     Lowercase first char of a string
 207    uc          Uppercase a string
 208    ucfirst     Titlecase first char of a string
 209    pos         Return or set current match position
 210    quotemeta   Quote metacharacters
 211    reset       Reset ?pattern? status
 212    study       Analyze string for optimizing matching
 213
 214    split       Use regex to split a string into parts
 215
 216 =head1 AUTHOR
 217
 218 Iain Truskett.
 219
 220 This document may be distributed under the same terms as Perl itself.
 221
 222 =head1 SEE ALSO
 223
 224 =over 4
 225
 226 =item *
 227
 228 L<perlretut> for a tutorial on regular expressions.
 229
 230 =item *
 231
 232 L<perlrequick> for a rapid tutorial.
 233
 234 =item *
 235
 236 L<perlre> for more details.
 237
 238 =item *
 239
 240 L<perlvar> for details on the variables.
 241
 242 =item *
 243
 244 L<perlop> for details on the operators.
 245
 246 =item *
 247
 248 L<perlfunc> for details on the functions.
 249
 250 =item *
 251
 252 L<perlfaq6> for FAQs on regular expressions.
 253
 254 =item *
 255
 256 The L<re> module to alter behaviour and aid
 257 debugging.
 258
 259 =item *
 260
 261 L<perldebug/"Debugging regular expressions">
 262
 263 =item *
 264
 265 L<perluniintro>, L<perlunicode>, L<charnames> and L<locale>
 266 for details on regexes and internationalisation.
 267
 268 =item *
 269
 270 I<Mastering Regular Expressions> by Jeffrey Friedl
 271 (F<http://regex.info/>) for a thorough grounding and
 272 reference on the topic.
 273
 274 =back
 275
 276 =head1 THANKS
 277
 278 David P.C. Wollmann,
 279 Richard Soderberg,
 280 Sean M. Burke,
 281 Tom Christiansen,
 282 Jim Cromie,
 283 and
 284 Jeffrey Goff
 285 for useful advice.