This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Integrate:
[perl5.git] / pod / perlreref.pod
CommitLineData
bb25ec9b
JH
1=head1 NAME
2
3perlreref - Perl Regular Expressions Reference
4
5=head1 DESCRIPTION
6
7This is a quick reference to Perl's regular expressions.
8For full information see L<perlre> and L<perlop>, as well
d211ebfa 9as the L</"SEE ALSO"> section in this document.
bb25ec9b 10
d211ebfa 11=head2 OPERATORS
bb25ec9b
JH
12
13 =~ determines to which variable the regex is applied.
14 In its absence, $_ is used.
15
16 $var =~ /foo/;
17
d211ebfa
JH
18 !~ determines to which variable the regex is applied,
19 and negates the result of the match; it returns
20 false if the match succeeds, and true if it fails.
21
22 $var !~ /foo/;
23
bb25ec9b
JH
24 m/pattern/igmsoxc searches a string for a pattern match,
25 applying the given options.
26
27 i case-Insensitive
28 g Global - all occurrences
29 m Multiline mode - ^ and $ match internal lines
30 s match as a Single line - . matches \n
31 o compile pattern Once
32 x eXtended legibility - free whitespace and comments
d211ebfa 33 c don't reset pos on failed matches when using /g
bb25ec9b 34
d211ebfa 35 If 'pattern' is an empty string, the last I<successfully> matched
bb25ec9b
JH
36 regex is used. Delimiters other than '/' may be used for both this
37 operator and the following ones.
38
39 qr/pattern/imsox lets you store a regex in a variable,
40 or pass one around. Modifiers as for m// and are stored
41 within the regex.
42
43 s/pattern/replacement/igmsoxe substitutes matches of
44 'pattern' with 'replacement'. Modifiers as for m//
45 with one addition:
46
47 e Evaluate replacement as an expression
48
49 'e' may be specified multiple times. 'replacement' is interpreted
50 as a double quoted string unless a single-quote (') is the delimiter.
51
52 ?pattern? is like m/pattern/ but matches only once. No alternate
d211ebfa 53 delimiters can be used. Must be reset with L<reset|perlfunc/reset>.
bb25ec9b 54
d211ebfa 55=head2 SYNTAX
bb25ec9b 56
d211ebfa 57 \ Escapes the character immediately following it
bb25ec9b
JH
58 . Matches any single character except a newline (unless /s is used)
59 ^ Matches at the beginning of the string (or line, if /m is used)
60 $ Matches at the end of the string (or line, if /m is used)
61 * Matches the preceding element 0 or more times
62 + Matches the preceding element 1 or more times
63 ? Matches the preceding element 0 or 1 times
64 {...} Specifies a range of occurrences for the element preceding it
65 [...] Matches any one of the characters contained within the brackets
66 (...) Groups subexpressions for capturing to $1, $2...
67 (?:...) Groups subexpressions without capturing (cluster)
d211ebfa 68 | Matches either the subexpression preceding or following it
bb25ec9b
JH
69 \1, \2 ... The text from the Nth group
70
71=head2 ESCAPE SEQUENCES
72
73These work as in normal strings.
74
75 \a Alarm (beep)
76 \e Escape
77 \f Formfeed
78 \n Newline
79 \r Carriage return
80 \t Tab
81 \038 Any octal ASCII value
82 \x7f Any hexadecimal ASCII value
83 \x{263a} A wide hexadecimal value
84 \cx Control-x
85 \N{name} A named character
86
d211ebfa
JH
87 \l Lowercase next character
88 \u Titlecase next character
bb25ec9b
JH
89 \L Lowercase until \E
90 \U Uppercase until \E
91 \Q Disable pattern metacharacters until \E
92 \E End case modification
93
d211ebfa
JH
94For Titlecase, see L</Titlecase>.
95
bb25ec9b
JH
96This one works differently from normal strings:
97
98 \b An assertion, not backspace, except in a character class
99
100=head2 CHARACTER CLASSES
101
102 [amy] Match 'a', 'm' or 'y'
103 [f-j] Dash specifies "range"
104 [f-j-] Dash escaped or at start or end means 'dash'
d211ebfa 105 [^f-j] Caret indicates "match any character _except_ these"
bb25ec9b
JH
106
107The following work within or without a character class:
108
109 \d A digit, same as [0-9]
110 \D A nondigit, same as [^0-9]
d211ebfa
JH
111 \w A word character (alphanumeric), same as [a-zA-Z0-9_]
112 \W A non-word character, [^a-zA-Z0-9_]
bb25ec9b
JH
113 \s A whitespace character, same as [ \t\n\r\f]
114 \S A non-whitespace character, [^ \t\n\r\f]
d211ebfa 115 \C Match a byte (with Unicode, '.' matches char)
bb25ec9b
JH
116 \pP Match P-named (Unicode) property
117 \p{...} Match Unicode property with long name
118 \PP Match non-P
119 \P{...} Match lack of Unicode property with long name
120 \X Match extended unicode sequence
121
122POSIX character classes and their Unicode and Perl equivalents:
123
124 alnum IsAlnum Alphanumeric
125 alpha IsAlpha Alphabetic
126 ascii IsASCII Any ASCII char
127 blank IsSpace [ \t] Horizontal whitespace (GNU)
128 cntrl IsCntrl Control characters
129 digit IsDigit \d Digits
130 graph IsGraph Alphanumeric and punctuation
131 lower IsLower Lowercase chars (locale aware)
132 print IsPrint Alphanumeric, punct, and space
133 punct IsPunct Punctuation
134 space IsSpace [\s\ck] Whitespace
135 IsSpacePerl \s Perl's whitespace definition
136 upper IsUpper Uppercase chars (locale aware)
137 word IsWord \w Alphanumeric plus _ (Perl)
138 xdigit IsXDigit [\dA-Fa-f] Hexadecimal digit
139
140Within a character class:
141
142 POSIX traditional Unicode
143 [:digit:] \d \p{IsDigit}
144 [:^digit:] \D \P{IsDigit}
145
146=head2 ANCHORS
147
148All are zero-width assertions.
149
150 ^ Match string start (or line, if /m is used)
151 $ Match string end (or line, if /m is used) or before newline
152 \b Match word boundary (between \w and \W)
d211ebfa 153 \B Match except at word boundary (between \w and \w or \W and \W)
bb25ec9b 154 \A Match string start (regardless of /m)
d211ebfa 155 \Z Match string end (before optional newline)
bb25ec9b
JH
156 \z Match absolute string end
157 \G Match where previous m//g left off
bb25ec9b
JH
158
159=head2 QUANTIFIERS
160
d211ebfa 161Quantifiers are greedy by default -- match the B<longest> leftmost.
bb25ec9b
JH
162
163 Maximal Minimal Allowed range
164 ------- ------- -------------
165 {n,m} {n,m}? Must occur at least n times but no more than m times
166 {n,} {n,}? Must occur at least n times
d211ebfa 167 {n} {n}? Must occur exactly n times
bb25ec9b
JH
168 * *? 0 or more times (same as {0,})
169 + +? 1 or more times (same as {1,})
170 ? ?? 0 or 1 time (same as {0,1})
171
d211ebfa
JH
172There is no quantifier {,n} -- that gets understood as a literal string.
173
bb25ec9b
JH
174=head2 EXTENDED CONSTRUCTS
175
176 (?#text) A comment
d211ebfa 177 (?imxs-imsx:...) Enable/disable option (as per m// modifiers)
bb25ec9b
JH
178 (?=...) Zero-width positive lookahead assertion
179 (?!...) Zero-width negative lookahead assertion
d211ebfa 180 (?<=...) Zero-width positive lookbehind assertion
bb25ec9b
JH
181 (?<!...) Zero-width negative lookbehind assertion
182 (?>...) Grab what we can, prohibit backtracking
183 (?{ code }) Embedded code, return value becomes $^R
184 (??{ code }) Dynamic regex, return value used as regex
185 (?(cond)yes|no) cond being integer corresponding to capturing parens
186 (?(cond)yes) or a lookaround/eval zero-width assertion
187
d211ebfa 188=head2 VARIABLES
bb25ec9b
JH
189
190 $_ Default variable for operators to use
69615930 191 $* Enable multiline matching (deprecated; not in 5.9.0 or later)
bb25ec9b
JH
192
193 $& Entire matched string
194 $` Everything prior to matched string
195 $' Everything after to matched string
196
197The use of those last three will slow down B<all> regex use
198within your program. Consult L<perlvar> for C<@LAST_MATCH_START>
199to see equivalent expressions that won't cause slow down.
200See also L<Devel::SawAmpersand>.
201
202 $1, $2 ... hold the Xth captured expr
203 $+ Last parenthesized pattern match
204 $^N Holds the most recently closed capture
205 $^R Holds the result of the last (?{...}) expr
d211ebfa
JH
206 @- Offsets of starts of groups. $-[0] holds start of whole match
207 @+ Offsets of ends of groups. $+[0] holds end of whole match
bb25ec9b 208
d211ebfa 209Captured groups are numbered according to their I<opening> paren.
bb25ec9b 210
d211ebfa 211=head2 FUNCTIONS
bb25ec9b
JH
212
213 lc Lowercase a string
214 lcfirst Lowercase first char of a string
215 uc Uppercase a string
216 ucfirst Titlecase first char of a string
d211ebfa 217
bb25ec9b
JH
218 pos Return or set current match position
219 quotemeta Quote metacharacters
220 reset Reset ?pattern? status
221 study Analyze string for optimizing matching
222
223 split Use regex to split a string into parts
224
d211ebfa
JH
225The first four of these are like the escape sequences C<\L>, C<\l>,
226C<\U>, and C<\u>. For Titlecase, see L</Titlecase>.
227
228=head2 TERMINOLOGY
229
230=head3 Titlecase
231
232Unicode concept which most often is equal to uppercase, but for
233certain characters like the German "sharp s" there is a difference.
234
bb25ec9b
JH
235=head1 AUTHOR
236
237Iain Truskett.
238
239This document may be distributed under the same terms as Perl itself.
240
241=head1 SEE ALSO
242
243=over 4
244
245=item *
246
247L<perlretut> for a tutorial on regular expressions.
248
249=item *
250
251L<perlrequick> for a rapid tutorial.
252
253=item *
254
255L<perlre> for more details.
256
257=item *
258
259L<perlvar> for details on the variables.
260
261=item *
262
263L<perlop> for details on the operators.
264
265=item *
266
267L<perlfunc> for details on the functions.
268
269=item *
270
271L<perlfaq6> for FAQs on regular expressions.
272
273=item *
274
275The L<re> module to alter behaviour and aid
276debugging.
277
278=item *
279
280L<perldebug/"Debugging regular expressions">
281
282=item *
283
284L<perluniintro>, L<perlunicode>, L<charnames> and L<locale>
285for details on regexes and internationalisation.
286
287=item *
288
289I<Mastering Regular Expressions> by Jeffrey Friedl
290(F<http://regex.info/>) for a thorough grounding and
291reference on the topic.
292
293=back
294
295=head1 THANKS
296
297David P.C. Wollmann,
298Richard Soderberg,
299Sean M. Burke,
300Tom Christiansen,
301Jim Cromie,
302and
303Jeffrey Goff
304for useful advice.
d211ebfa
JH
305
306=cut