Commit | Line | Data |
---|---|---|
30487ceb RGS |
1 | =head1 NAME |
2 | ||
3 | perlreref - Perl Regular Expressions Reference | |
4 | ||
5 | =head1 DESCRIPTION | |
6 | ||
7 | This is a quick reference to Perl's regular expressions. | |
8 | For full information see L<perlre> and L<perlop>, as well | |
6d014f17 | 9 | as the L</"SEE ALSO"> section in this document. |
30487ceb | 10 | |
a5365663 | 11 | =head2 OPERATORS |
30487ceb RGS |
12 | |
13 | =~ determines to which variable the regex is applied. | |
e5a7b003 | 14 | In its absence, $_ is used. |
30487ceb RGS |
15 | |
16 | $var =~ /foo/; | |
17 | ||
6d014f17 JH |
18 | !~ determines to which variable the regex is applied, |
19 | and negates the result of the match; it returns | |
20 | false if the match succeeds, and true if it fails. | |
21 | ||
22 | $var !~ /foo/; | |
23 | ||
30487ceb RGS |
24 | m/pattern/igmsoxc searches a string for a pattern match, |
25 | applying the given options. | |
26 | ||
27 | i case-Insensitive | |
28 | g Global - all occurrences | |
29 | m Multiline mode - ^ and $ match internal lines | |
30 | s match as a Single line - . matches \n | |
31 | o compile pattern Once | |
32 | x eXtended legibility - free whitespace and comments | |
6d014f17 | 33 | c don't reset pos on failed matches when using /g |
30487ceb | 34 | |
6d014f17 | 35 | If 'pattern' is an empty string, the last I<successfully> matched |
e5a7b003 | 36 | regex is used. Delimiters other than '/' may be used for both this |
30487ceb RGS |
37 | operator and the following ones. |
38 | ||
39 | qr/pattern/imsox lets you store a regex in a variable, | |
e5a7b003 | 40 | or pass one around. Modifiers as for m// and are stored |
30487ceb RGS |
41 | within the regex. |
42 | ||
43 | s/pattern/replacement/igmsoxe substitutes matches of | |
e5a7b003 RGS |
44 | 'pattern' with 'replacement'. Modifiers as for m// |
45 | with one addition: | |
30487ceb RGS |
46 | |
47 | e Evaluate replacement as an expression | |
48 | ||
49 | 'e' may be specified multiple times. 'replacement' is interpreted | |
50 | as a double quoted string unless a single-quote (') is the delimiter. | |
51 | ||
e5a7b003 | 52 | ?pattern? is like m/pattern/ but matches only once. No alternate |
6d014f17 | 53 | delimiters can be used. Must be reset with L<reset|perlfunc/reset>. |
30487ceb | 54 | |
a5365663 | 55 | =head2 SYNTAX |
30487ceb | 56 | |
6d014f17 | 57 | \ Escapes the character immediately following it |
e5a7b003 RGS |
58 | . Matches any single character except a newline (unless /s is used) |
59 | ^ Matches at the beginning of the string (or line, if /m is used) | |
60 | $ Matches at the end of the string (or line, if /m is used) | |
61 | * Matches the preceding element 0 or more times | |
62 | + Matches the preceding element 1 or more times | |
63 | ? Matches the preceding element 0 or 1 times | |
64 | {...} Specifies a range of occurrences for the element preceding it | |
65 | [...] Matches any one of the characters contained within the brackets | |
66 | (...) Groups subexpressions for capturing to $1, $2... | |
67 | (?:...) Groups subexpressions without capturing (cluster) | |
6d014f17 | 68 | | Matches either the subexpression preceding or following it |
30487ceb RGS |
69 | \1, \2 ... The text from the Nth group |
70 | ||
71 | =head2 ESCAPE SEQUENCES | |
72 | ||
73 | These work as in normal strings. | |
74 | ||
75 | \a Alarm (beep) | |
76 | \e Escape | |
77 | \f Formfeed | |
78 | \n Newline | |
79 | \r Carriage return | |
80 | \t Tab | |
6ed007ae | 81 | \037 Any octal ASCII value |
30487ceb RGS |
82 | \x7f Any hexadecimal ASCII value |
83 | \x{263a} A wide hexadecimal value | |
84 | \cx Control-x | |
85 | \N{name} A named character | |
86 | ||
6d014f17 | 87 | \l Lowercase next character |
d3b55b48 | 88 | \u Titlecase next character |
30487ceb | 89 | \L Lowercase until \E |
d3b55b48 | 90 | \U Uppercase until \E |
30487ceb RGS |
91 | \Q Disable pattern metacharacters until \E |
92 | \E End case modification | |
93 | ||
47e8a552 IT |
94 | For Titlecase, see L</Titlecase>. |
95 | ||
30487ceb RGS |
96 | This one works differently from normal strings: |
97 | ||
98 | \b An assertion, not backspace, except in a character class | |
99 | ||
100 | =head2 CHARACTER CLASSES | |
101 | ||
102 | [amy] Match 'a', 'm' or 'y' | |
103 | [f-j] Dash specifies "range" | |
104 | [f-j-] Dash escaped or at start or end means 'dash' | |
6d014f17 | 105 | [^f-j] Caret indicates "match any character _except_ these" |
30487ceb | 106 | |
e04a154e JH |
107 | The following sequences work within or without a character class. |
108 | The first six are locale aware, all are Unicode aware. The default | |
109 | character class equivalent are given. See L<perllocale> and | |
110 | L<perlunicode> for details. | |
111 | ||
112 | \d A digit [0-9] | |
113 | \D A nondigit [^0-9] | |
114 | \w A word character [a-zA-Z0-9_] | |
115 | \W A non-word character [^a-zA-Z0-9_] | |
116 | \s A whitespace character [ \t\n\r\f] | |
117 | \S A non-whitespace character [^ \t\n\r\f] | |
118 | ||
119 | \C Match a byte (with Unicode, '.' matches a character) | |
30487ceb RGS |
120 | \pP Match P-named (Unicode) property |
121 | \p{...} Match Unicode property with long name | |
122 | \PP Match non-P | |
123 | \P{...} Match lack of Unicode property with long name | |
124 | \X Match extended unicode sequence | |
125 | ||
126 | POSIX character classes and their Unicode and Perl equivalents: | |
127 | ||
e04a154e JH |
128 | alnum IsAlnum Alphanumeric |
129 | alpha IsAlpha Alphabetic | |
130 | ascii IsASCII Any ASCII char | |
131 | blank IsSpace [ \t] Horizontal whitespace (GNU extension) | |
132 | cntrl IsCntrl Control characters | |
133 | digit IsDigit \d Digits | |
134 | graph IsGraph Alphanumeric and punctuation | |
135 | lower IsLower Lowercase chars (locale and Unicode aware) | |
136 | print IsPrint Alphanumeric, punct, and space | |
137 | punct IsPunct Punctuation | |
138 | space IsSpace [\s\ck] Whitespace | |
139 | IsSpacePerl \s Perl's whitespace definition | |
140 | upper IsUpper Uppercase chars (locale and Unicode aware) | |
141 | word IsWord \w Alphanumeric plus _ (Perl extension) | |
142 | xdigit IsXDigit [0-9A-Fa-f] Hexadecimal digit | |
30487ceb RGS |
143 | |
144 | Within a character class: | |
145 | ||
146 | POSIX traditional Unicode | |
147 | [:digit:] \d \p{IsDigit} | |
148 | [:^digit:] \D \P{IsDigit} | |
149 | ||
150 | =head2 ANCHORS | |
151 | ||
152 | All are zero-width assertions. | |
153 | ||
154 | ^ Match string start (or line, if /m is used) | |
155 | $ Match string end (or line, if /m is used) or before newline | |
156 | \b Match word boundary (between \w and \W) | |
6d014f17 | 157 | \B Match except at word boundary (between \w and \w or \W and \W) |
30487ceb | 158 | \A Match string start (regardless of /m) |
6d014f17 | 159 | \Z Match string end (before optional newline) |
30487ceb RGS |
160 | \z Match absolute string end |
161 | \G Match where previous m//g left off | |
30487ceb RGS |
162 | |
163 | =head2 QUANTIFIERS | |
164 | ||
6d014f17 | 165 | Quantifiers are greedy by default -- match the B<longest> leftmost. |
30487ceb RGS |
166 | |
167 | Maximal Minimal Allowed range | |
168 | ------- ------- ------------- | |
169 | {n,m} {n,m}? Must occur at least n times but no more than m times | |
170 | {n,} {n,}? Must occur at least n times | |
6d014f17 | 171 | {n} {n}? Must occur exactly n times |
30487ceb RGS |
172 | * *? 0 or more times (same as {0,}) |
173 | + +? 1 or more times (same as {1,}) | |
174 | ? ?? 0 or 1 time (same as {0,1}) | |
175 | ||
6d014f17 JH |
176 | There is no quantifier {,n} -- that gets understood as a literal string. |
177 | ||
30487ceb RGS |
178 | =head2 EXTENDED CONSTRUCTS |
179 | ||
180 | (?#text) A comment | |
6d014f17 | 181 | (?imxs-imsx:...) Enable/disable option (as per m// modifiers) |
30487ceb RGS |
182 | (?=...) Zero-width positive lookahead assertion |
183 | (?!...) Zero-width negative lookahead assertion | |
6d014f17 | 184 | (?<=...) Zero-width positive lookbehind assertion |
30487ceb RGS |
185 | (?<!...) Zero-width negative lookbehind assertion |
186 | (?>...) Grab what we can, prohibit backtracking | |
187 | (?{ code }) Embedded code, return value becomes $^R | |
188 | (??{ code }) Dynamic regex, return value used as regex | |
e5a7b003 | 189 | (?(cond)yes|no) cond being integer corresponding to capturing parens |
30487ceb RGS |
190 | (?(cond)yes) or a lookaround/eval zero-width assertion |
191 | ||
a5365663 | 192 | =head2 VARIABLES |
30487ceb RGS |
193 | |
194 | $_ Default variable for operators to use | |
8da7c437 | 195 | $* Enable multiline matching (deprecated; not in 5.9.0 or later) |
30487ceb RGS |
196 | |
197 | $& Entire matched string | |
198 | $` Everything prior to matched string | |
199 | $' Everything after to matched string | |
200 | ||
201 | The use of those last three will slow down B<all> regex use | |
202 | within your program. Consult L<perlvar> for C<@LAST_MATCH_START> | |
203 | to see equivalent expressions that won't cause slow down. | |
204 | See also L<Devel::SawAmpersand>. | |
205 | ||
206 | $1, $2 ... hold the Xth captured expr | |
207 | $+ Last parenthesized pattern match | |
208 | $^N Holds the most recently closed capture | |
209 | $^R Holds the result of the last (?{...}) expr | |
6d014f17 JH |
210 | @- Offsets of starts of groups. $-[0] holds start of whole match |
211 | @+ Offsets of ends of groups. $+[0] holds end of whole match | |
30487ceb | 212 | |
6d014f17 | 213 | Captured groups are numbered according to their I<opening> paren. |
30487ceb | 214 | |
a5365663 | 215 | =head2 FUNCTIONS |
30487ceb RGS |
216 | |
217 | lc Lowercase a string | |
218 | lcfirst Lowercase first char of a string | |
219 | uc Uppercase a string | |
47e8a552 IT |
220 | ucfirst Titlecase first char of a string |
221 | ||
30487ceb RGS |
222 | pos Return or set current match position |
223 | quotemeta Quote metacharacters | |
224 | reset Reset ?pattern? status | |
225 | study Analyze string for optimizing matching | |
226 | ||
227 | split Use regex to split a string into parts | |
228 | ||
d3b55b48 JH |
229 | The first four of these are like the escape sequences C<\L>, C<\l>, |
230 | C<\U>, and C<\u>. For Titlecase, see L</Titlecase>. | |
47e8a552 | 231 | |
1501d360 | 232 | =head2 TERMINOLOGY |
47e8a552 | 233 | |
a5365663 | 234 | =head3 Titlecase |
47e8a552 IT |
235 | |
236 | Unicode concept which most often is equal to uppercase, but for | |
237 | certain characters like the German "sharp s" there is a difference. | |
238 | ||
40506b5d | 239 | =head1 AUTHOR |
30487ceb RGS |
240 | |
241 | Iain Truskett. | |
242 | ||
243 | This document may be distributed under the same terms as Perl itself. | |
244 | ||
40506b5d | 245 | =head1 SEE ALSO |
30487ceb RGS |
246 | |
247 | =over 4 | |
248 | ||
249 | =item * | |
250 | ||
251 | L<perlretut> for a tutorial on regular expressions. | |
252 | ||
253 | =item * | |
254 | ||
255 | L<perlrequick> for a rapid tutorial. | |
256 | ||
257 | =item * | |
258 | ||
259 | L<perlre> for more details. | |
260 | ||
261 | =item * | |
262 | ||
263 | L<perlvar> for details on the variables. | |
264 | ||
265 | =item * | |
266 | ||
267 | L<perlop> for details on the operators. | |
268 | ||
269 | =item * | |
270 | ||
271 | L<perlfunc> for details on the functions. | |
272 | ||
273 | =item * | |
274 | ||
275 | L<perlfaq6> for FAQs on regular expressions. | |
276 | ||
277 | =item * | |
278 | ||
279 | The L<re> module to alter behaviour and aid | |
280 | debugging. | |
281 | ||
282 | =item * | |
283 | ||
284 | L<perldebug/"Debugging regular expressions"> | |
285 | ||
286 | =item * | |
287 | ||
288 | L<perluniintro>, L<perlunicode>, L<charnames> and L<locale> | |
289 | for details on regexes and internationalisation. | |
290 | ||
291 | =item * | |
292 | ||
293 | I<Mastering Regular Expressions> by Jeffrey Friedl | |
294 | (F<http://regex.info/>) for a thorough grounding and | |
295 | reference on the topic. | |
296 | ||
297 | =back | |
298 | ||
40506b5d | 299 | =head1 THANKS |
30487ceb RGS |
300 | |
301 | David P.C. Wollmann, | |
302 | Richard Soderberg, | |
303 | Sean M. Burke, | |
304 | Tom Christiansen, | |
e5a7b003 | 305 | Jim Cromie, |
30487ceb RGS |
306 | and |
307 | Jeffrey Goff | |
308 | for useful advice. | |
6d014f17 JH |
309 | |
310 | =cut |