perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=head1 NAME
	2
	3	perlreref - Perl Regular Expressions Reference
	4
	5	=head1 DESCRIPTION
	6
	7	This is a quick reference to Perl's regular expressions.
	8	For full information see L<perlre> and L<perlop>, as well
	9	as the L</"SEE ALSO"> section in this document.
	10
	11	=head2 OPERATORS
	12
	13	C<=~> determines to which variable the regex is applied.
	14	In its absence, $_ is used.
	15
	16	$var =~ /foo/;
	17
	18	C<!~> determines to which variable the regex is applied,
	19	and negates the result of the match; it returns
	20	false if the match succeeds, and true if it fails.
	21
	22	$var !~ /foo/;
	23
	24	C<m/pattern/msixpogcdual> searches a string for a pattern match,
	25	applying the given options.
	26
	27	m Multiline mode - ^ and $ match internal lines
	28	s match as a Single line - . matches \n
	29	i case-Insensitive
	30	x eXtended legibility - free whitespace and comments
	31	p Preserve a copy of the matched string -
	32	${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
	33	o compile pattern Once
	34	g Global - all occurrences
	35	c don't reset pos on failed matches when using /g
	36	a restrict \d, \s, \w and [:posix:] to match ASCII only
	37	aa (two a's) also /i matches exclude ASCII/non-ASCII
	38	l match according to current locale
	39	u match according to Unicode rules
	40	d match according to native rules unless something indicates
	41	Unicode
	42
	43	If 'pattern' is an empty string, the last I<successfully> matched
	44	regex is used. Delimiters other than '/' may be used for both this
	45	operator and the following ones. The leading C<m> can be omitted
	46	if the delimiter is '/'.
	47
	48	C<qr/pattern/msixpodual> lets you store a regex in a variable,
	49	or pass one around. Modifiers as for C<m//>, and are stored
	50	within the regex.
	51
	52	C<s/pattern/replacement/msixpogcedual> substitutes matches of
	53	'pattern' with 'replacement'. Modifiers as for C<m//>,
	54	with two additions:
	55
	56	e Evaluate 'replacement' as an expression
	57	r Return substitution and leave the original string untouched.
	58
	59	'e' may be specified multiple times. 'replacement' is interpreted
	60	as a double quoted string unless a single-quote (C<'>) is the delimiter.
	61
	62	C<?pattern?> is like C<m/pattern/> but matches only once. No alternate
	63	delimiters can be used. Must be reset with reset().
	64
	65	=head2 SYNTAX
	66
	67	\ Escapes the character immediately following it
	68	. Matches any single character except a newline (unless /s is
	69	used)
	70	^ Matches at the beginning of the string (or line, if /m is used)
	71	$ Matches at the end of the string (or line, if /m is used)
	72	* Matches the preceding element 0 or more times
	73	+ Matches the preceding element 1 or more times
	74	? Matches the preceding element 0 or 1 times
	75	{...} Specifies a range of occurrences for the element preceding it
	76	[...] Matches any one of the characters contained within the brackets
	77	(...) Groups subexpressions for capturing to $1, $2...
	78	(?:...) Groups subexpressions without capturing (cluster)
	79	\| Matches either the subexpression preceding or following it
	80	\g1 or \g{1}, \g2 ... Matches the text from the Nth group
	81	\1, \2, \3 ... Matches the text from the Nth group
	82	\g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group
	83	\g{name} Named backreference
	84	\k<name> Named backreference
	85	\k'name' Named backreference
	86	(?P=name) Named backreference (python syntax)
	87
	88	=head2 ESCAPE SEQUENCES
	89
	90	These work as in normal strings.
	91
	92	\a Alarm (beep)
	93	\e Escape
	94	\f Formfeed
	95	\n Newline
	96	\r Carriage return
	97	\t Tab
	98	\037 Char whose ordinal is the 3 octal digits, max \777
	99	\o{2307} Char whose ordinal is the octal number, unrestricted
	100	\x7f Char whose ordinal is the 2 hex digits, max \xFF
	101	\x{263a} Char whose ordinal is the hex number, unrestricted
	102	\cx Control-x
	103	\N{name} A named Unicode character or character sequence
	104	\N{U+263D} A Unicode character by hex ordinal
	105
	106	\l Lowercase next character
	107	\u Titlecase next character
	108	\L Lowercase until \E
	109	\U Uppercase until \E
	110	\Q Disable pattern metacharacters until \E
	111	\E End modification
	112
	113	For Titlecase, see L</Titlecase>.
	114
	115	This one works differently from normal strings:
	116
	117	\b An assertion, not backspace, except in a character class
	118
	119	=head2 CHARACTER CLASSES
	120
	121	[amy] Match 'a', 'm' or 'y'
	122	[f-j] Dash specifies "range"
	123	[f-j-] Dash escaped or at start or end means 'dash'
	124	[^f-j] Caret indicates "match any character _except_ these"
	125
	126	The following sequences (except C<\N>) work within or without a character class.
	127	The first six are locale aware, all are Unicode aware. See L<perllocale>
	128	and L<perlunicode> for details.
	129
	130	\d A digit
	131	\D A nondigit
	132	\w A word character
	133	\W A non-word character
	134	\s A whitespace character
	135	\S A non-whitespace character
	136	\h An horizontal whitespace
	137	\H A non horizontal whitespace
	138	\N A non newline (when not followed by '{NAME}'; experimental;
	139	not valid in a character class; equivalent to [^\n]; it's
	140	like '.' without /s modifier)
	141	\v A vertical whitespace
	142	\V A non vertical whitespace
	143	\R A generic newline (?>\v\|\x0D\x0A)
	144
	145	\C Match a byte (with Unicode, '.' matches a character)
	146	\pP Match P-named (Unicode) property
	147	\p{...} Match Unicode property with name longer than 1 character
	148	\PP Match non-P
	149	\P{...} Match lack of Unicode property with name longer than 1 char
	150	\X Match Unicode extended grapheme cluster
	151
	152	POSIX character classes and their Unicode and Perl equivalents:
	153
	154	ASCII- Full-
	155	POSIX range range backslash
	156	[[:...:]] \p{...} \p{...} sequence Description
	157
	158	-----------------------------------------------------------------------
	159	alnum PosixAlnum XPosixAlnum Alpha plus Digit
	160	alpha PosixAlpha XPosixAlpha Alphabetic characters
	161	ascii ASCII Any ASCII character
	162	blank PosixBlank XPosixBlank \h Horizontal whitespace;
	163	full-range also
	164	written as
	165	\p{HorizSpace} (GNU
	166	extension)
	167	cntrl PosixCntrl XPosixCntrl Control characters
	168	digit PosixDigit XPosixDigit \d Decimal digits
	169	graph PosixGraph XPosixGraph Alnum plus Punct
	170	lower PosixLower XPosixLower Lowercase characters
	171	print PosixPrint XPosixPrint Graph plus Print, but
	172	not any Cntrls
	173	punct PosixPunct XPosixPunct Punctuation and Symbols
	174	in ASCII-range; just
	175	punct outside it
	176	space PosixSpace XPosixSpace [\s\cK]
	177	PerlSpace XPerlSpace \s Perl's whitespace def'n
	178	upper PosixUpper XPosixUpper Uppercase characters
	179	word PosixWord XPosixWord \w Alnum + Unicode marks +
	180	connectors, like '_'
	181	(Perl extension)
	182	xdigit ASCII_Hex_Digit XPosixDigit Hexadecimal digit,
	183	ASCII-range is
	184	[0-9A-Fa-f]
	185
	186	Also, various synonyms like C<\p{Alpha}> for C<\p{XPosixAlpha}>; all listed
	187	in L<perluniprops/Properties accessible through \p{} and \P{}>
	188
	189	Within a character class:
	190
	191	POSIX traditional Unicode
	192	[:digit:] \d \p{Digit}
	193	[:^digit:] \D \P{Digit}
	194
	195	=head2 ANCHORS
	196
	197	All are zero-width assertions.
	198
	199	^ Match string start (or line, if /m is used)
	200	$ Match string end (or line, if /m is used) or before newline
	201	\b Match word boundary (between \w and \W)
	202	\B Match except at word boundary (between \w and \w or \W and \W)
	203	\A Match string start (regardless of /m)
	204	\Z Match string end (before optional newline)
	205	\z Match absolute string end
	206	\G Match where previous m//g left off
	207	\K Keep the stuff left of the \K, don't include it in $&
	208
	209	=head2 QUANTIFIERS
	210
	211	Quantifiers are greedy by default and match the B<longest> leftmost.
	212
	213	Maximal Minimal Possessive Allowed range
	214	------- ------- ---------- -------------
	215	{n,m} {n,m}? {n,m}+ Must occur at least n times
	216	but no more than m times
	217	{n,} {n,}? {n,}+ Must occur at least n times
	218	{n} {n}? {n}+ Must occur exactly n times
	219	* ? + 0 or more times (same as {0,})
	220	+ +? ++ 1 or more times (same as {1,})
	221	? ?? ?+ 0 or 1 time (same as {0,1})
	222
	223	The possessive forms (new in Perl 5.10) prevent backtracking: what gets
	224	matched by a pattern with a possessive quantifier will not be backtracked
	225	into, even if that causes the whole match to fail.
	226
	227	There is no quantifier C<{,n}>. That's interpreted as a literal string.
	228
	229	=head2 EXTENDED CONSTRUCTS
	230
	231	(?#text) A comment
	232	(?:...) Groups subexpressions without capturing (cluster)
	233	(?pimsx-imsx:...) Enable/disable option (as per m// modifiers)
	234	(?=...) Zero-width positive lookahead assertion
	235	(?!...) Zero-width negative lookahead assertion
	236	(?<=...) Zero-width positive lookbehind assertion
	237	(?<!...) Zero-width negative lookbehind assertion
	238	(?>...) Grab what we can, prohibit backtracking
	239	(?\|...) Branch reset
	240	(?<name>...) Named capture
	241	(?'name'...) Named capture
	242	(?P<name>...) Named capture (python syntax)
	243	(?{ code }) Embedded code, return value becomes $^R
	244	(??{ code }) Dynamic regex, return value used as regex
	245	(?N) Recurse into subpattern number N
	246	(?-N), (?+N) Recurse into Nth previous/next subpattern
	247	(?R), (?0) Recurse at the beginning of the whole pattern
	248	(?&name) Recurse into a named subpattern
	249	(?P>name) Recurse into a named subpattern (python syntax)
	250	(?(cond)yes\|no)
	251	(?(cond)yes) Conditional expression, where "cond" can be:
	252	(?=pat) look-ahead
	253	(?!pat) negative look-ahead
	254	(?<=pat) look-behind
	255	(?<!pat) negative look-behind
	256	(N) subpattern N has matched something
	257	(<name>) named subpattern has matched something
	258	('name') named subpattern has matched something
	259	(?{code}) code condition
	260	(R) true if recursing
	261	(RN) true if recursing into Nth subpattern
	262	(R&name) true if recursing into named subpattern
	263	(DEFINE) always false, no no-pattern allowed
	264
	265	=head2 VARIABLES
	266
	267	$_ Default variable for operators to use
	268
	269	$` Everything prior to matched string
	270	$& Entire matched string
	271	$' Everything after to matched string
	272
	273	${^PREMATCH} Everything prior to matched string
	274	${^MATCH} Entire matched string
	275	${^POSTMATCH} Everything after to matched string
	276
	277	The use of C<$`>, C<$&> or C<$'> will slow down B<all> regex use
	278	within your program. Consult L<perlvar> for C<@->
	279	to see equivalent expressions that won't cause slow down.
	280	See also L<Devel::SawAmpersand>. Starting with Perl 5.10, you
	281	can also use the equivalent variables C<${^PREMATCH}>, C<${^MATCH}>
	282	and C<${^POSTMATCH}>, but for them to be defined, you have to
	283	specify the C</p> (preserve) modifier on your regular expression.
	284
	285	$1, $2 ... hold the Xth captured expr
	286	$+ Last parenthesized pattern match
	287	$^N Holds the most recently closed capture
	288	$^R Holds the result of the last (?{...}) expr
	289	@- Offsets of starts of groups. $-[0] holds start of whole match
	290	@+ Offsets of ends of groups. $+[0] holds end of whole match
	291	%+ Named capture groups
	292	%- Named capture groups, as array refs
	293
	294	Captured groups are numbered according to their I<opening> paren.
	295
	296	=head2 FUNCTIONS
	297
	298	lc Lowercase a string
	299	lcfirst Lowercase first char of a string
	300	uc Uppercase a string
	301	ucfirst Titlecase first char of a string
	302
	303	pos Return or set current match position
	304	quotemeta Quote metacharacters
	305	reset Reset ?pattern? status
	306	study Analyze string for optimizing matching
	307
	308	split Use a regex to split a string into parts
	309
	310	The first four of these are like the escape sequences C<\L>, C<\l>,
	311	C<\U>, and C<\u>. For Titlecase, see L</Titlecase>.
	312
	313	=head2 TERMINOLOGY
	314
	315	=head3 Titlecase
	316
	317	Unicode concept which most often is equal to uppercase, but for
	318	certain characters like the German "sharp s" there is a difference.
	319
	320	=head1 AUTHOR
	321
	322	Iain Truskett. Updated by the Perl 5 Porters.
	323
	324	This document may be distributed under the same terms as Perl itself.
	325
	326	=head1 SEE ALSO
	327
	328	=over 4
	329
	330	=item *
	331
	332	L<perlretut> for a tutorial on regular expressions.
	333
	334	=item *
	335
	336	L<perlrequick> for a rapid tutorial.
	337
	338	=item *
	339
	340	L<perlre> for more details.
	341
	342	=item *
	343
	344	L<perlvar> for details on the variables.
	345
	346	=item *
	347
	348	L<perlop> for details on the operators.
	349
	350	=item *
	351
	352	L<perlfunc> for details on the functions.
	353
	354	=item *
	355
	356	L<perlfaq6> for FAQs on regular expressions.
	357
	358	=item *
	359
	360	L<perlrebackslash> for a reference on backslash sequences.
	361
	362	=item *
	363
	364	L<perlrecharclass> for a reference on character classes.
	365
	366	=item *
	367
	368	The L<re> module to alter behaviour and aid
	369	debugging.
	370
	371	=item *
	372
	373	L<perldebug/"Debugging Regular Expressions">
	374
	375	=item *
	376
	377	L<perluniintro>, L<perlunicode>, L<charnames> and L<perllocale>
	378	for details on regexes and internationalisation.
	379
	380	=item *
	381
	382	I<Mastering Regular Expressions> by Jeffrey Friedl
	383	(F<http://oreilly.com/catalog/9780596528126/>) for a thorough grounding and
	384	reference on the topic.
	385
	386	=back
	387
	388	=head1 THANKS
	389
	390	David P.C. Wollmann,
	391	Richard Soderberg,
	392	Sean M. Burke,
	393	Tom Christiansen,
	394	Jim Cromie,
	395	and
	396	Jeffrey Goff
	397	for useful advice.
	398
	399	=cut