perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=head1 NAME
	2
	3	perlreref - Perl Regular Expressions Reference
	4
	5	=head1 DESCRIPTION
	6
	7	This is a quick reference to Perl's regular expressions.
	8	For full information see L<perlre> and L<perlop>, as well
	9	as the L</"SEE ALSO"> section in this document.
	10
	11	=head2 OPERATORS
	12
	13	C<=~> determines to which variable the regex is applied.
	14	In its absence, $_ is used.
	15
	16	$var =~ /foo/;
	17
	18	C<!~> determines to which variable the regex is applied,
	19	and negates the result of the match; it returns
	20	false if the match succeeds, and true if it fails.
	21
	22	$var !~ /foo/;
	23
	24	C<m/pattern/msixpogcdualn> searches a string for a pattern match,
	25	applying the given options.
	26
	27	m Multiline mode - ^ and $ match internal lines
	28	s match as a Single line - . matches \n
	29	i case-Insensitive
	30	x eXtended legibility - free whitespace and comments
	31	p Preserve a copy of the matched string -
	32	${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
	33	o compile pattern Once
	34	g Global - all occurrences
	35	c don't reset pos on failed matches when using /g
	36	a restrict \d, \s, \w and [:posix:] to match ASCII only
	37	aa (two a's) also /i matches exclude ASCII/non-ASCII
	38	l match according to current locale
	39	u match according to Unicode rules
	40	d match according to native rules unless something indicates
	41	Unicode
	42	n Non-capture mode. Don't let () fill in $1, $2, etc...
	43
	44	If 'pattern' is an empty string, the last I<successfully> matched
	45	regex is used. Delimiters other than '/' may be used for both this
	46	operator and the following ones. The leading C<m> can be omitted
	47	if the delimiter is '/'.
	48
	49	C<qr/pattern/msixpodualn> lets you store a regex in a variable,
	50	or pass one around. Modifiers as for C<m//>, and are stored
	51	within the regex.
	52
	53	C<s/pattern/replacement/msixpogcedual> substitutes matches of
	54	'pattern' with 'replacement'. Modifiers as for C<m//>,
	55	with two additions:
	56
	57	e Evaluate 'replacement' as an expression
	58	r Return substitution and leave the original string untouched.
	59
	60	'e' may be specified multiple times. 'replacement' is interpreted
	61	as a double quoted string unless a single-quote (C<'>) is the delimiter.
	62
	63	C<m?pattern?> is like C<m/pattern/> but matches only once. No alternate
	64	delimiters can be used. Must be reset with reset().
	65
	66	=head2 SYNTAX
	67
	68	\ Escapes the character immediately following it
	69	. Matches any single character except a newline (unless /s is
	70	used)
	71	^ Matches at the beginning of the string (or line, if /m is used)
	72	$ Matches at the end of the string (or line, if /m is used)
	73	* Matches the preceding element 0 or more times
	74	+ Matches the preceding element 1 or more times
	75	? Matches the preceding element 0 or 1 times
	76	{...} Specifies a range of occurrences for the element preceding it
	77	[...] Matches any one of the characters contained within the brackets
	78	(...) Groups subexpressions for capturing to $1, $2...
	79	(?:...) Groups subexpressions without capturing (cluster)
	80	\| Matches either the subexpression preceding or following it
	81	\g1 or \g{1}, \g2 ... Matches the text from the Nth group
	82	\1, \2, \3 ... Matches the text from the Nth group
	83	\g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group
	84	\g{name} Named backreference
	85	\k<name> Named backreference
	86	\k'name' Named backreference
	87	(?P=name) Named backreference (python syntax)
	88
	89	=head2 ESCAPE SEQUENCES
	90
	91	These work as in normal strings.
	92
	93	\a Alarm (beep)
	94	\e Escape
	95	\f Formfeed
	96	\n Newline
	97	\r Carriage return
	98	\t Tab
	99	\037 Char whose ordinal is the 3 octal digits, max \777
	100	\o{2307} Char whose ordinal is the octal number, unrestricted
	101	\x7f Char whose ordinal is the 2 hex digits, max \xFF
	102	\x{263a} Char whose ordinal is the hex number, unrestricted
	103	\cx Control-x
	104	\N{name} A named Unicode character or character sequence
	105	\N{U+263D} A Unicode character by hex ordinal
	106
	107	\l Lowercase next character
	108	\u Titlecase next character
	109	\L Lowercase until \E
	110	\U Uppercase until \E
	111	\F Foldcase until \E
	112	\Q Disable pattern metacharacters until \E
	113	\E End modification
	114
	115	For Titlecase, see L</Titlecase>.
	116
	117	This one works differently from normal strings:
	118
	119	\b An assertion, not backspace, except in a character class
	120
	121	=head2 CHARACTER CLASSES
	122
	123	[amy] Match 'a', 'm' or 'y'
	124	[f-j] Dash specifies "range"
	125	[f-j-] Dash escaped or at start or end means 'dash'
	126	[^f-j] Caret indicates "match any character _except_ these"
	127
	128	The following sequences (except C<\N>) work within or without a character class.
	129	The first six are locale aware, all are Unicode aware. See L<perllocale>
	130	and L<perlunicode> for details.
	131
	132	\d A digit
	133	\D A nondigit
	134	\w A word character
	135	\W A non-word character
	136	\s A whitespace character
	137	\S A non-whitespace character
	138	\h A horizontal whitespace
	139	\H A non horizontal whitespace
	140	\N A non newline (when not followed by '{NAME}';;
	141	not valid in a character class; equivalent to [^\n]; it's
	142	like '.' without /s modifier)
	143	\v A vertical whitespace
	144	\V A non vertical whitespace
	145	\R A generic newline (?>\v\|\x0D\x0A)
	146
	147	\pP Match P-named (Unicode) property
	148	\p{...} Match Unicode property with name longer than 1 character
	149	\PP Match non-P
	150	\P{...} Match lack of Unicode property with name longer than 1 char
	151	\X Match Unicode extended grapheme cluster
	152
	153	POSIX character classes and their Unicode and Perl equivalents:
	154
	155	ASCII- Full-
	156	POSIX range range backslash
	157	[[:...:]] \p{...} \p{...} sequence Description
	158
	159	-----------------------------------------------------------------------
	160	alnum PosixAlnum XPosixAlnum 'alpha' plus 'digit'
	161	alpha PosixAlpha XPosixAlpha Alphabetic characters
	162	ascii ASCII Any ASCII character
	163	blank PosixBlank XPosixBlank \h Horizontal whitespace;
	164	full-range also
	165	written as
	166	\p{HorizSpace} (GNU
	167	extension)
	168	cntrl PosixCntrl XPosixCntrl Control characters
	169	digit PosixDigit XPosixDigit \d Decimal digits
	170	graph PosixGraph XPosixGraph 'alnum' plus 'punct'
	171	lower PosixLower XPosixLower Lowercase characters
	172	print PosixPrint XPosixPrint 'graph' plus 'space',
	173	but not any Controls
	174	punct PosixPunct XPosixPunct Punctuation and Symbols
	175	in ASCII-range; just
	176	punct outside it
	177	space PosixSpace XPosixSpace \s Whitespace
	178	upper PosixUpper XPosixUpper Uppercase characters
	179	word PosixWord XPosixWord \w 'alnum' + Unicode marks
	180	+ connectors, like
	181	'_' (Perl extension)
	182	xdigit ASCII_Hex_Digit XPosixDigit Hexadecimal digit,
	183	ASCII-range is
	184	[0-9A-Fa-f]
	185
	186	Also, various synonyms like C<\p{Alpha}> for C<\p{XPosixAlpha}>; all listed
	187	in L<perluniprops/Properties accessible through \p{} and \P{}>
	188
	189	Within a character class:
	190
	191	POSIX traditional Unicode
	192	[:digit:] \d \p{Digit}
	193	[:^digit:] \D \P{Digit}
	194
	195	=head2 ANCHORS
	196
	197	All are zero-width assertions.
	198
	199	^ Match string start (or line, if /m is used)
	200	$ Match string end (or line, if /m is used) or before newline
	201	\b{} Match boundary of type specified within the braces
	202	\B{} Match wherever \b{} doesn't match
	203	\b Match word boundary (between \w and \W)
	204	\B Match except at word boundary (between \w and \w or \W and \W)
	205	\A Match string start (regardless of /m)
	206	\Z Match string end (before optional newline)
	207	\z Match absolute string end
	208	\G Match where previous m//g left off
	209	\K Keep the stuff left of the \K, don't include it in $&
	210
	211	=head2 QUANTIFIERS
	212
	213	Quantifiers are greedy by default and match the B<longest> leftmost.
	214
	215	Maximal Minimal Possessive Allowed range
	216	------- ------- ---------- -------------
	217	{n,m} {n,m}? {n,m}+ Must occur at least n times
	218	but no more than m times
	219	{n,} {n,}? {n,}+ Must occur at least n times
	220	{,n} {,n}? {,n}+ Must occur at most n times
	221	{n} {n}? {n}+ Must occur exactly n times
	222	* ? + 0 or more times (same as {0,})
	223	+ +? ++ 1 or more times (same as {1,})
	224	? ?? ?+ 0 or 1 time (same as {0,1})
	225
	226	The possessive forms (new in Perl 5.10) prevent backtracking: what gets
	227	matched by a pattern with a possessive quantifier will not be backtracked
	228	into, even if that causes the whole match to fail.
	229
	230	=head2 EXTENDED CONSTRUCTS
	231
	232	(?#text) A comment
	233	(?:...) Groups subexpressions without capturing (cluster)
	234	(?pimsx-imsx:...) Enable/disable option (as per m// modifiers)
	235	(?=...) Zero-width positive lookahead assertion
	236	(*pla:...) Same, starting in 5.32; experimentally in 5.28
	237	(positive_lookahead:...) Same, same versions as pla
	238	(?!...) Zero-width negative lookahead assertion
	239	(*nla:...) Same, starting in 5.32; experimentally in 5.28
	240	(negative_lookahead:...) Same, same versions as nla
	241	(?<=...) Zero-width positive lookbehind assertion
	242	(*plb:...) Same, starting in 5.32; experimentally in 5.28
	243	(positive_lookbehind:...) Same, same versions as plb
	244	(?<!...) Zero-width negative lookbehind assertion
	245	(*nlb:...) Same, starting in 5.32; experimentally in 5.28
	246	(negative_lookbehind:...) Same, same versions as plb
	247	(?>...) Grab what we can, prohibit backtracking
	248	(*atomic:...) Same, starting in 5.32; experimentally in 5.28
	249	(?\|...) Branch reset
	250	(?<name>...) Named capture
	251	(?'name'...) Named capture
	252	(?P<name>...) Named capture (python syntax)
	253	(?[...]) Extended bracketed character class
	254	(?{ code }) Embedded code, return value becomes $^R
	255	(??{ code }) Dynamic regex, return value used as regex
	256	(?N) Recurse into subpattern number N
	257	(?-N), (?+N) Recurse into Nth previous/next subpattern
	258	(?R), (?0) Recurse at the beginning of the whole pattern
	259	(?&name) Recurse into a named subpattern
	260	(?P>name) Recurse into a named subpattern (python syntax)
	261	(?(cond)yes\|no)
	262	(?(cond)yes) Conditional expression, where "(cond)" can be:
	263	(?=pat) lookahead; also (*pla:pat)
	264	(*positive_lookahead:pat)
	265	(?!pat) negative lookahead; also (*nla:pat)
	266	(*negative_lookahead:pat)
	267	(?<=pat) lookbehind; also (*plb:pat)
	268	(*lookbehind:pat)
	269	(?<!pat) negative lookbehind; also (*nlb:pat)
	270	(*negative_lookbehind:pat)
	271	(N) subpattern N has matched something
	272	(<name>) named subpattern has matched something
	273	('name') named subpattern has matched something
	274	(?{code}) code condition
	275	(R) true if recursing
	276	(RN) true if recursing into Nth subpattern
	277	(R&name) true if recursing into named subpattern
	278	(DEFINE) always false, no no-pattern allowed
	279
	280	=head2 VARIABLES
	281
	282	$_ Default variable for operators to use
	283
	284	$` Everything prior to matched string
	285	$& Entire matched string
	286	$' Everything after to matched string
	287
	288	${^PREMATCH} Everything prior to matched string
	289	${^MATCH} Entire matched string
	290	${^POSTMATCH} Everything after to matched string
	291
	292	Note to those still using Perl 5.18 or earlier:
	293	The use of C<$`>, C<$&> or C<$'> will slow down B<all> regex use
	294	within your program. Consult L<perlvar> for C<@->
	295	to see equivalent expressions that won't cause slow down.
	296	See also L<Devel::SawAmpersand>. Starting with Perl 5.10, you
	297	can also use the equivalent variables C<${^PREMATCH}>, C<${^MATCH}>
	298	and C<${^POSTMATCH}>, but for them to be defined, you have to
	299	specify the C</p> (preserve) modifier on your regular expression.
	300	In Perl 5.20, the use of C<$`>, C<$&> and C<$'> makes no speed difference.
	301
	302	$1, $2 ... hold the Xth captured expr
	303	$+ Last parenthesized pattern match
	304	$^N Holds the most recently closed capture
	305	$^R Holds the result of the last (?{...}) expr
	306	@- Offsets of starts of groups. $-[0] holds start of whole match
	307	@+ Offsets of ends of groups. $+[0] holds end of whole match
	308	%+ Named capture groups
	309	%- Named capture groups, as array refs
	310
	311	Captured groups are numbered according to their I<opening> paren.
	312
	313	=head2 FUNCTIONS
	314
	315	lc Lowercase a string
	316	lcfirst Lowercase first char of a string
	317	uc Uppercase a string
	318	ucfirst Titlecase first char of a string
	319	fc Foldcase a string
	320
	321	pos Return or set current match position
	322	quotemeta Quote metacharacters
	323	reset Reset m?pattern? status
	324	study Analyze string for optimizing matching
	325
	326	split Use a regex to split a string into parts
	327
	328	The first five of these are like the escape sequences C<\L>, C<\l>,
	329	C<\U>, C<\u>, and C<\F>. For Titlecase, see L</Titlecase>; For
	330	Foldcase, see L</Foldcase>.
	331
	332	=head2 TERMINOLOGY
	333
	334	=head3 Titlecase
	335
	336	Unicode concept which most often is equal to uppercase, but for
	337	certain characters like the German "sharp s" there is a difference.
	338
	339	=head3 Foldcase
	340
	341	Unicode form that is useful when comparing strings regardless of case,
	342	as certain characters have complex one-to-many case mappings. Primarily a
	343	variant of lowercase.
	344
	345	=head1 AUTHOR
	346
	347	Iain Truskett. Updated by the Perl 5 Porters.
	348
	349	This document may be distributed under the same terms as Perl itself.
	350
	351	=head1 SEE ALSO
	352
	353	=over 4
	354
	355	=item *
	356
	357	L<perlretut> for a tutorial on regular expressions.
	358
	359	=item *
	360
	361	L<perlrequick> for a rapid tutorial.
	362
	363	=item *
	364
	365	L<perlre> for more details.
	366
	367	=item *
	368
	369	L<perlvar> for details on the variables.
	370
	371	=item *
	372
	373	L<perlop> for details on the operators.
	374
	375	=item *
	376
	377	L<perlfunc> for details on the functions.
	378
	379	=item *
	380
	381	L<perlfaq6> for FAQs on regular expressions.
	382
	383	=item *
	384
	385	L<perlrebackslash> for a reference on backslash sequences.
	386
	387	=item *
	388
	389	L<perlrecharclass> for a reference on character classes.
	390
	391	=item *
	392
	393	The L<re> module to alter behaviour and aid
	394	debugging.
	395
	396	=item *
	397
	398	L<perldebug/"Debugging Regular Expressions">
	399
	400	=item *
	401
	402	L<perluniintro>, L<perlunicode>, L<charnames> and L<perllocale>
	403	for details on regexes and internationalisation.
	404
	405	=item *
	406
	407	I<Mastering Regular Expressions> by Jeffrey Friedl
	408	(L<http://oreilly.com/catalog/9780596528126/>) for a thorough grounding and
	409	reference on the topic.
	410
	411	=back
	412
	413	=head1 THANKS
	414
	415	David P.C. Wollmann,
	416	Richard Soderberg,
	417	Sean M. Burke,
	418	Tom Christiansen,
	419	Jim Cromie,
	420	and
	421	Jeffrey Goff
	422	for useful advice.
	423
	424	=cut