perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=head1 NAME
	2
	3	perlreref - Perl Regular Expressions Reference
	4
	5	=head1 DESCRIPTION
	6
	7	This is a quick reference to Perl's regular expressions.
	8	For full information see L<perlre> and L<perlop>, as well
	9	as the L</"SEE ALSO"> section in this document.
	10
	11	=head2 OPERATORS
	12
	13	C<=~> determines to which variable the regex is applied.
	14	In its absence, $_ is used.
	15
	16	$var =~ /foo/;
	17
	18	C<!~> determines to which variable the regex is applied,
	19	and negates the result of the match; it returns
	20	false if the match succeeds, and true if it fails.
	21
	22	$var !~ /foo/;
	23
	24	C<m/pattern/msixpogcdualn> searches a string for a pattern match,
	25	applying the given options.
	26
	27	m Multiline mode - ^ and $ match internal lines
	28	s match as a Single line - . matches \n
	29	i case-Insensitive
	30	x eXtended legibility - free whitespace and comments
	31	p Preserve a copy of the matched string -
	32	${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
	33	o compile pattern Once
	34	g Global - all occurrences
	35	c don't reset pos on failed matches when using /g
	36	a restrict \d, \s, \w and [:posix:] to match ASCII only
	37	aa (two a's) also /i matches exclude ASCII/non-ASCII
	38	l match according to current locale
	39	u match according to Unicode rules
	40	d match according to native rules unless something indicates
	41	Unicode
	42	n Non-capture mode. Don't let () fill in $1, $2, etc...
	43
	44	If 'pattern' is an empty string, the last I<successfully> matched
	45	regex is used. Delimiters other than '/' may be used for both this
	46	operator and the following ones. The leading C<m> can be omitted
	47	if the delimiter is '/'.
	48
	49	C<qr/pattern/msixpodualn> lets you store a regex in a variable,
	50	or pass one around. Modifiers as for C<m//>, and are stored
	51	within the regex.
	52
	53	C<s/pattern/replacement/msixpogcedual> substitutes matches of
	54	'pattern' with 'replacement'. Modifiers as for C<m//>,
	55	with two additions:
	56
	57	e Evaluate 'replacement' as an expression
	58	r Return substitution and leave the original string untouched.
	59
	60	'e' may be specified multiple times. 'replacement' is interpreted
	61	as a double quoted string unless a single-quote (C<'>) is the delimiter.
	62
	63	C<?pattern?> is like C<m/pattern/> but matches only once. No alternate
	64	delimiters can be used. Must be reset with reset().
	65
	66	=head2 SYNTAX
	67
	68	\ Escapes the character immediately following it
	69	. Matches any single character except a newline (unless /s is
	70	used)
	71	^ Matches at the beginning of the string (or line, if /m is used)
	72	$ Matches at the end of the string (or line, if /m is used)
	73	* Matches the preceding element 0 or more times
	74	+ Matches the preceding element 1 or more times
	75	? Matches the preceding element 0 or 1 times
	76	{...} Specifies a range of occurrences for the element preceding it
	77	[...] Matches any one of the characters contained within the brackets
	78	(...) Groups subexpressions for capturing to $1, $2...
	79	(?:...) Groups subexpressions without capturing (cluster)
	80	\| Matches either the subexpression preceding or following it
	81	\g1 or \g{1}, \g2 ... Matches the text from the Nth group
	82	\1, \2, \3 ... Matches the text from the Nth group
	83	\g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group
	84	\g{name} Named backreference
	85	\k<name> Named backreference
	86	\k'name' Named backreference
	87	(?P=name) Named backreference (python syntax)
	88
	89	=head2 ESCAPE SEQUENCES
	90
	91	These work as in normal strings.
	92
	93	\a Alarm (beep)
	94	\e Escape
	95	\f Formfeed
	96	\n Newline
	97	\r Carriage return
	98	\t Tab
	99	\037 Char whose ordinal is the 3 octal digits, max \777
	100	\o{2307} Char whose ordinal is the octal number, unrestricted
	101	\x7f Char whose ordinal is the 2 hex digits, max \xFF
	102	\x{263a} Char whose ordinal is the hex number, unrestricted
	103	\cx Control-x
	104	\N{name} A named Unicode character or character sequence
	105	\N{U+263D} A Unicode character by hex ordinal
	106
	107	\l Lowercase next character
	108	\u Titlecase next character
	109	\L Lowercase until \E
	110	\U Uppercase until \E
	111	\F Foldcase until \E
	112	\Q Disable pattern metacharacters until \E
	113	\E End modification
	114
	115	For Titlecase, see L</Titlecase>.
	116
	117	This one works differently from normal strings:
	118
	119	\b An assertion, not backspace, except in a character class
	120
	121	=head2 CHARACTER CLASSES
	122
	123	[amy] Match 'a', 'm' or 'y'
	124	[f-j] Dash specifies "range"
	125	[f-j-] Dash escaped or at start or end means 'dash'
	126	[^f-j] Caret indicates "match any character _except_ these"
	127
	128	The following sequences (except C<\N>) work within or without a character class.
	129	The first six are locale aware, all are Unicode aware. See L<perllocale>
	130	and L<perlunicode> for details.
	131
	132	\d A digit
	133	\D A nondigit
	134	\w A word character
	135	\W A non-word character
	136	\s A whitespace character
	137	\S A non-whitespace character
	138	\h An horizontal whitespace
	139	\H A non horizontal whitespace
	140	\N A non newline (when not followed by '{NAME}';;
	141	not valid in a character class; equivalent to [^\n]; it's
	142	like '.' without /s modifier)
	143	\v A vertical whitespace
	144	\V A non vertical whitespace
	145	\R A generic newline (?>\v\|\x0D\x0A)
	146
	147	\pP Match P-named (Unicode) property
	148	\p{...} Match Unicode property with name longer than 1 character
	149	\PP Match non-P
	150	\P{...} Match lack of Unicode property with name longer than 1 char
	151	\X Match Unicode extended grapheme cluster
	152
	153	POSIX character classes and their Unicode and Perl equivalents:
	154
	155	ASCII- Full-
	156	POSIX range range backslash
	157	[[:...:]] \p{...} \p{...} sequence Description
	158
	159	-----------------------------------------------------------------------
	160	alnum PosixAlnum XPosixAlnum 'alpha' plus 'digit'
	161	alpha PosixAlpha XPosixAlpha Alphabetic characters
	162	ascii ASCII Any ASCII character
	163	blank PosixBlank XPosixBlank \h Horizontal whitespace;
	164	full-range also
	165	written as
	166	\p{HorizSpace} (GNU
	167	extension)
	168	cntrl PosixCntrl XPosixCntrl Control characters
	169	digit PosixDigit XPosixDigit \d Decimal digits
	170	graph PosixGraph XPosixGraph 'alnum' plus 'punct'
	171	lower PosixLower XPosixLower Lowercase characters
	172	print PosixPrint XPosixPrint 'graph' plus 'space',
	173	but not any Controls
	174	punct PosixPunct XPosixPunct Punctuation and Symbols
	175	in ASCII-range; just
	176	punct outside it
	177	space PosixSpace XPosixSpace \s Whitespace
	178	upper PosixUpper XPosixUpper Uppercase characters
	179	word PosixWord XPosixWord \w 'alnum' + Unicode marks
	180	+ connectors, like
	181	'_' (Perl extension)
	182	xdigit ASCII_Hex_Digit XPosixDigit Hexadecimal digit,
	183	ASCII-range is
	184	[0-9A-Fa-f]
	185
	186	Also, various synonyms like C<\p{Alpha}> for C<\p{XPosixAlpha}>; all listed
	187	in L<perluniprops/Properties accessible through \p{} and \P{}>
	188
	189	Within a character class:
	190
	191	POSIX traditional Unicode
	192	[:digit:] \d \p{Digit}
	193	[:^digit:] \D \P{Digit}
	194
	195	=head2 ANCHORS
	196
	197	All are zero-width assertions.
	198
	199	^ Match string start (or line, if /m is used)
	200	$ Match string end (or line, if /m is used) or before newline
	201	\b{} Match boundary of type specified within the braces
	202	\B{} Match wherever \b{} doesn't match
	203	\b Match word boundary (between \w and \W)
	204	\B Match except at word boundary (between \w and \w or \W and \W)
	205	\A Match string start (regardless of /m)
	206	\Z Match string end (before optional newline)
	207	\z Match absolute string end
	208	\G Match where previous m//g left off
	209	\K Keep the stuff left of the \K, don't include it in $&
	210
	211	=head2 QUANTIFIERS
	212
	213	Quantifiers are greedy by default and match the B<longest> leftmost.
	214
	215	Maximal Minimal Possessive Allowed range
	216	------- ------- ---------- -------------
	217	{n,m} {n,m}? {n,m}+ Must occur at least n times
	218	but no more than m times
	219	{n,} {n,}? {n,}+ Must occur at least n times
	220	{n} {n}? {n}+ Must occur exactly n times
	221	* ? + 0 or more times (same as {0,})
	222	+ +? ++ 1 or more times (same as {1,})
	223	? ?? ?+ 0 or 1 time (same as {0,1})
	224
	225	The possessive forms (new in Perl 5.10) prevent backtracking: what gets
	226	matched by a pattern with a possessive quantifier will not be backtracked
	227	into, even if that causes the whole match to fail.
	228
	229	There is no quantifier C<{,n}>. That's interpreted as a literal string.
	230
	231	=head2 EXTENDED CONSTRUCTS
	232
	233	(?#text) A comment
	234	(?:...) Groups subexpressions without capturing (cluster)
	235	(?pimsx-imsx:...) Enable/disable option (as per m// modifiers)
	236	(?=...) Zero-width positive lookahead assertion
	237	(?!...) Zero-width negative lookahead assertion
	238	(?<=...) Zero-width positive lookbehind assertion
	239	(?<!...) Zero-width negative lookbehind assertion
	240	(?>...) Grab what we can, prohibit backtracking
	241	(?\|...) Branch reset
	242	(?<name>...) Named capture
	243	(?'name'...) Named capture
	244	(?P<name>...) Named capture (python syntax)
	245	(?[...]) Extended bracketed character class
	246	(?{ code }) Embedded code, return value becomes $^R
	247	(??{ code }) Dynamic regex, return value used as regex
	248	(?N) Recurse into subpattern number N
	249	(?-N), (?+N) Recurse into Nth previous/next subpattern
	250	(?R), (?0) Recurse at the beginning of the whole pattern
	251	(?&name) Recurse into a named subpattern
	252	(?P>name) Recurse into a named subpattern (python syntax)
	253	(?(cond)yes\|no)
	254	(?(cond)yes) Conditional expression, where "cond" can be:
	255	(?=pat) lookahead
	256	(?!pat) negative lookahead
	257	(?<=pat) lookbehind
	258	(?<!pat) negative lookbehind
	259	(N) subpattern N has matched something
	260	(<name>) named subpattern has matched something
	261	('name') named subpattern has matched something
	262	(?{code}) code condition
	263	(R) true if recursing
	264	(RN) true if recursing into Nth subpattern
	265	(R&name) true if recursing into named subpattern
	266	(DEFINE) always false, no no-pattern allowed
	267
	268	=head2 VARIABLES
	269
	270	$_ Default variable for operators to use
	271
	272	$` Everything prior to matched string
	273	$& Entire matched string
	274	$' Everything after to matched string
	275
	276	${^PREMATCH} Everything prior to matched string
	277	${^MATCH} Entire matched string
	278	${^POSTMATCH} Everything after to matched string
	279
	280	Note to those still using Perl 5.18 or earlier:
	281	The use of C<$`>, C<$&> or C<$'> will slow down B<all> regex use
	282	within your program. Consult L<perlvar> for C<@->
	283	to see equivalent expressions that won't cause slow down.
	284	See also L<Devel::SawAmpersand>. Starting with Perl 5.10, you
	285	can also use the equivalent variables C<${^PREMATCH}>, C<${^MATCH}>
	286	and C<${^POSTMATCH}>, but for them to be defined, you have to
	287	specify the C</p> (preserve) modifier on your regular expression.
	288	In Perl 5.20, the use of C<$`>, C<$&> and C<$'> makes no speed difference.
	289
	290	$1, $2 ... hold the Xth captured expr
	291	$+ Last parenthesized pattern match
	292	$^N Holds the most recently closed capture
	293	$^R Holds the result of the last (?{...}) expr
	294	@- Offsets of starts of groups. $-[0] holds start of whole match
	295	@+ Offsets of ends of groups. $+[0] holds end of whole match
	296	%+ Named capture groups
	297	%- Named capture groups, as array refs
	298
	299	Captured groups are numbered according to their I<opening> paren.
	300
	301	=head2 FUNCTIONS
	302
	303	lc Lowercase a string
	304	lcfirst Lowercase first char of a string
	305	uc Uppercase a string
	306	ucfirst Titlecase first char of a string
	307	fc Foldcase a string
	308
	309	pos Return or set current match position
	310	quotemeta Quote metacharacters
	311	reset Reset ?pattern? status
	312	study Analyze string for optimizing matching
	313
	314	split Use a regex to split a string into parts
	315
	316	The first five of these are like the escape sequences C<\L>, C<\l>,
	317	C<\U>, C<\u>, and C<\F>. For Titlecase, see L</Titlecase>; For
	318	Foldcase, see L</Foldcase>.
	319
	320	=head2 TERMINOLOGY
	321
	322	=head3 Titlecase
	323
	324	Unicode concept which most often is equal to uppercase, but for
	325	certain characters like the German "sharp s" there is a difference.
	326
	327	=head3 Foldcase
	328
	329	Unicode form that is useful when comparing strings regardless of case,
	330	as certain characters have complex one-to-many case mappings. Primarily a
	331	variant of lowercase.
	332
	333	=head1 AUTHOR
	334
	335	Iain Truskett. Updated by the Perl 5 Porters.
	336
	337	This document may be distributed under the same terms as Perl itself.
	338
	339	=head1 SEE ALSO
	340
	341	=over 4
	342
	343	=item *
	344
	345	L<perlretut> for a tutorial on regular expressions.
	346
	347	=item *
	348
	349	L<perlrequick> for a rapid tutorial.
	350
	351	=item *
	352
	353	L<perlre> for more details.
	354
	355	=item *
	356
	357	L<perlvar> for details on the variables.
	358
	359	=item *
	360
	361	L<perlop> for details on the operators.
	362
	363	=item *
	364
	365	L<perlfunc> for details on the functions.
	366
	367	=item *
	368
	369	L<perlfaq6> for FAQs on regular expressions.
	370
	371	=item *
	372
	373	L<perlrebackslash> for a reference on backslash sequences.
	374
	375	=item *
	376
	377	L<perlrecharclass> for a reference on character classes.
	378
	379	=item *
	380
	381	The L<re> module to alter behaviour and aid
	382	debugging.
	383
	384	=item *
	385
	386	L<perldebug/"Debugging Regular Expressions">
	387
	388	=item *
	389
	390	L<perluniintro>, L<perlunicode>, L<charnames> and L<perllocale>
	391	for details on regexes and internationalisation.
	392
	393	=item *
	394
	395	I<Mastering Regular Expressions> by Jeffrey Friedl
	396	(F<http://oreilly.com/catalog/9780596528126/>) for a thorough grounding and
	397	reference on the topic.
	398
	399	=back
	400
	401	=head1 THANKS
	402
	403	David P.C. Wollmann,
	404	Richard Soderberg,
	405	Sean M. Burke,
	406	Tom Christiansen,
	407	Jim Cromie,
	408	and
	409	Jeffrey Goff
	410	for useful advice.
	411
	412	=cut