perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=head1 NAME
	2
	3	perlreref - Perl Regular Expressions Reference
	4
	5	=head1 DESCRIPTION
	6
	7	This is a quick reference to Perl's regular expressions.
	8	For full information see L<perlre> and L<perlop>, as well
	9	as the L</"SEE ALSO"> section in this document.
	10
	11	=head2 OPERATORS
	12
	13	C<=~> determines to which variable the regex is applied.
	14	In its absence, $_ is used.
	15
	16	$var =~ /foo/;
	17
	18	C<!~> determines to which variable the regex is applied,
	19	and negates the result of the match; it returns
	20	false if the match succeeds, and true if it fails.
	21
	22	$var !~ /foo/;
	23
	24	C<m/pattern/msixpogcdual> searches a string for a pattern match,
	25	applying the given options.
	26
	27	m Multiline mode - ^ and $ match internal lines
	28	s match as a Single line - . matches \n
	29	i case-Insensitive
	30	x eXtended legibility - free whitespace and comments
	31	p Preserve a copy of the matched string -
	32	${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
	33	o compile pattern Once
	34	g Global - all occurrences
	35	c don't reset pos on failed matches when using /g
	36	a restrict \d, \s, \w and [:posix:] to match ASCII only
	37	aa (two a's) also /i matches exclude ASCII/non-ASCII
	38	l match according to current locale
	39	u match according to Unicode rules
	40	d match according to native rules unless something indicates
	41	Unicode
	42
	43	If 'pattern' is an empty string, the last I<successfully> matched
	44	regex is used. Delimiters other than '/' may be used for both this
	45	operator and the following ones. The leading C<m> can be omitted
	46	if the delimiter is '/'.
	47
	48	C<qr/pattern/msixpodual> lets you store a regex in a variable,
	49	or pass one around. Modifiers as for C<m//>, and are stored
	50	within the regex.
	51
	52	C<s/pattern/replacement/msixpogcedual> substitutes matches of
	53	'pattern' with 'replacement'. Modifiers as for C<m//>,
	54	with two additions:
	55
	56	e Evaluate 'replacement' as an expression
	57	r Return substitution and leave the original string untouched.
	58
	59	'e' may be specified multiple times. 'replacement' is interpreted
	60	as a double quoted string unless a single-quote (C<'>) is the delimiter.
	61
	62	C<?pattern?> is like C<m/pattern/> but matches only once. No alternate
	63	delimiters can be used. Must be reset with reset().
	64
	65	=head2 SYNTAX
	66
	67	\ Escapes the character immediately following it
	68	. Matches any single character except a newline (unless /s is
	69	used)
	70	^ Matches at the beginning of the string (or line, if /m is used)
	71	$ Matches at the end of the string (or line, if /m is used)
	72	* Matches the preceding element 0 or more times
	73	+ Matches the preceding element 1 or more times
	74	? Matches the preceding element 0 or 1 times
	75	{...} Specifies a range of occurrences for the element preceding it
	76	[...] Matches any one of the characters contained within the brackets
	77	(...) Groups subexpressions for capturing to $1, $2...
	78	(?:...) Groups subexpressions without capturing (cluster)
	79	\| Matches either the subexpression preceding or following it
	80	\g1 or \g{1}, \g2 ... Matches the text from the Nth group
	81	\1, \2, \3 ... Matches the text from the Nth group
	82	\g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group
	83	\g{name} Named backreference
	84	\k<name> Named backreference
	85	\k'name' Named backreference
	86	(?P=name) Named backreference (python syntax)
	87
	88	=head2 ESCAPE SEQUENCES
	89
	90	These work as in normal strings.
	91
	92	\a Alarm (beep)
	93	\e Escape
	94	\f Formfeed
	95	\n Newline
	96	\r Carriage return
	97	\t Tab
	98	\037 Char whose ordinal is the 3 octal digits, max \777
	99	\o{2307} Char whose ordinal is the octal number, unrestricted
	100	\x7f Char whose ordinal is the 2 hex digits, max \xFF
	101	\x{263a} Char whose ordinal is the hex number, unrestricted
	102	\cx Control-x
	103	\N{name} A named Unicode character or character sequence
	104	\N{U+263D} A Unicode character by hex ordinal
	105
	106	\l Lowercase next character
	107	\u Titlecase next character
	108	\L Lowercase until \E
	109	\U Uppercase until \E
	110	\F Foldcase until \E
	111	\Q Disable pattern metacharacters until \E
	112	\E End modification
	113
	114	For Titlecase, see L</Titlecase>.
	115
	116	This one works differently from normal strings:
	117
	118	\b An assertion, not backspace, except in a character class
	119
	120	=head2 CHARACTER CLASSES
	121
	122	[amy] Match 'a', 'm' or 'y'
	123	[f-j] Dash specifies "range"
	124	[f-j-] Dash escaped or at start or end means 'dash'
	125	[^f-j] Caret indicates "match any character _except_ these"
	126
	127	The following sequences (except C<\N>) work within or without a character class.
	128	The first six are locale aware, all are Unicode aware. See L<perllocale>
	129	and L<perlunicode> for details.
	130
	131	\d A digit
	132	\D A nondigit
	133	\w A word character
	134	\W A non-word character
	135	\s A whitespace character
	136	\S A non-whitespace character
	137	\h An horizontal whitespace
	138	\H A non horizontal whitespace
	139	\N A non newline (when not followed by '{NAME}';;
	140	not valid in a character class; equivalent to [^\n]; it's
	141	like '.' without /s modifier)
	142	\v A vertical whitespace
	143	\V A non vertical whitespace
	144	\R A generic newline (?>\v\|\x0D\x0A)
	145
	146	\C Match a byte (with Unicode, '.' matches a character)
	147	\pP Match P-named (Unicode) property
	148	\p{...} Match Unicode property with name longer than 1 character
	149	\PP Match non-P
	150	\P{...} Match lack of Unicode property with name longer than 1 char
	151	\X Match Unicode extended grapheme cluster
	152
	153	POSIX character classes and their Unicode and Perl equivalents:
	154
	155	ASCII- Full-
	156	POSIX range range backslash
	157	[[:...:]] \p{...} \p{...} sequence Description
	158
	159	-----------------------------------------------------------------------
	160	alnum PosixAlnum XPosixAlnum Alpha plus Digit
	161	alpha PosixAlpha XPosixAlpha Alphabetic characters
	162	ascii ASCII Any ASCII character
	163	blank PosixBlank XPosixBlank \h Horizontal whitespace;
	164	full-range also
	165	written as
	166	\p{HorizSpace} (GNU
	167	extension)
	168	cntrl PosixCntrl XPosixCntrl Control characters
	169	digit PosixDigit XPosixDigit \d Decimal digits
	170	graph PosixGraph XPosixGraph Alnum plus Punct
	171	lower PosixLower XPosixLower Lowercase characters
	172	print PosixPrint XPosixPrint Graph plus Print, but
	173	not any Cntrls
	174	punct PosixPunct XPosixPunct Punctuation and Symbols
	175	in ASCII-range; just
	176	punct outside it
	177	space PosixSpace XPosixSpace [\s\cK]
	178	PerlSpace XPerlSpace \s Perl's whitespace def'n
	179	upper PosixUpper XPosixUpper Uppercase characters
	180	word PosixWord XPosixWord \w Alnum + Unicode marks +
	181	connectors, like '_'
	182	(Perl extension)
	183	xdigit ASCII_Hex_Digit XPosixDigit Hexadecimal digit,
	184	ASCII-range is
	185	[0-9A-Fa-f]
	186
	187	Also, various synonyms like C<\p{Alpha}> for C<\p{XPosixAlpha}>; all listed
	188	in L<perluniprops/Properties accessible through \p{} and \P{}>
	189
	190	Within a character class:
	191
	192	POSIX traditional Unicode
	193	[:digit:] \d \p{Digit}
	194	[:^digit:] \D \P{Digit}
	195
	196	=head2 ANCHORS
	197
	198	All are zero-width assertions.
	199
	200	^ Match string start (or line, if /m is used)
	201	$ Match string end (or line, if /m is used) or before newline
	202	\b Match word boundary (between \w and \W)
	203	\B Match except at word boundary (between \w and \w or \W and \W)
	204	\A Match string start (regardless of /m)
	205	\Z Match string end (before optional newline)
	206	\z Match absolute string end
	207	\G Match where previous m//g left off
	208	\K Keep the stuff left of the \K, don't include it in $&
	209
	210	=head2 QUANTIFIERS
	211
	212	Quantifiers are greedy by default and match the B<longest> leftmost.
	213
	214	Maximal Minimal Possessive Allowed range
	215	------- ------- ---------- -------------
	216	{n,m} {n,m}? {n,m}+ Must occur at least n times
	217	but no more than m times
	218	{n,} {n,}? {n,}+ Must occur at least n times
	219	{n} {n}? {n}+ Must occur exactly n times
	220	* ? + 0 or more times (same as {0,})
	221	+ +? ++ 1 or more times (same as {1,})
	222	? ?? ?+ 0 or 1 time (same as {0,1})
	223
	224	The possessive forms (new in Perl 5.10) prevent backtracking: what gets
	225	matched by a pattern with a possessive quantifier will not be backtracked
	226	into, even if that causes the whole match to fail.
	227
	228	There is no quantifier C<{,n}>. That's interpreted as a literal string.
	229
	230	=head2 EXTENDED CONSTRUCTS
	231
	232	(?#text) A comment
	233	(?:...) Groups subexpressions without capturing (cluster)
	234	(?pimsx-imsx:...) Enable/disable option (as per m// modifiers)
	235	(?=...) Zero-width positive lookahead assertion
	236	(?!...) Zero-width negative lookahead assertion
	237	(?<=...) Zero-width positive lookbehind assertion
	238	(?<!...) Zero-width negative lookbehind assertion
	239	(?>...) Grab what we can, prohibit backtracking
	240	(?\|...) Branch reset
	241	(?<name>...) Named capture
	242	(?'name'...) Named capture
	243	(?P<name>...) Named capture (python syntax)
	244	(?{ code }) Embedded code, return value becomes $^R
	245	(??{ code }) Dynamic regex, return value used as regex
	246	(?N) Recurse into subpattern number N
	247	(?-N), (?+N) Recurse into Nth previous/next subpattern
	248	(?R), (?0) Recurse at the beginning of the whole pattern
	249	(?&name) Recurse into a named subpattern
	250	(?P>name) Recurse into a named subpattern (python syntax)
	251	(?(cond)yes\|no)
	252	(?(cond)yes) Conditional expression, where "cond" can be:
	253	(?=pat) look-ahead
	254	(?!pat) negative look-ahead
	255	(?<=pat) look-behind
	256	(?<!pat) negative look-behind
	257	(N) subpattern N has matched something
	258	(<name>) named subpattern has matched something
	259	('name') named subpattern has matched something
	260	(?{code}) code condition
	261	(R) true if recursing
	262	(RN) true if recursing into Nth subpattern
	263	(R&name) true if recursing into named subpattern
	264	(DEFINE) always false, no no-pattern allowed
	265
	266	=head2 VARIABLES
	267
	268	$_ Default variable for operators to use
	269
	270	$` Everything prior to matched string
	271	$& Entire matched string
	272	$' Everything after to matched string
	273
	274	${^PREMATCH} Everything prior to matched string
	275	${^MATCH} Entire matched string
	276	${^POSTMATCH} Everything after to matched string
	277
	278	Note to those still using Perl 5.18 or earlier:
	279	The use of C<$`>, C<$&> or C<$'> will slow down B<all> regex use
	280	within your program. Consult L<perlvar> for C<@->
	281	to see equivalent expressions that won't cause slow down.
	282	See also L<Devel::SawAmpersand>. Starting with Perl 5.10, you
	283	can also use the equivalent variables C<${^PREMATCH}>, C<${^MATCH}>
	284	and C<${^POSTMATCH}>, but for them to be defined, you have to
	285	specify the C</p> (preserve) modifier on your regular expression.
	286	In Perl 5.20, the use of C<$`>, C<$&> and C<$'> makes no speed difference.
	287
	288	$1, $2 ... hold the Xth captured expr
	289	$+ Last parenthesized pattern match
	290	$^N Holds the most recently closed capture
	291	$^R Holds the result of the last (?{...}) expr
	292	@- Offsets of starts of groups. $-[0] holds start of whole match
	293	@+ Offsets of ends of groups. $+[0] holds end of whole match
	294	%+ Named capture groups
	295	%- Named capture groups, as array refs
	296
	297	Captured groups are numbered according to their I<opening> paren.
	298
	299	=head2 FUNCTIONS
	300
	301	lc Lowercase a string
	302	lcfirst Lowercase first char of a string
	303	uc Uppercase a string
	304	ucfirst Titlecase first char of a string
	305	fc Foldcase a string
	306
	307	pos Return or set current match position
	308	quotemeta Quote metacharacters
	309	reset Reset ?pattern? status
	310	study Analyze string for optimizing matching
	311
	312	split Use a regex to split a string into parts
	313
	314	The first five of these are like the escape sequences C<\L>, C<\l>,
	315	C<\U>, C<\u>, and C<\F>. For Titlecase, see L</Titlecase>; For
	316	Foldcase, see L</Foldcase>.
	317
	318	=head2 TERMINOLOGY
	319
	320	=head3 Titlecase
	321
	322	Unicode concept which most often is equal to uppercase, but for
	323	certain characters like the German "sharp s" there is a difference.
	324
	325	=head3 Foldcase
	326
	327	Unicode form that is useful when comparing strings regardless of case,
	328	as certain characters have complex one-to-many case mappings. Primarily a
	329	variant of lowercase.
	330
	331	=head1 AUTHOR
	332
	333	Iain Truskett. Updated by the Perl 5 Porters.
	334
	335	This document may be distributed under the same terms as Perl itself.
	336
	337	=head1 SEE ALSO
	338
	339	=over 4
	340
	341	=item *
	342
	343	L<perlretut> for a tutorial on regular expressions.
	344
	345	=item *
	346
	347	L<perlrequick> for a rapid tutorial.
	348
	349	=item *
	350
	351	L<perlre> for more details.
	352
	353	=item *
	354
	355	L<perlvar> for details on the variables.
	356
	357	=item *
	358
	359	L<perlop> for details on the operators.
	360
	361	=item *
	362
	363	L<perlfunc> for details on the functions.
	364
	365	=item *
	366
	367	L<perlfaq6> for FAQs on regular expressions.
	368
	369	=item *
	370
	371	L<perlrebackslash> for a reference on backslash sequences.
	372
	373	=item *
	374
	375	L<perlrecharclass> for a reference on character classes.
	376
	377	=item *
	378
	379	The L<re> module to alter behaviour and aid
	380	debugging.
	381
	382	=item *
	383
	384	L<perldebug/"Debugging Regular Expressions">
	385
	386	=item *
	387
	388	L<perluniintro>, L<perlunicode>, L<charnames> and L<perllocale>
	389	for details on regexes and internationalisation.
	390
	391	=item *
	392
	393	I<Mastering Regular Expressions> by Jeffrey Friedl
	394	(F<http://oreilly.com/catalog/9780596528126/>) for a thorough grounding and
	395	reference on the topic.
	396
	397	=back
	398
	399	=head1 THANKS
	400
	401	David P.C. Wollmann,
	402	Richard Soderberg,
	403	Sean M. Burke,
	404	Tom Christiansen,
	405	Jim Cromie,
	406	and
	407	Jeffrey Goff
	408	for useful advice.
	409
	410	=cut