perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=head1 NAME
	2
	3	perlreref - Perl Regular Expressions Reference
	4
	5	=head1 DESCRIPTION
	6
	7	This is a quick reference to Perl's regular expressions.
	8	For full information see L<perlre> and L<perlop>, as well
	9	as the L</"SEE ALSO"> section in this document.
	10
	11	=head2 OPERATORS
	12
	13	C<=~> determines to which variable the regex is applied.
	14	In its absence, $_ is used.
	15
	16	$var =~ /foo/;
	17
	18	C<!~> determines to which variable the regex is applied,
	19	and negates the result of the match; it returns
	20	false if the match succeeds, and true if it fails.
	21
	22	$var !~ /foo/;
	23
	24	C<m/pattern/msixpogcdual> searches a string for a pattern match,
	25	applying the given options.
	26
	27	m Multiline mode - ^ and $ match internal lines
	28	s match as a Single line - . matches \n
	29	i case-Insensitive
	30	x eXtended legibility - free whitespace and comments
	31	p Preserve a copy of the matched string -
	32	${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
	33	o compile pattern Once
	34	g Global - all occurrences
	35	c don't reset pos on failed matches when using /g
	36	a restrict \d, \s, \w and [:posix:] to match ASCII only
	37	aa (two a's) also /i matches exclude ASCII/non-ASCII
	38	l match according to current locale
	39	u match according to Unicode rules
	40	d match according to native rules unless something indicates
	41	Unicode
	42
	43	If 'pattern' is an empty string, the last I<successfully> matched
	44	regex is used. Delimiters other than '/' may be used for both this
	45	operator and the following ones. The leading C<m> can be omitted
	46	if the delimiter is '/'.
	47
	48	C<qr/pattern/msixpodual> lets you store a regex in a variable,
	49	or pass one around. Modifiers as for C<m//>, and are stored
	50	within the regex.
	51
	52	C<s/pattern/replacement/msixpogcedual> substitutes matches of
	53	'pattern' with 'replacement'. Modifiers as for C<m//>,
	54	with two additions:
	55
	56	e Evaluate 'replacement' as an expression
	57	r Return substitution and leave the original string untouched.
	58
	59	'e' may be specified multiple times. 'replacement' is interpreted
	60	as a double quoted string unless a single-quote (C<'>) is the delimiter.
	61
	62	C<?pattern?> is like C<m/pattern/> but matches only once. No alternate
	63	delimiters can be used. Must be reset with reset().
	64
	65	=head2 SYNTAX
	66
	67	\ Escapes the character immediately following it
	68	. Matches any single character except a newline (unless /s is
	69	used)
	70	^ Matches at the beginning of the string (or line, if /m is used)
	71	$ Matches at the end of the string (or line, if /m is used)
	72	* Matches the preceding element 0 or more times
	73	+ Matches the preceding element 1 or more times
	74	? Matches the preceding element 0 or 1 times
	75	{...} Specifies a range of occurrences for the element preceding it
	76	[...] Matches any one of the characters contained within the brackets
	77	(...) Groups subexpressions for capturing to $1, $2...
	78	(?:...) Groups subexpressions without capturing (cluster)
	79	\| Matches either the subexpression preceding or following it
	80	\g1 or \g{1}, \g2 ... Matches the text from the Nth group
	81	\1, \2, \3 ... Matches the text from the Nth group
	82	\g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group
	83	\g{name} Named backreference
	84	\k<name> Named backreference
	85	\k'name' Named backreference
	86	(?P=name) Named backreference (python syntax)
	87
	88	=head2 ESCAPE SEQUENCES
	89
	90	These work as in normal strings.
	91
	92	\a Alarm (beep)
	93	\e Escape
	94	\f Formfeed
	95	\n Newline
	96	\r Carriage return
	97	\t Tab
	98	\037 Char whose ordinal is the 3 octal digits, max \777
	99	\o{2307} Char whose ordinal is the octal number, unrestricted
	100	\x7f Char whose ordinal is the 2 hex digits, max \xFF
	101	\x{263a} Char whose ordinal is the hex number, unrestricted
	102	\cx Control-x
	103	\N{name} A named Unicode character or character sequence
	104	\N{U+263D} A Unicode character by hex ordinal
	105
	106	\l Lowercase next character
	107	\u Titlecase next character
	108	\L Lowercase until \E
	109	\U Uppercase until \E
	110	\F Foldcase until \E
	111	\Q Disable pattern metacharacters until \E
	112	\E End modification
	113
	114	For Titlecase, see L</Titlecase>.
	115
	116	This one works differently from normal strings:
	117
	118	\b An assertion, not backspace, except in a character class
	119
	120	=head2 CHARACTER CLASSES
	121
	122	[amy] Match 'a', 'm' or 'y'
	123	[f-j] Dash specifies "range"
	124	[f-j-] Dash escaped or at start or end means 'dash'
	125	[^f-j] Caret indicates "match any character _except_ these"
	126
	127	The following sequences (except C<\N>) work within or without a character class.
	128	The first six are locale aware, all are Unicode aware. See L<perllocale>
	129	and L<perlunicode> for details.
	130
	131	\d A digit
	132	\D A nondigit
	133	\w A word character
	134	\W A non-word character
	135	\s A whitespace character
	136	\S A non-whitespace character
	137	\h An horizontal whitespace
	138	\H A non horizontal whitespace
	139	\N A non newline (when not followed by '{NAME}';;
	140	not valid in a character class; equivalent to [^\n]; it's
	141	like '.' without /s modifier)
	142	\v A vertical whitespace
	143	\V A non vertical whitespace
	144	\R A generic newline (?>\v\|\x0D\x0A)
	145
	146	\C Match a byte (with Unicode, '.' matches a character)
	147	(Deprecated.)
	148	\pP Match P-named (Unicode) property
	149	\p{...} Match Unicode property with name longer than 1 character
	150	\PP Match non-P
	151	\P{...} Match lack of Unicode property with name longer than 1 char
	152	\X Match Unicode extended grapheme cluster
	153
	154	POSIX character classes and their Unicode and Perl equivalents:
	155
	156	ASCII- Full-
	157	POSIX range range backslash
	158	[[:...:]] \p{...} \p{...} sequence Description
	159
	160	-----------------------------------------------------------------------
	161	alnum PosixAlnum XPosixAlnum Alpha plus Digit
	162	alpha PosixAlpha XPosixAlpha Alphabetic characters
	163	ascii ASCII Any ASCII character
	164	blank PosixBlank XPosixBlank \h Horizontal whitespace;
	165	full-range also
	166	written as
	167	\p{HorizSpace} (GNU
	168	extension)
	169	cntrl PosixCntrl XPosixCntrl Control characters
	170	digit PosixDigit XPosixDigit \d Decimal digits
	171	graph PosixGraph XPosixGraph Alnum plus Punct
	172	lower PosixLower XPosixLower Lowercase characters
	173	print PosixPrint XPosixPrint Graph plus Print, but
	174	not any Cntrls
	175	punct PosixPunct XPosixPunct Punctuation and Symbols
	176	in ASCII-range; just
	177	punct outside it
	178	space PosixSpace XPosixSpace [\s\cK]
	179	PerlSpace XPerlSpace \s Perl's whitespace def'n
	180	upper PosixUpper XPosixUpper Uppercase characters
	181	word PosixWord XPosixWord \w Alnum + Unicode marks +
	182	connectors, like '_'
	183	(Perl extension)
	184	xdigit ASCII_Hex_Digit XPosixDigit Hexadecimal digit,
	185	ASCII-range is
	186	[0-9A-Fa-f]
	187
	188	Also, various synonyms like C<\p{Alpha}> for C<\p{XPosixAlpha}>; all listed
	189	in L<perluniprops/Properties accessible through \p{} and \P{}>
	190
	191	Within a character class:
	192
	193	POSIX traditional Unicode
	194	[:digit:] \d \p{Digit}
	195	[:^digit:] \D \P{Digit}
	196
	197	=head2 ANCHORS
	198
	199	All are zero-width assertions.
	200
	201	^ Match string start (or line, if /m is used)
	202	$ Match string end (or line, if /m is used) or before newline
	203	\b Match word boundary (between \w and \W)
	204	\B Match except at word boundary (between \w and \w or \W and \W)
	205	\A Match string start (regardless of /m)
	206	\Z Match string end (before optional newline)
	207	\z Match absolute string end
	208	\G Match where previous m//g left off
	209	\K Keep the stuff left of the \K, don't include it in $&
	210
	211	=head2 QUANTIFIERS
	212
	213	Quantifiers are greedy by default and match the B<longest> leftmost.
	214
	215	Maximal Minimal Possessive Allowed range
	216	------- ------- ---------- -------------
	217	{n,m} {n,m}? {n,m}+ Must occur at least n times
	218	but no more than m times
	219	{n,} {n,}? {n,}+ Must occur at least n times
	220	{n} {n}? {n}+ Must occur exactly n times
	221	* ? + 0 or more times (same as {0,})
	222	+ +? ++ 1 or more times (same as {1,})
	223	? ?? ?+ 0 or 1 time (same as {0,1})
	224
	225	The possessive forms (new in Perl 5.10) prevent backtracking: what gets
	226	matched by a pattern with a possessive quantifier will not be backtracked
	227	into, even if that causes the whole match to fail.
	228
	229	There is no quantifier C<{,n}>. That's interpreted as a literal string.
	230
	231	=head2 EXTENDED CONSTRUCTS
	232
	233	(?#text) A comment
	234	(?:...) Groups subexpressions without capturing (cluster)
	235	(?pimsx-imsx:...) Enable/disable option (as per m// modifiers)
	236	(?=...) Zero-width positive lookahead assertion
	237	(?!...) Zero-width negative lookahead assertion
	238	(?<=...) Zero-width positive lookbehind assertion
	239	(?<!...) Zero-width negative lookbehind assertion
	240	(?>...) Grab what we can, prohibit backtracking
	241	(?\|...) Branch reset
	242	(?<name>...) Named capture
	243	(?'name'...) Named capture
	244	(?P<name>...) Named capture (python syntax)
	245	(?{ code }) Embedded code, return value becomes $^R
	246	(??{ code }) Dynamic regex, return value used as regex
	247	(?N) Recurse into subpattern number N
	248	(?-N), (?+N) Recurse into Nth previous/next subpattern
	249	(?R), (?0) Recurse at the beginning of the whole pattern
	250	(?&name) Recurse into a named subpattern
	251	(?P>name) Recurse into a named subpattern (python syntax)
	252	(?(cond)yes\|no)
	253	(?(cond)yes) Conditional expression, where "cond" can be:
	254	(?=pat) look-ahead
	255	(?!pat) negative look-ahead
	256	(?<=pat) look-behind
	257	(?<!pat) negative look-behind
	258	(N) subpattern N has matched something
	259	(<name>) named subpattern has matched something
	260	('name') named subpattern has matched something
	261	(?{code}) code condition
	262	(R) true if recursing
	263	(RN) true if recursing into Nth subpattern
	264	(R&name) true if recursing into named subpattern
	265	(DEFINE) always false, no no-pattern allowed
	266
	267	=head2 VARIABLES
	268
	269	$_ Default variable for operators to use
	270
	271	$` Everything prior to matched string
	272	$& Entire matched string
	273	$' Everything after to matched string
	274
	275	${^PREMATCH} Everything prior to matched string
	276	${^MATCH} Entire matched string
	277	${^POSTMATCH} Everything after to matched string
	278
	279	Note to those still using Perl 5.18 or earlier:
	280	The use of C<$`>, C<$&> or C<$'> will slow down B<all> regex use
	281	within your program. Consult L<perlvar> for C<@->
	282	to see equivalent expressions that won't cause slow down.
	283	See also L<Devel::SawAmpersand>. Starting with Perl 5.10, you
	284	can also use the equivalent variables C<${^PREMATCH}>, C<${^MATCH}>
	285	and C<${^POSTMATCH}>, but for them to be defined, you have to
	286	specify the C</p> (preserve) modifier on your regular expression.
	287	In Perl 5.20, the use of C<$`>, C<$&> and C<$'> makes no speed difference.
	288
	289	$1, $2 ... hold the Xth captured expr
	290	$+ Last parenthesized pattern match
	291	$^N Holds the most recently closed capture
	292	$^R Holds the result of the last (?{...}) expr
	293	@- Offsets of starts of groups. $-[0] holds start of whole match
	294	@+ Offsets of ends of groups. $+[0] holds end of whole match
	295	%+ Named capture groups
	296	%- Named capture groups, as array refs
	297
	298	Captured groups are numbered according to their I<opening> paren.
	299
	300	=head2 FUNCTIONS
	301
	302	lc Lowercase a string
	303	lcfirst Lowercase first char of a string
	304	uc Uppercase a string
	305	ucfirst Titlecase first char of a string
	306	fc Foldcase a string
	307
	308	pos Return or set current match position
	309	quotemeta Quote metacharacters
	310	reset Reset ?pattern? status
	311	study Analyze string for optimizing matching
	312
	313	split Use a regex to split a string into parts
	314
	315	The first five of these are like the escape sequences C<\L>, C<\l>,
	316	C<\U>, C<\u>, and C<\F>. For Titlecase, see L</Titlecase>; For
	317	Foldcase, see L</Foldcase>.
	318
	319	=head2 TERMINOLOGY
	320
	321	=head3 Titlecase
	322
	323	Unicode concept which most often is equal to uppercase, but for
	324	certain characters like the German "sharp s" there is a difference.
	325
	326	=head3 Foldcase
	327
	328	Unicode form that is useful when comparing strings regardless of case,
	329	as certain characters have complex one-to-many case mappings. Primarily a
	330	variant of lowercase.
	331
	332	=head1 AUTHOR
	333
	334	Iain Truskett. Updated by the Perl 5 Porters.
	335
	336	This document may be distributed under the same terms as Perl itself.
	337
	338	=head1 SEE ALSO
	339
	340	=over 4
	341
	342	=item *
	343
	344	L<perlretut> for a tutorial on regular expressions.
	345
	346	=item *
	347
	348	L<perlrequick> for a rapid tutorial.
	349
	350	=item *
	351
	352	L<perlre> for more details.
	353
	354	=item *
	355
	356	L<perlvar> for details on the variables.
	357
	358	=item *
	359
	360	L<perlop> for details on the operators.
	361
	362	=item *
	363
	364	L<perlfunc> for details on the functions.
	365
	366	=item *
	367
	368	L<perlfaq6> for FAQs on regular expressions.
	369
	370	=item *
	371
	372	L<perlrebackslash> for a reference on backslash sequences.
	373
	374	=item *
	375
	376	L<perlrecharclass> for a reference on character classes.
	377
	378	=item *
	379
	380	The L<re> module to alter behaviour and aid
	381	debugging.
	382
	383	=item *
	384
	385	L<perldebug/"Debugging Regular Expressions">
	386
	387	=item *
	388
	389	L<perluniintro>, L<perlunicode>, L<charnames> and L<perllocale>
	390	for details on regexes and internationalisation.
	391
	392	=item *
	393
	394	I<Mastering Regular Expressions> by Jeffrey Friedl
	395	(F<http://oreilly.com/catalog/9780596528126/>) for a thorough grounding and
	396	reference on the topic.
	397
	398	=back
	399
	400	=head1 THANKS
	401
	402	David P.C. Wollmann,
	403	Richard Soderberg,
	404	Sean M. Burke,
	405	Tom Christiansen,
	406	Jim Cromie,
	407	and
	408	Jeffrey Goff
	409	for useful advice.
	410
	411	=cut