perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=head1 NAME
	2	X<regular expression> X<regex> X<regexp>
	3
	4	perlre - Perl regular expressions
	5
	6	=head1 DESCRIPTION
	7
	8	This page describes the syntax of regular expressions in Perl.
	9
	10	If you haven't used regular expressions before, a quick-start
	11	introduction is available in L<perlrequick>, and a longer tutorial
	12	introduction is available in L<perlretut>.
	13
	14	For reference on how regular expressions are used in matching
	15	operations, plus various examples of the same, see discussions of
	16	C<m//>, C<s///>, C<qr//> and C<??> in L<perlop/"Regexp Quote-Like
	17	Operators">.
	18
	19
	20	=head2 Modifiers
	21
	22	Matching operations can have various modifiers. Modifiers
	23	that relate to the interpretation of the regular expression inside
	24	are listed below. Modifiers that alter the way a regular expression
	25	is used by Perl are detailed in L<perlop/"Regexp Quote-Like Operators"> and
	26	L<perlop/"Gory details of parsing quoted constructs">.
	27
	28	=over 4
	29
	30	=item m
	31	X</m> X<regex, multiline> X<regexp, multiline> X<regular expression, multiline>
	32
	33	Treat string as multiple lines. That is, change "^" and "$" from matching
	34	the start or end of the string to matching the start or end of any
	35	line anywhere within the string.
	36
	37	=item s
	38	X</s> X<regex, single-line> X<regexp, single-line>
	39	X<regular expression, single-line>
	40
	41	Treat string as single line. That is, change "." to match any character
	42	whatsoever, even a newline, which normally it would not match.
	43
	44	Used together, as C</ms>, they let the "." match any character whatsoever,
	45	while still allowing "^" and "$" to match, respectively, just after
	46	and just before newlines within the string.
	47
	48	=item i
	49	X</i> X<regex, case-insensitive> X<regexp, case-insensitive>
	50	X<regular expression, case-insensitive>
	51
	52	Do case-insensitive pattern matching.
	53
	54	If C<use locale> is in effect, the case map is taken from the current
	55	locale. See L<perllocale>.
	56
	57	=item x
	58	X</x>
	59
	60	Extend your pattern's legibility by permitting whitespace and comments.
	61
	62	=item p
	63	X</p> X<regex, preserve> X<regexp, preserve>
	64
	65	Preserve the string matched such that ${^PREMATCH}, ${^MATCH}, and
	66	${^POSTMATCH} are available for use after matching.
	67
	68	=item g and c
	69	X</g> X</c>
	70
	71	Global matching, and keep the Current position after failed matching.
	72	Unlike i, m, s and x, these two flags affect the way the regex is used
	73	rather than the regex itself. See
	74	L<perlretut/"Using regular expressions in Perl"> for further explanation
	75	of the g and c modifiers.
	76
	77	=back
	78
	79	These are usually written as "the C</x> modifier", even though the delimiter
	80	in question might not really be a slash. Any of these
	81	modifiers may also be embedded within the regular expression itself using
	82	the C<(?...)> construct. See below.
	83
	84	The C</x> modifier itself needs a little more explanation. It tells
	85	the regular expression parser to ignore most whitespace that is neither
	86	backslashed nor within a character class. You can use this to break up
	87	your regular expression into (slightly) more readable parts. The C<#>
	88	character is also treated as a metacharacter introducing a comment,
	89	just as in ordinary Perl code. This also means that if you want real
	90	whitespace or C<#> characters in the pattern (outside a character
	91	class, where they are unaffected by C</x>), then you'll either have to
	92	escape them (using backslashes or C<\Q...\E>) or encode them using octal,
	93	hex, or C<\N{}> escapes. Taken together, these features go a long way towards
	94	making Perl's regular expressions more readable. Note that you have to
	95	be careful not to include the pattern delimiter in the comment--perl has
	96	no way of knowing you did not intend to close the pattern early. See
	97	the C-comment deletion code in L<perlop>. Also note that anything inside
	98	a C<\Q...\E> stays unaffected by C</x>. And note that C</x> doesn't affect
	99	whether space interpretation within a single multi-character construct. For
	100	example in C<\x{...}>, regardless of the C</x> modifier, there can be no
	101	spaces. Same for a L<quantifier\|/Quantifiers> such as C<{3}> or
	102	C<{5,}>. Similarly, C<(?:...)> can't have a space between the C<?> and C<:>,
	103	but can between the C<(> and C<?>. Within any delimiters for such a
	104	construct, allowed spaces are not affected by C</x>, and depend on the
	105	construct. For example, C<\x{...}> can't have spaces because hexadecimal
	106	numbers don't have spaces in them. But, Unicode properties can have spaces, so
	107	in C<\p{...}> there can be spaces that follow the Unicode rules, for which see
	108	L<perluniprops/Properties accessible through \p{} and \P{}>.
	109	X</x>
	110
	111	=head2 Regular Expressions
	112
	113	=head3 Metacharacters
	114
	115	The patterns used in Perl pattern matching evolved from those supplied in
	116	the Version 8 regex routines. (The routines are derived
	117	(distantly) from Henry Spencer's freely redistributable reimplementation
	118	of the V8 routines.) See L<Version 8 Regular Expressions> for
	119	details.
	120
	121	In particular the following metacharacters have their standard I<egrep>-ish
	122	meanings:
	123	X<metacharacter>
	124	X<\> X<^> X<.> X<$> X<\|> X<(> X<()> X<[> X<[]>
	125
	126
	127	\ Quote the next metacharacter
	128	^ Match the beginning of the line
	129	. Match any character (except newline)
	130	$ Match the end of the line (or before newline at the end)
	131	\| Alternation
	132	() Grouping
	133	[] Bracketed Character class
	134
	135	By default, the "^" character is guaranteed to match only the
	136	beginning of the string, the "$" character only the end (or before the
	137	newline at the end), and Perl does certain optimizations with the
	138	assumption that the string contains only one line. Embedded newlines
	139	will not be matched by "^" or "$". You may, however, wish to treat a
	140	string as a multi-line buffer, such that the "^" will match after any
	141	newline within the string (except if the newline is the last character in
	142	the string), and "$" will match before any newline. At the
	143	cost of a little more overhead, you can do this by using the /m modifier
	144	on the pattern match operator. (Older programs did this by setting C<$*>,
	145	but this practice has been removed in perl 5.9.)
	146	X<^> X<$> X</m>
	147
	148	To simplify multi-line substitutions, the "." character never matches a
	149	newline unless you use the C</s> modifier, which in effect tells Perl to pretend
	150	the string is a single line--even if it isn't.
	151	X<.> X</s>
	152
	153	=head3 Quantifiers
	154
	155	The following standard quantifiers are recognized:
	156	X<metacharacter> X<quantifier> X<*> X<+> X<?> X<{n}> X<{n,}> X<{n,m}>
	157
	158	* Match 0 or more times
	159	+ Match 1 or more times
	160	? Match 1 or 0 times
	161	{n} Match exactly n times
	162	{n,} Match at least n times
	163	{n,m} Match at least n but not more than m times
	164
	165	(If a curly bracket occurs in any other context, it is treated
	166	as a regular character. In particular, the lower bound
	167	is not optional.) The "*" quantifier is equivalent to C<{0,}>, the "+"
	168	quantifier to C<{1,}>, and the "?" quantifier to C<{0,1}>. n and m are limited
	169	to non-negative integral values less than a preset limit defined when perl is built.
	170	This is usually 32766 on the most common platforms. The actual limit can
	171	be seen in the error message generated by code such as this:
	172
	173	$_ **= $_ , / {$_} / for 2 .. 42;
	174
	175	By default, a quantified subpattern is "greedy", that is, it will match as
	176	many times as possible (given a particular starting location) while still
	177	allowing the rest of the pattern to match. If you want it to match the
	178	minimum number of times possible, follow the quantifier with a "?". Note
	179	that the meanings don't change, just the "greediness":
	180	X<metacharacter> X<greedy> X<greediness>
	181	X<?> X<*?> X<+?> X<??> X<{n}?> X<{n,}?> X<{n,m}?>
	182
	183	*? Match 0 or more times, not greedily
	184	+? Match 1 or more times, not greedily
	185	?? Match 0 or 1 time, not greedily
	186	{n}? Match exactly n times, not greedily
	187	{n,}? Match at least n times, not greedily
	188	{n,m}? Match at least n but not more than m times, not greedily
	189
	190	By default, when a quantified subpattern does not allow the rest of the
	191	overall pattern to match, Perl will backtrack. However, this behaviour is
	192	sometimes undesirable. Thus Perl provides the "possessive" quantifier form
	193	as well.
	194
	195	*+ Match 0 or more times and give nothing back
	196	++ Match 1 or more times and give nothing back
	197	?+ Match 0 or 1 time and give nothing back
	198	{n}+ Match exactly n times and give nothing back (redundant)
	199	{n,}+ Match at least n times and give nothing back
	200	{n,m}+ Match at least n but not more than m times and give nothing back
	201
	202	For instance,
	203
	204	'aaaa' =~ /a++a/
	205
	206	will never match, as the C<a++> will gobble up all the C<a>'s in the
	207	string and won't leave any for the remaining part of the pattern. This
	208	feature can be extremely useful to give perl hints about where it
	209	shouldn't backtrack. For instance, the typical "match a double-quoted
	210	string" problem can be most efficiently performed when written as:
	211
	212	/"(?:[^"\\]++\|\\.)*+"/
	213
	214	as we know that if the final quote does not match, backtracking will not
	215	help. See the independent subexpression C<< (?>...) >> for more details;
	216	possessive quantifiers are just syntactic sugar for that construct. For
	217	instance the above example could also be written as follows:
	218
	219	/"(?>(?:(?>[^"\\]+)\|\\.)*)"/
	220
	221	=head3 Escape sequences
	222
	223	Because patterns are processed as double quoted strings, the following
	224	also work:
	225
	226	\t tab (HT, TAB)
	227	\n newline (LF, NL)
	228	\r return (CR)
	229	\f form feed (FF)
	230	\a alarm (bell) (BEL)
	231	\e escape (think troff) (ESC)
	232	\cK control char (example: VT)
	233	\x{}, \x00 character whose ordinal is the given hexadecimal number
	234	\N{name} named Unicode character or character sequence
	235	\N{U+263D} Unicode character (example: FIRST QUARTER MOON)
	236	\o{}, \000 character whose ordinal is the given octal number
	237	\l lowercase next char (think vi)
	238	\u uppercase next char (think vi)
	239	\L lowercase till \E (think vi)
	240	\U uppercase till \E (think vi)
	241	\Q quote (disable) pattern metacharacters till \E
	242	\E end either case modification or quoted section, think vi
	243
	244	Details are in L<perlop/Quote and Quote-like Operators>.
	245
	246	=head3 Character Classes and other Special Escapes
	247
	248	In addition, Perl defines the following:
	249	X<\g> X<\k> X<\K> X<backreference>
	250
	251	Sequence Note Description
	252	[...] [1] Match a character according to the rules of the
	253	bracketed character class defined by the "...".
	254	Example: [a-z] matches "a" or "b" or "c" ... or "z"
	255	[[:...:]] [2] Match a character according to the rules of the POSIX
	256	character class "..." within the outer bracketed
	257	character class. Example: [[:upper:]] matches any
	258	uppercase character.
	259	\w [3] Match a "word" character (alphanumeric plus "_")
	260	\W [3] Match a non-"word" character
	261	\s [3] Match a whitespace character
	262	\S [3] Match a non-whitespace character
	263	\d [3] Match a decimal digit character
	264	\D [3] Match a non-digit character
	265	\pP [3] Match P, named property. Use \p{Prop} for longer names
	266	\PP [3] Match non-P
	267	\X [4] Match Unicode "eXtended grapheme cluster"
	268	\C Match a single C-language char (octet) even if that is
	269	part of a larger UTF-8 character. Thus it breaks up
	270	characters into their UTF-8 bytes, so you may end up
	271	with malformed pieces of UTF-8. Unsupported in
	272	lookbehind.
	273	\1 [5] Backreference to a specific capture group or buffer.
	274	'1' may actually be any positive integer.
	275	\g1 [5] Backreference to a specific or previous group,
	276	\g{-1} [5] The number may be negative indicating a relative
	277	previous group and may optionally be wrapped in
	278	curly brackets for safer parsing.
	279	\g{name} [5] Named backreference
	280	\k<name> [5] Named backreference
	281	\K [6] Keep the stuff left of the \K, don't include it in $&
	282	\N [7] Any character but \n (experimental). Not affected by
	283	/s modifier
	284	\v [3] Vertical whitespace
	285	\V [3] Not vertical whitespace
	286	\h [3] Horizontal whitespace
	287	\H [3] Not horizontal whitespace
	288	\R [4] Linebreak
	289
	290	=over 4
	291
	292	=item [1]
	293
	294	See L<perlrecharclass/Bracketed Character Classes> for details.
	295
	296	=item [2]
	297
	298	See L<perlrecharclass/POSIX Character Classes> for details.
	299
	300	=item [3]
	301
	302	See L<perlrecharclass/Backslash sequences> for details.
	303
	304	=item [4]
	305
	306	See L<perlrebackslash/Misc> for details.
	307
	308	=item [5]
	309
	310	See L</Capture groups> below for details.
	311
	312	=item [6]
	313
	314	See L</Extended Patterns> below for details.
	315
	316	=item [7]
	317
	318	Note that C<\N> has two meanings. When of the form C<\N{NAME}>, it matches the
	319	character or character sequence whose name is C<NAME>; and similarly
	320	when of the form C<\N{U+I<hex>}>, it matches the character whose Unicode
	321	code point is I<hex>. Otherwise it matches any character but C<\n>.
	322
	323	=back
	324
	325	=head3 Assertions
	326
	327	Perl defines the following zero-width assertions:
	328	X<zero-width assertion> X<assertion> X<regex, zero-width assertion>
	329	X<regexp, zero-width assertion>
	330	X<regular expression, zero-width assertion>
	331	X<\b> X<\B> X<\A> X<\Z> X<\z> X<\G>
	332
	333	\b Match a word boundary
	334	\B Match except at a word boundary
	335	\A Match only at beginning of string
	336	\Z Match only at end of string, or before newline at the end
	337	\z Match only at end of string
	338	\G Match only at pos() (e.g. at the end-of-match position
	339	of prior m//g)
	340
	341	A word boundary (C<\b>) is a spot between two characters
	342	that has a C<\w> on one side of it and a C<\W> on the other side
	343	of it (in either order), counting the imaginary characters off the
	344	beginning and end of the string as matching a C<\W>. (Within
	345	character classes C<\b> represents backspace rather than a word
	346	boundary, just as it normally does in any double-quoted string.)
	347	The C<\A> and C<\Z> are just like "^" and "$", except that they
	348	won't match multiple times when the C</m> modifier is used, while
	349	"^" and "$" will match at every internal line boundary. To match
	350	the actual end of the string and not ignore an optional trailing
	351	newline, use C<\z>.
	352	X<\b> X<\A> X<\Z> X<\z> X</m>
	353
	354	The C<\G> assertion can be used to chain global matches (using
	355	C<m//g>), as described in L<perlop/"Regexp Quote-Like Operators">.
	356	It is also useful when writing C<lex>-like scanners, when you have
	357	several patterns that you want to match against consequent substrings
	358	of your string, see the previous reference. The actual location
	359	where C<\G> will match can also be influenced by using C<pos()> as
	360	an lvalue: see L<perlfunc/pos>. Note that the rule for zero-length
	361	matches is modified somewhat, in that contents to the left of C<\G> is
	362	not counted when determining the length of the match. Thus the following
	363	will not match forever:
	364	X<\G>
	365
	366	my $string = 'ABC';
	367	pos($string) = 1;
	368	while ($string =~ /(.\G)/g) {
	369	print $1;
	370	}
	371
	372	It will print 'A' and then terminate, as it considers the match to
	373	be zero-width, and thus will not match at the same position twice in a
	374	row.
	375
	376	It is worth noting that C<\G> improperly used can result in an infinite
	377	loop. Take care when using patterns that include C<\G> in an alternation.
	378
	379	=head3 Capture groups
	380
	381	The bracketing construct C<( ... )> creates capture groups (also referred to as
	382	capture buffers). To refer to the current contents of a group later on, within
	383	the same pattern, use C<\g1> (or C<\g{1}>) for the first, C<\g2> (or C<\g{2}>)
	384	for the second, and so on.
	385	This is called a I<backreference>.
	386	X<regex, capture buffer> X<regexp, capture buffer>
	387	X<regex, capture group> X<regexp, capture group>
	388	X<regular expression, capture buffer> X<backreference>
	389	X<regular expression, capture group> X<backreference>
	390	X<\g{1}> X<\g{-1}> X<\g{name}> X<relative backreference> X<named backreference>
	391	X<named capture buffer> X<regular expression, named capture buffer>
	392	X<named capture group> X<regular expression, named capture group>
	393	X<%+> X<$+{name}> X<< \k<name> >>
	394	There is no limit to the number of captured substrings that you may use.
	395	Groups are numbered with the leftmost open parenthesis being number 1, etc. If
	396	a group did not match, the associated backreference won't match either. (This
	397	can happen if the group is optional, or in a different branch of an
	398	alternation.)
	399	You can omit the C<"g">, and write C<"\1">, etc, but there are some issues with
	400	this form, described below.
	401
	402	You can also refer to capture groups relatively, by using a negative number, so
	403	that C<\g-1> and C<\g{-1}> both refer to the immediately preceding capture
	404	group, and C<\g-2> and C<\g{-2}> both refer to the group before it. For
	405	example:
	406
	407	/
	408	(Y) # group 1
	409	( # group 2
	410	(X) # group 3
	411	\g{-1} # backref to group 3
	412	\g{-3} # backref to group 1
	413	)
	414	/x
	415
	416	would match the same as C</(Y) ( (X) \g3 \g1 )/x>. This allows you to
	417	interpolate regexes into larger regexes and not have to worry about the
	418	capture groups being renumbered.
	419
	420	You can dispense with numbers altogether and create named capture groups.
	421	The notation is C<(?E<lt>I<name>E<gt>...)> to declare and C<\g{I<name>}> to
	422	reference. (To be compatible with .Net regular expressions, C<\g{I<name>}> may
	423	also be written as C<\k{I<name>}>, C<\kE<lt>I<name>E<gt>> or C<\k'I<name>'>.)
	424	I<name> must not begin with a number, nor contain hyphens.
	425	When different groups within the same pattern have the same name, any reference
	426	to that name assumes the leftmost defined group. Named groups count in
	427	absolute and relative numbering, and so can also be referred to by those
	428	numbers.
	429	(It's possible to do things with named capture groups that would otherwise
	430	require C<(??{})>.)
	431
	432	Capture group contents are dynamically scoped and available to you outside the
	433	pattern until the end of the enclosing block or until the next successful
	434	match, whichever comes first. (See L<perlsyn/"Compound Statements">.)
	435	You can refer to them by absolute number (using C<"$1"> instead of C<"\g1">,
	436	etc); or by name via the C<%+> hash, using C<"$+{I<name>}">.
	437
	438	Braces are required in referring to named capture groups, but are optional for
	439	absolute or relative numbered ones. Braces are safer when creating a regex by
	440	concatenating smaller strings. For example if you have C<qr/$a$b/>, and C<$a>
	441	contained C<"\g1">, and C<$b> contained C<"37">, you would get C</\g137/> which
	442	is probably not what you intended.
	443
	444	The C<\g> and C<\k> notations were introduced in Perl 5.10.0. Prior to that
	445	there were no named nor relative numbered capture groups. Absolute numbered
	446	groups were referred to using C<\1>, C<\2>, etc, and this notation is still
	447	accepted (and likely always will be). But it leads to some ambiguities if
	448	there are more than 9 capture groups, as C<\10> could mean either the tenth
	449	capture group, or the character whose ordinal in octal is 010 (a backspace in
	450	ASCII). Perl resolves this ambiguity by interpreting C<\10> as a backreference
	451	only if at least 10 left parentheses have opened before it. Likewise C<\11> is
	452	a backreference only if at least 11 left parentheses have opened before it.
	453	And so on. C<\1> through C<\9> are always interpreted as backreferences.
	454	There are several examples below that illustrate these perils. You can avoid
	455	the ambiguity by always using C<\g{}> or C<\g> if you mean capturing groups;
	456	and for octal constants always using C<\o{}>, or for C<\077> and below, using 3
	457	digits padded with leading zeros, since a leading zero implies an octal
	458	constant.
	459
	460	The C<\I<digit>> notation also works in certain circumstances outside
	461	the pattern. See L</Warning on \1 Instead of $1> below for details.)
	462
	463	Examples:
	464
	465	s/^([^ ]) ([^ ]*)/$2 $1/; # swap first two words
	466
	467	/(.)\g1/ # find first doubled char
	468	and print "'$1' is the first doubled character\n";
	469
	470	/(?<char>.)\k<char>/ # ... a different way
	471	and print "'$+{char}' is the first doubled character\n";
	472
	473	/(?'char'.)\g1/ # ... mix and match
	474	and print "'$1' is the first doubled character\n";
	475
	476	if (/Time: (..):(..):(..)/) { # parse out values
	477	$hours = $1;
	478	$minutes = $2;
	479	$seconds = $3;
	480	}
	481
	482	/(.)(.)(.)(.)(.)(.)(.)(.)(.)\g10/ # \g10 is a backreference
	483	/(.)(.)(.)(.)(.)(.)(.)(.)(.)\10/ # \10 is octal
	484	/((.)(.)(.)(.)(.)(.)(.)(.)(.))\10/ # \10 is a backreference
	485	/((.)(.)(.)(.)(.)(.)(.)(.)(.))\010/ # \010 is octal
	486
	487	$a = '(.)\1'; # Creates problems when concatenated.
	488	$b = '(.)\g{1}'; # Avoids the problems.
	489	"aa" =~ /${a}/; # True
	490	"aa" =~ /${b}/; # True
	491	"aa0" =~ /${a}0/; # False!
	492	"aa0" =~ /${b}0/; # True
	493	"aa\x08" =~ /${a}0/; # True!
	494	"aa\x08" =~ /${b}0/; # False
	495
	496	Several special variables also refer back to portions of the previous
	497	match. C<$+> returns whatever the last bracket match matched.
	498	C<$&> returns the entire matched string. (At one point C<$0> did
	499	also, but now it returns the name of the program.) C<$`> returns
	500	everything before the matched string. C<$'> returns everything
	501	after the matched string. And C<$^N> contains whatever was matched by
	502	the most-recently closed group (submatch). C<$^N> can be used in
	503	extended patterns (see below), for example to assign a submatch to a
	504	variable.
	505	X<$+> X<$^N> X<$&> X<$`> X<$'>
	506
	507	These special variables, like the C<%+> hash and the numbered match variables
	508	(C<$1>, C<$2>, C<$3>, etc.) are dynamically scoped
	509	until the end of the enclosing block or until the next successful
	510	match, whichever comes first. (See L<perlsyn/"Compound Statements">.)
	511	X<$+> X<$^N> X<$&> X<$`> X<$'>
	512	X<$1> X<$2> X<$3> X<$4> X<$5> X<$6> X<$7> X<$8> X<$9>
	513
	514	B<NOTE>: Failed matches in Perl do not reset the match variables,
	515	which makes it easier to write code that tests for a series of more
	516	specific cases and remembers the best match.
	517
	518	B<WARNING>: Once Perl sees that you need one of C<$&>, C<$`>, or
	519	C<$'> anywhere in the program, it has to provide them for every
	520	pattern match. This may substantially slow your program. Perl
	521	uses the same mechanism to produce C<$1>, C<$2>, etc, so you also pay a
	522	price for each pattern that contains capturing parentheses. (To
	523	avoid this cost while retaining the grouping behaviour, use the
	524	extended regular expression C<(?: ... )> instead.) But if you never
	525	use C<$&>, C<$`> or C<$'>, then patterns I<without> capturing
	526	parentheses will not be penalized. So avoid C<$&>, C<$'>, and C<$`>
	527	if you can, but if you can't (and some algorithms really appreciate
	528	them), once you've used them once, use them at will, because you've
	529	already paid the price. As of 5.005, C<$&> is not so costly as the
	530	other two.
	531	X<$&> X<$`> X<$'>
	532
	533	As a workaround for this problem, Perl 5.10.0 introduces C<${^PREMATCH}>,
	534	C<${^MATCH}> and C<${^POSTMATCH}>, which are equivalent to C<$`>, C<$&>
	535	and C<$'>, B<except> that they are only guaranteed to be defined after a
	536	successful match that was executed with the C</p> (preserve) modifier.
	537	The use of these variables incurs no global performance penalty, unlike
	538	their punctuation char equivalents, however at the trade-off that you
	539	have to tell perl when you want to use them.
	540	X</p> X<p modifier>
	541
	542	=head2 Quoting metacharacters
	543
	544	Backslashed metacharacters in Perl are alphanumeric, such as C<\b>,
	545	C<\w>, C<\n>. Unlike some other regular expression languages, there
	546	are no backslashed symbols that aren't alphanumeric. So anything
	547	that looks like \\, $, $, \<, \>, \{, or \} is always
	548	interpreted as a literal character, not a metacharacter. This was
	549	once used in a common idiom to disable or quote the special meanings
	550	of regular expression metacharacters in a string that you want to
	551	use for a pattern. Simply quote all non-"word" characters:
	552
	553	$pattern =~ s/(\W)/\\$1/g;
	554
	555	(If C<use locale> is set, then this depends on the current locale.)
	556	Today it is more common to use the quotemeta() function or the C<\Q>
	557	metaquoting escape sequence to disable all metacharacters' special
	558	meanings like this:
	559
	560	/$unquoted\Q$quoted\E$unquoted/
	561
	562	Beware that if you put literal backslashes (those not inside
	563	interpolated variables) between C<\Q> and C<\E>, double-quotish
	564	backslash interpolation may lead to confusing results. If you
	565	I<need> to use literal backslashes within C<\Q...\E>,
	566	consult L<perlop/"Gory details of parsing quoted constructs">.
	567
	568	=head2 Extended Patterns
	569
	570	Perl also defines a consistent extension syntax for features not
	571	found in standard tools like B<awk> and B<lex>. The syntax is a
	572	pair of parentheses with a question mark as the first thing within
	573	the parentheses. The character after the question mark indicates
	574	the extension.
	575
	576	The stability of these extensions varies widely. Some have been
	577	part of the core language for many years. Others are experimental
	578	and may change without warning or be completely removed. Check
	579	the documentation on an individual feature to verify its current
	580	status.
	581
	582	A question mark was chosen for this and for the minimal-matching
	583	construct because 1) question marks are rare in older regular
	584	expressions, and 2) whenever you see one, you should stop and
	585	"question" exactly what is going on. That's psychology...
	586
	587	=over 10
	588
	589	=item C<(?#text)>
	590	X<(?#)>
	591
	592	A comment. The text is ignored. If the C</x> modifier enables
	593	whitespace formatting, a simple C<#> will suffice. Note that Perl closes
	594	the comment as soon as it sees a C<)>, so there is no way to put a literal
	595	C<)> in the comment.
	596
	597	=item C<(?dlupimsx-imsx)>
	598
	599	=item C<(?^lupimsx)>
	600	X<(?)> X<(?^)>
	601
	602	One or more embedded pattern-match modifiers, to be turned on (or
	603	turned off, if preceded by C<->) for the remainder of the pattern or
	604	the remainder of the enclosing pattern group (if any).
	605
	606	This is particularly useful for dynamic patterns, such as those read in from a
	607	configuration file, taken from an argument, or specified in a table
	608	somewhere. Consider the case where some patterns want to be case
	609	sensitive and some do not: The case insensitive ones merely need to
	610	include C<(?i)> at the front of the pattern. For example:
	611
	612	$pattern = "foobar";
	613	if ( /$pattern/i ) { }
	614
	615	# more flexible:
	616
	617	$pattern = "(?i)foobar";
	618	if ( /$pattern/ ) { }
	619
	620	These modifiers are restored at the end of the enclosing group. For example,
	621
	622	( (?i) blah ) \s+ \g1
	623
	624	will match C<blah> in any case, some spaces, and an exact (I<including the case>!)
	625	repetition of the previous word, assuming the C</x> modifier, and no C</i>
	626	modifier outside this group.
	627
	628	These modifiers do not carry over into named subpatterns called in the
	629	enclosing group. In other words, a pattern such as C<((?i)(&NAME))> does not
	630	change the case-sensitivity of the "NAME" pattern.
	631
	632	Starting in Perl 5.14, a C<"^"> (caret or circumflex accent) immediately
	633	after the C<"?"> is a shorthand equivalent to C<d-imsx>. Flags (except
	634	C<"d">) may follow the caret to override it.
	635	But a minus sign is not legal with it.
	636
	637	Also, starting in Perl 5.14, are modifiers C<"d">, C<"l">, and C<"u">,
	638	which for 5.14 may not be used as suffix modifiers.
	639
	640	C<"l"> means to use a locale (see L<perllocale>) when pattern matching.
	641	The locale used will be the one in effect at the time of execution of
	642	the pattern match. This may not be the same as the compilation-time
	643	locale, and can differ from one match to another if there is an
	644	intervening call of the
	645	L<setlocale() function\|perllocale/The setlocale function>.
	646	This modifier is automatically set if the regular expression is compiled
	647	within the scope of a C<"use locale"> pragma.
	648
	649	C<"u"> has no effect currently. It is automatically set if the regular
	650	expression is compiled within the scope of a
	651	L<C<"use feature 'unicode_strings">\|feature> pragma.
	652
	653	C<"d"> means to use the traditional Perl pattern matching behavior.
	654	This is dualistic (hence the name C<"d">, which also could stand for
	655	"default"). When this is in effect, Perl matches utf8-encoded strings
	656	using Unicode rules, and matches non-utf8-encoded strings using the
	657	platform's native character set rules.
	658	See L<perlunicode/The "Unicode Bug">. It is automatically selected by
	659	default if the regular expression is compiled neither within the scope
	660	of a C<"use locale"> pragma nor a <C<"use feature 'unicode_strings">
	661	pragma.
	662
	663	Note that the C<d>, C<l>, C<p>, and C<u> modifiers are special in that
	664	they can only be enabled, not disabled, and the C<d>, C<l>, and C<u>
	665	modifiers are mutually exclusive; a maximum of one may appear in the
	666	construct. Specifying one de-specifies the others. Thus, for example,
	667	C<(?-p)> and C<(?-d:...)> are meaningless and will warn when compiled
	668	under C<use warnings>.
	669
	670	Note also that the C<p> modifier is special in that its presence
	671	anywhere in a pattern has a global effect.
	672
	673	=item C<(?:pattern)>
	674	X<(?:)>
	675
	676	=item C<(?dluimsx-imsx:pattern)>
	677
	678	=item C<(?^luimsx:pattern)>
	679	X<(?^:)>
	680
	681	This is for clustering, not capturing; it groups subexpressions like
	682	"()", but doesn't make backreferences as "()" does. So
	683
	684	@fields = split(/\b(?:a\|b\|c)\b/)
	685
	686	is like
	687
	688	@fields = split(/\b(a\|b\|c)\b/)
	689
	690	but doesn't spit out extra fields. It's also cheaper not to capture
	691	characters if you don't need to.
	692
	693	Any letters between C<?> and C<:> act as flags modifiers as with
	694	C<(?dluimsx-imsx)>. For example,
	695
	696	/(?s-i:more.than).million/i
	697
	698	is equivalent to the more verbose
	699
	700	/(?:(?s-i)more.than).million/i
	701
	702	Starting in Perl 5.14, a C<"^"> (caret or circumflex accent) immediately
	703	after the C<"?"> is a shorthand equivalent to C<d-imsx>. Any positive
	704	flags (except C<"d">) may follow the caret, so
	705
	706	(?^x:foo)
	707
	708	is equivalent to
	709
	710	(?x-ims:foo)
	711
	712	The caret tells Perl that this cluster doesn't inherit the flags of any
	713	surrounding pattern, but to go back to the system defaults (C<d-imsx>),
	714	modified by any flags specified.
	715
	716	The caret allows for simpler stringification of compiled regular
	717	expressions. These look like
	718
	719	(?^:pattern)
	720
	721	with any non-default flags appearing between the caret and the colon.
	722	A test that looks at such stringification thus doesn't need to have the
	723	system default flags hard-coded in it, just the caret. If new flags are
	724	added to Perl, the meaning of the caret's expansion will change to include
	725	the default for those flags, so the test will still work, unchanged.
	726
	727	Specifying a negative flag after the caret is an error, as the flag is
	728	redundant.
	729
	730	Mnemonic for C<(?^...)>: A fresh beginning since the usual use of a caret is
	731	to match at the beginning.
	732
	733	=item C<(?\|pattern)>
	734	X<(?\|)> X<Branch reset>
	735
	736	This is the "branch reset" pattern, which has the special property
	737	that the capture groups are numbered from the same starting point
	738	in each alternation branch. It is available starting from perl 5.10.0.
	739
	740	Capture groups are numbered from left to right, but inside this
	741	construct the numbering is restarted for each branch.
	742
	743	The numbering within each branch will be as normal, and any groups
	744	following this construct will be numbered as though the construct
	745	contained only one branch, that being the one with the most capture
	746	groups in it.
	747
	748	This construct will be useful when you want to capture one of a
	749	number of alternative matches.
	750
	751	Consider the following pattern. The numbers underneath show in
	752	which group the captured content will be stored.
	753
	754
	755	# before ---------------branch-reset----------- after
	756	/ ( a ) (?\| x ( y ) z \| (p (q) r) \| (t) u (v) ) ( z ) /x
	757	# 1 2 2 3 2 3 4
	758
	759	Be careful when using the branch reset pattern in combination with
	760	named captures. Named captures are implemented as being aliases to
	761	numbered groups holding the captures, and that interferes with the
	762	implementation of the branch reset pattern. If you are using named
	763	captures in a branch reset pattern, it's best to use the same names,
	764	in the same order, in each of the alternations:
	765
	766	/(?\| (?<a> x ) (?<b> y )
	767	\| (?<a> z ) (?<b> w )) /x
	768
	769	Not doing so may lead to surprises:
	770
	771	"12" =~ /(?\| (?<a> \d+ ) \| (?<b> \D+))/x;
	772	say $+ {a}; # Prints '12'
	773	say $+ {b}; # Also prints '12'.
	774
	775	The problem here is that both the group named C<< a >> and the group
	776	named C<< b >> are aliases for the group belonging to C<< $1 >>.
	777
	778	=item Look-Around Assertions
	779	X<look-around assertion> X<lookaround assertion> X<look-around> X<lookaround>
	780
	781	Look-around assertions are zero width patterns which match a specific
	782	pattern without including it in C<$&>. Positive assertions match when
	783	their subpattern matches, negative assertions match when their subpattern
	784	fails. Look-behind matches text up to the current match position,
	785	look-ahead matches text following the current match position.
	786
	787	=over 4
	788
	789	=item C<(?=pattern)>
	790	X<(?=)> X<look-ahead, positive> X<lookahead, positive>
	791
	792	A zero-width positive look-ahead assertion. For example, C</\w+(?=\t)/>
	793	matches a word followed by a tab, without including the tab in C<$&>.
	794
	795	=item C<(?!pattern)>
	796	X<(?!)> X<look-ahead, negative> X<lookahead, negative>
	797
	798	A zero-width negative look-ahead assertion. For example C</foo(?!bar)/>
	799	matches any occurrence of "foo" that isn't followed by "bar". Note
	800	however that look-ahead and look-behind are NOT the same thing. You cannot
	801	use this for look-behind.
	802
	803	If you are looking for a "bar" that isn't preceded by a "foo", C</(?!foo)bar/>
	804	will not do what you want. That's because the C<(?!foo)> is just saying that
	805	the next thing cannot be "foo"--and it's not, it's a "bar", so "foobar" will
	806	match. You would have to do something like C</(?!foo)...bar/> for that. We
	807	say "like" because there's the case of your "bar" not having three characters
	808	before it. You could cover that this way: C</(?:(?!foo)...\|^.{0,2})bar/>.
	809	Sometimes it's still easier just to say:
	810
	811	if (/bar/ && $` !~ /foo$/)
	812
	813	For look-behind see below.
	814
	815	=item C<(?<=pattern)> C<\K>
	816	X<(?<=)> X<look-behind, positive> X<lookbehind, positive> X<\K>
	817
	818	A zero-width positive look-behind assertion. For example, C</(?<=\t)\w+/>
	819	matches a word that follows a tab, without including the tab in C<$&>.
	820	Works only for fixed-width look-behind.
	821
	822	There is a special form of this construct, called C<\K>, which causes the
	823	regex engine to "keep" everything it had matched prior to the C<\K> and
	824	not include it in C<$&>. This effectively provides variable length
	825	look-behind. The use of C<\K> inside of another look-around assertion
	826	is allowed, but the behaviour is currently not well defined.
	827
	828	For various reasons C<\K> may be significantly more efficient than the
	829	equivalent C<< (?<=...) >> construct, and it is especially useful in
	830	situations where you want to efficiently remove something following
	831	something else in a string. For instance
	832
	833	s/(foo)bar/$1/g;
	834
	835	can be rewritten as the much more efficient
	836
	837	s/foo\Kbar//g;
	838
	839	=item C<(?<!pattern)>
	840	X<(?<!)> X<look-behind, negative> X<lookbehind, negative>
	841
	842	A zero-width negative look-behind assertion. For example C</(?<!bar)foo/>
	843	matches any occurrence of "foo" that does not follow "bar". Works
	844	only for fixed-width look-behind.
	845
	846	=back
	847
	848	=item C<(?'NAME'pattern)>
	849
	850	=item C<< (?<NAME>pattern) >>
	851	X<< (?<NAME>) >> X<(?'NAME')> X<named capture> X<capture>
	852
	853	A named capture group. Identical in every respect to normal capturing
	854	parentheses C<()> but for the additional fact that C<%+> or C<%-> may be
	855	used after a successful match to refer to a named group. See C<perlvar>
	856	for more details on the C<%+> and C<%-> hashes.
	857
	858	If multiple distinct capture groups have the same name then the
	859	$+{NAME} will refer to the leftmost defined group in the match.
	860
	861	The forms C<(?'NAME'pattern)> and C<< (?<NAME>pattern) >> are equivalent.
	862
	863	B<NOTE:> While the notation of this construct is the same as the similar
	864	function in .NET regexes, the behavior is not. In Perl the groups are
	865	numbered sequentially regardless of being named or not. Thus in the
	866	pattern
	867
	868	/(x)(?<foo>y)(z)/
	869
	870	$+{foo} will be the same as $2, and $3 will contain 'z' instead of
	871	the opposite which is what a .NET regex hacker might expect.
	872
	873	Currently NAME is restricted to simple identifiers only.
	874	In other words, it must match C</^[_A-Za-z][_A-Za-z0-9]*\z/> or
	875	its Unicode extension (see L<utf8>),
	876	though it isn't extended by the locale (see L<perllocale>).
	877
	878	B<NOTE:> In order to make things easier for programmers with experience
	879	with the Python or PCRE regex engines, the pattern C<< (?PE<lt>NAMEE<gt>pattern) >>
	880	may be used instead of C<< (?<NAME>pattern) >>; however this form does not
	881	support the use of single quotes as a delimiter for the name.
	882
	883	=item C<< \k<NAME> >>
	884
	885	=item C<< \k'NAME' >>
	886
	887	Named backreference. Similar to numeric backreferences, except that
	888	the group is designated by name and not number. If multiple groups
	889	have the same name then it refers to the leftmost defined group in
	890	the current match.
	891
	892	It is an error to refer to a name not defined by a C<< (?<NAME>) >>
	893	earlier in the pattern.
	894
	895	Both forms are equivalent.
	896
	897	B<NOTE:> In order to make things easier for programmers with experience
	898	with the Python or PCRE regex engines, the pattern C<< (?P=NAME) >>
	899	may be used instead of C<< \k<NAME> >>.
	900
	901	=item C<(?{ code })>
	902	X<(?{})> X<regex, code in> X<regexp, code in> X<regular expression, code in>
	903
	904	B<WARNING>: This extended regular expression feature is considered
	905	experimental, and may be changed without notice. Code executed that
	906	has side effects may not perform identically from version to version
	907	due to the effect of future optimisations in the regex engine.
	908
	909	This zero-width assertion evaluates any embedded Perl code. It
	910	always succeeds, and its C<code> is not interpolated. Currently,
	911	the rules to determine where the C<code> ends are somewhat convoluted.
	912
	913	This feature can be used together with the special variable C<$^N> to
	914	capture the results of submatches in variables without having to keep
	915	track of the number of nested parentheses. For example:
	916
	917	$_ = "The brown fox jumps over the lazy dog";
	918	/the (\S+)(?{ $color = $^N }) (\S+)(?{ $animal = $^N })/i;
	919	print "color = $color, animal = $animal\n";
	920
	921	Inside the C<(?{...})> block, C<$_> refers to the string the regular
	922	expression is matching against. You can also use C<pos()> to know what is
	923	the current position of matching within this string.
	924
	925	The C<code> is properly scoped in the following sense: If the assertion
	926	is backtracked (compare L<"Backtracking">), all changes introduced after
	927	C<local>ization are undone, so that
	928
	929	$_ = 'a' x 8;
	930	m<
	931	(?{ $cnt = 0 }) # Initialize $cnt.
	932	(
	933	a
	934	(?{
	935	local $cnt = $cnt + 1; # Update $cnt, backtracking-safe.
	936	})
	937	)*
	938	aaaa
	939	(?{ $res = $cnt }) # On success copy to
	940	# non-localized location.
	941	>x;
	942
	943	will set C<$res = 4>. Note that after the match, C<$cnt> returns to the globally
	944	introduced value, because the scopes that restrict C<local> operators
	945	are unwound.
	946
	947	This assertion may be used as a C<(?(condition)yes-pattern\|no-pattern)>
	948	switch. If I<not> used in this way, the result of evaluation of
	949	C<code> is put into the special variable C<$^R>. This happens
	950	immediately, so C<$^R> can be used from other C<(?{ code })> assertions
	951	inside the same regular expression.
	952
	953	The assignment to C<$^R> above is properly localized, so the old
	954	value of C<$^R> is restored if the assertion is backtracked; compare
	955	L<"Backtracking">.
	956
	957	For reasons of security, this construct is forbidden if the regular
	958	expression involves run-time interpolation of variables, unless the
	959	perilous C<use re 'eval'> pragma has been used (see L<re>), or the
	960	variables contain results of C<qr//> operator (see
	961	L<perlop/"qr/STRINGE<sol>msixpo">).
	962
	963	This restriction is due to the wide-spread and remarkably convenient
	964	custom of using run-time determined strings as patterns. For example:
	965
	966	$re = <>;
	967	chomp $re;
	968	$string =~ /$re/;
	969
	970	Before Perl knew how to execute interpolated code within a pattern,
	971	this operation was completely safe from a security point of view,
	972	although it could raise an exception from an illegal pattern. If
	973	you turn on the C<use re 'eval'>, though, it is no longer secure,
	974	so you should only do so if you are also using taint checking.
	975	Better yet, use the carefully constrained evaluation within a Safe
	976	compartment. See L<perlsec> for details about both these mechanisms.
	977
	978	B<WARNING>: Use of lexical (C<my>) variables in these blocks is
	979	broken. The result is unpredictable and will make perl unstable. The
	980	workaround is to use global (C<our>) variables.
	981
	982	B<WARNING>: Because Perl's regex engine is currently not re-entrant,
	983	interpolated code may not invoke the regex engine either directly with
	984	C<m//> or C<s///>), or indirectly with functions such as
	985	C<split>. Invoking the regex engine in these blocks will make perl
	986	unstable.
	987
	988	=item C<(??{ code })>
	989	X<(??{})>
	990	X<regex, postponed> X<regexp, postponed> X<regular expression, postponed>
	991
	992	B<WARNING>: This extended regular expression feature is considered
	993	experimental, and may be changed without notice. Code executed that
	994	has side effects may not perform identically from version to version
	995	due to the effect of future optimisations in the regex engine.
	996
	997	This is a "postponed" regular subexpression. The C<code> is evaluated
	998	at run time, at the moment this subexpression may match. The result
	999	of evaluation is considered as a regular expression and matched as
	1000	if it were inserted instead of this construct. Note that this means
	1001	that the contents of capture groups defined inside an eval'ed pattern
	1002	are not available outside of the pattern, and vice versa, there is no
	1003	way for the inner pattern to refer to a capture group defined outside.
	1004	Thus,
	1005
	1006	('a' x 100)=~/(??{'(.)' x 100})/
	1007
	1008	B<will> match, it will B<not> set $1.
	1009
	1010	The C<code> is not interpolated. As before, the rules to determine
	1011	where the C<code> ends are currently somewhat convoluted.
	1012
	1013	The following pattern matches a parenthesized group:
	1014
	1015	$re = qr{
	1016	\(
	1017	(?:
	1018	(?> [^()]+ ) # Non-parens without backtracking
	1019	\|
	1020	(??{ $re }) # Group with matching parens
	1021	)*
	1022	\)
	1023	}x;
	1024
	1025	See also C<(?PARNO)> for a different, more efficient way to accomplish
	1026	the same task.
	1027
	1028	For reasons of security, this construct is forbidden if the regular
	1029	expression involves run-time interpolation of variables, unless the
	1030	perilous C<use re 'eval'> pragma has been used (see L<re>), or the
	1031	variables contain results of C<qr//> operator (see
	1032	L<perlop/"qrE<sol>STRINGE<sol>msixpo">).
	1033
	1034	Because perl's regex engine is not currently re-entrant, delayed
	1035	code may not invoke the regex engine either directly with C<m//> or C<s///>),
	1036	or indirectly with functions such as C<split>.
	1037
	1038	Recursing deeper than 50 times without consuming any input string will
	1039	result in a fatal error. The maximum depth is compiled into perl, so
	1040	changing it requires a custom build.
	1041
	1042	=item C<(?PARNO)> C<(?-PARNO)> C<(?+PARNO)> C<(?R)> C<(?0)>
	1043	X<(?PARNO)> X<(?1)> X<(?R)> X<(?0)> X<(?-1)> X<(?+1)> X<(?-PARNO)> X<(?+PARNO)>
	1044	X<regex, recursive> X<regexp, recursive> X<regular expression, recursive>
	1045	X<regex, relative recursion>
	1046
	1047	Similar to C<(??{ code })> except it does not involve compiling any code,
	1048	instead it treats the contents of a capture group as an independent
	1049	pattern that must match at the current position. Capture groups
	1050	contained by the pattern will have the value as determined by the
	1051	outermost recursion.
	1052
	1053	PARNO is a sequence of digits (not starting with 0) whose value reflects
	1054	the paren-number of the capture group to recurse to. C<(?R)> recurses to
	1055	the beginning of the whole pattern. C<(?0)> is an alternate syntax for
	1056	C<(?R)>. If PARNO is preceded by a plus or minus sign then it is assumed
	1057	to be relative, with negative numbers indicating preceding capture groups
	1058	and positive ones following. Thus C<(?-1)> refers to the most recently
	1059	declared group, and C<(?+1)> indicates the next group to be declared.
	1060	Note that the counting for relative recursion differs from that of
	1061	relative backreferences, in that with recursion unclosed groups B<are>
	1062	included.
	1063
	1064	The following pattern matches a function foo() which may contain
	1065	balanced parentheses as the argument.
	1066
	1067	$re = qr{ ( # paren group 1 (full function)
	1068	foo
	1069	( # paren group 2 (parens)
	1070	\(
	1071	( # paren group 3 (contents of parens)
	1072	(?:
	1073	(?> [^()]+ ) # Non-parens without backtracking
	1074	\|
	1075	(?2) # Recurse to start of paren group 2
	1076	)*
	1077	)
	1078	\)
	1079	)
	1080	)
	1081	}x;
	1082
	1083	If the pattern was used as follows
	1084
	1085	'foo(bar(baz)+baz(bop))'=~/$re/
	1086	and print "\$1 = $1\n",
	1087	"\$2 = $2\n",
	1088	"\$3 = $3\n";
	1089
	1090	the output produced should be the following:
	1091
	1092	$1 = foo(bar(baz)+baz(bop))
	1093	$2 = (bar(baz)+baz(bop))
	1094	$3 = bar(baz)+baz(bop)
	1095
	1096	If there is no corresponding capture group defined, then it is a
	1097	fatal error. Recursing deeper than 50 times without consuming any input
	1098	string will also result in a fatal error. The maximum depth is compiled
	1099	into perl, so changing it requires a custom build.
	1100
	1101	The following shows how using negative indexing can make it
	1102	easier to embed recursive patterns inside of a C<qr//> construct
	1103	for later use:
	1104
	1105	my $parens = qr/($(?:[^()]++\|(?-1))*+$)/;
	1106	if (/foo $parens \s+ + \s+ bar $parens/x) {
	1107	# do something here...
	1108	}
	1109
	1110	B<Note> that this pattern does not behave the same way as the equivalent
	1111	PCRE or Python construct of the same form. In Perl you can backtrack into
	1112	a recursed group, in PCRE and Python the recursed into group is treated
	1113	as atomic. Also, modifiers are resolved at compile time, so constructs
	1114	like (?i:(?1)) or (?:(?i)(?1)) do not affect how the sub-pattern will
	1115	be processed.
	1116
	1117	=item C<(?&NAME)>
	1118	X<(?&NAME)>
	1119
	1120	Recurse to a named subpattern. Identical to C<(?PARNO)> except that the
	1121	parenthesis to recurse to is determined by name. If multiple parentheses have
	1122	the same name, then it recurses to the leftmost.
	1123
	1124	It is an error to refer to a name that is not declared somewhere in the
	1125	pattern.
	1126
	1127	B<NOTE:> In order to make things easier for programmers with experience
	1128	with the Python or PCRE regex engines the pattern C<< (?P>NAME) >>
	1129	may be used instead of C<< (?&NAME) >>.
	1130
	1131	=item C<(?(condition)yes-pattern\|no-pattern)>
	1132	X<(?()>
	1133
	1134	=item C<(?(condition)yes-pattern)>
	1135
	1136	Conditional expression. C<(condition)> should be either an integer in
	1137	parentheses (which is valid if the corresponding pair of parentheses
	1138	matched), a look-ahead/look-behind/evaluate zero-width assertion, a
	1139	name in angle brackets or single quotes (which is valid if a group
	1140	with the given name matched), or the special symbol (R) (true when
	1141	evaluated inside of recursion or eval). Additionally the R may be
	1142	followed by a number, (which will be true when evaluated when recursing
	1143	inside of the appropriate group), or by C<&NAME>, in which case it will
	1144	be true only when evaluated during recursion in the named group.
	1145
	1146	Here's a summary of the possible predicates:
	1147
	1148	=over 4
	1149
	1150	=item (1) (2) ...
	1151
	1152	Checks if the numbered capturing group has matched something.
	1153
	1154	=item (<NAME>) ('NAME')
	1155
	1156	Checks if a group with the given name has matched something.
	1157
	1158	=item (?{ CODE })
	1159
	1160	Treats the code block as the condition.
	1161
	1162	=item (R)
	1163
	1164	Checks if the expression has been evaluated inside of recursion.
	1165
	1166	=item (R1) (R2) ...
	1167
	1168	Checks if the expression has been evaluated while executing directly
	1169	inside of the n-th capture group. This check is the regex equivalent of
	1170
	1171	if ((caller(0))[3] eq 'subname') { ... }
	1172
	1173	In other words, it does not check the full recursion stack.
	1174
	1175	=item (R&NAME)
	1176
	1177	Similar to C<(R1)>, this predicate checks to see if we're executing
	1178	directly inside of the leftmost group with a given name (this is the same
	1179	logic used by C<(?&NAME)> to disambiguate). It does not check the full
	1180	stack, but only the name of the innermost active recursion.
	1181
	1182	=item (DEFINE)
	1183
	1184	In this case, the yes-pattern is never directly executed, and no
	1185	no-pattern is allowed. Similar in spirit to C<(?{0})> but more efficient.
	1186	See below for details.
	1187
	1188	=back
	1189
	1190	For example:
	1191
	1192	m{ ( \( )?
	1193	[^()]+
	1194	(?(1) \) )
	1195	}x
	1196
	1197	matches a chunk of non-parentheses, possibly included in parentheses
	1198	themselves.
	1199
	1200	A special form is the C<(DEFINE)> predicate, which never executes directly
	1201	its yes-pattern, and does not allow a no-pattern. This allows to define
	1202	subpatterns which will be executed only by using the recursion mechanism.
	1203	This way, you can define a set of regular expression rules that can be
	1204	bundled into any pattern you choose.
	1205
	1206	It is recommended that for this usage you put the DEFINE block at the
	1207	end of the pattern, and that you name any subpatterns defined within it.
	1208
	1209	Also, it's worth noting that patterns defined this way probably will
	1210	not be as efficient, as the optimiser is not very clever about
	1211	handling them.
	1212
	1213	An example of how this might be used is as follows:
	1214
	1215	/(?<NAME>(?&NAME_PAT))(?<ADDR>(?&ADDRESS_PAT))
	1216	(?(DEFINE)
	1217	(?<NAME_PAT>....)
	1218	(?<ADRESS_PAT>....)
	1219	)/x
	1220
	1221	Note that capture groups matched inside of recursion are not accessible
	1222	after the recursion returns, so the extra layer of capturing groups is
	1223	necessary. Thus C<$+{NAME_PAT}> would not be defined even though
	1224	C<$+{NAME}> would be.
	1225
	1226	=item C<< (?>pattern) >>
	1227	X<backtrack> X<backtracking> X<atomic> X<possessive>
	1228
	1229	An "independent" subexpression, one which matches the substring
	1230	that a I<standalone> C<pattern> would match if anchored at the given
	1231	position, and it matches I<nothing other than this substring>. This
	1232	construct is useful for optimizations of what would otherwise be
	1233	"eternal" matches, because it will not backtrack (see L<"Backtracking">).
	1234	It may also be useful in places where the "grab all you can, and do not
	1235	give anything back" semantic is desirable.
	1236
	1237	For example: C<< ^(?>a)ab >> will never match, since C<< (?>a) >>
	1238	(anchored at the beginning of string, as above) will match I<all>
	1239	characters C<a> at the beginning of string, leaving no C<a> for
	1240	C<ab> to match. In contrast, C<a*ab> will match the same as C<a+b>,
	1241	since the match of the subgroup C<a*> is influenced by the following
	1242	group C<ab> (see L<"Backtracking">). In particular, C<a*> inside
	1243	C<aab> will match fewer characters than a standalone C<a>, since
	1244	this makes the tail match.
	1245
	1246	An effect similar to C<< (?>pattern) >> may be achieved by writing
	1247	C<(?=(pattern))\g1>. This matches the same substring as a standalone
	1248	C<a+>, and the following C<\g1> eats the matched string; it therefore
	1249	makes a zero-length assertion into an analogue of C<< (?>...) >>.
	1250	(The difference between these two constructs is that the second one
	1251	uses a capturing group, thus shifting ordinals of backreferences
	1252	in the rest of a regular expression.)
	1253
	1254	Consider this pattern:
	1255
	1256	m{ \(
	1257	(
	1258	[^()]+ # x+
	1259	\|
	1260	$ [^()]* $
	1261	)+
	1262	\)
	1263	}x
	1264
	1265	That will efficiently match a nonempty group with matching parentheses
	1266	two levels deep or less. However, if there is no such group, it
	1267	will take virtually forever on a long string. That's because there
	1268	are so many different ways to split a long string into several
	1269	substrings. This is what C<(.+)+> is doing, and C<(.+)+> is similar
	1270	to a subpattern of the above pattern. Consider how the pattern
	1271	above detects no-match on C<((()aaaaaaaaaaaaaaaaaa> in several
	1272	seconds, but that each extra letter doubles this time. This
	1273	exponential performance will make it appear that your program has
	1274	hung. However, a tiny change to this pattern
	1275
	1276	m{ \(
	1277	(
	1278	(?> [^()]+ ) # change x+ above to (?> x+ )
	1279	\|
	1280	$ [^()]* $
	1281	)+
	1282	\)
	1283	}x
	1284
	1285	which uses C<< (?>...) >> matches exactly when the one above does (verifying
	1286	this yourself would be a productive exercise), but finishes in a fourth
	1287	the time when used on a similar string with 1000000 C<a>s. Be aware,
	1288	however, that this pattern currently triggers a warning message under
	1289	the C<use warnings> pragma or B<-w> switch saying it
	1290	C<"matches null string many times in regex">.
	1291
	1292	On simple groups, such as the pattern C<< (?> [^()]+ ) >>, a comparable
	1293	effect may be achieved by negative look-ahead, as in C<[^()]+ (?! [^()] )>.
	1294	This was only 4 times slower on a string with 1000000 C<a>s.
	1295
	1296	The "grab all you can, and do not give anything back" semantic is desirable
	1297	in many situations where on the first sight a simple C<()*> looks like
	1298	the correct solution. Suppose we parse text with comments being delimited
	1299	by C<#> followed by some optional (horizontal) whitespace. Contrary to
	1300	its appearance, C<#[ \t]*> I<is not> the correct subexpression to match
	1301	the comment delimiter, because it may "give up" some whitespace if
	1302	the remainder of the pattern can be made to match that way. The correct
	1303	answer is either one of these:
	1304
	1305	(?>#[ \t]*)
	1306	#[ \t]*(?![ \t])
	1307
	1308	For example, to grab non-empty comments into $1, one should use either
	1309	one of these:
	1310
	1311	/ (?> \# [ \t]* ) ( .+ ) /x;
	1312	/ \# [ \t]* ( [^ \t] .* ) /x;
	1313
	1314	Which one you pick depends on which of these expressions better reflects
	1315	the above specification of comments.
	1316
	1317	In some literature this construct is called "atomic matching" or
	1318	"possessive matching".
	1319
	1320	Possessive quantifiers are equivalent to putting the item they are applied
	1321	to inside of one of these constructs. The following equivalences apply:
	1322
	1323	Quantifier Form Bracketing Form
	1324	--------------- ---------------
	1325	PAT+ (?>PAT)
	1326	PAT++ (?>PAT+)
	1327	PAT?+ (?>PAT?)
	1328	PAT{min,max}+ (?>PAT{min,max})
	1329
	1330	=back
	1331
	1332	=head2 Special Backtracking Control Verbs
	1333
	1334	B<WARNING:> These patterns are experimental and subject to change or
	1335	removal in a future version of Perl. Their usage in production code should
	1336	be noted to avoid problems during upgrades.
	1337
	1338	These special patterns are generally of the form C<(*VERB:ARG)>. Unless
	1339	otherwise stated the ARG argument is optional; in some cases, it is
	1340	forbidden.
	1341
	1342	Any pattern containing a special backtracking verb that allows an argument
	1343	has the special behaviour that when executed it sets the current package's
	1344	C<$REGERROR> and C<$REGMARK> variables. When doing so the following
	1345	rules apply:
	1346
	1347	On failure, the C<$REGERROR> variable will be set to the ARG value of the
	1348	verb pattern, if the verb was involved in the failure of the match. If the
	1349	ARG part of the pattern was omitted, then C<$REGERROR> will be set to the
	1350	name of the last C<(*MARK:NAME)> pattern executed, or to TRUE if there was
	1351	none. Also, the C<$REGMARK> variable will be set to FALSE.
	1352
	1353	On a successful match, the C<$REGERROR> variable will be set to FALSE, and
	1354	the C<$REGMARK> variable will be set to the name of the last
	1355	C<(*MARK:NAME)> pattern executed. See the explanation for the
	1356	C<(*MARK:NAME)> verb below for more details.
	1357
	1358	B<NOTE:> C<$REGERROR> and C<$REGMARK> are not magic variables like C<$1>
	1359	and most other regex related variables. They are not local to a scope, nor
	1360	readonly, but instead are volatile package variables similar to C<$AUTOLOAD>.
	1361	Use C<local> to localize changes to them to a specific scope if necessary.
	1362
	1363	If a pattern does not contain a special backtracking verb that allows an
	1364	argument, then C<$REGERROR> and C<$REGMARK> are not touched at all.
	1365
	1366	=over 4
	1367
	1368	=item Verbs that take an argument
	1369
	1370	=over 4
	1371
	1372	=item C<(PRUNE)> C<(PRUNE:NAME)>
	1373	X<(PRUNE)> X<(PRUNE:NAME)>
	1374
	1375	This zero-width pattern prunes the backtracking tree at the current point
	1376	when backtracked into on failure. Consider the pattern C<A (*PRUNE) B>,
	1377	where A and B are complex patterns. Until the C<(*PRUNE)> verb is reached,
	1378	A may backtrack as necessary to match. Once it is reached, matching
	1379	continues in B, which may also backtrack as necessary; however, should B
	1380	not match, then no further backtracking will take place, and the pattern
	1381	will fail outright at the current starting position.
	1382
	1383	The following example counts all the possible matching strings in a
	1384	pattern (without actually matching any of them).
	1385
	1386	'aaab' =~ /a+b?(?{print "$&\n"; $count++})(*FAIL)/;
	1387	print "Count=$count\n";
	1388
	1389	which produces:
	1390
	1391	aaab
	1392	aaa
	1393	aa
	1394	a
	1395	aab
	1396	aa
	1397	a
	1398	ab
	1399	a
	1400	Count=9
	1401
	1402	If we add a C<(*PRUNE)> before the count like the following
	1403
	1404	'aaab' =~ /a+b?(PRUNE)(?{print "$&\n"; $count++})(FAIL)/;
	1405	print "Count=$count\n";
	1406
	1407	we prevent backtracking and find the count of the longest matching
	1408	at each matching starting point like so:
	1409
	1410	aaab
	1411	aab
	1412	ab
	1413	Count=3
	1414
	1415	Any number of C<(*PRUNE)> assertions may be used in a pattern.
	1416
	1417	See also C<< (?>pattern) >> and possessive quantifiers for other ways to
	1418	control backtracking. In some cases, the use of C<(*PRUNE)> can be
	1419	replaced with a C<< (?>pattern) >> with no functional difference; however,
	1420	C<(*PRUNE)> can be used to handle cases that cannot be expressed using a
	1421	C<< (?>pattern) >> alone.
	1422
	1423
	1424	=item C<(SKIP)> C<(SKIP:NAME)>
	1425	X<(*SKIP)>
	1426
	1427	This zero-width pattern is similar to C<(*PRUNE)>, except that on
	1428	failure it also signifies that whatever text that was matched leading up
	1429	to the C<(*SKIP)> pattern being executed cannot be part of I<any> match
	1430	of this pattern. This effectively means that the regex engine "skips" forward
	1431	to this position on failure and tries to match again, (assuming that
	1432	there is sufficient room to match).
	1433
	1434	The name of the C<(*SKIP:NAME)> pattern has special significance. If a
	1435	C<(*MARK:NAME)> was encountered while matching, then it is that position
	1436	which is used as the "skip point". If no C<(*MARK)> of that name was
	1437	encountered, then the C<(*SKIP)> operator has no effect. When used
	1438	without a name the "skip point" is where the match point was when
	1439	executing the (*SKIP) pattern.
	1440
	1441	Compare the following to the examples in C<(*PRUNE)>, note the string
	1442	is twice as long:
	1443
	1444	'aaabaaab' =~ /a+b?(SKIP)(?{print "$&\n"; $count++})(FAIL)/;
	1445	print "Count=$count\n";
	1446
	1447	outputs
	1448
	1449	aaab
	1450	aaab
	1451	Count=2
	1452
	1453	Once the 'aaab' at the start of the string has matched, and the C<(*SKIP)>
	1454	executed, the next starting point will be where the cursor was when the
	1455	C<(*SKIP)> was executed.
	1456
	1457	=item C<(MARK:NAME)> C<(:NAME)>
	1458	X<(MARK)> C<(MARK:NAME)> C<(*:NAME)>
	1459
	1460	This zero-width pattern can be used to mark the point reached in a string
	1461	when a certain part of the pattern has been successfully matched. This
	1462	mark may be given a name. A later C<(*SKIP)> pattern will then skip
	1463	forward to that point if backtracked into on failure. Any number of
	1464	C<(*MARK)> patterns are allowed, and the NAME portion may be duplicated.
	1465
	1466	In addition to interacting with the C<(SKIP)> pattern, C<(MARK:NAME)>
	1467	can be used to "label" a pattern branch, so that after matching, the
	1468	program can determine which branches of the pattern were involved in the
	1469	match.
	1470
	1471	When a match is successful, the C<$REGMARK> variable will be set to the
	1472	name of the most recently executed C<(*MARK:NAME)> that was involved
	1473	in the match.
	1474
	1475	This can be used to determine which branch of a pattern was matched
	1476	without using a separate capture group for each branch, which in turn
	1477	can result in a performance improvement, as perl cannot optimize
	1478	C</(?:(x)\|(y)\|(z))/> as efficiently as something like
	1479	C</(?:x(MARK:x)\|y(MARK:y)\|z(*MARK:z))/>.
	1480
	1481	When a match has failed, and unless another verb has been involved in
	1482	failing the match and has provided its own name to use, the C<$REGERROR>
	1483	variable will be set to the name of the most recently executed
	1484	C<(*MARK:NAME)>.
	1485
	1486	See C<(*SKIP)> for more details.
	1487
	1488	As a shortcut C<(MARK:NAME)> can be written C<(:NAME)>.
	1489
	1490	=item C<(THEN)> C<(THEN:NAME)>
	1491
	1492	This is similar to the "cut group" operator C<::> from Perl 6. Like
	1493	C<(*PRUNE)>, this verb always matches, and when backtracked into on
	1494	failure, it causes the regex engine to try the next alternation in the
	1495	innermost enclosing group (capturing or otherwise).
	1496
	1497	Its name comes from the observation that this operation combined with the
	1498	alternation operator (C<\|>) can be used to create what is essentially a
	1499	pattern-based if/then/else block:
	1500
	1501	( COND (THEN) FOO \| COND2 (THEN) BAR \| COND3 (*THEN) BAZ )
	1502
	1503	Note that if this operator is used and NOT inside of an alternation then
	1504	it acts exactly like the C<(*PRUNE)> operator.
	1505
	1506	/ A (*PRUNE) B /
	1507
	1508	is the same as
	1509
	1510	/ A (*THEN) B /
	1511
	1512	but
	1513
	1514	/ ( A (THEN) B \| C (THEN) D ) /
	1515
	1516	is not the same as
	1517
	1518	/ ( A (PRUNE) B \| C (PRUNE) D ) /
	1519
	1520	as after matching the A but failing on the B the C<(*THEN)> verb will
	1521	backtrack and try C; but the C<(*PRUNE)> verb will simply fail.
	1522
	1523	=item C<(*COMMIT)>
	1524	X<(*COMMIT)>
	1525
	1526	This is the Perl 6 "commit pattern" C<< <commit> >> or C<:::>. It's a
	1527	zero-width pattern similar to C<(*SKIP)>, except that when backtracked
	1528	into on failure it causes the match to fail outright. No further attempts
	1529	to find a valid match by advancing the start pointer will occur again.
	1530	For example,
	1531
	1532	'aaabaaab' =~ /a+b?(COMMIT)(?{print "$&\n"; $count++})(FAIL)/;
	1533	print "Count=$count\n";
	1534
	1535	outputs
	1536
	1537	aaab
	1538	Count=1
	1539
	1540	In other words, once the C<(*COMMIT)> has been entered, and if the pattern
	1541	does not match, the regex engine will not try any further matching on the
	1542	rest of the string.
	1543
	1544	=back
	1545
	1546	=item Verbs without an argument
	1547
	1548	=over 4
	1549
	1550	=item C<(FAIL)> C<(F)>
	1551	X<(FAIL)> X<(F)>
	1552
	1553	This pattern matches nothing and always fails. It can be used to force the
	1554	engine to backtrack. It is equivalent to C<(?!)>, but easier to read. In
	1555	fact, C<(?!)> gets optimised into C<(*FAIL)> internally.
	1556
	1557	It is probably useful only when combined with C<(?{})> or C<(??{})>.
	1558
	1559	=item C<(*ACCEPT)>
	1560	X<(*ACCEPT)>
	1561
	1562	B<WARNING:> This feature is highly experimental. It is not recommended
	1563	for production code.
	1564
	1565	This pattern matches nothing and causes the end of successful matching at
	1566	the point at which the C<(*ACCEPT)> pattern was encountered, regardless of
	1567	whether there is actually more to match in the string. When inside of a
	1568	nested pattern, such as recursion, or in a subpattern dynamically generated
	1569	via C<(??{})>, only the innermost pattern is ended immediately.
	1570
	1571	If the C<(*ACCEPT)> is inside of capturing groups then the groups are
	1572	marked as ended at the point at which the C<(*ACCEPT)> was encountered.
	1573	For instance:
	1574
	1575	'AB' =~ /(A (A\|B(*ACCEPT)\|C) D)(E)/x;
	1576
	1577	will match, and C<$1> will be C<AB> and C<$2> will be C<B>, C<$3> will not
	1578	be set. If another branch in the inner parentheses were matched, such as in the
	1579	string 'ACDE', then the C<D> and C<E> would have to be matched as well.
	1580
	1581	=back
	1582
	1583	=back
	1584
	1585	=head2 Backtracking
	1586	X<backtrack> X<backtracking>
	1587
	1588	NOTE: This section presents an abstract approximation of regular
	1589	expression behavior. For a more rigorous (and complicated) view of
	1590	the rules involved in selecting a match among possible alternatives,
	1591	see L<Combining RE Pieces>.
	1592
	1593	A fundamental feature of regular expression matching involves the
	1594	notion called I<backtracking>, which is currently used (when needed)
	1595	by all regular non-possessive expression quantifiers, namely C<>, C<?>, C<+>,
	1596	C<+?>, C<{n,m}>, and C<{n,m}?>. Backtracking is often optimized
	1597	internally, but the general principle outlined here is valid.
	1598
	1599	For a regular expression to match, the I<entire> regular expression must
	1600	match, not just part of it. So if the beginning of a pattern containing a
	1601	quantifier succeeds in a way that causes later parts in the pattern to
	1602	fail, the matching engine backs up and recalculates the beginning
	1603	part--that's why it's called backtracking.
	1604
	1605	Here is an example of backtracking: Let's say you want to find the
	1606	word following "foo" in the string "Food is on the foo table.":
	1607
	1608	$_ = "Food is on the foo table.";
	1609	if ( /\b(foo)\s+(\w+)/i ) {
	1610	print "$2 follows $1.\n";
	1611	}
	1612
	1613	When the match runs, the first part of the regular expression (C<\b(foo)>)
	1614	finds a possible match right at the beginning of the string, and loads up
	1615	$1 with "Foo". However, as soon as the matching engine sees that there's
	1616	no whitespace following the "Foo" that it had saved in $1, it realizes its
	1617	mistake and starts over again one character after where it had the
	1618	tentative match. This time it goes all the way until the next occurrence
	1619	of "foo". The complete regular expression matches this time, and you get
	1620	the expected output of "table follows foo."
	1621
	1622	Sometimes minimal matching can help a lot. Imagine you'd like to match
	1623	everything between "foo" and "bar". Initially, you write something
	1624	like this:
	1625
	1626	$_ = "The food is under the bar in the barn.";
	1627	if ( /foo(.*)bar/ ) {
	1628	print "got <$1>\n";
	1629	}
	1630
	1631	Which perhaps unexpectedly yields:
	1632
	1633	got <d is under the bar in the >
	1634
	1635	That's because C<.*> was greedy, so you get everything between the
	1636	I<first> "foo" and the I<last> "bar". Here it's more effective
	1637	to use minimal matching to make sure you get the text between a "foo"
	1638	and the first "bar" thereafter.
	1639
	1640	if ( /foo(.*?)bar/ ) { print "got <$1>\n" }
	1641	got <d is under the >
	1642
	1643	Here's another example. Let's say you'd like to match a number at the end
	1644	of a string, and you also want to keep the preceding part of the match.
	1645	So you write this:
	1646
	1647	$_ = "I have 2 numbers: 53147";
	1648	if ( /(.)(\d)/ ) { # Wrong!
	1649	print "Beginning is <$1>, number is <$2>.\n";
	1650	}
	1651
	1652	That won't work at all, because C<.*> was greedy and gobbled up the
	1653	whole string. As C<\d*> can match on an empty string the complete
	1654	regular expression matched successfully.
	1655
	1656	Beginning is <I have 2 numbers: 53147>, number is <>.
	1657
	1658	Here are some variants, most of which don't work:
	1659
	1660	$_ = "I have 2 numbers: 53147";
	1661	@pats = qw{
	1662	(.)(\d)
	1663	(.*)(\d+)
	1664	(.?)(\d)
	1665	(.*?)(\d+)
	1666	(.*)(\d+)$
	1667	(.*?)(\d+)$
	1668	(.*)\b(\d+)$
	1669	(.*\D)(\d+)$
	1670	};
	1671
	1672	for $pat (@pats) {
	1673	printf "%-12s ", $pat;
	1674	if ( /$pat/ ) {
	1675	print "<$1> <$2>\n";
	1676	} else {
	1677	print "FAIL\n";
	1678	}
	1679	}
	1680
	1681	That will print out:
	1682
	1683	(.)(\d) <I have 2 numbers: 53147> <>
	1684	(.*)(\d+) <I have 2 numbers: 5314> <7>
	1685	(.?)(\d) <> <>
	1686	(.*?)(\d+) <I have > <2>
	1687	(.*)(\d+)$ <I have 2 numbers: 5314> <7>
	1688	(.*?)(\d+)$ <I have 2 numbers: > <53147>
	1689	(.*)\b(\d+)$ <I have 2 numbers: > <53147>
	1690	(.*\D)(\d+)$ <I have 2 numbers: > <53147>
	1691
	1692	As you see, this can be a bit tricky. It's important to realize that a
	1693	regular expression is merely a set of assertions that gives a definition
	1694	of success. There may be 0, 1, or several different ways that the
	1695	definition might succeed against a particular string. And if there are
	1696	multiple ways it might succeed, you need to understand backtracking to
	1697	know which variety of success you will achieve.
	1698
	1699	When using look-ahead assertions and negations, this can all get even
	1700	trickier. Imagine you'd like to find a sequence of non-digits not
	1701	followed by "123". You might try to write that as
	1702
	1703	$_ = "ABC123";
	1704	if ( /^\D*(?!123)/ ) { # Wrong!
	1705	print "Yup, no 123 in $_\n";
	1706	}
	1707
	1708	But that isn't going to match; at least, not the way you're hoping. It
	1709	claims that there is no 123 in the string. Here's a clearer picture of
	1710	why that pattern matches, contrary to popular expectations:
	1711
	1712	$x = 'ABC123';
	1713	$y = 'ABC445';
	1714
	1715	print "1: got $1\n" if $x =~ /^(ABC)(?!123)/;
	1716	print "2: got $1\n" if $y =~ /^(ABC)(?!123)/;
	1717
	1718	print "3: got $1\n" if $x =~ /^(\D*)(?!123)/;
	1719	print "4: got $1\n" if $y =~ /^(\D*)(?!123)/;
	1720
	1721	This prints
	1722
	1723	2: got ABC
	1724	3: got AB
	1725	4: got ABC
	1726
	1727	You might have expected test 3 to fail because it seems to a more
	1728	general purpose version of test 1. The important difference between
	1729	them is that test 3 contains a quantifier (C<\D*>) and so can use
	1730	backtracking, whereas test 1 will not. What's happening is
	1731	that you've asked "Is it true that at the start of $x, following 0 or more
	1732	non-digits, you have something that's not 123?" If the pattern matcher had
	1733	let C<\D*> expand to "ABC", this would have caused the whole pattern to
	1734	fail.
	1735
	1736	The search engine will initially match C<\D*> with "ABC". Then it will
	1737	try to match C<(?!123> with "123", which fails. But because
	1738	a quantifier (C<\D*>) has been used in the regular expression, the
	1739	search engine can backtrack and retry the match differently
	1740	in the hope of matching the complete regular expression.
	1741
	1742	The pattern really, I<really> wants to succeed, so it uses the
	1743	standard pattern back-off-and-retry and lets C<\D*> expand to just "AB" this
	1744	time. Now there's indeed something following "AB" that is not
	1745	"123". It's "C123", which suffices.
	1746
	1747	We can deal with this by using both an assertion and a negation.
	1748	We'll say that the first part in $1 must be followed both by a digit
	1749	and by something that's not "123". Remember that the look-aheads
	1750	are zero-width expressions--they only look, but don't consume any
	1751	of the string in their match. So rewriting this way produces what
	1752	you'd expect; that is, case 5 will fail, but case 6 succeeds:
	1753
	1754	print "5: got $1\n" if $x =~ /^(\D*)(?=\d)(?!123)/;
	1755	print "6: got $1\n" if $y =~ /^(\D*)(?=\d)(?!123)/;
	1756
	1757	6: got ABC
	1758
	1759	In other words, the two zero-width assertions next to each other work as though
	1760	they're ANDed together, just as you'd use any built-in assertions: C</^$/>
	1761	matches only if you're at the beginning of the line AND the end of the
	1762	line simultaneously. The deeper underlying truth is that juxtaposition in
	1763	regular expressions always means AND, except when you write an explicit OR
	1764	using the vertical bar. C</ab/> means match "a" AND (then) match "b",
	1765	although the attempted matches are made at different positions because "a"
	1766	is not a zero-width assertion, but a one-width assertion.
	1767
	1768	B<WARNING>: Particularly complicated regular expressions can take
	1769	exponential time to solve because of the immense number of possible
	1770	ways they can use backtracking to try for a match. For example, without
	1771	internal optimizations done by the regular expression engine, this will
	1772	take a painfully long time to run:
	1773
	1774	'aaaaaaaaaaaa' =~ /((a{0,5}){0,5})*[c]/
	1775
	1776	And if you used C<*>'s in the internal groups instead of limiting them
	1777	to 0 through 5 matches, then it would take forever--or until you ran
	1778	out of stack space. Moreover, these internal optimizations are not
	1779	always applicable. For example, if you put C<{0,5}> instead of C<*>
	1780	on the external group, no current optimization is applicable, and the
	1781	match takes a long time to finish.
	1782
	1783	A powerful tool for optimizing such beasts is what is known as an
	1784	"independent group",
	1785	which does not backtrack (see L<C<< (?>pattern) >>>). Note also that
	1786	zero-length look-ahead/look-behind assertions will not backtrack to make
	1787	the tail match, since they are in "logical" context: only
	1788	whether they match is considered relevant. For an example
	1789	where side-effects of look-ahead I<might> have influenced the
	1790	following match, see L<C<< (?>pattern) >>>.
	1791
	1792	=head2 Version 8 Regular Expressions
	1793	X<regular expression, version 8> X<regex, version 8> X<regexp, version 8>
	1794
	1795	In case you're not familiar with the "regular" Version 8 regex
	1796	routines, here are the pattern-matching rules not described above.
	1797
	1798	Any single character matches itself, unless it is a I<metacharacter>
	1799	with a special meaning described here or above. You can cause
	1800	characters that normally function as metacharacters to be interpreted
	1801	literally by prefixing them with a "\" (e.g., "\." matches a ".", not any
	1802	character; "\\" matches a "\"). This escape mechanism is also required
	1803	for the character used as the pattern delimiter.
	1804
	1805	A series of characters matches that series of characters in the target
	1806	string, so the pattern C<blurfl> would match "blurfl" in the target
	1807	string.
	1808
	1809	You can specify a character class, by enclosing a list of characters
	1810	in C<[]>, which will match any character from the list. If the
	1811	first character after the "[" is "^", the class matches any character not
	1812	in the list. Within a list, the "-" character specifies a
	1813	range, so that C<a-z> represents all characters between "a" and "z",
	1814	inclusive. If you want either "-" or "]" itself to be a member of a
	1815	class, put it at the start of the list (possibly after a "^"), or
	1816	escape it with a backslash. "-" is also taken literally when it is
	1817	at the end of the list, just before the closing "]". (The
	1818	following all specify the same class of three characters: C<[-az]>,
	1819	C<[az-]>, and C<[a\-z]>. All are different from C<[a-z]>, which
	1820	specifies a class containing twenty-six characters, even on EBCDIC-based
	1821	character sets.) Also, if you try to use the character
	1822	classes C<\w>, C<\W>, C<\s>, C<\S>, C<\d>, or C<\D> as endpoints of
	1823	a range, the "-" is understood literally.
	1824
	1825	Note also that the whole range idea is rather unportable between
	1826	character sets--and even within character sets they may cause results
	1827	you probably didn't expect. A sound principle is to use only ranges
	1828	that begin from and end at either alphabetics of equal case ([a-e],
	1829	[A-E]), or digits ([0-9]). Anything else is unsafe. If in doubt,
	1830	spell out the character sets in full.
	1831
	1832	Characters may be specified using a metacharacter syntax much like that
	1833	used in C: "\n" matches a newline, "\t" a tab, "\r" a carriage return,
	1834	"\f" a form feed, etc. More generally, \I<nnn>, where I<nnn> is a string
	1835	of three octal digits, matches the character whose coded character set value
	1836	is I<nnn>. Similarly, \xI<nn>, where I<nn> are hexadecimal digits,
	1837	matches the character whose ordinal is I<nn>. The expression \cI<x>
	1838	matches the character control-I<x>. Finally, the "." metacharacter
	1839	matches any character except "\n" (unless you use C</s>).
	1840
	1841	You can specify a series of alternatives for a pattern using "\|" to
	1842	separate them, so that C<fee\|fie\|foe> will match any of "fee", "fie",
	1843	or "foe" in the target string (as would C<f(e\|i\|o)e>). The
	1844	first alternative includes everything from the last pattern delimiter
	1845	("(", "[", or the beginning of the pattern) up to the first "\|", and
	1846	the last alternative contains everything from the last "\|" to the next
	1847	pattern delimiter. That's why it's common practice to include
	1848	alternatives in parentheses: to minimize confusion about where they
	1849	start and end.
	1850
	1851	Alternatives are tried from left to right, so the first
	1852	alternative found for which the entire expression matches, is the one that
	1853	is chosen. This means that alternatives are not necessarily greedy. For
	1854	example: when matching C<foo\|foot> against "barefoot", only the "foo"
	1855	part will match, as that is the first alternative tried, and it successfully
	1856	matches the target string. (This might not seem important, but it is
	1857	important when you are capturing matched text using parentheses.)
	1858
	1859	Also remember that "\|" is interpreted as a literal within square brackets,
	1860	so if you write C<[fee\|fie\|foe]> you're really only matching C<[feio\|]>.
	1861
	1862	Within a pattern, you may designate subpatterns for later reference
	1863	by enclosing them in parentheses, and you may refer back to the
	1864	I<n>th subpattern later in the pattern using the metacharacter
	1865	\I<n>. Subpatterns are numbered based on the left to right order
	1866	of their opening parenthesis. A backreference matches whatever
	1867	actually matched the subpattern in the string being examined, not
	1868	the rules for that subpattern. Therefore, C<(0\|0x)\d\s\g1\d> will
	1869	match "0x1234 0x4321", but not "0x1234 01234", because subpattern
	1870	1 matched "0x", even though the rule C<0\|0x> could potentially match
	1871	the leading 0 in the second number.
	1872
	1873	=head2 Warning on \1 Instead of $1
	1874
	1875	Some people get too used to writing things like:
	1876
	1877	$pattern =~ s/(\W)/\\\1/g;
	1878
	1879	This is grandfathered (for \1 to \9) for the RHS of a substitute to avoid
	1880	shocking the
	1881	B<sed> addicts, but it's a dirty habit to get into. That's because in
	1882	PerlThink, the righthand side of an C<s///> is a double-quoted string. C<\1> in
	1883	the usual double-quoted string means a control-A. The customary Unix
	1884	meaning of C<\1> is kludged in for C<s///>. However, if you get into the habit
	1885	of doing that, you get yourself into trouble if you then add an C</e>
	1886	modifier.
	1887
	1888	s/(\d+)/ \1 + 1 /eg; # causes warning under -w
	1889
	1890	Or if you try to do
	1891
	1892	s/(\d+)/\1000/;
	1893
	1894	You can't disambiguate that by saying C<\{1}000>, whereas you can fix it with
	1895	C<${1}000>. The operation of interpolation should not be confused
	1896	with the operation of matching a backreference. Certainly they mean two
	1897	different things on the I<left> side of the C<s///>.
	1898
	1899	=head2 Repeated Patterns Matching a Zero-length Substring
	1900
	1901	B<WARNING>: Difficult material (and prose) ahead. This section needs a rewrite.
	1902
	1903	Regular expressions provide a terse and powerful programming language. As
	1904	with most other power tools, power comes together with the ability
	1905	to wreak havoc.
	1906
	1907	A common abuse of this power stems from the ability to make infinite
	1908	loops using regular expressions, with something as innocuous as:
	1909
	1910	'foo' =~ m{ ( o? )* }x;
	1911
	1912	The C<o?> matches at the beginning of C<'foo'>, and since the position
	1913	in the string is not moved by the match, C<o?> would match again and again
	1914	because of the C<*> quantifier. Another common way to create a similar cycle
	1915	is with the looping modifier C<//g>:
	1916
	1917	@matches = ( 'foo' =~ m{ o? }xg );
	1918
	1919	or
	1920
	1921	print "match: <$&>\n" while 'foo' =~ m{ o? }xg;
	1922
	1923	or the loop implied by split().
	1924
	1925	However, long experience has shown that many programming tasks may
	1926	be significantly simplified by using repeated subexpressions that
	1927	may match zero-length substrings. Here's a simple example being:
	1928
	1929	@chars = split //, $string; # // is not magic in split
	1930	($whitewashed = $string) =~ s/()/ /g; # parens avoid magic s// /
	1931
	1932	Thus Perl allows such constructs, by I<forcefully breaking
	1933	the infinite loop>. The rules for this are different for lower-level
	1934	loops given by the greedy quantifiers C<*+{}>, and for higher-level
	1935	ones like the C</g> modifier or split() operator.
	1936
	1937	The lower-level loops are I<interrupted> (that is, the loop is
	1938	broken) when Perl detects that a repeated expression matched a
	1939	zero-length substring. Thus
	1940
	1941	m{ (?: NON_ZERO_LENGTH \| ZERO_LENGTH )* }x;
	1942
	1943	is made equivalent to
	1944
	1945	m{ (?: NON_ZERO_LENGTH )*
	1946	\|
	1947	(?: ZERO_LENGTH )?
	1948	}x;
	1949
	1950	The higher level-loops preserve an additional state between iterations:
	1951	whether the last match was zero-length. To break the loop, the following
	1952	match after a zero-length match is prohibited to have a length of zero.
	1953	This prohibition interacts with backtracking (see L<"Backtracking">),
	1954	and so the I<second best> match is chosen if the I<best> match is of
	1955	zero length.
	1956
	1957	For example:
	1958
	1959	$_ = 'bar';
	1960	s/\w??/<$&>/g;
	1961
	1962	results in C<< <><b><><a><><r><> >>. At each position of the string the best
	1963	match given by non-greedy C<??> is the zero-length match, and the I<second
	1964	best> match is what is matched by C<\w>. Thus zero-length matches
	1965	alternate with one-character-long matches.
	1966
	1967	Similarly, for repeated C<m/()/g> the second-best match is the match at the
	1968	position one notch further in the string.
	1969
	1970	The additional state of being I<matched with zero-length> is associated with
	1971	the matched string, and is reset by each assignment to pos().
	1972	Zero-length matches at the end of the previous match are ignored
	1973	during C<split>.
	1974
	1975	=head2 Combining RE Pieces
	1976
	1977	Each of the elementary pieces of regular expressions which were described
	1978	before (such as C<ab> or C<\Z>) could match at most one substring
	1979	at the given position of the input string. However, in a typical regular
	1980	expression these elementary pieces are combined into more complicated
	1981	patterns using combining operators C<ST>, C<S\|T>, C<S*> etc
	1982	(in these examples C<S> and C<T> are regular subexpressions).
	1983
	1984	Such combinations can include alternatives, leading to a problem of choice:
	1985	if we match a regular expression C<a\|ab> against C<"abc">, will it match
	1986	substring C<"a"> or C<"ab">? One way to describe which substring is
	1987	actually matched is the concept of backtracking (see L<"Backtracking">).
	1988	However, this description is too low-level and makes you think
	1989	in terms of a particular implementation.
	1990
	1991	Another description starts with notions of "better"/"worse". All the
	1992	substrings which may be matched by the given regular expression can be
	1993	sorted from the "best" match to the "worst" match, and it is the "best"
	1994	match which is chosen. This substitutes the question of "what is chosen?"
	1995	by the question of "which matches are better, and which are worse?".
	1996
	1997	Again, for elementary pieces there is no such question, since at most
	1998	one match at a given position is possible. This section describes the
	1999	notion of better/worse for combining operators. In the description
	2000	below C<S> and C<T> are regular subexpressions.
	2001
	2002	=over 4
	2003
	2004	=item C<ST>
	2005
	2006	Consider two possible matches, C<AB> and C<A'B'>, C<A> and C<A'> are
	2007	substrings which can be matched by C<S>, C<B> and C<B'> are substrings
	2008	which can be matched by C<T>.
	2009
	2010	If C<A> is better match for C<S> than C<A'>, C<AB> is a better
	2011	match than C<A'B'>.
	2012
	2013	If C<A> and C<A'> coincide: C<AB> is a better match than C<AB'> if
	2014	C<B> is better match for C<T> than C<B'>.
	2015
	2016	=item C<S\|T>
	2017
	2018	When C<S> can match, it is a better match than when only C<T> can match.
	2019
	2020	Ordering of two matches for C<S> is the same as for C<S>. Similar for
	2021	two matches for C<T>.
	2022
	2023	=item C<S{REPEAT_COUNT}>
	2024
	2025	Matches as C<SSS...S> (repeated as many times as necessary).
	2026
	2027	=item C<S{min,max}>
	2028
	2029	Matches as C<S{max}\|S{max-1}\|...\|S{min+1}\|S{min}>.
	2030
	2031	=item C<S{min,max}?>
	2032
	2033	Matches as C<S{min}\|S{min+1}\|...\|S{max-1}\|S{max}>.
	2034
	2035	=item C<S?>, C<S*>, C<S+>
	2036
	2037	Same as C<S{0,1}>, C<S{0,BIG_NUMBER}>, C<S{1,BIG_NUMBER}> respectively.
	2038
	2039	=item C<S??>, C<S*?>, C<S+?>
	2040
	2041	Same as C<S{0,1}?>, C<S{0,BIG_NUMBER}?>, C<S{1,BIG_NUMBER}?> respectively.
	2042
	2043	=item C<< (?>S) >>
	2044
	2045	Matches the best match for C<S> and only that.
	2046
	2047	=item C<(?=S)>, C<(?<=S)>
	2048
	2049	Only the best match for C<S> is considered. (This is important only if
	2050	C<S> has capturing parentheses, and backreferences are used somewhere
	2051	else in the whole regular expression.)
	2052
	2053	=item C<(?!S)>, C<(?<!S)>
	2054
	2055	For this grouping operator there is no need to describe the ordering, since
	2056	only whether or not C<S> can match is important.
	2057
	2058	=item C<(??{ EXPR })>, C<(?PARNO)>
	2059
	2060	The ordering is the same as for the regular expression which is
	2061	the result of EXPR, or the pattern contained by capture group PARNO.
	2062
	2063	=item C<(?(condition)yes-pattern\|no-pattern)>
	2064
	2065	Recall that which of C<yes-pattern> or C<no-pattern> actually matches is
	2066	already determined. The ordering of the matches is the same as for the
	2067	chosen subexpression.
	2068
	2069	=back
	2070
	2071	The above recipes describe the ordering of matches I<at a given position>.
	2072	One more rule is needed to understand how a match is determined for the
	2073	whole regular expression: a match at an earlier position is always better
	2074	than a match at a later position.
	2075
	2076	=head2 Creating Custom RE Engines
	2077
	2078	Overloaded constants (see L<overload>) provide a simple way to extend
	2079	the functionality of the RE engine.
	2080
	2081	Suppose that we want to enable a new RE escape-sequence C<\Y\|> which
	2082	matches at a boundary between whitespace characters and non-whitespace
	2083	characters. Note that C<(?=\S)(?<!\S)\|(?!\S)(?<=\S)> matches exactly
	2084	at these positions, so we want to have each C<\Y\|> in the place of the
	2085	more complicated version. We can create a module C<customre> to do
	2086	this:
	2087
	2088	package customre;
	2089	use overload;
	2090
	2091	sub import {
	2092	shift;
	2093	die "No argument to customre::import allowed" if @_;
	2094	overload::constant 'qr' => \&convert;
	2095	}
	2096
	2097	sub invalid { die "/$_[0]/: invalid escape '\\$_[1]'"}
	2098
	2099	# We must also take care of not escaping the legitimate \\Y\|
	2100	# sequence, hence the presence of '\\' in the conversion rules.
	2101	my %rules = ( '\\' => '\\\\',
	2102	'Y\|' => qr/(?=\S)(?<!\S)\|(?!\S)(?<=\S)/ );
	2103	sub convert {
	2104	my $re = shift;
	2105	$re =~ s{
	2106	\\ ( \\ \| Y . )
	2107	}
	2108	{ $rules{$1} or invalid($re,$1) }sgex;
	2109	return $re;
	2110	}
	2111
	2112	Now C<use customre> enables the new escape in constant regular
	2113	expressions, i.e., those without any runtime variable interpolations.
	2114	As documented in L<overload>, this conversion will work only over
	2115	literal parts of regular expressions. For C<\Y\|$re\Y\|> the variable
	2116	part of this regular expression needs to be converted explicitly
	2117	(but only if the special meaning of C<\Y\|> should be enabled inside $re):
	2118
	2119	use customre;
	2120	$re = <>;
	2121	chomp $re;
	2122	$re = customre::convert $re;
	2123	/\Y\|$re\Y\|/;
	2124
	2125	=head1 PCRE/Python Support
	2126
	2127	As of Perl 5.10.0, Perl supports several Python/PCRE specific extensions
	2128	to the regex syntax. While Perl programmers are encouraged to use the
	2129	Perl specific syntax, the following are also accepted:
	2130
	2131	=over 4
	2132
	2133	=item C<< (?PE<lt>NAMEE<gt>pattern) >>
	2134
	2135	Define a named capture group. Equivalent to C<< (?<NAME>pattern) >>.
	2136
	2137	=item C<< (?P=NAME) >>
	2138
	2139	Backreference to a named capture group. Equivalent to C<< \g{NAME} >>.
	2140
	2141	=item C<< (?P>NAME) >>
	2142
	2143	Subroutine call to a named capture group. Equivalent to C<< (?&NAME) >>.
	2144
	2145	=back
	2146
	2147	=head1 BUGS
	2148
	2149	There are numerous problems with case insensitive matching of characters
	2150	outside the ASCII range, especially with those whose folds are multiple
	2151	characters, such as ligatures like C<LATIN SMALL LIGATURE FF>.
	2152
	2153	In a bracketed character class with case insensitive matching, ranges only work
	2154	for ASCII characters. For example,
	2155	C<m/[\N{CYRILLIC CAPITAL LETTER A}-\N{CYRILLIC CAPITAL LETTER YA}]/i>
	2156	doesn't match all the Russian upper and lower case letters.
	2157
	2158	Many regular expression constructs don't work on EBCDIC platforms.
	2159
	2160	This document varies from difficult to understand to completely
	2161	and utterly opaque. The wandering prose riddled with jargon is
	2162	hard to fathom in several places.
	2163
	2164	This document needs a rewrite that separates the tutorial content
	2165	from the reference content.
	2166
	2167	=head1 SEE ALSO
	2168
	2169	L<perlrequick>.
	2170
	2171	L<perlretut>.
	2172
	2173	L<perlop/"Regexp Quote-Like Operators">.
	2174
	2175	L<perlop/"Gory details of parsing quoted constructs">.
	2176
	2177	L<perlfaq6>.
	2178
	2179	L<perlfunc/pos>.
	2180
	2181	L<perllocale>.
	2182
	2183	L<perlebcdic>.
	2184
	2185	I<Mastering Regular Expressions> by Jeffrey Friedl, published
	2186	by O'Reilly and Associates.