perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=head1 NAME
	2	X<regular expression> X<regex> X<regexp>
	3
	4	perlre - Perl regular expressions
	5
	6	=head1 DESCRIPTION
	7
	8	This page describes the syntax of regular expressions in Perl.
	9
	10	If you haven't used regular expressions before, a quick-start
	11	introduction is available in L<perlrequick>, and a longer tutorial
	12	introduction is available in L<perlretut>.
	13
	14	For reference on how regular expressions are used in matching
	15	operations, plus various examples of the same, see discussions of
	16	C<m//>, C<s///>, C<qr//> and C<??> in L<perlop/"Regexp Quote-Like
	17	Operators">.
	18
	19
	20	=head2 Modifiers
	21
	22	Matching operations can have various modifiers. Modifiers
	23	that relate to the interpretation of the regular expression inside
	24	are listed below. Modifiers that alter the way a regular expression
	25	is used by Perl are detailed in L<perlop/"Regexp Quote-Like Operators"> and
	26	L<perlop/"Gory details of parsing quoted constructs">.
	27
	28	=over 4
	29
	30	=item m
	31	X</m> X<regex, multiline> X<regexp, multiline> X<regular expression, multiline>
	32
	33	Treat string as multiple lines. That is, change "^" and "$" from matching
	34	the start or end of the string to matching the start or end of any
	35	line anywhere within the string.
	36
	37	=item s
	38	X</s> X<regex, single-line> X<regexp, single-line>
	39	X<regular expression, single-line>
	40
	41	Treat string as single line. That is, change "." to match any character
	42	whatsoever, even a newline, which normally it would not match.
	43
	44	Used together, as /ms, they let the "." match any character whatsoever,
	45	while still allowing "^" and "$" to match, respectively, just after
	46	and just before newlines within the string.
	47
	48	=item i
	49	X</i> X<regex, case-insensitive> X<regexp, case-insensitive>
	50	X<regular expression, case-insensitive>
	51
	52	Do case-insensitive pattern matching.
	53
	54	If C<use locale> is in effect, the case map is taken from the current
	55	locale. See L<perllocale>.
	56
	57	=item x
	58	X</x>
	59
	60	Extend your pattern's legibility by permitting whitespace and comments.
	61
	62	=item p
	63	X</p> X<regex, preserve> X<regexp, preserve>
	64
	65	Preserve the string matched such that ${^PREMATCH}, {$^MATCH}, and
	66	${^POSTMATCH} are available for use after matching.
	67
	68	=back
	69
	70	These are usually written as "the C</x> modifier", even though the delimiter
	71	in question might not really be a slash. Any of these
	72	modifiers may also be embedded within the regular expression itself using
	73	the C<(?...)> construct. See below.
	74
	75	The C</x> modifier itself needs a little more explanation. It tells
	76	the regular expression parser to ignore whitespace that is neither
	77	backslashed nor within a character class. You can use this to break up
	78	your regular expression into (slightly) more readable parts. The C<#>
	79	character is also treated as a metacharacter introducing a comment,
	80	just as in ordinary Perl code. This also means that if you want real
	81	whitespace or C<#> characters in the pattern (outside a character
	82	class, where they are unaffected by C</x>), then you'll either have to
	83	escape them (using backslashes or C<\Q...\E>) or encode them using octal
	84	or hex escapes. Taken together, these features go a long way towards
	85	making Perl's regular expressions more readable. Note that you have to
	86	be careful not to include the pattern delimiter in the comment--perl has
	87	no way of knowing you did not intend to close the pattern early. See
	88	the C-comment deletion code in L<perlop>. Also note that anything inside
	89	a C<\Q...\E> stays unaffected by C</x>.
	90	X</x>
	91
	92	=head2 Regular Expressions
	93
	94	=head3 Metacharacters
	95
	96	The patterns used in Perl pattern matching evolved from the ones supplied in
	97	the Version 8 regex routines. (The routines are derived
	98	(distantly) from Henry Spencer's freely redistributable reimplementation
	99	of the V8 routines.) See L<Version 8 Regular Expressions> for
	100	details.
	101
	102	In particular the following metacharacters have their standard I<egrep>-ish
	103	meanings:
	104	X<metacharacter>
	105	X<\> X<^> X<.> X<$> X<\|> X<(> X<()> X<[> X<[]>
	106
	107
	108	\ Quote the next metacharacter
	109	^ Match the beginning of the line
	110	. Match any character (except newline)
	111	$ Match the end of the line (or before newline at the end)
	112	\| Alternation
	113	() Grouping
	114	[] Character class
	115
	116	By default, the "^" character is guaranteed to match only the
	117	beginning of the string, the "$" character only the end (or before the
	118	newline at the end), and Perl does certain optimizations with the
	119	assumption that the string contains only one line. Embedded newlines
	120	will not be matched by "^" or "$". You may, however, wish to treat a
	121	string as a multi-line buffer, such that the "^" will match after any
	122	newline within the string (except if the newline is the last character in
	123	the string), and "$" will match before any newline. At the
	124	cost of a little more overhead, you can do this by using the /m modifier
	125	on the pattern match operator. (Older programs did this by setting C<$*>,
	126	but this practice has been removed in perl 5.9.)
	127	X<^> X<$> X</m>
	128
	129	To simplify multi-line substitutions, the "." character never matches a
	130	newline unless you use the C</s> modifier, which in effect tells Perl to pretend
	131	the string is a single line--even if it isn't.
	132	X<.> X</s>
	133
	134	=head3 Quantifiers
	135
	136	The following standard quantifiers are recognized:
	137	X<metacharacter> X<quantifier> X<*> X<+> X<?> X<{n}> X<{n,}> X<{n,m}>
	138
	139	* Match 0 or more times
	140	+ Match 1 or more times
	141	? Match 1 or 0 times
	142	{n} Match exactly n times
	143	{n,} Match at least n times
	144	{n,m} Match at least n but not more than m times
	145
	146	(If a curly bracket occurs in any other context, it is treated
	147	as a regular character. In particular, the lower bound
	148	is not optional.) The "*" modifier is equivalent to C<{0,}>, the "+"
	149	modifier to C<{1,}>, and the "?" modifier to C<{0,1}>. n and m are limited
	150	to integral values less than a preset limit defined when perl is built.
	151	This is usually 32766 on the most common platforms. The actual limit can
	152	be seen in the error message generated by code such as this:
	153
	154	$_ **= $_ , / {$_} / for 2 .. 42;
	155
	156	By default, a quantified subpattern is "greedy", that is, it will match as
	157	many times as possible (given a particular starting location) while still
	158	allowing the rest of the pattern to match. If you want it to match the
	159	minimum number of times possible, follow the quantifier with a "?". Note
	160	that the meanings don't change, just the "greediness":
	161	X<metacharacter> X<greedy> X<greediness>
	162	X<?> X<*?> X<+?> X<??> X<{n}?> X<{n,}?> X<{n,m}?>
	163
	164	*? Match 0 or more times, not greedily
	165	+? Match 1 or more times, not greedily
	166	?? Match 0 or 1 time, not greedily
	167	{n}? Match exactly n times, not greedily
	168	{n,}? Match at least n times, not greedily
	169	{n,m}? Match at least n but not more than m times, not greedily
	170
	171	By default, when a quantified subpattern does not allow the rest of the
	172	overall pattern to match, Perl will backtrack. However, this behaviour is
	173	sometimes undesirable. Thus Perl provides the "possessive" quantifier form
	174	as well.
	175
	176	*+ Match 0 or more times and give nothing back
	177	++ Match 1 or more times and give nothing back
	178	?+ Match 0 or 1 time and give nothing back
	179	{n}+ Match exactly n times and give nothing back (redundant)
	180	{n,}+ Match at least n times and give nothing back
	181	{n,m}+ Match at least n but not more than m times and give nothing back
	182
	183	For instance,
	184
	185	'aaaa' =~ /a++a/
	186
	187	will never match, as the C<a++> will gobble up all the C<a>'s in the
	188	string and won't leave any for the remaining part of the pattern. This
	189	feature can be extremely useful to give perl hints about where it
	190	shouldn't backtrack. For instance, the typical "match a double-quoted
	191	string" problem can be most efficiently performed when written as:
	192
	193	/"(?:[^"\\]++\|\\.)*+"/
	194
	195	as we know that if the final quote does not match, backtracking will not
	196	help. See the independent subexpression C<< (?>...) >> for more details;
	197	possessive quantifiers are just syntactic sugar for that construct. For
	198	instance the above example could also be written as follows:
	199
	200	/"(?>(?:(?>[^"\\]+)\|\\.)*)"/
	201
	202	=head3 Escape sequences
	203
	204	Because patterns are processed as double quoted strings, the following
	205	also work:
	206	X<\t> X<\n> X<\r> X<\f> X<\e> X<\a> X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q>
	207	X<\0> X<\c> X<\N> X<\x>
	208
	209	\t tab (HT, TAB)
	210	\n newline (LF, NL)
	211	\r return (CR)
	212	\f form feed (FF)
	213	\a alarm (bell) (BEL)
	214	\e escape (think troff) (ESC)
	215	\033 octal char (example: ESC)
	216	\x1B hex char (example: ESC)
	217	\x{263a} wide hex char (example: Unicode SMILEY)
	218	\cK control char (example: VT)
	219	\N{name} named char
	220	\l lowercase next char (think vi)
	221	\u uppercase next char (think vi)
	222	\L lowercase till \E (think vi)
	223	\U uppercase till \E (think vi)
	224	\E end case modification (think vi)
	225	\Q quote (disable) pattern metacharacters till \E
	226
	227	If C<use locale> is in effect, the case map used by C<\l>, C<\L>, C<\u>
	228	and C<\U> is taken from the current locale. See L<perllocale>. For
	229	documentation of C<\N{name}>, see L<charnames>.
	230
	231	You cannot include a literal C<$> or C<@> within a C<\Q> sequence.
	232	An unescaped C<$> or C<@> interpolates the corresponding variable,
	233	while escaping will cause the literal string C<\$> to be matched.
	234	You'll need to write something like C<m/\Quser\E\@\Qhost/>.
	235
	236	=head3 Character classes
	237
	238	In addition, Perl defines the following:
	239	X<\w> X<\W> X<\s> X<\S> X<\d> X<\D> X<\X> X<\p> X<\P> X<\C>
	240	X<\g> X<\k> X<\N> X<\K> X<\v> X<\V>
	241	X<word> X<whitespace> X<character class> X<backreference>
	242
	243	\w Match a "word" character (alphanumeric plus "_")
	244	\W Match a non-"word" character
	245	\s Match a whitespace character
	246	\S Match a non-whitespace character
	247	\d Match a digit character
	248	\D Match a non-digit character
	249	\pP Match P, named property. Use \p{Prop} for longer names.
	250	\PP Match non-P
	251	\X Match eXtended Unicode "combining character sequence",
	252	equivalent to (?:\PM\pM*)
	253	\C Match a single C char (octet) even under Unicode.
	254	NOTE: breaks up characters into their UTF-8 bytes,
	255	so you may end up with malformed pieces of UTF-8.
	256	Unsupported in lookbehind.
	257	\1 Backreference to a specific group.
	258	'1' may actually be any positive integer.
	259	\g1 Backreference to a specific or previous group,
	260	\g{-1} number may be negative indicating a previous buffer and may
	261	optionally be wrapped in curly brackets for safer parsing.
	262	\g{name} Named backreference
	263	\k<name> Named backreference
	264	\N{name} Named unicode character, or unicode escape
	265	\x12 Hexadecimal escape sequence
	266	\x{1234} Long hexadecimal escape sequence
	267	\K Keep the stuff left of the \K, don't include it in $&
	268	\v Shortcut for (*PRUNE)
	269	\V Shortcut for (*SKIP)
	270
	271	A C<\w> matches a single alphanumeric character (an alphabetic
	272	character, or a decimal digit) or C<_>, not a whole word. Use C<\w+>
	273	to match a string of Perl-identifier characters (which isn't the same
	274	as matching an English word). If C<use locale> is in effect, the list
	275	of alphabetic characters generated by C<\w> is taken from the current
	276	locale. See L<perllocale>. You may use C<\w>, C<\W>, C<\s>, C<\S>,
	277	C<\d>, and C<\D> within character classes, but they aren't usable
	278	as either end of a range. If any of them precedes or follows a "-",
	279	the "-" is understood literally. If Unicode is in effect, C<\s> matches
	280	also "\x{85}", "\x{2028}, and "\x{2029}". See L<perlunicode> for more
	281	details about C<\pP>, C<\PP>, C<\X> and the possibility of defining
	282	your own C<\p> and C<\P> properties, and L<perluniintro> about Unicode
	283	in general.
	284	X<\w> X<\W> X<word>
	285
	286	The POSIX character class syntax
	287	X<character class>
	288
	289	[:class:]
	290
	291	is also available. Note that the C<[> and C<]> brackets are I<literal>;
	292	they must always be used within a character class expression.
	293
	294	# this is correct:
	295	$string =~ /[[:alpha:]]/;
	296
	297	# this is not, and will generate a warning:
	298	$string =~ /[:alpha:]/;
	299
	300	The available classes and their backslash equivalents (if available) are
	301	as follows:
	302	X<character class>
	303	X<alpha> X<alnum> X<ascii> X<blank> X<cntrl> X<digit> X<graph>
	304	X<lower> X<print> X<punct> X<space> X<upper> X<word> X<xdigit>
	305
	306	alpha
	307	alnum
	308	ascii
	309	blank [1]
	310	cntrl
	311	digit \d
	312	graph
	313	lower
	314	print
	315	punct
	316	space \s [2]
	317	upper
	318	word \w [3]
	319	xdigit
	320
	321	=over
	322
	323	=item [1]
	324
	325	A GNU extension equivalent to C<[ \t]>, "all horizontal whitespace".
	326
	327	=item [2]
	328
	329	Not exactly equivalent to C<\s> since the C<[[:space:]]> includes
	330	also the (very rare) "vertical tabulator", "\cK" or chr(11) in ASCII.
	331
	332	=item [3]
	333
	334	A Perl extension, see above.
	335
	336	=back
	337
	338	For example use C<[:upper:]> to match all the uppercase characters.
	339	Note that the C<[]> are part of the C<[::]> construct, not part of the
	340	whole character class. For example:
	341
	342	[01[:alpha:]%]
	343
	344	matches zero, one, any alphabetic character, and the percent sign.
	345
	346	The following equivalences to Unicode \p{} constructs and equivalent
	347	backslash character classes (if available), will hold:
	348	X<character class> X<\p> X<\p{}>
	349
	350	[[:...:]] \p{...} backslash
	351
	352	alpha IsAlpha
	353	alnum IsAlnum
	354	ascii IsASCII
	355	blank
	356	cntrl IsCntrl
	357	digit IsDigit \d
	358	graph IsGraph
	359	lower IsLower
	360	print IsPrint
	361	punct IsPunct
	362	space IsSpace
	363	IsSpacePerl \s
	364	upper IsUpper
	365	word IsWord
	366	xdigit IsXDigit
	367
	368	For example C<[[:lower:]]> and C<\p{IsLower}> are equivalent.
	369
	370	If the C<utf8> pragma is not used but the C<locale> pragma is, the
	371	classes correlate with the usual isalpha(3) interface (except for
	372	"word" and "blank").
	373
	374	The assumedly non-obviously named classes are:
	375
	376	=over 4
	377
	378	=item cntrl
	379	X<cntrl>
	380
	381	Any control character. Usually characters that don't produce output as
	382	such but instead control the terminal somehow: for example newline and
	383	backspace are control characters. All characters with ord() less than
	384	32 are usually classified as control characters (assuming ASCII,
	385	the ISO Latin character sets, and Unicode), as is the character with
	386	the ord() value of 127 (C<DEL>).
	387
	388	=item graph
	389	X<graph>
	390
	391	Any alphanumeric or punctuation (special) character.
	392
	393	=item print
	394	X<print>
	395
	396	Any alphanumeric or punctuation (special) character or the space character.
	397
	398	=item punct
	399	X<punct>
	400
	401	Any punctuation (special) character.
	402
	403	=item xdigit
	404	X<xdigit>
	405
	406	Any hexadecimal digit. Though this may feel silly ([0-9A-Fa-f] would
	407	work just fine) it is included for completeness.
	408
	409	=back
	410
	411	You can negate the [::] character classes by prefixing the class name
	412	with a '^'. This is a Perl extension. For example:
	413	X<character class, negation>
	414
	415	POSIX traditional Unicode
	416
	417	[[:^digit:]] \D \P{IsDigit}
	418	[[:^space:]] \S \P{IsSpace}
	419	[[:^word:]] \W \P{IsWord}
	420
	421	Perl respects the POSIX standard in that POSIX character classes are
	422	only supported within a character class. The POSIX character classes
	423	[.cc.] and [=cc=] are recognized but B<not> supported and trying to
	424	use them will cause an error.
	425
	426	=head3 Assertions
	427
	428	Perl defines the following zero-width assertions:
	429	X<zero-width assertion> X<assertion> X<regex, zero-width assertion>
	430	X<regexp, zero-width assertion>
	431	X<regular expression, zero-width assertion>
	432	X<\b> X<\B> X<\A> X<\Z> X<\z> X<\G>
	433
	434	\b Match a word boundary
	435	\B Match except at a word boundary
	436	\A Match only at beginning of string
	437	\Z Match only at end of string, or before newline at the end
	438	\z Match only at end of string
	439	\G Match only at pos() (e.g. at the end-of-match position
	440	of prior m//g)
	441
	442	A word boundary (C<\b>) is a spot between two characters
	443	that has a C<\w> on one side of it and a C<\W> on the other side
	444	of it (in either order), counting the imaginary characters off the
	445	beginning and end of the string as matching a C<\W>. (Within
	446	character classes C<\b> represents backspace rather than a word
	447	boundary, just as it normally does in any double-quoted string.)
	448	The C<\A> and C<\Z> are just like "^" and "$", except that they
	449	won't match multiple times when the C</m> modifier is used, while
	450	"^" and "$" will match at every internal line boundary. To match
	451	the actual end of the string and not ignore an optional trailing
	452	newline, use C<\z>.
	453	X<\b> X<\A> X<\Z> X<\z> X</m>
	454
	455	The C<\G> assertion can be used to chain global matches (using
	456	C<m//g>), as described in L<perlop/"Regexp Quote-Like Operators">.
	457	It is also useful when writing C<lex>-like scanners, when you have
	458	several patterns that you want to match against consequent substrings
	459	of your string, see the previous reference. The actual location
	460	where C<\G> will match can also be influenced by using C<pos()> as
	461	an lvalue: see L<perlfunc/pos>. Note that the rule for zero-length
	462	matches is modified somewhat, in that contents to the left of C<\G> is
	463	not counted when determining the length of the match. Thus the following
	464	will not match forever:
	465	X<\G>
	466
	467	$str = 'ABC';
	468	pos($str) = 1;
	469	while (/.\G/g) {
	470	print $&;
	471	}
	472
	473	It will print 'A' and then terminate, as it considers the match to
	474	be zero-width, and thus will not match at the same position twice in a
	475	row.
	476
	477	It is worth noting that C<\G> improperly used can result in an infinite
	478	loop. Take care when using patterns that include C<\G> in an alternation.
	479
	480	=head3 Capture buffers
	481
	482	The bracketing construct C<( ... )> creates capture buffers. To refer
	483	to the current contents of a buffer later on, within the same pattern,
	484	use \1 for the first, \2 for the second, and so on.
	485	Outside the match use "$" instead of "\". (The
	486	\<digit> notation works in certain circumstances outside
	487	the match. See the warning below about \1 vs $1 for details.)
	488	Referring back to another part of the match is called a
	489	I<backreference>.
	490	X<regex, capture buffer> X<regexp, capture buffer>
	491	X<regular expression, capture buffer> X<backreference>
	492
	493	There is no limit to the number of captured substrings that you may
	494	use. However Perl also uses \10, \11, etc. as aliases for \010,
	495	\011, etc. (Recall that 0 means octal, so \011 is the character at
	496	number 9 in your coded character set; which would be the 10th character,
	497	a horizontal tab under ASCII.) Perl resolves this
	498	ambiguity by interpreting \10 as a backreference only if at least 10
	499	left parentheses have opened before it. Likewise \11 is a
	500	backreference only if at least 11 left parentheses have opened
	501	before it. And so on. \1 through \9 are always interpreted as
	502	backreferences.
	503
	504	X<\g{1}> X<\g{-1}> X<\g{name}> X<relative backreference> X<named backreference>
	505	In order to provide a safer and easier way to construct patterns using
	506	backreferences, Perl 5.10 provides the C<\g{N}> notation. The curly
	507	brackets are optional, however omitting them is less safe as the meaning
	508	of the pattern can be changed by text (such as digits) following it.
	509	When N is a positive integer the C<\g{N}> notation is exactly equivalent
	510	to using normal backreferences. When N is a negative integer then it is
	511	a relative backreference referring to the previous N'th capturing group.
	512	When the bracket form is used and N is not an integer, it is treated as a
	513	reference to a named buffer.
	514
	515	Thus C<\g{-1}> refers to the last buffer, C<\g{-2}> refers to the
	516	buffer before that. For example:
	517
	518	/
	519	(Y) # buffer 1
	520	( # buffer 2
	521	(X) # buffer 3
	522	\g{-1} # backref to buffer 3
	523	\g{-3} # backref to buffer 1
	524	)
	525	/x
	526
	527	and would match the same as C</(Y) ( (X) \3 \1 )/x>.
	528
	529	Additionally, as of Perl 5.10 you may use named capture buffers and named
	530	backreferences. The notation is C<< (?<name>...) >> to declare and C<< \k<name> >>
	531	to reference. You may also use apostrophes instead of angle brackets to delimit the
	532	name; and you may use the bracketed C<< \g{name} >> backreference syntax.
	533	It's possible to refer to a named capture buffer by absolute and relative number as well.
	534	Outside the pattern, a named capture buffer is available via the C<%+> hash.
	535	When different buffers within the same pattern have the same name, C<$+{name}>
	536	and C<< \k<name> >> refer to the leftmost defined group. (Thus it's possible
	537	to do things with named capture buffers that would otherwise require C<(??{})>
	538	code to accomplish.)
	539	X<named capture buffer> X<regular expression, named capture buffer>
	540	X<%+> X<$+{name}> X<\k{name}>
	541
	542	Examples:
	543
	544	s/^([^ ]) ([^ ]*)/$2 $1/; # swap first two words
	545
	546	/(.)\1/ # find first doubled char
	547	and print "'$1' is the first doubled character\n";
	548
	549	/(?<char>.)\k<char>/ # ... a different way
	550	and print "'$+{char}' is the first doubled character\n";
	551
	552	/(?'char'.)\1/ # ... mix and match
	553	and print "'$1' is the first doubled character\n";
	554
	555	if (/Time: (..):(..):(..)/) { # parse out values
	556	$hours = $1;
	557	$minutes = $2;
	558	$seconds = $3;
	559	}
	560
	561	Several special variables also refer back to portions of the previous
	562	match. C<$+> returns whatever the last bracket match matched.
	563	C<$&> returns the entire matched string. (At one point C<$0> did
	564	also, but now it returns the name of the program.) C<$`> returns
	565	everything before the matched string. C<$'> returns everything
	566	after the matched string. And C<$^N> contains whatever was matched by
	567	the most-recently closed group (submatch). C<$^N> can be used in
	568	extended patterns (see below), for example to assign a submatch to a
	569	variable.
	570	X<$+> X<$^N> X<$&> X<$`> X<$'>
	571
	572	The numbered match variables ($1, $2, $3, etc.) and the related punctuation
	573	set (C<$+>, C<$&>, C<$`>, C<$'>, and C<$^N>) are all dynamically scoped
	574	until the end of the enclosing block or until the next successful
	575	match, whichever comes first. (See L<perlsyn/"Compound Statements">.)
	576	X<$+> X<$^N> X<$&> X<$`> X<$'>
	577	X<$1> X<$2> X<$3> X<$4> X<$5> X<$6> X<$7> X<$8> X<$9>
	578
	579
	580	B<NOTE>: Failed matches in Perl do not reset the match variables,
	581	which makes it easier to write code that tests for a series of more
	582	specific cases and remembers the best match.
	583
	584	B<WARNING>: Once Perl sees that you need one of C<$&>, C<$`>, or
	585	C<$'> anywhere in the program, it has to provide them for every
	586	pattern match. This may substantially slow your program. Perl
	587	uses the same mechanism to produce $1, $2, etc, so you also pay a
	588	price for each pattern that contains capturing parentheses. (To
	589	avoid this cost while retaining the grouping behaviour, use the
	590	extended regular expression C<(?: ... )> instead.) But if you never
	591	use C<$&>, C<$`> or C<$'>, then patterns I<without> capturing
	592	parentheses will not be penalized. So avoid C<$&>, C<$'>, and C<$`>
	593	if you can, but if you can't (and some algorithms really appreciate
	594	them), once you've used them once, use them at will, because you've
	595	already paid the price. As of 5.005, C<$&> is not so costly as the
	596	other two.
	597	X<$&> X<$`> X<$'>
	598
	599	As a workaround for this problem, Perl 5.10 introduces C<${^PREMATCH}>,
	600	C<${^MATCH}> and C<${^POSTMATCH}>, which are equivalent to C<$`>, C<$&>
	601	and C<$'>, B<except> that they are only guaranteed to be defined after a
	602	successful match that was executed with the C</p> (preserve) modifier.
	603	The use of these variables incurs no global performance penalty, unlike
	604	their punctuation char equivalents, however at the trade-off that you
	605	have to tell perl when you want to use them.
	606	X</p> X<p modifier>
	607
	608	Backslashed metacharacters in Perl are alphanumeric, such as C<\b>,
	609	C<\w>, C<\n>. Unlike some other regular expression languages, there
	610	are no backslashed symbols that aren't alphanumeric. So anything
	611	that looks like \\, $, $, \<, \>, \{, or \} is always
	612	interpreted as a literal character, not a metacharacter. This was
	613	once used in a common idiom to disable or quote the special meanings
	614	of regular expression metacharacters in a string that you want to
	615	use for a pattern. Simply quote all non-"word" characters:
	616
	617	$pattern =~ s/(\W)/\\$1/g;
	618
	619	(If C<use locale> is set, then this depends on the current locale.)
	620	Today it is more common to use the quotemeta() function or the C<\Q>
	621	metaquoting escape sequence to disable all metacharacters' special
	622	meanings like this:
	623
	624	/$unquoted\Q$quoted\E$unquoted/
	625
	626	Beware that if you put literal backslashes (those not inside
	627	interpolated variables) between C<\Q> and C<\E>, double-quotish
	628	backslash interpolation may lead to confusing results. If you
	629	I<need> to use literal backslashes within C<\Q...\E>,
	630	consult L<perlop/"Gory details of parsing quoted constructs">.
	631
	632	=head2 Extended Patterns
	633
	634	Perl also defines a consistent extension syntax for features not
	635	found in standard tools like B<awk> and B<lex>. The syntax is a
	636	pair of parentheses with a question mark as the first thing within
	637	the parentheses. The character after the question mark indicates
	638	the extension.
	639
	640	The stability of these extensions varies widely. Some have been
	641	part of the core language for many years. Others are experimental
	642	and may change without warning or be completely removed. Check
	643	the documentation on an individual feature to verify its current
	644	status.
	645
	646	A question mark was chosen for this and for the minimal-matching
	647	construct because 1) question marks are rare in older regular
	648	expressions, and 2) whenever you see one, you should stop and
	649	"question" exactly what is going on. That's psychology...
	650
	651	=over 10
	652
	653	=item C<(?#text)>
	654	X<(?#)>
	655
	656	A comment. The text is ignored. If the C</x> modifier enables
	657	whitespace formatting, a simple C<#> will suffice. Note that Perl closes
	658	the comment as soon as it sees a C<)>, so there is no way to put a literal
	659	C<)> in the comment.
	660
	661	=item C<(?kimsx-imsx)>
	662	X<(?)>
	663
	664	One or more embedded pattern-match modifiers, to be turned on (or
	665	turned off, if preceded by C<->) for the remainder of the pattern or
	666	the remainder of the enclosing pattern group (if any). This is
	667	particularly useful for dynamic patterns, such as those read in from a
	668	configuration file, taken from an argument, or specified in a table
	669	somewhere. Consider the case where some patterns want to be case
	670	sensitive and some do not: The case insensitive ones merely need to
	671	include C<(?i)> at the front of the pattern. For example:
	672
	673	$pattern = "foobar";
	674	if ( /$pattern/i ) { }
	675
	676	# more flexible:
	677
	678	$pattern = "(?i)foobar";
	679	if ( /$pattern/ ) { }
	680
	681	These modifiers are restored at the end of the enclosing group. For example,
	682
	683	( (?i) blah ) \s+ \1
	684
	685	will match C<blah> in any case, some spaces, and an exact (I<including the case>!)
	686	repetition of the previous word, assuming the C</x> modifier, and no C</i>
	687	modifier outside this group.
	688
	689	Note that the C<k> modifier is special in that it can only be enabled,
	690	not disabled, and that its presence anywhere in a pattern has a global
	691	effect. Thus C<(?-k)> and C<(?-k:...)> are meaningless and will warn
	692	when executed under C<use warnings>.
	693
	694	=item C<(?:pattern)>
	695	X<(?:)>
	696
	697	=item C<(?imsx-imsx:pattern)>
	698
	699	This is for clustering, not capturing; it groups subexpressions like
	700	"()", but doesn't make backreferences as "()" does. So
	701
	702	@fields = split(/\b(?:a\|b\|c)\b/)
	703
	704	is like
	705
	706	@fields = split(/\b(a\|b\|c)\b/)
	707
	708	but doesn't spit out extra fields. It's also cheaper not to capture
	709	characters if you don't need to.
	710
	711	Any letters between C<?> and C<:> act as flags modifiers as with
	712	C<(?imsx-imsx)>. For example,
	713
	714	/(?s-i:more.than).million/i
	715
	716	is equivalent to the more verbose
	717
	718	/(?:(?s-i)more.than).million/i
	719
	720	=item C<(?\|pattern)>
	721	X<(?\|)> X<Branch reset>
	722
	723	This is the "branch reset" pattern, which has the special property
	724	that the capture buffers are numbered from the same starting point
	725	in each alternation branch. It is available starting from perl 5.10.
	726
	727	Capture buffers are numbered from left to right, but inside this
	728	construct the numbering is restarted for each branch.
	729
	730	The numbering within each branch will be as normal, and any buffers
	731	following this construct will be numbered as though the construct
	732	contained only one branch, that being the one with the most capture
	733	buffers in it.
	734
	735	This construct will be useful when you want to capture one of a
	736	number of alternative matches.
	737
	738	Consider the following pattern. The numbers underneath show in
	739	which buffer the captured content will be stored.
	740
	741
	742	# before ---------------branch-reset----------- after
	743	/ ( a ) (?\| x ( y ) z \| (p (q) r) \| (t) u (v) ) ( z ) /x
	744	# 1 2 2 3 2 3 4
	745
	746	=item Look-Around Assertions
	747	X<look-around assertion> X<lookaround assertion> X<look-around> X<lookaround>
	748
	749	Look-around assertions are zero width patterns which match a specific
	750	pattern without including it in C<$&>. Positive assertions match when
	751	their subpattern matches, negative assertions match when their subpattern
	752	fails. Look-behind matches text up to the current match position,
	753	look-ahead matches text following the current match position.
	754
	755	=over 4
	756
	757	=item C<(?=pattern)>
	758	X<(?=)> X<look-ahead, positive> X<lookahead, positive>
	759
	760	A zero-width positive look-ahead assertion. For example, C</\w+(?=\t)/>
	761	matches a word followed by a tab, without including the tab in C<$&>.
	762
	763	=item C<(?!pattern)>
	764	X<(?!)> X<look-ahead, negative> X<lookahead, negative>
	765
	766	A zero-width negative look-ahead assertion. For example C</foo(?!bar)/>
	767	matches any occurrence of "foo" that isn't followed by "bar". Note
	768	however that look-ahead and look-behind are NOT the same thing. You cannot
	769	use this for look-behind.
	770
	771	If you are looking for a "bar" that isn't preceded by a "foo", C</(?!foo)bar/>
	772	will not do what you want. That's because the C<(?!foo)> is just saying that
	773	the next thing cannot be "foo"--and it's not, it's a "bar", so "foobar" will
	774	match. You would have to do something like C</(?!foo)...bar/> for that. We
	775	say "like" because there's the case of your "bar" not having three characters
	776	before it. You could cover that this way: C</(?:(?!foo)...\|^.{0,2})bar/>.
	777	Sometimes it's still easier just to say:
	778
	779	if (/bar/ && $` !~ /foo$/)
	780
	781	For look-behind see below.
	782
	783	=item C<(?<=pattern)> C<\K>
	784	X<(?<=)> X<look-behind, positive> X<lookbehind, positive> X<\K>
	785
	786	A zero-width positive look-behind assertion. For example, C</(?<=\t)\w+/>
	787	matches a word that follows a tab, without including the tab in C<$&>.
	788	Works only for fixed-width look-behind.
	789
	790	There is a special form of this construct, called C<\K>, which causes the
	791	regex engine to "keep" everything it had matched prior to the C<\K> and
	792	not include it in C<$&>. This effectively provides variable length
	793	look-behind. The use of C<\K> inside of another look-around assertion
	794	is allowed, but the behaviour is currently not well defined.
	795
	796	For various reasons C<\K> may be signifigantly more efficient than the
	797	equivalent C<< (?<=...) >> construct, and it is especially useful in
	798	situations where you want to efficiently remove something following
	799	something else in a string. For instance
	800
	801	s/(foo)bar/$1/g;
	802
	803	can be rewritten as the much more efficient
	804
	805	s/foo\Kbar//g;
	806
	807	=item C<(?<!pattern)>
	808	X<(?<!)> X<look-behind, negative> X<lookbehind, negative>
	809
	810	A zero-width negative look-behind assertion. For example C</(?<!bar)foo/>
	811	matches any occurrence of "foo" that does not follow "bar". Works
	812	only for fixed-width look-behind.
	813
	814	=back
	815
	816	=item C<(?'NAME'pattern)>
	817
	818	=item C<< (?<NAME>pattern) >>
	819	X<< (?<NAME>) >> X<(?'NAME')> X<named capture> X<capture>
	820
	821	A named capture buffer. Identical in every respect to normal capturing
	822	parentheses C<()> but for the additional fact that C<%+> may be used after
	823	a succesful match to refer to a named buffer. See C<perlvar> for more
	824	details on the C<%+> hash.
	825
	826	If multiple distinct capture buffers have the same name then the
	827	$+{NAME} will refer to the leftmost defined buffer in the match.
	828
	829	The forms C<(?'NAME'pattern)> and C<< (?<NAME>pattern) >> are equivalent.
	830
	831	B<NOTE:> While the notation of this construct is the same as the similar
	832	function in .NET regexes, the behavior is not. In Perl the buffers are
	833	numbered sequentially regardless of being named or not. Thus in the
	834	pattern
	835
	836	/(x)(?<foo>y)(z)/
	837
	838	$+{foo} will be the same as $2, and $3 will contain 'z' instead of
	839	the opposite which is what a .NET regex hacker might expect.
	840
	841	Currently NAME is restricted to simple identifiers only.
	842	In other words, it must match C</^[_A-Za-z][_A-Za-z0-9]*\z/> or
	843	its Unicode extension (see L<utf8>),
	844	though it isn't extended by the locale (see L<perllocale>).
	845
	846	B<NOTE:> In order to make things easier for programmers with experience
	847	with the Python or PCRE regex engines, the pattern C<< (?PE<lt>NAMEE<gt>pattern) >>
	848	may be used instead of C<< (?<NAME>pattern) >>; however this form does not
	849	support the use of single quotes as a delimiter for the name. This is
	850	only available in Perl 5.10 or later.
	851
	852	=item C<< \k<NAME> >>
	853
	854	=item C<< \k'NAME' >>
	855
	856	Named backreference. Similar to numeric backreferences, except that
	857	the group is designated by name and not number. If multiple groups
	858	have the same name then it refers to the leftmost defined group in
	859	the current match.
	860
	861	It is an error to refer to a name not defined by a C<< (?<NAME>) >>
	862	earlier in the pattern.
	863
	864	Both forms are equivalent.
	865
	866	B<NOTE:> In order to make things easier for programmers with experience
	867	with the Python or PCRE regex engines, the pattern C<< (?P=NAME) >>
	868	may be used instead of C<< \k<NAME> >> in Perl 5.10 or later.
	869
	870	=item C<(?{ code })>
	871	X<(?{})> X<regex, code in> X<regexp, code in> X<regular expression, code in>
	872
	873	B<WARNING>: This extended regular expression feature is considered
	874	experimental, and may be changed without notice. Code executed that
	875	has side effects may not perform identically from version to version
	876	due to the effect of future optimisations in the regex engine.
	877
	878	This zero-width assertion evaluates any embedded Perl code. It
	879	always succeeds, and its C<code> is not interpolated. Currently,
	880	the rules to determine where the C<code> ends are somewhat convoluted.
	881
	882	This feature can be used together with the special variable C<$^N> to
	883	capture the results of submatches in variables without having to keep
	884	track of the number of nested parentheses. For example:
	885
	886	$_ = "The brown fox jumps over the lazy dog";
	887	/the (\S+)(?{ $color = $^N }) (\S+)(?{ $animal = $^N })/i;
	888	print "color = $color, animal = $animal\n";
	889
	890	Inside the C<(?{...})> block, C<$_> refers to the string the regular
	891	expression is matching against. You can also use C<pos()> to know what is
	892	the current position of matching within this string.
	893
	894	The C<code> is properly scoped in the following sense: If the assertion
	895	is backtracked (compare L<"Backtracking">), all changes introduced after
	896	C<local>ization are undone, so that
	897
	898	$_ = 'a' x 8;
	899	m<
	900	(?{ $cnt = 0 }) # Initialize $cnt.
	901	(
	902	a
	903	(?{
	904	local $cnt = $cnt + 1; # Update $cnt, backtracking-safe.
	905	})
	906	)*
	907	aaaa
	908	(?{ $res = $cnt }) # On success copy to non-localized
	909	# location.
	910	>x;
	911
	912	will set C<$res = 4>. Note that after the match, C<$cnt> returns to the globally
	913	introduced value, because the scopes that restrict C<local> operators
	914	are unwound.
	915
	916	This assertion may be used as a C<(?(condition)yes-pattern\|no-pattern)>
	917	switch. If I<not> used in this way, the result of evaluation of
	918	C<code> is put into the special variable C<$^R>. This happens
	919	immediately, so C<$^R> can be used from other C<(?{ code })> assertions
	920	inside the same regular expression.
	921
	922	The assignment to C<$^R> above is properly localized, so the old
	923	value of C<$^R> is restored if the assertion is backtracked; compare
	924	L<"Backtracking">.
	925
	926	Due to an unfortunate implementation issue, the Perl code contained in these
	927	blocks is treated as a compile time closure that can have seemingly bizarre
	928	consequences when used with lexically scoped variables inside of subroutines
	929	or loops. There are various workarounds for this, including simply using
	930	global variables instead. If you are using this construct and strange results
	931	occur then check for the use of lexically scoped variables.
	932
	933	For reasons of security, this construct is forbidden if the regular
	934	expression involves run-time interpolation of variables, unless the
	935	perilous C<use re 'eval'> pragma has been used (see L<re>), or the
	936	variables contain results of C<qr//> operator (see
	937	L<perlop/"qr/STRING/imosx">).
	938
	939	This restriction is due to the wide-spread and remarkably convenient
	940	custom of using run-time determined strings as patterns. For example:
	941
	942	$re = <>;
	943	chomp $re;
	944	$string =~ /$re/;
	945
	946	Before Perl knew how to execute interpolated code within a pattern,
	947	this operation was completely safe from a security point of view,
	948	although it could raise an exception from an illegal pattern. If
	949	you turn on the C<use re 'eval'>, though, it is no longer secure,
	950	so you should only do so if you are also using taint checking.
	951	Better yet, use the carefully constrained evaluation within a Safe
	952	compartment. See L<perlsec> for details about both these mechanisms.
	953
	954	Because Perl's regex engine is currently not re-entrant, interpolated
	955	code may not invoke the regex engine either directly with C<m//> or C<s///>),
	956	or indirectly with functions such as C<split>.
	957
	958	=item C<(??{ code })>
	959	X<(??{})>
	960	X<regex, postponed> X<regexp, postponed> X<regular expression, postponed>
	961
	962	B<WARNING>: This extended regular expression feature is considered
	963	experimental, and may be changed without notice. Code executed that
	964	has side effects may not perform identically from version to version
	965	due to the effect of future optimisations in the regex engine.
	966
	967	This is a "postponed" regular subexpression. The C<code> is evaluated
	968	at run time, at the moment this subexpression may match. The result
	969	of evaluation is considered as a regular expression and matched as
	970	if it were inserted instead of this construct. Note that this means
	971	that the contents of capture buffers defined inside an eval'ed pattern
	972	are not available outside of the pattern, and vice versa, there is no
	973	way for the inner pattern to refer to a capture buffer defined outside.
	974	Thus,
	975
	976	('a' x 100)=~/(??{'(.)' x 100})/
	977
	978	B<will> match, it will B<not> set $1.
	979
	980	The C<code> is not interpolated. As before, the rules to determine
	981	where the C<code> ends are currently somewhat convoluted.
	982
	983	The following pattern matches a parenthesized group:
	984
	985	$re = qr{
	986	\(
	987	(?:
	988	(?> [^()]+ ) # Non-parens without backtracking
	989	\|
	990	(??{ $re }) # Group with matching parens
	991	)*
	992	\)
	993	}x;
	994
	995	See also C<(?PARNO)> for a different, more efficient way to accomplish
	996	the same task.
	997
	998	Because perl's regex engine is not currently re-entrant, delayed
	999	code may not invoke the regex engine either directly with C<m//> or C<s///>),
	1000	or indirectly with functions such as C<split>.
	1001
	1002	Recursing deeper than 50 times without consuming any input string will
	1003	result in a fatal error. The maximum depth is compiled into perl, so
	1004	changing it requires a custom build.
	1005
	1006	=item C<(?PARNO)> C<(?-PARNO)> C<(?+PARNO)> C<(?R)> C<(?0)>
	1007	X<(?PARNO)> X<(?1)> X<(?R)> X<(?0)> X<(?-1)> X<(?+1)> X<(?-PARNO)> X<(?+PARNO)>
	1008	X<regex, recursive> X<regexp, recursive> X<regular expression, recursive>
	1009	X<regex, relative recursion>
	1010
	1011	Similar to C<(??{ code })> except it does not involve compiling any code,
	1012	instead it treats the contents of a capture buffer as an independent
	1013	pattern that must match at the current position. Capture buffers
	1014	contained by the pattern will have the value as determined by the
	1015	outermost recursion.
	1016
	1017	PARNO is a sequence of digits (not starting with 0) whose value reflects
	1018	the paren-number of the capture buffer to recurse to. C<(?R)> recurses to
	1019	the beginning of the whole pattern. C<(?0)> is an alternate syntax for
	1020	C<(?R)>. If PARNO is preceded by a plus or minus sign then it is assumed
	1021	to be relative, with negative numbers indicating preceding capture buffers
	1022	and positive ones following. Thus C<(?-1)> refers to the most recently
	1023	declared buffer, and C<(?+1)> indicates the next buffer to be declared.
	1024	Note that the counting for relative recursion differs from that of
	1025	relative backreferences, in that with recursion unclosed buffers B<are>
	1026	included.
	1027
	1028	The following pattern matches a function foo() which may contain
	1029	balanced parentheses as the argument.
	1030
	1031	$re = qr{ ( # paren group 1 (full function)
	1032	foo
	1033	( # paren group 2 (parens)
	1034	\(
	1035	( # paren group 3 (contents of parens)
	1036	(?:
	1037	(?> [^()]+ ) # Non-parens without backtracking
	1038	\|
	1039	(?2) # Recurse to start of paren group 2
	1040	)*
	1041	)
	1042	\)
	1043	)
	1044	)
	1045	}x;
	1046
	1047	If the pattern was used as follows
	1048
	1049	'foo(bar(baz)+baz(bop))'=~/$re/
	1050	and print "\$1 = $1\n",
	1051	"\$2 = $2\n",
	1052	"\$3 = $3\n";
	1053
	1054	the output produced should be the following:
	1055
	1056	$1 = foo(bar(baz)+baz(bop))
	1057	$2 = (bar(baz)+baz(bop))
	1058	$3 = bar(baz)+baz(bop)
	1059
	1060	If there is no corresponding capture buffer defined, then it is a
	1061	fatal error. Recursing deeper than 50 times without consuming any input
	1062	string will also result in a fatal error. The maximum depth is compiled
	1063	into perl, so changing it requires a custom build.
	1064
	1065	The following shows how using negative indexing can make it
	1066	easier to embed recursive patterns inside of a C<qr//> construct
	1067	for later use:
	1068
	1069	my $parens = qr/($(?:[^()]++\|(?-1))*+$)/;
	1070	if (/foo $parens \s+ + \s+ bar $parens/x) {
	1071	# do something here...
	1072	}
	1073
	1074	B<Note> that this pattern does not behave the same way as the equivalent
	1075	PCRE or Python construct of the same form. In Perl you can backtrack into
	1076	a recursed group, in PCRE and Python the recursed into group is treated
	1077	as atomic. Also, modifiers are resolved at compile time, so constructs
	1078	like (?i:(?1)) or (?:(?i)(?1)) do not affect how the sub-pattern will
	1079	be processed.
	1080
	1081	=item C<(?&NAME)>
	1082	X<(?&NAME)>
	1083
	1084	Recurse to a named subpattern. Identical to C<(?PARNO)> except that the
	1085	parenthesis to recurse to is determined by name. If multiple parentheses have
	1086	the same name, then it recurses to the leftmost.
	1087
	1088	It is an error to refer to a name that is not declared somewhere in the
	1089	pattern.
	1090
	1091	B<NOTE:> In order to make things easier for programmers with experience
	1092	with the Python or PCRE regex engines the pattern C<< (?P>NAME) >>
	1093	may be used instead of C<< (?&NAME) >> in Perl 5.10 or later.
	1094
	1095	=item C<(?(condition)yes-pattern\|no-pattern)>
	1096	X<(?()>
	1097
	1098	=item C<(?(condition)yes-pattern)>
	1099
	1100	Conditional expression. C<(condition)> should be either an integer in
	1101	parentheses (which is valid if the corresponding pair of parentheses
	1102	matched), a look-ahead/look-behind/evaluate zero-width assertion, a
	1103	name in angle brackets or single quotes (which is valid if a buffer
	1104	with the given name matched), or the special symbol (R) (true when
	1105	evaluated inside of recursion or eval). Additionally the R may be
	1106	followed by a number, (which will be true when evaluated when recursing
	1107	inside of the appropriate group), or by C<&NAME>, in which case it will
	1108	be true only when evaluated during recursion in the named group.
	1109
	1110	Here's a summary of the possible predicates:
	1111
	1112	=over 4
	1113
	1114	=item (1) (2) ...
	1115
	1116	Checks if the numbered capturing buffer has matched something.
	1117
	1118	=item (<NAME>) ('NAME')
	1119
	1120	Checks if a buffer with the given name has matched something.
	1121
	1122	=item (?{ CODE })
	1123
	1124	Treats the code block as the condition.
	1125
	1126	=item (R)
	1127
	1128	Checks if the expression has been evaluated inside of recursion.
	1129
	1130	=item (R1) (R2) ...
	1131
	1132	Checks if the expression has been evaluated while executing directly
	1133	inside of the n-th capture group. This check is the regex equivalent of
	1134
	1135	if ((caller(0))[3] eq 'subname') { ... }
	1136
	1137	In other words, it does not check the full recursion stack.
	1138
	1139	=item (R&NAME)
	1140
	1141	Similar to C<(R1)>, this predicate checks to see if we're executing
	1142	directly inside of the leftmost group with a given name (this is the same
	1143	logic used by C<(?&NAME)> to disambiguate). It does not check the full
	1144	stack, but only the name of the innermost active recursion.
	1145
	1146	=item (DEFINE)
	1147
	1148	In this case, the yes-pattern is never directly executed, and no
	1149	no-pattern is allowed. Similar in spirit to C<(?{0})> but more efficient.
	1150	See below for details.
	1151
	1152	=back
	1153
	1154	For example:
	1155
	1156	m{ ( \( )?
	1157	[^()]+
	1158	(?(1) \) )
	1159	}x
	1160
	1161	matches a chunk of non-parentheses, possibly included in parentheses
	1162	themselves.
	1163
	1164	A special form is the C<(DEFINE)> predicate, which never executes directly
	1165	its yes-pattern, and does not allow a no-pattern. This allows to define
	1166	subpatterns which will be executed only by using the recursion mechanism.
	1167	This way, you can define a set of regular expression rules that can be
	1168	bundled into any pattern you choose.
	1169
	1170	It is recommended that for this usage you put the DEFINE block at the
	1171	end of the pattern, and that you name any subpatterns defined within it.
	1172
	1173	Also, it's worth noting that patterns defined this way probably will
	1174	not be as efficient, as the optimiser is not very clever about
	1175	handling them.
	1176
	1177	An example of how this might be used is as follows:
	1178
	1179	/(?<NAME>(?&NAME_PAT))(?<ADDR>(?&ADDRESS_PAT))
	1180	(?(DEFINE)
	1181	(?<NAME_PAT>....)
	1182	(?<ADRESS_PAT>....)
	1183	)/x
	1184
	1185	Note that capture buffers matched inside of recursion are not accessible
	1186	after the recursion returns, so the extra layer of capturing buffers is
	1187	necessary. Thus C<$+{NAME_PAT}> would not be defined even though
	1188	C<$+{NAME}> would be.
	1189
	1190	=item C<< (?>pattern) >>
	1191	X<backtrack> X<backtracking> X<atomic> X<possessive>
	1192
	1193	An "independent" subexpression, one which matches the substring
	1194	that a I<standalone> C<pattern> would match if anchored at the given
	1195	position, and it matches I<nothing other than this substring>. This
	1196	construct is useful for optimizations of what would otherwise be
	1197	"eternal" matches, because it will not backtrack (see L<"Backtracking">).
	1198	It may also be useful in places where the "grab all you can, and do not
	1199	give anything back" semantic is desirable.
	1200
	1201	For example: C<< ^(?>a)ab >> will never match, since C<< (?>a) >>
	1202	(anchored at the beginning of string, as above) will match I<all>
	1203	characters C<a> at the beginning of string, leaving no C<a> for
	1204	C<ab> to match. In contrast, C<a*ab> will match the same as C<a+b>,
	1205	since the match of the subgroup C<a*> is influenced by the following
	1206	group C<ab> (see L<"Backtracking">). In particular, C<a*> inside
	1207	C<aab> will match fewer characters than a standalone C<a>, since
	1208	this makes the tail match.
	1209
	1210	An effect similar to C<< (?>pattern) >> may be achieved by writing
	1211	C<(?=(pattern))\1>. This matches the same substring as a standalone
	1212	C<a+>, and the following C<\1> eats the matched string; it therefore
	1213	makes a zero-length assertion into an analogue of C<< (?>...) >>.
	1214	(The difference between these two constructs is that the second one
	1215	uses a capturing group, thus shifting ordinals of backreferences
	1216	in the rest of a regular expression.)
	1217
	1218	Consider this pattern:
	1219
	1220	m{ \(
	1221	(
	1222	[^()]+ # x+
	1223	\|
	1224	$ [^()]* $
	1225	)+
	1226	\)
	1227	}x
	1228
	1229	That will efficiently match a nonempty group with matching parentheses
	1230	two levels deep or less. However, if there is no such group, it
	1231	will take virtually forever on a long string. That's because there
	1232	are so many different ways to split a long string into several
	1233	substrings. This is what C<(.+)+> is doing, and C<(.+)+> is similar
	1234	to a subpattern of the above pattern. Consider how the pattern
	1235	above detects no-match on C<((()aaaaaaaaaaaaaaaaaa> in several
	1236	seconds, but that each extra letter doubles this time. This
	1237	exponential performance will make it appear that your program has
	1238	hung. However, a tiny change to this pattern
	1239
	1240	m{ \(
	1241	(
	1242	(?> [^()]+ ) # change x+ above to (?> x+ )
	1243	\|
	1244	$ [^()]* $
	1245	)+
	1246	\)
	1247	}x
	1248
	1249	which uses C<< (?>...) >> matches exactly when the one above does (verifying
	1250	this yourself would be a productive exercise), but finishes in a fourth
	1251	the time when used on a similar string with 1000000 C<a>s. Be aware,
	1252	however, that this pattern currently triggers a warning message under
	1253	the C<use warnings> pragma or B<-w> switch saying it
	1254	C<"matches null string many times in regex">.
	1255
	1256	On simple groups, such as the pattern C<< (?> [^()]+ ) >>, a comparable
	1257	effect may be achieved by negative look-ahead, as in C<[^()]+ (?! [^()] )>.
	1258	This was only 4 times slower on a string with 1000000 C<a>s.
	1259
	1260	The "grab all you can, and do not give anything back" semantic is desirable
	1261	in many situations where on the first sight a simple C<()*> looks like
	1262	the correct solution. Suppose we parse text with comments being delimited
	1263	by C<#> followed by some optional (horizontal) whitespace. Contrary to
	1264	its appearance, C<#[ \t]*> I<is not> the correct subexpression to match
	1265	the comment delimiter, because it may "give up" some whitespace if
	1266	the remainder of the pattern can be made to match that way. The correct
	1267	answer is either one of these:
	1268
	1269	(?>#[ \t]*)
	1270	#[ \t]*(?![ \t])
	1271
	1272	For example, to grab non-empty comments into $1, one should use either
	1273	one of these:
	1274
	1275	/ (?> \# [ \t]* ) ( .+ ) /x;
	1276	/ \# [ \t]* ( [^ \t] .* ) /x;
	1277
	1278	Which one you pick depends on which of these expressions better reflects
	1279	the above specification of comments.
	1280
	1281	In some literature this construct is called "atomic matching" or
	1282	"possessive matching".
	1283
	1284	Possessive quantifiers are equivalent to putting the item they are applied
	1285	to inside of one of these constructs. The following equivalences apply:
	1286
	1287	Quantifier Form Bracketing Form
	1288	--------------- ---------------
	1289	PAT+ (?>PAT)
	1290	PAT++ (?>PAT+)
	1291	PAT?+ (?>PAT?)
	1292	PAT{min,max}+ (?>PAT{min,max})
	1293
	1294	=back
	1295
	1296	=head2 Special Backtracking Control Verbs
	1297
	1298	B<WARNING:> These patterns are experimental and subject to change or
	1299	removal in a future version of Perl. Their usage in production code should
	1300	be noted to avoid problems during upgrades.
	1301
	1302	These special patterns are generally of the form C<(*VERB:ARG)>. Unless
	1303	otherwise stated the ARG argument is optional; in some cases, it is
	1304	forbidden.
	1305
	1306	Any pattern containing a special backtracking verb that allows an argument
	1307	has the special behaviour that when executed it sets the current packages'
	1308	C<$REGERROR> and C<$REGMARK> variables. When doing so the following
	1309	rules apply:
	1310
	1311	On failure, the C<$REGERROR> variable will be set to the ARG value of the
	1312	verb pattern, if the verb was involved in the failure of the match. If the
	1313	ARG part of the pattern was omitted, then C<$REGERROR> will be set to the
	1314	name of the last C<(*MARK:NAME)> pattern executed, or to TRUE if there was
	1315	none. Also, the C<$REGMARK> variable will be set to FALSE.
	1316
	1317	On a successful match, the C<$REGERROR> variable will be set to FALSE, and
	1318	the C<$REGMARK> variable will be set to the name of the last
	1319	C<(*MARK:NAME)> pattern executed. See the explanation for the
	1320	C<(*MARK:NAME)> verb below for more details.
	1321
	1322	B<NOTE:> C<$REGERROR> and C<$REGMARK> are not magic variables like C<$1>
	1323	and most other regex related variables. They are not local to a scope, nor
	1324	readonly, but instead are volatile package variables similar to C<$AUTOLOAD>.
	1325	Use C<local> to localize changes to them to a specific scope if necessary.
	1326
	1327	If a pattern does not contain a special backtracking verb that allows an
	1328	argument, then C<$REGERROR> and C<$REGMARK> are not touched at all.
	1329
	1330	=over 4
	1331
	1332	=item Verbs that take an argument
	1333
	1334	=over 4
	1335
	1336	=item C<(PRUNE)> C<(PRUNE:NAME)>
	1337	X<(PRUNE)> X<(PRUNE:NAME)> X<\v>
	1338
	1339	This zero-width pattern prunes the backtracking tree at the current point
	1340	when backtracked into on failure. Consider the pattern C<A (*PRUNE) B>,
	1341	where A and B are complex patterns. Until the C<(*PRUNE)> verb is reached,
	1342	A may backtrack as necessary to match. Once it is reached, matching
	1343	continues in B, which may also backtrack as necessary; however, should B
	1344	not match, then no further backtracking will take place, and the pattern
	1345	will fail outright at the current starting position.
	1346
	1347	As a shortcut, C<\v> is exactly equivalent to C<(*PRUNE)>.
	1348
	1349	The following example counts all the possible matching strings in a
	1350	pattern (without actually matching any of them).
	1351
	1352	'aaab' =~ /a+b?(?{print "$&\n"; $count++})(*FAIL)/;
	1353	print "Count=$count\n";
	1354
	1355	which produces:
	1356
	1357	aaab
	1358	aaa
	1359	aa
	1360	a
	1361	aab
	1362	aa
	1363	a
	1364	ab
	1365	a
	1366	Count=9
	1367
	1368	If we add a C<(*PRUNE)> before the count like the following
	1369
	1370	'aaab' =~ /a+b?(PRUNE)(?{print "$&\n"; $count++})(FAIL)/;
	1371	print "Count=$count\n";
	1372
	1373	we prevent backtracking and find the count of the longest matching
	1374	at each matching startpoint like so:
	1375
	1376	aaab
	1377	aab
	1378	ab
	1379	Count=3
	1380
	1381	Any number of C<(*PRUNE)> assertions may be used in a pattern.
	1382
	1383	See also C<< (?>pattern) >> and possessive quantifiers for other ways to
	1384	control backtracking. In some cases, the use of C<(*PRUNE)> can be
	1385	replaced with a C<< (?>pattern) >> with no functional difference; however,
	1386	C<(*PRUNE)> can be used to handle cases that cannot be expressed using a
	1387	C<< (?>pattern) >> alone.
	1388
	1389
	1390	=item C<(SKIP)> C<(SKIP:NAME)>
	1391	X<(*SKIP)>
	1392
	1393	This zero-width pattern is similar to C<(*PRUNE)>, except that on
	1394	failure it also signifies that whatever text that was matched leading up
	1395	to the C<(*SKIP)> pattern being executed cannot be part of I<any> match
	1396	of this pattern. This effectively means that the regex engine "skips" forward
	1397	to this position on failure and tries to match again, (assuming that
	1398	there is sufficient room to match).
	1399
	1400	As a shortcut C<\V> is exactly equivalent to C<(*SKIP)>.
	1401
	1402	The name of the C<(*SKIP:NAME)> pattern has special significance. If a
	1403	C<(*MARK:NAME)> was encountered while matching, then it is that position
	1404	which is used as the "skip point". If no C<(*MARK)> of that name was
	1405	encountered, then the C<(*SKIP)> operator has no effect. When used
	1406	without a name the "skip point" is where the match point was when
	1407	executing the (*SKIP) pattern.
	1408
	1409	Compare the following to the examples in C<(*PRUNE)>, note the string
	1410	is twice as long:
	1411
	1412	'aaabaaab' =~ /a+b?(SKIP)(?{print "$&\n"; $count++})(FAIL)/;
	1413	print "Count=$count\n";
	1414
	1415	outputs
	1416
	1417	aaab
	1418	aaab
	1419	Count=2
	1420
	1421	Once the 'aaab' at the start of the string has matched, and the C<(*SKIP)>
	1422	executed, the next startpoint will be where the cursor was when the
	1423	C<(*SKIP)> was executed.
	1424
	1425	=item C<(MARK:NAME)> C<(:NAME)>
	1426	X<(MARK)> C<(MARK:NAME)> C<(*:NAME)>
	1427
	1428	This zero-width pattern can be used to mark the point reached in a string
	1429	when a certain part of the pattern has been successfully matched. This
	1430	mark may be given a name. A later C<(*SKIP)> pattern will then skip
	1431	forward to that point if backtracked into on failure. Any number of
	1432	C<(*MARK)> patterns are allowed, and the NAME portion is optional and may
	1433	be duplicated.
	1434
	1435	In addition to interacting with the C<(SKIP)> pattern, C<(MARK:NAME)>
	1436	can be used to "label" a pattern branch, so that after matching, the
	1437	program can determine which branches of the pattern were involved in the
	1438	match.
	1439
	1440	When a match is successful, the C<$REGMARK> variable will be set to the
	1441	name of the most recently executed C<(*MARK:NAME)> that was involved
	1442	in the match.
	1443
	1444	This can be used to determine which branch of a pattern was matched
	1445	without using a seperate capture buffer for each branch, which in turn
	1446	can result in a performance improvement, as perl cannot optimize
	1447	C</(?:(x)\|(y)\|(z))/> as efficiently as something like
	1448	C</(?:x(MARK:x)\|y(MARK:y)\|z(*MARK:z))/>.
	1449
	1450	When a match has failed, and unless another verb has been involved in
	1451	failing the match and has provided its own name to use, the C<$REGERROR>
	1452	variable will be set to the name of the most recently executed
	1453	C<(*MARK:NAME)>.
	1454
	1455	See C<(*SKIP)> for more details.
	1456
	1457	As a shortcut C<(MARK:NAME)> can be written C<(:NAME)>.
	1458
	1459	=item C<(THEN)> C<(THEN:NAME)>
	1460
	1461	This is similar to the "cut group" operator C<::> from Perl6. Like
	1462	C<(*PRUNE)>, this verb always matches, and when backtracked into on
	1463	failure, it causes the regex engine to try the next alternation in the
	1464	innermost enclosing group (capturing or otherwise).
	1465
	1466	Its name comes from the observation that this operation combined with the
	1467	alternation operator (C<\|>) can be used to create what is essentially a
	1468	pattern-based if/then/else block:
	1469
	1470	( COND (THEN) FOO \| COND2 (THEN) BAR \| COND3 (*THEN) BAZ )
	1471
	1472	Note that if this operator is used and NOT inside of an alternation then
	1473	it acts exactly like the C<(*PRUNE)> operator.
	1474
	1475	/ A (*PRUNE) B /
	1476
	1477	is the same as
	1478
	1479	/ A (*THEN) B /
	1480
	1481	but
	1482
	1483	/ ( A (THEN) B \| C (THEN) D ) /
	1484
	1485	is not the same as
	1486
	1487	/ ( A (PRUNE) B \| C (PRUNE) D ) /
	1488
	1489	as after matching the A but failing on the B the C<(*THEN)> verb will
	1490	backtrack and try C; but the C<(*PRUNE)> verb will simply fail.
	1491
	1492	=item C<(*COMMIT)>
	1493	X<(*COMMIT)>
	1494
	1495	This is the Perl6 "commit pattern" C<< <commit> >> or C<:::>. It's a
	1496	zero-width pattern similar to C<(*SKIP)>, except that when backtracked
	1497	into on failure it causes the match to fail outright. No further attempts
	1498	to find a valid match by advancing the start pointer will occur again.
	1499	For example,
	1500
	1501	'aaabaaab' =~ /a+b?(COMMIT)(?{print "$&\n"; $count++})(FAIL)/;
	1502	print "Count=$count\n";
	1503
	1504	outputs
	1505
	1506	aaab
	1507	Count=1
	1508
	1509	In other words, once the C<(*COMMIT)> has been entered, and if the pattern
	1510	does not match, the regex engine will not try any further matching on the
	1511	rest of the string.
	1512
	1513	=back
	1514
	1515	=item Verbs without an argument
	1516
	1517	=over 4
	1518
	1519	=item C<(FAIL)> C<(F)>
	1520	X<(FAIL)> X<(F)>
	1521
	1522	This pattern matches nothing and always fails. It can be used to force the
	1523	engine to backtrack. It is equivalent to C<(?!)>, but easier to read. In
	1524	fact, C<(?!)> gets optimised into C<(*FAIL)> internally.
	1525
	1526	It is probably useful only when combined with C<(?{})> or C<(??{})>.
	1527
	1528	=item C<(*ACCEPT)>
	1529	X<(*ACCEPT)>
	1530
	1531	B<WARNING:> This feature is highly experimental. It is not recommended
	1532	for production code.
	1533
	1534	This pattern matches nothing and causes the end of successful matching at
	1535	the point at which the C<(*ACCEPT)> pattern was encountered, regardless of
	1536	whether there is actually more to match in the string. When inside of a
	1537	nested pattern, such as recursion, or in a subpattern dynamically generated
	1538	via C<(??{})>, only the innermost pattern is ended immediately.
	1539
	1540	If the C<(*ACCEPT)> is inside of capturing buffers then the buffers are
	1541	marked as ended at the point at which the C<(*ACCEPT)> was encountered.
	1542	For instance:
	1543
	1544	'AB' =~ /(A (A\|B(*ACCEPT)\|C) D)(E)/x;
	1545
	1546	will match, and C<$1> will be C<AB> and C<$2> will be C<B>, C<$3> will not
	1547	be set. If another branch in the inner parentheses were matched, such as in the
	1548	string 'ACDE', then the C<D> and C<E> would have to be matched as well.
	1549
	1550	=back
	1551
	1552	=back
	1553
	1554	=head2 Backtracking
	1555	X<backtrack> X<backtracking>
	1556
	1557	NOTE: This section presents an abstract approximation of regular
	1558	expression behavior. For a more rigorous (and complicated) view of
	1559	the rules involved in selecting a match among possible alternatives,
	1560	see L<Combining RE Pieces>.
	1561
	1562	A fundamental feature of regular expression matching involves the
	1563	notion called I<backtracking>, which is currently used (when needed)
	1564	by all regular non-possessive expression quantifiers, namely C<>, C<?>, C<+>,
	1565	C<+?>, C<{n,m}>, and C<{n,m}?>. Backtracking is often optimized
	1566	internally, but the general principle outlined here is valid.
	1567
	1568	For a regular expression to match, the I<entire> regular expression must
	1569	match, not just part of it. So if the beginning of a pattern containing a
	1570	quantifier succeeds in a way that causes later parts in the pattern to
	1571	fail, the matching engine backs up and recalculates the beginning
	1572	part--that's why it's called backtracking.
	1573
	1574	Here is an example of backtracking: Let's say you want to find the
	1575	word following "foo" in the string "Food is on the foo table.":
	1576
	1577	$_ = "Food is on the foo table.";
	1578	if ( /\b(foo)\s+(\w+)/i ) {
	1579	print "$2 follows $1.\n";
	1580	}
	1581
	1582	When the match runs, the first part of the regular expression (C<\b(foo)>)
	1583	finds a possible match right at the beginning of the string, and loads up
	1584	$1 with "Foo". However, as soon as the matching engine sees that there's
	1585	no whitespace following the "Foo" that it had saved in $1, it realizes its
	1586	mistake and starts over again one character after where it had the
	1587	tentative match. This time it goes all the way until the next occurrence
	1588	of "foo". The complete regular expression matches this time, and you get
	1589	the expected output of "table follows foo."
	1590
	1591	Sometimes minimal matching can help a lot. Imagine you'd like to match
	1592	everything between "foo" and "bar". Initially, you write something
	1593	like this:
	1594
	1595	$_ = "The food is under the bar in the barn.";
	1596	if ( /foo(.*)bar/ ) {
	1597	print "got <$1>\n";
	1598	}
	1599
	1600	Which perhaps unexpectedly yields:
	1601
	1602	got <d is under the bar in the >
	1603
	1604	That's because C<.*> was greedy, so you get everything between the
	1605	I<first> "foo" and the I<last> "bar". Here it's more effective
	1606	to use minimal matching to make sure you get the text between a "foo"
	1607	and the first "bar" thereafter.
	1608
	1609	if ( /foo(.*?)bar/ ) { print "got <$1>\n" }
	1610	got <d is under the >
	1611
	1612	Here's another example. Let's say you'd like to match a number at the end
	1613	of a string, and you also want to keep the preceding part of the match.
	1614	So you write this:
	1615
	1616	$_ = "I have 2 numbers: 53147";
	1617	if ( /(.)(\d)/ ) { # Wrong!
	1618	print "Beginning is <$1>, number is <$2>.\n";
	1619	}
	1620
	1621	That won't work at all, because C<.*> was greedy and gobbled up the
	1622	whole string. As C<\d*> can match on an empty string the complete
	1623	regular expression matched successfully.
	1624
	1625	Beginning is <I have 2 numbers: 53147>, number is <>.
	1626
	1627	Here are some variants, most of which don't work:
	1628
	1629	$_ = "I have 2 numbers: 53147";
	1630	@pats = qw{
	1631	(.)(\d)
	1632	(.*)(\d+)
	1633	(.?)(\d)
	1634	(.*?)(\d+)
	1635	(.*)(\d+)$
	1636	(.*?)(\d+)$
	1637	(.*)\b(\d+)$
	1638	(.*\D)(\d+)$
	1639	};
	1640
	1641	for $pat (@pats) {
	1642	printf "%-12s ", $pat;
	1643	if ( /$pat/ ) {
	1644	print "<$1> <$2>\n";
	1645	} else {
	1646	print "FAIL\n";
	1647	}
	1648	}
	1649
	1650	That will print out:
	1651
	1652	(.)(\d) <I have 2 numbers: 53147> <>
	1653	(.*)(\d+) <I have 2 numbers: 5314> <7>
	1654	(.?)(\d) <> <>
	1655	(.*?)(\d+) <I have > <2>
	1656	(.*)(\d+)$ <I have 2 numbers: 5314> <7>
	1657	(.*?)(\d+)$ <I have 2 numbers: > <53147>
	1658	(.*)\b(\d+)$ <I have 2 numbers: > <53147>
	1659	(.*\D)(\d+)$ <I have 2 numbers: > <53147>
	1660
	1661	As you see, this can be a bit tricky. It's important to realize that a
	1662	regular expression is merely a set of assertions that gives a definition
	1663	of success. There may be 0, 1, or several different ways that the
	1664	definition might succeed against a particular string. And if there are
	1665	multiple ways it might succeed, you need to understand backtracking to
	1666	know which variety of success you will achieve.
	1667
	1668	When using look-ahead assertions and negations, this can all get even
	1669	trickier. Imagine you'd like to find a sequence of non-digits not
	1670	followed by "123". You might try to write that as
	1671
	1672	$_ = "ABC123";
	1673	if ( /^\D*(?!123)/ ) { # Wrong!
	1674	print "Yup, no 123 in $_\n";
	1675	}
	1676
	1677	But that isn't going to match; at least, not the way you're hoping. It
	1678	claims that there is no 123 in the string. Here's a clearer picture of
	1679	why that pattern matches, contrary to popular expectations:
	1680
	1681	$x = 'ABC123';
	1682	$y = 'ABC445';
	1683
	1684	print "1: got $1\n" if $x =~ /^(ABC)(?!123)/;
	1685	print "2: got $1\n" if $y =~ /^(ABC)(?!123)/;
	1686
	1687	print "3: got $1\n" if $x =~ /^(\D*)(?!123)/;
	1688	print "4: got $1\n" if $y =~ /^(\D*)(?!123)/;
	1689
	1690	This prints
	1691
	1692	2: got ABC
	1693	3: got AB
	1694	4: got ABC
	1695
	1696	You might have expected test 3 to fail because it seems to a more
	1697	general purpose version of test 1. The important difference between
	1698	them is that test 3 contains a quantifier (C<\D*>) and so can use
	1699	backtracking, whereas test 1 will not. What's happening is
	1700	that you've asked "Is it true that at the start of $x, following 0 or more
	1701	non-digits, you have something that's not 123?" If the pattern matcher had
	1702	let C<\D*> expand to "ABC", this would have caused the whole pattern to
	1703	fail.
	1704
	1705	The search engine will initially match C<\D*> with "ABC". Then it will
	1706	try to match C<(?!123> with "123", which fails. But because
	1707	a quantifier (C<\D*>) has been used in the regular expression, the
	1708	search engine can backtrack and retry the match differently
	1709	in the hope of matching the complete regular expression.
	1710
	1711	The pattern really, I<really> wants to succeed, so it uses the
	1712	standard pattern back-off-and-retry and lets C<\D*> expand to just "AB" this
	1713	time. Now there's indeed something following "AB" that is not
	1714	"123". It's "C123", which suffices.
	1715
	1716	We can deal with this by using both an assertion and a negation.
	1717	We'll say that the first part in $1 must be followed both by a digit
	1718	and by something that's not "123". Remember that the look-aheads
	1719	are zero-width expressions--they only look, but don't consume any
	1720	of the string in their match. So rewriting this way produces what
	1721	you'd expect; that is, case 5 will fail, but case 6 succeeds:
	1722
	1723	print "5: got $1\n" if $x =~ /^(\D*)(?=\d)(?!123)/;
	1724	print "6: got $1\n" if $y =~ /^(\D*)(?=\d)(?!123)/;
	1725
	1726	6: got ABC
	1727
	1728	In other words, the two zero-width assertions next to each other work as though
	1729	they're ANDed together, just as you'd use any built-in assertions: C</^$/>
	1730	matches only if you're at the beginning of the line AND the end of the
	1731	line simultaneously. The deeper underlying truth is that juxtaposition in
	1732	regular expressions always means AND, except when you write an explicit OR
	1733	using the vertical bar. C</ab/> means match "a" AND (then) match "b",
	1734	although the attempted matches are made at different positions because "a"
	1735	is not a zero-width assertion, but a one-width assertion.
	1736
	1737	B<WARNING>: Particularly complicated regular expressions can take
	1738	exponential time to solve because of the immense number of possible
	1739	ways they can use backtracking to try for a match. For example, without
	1740	internal optimizations done by the regular expression engine, this will
	1741	take a painfully long time to run:
	1742
	1743	'aaaaaaaaaaaa' =~ /((a{0,5}){0,5})*[c]/
	1744
	1745	And if you used C<*>'s in the internal groups instead of limiting them
	1746	to 0 through 5 matches, then it would take forever--or until you ran
	1747	out of stack space. Moreover, these internal optimizations are not
	1748	always applicable. For example, if you put C<{0,5}> instead of C<*>
	1749	on the external group, no current optimization is applicable, and the
	1750	match takes a long time to finish.
	1751
	1752	A powerful tool for optimizing such beasts is what is known as an
	1753	"independent group",
	1754	which does not backtrack (see L<C<< (?>pattern) >>>). Note also that
	1755	zero-length look-ahead/look-behind assertions will not backtrack to make
	1756	the tail match, since they are in "logical" context: only
	1757	whether they match is considered relevant. For an example
	1758	where side-effects of look-ahead I<might> have influenced the
	1759	following match, see L<C<< (?>pattern) >>>.
	1760
	1761	=head2 Version 8 Regular Expressions
	1762	X<regular expression, version 8> X<regex, version 8> X<regexp, version 8>
	1763
	1764	In case you're not familiar with the "regular" Version 8 regex
	1765	routines, here are the pattern-matching rules not described above.
	1766
	1767	Any single character matches itself, unless it is a I<metacharacter>
	1768	with a special meaning described here or above. You can cause
	1769	characters that normally function as metacharacters to be interpreted
	1770	literally by prefixing them with a "\" (e.g., "\." matches a ".", not any
	1771	character; "\\" matches a "\"). This escape mechanism is also required
	1772	for the character used as the pattern delimiter.
	1773
	1774	A series of characters matches that series of characters in the target
	1775	string, so the pattern C<blurfl> would match "blurfl" in the target
	1776	string.
	1777
	1778	You can specify a character class, by enclosing a list of characters
	1779	in C<[]>, which will match any character from the list. If the
	1780	first character after the "[" is "^", the class matches any character not
	1781	in the list. Within a list, the "-" character specifies a
	1782	range, so that C<a-z> represents all characters between "a" and "z",
	1783	inclusive. If you want either "-" or "]" itself to be a member of a
	1784	class, put it at the start of the list (possibly after a "^"), or
	1785	escape it with a backslash. "-" is also taken literally when it is
	1786	at the end of the list, just before the closing "]". (The
	1787	following all specify the same class of three characters: C<[-az]>,
	1788	C<[az-]>, and C<[a\-z]>. All are different from C<[a-z]>, which
	1789	specifies a class containing twenty-six characters, even on EBCDIC-based
	1790	character sets.) Also, if you try to use the character
	1791	classes C<\w>, C<\W>, C<\s>, C<\S>, C<\d>, or C<\D> as endpoints of
	1792	a range, the "-" is understood literally.
	1793
	1794	Note also that the whole range idea is rather unportable between
	1795	character sets--and even within character sets they may cause results
	1796	you probably didn't expect. A sound principle is to use only ranges
	1797	that begin from and end at either alphabetics of equal case ([a-e],
	1798	[A-E]), or digits ([0-9]). Anything else is unsafe. If in doubt,
	1799	spell out the character sets in full.
	1800
	1801	Characters may be specified using a metacharacter syntax much like that
	1802	used in C: "\n" matches a newline, "\t" a tab, "\r" a carriage return,
	1803	"\f" a form feed, etc. More generally, \I<nnn>, where I<nnn> is a string
	1804	of octal digits, matches the character whose coded character set value
	1805	is I<nnn>. Similarly, \xI<nn>, where I<nn> are hexadecimal digits,
	1806	matches the character whose numeric value is I<nn>. The expression \cI<x>
	1807	matches the character control-I<x>. Finally, the "." metacharacter
	1808	matches any character except "\n" (unless you use C</s>).
	1809
	1810	You can specify a series of alternatives for a pattern using "\|" to
	1811	separate them, so that C<fee\|fie\|foe> will match any of "fee", "fie",
	1812	or "foe" in the target string (as would C<f(e\|i\|o)e>). The
	1813	first alternative includes everything from the last pattern delimiter
	1814	("(", "[", or the beginning of the pattern) up to the first "\|", and
	1815	the last alternative contains everything from the last "\|" to the next
	1816	pattern delimiter. That's why it's common practice to include
	1817	alternatives in parentheses: to minimize confusion about where they
	1818	start and end.
	1819
	1820	Alternatives are tried from left to right, so the first
	1821	alternative found for which the entire expression matches, is the one that
	1822	is chosen. This means that alternatives are not necessarily greedy. For
	1823	example: when matching C<foo\|foot> against "barefoot", only the "foo"
	1824	part will match, as that is the first alternative tried, and it successfully
	1825	matches the target string. (This might not seem important, but it is
	1826	important when you are capturing matched text using parentheses.)
	1827
	1828	Also remember that "\|" is interpreted as a literal within square brackets,
	1829	so if you write C<[fee\|fie\|foe]> you're really only matching C<[feio\|]>.
	1830
	1831	Within a pattern, you may designate subpatterns for later reference
	1832	by enclosing them in parentheses, and you may refer back to the
	1833	I<n>th subpattern later in the pattern using the metacharacter
	1834	\I<n>. Subpatterns are numbered based on the left to right order
	1835	of their opening parenthesis. A backreference matches whatever
	1836	actually matched the subpattern in the string being examined, not
	1837	the rules for that subpattern. Therefore, C<(0\|0x)\d\s\1\d> will
	1838	match "0x1234 0x4321", but not "0x1234 01234", because subpattern
	1839	1 matched "0x", even though the rule C<0\|0x> could potentially match
	1840	the leading 0 in the second number.
	1841
	1842	=head2 Warning on \1 Instead of $1
	1843
	1844	Some people get too used to writing things like:
	1845
	1846	$pattern =~ s/(\W)/\\\1/g;
	1847
	1848	This is grandfathered for the RHS of a substitute to avoid shocking the
	1849	B<sed> addicts, but it's a dirty habit to get into. That's because in
	1850	PerlThink, the righthand side of an C<s///> is a double-quoted string. C<\1> in
	1851	the usual double-quoted string means a control-A. The customary Unix
	1852	meaning of C<\1> is kludged in for C<s///>. However, if you get into the habit
	1853	of doing that, you get yourself into trouble if you then add an C</e>
	1854	modifier.
	1855
	1856	s/(\d+)/ \1 + 1 /eg; # causes warning under -w
	1857
	1858	Or if you try to do
	1859
	1860	s/(\d+)/\1000/;
	1861
	1862	You can't disambiguate that by saying C<\{1}000>, whereas you can fix it with
	1863	C<${1}000>. The operation of interpolation should not be confused
	1864	with the operation of matching a backreference. Certainly they mean two
	1865	different things on the I<left> side of the C<s///>.
	1866
	1867	=head2 Repeated Patterns Matching a Zero-length Substring
	1868
	1869	B<WARNING>: Difficult material (and prose) ahead. This section needs a rewrite.
	1870
	1871	Regular expressions provide a terse and powerful programming language. As
	1872	with most other power tools, power comes together with the ability
	1873	to wreak havoc.
	1874
	1875	A common abuse of this power stems from the ability to make infinite
	1876	loops using regular expressions, with something as innocuous as:
	1877
	1878	'foo' =~ m{ ( o? )* }x;
	1879
	1880	The C<o?> matches at the beginning of C<'foo'>, and since the position
	1881	in the string is not moved by the match, C<o?> would match again and again
	1882	because of the C<*> modifier. Another common way to create a similar cycle
	1883	is with the looping modifier C<//g>:
	1884
	1885	@matches = ( 'foo' =~ m{ o? }xg );
	1886
	1887	or
	1888
	1889	print "match: <$&>\n" while 'foo' =~ m{ o? }xg;
	1890
	1891	or the loop implied by split().
	1892
	1893	However, long experience has shown that many programming tasks may
	1894	be significantly simplified by using repeated subexpressions that
	1895	may match zero-length substrings. Here's a simple example being:
	1896
	1897	@chars = split //, $string; # // is not magic in split
	1898	($whitewashed = $string) =~ s/()/ /g; # parens avoid magic s// /
	1899
	1900	Thus Perl allows such constructs, by I<forcefully breaking
	1901	the infinite loop>. The rules for this are different for lower-level
	1902	loops given by the greedy modifiers C<*+{}>, and for higher-level
	1903	ones like the C</g> modifier or split() operator.
	1904
	1905	The lower-level loops are I<interrupted> (that is, the loop is
	1906	broken) when Perl detects that a repeated expression matched a
	1907	zero-length substring. Thus
	1908
	1909	m{ (?: NON_ZERO_LENGTH \| ZERO_LENGTH )* }x;
	1910
	1911	is made equivalent to
	1912
	1913	m{ (?: NON_ZERO_LENGTH )*
	1914	\|
	1915	(?: ZERO_LENGTH )?
	1916	}x;
	1917
	1918	The higher level-loops preserve an additional state between iterations:
	1919	whether the last match was zero-length. To break the loop, the following
	1920	match after a zero-length match is prohibited to have a length of zero.
	1921	This prohibition interacts with backtracking (see L<"Backtracking">),
	1922	and so the I<second best> match is chosen if the I<best> match is of
	1923	zero length.
	1924
	1925	For example:
	1926
	1927	$_ = 'bar';
	1928	s/\w??/<$&>/g;
	1929
	1930	results in C<< <><b><><a><><r><> >>. At each position of the string the best
	1931	match given by non-greedy C<??> is the zero-length match, and the I<second
	1932	best> match is what is matched by C<\w>. Thus zero-length matches
	1933	alternate with one-character-long matches.
	1934
	1935	Similarly, for repeated C<m/()/g> the second-best match is the match at the
	1936	position one notch further in the string.
	1937
	1938	The additional state of being I<matched with zero-length> is associated with
	1939	the matched string, and is reset by each assignment to pos().
	1940	Zero-length matches at the end of the previous match are ignored
	1941	during C<split>.
	1942
	1943	=head2 Combining RE Pieces
	1944
	1945	Each of the elementary pieces of regular expressions which were described
	1946	before (such as C<ab> or C<\Z>) could match at most one substring
	1947	at the given position of the input string. However, in a typical regular
	1948	expression these elementary pieces are combined into more complicated
	1949	patterns using combining operators C<ST>, C<S\|T>, C<S*> etc
	1950	(in these examples C<S> and C<T> are regular subexpressions).
	1951
	1952	Such combinations can include alternatives, leading to a problem of choice:
	1953	if we match a regular expression C<a\|ab> against C<"abc">, will it match
	1954	substring C<"a"> or C<"ab">? One way to describe which substring is
	1955	actually matched is the concept of backtracking (see L<"Backtracking">).
	1956	However, this description is too low-level and makes you think
	1957	in terms of a particular implementation.
	1958
	1959	Another description starts with notions of "better"/"worse". All the
	1960	substrings which may be matched by the given regular expression can be
	1961	sorted from the "best" match to the "worst" match, and it is the "best"
	1962	match which is chosen. This substitutes the question of "what is chosen?"
	1963	by the question of "which matches are better, and which are worse?".
	1964
	1965	Again, for elementary pieces there is no such question, since at most
	1966	one match at a given position is possible. This section describes the
	1967	notion of better/worse for combining operators. In the description
	1968	below C<S> and C<T> are regular subexpressions.
	1969
	1970	=over 4
	1971
	1972	=item C<ST>
	1973
	1974	Consider two possible matches, C<AB> and C<A'B'>, C<A> and C<A'> are
	1975	substrings which can be matched by C<S>, C<B> and C<B'> are substrings
	1976	which can be matched by C<T>.
	1977
	1978	If C<A> is better match for C<S> than C<A'>, C<AB> is a better
	1979	match than C<A'B'>.
	1980
	1981	If C<A> and C<A'> coincide: C<AB> is a better match than C<AB'> if
	1982	C<B> is better match for C<T> than C<B'>.
	1983
	1984	=item C<S\|T>
	1985
	1986	When C<S> can match, it is a better match than when only C<T> can match.
	1987
	1988	Ordering of two matches for C<S> is the same as for C<S>. Similar for
	1989	two matches for C<T>.
	1990
	1991	=item C<S{REPEAT_COUNT}>
	1992
	1993	Matches as C<SSS...S> (repeated as many times as necessary).
	1994
	1995	=item C<S{min,max}>
	1996
	1997	Matches as C<S{max}\|S{max-1}\|...\|S{min+1}\|S{min}>.
	1998
	1999	=item C<S{min,max}?>
	2000
	2001	Matches as C<S{min}\|S{min+1}\|...\|S{max-1}\|S{max}>.
	2002
	2003	=item C<S?>, C<S*>, C<S+>
	2004
	2005	Same as C<S{0,1}>, C<S{0,BIG_NUMBER}>, C<S{1,BIG_NUMBER}> respectively.
	2006
	2007	=item C<S??>, C<S*?>, C<S+?>
	2008
	2009	Same as C<S{0,1}?>, C<S{0,BIG_NUMBER}?>, C<S{1,BIG_NUMBER}?> respectively.
	2010
	2011	=item C<< (?>S) >>
	2012
	2013	Matches the best match for C<S> and only that.
	2014
	2015	=item C<(?=S)>, C<(?<=S)>
	2016
	2017	Only the best match for C<S> is considered. (This is important only if
	2018	C<S> has capturing parentheses, and backreferences are used somewhere
	2019	else in the whole regular expression.)
	2020
	2021	=item C<(?!S)>, C<(?<!S)>
	2022
	2023	For this grouping operator there is no need to describe the ordering, since
	2024	only whether or not C<S> can match is important.
	2025
	2026	=item C<(??{ EXPR })>, C<(?PARNO)>
	2027
	2028	The ordering is the same as for the regular expression which is
	2029	the result of EXPR, or the pattern contained by capture buffer PARNO.
	2030
	2031	=item C<(?(condition)yes-pattern\|no-pattern)>
	2032
	2033	Recall that which of C<yes-pattern> or C<no-pattern> actually matches is
	2034	already determined. The ordering of the matches is the same as for the
	2035	chosen subexpression.
	2036
	2037	=back
	2038
	2039	The above recipes describe the ordering of matches I<at a given position>.
	2040	One more rule is needed to understand how a match is determined for the
	2041	whole regular expression: a match at an earlier position is always better
	2042	than a match at a later position.
	2043
	2044	=head2 Creating Custom RE Engines
	2045
	2046	Overloaded constants (see L<overload>) provide a simple way to extend
	2047	the functionality of the RE engine.
	2048
	2049	Suppose that we want to enable a new RE escape-sequence C<\Y\|> which
	2050	matches at a boundary between whitespace characters and non-whitespace
	2051	characters. Note that C<(?=\S)(?<!\S)\|(?!\S)(?<=\S)> matches exactly
	2052	at these positions, so we want to have each C<\Y\|> in the place of the
	2053	more complicated version. We can create a module C<customre> to do
	2054	this:
	2055
	2056	package customre;
	2057	use overload;
	2058
	2059	sub import {
	2060	shift;
	2061	die "No argument to customre::import allowed" if @_;
	2062	overload::constant 'qr' => \&convert;
	2063	}
	2064
	2065	sub invalid { die "/$_[0]/: invalid escape '\\$_[1]'"}
	2066
	2067	# We must also take care of not escaping the legitimate \\Y\|
	2068	# sequence, hence the presence of '\\' in the conversion rules.
	2069	my %rules = ( '\\' => '\\\\',
	2070	'Y\|' => qr/(?=\S)(?<!\S)\|(?!\S)(?<=\S)/ );
	2071	sub convert {
	2072	my $re = shift;
	2073	$re =~ s{
	2074	\\ ( \\ \| Y . )
	2075	}
	2076	{ $rules{$1} or invalid($re,$1) }sgex;
	2077	return $re;
	2078	}
	2079
	2080	Now C<use customre> enables the new escape in constant regular
	2081	expressions, i.e., those without any runtime variable interpolations.
	2082	As documented in L<overload>, this conversion will work only over
	2083	literal parts of regular expressions. For C<\Y\|$re\Y\|> the variable
	2084	part of this regular expression needs to be converted explicitly
	2085	(but only if the special meaning of C<\Y\|> should be enabled inside $re):
	2086
	2087	use customre;
	2088	$re = <>;
	2089	chomp $re;
	2090	$re = customre::convert $re;
	2091	/\Y\|$re\Y\|/;
	2092
	2093	=head1 PCRE/Python Support
	2094
	2095	As of Perl 5.10 Perl supports several Python/PCRE specific extensions
	2096	to the regex syntax. While Perl programmers are encouraged to use the
	2097	Perl specific syntax, the following are legal in Perl 5.10:
	2098
	2099	=over 4
	2100
	2101	=item C<< (?PE<lt>NAMEE<gt>pattern) >>
	2102
	2103	Define a named capture buffer. Equivalent to C<< (?<NAME>pattern) >>.
	2104
	2105	=item C<< (?P=NAME) >>
	2106
	2107	Backreference to a named capture buffer. Equivalent to C<< \g{NAME} >>.
	2108
	2109	=item C<< (?P>NAME) >>
	2110
	2111	Subroutine call to a named capture buffer. Equivalent to C<< (?&NAME) >>.
	2112
	2113	=back
	2114
	2115	=head1 BUGS
	2116
	2117	This document varies from difficult to understand to completely
	2118	and utterly opaque. The wandering prose riddled with jargon is
	2119	hard to fathom in several places.
	2120
	2121	This document needs a rewrite that separates the tutorial content
	2122	from the reference content.
	2123
	2124	=head1 SEE ALSO
	2125
	2126	L<perlrequick>.
	2127
	2128	L<perlretut>.
	2129
	2130	L<perlop/"Regexp Quote-Like Operators">.
	2131
	2132	L<perlop/"Gory details of parsing quoted constructs">.
	2133
	2134	L<perlfaq6>.
	2135
	2136	L<perlfunc/pos>.
	2137
	2138	L<perllocale>.
	2139
	2140	L<perlebcdic>.
	2141
	2142	I<Mastering Regular Expressions> by Jeffrey Friedl, published
	2143	by O'Reilly and Associates.