perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=head1 NAME
	2	X<regular expression> X<regex> X<regexp>
	3
	4	perlre - Perl regular expressions
	5
	6	=head1 DESCRIPTION
	7
	8	This page describes the syntax of regular expressions in Perl.
	9
	10	If you haven't used regular expressions before, a tutorial introduction
	11	is available in L<perlretut>. If you know just a little about them,
	12	a quick-start introduction is available in L<perlrequick>.
	13
	14	Except for L</The Basics> section, this page assumes you are familiar
	15	with regular expression basics, like what is a "pattern", what does it
	16	look like, and how it is basically used. For a reference on how they
	17	are used, plus various examples of the same, see discussions of C<m//>,
	18	C<s///>, C<qr//> and C<"??"> in L<perlop/"Regexp Quote-Like Operators">.
	19
	20	New in v5.22, L<C<use re 'strict'>\|re/'strict' mode> applies stricter
	21	rules than otherwise when compiling regular expression patterns. It can
	22	find things that, while legal, may not be what you intended.
	23
	24	=head2 The Basics
	25	X<regular expression, version 8> X<regex, version 8> X<regexp, version 8>
	26
	27	Regular expressions are strings with the very particular syntax and
	28	meaning described in this document and auxiliary documents referred to
	29	by this one. The strings are called "patterns". Patterns are used to
	30	determine if some other string, called the "target", has (or doesn't
	31	have) the characteristics specified by the pattern. We call this
	32	"matching" the target string against the pattern. Usually the match is
	33	done by having the target be the first operand, and the pattern be the
	34	second operand, of one of the two binary operators C<=~> and C<!~>,
	35	listed in L<perlop/Binding Operators>; and the pattern will have been
	36	converted from an ordinary string by one of the operators in
	37	L<perlop/"Regexp Quote-Like Operators">, like so:
	38
	39	$foo =~ m/abc/
	40
	41	This evaluates to true if and only if the string in the variable C<$foo>
	42	contains somewhere in it, the sequence of characters "a", "b", then "c".
	43	(The C<=~ m>, or match operator, is described in
	44	L<perlop/m/PATTERN/msixpodualngc>.)
	45
	46	Patterns that aren't already stored in some variable must be delimited,
	47	at both ends, by delimiter characters. These are often, as in the
	48	example above, forward slashes, and the typical way a pattern is written
	49	in documentation is with those slashes. In most cases, the delimiter
	50	is the same character, fore and aft, but there are a few cases where a
	51	character looks like it has a mirror-image mate, where the opening
	52	version is the beginning delimiter, and the closing one is the ending
	53	delimiter, like
	54
	55	$foo =~ m<abc>
	56
	57	Most times, the pattern is evaluated in double-quotish context, but it
	58	is possible to choose delimiters to force single-quotish, like
	59
	60	$foo =~ m'abc'
	61
	62	If the pattern contains its delimiter within it, that delimiter must be
	63	escaped. Prefixing it with a backslash (I<e.g.>, C<"/foo\/bar/">)
	64	serves this purpose.
	65
	66	Any single character in a pattern matches that same character in the
	67	target string, unless the character is a I<metacharacter> with a special
	68	meaning described in this document. A sequence of non-metacharacters
	69	matches the same sequence in the target string, as we saw above with
	70	C<m/abc/>.
	71
	72	Only a few characters (all of them being ASCII punctuation characters)
	73	are metacharacters. The most commonly used one is a dot C<".">, which
	74	normally matches almost any character (including a dot itself).
	75
	76	You can cause characters that normally function as metacharacters to be
	77	interpreted literally by prefixing them with a C<"\">, just like the
	78	pattern's delimiter must be escaped if it also occurs within the
	79	pattern. Thus, C<"\."> matches just a literal dot, C<"."> instead of
	80	its normal meaning. This means that the backslash is also a
	81	metacharacter, so C<"\\"> matches a single C<"\">. And a sequence that
	82	contains an escaped metacharacter matches the same sequence (but without
	83	the escape) in the target string. So, the pattern C</blur\\fl/> would
	84	match any target string that contains the sequence C<"blur\fl">.
	85
	86	The metacharacter C<"\|"> is used to match one thing or another. Thus
	87
	88	$foo =~ m/this\|that/
	89
	90	is TRUE if and only if C<$foo> contains either the sequence C<"this"> or
	91	the sequence C<"that">. Like all metacharacters, prefixing the C<"\|">
	92	with a backslash makes it match the plain punctuation character; in its
	93	case, the VERTICAL LINE.
	94
	95	$foo =~ m/this\\|that/
	96
	97	is TRUE if and only if C<$foo> contains the sequence C<"this\|that">.
	98
	99	You aren't limited to just a single C<"\|">.
	100
	101	$foo =~ m/fee\|fie\|foe\|fum/
	102
	103	is TRUE if and only if C<$foo> contains any of those 4 sequences from
	104	the children's story "Jack and the Beanstalk".
	105
	106	As you can see, the C<"\|"> binds less tightly than a sequence of
	107	ordinary characters. We can override this by using the grouping
	108	metacharacters, the parentheses C<"("> and C<")">.
	109
	110	$foo =~ m/th(is\|at) thing/
	111
	112	is TRUE if and only if C<$foo> contains either the sequence S<C<"this
	113	thing">> or the sequence S<C<"that thing">>. The portions of the string
	114	that match the portions of the pattern enclosed in parentheses are
	115	normally made available separately for use later in the pattern,
	116	substitution, or program. This is called "capturing", and it can get
	117	complicated. See L</Capture groups>.
	118
	119	The first alternative includes everything from the last pattern
	120	delimiter (C<"(">, C<"(?:"> (described later), I<etc>. or the beginning
	121	of the pattern) up to the first C<"\|">, and the last alternative
	122	contains everything from the last C<"\|"> to the next closing pattern
	123	delimiter. That's why it's common practice to include alternatives in
	124	parentheses: to minimize confusion about where they start and end.
	125
	126	Alternatives are tried from left to right, so the first
	127	alternative found for which the entire expression matches, is the one that
	128	is chosen. This means that alternatives are not necessarily greedy. For
	129	example: when matching C<foo\|foot> against C<"barefoot">, only the C<"foo">
	130	part will match, as that is the first alternative tried, and it successfully
	131	matches the target string. (This might not seem important, but it is
	132	important when you are capturing matched text using parentheses.)
	133
	134	Besides taking away the special meaning of a metacharacter, a prefixed
	135	backslash changes some letter and digit characters away from matching
	136	just themselves to instead have special meaning. These are called
	137	"escape sequences", and all such are described in L<perlrebackslash>. A
	138	backslash sequence (of a letter or digit) that doesn't currently have
	139	special meaning to Perl will raise a warning if warnings are enabled,
	140	as those are reserved for potential future use.
	141
	142	One such sequence is C<\b>, which matches a boundary of some sort.
	143	C<\b{wb}> and a few others give specialized types of boundaries.
	144	(They are all described in detail starting at
	145	L<perlrebackslash/\b{}, \b, \B{}, \B>.) Note that these don't match
	146	characters, but the zero-width spaces between characters. They are an
	147	example of a L<zero-width assertion\|/Assertions>. Consider again,
	148
	149	$foo =~ m/fee\|fie\|foe\|fum/
	150
	151	It evaluates to TRUE if, besides those 4 words, any of the sequences
	152	"feed", "field", "Defoe", "fume", and many others are in C<$foo>. By
	153	judicious use of C<\b> (or better (because it is designed to handle
	154	natural language) C<\b{wb}>), we can make sure that only the Giant's
	155	words are matched:
	156
	157	$foo =~ m/\b(fee\|fie\|foe\|fum)\b/
	158	$foo =~ m/\b{wb}(fee\|fie\|foe\|fum)\b{wb}/
	159
	160	The final example shows that the characters C<"{"> and C<"}"> are
	161	metacharacters.
	162
	163	Another use for escape sequences is to specify characters that cannot
	164	(or which you prefer not to) be written literally. These are described
	165	in detail in L<perlrebackslash/Character Escapes>, but the next three
	166	paragraphs briefly describe some of them.
	167
	168	Various control characters can be written in C language style: C<"\n">
	169	matches a newline, C<"\t"> a tab, C<"\r"> a carriage return, C<"\f"> a
	170	form feed, I<etc>.
	171
	172	More generally, C<\I<nnn>>, where I<nnn> is a string of three octal
	173	digits, matches the character whose native code point is I<nnn>. You
	174	can easily run into trouble if you don't have exactly three digits. So
	175	always use three, or since Perl 5.14, you can use C<\o{...}> to specify
	176	any number of octal digits.
	177
	178	Similarly, C<\xI<nn>>, where I<nn> are hexadecimal digits, matches the
	179	character whose native ordinal is I<nn>. Again, not using exactly two
	180	digits is a recipe for disaster, but you can use C<\x{...}> to specify
	181	any number of hex digits.
	182
	183	Besides being a metacharacter, the C<"."> is an example of a "character
	184	class", something that can match any single character of a given set of
	185	them. In its case, the set is just about all possible characters. Perl
	186	predefines several character classes besides the C<".">; there is a
	187	separate reference page about just these, L<perlrecharclass>.
	188
	189	You can define your own custom character classes, by putting into your
	190	pattern in the appropriate place(s), a list of all the characters you
	191	want in the set. You do this by enclosing the list within C<[]> bracket
	192	characters. These are called "bracketed character classes" when we are
	193	being precise, but often the word "bracketed" is dropped. (Dropping it
	194	usually doesn't cause confusion.) This means that the C<"["> character
	195	is another metacharacter. It doesn't match anything just by itself; it
	196	is used only to tell Perl that what follows it is a bracketed character
	197	class. If you want to match a literal left square bracket, you must
	198	escape it, like C<"\[">. The matching C<"]"> is also a metacharacter;
	199	again it doesn't match anything by itself, but just marks the end of
	200	your custom class to Perl. It is an example of a "sometimes
	201	metacharacter". It isn't a metacharacter if there is no corresponding
	202	C<"[">, and matches its literal self:
	203
	204	print "]" =~ /]/; # prints 1
	205
	206	The list of characters within the character class gives the set of
	207	characters matched by the class. C<"[abc]"> matches a single "a" or "b"
	208	or "c". But if the first character after the C<"["> is C<"^">, the
	209	class instead matches any character not in the list. Within a list, the
	210	C<"-"> character specifies a range of characters, so that C<a-z>
	211	represents all characters between "a" and "z", inclusive. If you want
	212	either C<"-"> or C<"]"> itself to be a member of a class, put it at the
	213	start of the list (possibly after a C<"^">), or escape it with a
	214	backslash. C<"-"> is also taken literally when it is at the end of the
	215	list, just before the closing C<"]">. (The following all specify the
	216	same class of three characters: C<[-az]>, C<[az-]>, and C<[a\-z]>. All
	217	are different from C<[a-z]>, which specifies a class containing
	218	twenty-six characters, even on EBCDIC-based character sets.)
	219
	220	There is lots more to bracketed character classes; full details are in
	221	L<perlrecharclass/Bracketed Character Classes>.
	222
	223	=head3 Metacharacters
	224	X<metacharacter>
	225	X<\> X<^> X<.> X<$> X<\|> X<(> X<()> X<[> X<[]>
	226
	227	L</The Basics> introduced some of the metacharacters. This section
	228	gives them all. Most of them have the same meaning as in the I<egrep>
	229	command.
	230
	231	Only the C<"\"> is always a metacharacter. The others are metacharacters
	232	just sometimes. The following tables lists all of them, summarizes
	233	their use, and gives the contexts where they are metacharacters.
	234	Outside those contexts or if prefixed by a C<"\">, they match their
	235	corresponding punctuation character. In some cases, their meaning
	236	varies depending on various pattern modifiers that alter the default
	237	behaviors. See L</Modifiers>.
	238
	239
	240	PURPOSE WHERE
	241	\ Escape the next character Always, except when
	242	escaped by another \
	243	^ Match the beginning of the string Not in []
	244	(or line, if /m is used)
	245	^ Complement the [] class At the beginning of []
	246	. Match any single character except newline Not in []
	247	(under /s, includes newline)
	248	$ Match the end of the string Not in [], but can
	249	(or before newline at the end of the mean interpolate a
	250	string; or before any newline if /m is scalar
	251	used)
	252	\| Alternation Not in []
	253	() Grouping Not in []
	254	[ Start Bracketed Character class Not in []
	255	] End Bracketed Character class Only in [], and
	256	not first
	257	* Matches the preceding element 0 or more Not in []
	258	times
	259	+ Matches the preceding element 1 or more Not in []
	260	times
	261	? Matches the preceding element 0 or 1 Not in []
	262	times
	263	{ Starts a sequence that gives number(s) Not in []
	264	of times the preceding element can be
	265	matched
	266	{ when following certain escape sequences
	267	starts a modifier to the meaning of the
	268	sequence
	269	} End sequence started by {
	270	- Indicates a range Only in [] interior
	271	# Beginning of comment, extends to line end Only with /x modifier
	272
	273	Notice that most of the metacharacters lose their special meaning when
	274	they occur in a bracketed character class, except C<"^"> has a different
	275	meaning when it is at the beginning of such a class. And C<"-"> and C<"]">
	276	are metacharacters only at restricted positions within bracketed
	277	character classes; while C<"}"> is a metacharacter only when closing a
	278	special construct started by C<"{">.
	279
	280	In double-quotish context, as is usually the case, you need to be
	281	careful about C<"$"> and the non-metacharacter C<"@">. Those could
	282	interpolate variables, which may or may not be what you intended.
	283
	284	These rules were designed for compactness of expression, rather than
	285	legibility and maintainability. The L</E<sol>x and E<sol>xx> pattern
	286	modifiers allow you to insert white space to improve readability. And
	287	use of S<C<L<re 'strict'\|re/'strict' mode>>> adds extra checking to
	288	catch some typos that might silently compile into something unintended.
	289
	290	By default, the C<"^"> character is guaranteed to match only the
	291	beginning of the string, the C<"$"> character only the end (or before the
	292	newline at the end), and Perl does certain optimizations with the
	293	assumption that the string contains only one line. Embedded newlines
	294	will not be matched by C<"^"> or C<"$">. You may, however, wish to treat a
	295	string as a multi-line buffer, such that the C<"^"> will match after any
	296	newline within the string (except if the newline is the last character in
	297	the string), and C<"$"> will match before any newline. At the
	298	cost of a little more overhead, you can do this by using the
	299	C<L</E<sol>m>> modifier on the pattern match operator. (Older programs
	300	did this by setting C<$*>, but this option was removed in perl 5.10.)
	301	X<^> X<$> X</m>
	302
	303	To simplify multi-line substitutions, the C<"."> character never matches a
	304	newline unless you use the L<C<E<sol>s>\|/s> modifier, which in effect tells
	305	Perl to pretend the string is a single line--even if it isn't.
	306	X<.> X</s>
	307
	308	=head2 Modifiers
	309
	310	=head3 Overview
	311
	312	The default behavior for matching can be changed, using various
	313	modifiers. Modifiers that relate to the interpretation of the pattern
	314	are listed just below. Modifiers that alter the way a pattern is used
	315	by Perl are detailed in L<perlop/"Regexp Quote-Like Operators"> and
	316	L<perlop/"Gory details of parsing quoted constructs">. Modifiers can be added
	317	dynamically; see L</Extended Patterns> below.
	318
	319	=over 4
	320
	321	=item B<C<m>>
	322	X</m> X<regex, multiline> X<regexp, multiline> X<regular expression, multiline>
	323
	324	Treat the string being matched against as multiple lines. That is, change C<"^"> and C<"$"> from matching
	325	the start of the string's first line and the end of its last line to
	326	matching the start and end of each line within the string.
	327
	328	=item B<C<s>>
	329	X</s> X<regex, single-line> X<regexp, single-line>
	330	X<regular expression, single-line>
	331
	332	Treat the string as single line. That is, change C<"."> to match any character
	333	whatsoever, even a newline, which normally it would not match.
	334
	335	Used together, as C</ms>, they let the C<"."> match any character whatsoever,
	336	while still allowing C<"^"> and C<"$"> to match, respectively, just after
	337	and just before newlines within the string.
	338
	339	=item B<C<i>>
	340	X</i> X<regex, case-insensitive> X<regexp, case-insensitive>
	341	X<regular expression, case-insensitive>
	342
	343	Do case-insensitive pattern matching. For example, "A" will match "a"
	344	under C</i>.
	345
	346	If locale matching rules are in effect, the case map is taken from the
	347	current
	348	locale for code points less than 255, and from Unicode rules for larger
	349	code points. However, matches that would cross the Unicode
	350	rules/non-Unicode rules boundary (ords 255/256) will not succeed, unless
	351	the locale is a UTF-8 one. See L<perllocale>.
	352
	353	There are a number of Unicode characters that match a sequence of
	354	multiple characters under C</i>. For example,
	355	C<LATIN SMALL LIGATURE FI> should match the sequence C<fi>. Perl is not
	356	currently able to do this when the multiple characters are in the pattern and
	357	are split between groupings, or when one or more are quantified. Thus
	358
	359	"\N{LATIN SMALL LIGATURE FI}" =~ /fi/i; # Matches
	360	"\N{LATIN SMALL LIGATURE FI}" =~ /[fi][fi]/i; # Doesn't match!
	361	"\N{LATIN SMALL LIGATURE FI}" =~ /fi*/i; # Doesn't match!
	362
	363	# The below doesn't match, and it isn't clear what $1 and $2 would
	364	# be even if it did!!
	365	"\N{LATIN SMALL LIGATURE FI}" =~ /(f)(i)/i; # Doesn't match!
	366
	367	Perl doesn't match multiple characters in a bracketed
	368	character class unless the character that maps to them is explicitly
	369	mentioned, and it doesn't match them at all if the character class is
	370	inverted, which otherwise could be highly confusing. See
	371	L<perlrecharclass/Bracketed Character Classes>, and
	372	L<perlrecharclass/Negation>.
	373
	374	=item B<C<x>> and B<C<xx>>
	375	X</x>
	376
	377	Extend your pattern's legibility by permitting whitespace and comments.
	378	Details in L</E<sol>x and E<sol>xx>
	379
	380	=item B<C<p>>
	381	X</p> X<regex, preserve> X<regexp, preserve>
	382
	383	Preserve the string matched such that C<${^PREMATCH}>, C<${^MATCH}>, and
	384	C<${^POSTMATCH}> are available for use after matching.
	385
	386	In Perl 5.20 and higher this is ignored. Due to a new copy-on-write
	387	mechanism, C<${^PREMATCH}>, C<${^MATCH}>, and C<${^POSTMATCH}> will be available
	388	after the match regardless of the modifier.
	389
	390	=item B<C<a>>, B<C<d>>, B<C<l>>, and B<C<u>>
	391	X</a> X</d> X</l> X</u>
	392
	393	These modifiers, all new in 5.14, affect which character-set rules
	394	(Unicode, I<etc>.) are used, as described below in
	395	L</Character set modifiers>.
	396
	397	=item B<C<n>>
	398	X</n> X<regex, non-capture> X<regexp, non-capture>
	399	X<regular expression, non-capture>
	400
	401	Prevent the grouping metacharacters C<()> from capturing. This modifier,
	402	new in 5.22, will stop C<$1>, C<$2>, I<etc>... from being filled in.
	403
	404	"hello" =~ /(hi\|hello)/; # $1 is "hello"
	405	"hello" =~ /(hi\|hello)/n; # $1 is undef
	406
	407	This is equivalent to putting C<?:> at the beginning of every capturing group:
	408
	409	"hello" =~ /(?:hi\|hello)/; # $1 is undef
	410
	411	C</n> can be negated on a per-group basis. Alternatively, named captures
	412	may still be used.
	413
	414	"hello" =~ /(?-n:(hi\|hello))/n; # $1 is "hello"
	415	"hello" =~ /(?<greet>hi\|hello)/n; # $1 is "hello", $+{greet} is
	416	# "hello"
	417
	418	=item Other Modifiers
	419
	420	There are a number of flags that can be found at the end of regular
	421	expression constructs that are I<not> generic regular expression flags, but
	422	apply to the operation being performed, like matching or substitution (C<m//>
	423	or C<s///> respectively).
	424
	425	Flags described further in
	426	L<perlretut/"Using regular expressions in Perl"> are:
	427
	428	c - keep the current position during repeated matching
	429	g - globally match the pattern repeatedly in the string
	430
	431	Substitution-specific modifiers described in
	432	L<perlop/"s/PATTERN/REPLACEMENT/msixpodualngcer"> are:
	433
	434	e - evaluate the right-hand side as an expression
	435	ee - evaluate the right side as a string then eval the result
	436	o - pretend to optimize your code, but actually introduce bugs
	437	r - perform non-destructive substitution and return the new value
	438
	439	=back
	440
	441	Regular expression modifiers are usually written in documentation
	442	as I<e.g.>, "the C</x> modifier", even though the delimiter
	443	in question might not really be a slash. The modifiers C</imnsxadlup>
	444	may also be embedded within the regular expression itself using
	445	the C<(?...)> construct, see L</Extended Patterns> below.
	446
	447	=head3 Details on some modifiers
	448
	449	Some of the modifiers require more explanation than given in the
	450	L</Overview> above.
	451
	452	=head4 C</x> and C</xx>
	453
	454	A single C</x> tells
	455	the regular expression parser to ignore most whitespace that is neither
	456	backslashed nor within a bracketed character class, nor within the characters
	457	of a multi-character metapattern like C<(?i: ... )>. You can use this to
	458	break up your regular expression into more readable parts.
	459	Also, the C<"#"> character is treated as a metacharacter introducing a
	460	comment that runs up to the pattern's closing delimiter, or to the end
	461	of the current line if the pattern extends onto the next line. Hence,
	462	this is very much like an ordinary Perl code comment. (You can include
	463	the closing delimiter within the comment only if you precede it with a
	464	backslash, so be careful!)
	465
	466	Use of C</x> means that if you want real
	467	whitespace or C<"#"> characters in the pattern (outside a bracketed character
	468	class, which is unaffected by C</x>), then you'll either have to
	469	escape them (using backslashes or C<\Q...\E>) or encode them using octal,
	470	hex, or C<\N{}> or C<\p{name=...}> escapes.
	471	It is ineffective to try to continue a comment onto the next line by
	472	escaping the C<\n> with a backslash or C<\Q>.
	473
	474	You can use L</(?#text)> to create a comment that ends earlier than the
	475	end of the current line, but C<text> also can't contain the closing
	476	delimiter unless escaped with a backslash.
	477
	478	A common pitfall is to forget that C<"#"> characters (outside a
	479	bracketed character class) begin a comment under C</x> and are not
	480	matched literally. Just keep that in mind when trying to puzzle out why
	481	a particular C</x> pattern isn't working as expected.
	482	Inside a bracketed character class, C<"#"> retains its non-special,
	483	literal meaning.
	484
	485	Starting in Perl v5.26, if the modifier has a second C<"x"> within it,
	486	the effect of a single C</x> is increased. The only difference is that
	487	inside bracketed character classes, non-escaped (by a backslash) SPACE
	488	and TAB characters are not added to the class, and hence can be inserted
	489	to make the classes more readable:
	490
	491	/ [d-e g-i 3-7]/xx
	492	/[ ! @ " # $ % ^ & * () = ? <> ' ]/xx
	493
	494	may be easier to grasp than the squashed equivalents
	495
	496	/[d-eg-i3-7]/
	497	/[!@"#$%^&*()=?<>']/
	498
	499	Note that this unfortunately doesn't mean that your bracketed classes
	500	can contain comments or extend over multiple lines. A C<#> inside a
	501	character class is still just a literal C<#>, and doesn't introduce a
	502	comment. And, unless the closing bracket is on the same line as the
	503	opening one, the newline character (and everything on the next line(s)
	504	until terminated by a C<]> will be part of the class, just as if you'd
	505	written C<\n>.
	506
	507	Taken together, these features go a long way towards
	508	making Perl's regular expressions more readable. Here's an example:
	509
	510	# Delete (most) C comments.
	511	$program =~ s {
	512	/\* # Match the opening delimiter.
	513	.*? # Match a minimal number of characters.
	514	\*/ # Match the closing delimiter.
	515	} []gsx;
	516
	517	Note that anything inside
	518	a C<\Q...\E> stays unaffected by C</x>. And note that C</x> doesn't affect
	519	space interpretation within a single multi-character construct. For
	520	example C<(?:...)> can't have a space between the C<"(">,
	521	C<"?">, and C<":">. Within any delimiters for such a construct, allowed
	522	spaces are not affected by C</x>, and depend on the construct. For
	523	example, all constructs using curly braces as delimiters, such as
	524	C<\x{...}> can have blanks within but adjacent to the braces, but not
	525	elsewhere, and no non-blank space characters. An exception are Unicode
	526	properties which follow Unicode rules, for which see
	527	L<perluniprops/Properties accessible through \p{} and \P{}>.
	528	X</x>
	529
	530	The set of characters that are deemed whitespace are those that Unicode
	531	calls "Pattern White Space", namely:
	532
	533	U+0009 CHARACTER TABULATION
	534	U+000A LINE FEED
	535	U+000B LINE TABULATION
	536	U+000C FORM FEED
	537	U+000D CARRIAGE RETURN
	538	U+0020 SPACE
	539	U+0085 NEXT LINE
	540	U+200E LEFT-TO-RIGHT MARK
	541	U+200F RIGHT-TO-LEFT MARK
	542	U+2028 LINE SEPARATOR
	543	U+2029 PARAGRAPH SEPARATOR
	544
	545	=head4 Character set modifiers
	546
	547	C</d>, C</u>, C</a>, and C</l>, available starting in 5.14, are called
	548	the character set modifiers; they affect the character set rules
	549	used for the regular expression.
	550
	551	The C</d>, C</u>, and C</l> modifiers are not likely to be of much use
	552	to you, and so you need not worry about them very much. They exist for
	553	Perl's internal use, so that complex regular expression data structures
	554	can be automatically serialized and later exactly reconstituted,
	555	including all their nuances. But, since Perl can't keep a secret, and
	556	there may be rare instances where they are useful, they are documented
	557	here.
	558
	559	The C</a> modifier, on the other hand, may be useful. Its purpose is to
	560	allow code that is to work mostly on ASCII data to not have to concern
	561	itself with Unicode.
	562
	563	Briefly, C</l> sets the character set to that of whatever B<L>ocale is in
	564	effect at the time of the execution of the pattern match.
	565
	566	C</u> sets the character set to B<U>nicode.
	567
	568	C</a> also sets the character set to Unicode, BUT adds several
	569	restrictions for B<A>SCII-safe matching.
	570
	571	C</d> is the old, problematic, pre-5.14 B<D>efault character set
	572	behavior. Its only use is to force that old behavior.
	573
	574	At any given time, exactly one of these modifiers is in effect. Their
	575	existence allows Perl to keep the originally compiled behavior of a
	576	regular expression, regardless of what rules are in effect when it is
	577	actually executed. And if it is interpolated into a larger regex, the
	578	original's rules continue to apply to it, and don't affect the other
	579	parts.
	580
	581	The C</l> and C</u> modifiers are automatically selected for
	582	regular expressions compiled within the scope of various pragmas,
	583	and we recommend that in general, you use those pragmas instead of
	584	specifying these modifiers explicitly. For one thing, the modifiers
	585	affect only pattern matching, and do not extend to even any replacement
	586	done, whereas using the pragmas gives consistent results for all
	587	appropriate operations within their scopes. For example,
	588
	589	s/foo/\Ubar/il
	590
	591	will match "foo" using the locale's rules for case-insensitive matching,
	592	but the C</l> does not affect how the C<\U> operates. Most likely you
	593	want both of them to use locale rules. To do this, instead compile the
	594	regular expression within the scope of C<use locale>. This both
	595	implicitly adds the C</l>, and applies locale rules to the C<\U>. The
	596	lesson is to C<use locale>, and not C</l> explicitly.
	597
	598	Similarly, it would be better to use C<use feature 'unicode_strings'>
	599	instead of,
	600
	601	s/foo/\Lbar/iu
	602
	603	to get Unicode rules, as the C<\L> in the former (but not necessarily
	604	the latter) would also use Unicode rules.
	605
	606	More detail on each of the modifiers follows. Most likely you don't
	607	need to know this detail for C</l>, C</u>, and C</d>, and can skip ahead
	608	to L<E<sol>a\|/E<sol>a (and E<sol>aa)>.
	609
	610	=head4 /l
	611
	612	means to use the current locale's rules (see L<perllocale>) when pattern
	613	matching. For example, C<\w> will match the "word" characters of that
	614	locale, and C<"/i"> case-insensitive matching will match according to
	615	the locale's case folding rules. The locale used will be the one in
	616	effect at the time of execution of the pattern match. This may not be
	617	the same as the compilation-time locale, and can differ from one match
	618	to another if there is an intervening call of the
	619	L<setlocale() function\|perllocale/The setlocale function>.
	620
	621	Prior to v5.20, Perl did not support multi-byte locales. Starting then,
	622	UTF-8 locales are supported. No other multi byte locales are ever
	623	likely to be supported. However, in all locales, one can have code
	624	points above 255 and these will always be treated as Unicode no matter
	625	what locale is in effect.
	626
	627	Under Unicode rules, there are a few case-insensitive matches that cross
	628	the 255/256 boundary. Except for UTF-8 locales in Perls v5.20 and
	629	later, these are disallowed under C</l>. For example, 0xFF (on ASCII
	630	platforms) does not caselessly match the character at 0x178, C<LATIN
	631	CAPITAL LETTER Y WITH DIAERESIS>, because 0xFF may not be C<LATIN SMALL
	632	LETTER Y WITH DIAERESIS> in the current locale, and Perl has no way of
	633	knowing if that character even exists in the locale, much less what code
	634	point it is.
	635
	636	In a UTF-8 locale in v5.20 and later, the only visible difference
	637	between locale and non-locale in regular expressions should be tainting,
	638	if your perl supports taint checking (see L<perlsec>).
	639
	640	This modifier may be specified to be the default by C<use locale>, but
	641	see L</Which character set modifier is in effect?>.
	642	X</l>
	643
	644	=head4 /u
	645
	646	means to use Unicode rules when pattern matching. On ASCII platforms,
	647	this means that the code points between 128 and 255 take on their
	648	Latin-1 (ISO-8859-1) meanings (which are the same as Unicode's).
	649	(Otherwise Perl considers their meanings to be undefined.) Thus,
	650	under this modifier, the ASCII platform effectively becomes a Unicode
	651	platform; and hence, for example, C<\w> will match any of the more than
	652	100_000 word characters in Unicode.
	653
	654	Unlike most locales, which are specific to a language and country pair,
	655	Unicode classifies all the characters that are letters I<somewhere> in
	656	the world as
	657	C<\w>. For example, your locale might not think that C<LATIN SMALL
	658	LETTER ETH> is a letter (unless you happen to speak Icelandic), but
	659	Unicode does. Similarly, all the characters that are decimal digits
	660	somewhere in the world will match C<\d>; this is hundreds, not 10,
	661	possible matches. And some of those digits look like some of the 10
	662	ASCII digits, but mean a different number, so a human could easily think
	663	a number is a different quantity than it really is. For example,
	664	C<BENGALI DIGIT FOUR> (U+09EA) looks very much like an
	665	C<ASCII DIGIT EIGHT> (U+0038), and C<LEPCHA DIGIT SIX> (U+1C46) looks
	666	very much like an C<ASCII DIGIT FIVE> (U+0035). And, C<\d+>, may match
	667	strings of digits that are a mixture from different writing systems,
	668	creating a security issue. A fraudulent website, for example, could
	669	display the price of something using U+1C46, and it would appear to the
	670	user that something cost 500 units, but it really costs 600. A browser
	671	that enforced script runs (L</Script Runs>) would prevent that
	672	fraudulent display. L<Unicode::UCD/num()> can also be used to sort this
	673	out. Or the C</a> modifier can be used to force C<\d> to match just the
	674	ASCII 0 through 9.
	675
	676	Also, under this modifier, case-insensitive matching works on the full
	677	set of Unicode
	678	characters. The C<KELVIN SIGN>, for example matches the letters "k" and
	679	"K"; and C<LATIN SMALL LIGATURE FF> matches the sequence "ff", which,
	680	if you're not prepared, might make it look like a hexadecimal constant,
	681	presenting another potential security issue. See
	682	L<https://unicode.org/reports/tr36> for a detailed discussion of Unicode
	683	security issues.
	684
	685	This modifier may be specified to be the default by C<use feature
	686	'unicode_strings>, C<use locale ':not_characters'>, or
	687	C<L<use v5.12\|perlfunc/use VERSION>> (or higher),
	688	but see L</Which character set modifier is in effect?>.
	689	X</u>
	690
	691	=head4 /d
	692
	693	B<IMPORTANT:> Because of the unpredictable behaviors this
	694	modifier causes, only use it to maintain weird backward compatibilities.
	695	Use the
	696	L<< C<unicode_strings>\|feature/"The 'unicode_strings' feature" >>
	697	feature
	698	in new code to avoid inadvertently enabling this modifier by default.
	699
	700	What does this modifier do? It "Depends"!
	701
	702	This modifier means to use platform-native matching rules
	703	except when there is cause to use Unicode rules instead, as follows:
	704
	705	=over 4
	706
	707	=item 1
	708
	709	the target string's L<UTF8 flag\|perlunifaq/What is "the UTF8 flag"?>
	710	(see below) is set; or
	711
	712	=item 2
	713
	714	the pattern's L<UTF8 flag\|perlunifaq/What is "the UTF8 flag"?>
	715	(see below) is set; or
	716
	717	=item 3
	718
	719	the pattern explicitly mentions a code point that is above 255 (say by
	720	C<\x{100}>); or
	721
	722	=item 4
	723
	724	the pattern uses a Unicode name (C<\N{...}>); or
	725
	726	=item 5
	727
	728	the pattern uses a Unicode property (C<\p{...}> or C<\P{...}>); or
	729
	730	=item 6
	731
	732	the pattern uses a Unicode break (C<\b{...}> or C<\B{...}>); or
	733
	734	=item 7
	735
	736	the pattern uses C<L</(?[ ])>>
	737
	738	=item 8
	739
	740	the pattern uses L<C<(*script_run: ...)>\|/Script Runs>
	741
	742	=back
	743
	744	Regarding the "UTF8 flag" references above: normally Perl applications
	745	shouldn't think about that flag. It's part of Perl's internals,
	746	so it can change whenever Perl wants. C</d> may thus cause unpredictable
	747	results. See L<perlunicode/The "Unicode Bug">. This bug
	748	has become rather infamous, leading to yet other (without swearing) names
	749	for this modifier like "Dicey" and "Dodgy".
	750
	751	Here are some examples of how that works on an ASCII platform:
	752
	753	$str = "\xDF"; #
	754	utf8::downgrade($str); # $str is not UTF8-flagged.
	755	$str =~ /^\w/; # No match, since no UTF8 flag.
	756
	757	$str .= "\x{0e0b}"; # Now $str is UTF8-flagged.
	758	$str =~ /^\w/; # Match! $str is now UTF8-flagged.
	759	chop $str;
	760	$str =~ /^\w/; # Still a match! $str retains its UTF8 flag.
	761
	762	Under Perl's default configuration this modifier is automatically
	763	selected by default when none of the others are, so yet another name
	764	for it (unfortunately) is "Default".
	765
	766	Whenever you can, use the
	767	L<< C<unicode_strings>\|feature/"The 'unicode_strings' feature" >>
	768	to cause X</u> to be the default instead.
	769
	770	=head4 /a (and /aa)
	771
	772	This modifier stands for ASCII-restrict (or ASCII-safe). This modifier
	773	may be doubled-up to increase its effect.
	774
	775	When it appears singly, it causes the sequences C<\d>, C<\s>, C<\w>, and
	776	the Posix character classes to match only in the ASCII range. They thus
	777	revert to their pre-5.6, pre-Unicode meanings. Under C</a>, C<\d>
	778	always means precisely the digits C<"0"> to C<"9">; C<\s> means the five
	779	characters C<[ \f\n\r\t]>, and starting in Perl v5.18, the vertical tab;
	780	C<\w> means the 63 characters
	781	C<[A-Za-z0-9_]>; and likewise, all the Posix classes such as
	782	C<[[:print:]]> match only the appropriate ASCII-range characters.
	783
	784	This modifier is useful for people who only incidentally use Unicode,
	785	and who do not wish to be burdened with its complexities and security
	786	concerns.
	787
	788	With C</a>, one can write C<\d> with confidence that it will only match
	789	ASCII characters, and should the need arise to match beyond ASCII, you
	790	can instead use C<\p{Digit}> (or C<\p{Word}> for C<\w>). There are
	791	similar C<\p{...}> constructs that can match beyond ASCII both white
	792	space (see L<perlrecharclass/Whitespace>), and Posix classes (see
	793	L<perlrecharclass/POSIX Character Classes>). Thus, this modifier
	794	doesn't mean you can't use Unicode, it means that to get Unicode
	795	matching you must explicitly use a construct (C<\p{}>, C<\P{}>) that
	796	signals Unicode.
	797
	798	As you would expect, this modifier causes, for example, C<\D> to mean
	799	the same thing as C<[^0-9]>; in fact, all non-ASCII characters match
	800	C<\D>, C<\S>, and C<\W>. C<\b> still means to match at the boundary
	801	between C<\w> and C<\W>, using the C</a> definitions of them (similarly
	802	for C<\B>).
	803
	804	Otherwise, C</a> behaves like the C</u> modifier, in that
	805	case-insensitive matching uses Unicode rules; for example, "k" will
	806	match the Unicode C<\N{KELVIN SIGN}> under C</i> matching, and code
	807	points in the Latin1 range, above ASCII will have Unicode rules when it
	808	comes to case-insensitive matching.
	809
	810	To forbid ASCII/non-ASCII matches (like "k" with C<\N{KELVIN SIGN}>),
	811	specify the C<"a"> twice, for example C</aai> or C</aia>. (The first
	812	occurrence of C<"a"> restricts the C<\d>, I<etc>., and the second occurrence
	813	adds the C</i> restrictions.) But, note that code points outside the
	814	ASCII range will use Unicode rules for C</i> matching, so the modifier
	815	doesn't really restrict things to just ASCII; it just forbids the
	816	intermixing of ASCII and non-ASCII.
	817
	818	To summarize, this modifier provides protection for applications that
	819	don't wish to be exposed to all of Unicode. Specifying it twice
	820	gives added protection.
	821
	822	This modifier may be specified to be the default by C<use re '/a'>
	823	or C<use re '/aa'>. If you do so, you may actually have occasion to use
	824	the C</u> modifier explicitly if there are a few regular expressions
	825	where you do want full Unicode rules (but even here, it's best if
	826	everything were under feature C<"unicode_strings">, along with the
	827	C<use re '/aa'>). Also see L</Which character set modifier is in
	828	effect?>.
	829	X</a>
	830	X</aa>
	831
	832	=head4 Which character set modifier is in effect?
	833
	834	Which of these modifiers is in effect at any given point in a regular
	835	expression depends on a fairly complex set of interactions. These have
	836	been designed so that in general you don't have to worry about it, but
	837	this section gives the gory details. As
	838	explained below in L</Extended Patterns> it is possible to explicitly
	839	specify modifiers that apply only to portions of a regular expression.
	840	The innermost always has priority over any outer ones, and one applying
	841	to the whole expression has priority over any of the default settings that are
	842	described in the remainder of this section.
	843
	844	The C<L<use re 'E<sol>foo'\|re/"'/flags' mode">> pragma can be used to set
	845	default modifiers (including these) for regular expressions compiled
	846	within its scope. This pragma has precedence over the other pragmas
	847	listed below that also change the defaults.
	848
	849	Otherwise, C<L<use locale\|perllocale>> sets the default modifier to C</l>;
	850	and C<L<use feature 'unicode_strings\|feature>>, or
	851	C<L<use v5.12\|perlfunc/use VERSION>> (or higher) set the default to
	852	C</u> when not in the same scope as either C<L<use locale\|perllocale>>
	853	or C<L<use bytes\|bytes>>.
	854	(C<L<use locale ':not_characters'\|perllocale/Unicode and UTF-8>> also
	855	sets the default to C</u>, overriding any plain C<use locale>.)
	856	Unlike the mechanisms mentioned above, these
	857	affect operations besides regular expressions pattern matching, and so
	858	give more consistent results with other operators, including using
	859	C<\U>, C<\l>, I<etc>. in substitution replacements.
	860
	861	If none of the above apply, for backwards compatibility reasons, the
	862	C</d> modifier is the one in effect by default. As this can lead to
	863	unexpected results, it is best to specify which other rule set should be
	864	used.
	865
	866	=head4 Character set modifier behavior prior to Perl 5.14
	867
	868	Prior to 5.14, there were no explicit modifiers, but C</l> was implied
	869	for regexes compiled within the scope of C<use locale>, and C</d> was
	870	implied otherwise. However, interpolating a regex into a larger regex
	871	would ignore the original compilation in favor of whatever was in effect
	872	at the time of the second compilation. There were a number of
	873	inconsistencies (bugs) with the C</d> modifier, where Unicode rules
	874	would be used when inappropriate, and vice versa. C<\p{}> did not imply
	875	Unicode rules, and neither did all occurrences of C<\N{}>, until 5.12.
	876
	877	=head2 Regular Expressions
	878
	879	=head3 Quantifiers
	880
	881	Quantifiers are used when a particular portion of a pattern needs to
	882	match a certain number (or numbers) of times. If there isn't a
	883	quantifier the number of times to match is exactly one. The following
	884	standard quantifiers are recognized:
	885	X<metacharacter> X<quantifier> X<*> X<+> X<?> X<{n}> X<{n,}> X<{n,m}>
	886
	887	* Match 0 or more times
	888	+ Match 1 or more times
	889	? Match 1 or 0 times
	890	{n} Match exactly n times
	891	{n,} Match at least n times
	892	{,n} Match at most n times
	893	{n,m} Match at least n but not more than m times
	894
	895	(If a non-escaped curly bracket occurs in a context other than one of
	896	the quantifiers listed above, where it does not form part of a
	897	backslashed sequence like C<\x{...}>, it is either a fatal syntax error,
	898	or treated as a regular character, generally with a deprecation warning
	899	raised. To escape it, you can precede it with a backslash (C<"\{">) or
	900	enclose it within square brackets (C<"[{]">).
	901	This change will allow for future syntax extensions (like making the
	902	lower bound of a quantifier optional), and better error checking of
	903	quantifiers).
	904
	905	The C<"*"> quantifier is equivalent to C<{0,}>, the C<"+">
	906	quantifier to C<{1,}>, and the C<"?"> quantifier to C<{0,1}>. I<n> and I<m> are limited
	907	to non-negative integral values less than a preset limit defined when perl is built.
	908	This is usually 65534 on the most common platforms. The actual limit can
	909	be seen in the error message generated by code such as this:
	910
	911	$_ **= $_ , / {$_} / for 2 .. 42;
	912
	913	By default, a quantified subpattern is "greedy", that is, it will match as
	914	many times as possible (given a particular starting location) while still
	915	allowing the rest of the pattern to match. If you want it to match the
	916	minimum number of times possible, follow the quantifier with a C<"?">. Note
	917	that the meanings don't change, just the "greediness":
	918	X<metacharacter> X<greedy> X<greediness>
	919	X<?> X<*?> X<+?> X<??> X<{n}?> X<{n,}?> X<{,n}?> X<{n,m}?>
	920
	921	*? Match 0 or more times, not greedily
	922	+? Match 1 or more times, not greedily
	923	?? Match 0 or 1 time, not greedily
	924	{n}? Match exactly n times, not greedily (redundant)
	925	{n,}? Match at least n times, not greedily
	926	{,n}? Match at most n times, not greedily
	927	{n,m}? Match at least n but not more than m times, not greedily
	928
	929	Normally when a quantified subpattern does not allow the rest of the
	930	overall pattern to match, Perl will backtrack. However, this behaviour is
	931	sometimes undesirable. Thus Perl provides the "possessive" quantifier form
	932	as well.
	933
	934	*+ Match 0 or more times and give nothing back
	935	++ Match 1 or more times and give nothing back
	936	?+ Match 0 or 1 time and give nothing back
	937	{n}+ Match exactly n times and give nothing back (redundant)
	938	{n,}+ Match at least n times and give nothing back
	939	{,n}+ Match at most n times and give nothing back
	940	{n,m}+ Match at least n but not more than m times and give nothing back
	941
	942	For instance,
	943
	944	'aaaa' =~ /a++a/
	945
	946	will never match, as the C<a++> will gobble up all the C<"a">'s in the
	947	string and won't leave any for the remaining part of the pattern. This
	948	feature can be extremely useful to give perl hints about where it
	949	shouldn't backtrack. For instance, the typical "match a double-quoted
	950	string" problem can be most efficiently performed when written as:
	951
	952	/"(?:[^"\\]++\|\\.)*+"/
	953
	954	as we know that if the final quote does not match, backtracking will not
	955	help. See the independent subexpression
	956	C<L</(?E<gt>I<pattern>)>> for more details;
	957	possessive quantifiers are just syntactic sugar for that construct. For
	958	instance the above example could also be written as follows:
	959
	960	/"(?>(?:(?>[^"\\]+)\|\\.)*)"/
	961
	962	Note that the possessive quantifier modifier can not be combined
	963	with the non-greedy modifier. This is because it would make no sense.
	964	Consider the follow equivalency table:
	965
	966	Illegal Legal
	967	------------ ------
	968	X??+ X{0}
	969	X+?+ X{1}
	970	X{min,max}?+ X{min}
	971
	972	=head3 Escape sequences
	973
	974	Because patterns are processed as double-quoted strings, the following
	975	also work:
	976
	977	\t tab (HT, TAB)
	978	\n newline (LF, NL)
	979	\r return (CR)
	980	\f form feed (FF)
	981	\a alarm (bell) (BEL)
	982	\e escape (think troff) (ESC)
	983	\cK control char (example: VT)
	984	\x{}, \x00 character whose ordinal is the given hexadecimal number
	985	\N{name} named Unicode character or character sequence
	986	\N{U+263D} Unicode character (example: FIRST QUARTER MOON)
	987	\o{}, \000 character whose ordinal is the given octal number
	988	\l lowercase next char (think vi)
	989	\u uppercase next char (think vi)
	990	\L lowercase until \E (think vi)
	991	\U uppercase until \E (think vi)
	992	\Q quote (disable) pattern metacharacters until \E
	993	\E end either case modification or quoted section, think vi
	994
	995	Details are in L<perlop/Quote and Quote-like Operators>.
	996
	997	=head3 Character Classes and other Special Escapes
	998
	999	In addition, Perl defines the following:
	1000	X<\g> X<\k> X<\K> X<backreference>
	1001
	1002	Sequence Note Description
	1003	[...] [1] Match a character according to the rules of the
	1004	bracketed character class defined by the "...".
	1005	Example: [a-z] matches "a" or "b" or "c" ... or "z"
	1006	[[:...:]] [2] Match a character according to the rules of the POSIX
	1007	character class "..." within the outer bracketed
	1008	character class. Example: [[:upper:]] matches any
	1009	uppercase character.
	1010	(?[...]) [8] Extended bracketed character class
	1011	\w [3] Match a "word" character (alphanumeric plus "_", plus
	1012	other connector punctuation chars plus Unicode
	1013	marks)
	1014	\W [3] Match a non-"word" character
	1015	\s [3] Match a whitespace character
	1016	\S [3] Match a non-whitespace character
	1017	\d [3] Match a decimal digit character
	1018	\D [3] Match a non-digit character
	1019	\pP [3] Match P, named property. Use \p{Prop} for longer names
	1020	\PP [3] Match non-P
	1021	\X [4] Match Unicode "eXtended grapheme cluster"
	1022	\1 [5] Backreference to a specific capture group or buffer.
	1023	'1' may actually be any positive integer.
	1024	\g1 [5] Backreference to a specific or previous group,
	1025	\g{-1} [5] The number may be negative indicating a relative
	1026	previous group and may optionally be wrapped in
	1027	curly brackets for safer parsing.
	1028	\g{name} [5] Named backreference
	1029	\k<name> [5] Named backreference
	1030	\k'name' [5] Named backreference
	1031	\k{name} [5] Named backreference
	1032	\K [6] Keep the stuff left of the \K, don't include it in $&
	1033	\N [7] Any character but \n. Not affected by /s modifier
	1034	\v [3] Vertical whitespace
	1035	\V [3] Not vertical whitespace
	1036	\h [3] Horizontal whitespace
	1037	\H [3] Not horizontal whitespace
	1038	\R [4] Linebreak
	1039
	1040	=over 4
	1041
	1042	=item [1]
	1043
	1044	See L<perlrecharclass/Bracketed Character Classes> for details.
	1045
	1046	=item [2]
	1047
	1048	See L<perlrecharclass/POSIX Character Classes> for details.
	1049
	1050	=item [3]
	1051
	1052	See L<perlunicode/Unicode Character Properties> for details
	1053
	1054	=item [4]
	1055
	1056	See L<perlrebackslash/Misc> for details.
	1057
	1058	=item [5]
	1059
	1060	See L</Capture groups> below for details.
	1061
	1062	=item [6]
	1063
	1064	See L</Extended Patterns> below for details.
	1065
	1066	=item [7]
	1067
	1068	Note that C<\N> has two meanings. When of the form C<\N{I<NAME>}>, it
	1069	matches the character or character sequence whose name is I<NAME>; and
	1070	similarly
	1071	when of the form C<\N{U+I<hex>}>, it matches the character whose Unicode
	1072	code point is I<hex>. Otherwise it matches any character but C<\n>.
	1073
	1074	=item [8]
	1075
	1076	See L<perlrecharclass/Extended Bracketed Character Classes> for details.
	1077
	1078	=back
	1079
	1080	=head3 Assertions
	1081
	1082	Besides L<C<"^"> and C<"$">\|/Metacharacters>, Perl defines the following
	1083	zero-width assertions:
	1084	X<zero-width assertion> X<assertion> X<regex, zero-width assertion>
	1085	X<regexp, zero-width assertion>
	1086	X<regular expression, zero-width assertion>
	1087	X<\b> X<\B> X<\A> X<\Z> X<\z> X<\G>
	1088
	1089	\b{} Match at Unicode boundary of specified type
	1090	\B{} Match where corresponding \b{} doesn't match
	1091	\b Match a \w\W or \W\w boundary
	1092	\B Match except at a \w\W or \W\w boundary
	1093	\A Match only at beginning of string
	1094	\Z Match only at end of string, or before newline at the end
	1095	\z Match only at end of string
	1096	\G Match only at pos() (e.g. at the end-of-match position
	1097	of prior m//g)
	1098
	1099	A Unicode boundary (C<\b{}>), available starting in v5.22, is a spot
	1100	between two characters, or before the first character in the string, or
	1101	after the final character in the string where certain criteria defined
	1102	by Unicode are met. See L<perlrebackslash/\b{}, \b, \B{}, \B> for
	1103	details.
	1104
	1105	A word boundary (C<\b>) is a spot between two characters
	1106	that has a C<\w> on one side of it and a C<\W> on the other side
	1107	of it (in either order), counting the imaginary characters off the
	1108	beginning and end of the string as matching a C<\W>. (Within
	1109	character classes C<\b> represents backspace rather than a word
	1110	boundary, just as it normally does in any double-quoted string.)
	1111	The C<\A> and C<\Z> are just like C<"^"> and C<"$">, except that they
	1112	won't match multiple times when the C</m> modifier is used, while
	1113	C<"^"> and C<"$"> will match at every internal line boundary. To match
	1114	the actual end of the string and not ignore an optional trailing
	1115	newline, use C<\z>.
	1116	X<\b> X<\A> X<\Z> X<\z> X</m>
	1117
	1118	The C<\G> assertion can be used to chain global matches (using
	1119	C<m//g>), as described in L<perlop/"Regexp Quote-Like Operators">.
	1120	It is also useful when writing C<lex>-like scanners, when you have
	1121	several patterns that you want to match against consequent substrings
	1122	of your string; see the previous reference. The actual location
	1123	where C<\G> will match can also be influenced by using C<pos()> as
	1124	an lvalue: see L<perlfunc/pos>. Note that the rule for zero-length
	1125	matches (see L</"Repeated Patterns Matching a Zero-length Substring">)
	1126	is modified somewhat, in that contents to the left of C<\G> are
	1127	not counted when determining the length of the match. Thus the following
	1128	will not match forever:
	1129	X<\G>
	1130
	1131	my $string = 'ABC';
	1132	pos($string) = 1;
	1133	while ($string =~ /(.\G)/g) {
	1134	print $1;
	1135	}
	1136
	1137	It will print 'A' and then terminate, as it considers the match to
	1138	be zero-width, and thus will not match at the same position twice in a
	1139	row.
	1140
	1141	It is worth noting that C<\G> improperly used can result in an infinite
	1142	loop. Take care when using patterns that include C<\G> in an alternation.
	1143
	1144	Note also that C<s///> will refuse to overwrite part of a substitution
	1145	that has already been replaced; so for example this will stop after the
	1146	first iteration, rather than iterating its way backwards through the
	1147	string:
	1148
	1149	$_ = "123456789";
	1150	pos = 6;
	1151	s/.(?=.\G)/X/g;
	1152	print; # prints 1234X6789, not XXXXX6789
	1153
	1154
	1155	=head3 Capture groups
	1156
	1157	The grouping construct C<( ... )> creates capture groups (also referred to as
	1158	capture buffers). To refer to the current contents of a group later on, within
	1159	the same pattern, use C<\g1> (or C<\g{1}>) for the first, C<\g2> (or C<\g{2}>)
	1160	for the second, and so on.
	1161	This is called a I<backreference>.
	1162	X<regex, capture buffer> X<regexp, capture buffer>
	1163	X<regex, capture group> X<regexp, capture group>
	1164	X<regular expression, capture buffer> X<backreference>
	1165	X<regular expression, capture group> X<backreference>
	1166	X<\g{1}> X<\g{-1}> X<\g{name}> X<relative backreference> X<named backreference>
	1167	X<named capture buffer> X<regular expression, named capture buffer>
	1168	X<named capture group> X<regular expression, named capture group>
	1169	X<%+> X<$+{name}> X<< \k<name> >>
	1170	There is no limit to the number of captured substrings that you may use.
	1171	Groups are numbered with the leftmost open parenthesis being number 1, I<etc>. If
	1172	a group did not match, the associated backreference won't match either. (This
	1173	can happen if the group is optional, or in a different branch of an
	1174	alternation.)
	1175	You can omit the C<"g">, and write C<"\1">, I<etc>, but there are some issues with
	1176	this form, described below.
	1177
	1178	You can also refer to capture groups relatively, by using a negative number, so
	1179	that C<\g-1> and C<\g{-1}> both refer to the immediately preceding capture
	1180	group, and C<\g-2> and C<\g{-2}> both refer to the group before it. For
	1181	example:
	1182
	1183	/
	1184	(Y) # group 1
	1185	( # group 2
	1186	(X) # group 3
	1187	\g{-1} # backref to group 3
	1188	\g{-3} # backref to group 1
	1189	)
	1190	/x
	1191
	1192	would match the same as C</(Y) ( (X) \g3 \g1 )/x>. This allows you to
	1193	interpolate regexes into larger regexes and not have to worry about the
	1194	capture groups being renumbered.
	1195
	1196	You can dispense with numbers altogether and create named capture groups.
	1197	The notation is C<(?E<lt>I<name>E<gt>...)> to declare and C<\g{I<name>}> to
	1198	reference. (To be compatible with .Net regular expressions, C<\g{I<name>}> may
	1199	also be written as C<\k{I<name>}>, C<\kE<lt>I<name>E<gt>> or C<\k'I<name>'>.)
	1200	I<name> must not begin with a number, nor contain hyphens.
	1201	When different groups within the same pattern have the same name, any reference
	1202	to that name assumes the leftmost defined group. Named groups count in
	1203	absolute and relative numbering, and so can also be referred to by those
	1204	numbers.
	1205	(It's possible to do things with named capture groups that would otherwise
	1206	require C<(??{})>.)
	1207
	1208	Capture group contents are dynamically scoped and available to you outside the
	1209	pattern until the end of the enclosing block or until the next successful
	1210	match in the same scope, whichever comes first.
	1211	See L<perlsyn/"Compound Statements"> and
	1212	L<perlvar/"Scoping Rules of Regex Variables"> for more details.
	1213
	1214	You can access the contents of a capture group by absolute number (using
	1215	C<"$1"> instead of C<"\g1">, I<etc>); or by name via the C<%+> hash,
	1216	using C<"$+{I<name>}">.
	1217
	1218	Braces are required in referring to named capture groups, but are optional for
	1219	absolute or relative numbered ones. Braces are safer when creating a regex by
	1220	concatenating smaller strings. For example if you have C<qr/$x$y/>, and C<$x>
	1221	contained C<"\g1">, and C<$y> contained C<"37">, you would get C</\g137/> which
	1222	is probably not what you intended.
	1223
	1224	If you use braces, you may also optionally add any number of blank
	1225	(space or tab) characters within but adjacent to the braces, like
	1226	S<C<\g{ -1 }>>, or S<C<\k{ I<name> }>>.
	1227
	1228	The C<\g> and C<\k> notations were introduced in Perl 5.10.0. Prior to that
	1229	there were no named nor relative numbered capture groups. Absolute numbered
	1230	groups were referred to using C<\1>,
	1231	C<\2>, I<etc>., and this notation is still
	1232	accepted (and likely always will be). But it leads to some ambiguities if
	1233	there are more than 9 capture groups, as C<\10> could mean either the tenth
	1234	capture group, or the character whose ordinal in octal is 010 (a backspace in
	1235	ASCII). Perl resolves this ambiguity by interpreting C<\10> as a backreference
	1236	only if at least 10 left parentheses have opened before it. Likewise C<\11> is
	1237	a backreference only if at least 11 left parentheses have opened before it.
	1238	And so on. C<\1> through C<\9> are always interpreted as backreferences.
	1239	There are several examples below that illustrate these perils. You can avoid
	1240	the ambiguity by always using C<\g{}> or C<\g> if you mean capturing groups;
	1241	and for octal constants always using C<\o{}>, or for C<\077> and below, using 3
	1242	digits padded with leading zeros, since a leading zero implies an octal
	1243	constant.
	1244
	1245	The C<\I<digit>> notation also works in certain circumstances outside
	1246	the pattern. See L</Warning on \1 Instead of $1> below for details.
	1247
	1248	Examples:
	1249
	1250	s/^([^ ]) ([^ ]*)/$2 $1/; # swap first two words
	1251
	1252	/(.)\g1/ # find first doubled char
	1253	and print "'$1' is the first doubled character\n";
	1254
	1255	/(?<char>.)\k<char>/ # ... a different way
	1256	and print "'$+{char}' is the first doubled character\n";
	1257
	1258	/(?'char'.)\g1/ # ... mix and match
	1259	and print "'$1' is the first doubled character\n";
	1260
	1261	if (/Time: (..):(..):(..)/) { # parse out values
	1262	$hours = $1;
	1263	$minutes = $2;
	1264	$seconds = $3;
	1265	}
	1266
	1267	/(.)(.)(.)(.)(.)(.)(.)(.)(.)\g10/ # \g10 is a backreference
	1268	/(.)(.)(.)(.)(.)(.)(.)(.)(.)\10/ # \10 is octal
	1269	/((.)(.)(.)(.)(.)(.)(.)(.)(.))\10/ # \10 is a backreference
	1270	/((.)(.)(.)(.)(.)(.)(.)(.)(.))\010/ # \010 is octal
	1271
	1272	$x = '(.)\1'; # Creates problems when concatenated.
	1273	$y = '(.)\g{1}'; # Avoids the problems.
	1274	"aa" =~ /${x}/; # True
	1275	"aa" =~ /${y}/; # True
	1276	"aa0" =~ /${x}0/; # False!
	1277	"aa0" =~ /${y}0/; # True
	1278	"aa\x08" =~ /${x}0/; # True!
	1279	"aa\x08" =~ /${y}0/; # False
	1280
	1281	Several special variables also refer back to portions of the previous
	1282	match. C<$+> returns whatever the last bracket match matched.
	1283	C<$&> returns the entire matched string. (At one point C<$0> did
	1284	also, but now it returns the name of the program.) C<$`> returns
	1285	everything before the matched string. C<$'> returns everything
	1286	after the matched string. And C<$^N> contains whatever was matched by
	1287	the most-recently closed group (submatch). C<$^N> can be used in
	1288	extended patterns (see below), for example to assign a submatch to a
	1289	variable.
	1290	X<$+> X<$^N> X<$&> X<$`> X<$'>
	1291
	1292	These special variables, like the C<%+> hash and the numbered match variables
	1293	(C<$1>, C<$2>, C<$3>, I<etc>.) are dynamically scoped
	1294	until the end of the enclosing block or until the next successful
	1295	match, whichever comes first. (See L<perlsyn/"Compound Statements">.)
	1296	X<$+> X<$^N> X<$&> X<$`> X<$'>
	1297	X<$1> X<$2> X<$3> X<$4> X<$5> X<$6> X<$7> X<$8> X<$9>
	1298	X<@{^CAPTURE}>
	1299
	1300	The C<@{^CAPTURE}> array may be used to access ALL of the capture buffers
	1301	as an array without needing to know how many there are. For instance
	1302
	1303	$string=~/$pattern/ and @captured = @{^CAPTURE};
	1304
	1305	will place a copy of each capture variable, C<$1>, C<$2> etc, into the
	1306	C<@captured> array.
	1307
	1308	Be aware that when interpolating a subscript of the C<@{^CAPTURE}>
	1309	array you must use demarcated curly brace notation:
	1310
	1311	print "${^CAPTURE[0]}";
	1312
	1313	See L<perldata/"Demarcated variable names using braces"> for more on
	1314	this notation.
	1315
	1316	B<NOTE>: Failed matches in Perl do not reset the match variables,
	1317	which makes it easier to write code that tests for a series of more
	1318	specific cases and remembers the best match.
	1319
	1320	B<WARNING>: If your code is to run on Perl 5.16 or earlier,
	1321	beware that once Perl sees that you need one of C<$&>, C<$`>, or
	1322	C<$'> anywhere in the program, it has to provide them for every
	1323	pattern match. This may substantially slow your program.
	1324
	1325	Perl uses the same mechanism to produce C<$1>, C<$2>, I<etc>, so you also
	1326	pay a price for each pattern that contains capturing parentheses.
	1327	(To avoid this cost while retaining the grouping behaviour, use the
	1328	extended regular expression C<(?: ... )> instead.) But if you never
	1329	use C<$&>, C<$`> or C<$'>, then patterns I<without> capturing
	1330	parentheses will not be penalized. So avoid C<$&>, C<$'>, and C<$`>
	1331	if you can, but if you can't (and some algorithms really appreciate
	1332	them), once you've used them once, use them at will, because you've
	1333	already paid the price.
	1334	X<$&> X<$`> X<$'>
	1335
	1336	Perl 5.16 introduced a slightly more efficient mechanism that notes
	1337	separately whether each of C<$`>, C<$&>, and C<$'> have been seen, and
	1338	thus may only need to copy part of the string. Perl 5.20 introduced a
	1339	much more efficient copy-on-write mechanism which eliminates any slowdown.
	1340
	1341	As another workaround for this problem, Perl 5.10.0 introduced C<${^PREMATCH}>,
	1342	C<${^MATCH}> and C<${^POSTMATCH}>, which are equivalent to C<$`>, C<$&>
	1343	and C<$'>, B<except> that they are only guaranteed to be defined after a
	1344	successful match that was executed with the C</p> (preserve) modifier.
	1345	The use of these variables incurs no global performance penalty, unlike
	1346	their punctuation character equivalents, however at the trade-off that you
	1347	have to tell perl when you want to use them.
	1348	X</p> X<p modifier>
	1349
	1350	=head2 Quoting metacharacters
	1351
	1352	Backslashed metacharacters in Perl are alphanumeric, such as C<\b>,
	1353	C<\w>, C<\n>. Unlike some other regular expression languages, there
	1354	are no backslashed symbols that aren't alphanumeric. So anything
	1355	that looks like C<\\>, C<$>, C<$>, C<\[>, C<\]>, C<\{>, or C<\}> is
	1356	always
	1357	interpreted as a literal character, not a metacharacter. This was
	1358	once used in a common idiom to disable or quote the special meanings
	1359	of regular expression metacharacters in a string that you want to
	1360	use for a pattern. Simply quote all non-"word" characters:
	1361
	1362	$pattern =~ s/(\W)/\\$1/g;
	1363
	1364	(If C<use locale> is set, then this depends on the current locale.)
	1365	Today it is more common to use the C<L<quotemeta()\|perlfunc/quotemeta>>
	1366	function or the C<\Q> metaquoting escape sequence to disable all
	1367	metacharacters' special meanings like this:
	1368
	1369	/$unquoted\Q$quoted\E$unquoted/
	1370
	1371	Beware that if you put literal backslashes (those not inside
	1372	interpolated variables) between C<\Q> and C<\E>, double-quotish
	1373	backslash interpolation may lead to confusing results. If you
	1374	I<need> to use literal backslashes within C<\Q...\E>,
	1375	consult L<perlop/"Gory details of parsing quoted constructs">.
	1376
	1377	C<quotemeta()> and C<\Q> are fully described in L<perlfunc/quotemeta>.
	1378
	1379	=head2 Extended Patterns
	1380
	1381	Perl also defines a consistent extension syntax for features not
	1382	found in standard tools like B<awk> and
	1383	B<lex>. The syntax for most of these is a
	1384	pair of parentheses with a question mark as the first thing within
	1385	the parentheses. The character after the question mark indicates
	1386	the extension.
	1387
	1388	A question mark was chosen for this and for the minimal-matching
	1389	construct because 1) question marks are rare in older regular
	1390	expressions, and 2) whenever you see one, you should stop and
	1391	"question" exactly what is going on. That's psychology....
	1392
	1393	=over 4
	1394
	1395	=item C<(?#I<text>)>
	1396	X<(?#)>
	1397
	1398	A comment. The I<text> is ignored.
	1399	Note that Perl closes
	1400	the comment as soon as it sees a C<")">, so there is no way to put a literal
	1401	C<")"> in the comment. The pattern's closing delimiter must be escaped by
	1402	a backslash if it appears in the comment.
	1403
	1404	See L</E<sol>x> for another way to have comments in patterns.
	1405
	1406	Note that a comment can go just about anywhere, except in the middle of
	1407	an escape sequence. Examples:
	1408
	1409	qr/foo(?#comment)bar/' # Matches 'foobar'
	1410
	1411	# The pattern below matches 'abcd', 'abccd', or 'abcccd'
	1412	qr/abc(?#comment between literal and its quantifier){1,3}d/
	1413
	1414	# The pattern below generates a syntax error, because the '\p' must
	1415	# be followed immediately by a '{'.
	1416	qr/\p(?#comment between \p and its property name){Any}/
	1417
	1418	# The pattern below generates a syntax error, because the initial
	1419	# '\(' is a literal opening parenthesis, and so there is nothing
	1420	# for the closing ')' to match
	1421	qr/\(?#the backslash means this isn't a comment)p{Any}/
	1422
	1423	# Comments can be used to fold long patterns into multiple lines
	1424	qr/First part of a long regex(?#
	1425	)remaining part/
	1426
	1427	=item C<(?adlupimnsx-imnsx)>
	1428
	1429	=item C<(?^alupimnsx)>
	1430	X<(?)> X<(?^)>
	1431
	1432	Zero or more embedded pattern-match modifiers, to be turned on (or
	1433	turned off if preceded by C<"-">) for the remainder of the pattern or
	1434	the remainder of the enclosing pattern group (if any).
	1435
	1436	This is particularly useful for dynamically-generated patterns,
	1437	such as those read in from a
	1438	configuration file, taken from an argument, or specified in a table
	1439	somewhere. Consider the case where some patterns want to be
	1440	case-sensitive and some do not: The case-insensitive ones merely need to
	1441	include C<(?i)> at the front of the pattern. For example:
	1442
	1443	$pattern = "foobar";
	1444	if ( /$pattern/i ) { }
	1445
	1446	# more flexible:
	1447
	1448	$pattern = "(?i)foobar";
	1449	if ( /$pattern/ ) { }
	1450
	1451	These modifiers are restored at the end of the enclosing group. For example,
	1452
	1453	( (?i) blah ) \s+ \g1
	1454
	1455	will match C<blah> in any case, some spaces, and an exact (I<including the case>!)
	1456	repetition of the previous word, assuming the C</x> modifier, and no C</i>
	1457	modifier outside this group.
	1458
	1459	These modifiers do not carry over into named subpatterns called in the
	1460	enclosing group. In other words, a pattern such as C<((?i)(?&I<NAME>))> does not
	1461	change the case-sensitivity of the I<NAME> pattern.
	1462
	1463	A modifier is overridden by later occurrences of this construct in the
	1464	same scope containing the same modifier, so that
	1465
	1466	/((?im)foo(?-m)bar)/
	1467
	1468	matches all of C<foobar> case insensitively, but uses C</m> rules for
	1469	only the C<foo> portion. The C<"a"> flag overrides C<aa> as well;
	1470	likewise C<aa> overrides C<"a">. The same goes for C<"x"> and C<xx>.
	1471	Hence, in
	1472
	1473	/(?-x)foo/xx
	1474
	1475	both C</x> and C</xx> are turned off during matching C<foo>. And in
	1476
	1477	/(?x)foo/x
	1478
	1479	C</x> but NOT C</xx> is turned on for matching C<foo>. (One might
	1480	mistakenly think that since the inner C<(?x)> is already in the scope of
	1481	C</x>, that the result would effectively be the sum of them, yielding
	1482	C</xx>. It doesn't work that way.) Similarly, doing something like
	1483	C<(?xx-x)foo> turns off all C<"x"> behavior for matching C<foo>, it is not
	1484	that you subtract 1 C<"x"> from 2 to get 1 C<"x"> remaining.
	1485
	1486	Any of these modifiers can be set to apply globally to all regular
	1487	expressions compiled within the scope of a C<use re>. See
	1488	L<re/"'/flags' mode">.
	1489
	1490	Starting in Perl 5.14, a C<"^"> (caret or circumflex accent) immediately
	1491	after the C<"?"> is a shorthand equivalent to C<d-imnsx>. Flags (except
	1492	C<"d">) may follow the caret to override it.
	1493	But a minus sign is not legal with it.
	1494
	1495	Note that the C<"a">, C<"d">, C<"l">, C<"p">, and C<"u"> modifiers are special in
	1496	that they can only be enabled, not disabled, and the C<"a">, C<"d">, C<"l">, and
	1497	C<"u"> modifiers are mutually exclusive: specifying one de-specifies the
	1498	others, and a maximum of one (or two C<"a">'s) may appear in the
	1499	construct. Thus, for
	1500	example, C<(?-p)> will warn when compiled under C<use warnings>;
	1501	C<(?-d:...)> and C<(?dl:...)> are fatal errors.
	1502
	1503	Note also that the C<"p"> modifier is special in that its presence
	1504	anywhere in a pattern has a global effect.
	1505
	1506	Having zero modifiers makes this a no-op (so why did you specify it,
	1507	unless it's generated code), and starting in v5.30, warns under L<C<use
	1508	re 'strict'>\|re/'strict' mode>.
	1509
	1510	=item C<(?:I<pattern>)>
	1511	X<(?:)>
	1512
	1513	=item C<(?adluimnsx-imnsx:I<pattern>)>
	1514
	1515	=item C<(?^aluimnsx:I<pattern>)>
	1516	X<(?^:)>
	1517
	1518	This is for clustering, not capturing; it groups subexpressions like
	1519	C<"()">, but doesn't make backreferences as C<"()"> does. So
	1520
	1521	@fields = split(/\b(?:a\|b\|c)\b/)
	1522
	1523	matches the same field delimiters as
	1524
	1525	@fields = split(/\b(a\|b\|c)\b/)
	1526
	1527	but doesn't spit out the delimiters themselves as extra fields (even though
	1528	that's the behaviour of L<perlfunc/split> when its pattern contains capturing
	1529	groups). It's also cheaper not to capture
	1530	characters if you don't need to.
	1531
	1532	Any letters between C<"?"> and C<":"> act as flags modifiers as with
	1533	C<(?adluimnsx-imnsx)>. For example,
	1534
	1535	/(?s-i:more.than).million/i
	1536
	1537	is equivalent to the more verbose
	1538
	1539	/(?:(?s-i)more.than).million/i
	1540
	1541	Note that any C<()> constructs enclosed within this one will still
	1542	capture unless the C</n> modifier is in effect.
	1543
	1544	Like the L</(?adlupimnsx-imnsx)> construct, C<aa> and C<"a"> override each
	1545	other, as do C<xx> and C<"x">. They are not additive. So, doing
	1546	something like C<(?xx-x:foo)> turns off all C<"x"> behavior for matching
	1547	C<foo>.
	1548
	1549	Starting in Perl 5.14, a C<"^"> (caret or circumflex accent) immediately
	1550	after the C<"?"> is a shorthand equivalent to C<d-imnsx>. Any positive
	1551	flags (except C<"d">) may follow the caret, so
	1552
	1553	(?^x:foo)
	1554
	1555	is equivalent to
	1556
	1557	(?x-imns:foo)
	1558
	1559	The caret tells Perl that this cluster doesn't inherit the flags of any
	1560	surrounding pattern, but uses the system defaults (C<d-imnsx>),
	1561	modified by any flags specified.
	1562
	1563	The caret allows for simpler stringification of compiled regular
	1564	expressions. These look like
	1565
	1566	(?^:pattern)
	1567
	1568	with any non-default flags appearing between the caret and the colon.
	1569	A test that looks at such stringification thus doesn't need to have the
	1570	system default flags hard-coded in it, just the caret. If new flags are
	1571	added to Perl, the meaning of the caret's expansion will change to include
	1572	the default for those flags, so the test will still work, unchanged.
	1573
	1574	Specifying a negative flag after the caret is an error, as the flag is
	1575	redundant.
	1576
	1577	Mnemonic for C<(?^...)>: A fresh beginning since the usual use of a caret is
	1578	to match at the beginning.
	1579
	1580	=item C<(?\|I<pattern>)>
	1581	X<(?\|)> X<Branch reset>
	1582
	1583	This is the "branch reset" pattern, which has the special property
	1584	that the capture groups are numbered from the same starting point
	1585	in each alternation branch. It is available starting from perl 5.10.0.
	1586
	1587	Capture groups are numbered from left to right, but inside this
	1588	construct the numbering is restarted for each branch.
	1589
	1590	The numbering within each branch will be as normal, and any groups
	1591	following this construct will be numbered as though the construct
	1592	contained only one branch, that being the one with the most capture
	1593	groups in it.
	1594
	1595	This construct is useful when you want to capture one of a
	1596	number of alternative matches.
	1597
	1598	Consider the following pattern. The numbers underneath show in
	1599	which group the captured content will be stored.
	1600
	1601
	1602	# before ---------------branch-reset----------- after
	1603	/ ( a ) (?\| x ( y ) z \| (p (q) r) \| (t) u (v) ) ( z ) /x
	1604	# 1 2 2 3 2 3 4
	1605
	1606	Be careful when using the branch reset pattern in combination with
	1607	named captures. Named captures are implemented as being aliases to
	1608	numbered groups holding the captures, and that interferes with the
	1609	implementation of the branch reset pattern. If you are using named
	1610	captures in a branch reset pattern, it's best to use the same names,
	1611	in the same order, in each of the alternations:
	1612
	1613	/(?\| (?<a> x ) (?<b> y )
	1614	\| (?<a> z ) (?<b> w )) /x
	1615
	1616	Not doing so may lead to surprises:
	1617
	1618	"12" =~ /(?\| (?<a> \d+ ) \| (?<b> \D+))/x;
	1619	say $+{a}; # Prints '12'
	1620	say $+{b}; # Also prints '12'.
	1621
	1622	The problem here is that both the group named C<< a >> and the group
	1623	named C<< b >> are aliases for the group belonging to C<< $1 >>.
	1624
	1625	=item Lookaround Assertions
	1626	X<look-around assertion> X<lookaround assertion> X<look-around> X<lookaround>
	1627
	1628	Lookaround assertions are zero-width patterns which match a specific
	1629	pattern without including it in C<$&>. Positive assertions match when
	1630	their subpattern matches, negative assertions match when their subpattern
	1631	fails. Lookbehind matches text up to the current match position,
	1632	lookahead matches text following the current match position.
	1633
	1634	=over 4
	1635
	1636	=item C<(?=I<pattern>)>
	1637
	1638	=item C<(*pla:I<pattern>)>
	1639
	1640	=item C<(*positive_lookahead:I<pattern>)>
	1641	X<(?=)>
	1642	X<(*pla>
	1643	X<(*positive_lookahead>
	1644	X<look-ahead, positive> X<lookahead, positive>
	1645
	1646	A zero-width positive lookahead assertion. For example, C</\w+(?=\t)/>
	1647	matches a word followed by a tab, without including the tab in C<$&>.
	1648
	1649	=item C<(?!I<pattern>)>
	1650
	1651	=item C<(*nla:I<pattern>)>
	1652
	1653	=item C<(*negative_lookahead:I<pattern>)>
	1654	X<(?!)>
	1655	X<(*nla>
	1656	X<(*negative_lookahead>
	1657	X<look-ahead, negative> X<lookahead, negative>
	1658
	1659	A zero-width negative lookahead assertion. For example C</foo(?!bar)/>
	1660	matches any occurrence of "foo" that isn't followed by "bar". Note
	1661	however that lookahead and lookbehind are NOT the same thing. You cannot
	1662	use this for lookbehind.
	1663
	1664	If you are looking for a "bar" that isn't preceded by a "foo", C</(?!foo)bar/>
	1665	will not do what you want. That's because the C<(?!foo)> is just saying that
	1666	the next thing cannot be "foo"--and it's not, it's a "bar", so "foobar" will
	1667	match. Use lookbehind instead (see below).
	1668
	1669	=item C<(?<=I<pattern>)>
	1670
	1671	=item C<\K>
	1672
	1673	=item C<(*plb:I<pattern>)>
	1674
	1675	=item C<(*positive_lookbehind:I<pattern>)>
	1676	X<(?<=)>
	1677	X<(*plb>
	1678	X<(*positive_lookbehind>
	1679	X<look-behind, positive> X<lookbehind, positive> X<\K>
	1680
	1681	A zero-width positive lookbehind assertion. For example, C</(?<=\t)\w+/>
	1682	matches a word that follows a tab, without including the tab in C<$&>.
	1683
	1684	Prior to Perl 5.30, it worked only for fixed-width lookbehind, but
	1685	starting in that release, it can handle variable lengths from 1 to 255
	1686	characters as an experimental feature. The feature is enabled
	1687	automatically if you use a variable length positive lookbehind assertion.
	1688
	1689	In Perl 5.35.10 the scope of the experimental nature of this construct
	1690	has been reduced, and experimental warnings will only be produced when
	1691	the construct contains capturing parenthesis. The warnings will be
	1692	raised at pattern compilation time, unless turned off, in the
	1693	C<experimental::vlb> category. This is to warn you that the exact
	1694	contents of capturing buffers in a variable length positive lookbehind
	1695	is not well defined and is subject to change in a future release of perl.
	1696
	1697	Currently if you use capture buffers inside of a positive variable length
	1698	lookbehind the result will be the longest and thus leftmost match possible.
	1699	This means that
	1700
	1701	"aax" =~ /(?=x)(?<=(a\|aa))/
	1702	"aax" =~ /(?=x)(?<=(aa\|a))/
	1703	"aax" =~ /(?=x)(?<=(a{1,2}?)/
	1704	"aax" =~ /(?=x)(?<=(a{1,2})/
	1705
	1706	will all result in C<$1> containing C<"aa">. It is possible in a future
	1707	release of perl we will change this behavior.
	1708
	1709	There is a special form of this construct, called C<\K>
	1710	(available since Perl 5.10.0), which causes the
	1711	regex engine to "keep" everything it had matched prior to the C<\K> and
	1712	not include it in C<$&>. This effectively provides non-experimental
	1713	variable-length lookbehind of any length.
	1714
	1715	And, there is a technique that can be used to handle variable length
	1716	lookbehinds on earlier releases, and longer than 255 characters. It is
	1717	described in
	1718	L<http://www.drregex.com/2019/02/variable-length-lookbehinds-actually.html>.
	1719
	1720	Note that under C</i>, a few single characters match two or three other
	1721	characters. This makes them variable length, and the 255 length applies
	1722	to the maximum number of characters in the match. For
	1723	example C<qr/\N{LATIN SMALL LETTER SHARP S}/i> matches the sequence
	1724	C<"ss">. Your lookbehind assertion could contain 127 Sharp S
	1725	characters under C</i>, but adding a 128th would generate a compilation
	1726	error, as that could match 256 C<"s"> characters in a row.
	1727
	1728	The use of C<\K> inside of another lookaround assertion
	1729	is allowed, but the behaviour is currently not well defined.
	1730
	1731	For various reasons C<\K> may be significantly more efficient than the
	1732	equivalent C<< (?<=...) >> construct, and it is especially useful in
	1733	situations where you want to efficiently remove something following
	1734	something else in a string. For instance
	1735
	1736	s/(foo)bar/$1/g;
	1737
	1738	can be rewritten as the much more efficient
	1739
	1740	s/foo\Kbar//g;
	1741
	1742	Use of the non-greedy modifier C<"?"> may not give you the expected
	1743	results if it is within a capturing group within the construct.
	1744
	1745	=item C<(?<!I<pattern>)>
	1746
	1747	=item C<(*nlb:I<pattern>)>
	1748
	1749	=item C<(*negative_lookbehind:I<pattern>)>
	1750	X<(?<!)>
	1751	X<(*nlb>
	1752	X<(*negative_lookbehind>
	1753	X<look-behind, negative> X<lookbehind, negative>
	1754
	1755	A zero-width negative lookbehind assertion. For example C</(?<!bar)foo/>
	1756	matches any occurrence of "foo" that does not follow "bar".
	1757
	1758	Prior to Perl 5.30, it worked only for fixed-width lookbehind, but
	1759	starting in that release, it can handle variable lengths from 1 to 255
	1760	characters as an experimental feature. The feature is enabled
	1761	automatically if you use a variable length negative lookbehind assertion.
	1762
	1763	In Perl 5.35.10 the scope of the experimental nature of this construct
	1764	has been reduced, and experimental warnings will only be produced when
	1765	the construct contains capturing parentheses. The warnings will be
	1766	raised at pattern compilation time, unless turned off, in the
	1767	C<experimental::vlb> category. This is to warn you that the exact
	1768	contents of capturing buffers in a variable length negative lookbehind
	1769	is not well defined and is subject to change in a future release of perl.
	1770
	1771	Currently if you use capture buffers inside of a negative variable length
	1772	lookbehind the result may not be what you expect, for instance:
	1773
	1774	say "axfoo"=~/(?=foo)(?<!(a\|ax)(?{ say $1 }))/ ? "y" : "n";
	1775
	1776	will output the following:
	1777
	1778	a
	1779	no
	1780
	1781	which does not make sense as this should print out "ax" as the "a" does
	1782	not line up at the correct place. Another example would be:
	1783
	1784	say "yes: '$1-$2'" if "aayfoo"=~/(?=foo)(?<!(a\|aa)(a\|aa)x)/;
	1785
	1786	will output the following:
	1787
	1788	yes: 'aa-a'
	1789
	1790	It is possible in a future release of perl we will change this behavior
	1791	so both of these examples produced more reasonable output.
	1792
	1793	Note that we are confident that the construct will match and reject
	1794	patterns appropriately, the undefined behavior strictly relates to the
	1795	value of the capture buffer during or after matching.
	1796
	1797	There is a technique that can be used to handle variable length
	1798	lookbehind on earlier releases, and longer than 255 characters. It is
	1799	described in
	1800	L<http://www.drregex.com/2019/02/variable-length-lookbehinds-actually.html>.
	1801
	1802	Note that under C</i>, a few single characters match two or three other
	1803	characters. This makes them variable length, and the 255 length applies
	1804	to the maximum number of characters in the match. For
	1805	example C<qr/\N{LATIN SMALL LETTER SHARP S}/i> matches the sequence
	1806	C<"ss">. Your lookbehind assertion could contain 127 Sharp S
	1807	characters under C</i>, but adding a 128th would generate a compilation
	1808	error, as that could match 256 C<"s"> characters in a row.
	1809
	1810	Use of the non-greedy modifier C<"?"> may not give you the expected
	1811	results if it is within a capturing group within the construct.
	1812
	1813	=back
	1814
	1815	=item C<< (?<I<NAME>>I<pattern>) >>
	1816
	1817	=item C<(?'I<NAME>'I<pattern>)>
	1818	X<< (?<NAME>) >> X<(?'NAME')> X<named capture> X<capture>
	1819
	1820	A named capture group. Identical in every respect to normal capturing
	1821	parentheses C<()> but for the additional fact that the group
	1822	can be referred to by name in various regular expression
	1823	constructs (like C<\g{I<NAME>}>) and can be accessed by name
	1824	after a successful match via C<%+> or C<%->. See L<perlvar>
	1825	for more details on the C<%+> and C<%-> hashes.
	1826
	1827	If multiple distinct capture groups have the same name, then
	1828	C<$+{I<NAME>}> will refer to the leftmost defined group in the match.
	1829
	1830	The forms C<(?'I<NAME>'I<pattern>)> and C<< (?<I<NAME>>I<pattern>) >>
	1831	are equivalent.
	1832
	1833	B<NOTE:> While the notation of this construct is the same as the similar
	1834	function in .NET regexes, the behavior is not. In Perl the groups are
	1835	numbered sequentially regardless of being named or not. Thus in the
	1836	pattern
	1837
	1838	/(x)(?<foo>y)(z)/
	1839
	1840	C<$+{foo}> will be the same as C<$2>, and C<$3> will contain 'z' instead of
	1841	the opposite which is what a .NET regex hacker might expect.
	1842
	1843	Currently I<NAME> is restricted to simple identifiers only.
	1844	In other words, it must match C</^[_A-Za-z][_A-Za-z0-9]*\z/> or
	1845	its Unicode extension (see L<utf8>),
	1846	though it isn't extended by the locale (see L<perllocale>).
	1847
	1848	B<NOTE:> In order to make things easier for programmers with experience
	1849	with the Python or PCRE regex engines, the pattern C<<
	1850	(?PE<lt>I<NAME>E<gt>I<pattern>) >>
	1851	may be used instead of C<< (?<I<NAME>>I<pattern>) >>; however this form does not
	1852	support the use of single quotes as a delimiter for the name.
	1853
	1854	=item C<< \k<I<NAME>> >>
	1855
	1856	=item C<< \k'I<NAME>' >>
	1857
	1858	=item C<< \k{I<NAME>} >>
	1859
	1860	Named backreference. Similar to numeric backreferences, except that
	1861	the group is designated by name and not number. If multiple groups
	1862	have the same name then it refers to the leftmost defined group in
	1863	the current match.
	1864
	1865	It is an error to refer to a name not defined by a C<< (?<I<NAME>>) >>
	1866	earlier in the pattern.
	1867
	1868	All three forms are equivalent, although with C<< \k{ I<NAME> } >>,
	1869	you may optionally have blanks within but adjacent to the braces, as
	1870	shown.
	1871
	1872	B<NOTE:> In order to make things easier for programmers with experience
	1873	with the Python or PCRE regex engines, the pattern C<< (?P=I<NAME>) >>
	1874	may be used instead of C<< \k<I<NAME>> >>.
	1875
	1876	=item C<(?{ I<code> })>
	1877	X<(?{})> X<regex, code in> X<regexp, code in> X<regular expression, code in>
	1878
	1879	B<WARNING>: Using this feature safely requires that you understand its
	1880	limitations. Code executed that has side effects may not perform identically
	1881	from version to version due to the effect of future optimisations in the regex
	1882	engine. For more information on this, see L</Embedded Code Execution
	1883	Frequency>.
	1884
	1885	This zero-width assertion executes any embedded Perl code. It always
	1886	succeeds, and its return value is set as C<$^R>.
	1887
	1888	In literal patterns, the code is parsed at the same time as the
	1889	surrounding code. While within the pattern, control is passed temporarily
	1890	back to the perl parser, until the logically-balancing closing brace is
	1891	encountered. This is similar to the way that an array index expression in
	1892	a literal string is handled, for example
	1893
	1894	"abc$array[ 1 + f('[') + g()]def"
	1895
	1896	In particular, braces do not need to be balanced:
	1897
	1898	s/abc(?{ f('{'); })/def/
	1899
	1900	Even in a pattern that is interpolated and compiled at run-time, literal
	1901	code blocks will be compiled once, at perl compile time; the following
	1902	prints "ABCD":
	1903
	1904	print "D";
	1905	my $qr = qr/(?{ BEGIN { print "A" } })/;
	1906	my $foo = "foo";
	1907	/$foo$qr(?{ BEGIN { print "B" } })/;
	1908	BEGIN { print "C" }
	1909
	1910	In patterns where the text of the code is derived from run-time
	1911	information rather than appearing literally in a source code /pattern/,
	1912	the code is compiled at the same time that the pattern is compiled, and
	1913	for reasons of security, C<use re 'eval'> must be in scope. This is to
	1914	stop user-supplied patterns containing code snippets from being
	1915	executable.
	1916
	1917	In situations where you need to enable this with C<use re 'eval'>, you should
	1918	also have taint checking enabled, if your perl supports it.
	1919	Better yet, use the carefully constrained evaluation within a Safe compartment.
	1920	See L<perlsec> for details about both these mechanisms.
	1921
	1922	From the viewpoint of parsing, lexical variable scope and closures,
	1923
	1924	/AAA(?{ BBB })CCC/
	1925
	1926	behaves approximately like
	1927
	1928	/AAA/ && do { BBB } && /CCC/
	1929
	1930	Similarly,
	1931
	1932	qr/AAA(?{ BBB })CCC/
	1933
	1934	behaves approximately like
	1935
	1936	sub { /AAA/ && do { BBB } && /CCC/ }
	1937
	1938	In particular:
	1939
	1940	{ my $i = 1; $r = qr/(?{ print $i })/ }
	1941	my $i = 2;
	1942	/$r/; # prints "1"
	1943
	1944	Inside a C<(?{...})> block, C<$_> refers to the string the regular
	1945	expression is matching against. You can also use C<pos()> to know what is
	1946	the current position of matching within this string.
	1947
	1948	The code block introduces a new scope from the perspective of lexical
	1949	variable declarations, but B<not> from the perspective of C<local> and
	1950	similar localizing behaviours. So later code blocks within the same
	1951	pattern will still see the values which were localized in earlier blocks.
	1952	These accumulated localizations are undone either at the end of a
	1953	successful match, or if the assertion is backtracked (compare
	1954	L</"Backtracking">). For example,
	1955
	1956	$_ = 'a' x 8;
	1957	m<
	1958	(?{ $cnt = 0 }) # Initialize $cnt.
	1959	(
	1960	a
	1961	(?{
	1962	local $cnt = $cnt + 1; # Update $cnt,
	1963	# backtracking-safe.
	1964	})
	1965	)*
	1966	aaaa
	1967	(?{ $res = $cnt }) # On success copy to
	1968	# non-localized location.
	1969	>x;
	1970
	1971	will initially increment C<$cnt> up to 8; then during backtracking, its
	1972	value will be unwound back to 4, which is the value assigned to C<$res>.
	1973	At the end of the regex execution, C<$cnt> will be wound back to its initial
	1974	value of 0.
	1975
	1976	This assertion may be used as the condition in a
	1977
	1978	(?(condition)yes-pattern\|no-pattern)
	1979
	1980	switch. If I<not> used in this way, the result of evaluation of I<code>
	1981	is put into the special variable C<$^R>. This happens immediately, so
	1982	C<$^R> can be used from other C<(?{ I<code> })> assertions inside the same
	1983	regular expression.
	1984
	1985	The assignment to C<$^R> above is properly localized, so the old
	1986	value of C<$^R> is restored if the assertion is backtracked; compare
	1987	L</"Backtracking">.
	1988
	1989	Note that the special variable C<$^N> is particularly useful with code
	1990	blocks to capture the results of submatches in variables without having to
	1991	keep track of the number of nested parentheses. For example:
	1992
	1993	$_ = "The brown fox jumps over the lazy dog";
	1994	/the (\S+)(?{ $color = $^N }) (\S+)(?{ $animal = $^N })/i;
	1995	print "color = $color, animal = $animal\n";
	1996
	1997	The use of this construct disables some optimisations globally in the
	1998	pattern, and the pattern may execute much slower as a consequence.
	1999	Use a C<*> instead of the C<?> block to create an optimistic form of
	2000	this construct. C<(*{ ... })> should not disable any optimisations.
	2001
	2002	=item C<(*{ I<code> })>
	2003	X<(*{})> X<regex, optimistic code>
	2004
	2005	This is exactly the same as C<(?{ I<code> })> with the exception
	2006	that it does not disable B<any> optimisations at all in the regex engine.
	2007	How often it is executed may vary from perl release to perl release.
	2008	In a failing match it may not even be executed at all.
	2009
	2010	=item C<(??{ I<code> })>
	2011	X<(??{})>
	2012	X<regex, postponed> X<regexp, postponed> X<regular expression, postponed>
	2013
	2014	B<WARNING>: Using this feature safely requires that you understand its
	2015	limitations. Code executed that has side effects may not perform
	2016	identically from version to version due to the effect of future
	2017	optimisations in the regex engine. For more information on this, see
	2018	L</Embedded Code Execution Frequency>.
	2019
	2020	This is a "postponed" regular subexpression. It behaves in I<exactly> the
	2021	same way as a C<(?{ I<code> })> code block as described above, except that
	2022	its return value, rather than being assigned to C<$^R>, is treated as a
	2023	pattern, compiled if it's a string (or used as-is if its a qr// object),
	2024	then matched as if it were inserted instead of this construct.
	2025
	2026	During the matching of this sub-pattern, it has its own set of
	2027	captures which are valid during the sub-match, but are discarded once
	2028	control returns to the main pattern. For example, the following matches,
	2029	with the inner pattern capturing "B" and matching "BB", while the outer
	2030	pattern captures "A";
	2031
	2032	my $inner = '(.)\1';
	2033	"ABBA" =~ /^(.)(??{ $inner })\1/;
	2034	print $1; # prints "A";
	2035
	2036	Note that this means that there is no way for the inner pattern to refer
	2037	to a capture group defined outside. (The code block itself can use C<$1>,
	2038	I<etc>., to refer to the enclosing pattern's capture groups.) Thus, although
	2039
	2040	('a' x 100)=~/(??{'(.)' x 100})/
	2041
	2042	I<will> match, it will I<not> set C<$1> on exit.
	2043
	2044	The following pattern matches a parenthesized group:
	2045
	2046	$re = qr{
	2047	\(
	2048	(?:
	2049	(?> [^()]+ ) # Non-parens without backtracking
	2050	\|
	2051	(??{ $re }) # Group with matching parens
	2052	)*
	2053	\)
	2054	}x;
	2055
	2056	See also
	2057	L<C<(?I<PARNO>)>\|/(?I<PARNO>) (?-I<PARNO>) (?+I<PARNO>) (?R) (?0)>
	2058	for a different, more efficient way to accomplish
	2059	the same task.
	2060
	2061	Executing a postponed regular expression too many times without
	2062	consuming any input string will also result in a fatal error. The depth
	2063	at which that happens is compiled into perl, so it can be changed with a
	2064	custom build.
	2065
	2066	The use of this construct disables some optimisations globally in the pattern,
	2067	and the pattern may execute much slower as a consequence.
	2068
	2069	=item C<(?I<PARNO>)> C<(?-I<PARNO>)> C<(?+I<PARNO>)> C<(?R)> C<(?0)>
	2070	X<(?PARNO)> X<(?1)> X<(?R)> X<(?0)> X<(?-1)> X<(?+1)> X<(?-PARNO)> X<(?+PARNO)>
	2071	X<regex, recursive> X<regexp, recursive> X<regular expression, recursive>
	2072	X<regex, relative recursion> X<GOSUB> X<GOSTART>
	2073
	2074	Recursive subpattern. Treat the contents of a given capture buffer in the
	2075	current pattern as an independent subpattern and attempt to match it at
	2076	the current position in the string. Information about capture state from
	2077	the caller for things like backreferences is available to the subpattern,
	2078	but capture buffers set by the subpattern are not visible to the caller.
	2079
	2080	Similar to C<(??{ I<code> })> except that it does not involve executing any
	2081	code or potentially compiling a returned pattern string; instead it treats
	2082	the part of the current pattern contained within a specified capture group
	2083	as an independent pattern that must match at the current position. Also
	2084	different is the treatment of capture buffers, unlike C<(??{ I<code> })>
	2085	recursive patterns have access to their caller's match state, so one can
	2086	use backreferences safely.
	2087
	2088	I<PARNO> is a sequence of digits (not starting with 0) whose value reflects
	2089	the paren-number of the capture group to recurse to. C<(?R)> recurses to
	2090	the beginning of the whole pattern. C<(?0)> is an alternate syntax for
	2091	C<(?R)>. If I<PARNO> is preceded by a plus or minus sign then it is assumed
	2092	to be relative, with negative numbers indicating preceding capture groups
	2093	and positive ones following. Thus C<(?-1)> refers to the most recently
	2094	declared group, and C<(?+1)> indicates the next group to be declared.
	2095	Note that the counting for relative recursion differs from that of
	2096	relative backreferences, in that with recursion unclosed groups B<are>
	2097	included.
	2098
	2099	The following pattern matches a function C<foo()> which may contain
	2100	balanced parentheses as the argument.
	2101
	2102	$re = qr{ ( # paren group 1 (full function)
	2103	foo
	2104	( # paren group 2 (parens)
	2105	\(
	2106	( # paren group 3 (contents of parens)
	2107	(?:
	2108	(?> [^()]+ ) # Non-parens without backtracking
	2109	\|
	2110	(?2) # Recurse to start of paren group 2
	2111	)*
	2112	)
	2113	\)
	2114	)
	2115	)
	2116	}x;
	2117
	2118	If the pattern was used as follows
	2119
	2120	'foo(bar(baz)+baz(bop))'=~/$re/
	2121	and print "\$1 = $1\n",
	2122	"\$2 = $2\n",
	2123	"\$3 = $3\n";
	2124
	2125	the output produced should be the following:
	2126
	2127	$1 = foo(bar(baz)+baz(bop))
	2128	$2 = (bar(baz)+baz(bop))
	2129	$3 = bar(baz)+baz(bop)
	2130
	2131	If there is no corresponding capture group defined, then it is a
	2132	fatal error. Recursing deeply without consuming any input string will
	2133	also result in a fatal error. The depth at which that happens is
	2134	compiled into perl, so it can be changed with a custom build.
	2135
	2136	The following shows how using negative indexing can make it
	2137	easier to embed recursive patterns inside of a C<qr//> construct
	2138	for later use:
	2139
	2140	my $parens = qr/($(?:[^()]++\|(?-1))*+$)/;
	2141	if (/foo $parens \s+ \+ \s+ bar $parens/x) {
	2142	# do something here...
	2143	}
	2144
	2145	B<Note> that this pattern does not behave the same way as the equivalent
	2146	PCRE or Python construct of the same form. In Perl you can backtrack into
	2147	a recursed group, in PCRE and Python the recursed into group is treated
	2148	as atomic. Also, modifiers are resolved at compile time, so constructs
	2149	like C<(?i:(?1))> or C<(?:(?i)(?1))> do not affect how the sub-pattern will
	2150	be processed.
	2151
	2152	=item C<(?&I<NAME>)>
	2153	X<(?&NAME)>
	2154
	2155	Recurse to a named subpattern. Identical to C<(?I<PARNO>)> except that the
	2156	parenthesis to recurse to is determined by name. If multiple parentheses have
	2157	the same name, then it recurses to the leftmost.
	2158
	2159	It is an error to refer to a name that is not declared somewhere in the
	2160	pattern.
	2161
	2162	B<NOTE:> In order to make things easier for programmers with experience
	2163	with the Python or PCRE regex engines the pattern C<< (?P>I<NAME>) >>
	2164	may be used instead of C<< (?&I<NAME>) >>.
	2165
	2166	=item C<(?(I<condition>)I<yes-pattern>\|I<no-pattern>)>
	2167	X<(?()>
	2168
	2169	=item C<(?(I<condition>)I<yes-pattern>)>
	2170
	2171	Conditional expression. Matches I<yes-pattern> if I<condition> yields
	2172	a true value, matches I<no-pattern> otherwise. A missing pattern always
	2173	matches.
	2174
	2175	C<(I<condition>)> should be one of:
	2176
	2177	=over 4
	2178
	2179	=item an integer in parentheses
	2180
	2181	(which is valid if the corresponding pair of parentheses
	2182	matched);
	2183
	2184	=item a lookahead/lookbehind/evaluate zero-width assertion;
	2185
	2186	=item a name in angle brackets or single quotes
	2187
	2188	(which is valid if a group with the given name matched);
	2189
	2190	=item the special symbol C<(R)>
	2191
	2192	(true when evaluated inside of recursion or eval). Additionally the
	2193	C<"R"> may be
	2194	followed by a number, (which will be true when evaluated when recursing
	2195	inside of the appropriate group), or by C<&I<NAME>>, in which case it will
	2196	be true only when evaluated during recursion in the named group.
	2197
	2198	=back
	2199
	2200	Here's a summary of the possible predicates:
	2201
	2202	=over 4
	2203
	2204	=item C<(1)> C<(2)> ...
	2205
	2206	Checks if the numbered capturing group has matched something.
	2207	Full syntax: C<< (?(1)then\|else) >>
	2208
	2209	=item C<(E<lt>I<NAME>E<gt>)> C<('I<NAME>')>
	2210
	2211	Checks if a group with the given name has matched something.
	2212	Full syntax: C<< (?(<name>)then\|else) >>
	2213
	2214	=item C<(?=...)> C<(?!...)> C<(?<=...)> C<(?<!...)>
	2215
	2216	Checks whether the pattern matches (or does not match, for the C<"!">
	2217	variants).
	2218	Full syntax: C<< (?(?=I<lookahead>)I<then>\|I<else>) >>
	2219
	2220	=item C<(?{ I<CODE> })>
	2221
	2222	Treats the return value of the code block as the condition.
	2223	Full syntax: C<< (?(?{ I<CODE> })I<then>\|I<else>) >>
	2224
	2225	Note use of this construct may globally affect the performance
	2226	of the pattern. Consider using C<(*{ I<CODE> })>
	2227
	2228	=item C<(*{ I<CODE> })>
	2229
	2230	Treats the return value of the code block as the condition.
	2231	Full syntax: C<< (?(*{ I<CODE> })I<then>\|I<else>) >>
	2232
	2233	=item C<(R)>
	2234
	2235	Checks if the expression has been evaluated inside of recursion.
	2236	Full syntax: C<< (?(R)I<then>\|I<else>) >>
	2237
	2238	=item C<(R1)> C<(R2)> ...
	2239
	2240	Checks if the expression has been evaluated while executing directly
	2241	inside of the n-th capture group. This check is the regex equivalent of
	2242
	2243	if ((caller(0))[3] eq 'subname') { ... }
	2244
	2245	In other words, it does not check the full recursion stack.
	2246
	2247	Full syntax: C<< (?(R1)I<then>\|I<else>) >>
	2248
	2249	=item C<(R&I<NAME>)>
	2250
	2251	Similar to C<(R1)>, this predicate checks to see if we're executing
	2252	directly inside of the leftmost group with a given name (this is the same
	2253	logic used by C<(?&I<NAME>)> to disambiguate). It does not check the full
	2254	stack, but only the name of the innermost active recursion.
	2255	Full syntax: C<< (?(R&I<name>)I<then>\|I<else>) >>
	2256
	2257	=item C<(DEFINE)>
	2258
	2259	In this case, the yes-pattern is never directly executed, and no
	2260	no-pattern is allowed. Similar in spirit to C<(?{0})> but more efficient.
	2261	See below for details.
	2262	Full syntax: C<< (?(DEFINE)I<definitions>...) >>
	2263
	2264	=back
	2265
	2266	For example:
	2267
	2268	m{ ( \( )?
	2269	[^()]+
	2270	(?(1) \) )
	2271	}x
	2272
	2273	matches a chunk of non-parentheses, possibly included in parentheses
	2274	themselves.
	2275
	2276	A special form is the C<(DEFINE)> predicate, which never executes its
	2277	yes-pattern directly, and does not allow a no-pattern. This allows one to
	2278	define subpatterns which will be executed only by the recursion mechanism.
	2279	This way, you can define a set of regular expression rules that can be
	2280	bundled into any pattern you choose.
	2281
	2282	It is recommended that for this usage you put the DEFINE block at the
	2283	end of the pattern, and that you name any subpatterns defined within it.
	2284
	2285	Also, it's worth noting that patterns defined this way probably will
	2286	not be as efficient, as the optimizer is not very clever about
	2287	handling them.
	2288
	2289	An example of how this might be used is as follows:
	2290
	2291	/(?<NAME>(?&NAME_PAT))(?<ADDR>(?&ADDRESS_PAT))
	2292	(?(DEFINE)
	2293	(?<NAME_PAT>....)
	2294	(?<ADDRESS_PAT>....)
	2295	)/x
	2296
	2297	Note that capture groups matched inside of recursion are not accessible
	2298	after the recursion returns, so the extra layer of capturing groups is
	2299	necessary. Thus C<$+{NAME_PAT}> would not be defined even though
	2300	C<$+{NAME}> would be.
	2301
	2302	Finally, keep in mind that subpatterns created inside a DEFINE block
	2303	count towards the absolute and relative number of captures, so this:
	2304
	2305	my @captures = "a" =~ /(.) # First capture
	2306	(?(DEFINE)
	2307	(?<EXAMPLE> 1 ) # Second capture
	2308	)/x;
	2309	say scalar @captures;
	2310
	2311	Will output 2, not 1. This is particularly important if you intend to
	2312	compile the definitions with the C<qr//> operator, and later
	2313	interpolate them in another pattern.
	2314
	2315	=item C<< (?>I<pattern>) >>
	2316
	2317	=item C<< (*atomic:I<pattern>) >>
	2318	X<(?E<gt>pattern)>
	2319	X<(*atomic>
	2320	X<backtrack> X<backtracking> X<atomic> X<possessive>
	2321
	2322	An "independent" subexpression, one which matches the substring
	2323	that a standalone I<pattern> would match if anchored at the given
	2324	position, and it matches I<nothing other than this substring>. This
	2325	construct is useful for optimizations of what would otherwise be
	2326	"eternal" matches, because it will not backtrack (see L</"Backtracking">).
	2327	It may also be useful in places where the "grab all you can, and do not
	2328	give anything back" semantic is desirable.
	2329
	2330	For example: C<< ^(?>a)ab >> will never match, since C<< (?>a) >>
	2331	(anchored at the beginning of string, as above) will match I<all>
	2332	characters C<"a"> at the beginning of string, leaving no C<"a"> for
	2333	C<ab> to match. In contrast, C<a*ab> will match the same as C<a+b>,
	2334	since the match of the subgroup C<a*> is influenced by the following
	2335	group C<ab> (see L</"Backtracking">). In particular, C<a*> inside
	2336	C<aab> will match fewer characters than a standalone C<a>, since
	2337	this makes the tail match.
	2338
	2339	C<< (?>I<pattern>) >> does not disable backtracking altogether once it has
	2340	matched. It is still possible to backtrack past the construct, but not
	2341	into it. So C<< ((?>a)\|(?>b))ar >> will still match "bar".
	2342
	2343	An effect similar to C<< (?>I<pattern>) >> may be achieved by writing
	2344	C<(?=(I<pattern>))\g{-1}>. This matches the same substring as a standalone
	2345	C<a+>, and the following C<\g{-1}> eats the matched string; it therefore
	2346	makes a zero-length assertion into an analogue of C<< (?>...) >>.
	2347	(The difference between these two constructs is that the second one
	2348	uses a capturing group, thus shifting ordinals of backreferences
	2349	in the rest of a regular expression.)
	2350
	2351	Consider this pattern:
	2352
	2353	m{ \(
	2354	(
	2355	[^()]+ # x+
	2356	\|
	2357	$ [^()]* $
	2358	)+
	2359	\)
	2360	}x
	2361
	2362	That will efficiently match a nonempty group with matching parentheses
	2363	two levels deep or less. However, if there is no such group, it
	2364	will take virtually forever on a long string. That's because there
	2365	are so many different ways to split a long string into several
	2366	substrings. This is what C<(.+)+> is doing, and C<(.+)+> is similar
	2367	to a subpattern of the above pattern. Consider how the pattern
	2368	above detects no-match on C<((()aaaaaaaaaaaaaaaaaa> in several
	2369	seconds, but that each extra letter doubles this time. This
	2370	exponential performance will make it appear that your program has
	2371	hung. However, a tiny change to this pattern
	2372
	2373	m{ \(
	2374	(
	2375	(?> [^()]+ ) # change x+ above to (?> x+ )
	2376	\|
	2377	$ [^()]* $
	2378	)+
	2379	\)
	2380	}x
	2381
	2382	which uses C<< (?>...) >> matches exactly when the one above does (verifying
	2383	this yourself would be a productive exercise), but finishes in a fourth
	2384	the time when used on a similar string with 1000000 C<"a">s. Be aware,
	2385	however, that, when this construct is followed by a
	2386	quantifier, it currently triggers a warning message under
	2387	the C<use warnings> pragma or B<-w> switch saying it
	2388	C<"matches null string many times in regex">.
	2389
	2390	On simple groups, such as the pattern C<< (?> [^()]+ ) >>, a comparable
	2391	effect may be achieved by negative lookahead, as in C<[^()]+ (?! [^()] )>.
	2392	This was only 4 times slower on a string with 1000000 C<"a">s.
	2393
	2394	The "grab all you can, and do not give anything back" semantic is desirable
	2395	in many situations where on the first sight a simple C<()*> looks like
	2396	the correct solution. Suppose we parse text with comments being delimited
	2397	by C<"#"> followed by some optional (horizontal) whitespace. Contrary to
	2398	its appearance, C<#[ \t]*> I<is not> the correct subexpression to match
	2399	the comment delimiter, because it may "give up" some whitespace if
	2400	the remainder of the pattern can be made to match that way. The correct
	2401	answer is either one of these:
	2402
	2403	(?>#[ \t]*)
	2404	#[ \t]*(?![ \t])
	2405
	2406	For example, to grab non-empty comments into C<$1>, one should use either
	2407	one of these:
	2408
	2409	/ (?> \# [ \t]* ) ( .+ ) /x;
	2410	/ \# [ \t]* ( [^ \t] .* ) /x;
	2411
	2412	Which one you pick depends on which of these expressions better reflects
	2413	the above specification of comments.
	2414
	2415	In some literature this construct is called "atomic matching" or
	2416	"possessive matching".
	2417
	2418	Possessive quantifiers are equivalent to putting the item they are applied
	2419	to inside of one of these constructs. The following equivalences apply:
	2420
	2421	Quantifier Form Bracketing Form
	2422	--------------- ---------------
	2423	PAT+ (?>PAT)
	2424	PAT++ (?>PAT+)
	2425	PAT?+ (?>PAT?)
	2426	PAT{min,max}+ (?>PAT{min,max})
	2427
	2428	Nested C<(?E<gt>...)> constructs are not no-ops, even if at first glance
	2429	they might seem to be. This is because the nested C<(?E<gt>...)> can
	2430	restrict internal backtracking that otherwise might occur. For example,
	2431
	2432	"abc" =~ /(?>a[bc]*c)/
	2433
	2434	matches, but
	2435
	2436	"abc" =~ /(?>a(?>[bc]*)c)/
	2437
	2438	does not.
	2439
	2440	=item C<(?[ ])>
	2441
	2442	See L<perlrecharclass/Extended Bracketed Character Classes>.
	2443
	2444	=back
	2445
	2446	=head2 Backtracking
	2447	X<backtrack> X<backtracking>
	2448
	2449	NOTE: This section presents an abstract approximation of regular
	2450	expression behavior. For a more rigorous (and complicated) view of
	2451	the rules involved in selecting a match among possible alternatives,
	2452	see L</Combining RE Pieces>.
	2453
	2454	A fundamental feature of regular expression matching involves the
	2455	notion called I<backtracking>, which is currently used (when needed)
	2456	by all regular non-possessive expression quantifiers, namely C<"*">,
	2457	C<*?>, C<"+">, C<+?>, C<{n,m}>, and C<{n,m}?>. Backtracking is often
	2458	optimized internally, but the general principle outlined here is valid.
	2459
	2460	For a regular expression to match, the I<entire> regular expression must
	2461	match, not just part of it. So if the beginning of a pattern containing a
	2462	quantifier succeeds in a way that causes later parts in the pattern to
	2463	fail, the matching engine backs up and recalculates the beginning
	2464	part--that's why it's called backtracking.
	2465
	2466	Here is an example of backtracking: Let's say you want to find the
	2467	word following "foo" in the string "Food is on the foo table.":
	2468
	2469	$_ = "Food is on the foo table.";
	2470	if ( /\b(foo)\s+(\w+)/i ) {
	2471	print "$2 follows $1.\n";
	2472	}
	2473
	2474	When the match runs, the first part of the regular expression (C<\b(foo)>)
	2475	finds a possible match right at the beginning of the string, and loads up
	2476	C<$1> with "Foo". However, as soon as the matching engine sees that there's
	2477	no whitespace following the "Foo" that it had saved in C<$1>, it realizes its
	2478	mistake and starts over again one character after where it had the
	2479	tentative match. This time it goes all the way until the next occurrence
	2480	of "foo". The complete regular expression matches this time, and you get
	2481	the expected output of "table follows foo."
	2482
	2483	Sometimes minimal matching can help a lot. Imagine you'd like to match
	2484	everything between "foo" and "bar". Initially, you write something
	2485	like this:
	2486
	2487	$_ = "The food is under the bar in the barn.";
	2488	if ( /foo(.*)bar/ ) {
	2489	print "got <$1>\n";
	2490	}
	2491
	2492	Which perhaps unexpectedly yields:
	2493
	2494	got <d is under the bar in the >
	2495
	2496	That's because C<.*> was greedy, so you get everything between the
	2497	I<first> "foo" and the I<last> "bar". Here it's more effective
	2498	to use minimal matching to make sure you get the text between a "foo"
	2499	and the first "bar" thereafter.
	2500
	2501	if ( /foo(.*?)bar/ ) { print "got <$1>\n" }
	2502	got <d is under the >
	2503
	2504	Here's another example. Let's say you'd like to match a number at the end
	2505	of a string, and you also want to keep the preceding part of the match.
	2506	So you write this:
	2507
	2508	$_ = "I have 2 numbers: 53147";
	2509	if ( /(.)(\d)/ ) { # Wrong!
	2510	print "Beginning is <$1>, number is <$2>.\n";
	2511	}
	2512
	2513	That won't work at all, because C<.*> was greedy and gobbled up the
	2514	whole string. As C<\d*> can match on an empty string the complete
	2515	regular expression matched successfully.
	2516
	2517	Beginning is <I have 2 numbers: 53147>, number is <>.
	2518
	2519	Here are some variants, most of which don't work:
	2520
	2521	$_ = "I have 2 numbers: 53147";
	2522	@pats = qw{
	2523	(.)(\d)
	2524	(.*)(\d+)
	2525	(.?)(\d)
	2526	(.*?)(\d+)
	2527	(.*)(\d+)$
	2528	(.*?)(\d+)$
	2529	(.*)\b(\d+)$
	2530	(.*\D)(\d+)$
	2531	};
	2532
	2533	for $pat (@pats) {
	2534	printf "%-12s ", $pat;
	2535	if ( /$pat/ ) {
	2536	print "<$1> <$2>\n";
	2537	} else {
	2538	print "FAIL\n";
	2539	}
	2540	}
	2541
	2542	That will print out:
	2543
	2544	(.)(\d) <I have 2 numbers: 53147> <>
	2545	(.*)(\d+) <I have 2 numbers: 5314> <7>
	2546	(.?)(\d) <> <>
	2547	(.*?)(\d+) <I have > <2>
	2548	(.*)(\d+)$ <I have 2 numbers: 5314> <7>
	2549	(.*?)(\d+)$ <I have 2 numbers: > <53147>
	2550	(.*)\b(\d+)$ <I have 2 numbers: > <53147>
	2551	(.*\D)(\d+)$ <I have 2 numbers: > <53147>
	2552
	2553	As you see, this can be a bit tricky. It's important to realize that a
	2554	regular expression is merely a set of assertions that gives a definition
	2555	of success. There may be 0, 1, or several different ways that the
	2556	definition might succeed against a particular string. And if there are
	2557	multiple ways it might succeed, you need to understand backtracking to
	2558	know which variety of success you will achieve.
	2559
	2560	When using lookahead assertions and negations, this can all get even
	2561	trickier. Imagine you'd like to find a sequence of non-digits not
	2562	followed by "123". You might try to write that as
	2563
	2564	$_ = "ABC123";
	2565	if ( /^\D*(?!123)/ ) { # Wrong!
	2566	print "Yup, no 123 in $_\n";
	2567	}
	2568
	2569	But that isn't going to match; at least, not the way you're hoping. It
	2570	claims that there is no 123 in the string. Here's a clearer picture of
	2571	why that pattern matches, contrary to popular expectations:
	2572
	2573	$x = 'ABC123';
	2574	$y = 'ABC445';
	2575
	2576	print "1: got $1\n" if $x =~ /^(ABC)(?!123)/;
	2577	print "2: got $1\n" if $y =~ /^(ABC)(?!123)/;
	2578
	2579	print "3: got $1\n" if $x =~ /^(\D*)(?!123)/;
	2580	print "4: got $1\n" if $y =~ /^(\D*)(?!123)/;
	2581
	2582	This prints
	2583
	2584	2: got ABC
	2585	3: got AB
	2586	4: got ABC
	2587
	2588	You might have expected test 3 to fail because it seems to a more
	2589	general purpose version of test 1. The important difference between
	2590	them is that test 3 contains a quantifier (C<\D*>) and so can use
	2591	backtracking, whereas test 1 will not. What's happening is
	2592	that you've asked "Is it true that at the start of C<$x>, following 0 or more
	2593	non-digits, you have something that's not 123?" If the pattern matcher had
	2594	let C<\D*> expand to "ABC", this would have caused the whole pattern to
	2595	fail.
	2596
	2597	The search engine will initially match C<\D*> with "ABC". Then it will
	2598	try to match C<(?!123)> with "123", which fails. But because
	2599	a quantifier (C<\D*>) has been used in the regular expression, the
	2600	search engine can backtrack and retry the match differently
	2601	in the hope of matching the complete regular expression.
	2602
	2603	The pattern really, I<really> wants to succeed, so it uses the
	2604	standard pattern back-off-and-retry and lets C<\D*> expand to just "AB" this
	2605	time. Now there's indeed something following "AB" that is not
	2606	"123". It's "C123", which suffices.
	2607
	2608	We can deal with this by using both an assertion and a negation.
	2609	We'll say that the first part in C<$1> must be followed both by a digit
	2610	and by something that's not "123". Remember that the lookaheads
	2611	are zero-width expressions--they only look, but don't consume any
	2612	of the string in their match. So rewriting this way produces what
	2613	you'd expect; that is, case 5 will fail, but case 6 succeeds:
	2614
	2615	print "5: got $1\n" if $x =~ /^(\D*)(?=\d)(?!123)/;
	2616	print "6: got $1\n" if $y =~ /^(\D*)(?=\d)(?!123)/;
	2617
	2618	6: got ABC
	2619
	2620	In other words, the two zero-width assertions next to each other work as though
	2621	they're ANDed together, just as you'd use any built-in assertions: C</^$/>
	2622	matches only if you're at the beginning of the line AND the end of the
	2623	line simultaneously. The deeper underlying truth is that juxtaposition in
	2624	regular expressions always means AND, except when you write an explicit OR
	2625	using the vertical bar. C</ab/> means match "a" AND (then) match "b",
	2626	although the attempted matches are made at different positions because "a"
	2627	is not a zero-width assertion, but a one-width assertion.
	2628
	2629	B<WARNING>: Particularly complicated regular expressions can take
	2630	exponential time to solve because of the immense number of possible
	2631	ways they can use backtracking to try for a match. For example, without
	2632	internal optimizations done by the regular expression engine, this will
	2633	take a painfully long time to run:
	2634
	2635	'aaaaaaaaaaaa' =~ /((a{0,5}){0,5})*[c]/
	2636
	2637	And if you used C<"*">'s in the internal groups instead of limiting them
	2638	to 0 through 5 matches, then it would take forever--or until you ran
	2639	out of stack space. Moreover, these internal optimizations are not
	2640	always applicable. For example, if you put C<{0,5}> instead of C<"*">
	2641	on the external group, no current optimization is applicable, and the
	2642	match takes a long time to finish.
	2643
	2644	A powerful tool for optimizing such beasts is what is known as an
	2645	"independent group",
	2646	which does not backtrack (see C<L</(?E<gt>pattern)>>). Note also that
	2647	zero-length lookahead/lookbehind assertions will not backtrack to make
	2648	the tail match, since they are in "logical" context: only
	2649	whether they match is considered relevant. For an example
	2650	where side-effects of lookahead I<might> have influenced the
	2651	following match, see C<L</(?E<gt>pattern)>>.
	2652
	2653	=head2 Script Runs
	2654	X<(*script_run:...)> X<(sr:...)>
	2655	X<(*atomic_script_run:...)> X<(asr:...)>
	2656
	2657	A script run is basically a sequence of characters, all from the same
	2658	Unicode script (see L<perlunicode/Scripts>), such as Latin or Greek. In
	2659	most places a single word would never be written in multiple scripts,
	2660	unless it is a spoofing attack. An infamous example, is
	2661
	2662	paypal.com
	2663
	2664	Those letters could all be Latin (as in the example just above), or they
	2665	could be all Cyrillic (except for the dot), or they could be a mixture
	2666	of the two. In the case of an internet address the C<.com> would be in
	2667	Latin, And any Cyrillic ones would cause it to be a mixture, not a
	2668	script run. Someone clicking on such a link would not be directed to
	2669	the real Paypal website, but an attacker would craft a look-alike one to
	2670	attempt to gather sensitive information from the person.
	2671
	2672	Starting in Perl 5.28, it is now easy to detect strings that aren't
	2673	script runs. Simply enclose just about any pattern like either of
	2674	these:
	2675
	2676	(*script_run:pattern)
	2677	(*sr:pattern)
	2678
	2679	What happens is that after I<pattern> succeeds in matching, it is
	2680	subjected to the additional criterion that every character in it must be
	2681	from the same script (see exceptions below). If this isn't true,
	2682	backtracking occurs until something all in the same script is found that
	2683	matches, or all possibilities are exhausted. This can cause a lot of
	2684	backtracking, but generally, only malicious input will result in this,
	2685	though the slow down could cause a denial of service attack. If your
	2686	needs permit, it is best to make the pattern atomic to cut down on the
	2687	amount of backtracking. This is so likely to be what you want, that
	2688	instead of writing this:
	2689
	2690	(*script_run:(?>pattern))
	2691
	2692	you can write either of these:
	2693
	2694	(*atomic_script_run:pattern)
	2695	(*asr:pattern)
	2696
	2697	(See C<L</(?E<gt>I<pattern>)>>.)
	2698
	2699	In Taiwan, Japan, and Korea, it is common for text to have a mixture of
	2700	characters from their native scripts and base Chinese. Perl follows
	2701	Unicode's UTS 39 (L<https://unicode.org/reports/tr39/>) Unicode Security
	2702	Mechanisms in allowing such mixtures. For example, the Japanese scripts
	2703	Katakana and Hiragana are commonly mixed together in practice, along
	2704	with some Chinese characters, and hence are treated as being in a single
	2705	script run by Perl.
	2706
	2707	The rules used for matching decimal digits are slightly stricter. Many
	2708	scripts have their own sets of digits equivalent to the Western C<0>
	2709	through C<9> ones. A few, such as Arabic, have more than one set. For
	2710	a string to be considered a script run, all digits in it must come from
	2711	the same set of ten, as determined by the first digit encountered.
	2712	As an example,
	2713
	2714	qr/(*script_run: \d+ \b )/x
	2715
	2716	guarantees that the digits matched will all be from the same set of 10.
	2717	You won't get a look-alike digit from a different script that has a
	2718	different value than what it appears to be.
	2719
	2720	Unicode has three pseudo scripts that are handled specially.
	2721
	2722	"Unknown" is applied to code points whose meaning has yet to be
	2723	determined. Perl currently will match as a script run, any single
	2724	character string consisting of one of these code points. But any string
	2725	longer than one code point containing one of these will not be
	2726	considered a script run.
	2727
	2728	"Inherited" is applied to characters that modify another, such as an
	2729	accent of some type. These are considered to be in the script of the
	2730	master character, and so never cause a script run to not match.
	2731
	2732	The other one is "Common". This consists of mostly punctuation, emoji,
	2733	characters used in mathematics and music, the ASCII digits C<0>
	2734	through C<9>, and full-width forms of these digits. These characters
	2735	can appear intermixed in text in many of the world's scripts. These
	2736	also don't cause a script run to not match. But like other scripts, all
	2737	digits in a run must come from the same set of 10.
	2738
	2739	This construct is non-capturing. You can add parentheses to I<pattern>
	2740	to capture, if desired. You will have to do this if you plan to use
	2741	L</(ACCEPT) (ACCEPT:arg)> and not have it bypass the script run
	2742	checking.
	2743
	2744	The C<Script_Extensions> property as modified by UTS 39
	2745	(L<https://unicode.org/reports/tr39/>) is used as the basis for this
	2746	feature.
	2747
	2748	To summarize,
	2749
	2750	=over 4
	2751
	2752	=item *
	2753
	2754	All length 0 or length 1 sequences are script runs.
	2755
	2756	=item *
	2757
	2758	A longer sequence is a script run if and only if B<all> of the following
	2759	conditions are met:
	2760
	2761	Z<>
	2762
	2763	=over
	2764
	2765	=item 1
	2766
	2767	No code point in the sequence has the C<Script_Extension> property of
	2768	C<Unknown>.
	2769
	2770	This currently means that all code points in the sequence have been
	2771	assigned by Unicode to be characters that aren't private use nor
	2772	surrogate code points.
	2773
	2774	=item 2
	2775
	2776	All characters in the sequence come from the Common script and/or the
	2777	Inherited script and/or a single other script.
	2778
	2779	The script of a character is determined by the C<Script_Extensions>
	2780	property as modified by UTS 39 (L<https://unicode.org/reports/tr39/>), as
	2781	described above.
	2782
	2783	=item 3
	2784
	2785	All decimal digits in the sequence come from the same block of 10
	2786	consecutive digits.
	2787
	2788	=back
	2789
	2790	=back
	2791
	2792	=head2 Special Backtracking Control Verbs
	2793
	2794	These special patterns are generally of the form C<(*I<VERB>:I<arg>)>. Unless
	2795	otherwise stated the I<arg> argument is optional; in some cases, it is
	2796	mandatory.
	2797
	2798	Any pattern containing a special backtracking verb that allows an argument
	2799	has the special behaviour that when executed it sets the current package's
	2800	C<$REGERROR> and C<$REGMARK> variables. When doing so the following
	2801	rules apply:
	2802
	2803	On failure, the C<$REGERROR> variable will be set to the I<arg> value of the
	2804	verb pattern, if the verb was involved in the failure of the match. If the
	2805	I<arg> part of the pattern was omitted, then C<$REGERROR> will be set to the
	2806	name of the last C<(*MARK:I<NAME>)> pattern executed, or to TRUE if there was
	2807	none. Also, the C<$REGMARK> variable will be set to FALSE.
	2808
	2809	On a successful match, the C<$REGERROR> variable will be set to FALSE, and
	2810	the C<$REGMARK> variable will be set to the name of the last
	2811	C<(*MARK:I<NAME>)> pattern executed. See the explanation for the
	2812	C<(*MARK:I<NAME>)> verb below for more details.
	2813
	2814	B<NOTE:> C<$REGERROR> and C<$REGMARK> are not magic variables like C<$1>
	2815	and most other regex-related variables. They are not local to a scope, nor
	2816	readonly, but instead are volatile package variables similar to C<$AUTOLOAD>.
	2817	They are set in the package containing the code that I<executed> the regex
	2818	(rather than the one that compiled it, where those differ). If necessary, you
	2819	can use C<local> to localize changes to these variables to a specific scope
	2820	before executing a regex.
	2821
	2822	If a pattern does not contain a special backtracking verb that allows an
	2823	argument, then C<$REGERROR> and C<$REGMARK> are not touched at all.
	2824
	2825	=over 3
	2826
	2827	=item Verbs
	2828
	2829	=over 4
	2830
	2831	=item C<(PRUNE)> C<(PRUNE:I<NAME>)>
	2832	X<(PRUNE)> X<(PRUNE:NAME)>
	2833
	2834	This zero-width pattern prunes the backtracking tree at the current point
	2835	when backtracked into on failure. Consider the pattern C</I<A> (*PRUNE) I<B>/>,
	2836	where I<A> and I<B> are complex patterns. Until the C<(*PRUNE)> verb is reached,
	2837	I<A> may backtrack as necessary to match. Once it is reached, matching
	2838	continues in I<B>, which may also backtrack as necessary; however, should B
	2839	not match, then no further backtracking will take place, and the pattern
	2840	will fail outright at the current starting position.
	2841
	2842	The following example counts all the possible matching strings in a
	2843	pattern (without actually matching any of them).
	2844
	2845	'aaab' =~ /a+b?(?{print "$&\n"; $count++})(*FAIL)/;
	2846	print "Count=$count\n";
	2847
	2848	which produces:
	2849
	2850	aaab
	2851	aaa
	2852	aa
	2853	a
	2854	aab
	2855	aa
	2856	a
	2857	ab
	2858	a
	2859	Count=9
	2860
	2861	If we add a C<(*PRUNE)> before the count like the following
	2862
	2863	'aaab' =~ /a+b?(PRUNE)(?{print "$&\n"; $count++})(FAIL)/;
	2864	print "Count=$count\n";
	2865
	2866	we prevent backtracking and find the count of the longest matching string
	2867	at each matching starting point like so:
	2868
	2869	aaab
	2870	aab
	2871	ab
	2872	Count=3
	2873
	2874	Any number of C<(*PRUNE)> assertions may be used in a pattern.
	2875
	2876	See also C<<< L<< /(?>I<pattern>) >> >>> and possessive quantifiers for
	2877	other ways to
	2878	control backtracking. In some cases, the use of C<(*PRUNE)> can be
	2879	replaced with a C<< (?>pattern) >> with no functional difference; however,
	2880	C<(*PRUNE)> can be used to handle cases that cannot be expressed using a
	2881	C<< (?>pattern) >> alone.
	2882
	2883	=item C<(SKIP)> C<(SKIP:I<NAME>)>
	2884	X<(*SKIP)>
	2885
	2886	This zero-width pattern is similar to C<(*PRUNE)>, except that on
	2887	failure it also signifies that whatever text that was matched leading up
	2888	to the C<(*SKIP)> pattern being executed cannot be part of I<any> match
	2889	of this pattern. This effectively means that the regex engine "skips" forward
	2890	to this position on failure and tries to match again, (assuming that
	2891	there is sufficient room to match).
	2892
	2893	The name of the C<(*SKIP:I<NAME>)> pattern has special significance. If a
	2894	C<(*MARK:I<NAME>)> was encountered while matching, then it is that position
	2895	which is used as the "skip point". If no C<(*MARK)> of that name was
	2896	encountered, then the C<(*SKIP)> operator has no effect. When used
	2897	without a name the "skip point" is where the match point was when
	2898	executing the C<(*SKIP)> pattern.
	2899
	2900	Compare the following to the examples in C<(*PRUNE)>; note the string
	2901	is twice as long:
	2902
	2903	'aaabaaab' =~ /a+b?(SKIP)(?{print "$&\n"; $count++})(FAIL)/;
	2904	print "Count=$count\n";
	2905
	2906	outputs
	2907
	2908	aaab
	2909	aaab
	2910	Count=2
	2911
	2912	Once the 'aaab' at the start of the string has matched, and the C<(*SKIP)>
	2913	executed, the next starting point will be where the cursor was when the
	2914	C<(*SKIP)> was executed.
	2915
	2916	=item C<(MARK:I<NAME>)> C<(:I<NAME>)>
	2917	X<(MARK)> X<(MARK:NAME)> X<(*:NAME)>
	2918
	2919	This zero-width pattern can be used to mark the point reached in a string
	2920	when a certain part of the pattern has been successfully matched. This
	2921	mark may be given a name. A later C<(*SKIP)> pattern will then skip
	2922	forward to that point if backtracked into on failure. Any number of
	2923	C<(*MARK)> patterns are allowed, and the I<NAME> portion may be duplicated.
	2924
	2925	In addition to interacting with the C<(SKIP)> pattern, C<(MARK:I<NAME>)>
	2926	can be used to "label" a pattern branch, so that after matching, the
	2927	program can determine which branches of the pattern were involved in the
	2928	match.
	2929
	2930	When a match is successful, the C<$REGMARK> variable will be set to the
	2931	name of the most recently executed C<(*MARK:I<NAME>)> that was involved
	2932	in the match.
	2933
	2934	This can be used to determine which branch of a pattern was matched
	2935	without using a separate capture group for each branch, which in turn
	2936	can result in a performance improvement, as perl cannot optimize
	2937	C</(?:(x)\|(y)\|(z))/> as efficiently as something like
	2938	C</(?:x(MARK:x)\|y(MARK:y)\|z(*MARK:z))/>.
	2939
	2940	When a match has failed, and unless another verb has been involved in
	2941	failing the match and has provided its own name to use, the C<$REGERROR>
	2942	variable will be set to the name of the most recently executed
	2943	C<(*MARK:I<NAME>)>.
	2944
	2945	See L</(*SKIP)> for more details.
	2946
	2947	As a shortcut C<(MARK:I<NAME>)> can be written C<(:I<NAME>)>.
	2948
	2949	=item C<(THEN)> C<(THEN:I<NAME>)>
	2950
	2951	This is similar to the "cut group" operator C<::> from Raku. Like
	2952	C<(*PRUNE)>, this verb always matches, and when backtracked into on
	2953	failure, it causes the regex engine to try the next alternation in the
	2954	innermost enclosing group (capturing or otherwise) that has alternations.
	2955	The two branches of a C<(?(I<condition>)I<yes-pattern>\|I<no-pattern>)> do not
	2956	count as an alternation, as far as C<(*THEN)> is concerned.
	2957
	2958	Its name comes from the observation that this operation combined with the
	2959	alternation operator (C<"\|">) can be used to create what is essentially a
	2960	pattern-based if/then/else block:
	2961
	2962	( COND (THEN) FOO \| COND2 (THEN) BAR \| COND3 (*THEN) BAZ )
	2963
	2964	Note that if this operator is used and NOT inside of an alternation then
	2965	it acts exactly like the C<(*PRUNE)> operator.
	2966
	2967	/ A (*PRUNE) B /
	2968
	2969	is the same as
	2970
	2971	/ A (*THEN) B /
	2972
	2973	but
	2974
	2975	/ ( A (*THEN) B \| C ) /
	2976
	2977	is not the same as
	2978
	2979	/ ( A (*PRUNE) B \| C ) /
	2980
	2981	as after matching the I<A> but failing on the I<B> the C<(*THEN)> verb will
	2982	backtrack and try I<C>; but the C<(*PRUNE)> verb will simply fail.
	2983
	2984	=item C<(COMMIT)> C<(COMMIT:I<arg>)>
	2985	X<(*COMMIT)>
	2986
	2987	This is the Raku "commit pattern" C<< <commit> >> or C<:::>. It's a
	2988	zero-width pattern similar to C<(*SKIP)>, except that when backtracked
	2989	into on failure it causes the match to fail outright. No further attempts
	2990	to find a valid match by advancing the start pointer will occur again.
	2991	For example,
	2992
	2993	'aaabaaab' =~ /a+b?(COMMIT)(?{print "$&\n"; $count++})(FAIL)/;
	2994	print "Count=$count\n";
	2995
	2996	outputs
	2997
	2998	aaab
	2999	Count=1
	3000
	3001	In other words, once the C<(*COMMIT)> has been entered, and if the pattern
	3002	does not match, the regex engine will not try any further matching on the
	3003	rest of the string.
	3004
	3005	=item C<(FAIL)> C<(F)> C<(*FAIL:I<arg>)>
	3006	X<(FAIL)> X<(F)>
	3007
	3008	This pattern matches nothing and always fails. It can be used to force the
	3009	engine to backtrack. It is equivalent to C<(?!)>, but easier to read. In
	3010	fact, C<(?!)> gets optimised into C<(*FAIL)> internally. You can provide
	3011	an argument so that if the match fails because of this C<FAIL> directive
	3012	the argument can be obtained from C<$REGERROR>.
	3013
	3014	It is probably useful only when combined with C<(?{})> or C<(??{})>.
	3015
	3016	=item C<(ACCEPT)> C<(ACCEPT:I<arg>)>
	3017	X<(*ACCEPT)>
	3018
	3019	This pattern matches nothing and causes the end of successful matching at
	3020	the point at which the C<(*ACCEPT)> pattern was encountered, regardless of
	3021	whether there is actually more to match in the string. When inside of a
	3022	nested pattern, such as recursion, or in a subpattern dynamically generated
	3023	via C<(??{})>, only the innermost pattern is ended immediately.
	3024
	3025	If the C<(*ACCEPT)> is inside of capturing groups then the groups are
	3026	marked as ended at the point at which the C<(*ACCEPT)> was encountered.
	3027	For instance:
	3028
	3029	'AB' =~ /(A (A\|B(*ACCEPT)\|C) D)(E)/x;
	3030
	3031	will match, and C<$1> will be C<AB> and C<$2> will be C<"B">, C<$3> will not
	3032	be set. If another branch in the inner parentheses was matched, such as in the
	3033	string 'ACDE', then the C<"D"> and C<"E"> would have to be matched as well.
	3034
	3035	You can provide an argument, which will be available in the var
	3036	C<$REGMARK> after the match completes.
	3037
	3038	=back
	3039
	3040	=back
	3041
	3042	=head2 Warning on C<\1> Instead of C<$1>
	3043
	3044	Some people get too used to writing things like:
	3045
	3046	$pattern =~ s/(\W)/\\\1/g;
	3047
	3048	This is grandfathered (for \1 to \9) for the RHS of a substitute to avoid
	3049	shocking the
	3050	B<sed> addicts, but it's a dirty habit to get into. That's because in
	3051	PerlThink, the righthand side of an C<s///> is a double-quoted string. C<\1> in
	3052	the usual double-quoted string means a control-A. The customary Unix
	3053	meaning of C<\1> is kludged in for C<s///>. However, if you get into the habit
	3054	of doing that, you get yourself into trouble if you then add an C</e>
	3055	modifier.
	3056
	3057	s/(\d+)/ \1 + 1 /eg; # causes warning under -w
	3058
	3059	Or if you try to do
	3060
	3061	s/(\d+)/\1000/;
	3062
	3063	You can't disambiguate that by saying C<\{1}000>, whereas you can fix it with
	3064	C<${1}000>. The operation of interpolation should not be confused
	3065	with the operation of matching a backreference. Certainly they mean two
	3066	different things on the I<left> side of the C<s///>.
	3067
	3068	=head2 Repeated Patterns Matching a Zero-length Substring
	3069
	3070	B<WARNING>: Difficult material (and prose) ahead. This section needs a rewrite.
	3071
	3072	Regular expressions provide a terse and powerful programming language. As
	3073	with most other power tools, power comes together with the ability
	3074	to wreak havoc.
	3075
	3076	A common abuse of this power stems from the ability to make infinite
	3077	loops using regular expressions, with something as innocuous as:
	3078
	3079	'foo' =~ m{ ( o? )* }x;
	3080
	3081	The C<o?> matches at the beginning of "C<foo>", and since the position
	3082	in the string is not moved by the match, C<o?> would match again and again
	3083	because of the C<"*"> quantifier. Another common way to create a similar cycle
	3084	is with the looping modifier C</g>:
	3085
	3086	@matches = ( 'foo' =~ m{ o? }xg );
	3087
	3088	or
	3089
	3090	print "match: <$&>\n" while 'foo' =~ m{ o? }xg;
	3091
	3092	or the loop implied by C<split()>.
	3093
	3094	However, long experience has shown that many programming tasks may
	3095	be significantly simplified by using repeated subexpressions that
	3096	may match zero-length substrings. Here's a simple example being:
	3097
	3098	@chars = split //, $string; # // is not magic in split
	3099	($whitewashed = $string) =~ s/()/ /g; # parens avoid magic s// /
	3100
	3101	Thus Perl allows such constructs, by I<forcefully breaking
	3102	the infinite loop>. The rules for this are different for lower-level
	3103	loops given by the greedy quantifiers C<*+{}>, and for higher-level
	3104	ones like the C</g> modifier or C<split()> operator.
	3105
	3106	The lower-level loops are I<interrupted> (that is, the loop is
	3107	broken) when Perl detects that a repeated expression matched a
	3108	zero-length substring. Thus
	3109
	3110	m{ (?: NON_ZERO_LENGTH \| ZERO_LENGTH )* }x;
	3111
	3112	is made equivalent to
	3113
	3114	m{ (?: NON_ZERO_LENGTH )* (?: ZERO_LENGTH )? }x;
	3115
	3116	For example, this program
	3117
	3118	#!perl -l
	3119	"aaaaab" =~ /
	3120	(?:
	3121	a # non-zero
	3122	\| # or
	3123	(?{print "hello"}) # print hello whenever this
	3124	# branch is tried
	3125	(?=(b)) # zero-width assertion
	3126	)* # any number of times
	3127	/x;
	3128	print $&;
	3129	print $1;
	3130
	3131	prints
	3132
	3133	hello
	3134	aaaaa
	3135	b
	3136
	3137	Notice that "hello" is only printed once, as when Perl sees that the sixth
	3138	iteration of the outermost C<(?:)*> matches a zero-length string, it stops
	3139	the C<"*">.
	3140
	3141	The higher-level loops preserve an additional state between iterations:
	3142	whether the last match was zero-length. To break the loop, the following
	3143	match after a zero-length match is prohibited to have a length of zero.
	3144	This prohibition interacts with backtracking (see L</"Backtracking">),
	3145	and so the I<second best> match is chosen if the I<best> match is of
	3146	zero length.
	3147
	3148	For example:
	3149
	3150	$_ = 'bar';
	3151	s/\w??/<$&>/g;
	3152
	3153	results in C<< <><b><><a><><r><> >>. At each position of the string the best
	3154	match given by non-greedy C<??> is the zero-length match, and the I<second
	3155	best> match is what is matched by C<\w>. Thus zero-length matches
	3156	alternate with one-character-long matches.
	3157
	3158	Similarly, for repeated C<m/()/g> the second-best match is the match at the
	3159	position one notch further in the string.
	3160
	3161	The additional state of being I<matched with zero-length> is associated with
	3162	the matched string, and is reset by each assignment to C<pos()>.
	3163	Zero-length matches at the end of the previous match are ignored
	3164	during C<split>.
	3165
	3166	=head2 Combining RE Pieces
	3167
	3168	Each of the elementary pieces of regular expressions which were described
	3169	before (such as C<ab> or C<\Z>) could match at most one substring
	3170	at the given position of the input string. However, in a typical regular
	3171	expression these elementary pieces are combined into more complicated
	3172	patterns using combining operators C<ST>, C<S\|T>, C<S*> I<etc>.
	3173	(in these examples C<"S"> and C<"T"> are regular subexpressions).
	3174
	3175	Such combinations can include alternatives, leading to a problem of choice:
	3176	if we match a regular expression C<a\|ab> against C<"abc">, will it match
	3177	substring C<"a"> or C<"ab">? One way to describe which substring is
	3178	actually matched is the concept of backtracking (see L</"Backtracking">).
	3179	However, this description is too low-level and makes you think
	3180	in terms of a particular implementation.
	3181
	3182	Another description starts with notions of "better"/"worse". All the
	3183	substrings which may be matched by the given regular expression can be
	3184	sorted from the "best" match to the "worst" match, and it is the "best"
	3185	match which is chosen. This substitutes the question of "what is chosen?"
	3186	by the question of "which matches are better, and which are worse?".
	3187
	3188	Again, for elementary pieces there is no such question, since at most
	3189	one match at a given position is possible. This section describes the
	3190	notion of better/worse for combining operators. In the description
	3191	below C<"S"> and C<"T"> are regular subexpressions.
	3192
	3193	=over 4
	3194
	3195	=item C<ST>
	3196
	3197	Consider two possible matches, C<AB> and C<A'B'>, C<"A"> and C<A'> are
	3198	substrings which can be matched by C<"S">, C<"B"> and C<B'> are substrings
	3199	which can be matched by C<"T">.
	3200
	3201	If C<"A"> is a better match for C<"S"> than C<A'>, C<AB> is a better
	3202	match than C<A'B'>.
	3203
	3204	If C<"A"> and C<A'> coincide: C<AB> is a better match than C<AB'> if
	3205	C<"B"> is a better match for C<"T"> than C<B'>.
	3206
	3207	=item C<S\|T>
	3208
	3209	When C<"S"> can match, it is a better match than when only C<"T"> can match.
	3210
	3211	Ordering of two matches for C<"S"> is the same as for C<"S">. Similar for
	3212	two matches for C<"T">.
	3213
	3214	=item C<S{REPEAT_COUNT}>
	3215
	3216	Matches as C<SSS...S> (repeated as many times as necessary).
	3217
	3218	=item C<S{min,max}>
	3219
	3220	Matches as C<S{max}\|S{max-1}\|...\|S{min+1}\|S{min}>.
	3221
	3222	=item C<S{min,max}?>
	3223
	3224	Matches as C<S{min}\|S{min+1}\|...\|S{max-1}\|S{max}>.
	3225
	3226	=item C<S?>, C<S*>, C<S+>
	3227
	3228	Same as C<S{0,1}>, C<S{0,BIG_NUMBER}>, C<S{1,BIG_NUMBER}> respectively.
	3229
	3230	=item C<S??>, C<S*?>, C<S+?>
	3231
	3232	Same as C<S{0,1}?>, C<S{0,BIG_NUMBER}?>, C<S{1,BIG_NUMBER}?> respectively.
	3233
	3234	=item C<< (?>S) >>
	3235
	3236	Matches the best match for C<"S"> and only that.
	3237
	3238	=item C<(?=S)>, C<(?<=S)>
	3239
	3240	Only the best match for C<"S"> is considered. (This is important only if
	3241	C<"S"> has capturing parentheses, and backreferences are used somewhere
	3242	else in the whole regular expression.)
	3243
	3244	=item C<(?!S)>, C<(?<!S)>
	3245
	3246	For this grouping operator there is no need to describe the ordering, since
	3247	only whether or not C<"S"> can match is important.
	3248
	3249	=item C<(??{ I<EXPR> })>, C<(?I<PARNO>)>
	3250
	3251	The ordering is the same as for the regular expression which is
	3252	the result of I<EXPR>, or the pattern contained by capture group I<PARNO>.
	3253
	3254	=item C<(?(I<condition>)I<yes-pattern>\|I<no-pattern>)>
	3255
	3256	Recall that which of I<yes-pattern> or I<no-pattern> actually matches is
	3257	already determined. The ordering of the matches is the same as for the
	3258	chosen subexpression.
	3259
	3260	=back
	3261
	3262	The above recipes describe the ordering of matches I<at a given position>.
	3263	One more rule is needed to understand how a match is determined for the
	3264	whole regular expression: a match at an earlier position is always better
	3265	than a match at a later position.
	3266
	3267	=head2 Creating Custom RE Engines
	3268
	3269	As of Perl 5.10.0, one can create custom regular expression engines. This
	3270	is not for the faint of heart, as they have to plug in at the C level. See
	3271	L<perlreapi> for more details.
	3272
	3273	As an alternative, overloaded constants (see L<overload>) provide a simple
	3274	way to extend the functionality of the RE engine, by substituting one
	3275	pattern for another.
	3276
	3277	Suppose that we want to enable a new RE escape-sequence C<\Y\|> which
	3278	matches at a boundary between whitespace characters and non-whitespace
	3279	characters. Note that C<(?=\S)(?<!\S)\|(?!\S)(?<=\S)> matches exactly
	3280	at these positions, so we want to have each C<\Y\|> in the place of the
	3281	more complicated version. We can create a module C<customre> to do
	3282	this:
	3283
	3284	package customre;
	3285	use overload;
	3286
	3287	sub import {
	3288	shift;
	3289	die "No argument to customre::import allowed" if @_;
	3290	overload::constant 'qr' => \&convert;
	3291	}
	3292
	3293	sub invalid { die "/$_[0]/: invalid escape '\\$_[1]'"}
	3294
	3295	# We must also take care of not escaping the legitimate \\Y\|
	3296	# sequence, hence the presence of '\\' in the conversion rules.
	3297	my %rules = ( '\\' => '\\\\',
	3298	'Y\|' => qr/(?=\S)(?<!\S)\|(?!\S)(?<=\S)/ );
	3299	sub convert {
	3300	my $re = shift;
	3301	$re =~ s{
	3302	\\ ( \\ \| Y . )
	3303	}
	3304	{ $rules{$1} or invalid($re,$1) }sgex;
	3305	return $re;
	3306	}
	3307
	3308	Now C<use customre> enables the new escape in constant regular
	3309	expressions, I<i.e.>, those without any runtime variable interpolations.
	3310	As documented in L<overload>, this conversion will work only over
	3311	literal parts of regular expressions. For C<\Y\|$re\Y\|> the variable
	3312	part of this regular expression needs to be converted explicitly
	3313	(but only if the special meaning of C<\Y\|> should be enabled inside C<$re>):
	3314
	3315	use customre;
	3316	$re = <>;
	3317	chomp $re;
	3318	$re = customre::convert $re;
	3319	/\Y\|$re\Y\|/;
	3320
	3321	=head2 Embedded Code Execution Frequency
	3322
	3323	The exact rules for how often C<(?{})> and C<(??{})> are executed in a pattern
	3324	are unspecified, and this is even more true of C<(*{})>.
	3325	In the case of a successful match you can assume that they DWIM and
	3326	will be executed in left to right order the appropriate number of times in the
	3327	accepting path of the pattern as would any other meta-pattern. How non-
	3328	accepting pathways and match failures affect the number of times a pattern is
	3329	executed is specifically unspecified and may vary depending on what
	3330	optimizations can be applied to the pattern and is likely to change from
	3331	version to version.
	3332
	3333	For instance in
	3334
	3335	"aaabcdeeeee"=~/a(?{print "a"})b(?{print "b"})cde/;
	3336
	3337	the exact number of times "a" or "b" are printed out is unspecified for
	3338	failure, but you may assume they will be printed at least once during
	3339	a successful match, additionally you may assume that if "b" is printed,
	3340	it will be preceded by at least one "a".
	3341
	3342	In the case of branching constructs like the following:
	3343
	3344	/a(b\|(?{ print "a" }))c(?{ print "c" })/;
	3345
	3346	you can assume that the input "ac" will output "ac", and that "abc"
	3347	will output only "c".
	3348
	3349	When embedded code is quantified, successful matches will call the
	3350	code once for each matched iteration of the quantifier. For
	3351	example:
	3352
	3353	"good" =~ /g(?:o(?{print "o"}))*d/;
	3354
	3355	will output "o" twice.
	3356
	3357	For historical and consistency reasons the use of normal code blocks
	3358	anywhere in a pattern will disable certain optimisations. As of 5.37.7
	3359	you can use an "optimistic" codeblock, C<(*{ ... })> as a replacement
	3360	for C<(?{ ... })>, if you do not wish to disable these optimisations.
	3361	This may result in the code block being called less often than it might
	3362	have been had they not been optimistic.
	3363
	3364	=head2 PCRE/Python Support
	3365
	3366	As of Perl 5.10.0, Perl supports several Python/PCRE-specific extensions
	3367	to the regex syntax. While Perl programmers are encouraged to use the
	3368	Perl-specific syntax, the following are also accepted:
	3369
	3370	=over 4
	3371
	3372	=item C<< (?PE<lt>I<NAME>E<gt>I<pattern>) >>
	3373
	3374	Define a named capture group. Equivalent to C<< (?<I<NAME>>I<pattern>) >>.
	3375
	3376	=item C<< (?P=I<NAME>) >>
	3377
	3378	Backreference to a named capture group. Equivalent to C<< \g{I<NAME>} >>.
	3379
	3380	=item C<< (?P>I<NAME>) >>
	3381
	3382	Subroutine call to a named capture group. Equivalent to C<< (?&I<NAME>) >>.
	3383
	3384	=back
	3385
	3386	=head1 BUGS
	3387
	3388	There are a number of issues with regard to case-insensitive matching
	3389	in Unicode rules. See C<"i"> under L</Modifiers> above.
	3390
	3391	This document varies from difficult to understand to completely
	3392	and utterly opaque. The wandering prose riddled with jargon is
	3393	hard to fathom in several places.
	3394
	3395	This document needs a rewrite that separates the tutorial content
	3396	from the reference content.
	3397
	3398	=head1 SEE ALSO
	3399
	3400	The syntax of patterns used in Perl pattern matching evolved from those
	3401	supplied in the Bell Labs Research Unix 8th Edition (Version 8) regex
	3402	routines. (The code is actually derived (distantly) from Henry
	3403	Spencer's freely redistributable reimplementation of those V8 routines.)
	3404
	3405	L<perlrequick>.
	3406
	3407	L<perlretut>.
	3408
	3409	L<perlop/"Regexp Quote-Like Operators">.
	3410
	3411	L<perlop/"Gory details of parsing quoted constructs">.
	3412
	3413	L<perlfaq6>.
	3414
	3415	L<perlfunc/pos>.
	3416
	3417	L<perllocale>.
	3418
	3419	L<perlebcdic>.
	3420
	3421	I<Mastering Regular Expressions> by Jeffrey Friedl, published
	3422	by O'Reilly and Associates.