[perl5.git] / pod / perlreref.pod

=head1 NAME

perlreref - Perl Regular Expressions Reference

=head1 DESCRIPTION

This is a quick reference to Perl's regular expressions.
For full information see L<perlre> and L<perlop>, as well
as the L</"SEE ALSO"> section in this document.

=head2 OPERATORS

C<=~> determines to which variable the regex is applied.
In its absence, $_ is used.

    $var =~ /foo/;

C<!~> determines to which variable the regex is applied,
and negates the result of the match; it returns
false if the match succeeds, and true if it fails.

    $var !~ /foo/;

C<m/pattern/msixpogcdualn> searches a string for a pattern match,
applying the given options.

    m  Multiline mode - ^ and $ match internal lines
    s  match as a Single line - . matches \n
    i  case-Insensitive
    x  eXtended legibility - free whitespace and comments
    p  Preserve a copy of the matched string -
       ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
    o  compile pattern Once
    g  Global - all occurrences
    c  don't reset pos on failed matches when using /g
    a  restrict \d, \s, \w and [:posix:] to match ASCII only
    aa (two a's) also /i matches exclude ASCII/non-ASCII
    l  match according to current locale
    u  match according to Unicode rules
    d  match according to native rules unless something indicates
       Unicode
    n  Non-capture mode. Don't let () fill in $1, $2, etc...

If 'pattern' is an empty string, the last I<successfully> matched
regex is used. Delimiters other than '/' may be used for both this
operator and the following ones. The leading C<m> can be omitted
if the delimiter is '/'.

C<qr/pattern/msixpodualn> lets you store a regex in a variable,
or pass one around. Modifiers as for C<m//>, and are stored
within the regex.

C<s/pattern/replacement/msixpogcedual> substitutes matches of
'pattern' with 'replacement'. Modifiers as for C<m//>,
with two additions:

    e  Evaluate 'replacement' as an expression
    r  Return substitution and leave the original string untouched.

'e' may be specified multiple times. 'replacement' is interpreted
as a double quoted string unless a single-quote (C<'>) is the delimiter.

C<m?pattern?> is like C<m/pattern/> but matches only once. No alternate
delimiters can be used.  Must be reset with reset().

=head2 SYNTAX

 \       Escapes the character immediately following it
 .       Matches any single character except a newline (unless /s is
           used)
 ^       Matches at the beginning of the string (or line, if /m is used)
 $       Matches at the end of the string (or line, if /m is used)
 *       Matches the preceding element 0 or more times
 +       Matches the preceding element 1 or more times
 ?       Matches the preceding element 0 or 1 times
 {...}   Specifies a range of occurrences for the element preceding it
 [...]   Matches any one of the characters contained within the brackets
 (...)   Groups subexpressions for capturing to $1, $2...
 (?:...) Groups subexpressions without capturing (cluster)
 |       Matches either the subexpression preceding or following it
 \g1 or \g{1}, \g2 ...    Matches the text from the Nth group
 \1, \2, \3 ...           Matches the text from the Nth group
 \g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group
 \g{name}     Named backreference
 \k<name>     Named backreference
 \k'name'     Named backreference
 (?P=name)    Named backreference (python syntax)

=head2 ESCAPE SEQUENCES

These work as in normal strings.

   \a       Alarm (beep)
   \e       Escape
   \f       Formfeed
   \n       Newline
   \r       Carriage return
   \t       Tab
   \037     Char whose ordinal is the 3 octal digits, max \777
   \o{2307} Char whose ordinal is the octal number, unrestricted
   \x7f     Char whose ordinal is the 2 hex digits, max \xFF
   \x{263a} Char whose ordinal is the hex number, unrestricted
   \cx      Control-x
   \N{name} A named Unicode character or character sequence
   \N{U+263D} A Unicode character by hex ordinal

   \l  Lowercase next character
   \u  Titlecase next character
   \L  Lowercase until \E
   \U  Uppercase until \E
   \F  Foldcase until \E
   \Q  Disable pattern metacharacters until \E
   \E  End modification

For Titlecase, see L</Titlecase>.

This one works differently from normal strings:

   \b  An assertion, not backspace, except in a character class

=head2 CHARACTER CLASSES

   [amy]    Match 'a', 'm' or 'y'
   [f-j]    Dash specifies "range"
   [f-j-]   Dash escaped or at start or end means 'dash'
   [^f-j]   Caret indicates "match any character _except_ these"

The following sequences (except C<\N>) work within or without a character class.
The first six are locale aware, all are Unicode aware. See L<perllocale>
and L<perlunicode> for details.

   \d      A digit
   \D      A nondigit
   \w      A word character
   \W      A non-word character
   \s      A whitespace character
   \S      A non-whitespace character
   \h      A horizontal whitespace
   \H      A non horizontal whitespace
   \N      A non newline (when not followed by '{NAME}';;
           not valid in a character class; equivalent to [^\n]; it's
           like '.' without /s modifier)
   \v      A vertical whitespace
   \V      A non vertical whitespace
   \R      A generic newline           (?>\v|\x0D\x0A)

   \pP     Match P-named (Unicode) property
   \p{...} Match Unicode property with name longer than 1 character
   \PP     Match non-P
   \P{...} Match lack of Unicode property with name longer than 1 char
   \X      Match Unicode extended grapheme cluster

POSIX character classes and their Unicode and Perl equivalents:

            ASCII-         Full-
   POSIX    range          range    backslash
 [[:...:]]  \p{...}        \p{...}   sequence    Description

 -----------------------------------------------------------------------
 alnum   PosixAlnum       XPosixAlnum            'alpha' plus 'digit'
 alpha   PosixAlpha       XPosixAlpha            Alphabetic characters
 ascii   ASCII                                   Any ASCII character
 blank   PosixBlank       XPosixBlank   \h       Horizontal whitespace;
                                                   full-range also
                                                   written as
                                                   \p{HorizSpace} (GNU
                                                   extension)
 cntrl   PosixCntrl       XPosixCntrl            Control characters
 digit   PosixDigit       XPosixDigit   \d       Decimal digits
 graph   PosixGraph       XPosixGraph            'alnum' plus 'punct'
 lower   PosixLower       XPosixLower            Lowercase characters
 print   PosixPrint       XPosixPrint            'graph' plus 'space',
                                                   but not any Controls
 punct   PosixPunct       XPosixPunct            Punctuation and Symbols
                                                   in ASCII-range; just
                                                   punct outside it
 space   PosixSpace       XPosixSpace   \s       Whitespace
 upper   PosixUpper       XPosixUpper            Uppercase characters
 word    PosixWord        XPosixWord    \w       'alnum' + Unicode marks
                                                    + connectors, like
                                                    '_' (Perl extension)
 xdigit  ASCII_Hex_Digit  XPosixDigit            Hexadecimal digit,
                                                    ASCII-range is
                                                    [0-9A-Fa-f]

Also, various synonyms like C<\p{Alpha}> for C<\p{XPosixAlpha}>; all listed
in L<perluniprops/Properties accessible through \p{} and \P{}>

Within a character class:

    POSIX      traditional   Unicode
  [:digit:]       \d        \p{Digit}
  [:^digit:]      \D        \P{Digit}

=head2 ANCHORS

All are zero-width assertions.

   ^  Match string start (or line, if /m is used)
   $  Match string end (or line, if /m is used) or before newline
   \b{} Match boundary of type specified within the braces
   \B{} Match wherever \b{} doesn't match
   \b Match word boundary (between \w and \W)
   \B Match except at word boundary (between \w and \w or \W and \W)
   \A Match string start (regardless of /m)
   \Z Match string end (before optional newline)
   \z Match absolute string end
   \G Match where previous m//g left off
   \K Keep the stuff left of the \K, don't include it in $&

=head2 QUANTIFIERS

Quantifiers are greedy by default and match the B<longest> leftmost.

   Maximal Minimal Possessive Allowed range
   ------- ------- ---------- -------------
   {n,m}   {n,m}?  {n,m}+     Must occur at least n times
                              but no more than m times
   {n,}    {n,}?   {n,}+      Must occur at least n times
   {,n}    {,n}?   {,n}+      Must occur at most n times
   {n}     {n}?    {n}+       Must occur exactly n times
   *       *?      *+         0 or more times (same as {0,})
   +       +?      ++         1 or more times (same as {1,})
   ?       ??      ?+         0 or 1 time (same as {0,1})

The possessive forms (new in Perl 5.10) prevent backtracking: what gets
matched by a pattern with a possessive quantifier will not be backtracked
into, even if that causes the whole match to fail.

=head2 EXTENDED CONSTRUCTS

   (?#text)          A comment
   (?:...)           Groups subexpressions without capturing (cluster)
   (?pimsx-imsx:...) Enable/disable option (as per m// modifiers)
   (?=...)           Zero-width positive lookahead assertion
   (*pla:...)        Same, starting in 5.32; experimentally in 5.28
   (*positive_lookahead:...) Same, same versions as *pla
   (?!...)           Zero-width negative lookahead assertion
   (*nla:...)        Same, starting in 5.32; experimentally in 5.28
   (*negative_lookahead:...) Same, same versions as *nla
   (?<=...)          Zero-width positive lookbehind assertion
   (*plb:...)        Same, starting in 5.32; experimentally in 5.28
   (*positive_lookbehind:...) Same, same versions as *plb
   (?<!...)          Zero-width negative lookbehind assertion
   (*nlb:...)        Same, starting in 5.32; experimentally in 5.28
   (*negative_lookbehind:...) Same, same versions as *plb
   (?>...)           Grab what we can, prohibit backtracking
   (*atomic:...)     Same, starting in 5.32; experimentally in 5.28
   (?|...)           Branch reset
   (?<name>...)      Named capture
   (?'name'...)      Named capture
   (?P<name>...)     Named capture (python syntax)
   (?[...])          Extended bracketed character class
   (?{ code })       Embedded code, return value becomes $^R
   (??{ code })      Dynamic regex, return value used as regex
   (?N)              Recurse into subpattern number N
   (?-N), (?+N)      Recurse into Nth previous/next subpattern
   (?R), (?0)        Recurse at the beginning of the whole pattern
   (?&name)          Recurse into a named subpattern
   (?P>name)         Recurse into a named subpattern (python syntax)
   (?(cond)yes|no)
   (?(cond)yes)      Conditional expression, where "(cond)" can be:
                     (?=pat)   lookahead; also (*pla:pat)
                               (*positive_lookahead:pat)
                     (?!pat)   negative lookahead; also (*nla:pat)
                               (*negative_lookahead:pat)
                     (?<=pat)  lookbehind; also (*plb:pat)
                               (*lookbehind:pat)
                     (?<!pat)  negative lookbehind; also (*nlb:pat)
                               (*negative_lookbehind:pat)
                     (N)       subpattern N has matched something
                     (<name>)  named subpattern has matched something
                     ('name')  named subpattern has matched something
                     (?{code}) code condition
                     (R)       true if recursing
                     (RN)      true if recursing into Nth subpattern
                     (R&name)  true if recursing into named subpattern
                     (DEFINE)  always false, no no-pattern allowed

=head2 VARIABLES

   $_    Default variable for operators to use

   $`    Everything prior to matched string
   $&    Entire matched string
   $'    Everything after to matched string

   ${^PREMATCH}   Everything prior to matched string
   ${^MATCH}      Entire matched string
   ${^POSTMATCH}  Everything after to matched string

Note to those still using Perl 5.18 or earlier:
The use of C<$`>, C<$&> or C<$'> will slow down B<all> regex use
within your program. Consult L<perlvar> for C<@->
to see equivalent expressions that won't cause slow down.
See also L<Devel::SawAmpersand>. Starting with Perl 5.10, you
can also use the equivalent variables C<${^PREMATCH}>, C<${^MATCH}>
and C<${^POSTMATCH}>, but for them to be defined, you have to
specify the C</p> (preserve) modifier on your regular expression.
In Perl 5.20, the use of C<$`>, C<$&> and C<$'> makes no speed difference.

   $1, $2 ...  hold the Xth captured expr
   $+    Last parenthesized pattern match
   $^N   Holds the most recently closed capture
   $^R   Holds the result of the last (?{...}) expr
   @-    Offsets of starts of groups. $-[0] holds start of whole match
   @+    Offsets of ends of groups. $+[0] holds end of whole match
   %+    Named capture groups
   %-    Named capture groups, as array refs

Captured groups are numbered according to their I<opening> paren.

=head2 FUNCTIONS

   lc          Lowercase a string
   lcfirst     Lowercase first char of a string
   uc          Uppercase a string
   ucfirst     Titlecase first char of a string
   fc          Foldcase a string

   pos         Return or set current match position
   quotemeta   Quote metacharacters
   reset       Reset m?pattern? status
   study       Analyze string for optimizing matching

   split       Use a regex to split a string into parts

The first five of these are like the escape sequences C<\L>, C<\l>,
C<\U>, C<\u>, and C<\F>.  For Titlecase, see L</Titlecase>; For
Foldcase, see L</Foldcase>.

=head2 TERMINOLOGY

=head3 Titlecase

Unicode concept which most often is equal to uppercase, but for
certain characters like the German "sharp s" there is a difference.

=head3 Foldcase

Unicode form that is useful when comparing strings regardless of case,
as certain characters have complex one-to-many case mappings. Primarily a
variant of lowercase.

=head1 AUTHOR

Iain Truskett. Updated by the Perl 5 Porters.

This document may be distributed under the same terms as Perl itself.

=head1 SEE ALSO

=over 4

=item *

L<perlretut> for a tutorial on regular expressions.

=item *

L<perlrequick> for a rapid tutorial.

=item *

L<perlre> for more details.

=item *

L<perlvar> for details on the variables.

=item *

L<perlop> for details on the operators.

=item *

L<perlfunc> for details on the functions.

=item *

L<perlfaq6> for FAQs on regular expressions.

=item *

L<perlrebackslash> for a reference on backslash sequences.

=item *

L<perlrecharclass> for a reference on character classes.

=item *

The L<re> module to alter behaviour and aid
debugging.

=item *

L<perldebug/"Debugging Regular Expressions">

=item *

L<perluniintro>, L<perlunicode>, L<charnames> and L<perllocale>
for details on regexes and internationalisation.

=item *

I<Mastering Regular Expressions> by Jeffrey Friedl
(L<http://oreilly.com/catalog/9780596528126/>) for a thorough grounding and
reference on the topic.

=back

=head1 THANKS

David P.C. Wollmann,
Richard Soderberg,
Sean M. Burke,
Tom Christiansen,
Jim Cromie,
and
Jeffrey Goff
for useful advice.

=cut
Commit	Line	Data
30487ceb RGS	1	=head1 NAME
	2
	3	perlreref - Perl Regular Expressions Reference
	4
	5	=head1 DESCRIPTION
	6
	7	This is a quick reference to Perl's regular expressions.
	8	For full information see L<perlre> and L<perlop>, as well
6d014f17	9	as the L</"SEE ALSO"> section in this document.
30487ceb	10
a5365663	11	=head2 OPERATORS
30487ceb	12
e17472c5 RGS	13	C<=~> determines to which variable the regex is applied.
e17472c5 RGS	14	In its absence, $_ is used.
30487ceb	15
e17472c5	16	$var =~ /foo/;
30487ceb	17
e17472c5 RGS	18	C<!~> determines to which variable the regex is applied,
	19	and negates the result of the match; it returns
	20	false if the match succeeds, and true if it fails.
6d014f17	21
e17472c5	22	$var !~ /foo/;
6d014f17	23
33be4c61	24	C<m/pattern/msixpogcdualn> searches a string for a pattern match,
e17472c5	25	applying the given options.
30487ceb	26
e17472c5 RGS	27	m Multiline mode - ^ and $ match internal lines
	28	s match as a Single line - . matches \n
	29	i case-Insensitive
	30	x eXtended legibility - free whitespace and comments
	31	p Preserve a copy of the matched string -
	32	${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
	33	o compile pattern Once
	34	g Global - all occurrences
	35	c don't reset pos on failed matches when using /g
b33bbe43 KW	36	a restrict \d, \s, \w and [:posix:] to match ASCII only
	37	aa (two a's) also /i matches exclude ASCII/non-ASCII
	38	l match according to current locale
	39	u match according to Unicode rules
	40	d match according to native rules unless something indicates
	41	Unicode
33be4c61	42	n Non-capture mode. Don't let () fill in $1, $2, etc...
30487ceb	43
e17472c5 RGS	44	If 'pattern' is an empty string, the last I<successfully> matched
e17472c5 RGS	45	regex is used. Delimiters other than '/' may be used for both this
64c5a566	46	operator and the following ones. The leading C<m> can be omitted
e17472c5	47	if the delimiter is '/'.
30487ceb	48
33be4c61	49	C<qr/pattern/msixpodualn> lets you store a regex in a variable,
e17472c5 RGS	50	or pass one around. Modifiers as for C<m//>, and are stored
e17472c5 RGS	51	within the regex.
30487ceb	52
b33bbe43	53	C<s/pattern/replacement/msixpogcedual> substitutes matches of
e17472c5	54	'pattern' with 'replacement'. Modifiers as for C<m//>,
4f4d7508	55	with two additions:
30487ceb	56
e17472c5	57	e Evaluate 'replacement' as an expression
4f4d7508	58	r Return substitution and leave the original string untouched.
30487ceb	59
e17472c5 RGS	60	'e' may be specified multiple times. 'replacement' is interpreted
e17472c5 RGS	61	as a double quoted string unless a single-quote (C<'>) is the delimiter.
30487ceb	62
9c6deb98	63	C<m?pattern?> is like C<m/pattern/> but matches only once. No alternate
e17472c5	64	delimiters can be used. Must be reset with reset().
30487ceb	65
a5365663	66	=head2 SYNTAX
30487ceb	67
9f4a55d4 KW	68	\ Escapes the character immediately following it
	69	. Matches any single character except a newline (unless /s is
	70	used)
	71	^ Matches at the beginning of the string (or line, if /m is used)
	72	$ Matches at the end of the string (or line, if /m is used)
	73	* Matches the preceding element 0 or more times
	74	+ Matches the preceding element 1 or more times
	75	? Matches the preceding element 0 or 1 times
	76	{...} Specifies a range of occurrences for the element preceding it
	77	[...] Matches any one of the characters contained within the brackets
	78	(...) Groups subexpressions for capturing to $1, $2...
	79	(?:...) Groups subexpressions without capturing (cluster)
	80	\| Matches either the subexpression preceding or following it
9f4a55d4	81	\g1 or \g{1}, \g2 ... Matches the text from the Nth group
c27a5cfe	82	\1, \2, \3 ... Matches the text from the Nth group
9f4a55d4 KW	83	\g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group
	84	\g{name} Named backreference
	85	\k<name> Named backreference
	86	\k'name' Named backreference
	87	(?P=name) Named backreference (python syntax)
30487ceb RGS	88
	89	=head2 ESCAPE SEQUENCES
	90
	91	These work as in normal strings.
	92
	93	\a Alarm (beep)
	94	\e Escape
	95	\f Formfeed
	96	\n Newline
	97	\r Carriage return
	98	\t Tab
e54859e6 KW	99	\037 Char whose ordinal is the 3 octal digits, max \777
	100	\o{2307} Char whose ordinal is the octal number, unrestricted
	101	\x7f Char whose ordinal is the 2 hex digits, max \xFF
	102	\x{263a} Char whose ordinal is the hex number, unrestricted
30487ceb	103	\cx Control-x
fb121860	104	\N{name} A named Unicode character or character sequence
e526e8bb	105	\N{U+263D} A Unicode character by hex ordinal
30487ceb	106
6d014f17	107	\l Lowercase next character
d3b55b48	108	\u Titlecase next character
30487ceb	109	\L Lowercase until \E
d3b55b48	110	\U Uppercase until \E
628253b8	111	\F Foldcase until \E
30487ceb	112	\Q Disable pattern metacharacters until \E
e17472c5	113	\E End modification
30487ceb	114
47e8a552 IT	115	For Titlecase, see L</Titlecase>.
47e8a552 IT	116
30487ceb RGS	117	This one works differently from normal strings:
	118
	119	\b An assertion, not backspace, except in a character class
	120
	121	=head2 CHARACTER CLASSES
	122
	123	[amy] Match 'a', 'm' or 'y'
	124	[f-j] Dash specifies "range"
	125	[f-j-] Dash escaped or at start or end means 'dash'
6d014f17	126	[^f-j] Caret indicates "match any character _except_ these"
30487ceb	127
df225385	128	The following sequences (except C<\N>) work within or without a character class.
e17472c5 RGS	129	The first six are locale aware, all are Unicode aware. See L<perllocale>
	130	and L<perlunicode> for details.
	131
	132	\d A digit
	133	\D A nondigit
	134	\w A word character
	135	\W A non-word character
	136	\s A whitespace character
	137	\S A non-whitespace character
33f0d962	138	\h A horizontal whitespace
418e7b04	139	\H A non horizontal whitespace
2171640d	140	\N A non newline (when not followed by '{NAME}';;
9f4a55d4 KW	141	not valid in a character class; equivalent to [^\n]; it's
9f4a55d4 KW	142	like '.' without /s modifier)
418e7b04 KW	143	\v A vertical whitespace
418e7b04 KW	144	\V A non vertical whitespace
e17472c5	145	\R A generic newline (?>\v\|\x0D\x0A)
e04a154e	146
30487ceb	147	\pP Match P-named (Unicode) property
e1b711da	148	\p{...} Match Unicode property with name longer than 1 character
30487ceb	149	\PP Match non-P
e1b711da	150	\P{...} Match lack of Unicode property with name longer than 1 char
0111a78f	151	\X Match Unicode extended grapheme cluster
30487ceb RGS	152
	153	POSIX character classes and their Unicode and Perl equivalents:
	154
cbc24f92 KW	155	ASCII- Full-
	156	POSIX range range backslash
	157	[[:...:]] \p{...} \p{...} sequence Description
	158
9f4a55d4	159	-----------------------------------------------------------------------
92c5714c	160	alnum PosixAlnum XPosixAlnum 'alpha' plus 'digit'
cbc24f92 KW	161	alpha PosixAlpha XPosixAlpha Alphabetic characters
	162	ascii ASCII Any ASCII character
	163	blank PosixBlank XPosixBlank \h Horizontal whitespace;
	164	full-range also
	165	written as
	166	\p{HorizSpace} (GNU
	167	extension)
	168	cntrl PosixCntrl XPosixCntrl Control characters
	169	digit PosixDigit XPosixDigit \d Decimal digits
92c5714c	170	graph PosixGraph XPosixGraph 'alnum' plus 'punct'
cbc24f92	171	lower PosixLower XPosixLower Lowercase characters
92c5714c KW	172	print PosixPrint XPosixPrint 'graph' plus 'space',
92c5714c KW	173	but not any Controls
cbc24f92 KW	174	punct PosixPunct XPosixPunct Punctuation and Symbols
	175	in ASCII-range; just
	176	punct outside it
92c5714c	177	space PosixSpace XPosixSpace \s Whitespace
cbc24f92	178	upper PosixUpper XPosixUpper Uppercase characters
92c5714c KW	179	word PosixWord XPosixWord \w 'alnum' + Unicode marks
	180	+ connectors, like
	181	'_' (Perl extension)
cbc24f92 KW	182	xdigit ASCII_Hex_Digit XPosixDigit Hexadecimal digit,
	183	ASCII-range is
	184	[0-9A-Fa-f]
	185
	186	Also, various synonyms like C<\p{Alpha}> for C<\p{XPosixAlpha}>; all listed
	187	in L<perluniprops/Properties accessible through \p{} and \P{}>
30487ceb RGS	188
	189	Within a character class:
	190
9f4a55d4 KW	191	POSIX traditional Unicode
	192	[:digit:] \d \p{Digit}
	193	[:^digit:] \D \P{Digit}
30487ceb RGS	194
	195	=head2 ANCHORS
	196
	197	All are zero-width assertions.
	198
	199	^ Match string start (or line, if /m is used)
	200	$ Match string end (or line, if /m is used) or before newline
64935bc6 KW	201	\b{} Match boundary of type specified within the braces
64935bc6 KW	202	\B{} Match wherever \b{} doesn't match
30487ceb	203	\b Match word boundary (between \w and \W)
6d014f17	204	\B Match except at word boundary (between \w and \w or \W and \W)
30487ceb	205	\A Match string start (regardless of /m)
6d014f17	206	\Z Match string end (before optional newline)
30487ceb RGS	207	\z Match absolute string end
30487ceb RGS	208	\G Match where previous m//g left off
64c5a566 RGS	209	\K Keep the stuff left of the \K, don't include it in $&
64c5a566 RGS	210
30487ceb RGS	211	=head2 QUANTIFIERS
30487ceb RGS	212
ac036724	213	Quantifiers are greedy by default and match the B<longest> leftmost.
30487ceb	214
64c5a566 RGS	215	Maximal Minimal Possessive Allowed range
	216	------- ------- ---------- -------------
	217	{n,m} {n,m}? {n,m}+ Must occur at least n times
	218	but no more than m times
	219	{n,} {n,}? {n,}+ Must occur at least n times
20420ba9	220	{,n} {,n}? {,n}+ Must occur at most n times
64c5a566 RGS	221	{n} {n}? {n}+ Must occur exactly n times
	222	* ? + 0 or more times (same as {0,})
	223	+ +? ++ 1 or more times (same as {1,})
	224	? ?? ?+ 0 or 1 time (same as {0,1})
	225
	226	The possessive forms (new in Perl 5.10) prevent backtracking: what gets
	227	matched by a pattern with a possessive quantifier will not be backtracked
	228	into, even if that causes the whole match to fail.
30487ceb RGS	229
	230	=head2 EXTENDED CONSTRUCTS
	231
64c5a566 RGS	232	(?#text) A comment
	233	(?:...) Groups subexpressions without capturing (cluster)
	234	(?pimsx-imsx:...) Enable/disable option (as per m// modifiers)
	235	(?=...) Zero-width positive lookahead assertion
8d527d4b KW	236	(*pla:...) Same, starting in 5.32; experimentally in 5.28
8d527d4b KW	237	(positive_lookahead:...) Same, same versions as pla
64c5a566	238	(?!...) Zero-width negative lookahead assertion
8d527d4b KW	239	(*nla:...) Same, starting in 5.32; experimentally in 5.28
8d527d4b KW	240	(negative_lookahead:...) Same, same versions as nla
64c5a566	241	(?<=...) Zero-width positive lookbehind assertion
8d527d4b KW	242	(*plb:...) Same, starting in 5.32; experimentally in 5.28
8d527d4b KW	243	(positive_lookbehind:...) Same, same versions as plb
64c5a566	244	(?<!...) Zero-width negative lookbehind assertion
8d527d4b KW	245	(*nlb:...) Same, starting in 5.32; experimentally in 5.28
8d527d4b KW	246	(negative_lookbehind:...) Same, same versions as plb
64c5a566	247	(?>...) Grab what we can, prohibit backtracking
8d527d4b	248	(*atomic:...) Same, starting in 5.32; experimentally in 5.28
64c5a566 RGS	249	(?\|...) Branch reset
	250	(?<name>...) Named capture
	251	(?'name'...) Named capture
	252	(?P<name>...) Named capture (python syntax)
ea64e14e	253	(?[...]) Extended bracketed character class
64c5a566 RGS	254	(?{ code }) Embedded code, return value becomes $^R
	255	(??{ code }) Dynamic regex, return value used as regex
	256	(?N) Recurse into subpattern number N
	257	(?-N), (?+N) Recurse into Nth previous/next subpattern
	258	(?R), (?0) Recurse at the beginning of the whole pattern
	259	(?&name) Recurse into a named subpattern
	260	(?P>name) Recurse into a named subpattern (python syntax)
	261	(?(cond)yes\|no)
89c8f482	262	(?(cond)yes) Conditional expression, where "(cond)" can be:
3c57a2d9 KW	263	(?=pat) lookahead; also (*pla:pat)
	264	(*positive_lookahead:pat)
	265	(?!pat) negative lookahead; also (*nla:pat)
	266	(*negative_lookahead:pat)
	267	(?<=pat) lookbehind; also (*plb:pat)
	268	(*lookbehind:pat)
	269	(?<!pat) negative lookbehind; also (*nlb:pat)
	270	(*negative_lookbehind:pat)
64c5a566 RGS	271	(N) subpattern N has matched something
	272	(<name>) named subpattern has matched something
	273	('name') named subpattern has matched something
	274	(?{code}) code condition
	275	(R) true if recursing
	276	(RN) true if recursing into Nth subpattern
	277	(R&name) true if recursing into named subpattern
	278	(DEFINE) always false, no no-pattern allowed
30487ceb	279
a5365663	280	=head2 VARIABLES
30487ceb RGS	281
30487ceb RGS	282	$_ Default variable for operators to use
30487ceb	283
30487ceb	284	$` Everything prior to matched string
e17472c5	285	$& Entire matched string
30487ceb RGS	286	$' Everything after to matched string
30487ceb RGS	287
e17472c5 RGS	288	${^PREMATCH} Everything prior to matched string
	289	${^MATCH} Entire matched string
	290	${^POSTMATCH} Everything after to matched string
	291
13b0f67d	292	Note to those still using Perl 5.18 or earlier:
e17472c5	293	The use of C<$`>, C<$&> or C<$'> will slow down B<all> regex use
e1fd4132 DM	294	within your program. Consult L<perlvar> for C<@->
	295	to see equivalent expressions that won't cause slow down.
	296	See also L<Devel::SawAmpersand>. Starting with Perl 5.10, you
e17472c5 RGS	297	can also use the equivalent variables C<${^PREMATCH}>, C<${^MATCH}>
	298	and C<${^POSTMATCH}>, but for them to be defined, you have to
	299	specify the C</p> (preserve) modifier on your regular expression.
13b0f67d	300	In Perl 5.20, the use of C<$`>, C<$&> and C<$'> makes no speed difference.
30487ceb RGS	301
	302	$1, $2 ... hold the Xth captured expr
	303	$+ Last parenthesized pattern match
	304	$^N Holds the most recently closed capture
	305	$^R Holds the result of the last (?{...}) expr
6d014f17 JH	306	@- Offsets of starts of groups. $-[0] holds start of whole match
6d014f17 JH	307	@+ Offsets of ends of groups. $+[0] holds end of whole match
c27a5cfe KW	308	%+ Named capture groups
c27a5cfe KW	309	%- Named capture groups, as array refs
30487ceb	310
6d014f17	311	Captured groups are numbered according to their I<opening> paren.
30487ceb	312
a5365663	313	=head2 FUNCTIONS
30487ceb RGS	314
	315	lc Lowercase a string
	316	lcfirst Lowercase first char of a string
	317	uc Uppercase a string
47e8a552	318	ucfirst Titlecase first char of a string
628253b8	319	fc Foldcase a string
47e8a552	320
30487ceb RGS	321	pos Return or set current match position
30487ceb RGS	322	quotemeta Quote metacharacters
9c6deb98	323	reset Reset m?pattern? status
30487ceb RGS	324	study Analyze string for optimizing matching
30487ceb RGS	325
e17472c5	326	split Use a regex to split a string into parts
30487ceb	327
628253b8 BF	328	The first five of these are like the escape sequences C<\L>, C<\l>,
	329	C<\U>, C<\u>, and C<\F>. For Titlecase, see L</Titlecase>; For
	330	Foldcase, see L</Foldcase>.
47e8a552	331
1501d360	332	=head2 TERMINOLOGY
47e8a552	333
a5365663	334	=head3 Titlecase
47e8a552 IT	335
	336	Unicode concept which most often is equal to uppercase, but for
	337	certain characters like the German "sharp s" there is a difference.
	338
628253b8 BF	339	=head3 Foldcase
	340
	341	Unicode form that is useful when comparing strings regardless of case,
211f3bbf	342	as certain characters have complex one-to-many case mappings. Primarily a
628253b8 BF	343	variant of lowercase.
628253b8 BF	344
40506b5d	345	=head1 AUTHOR
30487ceb	346
64c5a566	347	Iain Truskett. Updated by the Perl 5 Porters.
30487ceb RGS	348
	349	This document may be distributed under the same terms as Perl itself.
	350
40506b5d	351	=head1 SEE ALSO
30487ceb RGS	352
	353	=over 4
	354
	355	=item *
	356
	357	L<perlretut> for a tutorial on regular expressions.
	358
	359	=item *
	360
	361	L<perlrequick> for a rapid tutorial.
	362
	363	=item *
	364
	365	L<perlre> for more details.
	366
	367	=item *
	368
	369	L<perlvar> for details on the variables.
	370
	371	=item *
	372
	373	L<perlop> for details on the operators.
	374
	375	=item *
	376
	377	L<perlfunc> for details on the functions.
	378
	379	=item *
	380
	381	L<perlfaq6> for FAQs on regular expressions.
	382
	383	=item *
	384
64c5a566 RGS	385	L<perlrebackslash> for a reference on backslash sequences.
	386
	387	=item *
	388
	389	L<perlrecharclass> for a reference on character classes.
	390
	391	=item *
	392
30487ceb RGS	393	The L<re> module to alter behaviour and aid
	394	debugging.
	395
	396	=item *
	397
57e8c15d	398	L<perldebug/"Debugging Regular Expressions">
30487ceb RGS	399
	400	=item *
	401
e17472c5	402	L<perluniintro>, L<perlunicode>, L<charnames> and L<perllocale>
30487ceb RGS	403	for details on regexes and internationalisation.
	404
	405	=item *
	406
	407	I<Mastering Regular Expressions> by Jeffrey Friedl
4b05bc8e	408	(L<http://oreilly.com/catalog/9780596528126/>) for a thorough grounding and
30487ceb RGS	409	reference on the topic.
	410
	411	=back
	412
40506b5d	413	=head1 THANKS
30487ceb RGS	414
	415	David P.C. Wollmann,
	416	Richard Soderberg,
	417	Sean M. Burke,
	418	Tom Christiansen,
e5a7b003	419	Jim Cromie,
30487ceb RGS	420	and
	421	Jeffrey Goff
	422	for useful advice.
6d014f17 JH	423
6d014f17 JH	424	=cut