This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Re: [perl #8835] fairly large regex optimization bug with 5.7.3
[perl5.git] / regcomp.sym
CommitLineData
03363afd
YO
1# regcomp.sym
2#
3# File has two sections, divided by a line of dashes '-'.
4#
5# Empty rows after #-comment are removed from input are ignored
6#
7# First section is for regops, second sectionis for regmatch-states
8#
3dab1dad
YO
9# Note that the order in this file is important.
10#
03363afd
YO
11# Format for first section:
12# NAME \t TYPE, arg-description [num-args] [longjump-len] \t DESCRIPTION
13#
3dab1dad
YO
14#
15
03363afd
YO
16
17
1de06328
YO
18#* Exit points (0,1)
19
d09b2d29
IZ
20END END, no End of program.
21SUCCEED END, no Return from a subroutine, basically.
22
1de06328
YO
23#* Anchors: (2..13)
24
d09b2d29
IZ
25BOL BOL, no Match "" at beginning of line.
26MBOL BOL, no Same, assuming multiline.
27SBOL BOL, no Same, assuming singleline.
b85d18e9 28EOS EOL, no Match "" at end of string.
d09b2d29
IZ
29EOL EOL, no Match "" at end of line.
30MEOL EOL, no Same, assuming multiline.
31SEOL EOL, no Same, assuming singleline.
32BOUND BOUND, no Match "" at any word boundary
33BOUNDL BOUND, no Match "" at any word boundary
34NBOUND NBOUND, no Match "" at any word non-boundary
35NBOUNDL NBOUND, no Match "" at any word non-boundary
36GPOS GPOS, no Matches where last m//g left off.
37
1de06328
YO
38#* [Special] alternatives: (14..30)
39
22c35a8c 40REG_ANY REG_ANY, no Match any one character (except newline).
22c35a8c 41SANY REG_ANY, no Match any one character.
f33976b4 42CANY REG_ANY, no Match any one byte.
d09b2d29
IZ
43ANYOF ANYOF, sv Match character in (or not in) this class.
44ALNUM ALNUM, no Match any alphanumeric character
45ALNUML ALNUM, no Match any alphanumeric char in locale
46NALNUM NALNUM, no Match any non-alphanumeric character
47NALNUML NALNUM, no Match any non-alphanumeric char in locale
48SPACE SPACE, no Match any whitespace character
49SPACEL SPACE, no Match any whitespace char in locale
50NSPACE NSPACE, no Match any non-whitespace character
51NSPACEL NSPACE, no Match any non-whitespace char in locale
52DIGIT DIGIT, no Match any numeric character
b8c5462f 53DIGITL DIGIT, no Match any numeric character in locale
d09b2d29 54NDIGIT NDIGIT, no Match any non-numeric character
b8c5462f 55NDIGITL NDIGIT, no Match any non-numeric character in locale
a0ed51b3 56CLUMP CLUMP, no Match any combining character sequence
d09b2d29 57
1de06328
YO
58#* Alternation (31)
59
60# BRANCH The set of branches constituting a single choice are hooked
d09b2d29
IZ
61# together with their "next" pointers, since precedence prevents
62# anything being concatenated to any individual branch. The
63# "next" pointer of the last BRANCH in a choice points to the
64# thing following the whole choice. This is also where the
65# final "next" pointer of each individual branch points; each
66# branch starts with the operand node of a BRANCH node.
67#
68BRANCH BRANCH, node Match this alternative, or the next...
69
1de06328
YO
70#*Back pointer (32)
71
d09b2d29
IZ
72# BACK Normal "next" pointers all implicitly point forward; BACK
73# exists to make loop structures possible.
74# not used
75BACK BACK, no Match "", "next" ptr points backward.
76
1de06328
YO
77#*Literals (33..35)
78
d09b2d29
IZ
79EXACT EXACT, sv Match this string (preceded by length).
80EXACTF EXACT, sv Match this string, folded (prec. by length).
81EXACTFL EXACT, sv Match this string, folded in locale (w/len).
82
1de06328
YO
83#*Do nothing types (36..37)
84
d09b2d29
IZ
85NOTHING NOTHING,no Match empty string.
86# A variant of above which delimits a group, thus stops optimizations
87TAIL NOTHING,no Match empty string. Can jump here from outside.
88
1de06328
YO
89#*Loops (38..44)
90
d09b2d29
IZ
91# STAR,PLUS '?', and complex '*' and '+', are implemented as circular
92# BRANCH structures using BACK. Simple cases (one character
93# per match) are implemented with STAR and PLUS for speed
94# and to minimize recursive plunges.
95#
96STAR STAR, node Match this (simple) thing 0 or more times.
97PLUS PLUS, node Match this (simple) thing 1 or more times.
98
99CURLY CURLY, sv 2 Match this simple thing {n,m} times.
100CURLYN CURLY, no 2 Match next-after-this simple thing
101# {n,m} times, set parenths.
102CURLYM CURLY, no 2 Match this medium-complex thing {n,m} times.
103CURLYX CURLY, sv 2 Match this complex thing {n,m} times.
104
105# This terminator creates a loop structure for CURLYX
106WHILEM WHILEM, no Do curly processing and see if rest matches.
107
1de06328
YO
108#*Buffer related (45..49)
109
d09b2d29
IZ
110# OPEN,CLOSE,GROUPP ...are numbered at compile time.
111OPEN OPEN, num 1 Mark this point in input as start of #n.
112CLOSE CLOSE, num 1 Analogous to OPEN.
113
114REF REF, num 1 Match some already matched string
115REFF REF, num 1 Match already matched string, folded
116REFFL REF, num 1 Match already matched string, folded in loc.
117
1de06328
YO
118#*Grouping assertions (50..54)
119
d09b2d29
IZ
120IFMATCH BRANCHJ,off 1 2 Succeeds if the following matches.
121UNLESSM BRANCHJ,off 1 2 Fails if the following matches.
122SUSPEND BRANCHJ,off 1 1 "Independent" sub-RE.
123IFTHEN BRANCHJ,off 1 1 Switch, should be preceeded by switcher .
124GROUPP GROUPP, num 1 Whether the group matched.
125
1de06328
YO
126#*Support for long RE (55..56)
127
d09b2d29
IZ
128LONGJMP LONGJMP,off 1 1 Jump far away.
129BRANCHJ BRANCHJ,off 1 1 BRANCH with long offset.
130
1de06328
YO
131#*The heavy worker (57..58)
132
d09b2d29
IZ
133EVAL EVAL, evl 1 Execute some Perl code.
134
1de06328
YO
135#*Modifiers (59..60)
136
d09b2d29
IZ
137MINMOD MINMOD, no Next operator is not greedy.
138LOGICAL LOGICAL,no Next opcode should set the flag only.
139
1de06328 140# This is not used yet (61)
d09b2d29
IZ
141RENUM BRANCHJ,off 1 1 Group with independently numbered parens.
142
1de06328
YO
143#*Trie Related (62..64)
144
145# Behave the same as A|LIST|OF|WORDS would. The '..C' variants have
146# inline charclass data (ascii only), the 'C' store it in the structure.
147# NOTE: the relative order of the TRIE-like regops is signifigant
ce5e9471 148
3dab1dad 149TRIE TRIE, trie 1 Match many EXACT(FL?)? at once. flags==type
786e8c11 150TRIEC TRIE, trie charclass Same as TRIE, but with embedded charclass data
3dab1dad 151
1de06328
YO
152# For start classes, contains an added fail table.
153AHOCORASICK TRIE, trie 1 Aho Corasick stclass. flags==type
154AHOCORASICKC TRIE, trie charclass Same as AHOCORASICK, but with embedded charclass data
155
6bda09f9
YO
156#*Recursion (65)
157RECURSE RECURSE, num/ofs 2L recurse to paren arg1 at (signed) ofs arg2
158SRECURSE RECURSE, no recurse to start of pattern
03363afd 159
1de06328
YO
160# NEW STUFF ABOVE THIS LINE -- Please update counts below.
161
03363afd
YO
162################################################################################
163
164#*SPECIAL REGOPS (65, 66)
1de06328
YO
165
166# This is not really a node, but an optimized away piece of a "long" node.
167# To simplify debugging output, we mark it as if it were a node
168OPTIMIZED NOTHING,off Placeholder for dump.
169
3dab1dad
YO
170# Special opcode with the property that no opcode in a compiled program
171# will ever be of this type. Thus it can be used as a flag value that
172# no other opcode has been seen. END is used similarly, in that an END
173# node cant be optimized. So END implies "unoptimizable" and PSEUDO mean
174# "not seen anything to optimize yet".
175PSEUDO PSEUDO,off Pseudo opcode for internal use.
1de06328 176
03363afd
YO
177-------------------------------------------------------------------------------
178# Format for second section:
179# REGOP \t typelist [ \t typelist] [# Comment]
180# typelist= namelist
181# = namelist:FAIL
182# = name:count
183
184# Anything below is a state
185#
186#
187TRIE next:FAIL
188EVAL AB:FAIL
189CURLYX resume
190WHILEM resume:6
191BRANCH next:FAIL
192CURLYM A,B:FAIL
193IFMATCH A:FAIL
194CURLY B_min_known,B_min,B_max:FAIL
195