Commit | Line | Data |
---|---|---|
055fd3a9 GS |
1 | =head1 NAME |
2 | ||
3 | perldebguts - Guts of Perl debugging | |
4 | ||
5 | =head1 DESCRIPTION | |
6 | ||
ba555bf5 | 7 | This is not L<perldebug>, which tells you how to use |
74410c12 JM |
8 | the debugger. This manpage describes low-level details concerning |
9 | the debugger's internals, which range from difficult to impossible | |
10 | to understand for anyone who isn't incredibly intimate with Perl's guts. | |
11 | Caveat lector. | |
055fd3a9 GS |
12 | |
13 | =head1 Debugger Internals | |
14 | ||
15 | Perl has special debugging hooks at compile-time and run-time used | |
16 | to create debugging environments. These hooks are not to be confused | |
028611fa DB |
17 | with the I<perl -Dxxx> command described in L<perlrun|perlrun/-Dletters>, |
18 | which is usable only if a special Perl is built per the instructions in | |
1d251f8b | 19 | the F<INSTALL> file in the Perl source tree. |
055fd3a9 GS |
20 | |
21 | For example, whenever you call Perl's built-in C<caller> function | |
74410c12 JM |
22 | from the package C<DB>, the arguments that the corresponding stack |
23 | frame was called with are copied to the C<@DB::args> array. These | |
24 | mechanisms are enabled by calling Perl with the B<-d> switch. | |
25 | Specifically, the following additional features are enabled | |
26 | (cf. L<perlvar/$^P>): | |
055fd3a9 | 27 | |
13a2d996 | 28 | =over 4 |
055fd3a9 GS |
29 | |
30 | =item * | |
31 | ||
32 | Perl inserts the contents of C<$ENV{PERL5DB}> (or C<BEGIN {require | |
33 | 'perl5db.pl'}> if not present) before the first line of your program. | |
34 | ||
35 | =item * | |
36 | ||
aa0b556f | 37 | Each array C<@{"_<$filename"}> holds the lines of $filename for a |
74410c12 JM |
38 | file compiled by Perl. The same is also true for C<eval>ed strings |
39 | that contain subroutines, or which are currently being executed. | |
40 | The $filename for C<eval>ed strings looks like C<(eval 34)>. | |
8894c26d MJD |
41 | |
42 | Values in this array are magical in numeric context: they compare | |
43 | equal to zero only if the line is not breakable. | |
055fd3a9 GS |
44 | |
45 | =item * | |
46 | ||
aa0b556f | 47 | Each hash C<%{"_<$filename"}> contains breakpoints and actions keyed |
055fd3a9 GS |
48 | by line number. Individual entries (as opposed to the whole hash) |
49 | are settable. Perl only cares about Boolean true here, although | |
50 | the values used by F<perl5db.pl> have the form | |
8894c26d | 51 | C<"$break_condition\0$action">. |
055fd3a9 GS |
52 | |
53 | The same holds for evaluated strings that contain subroutines, or | |
54 | which are currently being executed. The $filename for C<eval>ed strings | |
d24ca0c5 | 55 | looks like C<(eval 34)>. |
055fd3a9 GS |
56 | |
57 | =item * | |
58 | ||
6e764e36 | 59 | Each scalar C<${"_<$filename"}> contains C<$filename>. This is |
055fd3a9 | 60 | also the case for evaluated strings that contain subroutines, or |
6e764e36 | 61 | which are currently being executed. The C<$filename> for C<eval>ed |
d24ca0c5 | 62 | strings looks like C<(eval 34)>. |
055fd3a9 GS |
63 | |
64 | =item * | |
65 | ||
66 | After each C<require>d file is compiled, but before it is executed, | |
67 | C<DB::postponed(*{"_<$filename"})> is called if the subroutine | |
68 | C<DB::postponed> exists. Here, the $filename is the expanded name of | |
69 | the C<require>d file, as found in the values of %INC. | |
70 | ||
71 | =item * | |
72 | ||
73 | After each subroutine C<subname> is compiled, the existence of | |
74 | C<$DB::postponed{subname}> is checked. If this key exists, | |
75 | C<DB::postponed(subname)> is called if the C<DB::postponed> subroutine | |
76 | also exists. | |
77 | ||
78 | =item * | |
79 | ||
80 | A hash C<%DB::sub> is maintained, whose keys are subroutine names | |
81 | and whose values have the form C<filename:startline-endline>. | |
82 | C<filename> has the form C<(eval 34)> for subroutines defined inside | |
d24ca0c5 | 83 | C<eval>s. |
055fd3a9 GS |
84 | |
85 | =item * | |
86 | ||
87 | When the execution of your program reaches a point that can hold a | |
74410c12 JM |
88 | breakpoint, the C<DB::DB()> subroutine is called if any of the variables |
89 | C<$DB::trace>, C<$DB::single>, or C<$DB::signal> is true. These variables | |
055fd3a9 GS |
90 | are not C<local>izable. This feature is disabled when executing |
91 | inside C<DB::DB()>, including functions called from it | |
92 | unless C<< $^D & (1<<30) >> is true. | |
93 | ||
94 | =item * | |
95 | ||
96 | When execution of the program reaches a subroutine call, a call to | |
5109e49f Z |
97 | C<&DB::sub>(I<args>) is made instead, with C<$DB::sub> set to identify |
98 | the called subroutine. (This doesn't happen if the calling subroutine | |
99 | was compiled in the C<DB> package.) C<$DB::sub> normally holds the name | |
100 | of the called subroutine, if it has a name by which it can be looked up. | |
101 | Failing that, C<$DB::sub> will hold a reference to the called subroutine. | |
102 | Either way, the C<&DB::sub> subroutine can use C<$DB::sub> as a reference | |
103 | by which to call the called subroutine, which it will normally want to do. | |
055fd3a9 | 104 | |
77e42cd2 TC |
105 | X<&DB::lsub>If the call is to an lvalue subroutine, and C<&DB::lsub> |
106 | is defined C<&DB::lsub>(I<args>) is called instead, otherwise falling | |
107 | back to C<&DB::sub>(I<args>). | |
108 | ||
261cbad1 TC |
109 | =item * |
110 | ||
5109e49f Z |
111 | When execution of the program uses C<goto> to enter a non-XS subroutine |
112 | and the 0x80 bit is set in C<$^P>, a call to C<&DB::goto> is made, with | |
113 | C<$DB::sub> set to identify the subroutine being entered. The call to | |
114 | C<&DB::goto> does not replace the C<goto>; the requested subroutine will | |
115 | still be entered once C<&DB::goto> has returned. C<$DB::sub> normally | |
116 | holds the name of the subroutine being entered, if it has one. Failing | |
117 | that, C<$DB::sub> will hold a reference to the subroutine being entered. | |
118 | Unlike when C<&DB::sub> is called, it is not guaranteed that C<$DB::sub> | |
119 | can be used as a reference to operate on the subroutine being entered. | |
261cbad1 | 120 | |
055fd3a9 GS |
121 | =back |
122 | ||
123 | Note that if C<&DB::sub> needs external data for it to work, no | |
74410c12 JM |
124 | subroutine call is possible without it. As an example, the standard |
125 | debugger's C<&DB::sub> depends on the C<$DB::deep> variable | |
126 | (it defines how many levels of recursion deep into the debugger you can go | |
127 | before a mandatory break). If C<$DB::deep> is not defined, subroutine | |
128 | calls are not possible, even though C<&DB::sub> exists. | |
055fd3a9 GS |
129 | |
130 | =head2 Writing Your Own Debugger | |
131 | ||
74410c12 | 132 | =head3 Environment Variables |
666f95b9 | 133 | |
74410c12 JM |
134 | The C<PERL5DB> environment variable can be used to define a debugger. |
135 | For example, the minimal "working" debugger (it actually doesn't do anything) | |
136 | consists of one line: | |
666f95b9 | 137 | |
055fd3a9 GS |
138 | sub DB::DB {} |
139 | ||
74410c12 | 140 | It can easily be defined like this: |
666f95b9 | 141 | |
055fd3a9 GS |
142 | $ PERL5DB="sub DB::DB {}" perl -d your-script |
143 | ||
74410c12 | 144 | Another brief debugger, slightly more useful, can be created |
055fd3a9 GS |
145 | with only the line: |
146 | ||
147 | sub DB::DB {print ++$i; scalar <STDIN>} | |
148 | ||
74410c12 JM |
149 | This debugger prints a number which increments for each statement |
150 | encountered and waits for you to hit a newline before continuing | |
151 | to the next statement. | |
666f95b9 | 152 | |
74410c12 | 153 | The following debugger is actually useful: |
666f95b9 | 154 | |
055fd3a9 GS |
155 | { |
156 | package DB; | |
157 | sub DB {} | |
158 | sub sub {print ++$i, " $sub\n"; &$sub} | |
159 | } | |
160 | ||
74410c12 JM |
161 | It prints the sequence number of each subroutine call and the name of the |
162 | called subroutine. Note that C<&DB::sub> is being compiled into the | |
163 | package C<DB> through the use of the C<package> directive. | |
055fd3a9 | 164 | |
74410c12 JM |
165 | When it starts, the debugger reads your rc file (F<./.perldb> or |
166 | F<~/.perldb> under Unix), which can set important options. | |
167 | (A subroutine (C<&afterinit>) can be defined here as well; it is executed | |
168 | after the debugger completes its own initialization.) | |
055fd3a9 GS |
169 | |
170 | After the rc file is read, the debugger reads the PERLDB_OPTS | |
74410c12 JM |
171 | environment variable and uses it to set debugger options. The |
172 | contents of this variable are treated as if they were the argument | |
96090e4f | 173 | of an C<o ...> debugger command (q.v. in L<perldebug/"Configurable Options">). |
74410c12 | 174 | |
7b406369 | 175 | =head3 Debugger Internal Variables |
25cf7dea | 176 | |
74410c12 JM |
177 | In addition to the file and subroutine-related variables mentioned above, |
178 | the debugger also maintains various magical internal variables. | |
179 | ||
180 | =over 4 | |
181 | ||
182 | =item * | |
055fd3a9 | 183 | |
74410c12 JM |
184 | C<@DB::dbline> is an alias for C<@{"::_<current_file"}>, which |
185 | holds the lines of the currently-selected file (compiled by Perl), either | |
186 | explicitly chosen with the debugger's C<f> command, or implicitly by flow | |
187 | of execution. | |
188 | ||
189 | Values in this array are magical in numeric context: they compare | |
190 | equal to zero only if the line is not breakable. | |
191 | ||
192 | =item * | |
193 | ||
7b406369 | 194 | C<%DB::dbline> is an alias for C<%{"::_<current_file"}>, which |
74410c12 JM |
195 | contains breakpoints and actions keyed by line number in |
196 | the currently-selected file, either explicitly chosen with the | |
055fd3a9 GS |
197 | debugger's C<f> command, or implicitly by flow of execution. |
198 | ||
74410c12 JM |
199 | As previously noted, individual entries (as opposed to the whole hash) |
200 | are settable. Perl only cares about Boolean true here, although | |
201 | the values used by F<perl5db.pl> have the form | |
202 | C<"$break_condition\0$action">. | |
203 | ||
204 | =back | |
205 | ||
7b406369 | 206 | =head3 Debugger Customization Functions |
74410c12 JM |
207 | |
208 | Some functions are provided to simplify customization. | |
209 | ||
210 | =over 4 | |
211 | ||
212 | =item * | |
213 | ||
71110851 RGS |
214 | See L<perldebug/"Configurable Options"> for a description of options parsed by |
215 | C<DB::parse_options(string)>. | |
74410c12 JM |
216 | |
217 | =item * | |
218 | ||
219 | C<DB::dump_trace(skip[,count])> skips the specified number of frames | |
220 | and returns a list containing information about the calling frames (all | |
221 | of them, if C<count> is missing). Each entry is reference to a hash | |
222 | with keys C<context> (either C<.>, C<$>, or C<@>), C<sub> (subroutine | |
055fd3a9 GS |
223 | name, or info about C<eval>), C<args> (C<undef> or a reference to |
224 | an array), C<file>, and C<line>. | |
225 | ||
74410c12 JM |
226 | =item * |
227 | ||
228 | C<DB::print_trace(FH, skip[, count[, short]])> prints | |
055fd3a9 GS |
229 | formatted info about caller frames. The last two functions may be |
230 | convenient as arguments to C<< < >>, C<< << >> commands. | |
231 | ||
74410c12 JM |
232 | =back |
233 | ||
055fd3a9 GS |
234 | Note that any variables and functions that are not documented in |
235 | this manpages (or in L<perldebug>) are considered for internal | |
236 | use only, and as such are subject to change without notice. | |
237 | ||
238 | =head1 Frame Listing Output Examples | |
239 | ||
240 | The C<frame> option can be used to control the output of frame | |
241 | information. For example, contrast this expression trace: | |
242 | ||
243 | $ perl -de 42 | |
244 | Stack dump during die enabled outside of evals. | |
245 | ||
246 | Loading DB routines from perl5db.pl patch level 0.94 | |
247 | Emacs support available. | |
248 | ||
ccf3535a | 249 | Enter h or 'h h' for help. |
055fd3a9 GS |
250 | |
251 | main::(-e:1): 0 | |
252 | DB<1> sub foo { 14 } | |
253 | ||
254 | DB<2> sub bar { 3 } | |
255 | ||
256 | DB<3> t print foo() * bar() | |
257 | main::((eval 172):3): print foo() + bar(); | |
258 | main::foo((eval 168):2): | |
259 | main::bar((eval 170):2): | |
260 | 42 | |
261 | ||
492652be | 262 | with this one, once the C<o>ption C<frame=2> has been set: |
055fd3a9 | 263 | |
492652be | 264 | DB<4> o f=2 |
055fd3a9 GS |
265 | frame = '2' |
266 | DB<5> t print foo() * bar() | |
267 | 3: foo() * bar() | |
268 | entering main::foo | |
269 | 2: sub foo { 14 }; | |
270 | exited main::foo | |
271 | entering main::bar | |
272 | 2: sub bar { 3 }; | |
273 | exited main::bar | |
274 | 42 | |
275 | ||
276 | By way of demonstration, we present below a laborious listing | |
277 | resulting from setting your C<PERLDB_OPTS> environment variable to | |
278 | the value C<f=n N>, and running I<perl -d -V> from the command line. | |
7b406369 FC |
279 | Examples using various values of C<n> are shown to give you a feel |
280 | for the difference between settings. Long though it may be, this | |
055fd3a9 GS |
281 | is not a complete listing, but only excerpts. |
282 | ||
283 | =over 4 | |
284 | ||
285 | =item 1 | |
286 | ||
f185f654 KW |
287 | entering main::BEGIN |
288 | entering Config::BEGIN | |
289 | Package lib/Exporter.pm. | |
290 | Package lib/Carp.pm. | |
291 | Package lib/Config.pm. | |
292 | entering Config::TIEHASH | |
293 | entering Exporter::import | |
294 | entering Exporter::export | |
295 | entering Config::myconfig | |
296 | entering Config::FETCH | |
297 | entering Config::FETCH | |
298 | entering Config::FETCH | |
299 | entering Config::FETCH | |
055fd3a9 GS |
300 | |
301 | =item 2 | |
302 | ||
f185f654 KW |
303 | entering main::BEGIN |
304 | entering Config::BEGIN | |
305 | Package lib/Exporter.pm. | |
306 | Package lib/Carp.pm. | |
307 | exited Config::BEGIN | |
308 | Package lib/Config.pm. | |
309 | entering Config::TIEHASH | |
310 | exited Config::TIEHASH | |
311 | entering Exporter::import | |
312 | entering Exporter::export | |
313 | exited Exporter::export | |
314 | exited Exporter::import | |
315 | exited main::BEGIN | |
316 | entering Config::myconfig | |
317 | entering Config::FETCH | |
318 | exited Config::FETCH | |
319 | entering Config::FETCH | |
320 | exited Config::FETCH | |
321 | entering Config::FETCH | |
055fd3a9 | 322 | |
d5e42f17 | 323 | =item 3 |
055fd3a9 | 324 | |
f185f654 KW |
325 | in $=main::BEGIN() from /dev/null:0 |
326 | in $=Config::BEGIN() from lib/Config.pm:2 | |
327 | Package lib/Exporter.pm. | |
328 | Package lib/Carp.pm. | |
329 | Package lib/Config.pm. | |
330 | in $=Config::TIEHASH('Config') from lib/Config.pm:644 | |
331 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 | |
332 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from li | |
333 | in @=Config::myconfig() from /dev/null:0 | |
334 | in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 | |
335 | in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 | |
336 | in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 | |
337 | in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574 | |
338 | in $=Config::FETCH(ref(Config), 'osname') from lib/Config.pm:574 | |
339 | in $=Config::FETCH(ref(Config), 'osvers') from lib/Config.pm:574 | |
055fd3a9 | 340 | |
d5e42f17 | 341 | =item 4 |
055fd3a9 | 342 | |
f185f654 KW |
343 | in $=main::BEGIN() from /dev/null:0 |
344 | in $=Config::BEGIN() from lib/Config.pm:2 | |
345 | Package lib/Exporter.pm. | |
346 | Package lib/Carp.pm. | |
347 | out $=Config::BEGIN() from lib/Config.pm:0 | |
348 | Package lib/Config.pm. | |
349 | in $=Config::TIEHASH('Config') from lib/Config.pm:644 | |
350 | out $=Config::TIEHASH('Config') from lib/Config.pm:644 | |
351 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 | |
352 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/ | |
353 | out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/ | |
354 | out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 | |
355 | out $=main::BEGIN() from /dev/null:0 | |
356 | in @=Config::myconfig() from /dev/null:0 | |
357 | in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 | |
358 | out $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 | |
359 | in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 | |
360 | out $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 | |
361 | in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 | |
362 | out $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 | |
363 | in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574 | |
055fd3a9 | 364 | |
d5e42f17 | 365 | =item 5 |
055fd3a9 | 366 | |
f185f654 KW |
367 | in $=main::BEGIN() from /dev/null:0 |
368 | in $=Config::BEGIN() from lib/Config.pm:2 | |
369 | Package lib/Exporter.pm. | |
370 | Package lib/Carp.pm. | |
371 | out $=Config::BEGIN() from lib/Config.pm:0 | |
372 | Package lib/Config.pm. | |
373 | in $=Config::TIEHASH('Config') from lib/Config.pm:644 | |
374 | out $=Config::TIEHASH('Config') from lib/Config.pm:644 | |
375 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 | |
376 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E | |
377 | out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E | |
378 | out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 | |
379 | out $=main::BEGIN() from /dev/null:0 | |
380 | in @=Config::myconfig() from /dev/null:0 | |
381 | in $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574 | |
382 | out $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574 | |
383 | in $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574 | |
384 | out $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574 | |
055fd3a9 | 385 | |
d5e42f17 | 386 | =item 6 |
055fd3a9 | 387 | |
f185f654 KW |
388 | in $=CODE(0x15eca4)() from /dev/null:0 |
389 | in $=CODE(0x182528)() from lib/Config.pm:2 | |
390 | Package lib/Exporter.pm. | |
391 | out $=CODE(0x182528)() from lib/Config.pm:0 | |
392 | scalar context return from CODE(0x182528): undef | |
393 | Package lib/Config.pm. | |
394 | in $=Config::TIEHASH('Config') from lib/Config.pm:628 | |
395 | out $=Config::TIEHASH('Config') from lib/Config.pm:628 | |
396 | scalar context return from Config::TIEHASH: empty hash | |
397 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 | |
398 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171 | |
399 | out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171 | |
400 | scalar context return from Exporter::export: '' | |
401 | out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 | |
402 | scalar context return from Exporter::import: '' | |
055fd3a9 GS |
403 | |
404 | =back | |
405 | ||
406 | In all cases shown above, the line indentation shows the call tree. | |
407 | If bit 2 of C<frame> is set, a line is printed on exit from a | |
408 | subroutine as well. If bit 4 is set, the arguments are printed | |
409 | along with the caller info. If bit 8 is set, the arguments are | |
410 | printed even if they are tied or references. If bit 16 is set, the | |
411 | return value is printed, too. | |
412 | ||
413 | When a package is compiled, a line like this | |
414 | ||
415 | Package lib/Carp.pm. | |
416 | ||
417 | is printed with proper indentation. | |
418 | ||
7b406369 | 419 | =head1 Debugging Regular Expressions |
055fd3a9 GS |
420 | |
421 | There are two ways to enable debugging output for regular expressions. | |
422 | ||
423 | If your perl is compiled with C<-DDEBUGGING>, you may use the | |
dafb2544 KW |
424 | B<-Dr> flag on the command line, and C<-Drv> for more verbose |
425 | information. | |
055fd3a9 | 426 | |
dafb2544 | 427 | Otherwise, one can C<use re 'debug'>, which has effects at both |
3d71525d NJ |
428 | compile time and run time. Since Perl 5.9.5, this pragma is lexically |
429 | scoped. | |
055fd3a9 | 430 | |
7b406369 | 431 | =head2 Compile-time Output |
055fd3a9 GS |
432 | |
433 | The debugging output at compile time looks like this: | |
434 | ||
ccf3535a | 435 | Compiling REx '[bc]d(ef*g)+h[ij]k$' |
1c102323 MJD |
436 | size 45 Got 364 bytes for offset annotations. |
437 | first at 1 | |
438 | rarest char g at 0 | |
439 | rarest char d at 0 | |
440 | 1: ANYOF[bc](12) | |
441 | 12: EXACT <d>(14) | |
442 | 14: CURLYX[0] {1,32767}(28) | |
443 | 16: OPEN1(18) | |
444 | 18: EXACT <e>(20) | |
445 | 20: STAR(23) | |
446 | 21: EXACT <f>(0) | |
447 | 23: EXACT <g>(25) | |
448 | 25: CLOSE1(27) | |
449 | 27: WHILEM[1/1](0) | |
450 | 28: NOTHING(29) | |
451 | 29: EXACT <h>(31) | |
452 | 31: ANYOF[ij](42) | |
453 | 42: EXACT <k>(44) | |
454 | 44: EOL(45) | |
455 | 45: END(0) | |
ccf3535a JK |
456 | anchored 'de' at 1 floating 'gh' at 3..2147483647 (checking floating) |
457 | stclass 'ANYOF[bc]' minlen 7 | |
1c102323 MJD |
458 | Offsets: [45] |
459 | 1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1] | |
460 | 0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0] | |
461 | 11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0] | |
462 | 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0] | |
463 | Omitting $` $& $' support. | |
055fd3a9 GS |
464 | |
465 | The first line shows the pre-compiled form of the regex. The second | |
466 | shows the size of the compiled form (in arbitrary units, usually | |
1c102323 MJD |
467 | 4-byte words) and the total number of bytes allocated for the |
468 | offset/length table, usually 4+C<size>*8. The next line shows the | |
469 | label I<id> of the first node that does a match. | |
055fd3a9 | 470 | |
1c102323 MJD |
471 | The |
472 | ||
ccf3535a JK |
473 | anchored 'de' at 1 floating 'gh' at 3..2147483647 (checking floating) |
474 | stclass 'ANYOF[bc]' minlen 7 | |
1c102323 MJD |
475 | |
476 | line (split into two lines above) contains optimizer | |
055fd3a9 GS |
477 | information. In the example shown, the optimizer found that the match |
478 | should contain a substring C<de> at offset 1, plus substring C<gh> | |
479 | at some offset between 3 and infinity. Moreover, when checking for | |
480 | these substrings (to abandon impossible matches quickly), Perl will check | |
481 | for the substring C<gh> before checking for the substring C<de>. The | |
482 | optimizer may also use the knowledge that the match starts (at the | |
1c102323 MJD |
483 | C<first> I<id>) with a character class, and no string |
484 | shorter than 7 characters can possibly match. | |
055fd3a9 | 485 | |
1c102323 | 486 | The fields of interest which may appear in this line are |
055fd3a9 | 487 | |
13a2d996 | 488 | =over 4 |
055fd3a9 GS |
489 | |
490 | =item C<anchored> I<STRING> C<at> I<POS> | |
491 | ||
492 | =item C<floating> I<STRING> C<at> I<POS1..POS2> | |
493 | ||
494 | See above. | |
495 | ||
496 | =item C<matching floating/anchored> | |
497 | ||
498 | Which substring to check first. | |
499 | ||
500 | =item C<minlen> | |
501 | ||
502 | The minimal length of the match. | |
503 | ||
504 | =item C<stclass> I<TYPE> | |
505 | ||
506 | Type of first matching node. | |
507 | ||
508 | =item C<noscan> | |
509 | ||
510 | Don't scan for the found substrings. | |
511 | ||
512 | =item C<isall> | |
513 | ||
1c102323 | 514 | Means that the optimizer information is all that the regular |
055fd3a9 GS |
515 | expression contains, and thus one does not need to enter the regex engine at |
516 | all. | |
517 | ||
518 | =item C<GPOS> | |
519 | ||
520 | Set if the pattern contains C<\G>. | |
521 | ||
522 | =item C<plus> | |
523 | ||
524 | Set if the pattern starts with a repeated char (as in C<x+y>). | |
525 | ||
526 | =item C<implicit> | |
527 | ||
528 | Set if the pattern starts with C<.*>. | |
529 | ||
530 | =item C<with eval> | |
531 | ||
532 | Set if the pattern contain eval-groups, such as C<(?{ code })> and | |
533 | C<(??{ code })>. | |
534 | ||
535 | =item C<anchored(TYPE)> | |
536 | ||
7b406369 | 537 | If the pattern may match only at a handful of places, with C<TYPE> |
d3d47aac | 538 | being C<SBOL>, C<MBOL>, or C<GPOS>. See the table below. |
055fd3a9 GS |
539 | |
540 | =back | |
541 | ||
542 | If a substring is known to match at end-of-line only, it may be | |
ccf3535a | 543 | followed by C<$>, as in C<floating 'k'$>. |
055fd3a9 | 544 | |
1c102323 MJD |
545 | The optimizer-specific information is used to avoid entering (a slow) regex |
546 | engine on strings that will not definitely match. If the C<isall> flag | |
055fd3a9 GS |
547 | is set, a call to the regex engine may be avoided even when the optimizer |
548 | found an appropriate place for the match. | |
549 | ||
1c102323 | 550 | Above the optimizer section is the list of I<nodes> of the compiled |
055fd3a9 GS |
551 | form of the regex. Each line has format |
552 | ||
553 | C< >I<id>: I<TYPE> I<OPTIONAL-INFO> (I<next-id>) | |
554 | ||
7b406369 | 555 | =head2 Types of Nodes |
055fd3a9 | 556 | |
78465a4b | 557 | Here are the current possible types, with short descriptions: |
055fd3a9 | 558 | |
65aa4ca7 FC |
559 | =for comment |
560 | This table is generated by regen/regcomp.pl. Any changes made here | |
561 | will be lost. | |
562 | ||
563 | =for regcomp.pl begin | |
564 | ||
e21ef692 | 565 | # TYPE arg-description [regnode-struct-suffix] [longjump-len] DESCRIPTION |
5da6b59a KW |
566 | |
567 | # Exit points | |
65aa4ca7 | 568 | |
89829bb5 KW |
569 | END no End of program. |
570 | SUCCEED no Return from a subroutine, basically. | |
5da6b59a | 571 | |
d3d47aac | 572 | # Line Start Anchors: |
89829bb5 KW |
573 | SBOL no Match "" at beginning of line: /^/, /\A/ |
574 | MBOL no Same, assuming multiline: /^/m | |
5da6b59a | 575 | |
d3d47aac | 576 | # Line End Anchors: |
89829bb5 KW |
577 | SEOL no Match "" at end of line: /$/ |
578 | MEOL no Same, assuming multiline: /$/m | |
579 | EOS no Match "" at end of string: /\z/ | |
d3d47aac YO |
580 | |
581 | # Match Start Anchors: | |
89829bb5 | 582 | GPOS no Matches where last m//g left off. |
d3d47aac YO |
583 | |
584 | # Word Boundary Opcodes: | |
0991ffc9 KW |
585 | BOUND no Like BOUNDA for non-utf8, otherwise like |
586 | BOUNDU | |
89829bb5 KW |
587 | BOUNDL no Like BOUND/BOUNDU, but \w and \W are |
588 | defined by current locale | |
589 | BOUNDU no Match "" at any boundary of a given type | |
912b808c | 590 | using /u rules. |
89829bb5 KW |
591 | BOUNDA no Match "" at any boundary between \w\W or |
592 | \W\w, where \w is [_a-zA-Z0-9] | |
0991ffc9 KW |
593 | NBOUND no Like NBOUNDA for non-utf8, otherwise like |
594 | BOUNDU | |
89829bb5 KW |
595 | NBOUNDL no Like NBOUND/NBOUNDU, but \w and \W are |
596 | defined by current locale | |
597 | NBOUNDU no Match "" at any non-boundary of a given | |
912b808c | 598 | type using using /u rules. |
89829bb5 KW |
599 | NBOUNDA no Match "" betweeen any \w\w or \W\W, where |
600 | \w is [_a-zA-Z0-9] | |
5da6b59a KW |
601 | |
602 | # [Special] alternatives: | |
89829bb5 KW |
603 | REG_ANY no Match any one character (except newline). |
604 | SANY no Match any one character. | |
46fc0c43 KW |
605 | ANYOF sv Match character in (or not in) this class, |
606 | charclass single char match only | |
607 | ANYOFD sv Like ANYOF, but /d is in effect | |
608 | charclass | |
609 | ANYOFL sv Like ANYOF, but /l is in effect | |
610 | charclass | |
611 | ANYOFPOSIXL sv Like ANYOFL, but matches [[:posix:]] | |
612 | charclass_ classes | |
613 | posixl | |
f6eaa562 | 614 | |
c316b824 | 615 | ANYOFH sv 1 Like ANYOF, but only has "High" matches, |
29a889ef KW |
616 | none in the bitmap; the flags field |
617 | contains the lowest matchable UTF-8 start | |
618 | byte | |
6966a05b | 619 | ANYOFHb sv 1 Like ANYOFH, but all matches share the same |
a5bc0742 | 620 | UTF-8 start byte, given in the flags field |
3146c00a KW |
621 | ANYOFHr sv 1 Like ANYOFH, but the flags field contains |
622 | packed bounds for all matchable UTF-8 start | |
623 | bytes. | |
689eab88 | 624 | ANYOFHs sv:str 1 Like ANYOFHb, but has a string field that |
34924db0 KW |
625 | gives the leading matchable UTF-8 bytes; |
626 | flags field is len | |
13fcf652 KW |
627 | ANYOFR packed 1 Matches any character in the range given by |
628 | its packed args: upper 12 bits is the max | |
629 | delta from the base lower 20; the flags | |
630 | field contains the lowest matchable UTF-8 | |
631 | start byte | |
2d5613be KW |
632 | ANYOFRb packed 1 Like ANYOFR, but all matches share the same |
633 | UTF-8 start byte, given in the flags field | |
4c8c99df KW |
634 | |
635 | ANYOFHbbm none bbm Like ANYOFHb, but only for 2-byte UTF-8 | |
636 | characters; uses a bitmap to match the | |
637 | continuation byte | |
638 | ||
89829bb5 KW |
639 | ANYOFM byte 1 Like ANYOF, but matches an invariant byte |
640 | as determined by the mask and arg | |
3db0bccc | 641 | NANYOFM byte 1 complement of ANYOFM |
7bc66b18 | 642 | |
d3d47aac | 643 | # POSIX Character Classes: |
89829bb5 KW |
644 | POSIXD none Some [[:class:]] under /d; the FLAGS field |
645 | gives which one | |
646 | POSIXL none Some [[:class:]] under /l; the FLAGS field | |
647 | gives which one | |
648 | POSIXU none Some [[:class:]] under /u; the FLAGS field | |
649 | gives which one | |
650 | POSIXA none Some [[:class:]] under /a; the FLAGS field | |
651 | gives which one | |
652 | NPOSIXD none complement of POSIXD, [[:^class:]] | |
653 | NPOSIXL none complement of POSIXL, [[:^class:]] | |
654 | NPOSIXU none complement of POSIXU, [[:^class:]] | |
655 | NPOSIXA none complement of POSIXA, [[:^class:]] | |
656 | ||
89829bb5 KW |
657 | CLUMP no Match any extended grapheme cluster |
658 | sequence | |
5da6b59a KW |
659 | |
660 | # Alternation | |
661 | ||
65aa4ca7 FC |
662 | # BRANCH The set of branches constituting a single choice are |
663 | # hooked together with their "next" pointers, since | |
664 | # precedence prevents anything being concatenated to | |
665 | # any individual branch. The "next" pointer of the last | |
666 | # BRANCH in a choice points to the thing following the | |
667 | # whole choice. This is also where the final "next" | |
668 | # pointer of each individual branch points; each branch | |
669 | # starts with the operand node of a BRANCH node. | |
5da6b59a | 670 | # |
acababb4 | 671 | BRANCH node 1 Match this alternative, or the next... |
5da6b59a | 672 | |
5da6b59a KW |
673 | # Literals |
674 | ||
3ace85ea KW |
675 | EXACT str Match this string (flags field is the |
676 | length). | |
ae06e581 KW |
677 | |
678 | # In a long string node, the U32 argument is the length, and is | |
679 | # immediately followed by the string. | |
680 | LEXACT len:str 1 Match this long string (preceded by length; | |
681 | flags unused). | |
89829bb5 | 682 | EXACTL str Like EXACT, but /l is in effect (used so |
58ea1df2 | 683 | locale-related warnings can be checked for) |
a2f213ef | 684 | EXACTF str Like EXACT, but match using /id rules; |
58ea1df2 KW |
685 | (string not UTF-8, ASCII folded; non-ASCII |
686 | not) | |
a2f213ef | 687 | EXACTFL str Like EXACT, but match using /il rules; |
58ea1df2 | 688 | (string not likely to be folded) |
a2f213ef | 689 | EXACTFU str Like EXACT, but match using /iu rules; |
58ea1df2 KW |
690 | (string folded) |
691 | ||
a2f213ef | 692 | EXACTFAA str Like EXACT, but match using /iaa rules; |
5f162c35 KW |
693 | (string folded except MICRO in non-UTF8 |
694 | patterns; doesn't contain SHARP S unless | |
695 | UTF-8; folded length <= unfolded) | |
f97d9711 KW |
696 | EXACTFAA_NO_TRIE str Like EXACTFAA, (string not UTF-8, folded |
697 | except: MICRO, SHARP S; folded length <= | |
698 | unfolded, not currently trie-able) | |
a2f213ef KW |
699 | |
700 | EXACTFUP str Like EXACT, but match using /iu rules; | |
5f162c35 KW |
701 | (string not UTF-8, folded except MICRO: |
702 | hence Problematic) | |
aa419ff3 | 703 | |
a2f213ef KW |
704 | EXACTFLU8 str Like EXACTFU, but use /il, UTF-8, (string |
705 | is folded, and everything in it is above | |
58ea1df2 | 706 | 255 |
3f2416ae | 707 | EXACT_REQ8 str Like EXACT, but only UTF-8 encoded targets |
4f4c2c24 | 708 | can match |
3f2416ae | 709 | LEXACT_REQ8 len:str 1 Like LEXACT, but only UTF-8 encoded targets |
5cd61b66 | 710 | can match |
3f2416ae | 711 | EXACTFU_REQ8 str Like EXACTFU, but only UTF-8 encoded |
4f4c2c24 | 712 | targets can match |
f6b4b99d | 713 | |
95fb0a6e KW |
714 | EXACTFU_S_EDGE str /di rules, but nothing in it precludes /ui, |
715 | except begins and/or ends with [Ss]; | |
58ea1df2 | 716 | (string not UTF-8; compile-time only) |
8a100c91 | 717 | |
7af55186 KW |
718 | # New charclass like patterns |
719 | LNBREAK none generic newline pattern | |
720 | ||
721 | # Trie Related | |
722 | ||
723 | # Behave the same as A|LIST|OF|WORDS would. The '..C' variants | |
724 | # have inline charclass data (ascii only), the 'C' store it in the | |
725 | # structure. | |
726 | ||
727 | TRIE trie 1 Match many EXACT(F[ALU]?)? at once. | |
728 | flags==type | |
729 | TRIEC trie Same as TRIE, but with embedded charclass | |
730 | charclass data | |
731 | ||
732 | AHOCORASICK trie 1 Aho Corasick stclass. flags==type | |
733 | AHOCORASICKC trie Same as AHOCORASICK, but with embedded | |
734 | charclass charclass data | |
735 | ||
5da6b59a KW |
736 | # Do nothing types |
737 | ||
89829bb5 | 738 | NOTHING no Match empty string. |
5da6b59a | 739 | # A variant of above which delimits a group, thus stops optimizations |
89829bb5 KW |
740 | TAIL no Match empty string. Can jump here from |
741 | outside. | |
5da6b59a KW |
742 | |
743 | # Loops | |
744 | ||
65aa4ca7 | 745 | # STAR,PLUS '?', and complex '*' and '+', are implemented as |
62e6ef33 | 746 | # circular BRANCH structures. Simple cases |
65aa4ca7 FC |
747 | # (one character per match) are implemented with STAR |
748 | # and PLUS for speed and to minimize recursive plunges. | |
5da6b59a | 749 | # |
ac577429 KW |
750 | STAR node Match this (simple) thing 0 or more times: |
751 | /A{0,}B/ where A is width 1 char | |
752 | PLUS node Match this (simple) thing 1 or more times: | |
753 | /A{1,}B/ where A is width 1 char | |
754 | ||
17e3e02a | 755 | CURLY sv 3 Match this (simple) thing {n,m} times: |
ac577429 | 756 | /A{m,n}B/ where A is width 1 char |
17e3e02a | 757 | CURLYN no 3 Capture next-after-this simple thing: |
ac577429 | 758 | /(A){m,n}B/ where A is width 1 char |
17e3e02a | 759 | CURLYM no 3 Capture this medium-complex thing {n,m} |
ac577429 | 760 | times: /(A){m,n}B/ where A is fixed-length |
17e3e02a | 761 | CURLYX sv 3 Match/Capture this complex thing {n,m} |
89829bb5 | 762 | times. |
5da6b59a KW |
763 | |
764 | # This terminator creates a loop structure for CURLYX | |
89829bb5 KW |
765 | WHILEM no Do curly processing and see if rest |
766 | matches. | |
5da6b59a KW |
767 | |
768 | # Buffer related | |
769 | ||
770 | # OPEN,CLOSE,GROUPP ...are numbered at compile time. | |
89829bb5 KW |
771 | OPEN num 1 Mark this point in input as start of #n. |
772 | CLOSE num 1 Close corresponding OPEN of #n. | |
773 | SROPEN none Same as OPEN, but for script run | |
774 | SRCLOSE none Close preceding SROPEN | |
775 | ||
d78630f1 YO |
776 | REF num 2 Match some already matched string |
777 | REFF num 2 Match already matched string, using /di | |
912b808c | 778 | rules. |
d78630f1 | 779 | REFFL num 2 Match already matched string, using /li |
912b808c | 780 | rules. |
d78630f1 YO |
781 | REFFU num 2 Match already matched string, usng /ui. |
782 | REFFA num 2 Match already matched string, using /aai | |
912b808c | 783 | rules. |
65aa4ca7 FC |
784 | |
785 | # Named references. Code in regcomp.c assumes that these all are after | |
786 | # the numbered references | |
d78630f1 YO |
787 | REFN no-sv 2 Match some already matched string |
788 | REFFN no-sv 2 Match already matched string, using /di | |
912b808c | 789 | rules. |
d78630f1 | 790 | REFFLN no-sv 2 Match already matched string, using /li |
912b808c | 791 | rules. |
d78630f1 | 792 | REFFUN num 2 Match already matched string, using /ui |
912b808c | 793 | rules. |
d78630f1 | 794 | REFFAN num 2 Match already matched string, using /aai |
912b808c | 795 | rules. |
7bc66b18 | 796 | |
d3d47aac | 797 | # Support for long RE |
89829bb5 | 798 | LONGJMP off 1 1 Jump far away. |
17e3e02a | 799 | BRANCHJ off 2 1 BRANCH with long offset. |
d3d47aac YO |
800 | |
801 | # Special Case Regops | |
80101a2c | 802 | IFMATCH off 1 1 Succeeds if the following matches; non-zero |
2abbd513 KW |
803 | flags "f", next_off "o" means lookbehind |
804 | assertion starting "f..(f-o)" characters | |
805 | before current | |
80101a2c | 806 | UNLESSM off 1 1 Fails if the following matches; non-zero |
2abbd513 KW |
807 | flags "f", next_off "o" means lookbehind |
808 | assertion starting "f..(f-o)" characters | |
809 | before current | |
89829bb5 KW |
810 | SUSPEND off 1 1 "Independent" sub-RE. |
811 | IFTHEN off 1 1 Switch, should be preceded by switcher. | |
812 | GROUPP num 1 Whether the group matched. | |
5da6b59a | 813 | |
5da6b59a KW |
814 | # The heavy worker |
815 | ||
89829bb5 | 816 | EVAL evl/flags Execute some Perl code. |
17e3e02a | 817 | 2 |
5da6b59a KW |
818 | |
819 | # Modifiers | |
820 | ||
89829bb5 KW |
821 | MINMOD no Next operator is not greedy. |
822 | LOGICAL no Next opcode should set the flag only. | |
5da6b59a KW |
823 | |
824 | # This is not used yet | |
89829bb5 | 825 | RENUM off 1 1 Group with independently numbered parens. |
5da6b59a | 826 | |
5da6b59a | 827 | # Regex Subroutines |
17e3e02a | 828 | GOSUB num/ofs 2 recurse to paren arg1 at (signed) ofs arg2 |
5da6b59a KW |
829 | |
830 | # Special conditionals | |
016b7209 | 831 | GROUPPN no-sv 1 Whether the group matched. |
89829bb5 KW |
832 | INSUBP num 1 Whether we are in a specific recurse. |
833 | DEFINEP none 1 Never execute directly. | |
5da6b59a KW |
834 | |
835 | # Backtracking Verbs | |
89829bb5 KW |
836 | ENDLIKE none Used only for the type field of verbs |
837 | OPFAIL no-sv 1 Same as (?!), but with verb arg | |
838 | ACCEPT no-sv/num Accepts the current matched string, with | |
17e3e02a | 839 | 2 verbar |
5da6b59a KW |
840 | |
841 | # Verbs With Arguments | |
89829bb5 KW |
842 | VERB no-sv 1 Used only for the type field of verbs |
843 | PRUNE no-sv 1 Pattern fails at this startpoint if no- | |
844 | backtracking through this | |
845 | MARKPOINT no-sv 1 Push the current location for rollback by | |
846 | cut. | |
847 | SKIP no-sv 1 On failure skip forward (to the mark) | |
848 | before retrying | |
849 | COMMIT no-sv 1 Pattern fails outright if backtracking | |
850 | through this | |
851 | CUTGROUP no-sv 1 On failure go to the next alternation in | |
852 | the group | |
5da6b59a KW |
853 | |
854 | # Control what to keep in $&. | |
89829bb5 | 855 | KEEPS no $& begins here. |
5da6b59a | 856 | |
271c3af7 YO |
857 | # Validate that lookbehind IFMATCH and UNLESSM end at the right place |
858 | LOOKBEHIND_END no Return from lookbehind (IFMATCH/UNLESSM) | |
859 | and validate position | |
860 | ||
5da6b59a KW |
861 | # SPECIAL REGOPS |
862 | ||
65aa4ca7 FC |
863 | # This is not really a node, but an optimized away piece of a "long" |
864 | # node. To simplify debugging output, we mark it as if it were a node | |
89829bb5 | 865 | OPTIMIZED off Placeholder for dump. |
5da6b59a KW |
866 | |
867 | # Special opcode with the property that no opcode in a compiled program | |
868 | # will ever be of this type. Thus it can be used as a flag value that | |
869 | # no other opcode has been seen. END is used similarly, in that an END | |
65aa4ca7 FC |
870 | # node cant be optimized. So END implies "unoptimizable" and PSEUDO |
871 | # mean "not seen anything to optimize yet". | |
89829bb5 | 872 | PSEUDO off Pseudo opcode for internal use. |
65aa4ca7 | 873 | |
86451f01 KW |
874 | REGEX_SET depth p Regex set, temporary node used in pre- |
875 | optimization compilation | |
876 | ||
65aa4ca7 | 877 | =for regcomp.pl end |
055fd3a9 | 878 | |
1c102323 MJD |
879 | =for unprinted-credits |
880 | Next section M-J. Dominus (mjd-perl-patch+@plover.com) 20010421 | |
881 | ||
882 | Following the optimizer information is a dump of the offset/length | |
883 | table, here split across several lines: | |
884 | ||
885 | Offsets: [45] | |
886 | 1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1] | |
887 | 0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0] | |
888 | 11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0] | |
889 | 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0] | |
890 | ||
891 | The first line here indicates that the offset/length table contains 45 | |
892 | entries. Each entry is a pair of integers, denoted by C<offset[length]>. | |
17c338f3 | 893 | Entries are numbered starting with 1, so entry #1 here is C<1[4]> and |
1c102323 MJD |
894 | entry #12 is C<5[1]>. C<1[4]> indicates that the node labeled C<1:> |
895 | (the C<1: ANYOF[bc]>) begins at character position 1 in the | |
896 | pre-compiled form of the regex, and has a length of 4 characters. | |
897 | C<5[1]> in position 12 | |
898 | indicates that the node labeled C<12:> | |
899 | (the C<< 12: EXACT <d> >>) begins at character position 5 in the | |
900 | pre-compiled form of the regex, and has a length of 1 character. | |
901 | C<12[1]> in position 14 | |
902 | indicates that the node labeled C<14:> | |
903 | (the C<< 14: CURLYX[0] {1,32767} >>) begins at character position 12 in the | |
904 | pre-compiled form of the regex, and has a length of 1 character---that | |
905 | is, it corresponds to the C<+> symbol in the precompiled regex. | |
906 | ||
907 | C<0[0]> items indicate that there is no corresponding node. | |
908 | ||
7b406369 | 909 | =head2 Run-time Output |
055fd3a9 GS |
910 | |
911 | First of all, when doing a match, one may get no run-time output even | |
912 | if debugging is enabled. This means that the regex engine was never | |
913 | entered and that all of the job was therefore done by the optimizer. | |
914 | ||
915 | If the regex engine was entered, the output may look like this: | |
916 | ||
ccf3535a | 917 | Matching '[bc]d(ef*g)+h[ij]k$' against 'abcdefg__gh__' |
055fd3a9 GS |
918 | Setting an EVAL scope, savestack=3 |
919 | 2 <ab> <cdefg__gh_> | 1: ANYOF | |
920 | 3 <abc> <defg__gh_> | 11: EXACT <d> | |
921 | 4 <abcd> <efg__gh_> | 13: CURLYX {1,32767} | |
922 | 4 <abcd> <efg__gh_> | 26: WHILEM | |
923 | 0 out of 1..32767 cc=effff31c | |
924 | 4 <abcd> <efg__gh_> | 15: OPEN1 | |
925 | 4 <abcd> <efg__gh_> | 17: EXACT <e> | |
926 | 5 <abcde> <fg__gh_> | 19: STAR | |
927 | EXACT <f> can match 1 times out of 32767... | |
928 | Setting an EVAL scope, savestack=3 | |
929 | 6 <bcdef> <g__gh__> | 22: EXACT <g> | |
930 | 7 <bcdefg> <__gh__> | 24: CLOSE1 | |
931 | 7 <bcdefg> <__gh__> | 26: WHILEM | |
932 | 1 out of 1..32767 cc=effff31c | |
933 | Setting an EVAL scope, savestack=12 | |
934 | 7 <bcdefg> <__gh__> | 15: OPEN1 | |
935 | 7 <bcdefg> <__gh__> | 17: EXACT <e> | |
936 | restoring \1 to 4(4)..7 | |
937 | failed, try continuation... | |
938 | 7 <bcdefg> <__gh__> | 27: NOTHING | |
939 | 7 <bcdefg> <__gh__> | 28: EXACT <h> | |
940 | failed... | |
941 | failed... | |
942 | ||
943 | The most significant information in the output is about the particular I<node> | |
944 | of the compiled regex that is currently being tested against the target string. | |
945 | The format of these lines is | |
946 | ||
947 | C< >I<STRING-OFFSET> <I<PRE-STRING>> <I<POST-STRING>> |I<ID>: I<TYPE> | |
948 | ||
949 | The I<TYPE> info is indented with respect to the backtracking level. | |
950 | Other incidental information appears interspersed within. | |
951 | ||
7b406369 | 952 | =head1 Debugging Perl Memory Usage |
055fd3a9 GS |
953 | |
954 | Perl is a profligate wastrel when it comes to memory use. There | |
955 | is a saying that to estimate memory usage of Perl, assume a reasonable | |
956 | algorithm for memory allocation, multiply that estimate by 10, and | |
957 | while you still may miss the mark, at least you won't be quite so | |
4375e838 | 958 | astonished. This is not absolutely true, but may provide a good |
055fd3a9 GS |
959 | grasp of what happens. |
960 | ||
961 | Assume that an integer cannot take less than 20 bytes of memory, a | |
962 | float cannot take less than 24 bytes, a string cannot take less | |
963 | than 32 bytes (all these examples assume 32-bit architectures, the | |
964 | result are quite a bit worse on 64-bit architectures). If a variable | |
965 | is accessed in two of three different ways (which require an integer, | |
966 | a float, or a string), the memory footprint may increase yet another | |
b9449ee0 | 967 | 20 bytes. A sloppy malloc(3) implementation can inflate these |
055fd3a9 GS |
968 | numbers dramatically. |
969 | ||
970 | On the opposite end of the scale, a declaration like | |
971 | ||
972 | sub foo; | |
973 | ||
974 | may take up to 500 bytes of memory, depending on which release of Perl | |
975 | you're running. | |
976 | ||
977 | Anecdotal estimates of source-to-compiled code bloat suggest an | |
978 | eightfold increase. This means that the compiled form of reasonable | |
979 | (normally commented, properly indented etc.) code will take | |
980 | about eight times more space in memory than the code took | |
981 | on disk. | |
982 | ||
b30f304a JH |
983 | The B<-DL> command-line switch is obsolete since circa Perl 5.6.0 |
984 | (it was available only if Perl was built with C<-DDEBUGGING>). | |
985 | The switch was used to track Perl's memory allocations and possible | |
986 | memory leaks. These days the use of malloc debugging tools like | |
5b6a3331 | 987 | F<Purify> or F<valgrind> is suggested instead. See also |
7b406369 | 988 | L<perlhacktips/PERL_MEM_LOG>. |
b30f304a JH |
989 | |
990 | One way to find out how much memory is being used by Perl data | |
991 | structures is to install the Devel::Size module from CPAN: it gives | |
992 | you the minimum number of bytes required to store a particular data | |
993 | structure. Please be mindful of the difference between the size() | |
994 | and total_size(). | |
995 | ||
996 | If Perl has been compiled using Perl's malloc you can analyze Perl | |
7b406369 | 997 | memory usage by setting $ENV{PERL_DEBUG_MSTATS}. |
055fd3a9 GS |
998 | |
999 | =head2 Using C<$ENV{PERL_DEBUG_MSTATS}> | |
1000 | ||
1001 | If your perl is using Perl's malloc() and was compiled with the | |
1002 | necessary switches (this is the default), then it will print memory | |
4375e838 | 1003 | usage statistics after compiling your code when C<< $ENV{PERL_DEBUG_MSTATS} |
055fd3a9 GS |
1004 | > 1 >>, and before termination of the program when C<< |
1005 | $ENV{PERL_DEBUG_MSTATS} >= 1 >>. The report format is similar to | |
1006 | the following example: | |
1007 | ||
f185f654 KW |
1008 | $ PERL_DEBUG_MSTATS=2 perl -e "require Carp" |
1009 | Memory allocation statistics after compilation: (buckets 4(4)..8188(8192) | |
1010 | 14216 free: 130 117 28 7 9 0 2 2 1 0 0 | |
055fd3a9 | 1011 | 437 61 36 0 5 |
f185f654 | 1012 | 60924 used: 125 137 161 55 7 8 6 16 2 0 1 |
055fd3a9 | 1013 | 74 109 304 84 20 |
f185f654 KW |
1014 | Total sbrk(): 77824/21:119. Odd ends: pad+heads+chain+tail: 0+636+0+2048. |
1015 | Memory allocation statistics after execution: (buckets 4(4)..8188(8192) | |
1016 | 30888 free: 245 78 85 13 6 2 1 3 2 0 1 | |
055fd3a9 | 1017 | 315 162 39 42 11 |
f185f654 | 1018 | 175816 used: 265 176 1112 111 26 22 11 27 2 1 1 |
055fd3a9 | 1019 | 196 178 1066 798 39 |
f185f654 | 1020 | Total sbrk(): 215040/47:145. Odd ends: pad+heads+chain+tail: 0+2192+0+6144. |
055fd3a9 GS |
1021 | |
1022 | It is possible to ask for such a statistic at arbitrary points in | |
b9449ee0 | 1023 | your execution using the mstat() function out of the standard |
055fd3a9 GS |
1024 | Devel::Peek module. |
1025 | ||
1026 | Here is some explanation of that format: | |
1027 | ||
13a2d996 | 1028 | =over 4 |
055fd3a9 GS |
1029 | |
1030 | =item C<buckets SMALLEST(APPROX)..GREATEST(APPROX)> | |
1031 | ||
1032 | Perl's malloc() uses bucketed allocations. Every request is rounded | |
1033 | up to the closest bucket size available, and a bucket is taken from | |
1034 | the pool of buckets of that size. | |
1035 | ||
1036 | The line above describes the limits of buckets currently in use. | |
1037 | Each bucket has two sizes: memory footprint and the maximal size | |
1038 | of user data that can fit into this bucket. Suppose in the above | |
1039 | example that the smallest bucket were size 4. The biggest bucket | |
1040 | would have usable size 8188, and the memory footprint would be 8192. | |
1041 | ||
1042 | In a Perl built for debugging, some buckets may have negative usable | |
1043 | size. This means that these buckets cannot (and will not) be used. | |
1044 | For larger buckets, the memory footprint may be one page greater | |
7b406369 | 1045 | than a power of 2. If so, the corresponding power of two is |
055fd3a9 GS |
1046 | printed in the C<APPROX> field above. |
1047 | ||
1048 | =item Free/Used | |
1049 | ||
1050 | The 1 or 2 rows of numbers following that correspond to the number | |
1051 | of buckets of each size between C<SMALLEST> and C<GREATEST>. In | |
1052 | the first row, the sizes (memory footprints) of buckets are powers | |
1053 | of two--or possibly one page greater. In the second row, if present, | |
1054 | the memory footprints of the buckets are between the memory footprints | |
1055 | of two buckets "above". | |
1056 | ||
4375e838 | 1057 | For example, suppose under the previous example, the memory footprints |
055fd3a9 GS |
1058 | were |
1059 | ||
f185f654 | 1060 | free: 8 16 32 64 128 256 512 1024 2048 4096 8192 |
055fd3a9 GS |
1061 | 4 12 24 48 80 |
1062 | ||
7b406369 | 1063 | With a non-C<DEBUGGING> perl, the buckets starting from C<128> have |
d1be9408 | 1064 | a 4-byte overhead, and thus an 8192-long bucket may take up to |
055fd3a9 GS |
1065 | 8188-byte allocations. |
1066 | ||
1067 | =item C<Total sbrk(): SBRKed/SBRKs:CONTINUOUS> | |
1068 | ||
1069 | The first two fields give the total amount of memory perl sbrk(2)ed | |
1070 | (ess-broken? :-) and number of sbrk(2)s used. The third number is | |
1071 | what perl thinks about continuity of returned chunks. So long as | |
1072 | this number is positive, malloc() will assume that it is probable | |
1073 | that sbrk(2) will provide continuous memory. | |
1074 | ||
1075 | Memory allocated by external libraries is not counted. | |
1076 | ||
1077 | =item C<pad: 0> | |
1078 | ||
1079 | The amount of sbrk(2)ed memory needed to keep buckets aligned. | |
1080 | ||
1081 | =item C<heads: 2192> | |
1082 | ||
1083 | Although memory overhead of bigger buckets is kept inside the bucket, for | |
1084 | smaller buckets, it is kept in separate areas. This field gives the | |
1085 | total size of these areas. | |
1086 | ||
1087 | =item C<chain: 0> | |
1088 | ||
1089 | malloc() may want to subdivide a bigger bucket into smaller buckets. | |
1090 | If only a part of the deceased bucket is left unsubdivided, the rest | |
1091 | is kept as an element of a linked list. This field gives the total | |
1092 | size of these chunks. | |
1093 | ||
1094 | =item C<tail: 6144> | |
1095 | ||
1096 | To minimize the number of sbrk(2)s, malloc() asks for more memory. This | |
1097 | field gives the size of the yet unused part, which is sbrk(2)ed, but | |
1098 | never touched. | |
1099 | ||
1100 | =back | |
1101 | ||
055fd3a9 GS |
1102 | =head1 SEE ALSO |
1103 | ||
1104 | L<perldebug>, | |
65ac759c | 1105 | L<perl5db.pl>, |
055fd3a9 | 1106 | L<perlguts>, |
65ac759c | 1107 | L<perlrun>, |
055fd3a9 GS |
1108 | L<re>, |
1109 | and | |
fe854a6f | 1110 | L<Devel::DProf>. |