Commit | Line | Data |
---|---|---|
055fd3a9 GS |
1 | =head1 NAME |
2 | ||
3 | perldebguts - Guts of Perl debugging | |
4 | ||
5 | =head1 DESCRIPTION | |
6 | ||
7 | This is not the perldebug(1) manpage, which tells you how to use | |
8 | the debugger. This manpage describes low-level details ranging | |
9 | between difficult and impossible for anyone who isn't incredibly | |
10 | intimate with Perl's guts to understand. Caveat lector. | |
11 | ||
12 | =head1 Debugger Internals | |
13 | ||
14 | Perl has special debugging hooks at compile-time and run-time used | |
15 | to create debugging environments. These hooks are not to be confused | |
4375e838 GS |
16 | with the I<perl -Dxxx> command described in L<perlrun>, which is |
17 | usable only if a special Perl is built per the instructions in the | |
055fd3a9 GS |
18 | F<INSTALL> podpage in the Perl source tree. |
19 | ||
20 | For example, whenever you call Perl's built-in C<caller> function | |
21 | from the package DB, the arguments that the corresponding stack | |
106325ad | 22 | frame was called with are copied to the @DB::args array. The |
055fd3a9 GS |
23 | general mechanisms is enabled by calling Perl with the B<-d> switch, the |
24 | following additional features are enabled (cf. L<perlvar/$^P>): | |
25 | ||
13a2d996 | 26 | =over 4 |
055fd3a9 GS |
27 | |
28 | =item * | |
29 | ||
30 | Perl inserts the contents of C<$ENV{PERL5DB}> (or C<BEGIN {require | |
31 | 'perl5db.pl'}> if not present) before the first line of your program. | |
32 | ||
33 | =item * | |
34 | ||
aa0b556f MJD |
35 | Each array C<@{"_<$filename"}> holds the lines of $filename for a |
36 | file compiled by Perl. The same for C<eval>ed strings that contain | |
055fd3a9 GS |
37 | subroutines, or which are currently being executed. The $filename |
38 | for C<eval>ed strings looks like C<(eval 34)>. Code assertions | |
8894c26d MJD |
39 | in regexes look like C<(re_eval 19)>. |
40 | ||
41 | Values in this array are magical in numeric context: they compare | |
42 | equal to zero only if the line is not breakable. | |
055fd3a9 GS |
43 | |
44 | =item * | |
45 | ||
aa0b556f | 46 | Each hash C<%{"_<$filename"}> contains breakpoints and actions keyed |
055fd3a9 GS |
47 | by line number. Individual entries (as opposed to the whole hash) |
48 | are settable. Perl only cares about Boolean true here, although | |
49 | the values used by F<perl5db.pl> have the form | |
8894c26d | 50 | C<"$break_condition\0$action">. |
055fd3a9 GS |
51 | |
52 | The same holds for evaluated strings that contain subroutines, or | |
53 | which are currently being executed. The $filename for C<eval>ed strings | |
54 | looks like C<(eval 34)> or C<(re_eval 19)>. | |
55 | ||
56 | =item * | |
57 | ||
aa0b556f | 58 | Each scalar C<${"_<$filename"}> contains C<"_<$filename">. This is |
055fd3a9 GS |
59 | also the case for evaluated strings that contain subroutines, or |
60 | which are currently being executed. The $filename for C<eval>ed | |
61 | strings looks like C<(eval 34)> or C<(re_eval 19)>. | |
62 | ||
63 | =item * | |
64 | ||
65 | After each C<require>d file is compiled, but before it is executed, | |
66 | C<DB::postponed(*{"_<$filename"})> is called if the subroutine | |
67 | C<DB::postponed> exists. Here, the $filename is the expanded name of | |
68 | the C<require>d file, as found in the values of %INC. | |
69 | ||
70 | =item * | |
71 | ||
72 | After each subroutine C<subname> is compiled, the existence of | |
73 | C<$DB::postponed{subname}> is checked. If this key exists, | |
74 | C<DB::postponed(subname)> is called if the C<DB::postponed> subroutine | |
75 | also exists. | |
76 | ||
77 | =item * | |
78 | ||
79 | A hash C<%DB::sub> is maintained, whose keys are subroutine names | |
80 | and whose values have the form C<filename:startline-endline>. | |
81 | C<filename> has the form C<(eval 34)> for subroutines defined inside | |
82 | C<eval>s, or C<(re_eval 19)> for those within regex code assertions. | |
83 | ||
84 | =item * | |
85 | ||
86 | When the execution of your program reaches a point that can hold a | |
87 | breakpoint, the C<DB::DB()> subroutine is called any of the variables | |
88 | $DB::trace, $DB::single, or $DB::signal is true. These variables | |
89 | are not C<local>izable. This feature is disabled when executing | |
90 | inside C<DB::DB()>, including functions called from it | |
91 | unless C<< $^D & (1<<30) >> is true. | |
92 | ||
93 | =item * | |
94 | ||
95 | When execution of the program reaches a subroutine call, a call to | |
96 | C<&DB::sub>(I<args>) is made instead, with C<$DB::sub> holding the | |
97 | name of the called subroutine. This doesn't happen if the subroutine | |
98 | was compiled in the C<DB> package.) | |
99 | ||
100 | =back | |
101 | ||
102 | Note that if C<&DB::sub> needs external data for it to work, no | |
103 | subroutine call is possible until this is done. For the standard | |
104 | debugger, the C<$DB::deep> variable (how many levels of recursion | |
105 | deep into the debugger you can go before a mandatory break) gives | |
106 | an example of such a dependency. | |
107 | ||
108 | =head2 Writing Your Own Debugger | |
109 | ||
110 | The minimal working debugger consists of one line | |
111 | ||
112 | sub DB::DB {} | |
113 | ||
114 | which is quite handy as contents of C<PERL5DB> environment | |
115 | variable: | |
116 | ||
117 | $ PERL5DB="sub DB::DB {}" perl -d your-script | |
118 | ||
119 | Another brief debugger, slightly more useful, could be created | |
120 | with only the line: | |
121 | ||
122 | sub DB::DB {print ++$i; scalar <STDIN>} | |
123 | ||
124 | This debugger would print the sequential number of encountered | |
125 | statement, and would wait for you to hit a newline before continuing. | |
126 | ||
127 | The following debugger is quite functional: | |
128 | ||
129 | { | |
130 | package DB; | |
131 | sub DB {} | |
132 | sub sub {print ++$i, " $sub\n"; &$sub} | |
133 | } | |
134 | ||
135 | It prints the sequential number of subroutine call and the name of the | |
136 | called subroutine. Note that C<&DB::sub> should be compiled into the | |
137 | package C<DB>. | |
138 | ||
139 | At the start, the debugger reads your rc file (F<./.perldb> or | |
140 | F<~/.perldb> under Unix), which can set important options. This file may | |
141 | define a subroutine C<&afterinit> to be executed after the debugger is | |
142 | initialized. | |
143 | ||
144 | After the rc file is read, the debugger reads the PERLDB_OPTS | |
145 | environment variable and parses this as the remainder of a C<O ...> | |
146 | line as one might enter at the debugger prompt. | |
147 | ||
148 | The debugger also maintains magical internal variables, such as | |
149 | C<@DB::dbline>, C<%DB::dbline>, which are aliases for | |
150 | C<@{"::_<current_file"}> C<%{"::_<current_file"}>. Here C<current_file> | |
151 | is the currently selected file, either explicitly chosen with the | |
152 | debugger's C<f> command, or implicitly by flow of execution. | |
153 | ||
154 | Some functions are provided to simplify customization. See | |
155 | L<perldebug/"Options"> for description of options parsed by | |
156 | C<DB::parse_options(string)>. The function C<DB::dump_trace(skip[, | |
157 | count])> skips the specified number of frames and returns a list | |
158 | containing information about the calling frames (all of them, if | |
106325ad | 159 | C<count> is missing). Each entry is reference to a hash with |
055fd3a9 GS |
160 | keys C<context> (either C<.>, C<$>, or C<@>), C<sub> (subroutine |
161 | name, or info about C<eval>), C<args> (C<undef> or a reference to | |
162 | an array), C<file>, and C<line>. | |
163 | ||
164 | The function C<DB::print_trace(FH, skip[, count[, short]])> prints | |
165 | formatted info about caller frames. The last two functions may be | |
166 | convenient as arguments to C<< < >>, C<< << >> commands. | |
167 | ||
168 | Note that any variables and functions that are not documented in | |
169 | this manpages (or in L<perldebug>) are considered for internal | |
170 | use only, and as such are subject to change without notice. | |
171 | ||
172 | =head1 Frame Listing Output Examples | |
173 | ||
174 | The C<frame> option can be used to control the output of frame | |
175 | information. For example, contrast this expression trace: | |
176 | ||
177 | $ perl -de 42 | |
178 | Stack dump during die enabled outside of evals. | |
179 | ||
180 | Loading DB routines from perl5db.pl patch level 0.94 | |
181 | Emacs support available. | |
182 | ||
183 | Enter h or `h h' for help. | |
184 | ||
185 | main::(-e:1): 0 | |
186 | DB<1> sub foo { 14 } | |
187 | ||
188 | DB<2> sub bar { 3 } | |
189 | ||
190 | DB<3> t print foo() * bar() | |
191 | main::((eval 172):3): print foo() + bar(); | |
192 | main::foo((eval 168):2): | |
193 | main::bar((eval 170):2): | |
194 | 42 | |
195 | ||
196 | with this one, once the C<O>ption C<frame=2> has been set: | |
197 | ||
198 | DB<4> O f=2 | |
199 | frame = '2' | |
200 | DB<5> t print foo() * bar() | |
201 | 3: foo() * bar() | |
202 | entering main::foo | |
203 | 2: sub foo { 14 }; | |
204 | exited main::foo | |
205 | entering main::bar | |
206 | 2: sub bar { 3 }; | |
207 | exited main::bar | |
208 | 42 | |
209 | ||
210 | By way of demonstration, we present below a laborious listing | |
211 | resulting from setting your C<PERLDB_OPTS> environment variable to | |
212 | the value C<f=n N>, and running I<perl -d -V> from the command line. | |
213 | Examples use various values of C<n> are shown to give you a feel | |
214 | for the difference between settings. Long those it may be, this | |
215 | is not a complete listing, but only excerpts. | |
216 | ||
217 | =over 4 | |
218 | ||
219 | =item 1 | |
220 | ||
221 | entering main::BEGIN | |
222 | entering Config::BEGIN | |
223 | Package lib/Exporter.pm. | |
224 | Package lib/Carp.pm. | |
225 | Package lib/Config.pm. | |
226 | entering Config::TIEHASH | |
227 | entering Exporter::import | |
228 | entering Exporter::export | |
229 | entering Config::myconfig | |
230 | entering Config::FETCH | |
231 | entering Config::FETCH | |
232 | entering Config::FETCH | |
233 | entering Config::FETCH | |
234 | ||
235 | =item 2 | |
236 | ||
237 | entering main::BEGIN | |
238 | entering Config::BEGIN | |
239 | Package lib/Exporter.pm. | |
240 | Package lib/Carp.pm. | |
241 | exited Config::BEGIN | |
242 | Package lib/Config.pm. | |
243 | entering Config::TIEHASH | |
244 | exited Config::TIEHASH | |
245 | entering Exporter::import | |
246 | entering Exporter::export | |
247 | exited Exporter::export | |
248 | exited Exporter::import | |
249 | exited main::BEGIN | |
250 | entering Config::myconfig | |
251 | entering Config::FETCH | |
252 | exited Config::FETCH | |
253 | entering Config::FETCH | |
254 | exited Config::FETCH | |
255 | entering Config::FETCH | |
256 | ||
257 | =item 4 | |
258 | ||
259 | in $=main::BEGIN() from /dev/null:0 | |
260 | in $=Config::BEGIN() from lib/Config.pm:2 | |
261 | Package lib/Exporter.pm. | |
262 | Package lib/Carp.pm. | |
263 | Package lib/Config.pm. | |
264 | in $=Config::TIEHASH('Config') from lib/Config.pm:644 | |
265 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 | |
266 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from li | |
267 | in @=Config::myconfig() from /dev/null:0 | |
268 | in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 | |
269 | in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 | |
270 | in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 | |
271 | in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574 | |
272 | in $=Config::FETCH(ref(Config), 'osname') from lib/Config.pm:574 | |
273 | in $=Config::FETCH(ref(Config), 'osvers') from lib/Config.pm:574 | |
274 | ||
275 | =item 6 | |
276 | ||
277 | in $=main::BEGIN() from /dev/null:0 | |
278 | in $=Config::BEGIN() from lib/Config.pm:2 | |
279 | Package lib/Exporter.pm. | |
280 | Package lib/Carp.pm. | |
281 | out $=Config::BEGIN() from lib/Config.pm:0 | |
282 | Package lib/Config.pm. | |
283 | in $=Config::TIEHASH('Config') from lib/Config.pm:644 | |
284 | out $=Config::TIEHASH('Config') from lib/Config.pm:644 | |
285 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 | |
286 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/ | |
287 | out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/ | |
288 | out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 | |
289 | out $=main::BEGIN() from /dev/null:0 | |
290 | in @=Config::myconfig() from /dev/null:0 | |
291 | in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 | |
292 | out $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 | |
293 | in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 | |
294 | out $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 | |
295 | in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 | |
296 | out $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 | |
297 | in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574 | |
298 | ||
299 | =item 14 | |
300 | ||
301 | in $=main::BEGIN() from /dev/null:0 | |
302 | in $=Config::BEGIN() from lib/Config.pm:2 | |
303 | Package lib/Exporter.pm. | |
304 | Package lib/Carp.pm. | |
305 | out $=Config::BEGIN() from lib/Config.pm:0 | |
306 | Package lib/Config.pm. | |
307 | in $=Config::TIEHASH('Config') from lib/Config.pm:644 | |
308 | out $=Config::TIEHASH('Config') from lib/Config.pm:644 | |
309 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 | |
310 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E | |
311 | out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E | |
312 | out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 | |
313 | out $=main::BEGIN() from /dev/null:0 | |
314 | in @=Config::myconfig() from /dev/null:0 | |
315 | in $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574 | |
316 | out $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574 | |
317 | in $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574 | |
318 | out $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574 | |
319 | ||
320 | =item 30 | |
321 | ||
322 | in $=CODE(0x15eca4)() from /dev/null:0 | |
323 | in $=CODE(0x182528)() from lib/Config.pm:2 | |
324 | Package lib/Exporter.pm. | |
325 | out $=CODE(0x182528)() from lib/Config.pm:0 | |
326 | scalar context return from CODE(0x182528): undef | |
327 | Package lib/Config.pm. | |
328 | in $=Config::TIEHASH('Config') from lib/Config.pm:628 | |
329 | out $=Config::TIEHASH('Config') from lib/Config.pm:628 | |
330 | scalar context return from Config::TIEHASH: empty hash | |
331 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 | |
332 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171 | |
333 | out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171 | |
334 | scalar context return from Exporter::export: '' | |
335 | out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 | |
336 | scalar context return from Exporter::import: '' | |
337 | ||
338 | =back | |
339 | ||
340 | In all cases shown above, the line indentation shows the call tree. | |
341 | If bit 2 of C<frame> is set, a line is printed on exit from a | |
342 | subroutine as well. If bit 4 is set, the arguments are printed | |
343 | along with the caller info. If bit 8 is set, the arguments are | |
344 | printed even if they are tied or references. If bit 16 is set, the | |
345 | return value is printed, too. | |
346 | ||
347 | When a package is compiled, a line like this | |
348 | ||
349 | Package lib/Carp.pm. | |
350 | ||
351 | is printed with proper indentation. | |
352 | ||
353 | =head1 Debugging regular expressions | |
354 | ||
355 | There are two ways to enable debugging output for regular expressions. | |
356 | ||
357 | If your perl is compiled with C<-DDEBUGGING>, you may use the | |
358 | B<-Dr> flag on the command line. | |
359 | ||
360 | Otherwise, one can C<use re 'debug'>, which has effects at | |
361 | compile time and run time. It is not lexically scoped. | |
362 | ||
363 | =head2 Compile-time output | |
364 | ||
365 | The debugging output at compile time looks like this: | |
366 | ||
1c102323 MJD |
367 | Compiling REx `[bc]d(ef*g)+h[ij]k$' |
368 | size 45 Got 364 bytes for offset annotations. | |
369 | first at 1 | |
370 | rarest char g at 0 | |
371 | rarest char d at 0 | |
372 | 1: ANYOF[bc](12) | |
373 | 12: EXACT <d>(14) | |
374 | 14: CURLYX[0] {1,32767}(28) | |
375 | 16: OPEN1(18) | |
376 | 18: EXACT <e>(20) | |
377 | 20: STAR(23) | |
378 | 21: EXACT <f>(0) | |
379 | 23: EXACT <g>(25) | |
380 | 25: CLOSE1(27) | |
381 | 27: WHILEM[1/1](0) | |
382 | 28: NOTHING(29) | |
383 | 29: EXACT <h>(31) | |
384 | 31: ANYOF[ij](42) | |
385 | 42: EXACT <k>(44) | |
386 | 44: EOL(45) | |
387 | 45: END(0) | |
388 | anchored `de' at 1 floating `gh' at 3..2147483647 (checking floating) | |
389 | stclass `ANYOF[bc]' minlen 7 | |
390 | Offsets: [45] | |
391 | 1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1] | |
392 | 0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0] | |
393 | 11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0] | |
394 | 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0] | |
395 | Omitting $` $& $' support. | |
055fd3a9 GS |
396 | |
397 | The first line shows the pre-compiled form of the regex. The second | |
398 | shows the size of the compiled form (in arbitrary units, usually | |
1c102323 MJD |
399 | 4-byte words) and the total number of bytes allocated for the |
400 | offset/length table, usually 4+C<size>*8. The next line shows the | |
401 | label I<id> of the first node that does a match. | |
055fd3a9 | 402 | |
1c102323 MJD |
403 | The |
404 | ||
405 | anchored `de' at 1 floating `gh' at 3..2147483647 (checking floating) | |
406 | stclass `ANYOF[bc]' minlen 7 | |
407 | ||
408 | line (split into two lines above) contains optimizer | |
055fd3a9 GS |
409 | information. In the example shown, the optimizer found that the match |
410 | should contain a substring C<de> at offset 1, plus substring C<gh> | |
411 | at some offset between 3 and infinity. Moreover, when checking for | |
412 | these substrings (to abandon impossible matches quickly), Perl will check | |
413 | for the substring C<gh> before checking for the substring C<de>. The | |
414 | optimizer may also use the knowledge that the match starts (at the | |
1c102323 MJD |
415 | C<first> I<id>) with a character class, and no string |
416 | shorter than 7 characters can possibly match. | |
055fd3a9 | 417 | |
1c102323 | 418 | The fields of interest which may appear in this line are |
055fd3a9 | 419 | |
13a2d996 | 420 | =over 4 |
055fd3a9 GS |
421 | |
422 | =item C<anchored> I<STRING> C<at> I<POS> | |
423 | ||
424 | =item C<floating> I<STRING> C<at> I<POS1..POS2> | |
425 | ||
426 | See above. | |
427 | ||
428 | =item C<matching floating/anchored> | |
429 | ||
430 | Which substring to check first. | |
431 | ||
432 | =item C<minlen> | |
433 | ||
434 | The minimal length of the match. | |
435 | ||
436 | =item C<stclass> I<TYPE> | |
437 | ||
438 | Type of first matching node. | |
439 | ||
440 | =item C<noscan> | |
441 | ||
442 | Don't scan for the found substrings. | |
443 | ||
444 | =item C<isall> | |
445 | ||
1c102323 | 446 | Means that the optimizer information is all that the regular |
055fd3a9 GS |
447 | expression contains, and thus one does not need to enter the regex engine at |
448 | all. | |
449 | ||
450 | =item C<GPOS> | |
451 | ||
452 | Set if the pattern contains C<\G>. | |
453 | ||
454 | =item C<plus> | |
455 | ||
456 | Set if the pattern starts with a repeated char (as in C<x+y>). | |
457 | ||
458 | =item C<implicit> | |
459 | ||
460 | Set if the pattern starts with C<.*>. | |
461 | ||
462 | =item C<with eval> | |
463 | ||
464 | Set if the pattern contain eval-groups, such as C<(?{ code })> and | |
465 | C<(??{ code })>. | |
466 | ||
467 | =item C<anchored(TYPE)> | |
468 | ||
469 | If the pattern may match only at a handful of places, (with C<TYPE> | |
470 | being C<BOL>, C<MBOL>, or C<GPOS>. See the table below. | |
471 | ||
472 | =back | |
473 | ||
474 | If a substring is known to match at end-of-line only, it may be | |
475 | followed by C<$>, as in C<floating `k'$>. | |
476 | ||
1c102323 MJD |
477 | The optimizer-specific information is used to avoid entering (a slow) regex |
478 | engine on strings that will not definitely match. If the C<isall> flag | |
055fd3a9 GS |
479 | is set, a call to the regex engine may be avoided even when the optimizer |
480 | found an appropriate place for the match. | |
481 | ||
1c102323 | 482 | Above the optimizer section is the list of I<nodes> of the compiled |
055fd3a9 GS |
483 | form of the regex. Each line has format |
484 | ||
485 | C< >I<id>: I<TYPE> I<OPTIONAL-INFO> (I<next-id>) | |
486 | ||
487 | =head2 Types of nodes | |
488 | ||
489 | Here are the possible types, with short descriptions: | |
490 | ||
491 | # TYPE arg-description [num-args] [longjump-len] DESCRIPTION | |
492 | ||
493 | # Exit points | |
494 | END no End of program. | |
495 | SUCCEED no Return from a subroutine, basically. | |
496 | ||
497 | # Anchors: | |
498 | BOL no Match "" at beginning of line. | |
499 | MBOL no Same, assuming multiline. | |
500 | SBOL no Same, assuming singleline. | |
501 | EOS no Match "" at end of string. | |
502 | EOL no Match "" at end of line. | |
503 | MEOL no Same, assuming multiline. | |
504 | SEOL no Same, assuming singleline. | |
505 | BOUND no Match "" at any word boundary | |
506 | BOUNDL no Match "" at any word boundary | |
507 | NBOUND no Match "" at any word non-boundary | |
508 | NBOUNDL no Match "" at any word non-boundary | |
509 | GPOS no Matches where last m//g left off. | |
510 | ||
511 | # [Special] alternatives | |
512 | ANY no Match any one character (except newline). | |
513 | SANY no Match any one character. | |
514 | ANYOF sv Match character in (or not in) this class. | |
515 | ALNUM no Match any alphanumeric character | |
516 | ALNUML no Match any alphanumeric char in locale | |
517 | NALNUM no Match any non-alphanumeric character | |
518 | NALNUML no Match any non-alphanumeric char in locale | |
519 | SPACE no Match any whitespace character | |
520 | SPACEL no Match any whitespace char in locale | |
521 | NSPACE no Match any non-whitespace character | |
522 | NSPACEL no Match any non-whitespace char in locale | |
523 | DIGIT no Match any numeric character | |
524 | NDIGIT no Match any non-numeric character | |
525 | ||
526 | # BRANCH The set of branches constituting a single choice are hooked | |
527 | # together with their "next" pointers, since precedence prevents | |
528 | # anything being concatenated to any individual branch. The | |
529 | # "next" pointer of the last BRANCH in a choice points to the | |
530 | # thing following the whole choice. This is also where the | |
531 | # final "next" pointer of each individual branch points; each | |
532 | # branch starts with the operand node of a BRANCH node. | |
533 | # | |
534 | BRANCH node Match this alternative, or the next... | |
535 | ||
536 | # BACK Normal "next" pointers all implicitly point forward; BACK | |
537 | # exists to make loop structures possible. | |
538 | # not used | |
539 | BACK no Match "", "next" ptr points backward. | |
540 | ||
541 | # Literals | |
542 | EXACT sv Match this string (preceded by length). | |
543 | EXACTF sv Match this string, folded (prec. by length). | |
544 | EXACTFL sv Match this string, folded in locale (w/len). | |
545 | ||
546 | # Do nothing | |
547 | NOTHING no Match empty string. | |
548 | # A variant of above which delimits a group, thus stops optimizations | |
549 | TAIL no Match empty string. Can jump here from outside. | |
550 | ||
551 | # STAR,PLUS '?', and complex '*' and '+', are implemented as circular | |
552 | # BRANCH structures using BACK. Simple cases (one character | |
553 | # per match) are implemented with STAR and PLUS for speed | |
554 | # and to minimize recursive plunges. | |
555 | # | |
556 | STAR node Match this (simple) thing 0 or more times. | |
557 | PLUS node Match this (simple) thing 1 or more times. | |
558 | ||
559 | CURLY sv 2 Match this simple thing {n,m} times. | |
560 | CURLYN no 2 Match next-after-this simple thing | |
561 | # {n,m} times, set parens. | |
562 | CURLYM no 2 Match this medium-complex thing {n,m} times. | |
563 | CURLYX sv 2 Match this complex thing {n,m} times. | |
564 | ||
565 | # This terminator creates a loop structure for CURLYX | |
566 | WHILEM no Do curly processing and see if rest matches. | |
567 | ||
568 | # OPEN,CLOSE,GROUPP ...are numbered at compile time. | |
569 | OPEN num 1 Mark this point in input as start of #n. | |
570 | CLOSE num 1 Analogous to OPEN. | |
571 | ||
572 | REF num 1 Match some already matched string | |
573 | REFF num 1 Match already matched string, folded | |
574 | REFFL num 1 Match already matched string, folded in loc. | |
575 | ||
576 | # grouping assertions | |
577 | IFMATCH off 1 2 Succeeds if the following matches. | |
578 | UNLESSM off 1 2 Fails if the following matches. | |
579 | SUSPEND off 1 1 "Independent" sub-regex. | |
580 | IFTHEN off 1 1 Switch, should be preceded by switcher . | |
581 | GROUPP num 1 Whether the group matched. | |
582 | ||
583 | # Support for long regex | |
584 | LONGJMP off 1 1 Jump far away. | |
585 | BRANCHJ off 1 1 BRANCH with long offset. | |
586 | ||
587 | # The heavy worker | |
588 | EVAL evl 1 Execute some Perl code. | |
589 | ||
590 | # Modifiers | |
591 | MINMOD no Next operator is not greedy. | |
592 | LOGICAL no Next opcode should set the flag only. | |
593 | ||
594 | # This is not used yet | |
595 | RENUM off 1 1 Group with independently numbered parens. | |
596 | ||
597 | # This is not really a node, but an optimized away piece of a "long" node. | |
598 | # To simplify debugging output, we mark it as if it were a node | |
599 | OPTIMIZED off Placeholder for dump. | |
600 | ||
1c102323 MJD |
601 | =for unprinted-credits |
602 | Next section M-J. Dominus (mjd-perl-patch+@plover.com) 20010421 | |
603 | ||
604 | Following the optimizer information is a dump of the offset/length | |
605 | table, here split across several lines: | |
606 | ||
607 | Offsets: [45] | |
608 | 1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1] | |
609 | 0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0] | |
610 | 11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0] | |
611 | 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0] | |
612 | ||
613 | The first line here indicates that the offset/length table contains 45 | |
614 | entries. Each entry is a pair of integers, denoted by C<offset[length]>. | |
615 | Entries are numbered starting with, so entry #1 here is C<1[4]> and | |
616 | entry #12 is C<5[1]>. C<1[4]> indicates that the node labeled C<1:> | |
617 | (the C<1: ANYOF[bc]>) begins at character position 1 in the | |
618 | pre-compiled form of the regex, and has a length of 4 characters. | |
619 | C<5[1]> in position 12 | |
620 | indicates that the node labeled C<12:> | |
621 | (the C<< 12: EXACT <d> >>) begins at character position 5 in the | |
622 | pre-compiled form of the regex, and has a length of 1 character. | |
623 | C<12[1]> in position 14 | |
624 | indicates that the node labeled C<14:> | |
625 | (the C<< 14: CURLYX[0] {1,32767} >>) begins at character position 12 in the | |
626 | pre-compiled form of the regex, and has a length of 1 character---that | |
627 | is, it corresponds to the C<+> symbol in the precompiled regex. | |
628 | ||
629 | C<0[0]> items indicate that there is no corresponding node. | |
630 | ||
055fd3a9 GS |
631 | =head2 Run-time output |
632 | ||
633 | First of all, when doing a match, one may get no run-time output even | |
634 | if debugging is enabled. This means that the regex engine was never | |
635 | entered and that all of the job was therefore done by the optimizer. | |
636 | ||
637 | If the regex engine was entered, the output may look like this: | |
638 | ||
639 | Matching `[bc]d(ef*g)+h[ij]k$' against `abcdefg__gh__' | |
640 | Setting an EVAL scope, savestack=3 | |
641 | 2 <ab> <cdefg__gh_> | 1: ANYOF | |
642 | 3 <abc> <defg__gh_> | 11: EXACT <d> | |
643 | 4 <abcd> <efg__gh_> | 13: CURLYX {1,32767} | |
644 | 4 <abcd> <efg__gh_> | 26: WHILEM | |
645 | 0 out of 1..32767 cc=effff31c | |
646 | 4 <abcd> <efg__gh_> | 15: OPEN1 | |
647 | 4 <abcd> <efg__gh_> | 17: EXACT <e> | |
648 | 5 <abcde> <fg__gh_> | 19: STAR | |
649 | EXACT <f> can match 1 times out of 32767... | |
650 | Setting an EVAL scope, savestack=3 | |
651 | 6 <bcdef> <g__gh__> | 22: EXACT <g> | |
652 | 7 <bcdefg> <__gh__> | 24: CLOSE1 | |
653 | 7 <bcdefg> <__gh__> | 26: WHILEM | |
654 | 1 out of 1..32767 cc=effff31c | |
655 | Setting an EVAL scope, savestack=12 | |
656 | 7 <bcdefg> <__gh__> | 15: OPEN1 | |
657 | 7 <bcdefg> <__gh__> | 17: EXACT <e> | |
658 | restoring \1 to 4(4)..7 | |
659 | failed, try continuation... | |
660 | 7 <bcdefg> <__gh__> | 27: NOTHING | |
661 | 7 <bcdefg> <__gh__> | 28: EXACT <h> | |
662 | failed... | |
663 | failed... | |
664 | ||
665 | The most significant information in the output is about the particular I<node> | |
666 | of the compiled regex that is currently being tested against the target string. | |
667 | The format of these lines is | |
668 | ||
669 | C< >I<STRING-OFFSET> <I<PRE-STRING>> <I<POST-STRING>> |I<ID>: I<TYPE> | |
670 | ||
671 | The I<TYPE> info is indented with respect to the backtracking level. | |
672 | Other incidental information appears interspersed within. | |
673 | ||
674 | =head1 Debugging Perl memory usage | |
675 | ||
676 | Perl is a profligate wastrel when it comes to memory use. There | |
677 | is a saying that to estimate memory usage of Perl, assume a reasonable | |
678 | algorithm for memory allocation, multiply that estimate by 10, and | |
679 | while you still may miss the mark, at least you won't be quite so | |
4375e838 | 680 | astonished. This is not absolutely true, but may provide a good |
055fd3a9 GS |
681 | grasp of what happens. |
682 | ||
683 | Assume that an integer cannot take less than 20 bytes of memory, a | |
684 | float cannot take less than 24 bytes, a string cannot take less | |
685 | than 32 bytes (all these examples assume 32-bit architectures, the | |
686 | result are quite a bit worse on 64-bit architectures). If a variable | |
687 | is accessed in two of three different ways (which require an integer, | |
688 | a float, or a string), the memory footprint may increase yet another | |
b9449ee0 | 689 | 20 bytes. A sloppy malloc(3) implementation can inflate these |
055fd3a9 GS |
690 | numbers dramatically. |
691 | ||
692 | On the opposite end of the scale, a declaration like | |
693 | ||
694 | sub foo; | |
695 | ||
696 | may take up to 500 bytes of memory, depending on which release of Perl | |
697 | you're running. | |
698 | ||
699 | Anecdotal estimates of source-to-compiled code bloat suggest an | |
700 | eightfold increase. This means that the compiled form of reasonable | |
701 | (normally commented, properly indented etc.) code will take | |
702 | about eight times more space in memory than the code took | |
703 | on disk. | |
704 | ||
705 | There are two Perl-specific ways to analyze memory usage: | |
706 | $ENV{PERL_DEBUG_MSTATS} and B<-DL> command-line switch. The first | |
707 | is available only if Perl is compiled with Perl's malloc(); the | |
708 | second only if Perl was built with C<-DDEBUGGING>. See the | |
709 | instructions for how to do this in the F<INSTALL> podpage at | |
710 | the top level of the Perl source tree. | |
711 | ||
712 | =head2 Using C<$ENV{PERL_DEBUG_MSTATS}> | |
713 | ||
714 | If your perl is using Perl's malloc() and was compiled with the | |
715 | necessary switches (this is the default), then it will print memory | |
4375e838 | 716 | usage statistics after compiling your code when C<< $ENV{PERL_DEBUG_MSTATS} |
055fd3a9 GS |
717 | > 1 >>, and before termination of the program when C<< |
718 | $ENV{PERL_DEBUG_MSTATS} >= 1 >>. The report format is similar to | |
719 | the following example: | |
720 | ||
721 | $ PERL_DEBUG_MSTATS=2 perl -e "require Carp" | |
722 | Memory allocation statistics after compilation: (buckets 4(4)..8188(8192) | |
723 | 14216 free: 130 117 28 7 9 0 2 2 1 0 0 | |
724 | 437 61 36 0 5 | |
725 | 60924 used: 125 137 161 55 7 8 6 16 2 0 1 | |
726 | 74 109 304 84 20 | |
727 | Total sbrk(): 77824/21:119. Odd ends: pad+heads+chain+tail: 0+636+0+2048. | |
728 | Memory allocation statistics after execution: (buckets 4(4)..8188(8192) | |
729 | 30888 free: 245 78 85 13 6 2 1 3 2 0 1 | |
730 | 315 162 39 42 11 | |
731 | 175816 used: 265 176 1112 111 26 22 11 27 2 1 1 | |
732 | 196 178 1066 798 39 | |
733 | Total sbrk(): 215040/47:145. Odd ends: pad+heads+chain+tail: 0+2192+0+6144. | |
734 | ||
735 | It is possible to ask for such a statistic at arbitrary points in | |
b9449ee0 | 736 | your execution using the mstat() function out of the standard |
055fd3a9 GS |
737 | Devel::Peek module. |
738 | ||
739 | Here is some explanation of that format: | |
740 | ||
13a2d996 | 741 | =over 4 |
055fd3a9 GS |
742 | |
743 | =item C<buckets SMALLEST(APPROX)..GREATEST(APPROX)> | |
744 | ||
745 | Perl's malloc() uses bucketed allocations. Every request is rounded | |
746 | up to the closest bucket size available, and a bucket is taken from | |
747 | the pool of buckets of that size. | |
748 | ||
749 | The line above describes the limits of buckets currently in use. | |
750 | Each bucket has two sizes: memory footprint and the maximal size | |
751 | of user data that can fit into this bucket. Suppose in the above | |
752 | example that the smallest bucket were size 4. The biggest bucket | |
753 | would have usable size 8188, and the memory footprint would be 8192. | |
754 | ||
755 | In a Perl built for debugging, some buckets may have negative usable | |
756 | size. This means that these buckets cannot (and will not) be used. | |
757 | For larger buckets, the memory footprint may be one page greater | |
758 | than a power of 2. If so, case the corresponding power of two is | |
759 | printed in the C<APPROX> field above. | |
760 | ||
761 | =item Free/Used | |
762 | ||
763 | The 1 or 2 rows of numbers following that correspond to the number | |
764 | of buckets of each size between C<SMALLEST> and C<GREATEST>. In | |
765 | the first row, the sizes (memory footprints) of buckets are powers | |
766 | of two--or possibly one page greater. In the second row, if present, | |
767 | the memory footprints of the buckets are between the memory footprints | |
768 | of two buckets "above". | |
769 | ||
4375e838 | 770 | For example, suppose under the previous example, the memory footprints |
055fd3a9 GS |
771 | were |
772 | ||
773 | free: 8 16 32 64 128 256 512 1024 2048 4096 8192 | |
774 | 4 12 24 48 80 | |
775 | ||
776 | With non-C<DEBUGGING> perl, the buckets starting from C<128> have | |
777 | a 4-byte overhead, and thus a 8192-long bucket may take up to | |
778 | 8188-byte allocations. | |
779 | ||
780 | =item C<Total sbrk(): SBRKed/SBRKs:CONTINUOUS> | |
781 | ||
782 | The first two fields give the total amount of memory perl sbrk(2)ed | |
783 | (ess-broken? :-) and number of sbrk(2)s used. The third number is | |
784 | what perl thinks about continuity of returned chunks. So long as | |
785 | this number is positive, malloc() will assume that it is probable | |
786 | that sbrk(2) will provide continuous memory. | |
787 | ||
788 | Memory allocated by external libraries is not counted. | |
789 | ||
790 | =item C<pad: 0> | |
791 | ||
792 | The amount of sbrk(2)ed memory needed to keep buckets aligned. | |
793 | ||
794 | =item C<heads: 2192> | |
795 | ||
796 | Although memory overhead of bigger buckets is kept inside the bucket, for | |
797 | smaller buckets, it is kept in separate areas. This field gives the | |
798 | total size of these areas. | |
799 | ||
800 | =item C<chain: 0> | |
801 | ||
802 | malloc() may want to subdivide a bigger bucket into smaller buckets. | |
803 | If only a part of the deceased bucket is left unsubdivided, the rest | |
804 | is kept as an element of a linked list. This field gives the total | |
805 | size of these chunks. | |
806 | ||
807 | =item C<tail: 6144> | |
808 | ||
809 | To minimize the number of sbrk(2)s, malloc() asks for more memory. This | |
810 | field gives the size of the yet unused part, which is sbrk(2)ed, but | |
811 | never touched. | |
812 | ||
813 | =back | |
814 | ||
815 | =head2 Example of using B<-DL> switch | |
816 | ||
817 | Below we show how to analyse memory usage by | |
818 | ||
819 | do 'lib/auto/POSIX/autosplit.ix'; | |
820 | ||
821 | The file in question contains a header and 146 lines similar to | |
822 | ||
823 | sub getcwd; | |
824 | ||
825 | B<WARNING>: The discussion below supposes 32-bit architecture. In | |
826 | newer releases of Perl, memory usage of the constructs discussed | |
827 | here is greatly improved, but the story discussed below is a real-life | |
828 | story. This story is mercilessly terse, and assumes rather more than cursory | |
829 | knowledge of Perl internals. Type space to continue, `q' to quit. | |
830 | (Actually, you just want to skip to the next section.) | |
831 | ||
832 | Here is the itemized list of Perl allocations performed during parsing | |
833 | of this file: | |
834 | ||
835 | !!! "after" at test.pl line 3. | |
836 | Id subtot 4 8 12 16 20 24 28 32 36 40 48 56 64 72 80 80+ | |
837 | 0 02 13752 . . . . 294 . . . . . . . . . . 4 | |
838 | 0 54 5545 . . 8 124 16 . . . 1 1 . . . . . 3 | |
839 | 5 05 32 . . . . . . . 1 . . . . . . . . | |
840 | 6 02 7152 . . . . . . . . . . 149 . . . . . | |
841 | 7 02 3600 . . . . . 150 . . . . . . . . . . | |
842 | 7 03 64 . -1 . 1 . . 2 . . . . . . . . . | |
843 | 7 04 7056 . . . . . . . . . . . . . . . 7 | |
844 | 7 17 38404 . . . . . . . 1 . . 442 149 . . 147 . | |
845 | 9 03 2078 17 249 32 . . . . 2 . . . . . . . . | |
846 | ||
847 | ||
848 | To see this list, insert two C<warn('!...')> statements around the call: | |
849 | ||
850 | warn('!'); | |
851 | do 'lib/auto/POSIX/autosplit.ix'; | |
852 | warn('!!! "after"'); | |
853 | ||
4375e838 | 854 | and run it with Perl's B<-DL> option. The first warn() will print |
055fd3a9 GS |
855 | memory allocation info before parsing the file and will memorize |
856 | the statistics at this point (we ignore what it prints). The second | |
857 | warn() prints increments with respect to these memorized data. This | |
858 | is the printout shown above. | |
859 | ||
860 | Different I<Id>s on the left correspond to different subsystems of | |
861 | the perl interpreter. They are just the first argument given to | |
862 | the perl memory allocation API named New(). To find what C<9 03> | |
863 | means, just B<grep> the perl source for C<903>. You'll find it in | |
864 | F<util.c>, function savepvn(). (I know, you wonder why we told you | |
865 | to B<grep> and then gave away the answer. That's because grepping | |
866 | the source is good for the soul.) This function is used to store | |
867 | a copy of an existing chunk of memory. Using a C debugger, one can | |
868 | see that the function was called either directly from gv_init() or | |
869 | via sv_magic(), and that gv_init() is called from gv_fetchpv()--which | |
870 | was itself called from newSUB(). Please stop to catch your breath now. | |
871 | ||
872 | B<NOTE>: To reach this point in the debugger and skip the calls to | |
873 | savepvn() during the compilation of the main program, you should | |
874 | set a C breakpoint | |
875 | in Perl_warn(), continue until this point is reached, and I<then> set | |
876 | a C breakpoint in Perl_savepvn(). Note that you may need to skip a | |
877 | handful of Perl_savepvn() calls that do not correspond to mass production | |
878 | of CVs (there are more C<903> allocations than 146 similar lines of | |
879 | F<lib/auto/POSIX/autosplit.ix>). Note also that C<Perl_> prefixes are | |
880 | added by macroization code in perl header files to avoid conflicts | |
881 | with external libraries. | |
882 | ||
883 | Anyway, we see that C<903> ids correspond to creation of globs, twice | |
884 | per glob - for glob name, and glob stringification magic. | |
885 | ||
886 | Here are explanations for other I<Id>s above: | |
887 | ||
13a2d996 | 888 | =over 4 |
055fd3a9 GS |
889 | |
890 | =item C<717> | |
891 | ||
4375e838 | 892 | Creates bigger C<XPV*> structures. In the case above, it |
055fd3a9 GS |
893 | creates 3 C<AV>s per subroutine, one for a list of lexical variable |
894 | names, one for a scratchpad (which contains lexical variables and | |
895 | C<targets>), and one for the array of scratchpads needed for | |
896 | recursion. | |
897 | ||
898 | It also creates a C<GV> and a C<CV> per subroutine, all called from | |
899 | start_subparse(). | |
900 | ||
901 | =item C<002> | |
902 | ||
903 | Creates a C array corresponding to the C<AV> of scratchpads and the | |
904 | scratchpad itself. The first fake entry of this scratchpad is | |
905 | created though the subroutine itself is not defined yet. | |
906 | ||
907 | It also creates C arrays to keep data for the stash. This is one HV, | |
908 | but it grows; thus, there are 4 big allocations: the big chunks are not | |
909 | freed, but are kept as additional arenas for C<SV> allocations. | |
910 | ||
911 | =item C<054> | |
912 | ||
913 | Creates a C<HEK> for the name of the glob for the subroutine. This | |
914 | name is a key in a I<stash>. | |
915 | ||
916 | Big allocations with this I<Id> correspond to allocations of new | |
917 | arenas to keep C<HE>. | |
918 | ||
919 | =item C<602> | |
920 | ||
921 | Creates a C<GP> for the glob for the subroutine. | |
922 | ||
923 | =item C<702> | |
924 | ||
925 | Creates the C<MAGIC> for the glob for the subroutine. | |
926 | ||
927 | =item C<704> | |
928 | ||
929 | Creates I<arenas> which keep SVs. | |
930 | ||
931 | =back | |
932 | ||
933 | =head2 B<-DL> details | |
934 | ||
935 | If Perl is run with B<-DL> option, then warn()s that start with `!' | |
936 | behave specially. They print a list of I<categories> of memory | |
937 | allocations, and statistics of allocations of different sizes for | |
938 | these categories. | |
939 | ||
940 | If warn() string starts with | |
941 | ||
13a2d996 | 942 | =over 4 |
055fd3a9 GS |
943 | |
944 | =item C<!!!> | |
945 | ||
946 | print changed categories only, print the differences in counts of allocations. | |
947 | ||
948 | =item C<!!> | |
949 | ||
950 | print grown categories only; print the absolute values of counts, and totals. | |
951 | ||
952 | =item C<!> | |
953 | ||
954 | print nonempty categories, print the absolute values of counts and totals. | |
955 | ||
956 | =back | |
957 | ||
958 | =head2 Limitations of B<-DL> statistics | |
959 | ||
960 | If an extension or external library does not use the Perl API to | |
961 | allocate memory, such allocations are not counted. | |
962 | ||
963 | =head1 SEE ALSO | |
964 | ||
965 | L<perldebug>, | |
966 | L<perlguts>, | |
967 | L<perlrun> | |
968 | L<re>, | |
969 | and | |
970 | L<Devel::Dprof>. |