This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
doubled words in pods (from Simon Cozens
[perl5.git] / pod / perldebguts.pod
CommitLineData
055fd3a9
GS
1=head1 NAME
2
3perldebguts - Guts of Perl debugging
4
5=head1 DESCRIPTION
6
7This is not the perldebug(1) manpage, which tells you how to use
8the debugger. This manpage describes low-level details ranging
9between difficult and impossible for anyone who isn't incredibly
10intimate with Perl's guts to understand. Caveat lector.
11
12=head1 Debugger Internals
13
14Perl has special debugging hooks at compile-time and run-time used
15to create debugging environments. These hooks are not to be confused
4375e838
GS
16with the I<perl -Dxxx> command described in L<perlrun>, which is
17usable only if a special Perl is built per the instructions in the
055fd3a9
GS
18F<INSTALL> podpage in the Perl source tree.
19
20For example, whenever you call Perl's built-in C<caller> function
21from the package DB, the arguments that the corresponding stack
106325ad 22frame was called with are copied to the @DB::args array. The
055fd3a9
GS
23general mechanisms is enabled by calling Perl with the B<-d> switch, the
24following additional features are enabled (cf. L<perlvar/$^P>):
25
26=over
27
28=item *
29
30Perl inserts the contents of C<$ENV{PERL5DB}> (or C<BEGIN {require
31'perl5db.pl'}> if not present) before the first line of your program.
32
33=item *
34
35The array C<@{"_<$filename"}> holds the lines of $filename for all
36files compiled by Perl. The same for C<eval>ed strings that contain
37subroutines, or which are currently being executed. The $filename
38for C<eval>ed strings looks like C<(eval 34)>. Code assertions
39in regexes look like C<(re_eval 19)>.
40
41=item *
42
43The hash C<%{"_<$filename"}> contains breakpoints and actions keyed
44by line number. Individual entries (as opposed to the whole hash)
45are settable. Perl only cares about Boolean true here, although
46the values used by F<perl5db.pl> have the form
47C<"$break_condition\0$action">. Values in this hash are magical
48in numeric context: they are zeros if the line is not breakable.
49
50The same holds for evaluated strings that contain subroutines, or
51which are currently being executed. The $filename for C<eval>ed strings
52looks like C<(eval 34)> or C<(re_eval 19)>.
53
54=item *
55
56The scalar C<${"_<$filename"}> contains C<"_<$filename">. This is
57also the case for evaluated strings that contain subroutines, or
58which are currently being executed. The $filename for C<eval>ed
59strings looks like C<(eval 34)> or C<(re_eval 19)>.
60
61=item *
62
63After each C<require>d file is compiled, but before it is executed,
64C<DB::postponed(*{"_<$filename"})> is called if the subroutine
65C<DB::postponed> exists. Here, the $filename is the expanded name of
66the C<require>d file, as found in the values of %INC.
67
68=item *
69
70After each subroutine C<subname> is compiled, the existence of
71C<$DB::postponed{subname}> is checked. If this key exists,
72C<DB::postponed(subname)> is called if the C<DB::postponed> subroutine
73also exists.
74
75=item *
76
77A hash C<%DB::sub> is maintained, whose keys are subroutine names
78and whose values have the form C<filename:startline-endline>.
79C<filename> has the form C<(eval 34)> for subroutines defined inside
80C<eval>s, or C<(re_eval 19)> for those within regex code assertions.
81
82=item *
83
84When the execution of your program reaches a point that can hold a
85breakpoint, the C<DB::DB()> subroutine is called any of the variables
86$DB::trace, $DB::single, or $DB::signal is true. These variables
87are not C<local>izable. This feature is disabled when executing
88inside C<DB::DB()>, including functions called from it
89unless C<< $^D & (1<<30) >> is true.
90
91=item *
92
93When execution of the program reaches a subroutine call, a call to
94C<&DB::sub>(I<args>) is made instead, with C<$DB::sub> holding the
95name of the called subroutine. This doesn't happen if the subroutine
96was compiled in the C<DB> package.)
97
98=back
99
100Note that if C<&DB::sub> needs external data for it to work, no
101subroutine call is possible until this is done. For the standard
102debugger, the C<$DB::deep> variable (how many levels of recursion
103deep into the debugger you can go before a mandatory break) gives
104an example of such a dependency.
105
106=head2 Writing Your Own Debugger
107
108The minimal working debugger consists of one line
109
110 sub DB::DB {}
111
112which is quite handy as contents of C<PERL5DB> environment
113variable:
114
115 $ PERL5DB="sub DB::DB {}" perl -d your-script
116
117Another brief debugger, slightly more useful, could be created
118with only the line:
119
120 sub DB::DB {print ++$i; scalar <STDIN>}
121
122This debugger would print the sequential number of encountered
123statement, and would wait for you to hit a newline before continuing.
124
125The following debugger is quite functional:
126
127 {
128 package DB;
129 sub DB {}
130 sub sub {print ++$i, " $sub\n"; &$sub}
131 }
132
133It prints the sequential number of subroutine call and the name of the
134called subroutine. Note that C<&DB::sub> should be compiled into the
135package C<DB>.
136
137At the start, the debugger reads your rc file (F<./.perldb> or
138F<~/.perldb> under Unix), which can set important options. This file may
139define a subroutine C<&afterinit> to be executed after the debugger is
140initialized.
141
142After the rc file is read, the debugger reads the PERLDB_OPTS
143environment variable and parses this as the remainder of a C<O ...>
144line as one might enter at the debugger prompt.
145
146The debugger also maintains magical internal variables, such as
147C<@DB::dbline>, C<%DB::dbline>, which are aliases for
148C<@{"::_<current_file"}> C<%{"::_<current_file"}>. Here C<current_file>
149is the currently selected file, either explicitly chosen with the
150debugger's C<f> command, or implicitly by flow of execution.
151
152Some functions are provided to simplify customization. See
153L<perldebug/"Options"> for description of options parsed by
154C<DB::parse_options(string)>. The function C<DB::dump_trace(skip[,
155count])> skips the specified number of frames and returns a list
156containing information about the calling frames (all of them, if
106325ad 157C<count> is missing). Each entry is reference to a hash with
055fd3a9
GS
158keys C<context> (either C<.>, C<$>, or C<@>), C<sub> (subroutine
159name, or info about C<eval>), C<args> (C<undef> or a reference to
160an array), C<file>, and C<line>.
161
162The function C<DB::print_trace(FH, skip[, count[, short]])> prints
163formatted info about caller frames. The last two functions may be
164convenient as arguments to C<< < >>, C<< << >> commands.
165
166Note that any variables and functions that are not documented in
167this manpages (or in L<perldebug>) are considered for internal
168use only, and as such are subject to change without notice.
169
170=head1 Frame Listing Output Examples
171
172The C<frame> option can be used to control the output of frame
173information. For example, contrast this expression trace:
174
175 $ perl -de 42
176 Stack dump during die enabled outside of evals.
177
178 Loading DB routines from perl5db.pl patch level 0.94
179 Emacs support available.
180
181 Enter h or `h h' for help.
182
183 main::(-e:1): 0
184 DB<1> sub foo { 14 }
185
186 DB<2> sub bar { 3 }
187
188 DB<3> t print foo() * bar()
189 main::((eval 172):3): print foo() + bar();
190 main::foo((eval 168):2):
191 main::bar((eval 170):2):
192 42
193
194with this one, once the C<O>ption C<frame=2> has been set:
195
196 DB<4> O f=2
197 frame = '2'
198 DB<5> t print foo() * bar()
199 3: foo() * bar()
200 entering main::foo
201 2: sub foo { 14 };
202 exited main::foo
203 entering main::bar
204 2: sub bar { 3 };
205 exited main::bar
206 42
207
208By way of demonstration, we present below a laborious listing
209resulting from setting your C<PERLDB_OPTS> environment variable to
210the value C<f=n N>, and running I<perl -d -V> from the command line.
211Examples use various values of C<n> are shown to give you a feel
212for the difference between settings. Long those it may be, this
213is not a complete listing, but only excerpts.
214
215=over 4
216
217=item 1
218
219 entering main::BEGIN
220 entering Config::BEGIN
221 Package lib/Exporter.pm.
222 Package lib/Carp.pm.
223 Package lib/Config.pm.
224 entering Config::TIEHASH
225 entering Exporter::import
226 entering Exporter::export
227 entering Config::myconfig
228 entering Config::FETCH
229 entering Config::FETCH
230 entering Config::FETCH
231 entering Config::FETCH
232
233=item 2
234
235 entering main::BEGIN
236 entering Config::BEGIN
237 Package lib/Exporter.pm.
238 Package lib/Carp.pm.
239 exited Config::BEGIN
240 Package lib/Config.pm.
241 entering Config::TIEHASH
242 exited Config::TIEHASH
243 entering Exporter::import
244 entering Exporter::export
245 exited Exporter::export
246 exited Exporter::import
247 exited main::BEGIN
248 entering Config::myconfig
249 entering Config::FETCH
250 exited Config::FETCH
251 entering Config::FETCH
252 exited Config::FETCH
253 entering Config::FETCH
254
255=item 4
256
257 in $=main::BEGIN() from /dev/null:0
258 in $=Config::BEGIN() from lib/Config.pm:2
259 Package lib/Exporter.pm.
260 Package lib/Carp.pm.
261 Package lib/Config.pm.
262 in $=Config::TIEHASH('Config') from lib/Config.pm:644
263 in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
264 in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from li
265 in @=Config::myconfig() from /dev/null:0
266 in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
267 in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
268 in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
269 in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574
270 in $=Config::FETCH(ref(Config), 'osname') from lib/Config.pm:574
271 in $=Config::FETCH(ref(Config), 'osvers') from lib/Config.pm:574
272
273=item 6
274
275 in $=main::BEGIN() from /dev/null:0
276 in $=Config::BEGIN() from lib/Config.pm:2
277 Package lib/Exporter.pm.
278 Package lib/Carp.pm.
279 out $=Config::BEGIN() from lib/Config.pm:0
280 Package lib/Config.pm.
281 in $=Config::TIEHASH('Config') from lib/Config.pm:644
282 out $=Config::TIEHASH('Config') from lib/Config.pm:644
283 in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
284 in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/
285 out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/
286 out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
287 out $=main::BEGIN() from /dev/null:0
288 in @=Config::myconfig() from /dev/null:0
289 in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
290 out $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
291 in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
292 out $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
293 in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
294 out $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
295 in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574
296
297=item 14
298
299 in $=main::BEGIN() from /dev/null:0
300 in $=Config::BEGIN() from lib/Config.pm:2
301 Package lib/Exporter.pm.
302 Package lib/Carp.pm.
303 out $=Config::BEGIN() from lib/Config.pm:0
304 Package lib/Config.pm.
305 in $=Config::TIEHASH('Config') from lib/Config.pm:644
306 out $=Config::TIEHASH('Config') from lib/Config.pm:644
307 in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
308 in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E
309 out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E
310 out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
311 out $=main::BEGIN() from /dev/null:0
312 in @=Config::myconfig() from /dev/null:0
313 in $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574
314 out $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574
315 in $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574
316 out $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574
317
318=item 30
319
320 in $=CODE(0x15eca4)() from /dev/null:0
321 in $=CODE(0x182528)() from lib/Config.pm:2
322 Package lib/Exporter.pm.
323 out $=CODE(0x182528)() from lib/Config.pm:0
324 scalar context return from CODE(0x182528): undef
325 Package lib/Config.pm.
326 in $=Config::TIEHASH('Config') from lib/Config.pm:628
327 out $=Config::TIEHASH('Config') from lib/Config.pm:628
328 scalar context return from Config::TIEHASH: empty hash
329 in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
330 in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171
331 out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171
332 scalar context return from Exporter::export: ''
333 out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
334 scalar context return from Exporter::import: ''
335
336=back
337
338In all cases shown above, the line indentation shows the call tree.
339If bit 2 of C<frame> is set, a line is printed on exit from a
340subroutine as well. If bit 4 is set, the arguments are printed
341along with the caller info. If bit 8 is set, the arguments are
342printed even if they are tied or references. If bit 16 is set, the
343return value is printed, too.
344
345When a package is compiled, a line like this
346
347 Package lib/Carp.pm.
348
349is printed with proper indentation.
350
351=head1 Debugging regular expressions
352
353There are two ways to enable debugging output for regular expressions.
354
355If your perl is compiled with C<-DDEBUGGING>, you may use the
356B<-Dr> flag on the command line.
357
358Otherwise, one can C<use re 'debug'>, which has effects at
359compile time and run time. It is not lexically scoped.
360
361=head2 Compile-time output
362
363The debugging output at compile time looks like this:
364
365 compiling RE `[bc]d(ef*g)+h[ij]k$'
366 size 43 first at 1
367 1: ANYOF(11)
368 11: EXACT <d>(13)
369 13: CURLYX {1,32767}(27)
370 15: OPEN1(17)
371 17: EXACT <e>(19)
372 19: STAR(22)
373 20: EXACT <f>(0)
374 22: EXACT <g>(24)
375 24: CLOSE1(26)
376 26: WHILEM(0)
377 27: NOTHING(28)
378 28: EXACT <h>(30)
379 30: ANYOF(40)
380 40: EXACT <k>(42)
381 42: EOL(43)
382 43: END(0)
383 anchored `de' at 1 floating `gh' at 3..2147483647 (checking floating)
384 stclass `ANYOF' minlen 7
385
386The first line shows the pre-compiled form of the regex. The second
387shows the size of the compiled form (in arbitrary units, usually
3884-byte words) and the label I<id> of the first node that does a
389match.
390
391The last line (split into two lines above) contains optimizer
392information. In the example shown, the optimizer found that the match
393should contain a substring C<de> at offset 1, plus substring C<gh>
394at some offset between 3 and infinity. Moreover, when checking for
395these substrings (to abandon impossible matches quickly), Perl will check
396for the substring C<gh> before checking for the substring C<de>. The
397optimizer may also use the knowledge that the match starts (at the
398C<first> I<id>) with a character class, and the match cannot be
399shorter than 7 chars.
400
401The fields of interest which may appear in the last line are
402
403=over
404
405=item C<anchored> I<STRING> C<at> I<POS>
406
407=item C<floating> I<STRING> C<at> I<POS1..POS2>
408
409See above.
410
411=item C<matching floating/anchored>
412
413Which substring to check first.
414
415=item C<minlen>
416
417The minimal length of the match.
418
419=item C<stclass> I<TYPE>
420
421Type of first matching node.
422
423=item C<noscan>
424
425Don't scan for the found substrings.
426
427=item C<isall>
428
429Means that the optimizer info is all that the regular
430expression contains, and thus one does not need to enter the regex engine at
431all.
432
433=item C<GPOS>
434
435Set if the pattern contains C<\G>.
436
437=item C<plus>
438
439Set if the pattern starts with a repeated char (as in C<x+y>).
440
441=item C<implicit>
442
443Set if the pattern starts with C<.*>.
444
445=item C<with eval>
446
447Set if the pattern contain eval-groups, such as C<(?{ code })> and
448C<(??{ code })>.
449
450=item C<anchored(TYPE)>
451
452If the pattern may match only at a handful of places, (with C<TYPE>
453being C<BOL>, C<MBOL>, or C<GPOS>. See the table below.
454
455=back
456
457If a substring is known to match at end-of-line only, it may be
458followed by C<$>, as in C<floating `k'$>.
459
460The optimizer-specific info is used to avoid entering (a slow) regex
461engine on strings that will not definitely match. If C<isall> flag
462is set, a call to the regex engine may be avoided even when the optimizer
463found an appropriate place for the match.
464
465The rest of the output contains the list of I<nodes> of the compiled
466form of the regex. Each line has format
467
468C< >I<id>: I<TYPE> I<OPTIONAL-INFO> (I<next-id>)
469
470=head2 Types of nodes
471
472Here are the possible types, with short descriptions:
473
474 # TYPE arg-description [num-args] [longjump-len] DESCRIPTION
475
476 # Exit points
477 END no End of program.
478 SUCCEED no Return from a subroutine, basically.
479
480 # Anchors:
481 BOL no Match "" at beginning of line.
482 MBOL no Same, assuming multiline.
483 SBOL no Same, assuming singleline.
484 EOS no Match "" at end of string.
485 EOL no Match "" at end of line.
486 MEOL no Same, assuming multiline.
487 SEOL no Same, assuming singleline.
488 BOUND no Match "" at any word boundary
489 BOUNDL no Match "" at any word boundary
490 NBOUND no Match "" at any word non-boundary
491 NBOUNDL no Match "" at any word non-boundary
492 GPOS no Matches where last m//g left off.
493
494 # [Special] alternatives
495 ANY no Match any one character (except newline).
496 SANY no Match any one character.
497 ANYOF sv Match character in (or not in) this class.
498 ALNUM no Match any alphanumeric character
499 ALNUML no Match any alphanumeric char in locale
500 NALNUM no Match any non-alphanumeric character
501 NALNUML no Match any non-alphanumeric char in locale
502 SPACE no Match any whitespace character
503 SPACEL no Match any whitespace char in locale
504 NSPACE no Match any non-whitespace character
505 NSPACEL no Match any non-whitespace char in locale
506 DIGIT no Match any numeric character
507 NDIGIT no Match any non-numeric character
508
509 # BRANCH The set of branches constituting a single choice are hooked
510 # together with their "next" pointers, since precedence prevents
511 # anything being concatenated to any individual branch. The
512 # "next" pointer of the last BRANCH in a choice points to the
513 # thing following the whole choice. This is also where the
514 # final "next" pointer of each individual branch points; each
515 # branch starts with the operand node of a BRANCH node.
516 #
517 BRANCH node Match this alternative, or the next...
518
519 # BACK Normal "next" pointers all implicitly point forward; BACK
520 # exists to make loop structures possible.
521 # not used
522 BACK no Match "", "next" ptr points backward.
523
524 # Literals
525 EXACT sv Match this string (preceded by length).
526 EXACTF sv Match this string, folded (prec. by length).
527 EXACTFL sv Match this string, folded in locale (w/len).
528
529 # Do nothing
530 NOTHING no Match empty string.
531 # A variant of above which delimits a group, thus stops optimizations
532 TAIL no Match empty string. Can jump here from outside.
533
534 # STAR,PLUS '?', and complex '*' and '+', are implemented as circular
535 # BRANCH structures using BACK. Simple cases (one character
536 # per match) are implemented with STAR and PLUS for speed
537 # and to minimize recursive plunges.
538 #
539 STAR node Match this (simple) thing 0 or more times.
540 PLUS node Match this (simple) thing 1 or more times.
541
542 CURLY sv 2 Match this simple thing {n,m} times.
543 CURLYN no 2 Match next-after-this simple thing
544 # {n,m} times, set parens.
545 CURLYM no 2 Match this medium-complex thing {n,m} times.
546 CURLYX sv 2 Match this complex thing {n,m} times.
547
548 # This terminator creates a loop structure for CURLYX
549 WHILEM no Do curly processing and see if rest matches.
550
551 # OPEN,CLOSE,GROUPP ...are numbered at compile time.
552 OPEN num 1 Mark this point in input as start of #n.
553 CLOSE num 1 Analogous to OPEN.
554
555 REF num 1 Match some already matched string
556 REFF num 1 Match already matched string, folded
557 REFFL num 1 Match already matched string, folded in loc.
558
559 # grouping assertions
560 IFMATCH off 1 2 Succeeds if the following matches.
561 UNLESSM off 1 2 Fails if the following matches.
562 SUSPEND off 1 1 "Independent" sub-regex.
563 IFTHEN off 1 1 Switch, should be preceded by switcher .
564 GROUPP num 1 Whether the group matched.
565
566 # Support for long regex
567 LONGJMP off 1 1 Jump far away.
568 BRANCHJ off 1 1 BRANCH with long offset.
569
570 # The heavy worker
571 EVAL evl 1 Execute some Perl code.
572
573 # Modifiers
574 MINMOD no Next operator is not greedy.
575 LOGICAL no Next opcode should set the flag only.
576
577 # This is not used yet
578 RENUM off 1 1 Group with independently numbered parens.
579
580 # This is not really a node, but an optimized away piece of a "long" node.
581 # To simplify debugging output, we mark it as if it were a node
582 OPTIMIZED off Placeholder for dump.
583
584=head2 Run-time output
585
586First of all, when doing a match, one may get no run-time output even
587if debugging is enabled. This means that the regex engine was never
588entered and that all of the job was therefore done by the optimizer.
589
590If the regex engine was entered, the output may look like this:
591
592 Matching `[bc]d(ef*g)+h[ij]k$' against `abcdefg__gh__'
593 Setting an EVAL scope, savestack=3
594 2 <ab> <cdefg__gh_> | 1: ANYOF
595 3 <abc> <defg__gh_> | 11: EXACT <d>
596 4 <abcd> <efg__gh_> | 13: CURLYX {1,32767}
597 4 <abcd> <efg__gh_> | 26: WHILEM
598 0 out of 1..32767 cc=effff31c
599 4 <abcd> <efg__gh_> | 15: OPEN1
600 4 <abcd> <efg__gh_> | 17: EXACT <e>
601 5 <abcde> <fg__gh_> | 19: STAR
602 EXACT <f> can match 1 times out of 32767...
603 Setting an EVAL scope, savestack=3
604 6 <bcdef> <g__gh__> | 22: EXACT <g>
605 7 <bcdefg> <__gh__> | 24: CLOSE1
606 7 <bcdefg> <__gh__> | 26: WHILEM
607 1 out of 1..32767 cc=effff31c
608 Setting an EVAL scope, savestack=12
609 7 <bcdefg> <__gh__> | 15: OPEN1
610 7 <bcdefg> <__gh__> | 17: EXACT <e>
611 restoring \1 to 4(4)..7
612 failed, try continuation...
613 7 <bcdefg> <__gh__> | 27: NOTHING
614 7 <bcdefg> <__gh__> | 28: EXACT <h>
615 failed...
616 failed...
617
618The most significant information in the output is about the particular I<node>
619of the compiled regex that is currently being tested against the target string.
620The format of these lines is
621
622C< >I<STRING-OFFSET> <I<PRE-STRING>> <I<POST-STRING>> |I<ID>: I<TYPE>
623
624The I<TYPE> info is indented with respect to the backtracking level.
625Other incidental information appears interspersed within.
626
627=head1 Debugging Perl memory usage
628
629Perl is a profligate wastrel when it comes to memory use. There
630is a saying that to estimate memory usage of Perl, assume a reasonable
631algorithm for memory allocation, multiply that estimate by 10, and
632while you still may miss the mark, at least you won't be quite so
4375e838 633astonished. This is not absolutely true, but may provide a good
055fd3a9
GS
634grasp of what happens.
635
636Assume that an integer cannot take less than 20 bytes of memory, a
637float cannot take less than 24 bytes, a string cannot take less
638than 32 bytes (all these examples assume 32-bit architectures, the
639result are quite a bit worse on 64-bit architectures). If a variable
640is accessed in two of three different ways (which require an integer,
641a float, or a string), the memory footprint may increase yet another
64220 bytes. A sloppy malloc(3) implementation can make inflate these
643numbers dramatically.
644
645On the opposite end of the scale, a declaration like
646
647 sub foo;
648
649may take up to 500 bytes of memory, depending on which release of Perl
650you're running.
651
652Anecdotal estimates of source-to-compiled code bloat suggest an
653eightfold increase. This means that the compiled form of reasonable
654(normally commented, properly indented etc.) code will take
655about eight times more space in memory than the code took
656on disk.
657
658There are two Perl-specific ways to analyze memory usage:
659$ENV{PERL_DEBUG_MSTATS} and B<-DL> command-line switch. The first
660is available only if Perl is compiled with Perl's malloc(); the
661second only if Perl was built with C<-DDEBUGGING>. See the
662instructions for how to do this in the F<INSTALL> podpage at
663the top level of the Perl source tree.
664
665=head2 Using C<$ENV{PERL_DEBUG_MSTATS}>
666
667If your perl is using Perl's malloc() and was compiled with the
668necessary switches (this is the default), then it will print memory
4375e838 669usage statistics after compiling your code when C<< $ENV{PERL_DEBUG_MSTATS}
055fd3a9
GS
670> 1 >>, and before termination of the program when C<<
671$ENV{PERL_DEBUG_MSTATS} >= 1 >>. The report format is similar to
672the following example:
673
674 $ PERL_DEBUG_MSTATS=2 perl -e "require Carp"
675 Memory allocation statistics after compilation: (buckets 4(4)..8188(8192)
676 14216 free: 130 117 28 7 9 0 2 2 1 0 0
677 437 61 36 0 5
678 60924 used: 125 137 161 55 7 8 6 16 2 0 1
679 74 109 304 84 20
680 Total sbrk(): 77824/21:119. Odd ends: pad+heads+chain+tail: 0+636+0+2048.
681 Memory allocation statistics after execution: (buckets 4(4)..8188(8192)
682 30888 free: 245 78 85 13 6 2 1 3 2 0 1
683 315 162 39 42 11
684 175816 used: 265 176 1112 111 26 22 11 27 2 1 1
685 196 178 1066 798 39
686 Total sbrk(): 215040/47:145. Odd ends: pad+heads+chain+tail: 0+2192+0+6144.
687
688It is possible to ask for such a statistic at arbitrary points in
689your execution using the mstats() function out of the standard
690Devel::Peek module.
691
692Here is some explanation of that format:
693
694=over
695
696=item C<buckets SMALLEST(APPROX)..GREATEST(APPROX)>
697
698Perl's malloc() uses bucketed allocations. Every request is rounded
699up to the closest bucket size available, and a bucket is taken from
700the pool of buckets of that size.
701
702The line above describes the limits of buckets currently in use.
703Each bucket has two sizes: memory footprint and the maximal size
704of user data that can fit into this bucket. Suppose in the above
705example that the smallest bucket were size 4. The biggest bucket
706would have usable size 8188, and the memory footprint would be 8192.
707
708In a Perl built for debugging, some buckets may have negative usable
709size. This means that these buckets cannot (and will not) be used.
710For larger buckets, the memory footprint may be one page greater
711than a power of 2. If so, case the corresponding power of two is
712printed in the C<APPROX> field above.
713
714=item Free/Used
715
716The 1 or 2 rows of numbers following that correspond to the number
717of buckets of each size between C<SMALLEST> and C<GREATEST>. In
718the first row, the sizes (memory footprints) of buckets are powers
719of two--or possibly one page greater. In the second row, if present,
720the memory footprints of the buckets are between the memory footprints
721of two buckets "above".
722
4375e838 723For example, suppose under the previous example, the memory footprints
055fd3a9
GS
724were
725
726 free: 8 16 32 64 128 256 512 1024 2048 4096 8192
727 4 12 24 48 80
728
729With non-C<DEBUGGING> perl, the buckets starting from C<128> have
730a 4-byte overhead, and thus a 8192-long bucket may take up to
7318188-byte allocations.
732
733=item C<Total sbrk(): SBRKed/SBRKs:CONTINUOUS>
734
735The first two fields give the total amount of memory perl sbrk(2)ed
736(ess-broken? :-) and number of sbrk(2)s used. The third number is
737what perl thinks about continuity of returned chunks. So long as
738this number is positive, malloc() will assume that it is probable
739that sbrk(2) will provide continuous memory.
740
741Memory allocated by external libraries is not counted.
742
743=item C<pad: 0>
744
745The amount of sbrk(2)ed memory needed to keep buckets aligned.
746
747=item C<heads: 2192>
748
749Although memory overhead of bigger buckets is kept inside the bucket, for
750smaller buckets, it is kept in separate areas. This field gives the
751total size of these areas.
752
753=item C<chain: 0>
754
755malloc() may want to subdivide a bigger bucket into smaller buckets.
756If only a part of the deceased bucket is left unsubdivided, the rest
757is kept as an element of a linked list. This field gives the total
758size of these chunks.
759
760=item C<tail: 6144>
761
762To minimize the number of sbrk(2)s, malloc() asks for more memory. This
763field gives the size of the yet unused part, which is sbrk(2)ed, but
764never touched.
765
766=back
767
768=head2 Example of using B<-DL> switch
769
770Below we show how to analyse memory usage by
771
772 do 'lib/auto/POSIX/autosplit.ix';
773
774The file in question contains a header and 146 lines similar to
775
776 sub getcwd;
777
778B<WARNING>: The discussion below supposes 32-bit architecture. In
779newer releases of Perl, memory usage of the constructs discussed
780here is greatly improved, but the story discussed below is a real-life
781story. This story is mercilessly terse, and assumes rather more than cursory
782knowledge of Perl internals. Type space to continue, `q' to quit.
783(Actually, you just want to skip to the next section.)
784
785Here is the itemized list of Perl allocations performed during parsing
786of this file:
787
788 !!! "after" at test.pl line 3.
789 Id subtot 4 8 12 16 20 24 28 32 36 40 48 56 64 72 80 80+
790 0 02 13752 . . . . 294 . . . . . . . . . . 4
791 0 54 5545 . . 8 124 16 . . . 1 1 . . . . . 3
792 5 05 32 . . . . . . . 1 . . . . . . . .
793 6 02 7152 . . . . . . . . . . 149 . . . . .
794 7 02 3600 . . . . . 150 . . . . . . . . . .
795 7 03 64 . -1 . 1 . . 2 . . . . . . . . .
796 7 04 7056 . . . . . . . . . . . . . . . 7
797 7 17 38404 . . . . . . . 1 . . 442 149 . . 147 .
798 9 03 2078 17 249 32 . . . . 2 . . . . . . . .
799
800
801To see this list, insert two C<warn('!...')> statements around the call:
802
803 warn('!');
804 do 'lib/auto/POSIX/autosplit.ix';
805 warn('!!! "after"');
806
4375e838 807and run it with Perl's B<-DL> option. The first warn() will print
055fd3a9
GS
808memory allocation info before parsing the file and will memorize
809the statistics at this point (we ignore what it prints). The second
810warn() prints increments with respect to these memorized data. This
811is the printout shown above.
812
813Different I<Id>s on the left correspond to different subsystems of
814the perl interpreter. They are just the first argument given to
815the perl memory allocation API named New(). To find what C<9 03>
816means, just B<grep> the perl source for C<903>. You'll find it in
817F<util.c>, function savepvn(). (I know, you wonder why we told you
818to B<grep> and then gave away the answer. That's because grepping
819the source is good for the soul.) This function is used to store
820a copy of an existing chunk of memory. Using a C debugger, one can
821see that the function was called either directly from gv_init() or
822via sv_magic(), and that gv_init() is called from gv_fetchpv()--which
823was itself called from newSUB(). Please stop to catch your breath now.
824
825B<NOTE>: To reach this point in the debugger and skip the calls to
826savepvn() during the compilation of the main program, you should
827set a C breakpoint
828in Perl_warn(), continue until this point is reached, and I<then> set
829a C breakpoint in Perl_savepvn(). Note that you may need to skip a
830handful of Perl_savepvn() calls that do not correspond to mass production
831of CVs (there are more C<903> allocations than 146 similar lines of
832F<lib/auto/POSIX/autosplit.ix>). Note also that C<Perl_> prefixes are
833added by macroization code in perl header files to avoid conflicts
834with external libraries.
835
836Anyway, we see that C<903> ids correspond to creation of globs, twice
837per glob - for glob name, and glob stringification magic.
838
839Here are explanations for other I<Id>s above:
840
841=over
842
843=item C<717>
844
4375e838 845Creates bigger C<XPV*> structures. In the case above, it
055fd3a9
GS
846creates 3 C<AV>s per subroutine, one for a list of lexical variable
847names, one for a scratchpad (which contains lexical variables and
848C<targets>), and one for the array of scratchpads needed for
849recursion.
850
851It also creates a C<GV> and a C<CV> per subroutine, all called from
852start_subparse().
853
854=item C<002>
855
856Creates a C array corresponding to the C<AV> of scratchpads and the
857scratchpad itself. The first fake entry of this scratchpad is
858created though the subroutine itself is not defined yet.
859
860It also creates C arrays to keep data for the stash. This is one HV,
861but it grows; thus, there are 4 big allocations: the big chunks are not
862freed, but are kept as additional arenas for C<SV> allocations.
863
864=item C<054>
865
866Creates a C<HEK> for the name of the glob for the subroutine. This
867name is a key in a I<stash>.
868
869Big allocations with this I<Id> correspond to allocations of new
870arenas to keep C<HE>.
871
872=item C<602>
873
874Creates a C<GP> for the glob for the subroutine.
875
876=item C<702>
877
878Creates the C<MAGIC> for the glob for the subroutine.
879
880=item C<704>
881
882Creates I<arenas> which keep SVs.
883
884=back
885
886=head2 B<-DL> details
887
888If Perl is run with B<-DL> option, then warn()s that start with `!'
889behave specially. They print a list of I<categories> of memory
890allocations, and statistics of allocations of different sizes for
891these categories.
892
893If warn() string starts with
894
895=over
896
897=item C<!!!>
898
899print changed categories only, print the differences in counts of allocations.
900
901=item C<!!>
902
903print grown categories only; print the absolute values of counts, and totals.
904
905=item C<!>
906
907print nonempty categories, print the absolute values of counts and totals.
908
909=back
910
911=head2 Limitations of B<-DL> statistics
912
913If an extension or external library does not use the Perl API to
914allocate memory, such allocations are not counted.
915
916=head1 SEE ALSO
917
918L<perldebug>,
919L<perlguts>,
920L<perlrun>
921L<re>,
922and
923L<Devel::Dprof>.