| 1 | =head1 NAME |
| 2 | |
| 3 | perldebguts - Guts of Perl debugging |
| 4 | |
| 5 | =head1 DESCRIPTION |
| 6 | |
| 7 | This is not L<perldebug>, which tells you how to use |
| 8 | the debugger. This manpage describes low-level details concerning |
| 9 | the debugger's internals, which range from difficult to impossible |
| 10 | to understand for anyone who isn't incredibly intimate with Perl's guts. |
| 11 | Caveat lector. |
| 12 | |
| 13 | =head1 Debugger Internals |
| 14 | |
| 15 | Perl has special debugging hooks at compile-time and run-time used |
| 16 | to create debugging environments. These hooks are not to be confused |
| 17 | with the I<perl -Dxxx> command described in L<perlrun>, which is |
| 18 | usable only if a special Perl is built per the instructions in the |
| 19 | F<INSTALL> podpage in the Perl source tree. |
| 20 | |
| 21 | For example, whenever you call Perl's built-in C<caller> function |
| 22 | from the package C<DB>, the arguments that the corresponding stack |
| 23 | frame was called with are copied to the C<@DB::args> array. These |
| 24 | mechanisms are enabled by calling Perl with the B<-d> switch. |
| 25 | Specifically, the following additional features are enabled |
| 26 | (cf. L<perlvar/$^P>): |
| 27 | |
| 28 | =over 4 |
| 29 | |
| 30 | =item * |
| 31 | |
| 32 | Perl inserts the contents of C<$ENV{PERL5DB}> (or C<BEGIN {require |
| 33 | 'perl5db.pl'}> if not present) before the first line of your program. |
| 34 | |
| 35 | =item * |
| 36 | |
| 37 | Each array C<@{"_<$filename"}> holds the lines of $filename for a |
| 38 | file compiled by Perl. The same is also true for C<eval>ed strings |
| 39 | that contain subroutines, or which are currently being executed. |
| 40 | The $filename for C<eval>ed strings looks like C<(eval 34)>. |
| 41 | |
| 42 | Values in this array are magical in numeric context: they compare |
| 43 | equal to zero only if the line is not breakable. |
| 44 | |
| 45 | =item * |
| 46 | |
| 47 | Each hash C<%{"_<$filename"}> contains breakpoints and actions keyed |
| 48 | by line number. Individual entries (as opposed to the whole hash) |
| 49 | are settable. Perl only cares about Boolean true here, although |
| 50 | the values used by F<perl5db.pl> have the form |
| 51 | C<"$break_condition\0$action">. |
| 52 | |
| 53 | The same holds for evaluated strings that contain subroutines, or |
| 54 | which are currently being executed. The $filename for C<eval>ed strings |
| 55 | looks like C<(eval 34)>. |
| 56 | |
| 57 | =item * |
| 58 | |
| 59 | Each scalar C<${"_<$filename"}> contains C<"_<$filename">. This is |
| 60 | also the case for evaluated strings that contain subroutines, or |
| 61 | which are currently being executed. The $filename for C<eval>ed |
| 62 | strings looks like C<(eval 34)>. |
| 63 | |
| 64 | =item * |
| 65 | |
| 66 | After each C<require>d file is compiled, but before it is executed, |
| 67 | C<DB::postponed(*{"_<$filename"})> is called if the subroutine |
| 68 | C<DB::postponed> exists. Here, the $filename is the expanded name of |
| 69 | the C<require>d file, as found in the values of %INC. |
| 70 | |
| 71 | =item * |
| 72 | |
| 73 | After each subroutine C<subname> is compiled, the existence of |
| 74 | C<$DB::postponed{subname}> is checked. If this key exists, |
| 75 | C<DB::postponed(subname)> is called if the C<DB::postponed> subroutine |
| 76 | also exists. |
| 77 | |
| 78 | =item * |
| 79 | |
| 80 | A hash C<%DB::sub> is maintained, whose keys are subroutine names |
| 81 | and whose values have the form C<filename:startline-endline>. |
| 82 | C<filename> has the form C<(eval 34)> for subroutines defined inside |
| 83 | C<eval>s. |
| 84 | |
| 85 | =item * |
| 86 | |
| 87 | When the execution of your program reaches a point that can hold a |
| 88 | breakpoint, the C<DB::DB()> subroutine is called if any of the variables |
| 89 | C<$DB::trace>, C<$DB::single>, or C<$DB::signal> is true. These variables |
| 90 | are not C<local>izable. This feature is disabled when executing |
| 91 | inside C<DB::DB()>, including functions called from it |
| 92 | unless C<< $^D & (1<<30) >> is true. |
| 93 | |
| 94 | =item * |
| 95 | |
| 96 | When execution of the program reaches a subroutine call, a call to |
| 97 | C<&DB::sub>(I<args>) is made instead, with C<$DB::sub> set to identify |
| 98 | the called subroutine. (This doesn't happen if the calling subroutine |
| 99 | was compiled in the C<DB> package.) C<$DB::sub> normally holds the name |
| 100 | of the called subroutine, if it has a name by which it can be looked up. |
| 101 | Failing that, C<$DB::sub> will hold a reference to the called subroutine. |
| 102 | Either way, the C<&DB::sub> subroutine can use C<$DB::sub> as a reference |
| 103 | by which to call the called subroutine, which it will normally want to do. |
| 104 | |
| 105 | X<&DB::lsub>If the call is to an lvalue subroutine, and C<&DB::lsub> |
| 106 | is defined C<&DB::lsub>(I<args>) is called instead, otherwise falling |
| 107 | back to C<&DB::sub>(I<args>). |
| 108 | |
| 109 | =item * |
| 110 | |
| 111 | When execution of the program uses C<goto> to enter a non-XS subroutine |
| 112 | and the 0x80 bit is set in C<$^P>, a call to C<&DB::goto> is made, with |
| 113 | C<$DB::sub> set to identify the subroutine being entered. The call to |
| 114 | C<&DB::goto> does not replace the C<goto>; the requested subroutine will |
| 115 | still be entered once C<&DB::goto> has returned. C<$DB::sub> normally |
| 116 | holds the name of the subroutine being entered, if it has one. Failing |
| 117 | that, C<$DB::sub> will hold a reference to the subroutine being entered. |
| 118 | Unlike when C<&DB::sub> is called, it is not guaranteed that C<$DB::sub> |
| 119 | can be used as a reference to operate on the subroutine being entered. |
| 120 | |
| 121 | =back |
| 122 | |
| 123 | Note that if C<&DB::sub> needs external data for it to work, no |
| 124 | subroutine call is possible without it. As an example, the standard |
| 125 | debugger's C<&DB::sub> depends on the C<$DB::deep> variable |
| 126 | (it defines how many levels of recursion deep into the debugger you can go |
| 127 | before a mandatory break). If C<$DB::deep> is not defined, subroutine |
| 128 | calls are not possible, even though C<&DB::sub> exists. |
| 129 | |
| 130 | =head2 Writing Your Own Debugger |
| 131 | |
| 132 | =head3 Environment Variables |
| 133 | |
| 134 | The C<PERL5DB> environment variable can be used to define a debugger. |
| 135 | For example, the minimal "working" debugger (it actually doesn't do anything) |
| 136 | consists of one line: |
| 137 | |
| 138 | sub DB::DB {} |
| 139 | |
| 140 | It can easily be defined like this: |
| 141 | |
| 142 | $ PERL5DB="sub DB::DB {}" perl -d your-script |
| 143 | |
| 144 | Another brief debugger, slightly more useful, can be created |
| 145 | with only the line: |
| 146 | |
| 147 | sub DB::DB {print ++$i; scalar <STDIN>} |
| 148 | |
| 149 | This debugger prints a number which increments for each statement |
| 150 | encountered and waits for you to hit a newline before continuing |
| 151 | to the next statement. |
| 152 | |
| 153 | The following debugger is actually useful: |
| 154 | |
| 155 | { |
| 156 | package DB; |
| 157 | sub DB {} |
| 158 | sub sub {print ++$i, " $sub\n"; &$sub} |
| 159 | } |
| 160 | |
| 161 | It prints the sequence number of each subroutine call and the name of the |
| 162 | called subroutine. Note that C<&DB::sub> is being compiled into the |
| 163 | package C<DB> through the use of the C<package> directive. |
| 164 | |
| 165 | When it starts, the debugger reads your rc file (F<./.perldb> or |
| 166 | F<~/.perldb> under Unix), which can set important options. |
| 167 | (A subroutine (C<&afterinit>) can be defined here as well; it is executed |
| 168 | after the debugger completes its own initialization.) |
| 169 | |
| 170 | After the rc file is read, the debugger reads the PERLDB_OPTS |
| 171 | environment variable and uses it to set debugger options. The |
| 172 | contents of this variable are treated as if they were the argument |
| 173 | of an C<o ...> debugger command (q.v. in L<perldebug/"Configurable Options">). |
| 174 | |
| 175 | =head3 Debugger Internal Variables |
| 176 | |
| 177 | In addition to the file and subroutine-related variables mentioned above, |
| 178 | the debugger also maintains various magical internal variables. |
| 179 | |
| 180 | =over 4 |
| 181 | |
| 182 | =item * |
| 183 | |
| 184 | C<@DB::dbline> is an alias for C<@{"::_<current_file"}>, which |
| 185 | holds the lines of the currently-selected file (compiled by Perl), either |
| 186 | explicitly chosen with the debugger's C<f> command, or implicitly by flow |
| 187 | of execution. |
| 188 | |
| 189 | Values in this array are magical in numeric context: they compare |
| 190 | equal to zero only if the line is not breakable. |
| 191 | |
| 192 | =item * |
| 193 | |
| 194 | C<%DB::dbline> is an alias for C<%{"::_<current_file"}>, which |
| 195 | contains breakpoints and actions keyed by line number in |
| 196 | the currently-selected file, either explicitly chosen with the |
| 197 | debugger's C<f> command, or implicitly by flow of execution. |
| 198 | |
| 199 | As previously noted, individual entries (as opposed to the whole hash) |
| 200 | are settable. Perl only cares about Boolean true here, although |
| 201 | the values used by F<perl5db.pl> have the form |
| 202 | C<"$break_condition\0$action">. |
| 203 | |
| 204 | =back |
| 205 | |
| 206 | =head3 Debugger Customization Functions |
| 207 | |
| 208 | Some functions are provided to simplify customization. |
| 209 | |
| 210 | =over 4 |
| 211 | |
| 212 | =item * |
| 213 | |
| 214 | See L<perldebug/"Configurable Options"> for a description of options parsed by |
| 215 | C<DB::parse_options(string)>. |
| 216 | |
| 217 | =item * |
| 218 | |
| 219 | C<DB::dump_trace(skip[,count])> skips the specified number of frames |
| 220 | and returns a list containing information about the calling frames (all |
| 221 | of them, if C<count> is missing). Each entry is reference to a hash |
| 222 | with keys C<context> (either C<.>, C<$>, or C<@>), C<sub> (subroutine |
| 223 | name, or info about C<eval>), C<args> (C<undef> or a reference to |
| 224 | an array), C<file>, and C<line>. |
| 225 | |
| 226 | =item * |
| 227 | |
| 228 | C<DB::print_trace(FH, skip[, count[, short]])> prints |
| 229 | formatted info about caller frames. The last two functions may be |
| 230 | convenient as arguments to C<< < >>, C<< << >> commands. |
| 231 | |
| 232 | =back |
| 233 | |
| 234 | Note that any variables and functions that are not documented in |
| 235 | this manpages (or in L<perldebug>) are considered for internal |
| 236 | use only, and as such are subject to change without notice. |
| 237 | |
| 238 | =head1 Frame Listing Output Examples |
| 239 | |
| 240 | The C<frame> option can be used to control the output of frame |
| 241 | information. For example, contrast this expression trace: |
| 242 | |
| 243 | $ perl -de 42 |
| 244 | Stack dump during die enabled outside of evals. |
| 245 | |
| 246 | Loading DB routines from perl5db.pl patch level 0.94 |
| 247 | Emacs support available. |
| 248 | |
| 249 | Enter h or 'h h' for help. |
| 250 | |
| 251 | main::(-e:1): 0 |
| 252 | DB<1> sub foo { 14 } |
| 253 | |
| 254 | DB<2> sub bar { 3 } |
| 255 | |
| 256 | DB<3> t print foo() * bar() |
| 257 | main::((eval 172):3): print foo() + bar(); |
| 258 | main::foo((eval 168):2): |
| 259 | main::bar((eval 170):2): |
| 260 | 42 |
| 261 | |
| 262 | with this one, once the C<o>ption C<frame=2> has been set: |
| 263 | |
| 264 | DB<4> o f=2 |
| 265 | frame = '2' |
| 266 | DB<5> t print foo() * bar() |
| 267 | 3: foo() * bar() |
| 268 | entering main::foo |
| 269 | 2: sub foo { 14 }; |
| 270 | exited main::foo |
| 271 | entering main::bar |
| 272 | 2: sub bar { 3 }; |
| 273 | exited main::bar |
| 274 | 42 |
| 275 | |
| 276 | By way of demonstration, we present below a laborious listing |
| 277 | resulting from setting your C<PERLDB_OPTS> environment variable to |
| 278 | the value C<f=n N>, and running I<perl -d -V> from the command line. |
| 279 | Examples using various values of C<n> are shown to give you a feel |
| 280 | for the difference between settings. Long though it may be, this |
| 281 | is not a complete listing, but only excerpts. |
| 282 | |
| 283 | =over 4 |
| 284 | |
| 285 | =item 1 |
| 286 | |
| 287 | entering main::BEGIN |
| 288 | entering Config::BEGIN |
| 289 | Package lib/Exporter.pm. |
| 290 | Package lib/Carp.pm. |
| 291 | Package lib/Config.pm. |
| 292 | entering Config::TIEHASH |
| 293 | entering Exporter::import |
| 294 | entering Exporter::export |
| 295 | entering Config::myconfig |
| 296 | entering Config::FETCH |
| 297 | entering Config::FETCH |
| 298 | entering Config::FETCH |
| 299 | entering Config::FETCH |
| 300 | |
| 301 | =item 2 |
| 302 | |
| 303 | entering main::BEGIN |
| 304 | entering Config::BEGIN |
| 305 | Package lib/Exporter.pm. |
| 306 | Package lib/Carp.pm. |
| 307 | exited Config::BEGIN |
| 308 | Package lib/Config.pm. |
| 309 | entering Config::TIEHASH |
| 310 | exited Config::TIEHASH |
| 311 | entering Exporter::import |
| 312 | entering Exporter::export |
| 313 | exited Exporter::export |
| 314 | exited Exporter::import |
| 315 | exited main::BEGIN |
| 316 | entering Config::myconfig |
| 317 | entering Config::FETCH |
| 318 | exited Config::FETCH |
| 319 | entering Config::FETCH |
| 320 | exited Config::FETCH |
| 321 | entering Config::FETCH |
| 322 | |
| 323 | =item 3 |
| 324 | |
| 325 | in $=main::BEGIN() from /dev/null:0 |
| 326 | in $=Config::BEGIN() from lib/Config.pm:2 |
| 327 | Package lib/Exporter.pm. |
| 328 | Package lib/Carp.pm. |
| 329 | Package lib/Config.pm. |
| 330 | in $=Config::TIEHASH('Config') from lib/Config.pm:644 |
| 331 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 |
| 332 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from li |
| 333 | in @=Config::myconfig() from /dev/null:0 |
| 334 | in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 |
| 335 | in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 |
| 336 | in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 |
| 337 | in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574 |
| 338 | in $=Config::FETCH(ref(Config), 'osname') from lib/Config.pm:574 |
| 339 | in $=Config::FETCH(ref(Config), 'osvers') from lib/Config.pm:574 |
| 340 | |
| 341 | =item 4 |
| 342 | |
| 343 | in $=main::BEGIN() from /dev/null:0 |
| 344 | in $=Config::BEGIN() from lib/Config.pm:2 |
| 345 | Package lib/Exporter.pm. |
| 346 | Package lib/Carp.pm. |
| 347 | out $=Config::BEGIN() from lib/Config.pm:0 |
| 348 | Package lib/Config.pm. |
| 349 | in $=Config::TIEHASH('Config') from lib/Config.pm:644 |
| 350 | out $=Config::TIEHASH('Config') from lib/Config.pm:644 |
| 351 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 |
| 352 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/ |
| 353 | out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/ |
| 354 | out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 |
| 355 | out $=main::BEGIN() from /dev/null:0 |
| 356 | in @=Config::myconfig() from /dev/null:0 |
| 357 | in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 |
| 358 | out $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 |
| 359 | in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 |
| 360 | out $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 |
| 361 | in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 |
| 362 | out $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 |
| 363 | in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574 |
| 364 | |
| 365 | =item 5 |
| 366 | |
| 367 | in $=main::BEGIN() from /dev/null:0 |
| 368 | in $=Config::BEGIN() from lib/Config.pm:2 |
| 369 | Package lib/Exporter.pm. |
| 370 | Package lib/Carp.pm. |
| 371 | out $=Config::BEGIN() from lib/Config.pm:0 |
| 372 | Package lib/Config.pm. |
| 373 | in $=Config::TIEHASH('Config') from lib/Config.pm:644 |
| 374 | out $=Config::TIEHASH('Config') from lib/Config.pm:644 |
| 375 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 |
| 376 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E |
| 377 | out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E |
| 378 | out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 |
| 379 | out $=main::BEGIN() from /dev/null:0 |
| 380 | in @=Config::myconfig() from /dev/null:0 |
| 381 | in $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574 |
| 382 | out $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574 |
| 383 | in $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574 |
| 384 | out $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574 |
| 385 | |
| 386 | =item 6 |
| 387 | |
| 388 | in $=CODE(0x15eca4)() from /dev/null:0 |
| 389 | in $=CODE(0x182528)() from lib/Config.pm:2 |
| 390 | Package lib/Exporter.pm. |
| 391 | out $=CODE(0x182528)() from lib/Config.pm:0 |
| 392 | scalar context return from CODE(0x182528): undef |
| 393 | Package lib/Config.pm. |
| 394 | in $=Config::TIEHASH('Config') from lib/Config.pm:628 |
| 395 | out $=Config::TIEHASH('Config') from lib/Config.pm:628 |
| 396 | scalar context return from Config::TIEHASH: empty hash |
| 397 | in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 |
| 398 | in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171 |
| 399 | out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171 |
| 400 | scalar context return from Exporter::export: '' |
| 401 | out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 |
| 402 | scalar context return from Exporter::import: '' |
| 403 | |
| 404 | =back |
| 405 | |
| 406 | In all cases shown above, the line indentation shows the call tree. |
| 407 | If bit 2 of C<frame> is set, a line is printed on exit from a |
| 408 | subroutine as well. If bit 4 is set, the arguments are printed |
| 409 | along with the caller info. If bit 8 is set, the arguments are |
| 410 | printed even if they are tied or references. If bit 16 is set, the |
| 411 | return value is printed, too. |
| 412 | |
| 413 | When a package is compiled, a line like this |
| 414 | |
| 415 | Package lib/Carp.pm. |
| 416 | |
| 417 | is printed with proper indentation. |
| 418 | |
| 419 | =head1 Debugging Regular Expressions |
| 420 | |
| 421 | There are two ways to enable debugging output for regular expressions. |
| 422 | |
| 423 | If your perl is compiled with C<-DDEBUGGING>, you may use the |
| 424 | B<-Dr> flag on the command line. |
| 425 | |
| 426 | Otherwise, one can C<use re 'debug'>, which has effects at |
| 427 | compile time and run time. Since Perl 5.9.5, this pragma is lexically |
| 428 | scoped. |
| 429 | |
| 430 | =head2 Compile-time Output |
| 431 | |
| 432 | The debugging output at compile time looks like this: |
| 433 | |
| 434 | Compiling REx '[bc]d(ef*g)+h[ij]k$' |
| 435 | size 45 Got 364 bytes for offset annotations. |
| 436 | first at 1 |
| 437 | rarest char g at 0 |
| 438 | rarest char d at 0 |
| 439 | 1: ANYOF[bc](12) |
| 440 | 12: EXACT <d>(14) |
| 441 | 14: CURLYX[0] {1,32767}(28) |
| 442 | 16: OPEN1(18) |
| 443 | 18: EXACT <e>(20) |
| 444 | 20: STAR(23) |
| 445 | 21: EXACT <f>(0) |
| 446 | 23: EXACT <g>(25) |
| 447 | 25: CLOSE1(27) |
| 448 | 27: WHILEM[1/1](0) |
| 449 | 28: NOTHING(29) |
| 450 | 29: EXACT <h>(31) |
| 451 | 31: ANYOF[ij](42) |
| 452 | 42: EXACT <k>(44) |
| 453 | 44: EOL(45) |
| 454 | 45: END(0) |
| 455 | anchored 'de' at 1 floating 'gh' at 3..2147483647 (checking floating) |
| 456 | stclass 'ANYOF[bc]' minlen 7 |
| 457 | Offsets: [45] |
| 458 | 1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1] |
| 459 | 0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0] |
| 460 | 11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0] |
| 461 | 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0] |
| 462 | Omitting $` $& $' support. |
| 463 | |
| 464 | The first line shows the pre-compiled form of the regex. The second |
| 465 | shows the size of the compiled form (in arbitrary units, usually |
| 466 | 4-byte words) and the total number of bytes allocated for the |
| 467 | offset/length table, usually 4+C<size>*8. The next line shows the |
| 468 | label I<id> of the first node that does a match. |
| 469 | |
| 470 | The |
| 471 | |
| 472 | anchored 'de' at 1 floating 'gh' at 3..2147483647 (checking floating) |
| 473 | stclass 'ANYOF[bc]' minlen 7 |
| 474 | |
| 475 | line (split into two lines above) contains optimizer |
| 476 | information. In the example shown, the optimizer found that the match |
| 477 | should contain a substring C<de> at offset 1, plus substring C<gh> |
| 478 | at some offset between 3 and infinity. Moreover, when checking for |
| 479 | these substrings (to abandon impossible matches quickly), Perl will check |
| 480 | for the substring C<gh> before checking for the substring C<de>. The |
| 481 | optimizer may also use the knowledge that the match starts (at the |
| 482 | C<first> I<id>) with a character class, and no string |
| 483 | shorter than 7 characters can possibly match. |
| 484 | |
| 485 | The fields of interest which may appear in this line are |
| 486 | |
| 487 | =over 4 |
| 488 | |
| 489 | =item C<anchored> I<STRING> C<at> I<POS> |
| 490 | |
| 491 | =item C<floating> I<STRING> C<at> I<POS1..POS2> |
| 492 | |
| 493 | See above. |
| 494 | |
| 495 | =item C<matching floating/anchored> |
| 496 | |
| 497 | Which substring to check first. |
| 498 | |
| 499 | =item C<minlen> |
| 500 | |
| 501 | The minimal length of the match. |
| 502 | |
| 503 | =item C<stclass> I<TYPE> |
| 504 | |
| 505 | Type of first matching node. |
| 506 | |
| 507 | =item C<noscan> |
| 508 | |
| 509 | Don't scan for the found substrings. |
| 510 | |
| 511 | =item C<isall> |
| 512 | |
| 513 | Means that the optimizer information is all that the regular |
| 514 | expression contains, and thus one does not need to enter the regex engine at |
| 515 | all. |
| 516 | |
| 517 | =item C<GPOS> |
| 518 | |
| 519 | Set if the pattern contains C<\G>. |
| 520 | |
| 521 | =item C<plus> |
| 522 | |
| 523 | Set if the pattern starts with a repeated char (as in C<x+y>). |
| 524 | |
| 525 | =item C<implicit> |
| 526 | |
| 527 | Set if the pattern starts with C<.*>. |
| 528 | |
| 529 | =item C<with eval> |
| 530 | |
| 531 | Set if the pattern contain eval-groups, such as C<(?{ code })> and |
| 532 | C<(??{ code })>. |
| 533 | |
| 534 | =item C<anchored(TYPE)> |
| 535 | |
| 536 | If the pattern may match only at a handful of places, with C<TYPE> |
| 537 | being C<SBOL>, C<MBOL>, or C<GPOS>. See the table below. |
| 538 | |
| 539 | =back |
| 540 | |
| 541 | If a substring is known to match at end-of-line only, it may be |
| 542 | followed by C<$>, as in C<floating 'k'$>. |
| 543 | |
| 544 | The optimizer-specific information is used to avoid entering (a slow) regex |
| 545 | engine on strings that will not definitely match. If the C<isall> flag |
| 546 | is set, a call to the regex engine may be avoided even when the optimizer |
| 547 | found an appropriate place for the match. |
| 548 | |
| 549 | Above the optimizer section is the list of I<nodes> of the compiled |
| 550 | form of the regex. Each line has format |
| 551 | |
| 552 | C< >I<id>: I<TYPE> I<OPTIONAL-INFO> (I<next-id>) |
| 553 | |
| 554 | =head2 Types of Nodes |
| 555 | |
| 556 | Here are the current possible types, with short descriptions: |
| 557 | |
| 558 | =for comment |
| 559 | This table is generated by regen/regcomp.pl. Any changes made here |
| 560 | will be lost. |
| 561 | |
| 562 | =for regcomp.pl begin |
| 563 | |
| 564 | # TYPE arg-description [num-args] [longjump-len] DESCRIPTION |
| 565 | |
| 566 | # Exit points |
| 567 | |
| 568 | END no End of program. |
| 569 | SUCCEED no Return from a subroutine, basically. |
| 570 | |
| 571 | # Line Start Anchors: |
| 572 | SBOL no Match "" at beginning of line: /^/, /\A/ |
| 573 | MBOL no Same, assuming multiline: /^/m |
| 574 | |
| 575 | # Line End Anchors: |
| 576 | SEOL no Match "" at end of line: /$/ |
| 577 | MEOL no Same, assuming multiline: /$/m |
| 578 | EOS no Match "" at end of string: /\z/ |
| 579 | |
| 580 | # Match Start Anchors: |
| 581 | GPOS no Matches where last m//g left off. |
| 582 | |
| 583 | # Word Boundary Opcodes: |
| 584 | BOUND no Like BOUNDA for non-utf8, otherwise match |
| 585 | "" between any Unicode \w\W or \W\w |
| 586 | BOUNDL no Like BOUND/BOUNDU, but \w and \W are |
| 587 | defined by current locale |
| 588 | BOUNDU no Match "" at any boundary of a given type |
| 589 | using Unicode rules |
| 590 | BOUNDA no Match "" at any boundary between \w\W or |
| 591 | \W\w, where \w is [_a-zA-Z0-9] |
| 592 | NBOUND no Like NBOUNDA for non-utf8, otherwise match |
| 593 | "" between any Unicode \w\w or \W\W |
| 594 | NBOUNDL no Like NBOUND/NBOUNDU, but \w and \W are |
| 595 | defined by current locale |
| 596 | NBOUNDU no Match "" at any non-boundary of a given |
| 597 | type using using Unicode rules |
| 598 | NBOUNDA no Match "" betweeen any \w\w or \W\W, where |
| 599 | \w is [_a-zA-Z0-9] |
| 600 | |
| 601 | # [Special] alternatives: |
| 602 | REG_ANY no Match any one character (except newline). |
| 603 | SANY no Match any one character. |
| 604 | ANYOF sv 1 Match character in (or not in) this class, |
| 605 | single char match only |
| 606 | ANYOFD sv 1 Like ANYOF, but /d is in effect |
| 607 | ANYOFL sv 1 Like ANYOF, but /l is in effect |
| 608 | ANYOFM byte 1 Like ANYOF, but matches an invariant byte |
| 609 | as determined by the mask and arg |
| 610 | |
| 611 | # POSIX Character Classes: |
| 612 | POSIXD none Some [[:class:]] under /d; the FLAGS field |
| 613 | gives which one |
| 614 | POSIXL none Some [[:class:]] under /l; the FLAGS field |
| 615 | gives which one |
| 616 | POSIXU none Some [[:class:]] under /u; the FLAGS field |
| 617 | gives which one |
| 618 | POSIXA none Some [[:class:]] under /a; the FLAGS field |
| 619 | gives which one |
| 620 | NPOSIXD none complement of POSIXD, [[:^class:]] |
| 621 | NPOSIXL none complement of POSIXL, [[:^class:]] |
| 622 | NPOSIXU none complement of POSIXU, [[:^class:]] |
| 623 | NPOSIXA none complement of POSIXA, [[:^class:]] |
| 624 | |
| 625 | ASCII none [[:ascii:]] |
| 626 | NASCII none [[:^ascii:]] |
| 627 | |
| 628 | CLUMP no Match any extended grapheme cluster |
| 629 | sequence |
| 630 | |
| 631 | # Alternation |
| 632 | |
| 633 | # BRANCH The set of branches constituting a single choice are |
| 634 | # hooked together with their "next" pointers, since |
| 635 | # precedence prevents anything being concatenated to |
| 636 | # any individual branch. The "next" pointer of the last |
| 637 | # BRANCH in a choice points to the thing following the |
| 638 | # whole choice. This is also where the final "next" |
| 639 | # pointer of each individual branch points; each branch |
| 640 | # starts with the operand node of a BRANCH node. |
| 641 | # |
| 642 | BRANCH node Match this alternative, or the next... |
| 643 | |
| 644 | # Literals |
| 645 | |
| 646 | EXACT str Match this string (preceded by length). |
| 647 | EXACTL str Like EXACT, but /l is in effect (used so |
| 648 | locale-related warnings can be checked |
| 649 | for). |
| 650 | EXACTF str Match this non-UTF-8 string (not guaranteed |
| 651 | to be folded) using /id rules (w/len). |
| 652 | EXACTFL str Match this string (not guaranteed to be |
| 653 | folded) using /il rules (w/len). |
| 654 | EXACTFU str Match this string (folded iff in UTF-8, |
| 655 | length in folding doesn't change if not in |
| 656 | UTF-8) using /iu rules (w/len). |
| 657 | EXACTFAA str Match this string (not guaranteed to be |
| 658 | folded) using /iaa rules (w/len). |
| 659 | |
| 660 | EXACTFU_SS str Match this string (folded iff in UTF-8, |
| 661 | length in folding may change even if not in |
| 662 | UTF-8) using /iu rules (w/len). |
| 663 | EXACTFLU8 str Rare circumstances: like EXACTFU, but is |
| 664 | under /l, UTF-8, folded, and everything in |
| 665 | it is above 255. |
| 666 | EXACTFAA_NO_TRIE str Match this string (which is not trie-able; |
| 667 | not guaranteed to be folded) using /iaa |
| 668 | rules (w/len). |
| 669 | |
| 670 | # Do nothing types |
| 671 | |
| 672 | NOTHING no Match empty string. |
| 673 | # A variant of above which delimits a group, thus stops optimizations |
| 674 | TAIL no Match empty string. Can jump here from |
| 675 | outside. |
| 676 | |
| 677 | # Loops |
| 678 | |
| 679 | # STAR,PLUS '?', and complex '*' and '+', are implemented as |
| 680 | # circular BRANCH structures. Simple cases |
| 681 | # (one character per match) are implemented with STAR |
| 682 | # and PLUS for speed and to minimize recursive plunges. |
| 683 | # |
| 684 | STAR node Match this (simple) thing 0 or more times. |
| 685 | PLUS node Match this (simple) thing 1 or more times. |
| 686 | |
| 687 | CURLY sv 2 Match this simple thing {n,m} times. |
| 688 | CURLYN no 2 Capture next-after-this simple thing |
| 689 | CURLYM no 2 Capture this medium-complex thing {n,m} |
| 690 | times. |
| 691 | CURLYX sv 2 Match this complex thing {n,m} times. |
| 692 | |
| 693 | # This terminator creates a loop structure for CURLYX |
| 694 | WHILEM no Do curly processing and see if rest |
| 695 | matches. |
| 696 | |
| 697 | # Buffer related |
| 698 | |
| 699 | # OPEN,CLOSE,GROUPP ...are numbered at compile time. |
| 700 | OPEN num 1 Mark this point in input as start of #n. |
| 701 | CLOSE num 1 Close corresponding OPEN of #n. |
| 702 | SROPEN none Same as OPEN, but for script run |
| 703 | SRCLOSE none Close preceding SROPEN |
| 704 | |
| 705 | REF num 1 Match some already matched string |
| 706 | REFF num 1 Match already matched string, folded using |
| 707 | native charset rules for non-utf8 |
| 708 | REFFL num 1 Match already matched string, folded in |
| 709 | loc. |
| 710 | REFFU num 1 Match already matched string, folded using |
| 711 | unicode rules for non-utf8 |
| 712 | REFFA num 1 Match already matched string, folded using |
| 713 | unicode rules for non-utf8, no mixing |
| 714 | ASCII, non-ASCII |
| 715 | |
| 716 | # Named references. Code in regcomp.c assumes that these all are after |
| 717 | # the numbered references |
| 718 | NREF no-sv 1 Match some already matched string |
| 719 | NREFF no-sv 1 Match already matched string, folded using |
| 720 | native charset rules for non-utf8 |
| 721 | NREFFL no-sv 1 Match already matched string, folded in |
| 722 | loc. |
| 723 | NREFFU num 1 Match already matched string, folded using |
| 724 | unicode rules for non-utf8 |
| 725 | NREFFA num 1 Match already matched string, folded using |
| 726 | unicode rules for non-utf8, no mixing |
| 727 | ASCII, non-ASCII |
| 728 | |
| 729 | # Support for long RE |
| 730 | LONGJMP off 1 1 Jump far away. |
| 731 | BRANCHJ off 1 1 BRANCH with long offset. |
| 732 | |
| 733 | # Special Case Regops |
| 734 | IFMATCH off 1 2 Succeeds if the following matches. |
| 735 | UNLESSM off 1 2 Fails if the following matches. |
| 736 | SUSPEND off 1 1 "Independent" sub-RE. |
| 737 | IFTHEN off 1 1 Switch, should be preceded by switcher. |
| 738 | GROUPP num 1 Whether the group matched. |
| 739 | |
| 740 | # The heavy worker |
| 741 | |
| 742 | EVAL evl/flags Execute some Perl code. |
| 743 | 2L |
| 744 | |
| 745 | # Modifiers |
| 746 | |
| 747 | MINMOD no Next operator is not greedy. |
| 748 | LOGICAL no Next opcode should set the flag only. |
| 749 | |
| 750 | # This is not used yet |
| 751 | RENUM off 1 1 Group with independently numbered parens. |
| 752 | |
| 753 | # Trie Related |
| 754 | |
| 755 | # Behave the same as A|LIST|OF|WORDS would. The '..C' variants |
| 756 | # have inline charclass data (ascii only), the 'C' store it in the |
| 757 | # structure. |
| 758 | |
| 759 | TRIE trie 1 Match many EXACT(F[ALU]?)? at once. |
| 760 | flags==type |
| 761 | TRIEC trie Same as TRIE, but with embedded charclass |
| 762 | charclass data |
| 763 | |
| 764 | AHOCORASICK trie 1 Aho Corasick stclass. flags==type |
| 765 | AHOCORASICKC trie Same as AHOCORASICK, but with embedded |
| 766 | charclass charclass data |
| 767 | |
| 768 | # Regex Subroutines |
| 769 | GOSUB num/ofs 2L recurse to paren arg1 at (signed) ofs arg2 |
| 770 | |
| 771 | # Special conditionals |
| 772 | NGROUPP no-sv 1 Whether the group matched. |
| 773 | INSUBP num 1 Whether we are in a specific recurse. |
| 774 | DEFINEP none 1 Never execute directly. |
| 775 | |
| 776 | # Backtracking Verbs |
| 777 | ENDLIKE none Used only for the type field of verbs |
| 778 | OPFAIL no-sv 1 Same as (?!), but with verb arg |
| 779 | ACCEPT no-sv/num Accepts the current matched string, with |
| 780 | 2L verbar |
| 781 | |
| 782 | # Verbs With Arguments |
| 783 | VERB no-sv 1 Used only for the type field of verbs |
| 784 | PRUNE no-sv 1 Pattern fails at this startpoint if no- |
| 785 | backtracking through this |
| 786 | MARKPOINT no-sv 1 Push the current location for rollback by |
| 787 | cut. |
| 788 | SKIP no-sv 1 On failure skip forward (to the mark) |
| 789 | before retrying |
| 790 | COMMIT no-sv 1 Pattern fails outright if backtracking |
| 791 | through this |
| 792 | CUTGROUP no-sv 1 On failure go to the next alternation in |
| 793 | the group |
| 794 | |
| 795 | # Control what to keep in $&. |
| 796 | KEEPS no $& begins here. |
| 797 | |
| 798 | # New charclass like patterns |
| 799 | LNBREAK none generic newline pattern |
| 800 | |
| 801 | # SPECIAL REGOPS |
| 802 | |
| 803 | # This is not really a node, but an optimized away piece of a "long" |
| 804 | # node. To simplify debugging output, we mark it as if it were a node |
| 805 | OPTIMIZED off Placeholder for dump. |
| 806 | |
| 807 | # Special opcode with the property that no opcode in a compiled program |
| 808 | # will ever be of this type. Thus it can be used as a flag value that |
| 809 | # no other opcode has been seen. END is used similarly, in that an END |
| 810 | # node cant be optimized. So END implies "unoptimizable" and PSEUDO |
| 811 | # mean "not seen anything to optimize yet". |
| 812 | PSEUDO off Pseudo opcode for internal use. |
| 813 | |
| 814 | =for regcomp.pl end |
| 815 | |
| 816 | =for unprinted-credits |
| 817 | Next section M-J. Dominus (mjd-perl-patch+@plover.com) 20010421 |
| 818 | |
| 819 | Following the optimizer information is a dump of the offset/length |
| 820 | table, here split across several lines: |
| 821 | |
| 822 | Offsets: [45] |
| 823 | 1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1] |
| 824 | 0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0] |
| 825 | 11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0] |
| 826 | 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0] |
| 827 | |
| 828 | The first line here indicates that the offset/length table contains 45 |
| 829 | entries. Each entry is a pair of integers, denoted by C<offset[length]>. |
| 830 | Entries are numbered starting with 1, so entry #1 here is C<1[4]> and |
| 831 | entry #12 is C<5[1]>. C<1[4]> indicates that the node labeled C<1:> |
| 832 | (the C<1: ANYOF[bc]>) begins at character position 1 in the |
| 833 | pre-compiled form of the regex, and has a length of 4 characters. |
| 834 | C<5[1]> in position 12 |
| 835 | indicates that the node labeled C<12:> |
| 836 | (the C<< 12: EXACT <d> >>) begins at character position 5 in the |
| 837 | pre-compiled form of the regex, and has a length of 1 character. |
| 838 | C<12[1]> in position 14 |
| 839 | indicates that the node labeled C<14:> |
| 840 | (the C<< 14: CURLYX[0] {1,32767} >>) begins at character position 12 in the |
| 841 | pre-compiled form of the regex, and has a length of 1 character---that |
| 842 | is, it corresponds to the C<+> symbol in the precompiled regex. |
| 843 | |
| 844 | C<0[0]> items indicate that there is no corresponding node. |
| 845 | |
| 846 | =head2 Run-time Output |
| 847 | |
| 848 | First of all, when doing a match, one may get no run-time output even |
| 849 | if debugging is enabled. This means that the regex engine was never |
| 850 | entered and that all of the job was therefore done by the optimizer. |
| 851 | |
| 852 | If the regex engine was entered, the output may look like this: |
| 853 | |
| 854 | Matching '[bc]d(ef*g)+h[ij]k$' against 'abcdefg__gh__' |
| 855 | Setting an EVAL scope, savestack=3 |
| 856 | 2 <ab> <cdefg__gh_> | 1: ANYOF |
| 857 | 3 <abc> <defg__gh_> | 11: EXACT <d> |
| 858 | 4 <abcd> <efg__gh_> | 13: CURLYX {1,32767} |
| 859 | 4 <abcd> <efg__gh_> | 26: WHILEM |
| 860 | 0 out of 1..32767 cc=effff31c |
| 861 | 4 <abcd> <efg__gh_> | 15: OPEN1 |
| 862 | 4 <abcd> <efg__gh_> | 17: EXACT <e> |
| 863 | 5 <abcde> <fg__gh_> | 19: STAR |
| 864 | EXACT <f> can match 1 times out of 32767... |
| 865 | Setting an EVAL scope, savestack=3 |
| 866 | 6 <bcdef> <g__gh__> | 22: EXACT <g> |
| 867 | 7 <bcdefg> <__gh__> | 24: CLOSE1 |
| 868 | 7 <bcdefg> <__gh__> | 26: WHILEM |
| 869 | 1 out of 1..32767 cc=effff31c |
| 870 | Setting an EVAL scope, savestack=12 |
| 871 | 7 <bcdefg> <__gh__> | 15: OPEN1 |
| 872 | 7 <bcdefg> <__gh__> | 17: EXACT <e> |
| 873 | restoring \1 to 4(4)..7 |
| 874 | failed, try continuation... |
| 875 | 7 <bcdefg> <__gh__> | 27: NOTHING |
| 876 | 7 <bcdefg> <__gh__> | 28: EXACT <h> |
| 877 | failed... |
| 878 | failed... |
| 879 | |
| 880 | The most significant information in the output is about the particular I<node> |
| 881 | of the compiled regex that is currently being tested against the target string. |
| 882 | The format of these lines is |
| 883 | |
| 884 | C< >I<STRING-OFFSET> <I<PRE-STRING>> <I<POST-STRING>> |I<ID>: I<TYPE> |
| 885 | |
| 886 | The I<TYPE> info is indented with respect to the backtracking level. |
| 887 | Other incidental information appears interspersed within. |
| 888 | |
| 889 | =head1 Debugging Perl Memory Usage |
| 890 | |
| 891 | Perl is a profligate wastrel when it comes to memory use. There |
| 892 | is a saying that to estimate memory usage of Perl, assume a reasonable |
| 893 | algorithm for memory allocation, multiply that estimate by 10, and |
| 894 | while you still may miss the mark, at least you won't be quite so |
| 895 | astonished. This is not absolutely true, but may provide a good |
| 896 | grasp of what happens. |
| 897 | |
| 898 | Assume that an integer cannot take less than 20 bytes of memory, a |
| 899 | float cannot take less than 24 bytes, a string cannot take less |
| 900 | than 32 bytes (all these examples assume 32-bit architectures, the |
| 901 | result are quite a bit worse on 64-bit architectures). If a variable |
| 902 | is accessed in two of three different ways (which require an integer, |
| 903 | a float, or a string), the memory footprint may increase yet another |
| 904 | 20 bytes. A sloppy malloc(3) implementation can inflate these |
| 905 | numbers dramatically. |
| 906 | |
| 907 | On the opposite end of the scale, a declaration like |
| 908 | |
| 909 | sub foo; |
| 910 | |
| 911 | may take up to 500 bytes of memory, depending on which release of Perl |
| 912 | you're running. |
| 913 | |
| 914 | Anecdotal estimates of source-to-compiled code bloat suggest an |
| 915 | eightfold increase. This means that the compiled form of reasonable |
| 916 | (normally commented, properly indented etc.) code will take |
| 917 | about eight times more space in memory than the code took |
| 918 | on disk. |
| 919 | |
| 920 | The B<-DL> command-line switch is obsolete since circa Perl 5.6.0 |
| 921 | (it was available only if Perl was built with C<-DDEBUGGING>). |
| 922 | The switch was used to track Perl's memory allocations and possible |
| 923 | memory leaks. These days the use of malloc debugging tools like |
| 924 | F<Purify> or F<valgrind> is suggested instead. See also |
| 925 | L<perlhacktips/PERL_MEM_LOG>. |
| 926 | |
| 927 | One way to find out how much memory is being used by Perl data |
| 928 | structures is to install the Devel::Size module from CPAN: it gives |
| 929 | you the minimum number of bytes required to store a particular data |
| 930 | structure. Please be mindful of the difference between the size() |
| 931 | and total_size(). |
| 932 | |
| 933 | If Perl has been compiled using Perl's malloc you can analyze Perl |
| 934 | memory usage by setting $ENV{PERL_DEBUG_MSTATS}. |
| 935 | |
| 936 | =head2 Using C<$ENV{PERL_DEBUG_MSTATS}> |
| 937 | |
| 938 | If your perl is using Perl's malloc() and was compiled with the |
| 939 | necessary switches (this is the default), then it will print memory |
| 940 | usage statistics after compiling your code when C<< $ENV{PERL_DEBUG_MSTATS} |
| 941 | > 1 >>, and before termination of the program when C<< |
| 942 | $ENV{PERL_DEBUG_MSTATS} >= 1 >>. The report format is similar to |
| 943 | the following example: |
| 944 | |
| 945 | $ PERL_DEBUG_MSTATS=2 perl -e "require Carp" |
| 946 | Memory allocation statistics after compilation: (buckets 4(4)..8188(8192) |
| 947 | 14216 free: 130 117 28 7 9 0 2 2 1 0 0 |
| 948 | 437 61 36 0 5 |
| 949 | 60924 used: 125 137 161 55 7 8 6 16 2 0 1 |
| 950 | 74 109 304 84 20 |
| 951 | Total sbrk(): 77824/21:119. Odd ends: pad+heads+chain+tail: 0+636+0+2048. |
| 952 | Memory allocation statistics after execution: (buckets 4(4)..8188(8192) |
| 953 | 30888 free: 245 78 85 13 6 2 1 3 2 0 1 |
| 954 | 315 162 39 42 11 |
| 955 | 175816 used: 265 176 1112 111 26 22 11 27 2 1 1 |
| 956 | 196 178 1066 798 39 |
| 957 | Total sbrk(): 215040/47:145. Odd ends: pad+heads+chain+tail: 0+2192+0+6144. |
| 958 | |
| 959 | It is possible to ask for such a statistic at arbitrary points in |
| 960 | your execution using the mstat() function out of the standard |
| 961 | Devel::Peek module. |
| 962 | |
| 963 | Here is some explanation of that format: |
| 964 | |
| 965 | =over 4 |
| 966 | |
| 967 | =item C<buckets SMALLEST(APPROX)..GREATEST(APPROX)> |
| 968 | |
| 969 | Perl's malloc() uses bucketed allocations. Every request is rounded |
| 970 | up to the closest bucket size available, and a bucket is taken from |
| 971 | the pool of buckets of that size. |
| 972 | |
| 973 | The line above describes the limits of buckets currently in use. |
| 974 | Each bucket has two sizes: memory footprint and the maximal size |
| 975 | of user data that can fit into this bucket. Suppose in the above |
| 976 | example that the smallest bucket were size 4. The biggest bucket |
| 977 | would have usable size 8188, and the memory footprint would be 8192. |
| 978 | |
| 979 | In a Perl built for debugging, some buckets may have negative usable |
| 980 | size. This means that these buckets cannot (and will not) be used. |
| 981 | For larger buckets, the memory footprint may be one page greater |
| 982 | than a power of 2. If so, the corresponding power of two is |
| 983 | printed in the C<APPROX> field above. |
| 984 | |
| 985 | =item Free/Used |
| 986 | |
| 987 | The 1 or 2 rows of numbers following that correspond to the number |
| 988 | of buckets of each size between C<SMALLEST> and C<GREATEST>. In |
| 989 | the first row, the sizes (memory footprints) of buckets are powers |
| 990 | of two--or possibly one page greater. In the second row, if present, |
| 991 | the memory footprints of the buckets are between the memory footprints |
| 992 | of two buckets "above". |
| 993 | |
| 994 | For example, suppose under the previous example, the memory footprints |
| 995 | were |
| 996 | |
| 997 | free: 8 16 32 64 128 256 512 1024 2048 4096 8192 |
| 998 | 4 12 24 48 80 |
| 999 | |
| 1000 | With a non-C<DEBUGGING> perl, the buckets starting from C<128> have |
| 1001 | a 4-byte overhead, and thus an 8192-long bucket may take up to |
| 1002 | 8188-byte allocations. |
| 1003 | |
| 1004 | =item C<Total sbrk(): SBRKed/SBRKs:CONTINUOUS> |
| 1005 | |
| 1006 | The first two fields give the total amount of memory perl sbrk(2)ed |
| 1007 | (ess-broken? :-) and number of sbrk(2)s used. The third number is |
| 1008 | what perl thinks about continuity of returned chunks. So long as |
| 1009 | this number is positive, malloc() will assume that it is probable |
| 1010 | that sbrk(2) will provide continuous memory. |
| 1011 | |
| 1012 | Memory allocated by external libraries is not counted. |
| 1013 | |
| 1014 | =item C<pad: 0> |
| 1015 | |
| 1016 | The amount of sbrk(2)ed memory needed to keep buckets aligned. |
| 1017 | |
| 1018 | =item C<heads: 2192> |
| 1019 | |
| 1020 | Although memory overhead of bigger buckets is kept inside the bucket, for |
| 1021 | smaller buckets, it is kept in separate areas. This field gives the |
| 1022 | total size of these areas. |
| 1023 | |
| 1024 | =item C<chain: 0> |
| 1025 | |
| 1026 | malloc() may want to subdivide a bigger bucket into smaller buckets. |
| 1027 | If only a part of the deceased bucket is left unsubdivided, the rest |
| 1028 | is kept as an element of a linked list. This field gives the total |
| 1029 | size of these chunks. |
| 1030 | |
| 1031 | =item C<tail: 6144> |
| 1032 | |
| 1033 | To minimize the number of sbrk(2)s, malloc() asks for more memory. This |
| 1034 | field gives the size of the yet unused part, which is sbrk(2)ed, but |
| 1035 | never touched. |
| 1036 | |
| 1037 | =back |
| 1038 | |
| 1039 | =head1 SEE ALSO |
| 1040 | |
| 1041 | L<perldebug>, |
| 1042 | L<perlguts>, |
| 1043 | L<perlrun> |
| 1044 | L<re>, |
| 1045 | and |
| 1046 | L<Devel::DProf>. |