pod/perldebguts.pod

   1 =head1 NAME
   2
   3 perldebguts - Guts of Perl debugging
   4
   5 =head1 DESCRIPTION
   6
   7 This is not L<perldebug>, which tells you how to use
   8 the debugger.  This manpage describes low-level details concerning
   9 the debugger's internals, which range from difficult to impossible
  10 to understand for anyone who isn't incredibly intimate with Perl's guts.
  11 Caveat lector.
  12
  13 =head1 Debugger Internals
  14
  15 Perl has special debugging hooks at compile-time and run-time used
  16 to create debugging environments.  These hooks are not to be confused
  17 with the I<perl -Dxxx> command described in L<perlrun>, which is
  18 usable only if a special Perl is built per the instructions in the
  19 F<INSTALL> podpage in the Perl source tree.
  20
  21 For example, whenever you call Perl's built-in C<caller> function
  22 from the package C<DB>, the arguments that the corresponding stack
  23 frame was called with are copied to the C<@DB::args> array.  These
  24 mechanisms are enabled by calling Perl with the B<-d> switch.
  25 Specifically, the following additional features are enabled
  26 (cf. L<perlvar/$^P>):
  27
  28 =over 4
  29
  30 =item *
  31
  32 Perl inserts the contents of C<$ENV{PERL5DB}> (or C<BEGIN {require
  33 'perl5db.pl'}> if not present) before the first line of your program.
  34
  35 =item *
  36
  37 Each array C<@{"_<$filename"}> holds the lines of $filename for a
  38 file compiled by Perl.  The same is also true for C<eval>ed strings
  39 that contain subroutines, or which are currently being executed.
  40 The $filename for C<eval>ed strings looks like C<(eval 34)>.
  41
  42 Values in this array are magical in numeric context: they compare
  43 equal to zero only if the line is not breakable.
  44
  45 =item *
  46
  47 Each hash C<%{"_<$filename"}> contains breakpoints and actions keyed
  48 by line number.  Individual entries (as opposed to the whole hash)
  49 are settable.  Perl only cares about Boolean true here, although
  50 the values used by F<perl5db.pl> have the form
  51 C<"$break_condition\0$action">.
  52
  53 The same holds for evaluated strings that contain subroutines, or
  54 which are currently being executed.  The $filename for C<eval>ed strings
  55 looks like C<(eval 34)>.
  56
  57 =item *
  58
  59 Each scalar C<${"_<$filename"}> contains C<"_<$filename">.  This is
  60 also the case for evaluated strings that contain subroutines, or
  61 which are currently being executed.  The $filename for C<eval>ed
  62 strings looks like C<(eval 34)>.
  63
  64 =item *
  65
  66 After each C<require>d file is compiled, but before it is executed,
  67 C<DB::postponed(*{"_<$filename"})> is called if the subroutine
  68 C<DB::postponed> exists.  Here, the $filename is the expanded name of
  69 the C<require>d file, as found in the values of %INC.
  70
  71 =item *
  72
  73 After each subroutine C<subname> is compiled, the existence of
  74 C<$DB::postponed{subname}> is checked.  If this key exists,
  75 C<DB::postponed(subname)> is called if the C<DB::postponed> subroutine
  76 also exists.
  77
  78 =item *
  79
  80 A hash C<%DB::sub> is maintained, whose keys are subroutine names
  81 and whose values have the form C<filename:startline-endline>.
  82 C<filename> has the form C<(eval 34)> for subroutines defined inside
  83 C<eval>s.
  84
  85 =item *
  86
  87 When the execution of your program reaches a point that can hold a
  88 breakpoint, the C<DB::DB()> subroutine is called if any of the variables
  89 C<$DB::trace>, C<$DB::single>, or C<$DB::signal> is true.  These variables
  90 are not C<local>izable.  This feature is disabled when executing
  91 inside C<DB::DB()>, including functions called from it
  92 unless C<< $^D & (1<<30) >> is true.
  93
  94 =item *
  95
  96 When execution of the program reaches a subroutine call, a call to
  97 C<&DB::sub>(I<args>) is made instead, with C<$DB::sub> holding the
  98 name of the called subroutine. (This doesn't happen if the subroutine
  99 was compiled in the C<DB> package.)
 100
 101 X<&DB::lsub>If the call is to an lvalue subroutine, and C<&DB::lsub>
 102 is defined C<&DB::lsub>(I<args>) is called instead, otherwise falling
 103 back to C<&DB::sub>(I<args>).
 104
 105 =item *
 106
 107 When execution of the program uses C<goto> to enter a non-XS
 108 subroutine and the 0x80 bit is set in C<$^P>, a call to C<&DB::goto>
 109 is made, with C<$DB::sub> holding the name of the subroutine being
 110 entered.
 111
 112 =back
 113
 114 Note that if C<&DB::sub> needs external data for it to work, no
 115 subroutine call is possible without it. As an example, the standard
 116 debugger's C<&DB::sub> depends on the C<$DB::deep> variable
 117 (it defines how many levels of recursion deep into the debugger you can go
 118 before a mandatory break).  If C<$DB::deep> is not defined, subroutine
 119 calls are not possible, even though C<&DB::sub> exists.
 120
 121 =head2 Writing Your Own Debugger
 122
 123 =head3 Environment Variables
 124
 125 The C<PERL5DB> environment variable can be used to define a debugger.
 126 For example, the minimal "working" debugger (it actually doesn't do anything)
 127 consists of one line:
 128
 129   sub DB::DB {}
 130
 131 It can easily be defined like this:
 132
 133   $ PERL5DB="sub DB::DB {}" perl -d your-script
 134
 135 Another brief debugger, slightly more useful, can be created
 136 with only the line:
 137
 138   sub DB::DB {print ++$i; scalar <STDIN>}
 139
 140 This debugger prints a number which increments for each statement
 141 encountered and waits for you to hit a newline before continuing
 142 to the next statement.
 143
 144 The following debugger is actually useful:
 145
 146   {
 147     package DB;
 148     sub DB  {}
 149     sub sub {print ++$i, " $sub\n"; &$sub}
 150   }
 151
 152 It prints the sequence number of each subroutine call and the name of the
 153 called subroutine.  Note that C<&DB::sub> is being compiled into the
 154 package C<DB> through the use of the C<package> directive.
 155
 156 When it starts, the debugger reads your rc file (F<./.perldb> or
 157 F<~/.perldb> under Unix), which can set important options.
 158 (A subroutine (C<&afterinit>) can be defined here as well; it is executed
 159 after the debugger completes its own initialization.)
 160
 161 After the rc file is read, the debugger reads the PERLDB_OPTS
 162 environment variable and uses it to set debugger options. The
 163 contents of this variable are treated as if they were the argument
 164 of an C<o ...> debugger command (q.v. in L<perldebug/"Configurable Options">).
 165
 166 =head3 Debugger Internal Variables
 167
 168 In addition to the file and subroutine-related variables mentioned above,
 169 the debugger also maintains various magical internal variables.
 170
 171 =over 4
 172
 173 =item *
 174
 175 C<@DB::dbline> is an alias for C<@{"::_<current_file"}>, which
 176 holds the lines of the currently-selected file (compiled by Perl), either
 177 explicitly chosen with the debugger's C<f> command, or implicitly by flow
 178 of execution.
 179
 180 Values in this array are magical in numeric context: they compare
 181 equal to zero only if the line is not breakable.
 182
 183 =item *
 184
 185 C<%DB::dbline> is an alias for C<%{"::_<current_file"}>, which
 186 contains breakpoints and actions keyed by line number in
 187 the currently-selected file, either explicitly chosen with the
 188 debugger's C<f> command, or implicitly by flow of execution.
 189
 190 As previously noted, individual entries (as opposed to the whole hash)
 191 are settable.  Perl only cares about Boolean true here, although
 192 the values used by F<perl5db.pl> have the form
 193 C<"$break_condition\0$action">.
 194
 195 =back
 196
 197 =head3 Debugger Customization Functions
 198
 199 Some functions are provided to simplify customization.
 200
 201 =over 4
 202
 203 =item *
 204
 205 See L<perldebug/"Configurable Options"> for a description of options parsed by
 206 C<DB::parse_options(string)>.
 207
 208 =item *
 209
 210 C<DB::dump_trace(skip[,count])> skips the specified number of frames
 211 and returns a list containing information about the calling frames (all
 212 of them, if C<count> is missing).  Each entry is reference to a hash
 213 with keys C<context> (either C<.>, C<$>, or C<@>), C<sub> (subroutine
 214 name, or info about C<eval>), C<args> (C<undef> or a reference to
 215 an array), C<file>, and C<line>.
 216
 217 =item *
 218
 219 C<DB::print_trace(FH, skip[, count[, short]])> prints
 220 formatted info about caller frames.  The last two functions may be
 221 convenient as arguments to C<< < >>, C<< << >> commands.
 222
 223 =back
 224
 225 Note that any variables and functions that are not documented in
 226 this manpages (or in L<perldebug>) are considered for internal
 227 use only, and as such are subject to change without notice.
 228
 229 =head1 Frame Listing Output Examples
 230
 231 The C<frame> option can be used to control the output of frame
 232 information.  For example, contrast this expression trace:
 233
 234  $ perl -de 42
 235  Stack dump during die enabled outside of evals.
 236
 237  Loading DB routines from perl5db.pl patch level 0.94
 238  Emacs support available.
 239
 240  Enter h or 'h h' for help.
 241
 242  main::(-e:1):   0
 243    DB<1> sub foo { 14 }
 244
 245    DB<2> sub bar { 3 }
 246
 247    DB<3> t print foo() * bar()
 248  main::((eval 172):3):   print foo() + bar();
 249  main::foo((eval 168):2):
 250  main::bar((eval 170):2):
 251  42
 252
 253 with this one, once the C<o>ption C<frame=2> has been set:
 254
 255    DB<4> o f=2
 256                 frame = '2'
 257    DB<5> t print foo() * bar()
 258  3:      foo() * bar()
 259  entering main::foo
 260   2:     sub foo { 14 };
 261  exited main::foo
 262  entering main::bar
 263   2:     sub bar { 3 };
 264  exited main::bar
 265  42
 266
 267 By way of demonstration, we present below a laborious listing
 268 resulting from setting your C<PERLDB_OPTS> environment variable to
 269 the value C<f=n N>, and running I<perl -d -V> from the command line.
 270 Examples using various values of C<n> are shown to give you a feel
 271 for the difference between settings.  Long though it may be, this
 272 is not a complete listing, but only excerpts.
 273
 274 =over 4
 275
 276 =item 1
 277
 278  entering main::BEGIN
 279   entering Config::BEGIN
 280    Package lib/Exporter.pm.
 281    Package lib/Carp.pm.
 282   Package lib/Config.pm.
 283   entering Config::TIEHASH
 284   entering Exporter::import
 285    entering Exporter::export
 286  entering Config::myconfig
 287   entering Config::FETCH
 288   entering Config::FETCH
 289   entering Config::FETCH
 290   entering Config::FETCH
 291
 292 =item 2
 293
 294  entering main::BEGIN
 295   entering Config::BEGIN
 296    Package lib/Exporter.pm.
 297    Package lib/Carp.pm.
 298   exited Config::BEGIN
 299   Package lib/Config.pm.
 300   entering Config::TIEHASH
 301   exited Config::TIEHASH
 302   entering Exporter::import
 303    entering Exporter::export
 304    exited Exporter::export
 305   exited Exporter::import
 306  exited main::BEGIN
 307  entering Config::myconfig
 308   entering Config::FETCH
 309   exited Config::FETCH
 310   entering Config::FETCH
 311   exited Config::FETCH
 312   entering Config::FETCH
 313
 314 =item 3
 315
 316  in  $=main::BEGIN() from /dev/null:0
 317   in  $=Config::BEGIN() from lib/Config.pm:2
 318    Package lib/Exporter.pm.
 319    Package lib/Carp.pm.
 320   Package lib/Config.pm.
 321   in  $=Config::TIEHASH('Config') from lib/Config.pm:644
 322   in  $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
 323    in  $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from li
 324  in  @=Config::myconfig() from /dev/null:0
 325   in  $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
 326   in  $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
 327   in  $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
 328   in  $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574
 329   in  $=Config::FETCH(ref(Config), 'osname') from lib/Config.pm:574
 330   in  $=Config::FETCH(ref(Config), 'osvers') from lib/Config.pm:574
 331
 332 =item 4
 333
 334  in  $=main::BEGIN() from /dev/null:0
 335   in  $=Config::BEGIN() from lib/Config.pm:2
 336    Package lib/Exporter.pm.
 337    Package lib/Carp.pm.
 338   out $=Config::BEGIN() from lib/Config.pm:0
 339   Package lib/Config.pm.
 340   in  $=Config::TIEHASH('Config') from lib/Config.pm:644
 341   out $=Config::TIEHASH('Config') from lib/Config.pm:644
 342   in  $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
 343    in  $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/
 344    out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/
 345   out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
 346  out $=main::BEGIN() from /dev/null:0
 347  in  @=Config::myconfig() from /dev/null:0
 348   in  $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
 349   out $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
 350   in  $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
 351   out $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
 352   in  $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
 353   out $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
 354   in  $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574
 355
 356 =item 5
 357
 358  in  $=main::BEGIN() from /dev/null:0
 359   in  $=Config::BEGIN() from lib/Config.pm:2
 360    Package lib/Exporter.pm.
 361    Package lib/Carp.pm.
 362   out $=Config::BEGIN() from lib/Config.pm:0
 363   Package lib/Config.pm.
 364   in  $=Config::TIEHASH('Config') from lib/Config.pm:644
 365   out $=Config::TIEHASH('Config') from lib/Config.pm:644
 366   in  $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
 367    in  $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E
 368    out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E
 369   out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
 370  out $=main::BEGIN() from /dev/null:0
 371  in  @=Config::myconfig() from /dev/null:0
 372   in  $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574
 373   out $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574
 374   in  $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574
 375   out $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574
 376
 377 =item 6
 378
 379  in  $=CODE(0x15eca4)() from /dev/null:0
 380   in  $=CODE(0x182528)() from lib/Config.pm:2
 381    Package lib/Exporter.pm.
 382   out $=CODE(0x182528)() from lib/Config.pm:0
 383   scalar context return from CODE(0x182528): undef
 384   Package lib/Config.pm.
 385   in  $=Config::TIEHASH('Config') from lib/Config.pm:628
 386   out $=Config::TIEHASH('Config') from lib/Config.pm:628
 387   scalar context return from Config::TIEHASH:   empty hash
 388   in  $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
 389    in  $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171
 390    out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171
 391    scalar context return from Exporter::export: ''
 392   out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
 393   scalar context return from Exporter::import: ''
 394
 395 =back
 396
 397 In all cases shown above, the line indentation shows the call tree.
 398 If bit 2 of C<frame> is set, a line is printed on exit from a
 399 subroutine as well.  If bit 4 is set, the arguments are printed
 400 along with the caller info.  If bit 8 is set, the arguments are
 401 printed even if they are tied or references.  If bit 16 is set, the
 402 return value is printed, too.
 403
 404 When a package is compiled, a line like this
 405
 406     Package lib/Carp.pm.
 407
 408 is printed with proper indentation.
 409
 410 =head1 Debugging Regular Expressions
 411
 412 There are two ways to enable debugging output for regular expressions.
 413
 414 If your perl is compiled with C<-DDEBUGGING>, you may use the
 415 B<-Dr> flag on the command line.
 416
 417 Otherwise, one can C<use re 'debug'>, which has effects at
 418 compile time and run time.  Since Perl 5.9.5, this pragma is lexically
 419 scoped.
 420
 421 =head2 Compile-time Output
 422
 423 The debugging output at compile time looks like this:
 424
 425   Compiling REx '[bc]d(ef*g)+h[ij]k$'
 426   size 45 Got 364 bytes for offset annotations.
 427   first at 1
 428   rarest char g at 0
 429   rarest char d at 0
 430      1: ANYOF[bc](12)
 431     12: EXACT <d>(14)
 432     14: CURLYX[0] {1,32767}(28)
 433     16:   OPEN1(18)
 434     18:     EXACT <e>(20)
 435     20:     STAR(23)
 436     21:       EXACT <f>(0)
 437     23:     EXACT <g>(25)
 438     25:   CLOSE1(27)
 439     27:   WHILEM[1/1](0)
 440     28: NOTHING(29)
 441     29: EXACT <h>(31)
 442     31: ANYOF[ij](42)
 443     42: EXACT <k>(44)
 444     44: EOL(45)
 445     45: END(0)
 446   anchored 'de' at 1 floating 'gh' at 3..2147483647 (checking floating)
 447         stclass 'ANYOF[bc]' minlen 7
 448   Offsets: [45]
 449         1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1]
 450         0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0]
 451         11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0]
 452         0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0]
 453   Omitting $` $& $' support.
 454
 455 The first line shows the pre-compiled form of the regex.  The second
 456 shows the size of the compiled form (in arbitrary units, usually
 457 4-byte words) and the total number of bytes allocated for the
 458 offset/length table, usually 4+C<size>*8.  The next line shows the
 459 label I<id> of the first node that does a match.
 460
 461 The
 462
 463   anchored 'de' at 1 floating 'gh' at 3..2147483647 (checking floating)
 464         stclass 'ANYOF[bc]' minlen 7
 465
 466 line (split into two lines above) contains optimizer
 467 information.  In the example shown, the optimizer found that the match
 468 should contain a substring C<de> at offset 1, plus substring C<gh>
 469 at some offset between 3 and infinity.  Moreover, when checking for
 470 these substrings (to abandon impossible matches quickly), Perl will check
 471 for the substring C<gh> before checking for the substring C<de>.  The
 472 optimizer may also use the knowledge that the match starts (at the
 473 C<first> I<id>) with a character class, and no string
 474 shorter than 7 characters can possibly match.
 475
 476 The fields of interest which may appear in this line are
 477
 478 =over 4
 479
 480 =item C<anchored> I<STRING> C<at> I<POS>
 481
 482 =item C<floating> I<STRING> C<at> I<POS1..POS2>
 483
 484 See above.
 485
 486 =item C<matching floating/anchored>
 487
 488 Which substring to check first.
 489
 490 =item C<minlen>
 491
 492 The minimal length of the match.
 493
 494 =item C<stclass> I<TYPE>
 495
 496 Type of first matching node.
 497
 498 =item C<noscan>
 499
 500 Don't scan for the found substrings.
 501
 502 =item C<isall>
 503
 504 Means that the optimizer information is all that the regular
 505 expression contains, and thus one does not need to enter the regex engine at
 506 all.
 507
 508 =item C<GPOS>
 509
 510 Set if the pattern contains C<\G>.
 511
 512 =item C<plus>
 513
 514 Set if the pattern starts with a repeated char (as in C<x+y>).
 515
 516 =item C<implicit>
 517
 518 Set if the pattern starts with C<.*>.
 519
 520 =item C<with eval>
 521
 522 Set if the pattern contain eval-groups, such as C<(?{ code })> and
 523 C<(??{ code })>.
 524
 525 =item C<anchored(TYPE)>
 526
 527 If the pattern may match only at a handful of places, with C<TYPE>
 528 being C<SBOL>, C<MBOL>, or C<GPOS>.  See the table below.
 529
 530 =back
 531
 532 If a substring is known to match at end-of-line only, it may be
 533 followed by C<$>, as in C<floating 'k'$>.
 534
 535 The optimizer-specific information is used to avoid entering (a slow) regex
 536 engine on strings that will not definitely match.  If the C<isall> flag
 537 is set, a call to the regex engine may be avoided even when the optimizer
 538 found an appropriate place for the match.
 539
 540 Above the optimizer section is the list of I<nodes> of the compiled
 541 form of the regex.  Each line has format
 542
 543 C<   >I<id>: I<TYPE> I<OPTIONAL-INFO> (I<next-id>)
 544
 545 =head2 Types of Nodes
 546
 547 Here are the current possible types, with short descriptions:
 548
 549 =for comment
 550 This table is generated by regen/regcomp.pl.  Any changes made here
 551 will be lost.
 552
 553 =for regcomp.pl begin
 554
 555  # TYPE arg-description [num-args] [longjump-len] DESCRIPTION
 556
 557  # Exit points
 558
 559  END             no         End of program.
 560  SUCCEED         no         Return from a subroutine, basically.
 561
 562  # Line Start Anchors:
 563  SBOL            no         Match "" at beginning of line: /^/, /\A/
 564  MBOL            no         Same, assuming multiline: /^/m
 565
 566  # Line End Anchors:
 567  SEOL            no         Match "" at end of line: /$/
 568  MEOL            no         Same, assuming multiline: /$/m
 569  EOS             no         Match "" at end of string: /\z/
 570
 571  # Match Start Anchors:
 572  GPOS            no         Matches where last m//g left off.
 573
 574  # Word Boundary Opcodes:
 575  BOUND           no         Like BOUNDA for non-utf8, otherwise match ""
 576                             between any Unicode \w\W or \W\w
 577  BOUNDL          no         Like BOUND/BOUNDU, but \w and \W are defined
 578                             by current locale
 579  BOUNDU          no         Match "" at any boundary of a given type
 580                             using Unicode rules
 581  BOUNDA          no         Match "" at any boundary between \w\W or
 582                             \W\w, where \w is [_a-zA-Z0-9]
 583  NBOUND          no         Like NBOUNDA for non-utf8, otherwise match
 584                             "" between any Unicode \w\w or \W\W
 585  NBOUNDL         no         Like NBOUND/NBOUNDU, but \w and \W are
 586                             defined by current locale
 587  NBOUNDU         no         Match "" at any non-boundary of a given type
 588                             using using Unicode rules
 589  NBOUNDA         no         Match "" betweeen any \w\w or \W\W, where \w
 590                             is [_a-zA-Z0-9]
 591
 592  # [Special] alternatives:
 593  REG_ANY         no         Match any one character (except newline).
 594  SANY            no         Match any one character.
 595  ANYOF           sv 1       Match character in (or not in) this class,
 596                             single char match only
 597  ANYOFD          sv 1       Like ANYOF, but /d is in effect
 598  ANYOFL          sv 1       Like ANYOF, but /l is in effect
 599
 600  # POSIX Character Classes:
 601  POSIXD          none       Some [[:class:]] under /d; the FLAGS field
 602                             gives which one
 603  POSIXL          none       Some [[:class:]] under /l; the FLAGS field
 604                             gives which one
 605  POSIXU          none       Some [[:class:]] under /u; the FLAGS field
 606                             gives which one
 607  POSIXA          none       Some [[:class:]] under /a; the FLAGS field
 608                             gives which one
 609  NPOSIXD         none       complement of POSIXD, [[:^class:]]
 610  NPOSIXL         none       complement of POSIXL, [[:^class:]]
 611  NPOSIXU         none       complement of POSIXU, [[:^class:]]
 612  NPOSIXA         none       complement of POSIXA, [[:^class:]]
 613
 614  CLUMP           no         Match any extended grapheme cluster sequence
 615
 616  # Alternation
 617
 618  # BRANCH        The set of branches constituting a single choice are
 619  #               hooked together with their "next" pointers, since
 620  #               precedence prevents anything being concatenated to
 621  #               any individual branch.  The "next" pointer of the last
 622  #               BRANCH in a choice points to the thing following the
 623  #               whole choice.  This is also where the final "next"
 624  #               pointer of each individual branch points; each branch
 625  #               starts with the operand node of a BRANCH node.
 626  #
 627  BRANCH          node       Match this alternative, or the next...
 628
 629  # Literals
 630
 631  EXACT           str        Match this string (preceded by length).
 632  EXACTL          str        Like EXACT, but /l is in effect (used so
 633                             locale-related warnings can be checked for).
 634  EXACTF          str        Match this non-UTF-8 string (not guaranteed
 635                             to be folded) using /id rules (w/len).
 636  EXACTFL         str        Match this string (not guaranteed to be
 637                             folded) using /il rules (w/len).
 638  EXACTFU         str        Match this string (folded iff in UTF-8,
 639                             length in folding doesn't change if not in
 640                             UTF-8) using /iu rules (w/len).
 641  EXACTFA         str        Match this string (not guaranteed to be
 642                             folded) using /iaa rules (w/len).
 643
 644  EXACTFU_SS      str        Match this string (folded iff in UTF-8,
 645                             length in folding may change even if not in
 646                             UTF-8) using /iu rules (w/len).
 647  EXACTFLU8       str        Rare cirucmstances: like EXACTFU, but is
 648                             under /l, UTF-8, folded, and everything in
 649                             it is above 255.
 650  EXACTFA_NO_TRIE str        Match this string (which is not trie-able;
 651                             not guaranteed to be folded) using /iaa
 652                             rules (w/len).
 653
 654  # Do nothing types
 655
 656  NOTHING         no         Match empty string.
 657  # A variant of above which delimits a group, thus stops optimizations
 658  TAIL            no         Match empty string. Can jump here from
 659                             outside.
 660
 661  # Loops
 662
 663  # STAR,PLUS    '?', and complex '*' and '+', are implemented as
 664  #               circular BRANCH structures.  Simple cases
 665  #               (one character per match) are implemented with STAR
 666  #               and PLUS for speed and to minimize recursive plunges.
 667  #
 668  STAR            node       Match this (simple) thing 0 or more times.
 669  PLUS            node       Match this (simple) thing 1 or more times.
 670
 671  CURLY           sv 2       Match this simple thing {n,m} times.
 672  CURLYN          no 2       Capture next-after-this simple thing
 673  CURLYM          no 2       Capture this medium-complex thing {n,m}
 674                             times.
 675  CURLYX          sv 2       Match this complex thing {n,m} times.
 676
 677  # This terminator creates a loop structure for CURLYX
 678  WHILEM          no         Do curly processing and see if rest matches.
 679
 680  # Buffer related
 681
 682  # OPEN,CLOSE,GROUPP     ...are numbered at compile time.
 683  OPEN            num 1      Mark this point in input as start of #n.
 684  CLOSE           num 1      Analogous to OPEN.
 685
 686  REF             num 1      Match some already matched string
 687  REFF            num 1      Match already matched string, folded using
 688                             native charset rules for non-utf8
 689  REFFL           num 1      Match already matched string, folded in loc.
 690  REFFU           num 1      Match already matched string, folded using
 691                             unicode rules for non-utf8
 692  REFFA           num 1      Match already matched string, folded using
 693                             unicode rules for non-utf8, no mixing ASCII,
 694                             non-ASCII
 695
 696  # Named references.  Code in regcomp.c assumes that these all are after
 697  # the numbered references
 698  NREF            no-sv 1    Match some already matched string
 699  NREFF           no-sv 1    Match already matched string, folded using
 700                             native charset rules for non-utf8
 701  NREFFL          no-sv 1    Match already matched string, folded in loc.
 702  NREFFU          num 1      Match already matched string, folded using
 703                             unicode rules for non-utf8
 704  NREFFA          num 1      Match already matched string, folded using
 705                             unicode rules for non-utf8, no mixing ASCII,
 706                             non-ASCII
 707
 708  # Support for long RE
 709  LONGJMP         off 1 1    Jump far away.
 710  BRANCHJ         off 1 1    BRANCH with long offset.
 711
 712  # Special Case Regops
 713  IFMATCH         off 1 2    Succeeds if the following matches.
 714  UNLESSM         off 1 2    Fails if the following matches.
 715  SUSPEND         off 1 1    "Independent" sub-RE.
 716  IFTHEN          off 1 1    Switch, should be preceded by switcher.
 717  GROUPP          num 1      Whether the group matched.
 718
 719  # The heavy worker
 720
 721  EVAL            evl/flags  Execute some Perl code.
 722                  2L
 723
 724  # Modifiers
 725
 726  MINMOD          no         Next operator is not greedy.
 727  LOGICAL         no         Next opcode should set the flag only.
 728
 729  # This is not used yet
 730  RENUM           off 1 1    Group with independently numbered parens.
 731
 732  # Trie Related
 733
 734  # Behave the same as A|LIST|OF|WORDS would. The '..C' variants
 735  # have inline charclass data (ascii only), the 'C' store it in the
 736  # structure.
 737
 738  TRIE            trie 1     Match many EXACT(F[ALU]?)? at once.
 739                             flags==type
 740  TRIEC           trie       Same as TRIE, but with embedded charclass
 741                  charclass  data
 742
 743  AHOCORASICK     trie 1     Aho Corasick stclass. flags==type
 744  AHOCORASICKC    trie       Same as AHOCORASICK, but with embedded
 745                  charclass  charclass data
 746
 747  # Regex Subroutines
 748  GOSUB           num/ofs 2L recurse to paren arg1 at (signed) ofs arg2
 749
 750  # Special conditionals
 751  NGROUPP         no-sv 1    Whether the group matched.
 752  INSUBP          num 1      Whether we are in a specific recurse.
 753  DEFINEP         none 1     Never execute directly.
 754
 755  # Backtracking Verbs
 756  ENDLIKE         none       Used only for the type field of verbs
 757  OPFAIL          no-sv 1    Same as (?!), but with verb arg
 758  ACCEPT          no-sv/num  Accepts the current matched string, with
 759                  2L         verbar
 760
 761  # Verbs With Arguments
 762  VERB            no-sv 1    Used only for the type field of verbs
 763  PRUNE           no-sv 1    Pattern fails at this startpoint if no-
 764                             backtracking through this
 765  MARKPOINT       no-sv 1    Push the current location for rollback by
 766                             cut.
 767  SKIP            no-sv 1    On failure skip forward (to the mark) before
 768                             retrying
 769  COMMIT          no-sv 1    Pattern fails outright if backtracking
 770                             through this
 771  CUTGROUP        no-sv 1    On failure go to the next alternation in the
 772                             group
 773
 774  # Control what to keep in $&.
 775  KEEPS           no         $& begins here.
 776
 777  # New charclass like patterns
 778  LNBREAK         none       generic newline pattern
 779
 780  # SPECIAL  REGOPS
 781
 782  # This is not really a node, but an optimized away piece of a "long"
 783  # node.  To simplify debugging output, we mark it as if it were a node
 784  OPTIMIZED       off        Placeholder for dump.
 785
 786  # Special opcode with the property that no opcode in a compiled program
 787  # will ever be of this type. Thus it can be used as a flag value that
 788  # no other opcode has been seen. END is used similarly, in that an END
 789  # node cant be optimized. So END implies "unoptimizable" and PSEUDO
 790  # mean "not seen anything to optimize yet".
 791  PSEUDO          off        Pseudo opcode for internal use.
 792
 793 =for regcomp.pl end
 794
 795 =for unprinted-credits
 796 Next section M-J. Dominus (mjd-perl-patch+@plover.com) 20010421
 797
 798 Following the optimizer information is a dump of the offset/length
 799 table, here split across several lines:
 800
 801   Offsets: [45]
 802         1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1]
 803         0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0]
 804         11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0]
 805         0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0]
 806
 807 The first line here indicates that the offset/length table contains 45
 808 entries.  Each entry is a pair of integers, denoted by C<offset[length]>.
 809 Entries are numbered starting with 1, so entry #1 here is C<1[4]> and
 810 entry #12 is C<5[1]>.  C<1[4]> indicates that the node labeled C<1:>
 811 (the C<1: ANYOF[bc]>) begins at character position 1 in the
 812 pre-compiled form of the regex, and has a length of 4 characters.
 813 C<5[1]> in position 12
 814 indicates that the node labeled C<12:>
 815 (the C<< 12: EXACT <d> >>) begins at character position 5 in the
 816 pre-compiled form of the regex, and has a length of 1 character.
 817 C<12[1]> in position 14
 818 indicates that the node labeled C<14:>
 819 (the C<< 14: CURLYX[0] {1,32767} >>) begins at character position 12 in the
 820 pre-compiled form of the regex, and has a length of 1 character---that
 821 is, it corresponds to the C<+> symbol in the precompiled regex.
 822
 823 C<0[0]> items indicate that there is no corresponding node.
 824
 825 =head2 Run-time Output
 826
 827 First of all, when doing a match, one may get no run-time output even
 828 if debugging is enabled.  This means that the regex engine was never
 829 entered and that all of the job was therefore done by the optimizer.
 830
 831 If the regex engine was entered, the output may look like this:
 832
 833   Matching '[bc]d(ef*g)+h[ij]k$' against 'abcdefg__gh__'
 834     Setting an EVAL scope, savestack=3
 835      2 <ab> <cdefg__gh_>    |  1: ANYOF
 836      3 <abc> <defg__gh_>    | 11: EXACT <d>
 837      4 <abcd> <efg__gh_>    | 13: CURLYX {1,32767}
 838      4 <abcd> <efg__gh_>    | 26:   WHILEM
 839                                 0 out of 1..32767  cc=effff31c
 840      4 <abcd> <efg__gh_>    | 15:     OPEN1
 841      4 <abcd> <efg__gh_>    | 17:     EXACT <e>
 842      5 <abcde> <fg__gh_>    | 19:     STAR
 843                              EXACT <f> can match 1 times out of 32767...
 844     Setting an EVAL scope, savestack=3
 845      6 <bcdef> <g__gh__>    | 22:       EXACT <g>
 846      7 <bcdefg> <__gh__>    | 24:       CLOSE1
 847      7 <bcdefg> <__gh__>    | 26:       WHILEM
 848                                     1 out of 1..32767  cc=effff31c
 849     Setting an EVAL scope, savestack=12
 850      7 <bcdefg> <__gh__>    | 15:         OPEN1
 851      7 <bcdefg> <__gh__>    | 17:         EXACT <e>
 852        restoring \1 to 4(4)..7
 853                                     failed, try continuation...
 854      7 <bcdefg> <__gh__>    | 27:         NOTHING
 855      7 <bcdefg> <__gh__>    | 28:         EXACT <h>
 856                                     failed...
 857                                 failed...
 858
 859 The most significant information in the output is about the particular I<node>
 860 of the compiled regex that is currently being tested against the target string.
 861 The format of these lines is
 862
 863 C<    >I<STRING-OFFSET> <I<PRE-STRING>> <I<POST-STRING>>   |I<ID>:  I<TYPE>
 864
 865 The I<TYPE> info is indented with respect to the backtracking level.
 866 Other incidental information appears interspersed within.
 867
 868 =head1 Debugging Perl Memory Usage
 869
 870 Perl is a profligate wastrel when it comes to memory use.  There
 871 is a saying that to estimate memory usage of Perl, assume a reasonable
 872 algorithm for memory allocation, multiply that estimate by 10, and
 873 while you still may miss the mark, at least you won't be quite so
 874 astonished.  This is not absolutely true, but may provide a good
 875 grasp of what happens.
 876
 877 Assume that an integer cannot take less than 20 bytes of memory, a
 878 float cannot take less than 24 bytes, a string cannot take less
 879 than 32 bytes (all these examples assume 32-bit architectures, the
 880 result are quite a bit worse on 64-bit architectures).  If a variable
 881 is accessed in two of three different ways (which require an integer,
 882 a float, or a string), the memory footprint may increase yet another
 883 20 bytes.  A sloppy malloc(3) implementation can inflate these
 884 numbers dramatically.
 885
 886 On the opposite end of the scale, a declaration like
 887
 888   sub foo;
 889
 890 may take up to 500 bytes of memory, depending on which release of Perl
 891 you're running.
 892
 893 Anecdotal estimates of source-to-compiled code bloat suggest an
 894 eightfold increase.  This means that the compiled form of reasonable
 895 (normally commented, properly indented etc.) code will take
 896 about eight times more space in memory than the code took
 897 on disk.
 898
 899 The B<-DL> command-line switch is obsolete since circa Perl 5.6.0
 900 (it was available only if Perl was built with C<-DDEBUGGING>).
 901 The switch was used to track Perl's memory allocations and possible
 902 memory leaks.  These days the use of malloc debugging tools like
 903 F<Purify> or F<valgrind> is suggested instead.  See also
 904 L<perlhacktips/PERL_MEM_LOG>.
 905
 906 One way to find out how much memory is being used by Perl data
 907 structures is to install the Devel::Size module from CPAN: it gives
 908 you the minimum number of bytes required to store a particular data
 909 structure.  Please be mindful of the difference between the size()
 910 and total_size().
 911
 912 If Perl has been compiled using Perl's malloc you can analyze Perl
 913 memory usage by setting $ENV{PERL_DEBUG_MSTATS}.
 914
 915 =head2 Using C<$ENV{PERL_DEBUG_MSTATS}>
 916
 917 If your perl is using Perl's malloc() and was compiled with the
 918 necessary switches (this is the default), then it will print memory
 919 usage statistics after compiling your code when C<< $ENV{PERL_DEBUG_MSTATS}
 920 > 1 >>, and before termination of the program when C<<
 921 $ENV{PERL_DEBUG_MSTATS} >= 1 >>.  The report format is similar to
 922 the following example:
 923
 924  $ PERL_DEBUG_MSTATS=2 perl -e "require Carp"
 925  Memory allocation statistics after compilation: (buckets 4(4)..8188(8192)
 926     14216 free:   130   117    28     7     9   0   2     2   1 0 0
 927                 437    61    36     0     5
 928     60924 used:   125   137   161    55     7   8   6    16   2 0 1
 929                  74   109   304    84    20
 930  Total sbrk(): 77824/21:119. Odd ends: pad+heads+chain+tail: 0+636+0+2048.
 931  Memory allocation statistics after execution:   (buckets 4(4)..8188(8192)
 932     30888 free:   245    78    85    13     6   2   1     3   2 0 1
 933                 315   162    39    42    11
 934    175816 used:   265   176  1112   111    26  22  11    27   2 1 1
 935                 196   178  1066   798    39
 936  Total sbrk(): 215040/47:145. Odd ends: pad+heads+chain+tail: 0+2192+0+6144.
 937
 938 It is possible to ask for such a statistic at arbitrary points in
 939 your execution using the mstat() function out of the standard
 940 Devel::Peek module.
 941
 942 Here is some explanation of that format:
 943
 944 =over 4
 945
 946 =item C<buckets SMALLEST(APPROX)..GREATEST(APPROX)>
 947
 948 Perl's malloc() uses bucketed allocations.  Every request is rounded
 949 up to the closest bucket size available, and a bucket is taken from
 950 the pool of buckets of that size.
 951
 952 The line above describes the limits of buckets currently in use.
 953 Each bucket has two sizes: memory footprint and the maximal size
 954 of user data that can fit into this bucket.  Suppose in the above
 955 example that the smallest bucket were size 4.  The biggest bucket
 956 would have usable size 8188, and the memory footprint would be 8192.
 957
 958 In a Perl built for debugging, some buckets may have negative usable
 959 size.  This means that these buckets cannot (and will not) be used.
 960 For larger buckets, the memory footprint may be one page greater
 961 than a power of 2.  If so, the corresponding power of two is
 962 printed in the C<APPROX> field above.
 963
 964 =item Free/Used
 965
 966 The 1 or 2 rows of numbers following that correspond to the number
 967 of buckets of each size between C<SMALLEST> and C<GREATEST>.  In
 968 the first row, the sizes (memory footprints) of buckets are powers
 969 of two--or possibly one page greater.  In the second row, if present,
 970 the memory footprints of the buckets are between the memory footprints
 971 of two buckets "above".
 972
 973 For example, suppose under the previous example, the memory footprints
 974 were
 975
 976    free:    8     16    32    64    128  256 512 1024 2048 4096 8192
 977            4     12    24    48    80
 978
 979 With a non-C<DEBUGGING> perl, the buckets starting from C<128> have
 980 a 4-byte overhead, and thus an 8192-long bucket may take up to
 981 8188-byte allocations.
 982
 983 =item C<Total sbrk(): SBRKed/SBRKs:CONTINUOUS>
 984
 985 The first two fields give the total amount of memory perl sbrk(2)ed
 986 (ess-broken? :-) and number of sbrk(2)s used.  The third number is
 987 what perl thinks about continuity of returned chunks.  So long as
 988 this number is positive, malloc() will assume that it is probable
 989 that sbrk(2) will provide continuous memory.
 990
 991 Memory allocated by external libraries is not counted.
 992
 993 =item C<pad: 0>
 994
 995 The amount of sbrk(2)ed memory needed to keep buckets aligned.
 996
 997 =item C<heads: 2192>
 998
 999 Although memory overhead of bigger buckets is kept inside the bucket, for
1000 smaller buckets, it is kept in separate areas.  This field gives the
1001 total size of these areas.
1002
1003 =item C<chain: 0>
1004
1005 malloc() may want to subdivide a bigger bucket into smaller buckets.
1006 If only a part of the deceased bucket is left unsubdivided, the rest
1007 is kept as an element of a linked list.  This field gives the total
1008 size of these chunks.
1009
1010 =item C<tail: 6144>
1011
1012 To minimize the number of sbrk(2)s, malloc() asks for more memory.  This
1013 field gives the size of the yet unused part, which is sbrk(2)ed, but
1014 never touched.
1015
1016 =back
1017
1018 =head1 SEE ALSO
1019
1020 L<perldebug>,
1021 L<perlguts>,
1022 L<perlrun>
1023 L<re>,
1024 and
1025 L<Devel::DProf>.