X-Git-Url: https://perl5.git.perl.org/perl5.git/blobdiff_plain/7b406369ac9ca595fb848308e83bf235e8fd196f..12e1284a67e5e3404c704c3f864749fd9f04c7c4:/pod/perldebguts.pod diff --git a/pod/perldebguts.pod b/pod/perldebguts.pod index d6bffb1..b924689 100644 --- a/pod/perldebguts.pod +++ b/pod/perldebguts.pod @@ -38,7 +38,6 @@ Each array C<@{"_<$filename"}> holds the lines of $filename for a file compiled by Perl. The same is also true for Ced strings that contain subroutines, or which are currently being executed. The $filename for Ced strings looks like C<(eval 34)>. -Code assertions in regexes look like C<(re_eval 19)>. Values in this array are magical in numeric context: they compare equal to zero only if the line is not breakable. @@ -53,14 +52,14 @@ C<"$break_condition\0$action">. The same holds for evaluated strings that contain subroutines, or which are currently being executed. The $filename for Ced strings -looks like C<(eval 34)> or C<(re_eval 19)>. +looks like C<(eval 34)>. =item * Each scalar C<${"_<$filename"}> contains C<"_<$filename">. This is also the case for evaluated strings that contain subroutines, or which are currently being executed. The $filename for Ced -strings looks like C<(eval 34)> or C<(re_eval 19)>. +strings looks like C<(eval 34)>. =item * @@ -81,7 +80,7 @@ also exists. A hash C<%DB::sub> is maintained, whose keys are subroutine names and whose values have the form C. C has the form C<(eval 34)> for subroutines defined inside -Cs, or C<(re_eval 19)> for those within regex code assertions. +Cs. =item * @@ -95,9 +94,29 @@ unless C<< $^D & (1<<30) >> is true. =item * When execution of the program reaches a subroutine call, a call to -C<&DB::sub>(I) is made instead, with C<$DB::sub> holding the -name of the called subroutine. (This doesn't happen if the subroutine -was compiled in the C package.) +C<&DB::sub>(I) is made instead, with C<$DB::sub> set to identify +the called subroutine. (This doesn't happen if the calling subroutine +was compiled in the C package.) C<$DB::sub> normally holds the name +of the called subroutine, if it has a name by which it can be looked up. +Failing that, C<$DB::sub> will hold a reference to the called subroutine. +Either way, the C<&DB::sub> subroutine can use C<$DB::sub> as a reference +by which to call the called subroutine, which it will normally want to do. + +X<&DB::lsub>If the call is to an lvalue subroutine, and C<&DB::lsub> +is defined C<&DB::lsub>(I) is called instead, otherwise falling +back to C<&DB::sub>(I). + +=item * + +When execution of the program uses C to enter a non-XS subroutine +and the 0x80 bit is set in C<$^P>, a call to C<&DB::goto> is made, with +C<$DB::sub> set to identify the subroutine being entered. The call to +C<&DB::goto> does not replace the C; the requested subroutine will +still be entered once C<&DB::goto> has returned. C<$DB::sub> normally +holds the name of the subroutine being entered, if it has one. Failing +that, C<$DB::sub> will hold a reference to the subroutine being entered. +Unlike when C<&DB::sub> is called, it is not guaranteed that C<$DB::sub> +can be used as a reference to operate on the subroutine being entered. =back @@ -151,7 +170,7 @@ after the debugger completes its own initialization.) After the rc file is read, the debugger reads the PERLDB_OPTS environment variable and uses it to set debugger options. The contents of this variable are treated as if they were the argument -of an C debugger command (q.v. in L). +of an C debugger command (q.v. in L). =head3 Debugger Internal Variables @@ -227,7 +246,7 @@ information. For example, contrast this expression trace: Loading DB routines from perl5db.pl patch level 0.94 Emacs support available. - Enter h or `h h' for help. + Enter h or 'h h' for help. main::(-e:1): 0 DB<1> sub foo { 14 } @@ -265,122 +284,122 @@ is not a complete listing, but only excerpts. =item 1 - entering main::BEGIN - entering Config::BEGIN - Package lib/Exporter.pm. - Package lib/Carp.pm. - Package lib/Config.pm. - entering Config::TIEHASH - entering Exporter::import - entering Exporter::export - entering Config::myconfig - entering Config::FETCH - entering Config::FETCH - entering Config::FETCH - entering Config::FETCH + entering main::BEGIN + entering Config::BEGIN + Package lib/Exporter.pm. + Package lib/Carp.pm. + Package lib/Config.pm. + entering Config::TIEHASH + entering Exporter::import + entering Exporter::export + entering Config::myconfig + entering Config::FETCH + entering Config::FETCH + entering Config::FETCH + entering Config::FETCH =item 2 - entering main::BEGIN - entering Config::BEGIN - Package lib/Exporter.pm. - Package lib/Carp.pm. - exited Config::BEGIN - Package lib/Config.pm. - entering Config::TIEHASH - exited Config::TIEHASH - entering Exporter::import - entering Exporter::export - exited Exporter::export - exited Exporter::import - exited main::BEGIN - entering Config::myconfig - entering Config::FETCH - exited Config::FETCH - entering Config::FETCH - exited Config::FETCH - entering Config::FETCH + entering main::BEGIN + entering Config::BEGIN + Package lib/Exporter.pm. + Package lib/Carp.pm. + exited Config::BEGIN + Package lib/Config.pm. + entering Config::TIEHASH + exited Config::TIEHASH + entering Exporter::import + entering Exporter::export + exited Exporter::export + exited Exporter::import + exited main::BEGIN + entering Config::myconfig + entering Config::FETCH + exited Config::FETCH + entering Config::FETCH + exited Config::FETCH + entering Config::FETCH =item 3 - in $=main::BEGIN() from /dev/null:0 - in $=Config::BEGIN() from lib/Config.pm:2 - Package lib/Exporter.pm. - Package lib/Carp.pm. - Package lib/Config.pm. - in $=Config::TIEHASH('Config') from lib/Config.pm:644 - in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 - in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from li - in @=Config::myconfig() from /dev/null:0 - in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 - in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 - in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 - in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574 - in $=Config::FETCH(ref(Config), 'osname') from lib/Config.pm:574 - in $=Config::FETCH(ref(Config), 'osvers') from lib/Config.pm:574 + in $=main::BEGIN() from /dev/null:0 + in $=Config::BEGIN() from lib/Config.pm:2 + Package lib/Exporter.pm. + Package lib/Carp.pm. + Package lib/Config.pm. + in $=Config::TIEHASH('Config') from lib/Config.pm:644 + in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 + in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from li + in @=Config::myconfig() from /dev/null:0 + in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 + in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 + in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 + in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574 + in $=Config::FETCH(ref(Config), 'osname') from lib/Config.pm:574 + in $=Config::FETCH(ref(Config), 'osvers') from lib/Config.pm:574 =item 4 - in $=main::BEGIN() from /dev/null:0 - in $=Config::BEGIN() from lib/Config.pm:2 - Package lib/Exporter.pm. - Package lib/Carp.pm. - out $=Config::BEGIN() from lib/Config.pm:0 - Package lib/Config.pm. - in $=Config::TIEHASH('Config') from lib/Config.pm:644 - out $=Config::TIEHASH('Config') from lib/Config.pm:644 - in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 - in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/ - out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/ - out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 - out $=main::BEGIN() from /dev/null:0 - in @=Config::myconfig() from /dev/null:0 - in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 - out $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 - in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 - out $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 - in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 - out $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 - in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574 + in $=main::BEGIN() from /dev/null:0 + in $=Config::BEGIN() from lib/Config.pm:2 + Package lib/Exporter.pm. + Package lib/Carp.pm. + out $=Config::BEGIN() from lib/Config.pm:0 + Package lib/Config.pm. + in $=Config::TIEHASH('Config') from lib/Config.pm:644 + out $=Config::TIEHASH('Config') from lib/Config.pm:644 + in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 + in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/ + out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/ + out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 + out $=main::BEGIN() from /dev/null:0 + in @=Config::myconfig() from /dev/null:0 + in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 + out $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574 + in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 + out $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574 + in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 + out $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574 + in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574 =item 5 - in $=main::BEGIN() from /dev/null:0 - in $=Config::BEGIN() from lib/Config.pm:2 - Package lib/Exporter.pm. - Package lib/Carp.pm. - out $=Config::BEGIN() from lib/Config.pm:0 - Package lib/Config.pm. - in $=Config::TIEHASH('Config') from lib/Config.pm:644 - out $=Config::TIEHASH('Config') from lib/Config.pm:644 - in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 - in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E - out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E - out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 - out $=main::BEGIN() from /dev/null:0 - in @=Config::myconfig() from /dev/null:0 - in $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574 - out $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574 - in $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574 - out $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574 + in $=main::BEGIN() from /dev/null:0 + in $=Config::BEGIN() from lib/Config.pm:2 + Package lib/Exporter.pm. + Package lib/Carp.pm. + out $=Config::BEGIN() from lib/Config.pm:0 + Package lib/Config.pm. + in $=Config::TIEHASH('Config') from lib/Config.pm:644 + out $=Config::TIEHASH('Config') from lib/Config.pm:644 + in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 + in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E + out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E + out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 + out $=main::BEGIN() from /dev/null:0 + in @=Config::myconfig() from /dev/null:0 + in $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574 + out $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574 + in $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574 + out $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574 =item 6 - in $=CODE(0x15eca4)() from /dev/null:0 - in $=CODE(0x182528)() from lib/Config.pm:2 - Package lib/Exporter.pm. - out $=CODE(0x182528)() from lib/Config.pm:0 - scalar context return from CODE(0x182528): undef - Package lib/Config.pm. - in $=Config::TIEHASH('Config') from lib/Config.pm:628 - out $=Config::TIEHASH('Config') from lib/Config.pm:628 - scalar context return from Config::TIEHASH: empty hash - in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 - in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171 - out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171 - scalar context return from Exporter::export: '' - out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 - scalar context return from Exporter::import: '' + in $=CODE(0x15eca4)() from /dev/null:0 + in $=CODE(0x182528)() from lib/Config.pm:2 + Package lib/Exporter.pm. + out $=CODE(0x182528)() from lib/Config.pm:0 + scalar context return from CODE(0x182528): undef + Package lib/Config.pm. + in $=Config::TIEHASH('Config') from lib/Config.pm:628 + out $=Config::TIEHASH('Config') from lib/Config.pm:628 + scalar context return from Config::TIEHASH: empty hash + in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 + in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171 + out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171 + scalar context return from Exporter::export: '' + out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0 + scalar context return from Exporter::import: '' =back @@ -402,9 +421,10 @@ is printed with proper indentation. There are two ways to enable debugging output for regular expressions. If your perl is compiled with C<-DDEBUGGING>, you may use the -B<-Dr> flag on the command line. +B<-Dr> flag on the command line, and C<-Drv> for more verbose +information. -Otherwise, one can C, which has effects at +Otherwise, one can C, which has effects at both compile time and run time. Since Perl 5.9.5, this pragma is lexically scoped. @@ -412,7 +432,7 @@ scoped. The debugging output at compile time looks like this: - Compiling REx `[bc]d(ef*g)+h[ij]k$' + Compiling REx '[bc]d(ef*g)+h[ij]k$' size 45 Got 364 bytes for offset annotations. first at 1 rarest char g at 0 @@ -433,8 +453,8 @@ The debugging output at compile time looks like this: 42: EXACT (44) 44: EOL(45) 45: END(0) - anchored `de' at 1 floating `gh' at 3..2147483647 (checking floating) - stclass `ANYOF[bc]' minlen 7 + anchored 'de' at 1 floating 'gh' at 3..2147483647 (checking floating) + stclass 'ANYOF[bc]' minlen 7 Offsets: [45] 1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1] 0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0] @@ -450,8 +470,8 @@ label I of the first node that does a match. The - anchored `de' at 1 floating `gh' at 3..2147483647 (checking floating) - stclass `ANYOF[bc]' minlen 7 + anchored 'de' at 1 floating 'gh' at 3..2147483647 (checking floating) + stclass 'ANYOF[bc]' minlen 7 line (split into two lines above) contains optimizer information. In the example shown, the optimizer found that the match @@ -515,12 +535,12 @@ C<(??{ code })>. =item C If the pattern may match only at a handful of places, with C -being C, C, or C. See the table below. +being C, C, or C. See the table below. =back If a substring is known to match at end-of-line only, it may be -followed by C<$>, as in C. +followed by C<$>, as in C. The optimizer-specific information is used to avoid entering (a slow) regex engine on strings that will not definitely match. If the C flag @@ -534,117 +554,293 @@ C< >I: I I (I) =head2 Types of Nodes -Here are the possible types, with short descriptions: - - # TYPE arg-description [num-args] [longjump-len] DESCRIPTION - - # Exit points - END no End of program. - SUCCEED no Return from a subroutine, basically. - - # Anchors: - BOL no Match "" at beginning of line. - MBOL no Same, assuming multiline. - SBOL no Same, assuming singleline. - EOS no Match "" at end of string. - EOL no Match "" at end of line. - MEOL no Same, assuming multiline. - SEOL no Same, assuming singleline. - BOUND no Match "" at any word boundary - BOUNDL no Match "" at any word boundary - NBOUND no Match "" at any word non-boundary - NBOUNDL no Match "" at any word non-boundary - GPOS no Matches where last m//g left off. - - # [Special] alternatives - ANY no Match any one character (except newline). - SANY no Match any one character. - ANYOF sv Match character in (or not in) this class. - ALNUM no Match any alphanumeric character - ALNUML no Match any alphanumeric char in locale - NALNUM no Match any non-alphanumeric character - NALNUML no Match any non-alphanumeric char in locale - SPACE no Match any whitespace character - SPACEL no Match any whitespace char in locale - NSPACE no Match any non-whitespace character - NSPACEL no Match any non-whitespace char in locale - DIGIT no Match any numeric character - NDIGIT no Match any non-numeric character - - # BRANCH The set of branches constituting a single choice are hooked - # together with their "next" pointers, since precedence prevents - # anything being concatenated to any individual branch. The - # "next" pointer of the last BRANCH in a choice points to the - # thing following the whole choice. This is also where the - # final "next" pointer of each individual branch points; each - # branch starts with the operand node of a BRANCH node. - # - BRANCH node Match this alternative, or the next... - - # BACK Normal "next" pointers all implicitly point forward; BACK - # exists to make loop structures possible. - # not used - BACK no Match "", "next" ptr points backward. - - # Literals - EXACT sv Match this string (preceded by length). - EXACTF sv Match this string, folded (prec. by length). - EXACTFL sv Match this string, folded in locale (w/len). - - # Do nothing - NOTHING no Match empty string. - # A variant of above which delimits a group, thus stops optimizations - TAIL no Match empty string. Can jump here from outside. - - # STAR,PLUS '?', and complex '*' and '+', are implemented as circular - # BRANCH structures using BACK. Simple cases (one character - # per match) are implemented with STAR and PLUS for speed - # and to minimize recursive plunges. - # - STAR node Match this (simple) thing 0 or more times. - PLUS node Match this (simple) thing 1 or more times. - - CURLY sv 2 Match this simple thing {n,m} times. - CURLYN no 2 Match next-after-this simple thing - # {n,m} times, set parens. - CURLYM no 2 Match this medium-complex thing {n,m} times. - CURLYX sv 2 Match this complex thing {n,m} times. - - # This terminator creates a loop structure for CURLYX - WHILEM no Do curly processing and see if rest matches. - - # OPEN,CLOSE,GROUPP ...are numbered at compile time. - OPEN num 1 Mark this point in input as start of #n. - CLOSE num 1 Analogous to OPEN. - - REF num 1 Match some already matched string - REFF num 1 Match already matched string, folded - REFFL num 1 Match already matched string, folded in loc. - - # grouping assertions - IFMATCH off 1 2 Succeeds if the following matches. - UNLESSM off 1 2 Fails if the following matches. - SUSPEND off 1 1 "Independent" sub-regex. - IFTHEN off 1 1 Switch, should be preceded by switcher. - GROUPP num 1 Whether the group matched. - - # Support for long regex - LONGJMP off 1 1 Jump far away. - BRANCHJ off 1 1 BRANCH with long offset. - - # The heavy worker - EVAL evl 1 Execute some Perl code. - - # Modifiers - MINMOD no Next operator is not greedy. - LOGICAL no Next opcode should set the flag only. - - # This is not used yet - RENUM off 1 1 Group with independently numbered parens. - - # This is not really a node, but an optimized-away piece of a "long" node. - # To simplify debugging output, we mark it as if it were a node - OPTIMIZED off Placeholder for dump. +Here are the current possible types, with short descriptions: + +=for comment +This table is generated by regen/regcomp.pl. Any changes made here +will be lost. + +=for regcomp.pl begin + + # TYPE arg-description [num-args] [longjump-len] DESCRIPTION + + # Exit points + + END no End of program. + SUCCEED no Return from a subroutine, basically. + + # Line Start Anchors: + SBOL no Match "" at beginning of line: /^/, /\A/ + MBOL no Same, assuming multiline: /^/m + + # Line End Anchors: + SEOL no Match "" at end of line: /$/ + MEOL no Same, assuming multiline: /$/m + EOS no Match "" at end of string: /\z/ + + # Match Start Anchors: + GPOS no Matches where last m//g left off. + + # Word Boundary Opcodes: + BOUND no Like BOUNDA for non-utf8, otherwise match + "" between any Unicode \w\W or \W\w + BOUNDL no Like BOUND/BOUNDU, but \w and \W are + defined by current locale + BOUNDU no Match "" at any boundary of a given type + using /u rules. + BOUNDA no Match "" at any boundary between \w\W or + \W\w, where \w is [_a-zA-Z0-9] + NBOUND no Like NBOUNDA for non-utf8, otherwise match + "" between any Unicode \w\w or \W\W + NBOUNDL no Like NBOUND/NBOUNDU, but \w and \W are + defined by current locale + NBOUNDU no Match "" at any non-boundary of a given + type using using /u rules. + NBOUNDA no Match "" betweeen any \w\w or \W\W, where + \w is [_a-zA-Z0-9] + + # [Special] alternatives: + REG_ANY no Match any one character (except newline). + SANY no Match any one character. + ANYOF sv Match character in (or not in) this class, + charclass single char match only + ANYOFD sv Like ANYOF, but /d is in effect + charclass + ANYOFL sv Like ANYOF, but /l is in effect + charclass + ANYOFPOSIXL sv Like ANYOFL, but matches [[:posix:]] + charclass_ classes + posixl + + ANYOFH sv 1 Like ANYOF, but only has "High" matches, + none in the bitmap; the flags field + contains the lowest matchable UTF-8 start + byte + ANYOFHb sv 1 Like ANYOFH, but all matches share the same + UTF-8 start byte, given in the flags field + ANYOFHr sv 1 Like ANYOFH, but the flags field contains + packed bounds for all matchable UTF-8 start + bytes. + + ANYOFM byte 1 Like ANYOF, but matches an invariant byte + as determined by the mask and arg + NANYOFM byte 1 complement of ANYOFM + + # POSIX Character Classes: + POSIXD none Some [[:class:]] under /d; the FLAGS field + gives which one + POSIXL none Some [[:class:]] under /l; the FLAGS field + gives which one + POSIXU none Some [[:class:]] under /u; the FLAGS field + gives which one + POSIXA none Some [[:class:]] under /a; the FLAGS field + gives which one + NPOSIXD none complement of POSIXD, [[:^class:]] + NPOSIXL none complement of POSIXL, [[:^class:]] + NPOSIXU none complement of POSIXU, [[:^class:]] + NPOSIXA none complement of POSIXA, [[:^class:]] + + CLUMP no Match any extended grapheme cluster + sequence + + # Alternation + + # BRANCH The set of branches constituting a single choice are + # hooked together with their "next" pointers, since + # precedence prevents anything being concatenated to + # any individual branch. The "next" pointer of the last + # BRANCH in a choice points to the thing following the + # whole choice. This is also where the final "next" + # pointer of each individual branch points; each branch + # starts with the operand node of a BRANCH node. + # + BRANCH node Match this alternative, or the next... + + # Literals + + EXACT str Match this string (preceded by length). + EXACTL str Like EXACT, but /l is in effect (used so + locale-related warnings can be checked + for). + EXACTF str Match this string using /id rules (w/len); + (string not UTF-8, not guaranteed to be + folded). + EXACTFL str Match this string using /il rules (w/len); + (string not guaranteed to be folded). + EXACTFU str Match this string using /iu rules (w/len); + (string folded iff in UTF-8; non-UTF8 + folded length <= unfolded). + EXACTFAA str Match this string using /iaa rules (w/len) + (string folded iff in UTF-8; non-UTF8 + folded length <= unfolded). + + EXACTFUP str Match this string using /iu rules (w/len); + (string not UTF-8, not guaranteed to be + folded; and its Problematic). + + EXACTFLU8 str Like EXACTFU, but use /il, UTF-8, folded, + and everything in it is above 255. + EXACTFAA_NO_TRIE str Match this string using /iaa rules (w/len) + (string not UTF-8, not guaranteed to be + folded, not currently trie-able). + + EXACT_ONLY8 str Like EXACT, but only UTF-8 encoded targets + can match + EXACTFU_ONLY8 str Like EXACTFU, but only UTF-8 encoded + targets can match + + EXACTFU_S_EDGE str /di rules, but nothing in it precludes /ui, + except begins and/or ends with [Ss]; + (string not UTF-8; compile-time only). + + # Do nothing types + + NOTHING no Match empty string. + # A variant of above which delimits a group, thus stops optimizations + TAIL no Match empty string. Can jump here from + outside. + + # Loops + + # STAR,PLUS '?', and complex '*' and '+', are implemented as + # circular BRANCH structures. Simple cases + # (one character per match) are implemented with STAR + # and PLUS for speed and to minimize recursive plunges. + # + STAR node Match this (simple) thing 0 or more times. + PLUS node Match this (simple) thing 1 or more times. + + CURLY sv 2 Match this simple thing {n,m} times. + CURLYN no 2 Capture next-after-this simple thing + CURLYM no 2 Capture this medium-complex thing {n,m} + times. + CURLYX sv 2 Match this complex thing {n,m} times. + + # This terminator creates a loop structure for CURLYX + WHILEM no Do curly processing and see if rest + matches. + + # Buffer related + + # OPEN,CLOSE,GROUPP ...are numbered at compile time. + OPEN num 1 Mark this point in input as start of #n. + CLOSE num 1 Close corresponding OPEN of #n. + SROPEN none Same as OPEN, but for script run + SRCLOSE none Close preceding SROPEN + + REF num 1 Match some already matched string + REFF num 1 Match already matched string, using /di + rules. + REFFL num 1 Match already matched string, using /li + rules. + REFFU num 1 Match already matched string, usng /ui. + REFFA num 1 Match already matched string, using /aai + rules. + + # Named references. Code in regcomp.c assumes that these all are after + # the numbered references + REFN no-sv 1 Match some already matched string + REFFN no-sv 1 Match already matched string, using /di + rules. + REFFLN no-sv 1 Match already matched string, using /li + rules. + REFFUN num 1 Match already matched string, using /ui + rules. + REFFAN num 1 Match already matched string, using /aai + rules. + + # Support for long RE + LONGJMP off 1 1 Jump far away. + BRANCHJ off 1 1 BRANCH with long offset. + + # Special Case Regops + IFMATCH off 1 1 Succeeds if the following matches; non-zero + flags "f", next_off "o" means lookbehind + assertion starting "f..(f-o)" characters + before current + UNLESSM off 1 1 Fails if the following matches; non-zero + flags "f", next_off "o" means lookbehind + assertion starting "f..(f-o)" characters + before current + SUSPEND off 1 1 "Independent" sub-RE. + IFTHEN off 1 1 Switch, should be preceded by switcher. + GROUPP num 1 Whether the group matched. + + # The heavy worker + + EVAL evl/flags Execute some Perl code. + 2L + + # Modifiers + + MINMOD no Next operator is not greedy. + LOGICAL no Next opcode should set the flag only. + + # This is not used yet + RENUM off 1 1 Group with independently numbered parens. + + # Trie Related + + # Behave the same as A|LIST|OF|WORDS would. The '..C' variants + # have inline charclass data (ascii only), the 'C' store it in the + # structure. + + TRIE trie 1 Match many EXACT(F[ALU]?)? at once. + flags==type + TRIEC trie Same as TRIE, but with embedded charclass + charclass data + + AHOCORASICK trie 1 Aho Corasick stclass. flags==type + AHOCORASICKC trie Same as AHOCORASICK, but with embedded + charclass charclass data + + # Regex Subroutines + GOSUB num/ofs 2L recurse to paren arg1 at (signed) ofs arg2 + + # Special conditionals + GROUPPN no-sv 1 Whether the group matched. + INSUBP num 1 Whether we are in a specific recurse. + DEFINEP none 1 Never execute directly. + + # Backtracking Verbs + ENDLIKE none Used only for the type field of verbs + OPFAIL no-sv 1 Same as (?!), but with verb arg + ACCEPT no-sv/num Accepts the current matched string, with + 2L verbar + + # Verbs With Arguments + VERB no-sv 1 Used only for the type field of verbs + PRUNE no-sv 1 Pattern fails at this startpoint if no- + backtracking through this + MARKPOINT no-sv 1 Push the current location for rollback by + cut. + SKIP no-sv 1 On failure skip forward (to the mark) + before retrying + COMMIT no-sv 1 Pattern fails outright if backtracking + through this + CUTGROUP no-sv 1 On failure go to the next alternation in + the group + + # Control what to keep in $&. + KEEPS no $& begins here. + + # New charclass like patterns + LNBREAK none generic newline pattern + + # SPECIAL REGOPS + + # This is not really a node, but an optimized away piece of a "long" + # node. To simplify debugging output, we mark it as if it were a node + OPTIMIZED off Placeholder for dump. + + # Special opcode with the property that no opcode in a compiled program + # will ever be of this type. Thus it can be used as a flag value that + # no other opcode has been seen. END is used similarly, in that an END + # node cant be optimized. So END implies "unoptimizable" and PSEUDO + # mean "not seen anything to optimize yet". + PSEUDO off Pseudo opcode for internal use. + +=for regcomp.pl end =for unprinted-credits Next section M-J. Dominus (mjd-perl-patch+@plover.com) 20010421 @@ -684,7 +880,7 @@ entered and that all of the job was therefore done by the optimizer. If the regex engine was entered, the output may look like this: - Matching `[bc]d(ef*g)+h[ij]k$' against `abcdefg__gh__' + Matching '[bc]d(ef*g)+h[ij]k$' against 'abcdefg__gh__' Setting an EVAL scope, savestack=3 2 | 1: ANYOF 3 | 11: EXACT @@ -775,19 +971,19 @@ usage statistics after compiling your code when C<< $ENV{PERL_DEBUG_MSTATS} $ENV{PERL_DEBUG_MSTATS} >= 1 >>. The report format is similar to the following example: - $ PERL_DEBUG_MSTATS=2 perl -e "require Carp" - Memory allocation statistics after compilation: (buckets 4(4)..8188(8192) - 14216 free: 130 117 28 7 9 0 2 2 1 0 0 + $ PERL_DEBUG_MSTATS=2 perl -e "require Carp" + Memory allocation statistics after compilation: (buckets 4(4)..8188(8192) + 14216 free: 130 117 28 7 9 0 2 2 1 0 0 437 61 36 0 5 - 60924 used: 125 137 161 55 7 8 6 16 2 0 1 + 60924 used: 125 137 161 55 7 8 6 16 2 0 1 74 109 304 84 20 - Total sbrk(): 77824/21:119. Odd ends: pad+heads+chain+tail: 0+636+0+2048. - Memory allocation statistics after execution: (buckets 4(4)..8188(8192) - 30888 free: 245 78 85 13 6 2 1 3 2 0 1 + Total sbrk(): 77824/21:119. Odd ends: pad+heads+chain+tail: 0+636+0+2048. + Memory allocation statistics after execution: (buckets 4(4)..8188(8192) + 30888 free: 245 78 85 13 6 2 1 3 2 0 1 315 162 39 42 11 - 175816 used: 265 176 1112 111 26 22 11 27 2 1 1 + 175816 used: 265 176 1112 111 26 22 11 27 2 1 1 196 178 1066 798 39 - Total sbrk(): 215040/47:145. Odd ends: pad+heads+chain+tail: 0+2192+0+6144. + Total sbrk(): 215040/47:145. Odd ends: pad+heads+chain+tail: 0+2192+0+6144. It is possible to ask for such a statistic at arbitrary points in your execution using the mstat() function out of the standard @@ -827,7 +1023,7 @@ of two buckets "above". For example, suppose under the previous example, the memory footprints were - free: 8 16 32 64 128 256 512 1024 2048 4096 8192 + free: 8 16 32 64 128 256 512 1024 2048 4096 8192 4 12 24 48 80 With a non-C perl, the buckets starting from C<128> have