This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Re: [PATCH] Callbacks for named captures (%+ and %-)
[perl5.git] / pod / perlreapi.pod
CommitLineData
108003db
RGS
1=head1 NAME
2
3perlreapi - perl regular expression plugin interface
4
5=head1 DESCRIPTION
6
7As of Perl 5.9.5 there is a new interface for using other regexp engines than
8the default one. Each engine is supposed to provide access to a constant
9structure of the following format:
10
11 typedef struct regexp_engine {
3ab4a224 12 REGEXP* (*comp) (pTHX_ const SV * const pattern, const U32 flags);
49d7dfbc 13 I32 (*exec) (pTHX_ REGEXP * const rx, char* stringarg, char* strend,
2fdbfb4d
AB
14 char* strbeg, I32 minend, SV* screamer,
15 void* data, U32 flags);
49d7dfbc 16 char* (*intuit) (pTHX_ REGEXP * const rx, SV *sv, char *strpos,
2fdbfb4d
AB
17 char *strend, U32 flags,
18 struct re_scream_pos_data_s *data);
49d7dfbc
AB
19 SV* (*checkstr) (pTHX_ REGEXP * const rx);
20 void (*free) (pTHX_ REGEXP * const rx);
2fdbfb4d
AB
21 void (*numbered_buff_FETCH) (pTHX_ REGEXP * const rx, const I32 paren,
22 SV * const sv);
23 void (*numbered_buff_STORE) (pTHX_ REGEXP * const rx, const I32 paren,
24 SV const * const value);
25 I32 (*numbered_buff_LENGTH) (pTHX_ REGEXP * const rx, const SV * const sv,
26 const I32 paren);
192b9cd1
AB
27 SV* (*named_buff) (pTHX_ REGEXP * const rx, SV * const key,
28 SV * const value, U32 flags);
29 SV* (*named_buff_iter) (pTHX_ REGEXP * const rx, const SV * const lastkey,
30 const U32 flags);
49d7dfbc 31 SV* (*qr_package)(pTHX_ REGEXP * const rx);
108003db 32 #ifdef USE_ITHREADS
49d7dfbc 33 void* (*dupe) (pTHX_ REGEXP * const rx, CLONE_PARAMS *param);
108003db 34 #endif
108003db
RGS
35
36When a regexp is compiled, its C<engine> field is then set to point at
37the appropriate structure so that when it needs to be used Perl can find
38the right routines to do so.
39
40In order to install a new regexp handler, C<$^H{regcomp}> is set
41to an integer which (when casted appropriately) resolves to one of these
42structures. When compiling, the C<comp> method is executed, and the
43resulting regexp structure's engine field is expected to point back at
44the same structure.
45
46The pTHX_ symbol in the definition is a macro used by perl under threading
47to provide an extra argument to the routine holding a pointer back to
48the interpreter that is executing the regexp. So under threading all
49routines get an extra argument.
50
882227b7 51=head1 Callbacks
108003db
RGS
52
53=head2 comp
54
3ab4a224 55 REGEXP* comp(pTHX_ const SV * const pattern, const U32 flags);
108003db 56
3ab4a224
AB
57Compile the pattern stored in C<pattern> using the given C<flags> and
58return a pointer to a prepared C<REGEXP> structure that can perform
59the match. See L</The REGEXP structure> below for an explanation of
60the individual fields in the REGEXP struct.
61
62The C<pattern> parameter is the scalar that was used as the
63pattern. previous versions of perl would pass two C<char*> indicating
64the start and end of the stringifed pattern, the following snippet can
65be used to get the old parameters:
66
67 STRLEN plen;
68 char* exp = SvPV(pattern, plen);
69 char* xend = exp + plen;
70
71Since any scalar can be passed as a pattern it's possible to implement
72an engine that does something with an array (C<< "ook" =~ [ qw/ eek
73hlagh / ] >>) or with the non-stringified form of a compiled regular
74expression (C<< "ook" =~ qr/eek/ >>). perl's own engine will always
75stringify everything using the snippet above but that doesn't mean
76other engines have to.
108003db
RGS
77
78The C<flags> paramater is a bitfield which indicates which of the
79C<msixk> flags the regex was compiled with. In addition it contains
80info about whether C<use locale> is in effect and optimization info
81for C<split>. A regex engine might want to use the same split
82optimizations with a different syntax, for instance a Perl6 engine
83would treat C<split /^^/> equivalently to perl's C<split /^/>, see
84L<split documentation|perlfunc> and the relevant code in C<pp_split>
85in F<pp.c> to find out whether your engine should be setting these.
86
87The C<eogc> flags are stripped out before being passed to the comp
88routine. The regex engine does not need to know whether any of these
3ab4a224
AB
89are set as those flags should only affect what perl does with the
90pattern and its match variables, not how it gets compiled & executed.
108003db
RGS
91
92=over 4
93
94=item RXf_SKIPWHITE
95
96C<split ' '> or C<split> with no arguments (which really means
97C<split(' ', $_> see L<split|perlfunc>).
98
99=item RXf_START_ONLY
100
101Set if the pattern is C</^/> (C<<r->prelen == 1 && r->precomp[0] ==
102'^'>>). Will be used by the C<split> operator to split the given
103string on C<\n> (even under C</^/s>, see L<split|perlfunc>).
104
105=item RXf_WHITE
106
107Set if the pattern is exactly C</\s+/> and used by C<split>, the
108definition of whitespace varies depending on whether RXf_UTF8 or
109RXf_PMf_LOCALE is set.
110
111=item RXf_PMf_LOCALE
112
113Makes C<split> use the locale dependant definition of whitespace under C<use
114locale> when RXf_SKIPWHITE or RXf_WHITE is in effect. Under ASCII whitespace is
115defined as per L<isSPACE|perlapi/ISSPACE>, and by the internal macros
116C<is_utf8_space> under UTF-8 and C<isSPACE_LC> under C<use locale>.
117
118=item RXf_PMf_MULTILINE
119
120The C</m> flag, this ends up being passed to C<Perl_fbm_instr> by
121C<pp_split> regardless of the engine.
122
123=item RXf_PMf_SINGLELINE
124
125The C</s> flag. Guaranteed not to be used outside the regex engine.
126
127=item RXf_PMf_FOLD
128
129The C</i> flag. Guaranteed not to be used outside the regex engine.
130
131=item RXf_PMf_EXTENDED
132
133The C</x> flag. Guaranteed not to be used outside the regex
134engine. However if present on a regex C<#> comments will be stripped
135by the tokenizer regardless of the engine currently in use.
136
137=item RXf_PMf_KEEPCOPY
138
49d7dfbc 139The C</p> flag.
108003db
RGS
140
141=item RXf_UTF8
142
143Set if the pattern is L<SvUTF8()|perlapi/SvUTF8>, set by Perl_pmruntime.
144
145=back
146
882227b7
AB
147In general these flags should be preserved in C<< rx->extflags >>
148after compilation, although it is possible the regex includes
149constructs that changes them. The perl engine for instance may upgrade
150non-utf8 strings to utf8 if the pattern includes constructs such as
151C<\x{...}> that can only match unicode values. RXf_SKIPWHITE should
152always be preserved verbatim in C<< regex->extflags >>.
108003db
RGS
153
154=head2 exec
155
49d7dfbc 156 I32 exec(pTHX_ REGEXP * const rx,
108003db
RGS
157 char *stringarg, char* strend, char* strbeg,
158 I32 minend, SV* screamer,
159 void* data, U32 flags);
160
161Execute a regexp.
162
163=head2 intuit
164
49d7dfbc 165 char* intuit(pTHX_ REGEXP * const rx,
108003db 166 SV *sv, char *strpos, char *strend,
49d7dfbc 167 const U32 flags, struct re_scream_pos_data_s *data);
108003db
RGS
168
169Find the start position where a regex match should be attempted,
170or possibly whether the regex engine should not be run because the
171pattern can't match. This is called as appropriate by the core
172depending on the values of the extflags member of the regexp
173structure.
174
175=head2 checkstr
176
49d7dfbc 177 SV* checkstr(pTHX_ REGEXP * const rx);
108003db
RGS
178
179Return a SV containing a string that must appear in the pattern. Used
180by C<split> for optimising matches.
181
182=head2 free
183
49d7dfbc 184 void free(pTHX_ REGEXP * const rx);
108003db
RGS
185
186Called by perl when it is freeing a regexp pattern so that the engine
187can release any resources pointed to by the C<pprivate> member of the
188regexp structure. This is only responsible for freeing private data;
189perl will handle releasing anything else contained in the regexp structure.
190
192b9cd1 191=head2 Numbered capture callbacks
108003db 192
192b9cd1
AB
193Called to get/set the value of C<$`>, C<$'>, C<$&> and their named
194equivalents, ${^PREMATCH}, ${^POSTMATCH} and $^{MATCH}, as well as the
195numbered capture buffers (C<$1>, C<$2>, ...).
49d7dfbc
AB
196
197The C<paren> paramater will be C<-2> for C<$`>, C<-1> for C<$'>, C<0>
198for C<$&>, C<1> for C<$1> and so forth.
199
192b9cd1
AB
200The names have been chosen by analogy with L<Tie::Scalar> methods
201names with an additional B<LENGTH> callback for efficiency. However
202named capture variables are currently not tied internally but
203implemented via magic.
204
205=head3 numbered_buff_FETCH
206
207 void numbered_buff_FETCH(pTHX_ REGEXP * const rx, const I32 paren,
208 SV * const sv);
209
210Fetch a specified numbered capture. C<sv> should be set to the scalar
211to return, the scalar is passed as an argument rather than being
212returned from the function because when it's called perl already has a
213scalar to store the value, creating another one would be
214redundant. The scalar can be set with C<sv_setsv>, C<sv_setpvn> and
215friends, see L<perlapi>.
49d7dfbc
AB
216
217This callback is where perl untaints its own capture variables under
218taint mode (see L<perlsec>). See the C<Perl_reg_numbered_buff_get>
219function in F<regcomp.c> for how to untaint capture variables if
220that's something you'd like your engine to do as well.
108003db 221
192b9cd1 222=head3 numbered_buff_STORE
108003db 223
2fdbfb4d
AB
224 void (*numbered_buff_STORE) (pTHX_ REGEXP * const rx, const I32 paren,
225 SV const * const value);
108003db 226
192b9cd1
AB
227Set the value of a numbered capture variable. C<value> is the scalar
228that is to be used as the new value. It's up to the engine to make
229sure this is used as the new value (or reject it).
2fdbfb4d
AB
230
231Example:
232
233 if ("ook" =~ /(o*)/) {
234 # `paren' will be `1' and `value' will be `ee'
235 $1 =~ tr/o/e/;
236 }
237
238Perl's own engine will croak on any attempt to modify the capture
239variables, to do this in another engine use the following callack
240(copied from C<Perl_reg_numbered_buff_store>):
241
242 void
243 Example_reg_numbered_buff_store(pTHX_ REGEXP * const rx, const I32 paren,
244 SV const * const value)
245 {
246 PERL_UNUSED_ARG(rx);
247 PERL_UNUSED_ARG(paren);
248 PERL_UNUSED_ARG(value);
249
250 if (!PL_localizing)
251 Perl_croak(aTHX_ PL_no_modify);
252 }
253
254Actually perl 5.10 will not I<always> croak in a statement that looks
255like it would modify a numbered capture variable. This is because the
256STORE callback will not be called if perl can determine that it
257doesn't have to modify the value. This is exactly how tied variables
258behave in the same situation:
259
260 package CaptureVar;
261 use base 'Tie::Scalar';
262
263 sub TIESCALAR { bless [] }
264 sub FETCH { undef }
265 sub STORE { die "This doesn't get called" }
266
267 package main;
268
269 tie my $sv => "CatptureVar";
270 $sv =~ y/a/b/;
271
272Because C<$sv> is C<undef> when the C<y///> operator is applied to it
273the transliteration won't actually execute and the program won't
192b9cd1
AB
274C<die>. This is different to how 5.8 and earlier versions behaved
275since the capture variables were READONLY variables then, now they'll
276just die when assigned to in the default engine.
2fdbfb4d 277
192b9cd1 278=head3 numbered_buff_LENGTH
2fdbfb4d
AB
279
280 I32 numbered_buff_LENGTH (pTHX_ REGEXP * const rx, const SV * const sv,
281 const I32 paren);
282
283Get the C<length> of a capture variable. There's a special callback
284for this so that perl doesn't have to do a FETCH and run C<length> on
192b9cd1
AB
285the result, since the length is (in perl's case) known from an offset
286stored in C<<rx->offs> this is much more efficient:
2fdbfb4d
AB
287
288 I32 s1 = rx->offs[paren].start;
289 I32 s2 = rx->offs[paren].end;
290 I32 len = t1 - s1;
291
292This is a little bit more complex in the case of UTF-8, see what
293C<Perl_reg_numbered_buff_length> does with
294L<is_utf8_string_loclen|perlapi/is_utf8_string_loclen>.
295
192b9cd1
AB
296=head2 Named capture callbacks
297
298Called to get/set the value of C<%+> and C<%-> as well as by some
299utility functions in L<re>.
300
301There are two callbacks, C<named_buff> is called in all the cases the
302FETCH, STORE, DELETE, CLEAR, EXISTS and SCALAR L<Tie::Hash> callbacks
303would be on changes to C<%+> and C<%-> and C<named_buff_iter> in the
304same cases as FIRSTKEY and NEXTKEY.
305
306The C<flags> parameter can be used to determine which of these
307operations the callbacks should respond to, the following flags are
308currently defined:
309
310Which L<Tie::Hash> operation is being performed from the Perl level on
311C<%+> or C<%+>, if any:
312
313 RXf_HASH_FETCH
314 RXf_HASH_STORE
315 RXf_HASH_DELETE
316 RXf_HASH_CLEAR
317 RXf_HASH_EXISTS
318 RXf_HASH_SCALAR
319 RXf_HASH_FIRSTKEY
320 RXf_HASH_NEXTKEY
321
322Whether C<%+> or C<%-> is being operated on, if any.
2fdbfb4d 323
192b9cd1
AB
324 RXf_HASH_ONE /* %+ */
325 RXf_HASH_ALL /* %- */
2fdbfb4d 326
192b9cd1
AB
327Whether this is being called as C<re::regname>, C<re::regnames> or
328C<C<re::regnames_count>, if any. The first two will be combined with
329C<RXf_HASH_ONE> or C<RXf_HASH_ALL>.
330
331 RXf_HASH_REGNAME
332 RXf_HASH_REGNAMES
333 RXf_HASH_REGNAMES_COUNT
334
335Internally C<%+> and C<%-> are implemented with a real tied interface
336via L<Tie::Hash::NamedCapture>. The methods in that package will call
337back into these functions. However the usage of
338L<Tie::Hash::NamedCapture> for this purpose might change in future
339releases. For instance this might be implemented by magic instead
340(would need an extension to mgvtbl).
341
342=head3 named_buff
343
344 SV* (*named_buff) (pTHX_ REGEXP * const rx, SV * const key,
345 SV * const value, U32 flags);
346
347=head3 named_buff_iter
348
349 SV* (*named_buff_iter) (pTHX_ REGEXP * const rx, const SV * const lastkey,
350 const U32 flags);
108003db 351
49d7dfbc 352=head2 qr_package
108003db 353
49d7dfbc 354 SV* qr_package(pTHX_ REGEXP * const rx);
108003db
RGS
355
356The package the qr// magic object is blessed into (as seen by C<ref
49d7dfbc
AB
357qr//>). It is recommended that engines change this to their package
358name for identification regardless of whether they implement methods
359on the object.
360
192b9cd1
AB
361The package this method returns should also have the internal
362C<Regexp> package in its C<@ISA>. C<qr//->isa("Regexp")> should always
363be true regardless of what engine is being used.
364
365Example implementation might be:
108003db
RGS
366
367 SV*
192b9cd1 368 Example_qr_package(pTHX_ REGEXP * const rx)
108003db
RGS
369 {
370 PERL_UNUSED_ARG(rx);
371 return newSVpvs("re::engine::Example");
372 }
373
374Any method calls on an object created with C<qr//> will be dispatched to the
375package as a normal object.
376
377 use re::engine::Example;
378 my $re = qr//;
379 $re->meth; # dispatched to re::engine::Example::meth()
380
381To retrieve the C<REGEXP> object from the scalar in an XS function use the
382following snippet:
383
384 void meth(SV * rv)
385 PPCODE:
386 MAGIC * mg;
387 REGEXP * re;
388
389 if (SvMAGICAL(sv))
390 mg_get(sv);
391 if (SvROK(sv) &&
392 (sv = (SV*)SvRV(sv)) && /* assignment deliberate */
393 SvTYPE(sv) == SVt_PVMG &&
394 (mg = mg_find(sv, PERL_MAGIC_qr))) /* assignment deliberate */
395 {
192b9cd1 396 re = (REGEXP *)mg->mg_obj;
108003db
RGS
397 }
398
108003db
RGS
399=head2 dupe
400
49d7dfbc 401 void* dupe(pTHX_ REGEXP * const rx, CLONE_PARAMS *param);
108003db
RGS
402
403On threaded builds a regexp may need to be duplicated so that the pattern
404can be used by mutiple threads. This routine is expected to handle the
405duplication of any private data pointed to by the C<pprivate> member of
406the regexp structure. It will be called with the preconstructed new
407regexp structure as an argument, the C<pprivate> member will point at
408the B<old> private structue, and it is this routine's responsibility to
409construct a copy and return a pointer to it (which perl will then use to
410overwrite the field as passed to this routine.)
411
412This allows the engine to dupe its private data but also if necessary
413modify the final structure if it really must.
414
415On unthreaded builds this field doesn't exist.
416
417=head1 The REGEXP structure
418
419The REGEXP struct is defined in F<regexp.h>. All regex engines must be able to
420correctly build such a structure in their L</comp> routine.
421
422The REGEXP structure contains all the data that perl needs to be aware of
423to properly work with the regular expression. It includes data about
424optimisations that perl can use to determine if the regex engine should
425really be used, and various other control info that is needed to properly
426execute patterns in various contexts such as is the pattern anchored in
427some way, or what flags were used during the compile, or whether the
428program contains special constructs that perl needs to be aware of.
429
882227b7
AB
430In addition it contains two fields that are intended for the private
431use of the regex engine that compiled the pattern. These are the
432C<intflags> and C<pprivate> members. C<pprivate> is a void pointer to
433an arbitrary structure whose use and management is the responsibility
434of the compiling engine. perl will never modify either of these
435values.
108003db
RGS
436
437 typedef struct regexp {
438 /* what engine created this regexp? */
439 const struct regexp_engine* engine;
440
441 /* what re is this a lightweight copy of? */
442 struct regexp* mother_re;
443
444 /* Information about the match that the perl core uses to manage things */
445 U32 extflags; /* Flags used both externally and internally */
446 I32 minlen; /* mininum possible length of string to match */
447 I32 minlenret; /* mininum possible length of $& */
448 U32 gofs; /* chars left of pos that we search from */
449
450 /* substring data about strings that must appear
451 in the final match, used for optimisations */
452 struct reg_substr_data *substrs;
453
454 U32 nparens; /* number of capture buffers */
455
456 /* private engine specific data */
457 U32 intflags; /* Engine Specific Internal flags */
458 void *pprivate; /* Data private to the regex engine which
459 created this object. */
460
461 /* Data about the last/current match. These are modified during matching*/
462 U32 lastparen; /* last open paren matched */
463 U32 lastcloseparen; /* last close paren matched */
464 regexp_paren_pair *swap; /* Swap copy of *offs */
465 regexp_paren_pair *offs; /* Array of offsets for (@-) and (@+) */
466
467 char *subbeg; /* saved or original string so \digit works forever. */
468 SV_SAVED_COPY /* If non-NULL, SV which is COW from original */
469 I32 sublen; /* Length of string pointed by subbeg */
470
471 /* Information about the match that isn't often used */
472 I32 prelen; /* length of precomp */
473 const char *precomp; /* pre-compilation regular expression */
474
475 /* wrapped can't be const char*, as it is returned by sv_2pv_flags */
476 char *wrapped; /* wrapped version of the pattern */
477 I32 wraplen; /* length of wrapped */
478
479 I32 seen_evals; /* number of eval groups in the pattern - for security checks */
480 HV *paren_names; /* Optional hash of paren names */
481
482 /* Refcount of this regexp */
483 I32 refcnt; /* Refcount of this regexp */
484 } regexp;
485
486The fields are discussed in more detail below:
487
882227b7 488=head2 C<engine>
108003db
RGS
489
490This field points at a regexp_engine structure which contains pointers
491to the subroutines that are to be used for performing a match. It
492is the compiling routine's responsibility to populate this field before
493returning the regexp object.
494
495Internally this is set to C<NULL> unless a custom engine is specified in
496C<$^H{regcomp}>, perl's own set of callbacks can be accessed in the struct
497pointed to by C<RE_ENGINE_PTR>.
498
882227b7 499=head2 C<mother_re>
108003db
RGS
500
501TODO, see L<http://www.mail-archive.com/perl5-changes@perl.org/msg17328.html>
502
882227b7 503=head2 C<extflags>
108003db 504
192b9cd1
AB
505This will be used by perl to see what flags the regexp was compiled
506with, this will normally be set to the value of the flags parameter by
507the L<comp|/comp> callback.
108003db 508
882227b7 509=head2 C<minlen> C<minlenret>
108003db
RGS
510
511The minimum string length required for the pattern to match. This is used to
512prune the search space by not bothering to match any closer to the end of a
513string than would allow a match. For instance there is no point in even
514starting the regex engine if the minlen is 10 but the string is only 5
515characters long. There is no way that the pattern can match.
516
517C<minlenret> is the minimum length of the string that would be found
518in $& after a match.
519
520The difference between C<minlen> and C<minlenret> can be seen in the
521following pattern:
522
523 /ns(?=\d)/
524
525where the C<minlen> would be 3 but C<minlenret> would only be 2 as the \d is
526required to match but is not actually included in the matched content. This
527distinction is particularly important as the substitution logic uses the
528C<minlenret> to tell whether it can do in-place substition which can result in
529considerable speedup.
530
882227b7 531=head2 C<gofs>
108003db
RGS
532
533Left offset from pos() to start match at.
534
882227b7 535=head2 C<substrs>
108003db 536
192b9cd1
AB
537Substring data about strings that must appear in the final match. This
538is currently only used internally by perl's engine for but might be
539used in the future for all engines for optimisations like C<minlen>.
108003db 540
882227b7 541=head2 C<nparens>, C<lasparen>, and C<lastcloseparen>
108003db
RGS
542
543These fields are used to keep track of how many paren groups could be matched
544in the pattern, which was the last open paren to be entered, and which was
545the last close paren to be entered.
546
882227b7 547=head2 C<intflags>
108003db
RGS
548
549The engine's private copy of the flags the pattern was compiled with. Usually
192b9cd1 550this is the same as C<extflags> unless the engine chose to modify one of them.
108003db 551
882227b7 552=head2 C<pprivate>
108003db
RGS
553
554A void* pointing to an engine-defined data structure. The perl engine uses the
555C<regexp_internal> structure (see L<perlreguts/Base Structures>) but a custom
556engine should use something else.
557
882227b7 558=head2 C<swap>
108003db
RGS
559
560TODO: document
561
882227b7 562=head2 C<offs>
108003db
RGS
563
564A C<regexp_paren_pair> structure which defines offsets into the string being
565matched which correspond to the C<$&> and C<$1>, C<$2> etc. captures, the
566C<regexp_paren_pair> struct is defined as follows:
567
568 typedef struct regexp_paren_pair {
569 I32 start;
570 I32 end;
571 } regexp_paren_pair;
572
573If C<< ->offs[num].start >> or C<< ->offs[num].end >> is C<-1> then that
574capture buffer did not match. C<< ->offs[0].start/end >> represents C<$&> (or
575C<${^MATCH> under C<//p>) and C<< ->offs[paren].end >> matches C<$$paren> where
576C<$paren >= 1>.
577
882227b7 578=head2 C<precomp> C<prelen>
108003db 579
192b9cd1
AB
580Used for optimisations. C<precomp> holds a copy of the pattern that
581was compiled and C<prelen> its length. When a new pattern is to be
582compiled (such as inside a loop) the internal C<regcomp> operator
583checks whether the last compiled C<REGEXP>'s C<precomp> and C<prelen>
584are equivalent to the new one, and if so uses the old pattern instead
585of compiling a new one.
586
587The relevant snippet from C<Perl_pp_regcomp>:
588
589 if (!re || !re->precomp || re->prelen != (I32)len ||
590 memNE(re->precomp, t, len))
591 /* Compile a new pattern */
108003db 592
882227b7 593=head2 C<paren_names>
108003db
RGS
594
595This is a hash used internally to track named capture buffers and their
596offsets. The keys are the names of the buffers the values are dualvars,
597with the IV slot holding the number of buffers with the given name and the
598pv being an embedded array of I32. The values may also be contained
599independently in the data array in cases where named backreferences are
600used.
601
882227b7 602=head2 C<reg_substr_data>
108003db
RGS
603
604Holds information on the longest string that must occur at a fixed
605offset from the start of the pattern, and the longest string that must
606occur at a floating offset from the start of the pattern. Used to do
607Fast-Boyer-Moore searches on the string to find out if its worth using
608the regex engine at all, and if so where in the string to search.
609
882227b7 610=head2 C<subbeg> C<sublen> C<saved_copy>
108003db
RGS
611
612 #define SAVEPVN(p,n) ((p) ? savepvn(p,n) : NULL)
613 if (RX_MATCH_COPIED(ret))
614 ret->subbeg = SAVEPVN(ret->subbeg, ret->sublen);
615 else
616 ret->subbeg = NULL;
617
618C<PL_sawampersand || rx->extflags & RXf_PMf_KEEPCOPY>
619
620These are used during execution phase for managing search and replace
621patterns.
622
882227b7 623=head2 C<wrapped> C<wraplen>
108003db
RGS
624
625Stores the string C<qr//> stringifies to, for example C<(?-xism:eek)>
626in the case of C<qr/eek/>.
627
628When using a custom engine that doesn't support the C<(?:)> construct for
629inline modifiers it's best to have C<qr//> stringify to the supplied pattern,
630note that this will create invalid patterns in cases such as:
631
632 my $x = qr/a|b/; # "a|b"
192b9cd1 633 my $y = qr/c/i; # "c"
108003db
RGS
634 my $z = qr/$x$y/; # "a|bc"
635
192b9cd1
AB
636There's no solution for this problem other than making the custom
637engine understand a construct like C<(?:)>.
108003db
RGS
638
639The C<Perl_reg_stringify> in F<regcomp.c> does the stringification work.
640
882227b7 641=head2 C<seen_evals>
108003db
RGS
642
643This stores the number of eval groups in the pattern. This is used for security
644purposes when embedding compiled regexes into larger patterns with C<qr//>.
645
882227b7 646=head2 C<refcnt>
108003db
RGS
647
648The number of times the structure is referenced. When this falls to 0 the
649regexp is automatically freed by a call to pregfree. This should be set to 1 in
650each engine's L</comp> routine.
651
108003db
RGS
652=head1 HISTORY
653
654Originally part of L<perlreguts>.
655
656=head1 AUTHORS
657
658Originally written by Yves Orton, expanded by E<AElig>var ArnfjE<ouml>rE<eth>
659Bjarmason.
660
661=head1 LICENSE
662
663Copyright 2006 Yves Orton and 2007 E<AElig>var ArnfjE<ouml>rE<eth> Bjarmason.
664
665This program is free software; you can redistribute it and/or modify it under
666the same terms as Perl itself.
667
668=cut