Commit | Line | Data |
---|---|---|
fed514bc KW |
1 | =for comment |
2 | The part of this file between =for mg_vtable.pl markers is auto | |
3 | generated by mg_vtable.pl; any changes there need to be made instead to | |
4 | mg_vtable.pl | |
5 | ||
a0d0e21e LW |
6 | =head1 NAME |
7 | ||
954c1994 | 8 | perlguts - Introduction to the Perl API |
a0d0e21e LW |
9 | |
10 | =head1 DESCRIPTION | |
11 | ||
b3b6085d | 12 | This document attempts to describe how to use the Perl API, as well as |
10e2eb10 FC |
13 | to provide some info on the basic workings of the Perl core. It is far |
14 | from complete and probably contains many errors. Please refer any | |
b3b6085d | 15 | questions or comments to the author below. |
a0d0e21e | 16 | |
0a753a76 | 17 | =head1 Variables |
18 | ||
5f05dabc | 19 | =head2 Datatypes |
a0d0e21e LW |
20 | |
21 | Perl has three typedefs that handle Perl's three main data types: | |
22 | ||
23 | SV Scalar Value | |
24 | AV Array Value | |
25 | HV Hash Value | |
26 | ||
d1b91892 | 27 | Each typedef has specific routines that manipulate the various data types. |
a0d0e21e | 28 | |
3f620621 | 29 | =for apidoc_section $AV |
63dbc4a9 | 30 | =for apidoc Ayh||AV |
3f620621 | 31 | =for apidoc_section $HV |
63dbc4a9 | 32 | =for apidoc Ayh||HV |
3f620621 | 33 | =for apidoc_section $SV |
63dbc4a9 KW |
34 | =for apidoc Ayh||SV |
35 | ||
a0d0e21e LW |
36 | =head2 What is an "IV"? |
37 | ||
954c1994 | 38 | Perl uses a special typedef IV which is a simple signed integer type that is |
5f05dabc | 39 | guaranteed to be large enough to hold a pointer (as well as an integer). |
954c1994 | 40 | Additionally, there is the UV, which is simply an unsigned IV. |
a0d0e21e | 41 | |
63dbc4a9 KW |
42 | Perl also uses several special typedefs to declare variables to hold |
43 | integers of (at least) a given size. | |
44 | Use I8, I16, I32, and I64 to declare a signed integer variable which has | |
45 | at least as many bits as the number in its name. These all evaluate to | |
46 | the native C type that is closest to the given number of bits, but no | |
47 | smaller than that number. For example, on many platforms, a C<short> is | |
48 | 16 bits long, and if so, I16 will evaluate to a C<short>. But on | |
49 | platforms where a C<short> isn't exactly 16 bits, Perl will use the | |
50 | smallest type that contains 16 bits or more. | |
51 | ||
52 | U8, U16, U32, and U64 are to declare the corresponding unsigned integer | |
53 | types. | |
54 | ||
55 | If the platform doesn't support 64-bit integers, both I64 and U64 will | |
56 | be undefined. Use IV and UV to declare the largest practicable, and | |
57 | C<L<perlapi/WIDEST_UTYPE>> for the absolute maximum unsigned, but which | |
58 | may not be usable in all circumstances. | |
59 | ||
60 | A numeric constant can be specified with L<perlapi/C<INT16_C>>, | |
61 | L<perlapi/C<UINTMAX_C>>, and similar. | |
62 | ||
3f620621 | 63 | =for apidoc_section $integer |
1607e393 KW |
64 | =for apidoc Ayh ||IV |
65 | =for apidoc_item ||I8 | |
63dbc4a9 KW |
66 | =for apidoc_item ||I16 |
67 | =for apidoc_item ||I32 | |
68 | =for apidoc_item ||I64 | |
63dbc4a9 | 69 | |
1607e393 KW |
70 | =for apidoc Ayh ||UV |
71 | =for apidoc_item ||U8 | |
63dbc4a9 KW |
72 | =for apidoc_item ||U16 |
73 | =for apidoc_item ||U32 | |
74 | =for apidoc_item ||U64 | |
a0d0e21e | 75 | |
54310121 | 76 | =head2 Working with SVs |
a0d0e21e | 77 | |
20dbd849 NC |
78 | An SV can be created and loaded with one command. There are five types of |
79 | values that can be loaded: an integer value (IV), an unsigned integer | |
80 | value (UV), a double (NV), a string (PV), and another scalar (SV). | |
61984ee1 KW |
81 | ("PV" stands for "Pointer Value". You might think that it is misnamed |
82 | because it is described as pointing only to strings. However, it is | |
3ee1a09c | 83 | possible to have it point to other things. For example, it could point |
d6605d24 | 84 | to an array of UVs. But, |
61984ee1 KW |
85 | using it for non-strings requires care, as the underlying assumption of |
86 | much of the internals is that PVs are just for strings. Often, for | |
6602b933 | 87 | example, a trailing C<NUL> is tacked on automatically. The non-string use |
61984ee1 | 88 | is documented only in this paragraph.) |
a0d0e21e | 89 | |
7cc7ada7 | 90 | =for apidoc_section $floating |
63dbc4a9 KW |
91 | =for apidoc Ayh||NV |
92 | ||
20dbd849 | 93 | The seven routines are: |
a0d0e21e LW |
94 | |
95 | SV* newSViv(IV); | |
20dbd849 | 96 | SV* newSVuv(UV); |
a0d0e21e | 97 | SV* newSVnv(double); |
06f6df17 RGS |
98 | SV* newSVpv(const char*, STRLEN); |
99 | SV* newSVpvn(const char*, STRLEN); | |
46fc3d4c | 100 | SV* newSVpvf(const char*, ...); |
a0d0e21e LW |
101 | SV* newSVsv(SV*); |
102 | ||
e613617c | 103 | C<STRLEN> is an integer type (C<Size_t>, usually defined as C<size_t> in |
06f6df17 RGS |
104 | F<config.h>) guaranteed to be large enough to represent the size of |
105 | any string that perl can handle. | |
106 | ||
7cc7ada7 | 107 | =for apidoc_section $string |
63dbc4a9 KW |
108 | =for apidoc Ayh||STRLEN |
109 | ||
3bf17896 | 110 | In the unlikely case of a SV requiring more complex initialization, you |
06f6df17 RGS |
111 | can create an empty SV with newSV(len). If C<len> is 0 an empty SV of |
112 | type NULL is returned, else an SV of type PV is returned with len + 1 (for | |
6602b933 | 113 | the C<NUL>) bytes of storage allocated, accessible via SvPVX. In both cases |
da8c5729 | 114 | the SV has the undef value. |
20dbd849 | 115 | |
06f6df17 | 116 | SV *sv = newSV(0); /* no storage allocated */ |
a9b0660e KW |
117 | SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage |
118 | * allocated */ | |
20dbd849 | 119 | |
06f6df17 | 120 | To change the value of an I<already-existing> SV, there are eight routines: |
a0d0e21e LW |
121 | |
122 | void sv_setiv(SV*, IV); | |
deb3007b | 123 | void sv_setuv(SV*, UV); |
a0d0e21e | 124 | void sv_setnv(SV*, double); |
08105a92 | 125 | void sv_setpv(SV*, const char*); |
06f6df17 | 126 | void sv_setpvn(SV*, const char*, STRLEN) |
46fc3d4c | 127 | void sv_setpvf(SV*, const char*, ...); |
a9b0660e | 128 | void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, |
03a22d83 | 129 | SV **, Size_t, bool *); |
a0d0e21e LW |
130 | void sv_setsv(SV*, SV*); |
131 | ||
132 | Notice that you can choose to specify the length of the string to be | |
9da1e3b5 MUN |
133 | assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may |
134 | allow Perl to calculate the length by using C<sv_setpv> or by specifying | |
135 | 0 as the second argument to C<newSVpv>. Be warned, though, that Perl will | |
136 | determine the string's length by using C<strlen>, which depends on the | |
6602b933 | 137 | string terminating with a C<NUL> character, and not otherwise containing |
a9b0660e | 138 | NULs. |
9abd00ed GS |
139 | |
140 | The arguments of C<sv_setpvf> are processed like C<sprintf>, and the | |
141 | formatted output becomes the value. | |
142 | ||
328bf373 | 143 | C<sv_vsetpvfn> is an analogue of C<vsprintf>, but it allows you to specify |
9abd00ed GS |
144 | either a pointer to a variable argument list or the address and length of |
145 | an array of SVs. The last argument points to a boolean; on return, if that | |
146 | boolean is true, then locale-specific information has been used to format | |
c2611fb3 | 147 | the string, and the string's contents are therefore untrustworthy (see |
9abd00ed GS |
148 | L<perlsec>). This pointer may be NULL if that information is not |
149 | important. Note that this function requires you to specify the length of | |
150 | the format. | |
151 | ||
9da1e3b5 | 152 | The C<sv_set*()> functions are not generic enough to operate on values |
5a0de581 | 153 | that have "magic". See L</Magic Virtual Tables> later in this document. |
a0d0e21e | 154 | |
6602b933 KW |
155 | All SVs that contain strings should be terminated with a C<NUL> character. |
156 | If it is not C<NUL>-terminated there is a risk of | |
5f05dabc | 157 | core dumps and corruptions from code which passes the string to C |
6602b933 KW |
158 | functions or system calls which expect a C<NUL>-terminated string. |
159 | Perl's own functions typically add a trailing C<NUL> for this reason. | |
5f05dabc | 160 | Nevertheless, you should be very careful when you pass a string stored |
161 | in an SV to a C function or system call. | |
162 | ||
3c3f883d FG |
163 | To access the actual value that an SV points to, Perl's API exposes |
164 | several macros that coerce the actual scalar type into an IV, UV, double, | |
165 | or string: | |
166 | ||
167 | =over | |
168 | ||
169 | =item * C<SvIV(SV*)> (C<IV>) and C<SvUV(SV*)> (C<UV>) | |
170 | ||
171 | =item * C<SvNV(SV*)> (C<double>) | |
172 | ||
173 | =item * Strings are a bit complicated: | |
174 | ||
175 | =over | |
176 | ||
177 | =item * Byte string: C<SvPVbyte(SV*, STRLEN len)> or C<SvPVbyte_nolen(SV*)> | |
178 | ||
179 | If the Perl string is C<"\xff\xff">, then this returns a 2-byte C<char*>. | |
180 | ||
181 | This is suitable for Perl strings that represent bytes. | |
182 | ||
183 | =item * UTF-8 string: C<SvPVutf8(SV*, STRLEN len)> or C<SvPVutf8_nolen(SV*)> | |
184 | ||
185 | If the Perl string is C<"\xff\xff">, then this returns a 4-byte C<char*>. | |
186 | ||
187 | This is suitable for Perl strings that represent characters. | |
188 | ||
189 | B<CAVEAT>: That C<char*> will be encoded via Perl's internal UTF-8 variant, | |
190 | which means that if the SV contains non-Unicode code points (e.g., | |
191 | 0x110000), then the result may contain extensions over valid UTF-8. | |
192 | See L<perlapi/is_strict_utf8_string> for some methods Perl gives | |
193 | you to check the UTF-8 validity of these macros' returns. | |
194 | ||
195 | =item * You can also use C<SvPV(SV*, STRLEN len)> or C<SvPV_nolen(SV*)> | |
196 | to fetch the SV's raw internal buffer. This is tricky, though; if your Perl | |
197 | string | |
198 | is C<"\xff\xff">, then depending on the SV's internal encoding you might get | |
199 | back a 2-byte B<OR> a 4-byte C<char*>. | |
200 | Moreover, if it's the 4-byte string, that could come from either Perl | |
201 | C<"\xff\xff"> stored UTF-8 encoded, or Perl C<"\xc3\xbf\xc3\xbf"> stored | |
202 | as raw octets. To differentiate between these you B<MUST> look up the | |
203 | SV's UTF8 bit (cf. C<SvUTF8>) to know whether the source Perl string | |
204 | is 2 characters (C<SvUTF8> would be on) or 4 characters (C<SvUTF8> would be | |
205 | off). | |
206 | ||
207 | B<IMPORTANT:> Use of C<SvPV>, C<SvPV_nolen>, or | |
208 | similarly-named macros I<without> looking up the SV's UTF8 bit is | |
209 | almost certainly a bug if non-ASCII input is allowed. | |
210 | ||
211 | When the UTF8 bit is on, the same B<CAVEAT> about UTF-8 validity applies | |
212 | here as for C<SvPVutf8>. | |
213 | ||
214 | =back | |
215 | ||
216 | (See L</How do I pass a Perl string to a C library?> for more details.) | |
217 | ||
218 | In C<SvPVbyte>, C<SvPVutf8>, and C<SvPV>, the length of the C<char*> returned | |
219 | is placed into the | |
220 | variable C<len> (these are macros, so you do I<not> use C<&len>). If you do | |
221 | not care what the length of the data is, use C<SvPVbyte_nolen>, | |
222 | C<SvPVutf8_nolen>, or C<SvPV_nolen> instead. | |
223 | The global variable C<PL_na> can also be given to | |
224 | C<SvPVbyte>/C<SvPVutf8>/C<SvPV> | |
225 | in this case. But that can be quite inefficient because C<PL_na> must | |
1fa8b10d JD |
226 | be accessed in thread-local storage in threaded Perl. In any case, remember |
227 | that Perl allows arbitrary strings of data that may both contain NULs and | |
6602b933 | 228 | might not be terminated by a C<NUL>. |
a0d0e21e | 229 | |
3c3f883d | 230 | Also remember that C doesn't allow you to safely say C<foo(SvPVbyte(s, len), |
10e2eb10 FC |
231 | len);>. It might work with your |
232 | compiler, but it won't work for everyone. | |
ce2f5d8f KA |
233 | Break this sort of statement up into separate assignments: |
234 | ||
1aa6ea50 JC |
235 | SV *s; |
236 | STRLEN len; | |
61955433 | 237 | char *ptr; |
3c3f883d | 238 | ptr = SvPVbyte(s, len); |
1aa6ea50 | 239 | foo(ptr, len); |
ce2f5d8f | 240 | |
3c3f883d FG |
241 | =back |
242 | ||
07fa94a1 | 243 | If you want to know if the scalar value is TRUE, you can use: |
a0d0e21e LW |
244 | |
245 | SvTRUE(SV*) | |
246 | ||
247 | Although Perl will automatically grow strings for you, if you need to force | |
248 | Perl to allocate more memory for your SV, you can use the macro | |
249 | ||
250 | SvGROW(SV*, STRLEN newlen) | |
251 | ||
252 | which will determine if more memory needs to be allocated. If so, it will | |
253 | call the function C<sv_grow>. Note that C<SvGROW> can only increase, not | |
5f05dabc | 254 | decrease, the allocated memory of an SV and that it does not automatically |
6602b933 | 255 | add space for the trailing C<NUL> byte (perl's own string functions typically do |
8ebc5c01 | 256 | C<SvGROW(sv, len + 1)>). |
a0d0e21e | 257 | |
21134f66 | 258 | If you want to write to an existing SV's buffer and set its value to a |
3c3f883d | 259 | string, use SvPVbyte_force() or one of its variants to force the SV to be |
21134f66 TC |
260 | a PV. This will remove any of various types of non-stringness from |
261 | the SV while preserving the content of the SV in the PV. This can be | |
262 | used, for example, to append data from an API function to a buffer | |
263 | without extra copying: | |
264 | ||
265 | (void)SvPVbyte_force(sv, len); | |
266 | s = SvGROW(sv, len + needlen + 1); | |
267 | /* something that modifies up to needlen bytes at s+len, but | |
268 | modifies newlen bytes | |
269 | eg. newlen = read(fd, s + len, needlen); | |
270 | ignoring errors for these examples | |
271 | */ | |
272 | s[len + newlen] = '\0'; | |
273 | SvCUR_set(sv, len + newlen); | |
274 | SvUTF8_off(sv); | |
275 | SvSETMAGIC(sv); | |
276 | ||
277 | If you already have the data in memory or if you want to keep your | |
278 | code simple, you can use one of the sv_cat*() variants, such as | |
279 | sv_catpvn(). If you want to insert anywhere in the string you can use | |
280 | sv_insert() or sv_insert_flags(). | |
281 | ||
282 | If you don't need the existing content of the SV, you can avoid some | |
283 | copying with: | |
284 | ||
5b1fede8 | 285 | SvPVCLEAR(sv); |
21134f66 TC |
286 | s = SvGROW(sv, needlen + 1); |
287 | /* something that modifies up to needlen bytes at s, but modifies | |
288 | newlen bytes | |
889339bf | 289 | eg. newlen = read(fd, s, needlen); |
21134f66 TC |
290 | */ |
291 | s[newlen] = '\0'; | |
292 | SvCUR_set(sv, newlen); | |
293 | SvPOK_only(sv); /* also clears SVf_UTF8 */ | |
294 | SvSETMAGIC(sv); | |
295 | ||
296 | Again, if you already have the data in memory or want to avoid the | |
297 | complexity of the above, you can use sv_setpvn(). | |
298 | ||
299 | If you have a buffer allocated with Newx() and want to set that as the | |
300 | SV's value, you can use sv_usepvn_flags(). That has some requirements | |
301 | if you want to avoid perl re-allocating the buffer to fit the trailing | |
302 | NUL: | |
303 | ||
304 | Newx(buf, somesize+1, char); | |
305 | /* ... fill in buf ... */ | |
306 | buf[somesize] = '\0'; | |
307 | sv_usepvn_flags(sv, buf, somesize, SV_SMAGIC | SV_HAS_TRAILING_NUL); | |
308 | /* buf now belongs to perl, don't release it */ | |
309 | ||
a0d0e21e LW |
310 | If you have an SV and want to know what kind of data Perl thinks is stored |
311 | in it, you can use the following macros to check the type of SV you have. | |
312 | ||
313 | SvIOK(SV*) | |
314 | SvNOK(SV*) | |
315 | SvPOK(SV*) | |
316 | ||
dcab5185 TC |
317 | Be aware that retrieving the numeric value of an SV can set IOK or NOK |
318 | on that SV, even when the SV started as a string. Prior to Perl | |
319 | 5.36.0 retrieving the string value of an integer could set POK, but | |
320 | this can no longer occur. From 5.36.0 this can be used to distinguish | |
321 | the original representation of an SV and is intended to make life | |
322 | simpler for serializers: | |
323 | ||
324 | /* references handled elsewhere */ | |
325 | if (SvIsBOOL(sv)) { | |
326 | /* originally boolean */ | |
327 | ... | |
328 | } | |
329 | else if (SvPOK(sv)) { | |
330 | /* originally a string */ | |
331 | ... | |
332 | } | |
333 | else if (SvNIOK(sv)) { | |
334 | /* originally numeric */ | |
335 | ... | |
336 | } | |
337 | else { | |
338 | /* something special or undef */ | |
339 | } | |
340 | ||
a0d0e21e LW |
341 | You can get and set the current length of the string stored in an SV with |
342 | the following macros: | |
343 | ||
344 | SvCUR(SV*) | |
345 | SvCUR_set(SV*, I32 val) | |
346 | ||
cb1a09d0 AD |
347 | You can also get a pointer to the end of the string stored in the SV |
348 | with the macro: | |
349 | ||
350 | SvEND(SV*) | |
351 | ||
352 | But note that these last three macros are valid only if C<SvPOK()> is true. | |
a0d0e21e | 353 | |
d1b91892 AD |
354 | If you want to append something to the end of string stored in an C<SV*>, |
355 | you can use the following functions: | |
356 | ||
08105a92 | 357 | void sv_catpv(SV*, const char*); |
e65f3abd | 358 | void sv_catpvn(SV*, const char*, STRLEN); |
46fc3d4c | 359 | void sv_catpvf(SV*, const char*, ...); |
a9b0660e KW |
360 | void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, |
361 | I32, bool); | |
d1b91892 AD |
362 | void sv_catsv(SV*, SV*); |
363 | ||
364 | The first function calculates the length of the string to be appended by | |
365 | using C<strlen>. In the second, you specify the length of the string | |
46fc3d4c | 366 | yourself. The third function processes its arguments like C<sprintf> and |
9abd00ed GS |
367 | appends the formatted output. The fourth function works like C<vsprintf>. |
368 | You can specify the address and length of an array of SVs instead of the | |
10e2eb10 FC |
369 | va_list argument. The fifth function |
370 | extends the string stored in the first | |
9abd00ed GS |
371 | SV with the string stored in the second SV. It also forces the second SV |
372 | to be interpreted as a string. | |
373 | ||
374 | The C<sv_cat*()> functions are not generic enough to operate on values that | |
5a0de581 | 375 | have "magic". See L</Magic Virtual Tables> later in this document. |
d1b91892 | 376 | |
a0d0e21e LW |
377 | If you know the name of a scalar variable, you can get a pointer to its SV |
378 | by using the following: | |
379 | ||
64ace3f8 | 380 | SV* get_sv("package::varname", 0); |
a0d0e21e LW |
381 | |
382 | This returns NULL if the variable does not exist. | |
383 | ||
d1b91892 | 384 | If you want to know if this variable (or any other SV) is actually C<defined>, |
a0d0e21e LW |
385 | you can call: |
386 | ||
387 | SvOK(SV*) | |
388 | ||
06f6df17 | 389 | The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>. |
9adebda4 | 390 | |
10e2eb10 FC |
391 | Its address can be used whenever an C<SV*> is needed. Make sure that |
392 | you don't try to compare a random sv with C<&PL_sv_undef>. For example | |
9adebda4 SB |
393 | when interfacing Perl code, it'll work correctly for: |
394 | ||
395 | foo(undef); | |
396 | ||
397 | But won't work when called as: | |
398 | ||
399 | $x = undef; | |
400 | foo($x); | |
401 | ||
402 | So to repeat always use SvOK() to check whether an sv is defined. | |
403 | ||
404 | Also you have to be careful when using C<&PL_sv_undef> as a value in | |
5a0de581 | 405 | AVs or HVs (see L</AVs, HVs and undefined values>). |
a0d0e21e | 406 | |
06f6df17 RGS |
407 | There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain |
408 | boolean TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their | |
409 | addresses can be used whenever an C<SV*> is needed. | |
a0d0e21e | 410 | |
9cde0e7f | 411 | Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>. |
a0d0e21e LW |
412 | Take this code: |
413 | ||
414 | SV* sv = (SV*) 0; | |
415 | if (I-am-to-return-a-real-value) { | |
416 | sv = sv_2mortal(newSViv(42)); | |
417 | } | |
418 | sv_setsv(ST(0), sv); | |
419 | ||
420 | This code tries to return a new SV (which contains the value 42) if it should | |
04343c6d | 421 | return a real value, or undef otherwise. Instead it has returned a NULL |
a0d0e21e | 422 | pointer which, somewhere down the line, will cause a segmentation violation, |
06f6df17 RGS |
423 | bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the |
424 | first line and all will be well. | |
a0d0e21e LW |
425 | |
426 | To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this | |
5a0de581 | 427 | call is not necessary (see L</Reference Counts and Mortality>). |
a0d0e21e | 428 | |
94dde4fb SC |
429 | =head2 Offsets |
430 | ||
431 | Perl provides the function C<sv_chop> to efficiently remove characters | |
432 | from the beginning of a string; you give it an SV and a pointer to | |
da75cd15 | 433 | somewhere inside the PV, and it discards everything before the |
10e2eb10 | 434 | pointer. The efficiency comes by means of a little hack: instead of |
94dde4fb SC |
435 | actually removing the characters, C<sv_chop> sets the flag C<OOK> |
436 | (offset OK) to signal to other functions that the offset hack is in | |
883bb8c0 KW |
437 | effect, and it moves the PV pointer (called C<SvPVX>) forward |
438 | by the number of bytes chopped off, and adjusts C<SvCUR> and C<SvLEN> | |
439 | accordingly. (A portion of the space between the old and new PV | |
440 | pointers is used to store the count of chopped bytes.) | |
94dde4fb SC |
441 | |
442 | Hence, at this point, the start of the buffer that we allocated lives | |
443 | at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing | |
444 | into the middle of this allocated storage. | |
445 | ||
f942a0df FC |
446 | This is best demonstrated by example. Normally copy-on-write will prevent |
447 | the substitution from operator from using this hack, but if you can craft a | |
448 | string for which copy-on-write is not possible, you can see it in play. In | |
449 | the current implementation, the final byte of a string buffer is used as a | |
450 | copy-on-write reference count. If the buffer is not big enough, then | |
451 | copy-on-write is skipped. First have a look at an empty string: | |
452 | ||
453 | % ./perl -Ilib -MDevel::Peek -le '$a=""; $a .= ""; Dump $a' | |
454 | SV = PV(0x7ffb7c008a70) at 0x7ffb7c030390 | |
455 | REFCNT = 1 | |
456 | FLAGS = (POK,pPOK) | |
457 | PV = 0x7ffb7bc05b50 ""\0 | |
458 | CUR = 0 | |
459 | LEN = 10 | |
460 | ||
461 | Notice here the LEN is 10. (It may differ on your platform.) Extend the | |
462 | length of the string to one less than 10, and do a substitution: | |
94dde4fb | 463 | |
e46aa1dd KW |
464 | % ./perl -Ilib -MDevel::Peek -le '$a=""; $a.="123456789"; $a=~s/.//; \ |
465 | Dump($a)' | |
466 | SV = PV(0x7ffa04008a70) at 0x7ffa04030390 | |
467 | REFCNT = 1 | |
468 | FLAGS = (POK,OOK,pPOK) | |
469 | OFFSET = 1 | |
470 | PV = 0x7ffa03c05b61 ( "\1" . ) "23456789"\0 | |
471 | CUR = 8 | |
472 | LEN = 9 | |
94dde4fb | 473 | |
f942a0df | 474 | Here the number of bytes chopped off (1) is shown next as the OFFSET. The |
94dde4fb SC |
475 | portion of the string between the "real" and the "fake" beginnings is |
476 | shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect | |
f942a0df FC |
477 | the fake beginning, not the real one. (The first character of the string |
478 | buffer happens to have changed to "\1" here, not "1", because the current | |
479 | implementation stores the offset count in the string buffer. This is | |
480 | subject to change.) | |
94dde4fb | 481 | |
fe854a6f | 482 | Something similar to the offset hack is performed on AVs to enable |
319cef53 SC |
483 | efficient shifting and splicing off the beginning of the array; while |
484 | C<AvARRAY> points to the first element in the array that is visible from | |
10e2eb10 | 485 | Perl, C<AvALLOC> points to the real start of the C array. These are |
319cef53 | 486 | usually the same, but a C<shift> operation can be carried out by |
6de131f0 | 487 | increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvMAX>. |
319cef53 | 488 | Again, the location of the real start of the C array only comes into |
10e2eb10 | 489 | play when freeing the array. See C<av_shift> in F<av.c>. |
319cef53 | 490 | |
6ef63541 KW |
491 | =for apidoc_section $AV |
492 | =for apidoc Amh||AvALLOC|AV* av | |
493 | ||
d1b91892 | 494 | =head2 What's Really Stored in an SV? |
a0d0e21e LW |
495 | |
496 | Recall that the usual method of determining the type of scalar you have is | |
5f05dabc | 497 | to use C<Sv*OK> macros. Because a scalar can be both a number and a string, |
d1b91892 | 498 | usually these macros will always return TRUE and calling the C<Sv*V> |
a0d0e21e LW |
499 | macros will do the appropriate conversion of string to integer/double or |
500 | integer/double to string. | |
501 | ||
502 | If you I<really> need to know if you have an integer, double, or string | |
503 | pointer in an SV, you can use the following three macros instead: | |
504 | ||
505 | SvIOKp(SV*) | |
506 | SvNOKp(SV*) | |
507 | SvPOKp(SV*) | |
508 | ||
509 | These will tell you if you truly have an integer, double, or string pointer | |
d1b91892 | 510 | stored in your SV. The "p" stands for private. |
a0d0e21e | 511 | |
da8c5729 | 512 | There are various ways in which the private and public flags may differ. |
9090718a FC |
513 | For example, in perl 5.16 and earlier a tied SV may have a valid |
514 | underlying value in the IV slot (so SvIOKp is true), but the data | |
515 | should be accessed via the FETCH routine rather than directly, | |
516 | so SvIOK is false. (In perl 5.18 onwards, tied scalars use | |
517 | the flags the same way as untied scalars.) Another is when | |
d7f8936a | 518 | numeric conversion has occurred and precision has been lost: only the |
10e2eb10 | 519 | private flag is set on 'lossy' values. So when an NV is converted to an |
9e9796d6 JH |
520 | IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be. |
521 | ||
07fa94a1 | 522 | In general, though, it's best to use the C<Sv*V> macros. |
a0d0e21e | 523 | |
54310121 | 524 | =head2 Working with AVs |
a0d0e21e | 525 | |
e69d7f8b RL |
526 | There are two main, longstanding ways to create and load an AV. The first |
527 | method creates an empty AV: | |
a0d0e21e LW |
528 | |
529 | AV* newAV(); | |
530 | ||
54310121 | 531 | The second method both creates the AV and initially populates it with SVs: |
a0d0e21e | 532 | |
c70927a6 | 533 | AV* av_make(SSize_t num, SV **ptr); |
a0d0e21e | 534 | |
5f05dabc | 535 | The second argument points to an array containing C<num> C<SV*>'s. Once the |
54310121 | 536 | AV has been created, the SVs can be destroyed, if so desired. |
a0d0e21e | 537 | |
e69d7f8b RL |
538 | Perl v5.36 added two new ways to create an AV and allocate a SV** array |
539 | without populating it. These are more efficient than a newAV() followed by an | |
540 | av_extend(). | |
541 | ||
542 | /* Creates but does not initialize (Zero) the SV** array */ | |
543 | AV *av = newAV_alloc_x(1); | |
544 | /* Creates and does initialize (Zero) the SV** array */ | |
545 | AV *av = newAV_alloc_xz(1); | |
546 | ||
547 | The numerical argument refers to the number of array elements to allocate, not | |
548 | an array index, and must be >0. The first form must only ever be used when all | |
549 | elements will be initialized before any read occurs. Reading a non-initialized | |
550 | SV* - i.e. treating a random memory address as a SV* - is a serious bug. | |
551 | ||
da8c5729 | 552 | Once the AV has been created, the following operations are possible on it: |
a0d0e21e LW |
553 | |
554 | void av_push(AV*, SV*); | |
555 | SV* av_pop(AV*); | |
556 | SV* av_shift(AV*); | |
c70927a6 | 557 | void av_unshift(AV*, SSize_t num); |
a0d0e21e LW |
558 | |
559 | These should be familiar operations, with the exception of C<av_unshift>. | |
560 | This routine adds C<num> elements at the front of the array with the C<undef> | |
561 | value. You must then use C<av_store> (described below) to assign values | |
562 | to these new elements. | |
563 | ||
564 | Here are some other functions: | |
565 | ||
c70927a6 FC |
566 | SSize_t av_top_index(AV*); |
567 | SV** av_fetch(AV*, SSize_t key, I32 lval); | |
568 | SV** av_store(AV*, SSize_t key, SV* val); | |
a0d0e21e | 569 | |
dab460cd | 570 | The C<av_top_index> function returns the highest index value in an array (just |
5f05dabc | 571 | like $#array in Perl). If the array is empty, -1 is returned. The |
572 | C<av_fetch> function returns the value at index C<key>, but if C<lval> | |
573 | is non-zero, then C<av_fetch> will store an undef value at that index. | |
04343c6d GS |
574 | The C<av_store> function stores the value C<val> at index C<key>, and does |
575 | not increment the reference count of C<val>. Thus the caller is responsible | |
576 | for taking care of that, and if C<av_store> returns NULL, the caller will | |
577 | have to decrement the reference count to avoid a memory leak. Note that | |
578 | C<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their | |
579 | return value. | |
d1b91892 | 580 | |
da8c5729 MH |
581 | A few more: |
582 | ||
a0d0e21e | 583 | void av_clear(AV*); |
a0d0e21e | 584 | void av_undef(AV*); |
c70927a6 | 585 | void av_extend(AV*, SSize_t key); |
5f05dabc | 586 | |
587 | The C<av_clear> function deletes all the elements in the AV* array, but | |
588 | does not actually delete the array itself. The C<av_undef> function will | |
589 | delete all the elements in the array plus the array itself. The | |
adc882cf GS |
590 | C<av_extend> function extends the array so that it contains at least C<key+1> |
591 | elements. If C<key+1> is less than the currently allocated length of the array, | |
592 | then nothing is done. | |
a0d0e21e LW |
593 | |
594 | If you know the name of an array variable, you can get a pointer to its AV | |
595 | by using the following: | |
596 | ||
cbfd0a87 | 597 | AV* get_av("package::varname", 0); |
a0d0e21e LW |
598 | |
599 | This returns NULL if the variable does not exist. | |
600 | ||
5a0de581 | 601 | See L</Understanding the Magic of Tied Hashes and Arrays> for more |
04343c6d GS |
602 | information on how to use the array access functions on tied arrays. |
603 | ||
e69d7f8b RL |
604 | =head3 More efficient working with new or vanilla AVs |
605 | ||
606 | Perl v5.36 and v5.38 introduced streamlined, inlined versions of some | |
607 | functions: | |
608 | ||
609 | =over | |
610 | ||
611 | =item * C<av_store_simple> | |
612 | ||
613 | =item * C<av_fetch_simple> | |
614 | ||
615 | =item * C<av_push_simple> | |
616 | ||
617 | =back | |
618 | ||
619 | These are drop-in replacements, but can only be used on straightforward | |
620 | AVs that meet the following criteria: | |
621 | ||
622 | =over | |
623 | ||
624 | =item * are not magical | |
625 | ||
626 | =item * are not readonly | |
627 | ||
628 | =item * are "real" (refcounted) AVs | |
629 | ||
630 | =item * have an av_top_index value > -2 | |
631 | ||
632 | =back | |
633 | ||
634 | AVs created using C<newAV()>, C<av_make>, C<newAV_alloc_x>, and | |
635 | C<newAV_alloc_xz> are all compatible at the time of creation. It is | |
636 | only if they are declared readonly or unreal, have magic attached, or | |
637 | are otherwise configured unusually that they will stop being compatible. | |
638 | ||
639 | Note that some interpreter functions may attach magic to an AV as part | |
640 | of normal operations. It is therefore safest, unless you are sure of the | |
641 | lifecycle of an AV, to only use these new functions close to the point | |
642 | of AV creation. | |
643 | ||
54310121 | 644 | =head2 Working with HVs |
a0d0e21e LW |
645 | |
646 | To create an HV, you use the following routine: | |
647 | ||
648 | HV* newHV(); | |
649 | ||
da8c5729 | 650 | Once the HV has been created, the following operations are possible on it: |
a0d0e21e | 651 | |
08105a92 GS |
652 | SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash); |
653 | SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval); | |
a0d0e21e | 654 | |
5f05dabc | 655 | The C<klen> parameter is the length of the key being passed in (Note that |
656 | you cannot pass 0 in as a value of C<klen> to tell Perl to measure the | |
657 | length of the key). The C<val> argument contains the SV pointer to the | |
54310121 | 658 | scalar being stored, and C<hash> is the precomputed hash value (zero if |
5f05dabc | 659 | you want C<hv_store> to calculate it for you). The C<lval> parameter |
660 | indicates whether this fetch is actually a part of a store operation, in | |
661 | which case a new undefined value will be added to the HV with the supplied | |
662 | key and C<hv_fetch> will return as if the value had already existed. | |
a0d0e21e | 663 | |
5f05dabc | 664 | Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just |
665 | C<SV*>. To access the scalar value, you must first dereference the return | |
666 | value. However, you should check to make sure that the return value is | |
667 | not NULL before dereferencing it. | |
a0d0e21e | 668 | |
da8c5729 MH |
669 | The first of these two functions checks if a hash table entry exists, and the |
670 | second deletes it. | |
a0d0e21e | 671 | |
08105a92 GS |
672 | bool hv_exists(HV*, const char* key, U32 klen); |
673 | SV* hv_delete(HV*, const char* key, U32 klen, I32 flags); | |
a0d0e21e | 674 | |
5f05dabc | 675 | If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will |
676 | create and return a mortal copy of the deleted value. | |
677 | ||
a0d0e21e LW |
678 | And more miscellaneous functions: |
679 | ||
680 | void hv_clear(HV*); | |
a0d0e21e | 681 | void hv_undef(HV*); |
5f05dabc | 682 | |
683 | Like their AV counterparts, C<hv_clear> deletes all the entries in the hash | |
684 | table but does not actually delete the hash table. The C<hv_undef> deletes | |
685 | both the entries and the hash table itself. | |
a0d0e21e | 686 | |
a9b0660e | 687 | Perl keeps the actual data in a linked list of structures with a typedef of HE. |
d1b91892 AD |
688 | These contain the actual key and value pointers (plus extra administrative |
689 | overhead). The key is a string pointer; the value is an C<SV*>. However, | |
690 | once you have an C<HE*>, to get the actual key and value, use the routines | |
691 | specified below. | |
692 | ||
7cc7ada7 | 693 | =for apidoc_section $HV |
63dbc4a9 KW |
694 | =for apidoc Ayh||HE |
695 | ||
a0d0e21e LW |
696 | I32 hv_iterinit(HV*); |
697 | /* Prepares starting point to traverse hash table */ | |
698 | HE* hv_iternext(HV*); | |
699 | /* Get the next entry, and return a pointer to a | |
700 | structure that has both the key and value */ | |
701 | char* hv_iterkey(HE* entry, I32* retlen); | |
702 | /* Get the key from an HE structure and also return | |
703 | the length of the key string */ | |
cb1a09d0 | 704 | SV* hv_iterval(HV*, HE* entry); |
d1be9408 | 705 | /* Return an SV pointer to the value of the HE |
a0d0e21e | 706 | structure */ |
cb1a09d0 | 707 | SV* hv_iternextsv(HV*, char** key, I32* retlen); |
d1b91892 AD |
708 | /* This convenience routine combines hv_iternext, |
709 | hv_iterkey, and hv_iterval. The key and retlen | |
710 | arguments are return values for the key and its | |
711 | length. The value is returned in the SV* argument */ | |
a0d0e21e LW |
712 | |
713 | If you know the name of a hash variable, you can get a pointer to its HV | |
714 | by using the following: | |
715 | ||
6673a63c | 716 | HV* get_hv("package::varname", 0); |
a0d0e21e LW |
717 | |
718 | This returns NULL if the variable does not exist. | |
719 | ||
a43e7901 | 720 | The hash algorithm is defined in the C<PERL_HASH> macro: |
a0d0e21e | 721 | |
a43e7901 | 722 | PERL_HASH(hash, key, klen) |
ab192400 | 723 | |
a43e7901 YO |
724 | The exact implementation of this macro varies by architecture and version |
725 | of perl, and the return value may change per invocation, so the value | |
726 | is only valid for the duration of a single perl process. | |
a0d0e21e | 727 | |
5a0de581 | 728 | See L</Understanding the Magic of Tied Hashes and Arrays> for more |
04343c6d GS |
729 | information on how to use the hash access functions on tied hashes. |
730 | ||
3f620621 | 731 | =for apidoc_section $HV |
4f313521 KW |
732 | =for apidoc Amh|void|PERL_HASH|U32 hash|char *key|STRLEN klen |
733 | ||
1e422769 | 734 | =head2 Hash API Extensions |
735 | ||
736 | Beginning with version 5.004, the following functions are also supported: | |
737 | ||
738 | HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash); | |
739 | HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash); | |
c47ff5f1 | 740 | |
1e422769 | 741 | bool hv_exists_ent (HV* tb, SV* key, U32 hash); |
742 | SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash); | |
c47ff5f1 | 743 | |
1e422769 | 744 | SV* hv_iterkeysv (HE* entry); |
745 | ||
746 | Note that these functions take C<SV*> keys, which simplifies writing | |
747 | of extension code that deals with hash structures. These functions | |
748 | also allow passing of C<SV*> keys to C<tie> functions without forcing | |
749 | you to stringify the keys (unlike the previous set of functions). | |
750 | ||
751 | They also return and accept whole hash entries (C<HE*>), making their | |
752 | use more efficient (since the hash number for a particular string | |
4a4eefd0 GS |
753 | doesn't have to be recomputed every time). See L<perlapi> for detailed |
754 | descriptions. | |
1e422769 | 755 | |
756 | The following macros must always be used to access the contents of hash | |
757 | entries. Note that the arguments to these macros must be simple | |
758 | variables, since they may get evaluated more than once. See | |
4a4eefd0 | 759 | L<perlapi> for detailed descriptions of these macros. |
1e422769 | 760 | |
761 | HePV(HE* he, STRLEN len) | |
762 | HeVAL(HE* he) | |
763 | HeHASH(HE* he) | |
764 | HeSVKEY(HE* he) | |
765 | HeSVKEY_force(HE* he) | |
766 | HeSVKEY_set(HE* he, SV* sv) | |
767 | ||
768 | These two lower level macros are defined, but must only be used when | |
769 | dealing with keys that are not C<SV*>s: | |
770 | ||
771 | HeKEY(HE* he) | |
772 | HeKLEN(HE* he) | |
773 | ||
04343c6d GS |
774 | Note that both C<hv_store> and C<hv_store_ent> do not increment the |
775 | reference count of the stored C<val>, which is the caller's responsibility. | |
776 | If these functions return a NULL value, the caller will usually have to | |
777 | decrement the reference count of C<val> to avoid a memory leak. | |
1e422769 | 778 | |
a9381218 MHM |
779 | =head2 AVs, HVs and undefined values |
780 | ||
10e2eb10 FC |
781 | Sometimes you have to store undefined values in AVs or HVs. Although |
782 | this may be a rare case, it can be tricky. That's because you're | |
a9381218 MHM |
783 | used to using C<&PL_sv_undef> if you need an undefined SV. |
784 | ||
785 | For example, intuition tells you that this XS code: | |
786 | ||
787 | AV *av = newAV(); | |
788 | av_store( av, 0, &PL_sv_undef ); | |
789 | ||
790 | is equivalent to this Perl code: | |
791 | ||
792 | my @av; | |
793 | $av[0] = undef; | |
794 | ||
f3c4ec28 | 795 | Unfortunately, this isn't true. In perl 5.18 and earlier, AVs use C<&PL_sv_undef> as a marker |
a9381218 MHM |
796 | for indicating that an array element has not yet been initialized. |
797 | Thus, C<exists $av[0]> would be true for the above Perl code, but | |
f3c4ec28 FC |
798 | false for the array generated by the XS code. In perl 5.20, storing |
799 | &PL_sv_undef will create a read-only element, because the scalar | |
800 | &PL_sv_undef itself is stored, not a copy. | |
a9381218 | 801 | |
f3c4ec28 | 802 | Similar problems can occur when storing C<&PL_sv_undef> in HVs: |
a9381218 MHM |
803 | |
804 | hv_store( hv, "key", 3, &PL_sv_undef, 0 ); | |
805 | ||
806 | This will indeed make the value C<undef>, but if you try to modify | |
807 | the value of C<key>, you'll get the following error: | |
808 | ||
809 | Modification of non-creatable hash value attempted | |
810 | ||
811 | In perl 5.8.0, C<&PL_sv_undef> was also used to mark placeholders | |
10e2eb10 | 812 | in restricted hashes. This caused such hash entries not to appear |
a9381218 MHM |
813 | when iterating over the hash or when checking for the keys |
814 | with the C<hv_exists> function. | |
815 | ||
8abccac8 | 816 | You can run into similar problems when you store C<&PL_sv_yes> or |
10e2eb10 | 817 | C<&PL_sv_no> into AVs or HVs. Trying to modify such elements |
a9381218 MHM |
818 | will give you the following error: |
819 | ||
820 | Modification of a read-only value attempted | |
821 | ||
822 | To make a long story short, you can use the special variables | |
8abccac8 | 823 | C<&PL_sv_undef>, C<&PL_sv_yes> and C<&PL_sv_no> with AVs and |
a9381218 MHM |
824 | HVs, but you have to make sure you know what you're doing. |
825 | ||
826 | Generally, if you want to store an undefined value in an AV | |
827 | or HV, you should not use C<&PL_sv_undef>, but rather create a | |
828 | new undefined value using the C<newSV> function, for example: | |
829 | ||
830 | av_store( av, 42, newSV(0) ); | |
831 | hv_store( hv, "foo", 3, newSV(0), 0 ); | |
832 | ||
a0d0e21e LW |
833 | =head2 References |
834 | ||
d1b91892 | 835 | References are a special type of scalar that point to other data types |
a9b0660e | 836 | (including other references). |
a0d0e21e | 837 | |
07fa94a1 | 838 | To create a reference, use either of the following functions: |
a0d0e21e | 839 | |
5f05dabc | 840 | SV* newRV_inc((SV*) thing); |
841 | SV* newRV_noinc((SV*) thing); | |
a0d0e21e | 842 | |
5f05dabc | 843 | The C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>. The |
07fa94a1 JO |
844 | functions are identical except that C<newRV_inc> increments the reference |
845 | count of the C<thing>, while C<newRV_noinc> does not. For historical | |
846 | reasons, C<newRV> is a synonym for C<newRV_inc>. | |
847 | ||
848 | Once you have a reference, you can use the following macro to dereference | |
849 | the reference: | |
a0d0e21e LW |
850 | |
851 | SvRV(SV*) | |
852 | ||
853 | then call the appropriate routines, casting the returned C<SV*> to either an | |
d1b91892 | 854 | C<AV*> or C<HV*>, if required. |
a0d0e21e | 855 | |
d1b91892 | 856 | To determine if an SV is a reference, you can use the following macro: |
a0d0e21e LW |
857 | |
858 | SvROK(SV*) | |
859 | ||
07fa94a1 JO |
860 | To discover what type of value the reference refers to, use the following |
861 | macro and then check the return value. | |
d1b91892 AD |
862 | |
863 | SvTYPE(SvRV(SV*)) | |
864 | ||
865 | The most useful types that will be returned are: | |
866 | ||
a5e62da0 FC |
867 | SVt_PVAV Array |
868 | SVt_PVHV Hash | |
869 | SVt_PVCV Code | |
870 | SVt_PVGV Glob (possibly a file handle) | |
871 | ||
2d0e7d1f DM |
872 | Any numerical value returned which is less than SVt_PVAV will be a scalar |
873 | of some form. | |
874 | ||
a5e62da0 | 875 | See L<perlapi/svtype> for more details. |
d1b91892 | 876 | |
cb1a09d0 AD |
877 | =head2 Blessed References and Class Objects |
878 | ||
06f6df17 | 879 | References are also used to support object-oriented programming. In perl's |
cb1a09d0 AD |
880 | OO lexicon, an object is simply a reference that has been blessed into a |
881 | package (or class). Once blessed, the programmer may now use the reference | |
882 | to access the various methods in the class. | |
883 | ||
884 | A reference can be blessed into a package with the following function: | |
885 | ||
886 | SV* sv_bless(SV* sv, HV* stash); | |
887 | ||
06f6df17 RGS |
888 | The C<sv> argument must be a reference value. The C<stash> argument |
889 | specifies which class the reference will belong to. See | |
5a0de581 | 890 | L</Stashes and Globs> for information on converting class names into stashes. |
cb1a09d0 AD |
891 | |
892 | /* Still under construction */ | |
893 | ||
ddd2cc91 DM |
894 | The following function upgrades rv to reference if not already one. |
895 | Creates a new SV for rv to point to. If C<classname> is non-null, the SV | |
896 | is blessed into the specified class. SV is returned. | |
cb1a09d0 | 897 | |
08105a92 | 898 | SV* newSVrv(SV* rv, const char* classname); |
cb1a09d0 | 899 | |
ddd2cc91 DM |
900 | The following three functions copy integer, unsigned integer or double |
901 | into an SV whose reference is C<rv>. SV is blessed if C<classname> is | |
902 | non-null. | |
cb1a09d0 | 903 | |
08105a92 | 904 | SV* sv_setref_iv(SV* rv, const char* classname, IV iv); |
e1c57cef | 905 | SV* sv_setref_uv(SV* rv, const char* classname, UV uv); |
08105a92 | 906 | SV* sv_setref_nv(SV* rv, const char* classname, NV iv); |
cb1a09d0 | 907 | |
ddd2cc91 DM |
908 | The following function copies the pointer value (I<the address, not the |
909 | string!>) into an SV whose reference is rv. SV is blessed if C<classname> | |
910 | is non-null. | |
cb1a09d0 | 911 | |
ddd2cc91 | 912 | SV* sv_setref_pv(SV* rv, const char* classname, void* pv); |
cb1a09d0 | 913 | |
a9b0660e | 914 | The following function copies a string into an SV whose reference is C<rv>. |
ddd2cc91 DM |
915 | Set length to 0 to let Perl calculate the string length. SV is blessed if |
916 | C<classname> is non-null. | |
cb1a09d0 | 917 | |
a9b0660e KW |
918 | SV* sv_setref_pvn(SV* rv, const char* classname, char* pv, |
919 | STRLEN length); | |
cb1a09d0 | 920 | |
ddd2cc91 DM |
921 | The following function tests whether the SV is blessed into the specified |
922 | class. It does not check inheritance relationships. | |
9abd00ed | 923 | |
08105a92 | 924 | int sv_isa(SV* sv, const char* name); |
9abd00ed | 925 | |
ddd2cc91 | 926 | The following function tests whether the SV is a reference to a blessed object. |
9abd00ed GS |
927 | |
928 | int sv_isobject(SV* sv); | |
929 | ||
ddd2cc91 | 930 | The following function tests whether the SV is derived from the specified |
10e2eb10 FC |
931 | class. SV can be either a reference to a blessed object or a string |
932 | containing a class name. This is the function implementing the | |
ddd2cc91 | 933 | C<UNIVERSAL::isa> functionality. |
9abd00ed | 934 | |
08105a92 | 935 | bool sv_derived_from(SV* sv, const char* name); |
9abd00ed | 936 | |
00aadd71 | 937 | To check if you've got an object derived from a specific class you have |
9abd00ed GS |
938 | to write: |
939 | ||
940 | if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... } | |
cb1a09d0 | 941 | |
5f05dabc | 942 | =head2 Creating New Variables |
cb1a09d0 | 943 | |
5f05dabc | 944 | To create a new Perl variable with an undef value which can be accessed from |
945 | your Perl script, use the following routines, depending on the variable type. | |
cb1a09d0 | 946 | |
64ace3f8 | 947 | SV* get_sv("package::varname", GV_ADD); |
cbfd0a87 | 948 | AV* get_av("package::varname", GV_ADD); |
6673a63c | 949 | HV* get_hv("package::varname", GV_ADD); |
cb1a09d0 | 950 | |
058a5f6c | 951 | Notice the use of GV_ADD as the second parameter. The new variable can now |
cb1a09d0 AD |
952 | be set, using the routines appropriate to the data type. |
953 | ||
5f05dabc | 954 | There are additional macros whose values may be bitwise OR'ed with the |
058a5f6c | 955 | C<GV_ADD> argument to enable certain extra features. Those bits are: |
cb1a09d0 | 956 | |
9a68f1db SB |
957 | =over |
958 | ||
959 | =item GV_ADDMULTI | |
960 | ||
961 | Marks the variable as multiply defined, thus preventing the: | |
962 | ||
963 | Name <varname> used only once: possible typo | |
964 | ||
965 | warning. | |
966 | ||
9a68f1db SB |
967 | =item GV_ADDWARN |
968 | ||
969 | Issues the warning: | |
970 | ||
971 | Had to create <varname> unexpectedly | |
972 | ||
973 | if the variable did not exist before the function was called. | |
974 | ||
975 | =back | |
cb1a09d0 | 976 | |
07fa94a1 JO |
977 | If you do not specify a package name, the variable is created in the current |
978 | package. | |
cb1a09d0 | 979 | |
5f05dabc | 980 | =head2 Reference Counts and Mortality |
a0d0e21e | 981 | |
10e2eb10 | 982 | Perl uses a reference count-driven garbage collection mechanism. SVs, |
54310121 | 983 | AVs, or HVs (xV for short in the following) start their life with a |
55497cff | 984 | reference count of 1. If the reference count of an xV ever drops to 0, |
07fa94a1 | 985 | then it will be destroyed and its memory made available for reuse. |
3d2ba989 Z |
986 | At the most basic internal level, reference counts can be manipulated |
987 | with the following macros: | |
55497cff | 988 | |
989 | int SvREFCNT(SV* sv); | |
5f05dabc | 990 | SV* SvREFCNT_inc(SV* sv); |
55497cff | 991 | void SvREFCNT_dec(SV* sv); |
992 | ||
3d2ba989 Z |
993 | (There are also suffixed versions of the increment and decrement macros, |
994 | for situations where the full generality of these basic macros can be | |
995 | exchanged for some performance.) | |
996 | ||
997 | However, the way a programmer should think about references is not so | |
998 | much in terms of the bare reference count, but in terms of I<ownership> | |
999 | of references. A reference to an xV can be owned by any of a variety | |
1000 | of entities: another xV, the Perl interpreter, an XS data structure, | |
1001 | a piece of running code, or a dynamic scope. An xV generally does not | |
1002 | know what entities own the references to it; it only knows how many | |
1003 | references there are, which is the reference count. | |
1004 | ||
1005 | To correctly maintain reference counts, it is essential to keep track | |
1006 | of what references the XS code is manipulating. The programmer should | |
1007 | always know where a reference has come from and who owns it, and be | |
1008 | aware of any creation or destruction of references, and any transfers | |
1009 | of ownership. Because ownership isn't represented explicitly in the xV | |
1010 | data structures, only the reference count need be actually maintained | |
1011 | by the code, and that means that this understanding of ownership is not | |
1012 | actually evident in the code. For example, transferring ownership of a | |
1013 | reference from one owner to another doesn't change the reference count | |
1014 | at all, so may be achieved with no actual code. (The transferring code | |
1015 | doesn't touch the referenced object, but does need to ensure that the | |
1016 | former owner knows that it no longer owns the reference, and that the | |
1017 | new owner knows that it now does.) | |
1018 | ||
1019 | An xV that is visible at the Perl level should not become unreferenced | |
1020 | and thus be destroyed. Normally, an object will only become unreferenced | |
1021 | when it is no longer visible, often by the same means that makes it | |
1022 | invisible. For example, a Perl reference value (RV) owns a reference to | |
1023 | its referent, so if the RV is overwritten that reference gets destroyed, | |
1024 | and the no-longer-reachable referent may be destroyed as a result. | |
1025 | ||
1026 | Many functions have some kind of reference manipulation as | |
1027 | part of their purpose. Sometimes this is documented in terms | |
1028 | of ownership of references, and sometimes it is (less helpfully) | |
1029 | documented in terms of changes to reference counts. For example, the | |
1030 | L<newRV_inc()|perlapi/newRV_inc> function is documented to create a new RV | |
1031 | (with reference count 1) and increment the reference count of the referent | |
1032 | that was supplied by the caller. This is best understood as creating | |
1033 | a new reference to the referent, which is owned by the created RV, | |
1034 | and returning to the caller ownership of the sole reference to the RV. | |
1035 | The L<newRV_noinc()|perlapi/newRV_noinc> function instead does not | |
1036 | increment the reference count of the referent, but the RV nevertheless | |
1037 | ends up owning a reference to the referent. It is therefore implied | |
1038 | that the caller of C<newRV_noinc()> is relinquishing a reference to the | |
1039 | referent, making this conceptually a more complicated operation even | |
1040 | though it does less to the data structures. | |
1041 | ||
1042 | For example, imagine you want to return a reference from an XSUB | |
1043 | function. Inside the XSUB routine, you create an SV which initially | |
1044 | has just a single reference, owned by the XSUB routine. This reference | |
1045 | needs to be disposed of before the routine is complete, otherwise it | |
1046 | will leak, preventing the SV from ever being destroyed. So to create | |
1047 | an RV referencing the SV, it is most convenient to pass the SV to | |
1048 | C<newRV_noinc()>, which consumes that reference. Now the XSUB routine | |
1049 | no longer owns a reference to the SV, but does own a reference to the RV, | |
1050 | which in turn owns a reference to the SV. The ownership of the reference | |
1051 | to the RV is then transferred by the process of returning the RV from | |
1052 | the XSUB. | |
55497cff | 1053 | |
5f05dabc | 1054 | There are some convenience functions available that can help with the |
54310121 | 1055 | destruction of xVs. These functions introduce the concept of "mortality". |
3d2ba989 Z |
1056 | Much documentation speaks of an xV itself being mortal, but this is |
1057 | misleading. It is really I<a reference to> an xV that is mortal, and it | |
1058 | is possible for there to be more than one mortal reference to a single xV. | |
1059 | For a reference to be mortal means that it is owned by the temps stack, | |
1060 | one of perl's many internal stacks, which will destroy that reference | |
1061 | "a short time later". Usually the "short time later" is the end of | |
1062 | the current Perl statement. However, it gets more complicated around | |
1063 | dynamic scopes: there can be multiple sets of mortal references hanging | |
1064 | around at the same time, with different death dates. Internally, the | |
1065 | actual determinant for when mortal xV references are destroyed depends | |
1066 | on two macros, SAVETMPS and FREETMPS. See L<perlcall> and L<perlxs> | |
e55ec392 | 1067 | and L</Temporaries Stack> below for more details on these macros. |
3d2ba989 Z |
1068 | |
1069 | Mortal references are mainly used for xVs that are placed on perl's | |
1070 | main stack. The stack is problematic for reference tracking, because it | |
1071 | contains a lot of xV references, but doesn't own those references: they | |
1072 | are not counted. Currently, there are many bugs resulting from xVs being | |
1073 | destroyed while referenced by the stack, because the stack's uncounted | |
1074 | references aren't enough to keep the xVs alive. So when putting an | |
1075 | (uncounted) reference on the stack, it is vitally important to ensure that | |
1076 | there will be a counted reference to the same xV that will last at least | |
1077 | as long as the uncounted reference. But it's also important that that | |
1078 | counted reference be cleaned up at an appropriate time, and not unduly | |
1079 | prolong the xV's life. For there to be a mortal reference is often the | |
1080 | best way to satisfy this requirement, especially if the xV was created | |
1081 | especially to be put on the stack and would otherwise be unreferenced. | |
1082 | ||
1083 | To create a mortal reference, use the functions: | |
a0d0e21e LW |
1084 | |
1085 | SV* sv_newmortal() | |
a0d0e21e | 1086 | SV* sv_mortalcopy(SV*) |
3d2ba989 | 1087 | SV* sv_2mortal(SV*) |
a0d0e21e | 1088 | |
3d2ba989 Z |
1089 | C<sv_newmortal()> creates an SV (with the undefined value) whose sole |
1090 | reference is mortal. C<sv_mortalcopy()> creates an xV whose value is a | |
1091 | copy of a supplied xV and whose sole reference is mortal. C<sv_2mortal()> | |
1092 | mortalises an existing xV reference: it transfers ownership of a reference | |
1093 | from the caller to the temps stack. Because C<sv_newmortal> gives the new | |
1094 | SV no value, it must normally be given one via C<sv_setpv>, C<sv_setiv>, | |
1095 | etc. : | |
00aadd71 NIS |
1096 | |
1097 | SV *tmp = sv_newmortal(); | |
1098 | sv_setiv(tmp, an_integer); | |
1099 | ||
1100 | As that is multiple C statements it is quite common so see this idiom instead: | |
1101 | ||
1102 | SV *tmp = sv_2mortal(newSViv(an_integer)); | |
1103 | ||
ac036724 | 1104 | The mortal routines are not just for SVs; AVs and HVs can be |
faed5253 | 1105 | made mortal by passing their address (type-casted to C<SV*>) to the |
07fa94a1 | 1106 | C<sv_2mortal> or C<sv_mortalcopy> routines. |
a0d0e21e | 1107 | |
5f05dabc | 1108 | =head2 Stashes and Globs |
a0d0e21e | 1109 | |
06f6df17 RGS |
1110 | A B<stash> is a hash that contains all variables that are defined |
1111 | within a package. Each key of the stash is a symbol | |
aa689395 | 1112 | name (shared by all the different types of objects that have the same |
1113 | name), and each value in the hash table is a GV (Glob Value). This GV | |
1114 | in turn contains references to the various objects of that name, | |
1115 | including (but not limited to) the following: | |
cb1a09d0 | 1116 | |
a0d0e21e LW |
1117 | Scalar Value |
1118 | Array Value | |
1119 | Hash Value | |
a3cb178b | 1120 | I/O Handle |
a0d0e21e LW |
1121 | Format |
1122 | Subroutine | |
1123 | ||
06f6df17 RGS |
1124 | There is a single stash called C<PL_defstash> that holds the items that exist |
1125 | in the C<main> package. To get at the items in other packages, append the | |
1126 | string "::" to the package name. The items in the C<Foo> package are in | |
1127 | the stash C<Foo::> in PL_defstash. The items in the C<Bar::Baz> package are | |
1128 | in the stash C<Baz::> in C<Bar::>'s stash. | |
a0d0e21e | 1129 | |
6ef63541 KW |
1130 | =for apidoc_section $GV |
1131 | =for apidoc Amnh||PL_defstash | |
1132 | ||
d1b91892 | 1133 | To get the stash pointer for a particular package, use the function: |
a0d0e21e | 1134 | |
da51bb9b NC |
1135 | HV* gv_stashpv(const char* name, I32 flags) |
1136 | HV* gv_stashsv(SV*, I32 flags) | |
a0d0e21e LW |
1137 | |
1138 | The first function takes a literal string, the second uses the string stored | |
d1b91892 | 1139 | in the SV. Remember that a stash is just a hash table, so you get back an |
da51bb9b | 1140 | C<HV*>. The C<flags> flag will create a new package if it is set to GV_ADD. |
a0d0e21e LW |
1141 | |
1142 | The name that C<gv_stash*v> wants is the name of the package whose symbol table | |
1143 | you want. The default package is called C<main>. If you have multiply nested | |
d1b91892 AD |
1144 | packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl |
1145 | language itself. | |
a0d0e21e LW |
1146 | |
1147 | Alternately, if you have an SV that is a blessed reference, you can find | |
1148 | out the stash pointer by using: | |
1149 | ||
1150 | HV* SvSTASH(SvRV(SV*)); | |
1151 | ||
1152 | then use the following to get the package name itself: | |
1153 | ||
1154 | char* HvNAME(HV* stash); | |
1155 | ||
5f05dabc | 1156 | If you need to bless or re-bless an object you can use the following |
1157 | function: | |
a0d0e21e LW |
1158 | |
1159 | SV* sv_bless(SV*, HV* stash) | |
1160 | ||
1161 | where the first argument, an C<SV*>, must be a reference, and the second | |
1162 | argument is a stash. The returned C<SV*> can now be used in the same way | |
1163 | as any other SV. | |
1164 | ||
d1b91892 AD |
1165 | For more information on references and blessings, consult L<perlref>. |
1166 | ||
4c29eb71 TC |
1167 | =head2 I/O Handles |
1168 | ||
1169 | Like AVs and HVs, IO objects are another type of non-scalar SV which | |
1170 | may contain input and output L<PerlIO|perlapio> objects or a C<DIR *> | |
1171 | from opendir(). | |
1172 | ||
1173 | You can create a new IO object: | |
1174 | ||
1175 | IO* newIO(); | |
1176 | ||
1177 | Unlike other SVs, a new IO object is automatically blessed into the | |
1178 | L<IO::File> class. | |
1179 | ||
1180 | The IO object contains an input and output PerlIO handle: | |
1181 | ||
1182 | PerlIO *IoIFP(IO *io); | |
1183 | PerlIO *IoOFP(IO *io); | |
1184 | ||
6ef63541 KW |
1185 | =for apidoc_section $io |
1186 | =for apidoc Amh|PerlIO *|IoIFP|IO *io | |
1187 | =for apidoc Amh|PerlIO *|IoOFP|IO *io | |
1188 | ||
4c29eb71 TC |
1189 | Typically if the IO object has been opened on a file, the input handle |
1190 | is always present, but the output handle is only present if the file | |
1191 | is open for output. For a file, if both are present they will be the | |
1192 | same PerlIO object. | |
1193 | ||
1194 | Distinct input and output PerlIO objects are created for sockets and | |
1195 | character devices. | |
1196 | ||
1197 | The IO object also contains other data associated with Perl I/O | |
1198 | handles: | |
1199 | ||
1200 | IV IoLINES(io); /* $. */ | |
1201 | IV IoPAGE(io); /* $% */ | |
1202 | IV IoPAGE_LEN(io); /* $= */ | |
1203 | IV IoLINES_LEFT(io); /* $- */ | |
1204 | char *IoTOP_NAME(io); /* $^ */ | |
1205 | GV *IoTOP_GV(io); /* $^ */ | |
1206 | char *IoFMT_NAME(io); /* $~ */ | |
1207 | GV *IoFMT_GV(io); /* $~ */ | |
1208 | char *IoBOTTOM_NAME(io); | |
1209 | GV *IoBOTTOM_GV(io); | |
1210 | char IoTYPE(io); | |
1211 | U8 IoFLAGS(io); | |
1212 | ||
6ef63541 KW |
1213 | =for apidoc_sections $io_scn, $formats_section |
1214 | =for apidoc_section $reports | |
1215 | =for apidoc Amh|IV|IoLINES|IO *io | |
1216 | =for apidoc Amh|IV|IoPAGE|IO *io | |
1217 | =for apidoc Amh|IV|IoPAGE_LEN|IO *io | |
1218 | =for apidoc Amh|IV|IoLINES_LEFT|IO *io | |
1219 | =for apidoc Amh|char *|IoTOP_NAME|IO *io | |
1220 | =for apidoc Amh|GV *|IoTOP_GV|IO *io | |
1221 | =for apidoc Amh|char *|IoFMT_NAME|IO *io | |
1222 | =for apidoc Amh|GV *|IoFMT_GV|IO *io | |
1223 | =for apidoc Amh|char *|IoBOTTOM_NAME|IO *io | |
1224 | =for apidoc Amh|GV *|IoBOTTOM_GV|IO *io | |
1225 | =for apidoc_section $io | |
1226 | =for apidoc Amh|char|IoTYPE|IO *io | |
1227 | =for apidoc Amh|U8|IoFLAGS|IO *io | |
1228 | ||
4c29eb71 TC |
1229 | Most of these are involved with L<formats|perlform>. |
1230 | ||
1231 | IoFLAGs() may contain a combination of flags, the most interesting of | |
1232 | which are C<IOf_FLUSH> (C<$|>) for autoflush and C<IOf_UNTAINT>, | |
1233 | settable with L<< IO::Handle's untaint() method|IO::Handle/"$io->untaint" >>. | |
1234 | ||
6ef63541 KW |
1235 | =for apidoc Amnh||IOf_FLUSH |
1236 | =for apidoc Amnh||IOf_UNTAINT | |
1237 | ||
4c29eb71 TC |
1238 | The IO object may also contains a directory handle: |
1239 | ||
1240 | DIR *IoDIRP(io); | |
1241 | ||
6ef63541 KW |
1242 | =for apidoc Amh|DIR *|IoDIRP|IO *io |
1243 | ||
4c29eb71 TC |
1244 | suitable for use with PerlDir_read() etc. |
1245 | ||
1246 | All of these accessors macros are lvalues, there are no distinct | |
1247 | C<_set()> macros to modify the members of the IO object. | |
1248 | ||
54310121 | 1249 | =head2 Double-Typed SVs |
0a753a76 | 1250 | |
1251 | Scalar variables normally contain only one type of value, an integer, | |
1252 | double, pointer, or reference. Perl will automatically convert the | |
1253 | actual scalar data from the stored type into the requested type. | |
1254 | ||
1255 | Some scalar variables contain more than one type of scalar data. For | |
1256 | example, the variable C<$!> contains either the numeric value of C<errno> | |
1257 | or its string equivalent from either C<strerror> or C<sys_errlist[]>. | |
1258 | ||
1259 | To force multiple data values into an SV, you must do two things: use the | |
1260 | C<sv_set*v> routines to add the additional scalar type, then set a flag | |
1261 | so that Perl will believe it contains more than one type of data. The | |
1262 | four macros to set the flags are: | |
1263 | ||
1264 | SvIOK_on | |
1265 | SvNOK_on | |
1266 | SvPOK_on | |
1267 | SvROK_on | |
1268 | ||
1269 | The particular macro you must use depends on which C<sv_set*v> routine | |
1270 | you called first. This is because every C<sv_set*v> routine turns on | |
1271 | only the bit for the particular type of data being set, and turns off | |
1272 | all the rest. | |
1273 | ||
1274 | For example, to create a new Perl variable called "dberror" that contains | |
1275 | both the numeric and descriptive string error values, you could use the | |
1276 | following code: | |
1277 | ||
1278 | extern int dberror; | |
1279 | extern char *dberror_list; | |
1280 | ||
64ace3f8 | 1281 | SV* sv = get_sv("dberror", GV_ADD); |
0a753a76 | 1282 | sv_setiv(sv, (IV) dberror); |
1283 | sv_setpv(sv, dberror_list[dberror]); | |
1284 | SvIOK_on(sv); | |
1285 | ||
1286 | If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the | |
1287 | macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>. | |
1288 | ||
4f4531b8 FC |
1289 | =head2 Read-Only Values |
1290 | ||
1291 | In Perl 5.16 and earlier, copy-on-write (see the next section) shared a | |
1292 | flag bit with read-only scalars. So the only way to test whether | |
1293 | C<sv_setsv>, etc., will raise a "Modification of a read-only value" error | |
1294 | in those versions is: | |
1295 | ||
1296 | SvREADONLY(sv) && !SvIsCOW(sv) | |
1297 | ||
1298 | Under Perl 5.18 and later, SvREADONLY only applies to read-only variables, | |
1299 | and, under 5.20, copy-on-write scalars can also be read-only, so the above | |
1300 | check is incorrect. You just want: | |
1301 | ||
1302 | SvREADONLY(sv) | |
1303 | ||
1304 | If you need to do this check often, define your own macro like this: | |
1305 | ||
1306 | #if PERL_VERSION >= 18 | |
1307 | # define SvTRULYREADONLY(sv) SvREADONLY(sv) | |
1308 | #else | |
1309 | # define SvTRULYREADONLY(sv) (SvREADONLY(sv) && !SvIsCOW(sv)) | |
1310 | #endif | |
1311 | ||
1312 | =head2 Copy on Write | |
1313 | ||
1314 | Perl implements a copy-on-write (COW) mechanism for scalars, in which | |
1315 | string copies are not immediately made when requested, but are deferred | |
1316 | until made necessary by one or the other scalar changing. This is mostly | |
1317 | transparent, but one must take care not to modify string buffers that are | |
1318 | shared by multiple SVs. | |
1319 | ||
1320 | You can test whether an SV is using copy-on-write with C<SvIsCOW(sv)>. | |
1321 | ||
1322 | You can force an SV to make its own copy of its string buffer by calling C<sv_force_normal(sv)> or SvPV_force_nolen(sv). | |
1323 | ||
1324 | If you want to make the SV drop its string buffer, use | |
1325 | C<sv_force_normal_flags(sv, SV_COW_DROP_PV)> or simply | |
1326 | C<sv_setsv(sv, NULL)>. | |
1327 | ||
1328 | All of these functions will croak on read-only scalars (see the previous | |
1329 | section for more on those). | |
1330 | ||
1331 | To test that your code is behaving correctly and not modifying COW buffers, | |
1332 | on systems that support L<mmap(2)> (i.e., Unix) you can configure perl with | |
1333 | C<-Accflags=-DPERL_DEBUG_READONLY_COW> and it will turn buffer violations | |
1334 | into crashes. You will find it to be marvellously slow, so you may want to | |
1335 | skip perl's own tests. | |
1336 | ||
0a753a76 | 1337 | =head2 Magic Variables |
a0d0e21e | 1338 | |
d1b91892 AD |
1339 | [This section still under construction. Ignore everything here. Post no |
1340 | bills. Everything not permitted is forbidden.] | |
1341 | ||
d1b91892 AD |
1342 | Any SV may be magical, that is, it has special features that a normal |
1343 | SV does not have. These features are stored in the SV structure in a | |
5f05dabc | 1344 | linked list of C<struct magic>'s, typedef'ed to C<MAGIC>. |
d1b91892 AD |
1345 | |
1346 | struct magic { | |
1347 | MAGIC* mg_moremagic; | |
1348 | MGVTBL* mg_virtual; | |
1349 | U16 mg_private; | |
1350 | char mg_type; | |
1351 | U8 mg_flags; | |
b205eb13 | 1352 | I32 mg_len; |
d1b91892 AD |
1353 | SV* mg_obj; |
1354 | char* mg_ptr; | |
d1b91892 AD |
1355 | }; |
1356 | ||
1357 | Note this is current as of patchlevel 0, and could change at any time. | |
1358 | ||
1359 | =head2 Assigning Magic | |
1360 | ||
1361 | Perl adds magic to an SV using the sv_magic function: | |
1362 | ||
a9b0660e | 1363 | void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen); |
d1b91892 AD |
1364 | |
1365 | The C<sv> argument is a pointer to the SV that is to acquire a new magical | |
1366 | feature. | |
1367 | ||
1368 | If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to | |
10e2eb10 FC |
1369 | convert C<sv> to type C<SVt_PVMG>. |
1370 | Perl then continues by adding new magic | |
645c22ef DM |
1371 | to the beginning of the linked list of magical features. Any prior entry |
1372 | of the same type of magic is deleted. Note that this can be overridden, | |
1373 | and multiple instances of the same type of magic can be associated with an | |
1374 | SV. | |
d1b91892 | 1375 | |
54310121 | 1376 | The C<name> and C<namlen> arguments are used to associate a string with |
10e2eb10 | 1377 | the magic, typically the name of a variable. C<namlen> is stored in the |
2d8d5d5a SH |
1378 | C<mg_len> field and if C<name> is non-null then either a C<savepvn> copy of |
1379 | C<name> or C<name> itself is stored in the C<mg_ptr> field, depending on | |
1380 | whether C<namlen> is greater than zero or equal to zero respectively. As a | |
1381 | special case, if C<(name && namlen == HEf_SVKEY)> then C<name> is assumed | |
1382 | to contain an C<SV*> and is stored as-is with its REFCNT incremented. | |
d1b91892 AD |
1383 | |
1384 | The sv_magic function uses C<how> to determine which, if any, predefined | |
1385 | "Magic Virtual Table" should be assigned to the C<mg_virtual> field. | |
5a0de581 | 1386 | See the L</Magic Virtual Tables> section below. The C<how> argument is also |
10e2eb10 FC |
1387 | stored in the C<mg_type> field. The value of |
1388 | C<how> should be chosen from the set of macros | |
1389 | C<PERL_MAGIC_foo> found in F<perl.h>. Note that before | |
645c22ef | 1390 | these macros were added, Perl internals used to directly use character |
14befaf4 | 1391 | literals, so you may occasionally come across old code or documentation |
75d0f26d | 1392 | referring to 'U' magic rather than C<PERL_MAGIC_uvar> for example. |
d1b91892 AD |
1393 | |
1394 | The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC> | |
1395 | structure. If it is not the same as the C<sv> argument, the reference | |
1396 | count of the C<obj> object is incremented. If it is the same, or if | |
27deb0cf YO |
1397 | the C<how> argument is C<PERL_MAGIC_arylen>, C<PERL_MAGIC_regdatum>, |
1398 | C<PERL_MAGIC_regdata>, or if it is a NULL pointer, then C<obj> is merely | |
1399 | stored, without the reference count being incremented. | |
d1b91892 | 1400 | |
2d8d5d5a SH |
1401 | See also C<sv_magicext> in L<perlapi> for a more flexible way to add magic |
1402 | to an SV. | |
1403 | ||
cb1a09d0 AD |
1404 | There is also a function to add magic to an C<HV>: |
1405 | ||
1406 | void hv_magic(HV *hv, GV *gv, int how); | |
1407 | ||
1408 | This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>. | |
1409 | ||
1410 | To remove the magic from an SV, call the function sv_unmagic: | |
1411 | ||
70a53b35 | 1412 | int sv_unmagic(SV *sv, int type); |
cb1a09d0 AD |
1413 | |
1414 | The C<type> argument should be equal to the C<how> value when the C<SV> | |
1415 | was initially made magical. | |
1416 | ||
f6ee7b17 | 1417 | However, note that C<sv_unmagic> removes all magic of a certain C<type> from the |
10e2eb10 FC |
1418 | C<SV>. If you want to remove only certain |
1419 | magic of a C<type> based on the magic | |
f6ee7b17 FR |
1420 | virtual table, use C<sv_unmagicext> instead: |
1421 | ||
1422 | int sv_unmagicext(SV *sv, int type, MGVTBL *vtbl); | |
1423 | ||
d1b91892 AD |
1424 | =head2 Magic Virtual Tables |
1425 | ||
d1be9408 | 1426 | The C<mg_virtual> field in the C<MAGIC> structure is a pointer to an |
d1b91892 AD |
1427 | C<MGVTBL>, which is a structure of function pointers and stands for |
1428 | "Magic Virtual Table" to handle the various operations that might be | |
1429 | applied to that variable. | |
1430 | ||
39988615 | 1431 | =for apidoc_section $magic |
63dbc4a9 KW |
1432 | =for apidoc Ayh||MGVTBL |
1433 | ||
301cb7e8 DM |
1434 | The C<MGVTBL> has five (or sometimes eight) pointers to the following |
1435 | routine types: | |
d1b91892 | 1436 | |
e97ca230 DM |
1437 | int (*svt_get) (pTHX_ SV* sv, MAGIC* mg); |
1438 | int (*svt_set) (pTHX_ SV* sv, MAGIC* mg); | |
1439 | U32 (*svt_len) (pTHX_ SV* sv, MAGIC* mg); | |
1440 | int (*svt_clear)(pTHX_ SV* sv, MAGIC* mg); | |
1441 | int (*svt_free) (pTHX_ SV* sv, MAGIC* mg); | |
d1b91892 | 1442 | |
e97ca230 | 1443 | int (*svt_copy) (pTHX_ SV *sv, MAGIC* mg, SV *nsv, |
a9b0660e | 1444 | const char *name, I32 namlen); |
e97ca230 DM |
1445 | int (*svt_dup) (pTHX_ MAGIC *mg, CLONE_PARAMS *param); |
1446 | int (*svt_local)(pTHX_ SV *nsv, MAGIC *mg); | |
301cb7e8 DM |
1447 | |
1448 | ||
06f6df17 | 1449 | This MGVTBL structure is set at compile-time in F<perl.h> and there are |
b7a0f54c S |
1450 | currently 32 types. These different structures contain pointers to various |
1451 | routines that perform additional actions depending on which function is | |
1452 | being called. | |
d1b91892 | 1453 | |
a9b0660e KW |
1454 | Function pointer Action taken |
1455 | ---------------- ------------ | |
1456 | svt_get Do something before the value of the SV is | |
1457 | retrieved. | |
1458 | svt_set Do something after the SV is assigned a value. | |
1459 | svt_len Report on the SV's length. | |
1460 | svt_clear Clear something the SV represents. | |
1461 | svt_free Free any extra storage associated with the SV. | |
d1b91892 | 1462 | |
a9b0660e KW |
1463 | svt_copy copy tied variable magic to a tied element |
1464 | svt_dup duplicate a magic structure during thread cloning | |
1465 | svt_local copy magic to local value during 'local' | |
301cb7e8 | 1466 | |
d1b91892 | 1467 | For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds |
14befaf4 | 1468 | to an C<mg_type> of C<PERL_MAGIC_sv>) contains: |
d1b91892 AD |
1469 | |
1470 | { magic_get, magic_set, magic_len, 0, 0 } | |
1471 | ||
14befaf4 DM |
1472 | Thus, when an SV is determined to be magical and of type C<PERL_MAGIC_sv>, |
1473 | if a get operation is being performed, the routine C<magic_get> is | |
1474 | called. All the various routines for the various magical types begin | |
1475 | with C<magic_>. NOTE: the magic routines are not considered part of | |
1476 | the Perl API, and may not be exported by the Perl library. | |
d1b91892 | 1477 | |
301cb7e8 DM |
1478 | The last three slots are a recent addition, and for source code |
1479 | compatibility they are only checked for if one of the three flags | |
0985f7e5 | 1480 | C<MGf_COPY>, C<MGf_DUP>, or C<MGf_LOCAL> is set in mg_flags. |
10e2eb10 FC |
1481 | This means that most code can continue declaring |
1482 | a vtable as a 5-element value. These three are | |
301cb7e8 DM |
1483 | currently used exclusively by the threading code, and are highly subject |
1484 | to change. | |
1485 | ||
6ef63541 KW |
1486 | =for apidoc_section $magic |
1487 | =for apidoc Amnh||MGf_COPY | |
1488 | =for apidoc_item ||MGf_DUP | |
1489 | =for apidoc_item ||MGf_LOCAL | |
1490 | ||
d1b91892 AD |
1491 | The current kinds of Magic Virtual Tables are: |
1492 | ||
f1f5ddd7 FC |
1493 | =for comment |
1494 | This table is generated by regen/mg_vtable.pl. Any changes made here | |
1495 | will be lost. | |
1496 | ||
1497 | =for mg_vtable.pl begin | |
1498 | ||
a9b0660e | 1499 | mg_type |
bd6e6c12 FC |
1500 | (old-style char and macro) MGVTBL Type of magic |
1501 | -------------------------- ------ ------------- | |
1502 | \0 PERL_MAGIC_sv vtbl_sv Special scalar variable | |
1503 | # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary) | |
e5e1ee61 | 1504 | % PERL_MAGIC_rhash (none) Extra data for restricted |
bd6e6c12 | 1505 | hashes |
a6d69523 TC |
1506 | * PERL_MAGIC_debugvar vtbl_debugvar $DB::single, signal, trace |
1507 | vars | |
bd6e6c12 | 1508 | . PERL_MAGIC_pos vtbl_pos pos() lvalue |
e5e1ee61 | 1509 | : PERL_MAGIC_symtab (none) Extra data for symbol |
bd6e6c12 | 1510 | tables |
e5e1ee61 FC |
1511 | < PERL_MAGIC_backref vtbl_backref For weak ref data |
1512 | @ PERL_MAGIC_arylen_p (none) To move arylen out of XPVAV | |
2f920c2f | 1513 | B PERL_MAGIC_bm vtbl_regexp Boyer-Moore |
bd6e6c12 | 1514 | (fast string search) |
2f920c2f | 1515 | c PERL_MAGIC_overload_table vtbl_ovrld Holds overload table |
bd6e6c12 | 1516 | (AMT) on stash |
2f920c2f | 1517 | D PERL_MAGIC_regdata vtbl_regdata Regex match position data |
bd6e6c12 FC |
1518 | (@+ and @- vars) |
1519 | d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data | |
1520 | element | |
1521 | E PERL_MAGIC_env vtbl_env %ENV hash | |
1522 | e PERL_MAGIC_envelem vtbl_envelem %ENV hash element | |
2f920c2f | 1523 | f PERL_MAGIC_fm vtbl_regexp Formline |
bd6e6c12 | 1524 | ('compiled' format) |
bd6e6c12 FC |
1525 | g PERL_MAGIC_regex_global vtbl_mglob m//g target |
1526 | H PERL_MAGIC_hints vtbl_hints %^H hash | |
1527 | h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element | |
1528 | I PERL_MAGIC_isa vtbl_isa @ISA array | |
1529 | i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element | |
1530 | k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue | |
1531 | L PERL_MAGIC_dbfile (none) Debugger %_<filename | |
1532 | l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename | |
1533 | element | |
1534 | N PERL_MAGIC_shared (none) Shared between threads | |
1535 | n PERL_MAGIC_shared_scalar (none) Shared between threads | |
1536 | o PERL_MAGIC_collxfrm vtbl_collxfrm Locale transformation | |
1537 | P PERL_MAGIC_tied vtbl_pack Tied array or hash | |
1538 | p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element | |
1539 | q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle | |
e5e1ee61 | 1540 | r PERL_MAGIC_qr vtbl_regexp Precompiled qr// regex |
55f5e765 | 1541 | S PERL_MAGIC_sig vtbl_sig %SIG hash |
bd6e6c12 FC |
1542 | s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element |
1543 | t PERL_MAGIC_taint vtbl_taint Taintedness | |
1544 | U PERL_MAGIC_uvar vtbl_uvar Available for use by | |
1545 | extensions | |
1546 | u PERL_MAGIC_uvar_elem (none) Reserved for use by | |
1547 | extensions | |
4499db73 | 1548 | V PERL_MAGIC_vstring (none) SV was vstring literal |
bd6e6c12 FC |
1549 | v PERL_MAGIC_vec vtbl_vec vec() lvalue |
1550 | w PERL_MAGIC_utf8 vtbl_utf8 Cached UTF-8 information | |
2f920c2f | 1551 | X PERL_MAGIC_destruct vtbl_destruct destruct callback |
bd6e6c12 | 1552 | x PERL_MAGIC_substr vtbl_substr substr() lvalue |
1f1dcfb5 FC |
1553 | Y PERL_MAGIC_nonelem vtbl_nonelem Array element that does not |
1554 | exist | |
bd6e6c12 FC |
1555 | y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator |
1556 | variable / smart parameter | |
1557 | vivification | |
93f6f965 YO |
1558 | Z PERL_MAGIC_hook vtbl_hook %{^HOOK} hash |
1559 | z PERL_MAGIC_hookelem vtbl_hookelem %{^HOOK} hash element | |
baabe3fb FC |
1560 | \ PERL_MAGIC_lvref vtbl_lvref Lvalue reference |
1561 | constructor | |
e5e1ee61 | 1562 | ] PERL_MAGIC_checkcall vtbl_checkcall Inlining/mutation of call |
bd6e6c12 | 1563 | to this CV |
3e510e80 LT |
1564 | ^ PERL_MAGIC_extvalue (none) Value magic available for |
1565 | use by extensions | |
1566 | ~ PERL_MAGIC_ext (none) Variable magic available | |
1567 | for use by extensions | |
0cbee0a4 | 1568 | |
ed48408e | 1569 | |
7d61aa1c | 1570 | =for apidoc_section $magic |
4295f56c KW |
1571 | =for apidoc AmnhU||PERL_MAGIC_arylen |
1572 | =for apidoc_item ||PERL_MAGIC_arylen_p | |
1573 | =for apidoc_item ||PERL_MAGIC_backref | |
1574 | =for apidoc_item ||PERL_MAGIC_bm | |
1575 | =for apidoc_item ||PERL_MAGIC_checkcall | |
1576 | =for apidoc_item ||PERL_MAGIC_collxfrm | |
1577 | =for apidoc_item ||PERL_MAGIC_dbfile | |
1578 | =for apidoc_item ||PERL_MAGIC_dbline | |
1579 | =for apidoc_item ||PERL_MAGIC_debugvar | |
1580 | =for apidoc_item ||PERL_MAGIC_defelem | |
2f920c2f | 1581 | =for apidoc_item ||PERL_MAGIC_destruct |
4295f56c KW |
1582 | =for apidoc_item ||PERL_MAGIC_env |
1583 | =for apidoc_item ||PERL_MAGIC_envelem | |
1584 | =for apidoc_item ||PERL_MAGIC_ext | |
3e510e80 | 1585 | =for apidoc_item ||PERL_MAGIC_extvalue |
4295f56c KW |
1586 | =for apidoc_item ||PERL_MAGIC_fm |
1587 | =for apidoc_item ||PERL_MAGIC_hints | |
1588 | =for apidoc_item ||PERL_MAGIC_hintselem | |
93f6f965 YO |
1589 | =for apidoc_item ||PERL_MAGIC_hook |
1590 | =for apidoc_item ||PERL_MAGIC_hookelem | |
4295f56c KW |
1591 | =for apidoc_item ||PERL_MAGIC_isa |
1592 | =for apidoc_item ||PERL_MAGIC_isaelem | |
1593 | =for apidoc_item ||PERL_MAGIC_lvref | |
1594 | =for apidoc_item ||PERL_MAGIC_nkeys | |
1595 | =for apidoc_item ||PERL_MAGIC_nonelem | |
1596 | =for apidoc_item ||PERL_MAGIC_overload_table | |
1597 | =for apidoc_item ||PERL_MAGIC_pos | |
1598 | =for apidoc_item ||PERL_MAGIC_qr | |
1599 | =for apidoc_item ||PERL_MAGIC_regdata | |
1600 | =for apidoc_item ||PERL_MAGIC_regdatum | |
1601 | =for apidoc_item ||PERL_MAGIC_regex_global | |
1602 | =for apidoc_item ||PERL_MAGIC_rhash | |
1603 | =for apidoc_item ||PERL_MAGIC_shared | |
1604 | =for apidoc_item ||PERL_MAGIC_shared_scalar | |
1605 | =for apidoc_item ||PERL_MAGIC_sig | |
1606 | =for apidoc_item ||PERL_MAGIC_sigelem | |
1607 | =for apidoc_item ||PERL_MAGIC_substr | |
1608 | =for apidoc_item ||PERL_MAGIC_sv | |
1609 | =for apidoc_item ||PERL_MAGIC_symtab | |
1610 | =for apidoc_item ||PERL_MAGIC_taint | |
1611 | =for apidoc_item ||PERL_MAGIC_tied | |
1612 | =for apidoc_item ||PERL_MAGIC_tiedelem | |
1613 | =for apidoc_item ||PERL_MAGIC_tiedscalar | |
1614 | =for apidoc_item ||PERL_MAGIC_utf8 | |
1615 | =for apidoc_item ||PERL_MAGIC_uvar | |
1616 | =for apidoc_item ||PERL_MAGIC_uvar_elem | |
1617 | =for apidoc_item ||PERL_MAGIC_vec | |
1618 | =for apidoc_item ||PERL_MAGIC_vstring | |
ed48408e | 1619 | |
f1f5ddd7 | 1620 | =for mg_vtable.pl end |
d1b91892 | 1621 | |
68dc0745 | 1622 | When an uppercase and lowercase letter both exist in the table, then the |
92f0c265 JP |
1623 | uppercase letter is typically used to represent some kind of composite type |
1624 | (a list or a hash), and the lowercase letter is used to represent an element | |
10e2eb10 | 1625 | of that composite type. Some internals code makes use of this case |
92f0c265 | 1626 | relationship. However, 'v' and 'V' (vec and v-string) are in no way related. |
14befaf4 | 1627 | |
3e510e80 LT |
1628 | The C<PERL_MAGIC_ext>, C<PERL_MAGIC_extvalue> and C<PERL_MAGIC_uvar> magic types |
1629 | are defined specifically for use by extensions and will not be used by perl | |
1630 | itself. Extensions can use C<PERL_MAGIC_ext> or C<PERL_MAGIC_extvalue> magic to | |
1631 | 'attach' private information to variables (typically objects). This is | |
1632 | especially useful because there is no way for normal perl code to corrupt this | |
1633 | private information (unlike using extra elements of a hash object). | |
1634 | C<PERL_MAGIC_extvalue> is value magic (unlike C<PERL_MAGIC_ext> and | |
1635 | C<PERL_MAGIC_uvar>) meaning that on localization the new value will not be | |
1636 | magical. | |
14befaf4 DM |
1637 | |
1638 | Similarly, C<PERL_MAGIC_uvar> magic can be used much like tie() to call a | |
1639 | C function any time a scalar's value is used or changed. The C<MAGIC>'s | |
bdbeb323 SM |
1640 | C<mg_ptr> field points to a C<ufuncs> structure: |
1641 | ||
1642 | struct ufuncs { | |
a9402793 AB |
1643 | I32 (*uf_val)(pTHX_ IV, SV*); |
1644 | I32 (*uf_set)(pTHX_ IV, SV*); | |
bdbeb323 SM |
1645 | IV uf_index; |
1646 | }; | |
1647 | ||
1648 | When the SV is read from or written to, the C<uf_val> or C<uf_set> | |
14befaf4 DM |
1649 | function will be called with C<uf_index> as the first arg and a pointer to |
1650 | the SV as the second. A simple example of how to add C<PERL_MAGIC_uvar> | |
1526ead6 AB |
1651 | magic is shown below. Note that the ufuncs structure is copied by |
1652 | sv_magic, so you can safely allocate it on the stack. | |
1653 | ||
1654 | void | |
1655 | Umagic(sv) | |
1656 | SV *sv; | |
1657 | PREINIT: | |
1658 | struct ufuncs uf; | |
1659 | CODE: | |
1660 | uf.uf_val = &my_get_fn; | |
1661 | uf.uf_set = &my_set_fn; | |
1662 | uf.uf_index = 0; | |
14befaf4 | 1663 | sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf)); |
5f05dabc | 1664 | |
1e73acc8 AS |
1665 | Attaching C<PERL_MAGIC_uvar> to arrays is permissible but has no effect. |
1666 | ||
1667 | For hashes there is a specialized hook that gives control over hash | |
1668 | keys (but not values). This hook calls C<PERL_MAGIC_uvar> 'get' magic | |
1669 | if the "set" function in the C<ufuncs> structure is NULL. The hook | |
1670 | is activated whenever the hash is accessed with a key specified as | |
1671 | an C<SV> through the functions C<hv_store_ent>, C<hv_fetch_ent>, | |
1672 | C<hv_delete_ent>, and C<hv_exists_ent>. Accessing the key as a string | |
1673 | through the functions without the C<..._ent> suffix circumvents the | |
4509d391 | 1674 | hook. See L<Hash::Util::FieldHash/GUTS> for a detailed description. |
1e73acc8 | 1675 | |
14befaf4 DM |
1676 | Note that because multiple extensions may be using C<PERL_MAGIC_ext> |
1677 | or C<PERL_MAGIC_uvar> magic, it is important for extensions to take | |
1678 | extra care to avoid conflict. Typically only using the magic on | |
1679 | objects blessed into the same class as the extension is sufficient. | |
2f07f21a FR |
1680 | For C<PERL_MAGIC_ext> magic, it is usually a good idea to define an |
1681 | C<MGVTBL>, even if all its fields will be C<0>, so that individual | |
1682 | C<MAGIC> pointers can be identified as a particular kind of magic | |
10e2eb10 | 1683 | using their magic virtual table. C<mg_findext> provides an easy way |
f6ee7b17 | 1684 | to do that: |
2f07f21a FR |
1685 | |
1686 | STATIC MGVTBL my_vtbl = { 0, 0, 0, 0, 0, 0, 0, 0 }; | |
1687 | ||
1688 | MAGIC *mg; | |
f6ee7b17 FR |
1689 | if ((mg = mg_findext(sv, PERL_MAGIC_ext, &my_vtbl))) { |
1690 | /* this is really ours, not another module's PERL_MAGIC_ext */ | |
1691 | my_priv_data_t *priv = (my_priv_data_t *)mg->mg_ptr; | |
1692 | ... | |
2f07f21a | 1693 | } |
5f05dabc | 1694 | |
ef50df4b GS |
1695 | Also note that the C<sv_set*()> and C<sv_cat*()> functions described |
1696 | earlier do B<not> invoke 'set' magic on their targets. This must | |
1697 | be done by the user either by calling the C<SvSETMAGIC()> macro after | |
1698 | calling these functions, or by using one of the C<sv_set*_mg()> or | |
1699 | C<sv_cat*_mg()> functions. Similarly, generic C code must call the | |
1700 | C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV | |
1701 | obtained from external sources in functions that don't handle magic. | |
4a4eefd0 | 1702 | See L<perlapi> for a description of these functions. |
189b2af5 GS |
1703 | For example, calls to the C<sv_cat*()> functions typically need to be |
1704 | followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()> | |
1705 | since their implementation handles 'get' magic. | |
1706 | ||
d1b91892 AD |
1707 | =head2 Finding Magic |
1708 | ||
a9b0660e KW |
1709 | MAGIC *mg_find(SV *sv, int type); /* Finds the magic pointer of that |
1710 | * type */ | |
f6ee7b17 FR |
1711 | |
1712 | This routine returns a pointer to a C<MAGIC> structure stored in the SV. | |
10e2eb10 FC |
1713 | If the SV does not have that magical |
1714 | feature, C<NULL> is returned. If the | |
f6ee7b17 | 1715 | SV has multiple instances of that magical feature, the first one will be |
10e2eb10 FC |
1716 | returned. C<mg_findext> can be used |
1717 | to find a C<MAGIC> structure of an SV | |
da8c5729 | 1718 | based on both its magic type and its magic virtual table: |
f6ee7b17 FR |
1719 | |
1720 | MAGIC *mg_findext(SV *sv, int type, MGVTBL *vtbl); | |
d1b91892 | 1721 | |
f6ee7b17 FR |
1722 | Also, if the SV passed to C<mg_find> or C<mg_findext> is not of type |
1723 | SVt_PVMG, Perl may core dump. | |
d1b91892 | 1724 | |
08105a92 | 1725 | int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen); |
d1b91892 AD |
1726 | |
1727 | This routine checks to see what types of magic C<sv> has. If the mg_type | |
68dc0745 | 1728 | field is an uppercase letter, then the mg_obj is copied to C<nsv>, but |
1729 | the mg_type field is changed to be the lowercase letter. | |
a0d0e21e | 1730 | |
04343c6d GS |
1731 | =head2 Understanding the Magic of Tied Hashes and Arrays |
1732 | ||
14befaf4 DM |
1733 | Tied hashes and arrays are magical beasts of the C<PERL_MAGIC_tied> |
1734 | magic type. | |
9edb2b46 GS |
1735 | |
1736 | WARNING: As of the 5.004 release, proper usage of the array and hash | |
1737 | access functions requires understanding a few caveats. Some | |
1738 | of these caveats are actually considered bugs in the API, to be fixed | |
10e2eb10 | 1739 | in later releases, and are bracketed with [MAYCHANGE] below. If |
9edb2b46 GS |
1740 | you find yourself actually applying such information in this section, be |
1741 | aware that the behavior may change in the future, umm, without warning. | |
04343c6d | 1742 | |
1526ead6 | 1743 | The perl tie function associates a variable with an object that implements |
9a68f1db | 1744 | the various GET, SET, etc methods. To perform the equivalent of the perl |
1526ead6 | 1745 | tie function from an XSUB, you must mimic this behaviour. The code below |
61ad4b94 | 1746 | carries out the necessary steps -- firstly it creates a new hash, and then |
1526ead6 | 1747 | creates a second hash which it blesses into the class which will implement |
10e2eb10 | 1748 | the tie methods. Lastly it ties the two hashes together, and returns a |
1526ead6 AB |
1749 | reference to the new tied hash. Note that the code below does NOT call the |
1750 | TIEHASH method in the MyTie class - | |
5a0de581 | 1751 | see L</Calling Perl Routines from within C Programs> for details on how |
1526ead6 AB |
1752 | to do this. |
1753 | ||
1754 | SV* | |
1755 | mytie() | |
1756 | PREINIT: | |
1757 | HV *hash; | |
1758 | HV *stash; | |
1759 | SV *tie; | |
1760 | CODE: | |
1761 | hash = newHV(); | |
1762 | tie = newRV_noinc((SV*)newHV()); | |
da51bb9b | 1763 | stash = gv_stashpv("MyTie", GV_ADD); |
1526ead6 | 1764 | sv_bless(tie, stash); |
899e16d0 | 1765 | hv_magic(hash, (GV*)tie, PERL_MAGIC_tied); |
1526ead6 AB |
1766 | RETVAL = newRV_noinc(hash); |
1767 | OUTPUT: | |
1768 | RETVAL | |
1769 | ||
04343c6d GS |
1770 | The C<av_store> function, when given a tied array argument, merely |
1771 | copies the magic of the array onto the value to be "stored", using | |
1772 | C<mg_copy>. It may also return NULL, indicating that the value did not | |
9edb2b46 GS |
1773 | actually need to be stored in the array. [MAYCHANGE] After a call to |
1774 | C<av_store> on a tied array, the caller will usually need to call | |
1775 | C<mg_set(val)> to actually invoke the perl level "STORE" method on the | |
1776 | TIEARRAY object. If C<av_store> did return NULL, a call to | |
1777 | C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory | |
1778 | leak. [/MAYCHANGE] | |
04343c6d GS |
1779 | |
1780 | The previous paragraph is applicable verbatim to tied hash access using the | |
1781 | C<hv_store> and C<hv_store_ent> functions as well. | |
1782 | ||
1783 | C<av_fetch> and the corresponding hash functions C<hv_fetch> and | |
1784 | C<hv_fetch_ent> actually return an undefined mortal value whose magic | |
1785 | has been initialized using C<mg_copy>. Note the value so returned does not | |
9edb2b46 GS |
1786 | need to be deallocated, as it is already mortal. [MAYCHANGE] But you will |
1787 | need to call C<mg_get()> on the returned value in order to actually invoke | |
1788 | the perl level "FETCH" method on the underlying TIE object. Similarly, | |
04343c6d GS |
1789 | you may also call C<mg_set()> on the return value after possibly assigning |
1790 | a suitable value to it using C<sv_setsv>, which will invoke the "STORE" | |
9edb2b46 | 1791 | method on the TIE object. [/MAYCHANGE] |
04343c6d | 1792 | |
9edb2b46 | 1793 | [MAYCHANGE] |
04343c6d GS |
1794 | In other words, the array or hash fetch/store functions don't really |
1795 | fetch and store actual values in the case of tied arrays and hashes. They | |
1796 | merely call C<mg_copy> to attach magic to the values that were meant to be | |
1797 | "stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually | |
1798 | do the job of invoking the TIE methods on the underlying objects. Thus | |
9edb2b46 | 1799 | the magic mechanism currently implements a kind of lazy access to arrays |
04343c6d GS |
1800 | and hashes. |
1801 | ||
1802 | Currently (as of perl version 5.004), use of the hash and array access | |
1803 | functions requires the user to be aware of whether they are operating on | |
9edb2b46 GS |
1804 | "normal" hashes and arrays, or on their tied variants. The API may be |
1805 | changed to provide more transparent access to both tied and normal data | |
1806 | types in future versions. | |
1807 | [/MAYCHANGE] | |
04343c6d GS |
1808 | |
1809 | You would do well to understand that the TIEARRAY and TIEHASH interfaces | |
1810 | are mere sugar to invoke some perl method calls while using the uniform hash | |
1811 | and array syntax. The use of this sugar imposes some overhead (typically | |
1812 | about two to four extra opcodes per FETCH/STORE operation, in addition to | |
1813 | the creation of all the mortal variables required to invoke the methods). | |
1814 | This overhead will be comparatively small if the TIE methods are themselves | |
1815 | substantial, but if they are only a few statements long, the overhead | |
1816 | will not be insignificant. | |
1817 | ||
d1c897a1 IZ |
1818 | =head2 Localizing changes |
1819 | ||
1820 | Perl has a very handy construction | |
1821 | ||
1822 | { | |
1823 | local $var = 2; | |
1824 | ... | |
1825 | } | |
1826 | ||
1827 | This construction is I<approximately> equivalent to | |
1828 | ||
1829 | { | |
1830 | my $oldvar = $var; | |
1831 | $var = 2; | |
1832 | ... | |
1833 | $var = $oldvar; | |
1834 | } | |
1835 | ||
1836 | The biggest difference is that the first construction would | |
1837 | reinstate the initial value of $var, irrespective of how control exits | |
10e2eb10 | 1838 | the block: C<goto>, C<return>, C<die>/C<eval>, etc. It is a little bit |
d1c897a1 IZ |
1839 | more efficient as well. |
1840 | ||
1841 | There is a way to achieve a similar task from C via Perl API: create a | |
1842 | I<pseudo-block>, and arrange for some changes to be automatically | |
1843 | undone at the end of it, either explicit, or via a non-local exit (via | |
10e2eb10 | 1844 | die()). A I<block>-like construct is created by a pair of |
b687b08b TC |
1845 | C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">). |
1846 | Such a construct may be created specially for some important localized | |
1847 | task, or an existing one (like boundaries of enclosing Perl | |
1848 | subroutine/block, or an existing pair for freeing TMPs) may be | |
10e2eb10 FC |
1849 | used. (In the second case the overhead of additional localization must |
1850 | be almost negligible.) Note that any XSUB is automatically enclosed in | |
b687b08b | 1851 | an C<ENTER>/C<LEAVE> pair. |
d1c897a1 IZ |
1852 | |
1853 | Inside such a I<pseudo-block> the following service is available: | |
1854 | ||
13a2d996 | 1855 | =over 4 |
d1c897a1 IZ |
1856 | |
1857 | =item C<SAVEINT(int i)> | |
1858 | ||
1859 | =item C<SAVEIV(IV i)> | |
1860 | ||
1861 | =item C<SAVEI32(I32 i)> | |
1862 | ||
1863 | =item C<SAVELONG(long i)> | |
1864 | ||
6c53e783 KW |
1865 | =item C<SAVEI8(I8 i)> |
1866 | ||
1867 | =item C<SAVEI16(I16 i)> | |
1868 | ||
1869 | =item C<SAVEBOOL(int i)> | |
1870 | ||
58541fd0 PE |
1871 | =item C<SAVESTRLEN(STRLEN i)> |
1872 | ||
d1c897a1 | 1873 | These macros arrange things to restore the value of integer variable |
88d9f68d | 1874 | C<i> at the end of the enclosing I<pseudo-block>. |
d1c897a1 | 1875 | |
7cc7ada7 | 1876 | =for apidoc_section $callback |
9144f9d9 KW |
1877 | =for apidoc Amh||SAVEINT|int i |
1878 | =for apidoc Amh||SAVEIV|IV i | |
1879 | =for apidoc Amh||SAVEI32|I32 i | |
1880 | =for apidoc Amh||SAVELONG|long i | |
1881 | =for apidoc Amh||SAVEI8|I8 i | |
1882 | =for apidoc Amh||SAVEI16|I16 i | |
d633272a | 1883 | =for apidoc Amh||SAVEBOOL|bool i |
58541fd0 | 1884 | =for apidoc Amh||SAVESTRLEN|STRLEN i |
9144f9d9 | 1885 | |
d1c897a1 IZ |
1886 | =item C<SAVESPTR(s)> |
1887 | ||
1888 | =item C<SAVEPPTR(p)> | |
1889 | ||
1890 | These macros arrange things to restore the value of pointers C<s> and | |
10e2eb10 | 1891 | C<p>. C<s> must be a pointer of a type which survives conversion to |
d1c897a1 IZ |
1892 | C<SV*> and back, C<p> should be able to survive conversion to C<char*> |
1893 | and back. | |
1894 | ||
9144f9d9 KW |
1895 | =for apidoc Amh||SAVESPTR|SV * s |
1896 | =for apidoc Amh||SAVEPPTR|char * p | |
1897 | ||
624f6f53 YO |
1898 | =item C<SAVERCPV(char **ppv)> |
1899 | ||
1900 | This macro arranges to restore the value of a C<char *> variable which | |
1901 | was allocated with a call to C<rcpv_new()> to its previous state when | |
1902 | the current pseudo block is completed. The pointer stored in C<*ppv> at | |
1903 | the time of the call will be refcount incremented and stored on the save | |
1904 | stack. Later when the current I<pseudo-block> is completed the value | |
1905 | stored in C<*ppv> will be refcount decremented, and the previous value | |
1906 | restored from the savestack which will also be refcount decremented. | |
1907 | ||
1908 | This is the C<RCPV> equivalent of C<SAVEGENERICSV()>. | |
1909 | ||
1910 | =for apidoc Amh||SAVERCPV|char *pv | |
1911 | ||
1912 | =item C<SAVEGENERICSV(SV **psv)> | |
1913 | ||
1914 | This macro arranges to restore the value of a C<SV *> variable to its | |
1915 | previous state when the current pseudo block is completed. The pointer | |
1916 | stored in C<*psv> at the time of the call will be refcount incremented | |
1917 | and stored on the save stack. Later when the current I<pseudo-block> is | |
1918 | completed the value stored in C<*ppv> will be refcount decremented, and | |
1919 | the previous value restored from the savestack which will also be refcount | |
1920 | decremented. This the C equivalent of C<local $sv>. | |
1921 | ||
1922 | =for apidoc Amh||SAVEGENERICSV|char **psv | |
1923 | ||
d1c897a1 IZ |
1924 | =item C<SAVEFREESV(SV *sv)> |
1925 | ||
06f1e0b6 | 1926 | The refcount of C<sv> will be decremented at the end of |
26d9b02f JH |
1927 | I<pseudo-block>. This is similar to C<sv_2mortal> in that it is also a |
1928 | mechanism for doing a delayed C<SvREFCNT_dec>. However, while C<sv_2mortal> | |
1929 | extends the lifetime of C<sv> until the beginning of the next statement, | |
1930 | C<SAVEFREESV> extends it until the end of the enclosing scope. These | |
1931 | lifetimes can be wildly different. | |
1932 | ||
1933 | Also compare C<SAVEMORTALIZESV>. | |
1934 | ||
9144f9d9 KW |
1935 | =for apidoc Amh||SAVEFREESV|SV* sv |
1936 | ||
26d9b02f JH |
1937 | =item C<SAVEMORTALIZESV(SV *sv)> |
1938 | ||
1939 | Just like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current | |
1940 | scope instead of decrementing its reference count. This usually has the | |
1941 | effect of keeping C<sv> alive until the statement that called the currently | |
1942 | live scope has finished executing. | |
d1c897a1 | 1943 | |
9144f9d9 KW |
1944 | =for apidoc Amh||SAVEMORTALIZESV|SV* sv |
1945 | ||
d1c897a1 IZ |
1946 | =item C<SAVEFREEOP(OP *op)> |
1947 | ||
624f6f53 | 1948 | The C<OP *> is C<op_free()>ed at the end of I<pseudo-block>. |
d1c897a1 | 1949 | |
9144f9d9 KW |
1950 | =for apidoc Amh||SAVEFREEOP|OP *op |
1951 | ||
d1c897a1 IZ |
1952 | =item C<SAVEFREEPV(p)> |
1953 | ||
624f6f53 YO |
1954 | The chunk of memory which is pointed to by C<p> is C<Safefree()>ed at the |
1955 | end of the current I<pseudo-block>. | |
1956 | ||
1957 | =for apidoc Amh||SAVEFREEPV|char *pv | |
1958 | ||
1959 | =item C<SAVEFREERCPV(char *pv)> | |
1960 | ||
1961 | Ensures that a C<char *> which was created by a call to C<rcpv_new()> is | |
1962 | C<rcpv_free()>ed at the end of the current I<pseudo-block>. | |
1963 | ||
1964 | This is the RCPV equivalent of C<SAVEFREESV()>. | |
d1c897a1 | 1965 | |
624f6f53 | 1966 | =for apidoc Amh||SAVEFREERCPV|char *pv |
9144f9d9 | 1967 | |
d1c897a1 IZ |
1968 | =item C<SAVECLEARSV(SV *sv)> |
1969 | ||
1970 | Clears a slot in the current scratchpad which corresponds to C<sv> at | |
1971 | the end of I<pseudo-block>. | |
1972 | ||
1973 | =item C<SAVEDELETE(HV *hv, char *key, I32 length)> | |
1974 | ||
10e2eb10 | 1975 | The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The |
d1c897a1 IZ |
1976 | string pointed to by C<key> is Safefree()ed. If one has a I<key> in |
1977 | short-lived storage, the corresponding string may be reallocated like | |
1978 | this: | |
1979 | ||
9cde0e7f | 1980 | SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf)); |
d1c897a1 | 1981 | |
9144f9d9 KW |
1982 | =for apidoc Amh||SAVEDELETE|HV * hv|char * key|I32 length |
1983 | ||
c76ac1ee | 1984 | =item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)> |
d1c897a1 IZ |
1985 | |
1986 | At the end of I<pseudo-block> the function C<f> is called with the | |
2f920c2f | 1987 | only argument C<p> which may be NULL. |
c76ac1ee | 1988 | |
63dbc4a9 | 1989 | =for apidoc Ayh||DESTRUCTORFUNC_NOCONTEXT_t |
9144f9d9 KW |
1990 | =for apidoc Amh||SAVEDESTRUCTOR|DESTRUCTORFUNC_NOCONTEXT_t f|void *p |
1991 | ||
c76ac1ee GS |
1992 | =item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)> |
1993 | ||
1994 | At the end of I<pseudo-block> the function C<f> is called with the | |
2f920c2f YO |
1995 | implicit context argument (if any), and C<p> which may be NULL. |
1996 | ||
1997 | Note the I<end of the current pseudo-block> may occur much later than | |
493e6288 | 1998 | the I<end of the current statement>. You may wish to look at the |
475dc022 | 1999 | C<MORTALSVFUNC_X()> macro instead. |
d1c897a1 | 2000 | |
63dbc4a9 KW |
2001 | =for apidoc Ayh||DESTRUCTORFUNC_t |
2002 | =for apidoc Amh||SAVEDESTRUCTOR_X|DESTRUCTORFUNC_t f|void *p | |
9144f9d9 | 2003 | |
2f920c2f YO |
2004 | =item C<MORTALSVFUNC_X(SVFUNC_t f, SV *sv)> |
2005 | ||
2006 | At the end of I<the current statement> the function C<f> is called with | |
2007 | the implicit context argument (if any), and C<sv> which may be NULL. | |
2008 | ||
2009 | Be aware that the parameter argument to the destructor function differs | |
2010 | from the related C<SAVEDESTRUCTOR_X()> in that it MUST be either NULL or | |
2011 | an C<SV*>. | |
2012 | ||
2013 | Note the I<end of the current statement> may occur much before the | |
2014 | the I<end of the current pseudo-block>. You may wish to look at the | |
2015 | C<SAVEDESTRUCTOR_X()> macro instead. | |
2016 | ||
475dc022 | 2017 | =for apidoc Amh||MORTALSVFUNC_X|SVFUNC_t f|SV *sv |
2f920c2f YO |
2018 | |
2019 | =item C<MORTALDESTRUCTOR_SV(SV *coderef, SV *args)> | |
2020 | ||
2021 | At the end of I<the current statement> the Perl function contained in | |
2022 | C<coderef> is called with the arguments provided (if any) in C<args>. | |
2023 | See the documentation for C<mortal_destructor_sv()> for details on | |
2024 | the C<args> parameter is handled. | |
2025 | ||
2026 | Note the I<end of the current statement> may occur much before the | |
2027 | the I<end of the current pseudo-block>. If you wish to call a perl | |
2028 | function at the end of the current pseudo block you should use the | |
2029 | C<SAVEDESTRUCTOR_X()> API instead, which will require you create a | |
2030 | C wrapper to call the Perl function. | |
2031 | ||
2032 | =for apidoc Amh||MORTALDESTRUCTOR_SV|SV *coderef|SV *args | |
2033 | ||
d1c897a1 IZ |
2034 | =item C<SAVESTACK_POS()> |
2035 | ||
2036 | The current offset on the Perl internal stack (cf. C<SP>) is restored | |
2037 | at the end of I<pseudo-block>. | |
2038 | ||
9144f9d9 KW |
2039 | =for apidoc Amh||SAVESTACK_POS |
2040 | ||
d1c897a1 IZ |
2041 | =back |
2042 | ||
2043 | The following API list contains functions, thus one needs to | |
2044 | provide pointers to the modifiable data explicitly (either C pointers, | |
00aadd71 | 2045 | or Perlish C<GV *>s). Where the above macros take C<int>, a similar |
d1c897a1 IZ |
2046 | function takes C<int *>. |
2047 | ||
9144f9d9 KW |
2048 | Other macros above have functions implementing them, but its probably |
2049 | best to just use the macro, and not those or the ones below. | |
2050 | ||
13a2d996 | 2051 | =over 4 |
d1c897a1 IZ |
2052 | |
2053 | =item C<SV* save_scalar(GV *gv)> | |
2054 | ||
4f313521 KW |
2055 | =for apidoc save_scalar |
2056 | ||
d1c897a1 IZ |
2057 | Equivalent to Perl code C<local $gv>. |
2058 | ||
2059 | =item C<AV* save_ary(GV *gv)> | |
2060 | ||
4f313521 KW |
2061 | =for apidoc save_ary |
2062 | ||
d1c897a1 IZ |
2063 | =item C<HV* save_hash(GV *gv)> |
2064 | ||
4f313521 KW |
2065 | =for apidoc save_hash |
2066 | ||
d1c897a1 IZ |
2067 | Similar to C<save_scalar>, but localize C<@gv> and C<%gv>. |
2068 | ||
2069 | =item C<void save_item(SV *item)> | |
2070 | ||
53dedf6f KW |
2071 | =for apidoc save_item |
2072 | ||
2073 | Duplicates the current value of C<SV>. On the exit from the current | |
2074 | C<ENTER>/C<LEAVE> I<pseudo-block> the value of C<SV> will be restored | |
10e2eb10 | 2075 | using the stored value. It doesn't handle magic. Use C<save_scalar> if |
038fcae3 | 2076 | magic is affected. |
d1c897a1 | 2077 | |
d1c897a1 IZ |
2078 | =item C<SV* save_svref(SV **sptr)> |
2079 | ||
4f313521 KW |
2080 | =for apidoc save_svref |
2081 | ||
d1be9408 | 2082 | Similar to C<save_scalar>, but will reinstate an C<SV *>. |
d1c897a1 IZ |
2083 | |
2084 | =item C<void save_aptr(AV **aptr)> | |
2085 | ||
2086 | =item C<void save_hptr(HV **hptr)> | |
2087 | ||
4f313521 KW |
2088 | =for apidoc save_aptr |
2089 | =for apidoc save_hptr | |
2090 | ||
d1c897a1 IZ |
2091 | Similar to C<save_svref>, but localize C<AV *> and C<HV *>. |
2092 | ||
2093 | =back | |
2094 | ||
2095 | The C<Alias> module implements localization of the basic types within the | |
2096 | I<caller's scope>. People who are interested in how to localize things in | |
2097 | the containing scope should take a look there too. | |
2098 | ||
0a753a76 | 2099 | =head1 Subroutines |
a0d0e21e | 2100 | |
68dc0745 | 2101 | =head2 XSUBs and the Argument Stack |
5f05dabc | 2102 | |
2103 | The XSUB mechanism is a simple way for Perl programs to access C subroutines. | |
2104 | An XSUB routine will have a stack that contains the arguments from the Perl | |
2105 | program, and a way to map from the Perl data structures to a C equivalent. | |
2106 | ||
2107 | The stack arguments are accessible through the C<ST(n)> macro, which returns | |
2108 | the C<n>'th stack argument. Argument 0 is the first argument passed in the | |
2109 | Perl subroutine call. These arguments are C<SV*>, and can be used anywhere | |
2110 | an C<SV*> is used. | |
2111 | ||
2112 | Most of the time, output from the C routine can be handled through use of | |
2113 | the RETVAL and OUTPUT directives. However, there are some cases where the | |
2114 | argument stack is not already long enough to handle all the return values. | |
2115 | An example is the POSIX tzname() call, which takes no arguments, but returns | |
2116 | two, the local time zone's standard and summer time abbreviations. | |
2117 | ||
2118 | To handle this situation, the PPCODE directive is used and the stack is | |
2119 | extended using the macro: | |
2120 | ||
924508f0 | 2121 | EXTEND(SP, num); |
5f05dabc | 2122 | |
924508f0 GS |
2123 | where C<SP> is the macro that represents the local copy of the stack pointer, |
2124 | and C<num> is the number of elements the stack should be extended by. | |
5f05dabc | 2125 | |
00aadd71 | 2126 | Now that there is room on the stack, values can be pushed on it using C<PUSHs> |
10e2eb10 | 2127 | macro. The pushed values will often need to be "mortal" (See |
d82b684c | 2128 | L</Reference Counts and Mortality>): |
5f05dabc | 2129 | |
00aadd71 | 2130 | PUSHs(sv_2mortal(newSViv(an_integer))) |
d82b684c SH |
2131 | PUSHs(sv_2mortal(newSVuv(an_unsigned_integer))) |
2132 | PUSHs(sv_2mortal(newSVnv(a_double))) | |
00aadd71 | 2133 | PUSHs(sv_2mortal(newSVpv("Some String",0))) |
a9b0660e KW |
2134 | /* Although the last example is better written as the more |
2135 | * efficient: */ | |
a3179684 | 2136 | PUSHs(newSVpvs_flags("Some String", SVs_TEMP)) |
5f05dabc | 2137 | |
2138 | And now the Perl program calling C<tzname>, the two values will be assigned | |
2139 | as in: | |
2140 | ||
2141 | ($standard_abbrev, $summer_abbrev) = POSIX::tzname; | |
2142 | ||
2143 | An alternate (and possibly simpler) method to pushing values on the stack is | |
00aadd71 | 2144 | to use the macro: |
5f05dabc | 2145 | |
5f05dabc | 2146 | XPUSHs(SV*) |
2147 | ||
da8c5729 | 2148 | This macro automatically adjusts the stack for you, if needed. Thus, you |
5f05dabc | 2149 | do not need to call C<EXTEND> to extend the stack. |
00aadd71 NIS |
2150 | |
2151 | Despite their suggestions in earlier versions of this document the macros | |
d82b684c SH |
2152 | C<(X)PUSH[iunp]> are I<not> suited to XSUBs which return multiple results. |
2153 | For that, either stick to the C<(X)PUSHs> macros shown above, or use the new | |
2154 | C<m(X)PUSH[iunp]> macros instead; see L</Putting a C value on Perl stack>. | |
5f05dabc | 2155 | |
2156 | For more information, consult L<perlxs> and L<perlxstut>. | |
2157 | ||
5b36e945 FC |
2158 | =head2 Autoloading with XSUBs |
2159 | ||
2160 | If an AUTOLOAD routine is an XSUB, as with Perl subroutines, Perl puts the | |
2161 | fully-qualified name of the autoloaded subroutine in the $AUTOLOAD variable | |
2162 | of the XSUB's package. | |
2163 | ||
2164 | But it also puts the same information in certain fields of the XSUB itself: | |
2165 | ||
2166 | HV *stash = CvSTASH(cv); | |
2167 | const char *subname = SvPVX(cv); | |
2168 | STRLEN name_length = SvCUR(cv); /* in bytes */ | |
2169 | U32 is_utf8 = SvUTF8(cv); | |
f703fc96 | 2170 | |
5b36e945 | 2171 | C<SvPVX(cv)> contains just the sub name itself, not including the package. |
d8893903 FC |
2172 | For an AUTOLOAD routine in UNIVERSAL or one of its superclasses, |
2173 | C<CvSTASH(cv)> returns NULL during a method call on a nonexistent package. | |
5b36e945 FC |
2174 | |
2175 | B<Note>: Setting $AUTOLOAD stopped working in 5.6.1, which did not support | |
2176 | XS AUTOLOAD subs at all. Perl 5.8.0 introduced the use of fields in the | |
2177 | XSUB itself. Perl 5.16.0 restored the setting of $AUTOLOAD. If you need | |
2178 | to support 5.8-5.14, use the XSUB's fields. | |
2179 | ||
5f05dabc | 2180 | =head2 Calling Perl Routines from within C Programs |
a0d0e21e LW |
2181 | |
2182 | There are four routines that can be used to call a Perl subroutine from | |
2183 | within a C program. These four are: | |
2184 | ||
954c1994 GS |
2185 | I32 call_sv(SV*, I32); |
2186 | I32 call_pv(const char*, I32); | |
2187 | I32 call_method(const char*, I32); | |
5aaab254 | 2188 | I32 call_argv(const char*, I32, char**); |
a0d0e21e | 2189 | |
954c1994 | 2190 | The routine most often used is C<call_sv>. The C<SV*> argument |
d1b91892 AD |
2191 | contains either the name of the Perl subroutine to be called, or a |
2192 | reference to the subroutine. The second argument consists of flags | |
2193 | that control the context in which the subroutine is called, whether | |
2194 | or not the subroutine is being passed arguments, how errors should be | |
2195 | trapped, and how to treat return values. | |
a0d0e21e LW |
2196 | |
2197 | All four routines return the number of arguments that the subroutine returned | |
2198 | on the Perl stack. | |
2199 | ||
9a68f1db | 2200 | These routines used to be called C<perl_call_sv>, etc., before Perl v5.6.0, |
954c1994 GS |
2201 | but those names are now deprecated; macros of the same name are provided for |
2202 | compatibility. | |
2203 | ||
2204 | When using any of these routines (except C<call_argv>), the programmer | |
d1b91892 AD |
2205 | must manipulate the Perl stack. These include the following macros and |
2206 | functions: | |
a0d0e21e LW |
2207 | |
2208 | dSP | |
924508f0 | 2209 | SP |
a0d0e21e LW |
2210 | PUSHMARK() |
2211 | PUTBACK | |
2212 | SPAGAIN | |
2213 | ENTER | |
2214 | SAVETMPS | |
2215 | FREETMPS | |
2216 | LEAVE | |
2217 | XPUSH*() | |
cb1a09d0 | 2218 | POP*() |
a0d0e21e | 2219 | |
5f05dabc | 2220 | For a detailed description of calling conventions from C to Perl, |
2221 | consult L<perlcall>. | |
a0d0e21e | 2222 | |
8ebc5c01 | 2223 | =head2 Putting a C value on Perl stack |
ce3d39e2 IZ |
2224 | |
2225 | A lot of opcodes (this is an elementary operation in the internal perl | |
10e2eb10 FC |
2226 | stack machine) put an SV* on the stack. However, as an optimization |
2227 | the corresponding SV is (usually) not recreated each time. The opcodes | |
ce3d39e2 IZ |
2228 | reuse specially assigned SVs (I<target>s) which are (as a corollary) |
2229 | not constantly freed/created. | |
2230 | ||
0a753a76 | 2231 | Each of the targets is created only once (but see |
5a0de581 | 2232 | L</Scratchpads and recursion> below), and when an opcode needs to put |
01825556 | 2233 | an integer, a double, or a string on the stack, it just sets the |
ce3d39e2 IZ |
2234 | corresponding parts of its I<target> and puts the I<target> on stack. |
2235 | ||
2236 | The macro to put this target on stack is C<PUSHTARG>, and it is | |
2237 | directly used in some opcodes, as well as indirectly in zillions of | |
d82b684c | 2238 | others, which use it via C<(X)PUSH[iunp]>. |
ce3d39e2 | 2239 | |
1bd1c0d5 | 2240 | Because the target is reused, you must be careful when pushing multiple |
10e2eb10 | 2241 | values on the stack. The following code will not do what you think: |
1bd1c0d5 SC |
2242 | |
2243 | XPUSHi(10); | |
2244 | XPUSHi(20); | |
2245 | ||
2246 | This translates as "set C<TARG> to 10, push a pointer to C<TARG> onto | |
2247 | the stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack". | |
2248 | At the end of the operation, the stack does not contain the values 10 | |
2249 | and 20, but actually contains two pointers to C<TARG>, which we have set | |
d82b684c | 2250 | to 20. |
1bd1c0d5 | 2251 | |
d82b684c SH |
2252 | If you need to push multiple different values then you should either use |
2253 | the C<(X)PUSHs> macros, or else use the new C<m(X)PUSH[iunp]> macros, | |
2254 | none of which make use of C<TARG>. The C<(X)PUSHs> macros simply push an | |
2255 | SV* on the stack, which, as noted under L</XSUBs and the Argument Stack>, | |
2256 | will often need to be "mortal". The new C<m(X)PUSH[iunp]> macros make | |
2257 | this a little easier to achieve by creating a new mortal for you (via | |
2258 | C<(X)PUSHmortal>), pushing that onto the stack (extending it if necessary | |
2259 | in the case of the C<mXPUSH[iunp]> macros), and then setting its value. | |
2260 | Thus, instead of writing this to "fix" the example above: | |
2261 | ||
2262 | XPUSHs(sv_2mortal(newSViv(10))) | |
2263 | XPUSHs(sv_2mortal(newSViv(20))) | |
2264 | ||
2265 | you can simply write: | |
2266 | ||
2267 | mXPUSHi(10) | |
2268 | mXPUSHi(20) | |
2269 | ||
2270 | On a related note, if you do use C<(X)PUSH[iunp]>, then you're going to | |
1bd1c0d5 | 2271 | need a C<dTARG> in your variable declarations so that the C<*PUSH*> |
0985f7e5 | 2272 | macros can make use of the local variable C<TARG>. See also |
6ef63541 | 2273 | C<dTARGET> and C<dXSTARG>. |
1bd1c0d5 | 2274 | |
8ebc5c01 | 2275 | =head2 Scratchpads |
ce3d39e2 | 2276 | |
54310121 | 2277 | The question remains on when the SVs which are I<target>s for opcodes |
10e2eb10 | 2278 | are created. The answer is that they are created when the current |
ac036724 | 2279 | unit--a subroutine or a file (for opcodes for statements outside of |
10e2eb10 | 2280 | subroutines)--is compiled. During this time a special anonymous Perl |
ac036724 | 2281 | array is created, which is called a scratchpad for the current unit. |
ce3d39e2 | 2282 | |
54310121 | 2283 | A scratchpad keeps SVs which are lexicals for the current unit and are |
d777b41a FC |
2284 | targets for opcodes. A previous version of this document |
2285 | stated that one can deduce that an SV lives on a scratchpad | |
ce3d39e2 | 2286 | by looking on its flags: lexicals have C<SVs_PADMY> set, and |
eee3e302 | 2287 | I<target>s have C<SVs_PADTMP> set. But this has never been fully true. |
d777b41a FC |
2288 | C<SVs_PADMY> could be set on a variable that no longer resides in any pad. |
2289 | While I<target>s do have C<SVs_PADTMP> set, it can also be set on variables | |
eed77337 FC |
2290 | that have never resided in a pad, but nonetheless act like I<target>s. As |
2291 | of perl 5.21.5, the C<SVs_PADMY> flag is no longer used and is defined as | |
2292 | 0. C<SvPADMY()> now returns true for anything without C<SVs_PADTMP>. | |
ce3d39e2 | 2293 | |
6ef63541 KW |
2294 | =for apidoc_section $pad |
2295 | =for apidoc Amnh||SVs_PADTMP | |
2296 | =for apidoc AmnhD||SVs_PADMY | |
2297 | ||
10e2eb10 | 2298 | The correspondence between OPs and I<target>s is not 1-to-1. Different |
54310121 | 2299 | OPs in the compile tree of the unit can use the same target, if this |
ce3d39e2 IZ |
2300 | would not conflict with the expected life of the temporary. |
2301 | ||
2ae324a7 | 2302 | =head2 Scratchpads and recursion |
ce3d39e2 IZ |
2303 | |
2304 | In fact it is not 100% true that a compiled unit contains a pointer to | |
10e2eb10 FC |
2305 | the scratchpad AV. In fact it contains a pointer to an AV of |
2306 | (initially) one element, and this element is the scratchpad AV. Why do | |
ce3d39e2 IZ |
2307 | we need an extra level of indirection? |
2308 | ||
10e2eb10 | 2309 | The answer is B<recursion>, and maybe B<threads>. Both |
ce3d39e2 | 2310 | these can create several execution pointers going into the same |
10e2eb10 | 2311 | subroutine. For the subroutine-child not write over the temporaries |
ce3d39e2 IZ |
2312 | for the subroutine-parent (lifespan of which covers the call to the |
2313 | child), the parent and the child should have different | |
10e2eb10 | 2314 | scratchpads. (I<And> the lexicals should be separate anyway!) |
ce3d39e2 | 2315 | |
5f05dabc | 2316 | So each subroutine is born with an array of scratchpads (of length 1). |
2317 | On each entry to the subroutine it is checked that the current | |
ce3d39e2 IZ |
2318 | depth of the recursion is not more than the length of this array, and |
2319 | if it is, new scratchpad is created and pushed into the array. | |
2320 | ||
2321 | The I<target>s on this scratchpad are C<undef>s, but they are already | |
2322 | marked with correct flags. | |
2323 | ||
22d36020 FC |
2324 | =head1 Memory Allocation |
2325 | ||
2326 | =head2 Allocation | |
2327 | ||
2328 | All memory meant to be used with the Perl API functions should be manipulated | |
2329 | using the macros described in this section. The macros provide the necessary | |
2330 | transparency between differences in the actual malloc implementation that is | |
2331 | used within perl. | |
2332 | ||
22d36020 FC |
2333 | The following three macros are used to initially allocate memory : |
2334 | ||
2335 | Newx(pointer, number, type); | |
2336 | Newxc(pointer, number, type, cast); | |
2337 | Newxz(pointer, number, type); | |
2338 | ||
2339 | The first argument C<pointer> should be the name of a variable that will | |
2340 | point to the newly allocated memory. | |
2341 | ||
2342 | The second and third arguments C<number> and C<type> specify how many of | |
2343 | the specified type of data structure should be allocated. The argument | |
2344 | C<type> is passed to C<sizeof>. The final argument to C<Newxc>, C<cast>, | |
2345 | should be used if the C<pointer> argument is different from the C<type> | |
2346 | argument. | |
2347 | ||
2348 | Unlike the C<Newx> and C<Newxc> macros, the C<Newxz> macro calls C<memzero> | |
2349 | to zero out all the newly allocated memory. | |
2350 | ||
2351 | =head2 Reallocation | |
2352 | ||
2353 | Renew(pointer, number, type); | |
2354 | Renewc(pointer, number, type, cast); | |
2355 | Safefree(pointer) | |
2356 | ||
2357 | These three macros are used to change a memory buffer size or to free a | |
2358 | piece of memory no longer needed. The arguments to C<Renew> and C<Renewc> | |
2359 | match those of C<New> and C<Newc> with the exception of not needing the | |
2360 | "magic cookie" argument. | |
2361 | ||
2362 | =head2 Moving | |
2363 | ||
2364 | Move(source, dest, number, type); | |
2365 | Copy(source, dest, number, type); | |
2366 | Zero(dest, number, type); | |
2367 | ||
2368 | These three macros are used to move, copy, or zero out previously allocated | |
2369 | memory. The C<source> and C<dest> arguments point to the source and | |
2370 | destination starting points. Perl will move, copy, or zero out C<number> | |
2371 | instances of the size of the C<type> data structure (using the C<sizeof> | |
2372 | function). | |
2373 | ||
2374 | =head1 PerlIO | |
2375 | ||
2376 | The most recent development releases of Perl have been experimenting with | |
2377 | removing Perl's dependency on the "normal" standard I/O suite and allowing | |
2378 | other stdio implementations to be used. This involves creating a new | |
2379 | abstraction layer that then calls whichever implementation of stdio Perl | |
2380 | was compiled with. All XSUBs should now use the functions in the PerlIO | |
2381 | abstraction layer and not make any assumptions about what kind of stdio | |
2382 | is being used. | |
2383 | ||
2384 | For a complete description of the PerlIO abstraction, consult L<perlapio>. | |
2385 | ||
0a753a76 | 2386 | =head1 Compiled code |
2387 | ||
2388 | =head2 Code tree | |
2389 | ||
2390 | Here we describe the internal form your code is converted to by | |
10e2eb10 | 2391 | Perl. Start with a simple example: |
0a753a76 | 2392 | |
2393 | $a = $b + $c; | |
2394 | ||
2395 | This is converted to a tree similar to this one: | |
2396 | ||
2397 | assign-to | |
2398 | / \ | |
2399 | + $a | |
2400 | / \ | |
2401 | $b $c | |
2402 | ||
7b8d334a | 2403 | (but slightly more complicated). This tree reflects the way Perl |
0a753a76 | 2404 | parsed your code, but has nothing to do with the execution order. |
2405 | There is an additional "thread" going through the nodes of the tree | |
2406 | which shows the order of execution of the nodes. In our simplified | |
2407 | example above it looks like: | |
2408 | ||
2409 | $b ---> $c ---> + ---> $a ---> assign-to | |
2410 | ||
2411 | But with the actual compile tree for C<$a = $b + $c> it is different: | |
2412 | some nodes I<optimized away>. As a corollary, though the actual tree | |
2413 | contains more nodes than our simplified example, the execution order | |
2414 | is the same as in our example. | |
2415 | ||
2416 | =head2 Examining the tree | |
2417 | ||
06f6df17 RGS |
2418 | If you have your perl compiled for debugging (usually done with |
2419 | C<-DDEBUGGING> on the C<Configure> command line), you may examine the | |
0a753a76 | 2420 | compiled tree by specifying C<-Dx> on the Perl command line. The |
2421 | output takes several lines per node, and for C<$b+$c> it looks like | |
2422 | this: | |
2423 | ||
2424 | 5 TYPE = add ===> 6 | |
2425 | TARG = 1 | |
2426 | FLAGS = (SCALAR,KIDS) | |
2427 | { | |
2428 | TYPE = null ===> (4) | |
2429 | (was rv2sv) | |
2430 | FLAGS = (SCALAR,KIDS) | |
2431 | { | |
2432 | 3 TYPE = gvsv ===> 4 | |
2433 | FLAGS = (SCALAR) | |
2434 | GV = main::b | |
2435 | } | |
2436 | } | |
2437 | { | |
2438 | TYPE = null ===> (5) | |
2439 | (was rv2sv) | |
2440 | FLAGS = (SCALAR,KIDS) | |
2441 | { | |
2442 | 4 TYPE = gvsv ===> 5 | |
2443 | FLAGS = (SCALAR) | |
2444 | GV = main::c | |
2445 | } | |
2446 | } | |
2447 | ||
2448 | This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are | |
2449 | not optimized away (one per number in the left column). The immediate | |
2450 | children of the given node correspond to C<{}> pairs on the same level | |
2451 | of indentation, thus this listing corresponds to the tree: | |
2452 | ||
2453 | add | |
2454 | / \ | |
2455 | null null | |
2456 | | | | |
2457 | gvsv gvsv | |
2458 | ||
2459 | The execution order is indicated by C<===E<gt>> marks, thus it is C<3 | |
2460 | 4 5 6> (node C<6> is not included into above listing), i.e., | |
2461 | C<gvsv gvsv add whatever>. | |
2462 | ||
9afa14e3 | 2463 | Each of these nodes represents an op, a fundamental operation inside the |
10e2eb10 | 2464 | Perl core. The code which implements each operation can be found in the |
9afa14e3 | 2465 | F<pp*.c> files; the function which implements the op with type C<gvsv> |
10e2eb10 | 2466 | is C<pp_gvsv>, and so on. As the tree above shows, different ops have |
9afa14e3 | 2467 | different numbers of children: C<add> is a binary operator, as one would |
10e2eb10 | 2468 | expect, and so has two children. To accommodate the various different |
9afa14e3 SC |
2469 | numbers of children, there are various types of op data structure, and |
2470 | they link together in different ways. | |
2471 | ||
10e2eb10 | 2472 | The simplest type of op structure is C<OP>: this has no children. Unary |
9afa14e3 | 2473 | operators, C<UNOP>s, have one child, and this is pointed to by the |
10e2eb10 FC |
2474 | C<op_first> field. Binary operators (C<BINOP>s) have not only an |
2475 | C<op_first> field but also an C<op_last> field. The most complex type of | |
2476 | op is a C<LISTOP>, which has any number of children. In this case, the | |
9afa14e3 | 2477 | first child is pointed to by C<op_first> and the last child by |
10e2eb10 | 2478 | C<op_last>. The children in between can be found by iteratively |
86cd3a13 | 2479 | following the C<OpSIBLING> pointer from the first child to the last (but |
29e61fd9 | 2480 | see below). |
9afa14e3 | 2481 | |
7cc7ada7 | 2482 | =for apidoc_section $optree_construction |
63dbc4a9 KW |
2483 | =for apidoc Ayh||OP |
2484 | =for apidoc Ayh||BINOP | |
2485 | =for apidoc Ayh||LISTOP | |
2486 | =for apidoc Ayh||UNOP | |
2487 | ||
29e61fd9 | 2488 | There are also some other op types: a C<PMOP> holds a regular expression, |
10e2eb10 FC |
2489 | and has no children, and a C<LOOP> may or may not have children. If the |
2490 | C<op_children> field is non-zero, it behaves like a C<LISTOP>. To | |
9afa14e3 SC |
2491 | complicate matters, if a C<UNOP> is actually a C<null> op after |
2492 | optimization (see L</Compile pass 2: context propagation>) it will still | |
2493 | have children in accordance with its former type. | |
2494 | ||
63dbc4a9 KW |
2495 | =for apidoc Ayh||LOOP |
2496 | =for apidoc Ayh||PMOP | |
2497 | ||
29e61fd9 DM |
2498 | Finally, there is a C<LOGOP>, or logic op. Like a C<LISTOP>, this has one |
2499 | or more children, but it doesn't have an C<op_last> field: so you have to | |
86cd3a13 | 2500 | follow C<op_first> and then the C<OpSIBLING> chain itself to find the |
29e61fd9 DM |
2501 | last child. Instead it has an C<op_other> field, which is comparable to |
2502 | the C<op_next> field described below, and represents an alternate | |
2503 | execution path. Operators like C<and>, C<or> and C<?> are C<LOGOP>s. Note | |
2504 | that in general, C<op_other> may not point to any of the direct children | |
2505 | of the C<LOGOP>. | |
2506 | ||
63dbc4a9 KW |
2507 | =for apidoc Ayh||LOGOP |
2508 | ||
29e61fd9 DM |
2509 | Starting in version 5.21.2, perls built with the experimental |
2510 | define C<-DPERL_OP_PARENT> add an extra boolean flag for each op, | |
87b5a8b9 | 2511 | C<op_moresib>. When not set, this indicates that this is the last op in an |
86cd3a13 DM |
2512 | C<OpSIBLING> chain. This frees up the C<op_sibling> field on the last |
2513 | sibling to point back to the parent op. Under this build, that field is | |
2514 | also renamed C<op_sibparent> to reflect its joint role. The macro | |
2515 | C<OpSIBLING(o)> wraps this special behaviour, and always returns NULL on | |
2516 | the last sibling. With this build the C<op_parent(o)> function can be | |
2517 | used to find the parent of any op. Thus for forward compatibility, you | |
2518 | should always use the C<OpSIBLING(o)> macro rather than accessing | |
2519 | C<op_sibling> directly. | |
29e61fd9 | 2520 | |
06f6df17 RGS |
2521 | Another way to examine the tree is to use a compiler back-end module, such |
2522 | as L<B::Concise>. | |
2523 | ||
0a753a76 | 2524 | =head2 Compile pass 1: check routines |
2525 | ||
8870b5c7 | 2526 | The tree is created by the compiler while I<yacc> code feeds it |
10e2eb10 | 2527 | the constructions it recognizes. Since I<yacc> works bottom-up, so does |
0a753a76 | 2528 | the first pass of perl compilation. |
2529 | ||
2530 | What makes this pass interesting for perl developers is that some | |
2531 | optimization may be performed on this pass. This is optimization by | |
8870b5c7 | 2532 | so-called "check routines". The correspondence between node names |
0a753a76 | 2533 | and corresponding check routines is described in F<opcode.pl> (do not |
2534 | forget to run C<make regen_headers> if you modify this file). | |
2535 | ||
2536 | A check routine is called when the node is fully constructed except | |
7b8d334a | 2537 | for the execution-order thread. Since at this time there are no |
0a753a76 | 2538 | back-links to the currently constructed node, one can do most any |
2539 | operation to the top-level node, including freeing it and/or creating | |
2540 | new nodes above/below it. | |
2541 | ||
2542 | The check routine returns the node which should be inserted into the | |
2543 | tree (if the top-level node was not modified, check routine returns | |
2544 | its argument). | |
2545 | ||
10e2eb10 | 2546 | By convention, check routines have names C<ck_*>. They are usually |
0a753a76 | 2547 | called from C<new*OP> subroutines (or C<convert>) (which in turn are |
2548 | called from F<perly.y>). | |
2549 | ||
2550 | =head2 Compile pass 1a: constant folding | |
2551 | ||
2552 | Immediately after the check routine is called the returned node is | |
2553 | checked for being compile-time executable. If it is (the value is | |
2554 | judged to be constant) it is immediately executed, and a I<constant> | |
2555 | node with the "return value" of the corresponding subtree is | |
2556 | substituted instead. The subtree is deleted. | |
2557 | ||
2558 | If constant folding was not performed, the execution-order thread is | |
2559 | created. | |
2560 | ||
2561 | =head2 Compile pass 2: context propagation | |
2562 | ||
2563 | When a context for a part of compile tree is known, it is propagated | |
a3cb178b | 2564 | down through the tree. At this time the context can have 5 values |
0a753a76 | 2565 | (instead of 2 for runtime context): void, boolean, scalar, list, and |
2566 | lvalue. In contrast with the pass 1 this pass is processed from top | |
2567 | to bottom: a node's context determines the context for its children. | |
2568 | ||
2569 | Additional context-dependent optimizations are performed at this time. | |
2570 | Since at this moment the compile tree contains back-references (via | |
2571 | "thread" pointers), nodes cannot be free()d now. To allow | |
2572 | optimized-away nodes at this stage, such nodes are null()ified instead | |
2573 | of free()ing (i.e. their type is changed to OP_NULL). | |
2574 | ||
2575 | =head2 Compile pass 3: peephole optimization | |
2576 | ||
2577 | After the compile tree for a subroutine (or for an C<eval> or a file) | |
10e2eb10 | 2578 | is created, an additional pass over the code is performed. This pass |
0a753a76 | 2579 | is neither top-down or bottom-up, but in the execution order (with |
9ea12537 Z |
2580 | additional complications for conditionals). Optimizations performed |
2581 | at this stage are subject to the same restrictions as in the pass 2. | |
2582 | ||
2583 | Peephole optimizations are done by calling the function pointed to | |
2584 | by the global variable C<PL_peepp>. By default, C<PL_peepp> just | |
2585 | calls the function pointed to by the global variable C<PL_rpeepp>. | |
2586 | By default, that performs some basic op fixups and optimisations along | |
2587 | the execution-order op chain, and recursively calls C<PL_rpeepp> for | |
2588 | each side chain of ops (resulting from conditionals). Extensions may | |
2589 | provide additional optimisations or fixups, hooking into either the | |
2590 | per-subroutine or recursive stage, like this: | |
2591 | ||
2592 | static peep_t prev_peepp; | |
2593 | static void my_peep(pTHX_ OP *o) | |
2594 | { | |
2595 | /* custom per-subroutine optimisation goes here */ | |
f0358462 | 2596 | prev_peepp(aTHX_ o); |
9ea12537 Z |
2597 | /* custom per-subroutine optimisation may also go here */ |
2598 | } | |
2599 | BOOT: | |
2600 | prev_peepp = PL_peepp; | |
2601 | PL_peepp = my_peep; | |
2602 | ||
2603 | static peep_t prev_rpeepp; | |
39f7bd8a | 2604 | static void my_rpeep(pTHX_ OP *first) |
9ea12537 | 2605 | { |
39f7bd8a IB |
2606 | OP *o = first, *t = first; |
2607 | for(; o = o->op_next, t = t->op_next) { | |
9ea12537 | 2608 | /* custom per-op optimisation goes here */ |
39f7bd8a IB |
2609 | o = o->op_next; |
2610 | if (!o || o == t) break; | |
2611 | /* custom per-op optimisation goes AND here */ | |
9ea12537 | 2612 | } |
f0358462 | 2613 | prev_rpeepp(aTHX_ orig_o); |
9ea12537 Z |
2614 | } |
2615 | BOOT: | |
2616 | prev_rpeepp = PL_rpeepp; | |
2617 | PL_rpeepp = my_rpeep; | |
0a753a76 | 2618 | |
7cc7ada7 | 2619 | =for apidoc_section $optree_manipulation |
63dbc4a9 KW |
2620 | =for apidoc Ayh||peep_t |
2621 | ||
1ba7f851 PJ |
2622 | =head2 Pluggable runops |
2623 | ||
2624 | The compile tree is executed in a runops function. There are two runops | |
1388f78e RGS |
2625 | functions, in F<run.c> and in F<dump.c>. C<Perl_runops_debug> is used |
2626 | with DEBUGGING and C<Perl_runops_standard> is used otherwise. For fine | |
2627 | control over the execution of the compile tree it is possible to provide | |
2628 | your own runops function. | |
1ba7f851 PJ |
2629 | |
2630 | It's probably best to copy one of the existing runops functions and | |
2631 | change it to suit your needs. Then, in the BOOT section of your XS | |
2632 | file, add the line: | |
2633 | ||
2634 | PL_runops = my_runops; | |
2635 | ||
7cc7ada7 | 2636 | =for apidoc_section $debugging |
6ef63541 KW |
2637 | =for apidoc runops_debug |
2638 | =for apidoc runops_standard | |
6a7c980a KW |
2639 | =for apidoc Amnh|runops_proc_t|PL_runops |
2640 | ||
1ba7f851 PJ |
2641 | This function should be as efficient as possible to keep your programs |
2642 | running as fast as possible. | |
2643 | ||
fd85fad2 BM |
2644 | =head2 Compile-time scope hooks |
2645 | ||
2646 | As of perl 5.14 it is possible to hook into the compile-time lexical | |
10e2eb10 | 2647 | scope mechanism using C<Perl_blockhook_register>. This is used like |
fd85fad2 BM |
2648 | this: |
2649 | ||
2650 | STATIC void my_start_hook(pTHX_ int full); | |
2651 | STATIC BHK my_hooks; | |
2652 | ||
2653 | BOOT: | |
a88d97bf | 2654 | BhkENTRY_set(&my_hooks, bhk_start, my_start_hook); |
fd85fad2 BM |
2655 | Perl_blockhook_register(aTHX_ &my_hooks); |
2656 | ||
2657 | This will arrange to have C<my_start_hook> called at the start of | |
10e2eb10 | 2658 | compiling every lexical scope. The available hooks are: |
fd85fad2 | 2659 | |
7cc7ada7 | 2660 | =for apidoc_section $lexer |
63dbc4a9 KW |
2661 | =for apidoc Ayh||BHK |
2662 | ||
fd85fad2 BM |
2663 | =over 4 |
2664 | ||
a88d97bf | 2665 | =item C<void bhk_start(pTHX_ int full)> |
fd85fad2 | 2666 | |
10e2eb10 | 2667 | This is called just after starting a new lexical scope. Note that Perl |
fd85fad2 BM |
2668 | code like |
2669 | ||
2670 | if ($x) { ... } | |
2671 | ||
2672 | creates two scopes: the first starts at the C<(> and has C<full == 1>, | |
10e2eb10 | 2673 | the second starts at the C<{> and has C<full == 0>. Both end at the |
f185f654 | 2674 | C<}>, so calls to C<start> and C<pre>/C<post_end> will match. Anything |
fd85fad2 BM |
2675 | pushed onto the save stack by this hook will be popped just before the |
2676 | scope ends (between the C<pre_> and C<post_end> hooks, in fact). | |
2677 | ||
a88d97bf | 2678 | =item C<void bhk_pre_end(pTHX_ OP **o)> |
fd85fad2 BM |
2679 | |
2680 | This is called at the end of a lexical scope, just before unwinding the | |
10e2eb10 | 2681 | stack. I<o> is the root of the optree representing the scope; it is a |
fd85fad2 BM |
2682 | double pointer so you can replace the OP if you need to. |
2683 | ||
a88d97bf | 2684 | =item C<void bhk_post_end(pTHX_ OP **o)> |
fd85fad2 BM |
2685 | |
2686 | This is called at the end of a lexical scope, just after unwinding the | |
10e2eb10 | 2687 | stack. I<o> is as above. Note that it is possible for calls to C<pre_> |
fd85fad2 BM |
2688 | and C<post_end> to nest, if there is something on the save stack that |
2689 | calls string eval. | |
2690 | ||
a88d97bf | 2691 | =item C<void bhk_eval(pTHX_ OP *const o)> |
fd85fad2 BM |
2692 | |
2693 | This is called just before starting to compile an C<eval STRING>, C<do | |
10e2eb10 | 2694 | FILE>, C<require> or C<use>, after the eval has been set up. I<o> is the |
fd85fad2 BM |
2695 | OP that requested the eval, and will normally be an C<OP_ENTEREVAL>, |
2696 | C<OP_DOFILE> or C<OP_REQUIRE>. | |
2697 | ||
2698 | =back | |
2699 | ||
2700 | Once you have your hook functions, you need a C<BHK> structure to put | |
10e2eb10 FC |
2701 | them in. It's best to allocate it statically, since there is no way to |
2702 | free it once it's registered. The function pointers should be inserted | |
fd85fad2 | 2703 | into this structure using the C<BhkENTRY_set> macro, which will also set |
10e2eb10 | 2704 | flags indicating which entries are valid. If you do need to allocate |
fd85fad2 BM |
2705 | your C<BHK> dynamically for some reason, be sure to zero it before you |
2706 | start. | |
2707 | ||
2708 | Once registered, there is no mechanism to switch these hooks off, so if | |
10e2eb10 | 2709 | that is necessary you will need to do this yourself. An entry in C<%^H> |
a3e07c87 BM |
2710 | is probably the best way, so the effect is lexically scoped; however it |
2711 | is also possible to use the C<BhkDISABLE> and C<BhkENABLE> macros to | |
10e2eb10 | 2712 | temporarily switch entries on and off. You should also be aware that |
a3e07c87 | 2713 | generally speaking at least one scope will have opened before your |
f185f654 | 2714 | extension is loaded, so you will see some C<pre>/C<post_end> pairs that |
a3e07c87 | 2715 | didn't have a matching C<start>. |
fd85fad2 | 2716 | |
9afa14e3 SC |
2717 | =head1 Examining internal data structures with the C<dump> functions |
2718 | ||
2719 | To aid debugging, the source file F<dump.c> contains a number of | |
2720 | functions which produce formatted output of internal data structures. | |
2721 | ||
2722 | The most commonly used of these functions is C<Perl_sv_dump>; it's used | |
10e2eb10 | 2723 | for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls |
9afa14e3 | 2724 | C<sv_dump> to produce debugging output from Perl-space, so users of that |
00aadd71 | 2725 | module should already be familiar with its format. |
9afa14e3 SC |
2726 | |
2727 | C<Perl_op_dump> can be used to dump an C<OP> structure or any of its | |
210b36aa | 2728 | derivatives, and produces output similar to C<perl -Dx>; in fact, |
9afa14e3 SC |
2729 | C<Perl_dump_eval> will dump the main root of the code being evaluated, |
2730 | exactly like C<-Dx>. | |
2731 | ||
03c0fc11 KW |
2732 | =for apidoc_section $debugging |
2733 | =for apidoc dump_eval | |
2734 | ||
9afa14e3 SC |
2735 | Other useful functions are C<Perl_dump_sub>, which turns a C<GV> into an |
2736 | op tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the | |
2737 | subroutines in a package like so: (Thankfully, these are all xsubs, so | |
2738 | there is no op tree) | |
2739 | ||
6ef63541 KW |
2740 | =for apidoc_section $debugging |
2741 | =for apidoc dump_sub | |
2742 | ||
9afa14e3 SC |
2743 | (gdb) print Perl_dump_packsubs(PL_defstash) |
2744 | ||
2745 | SUB attributes::bootstrap = (xsub 0x811fedc 0) | |
2746 | ||
2747 | SUB UNIVERSAL::can = (xsub 0x811f50c 0) | |
2748 | ||
2749 | SUB UNIVERSAL::isa = (xsub 0x811f304 0) | |
2750 | ||
2751 | SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0) | |
2752 | ||
2753 | SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0) | |
2754 | ||
2755 | and C<Perl_dump_all>, which dumps all the subroutines in the stash and | |
2756 | the op tree of the main root. | |
2757 | ||
954c1994 | 2758 | =head1 How multiple interpreters and concurrency are supported |
ee072b34 | 2759 | |
6e512bc2 | 2760 | =head2 Background and MULTIPLICITY |
ee072b34 | 2761 | |
6ef63541 KW |
2762 | =for apidoc_section $concurrency |
2763 | =for apidoc Amnh||PERL_IMPLICIT_CONTEXT | |
2764 | ||
ee072b34 GS |
2765 | The Perl interpreter can be regarded as a closed box: it has an API |
2766 | for feeding it code or otherwise making it do things, but it also has | |
2767 | functions for its own use. This smells a lot like an object, and | |
8c3a0f6c | 2768 | there is a way for you to build Perl so that you can have multiple |
acfe0abc GS |
2769 | interpreters, with one interpreter represented either as a C structure, |
2770 | or inside a thread-specific structure. These structures contain all | |
2771 | the context, the state of that interpreter. | |
2772 | ||
8c3a0f6c | 2773 | The macro that controls the major Perl build flavor is MULTIPLICITY. The |
7b52221d | 2774 | MULTIPLICITY build has a C structure that packages all the interpreter |
6e512bc2 TK |
2775 | state, which is being passed to various perl functions as a "hidden" |
2776 | first argument. MULTIPLICITY makes multi-threaded perls possible (with the | |
2777 | ithreads threading model, related to the macro USE_ITHREADS.) | |
2778 | ||
2779 | PERL_IMPLICIT_CONTEXT is a legacy synonym for MULTIPLICITY. | |
54aff467 | 2780 | |
6ef63541 KW |
2781 | =for apidoc_section $concurrency |
2782 | =for apidoc Amnh||MULTIPLICITY | |
2783 | ||
9aa97215 JH |
2784 | To see whether you have non-const data you can use a BSD (or GNU) |
2785 | compatible C<nm>: | |
bc028b6b JH |
2786 | |
2787 | nm libperl.a | grep -v ' [TURtr] ' | |
2788 | ||
9aa97215 JH |
2789 | If this displays any C<D> or C<d> symbols (or possibly C<C> or C<c>), |
2790 | you have non-const data. The symbols the C<grep> removed are as follows: | |
2791 | C<Tt> are I<text>, or code, the C<Rr> are I<read-only> (const) data, | |
2792 | and the C<U> is <undefined>, external symbols referred to. | |
2793 | ||
2794 | The test F<t/porting/libperl.t> does this kind of symbol sanity | |
2795 | checking on C<libperl.a>. | |
bc028b6b | 2796 | |
54aff467 | 2797 | All this obviously requires a way for the Perl internal functions to be |
acfe0abc | 2798 | either subroutines taking some kind of structure as the first |
ee072b34 | 2799 | argument, or subroutines taking nothing as the first argument. To |
acfe0abc | 2800 | enable these two very different ways of building the interpreter, |
ee072b34 GS |
2801 | the Perl source (as it does in so many other situations) makes heavy |
2802 | use of macros and subroutine naming conventions. | |
2803 | ||
54aff467 | 2804 | First problem: deciding which functions will be public API functions and |
00aadd71 | 2805 | which will be private. All functions whose names begin C<S_> are private |
954c1994 GS |
2806 | (think "S" for "secret" or "static"). All other functions begin with |
2807 | "Perl_", but just because a function begins with "Perl_" does not mean it is | |
10e2eb10 FC |
2808 | part of the API. (See L</Internal |
2809 | Functions>.) The easiest way to be B<sure> a | |
00aadd71 NIS |
2810 | function is part of the API is to find its entry in L<perlapi>. |
2811 | If it exists in L<perlapi>, it's part of the API. If it doesn't, and you | |
8166b4e0 DB |
2812 | think it should be (i.e., you need it for your extension), submit an issue at |
2813 | L<https://github.com/Perl/perl5/issues> explaining why you think it should be. | |
ee072b34 GS |
2814 | |
2815 | Second problem: there must be a syntax so that the same subroutine | |
2816 | declarations and calls can pass a structure as their first argument, | |
2817 | or pass nothing. To solve this, the subroutines are named and | |
2818 | declared in a particular way. Here's a typical start of a static | |
2819 | function used within the Perl guts: | |
2820 | ||
2821 | STATIC void | |
2822 | S_incline(pTHX_ char *s) | |
2823 | ||
acfe0abc | 2824 | STATIC becomes "static" in C, and may be #define'd to nothing in some |
da8c5729 | 2825 | configurations in the future. |
ee072b34 | 2826 | |
3f620621 | 2827 | =for apidoc_section $directives |
63dbc4a9 KW |
2828 | =for apidoc Ayh||STATIC |
2829 | ||
651a3225 GS |
2830 | A public function (i.e. part of the internal API, but not necessarily |
2831 | sanctioned for use in extensions) begins like this: | |
ee072b34 GS |
2832 | |
2833 | void | |
2307c6d0 | 2834 | Perl_sv_setiv(pTHX_ SV* dsv, IV num) |
ee072b34 | 2835 | |
0147cd53 | 2836 | C<pTHX_> is one of a number of macros (in F<perl.h>) that hide the |
ee072b34 GS |
2837 | details of the interpreter's context. THX stands for "thread", "this", |
2838 | or "thingy", as the case may be. (And no, George Lucas is not involved. :-) | |
2839 | The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument, | |
a7486cbb JH |
2840 | or 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and |
2841 | their variants. | |
ee072b34 | 2842 | |
3f620621 | 2843 | =for apidoc_section $concurrency |
4f313521 KW |
2844 | =for apidoc Amnh||aTHX |
2845 | =for apidoc Amnh||aTHX_ | |
2846 | =for apidoc Amnh||dTHX | |
2847 | =for apidoc Amnh||pTHX | |
2848 | =for apidoc Amnh||pTHX_ | |
2849 | ||
6e512bc2 | 2850 | When Perl is built without options that set MULTIPLICITY, there is no |
a7486cbb | 2851 | first argument containing the interpreter's context. The trailing underscore |
ee072b34 GS |
2852 | in the pTHX_ macro indicates that the macro expansion needs a comma |
2853 | after the context argument because other arguments follow it. If | |
6e512bc2 | 2854 | MULTIPLICITY is not defined, pTHX_ will be ignored, and the |
54aff467 GS |
2855 | subroutine is not prototyped to take the extra argument. The form of the |
2856 | macro without the trailing underscore is used when there are no additional | |
ee072b34 GS |
2857 | explicit arguments. |
2858 | ||
54aff467 | 2859 | When a core function calls another, it must pass the context. This |
2307c6d0 | 2860 | is normally hidden via macros. Consider C<sv_setiv>. It expands into |
ee072b34 GS |
2861 | something like this: |
2862 | ||
6e512bc2 | 2863 | #ifdef MULTIPLICITY |
2307c6d0 | 2864 | #define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b) |
ee072b34 | 2865 | /* can't do this for vararg functions, see below */ |
2307c6d0 SB |
2866 | #else |
2867 | #define sv_setiv Perl_sv_setiv | |
2868 | #endif | |
ee072b34 GS |
2869 | |
2870 | This works well, and means that XS authors can gleefully write: | |
2871 | ||
2307c6d0 | 2872 | sv_setiv(foo, bar); |
ee072b34 GS |
2873 | |
2874 | and still have it work under all the modes Perl could have been | |
2875 | compiled with. | |
2876 | ||
ee072b34 GS |
2877 | This doesn't work so cleanly for varargs functions, though, as macros |
2878 | imply that the number of arguments is known in advance. Instead we | |
2879 | either need to spell them out fully, passing C<aTHX_> as the first | |
2880 | argument (the Perl core tends to do this with functions like | |
2881 | Perl_warner), or use a context-free version. | |
2882 | ||
2883 | The context-free version of Perl_warner is called | |
2884 | Perl_warner_nocontext, and does not take the extra argument. Instead | |
10bee092 | 2885 | it does C<dTHX;> to get the context from thread-local storage. We |
ee072b34 GS |
2886 | C<#define warner Perl_warner_nocontext> so that extensions get source |
2887 | compatibility at the expense of performance. (Passing an arg is | |
2888 | cheaper than grabbing it from thread-local storage.) | |
2889 | ||
acfe0abc | 2890 | You can ignore [pad]THXx when browsing the Perl headers/sources. |
ee072b34 GS |
2891 | Those are strictly for use within the core. Extensions and embedders |
2892 | need only be aware of [pad]THX. | |
2893 | ||
a7486cbb JH |
2894 | =head2 So what happened to dTHR? |
2895 | ||
7cc7ada7 | 2896 | =for apidoc_section $concurrency |
4f313521 KW |
2897 | =for apidoc Amnh||dTHR |
2898 | ||
a7486cbb JH |
2899 | C<dTHR> was introduced in perl 5.005 to support the older thread model. |
2900 | The older thread model now uses the C<THX> mechanism to pass context | |
2901 | pointers around, so C<dTHR> is not useful any more. Perl 5.6.0 and | |
2902 | later still have it for backward source compatibility, but it is defined | |
2903 | to be a no-op. | |
2904 | ||
ee072b34 GS |
2905 | =head2 How do I use all this in extensions? |
2906 | ||
6e512bc2 | 2907 | When Perl is built with MULTIPLICITY, extensions that call |
ee072b34 GS |
2908 | any functions in the Perl API will need to pass the initial context |
2909 | argument somehow. The kicker is that you will need to write it in | |
2910 | such a way that the extension still compiles when Perl hasn't been | |
6e512bc2 | 2911 | built with MULTIPLICITY enabled. |
ee072b34 GS |
2912 | |
2913 | There are three ways to do this. First, the easy but inefficient way, | |
2914 | which is also the default, in order to maintain source compatibility | |
0147cd53 | 2915 | with extensions: whenever F<XSUB.h> is #included, it redefines the aTHX |
ee072b34 GS |
2916 | and aTHX_ macros to call a function that will return the context. |
2917 | Thus, something like: | |
2918 | ||
2307c6d0 | 2919 | sv_setiv(sv, num); |
ee072b34 | 2920 | |
6e512bc2 | 2921 | in your extension will translate to this when MULTIPLICITY is |
54aff467 | 2922 | in effect: |
ee072b34 | 2923 | |
2307c6d0 | 2924 | Perl_sv_setiv(Perl_get_context(), sv, num); |
ee072b34 | 2925 | |
54aff467 | 2926 | or to this otherwise: |
ee072b34 | 2927 | |
2307c6d0 | 2928 | Perl_sv_setiv(sv, num); |
ee072b34 | 2929 | |
da8c5729 | 2930 | You don't have to do anything new in your extension to get this; since |
2fa86c13 | 2931 | the Perl library provides Perl_get_context(), it will all just |
ee072b34 GS |
2932 | work. |
2933 | ||
2934 | The second, more efficient way is to use the following template for | |
2935 | your Foo.xs: | |
2936 | ||
c52f9dcd JH |
2937 | #define PERL_NO_GET_CONTEXT /* we want efficiency */ |
2938 | #include "EXTERN.h" | |
2939 | #include "perl.h" | |
2940 | #include "XSUB.h" | |
ee072b34 | 2941 | |
fd061412 | 2942 | STATIC void my_private_function(int arg1, int arg2); |
ee072b34 | 2943 | |
fd061412 | 2944 | STATIC void |
c52f9dcd JH |
2945 | my_private_function(int arg1, int arg2) |
2946 | { | |
2947 | dTHX; /* fetch context */ | |
2948 | ... call many Perl API functions ... | |
2949 | } | |
ee072b34 GS |
2950 | |
2951 | [... etc ...] | |
2952 | ||
c52f9dcd | 2953 | MODULE = Foo PACKAGE = Foo |
ee072b34 | 2954 | |
c52f9dcd | 2955 | /* typical XSUB */ |
ee072b34 | 2956 | |
c52f9dcd JH |
2957 | void |
2958 | my_xsub(arg) | |
2959 | int arg | |
2960 | CODE: | |
2961 | my_private_function(arg, 10); | |
ee072b34 GS |
2962 | |
2963 | Note that the only two changes from the normal way of writing an | |
2964 | extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before | |
2965 | including the Perl headers, followed by a C<dTHX;> declaration at | |
2966 | the start of every function that will call the Perl API. (You'll | |
2967 | know which functions need this, because the C compiler will complain | |
2968 | that there's an undeclared identifier in those functions.) No changes | |
2969 | are needed for the XSUBs themselves, because the XS() macro is | |
2970 | correctly defined to pass in the implicit context if needed. | |
2971 | ||
40578475 KW |
2972 | =for apidoc_section $concurrency |
2973 | =for apidoc AmnhU||PERL_NO_GET_CONTEXT | |
2974 | ||
ee072b34 GS |
2975 | The third, even more efficient way is to ape how it is done within |
2976 | the Perl guts: | |
2977 | ||
2978 | ||
c52f9dcd JH |
2979 | #define PERL_NO_GET_CONTEXT /* we want efficiency */ |
2980 | #include "EXTERN.h" | |
2981 | #include "perl.h" | |
2982 | #include "XSUB.h" | |
ee072b34 GS |
2983 | |
2984 | /* pTHX_ only needed for functions that call Perl API */ | |
fd061412 | 2985 | STATIC void my_private_function(pTHX_ int arg1, int arg2); |
ee072b34 | 2986 | |
fd061412 | 2987 | STATIC void |
c52f9dcd JH |
2988 | my_private_function(pTHX_ int arg1, int arg2) |
2989 | { | |
2990 | /* dTHX; not needed here, because THX is an argument */ | |
2991 | ... call Perl API functions ... | |
2992 | } | |
ee072b34 GS |
2993 | |
2994 | [... etc ...] | |
2995 | ||
c52f9dcd | 2996 | MODULE = Foo PACKAGE = Foo |
ee072b34 | 2997 | |
c52f9dcd | 2998 | /* typical XSUB */ |
ee072b34 | 2999 | |
c52f9dcd JH |
3000 | void |
3001 | my_xsub(arg) | |
3002 | int arg | |
3003 | CODE: | |
3004 | my_private_function(aTHX_ arg, 10); | |
ee072b34 GS |
3005 | |
3006 | This implementation never has to fetch the context using a function | |
3007 | call, since it is always passed as an extra argument. Depending on | |
3008 | your needs for simplicity or efficiency, you may mix the previous | |
3009 | two approaches freely. | |
3010 | ||
651a3225 GS |
3011 | Never add a comma after C<pTHX> yourself--always use the form of the |
3012 | macro with the underscore for functions that take explicit arguments, | |
3013 | or the form without the argument for functions with no explicit arguments. | |
ee072b34 | 3014 | |
a7486cbb JH |
3015 | =head2 Should I do anything special if I call perl from multiple threads? |
3016 | ||
3017 | If you create interpreters in one thread and then proceed to call them in | |
3018 | another, you need to make sure perl's own Thread Local Storage (TLS) slot is | |
3019 | initialized correctly in each of those threads. | |
3020 | ||
3021 | The C<perl_alloc> and C<perl_clone> API functions will automatically set | |
3022 | the TLS slot to the interpreter they created, so that there is no need to do | |
3023 | anything special if the interpreter is always accessed in the same thread that | |
3024 | created it, and that thread did not create or call any other interpreters | |
3025 | afterwards. If that is not the case, you have to set the TLS slot of the | |
3026 | thread before calling any functions in the Perl API on that particular | |
3027 | interpreter. This is done by calling the C<PERL_SET_CONTEXT> macro in that | |
3028 | thread as the first thing you do: | |
3029 | ||
3030 | /* do this before doing anything else with some_perl */ | |
3031 | PERL_SET_CONTEXT(some_perl); | |
3032 | ||
3033 | ... other Perl API calls on some_perl go here ... | |
3034 | ||
32c3a37b KW |
3035 | =for apidoc_section $embedding |
3036 | =for apidoc Amh|void|PERL_SET_CONTEXT|PerlInterpreter* i | |
3037 | ||
3038 | (You can always get the current context via C<PERL_GET_CONTEXT>.) | |
3039 | ||
3040 | =for apidoc Amnh|PerlInterpreter*|PERL_GET_CONTEXT| | |
3041 | ||
ee072b34 GS |
3042 | =head2 Future Plans and PERL_IMPLICIT_SYS |
3043 | ||
6e512bc2 | 3044 | Just as MULTIPLICITY provides a way to bundle up everything |
ee072b34 GS |
3045 | that the interpreter knows about itself and pass it around, so too are |
3046 | there plans to allow the interpreter to bundle up everything it knows | |
3047 | about the environment it's running on. This is enabled with the | |
7b52221d RGS |
3048 | PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS on |
3049 | Windows. | |
ee072b34 GS |
3050 | |
3051 | This allows the ability to provide an extra pointer (called the "host" | |
3052 | environment) for all the system calls. This makes it possible for | |
3053 | all the system stuff to maintain their own state, broken down into | |
3054 | seven C structures. These are thin wrappers around the usual system | |
0147cd53 | 3055 | calls (see F<win32/perllib.c>) for the default perl executable, but for a |
ee072b34 GS |
3056 | more ambitious host (like the one that would do fork() emulation) all |
3057 | the extra work needed to pretend that different interpreters are | |
3058 | actually different "processes", would be done here. | |
3059 | ||
3060 | The Perl engine/interpreter and the host are orthogonal entities. | |
3061 | There could be one or more interpreters in a process, and one or | |
3062 | more "hosts", with free association between them. | |
3063 | ||
a422fd2d SC |
3064 | =head1 Internal Functions |
3065 | ||
3066 | All of Perl's internal functions which will be exposed to the outside | |
06f6df17 | 3067 | world are prefixed by C<Perl_> so that they will not conflict with XS |
a422fd2d | 3068 | functions or functions used in a program in which Perl is embedded. |
10e2eb10 | 3069 | Similarly, all global variables begin with C<PL_>. (By convention, |
06f6df17 | 3070 | static functions start with C<S_>.) |
a422fd2d | 3071 | |
0972ecdf DM |
3072 | Inside the Perl core (C<PERL_CORE> defined), you can get at the functions |
3073 | either with or without the C<Perl_> prefix, thanks to a bunch of defines | |
10e2eb10 | 3074 | that live in F<embed.h>. Note that extension code should I<not> set |
0972ecdf DM |
3075 | C<PERL_CORE>; this exposes the full perl internals, and is likely to cause |
3076 | breakage of the XS in each new perl release. | |
3077 | ||
3078 | The file F<embed.h> is generated automatically from | |
10e2eb10 | 3079 | F<embed.pl> and F<embed.fnc>. F<embed.pl> also creates the prototyping |
dc9b1d22 | 3080 | header files for the internal functions, generates the documentation |
10e2eb10 | 3081 | and a lot of other bits and pieces. It's important that when you add |
dc9b1d22 | 3082 | a new function to the core or change an existing one, you change the |
10e2eb10 | 3083 | data in the table in F<embed.fnc> as well. Here's a sample entry from |
dc9b1d22 | 3084 | that table: |
a422fd2d SC |
3085 | |
3086 | Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval | |
3087 | ||
790ba721 KW |
3088 | The first column is a set of flags, the second column the return type, |
3089 | the third column the name. Columns after that are the arguments. | |
3090 | The flags are documented at the top of F<embed.fnc>. | |
a422fd2d | 3091 | |
dc9b1d22 MHM |
3092 | If you edit F<embed.pl> or F<embed.fnc>, you will need to run |
3093 | C<make regen_headers> to force a rebuild of F<embed.h> and other | |
3094 | auto-generated files. | |
a422fd2d | 3095 | |
6b4667fc | 3096 | =head2 Formatted Printing of IVs, UVs, and NVs |
9dd9db0b | 3097 | |
6b4667fc A |
3098 | If you are printing IVs, UVs, or NVS instead of the stdio(3) style |
3099 | formatting codes like C<%d>, C<%ld>, C<%f>, you should use the | |
3100 | following macros for portability | |
9dd9db0b | 3101 | |
c52f9dcd JH |
3102 | IVdf IV in decimal |
3103 | UVuf UV in decimal | |
3104 | UVof UV in octal | |
3105 | UVxf UV in hexadecimal | |
3106 | NVef NV %e-like | |
3107 | NVff NV %f-like | |
3108 | NVgf NV %g-like | |
9dd9db0b | 3109 | |
6b4667fc A |
3110 | These will take care of 64-bit integers and long doubles. |
3111 | For example: | |
3112 | ||
9faa5a89 | 3113 | printf("IV is %" IVdf "\n", iv); |
6b4667fc | 3114 | |
9faa5a89 KW |
3115 | The C<IVdf> will expand to whatever is the correct format for the IVs. |
3116 | Note that the spaces are required around the format in case the code is | |
3117 | compiled with C++, to maintain compliance with its standard. | |
9dd9db0b | 3118 | |
aacf4ea2 JH |
3119 | Note that there are different "long doubles": Perl will use |
3120 | whatever the compiler has. | |
3121 | ||
a7c67fbc KW |
3122 | If you are printing addresses of pointers, use %p or UVxf combined |
3123 | with PTR2UV(). | |
8908e76d | 3124 | |
2d197238 KW |
3125 | =head2 Formatted Printing of SVs |
3126 | ||
3127 | The contents of SVs may be printed using the C<SVf> format, like so: | |
3128 | ||
0e13edb0 | 3129 | Perl_croak(aTHX_ "This croaked because: %" SVf "\n", SVfARG(err_msg)) |
2d197238 KW |
3130 | |
3131 | where C<err_msg> is an SV. | |
3132 | ||
7cc7ada7 | 3133 | =for apidoc_section $io_formats |
6015ce9b KW |
3134 | =for apidoc Amnh||SVf |
3135 | =for apidoc Amh||SVfARG|SV *sv | |
3136 | ||
2d197238 KW |
3137 | Not all scalar types are printable. Simple values certainly are: one of |
3138 | IV, UV, NV, or PV. Also, if the SV is a reference to some value, | |
3139 | either it will be dereferenced and the value printed, or information | |
3140 | about the type of that value and its address are displayed. The results | |
3141 | of printing any other type of SV are undefined and likely to lead to an | |
e807022f | 3142 | interpreter crash. NVs are printed using a C<%g>-ish format. |
2d197238 KW |
3143 | |
3144 | Note that the spaces are required around the C<SVf> in case the code is | |
3145 | compiled with C++, to maintain compliance with its standard. | |
3146 | ||
3147 | Note that any filehandle being printed to under UTF-8 must be expecting | |
3148 | UTF-8 in order to get good results and avoid Wide-character warnings. | |
3149 | One way to do this for typical filehandles is to invoke perl with the | |
2d2503ee | 3150 | C<-C> parameter. (See L<perlrun/-C [numberE<sol>list]>. |
2d197238 KW |
3151 | |
3152 | You can use this to concatenate two scalars: | |
3153 | ||
3154 | SV *var1 = get_sv("var1", GV_ADD); | |
3155 | SV *var2 = get_sv("var2", GV_ADD); | |
3156 | SV *var3 = newSVpvf("var1=%" SVf " and var2=%" SVf, | |
e807022f | 3157 | SVfARG(var1), SVfARG(var2)); |
2d197238 | 3158 | |
33ef5d2c YO |
3159 | =for apidoc Amnh||SVf_QUOTEDPREFIX |
3160 | ||
3161 | C<SVf_QUOTEDPREFIX> is similar to C<SVf> except that it restricts the | |
3162 | number of the characters printed, showing at most the first | |
3163 | C<PERL_QUOTEDPREFIX_LEN> characters of the argument, and rendering it with | |
3164 | double quotes and with the contents escaped using double quoted string | |
3165 | escaping rules. If the string is longer than this then ellipses "..." | |
3166 | will be appended after the trailing quote. This is intended for error | |
3167 | messages where the string is assumed to be a class name. | |
3168 | ||
332af227 YO |
3169 | =for apidoc Amnh||HvNAMEf |
3170 | =for apidoc Amnh||HvNAMEf_QUOTEDPREFIX | |
3171 | ||
3172 | C<HvNAMEf> and C<HvNAMEf_QUOTEDPREFIX> are similar to C<SVf> except they | |
3173 | extract the string, length and utf8 flags from the argument using the | |
3174 | C<HvNAME()>, C<HvNAMELEN()>, C<HvNAMEUTF8()> macros. This is intended | |
3175 | for stringifying a class name directly from an stash HV. | |
3176 | ||
9bec17d7 | 3177 | =head2 Formatted Printing of Strings |
8b64b5d1 | 3178 | |
aae69fa9 P |
3179 | If you just want the bytes printed in a 7bit NUL-terminated string, you can |
3180 | just use C<%s> (assuming they are all really only 7bit). But if there is a | |
3181 | possibility the value will be encoded as UTF-8 or contains bytes above | |
3182 | C<0x7F> (and therefore 8bit), you should instead use the C<UTF8f> format. | |
3183 | And as its parameter, use the C<UTF8fARG()> macro: | |
9bec17d7 KW |
3184 | |
3185 | chr * msg; | |
3186 | ||
3187 | /* U+2018: \xE2\x80\x98 LEFT SINGLE QUOTATION MARK | |
3188 | U+2019: \xE2\x80\x99 RIGHT SINGLE QUOTATION MARK */ | |
3189 | if (can_utf8) | |
3190 | msg = "\xE2\x80\x98Uses fancy quotes\xE2\x80\x99"; | |
3191 | else | |
3192 | msg = "'Uses simple quotes'"; | |
3193 | ||
3194 | Perl_croak(aTHX_ "The message is: %" UTF8f "\n", | |
3195 | UTF8fARG(can_utf8, strlen(msg), msg)); | |
8b64b5d1 KW |
3196 | |
3197 | The first parameter to C<UTF8fARG> is a boolean: 1 if the string is in | |
aae69fa9 | 3198 | UTF-8; 0 if string is in native byte encoding (Latin1). |
8b64b5d1 KW |
3199 | The second parameter is the number of bytes in the string to print. |
3200 | And the third and final parameter is a pointer to the first byte in the | |
3201 | string. | |
3202 | ||
1f633c5e KW |
3203 | Note that any filehandle being printed to under UTF-8 must be expecting |
3204 | UTF-8 in order to get good results and avoid Wide-character warnings. | |
3205 | One way to do this for typical filehandles is to invoke perl with the | |
2d2503ee | 3206 | C<-C> parameter. (See L<perlrun/-C [numberE<sol>list]>. |
1f633c5e | 3207 | |
a87f9c51 | 3208 | =for apidoc_section $io_formats |
5c29a976 | 3209 | =for apidoc Amnh||UTF8f |
33ef5d2c YO |
3210 | Output a possibly UTF8 value. Be sure to use UTF8fARG() to compose |
3211 | the arguments for this format. | |
3212 | =for apidoc Amnh||UTF8f_QUOTEDPREFIX | |
3213 | Same as C<UTF8f> but the output is quoted, escaped and length limited. | |
3214 | See C<SVf_QUOTEDPREFIX> for more details on escaping. | |
5c29a976 KW |
3215 | =for apidoc Amh||UTF8fARG|bool is_utf8|Size_t byte_len|char *str |
3216 | ||
51b56f5c KW |
3217 | =cut |
3218 | ||
e613617c | 3219 | =head2 Formatted Printing of C<Size_t> and C<SSize_t> |
5862f74e KW |
3220 | |
3221 | The most general way to do this is to cast them to a UV or IV, and | |
3222 | print as in the | |
3223 | L<previous section|/Formatted Printing of IVs, UVs, and NVs>. | |
3224 | ||
3225 | But if you're using C<PerlIO_printf()>, it's less typing and visual | |
e807022f | 3226 | clutter to use the C<%z> length modifier (for I<siZe>): |
5862f74e KW |
3227 | |
3228 | PerlIO_printf("STRLEN is %zu\n", len); | |
3229 | ||
3230 | This modifier is not portable, so its use should be restricted to | |
3231 | C<PerlIO_printf()>. | |
3232 | ||
f02bba19 KW |
3233 | =head2 Formatted Printing of C<Ptrdiff_t>, C<intmax_t>, C<short> and other special sizes |
3234 | ||
3235 | There are modifiers for these special situations if you are using | |
3236 | C<PerlIO_printf()>. See L<perlfunc/size>. | |
3237 | ||
8908e76d JH |
3238 | =head2 Pointer-To-Integer and Integer-To-Pointer |
3239 | ||
3240 | Because pointer size does not necessarily equal integer size, | |
3241 | use the follow macros to do it right. | |
3242 | ||
c52f9dcd JH |
3243 | PTR2UV(pointer) |
3244 | PTR2IV(pointer) | |
3245 | PTR2NV(pointer) | |
3246 | INT2PTR(pointertotype, integer) | |
8908e76d | 3247 | |
3f620621 | 3248 | =for apidoc_section $casting |
3e2d7a92 KW |
3249 | =for apidoc Amh|type|INT2PTR|type|int value |
3250 | =for apidoc Amh|UV|PTR2UV|void * ptr | |
3251 | =for apidoc Amh|IV|PTR2IV|void * ptr | |
3252 | =for apidoc Amh|NV|PTR2NV|void * ptr | |
4f313521 | 3253 | |
8908e76d JH |
3254 | For example: |
3255 | ||
c52f9dcd JH |
3256 | IV iv = ...; |
3257 | SV *sv = INT2PTR(SV*, iv); | |
8908e76d JH |
3258 | |
3259 | and | |
3260 | ||
c52f9dcd JH |
3261 | AV *av = ...; |
3262 | UV uv = PTR2UV(av); | |
8908e76d | 3263 | |
b770a21b KW |
3264 | There are also |
3265 | ||
3266 | PTR2nat(pointer) /* pointer to integer of PTRSIZE */ | |
3267 | PTR2ul(pointer) /* pointer to unsigned long */ | |
3268 | ||
3269 | =for apidoc Amh|IV|PTR2nat|void * | |
3270 | =for apidoc Amh|unsigned long|PTR2ul|void * | |
3271 | ||
3272 | And C<PTRV> which gives the native type for an integer the same size as | |
3273 | pointers, such as C<unsigned> or C<unsigned long>. | |
3274 | ||
21017b82 | 3275 | =for apidoc Ayh|type|PTRV |
b770a21b | 3276 | |
0ca3a874 MHM |
3277 | =head2 Exception Handling |
3278 | ||
9b5c3821 | 3279 | There are a couple of macros to do very basic exception handling in XS |
10e2eb10 | 3280 | modules. You have to define C<NO_XSLOCKS> before including F<XSUB.h> to |
9b5c3821 MHM |
3281 | be able to use these macros: |
3282 | ||
3283 | #define NO_XSLOCKS | |
3284 | #include "XSUB.h" | |
3285 | ||
3286 | You can use these macros if you call code that may croak, but you need | |
10e2eb10 | 3287 | to do some cleanup before giving control back to Perl. For example: |
0ca3a874 | 3288 | |
d7f8936a | 3289 | dXCPT; /* set up necessary variables */ |
0ca3a874 MHM |
3290 | |
3291 | XCPT_TRY_START { | |
3292 | code_that_may_croak(); | |
3293 | } XCPT_TRY_END | |
3294 | ||
3295 | XCPT_CATCH | |
3296 | { | |
3297 | /* do cleanup here */ | |
3298 | XCPT_RETHROW; | |
3299 | } | |
3300 | ||
3301 | Note that you always have to rethrow an exception that has been | |
10e2eb10 FC |
3302 | caught. Using these macros, it is not possible to just catch the |
3303 | exception and ignore it. If you have to ignore the exception, you | |
0ca3a874 MHM |
3304 | have to use the C<call_*> function. |
3305 | ||
3306 | The advantage of using the above macros is that you don't have | |
3307 | to setup an extra function for C<call_*>, and that using these | |
3308 | macros is faster than using C<call_*>. | |
3309 | ||
a422fd2d SC |
3310 | =head2 Source Documentation |
3311 | ||
3312 | There's an effort going on to document the internal functions and | |
61ad4b94 | 3313 | automatically produce reference manuals from them -- L<perlapi> is one |
a422fd2d | 3314 | such manual which details all the functions which are available to XS |
10e2eb10 | 3315 | writers. L<perlintern> is the autogenerated manual for the functions |
a422fd2d SC |
3316 | which are not part of the API and are supposedly for internal use only. |
3317 | ||
3318 | Source documentation is created by putting POD comments into the C | |
3319 | source, like this: | |
3320 | ||
3321 | /* | |
3322 | =for apidoc sv_setiv | |
3323 | ||
3324 | Copies an integer into the given SV. Does not handle 'set' magic. See | |
a95b3d6a | 3325 | L<perlapi/sv_setiv_mg>. |
a422fd2d SC |
3326 | |
3327 | =cut | |
3328 | */ | |
3329 | ||
3330 | Please try and supply some documentation if you add functions to the | |
3331 | Perl core. | |
3332 | ||
0d098d33 MHM |
3333 | =head2 Backwards compatibility |
3334 | ||
10e2eb10 FC |
3335 | The Perl API changes over time. New functions are |
3336 | added or the interfaces of existing functions are | |
3337 | changed. The C<Devel::PPPort> module tries to | |
0d098d33 MHM |
3338 | provide compatibility code for some of these changes, so XS writers don't |
3339 | have to code it themselves when supporting multiple versions of Perl. | |
3340 | ||
3341 | C<Devel::PPPort> generates a C header file F<ppport.h> that can also | |
10e2eb10 | 3342 | be run as a Perl script. To generate F<ppport.h>, run: |
0d098d33 MHM |
3343 | |
3344 | perl -MDevel::PPPort -eDevel::PPPort::WriteFile | |
3345 | ||
3346 | Besides checking existing XS code, the script can also be used to retrieve | |
3347 | compatibility information for various API calls using the C<--api-info> | |
10e2eb10 | 3348 | command line switch. For example: |
0d098d33 MHM |
3349 | |
3350 | % perl ppport.h --api-info=sv_magicext | |
3351 | ||
0985f7e5 | 3352 | For details, see S<C<perldoc ppport.h>>. |
0d098d33 | 3353 | |
a422fd2d SC |
3354 | =head1 Unicode Support |
3355 | ||
10e2eb10 | 3356 | Perl 5.6.0 introduced Unicode support. It's important for porters and XS |
a422fd2d SC |
3357 | writers to understand this support and make sure that the code they |
3358 | write does not corrupt Unicode data. | |
3359 | ||
3360 | =head2 What B<is> Unicode, anyway? | |
3361 | ||
10e2eb10 FC |
3362 | In the olden, less enlightened times, we all used to use ASCII. Most of |
3363 | us did, anyway. The big problem with ASCII is that it's American. Well, | |
a422fd2d | 3364 | no, that's not actually the problem; the problem is that it's not |
10e2eb10 | 3365 | particularly useful for people who don't use the Roman alphabet. What |
a422fd2d | 3366 | used to happen was that particular languages would stick their own |
10e2eb10 | 3367 | alphabet in the upper range of the sequence, between 128 and 255. Of |
a422fd2d SC |
3368 | course, we then ended up with plenty of variants that weren't quite |
3369 | ASCII, and the whole point of it being a standard was lost. | |
3370 | ||
3371 | Worse still, if you've got a language like Chinese or | |
3372 | Japanese that has hundreds or thousands of characters, then you really | |
3373 | can't fit them into a mere 256, so they had to forget about ASCII | |
3374 | altogether, and build their own systems using pairs of numbers to refer | |
3375 | to one character. | |
3376 | ||
3377 | To fix this, some people formed Unicode, Inc. and | |
3378 | produced a new character set containing all the characters you can | |
10e2eb10 FC |
3379 | possibly think of and more. There are several ways of representing these |
3380 | characters, and the one Perl uses is called UTF-8. UTF-8 uses | |
3381 | a variable number of bytes to represent a character. You can learn more | |
2575c402 | 3382 | about Unicode and Perl's Unicode model in L<perlunicode>. |
a422fd2d | 3383 | |
3ad86f0e KW |
3384 | (On EBCDIC platforms, Perl uses instead UTF-EBCDIC, which is a form of |
3385 | UTF-8 adapted for EBCDIC platforms. Below, we just talk about UTF-8. | |
3386 | UTF-EBCDIC is like UTF-8, but the details are different. The macros | |
3387 | hide the differences from you, just remember that the particular numbers | |
3388 | and bit patterns presented below will differ in UTF-EBCDIC.) | |
3389 | ||
1e54db1a | 3390 | =head2 How can I recognise a UTF-8 string? |
a422fd2d | 3391 | |
10e2eb10 FC |
3392 | You can't. This is because UTF-8 data is stored in bytes just like |
3393 | non-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types) | |
a422fd2d | 3394 | capital E with a grave accent, is represented by the two bytes |
10e2eb10 | 3395 | C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)> |
61ad4b94 | 3396 | has that byte sequence as well. So you can't tell just by looking -- this |
a422fd2d SC |
3397 | is what makes Unicode input an interesting problem. |
3398 | ||
2575c402 JW |
3399 | In general, you either have to know what you're dealing with, or you |
3400 | have to guess. The API function C<is_utf8_string> can help; it'll tell | |
61ad4b94 KW |
3401 | you if a string contains only valid UTF-8 characters, and the chances |
3402 | of a non-UTF-8 string looking like valid UTF-8 become very small very | |
3403 | quickly with increasing string length. On a character-by-character | |
3404 | basis, C<isUTF8_CHAR> | |
2575c402 | 3405 | will tell you whether the current character in a string is valid UTF-8. |
a422fd2d | 3406 | |
1e54db1a | 3407 | =head2 How does UTF-8 represent Unicode characters? |
a422fd2d | 3408 | |
1e54db1a | 3409 | As mentioned above, UTF-8 uses a variable number of bytes to store a |
10e2eb10 FC |
3410 | character. Characters with values 0...127 are stored in one |
3411 | byte, just like good ol' ASCII. Character 128 is stored as | |
3412 | C<v194.128>; this continues up to character 191, which is | |
3413 | C<v194.191>. Now we've run out of bits (191 is binary | |
61ad4b94 | 3414 | C<10111111>) so we move on; character 192 is C<v195.128>. And |
a422fd2d | 3415 | so it goes on, moving to three bytes at character 2048. |
6e31cdd1 | 3416 | L<perlunicode/Unicode Encodings> has pictures of how this works. |
a422fd2d | 3417 | |
1e54db1a | 3418 | Assuming you know you're dealing with a UTF-8 string, you can find out |
a422fd2d SC |
3419 | how long the first character in it is with the C<UTF8SKIP> macro: |
3420 | ||
3421 | char *utf = "\305\233\340\240\201"; | |
3422 | I32 len; | |
3423 | ||
3424 | len = UTF8SKIP(utf); /* len is 2 here */ | |
3425 | utf += len; | |
3426 | len = UTF8SKIP(utf); /* len is 3 here */ | |
3427 | ||
1e54db1a | 3428 | Another way to skip over characters in a UTF-8 string is to use |
a422fd2d | 3429 | C<utf8_hop>, which takes a string and a number of characters to skip |
10e2eb10 | 3430 | over. You're on your own about bounds checking, though, so don't use it |
a422fd2d SC |
3431 | lightly. |
3432 | ||
1e54db1a | 3433 | All bytes in a multi-byte UTF-8 character will have the high bit set, |
3a2263fe | 3434 | so you can test if you need to do something special with this |
61ad4b94 | 3435 | character like this (the C<UTF8_IS_INVARIANT()> is a macro that tests |
9f98c7fe | 3436 | whether the byte is encoded as a single byte even in UTF-8): |
a422fd2d | 3437 | |
32128a7f KW |
3438 | U8 *utf; /* Initialize this to point to the beginning of the |
3439 | sequence to convert */ | |
3440 | U8 *utf_end; /* Initialize this to 1 beyond the end of the sequence | |
3441 | pointed to by 'utf' */ | |
3442 | UV uv; /* Returned code point; note: a UV, not a U8, not a | |
3443 | char */ | |
3444 | STRLEN len; /* Returned length of character in bytes */ | |
a422fd2d | 3445 | |
3a2263fe | 3446 | if (!UTF8_IS_INVARIANT(*utf)) |
1e54db1a | 3447 | /* Must treat this as UTF-8 */ |
4b88fb76 | 3448 | uv = utf8_to_uvchr_buf(utf, utf_end, &len); |
a422fd2d SC |
3449 | else |
3450 | /* OK to treat this character as a byte */ | |
3451 | uv = *utf; | |
3452 | ||
4b88fb76 | 3453 | You can also see in that example that we use C<utf8_to_uvchr_buf> to get the |
95701e00 | 3454 | value of the character; the inverse function C<uvchr_to_utf8> is available |
1e54db1a | 3455 | for putting a UV into UTF-8: |
a422fd2d | 3456 | |
61ad4b94 | 3457 | if (!UVCHR_IS_INVARIANT(uv)) |
a422fd2d | 3458 | /* Must treat this as UTF8 */ |
95701e00 | 3459 | utf8 = uvchr_to_utf8(utf8, uv); |
a422fd2d SC |
3460 | else |
3461 | /* OK to treat this character as a byte */ | |
3462 | *utf8++ = uv; | |
3463 | ||
3464 | You B<must> convert characters to UVs using the above functions if | |
1e54db1a | 3465 | you're ever in a situation where you have to match UTF-8 and non-UTF-8 |
10e2eb10 | 3466 | characters. You may not skip over UTF-8 characters in this case. If you |
1e54db1a JH |
3467 | do this, you'll lose the ability to match hi-bit non-UTF-8 characters; |
3468 | for instance, if your UTF-8 string contains C<v196.172>, and you skip | |
3469 | that character, you can never match a C<chr(200)> in a non-UTF-8 string. | |
a422fd2d SC |
3470 | So don't do that! |
3471 | ||
61ad4b94 KW |
3472 | (Note that we don't have to test for invariant characters in the |
3473 | examples above. The functions work on any well-formed UTF-8 input. | |
3474 | It's just that its faster to avoid the function overhead when it's not | |
3475 | needed.) | |
3476 | ||
1e54db1a | 3477 | =head2 How does Perl store UTF-8 strings? |
a422fd2d | 3478 | |
61ad4b94 | 3479 | Currently, Perl deals with UTF-8 strings and non-UTF-8 strings |
10e2eb10 FC |
3480 | slightly differently. A flag in the SV, C<SVf_UTF8>, indicates that the |
3481 | string is internally encoded as UTF-8. Without it, the byte value is the | |
61ad4b94 KW |
3482 | codepoint number and vice versa. This flag is only meaningful if the SV |
3483 | is C<SvPOK> or immediately after stringification via C<SvPV> or a | |
3484 | similar macro. You can check and manipulate this flag with the | |
2575c402 | 3485 | following macros: |
a422fd2d SC |
3486 | |
3487 | SvUTF8(sv) | |
3488 | SvUTF8_on(sv) | |
3489 | SvUTF8_off(sv) | |
3490 | ||
3491 | This flag has an important effect on Perl's treatment of the string: if | |
61ad4b94 | 3492 | UTF-8 data is not properly distinguished, regular expressions, |
a422fd2d | 3493 | C<length>, C<substr> and other string handling operations will have |
61ad4b94 | 3494 | undesirable (wrong) results. |
a422fd2d SC |
3495 | |
3496 | The problem comes when you have, for instance, a string that isn't | |
61ad4b94 | 3497 | flagged as UTF-8, and contains a byte sequence that could be UTF-8 -- |
1e54db1a | 3498 | especially when combining non-UTF-8 and UTF-8 strings. |
a422fd2d | 3499 | |
61ad4b94 KW |
3500 | Never forget that the C<SVf_UTF8> flag is separate from the PV value; you |
3501 | need to be sure you don't accidentally knock it off while you're | |
10e2eb10 | 3502 | manipulating SVs. More specifically, you cannot expect to do this: |
a422fd2d SC |
3503 | |
3504 | SV *sv; | |
3505 | SV *nsv; | |
3506 | STRLEN len; | |
3507 | char *p; | |
3508 | ||
3509 | p = SvPV(sv, len); | |
3510 | frobnicate(p); | |
3511 | nsv = newSVpvn(p, len); | |
3512 | ||
3513 | The C<char*> string does not tell you the whole story, and you can't | |
10e2eb10 | 3514 | copy or reconstruct an SV just by copying the string value. Check if the |
c31cc9fc FC |
3515 | old SV has the UTF8 flag set (I<after> the C<SvPV> call), and act |
3516 | accordingly: | |
a422fd2d SC |
3517 | |
3518 | p = SvPV(sv, len); | |
6db25795 KW |
3519 | is_utf8 = SvUTF8(sv); |
3520 | frobnicate(p, is_utf8); | |
a422fd2d | 3521 | nsv = newSVpvn(p, len); |
6db25795 | 3522 | if (is_utf8) |
a422fd2d SC |
3523 | SvUTF8_on(nsv); |
3524 | ||
6db25795 KW |
3525 | In the above, your C<frobnicate> function has been changed to be made |
3526 | aware of whether or not it's dealing with UTF-8 data, so that it can | |
3527 | handle the string appropriately. | |
a422fd2d | 3528 | |
3a2263fe | 3529 | Since just passing an SV to an XS function and copying the data of |
2575c402 | 3530 | the SV is not enough to copy the UTF8 flags, even less right is just |
61ad4b94 | 3531 | passing a S<C<char *>> to an XS function. |
3a2263fe | 3532 | |
dc83bf8e | 3533 | For full generality, use the L<C<DO_UTF8>|perlapi/DO_UTF8> macro to see if the |
6db25795 KW |
3534 | string in an SV is to be I<treated> as UTF-8. This takes into account |
3535 | if the call to the XS function is being made from within the scope of | |
3536 | L<S<C<use bytes>>|bytes>. If so, the underlying bytes that comprise the | |
3537 | UTF-8 string are to be exposed, rather than the character they | |
3538 | represent. But this pragma should only really be used for debugging and | |
3539 | perhaps low-level testing at the byte level. Hence most XS code need | |
3540 | not concern itself with this, but various areas of the perl core do need | |
3541 | to support it. | |
3542 | ||
3543 | And this isn't the whole story. Starting in Perl v5.12, strings that | |
3544 | aren't encoded in UTF-8 may also be treated as Unicode under various | |
6e31cdd1 | 3545 | conditions (see L<perlunicode/ASCII Rules versus Unicode Rules>). |
6db25795 KW |
3546 | This is only really a problem for characters whose ordinals are between |
3547 | 128 and 255, and their behavior varies under ASCII versus Unicode rules | |
3548 | in ways that your code cares about (see L<perlunicode/The "Unicode Bug">). | |
3549 | There is no published API for dealing with this, as it is subject to | |
3550 | change, but you can look at the code for C<pp_lc> in F<pp.c> for an | |
3551 | example as to how it's currently done. | |
3552 | ||
3c3f883d FG |
3553 | =head2 How do I pass a Perl string to a C library? |
3554 | ||
3555 | A Perl string, conceptually, is an opaque sequence of code points. | |
3556 | Many C libraries expect their inputs to be "classical" C strings, which are | |
3557 | arrays of octets 1-255, terminated with a NUL byte. Your job when writing | |
3558 | an interface between Perl and a C library is to define the mapping between | |
3559 | Perl and that library. | |
3560 | ||
3561 | Generally speaking, C<SvPVbyte> and related macros suit this task well. | |
3562 | These assume that your Perl string is a "byte string", i.e., is either | |
3563 | raw, undecoded input into Perl or is pre-encoded to, e.g., UTF-8. | |
3564 | ||
3565 | Alternatively, if your C library expects UTF-8 text, you can use | |
3566 | C<SvPVutf8> and related macros. This has the same effect as encoding | |
3567 | to UTF-8 then calling the corresponding C<SvPVbyte>-related macro. | |
3568 | ||
3569 | Some C libraries may expect other encodings (e.g., UTF-16LE). To give | |
3570 | Perl strings to such libraries | |
3571 | you must either do that encoding in Perl then use C<SvPVbyte>, or | |
3572 | use an intermediary C library to convert from however Perl stores the | |
3573 | string to the desired encoding. | |
3574 | ||
3575 | Take care also that NULs in your Perl string don't confuse the C | |
3576 | library. If possible, give the string's length to the C library; if that's | |
3577 | not possible, consider rejecting strings that contain NUL bytes. | |
3578 | ||
3579 | =head3 What about C<SvPV>, C<SvPV_nolen>, etc.? | |
3580 | ||
3581 | Consider a 3-character Perl string C<$foo = "\x64\x78\x8c">. | |
3582 | Perl can store these 3 characters either of two ways: | |
3583 | ||
3584 | =over | |
3585 | ||
3586 | =item * bytes: 0x64 0x78 0x8c | |
3587 | ||
3588 | =item * UTF-8: 0x64 0x78 0xc2 0x8c | |
3589 | ||
3590 | =back | |
3591 | ||
3592 | Now let's say you convert C<$foo> to a C string thus: | |
3593 | ||
3594 | STRLEN strlen; | |
3595 | char *str = SvPV(foo_sv, strlen); | |
3596 | ||
3597 | At this point C<str> could point to a 3-byte C string or a 4-byte one. | |
3598 | ||
3599 | Generally speaking, we want C<str> to be the same regardless of how | |
3600 | Perl stores C<$foo>, so the ambiguity here is undesirable. C<SvPVbyte> | |
3601 | and C<SvPVutf8> solve that by giving predictable output: use | |
3602 | C<SvPVbyte> if your C library expects byte strings, or C<SvPVutf8> | |
3603 | if it expects UTF-8. | |
3604 | ||
3605 | If your C library happens to support both encodings, then C<SvPV>--always | |
3606 | in tandem with lookups to C<SvUTF8>!--may be safe and (slightly) more | |
3607 | efficient. | |
3608 | ||
3609 | B<TESTING> B<TIP:> Use L<utf8>'s C<upgrade> and C<downgrade> functions | |
3610 | in your tests to ensure consistent handling regardless of Perl's | |
3611 | internal encoding. | |
3612 | ||
1e54db1a | 3613 | =head2 How do I convert a string to UTF-8? |
a422fd2d | 3614 | |
2575c402 | 3615 | If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to upgrade |
61ad4b94 | 3616 | the non-UTF-8 strings to UTF-8. If you've got an SV, the easiest way to do |
2575c402 | 3617 | this is: |
a422fd2d SC |
3618 | |
3619 | sv_utf8_upgrade(sv); | |
3620 | ||
3621 | However, you must not do this, for example: | |
3622 | ||
3623 | if (!SvUTF8(left)) | |
3624 | sv_utf8_upgrade(left); | |
3625 | ||
3626 | If you do this in a binary operator, you will actually change one of the | |
b1866b2d | 3627 | strings that came into the operator, and, while it shouldn't be noticeable |
2575c402 | 3628 | by the end user, it can cause problems in deficient code. |
a422fd2d | 3629 | |
1e54db1a | 3630 | Instead, C<bytes_to_utf8> will give you a UTF-8-encoded B<copy> of its |
10e2eb10 FC |
3631 | string argument. This is useful for having the data available for |
3632 | comparisons and so on, without harming the original SV. There's also | |
a422fd2d SC |
3633 | C<utf8_to_bytes> to go the other way, but naturally, this will fail if |
3634 | the string contains any characters above 255 that can't be represented | |
3635 | in a single byte. | |
3636 | ||
6db25795 KW |
3637 | =head2 How do I compare strings? |
3638 | ||
3639 | L<perlapi/sv_cmp> and L<perlapi/sv_cmp_flags> do a lexigraphic | |
3640 | comparison of two SV's, and handle UTF-8ness properly. Note, however, | |
3641 | that Unicode specifies a much fancier mechanism for collation, available | |
3642 | via the L<Unicode::Collate> module. | |
3643 | ||
3644 | To just compare two strings for equality/non-equality, you can just use | |
3645 | L<C<memEQ()>|perlapi/memEQ> and L<C<memNE()>|perlapi/memEQ> as usual, | |
3646 | except the strings must be both UTF-8 or not UTF-8 encoded. | |
3647 | ||
3648 | To compare two strings case-insensitively, use | |
3649 | L<C<foldEQ_utf8()>|perlapi/foldEQ_utf8> (the strings don't have to have | |
3650 | the same UTF-8ness). | |
3651 | ||
a422fd2d SC |
3652 | =head2 Is there anything else I need to know? |
3653 | ||
10e2eb10 | 3654 | Not really. Just remember these things: |
a422fd2d SC |
3655 | |
3656 | =over 3 | |
3657 | ||
3658 | =item * | |
3659 | ||
6db25795 KW |
3660 | There's no way to tell if a S<C<char *>> or S<C<U8 *>> string is UTF-8 |
3661 | or not. But you can tell if an SV is to be treated as UTF-8 by calling | |
3662 | C<DO_UTF8> on it, after stringifying it with C<SvPV> or a similar | |
3663 | macro. And, you can tell if SV is actually UTF-8 (even if it is not to | |
3664 | be treated as such) by looking at its C<SvUTF8> flag (again after | |
3665 | stringifying it). Don't forget to set the flag if something should be | |
3666 | UTF-8. | |
3667 | Treat the flag as part of the PV, even though it's not -- if you pass on | |
3668 | the PV to somewhere, pass on the flag too. | |
a422fd2d SC |
3669 | |
3670 | =item * | |
3671 | ||
4b88fb76 | 3672 | If a string is UTF-8, B<always> use C<utf8_to_uvchr_buf> to get at the value, |
3a2263fe | 3673 | unless C<UTF8_IS_INVARIANT(*s)> in which case you can use C<*s>. |
a422fd2d SC |
3674 | |
3675 | =item * | |
3676 | ||
61ad4b94 KW |
3677 | When writing a character UV to a UTF-8 string, B<always> use |
3678 | C<uvchr_to_utf8>, unless C<UVCHR_IS_INVARIANT(uv))> in which case | |
3a2263fe | 3679 | you can use C<*s = uv>. |
a422fd2d SC |
3680 | |
3681 | =item * | |
3682 | ||
10e2eb10 FC |
3683 | Mixing UTF-8 and non-UTF-8 strings is |
3684 | tricky. Use C<bytes_to_utf8> to get | |
2bbc8d55 | 3685 | a new string which is UTF-8 encoded, and then combine them. |
a422fd2d SC |
3686 | |
3687 | =back | |
3688 | ||
53e06cf0 SC |
3689 | =head1 Custom Operators |
3690 | ||
2a0fd0f1 | 3691 | Custom operator support is an experimental feature that allows you to |
10e2eb10 | 3692 | define your own ops. This is primarily to allow the building of |
53e06cf0 SC |
3693 | interpreters for other languages in the Perl core, but it also allows |
3694 | optimizations through the creation of "macro-ops" (ops which perform the | |
3695 | functions of multiple ops which are usually executed together, such as | |
1aa6ea50 | 3696 | C<gvsv, gvsv, add>.) |
53e06cf0 | 3697 | |
10e2eb10 | 3698 | This feature is implemented as a new op type, C<OP_CUSTOM>. The Perl |
53e06cf0 | 3699 | core does not "know" anything special about this op type, and so it will |
10e2eb10 | 3700 | not be involved in any optimizations. This also means that you can |
61ad4b94 KW |
3701 | define your custom ops to be any op structure -- unary, binary, list and |
3702 | so on -- you like. | |
53e06cf0 | 3703 | |
10e2eb10 FC |
3704 | It's important to know what custom operators won't do for you. They |
3705 | won't let you add new syntax to Perl, directly. They won't even let you | |
3706 | add new keywords, directly. In fact, they won't change the way Perl | |
3707 | compiles a program at all. You have to do those changes yourself, after | |
3708 | Perl has compiled the program. You do this either by manipulating the op | |
53e06cf0 SC |
3709 | tree using a C<CHECK> block and the C<B::Generate> module, or by adding |
3710 | a custom peephole optimizer with the C<optimize> module. | |
3711 | ||
3712 | When you do this, you replace ordinary Perl ops with custom ops by | |
407f86e1 | 3713 | creating ops with the type C<OP_CUSTOM> and the C<op_ppaddr> of your own |
10e2eb10 FC |
3714 | PP function. This should be defined in XS code, and should look like |
3715 | the PP ops in C<pp_*.c>. You are responsible for ensuring that your op | |
53e06cf0 SC |
3716 | takes the appropriate number of values from the stack, and you are |
3717 | responsible for adding stack marks if necessary. | |
3718 | ||
3719 | You should also "register" your op with the Perl interpreter so that it | |
10e2eb10 | 3720 | can produce sensible error and warning messages. Since it is possible to |
53e06cf0 | 3721 | have multiple custom ops within the one "logical" op type C<OP_CUSTOM>, |
9733086d | 3722 | Perl uses the value of C<< o->op_ppaddr >> to determine which custom op |
10e2eb10 | 3723 | it is dealing with. You should create an C<XOP> structure for each |
9733086d BM |
3724 | ppaddr you use, set the properties of the custom op with |
3725 | C<XopENTRY_set>, and register the structure against the ppaddr using | |
10e2eb10 | 3726 | C<Perl_custom_op_register>. A trivial example might look like: |
9733086d | 3727 | |
7cc7ada7 | 3728 | =for apidoc_section $optree_manipulation |
63dbc4a9 KW |
3729 | =for apidoc Ayh||XOP |
3730 | ||
9733086d BM |
3731 | static XOP my_xop; |
3732 | static OP *my_pp(pTHX); | |
3733 | ||
3734 | BOOT: | |
3735 | XopENTRY_set(&my_xop, xop_name, "myxop"); | |
3736 | XopENTRY_set(&my_xop, xop_desc, "Useless custom op"); | |
3737 | Perl_custom_op_register(aTHX_ my_pp, &my_xop); | |
3738 | ||
3739 | The available fields in the structure are: | |
3740 | ||
3741 | =over 4 | |
3742 | ||
3743 | =item xop_name | |
3744 | ||
10e2eb10 | 3745 | A short name for your op. This will be included in some error messages, |
9733086d BM |
3746 | and will also be returned as C<< $op->name >> by the L<B|B> module, so |
3747 | it will appear in the output of module like L<B::Concise|B::Concise>. | |
3748 | ||
3749 | =item xop_desc | |
3750 | ||
3751 | A short description of the function of the op. | |
3752 | ||
3753 | =item xop_class | |
3754 | ||
10e2eb10 | 3755 | Which of the various C<*OP> structures this op uses. This should be one of |
9733086d BM |
3756 | the C<OA_*> constants from F<op.h>, namely |
3757 | ||
3758 | =over 4 | |
3759 | ||
3760 | =item OA_BASEOP | |
3761 | ||
3762 | =item OA_UNOP | |
3763 | ||
3764 | =item OA_BINOP | |
3765 | ||
3766 | =item OA_LOGOP | |
3767 | ||
3768 | =item OA_LISTOP | |
3769 | ||
3770 | =item OA_PMOP | |
3771 | ||
3772 | =item OA_SVOP | |
3773 | ||
3774 | =item OA_PADOP | |
3775 | ||
3776 | =item OA_PVOP_OR_SVOP | |
3777 | ||
10e2eb10 | 3778 | This should be interpreted as 'C<PVOP>' only. The C<_OR_SVOP> is because |
9733086d BM |
3779 | the only core C<PVOP>, C<OP_TRANS>, can sometimes be a C<SVOP> instead. |
3780 | ||
3781 | =item OA_LOOP | |
3782 | ||
3783 | =item OA_COP | |
3784 | ||
6ef63541 KW |
3785 | =for apidoc_section $optree_manipulation |
3786 | =for apidoc Amnh||OA_BASEOP | |
3787 | =for apidoc_item OA_BINOP | |
3788 | =for apidoc_item OA_COP | |
3789 | =for apidoc_item OA_LISTOP | |
3790 | =for apidoc_item OA_LOGOP | |
1607e393 | 3791 | =for apidoc_item OA_LOOP |
6ef63541 KW |
3792 | =for apidoc_item OA_PADOP |
3793 | =for apidoc_item OA_PMOP | |
3794 | =for apidoc_item OA_PVOP_OR_SVOP | |
3795 | =for apidoc_item OA_SVOP | |
3796 | =for apidoc_item OA_UNOP | |
6ef63541 | 3797 | |
9733086d BM |
3798 | =back |
3799 | ||
3800 | The other C<OA_*> constants should not be used. | |
3801 | ||
3802 | =item xop_peep | |
3803 | ||
3804 | This member is of type C<Perl_cpeep_t>, which expands to C<void | |
10e2eb10 | 3805 | (*Perl_cpeep_t)(aTHX_ OP *o, OP *oldop)>. If it is set, this function |
9733086d | 3806 | will be called from C<Perl_rpeep> when ops of this type are encountered |
10e2eb10 | 3807 | by the peephole optimizer. I<o> is the OP that needs optimizing; |
9733086d BM |
3808 | I<oldop> is the previous OP optimized, whose C<op_next> points to I<o>. |
3809 | ||
7cc7ada7 | 3810 | =for apidoc_section $optree_manipulation |
63dbc4a9 KW |
3811 | =for apidoc Ayh||Perl_cpeep_t |
3812 | ||
9733086d | 3813 | =back |
53e06cf0 | 3814 | |
e7d4c058 | 3815 | C<B::Generate> directly supports the creation of custom ops by name. |
53e06cf0 | 3816 | |
e55ec392 PE |
3817 | =head1 Stacks |
3818 | ||
3819 | Descriptions above occasionally refer to "the stack", but there are in fact | |
3820 | many stack-like data structures within the perl interpreter. When otherwise | |
3821 | unqualified, "the stack" usually refers to the value stack. | |
3822 | ||
3823 | The various stacks have different purposes, and operate in slightly different | |
3824 | ways. Their differences are noted below. | |
3825 | ||
3826 | =head2 Value Stack | |
3827 | ||
3828 | This stack stores the values that regular perl code is operating on, usually | |
3829 | intermediate values of expressions within a statement. The stack itself is | |
3830 | formed of an array of SV pointers. | |
3831 | ||
3832 | The base of this stack is pointed to by the interpreter variable | |
3833 | C<PL_stack_base>, of type C<SV **>. | |
3834 | ||
6ef63541 KW |
3835 | =for apidoc_section $stack |
3836 | =for apidoc Amnh||PL_stack_base | |
3837 | ||
e55ec392 PE |
3838 | The head of the stack is C<PL_stack_sp>, and points to the most |
3839 | recently-pushed item. | |
3840 | ||
6ef63541 KW |
3841 | =for apidoc Amnh||PL_stack_sp |
3842 | ||
e55ec392 PE |
3843 | Items are pushed to the stack by using the C<PUSHs()> macro or its variants |
3844 | described above; C<XPUSHs()>, C<mPUSHs()>, C<mXPUSHs()> and the typed | |
3845 | versions. Note carefully that the non-C<X> versions of these macros do not | |
3846 | check the size of the stack and assume it to be big enough. These must be | |
3847 | paired with a suitable check of the stack's size, such as the C<EXTEND> macro | |
3848 | to ensure it is large enough. For example | |
3849 | ||
3850 | EXTEND(SP, 4); | |
3851 | mPUSHi(10); | |
3852 | mPUSHi(20); | |
3853 | mPUSHi(30); | |
3854 | mPUSHi(40); | |
3855 | ||
3856 | This is slightly more performant than making four separate checks in four | |
3857 | separate C<mXPUSHi()> calls. | |
3858 | ||
3859 | As a further performance optimisation, the various C<PUSH> macros all operate | |
3860 | using a local variable C<SP>, rather than the interpreter-global variable | |
3861 | C<PL_stack_sp>. This variable is declared by the C<dSP> macro - though it is | |
3862 | normally implied by XSUBs and similar so it is rare you have to consider it | |
3863 | directly. Once declared, the C<PUSH> macros will operate only on this local | |
3864 | variable, so before invoking any other perl core functions you must use the | |
3865 | C<PUTBACK> macro to return the value from the local C<SP> variable back to | |
3866 | the interpreter variable. Similarly, after calling a perl core function which | |
3867 | may have had reason to move the stack or push/pop values to it, you must use | |
3868 | the C<SPAGAIN> macro which refreshes the local C<SP> value back from the | |
3869 | interpreter one. | |
3870 | ||
3871 | Items are popped from the stack by using the C<POPs> macro or its typed | |
3872 | versions, There is also a macro C<TOPs> that inspects the topmost item without | |
3873 | removing it. | |
3874 | ||
6ef63541 KW |
3875 | =for apidoc_section $stack |
3876 | =for apidoc Amnh||TOPs | |
3877 | ||
e55ec392 PE |
3878 | Note specifically that SV pointers on the value stack do not contribute to the |
3879 | overall reference count of the xVs being referred to. If newly-created xVs are | |
3880 | being pushed to the stack you must arrange for them to be destroyed at a | |
3881 | suitable time; usually by using one of the C<mPUSH*> macros or C<sv_2mortal()> | |
3882 | to mortalise the xV. | |
3883 | ||
3884 | =head2 Mark Stack | |
3885 | ||
3886 | The value stack stores individual perl scalar values as temporaries between | |
3887 | expressions. Some perl expressions operate on entire lists; for that purpose | |
3888 | we need to know where on the stack each list begins. This is the purpose of the | |
3889 | mark stack. | |
3890 | ||
3891 | The mark stack stores integers as I32 values, which are the height of the | |
3892 | value stack at the time before the list began; thus the mark itself actually | |
3893 | points to the value stack entry one before the list. The list itself starts at | |
3894 | C<mark + 1>. | |
3895 | ||
3896 | The base of this stack is pointed to by the interpreter variable | |
3897 | C<PL_markstack>, of type C<I32 *>. | |
3898 | ||
6ef63541 KW |
3899 | =for apidoc_section $stack |
3900 | =for apidoc Amnh||PL_markstack | |
3901 | ||
e55ec392 PE |
3902 | The head of the stack is C<PL_markstack_ptr>, and points to the most |
3903 | recently-pushed item. | |
3904 | ||
6ef63541 KW |
3905 | =for apidoc Amnh||PL_markstack_ptr |
3906 | ||
e55ec392 PE |
3907 | Items are pushed to the stack by using the C<PUSHMARK()> macro. Even though |
3908 | the stack itself stores (value) stack indices as integers, the C<PUSHMARK> | |
3909 | macro should be given a stack pointer directly; it will calculate the index | |
3910 | offset by comparing to the C<PL_stack_sp> variable. Thus almost always the | |
3911 | code to perform this is | |
3912 | ||
3913 | PUSHMARK(SP); | |
3914 | ||
3915 | Items are popped from the stack by the C<POPMARK> macro. There is also a macro | |
3916 | C<TOPMARK> that inspects the topmost item without removing it. These macros | |
3917 | return I32 index values directly. There is also the C<dMARK> macro which | |
3918 | declares a new SV double-pointer variable, called C<mark>, which points at the | |
3919 | marked stack slot; this is the usual macro that C code will use when operating | |
3920 | on lists given on the stack. | |
3921 | ||
3922 | As noted above, the C<mark> variable itself will point at the most recently | |
3923 | pushed value on the value stack before the list begins, and so the list itself | |
3924 | starts at C<mark + 1>. The values of the list may be iterated by code such as | |
3925 | ||
3926 | for(SV **svp = mark + 1; svp <= PL_stack_sp; svp++) { | |
3927 | SV *item = *svp; | |
3928 | ... | |
3929 | } | |
3930 | ||
3931 | Note specifically in the case that the list is already empty, C<mark> will | |
3932 | equal C<PL_stack_sp>. | |
3933 | ||
3934 | Because the C<mark> variable is converted to a pointer on the value stack, | |
3935 | extra care must be taken if C<EXTEND> or any of the C<XPUSH> macros are | |
3936 | invoked within the function, because the stack may need to be moved to | |
3937 | extend it and so the existing pointer will now be invalid. If this may be a | |
3938 | problem, a possible solution is to track the mark offset as an integer and | |
3939 | track the mark itself later on after the stack had been moved. | |
3940 | ||
3941 | I32 markoff = POPMARK; | |
3942 | ||
3943 | ... | |
3944 | ||
3945 | SP **mark = PL_stack_base + markoff; | |
3946 | ||
3947 | =head2 Temporaries Stack | |
3948 | ||
3949 | As noted above, xV references on the main value stack do not contribute to the | |
3950 | reference count of an xV, and so another mechanism is used to track when | |
3951 | temporary values which live on the stack must be released. This is the job of | |
3952 | the temporaries stack. | |
3953 | ||
3954 | The temporaries stack stores pointers to xVs whose reference counts will be | |
3955 | decremented soon. | |
3956 | ||
3957 | The base of this stack is pointed to by the interpreter variable | |
3958 | C<PL_tmps_stack>, of type C<SV **>. | |
3959 | ||
6ef63541 KW |
3960 | =for apidoc_section $stack |
3961 | =for apidoc Amnh||PL_tmps_stack | |
3962 | ||
e55ec392 PE |
3963 | The head of the stack is indexed by C<PL_tmps_ix>, an integer which stores the |
3964 | index in the array of the most recently-pushed item. | |
3965 | ||
6ef63541 KW |
3966 | =for apidoc Amnh||PL_tmps_ix |
3967 | ||
e55ec392 PE |
3968 | There is no public API to directly push items to the temporaries stack. Instead, |
3969 | the API function C<sv_2mortal()> is used to mortalize an xV, adding its | |
3970 | address to the temporaries stack. | |
3971 | ||
3972 | Likewise, there is no public API to read values from the temporaries stack. | |
b1b78d72 | 3973 | Instead, the macros C<SAVETMPS> and C<FREETMPS> are used. The C<SAVETMPS> |
e55ec392 PE |
3974 | macro establishes the base levels of the temporaries stack, by capturing the |
3975 | current value of C<PL_tmps_ix> into C<PL_tmps_floor> and saving the previous | |
3976 | value to the save stack. Thereafter, whenever C<FREETMPS> is invoked all of | |
3977 | the temporaries that have been pushed since that level are reclaimed. | |
3978 | ||
6ef63541 KW |
3979 | =for apidoc_section $stack |
3980 | =for apidoc Amnh||PL_tmps_floor | |
3981 | ||
e55ec392 PE |
3982 | While it is common to see these two macros in pairs within an C<ENTER>/ |
3983 | C<LEAVE> pair, it is not necessary to match them. It is permitted to invoke | |
3984 | C<FREETMPS> multiple times since the most recent C<SAVETMPS>; for example in a | |
3985 | loop iterating over elements of a list. While you can invoke C<SAVETMPS> | |
3986 | multiple times within a scope pair, it is unlikely to be useful. Subsequent | |
3987 | invocations will move the temporaries floor further up, thus effectively | |
3988 | trapping the existing temporaries to only be released at the end of the scope. | |
3989 | ||
3990 | =head2 Save Stack | |
3991 | ||
3992 | The save stack is used by perl to implement the C<local> keyword and other | |
3993 | similar behaviours; any cleanup operations that need to be performed when | |
3994 | leaving the current scope. Items pushed to this stack generally capture the | |
3995 | current value of some internal variable or state, which will be restored when | |
3996 | the scope is unwound due to leaving, C<return>, C<die>, C<goto> or other | |
3997 | reasons. | |
3998 | ||
3999 | Whereas other perl internal stacks store individual items all of the same type | |
4000 | (usually SV pointers or integers), the items pushed to the save stack are | |
4001 | formed of many different types, having multiple fields to them. For example, | |
4002 | the C<SAVEt_INT> type needs to store both the address of the C<int> variable | |
4003 | to restore, and the value to restore it to. This information could have been | |
4004 | stored using fields of a C<struct>, but would have to be large enough to store | |
4005 | three pointers in the largest case, which would waste a lot of space in most | |
4006 | of the smaller cases. | |
4007 | ||
6ef63541 KW |
4008 | =for apidoc_section $stack |
4009 | =for apidoc Amnh||SAVEt_INT | |
4010 | ||
e55ec392 PE |
4011 | Instead, the stack stores information in a variable-length encoding of C<ANY> |
4012 | structures. The final value pushed is stored in the C<UV> field which encodes | |
5ab5717f | 4013 | the kind of item held by the preceding items; the count and types of which |
e55ec392 PE |
4014 | will depend on what kind of item is being stored. The kind field is pushed |
4015 | last because that will be the first field to be popped when unwinding items | |
4016 | from the stack. | |
4017 | ||
4018 | The base of this stack is pointed to by the interpreter variable | |
4019 | C<PL_savestack>, of type C<ANY *>. | |
4020 | ||
6ef63541 KW |
4021 | =for apidoc_section $stack |
4022 | =for apidoc Amnh||PL_savestack | |
4023 | ||
e55ec392 PE |
4024 | The head of the stack is indexed by C<PL_savestack_ix>, an integer which |
4025 | stores the index in the array at which the next item should be pushed. (Note | |
4026 | that this is different to most other stacks, which reference the most | |
4027 | recently-pushed item). | |
4028 | ||
6ef63541 KW |
4029 | =for apidoc_section $stack |
4030 | =for apidoc Amnh||PL_savestack_ix | |
4031 | ||
e55ec392 PE |
4032 | Items are pushed to the save stack by using the various C<SAVE...()> macros. |
4033 | Many of these macros take a variable and store both its address and current | |
4034 | value on the save stack, ensuring that value gets restored on scope exit. | |
4035 | ||
4036 | SAVEI8(i8) | |
4037 | SAVEI16(i16) | |
4038 | SAVEI32(i32) | |
4039 | SAVEINT(i) | |
4040 | ... | |
4041 | ||
4042 | There are also a variety of other special-purpose macros which save particular | |
4043 | types or values of interest. C<SAVETMPS> has already been mentioned above. | |
4044 | Others include C<SAVEFREEPV> which arranges for a PV (i.e. a string buffer) to | |
4045 | be freed, or C<SAVEDESTRUCTOR> which arranges for a given function pointer to | |
4046 | be invoked on scope exit. A full list of such macros can be found in | |
4047 | F<scope.h>. | |
4048 | ||
4049 | There is no public API for popping individual values or items from the save | |
4050 | stack. Instead, via the scope stack, the C<ENTER> and C<LEAVE> pair form a way | |
4051 | to start and stop nested scopes. Leaving a nested scope via C<LEAVE> will | |
4052 | restore all of the saved values that had been pushed since the most recent | |
4053 | C<ENTER>. | |
4054 | ||
4055 | =head2 Scope Stack | |
4056 | ||
4057 | As with the mark stack to the value stack, the scope stack forms a pair with | |
170a6378 | 4058 | the save stack. The scope stack stores the height of the save stack at which |
e55ec392 PE |
4059 | nested scopes begin, and allows the save stack to be unwound back to that |
4060 | point when the scope is left. | |
4061 | ||
4062 | When perl is built with debugging enabled, there is a second part to this | |
4063 | stack storing human-readable string names describing the type of stack | |
4064 | context. Each push operation saves the name as well as the height of the save | |
4065 | stack, and each pop operation checks the topmost name with what is expected, | |
4066 | causing an assertion failure if the name does not match. | |
4067 | ||
4068 | The base of this stack is pointed to by the interpreter variable | |
4069 | C<PL_scopestack>, of type C<I32 *>. If enabled, the scope stack names are | |
4070 | stored in a separate array pointed to by C<PL_scopestack_name>, of type | |
4071 | C<const char **>. | |
4072 | ||
6ef63541 KW |
4073 | =for apidoc_section $stack |
4074 | =for apidoc Amnh||PL_scopestack | |
4075 | =for apidoc Amnh||PL_scopestack_name | |
4076 | ||
e55ec392 PE |
4077 | The head of the stack is indexed by C<PL_scopestack_ix>, an integer which |
4078 | stores the index of the array or arrays at which the next item should be | |
4079 | pushed. (Note that this is different to most other stacks, which reference the | |
4080 | most recently-pushed item). | |
4081 | ||
6ef63541 KW |
4082 | =for apidoc_section $stack |
4083 | =for apidoc Amnh||PL_scopestack_ix | |
4084 | ||
e55ec392 PE |
4085 | Values are pushed to the scope stack using the C<ENTER> macro, which begins a |
4086 | new nested scope. Any items pushed to the save stack are then restored at the | |
4087 | next nested invocation of the C<LEAVE> macro. | |
bf5e9371 DM |
4088 | |
4089 | =head1 Dynamic Scope and the Context Stack | |
4090 | ||
4091 | B<Note:> this section describes a non-public internal API that is subject | |
4092 | to change without notice. | |
4093 | ||
4094 | =head2 Introduction to the context stack | |
4095 | ||
4096 | In Perl, dynamic scoping refers to the runtime nesting of things like | |
4097 | subroutine calls, evals etc, as well as the entering and exiting of block | |
4098 | scopes. For example, the restoring of a C<local>ised variable is | |
4099 | determined by the dynamic scope. | |
4100 | ||
4101 | Perl tracks the dynamic scope by a data structure called the context | |
4102 | stack, which is an array of C<PERL_CONTEXT> structures, and which is | |
4103 | itself a big union for all the types of context. Whenever a new scope is | |
4104 | entered (such as a block, a C<for> loop, or a subroutine call), a new | |
4105 | context entry is pushed onto the stack. Similarly when leaving a block or | |
4106 | returning from a subroutine call etc. a context is popped. Since the | |
4107 | context stack represents the current dynamic scope, it can be searched. | |
4108 | For example, C<next LABEL> searches back through the stack looking for a | |
4109 | loop context that matches the label; C<return> pops contexts until it | |
4110 | finds a sub or eval context or similar; C<caller> examines sub contexts on | |
4111 | the stack. | |
4112 | ||
6ef63541 KW |
4113 | =for apidoc_section $concurrency |
4114 | =for apidoc Cyh||PERL_CONTEXT | |
4115 | ||
bf5e9371 DM |
4116 | Each context entry is labelled with a context type, C<cx_type>. Typical |
4117 | context types are C<CXt_SUB>, C<CXt_EVAL> etc., as well as C<CXt_BLOCK> | |
4118 | and C<CXt_NULL> which represent a basic scope (as pushed by C<pp_enter>) | |
4119 | and a sort block. The type determines which part of the context union are | |
4120 | valid. | |
4121 | ||
6ef63541 KW |
4122 | =for apidoc Cyh ||cx_type |
4123 | ||
4124 | =for apidoc Cmnh||CXt_BLOCK | |
4125 | =for apidoc_item ||CXt_EVAL | |
4126 | =for apidoc_item ||CXt_FORMAT | |
4127 | =for apidoc_item ||CXt_GIVEN | |
4128 | =for apidoc_item ||CXt_LOOP_ARY | |
4129 | =for apidoc_item ||CXt_LOOP_LAZYIV | |
4130 | =for apidoc_item ||CXt_LOOP_LAZYSV | |
4131 | =for apidoc_item ||CXt_LOOP_LIST | |
4132 | =for apidoc_item ||CXt_LOOP_PLAIN | |
4133 | =for apidoc_item ||CXt_NULL | |
4134 | =for apidoc_item ||CXt_SUB | |
4135 | =for apidoc_item ||CXt_SUBST | |
4136 | =for apidoc_item ||CXt_WHEN | |
4137 | ||
bf5e9371 DM |
4138 | The main division in the context struct is between a substitution scope |
4139 | (C<CXt_SUBST>) and block scopes, which are everything else. The former is | |
d7c7f8cb | 4140 | just used while executing C<s///e>, and won't be discussed further |
bf5e9371 DM |
4141 | here. |
4142 | ||
4143 | All the block scope types share a common base, which corresponds to | |
4144 | C<CXt_BLOCK>. This stores the old values of various scope-related | |
4145 | variables like C<PL_curpm>, as well as information about the current | |
4146 | scope, such as C<gimme>. On scope exit, the old variables are restored. | |
4147 | ||
4148 | Particular block scope types store extra per-type information. For | |
4149 | example, C<CXt_SUB> stores the currently executing CV, while the various | |
4150 | for loop types might hold the original loop variable SV. On scope exit, | |
4151 | the per-type data is processed; for example the CV has its reference count | |
4152 | decremented, and the original loop variable is restored. | |
4153 | ||
4154 | The macro C<cxstack> returns the base of the current context stack, while | |
4155 | C<cxstack_ix> is the index of the current frame within that stack. | |
4156 | ||
6ef63541 KW |
4157 | =for apidoc_section $concurrency |
4158 | =for apidoc Cmnh|PERL_CONTEXT *|cxstack | |
4159 | =for apidoc Cmnh|I32|cxstack_ix | |
4160 | ||
bf5e9371 DM |
4161 | In fact, the context stack is actually part of a stack-of-stacks system; |
4162 | whenever something unusual is done such as calling a C<DESTROY> or tie | |
4163 | handler, a new stack is pushed, then popped at the end. | |
4164 | ||
4165 | Note that the API described here changed considerably in perl 5.24; prior | |
4166 | to that, big macros like C<PUSHBLOCK> and C<POPSUB> were used; in 5.24 | |
4167 | they were replaced by the inline static functions described below. In | |
4168 | addition, the ordering and detail of how these macros/function work | |
4169 | changed in many ways, often subtly. In particular they didn't handle | |
4170 | saving the savestack and temps stack positions, and required additional | |
4171 | C<ENTER>, C<SAVETMPS> and C<LEAVE> compared to the new functions. The | |
4172 | old-style macros will not be described further. | |
4173 | ||
4174 | ||
4175 | =head2 Pushing contexts | |
4176 | ||
4177 | For pushing a new context, the two basic functions are | |
4178 | C<cx = cx_pushblock()>, which pushes a new basic context block and returns | |
4179 | its address, and a family of similar functions with names like | |
4180 | C<cx_pushsub(cx)> which populate the additional type-dependent fields in | |
4181 | the C<cx> struct. Note that C<CXt_NULL> and C<CXt_BLOCK> don't have their | |
4182 | own push functions, as they don't store any data beyond that pushed by | |
4183 | C<cx_pushblock>. | |
4184 | ||
4185 | The fields of the context struct and the arguments to the C<cx_*> | |
4186 | functions are subject to change between perl releases, representing | |
4187 | whatever is convenient or efficient for that release. | |
4188 | ||
4189 | A typical context stack pushing can be found in C<pp_entersub>; the | |
4190 | following shows a simplified and stripped-down example of a non-XS call, | |
4191 | along with comments showing roughly what each function does. | |
4192 | ||
61f554bd KW |
4193 | dMARK; |
4194 | U8 gimme = GIMME_V; | |
4195 | bool hasargs = cBOOL(PL_op->op_flags & OPf_STACKED); | |
4196 | OP *retop = PL_op->op_next; | |
4197 | I32 old_ss_ix = PL_savestack_ix; | |
4198 | CV *cv = ....; | |
4199 | ||
4200 | /* ... make mortal copies of stack args which are PADTMPs here ... */ | |
4201 | ||
4202 | /* ... do any additional savestack pushes here ... */ | |
4203 | ||
4204 | /* Now push a new context entry of type 'CXt_SUB'; initially just | |
4205 | * doing the actions common to all block types: */ | |
4206 | ||
4207 | cx = cx_pushblock(CXt_SUB, gimme, MARK, old_ss_ix); | |
4208 | ||
4209 | /* this does (approximately): | |
4210 | CXINC; /* cxstack_ix++ (grow if necessary) */ | |
4211 | cx = CX_CUR(); /* and get the address of new frame */ | |
4212 | cx->cx_type = CXt_SUB; | |
4213 | cx->blk_gimme = gimme; | |
4214 | cx->blk_oldsp = MARK - PL_stack_base; | |
4215 | cx->blk_oldsaveix = old_ss_ix; | |
4216 | cx->blk_oldcop = PL_curcop; | |
4217 | cx->blk_oldmarksp = PL_markstack_ptr - PL_markstack; | |
4218 | cx->blk_oldscopesp = PL_scopestack_ix; | |
4219 | cx->blk_oldpm = PL_curpm; | |
4220 | cx->blk_old_tmpsfloor = PL_tmps_floor; | |
4221 | ||
4222 | PL_tmps_floor = PL_tmps_ix; | |
4223 | */ | |
bf5e9371 DM |
4224 | |
4225 | ||
61f554bd KW |
4226 | /* then update the new context frame with subroutine-specific info, |
4227 | * such as the CV about to be executed: */ | |
bf5e9371 | 4228 | |
61f554bd | 4229 | cx_pushsub(cx, cv, retop, hasargs); |
bf5e9371 | 4230 | |
61f554bd KW |
4231 | /* this does (approximately): |
4232 | cx->blk_sub.cv = cv; | |
4233 | cx->blk_sub.olddepth = CvDEPTH(cv); | |
4234 | cx->blk_sub.prevcomppad = PL_comppad; | |
4235 | cx->cx_type |= (hasargs) ? CXp_HASARGS : 0; | |
4236 | cx->blk_sub.retop = retop; | |
4237 | SvREFCNT_inc_simple_void_NN(cv); | |
4238 | */ | |
bf5e9371 | 4239 | |
6ef63541 KW |
4240 | =for apidoc_section $concurrency |
4241 | =for apidoc Cmnh||CXINC | |
4242 | ||
bf5e9371 DM |
4243 | Note that C<cx_pushblock()> sets two new floors: for the args stack (to |
4244 | C<MARK>) and the temps stack (to C<PL_tmps_ix>). While executing at this | |
4245 | scope level, every C<nextstate> (amongst others) will reset the args and | |
4246 | tmps stack levels to these floors. Note that since C<cx_pushblock> uses | |
4247 | the current value of C<PL_tmps_ix> rather than it being passed as an arg, | |
4248 | this dictates at what point C<cx_pushblock> should be called. In | |
4249 | particular, any new mortals which should be freed only on scope exit | |
4250 | (rather than at the next C<nextstate>) should be created first. | |
4251 | ||
4252 | Most callers of C<cx_pushblock> simply set the new args stack floor to the | |
4253 | top of the previous stack frame, but for C<CXt_LOOP_LIST> it stores the | |
4254 | items being iterated over on the stack, and so sets C<blk_oldsp> to the | |
4255 | top of these items instead. Note that, contrary to its name, C<blk_oldsp> | |
4256 | doesn't always represent the value to restore C<PL_stack_sp> to on scope | |
4257 | exit. | |
4258 | ||
4259 | Note the early capture of C<PL_savestack_ix> to C<old_ss_ix>, which is | |
4260 | later passed as an arg to C<cx_pushblock>. In the case of C<pp_entersub>, | |
4261 | this is because, although most values needing saving are stored in fields | |
4262 | of the context struct, an extra value needs saving only when the debugger | |
4263 | is running, and it doesn't make sense to bloat the struct for this rare | |
4264 | case. So instead it is saved on the savestack. Since this value gets | |
4265 | calculated and saved before the context is pushed, it is necessary to pass | |
4266 | the old value of C<PL_savestack_ix> to C<cx_pushblock>, to ensure that the | |
4267 | saved value gets freed during scope exit. For most users of | |
4268 | C<cx_pushblock>, where nothing needs pushing on the save stack, | |
4269 | C<PL_savestack_ix> is just passed directly as an arg to C<cx_pushblock>. | |
4270 | ||
4271 | Note that where possible, values should be saved in the context struct | |
4272 | rather than on the save stack; it's much faster that way. | |
4273 | ||
4274 | Normally C<cx_pushblock> should be immediately followed by the appropriate | |
4275 | C<cx_pushfoo>, with nothing between them; this is because if code | |
4276 | in-between could die (e.g. a warning upgraded to fatal), then the context | |
4277 | stack unwinding code in C<dounwind> would see (in the example above) a | |
4278 | C<CXt_SUB> context frame, but without all the subroutine-specific fields | |
4279 | set, and crashes would soon ensue. | |
4280 | ||
82f65d69 KW |
4281 | =for apidoc dounwind |
4282 | ||
bf5e9371 DM |
4283 | Where the two must be separate, initially set the type to C<CXt_NULL> or |
4284 | C<CXt_BLOCK>, and later change it to C<CXt_foo> when doing the | |
4285 | C<cx_pushfoo>. This is exactly what C<pp_enteriter> does, once it's | |
4286 | determined which type of loop it's pushing. | |
4287 | ||
4288 | =head2 Popping contexts | |
4289 | ||
4290 | Contexts are popped using C<cx_popsub()> etc. and C<cx_popblock()>. Note | |
4291 | however, that unlike C<cx_pushblock>, neither of these functions actually | |
4292 | decrement the current context stack index; this is done separately using | |
4293 | C<CX_POP()>. | |
4294 | ||
6ef63541 KW |
4295 | =for apidoc_section $concurrency |
4296 | =for apidoc Cmh|void|CX_POP|PERL_CONTEXT* cx | |
4297 | ||
bf5e9371 DM |
4298 | There are two main ways that contexts are popped. During normal execution |
4299 | as scopes are exited, functions like C<pp_leave>, C<pp_leaveloop> and | |
4300 | C<pp_leavesub> process and pop just one context using C<cx_popfoo> and | |
4301 | C<cx_popblock>. On the other hand, things like C<pp_return> and C<next> | |
4302 | may have to pop back several scopes until a sub or loop context is found, | |
4303 | and exceptions (such as C<die>) need to pop back contexts until an eval | |
4304 | context is found. Both of these are accomplished by C<dounwind()>, which | |
4305 | is capable of processing and popping all contexts above the target one. | |
4306 | ||
4307 | Here is a typical example of context popping, as found in C<pp_leavesub> | |
4308 | (simplified slightly): | |
4309 | ||
61f554bd KW |
4310 | U8 gimme; |
4311 | PERL_CONTEXT *cx; | |
4312 | SV **oldsp; | |
4313 | OP *retop; | |
bf5e9371 | 4314 | |
61f554bd | 4315 | cx = CX_CUR(); |
bf5e9371 | 4316 | |
61f554bd KW |
4317 | gimme = cx->blk_gimme; |
4318 | oldsp = PL_stack_base + cx->blk_oldsp; /* last arg of previous frame */ | |
bf5e9371 | 4319 | |
61f554bd KW |
4320 | if (gimme == G_VOID) |
4321 | PL_stack_sp = oldsp; | |
4322 | else | |
4323 | leave_adjust_stacks(oldsp, oldsp, gimme, 0); | |
bf5e9371 | 4324 | |
61f554bd KW |
4325 | CX_LEAVE_SCOPE(cx); |
4326 | cx_popsub(cx); | |
4327 | cx_popblock(cx); | |
4328 | retop = cx->blk_sub.retop; | |
4329 | CX_POP(cx); | |
bf5e9371 | 4330 | |
61f554bd | 4331 | return retop; |
bf5e9371 | 4332 | |
6ef63541 KW |
4333 | =for apidoc_section $concurrency |
4334 | =for apidoc Cmh||CX_CUR | |
4335 | ||
bf5e9371 DM |
4336 | The steps above are in a very specific order, designed to be the reverse |
4337 | order of when the context was pushed. The first thing to do is to copy | |
a3815e44 | 4338 | and/or protect any return arguments and free any temps in the current |
bf5e9371 DM |
4339 | scope. Scope exits like an rvalue sub normally return a mortal copy of |
4340 | their return args (as opposed to lvalue subs). It is important to make | |
4341 | this copy before the save stack is popped or variables are restored, or | |
4342 | bad things like the following can happen: | |
4343 | ||
4344 | sub f { my $x =...; $x } # $x freed before we get to copy it | |
4345 | sub f { /(...)/; $1 } # PL_curpm restored before $1 copied | |
4346 | ||
4347 | Although we wish to free any temps at the same time, we have to be careful | |
4348 | not to free any temps which are keeping return args alive; nor to free the | |
4349 | temps we have just created while mortal copying return args. Fortunately, | |
4350 | C<leave_adjust_stacks()> is capable of making mortal copies of return args, | |
4351 | shifting args down the stack, and only processing those entries on the | |
4352 | temps stack that are safe to do so. | |
4353 | ||
4354 | In void context no args are returned, so it's more efficient to skip | |
4355 | calling C<leave_adjust_stacks()>. Also in void context, a C<nextstate> op | |
4356 | is likely to be imminently called which will do a C<FREETMPS>, so there's | |
4357 | no need to do that either. | |
4358 | ||
4359 | The next step is to pop savestack entries: C<CX_LEAVE_SCOPE(cx)> is just | |
4b93f2ab | 4360 | defined as C<< LEAVE_SCOPE(cx->blk_oldsaveix) >>. Note that during the |
bf5e9371 DM |
4361 | popping, it's possible for perl to call destructors, call C<STORE> to undo |
4362 | localisations of tied vars, and so on. Any of these can die or call | |
4363 | C<exit()>. In this case, C<dounwind()> will be called, and the current | |
4364 | context stack frame will be re-processed. Thus it is vital that all steps | |
4365 | in popping a context are done in such a way to support reentrancy. The | |
4366 | other alternative, of decrementing C<cxstack_ix> I<before> processing the | |
4367 | frame, would lead to leaks and the like if something died halfway through, | |
4368 | or overwriting of the current frame. | |
4369 | ||
6ef63541 KW |
4370 | =for apidoc_section $concurrency |
4371 | =for apidoc Cmh|void|CX_LEAVE_SCOPE|PERL_CONTEXT* cx | |
4372 | ||
bf5e9371 DM |
4373 | C<CX_LEAVE_SCOPE> itself is safely re-entrant: if only half the savestack |
4374 | items have been popped before dying and getting trapped by eval, then the | |
4375 | C<CX_LEAVE_SCOPE>s in C<dounwind> or C<pp_leaveeval> will continue where | |
4376 | the first one left off. | |
4377 | ||
4378 | The next step is the type-specific context processing; in this case | |
4379 | C<cx_popsub>. In part, this looks like: | |
4380 | ||
4381 | cv = cx->blk_sub.cv; | |
4382 | CvDEPTH(cv) = cx->blk_sub.olddepth; | |
4383 | cx->blk_sub.cv = NULL; | |
4384 | SvREFCNT_dec(cv); | |
4385 | ||
4386 | where its processing the just-executed CV. Note that before it decrements | |
4387 | the CV's reference count, it nulls the C<blk_sub.cv>. This means that if | |
4388 | it re-enters, the CV won't be freed twice. It also means that you can't | |
4389 | rely on such type-specific fields having useful values after the return | |
4390 | from C<cx_popfoo>. | |
4391 | ||
4392 | Next, C<cx_popblock> restores all the various interpreter vars to their | |
4393 | previous values or previous high water marks; it expands to: | |
4394 | ||
4395 | PL_markstack_ptr = PL_markstack + cx->blk_oldmarksp; | |
4396 | PL_scopestack_ix = cx->blk_oldscopesp; | |
4397 | PL_curpm = cx->blk_oldpm; | |
4398 | PL_curcop = cx->blk_oldcop; | |
4399 | PL_tmps_floor = cx->blk_old_tmpsfloor; | |
4400 | ||
4401 | Note that it I<doesn't> restore C<PL_stack_sp>; as mentioned earlier, | |
4402 | which value to restore it to depends on the context type (specifically | |
4403 | C<for (list) {}>), and what args (if any) it returns; and that will | |
4404 | already have been sorted out earlier by C<leave_adjust_stacks()>. | |
4405 | ||
4406 | Finally, the context stack pointer is actually decremented by C<CX_POP(cx)>. | |
4407 | After this point, it's possible that that the current context frame could | |
4408 | be overwritten by other contexts being pushed. Although things like ties | |
4409 | and C<DESTROY> are supposed to work within a new context stack, it's best | |
4410 | not to assume this. Indeed on debugging builds, C<CX_POP(cx)> deliberately | |
4411 | sets C<cx> to null to detect code that is still relying on the field | |
4412 | values in that context frame. Note in the C<pp_leavesub()> example above, | |
4413 | we grab C<blk_sub.retop> I<before> calling C<CX_POP>. | |
4414 | ||
4415 | =head2 Redoing contexts | |
4416 | ||
4417 | Finally, there is C<cx_topblock(cx)>, which acts like a super-C<nextstate> | |
4418 | as regards to resetting various vars to their base values. It is used in | |
4419 | places like C<pp_next>, C<pp_redo> and C<pp_goto> where rather than | |
4420 | exiting a scope, we want to re-initialise the scope. As well as resetting | |
4421 | C<PL_stack_sp> like C<nextstate>, it also resets C<PL_markstack_ptr>, | |
4422 | C<PL_scopestack_ix> and C<PL_curpm>. Note that it doesn't do a | |
4423 | C<FREETMPS>. | |
4424 | ||
4425 | ||
091ff1b6 AC |
4426 | =head1 Slab-based operator allocation |
4427 | ||
4428 | B<Note:> this section describes a non-public internal API that is subject | |
4429 | to change without notice. | |
4430 | ||
4431 | Perl's internal error-handling mechanisms implement C<die> (and its internal | |
4432 | equivalents) using longjmp. If this occurs during lexing, parsing or | |
4433 | compilation, we must ensure that any ops allocated as part of the compilation | |
4434 | process are freed. (Older Perl versions did not adequately handle this | |
4435 | situation: when failing a parse, they would leak ops that were stored in | |
4436 | C C<auto> variables and not linked anywhere else.) | |
4437 | ||
4438 | To handle this situation, Perl uses I<op slabs> that are attached to the | |
4439 | currently-compiling CV. A slab is a chunk of allocated memory. New ops are | |
4440 | allocated as regions of the slab. If the slab fills up, a new one is created | |
4441 | (and linked from the previous one). When an error occurs and the CV is freed, | |
4442 | any ops remaining are freed. | |
4443 | ||
4444 | Each op is preceded by two pointers: one points to the next op in the slab, and | |
4445 | the other points to the slab that owns it. The next-op pointer is needed so | |
4446 | that Perl can iterate over a slab and free all its ops. (Op structures are of | |
4447 | different sizes, so the slab's ops can't merely be treated as a dense array.) | |
4448 | The slab pointer is needed for accessing a reference count on the slab: when | |
4449 | the last op on a slab is freed, the slab itself is freed. | |
4450 | ||
4451 | The slab allocator puts the ops at the end of the slab first. This will tend to | |
4452 | allocate the leaves of the op tree first, and the layout will therefore | |
4453 | hopefully be cache-friendly. In addition, this means that there's no need to | |
4454 | store the size of the slab (see below on why slabs vary in size), because Perl | |
4455 | can follow pointers to find the last op. | |
4456 | ||
eab86acb | 4457 | It might seem possible to eliminate slab reference counts altogether, by having |
091ff1b6 AC |
4458 | all ops implicitly attached to C<PL_compcv> when allocated and freed when the |
4459 | CV is freed. That would also allow C<op_free> to skip C<FreeOp> altogether, and | |
4460 | thus free ops faster. But that doesn't work in those cases where ops need to | |
4461 | survive beyond their CVs, such as re-evals. | |
4462 | ||
4463 | The CV also has to have a reference count on the slab. Sometimes the first op | |
4464 | created is immediately freed. If the reference count of the slab reaches 0, | |
4465 | then it will be freed with the CV still pointing to it. | |
4466 | ||
4467 | CVs use the C<CVf_SLABBED> flag to indicate that the CV has a reference count | |
4468 | on the slab. When this flag is set, the slab is accessible via C<CvSTART> when | |
4469 | C<CvROOT> is not set, or by subtracting two pointers C<(2*sizeof(I32 *))> from | |
4470 | C<CvROOT> when it is set. The alternative to this approach of sneaking the slab | |
4471 | into C<CvSTART> during compilation would be to enlarge the C<xpvcv> struct by | |
4472 | another pointer. But that would make all CVs larger, even though slab-based op | |
4473 | freeing is typically of benefit only for programs that make significant use of | |
4474 | string eval. | |
4475 | ||
6ef63541 KW |
4476 | =for apidoc_section $concurrency |
4477 | =for apidoc Cmnh| |CVf_SLABBED | |
4478 | =for apidoc_item |OP *|CvROOT|CV * sv | |
4479 | =for apidoc_item |OP *|CvSTART|CV * sv | |
4480 | ||
091ff1b6 AC |
4481 | When the C<CVf_SLABBED> flag is set, the CV takes responsibility for freeing |
4482 | the slab. If C<CvROOT> is not set when the CV is freed or undeffed, it is | |
4483 | assumed that a compilation error has occurred, so the op slab is traversed and | |
4484 | all the ops are freed. | |
4485 | ||
4486 | Under normal circumstances, the CV forgets about its slab (decrementing the | |
4487 | reference count) when the root is attached. So the slab reference counting that | |
4488 | happens when ops are freed takes care of freeing the slab. In some cases, the | |
4489 | CV is told to forget about the slab (C<cv_forget_slab>) precisely so that the | |
4490 | ops can survive after the CV is done away with. | |
4491 | ||
4492 | Forgetting the slab when the root is attached is not strictly necessary, but | |
4493 | avoids potential problems with C<CvROOT> being written over. There is code all | |
4494 | over the place, both in core and on CPAN, that does things with C<CvROOT>, so | |
4495 | forgetting the slab makes things more robust and avoids potential problems. | |
4496 | ||
4497 | Since the CV takes ownership of its slab when flagged, that flag is never | |
4498 | copied when a CV is cloned, as one CV could free a slab that another CV still | |
4499 | points to, since forced freeing of ops ignores the reference count (but asserts | |
4500 | that it looks right). | |
4501 | ||
4502 | To avoid slab fragmentation, freed ops are marked as freed and attached to the | |
4503 | slab's freed chain (an idea stolen from DBM::Deep). Those freed ops are reused | |
4504 | when possible. Not reusing freed ops would be simpler, but it would result in | |
4505 | significantly higher memory usage for programs with large C<if (DEBUG) {...}> | |
4506 | blocks. | |
4507 | ||
4508 | C<SAVEFREEOP> is slightly problematic under this scheme. Sometimes it can cause | |
4509 | an op to be freed after its CV. If the CV has forcibly freed the ops on its | |
4510 | slab and the slab itself, then we will be fiddling with a freed slab. Making | |
4511 | C<SAVEFREEOP> a no-op doesn't help, as sometimes an op can be savefreed when | |
4512 | there is no compilation error, so the op would never be freed. It holds | |
4513 | a reference count on the slab, so the whole slab would leak. So C<SAVEFREEOP> | |
4514 | now sets a special flag on the op (C<< ->op_savefree >>). The forced freeing of | |
4515 | ops after a compilation error won't free any ops thus marked. | |
4516 | ||
4517 | Since many pieces of code create tiny subroutines consisting of only a few ops, | |
4518 | and since a huge slab would be quite a bit of baggage for those to carry | |
4519 | around, the first slab is always very small. To avoid allocating too many | |
4520 | slabs for a single CV, each subsequent slab is twice the size of the previous. | |
4521 | ||
4522 | Smartmatch expects to be able to allocate an op at run time, run it, and then | |
0985f7e5 | 4523 | throw it away. For that to work the op is simply malloced when C<PL_compcv> hasn't |
091ff1b6 AC |
4524 | been set up. So all slab-allocated ops are marked as such (C<< ->op_slabbed >>), |
4525 | to distinguish them from malloced ops. | |
4526 | ||
4527 | ||
954c1994 | 4528 | =head1 AUTHORS |
e89caa19 | 4529 | |
954c1994 | 4530 | Until May 1997, this document was maintained by Jeff Okamoto |
9b5bb84f SB |
4531 | E<lt>okamoto@corp.hp.comE<gt>. It is now maintained as part of Perl |
4532 | itself by the Perl 5 Porters E<lt>perl5-porters@perl.orgE<gt>. | |
cb1a09d0 | 4533 | |
954c1994 GS |
4534 | With lots of help and suggestions from Dean Roehrich, Malcolm Beattie, |
4535 | Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil | |
4536 | Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer, | |
4537 | Stephen McCamant, and Gurusamy Sarathy. | |
cb1a09d0 | 4538 | |
954c1994 | 4539 | =head1 SEE ALSO |
cb1a09d0 | 4540 | |
ba555bf5 | 4541 | L<perlapi>, L<perlintern>, L<perlxs>, L<perlembed> |