| 1 | =for comment |
| 2 | The part of this file between =for mg_vtable.pl markers is auto |
| 3 | generated by mg_vtable.pl; any changes there need to be made instead to |
| 4 | mg_vtable.pl |
| 5 | |
| 6 | =head1 NAME |
| 7 | |
| 8 | perlguts - Introduction to the Perl API |
| 9 | |
| 10 | =head1 DESCRIPTION |
| 11 | |
| 12 | This document attempts to describe how to use the Perl API, as well as |
| 13 | to provide some info on the basic workings of the Perl core. It is far |
| 14 | from complete and probably contains many errors. Please refer any |
| 15 | questions or comments to the author below. |
| 16 | |
| 17 | =head1 Variables |
| 18 | |
| 19 | =head2 Datatypes |
| 20 | |
| 21 | Perl has three typedefs that handle Perl's three main data types: |
| 22 | |
| 23 | SV Scalar Value |
| 24 | AV Array Value |
| 25 | HV Hash Value |
| 26 | |
| 27 | Each typedef has specific routines that manipulate the various data types. |
| 28 | |
| 29 | =for apidoc_section $AV |
| 30 | =for apidoc Ayh||AV |
| 31 | =for apidoc_section $HV |
| 32 | =for apidoc Ayh||HV |
| 33 | =for apidoc_section $SV |
| 34 | =for apidoc Ayh||SV |
| 35 | |
| 36 | =head2 What is an "IV"? |
| 37 | |
| 38 | Perl uses a special typedef IV which is a simple signed integer type that is |
| 39 | guaranteed to be large enough to hold a pointer (as well as an integer). |
| 40 | Additionally, there is the UV, which is simply an unsigned IV. |
| 41 | |
| 42 | Perl also uses several special typedefs to declare variables to hold |
| 43 | integers of (at least) a given size. |
| 44 | Use I8, I16, I32, and I64 to declare a signed integer variable which has |
| 45 | at least as many bits as the number in its name. These all evaluate to |
| 46 | the native C type that is closest to the given number of bits, but no |
| 47 | smaller than that number. For example, on many platforms, a C<short> is |
| 48 | 16 bits long, and if so, I16 will evaluate to a C<short>. But on |
| 49 | platforms where a C<short> isn't exactly 16 bits, Perl will use the |
| 50 | smallest type that contains 16 bits or more. |
| 51 | |
| 52 | U8, U16, U32, and U64 are to declare the corresponding unsigned integer |
| 53 | types. |
| 54 | |
| 55 | If the platform doesn't support 64-bit integers, both I64 and U64 will |
| 56 | be undefined. Use IV and UV to declare the largest practicable, and |
| 57 | C<L<perlapi/WIDEST_UTYPE>> for the absolute maximum unsigned, but which |
| 58 | may not be usable in all circumstances. |
| 59 | |
| 60 | A numeric constant can be specified with L<perlapi/C<INT16_C>>, |
| 61 | L<perlapi/C<UINTMAX_C>>, and similar. |
| 62 | |
| 63 | =for apidoc_section $integer |
| 64 | =for apidoc Ayh ||IV |
| 65 | =for apidoc_item ||I8 |
| 66 | =for apidoc_item ||I16 |
| 67 | =for apidoc_item ||I32 |
| 68 | =for apidoc_item ||I64 |
| 69 | |
| 70 | =for apidoc Ayh ||UV |
| 71 | =for apidoc_item ||U8 |
| 72 | =for apidoc_item ||U16 |
| 73 | =for apidoc_item ||U32 |
| 74 | =for apidoc_item ||U64 |
| 75 | |
| 76 | =head2 Working with SVs |
| 77 | |
| 78 | An SV can be created and loaded with one command. There are five types of |
| 79 | values that can be loaded: an integer value (IV), an unsigned integer |
| 80 | value (UV), a double (NV), a string (PV), and another scalar (SV). |
| 81 | ("PV" stands for "Pointer Value". You might think that it is misnamed |
| 82 | because it is described as pointing only to strings. However, it is |
| 83 | possible to have it point to other things. For example, it could point |
| 84 | to an array of UVs. But, |
| 85 | using it for non-strings requires care, as the underlying assumption of |
| 86 | much of the internals is that PVs are just for strings. Often, for |
| 87 | example, a trailing C<NUL> is tacked on automatically. The non-string use |
| 88 | is documented only in this paragraph.) |
| 89 | |
| 90 | =for apidoc_section $floating |
| 91 | =for apidoc Ayh||NV |
| 92 | |
| 93 | The seven routines are: |
| 94 | |
| 95 | SV* newSViv(IV); |
| 96 | SV* newSVuv(UV); |
| 97 | SV* newSVnv(double); |
| 98 | SV* newSVpv(const char*, STRLEN); |
| 99 | SV* newSVpvn(const char*, STRLEN); |
| 100 | SV* newSVpvf(const char*, ...); |
| 101 | SV* newSVsv(SV*); |
| 102 | |
| 103 | C<STRLEN> is an integer type (C<Size_t>, usually defined as C<size_t> in |
| 104 | F<config.h>) guaranteed to be large enough to represent the size of |
| 105 | any string that perl can handle. |
| 106 | |
| 107 | =for apidoc_section $string |
| 108 | =for apidoc Ayh||STRLEN |
| 109 | |
| 110 | In the unlikely case of a SV requiring more complex initialization, you |
| 111 | can create an empty SV with newSV(len). If C<len> is 0 an empty SV of |
| 112 | type NULL is returned, else an SV of type PV is returned with len + 1 (for |
| 113 | the C<NUL>) bytes of storage allocated, accessible via SvPVX. In both cases |
| 114 | the SV has the undef value. |
| 115 | |
| 116 | SV *sv = newSV(0); /* no storage allocated */ |
| 117 | SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage |
| 118 | * allocated */ |
| 119 | |
| 120 | To change the value of an I<already-existing> SV, there are eight routines: |
| 121 | |
| 122 | void sv_setiv(SV*, IV); |
| 123 | void sv_setuv(SV*, UV); |
| 124 | void sv_setnv(SV*, double); |
| 125 | void sv_setpv(SV*, const char*); |
| 126 | void sv_setpvn(SV*, const char*, STRLEN) |
| 127 | void sv_setpvf(SV*, const char*, ...); |
| 128 | void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, |
| 129 | SV **, Size_t, bool *); |
| 130 | void sv_setsv(SV*, SV*); |
| 131 | |
| 132 | Notice that you can choose to specify the length of the string to be |
| 133 | assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may |
| 134 | allow Perl to calculate the length by using C<sv_setpv> or by specifying |
| 135 | 0 as the second argument to C<newSVpv>. Be warned, though, that Perl will |
| 136 | determine the string's length by using C<strlen>, which depends on the |
| 137 | string terminating with a C<NUL> character, and not otherwise containing |
| 138 | NULs. |
| 139 | |
| 140 | The arguments of C<sv_setpvf> are processed like C<sprintf>, and the |
| 141 | formatted output becomes the value. |
| 142 | |
| 143 | C<sv_vsetpvfn> is an analogue of C<vsprintf>, but it allows you to specify |
| 144 | either a pointer to a variable argument list or the address and length of |
| 145 | an array of SVs. The last argument points to a boolean; on return, if that |
| 146 | boolean is true, then locale-specific information has been used to format |
| 147 | the string, and the string's contents are therefore untrustworthy (see |
| 148 | L<perlsec>). This pointer may be NULL if that information is not |
| 149 | important. Note that this function requires you to specify the length of |
| 150 | the format. |
| 151 | |
| 152 | The C<sv_set*()> functions are not generic enough to operate on values |
| 153 | that have "magic". See L</Magic Virtual Tables> later in this document. |
| 154 | |
| 155 | All SVs that contain strings should be terminated with a C<NUL> character. |
| 156 | If it is not C<NUL>-terminated there is a risk of |
| 157 | core dumps and corruptions from code which passes the string to C |
| 158 | functions or system calls which expect a C<NUL>-terminated string. |
| 159 | Perl's own functions typically add a trailing C<NUL> for this reason. |
| 160 | Nevertheless, you should be very careful when you pass a string stored |
| 161 | in an SV to a C function or system call. |
| 162 | |
| 163 | To access the actual value that an SV points to, Perl's API exposes |
| 164 | several macros that coerce the actual scalar type into an IV, UV, double, |
| 165 | or string: |
| 166 | |
| 167 | =over |
| 168 | |
| 169 | =item * C<SvIV(SV*)> (C<IV>) and C<SvUV(SV*)> (C<UV>) |
| 170 | |
| 171 | =item * C<SvNV(SV*)> (C<double>) |
| 172 | |
| 173 | =item * Strings are a bit complicated: |
| 174 | |
| 175 | =over |
| 176 | |
| 177 | =item * Byte string: C<SvPVbyte(SV*, STRLEN len)> or C<SvPVbyte_nolen(SV*)> |
| 178 | |
| 179 | If the Perl string is C<"\xff\xff">, then this returns a 2-byte C<char*>. |
| 180 | |
| 181 | This is suitable for Perl strings that represent bytes. |
| 182 | |
| 183 | =item * UTF-8 string: C<SvPVutf8(SV*, STRLEN len)> or C<SvPVutf8_nolen(SV*)> |
| 184 | |
| 185 | If the Perl string is C<"\xff\xff">, then this returns a 4-byte C<char*>. |
| 186 | |
| 187 | This is suitable for Perl strings that represent characters. |
| 188 | |
| 189 | B<CAVEAT>: That C<char*> will be encoded via Perl's internal UTF-8 variant, |
| 190 | which means that if the SV contains non-Unicode code points (e.g., |
| 191 | 0x110000), then the result may contain extensions over valid UTF-8. |
| 192 | See L<perlapi/is_strict_utf8_string> for some methods Perl gives |
| 193 | you to check the UTF-8 validity of these macros' returns. |
| 194 | |
| 195 | =item * You can also use C<SvPV(SV*, STRLEN len)> or C<SvPV_nolen(SV*)> |
| 196 | to fetch the SV's raw internal buffer. This is tricky, though; if your Perl |
| 197 | string |
| 198 | is C<"\xff\xff">, then depending on the SV's internal encoding you might get |
| 199 | back a 2-byte B<OR> a 4-byte C<char*>. |
| 200 | Moreover, if it's the 4-byte string, that could come from either Perl |
| 201 | C<"\xff\xff"> stored UTF-8 encoded, or Perl C<"\xc3\xbf\xc3\xbf"> stored |
| 202 | as raw octets. To differentiate between these you B<MUST> look up the |
| 203 | SV's UTF8 bit (cf. C<SvUTF8>) to know whether the source Perl string |
| 204 | is 2 characters (C<SvUTF8> would be on) or 4 characters (C<SvUTF8> would be |
| 205 | off). |
| 206 | |
| 207 | B<IMPORTANT:> Use of C<SvPV>, C<SvPV_nolen>, or |
| 208 | similarly-named macros I<without> looking up the SV's UTF8 bit is |
| 209 | almost certainly a bug if non-ASCII input is allowed. |
| 210 | |
| 211 | When the UTF8 bit is on, the same B<CAVEAT> about UTF-8 validity applies |
| 212 | here as for C<SvPVutf8>. |
| 213 | |
| 214 | =back |
| 215 | |
| 216 | (See L</How do I pass a Perl string to a C library?> for more details.) |
| 217 | |
| 218 | In C<SvPVbyte>, C<SvPVutf8>, and C<SvPV>, the length of the C<char*> returned |
| 219 | is placed into the |
| 220 | variable C<len> (these are macros, so you do I<not> use C<&len>). If you do |
| 221 | not care what the length of the data is, use C<SvPVbyte_nolen>, |
| 222 | C<SvPVutf8_nolen>, or C<SvPV_nolen> instead. |
| 223 | The global variable C<PL_na> can also be given to |
| 224 | C<SvPVbyte>/C<SvPVutf8>/C<SvPV> |
| 225 | in this case. But that can be quite inefficient because C<PL_na> must |
| 226 | be accessed in thread-local storage in threaded Perl. In any case, remember |
| 227 | that Perl allows arbitrary strings of data that may both contain NULs and |
| 228 | might not be terminated by a C<NUL>. |
| 229 | |
| 230 | Also remember that C doesn't allow you to safely say C<foo(SvPVbyte(s, len), |
| 231 | len);>. It might work with your |
| 232 | compiler, but it won't work for everyone. |
| 233 | Break this sort of statement up into separate assignments: |
| 234 | |
| 235 | SV *s; |
| 236 | STRLEN len; |
| 237 | char *ptr; |
| 238 | ptr = SvPVbyte(s, len); |
| 239 | foo(ptr, len); |
| 240 | |
| 241 | =back |
| 242 | |
| 243 | If you want to know if the scalar value is TRUE, you can use: |
| 244 | |
| 245 | SvTRUE(SV*) |
| 246 | |
| 247 | Although Perl will automatically grow strings for you, if you need to force |
| 248 | Perl to allocate more memory for your SV, you can use the macro |
| 249 | |
| 250 | SvGROW(SV*, STRLEN newlen) |
| 251 | |
| 252 | which will determine if more memory needs to be allocated. If so, it will |
| 253 | call the function C<sv_grow>. Note that C<SvGROW> can only increase, not |
| 254 | decrease, the allocated memory of an SV and that it does not automatically |
| 255 | add space for the trailing C<NUL> byte (perl's own string functions typically do |
| 256 | C<SvGROW(sv, len + 1)>). |
| 257 | |
| 258 | If you want to write to an existing SV's buffer and set its value to a |
| 259 | string, use SvPVbyte_force() or one of its variants to force the SV to be |
| 260 | a PV. This will remove any of various types of non-stringness from |
| 261 | the SV while preserving the content of the SV in the PV. This can be |
| 262 | used, for example, to append data from an API function to a buffer |
| 263 | without extra copying: |
| 264 | |
| 265 | (void)SvPVbyte_force(sv, len); |
| 266 | s = SvGROW(sv, len + needlen + 1); |
| 267 | /* something that modifies up to needlen bytes at s+len, but |
| 268 | modifies newlen bytes |
| 269 | eg. newlen = read(fd, s + len, needlen); |
| 270 | ignoring errors for these examples |
| 271 | */ |
| 272 | s[len + newlen] = '\0'; |
| 273 | SvCUR_set(sv, len + newlen); |
| 274 | SvUTF8_off(sv); |
| 275 | SvSETMAGIC(sv); |
| 276 | |
| 277 | If you already have the data in memory or if you want to keep your |
| 278 | code simple, you can use one of the sv_cat*() variants, such as |
| 279 | sv_catpvn(). If you want to insert anywhere in the string you can use |
| 280 | sv_insert() or sv_insert_flags(). |
| 281 | |
| 282 | If you don't need the existing content of the SV, you can avoid some |
| 283 | copying with: |
| 284 | |
| 285 | SvPVCLEAR(sv); |
| 286 | s = SvGROW(sv, needlen + 1); |
| 287 | /* something that modifies up to needlen bytes at s, but modifies |
| 288 | newlen bytes |
| 289 | eg. newlen = read(fd, s, needlen); |
| 290 | */ |
| 291 | s[newlen] = '\0'; |
| 292 | SvCUR_set(sv, newlen); |
| 293 | SvPOK_only(sv); /* also clears SVf_UTF8 */ |
| 294 | SvSETMAGIC(sv); |
| 295 | |
| 296 | Again, if you already have the data in memory or want to avoid the |
| 297 | complexity of the above, you can use sv_setpvn(). |
| 298 | |
| 299 | If you have a buffer allocated with Newx() and want to set that as the |
| 300 | SV's value, you can use sv_usepvn_flags(). That has some requirements |
| 301 | if you want to avoid perl re-allocating the buffer to fit the trailing |
| 302 | NUL: |
| 303 | |
| 304 | Newx(buf, somesize+1, char); |
| 305 | /* ... fill in buf ... */ |
| 306 | buf[somesize] = '\0'; |
| 307 | sv_usepvn_flags(sv, buf, somesize, SV_SMAGIC | SV_HAS_TRAILING_NUL); |
| 308 | /* buf now belongs to perl, don't release it */ |
| 309 | |
| 310 | If you have an SV and want to know what kind of data Perl thinks is stored |
| 311 | in it, you can use the following macros to check the type of SV you have. |
| 312 | |
| 313 | SvIOK(SV*) |
| 314 | SvNOK(SV*) |
| 315 | SvPOK(SV*) |
| 316 | |
| 317 | Be aware that retrieving the numeric value of an SV can set IOK or NOK |
| 318 | on that SV, even when the SV started as a string. Prior to Perl |
| 319 | 5.36.0 retrieving the string value of an integer could set POK, but |
| 320 | this can no longer occur. From 5.36.0 this can be used to distinguish |
| 321 | the original representation of an SV and is intended to make life |
| 322 | simpler for serializers: |
| 323 | |
| 324 | /* references handled elsewhere */ |
| 325 | if (SvIsBOOL(sv)) { |
| 326 | /* originally boolean */ |
| 327 | ... |
| 328 | } |
| 329 | else if (SvPOK(sv)) { |
| 330 | /* originally a string */ |
| 331 | ... |
| 332 | } |
| 333 | else if (SvNIOK(sv)) { |
| 334 | /* originally numeric */ |
| 335 | ... |
| 336 | } |
| 337 | else { |
| 338 | /* something special or undef */ |
| 339 | } |
| 340 | |
| 341 | You can get and set the current length of the string stored in an SV with |
| 342 | the following macros: |
| 343 | |
| 344 | SvCUR(SV*) |
| 345 | SvCUR_set(SV*, I32 val) |
| 346 | |
| 347 | You can also get a pointer to the end of the string stored in the SV |
| 348 | with the macro: |
| 349 | |
| 350 | SvEND(SV*) |
| 351 | |
| 352 | But note that these last three macros are valid only if C<SvPOK()> is true. |
| 353 | |
| 354 | If you want to append something to the end of string stored in an C<SV*>, |
| 355 | you can use the following functions: |
| 356 | |
| 357 | void sv_catpv(SV*, const char*); |
| 358 | void sv_catpvn(SV*, const char*, STRLEN); |
| 359 | void sv_catpvf(SV*, const char*, ...); |
| 360 | void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, |
| 361 | I32, bool); |
| 362 | void sv_catsv(SV*, SV*); |
| 363 | |
| 364 | The first function calculates the length of the string to be appended by |
| 365 | using C<strlen>. In the second, you specify the length of the string |
| 366 | yourself. The third function processes its arguments like C<sprintf> and |
| 367 | appends the formatted output. The fourth function works like C<vsprintf>. |
| 368 | You can specify the address and length of an array of SVs instead of the |
| 369 | va_list argument. The fifth function |
| 370 | extends the string stored in the first |
| 371 | SV with the string stored in the second SV. It also forces the second SV |
| 372 | to be interpreted as a string. |
| 373 | |
| 374 | The C<sv_cat*()> functions are not generic enough to operate on values that |
| 375 | have "magic". See L</Magic Virtual Tables> later in this document. |
| 376 | |
| 377 | If you know the name of a scalar variable, you can get a pointer to its SV |
| 378 | by using the following: |
| 379 | |
| 380 | SV* get_sv("package::varname", 0); |
| 381 | |
| 382 | This returns NULL if the variable does not exist. |
| 383 | |
| 384 | If you want to know if this variable (or any other SV) is actually C<defined>, |
| 385 | you can call: |
| 386 | |
| 387 | SvOK(SV*) |
| 388 | |
| 389 | The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>. |
| 390 | |
| 391 | Its address can be used whenever an C<SV*> is needed. Make sure that |
| 392 | you don't try to compare a random sv with C<&PL_sv_undef>. For example |
| 393 | when interfacing Perl code, it'll work correctly for: |
| 394 | |
| 395 | foo(undef); |
| 396 | |
| 397 | But won't work when called as: |
| 398 | |
| 399 | $x = undef; |
| 400 | foo($x); |
| 401 | |
| 402 | So to repeat always use SvOK() to check whether an sv is defined. |
| 403 | |
| 404 | Also you have to be careful when using C<&PL_sv_undef> as a value in |
| 405 | AVs or HVs (see L</AVs, HVs and undefined values>). |
| 406 | |
| 407 | There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain |
| 408 | boolean TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their |
| 409 | addresses can be used whenever an C<SV*> is needed. |
| 410 | |
| 411 | Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>. |
| 412 | Take this code: |
| 413 | |
| 414 | SV* sv = (SV*) 0; |
| 415 | if (I-am-to-return-a-real-value) { |
| 416 | sv = sv_2mortal(newSViv(42)); |
| 417 | } |
| 418 | sv_setsv(ST(0), sv); |
| 419 | |
| 420 | This code tries to return a new SV (which contains the value 42) if it should |
| 421 | return a real value, or undef otherwise. Instead it has returned a NULL |
| 422 | pointer which, somewhere down the line, will cause a segmentation violation, |
| 423 | bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the |
| 424 | first line and all will be well. |
| 425 | |
| 426 | To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this |
| 427 | call is not necessary (see L</Reference Counts and Mortality>). |
| 428 | |
| 429 | =head2 Offsets |
| 430 | |
| 431 | Perl provides the function C<sv_chop> to efficiently remove characters |
| 432 | from the beginning of a string; you give it an SV and a pointer to |
| 433 | somewhere inside the PV, and it discards everything before the |
| 434 | pointer. The efficiency comes by means of a little hack: instead of |
| 435 | actually removing the characters, C<sv_chop> sets the flag C<OOK> |
| 436 | (offset OK) to signal to other functions that the offset hack is in |
| 437 | effect, and it moves the PV pointer (called C<SvPVX>) forward |
| 438 | by the number of bytes chopped off, and adjusts C<SvCUR> and C<SvLEN> |
| 439 | accordingly. (A portion of the space between the old and new PV |
| 440 | pointers is used to store the count of chopped bytes.) |
| 441 | |
| 442 | Hence, at this point, the start of the buffer that we allocated lives |
| 443 | at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing |
| 444 | into the middle of this allocated storage. |
| 445 | |
| 446 | This is best demonstrated by example. Normally copy-on-write will prevent |
| 447 | the substitution from operator from using this hack, but if you can craft a |
| 448 | string for which copy-on-write is not possible, you can see it in play. In |
| 449 | the current implementation, the final byte of a string buffer is used as a |
| 450 | copy-on-write reference count. If the buffer is not big enough, then |
| 451 | copy-on-write is skipped. First have a look at an empty string: |
| 452 | |
| 453 | % ./perl -Ilib -MDevel::Peek -le '$a=""; $a .= ""; Dump $a' |
| 454 | SV = PV(0x7ffb7c008a70) at 0x7ffb7c030390 |
| 455 | REFCNT = 1 |
| 456 | FLAGS = (POK,pPOK) |
| 457 | PV = 0x7ffb7bc05b50 ""\0 |
| 458 | CUR = 0 |
| 459 | LEN = 10 |
| 460 | |
| 461 | Notice here the LEN is 10. (It may differ on your platform.) Extend the |
| 462 | length of the string to one less than 10, and do a substitution: |
| 463 | |
| 464 | % ./perl -Ilib -MDevel::Peek -le '$a=""; $a.="123456789"; $a=~s/.//; \ |
| 465 | Dump($a)' |
| 466 | SV = PV(0x7ffa04008a70) at 0x7ffa04030390 |
| 467 | REFCNT = 1 |
| 468 | FLAGS = (POK,OOK,pPOK) |
| 469 | OFFSET = 1 |
| 470 | PV = 0x7ffa03c05b61 ( "\1" . ) "23456789"\0 |
| 471 | CUR = 8 |
| 472 | LEN = 9 |
| 473 | |
| 474 | Here the number of bytes chopped off (1) is shown next as the OFFSET. The |
| 475 | portion of the string between the "real" and the "fake" beginnings is |
| 476 | shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect |
| 477 | the fake beginning, not the real one. (The first character of the string |
| 478 | buffer happens to have changed to "\1" here, not "1", because the current |
| 479 | implementation stores the offset count in the string buffer. This is |
| 480 | subject to change.) |
| 481 | |
| 482 | Something similar to the offset hack is performed on AVs to enable |
| 483 | efficient shifting and splicing off the beginning of the array; while |
| 484 | C<AvARRAY> points to the first element in the array that is visible from |
| 485 | Perl, C<AvALLOC> points to the real start of the C array. These are |
| 486 | usually the same, but a C<shift> operation can be carried out by |
| 487 | increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvMAX>. |
| 488 | Again, the location of the real start of the C array only comes into |
| 489 | play when freeing the array. See C<av_shift> in F<av.c>. |
| 490 | |
| 491 | =for apidoc_section $AV |
| 492 | =for apidoc Amh||AvALLOC|AV* av |
| 493 | |
| 494 | =head2 What's Really Stored in an SV? |
| 495 | |
| 496 | Recall that the usual method of determining the type of scalar you have is |
| 497 | to use C<Sv*OK> macros. Because a scalar can be both a number and a string, |
| 498 | usually these macros will always return TRUE and calling the C<Sv*V> |
| 499 | macros will do the appropriate conversion of string to integer/double or |
| 500 | integer/double to string. |
| 501 | |
| 502 | If you I<really> need to know if you have an integer, double, or string |
| 503 | pointer in an SV, you can use the following three macros instead: |
| 504 | |
| 505 | SvIOKp(SV*) |
| 506 | SvNOKp(SV*) |
| 507 | SvPOKp(SV*) |
| 508 | |
| 509 | These will tell you if you truly have an integer, double, or string pointer |
| 510 | stored in your SV. The "p" stands for private. |
| 511 | |
| 512 | There are various ways in which the private and public flags may differ. |
| 513 | For example, in perl 5.16 and earlier a tied SV may have a valid |
| 514 | underlying value in the IV slot (so SvIOKp is true), but the data |
| 515 | should be accessed via the FETCH routine rather than directly, |
| 516 | so SvIOK is false. (In perl 5.18 onwards, tied scalars use |
| 517 | the flags the same way as untied scalars.) Another is when |
| 518 | numeric conversion has occurred and precision has been lost: only the |
| 519 | private flag is set on 'lossy' values. So when an NV is converted to an |
| 520 | IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be. |
| 521 | |
| 522 | In general, though, it's best to use the C<Sv*V> macros. |
| 523 | |
| 524 | =head2 Working with AVs |
| 525 | |
| 526 | There are two main, longstanding ways to create and load an AV. The first |
| 527 | method creates an empty AV: |
| 528 | |
| 529 | AV* newAV(); |
| 530 | |
| 531 | The second method both creates the AV and initially populates it with SVs: |
| 532 | |
| 533 | AV* av_make(SSize_t num, SV **ptr); |
| 534 | |
| 535 | The second argument points to an array containing C<num> C<SV*>'s. Once the |
| 536 | AV has been created, the SVs can be destroyed, if so desired. |
| 537 | |
| 538 | Perl v5.36 added two new ways to create an AV and allocate a SV** array |
| 539 | without populating it. These are more efficient than a newAV() followed by an |
| 540 | av_extend(). |
| 541 | |
| 542 | /* Creates but does not initialize (Zero) the SV** array */ |
| 543 | AV *av = newAV_alloc_x(1); |
| 544 | /* Creates and does initialize (Zero) the SV** array */ |
| 545 | AV *av = newAV_alloc_xz(1); |
| 546 | |
| 547 | The numerical argument refers to the number of array elements to allocate, not |
| 548 | an array index, and must be >0. The first form must only ever be used when all |
| 549 | elements will be initialized before any read occurs. Reading a non-initialized |
| 550 | SV* - i.e. treating a random memory address as a SV* - is a serious bug. |
| 551 | |
| 552 | Once the AV has been created, the following operations are possible on it: |
| 553 | |
| 554 | void av_push(AV*, SV*); |
| 555 | SV* av_pop(AV*); |
| 556 | SV* av_shift(AV*); |
| 557 | void av_unshift(AV*, SSize_t num); |
| 558 | |
| 559 | These should be familiar operations, with the exception of C<av_unshift>. |
| 560 | This routine adds C<num> elements at the front of the array with the C<undef> |
| 561 | value. You must then use C<av_store> (described below) to assign values |
| 562 | to these new elements. |
| 563 | |
| 564 | Here are some other functions: |
| 565 | |
| 566 | SSize_t av_top_index(AV*); |
| 567 | SV** av_fetch(AV*, SSize_t key, I32 lval); |
| 568 | SV** av_store(AV*, SSize_t key, SV* val); |
| 569 | |
| 570 | The C<av_top_index> function returns the highest index value in an array (just |
| 571 | like $#array in Perl). If the array is empty, -1 is returned. The |
| 572 | C<av_fetch> function returns the value at index C<key>, but if C<lval> |
| 573 | is non-zero, then C<av_fetch> will store an undef value at that index. |
| 574 | The C<av_store> function stores the value C<val> at index C<key>, and does |
| 575 | not increment the reference count of C<val>. Thus the caller is responsible |
| 576 | for taking care of that, and if C<av_store> returns NULL, the caller will |
| 577 | have to decrement the reference count to avoid a memory leak. Note that |
| 578 | C<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their |
| 579 | return value. |
| 580 | |
| 581 | A few more: |
| 582 | |
| 583 | void av_clear(AV*); |
| 584 | void av_undef(AV*); |
| 585 | void av_extend(AV*, SSize_t key); |
| 586 | |
| 587 | The C<av_clear> function deletes all the elements in the AV* array, but |
| 588 | does not actually delete the array itself. The C<av_undef> function will |
| 589 | delete all the elements in the array plus the array itself. The |
| 590 | C<av_extend> function extends the array so that it contains at least C<key+1> |
| 591 | elements. If C<key+1> is less than the currently allocated length of the array, |
| 592 | then nothing is done. |
| 593 | |
| 594 | If you know the name of an array variable, you can get a pointer to its AV |
| 595 | by using the following: |
| 596 | |
| 597 | AV* get_av("package::varname", 0); |
| 598 | |
| 599 | This returns NULL if the variable does not exist. |
| 600 | |
| 601 | See L</Understanding the Magic of Tied Hashes and Arrays> for more |
| 602 | information on how to use the array access functions on tied arrays. |
| 603 | |
| 604 | =head3 More efficient working with new or vanilla AVs |
| 605 | |
| 606 | Perl v5.36 and v5.38 introduced streamlined, inlined versions of some |
| 607 | functions: |
| 608 | |
| 609 | =over |
| 610 | |
| 611 | =item * C<av_store_simple> |
| 612 | |
| 613 | =item * C<av_fetch_simple> |
| 614 | |
| 615 | =item * C<av_push_simple> |
| 616 | |
| 617 | =back |
| 618 | |
| 619 | These are drop-in replacements, but can only be used on straightforward |
| 620 | AVs that meet the following criteria: |
| 621 | |
| 622 | =over |
| 623 | |
| 624 | =item * are not magical |
| 625 | |
| 626 | =item * are not readonly |
| 627 | |
| 628 | =item * are "real" (refcounted) AVs |
| 629 | |
| 630 | =item * have an av_top_index value > -2 |
| 631 | |
| 632 | =back |
| 633 | |
| 634 | AVs created using C<newAV()>, C<av_make>, C<newAV_alloc_x>, and |
| 635 | C<newAV_alloc_xz> are all compatible at the time of creation. It is |
| 636 | only if they are declared readonly or unreal, have magic attached, or |
| 637 | are otherwise configured unusually that they will stop being compatible. |
| 638 | |
| 639 | Note that some interpreter functions may attach magic to an AV as part |
| 640 | of normal operations. It is therefore safest, unless you are sure of the |
| 641 | lifecycle of an AV, to only use these new functions close to the point |
| 642 | of AV creation. |
| 643 | |
| 644 | =head2 Working with HVs |
| 645 | |
| 646 | To create an HV, you use the following routine: |
| 647 | |
| 648 | HV* newHV(); |
| 649 | |
| 650 | Once the HV has been created, the following operations are possible on it: |
| 651 | |
| 652 | SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash); |
| 653 | SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval); |
| 654 | |
| 655 | The C<klen> parameter is the length of the key being passed in (Note that |
| 656 | you cannot pass 0 in as a value of C<klen> to tell Perl to measure the |
| 657 | length of the key). The C<val> argument contains the SV pointer to the |
| 658 | scalar being stored, and C<hash> is the precomputed hash value (zero if |
| 659 | you want C<hv_store> to calculate it for you). The C<lval> parameter |
| 660 | indicates whether this fetch is actually a part of a store operation, in |
| 661 | which case a new undefined value will be added to the HV with the supplied |
| 662 | key and C<hv_fetch> will return as if the value had already existed. |
| 663 | |
| 664 | Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just |
| 665 | C<SV*>. To access the scalar value, you must first dereference the return |
| 666 | value. However, you should check to make sure that the return value is |
| 667 | not NULL before dereferencing it. |
| 668 | |
| 669 | The first of these two functions checks if a hash table entry exists, and the |
| 670 | second deletes it. |
| 671 | |
| 672 | bool hv_exists(HV*, const char* key, U32 klen); |
| 673 | SV* hv_delete(HV*, const char* key, U32 klen, I32 flags); |
| 674 | |
| 675 | If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will |
| 676 | create and return a mortal copy of the deleted value. |
| 677 | |
| 678 | And more miscellaneous functions: |
| 679 | |
| 680 | void hv_clear(HV*); |
| 681 | void hv_undef(HV*); |
| 682 | |
| 683 | Like their AV counterparts, C<hv_clear> deletes all the entries in the hash |
| 684 | table but does not actually delete the hash table. The C<hv_undef> deletes |
| 685 | both the entries and the hash table itself. |
| 686 | |
| 687 | Perl keeps the actual data in a linked list of structures with a typedef of HE. |
| 688 | These contain the actual key and value pointers (plus extra administrative |
| 689 | overhead). The key is a string pointer; the value is an C<SV*>. However, |
| 690 | once you have an C<HE*>, to get the actual key and value, use the routines |
| 691 | specified below. |
| 692 | |
| 693 | =for apidoc_section $HV |
| 694 | =for apidoc Ayh||HE |
| 695 | |
| 696 | I32 hv_iterinit(HV*); |
| 697 | /* Prepares starting point to traverse hash table */ |
| 698 | HE* hv_iternext(HV*); |
| 699 | /* Get the next entry, and return a pointer to a |
| 700 | structure that has both the key and value */ |
| 701 | char* hv_iterkey(HE* entry, I32* retlen); |
| 702 | /* Get the key from an HE structure and also return |
| 703 | the length of the key string */ |
| 704 | SV* hv_iterval(HV*, HE* entry); |
| 705 | /* Return an SV pointer to the value of the HE |
| 706 | structure */ |
| 707 | SV* hv_iternextsv(HV*, char** key, I32* retlen); |
| 708 | /* This convenience routine combines hv_iternext, |
| 709 | hv_iterkey, and hv_iterval. The key and retlen |
| 710 | arguments are return values for the key and its |
| 711 | length. The value is returned in the SV* argument */ |
| 712 | |
| 713 | If you know the name of a hash variable, you can get a pointer to its HV |
| 714 | by using the following: |
| 715 | |
| 716 | HV* get_hv("package::varname", 0); |
| 717 | |
| 718 | This returns NULL if the variable does not exist. |
| 719 | |
| 720 | The hash algorithm is defined in the C<PERL_HASH> macro: |
| 721 | |
| 722 | PERL_HASH(hash, key, klen) |
| 723 | |
| 724 | The exact implementation of this macro varies by architecture and version |
| 725 | of perl, and the return value may change per invocation, so the value |
| 726 | is only valid for the duration of a single perl process. |
| 727 | |
| 728 | See L</Understanding the Magic of Tied Hashes and Arrays> for more |
| 729 | information on how to use the hash access functions on tied hashes. |
| 730 | |
| 731 | =for apidoc_section $HV |
| 732 | =for apidoc Amh|void|PERL_HASH|U32 hash|char *key|STRLEN klen |
| 733 | |
| 734 | =head2 Hash API Extensions |
| 735 | |
| 736 | Beginning with version 5.004, the following functions are also supported: |
| 737 | |
| 738 | HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash); |
| 739 | HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash); |
| 740 | |
| 741 | bool hv_exists_ent (HV* tb, SV* key, U32 hash); |
| 742 | SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash); |
| 743 | |
| 744 | SV* hv_iterkeysv (HE* entry); |
| 745 | |
| 746 | Note that these functions take C<SV*> keys, which simplifies writing |
| 747 | of extension code that deals with hash structures. These functions |
| 748 | also allow passing of C<SV*> keys to C<tie> functions without forcing |
| 749 | you to stringify the keys (unlike the previous set of functions). |
| 750 | |
| 751 | They also return and accept whole hash entries (C<HE*>), making their |
| 752 | use more efficient (since the hash number for a particular string |
| 753 | doesn't have to be recomputed every time). See L<perlapi> for detailed |
| 754 | descriptions. |
| 755 | |
| 756 | The following macros must always be used to access the contents of hash |
| 757 | entries. Note that the arguments to these macros must be simple |
| 758 | variables, since they may get evaluated more than once. See |
| 759 | L<perlapi> for detailed descriptions of these macros. |
| 760 | |
| 761 | HePV(HE* he, STRLEN len) |
| 762 | HeVAL(HE* he) |
| 763 | HeHASH(HE* he) |
| 764 | HeSVKEY(HE* he) |
| 765 | HeSVKEY_force(HE* he) |
| 766 | HeSVKEY_set(HE* he, SV* sv) |
| 767 | |
| 768 | These two lower level macros are defined, but must only be used when |
| 769 | dealing with keys that are not C<SV*>s: |
| 770 | |
| 771 | HeKEY(HE* he) |
| 772 | HeKLEN(HE* he) |
| 773 | |
| 774 | Note that both C<hv_store> and C<hv_store_ent> do not increment the |
| 775 | reference count of the stored C<val>, which is the caller's responsibility. |
| 776 | If these functions return a NULL value, the caller will usually have to |
| 777 | decrement the reference count of C<val> to avoid a memory leak. |
| 778 | |
| 779 | =head2 AVs, HVs and undefined values |
| 780 | |
| 781 | Sometimes you have to store undefined values in AVs or HVs. Although |
| 782 | this may be a rare case, it can be tricky. That's because you're |
| 783 | used to using C<&PL_sv_undef> if you need an undefined SV. |
| 784 | |
| 785 | For example, intuition tells you that this XS code: |
| 786 | |
| 787 | AV *av = newAV(); |
| 788 | av_store( av, 0, &PL_sv_undef ); |
| 789 | |
| 790 | is equivalent to this Perl code: |
| 791 | |
| 792 | my @av; |
| 793 | $av[0] = undef; |
| 794 | |
| 795 | Unfortunately, this isn't true. In perl 5.18 and earlier, AVs use C<&PL_sv_undef> as a marker |
| 796 | for indicating that an array element has not yet been initialized. |
| 797 | Thus, C<exists $av[0]> would be true for the above Perl code, but |
| 798 | false for the array generated by the XS code. In perl 5.20, storing |
| 799 | &PL_sv_undef will create a read-only element, because the scalar |
| 800 | &PL_sv_undef itself is stored, not a copy. |
| 801 | |
| 802 | Similar problems can occur when storing C<&PL_sv_undef> in HVs: |
| 803 | |
| 804 | hv_store( hv, "key", 3, &PL_sv_undef, 0 ); |
| 805 | |
| 806 | This will indeed make the value C<undef>, but if you try to modify |
| 807 | the value of C<key>, you'll get the following error: |
| 808 | |
| 809 | Modification of non-creatable hash value attempted |
| 810 | |
| 811 | In perl 5.8.0, C<&PL_sv_undef> was also used to mark placeholders |
| 812 | in restricted hashes. This caused such hash entries not to appear |
| 813 | when iterating over the hash or when checking for the keys |
| 814 | with the C<hv_exists> function. |
| 815 | |
| 816 | You can run into similar problems when you store C<&PL_sv_yes> or |
| 817 | C<&PL_sv_no> into AVs or HVs. Trying to modify such elements |
| 818 | will give you the following error: |
| 819 | |
| 820 | Modification of a read-only value attempted |
| 821 | |
| 822 | To make a long story short, you can use the special variables |
| 823 | C<&PL_sv_undef>, C<&PL_sv_yes> and C<&PL_sv_no> with AVs and |
| 824 | HVs, but you have to make sure you know what you're doing. |
| 825 | |
| 826 | Generally, if you want to store an undefined value in an AV |
| 827 | or HV, you should not use C<&PL_sv_undef>, but rather create a |
| 828 | new undefined value using the C<newSV> function, for example: |
| 829 | |
| 830 | av_store( av, 42, newSV(0) ); |
| 831 | hv_store( hv, "foo", 3, newSV(0), 0 ); |
| 832 | |
| 833 | =head2 References |
| 834 | |
| 835 | References are a special type of scalar that point to other data types |
| 836 | (including other references). |
| 837 | |
| 838 | To create a reference, use either of the following functions: |
| 839 | |
| 840 | SV* newRV_inc((SV*) thing); |
| 841 | SV* newRV_noinc((SV*) thing); |
| 842 | |
| 843 | The C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>. The |
| 844 | functions are identical except that C<newRV_inc> increments the reference |
| 845 | count of the C<thing>, while C<newRV_noinc> does not. For historical |
| 846 | reasons, C<newRV> is a synonym for C<newRV_inc>. |
| 847 | |
| 848 | Once you have a reference, you can use the following macro to dereference |
| 849 | the reference: |
| 850 | |
| 851 | SvRV(SV*) |
| 852 | |
| 853 | then call the appropriate routines, casting the returned C<SV*> to either an |
| 854 | C<AV*> or C<HV*>, if required. |
| 855 | |
| 856 | To determine if an SV is a reference, you can use the following macro: |
| 857 | |
| 858 | SvROK(SV*) |
| 859 | |
| 860 | To discover what type of value the reference refers to, use the following |
| 861 | macro and then check the return value. |
| 862 | |
| 863 | SvTYPE(SvRV(SV*)) |
| 864 | |
| 865 | The most useful types that will be returned are: |
| 866 | |
| 867 | SVt_PVAV Array |
| 868 | SVt_PVHV Hash |
| 869 | SVt_PVCV Code |
| 870 | SVt_PVGV Glob (possibly a file handle) |
| 871 | |
| 872 | Any numerical value returned which is less than SVt_PVAV will be a scalar |
| 873 | of some form. |
| 874 | |
| 875 | See L<perlapi/svtype> for more details. |
| 876 | |
| 877 | =head2 Blessed References and Class Objects |
| 878 | |
| 879 | References are also used to support object-oriented programming. In perl's |
| 880 | OO lexicon, an object is simply a reference that has been blessed into a |
| 881 | package (or class). Once blessed, the programmer may now use the reference |
| 882 | to access the various methods in the class. |
| 883 | |
| 884 | A reference can be blessed into a package with the following function: |
| 885 | |
| 886 | SV* sv_bless(SV* sv, HV* stash); |
| 887 | |
| 888 | The C<sv> argument must be a reference value. The C<stash> argument |
| 889 | specifies which class the reference will belong to. See |
| 890 | L</Stashes and Globs> for information on converting class names into stashes. |
| 891 | |
| 892 | /* Still under construction */ |
| 893 | |
| 894 | The following function upgrades rv to reference if not already one. |
| 895 | Creates a new SV for rv to point to. If C<classname> is non-null, the SV |
| 896 | is blessed into the specified class. SV is returned. |
| 897 | |
| 898 | SV* newSVrv(SV* rv, const char* classname); |
| 899 | |
| 900 | The following three functions copy integer, unsigned integer or double |
| 901 | into an SV whose reference is C<rv>. SV is blessed if C<classname> is |
| 902 | non-null. |
| 903 | |
| 904 | SV* sv_setref_iv(SV* rv, const char* classname, IV iv); |
| 905 | SV* sv_setref_uv(SV* rv, const char* classname, UV uv); |
| 906 | SV* sv_setref_nv(SV* rv, const char* classname, NV iv); |
| 907 | |
| 908 | The following function copies the pointer value (I<the address, not the |
| 909 | string!>) into an SV whose reference is rv. SV is blessed if C<classname> |
| 910 | is non-null. |
| 911 | |
| 912 | SV* sv_setref_pv(SV* rv, const char* classname, void* pv); |
| 913 | |
| 914 | The following function copies a string into an SV whose reference is C<rv>. |
| 915 | Set length to 0 to let Perl calculate the string length. SV is blessed if |
| 916 | C<classname> is non-null. |
| 917 | |
| 918 | SV* sv_setref_pvn(SV* rv, const char* classname, char* pv, |
| 919 | STRLEN length); |
| 920 | |
| 921 | The following function tests whether the SV is blessed into the specified |
| 922 | class. It does not check inheritance relationships. |
| 923 | |
| 924 | int sv_isa(SV* sv, const char* name); |
| 925 | |
| 926 | The following function tests whether the SV is a reference to a blessed object. |
| 927 | |
| 928 | int sv_isobject(SV* sv); |
| 929 | |
| 930 | The following function tests whether the SV is derived from the specified |
| 931 | class. SV can be either a reference to a blessed object or a string |
| 932 | containing a class name. This is the function implementing the |
| 933 | C<UNIVERSAL::isa> functionality. |
| 934 | |
| 935 | bool sv_derived_from(SV* sv, const char* name); |
| 936 | |
| 937 | To check if you've got an object derived from a specific class you have |
| 938 | to write: |
| 939 | |
| 940 | if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... } |
| 941 | |
| 942 | =head2 Creating New Variables |
| 943 | |
| 944 | To create a new Perl variable with an undef value which can be accessed from |
| 945 | your Perl script, use the following routines, depending on the variable type. |
| 946 | |
| 947 | SV* get_sv("package::varname", GV_ADD); |
| 948 | AV* get_av("package::varname", GV_ADD); |
| 949 | HV* get_hv("package::varname", GV_ADD); |
| 950 | |
| 951 | Notice the use of GV_ADD as the second parameter. The new variable can now |
| 952 | be set, using the routines appropriate to the data type. |
| 953 | |
| 954 | There are additional macros whose values may be bitwise OR'ed with the |
| 955 | C<GV_ADD> argument to enable certain extra features. Those bits are: |
| 956 | |
| 957 | =over |
| 958 | |
| 959 | =item GV_ADDMULTI |
| 960 | |
| 961 | Marks the variable as multiply defined, thus preventing the: |
| 962 | |
| 963 | Name <varname> used only once: possible typo |
| 964 | |
| 965 | warning. |
| 966 | |
| 967 | =item GV_ADDWARN |
| 968 | |
| 969 | Issues the warning: |
| 970 | |
| 971 | Had to create <varname> unexpectedly |
| 972 | |
| 973 | if the variable did not exist before the function was called. |
| 974 | |
| 975 | =back |
| 976 | |
| 977 | If you do not specify a package name, the variable is created in the current |
| 978 | package. |
| 979 | |
| 980 | =head2 Reference Counts and Mortality |
| 981 | |
| 982 | Perl uses a reference count-driven garbage collection mechanism. SVs, |
| 983 | AVs, or HVs (xV for short in the following) start their life with a |
| 984 | reference count of 1. If the reference count of an xV ever drops to 0, |
| 985 | then it will be destroyed and its memory made available for reuse. |
| 986 | At the most basic internal level, reference counts can be manipulated |
| 987 | with the following macros: |
| 988 | |
| 989 | int SvREFCNT(SV* sv); |
| 990 | SV* SvREFCNT_inc(SV* sv); |
| 991 | void SvREFCNT_dec(SV* sv); |
| 992 | |
| 993 | (There are also suffixed versions of the increment and decrement macros, |
| 994 | for situations where the full generality of these basic macros can be |
| 995 | exchanged for some performance.) |
| 996 | |
| 997 | However, the way a programmer should think about references is not so |
| 998 | much in terms of the bare reference count, but in terms of I<ownership> |
| 999 | of references. A reference to an xV can be owned by any of a variety |
| 1000 | of entities: another xV, the Perl interpreter, an XS data structure, |
| 1001 | a piece of running code, or a dynamic scope. An xV generally does not |
| 1002 | know what entities own the references to it; it only knows how many |
| 1003 | references there are, which is the reference count. |
| 1004 | |
| 1005 | To correctly maintain reference counts, it is essential to keep track |
| 1006 | of what references the XS code is manipulating. The programmer should |
| 1007 | always know where a reference has come from and who owns it, and be |
| 1008 | aware of any creation or destruction of references, and any transfers |
| 1009 | of ownership. Because ownership isn't represented explicitly in the xV |
| 1010 | data structures, only the reference count need be actually maintained |
| 1011 | by the code, and that means that this understanding of ownership is not |
| 1012 | actually evident in the code. For example, transferring ownership of a |
| 1013 | reference from one owner to another doesn't change the reference count |
| 1014 | at all, so may be achieved with no actual code. (The transferring code |
| 1015 | doesn't touch the referenced object, but does need to ensure that the |
| 1016 | former owner knows that it no longer owns the reference, and that the |
| 1017 | new owner knows that it now does.) |
| 1018 | |
| 1019 | An xV that is visible at the Perl level should not become unreferenced |
| 1020 | and thus be destroyed. Normally, an object will only become unreferenced |
| 1021 | when it is no longer visible, often by the same means that makes it |
| 1022 | invisible. For example, a Perl reference value (RV) owns a reference to |
| 1023 | its referent, so if the RV is overwritten that reference gets destroyed, |
| 1024 | and the no-longer-reachable referent may be destroyed as a result. |
| 1025 | |
| 1026 | Many functions have some kind of reference manipulation as |
| 1027 | part of their purpose. Sometimes this is documented in terms |
| 1028 | of ownership of references, and sometimes it is (less helpfully) |
| 1029 | documented in terms of changes to reference counts. For example, the |
| 1030 | L<newRV_inc()|perlapi/newRV_inc> function is documented to create a new RV |
| 1031 | (with reference count 1) and increment the reference count of the referent |
| 1032 | that was supplied by the caller. This is best understood as creating |
| 1033 | a new reference to the referent, which is owned by the created RV, |
| 1034 | and returning to the caller ownership of the sole reference to the RV. |
| 1035 | The L<newRV_noinc()|perlapi/newRV_noinc> function instead does not |
| 1036 | increment the reference count of the referent, but the RV nevertheless |
| 1037 | ends up owning a reference to the referent. It is therefore implied |
| 1038 | that the caller of C<newRV_noinc()> is relinquishing a reference to the |
| 1039 | referent, making this conceptually a more complicated operation even |
| 1040 | though it does less to the data structures. |
| 1041 | |
| 1042 | For example, imagine you want to return a reference from an XSUB |
| 1043 | function. Inside the XSUB routine, you create an SV which initially |
| 1044 | has just a single reference, owned by the XSUB routine. This reference |
| 1045 | needs to be disposed of before the routine is complete, otherwise it |
| 1046 | will leak, preventing the SV from ever being destroyed. So to create |
| 1047 | an RV referencing the SV, it is most convenient to pass the SV to |
| 1048 | C<newRV_noinc()>, which consumes that reference. Now the XSUB routine |
| 1049 | no longer owns a reference to the SV, but does own a reference to the RV, |
| 1050 | which in turn owns a reference to the SV. The ownership of the reference |
| 1051 | to the RV is then transferred by the process of returning the RV from |
| 1052 | the XSUB. |
| 1053 | |
| 1054 | There are some convenience functions available that can help with the |
| 1055 | destruction of xVs. These functions introduce the concept of "mortality". |
| 1056 | Much documentation speaks of an xV itself being mortal, but this is |
| 1057 | misleading. It is really I<a reference to> an xV that is mortal, and it |
| 1058 | is possible for there to be more than one mortal reference to a single xV. |
| 1059 | For a reference to be mortal means that it is owned by the temps stack, |
| 1060 | one of perl's many internal stacks, which will destroy that reference |
| 1061 | "a short time later". Usually the "short time later" is the end of |
| 1062 | the current Perl statement. However, it gets more complicated around |
| 1063 | dynamic scopes: there can be multiple sets of mortal references hanging |
| 1064 | around at the same time, with different death dates. Internally, the |
| 1065 | actual determinant for when mortal xV references are destroyed depends |
| 1066 | on two macros, SAVETMPS and FREETMPS. See L<perlcall> and L<perlxs> |
| 1067 | and L</Temporaries Stack> below for more details on these macros. |
| 1068 | |
| 1069 | Mortal references are mainly used for xVs that are placed on perl's |
| 1070 | main stack. The stack is problematic for reference tracking, because it |
| 1071 | contains a lot of xV references, but doesn't own those references: they |
| 1072 | are not counted. Currently, there are many bugs resulting from xVs being |
| 1073 | destroyed while referenced by the stack, because the stack's uncounted |
| 1074 | references aren't enough to keep the xVs alive. So when putting an |
| 1075 | (uncounted) reference on the stack, it is vitally important to ensure that |
| 1076 | there will be a counted reference to the same xV that will last at least |
| 1077 | as long as the uncounted reference. But it's also important that that |
| 1078 | counted reference be cleaned up at an appropriate time, and not unduly |
| 1079 | prolong the xV's life. For there to be a mortal reference is often the |
| 1080 | best way to satisfy this requirement, especially if the xV was created |
| 1081 | especially to be put on the stack and would otherwise be unreferenced. |
| 1082 | |
| 1083 | To create a mortal reference, use the functions: |
| 1084 | |
| 1085 | SV* sv_newmortal() |
| 1086 | SV* sv_mortalcopy(SV*) |
| 1087 | SV* sv_2mortal(SV*) |
| 1088 | |
| 1089 | C<sv_newmortal()> creates an SV (with the undefined value) whose sole |
| 1090 | reference is mortal. C<sv_mortalcopy()> creates an xV whose value is a |
| 1091 | copy of a supplied xV and whose sole reference is mortal. C<sv_2mortal()> |
| 1092 | mortalises an existing xV reference: it transfers ownership of a reference |
| 1093 | from the caller to the temps stack. Because C<sv_newmortal> gives the new |
| 1094 | SV no value, it must normally be given one via C<sv_setpv>, C<sv_setiv>, |
| 1095 | etc. : |
| 1096 | |
| 1097 | SV *tmp = sv_newmortal(); |
| 1098 | sv_setiv(tmp, an_integer); |
| 1099 | |
| 1100 | As that is multiple C statements it is quite common so see this idiom instead: |
| 1101 | |
| 1102 | SV *tmp = sv_2mortal(newSViv(an_integer)); |
| 1103 | |
| 1104 | The mortal routines are not just for SVs; AVs and HVs can be |
| 1105 | made mortal by passing their address (type-casted to C<SV*>) to the |
| 1106 | C<sv_2mortal> or C<sv_mortalcopy> routines. |
| 1107 | |
| 1108 | =head2 Stashes and Globs |
| 1109 | |
| 1110 | A B<stash> is a hash that contains all variables that are defined |
| 1111 | within a package. Each key of the stash is a symbol |
| 1112 | name (shared by all the different types of objects that have the same |
| 1113 | name), and each value in the hash table is a GV (Glob Value). This GV |
| 1114 | in turn contains references to the various objects of that name, |
| 1115 | including (but not limited to) the following: |
| 1116 | |
| 1117 | Scalar Value |
| 1118 | Array Value |
| 1119 | Hash Value |
| 1120 | I/O Handle |
| 1121 | Format |
| 1122 | Subroutine |
| 1123 | |
| 1124 | There is a single stash called C<PL_defstash> that holds the items that exist |
| 1125 | in the C<main> package. To get at the items in other packages, append the |
| 1126 | string "::" to the package name. The items in the C<Foo> package are in |
| 1127 | the stash C<Foo::> in PL_defstash. The items in the C<Bar::Baz> package are |
| 1128 | in the stash C<Baz::> in C<Bar::>'s stash. |
| 1129 | |
| 1130 | =for apidoc_section $GV |
| 1131 | =for apidoc Amnh||PL_defstash |
| 1132 | |
| 1133 | To get the stash pointer for a particular package, use the function: |
| 1134 | |
| 1135 | HV* gv_stashpv(const char* name, I32 flags) |
| 1136 | HV* gv_stashsv(SV*, I32 flags) |
| 1137 | |
| 1138 | The first function takes a literal string, the second uses the string stored |
| 1139 | in the SV. Remember that a stash is just a hash table, so you get back an |
| 1140 | C<HV*>. The C<flags> flag will create a new package if it is set to GV_ADD. |
| 1141 | |
| 1142 | The name that C<gv_stash*v> wants is the name of the package whose symbol table |
| 1143 | you want. The default package is called C<main>. If you have multiply nested |
| 1144 | packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl |
| 1145 | language itself. |
| 1146 | |
| 1147 | Alternately, if you have an SV that is a blessed reference, you can find |
| 1148 | out the stash pointer by using: |
| 1149 | |
| 1150 | HV* SvSTASH(SvRV(SV*)); |
| 1151 | |
| 1152 | then use the following to get the package name itself: |
| 1153 | |
| 1154 | char* HvNAME(HV* stash); |
| 1155 | |
| 1156 | If you need to bless or re-bless an object you can use the following |
| 1157 | function: |
| 1158 | |
| 1159 | SV* sv_bless(SV*, HV* stash) |
| 1160 | |
| 1161 | where the first argument, an C<SV*>, must be a reference, and the second |
| 1162 | argument is a stash. The returned C<SV*> can now be used in the same way |
| 1163 | as any other SV. |
| 1164 | |
| 1165 | For more information on references and blessings, consult L<perlref>. |
| 1166 | |
| 1167 | =head2 I/O Handles |
| 1168 | |
| 1169 | Like AVs and HVs, IO objects are another type of non-scalar SV which |
| 1170 | may contain input and output L<PerlIO|perlapio> objects or a C<DIR *> |
| 1171 | from opendir(). |
| 1172 | |
| 1173 | You can create a new IO object: |
| 1174 | |
| 1175 | IO* newIO(); |
| 1176 | |
| 1177 | Unlike other SVs, a new IO object is automatically blessed into the |
| 1178 | L<IO::File> class. |
| 1179 | |
| 1180 | The IO object contains an input and output PerlIO handle: |
| 1181 | |
| 1182 | PerlIO *IoIFP(IO *io); |
| 1183 | PerlIO *IoOFP(IO *io); |
| 1184 | |
| 1185 | =for apidoc_section $io |
| 1186 | =for apidoc Amh|PerlIO *|IoIFP|IO *io |
| 1187 | =for apidoc Amh|PerlIO *|IoOFP|IO *io |
| 1188 | |
| 1189 | Typically if the IO object has been opened on a file, the input handle |
| 1190 | is always present, but the output handle is only present if the file |
| 1191 | is open for output. For a file, if both are present they will be the |
| 1192 | same PerlIO object. |
| 1193 | |
| 1194 | Distinct input and output PerlIO objects are created for sockets and |
| 1195 | character devices. |
| 1196 | |
| 1197 | The IO object also contains other data associated with Perl I/O |
| 1198 | handles: |
| 1199 | |
| 1200 | IV IoLINES(io); /* $. */ |
| 1201 | IV IoPAGE(io); /* $% */ |
| 1202 | IV IoPAGE_LEN(io); /* $= */ |
| 1203 | IV IoLINES_LEFT(io); /* $- */ |
| 1204 | char *IoTOP_NAME(io); /* $^ */ |
| 1205 | GV *IoTOP_GV(io); /* $^ */ |
| 1206 | char *IoFMT_NAME(io); /* $~ */ |
| 1207 | GV *IoFMT_GV(io); /* $~ */ |
| 1208 | char *IoBOTTOM_NAME(io); |
| 1209 | GV *IoBOTTOM_GV(io); |
| 1210 | char IoTYPE(io); |
| 1211 | U8 IoFLAGS(io); |
| 1212 | |
| 1213 | =for apidoc_sections $io_scn, $formats_section |
| 1214 | =for apidoc_section $reports |
| 1215 | =for apidoc Amh|IV|IoLINES|IO *io |
| 1216 | =for apidoc Amh|IV|IoPAGE|IO *io |
| 1217 | =for apidoc Amh|IV|IoPAGE_LEN|IO *io |
| 1218 | =for apidoc Amh|IV|IoLINES_LEFT|IO *io |
| 1219 | =for apidoc Amh|char *|IoTOP_NAME|IO *io |
| 1220 | =for apidoc Amh|GV *|IoTOP_GV|IO *io |
| 1221 | =for apidoc Amh|char *|IoFMT_NAME|IO *io |
| 1222 | =for apidoc Amh|GV *|IoFMT_GV|IO *io |
| 1223 | =for apidoc Amh|char *|IoBOTTOM_NAME|IO *io |
| 1224 | =for apidoc Amh|GV *|IoBOTTOM_GV|IO *io |
| 1225 | =for apidoc_section $io |
| 1226 | =for apidoc Amh|char|IoTYPE|IO *io |
| 1227 | =for apidoc Amh|U8|IoFLAGS|IO *io |
| 1228 | |
| 1229 | Most of these are involved with L<formats|perlform>. |
| 1230 | |
| 1231 | IoFLAGs() may contain a combination of flags, the most interesting of |
| 1232 | which are C<IOf_FLUSH> (C<$|>) for autoflush and C<IOf_UNTAINT>, |
| 1233 | settable with L<< IO::Handle's untaint() method|IO::Handle/"$io->untaint" >>. |
| 1234 | |
| 1235 | =for apidoc Amnh||IOf_FLUSH |
| 1236 | =for apidoc Amnh||IOf_UNTAINT |
| 1237 | |
| 1238 | The IO object may also contains a directory handle: |
| 1239 | |
| 1240 | DIR *IoDIRP(io); |
| 1241 | |
| 1242 | =for apidoc Amh|DIR *|IoDIRP|IO *io |
| 1243 | |
| 1244 | suitable for use with PerlDir_read() etc. |
| 1245 | |
| 1246 | All of these accessors macros are lvalues, there are no distinct |
| 1247 | C<_set()> macros to modify the members of the IO object. |
| 1248 | |
| 1249 | =head2 Double-Typed SVs |
| 1250 | |
| 1251 | Scalar variables normally contain only one type of value, an integer, |
| 1252 | double, pointer, or reference. Perl will automatically convert the |
| 1253 | actual scalar data from the stored type into the requested type. |
| 1254 | |
| 1255 | Some scalar variables contain more than one type of scalar data. For |
| 1256 | example, the variable C<$!> contains either the numeric value of C<errno> |
| 1257 | or its string equivalent from either C<strerror> or C<sys_errlist[]>. |
| 1258 | |
| 1259 | To force multiple data values into an SV, you must do two things: use the |
| 1260 | C<sv_set*v> routines to add the additional scalar type, then set a flag |
| 1261 | so that Perl will believe it contains more than one type of data. The |
| 1262 | four macros to set the flags are: |
| 1263 | |
| 1264 | SvIOK_on |
| 1265 | SvNOK_on |
| 1266 | SvPOK_on |
| 1267 | SvROK_on |
| 1268 | |
| 1269 | The particular macro you must use depends on which C<sv_set*v> routine |
| 1270 | you called first. This is because every C<sv_set*v> routine turns on |
| 1271 | only the bit for the particular type of data being set, and turns off |
| 1272 | all the rest. |
| 1273 | |
| 1274 | For example, to create a new Perl variable called "dberror" that contains |
| 1275 | both the numeric and descriptive string error values, you could use the |
| 1276 | following code: |
| 1277 | |
| 1278 | extern int dberror; |
| 1279 | extern char *dberror_list; |
| 1280 | |
| 1281 | SV* sv = get_sv("dberror", GV_ADD); |
| 1282 | sv_setiv(sv, (IV) dberror); |
| 1283 | sv_setpv(sv, dberror_list[dberror]); |
| 1284 | SvIOK_on(sv); |
| 1285 | |
| 1286 | If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the |
| 1287 | macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>. |
| 1288 | |
| 1289 | =head2 Read-Only Values |
| 1290 | |
| 1291 | In Perl 5.16 and earlier, copy-on-write (see the next section) shared a |
| 1292 | flag bit with read-only scalars. So the only way to test whether |
| 1293 | C<sv_setsv>, etc., will raise a "Modification of a read-only value" error |
| 1294 | in those versions is: |
| 1295 | |
| 1296 | SvREADONLY(sv) && !SvIsCOW(sv) |
| 1297 | |
| 1298 | Under Perl 5.18 and later, SvREADONLY only applies to read-only variables, |
| 1299 | and, under 5.20, copy-on-write scalars can also be read-only, so the above |
| 1300 | check is incorrect. You just want: |
| 1301 | |
| 1302 | SvREADONLY(sv) |
| 1303 | |
| 1304 | If you need to do this check often, define your own macro like this: |
| 1305 | |
| 1306 | #if PERL_VERSION >= 18 |
| 1307 | # define SvTRULYREADONLY(sv) SvREADONLY(sv) |
| 1308 | #else |
| 1309 | # define SvTRULYREADONLY(sv) (SvREADONLY(sv) && !SvIsCOW(sv)) |
| 1310 | #endif |
| 1311 | |
| 1312 | =head2 Copy on Write |
| 1313 | |
| 1314 | Perl implements a copy-on-write (COW) mechanism for scalars, in which |
| 1315 | string copies are not immediately made when requested, but are deferred |
| 1316 | until made necessary by one or the other scalar changing. This is mostly |
| 1317 | transparent, but one must take care not to modify string buffers that are |
| 1318 | shared by multiple SVs. |
| 1319 | |
| 1320 | You can test whether an SV is using copy-on-write with C<SvIsCOW(sv)>. |
| 1321 | |
| 1322 | You can force an SV to make its own copy of its string buffer by calling C<sv_force_normal(sv)> or SvPV_force_nolen(sv). |
| 1323 | |
| 1324 | If you want to make the SV drop its string buffer, use |
| 1325 | C<sv_force_normal_flags(sv, SV_COW_DROP_PV)> or simply |
| 1326 | C<sv_setsv(sv, NULL)>. |
| 1327 | |
| 1328 | All of these functions will croak on read-only scalars (see the previous |
| 1329 | section for more on those). |
| 1330 | |
| 1331 | To test that your code is behaving correctly and not modifying COW buffers, |
| 1332 | on systems that support L<mmap(2)> (i.e., Unix) you can configure perl with |
| 1333 | C<-Accflags=-DPERL_DEBUG_READONLY_COW> and it will turn buffer violations |
| 1334 | into crashes. You will find it to be marvellously slow, so you may want to |
| 1335 | skip perl's own tests. |
| 1336 | |
| 1337 | =head2 Magic Variables |
| 1338 | |
| 1339 | [This section still under construction. Ignore everything here. Post no |
| 1340 | bills. Everything not permitted is forbidden.] |
| 1341 | |
| 1342 | Any SV may be magical, that is, it has special features that a normal |
| 1343 | SV does not have. These features are stored in the SV structure in a |
| 1344 | linked list of C<struct magic>'s, typedef'ed to C<MAGIC>. |
| 1345 | |
| 1346 | struct magic { |
| 1347 | MAGIC* mg_moremagic; |
| 1348 | MGVTBL* mg_virtual; |
| 1349 | U16 mg_private; |
| 1350 | char mg_type; |
| 1351 | U8 mg_flags; |
| 1352 | I32 mg_len; |
| 1353 | SV* mg_obj; |
| 1354 | char* mg_ptr; |
| 1355 | }; |
| 1356 | |
| 1357 | Note this is current as of patchlevel 0, and could change at any time. |
| 1358 | |
| 1359 | =head2 Assigning Magic |
| 1360 | |
| 1361 | Perl adds magic to an SV using the sv_magic function: |
| 1362 | |
| 1363 | void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen); |
| 1364 | |
| 1365 | The C<sv> argument is a pointer to the SV that is to acquire a new magical |
| 1366 | feature. |
| 1367 | |
| 1368 | If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to |
| 1369 | convert C<sv> to type C<SVt_PVMG>. |
| 1370 | Perl then continues by adding new magic |
| 1371 | to the beginning of the linked list of magical features. Any prior entry |
| 1372 | of the same type of magic is deleted. Note that this can be overridden, |
| 1373 | and multiple instances of the same type of magic can be associated with an |
| 1374 | SV. |
| 1375 | |
| 1376 | The C<name> and C<namlen> arguments are used to associate a string with |
| 1377 | the magic, typically the name of a variable. C<namlen> is stored in the |
| 1378 | C<mg_len> field and if C<name> is non-null then either a C<savepvn> copy of |
| 1379 | C<name> or C<name> itself is stored in the C<mg_ptr> field, depending on |
| 1380 | whether C<namlen> is greater than zero or equal to zero respectively. As a |
| 1381 | special case, if C<(name && namlen == HEf_SVKEY)> then C<name> is assumed |
| 1382 | to contain an C<SV*> and is stored as-is with its REFCNT incremented. |
| 1383 | |
| 1384 | The sv_magic function uses C<how> to determine which, if any, predefined |
| 1385 | "Magic Virtual Table" should be assigned to the C<mg_virtual> field. |
| 1386 | See the L</Magic Virtual Tables> section below. The C<how> argument is also |
| 1387 | stored in the C<mg_type> field. The value of |
| 1388 | C<how> should be chosen from the set of macros |
| 1389 | C<PERL_MAGIC_foo> found in F<perl.h>. Note that before |
| 1390 | these macros were added, Perl internals used to directly use character |
| 1391 | literals, so you may occasionally come across old code or documentation |
| 1392 | referring to 'U' magic rather than C<PERL_MAGIC_uvar> for example. |
| 1393 | |
| 1394 | The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC> |
| 1395 | structure. If it is not the same as the C<sv> argument, the reference |
| 1396 | count of the C<obj> object is incremented. If it is the same, or if |
| 1397 | the C<how> argument is C<PERL_MAGIC_arylen>, C<PERL_MAGIC_regdatum>, |
| 1398 | C<PERL_MAGIC_regdata>, or if it is a NULL pointer, then C<obj> is merely |
| 1399 | stored, without the reference count being incremented. |
| 1400 | |
| 1401 | See also C<sv_magicext> in L<perlapi> for a more flexible way to add magic |
| 1402 | to an SV. |
| 1403 | |
| 1404 | There is also a function to add magic to an C<HV>: |
| 1405 | |
| 1406 | void hv_magic(HV *hv, GV *gv, int how); |
| 1407 | |
| 1408 | This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>. |
| 1409 | |
| 1410 | To remove the magic from an SV, call the function sv_unmagic: |
| 1411 | |
| 1412 | int sv_unmagic(SV *sv, int type); |
| 1413 | |
| 1414 | The C<type> argument should be equal to the C<how> value when the C<SV> |
| 1415 | was initially made magical. |
| 1416 | |
| 1417 | However, note that C<sv_unmagic> removes all magic of a certain C<type> from the |
| 1418 | C<SV>. If you want to remove only certain |
| 1419 | magic of a C<type> based on the magic |
| 1420 | virtual table, use C<sv_unmagicext> instead: |
| 1421 | |
| 1422 | int sv_unmagicext(SV *sv, int type, MGVTBL *vtbl); |
| 1423 | |
| 1424 | =head2 Magic Virtual Tables |
| 1425 | |
| 1426 | The C<mg_virtual> field in the C<MAGIC> structure is a pointer to an |
| 1427 | C<MGVTBL>, which is a structure of function pointers and stands for |
| 1428 | "Magic Virtual Table" to handle the various operations that might be |
| 1429 | applied to that variable. |
| 1430 | |
| 1431 | =for apidoc_section $magic |
| 1432 | =for apidoc Ayh||MGVTBL |
| 1433 | |
| 1434 | The C<MGVTBL> has five (or sometimes eight) pointers to the following |
| 1435 | routine types: |
| 1436 | |
| 1437 | int (*svt_get) (pTHX_ SV* sv, MAGIC* mg); |
| 1438 | int (*svt_set) (pTHX_ SV* sv, MAGIC* mg); |
| 1439 | U32 (*svt_len) (pTHX_ SV* sv, MAGIC* mg); |
| 1440 | int (*svt_clear)(pTHX_ SV* sv, MAGIC* mg); |
| 1441 | int (*svt_free) (pTHX_ SV* sv, MAGIC* mg); |
| 1442 | |
| 1443 | int (*svt_copy) (pTHX_ SV *sv, MAGIC* mg, SV *nsv, |
| 1444 | const char *name, I32 namlen); |
| 1445 | int (*svt_dup) (pTHX_ MAGIC *mg, CLONE_PARAMS *param); |
| 1446 | int (*svt_local)(pTHX_ SV *nsv, MAGIC *mg); |
| 1447 | |
| 1448 | |
| 1449 | This MGVTBL structure is set at compile-time in F<perl.h> and there are |
| 1450 | currently 32 types. These different structures contain pointers to various |
| 1451 | routines that perform additional actions depending on which function is |
| 1452 | being called. |
| 1453 | |
| 1454 | Function pointer Action taken |
| 1455 | ---------------- ------------ |
| 1456 | svt_get Do something before the value of the SV is |
| 1457 | retrieved. |
| 1458 | svt_set Do something after the SV is assigned a value. |
| 1459 | svt_len Report on the SV's length. |
| 1460 | svt_clear Clear something the SV represents. |
| 1461 | svt_free Free any extra storage associated with the SV. |
| 1462 | |
| 1463 | svt_copy copy tied variable magic to a tied element |
| 1464 | svt_dup duplicate a magic structure during thread cloning |
| 1465 | svt_local copy magic to local value during 'local' |
| 1466 | |
| 1467 | For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds |
| 1468 | to an C<mg_type> of C<PERL_MAGIC_sv>) contains: |
| 1469 | |
| 1470 | { magic_get, magic_set, magic_len, 0, 0 } |
| 1471 | |
| 1472 | Thus, when an SV is determined to be magical and of type C<PERL_MAGIC_sv>, |
| 1473 | if a get operation is being performed, the routine C<magic_get> is |
| 1474 | called. All the various routines for the various magical types begin |
| 1475 | with C<magic_>. NOTE: the magic routines are not considered part of |
| 1476 | the Perl API, and may not be exported by the Perl library. |
| 1477 | |
| 1478 | The last three slots are a recent addition, and for source code |
| 1479 | compatibility they are only checked for if one of the three flags |
| 1480 | C<MGf_COPY>, C<MGf_DUP>, or C<MGf_LOCAL> is set in mg_flags. |
| 1481 | This means that most code can continue declaring |
| 1482 | a vtable as a 5-element value. These three are |
| 1483 | currently used exclusively by the threading code, and are highly subject |
| 1484 | to change. |
| 1485 | |
| 1486 | =for apidoc_section $magic |
| 1487 | =for apidoc Amnh||MGf_COPY |
| 1488 | =for apidoc_item ||MGf_DUP |
| 1489 | =for apidoc_item ||MGf_LOCAL |
| 1490 | |
| 1491 | The current kinds of Magic Virtual Tables are: |
| 1492 | |
| 1493 | =for comment |
| 1494 | This table is generated by regen/mg_vtable.pl. Any changes made here |
| 1495 | will be lost. |
| 1496 | |
| 1497 | =for mg_vtable.pl begin |
| 1498 | |
| 1499 | mg_type |
| 1500 | (old-style char and macro) MGVTBL Type of magic |
| 1501 | -------------------------- ------ ------------- |
| 1502 | \0 PERL_MAGIC_sv vtbl_sv Special scalar variable |
| 1503 | # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary) |
| 1504 | % PERL_MAGIC_rhash (none) Extra data for restricted |
| 1505 | hashes |
| 1506 | * PERL_MAGIC_debugvar vtbl_debugvar $DB::single, signal, trace |
| 1507 | vars |
| 1508 | . PERL_MAGIC_pos vtbl_pos pos() lvalue |
| 1509 | : PERL_MAGIC_symtab (none) Extra data for symbol |
| 1510 | tables |
| 1511 | < PERL_MAGIC_backref vtbl_backref For weak ref data |
| 1512 | @ PERL_MAGIC_arylen_p (none) To move arylen out of XPVAV |
| 1513 | B PERL_MAGIC_bm vtbl_regexp Boyer-Moore |
| 1514 | (fast string search) |
| 1515 | c PERL_MAGIC_overload_table vtbl_ovrld Holds overload table |
| 1516 | (AMT) on stash |
| 1517 | D PERL_MAGIC_regdata vtbl_regdata Regex match position data |
| 1518 | (@+ and @- vars) |
| 1519 | d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data |
| 1520 | element |
| 1521 | E PERL_MAGIC_env vtbl_env %ENV hash |
| 1522 | e PERL_MAGIC_envelem vtbl_envelem %ENV hash element |
| 1523 | f PERL_MAGIC_fm vtbl_regexp Formline |
| 1524 | ('compiled' format) |
| 1525 | g PERL_MAGIC_regex_global vtbl_mglob m//g target |
| 1526 | H PERL_MAGIC_hints vtbl_hints %^H hash |
| 1527 | h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element |
| 1528 | I PERL_MAGIC_isa vtbl_isa @ISA array |
| 1529 | i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element |
| 1530 | k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue |
| 1531 | L PERL_MAGIC_dbfile (none) Debugger %_<filename |
| 1532 | l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename |
| 1533 | element |
| 1534 | N PERL_MAGIC_shared (none) Shared between threads |
| 1535 | n PERL_MAGIC_shared_scalar (none) Shared between threads |
| 1536 | o PERL_MAGIC_collxfrm vtbl_collxfrm Locale transformation |
| 1537 | P PERL_MAGIC_tied vtbl_pack Tied array or hash |
| 1538 | p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element |
| 1539 | q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle |
| 1540 | r PERL_MAGIC_qr vtbl_regexp Precompiled qr// regex |
| 1541 | S PERL_MAGIC_sig vtbl_sig %SIG hash |
| 1542 | s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element |
| 1543 | t PERL_MAGIC_taint vtbl_taint Taintedness |
| 1544 | U PERL_MAGIC_uvar vtbl_uvar Available for use by |
| 1545 | extensions |
| 1546 | u PERL_MAGIC_uvar_elem (none) Reserved for use by |
| 1547 | extensions |
| 1548 | V PERL_MAGIC_vstring (none) SV was vstring literal |
| 1549 | v PERL_MAGIC_vec vtbl_vec vec() lvalue |
| 1550 | w PERL_MAGIC_utf8 vtbl_utf8 Cached UTF-8 information |
| 1551 | X PERL_MAGIC_destruct vtbl_destruct destruct callback |
| 1552 | x PERL_MAGIC_substr vtbl_substr substr() lvalue |
| 1553 | Y PERL_MAGIC_nonelem vtbl_nonelem Array element that does not |
| 1554 | exist |
| 1555 | y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator |
| 1556 | variable / smart parameter |
| 1557 | vivification |
| 1558 | Z PERL_MAGIC_hook vtbl_hook %{^HOOK} hash |
| 1559 | z PERL_MAGIC_hookelem vtbl_hookelem %{^HOOK} hash element |
| 1560 | \ PERL_MAGIC_lvref vtbl_lvref Lvalue reference |
| 1561 | constructor |
| 1562 | ] PERL_MAGIC_checkcall vtbl_checkcall Inlining/mutation of call |
| 1563 | to this CV |
| 1564 | ^ PERL_MAGIC_extvalue (none) Value magic available for |
| 1565 | use by extensions |
| 1566 | ~ PERL_MAGIC_ext (none) Variable magic available |
| 1567 | for use by extensions |
| 1568 | |
| 1569 | |
| 1570 | =for apidoc_section $magic |
| 1571 | =for apidoc AmnhU||PERL_MAGIC_arylen |
| 1572 | =for apidoc_item ||PERL_MAGIC_arylen_p |
| 1573 | =for apidoc_item ||PERL_MAGIC_backref |
| 1574 | =for apidoc_item ||PERL_MAGIC_bm |
| 1575 | =for apidoc_item ||PERL_MAGIC_checkcall |
| 1576 | =for apidoc_item ||PERL_MAGIC_collxfrm |
| 1577 | =for apidoc_item ||PERL_MAGIC_dbfile |
| 1578 | =for apidoc_item ||PERL_MAGIC_dbline |
| 1579 | =for apidoc_item ||PERL_MAGIC_debugvar |
| 1580 | =for apidoc_item ||PERL_MAGIC_defelem |
| 1581 | =for apidoc_item ||PERL_MAGIC_destruct |
| 1582 | =for apidoc_item ||PERL_MAGIC_env |
| 1583 | =for apidoc_item ||PERL_MAGIC_envelem |
| 1584 | =for apidoc_item ||PERL_MAGIC_ext |
| 1585 | =for apidoc_item ||PERL_MAGIC_extvalue |
| 1586 | =for apidoc_item ||PERL_MAGIC_fm |
| 1587 | =for apidoc_item ||PERL_MAGIC_hints |
| 1588 | =for apidoc_item ||PERL_MAGIC_hintselem |
| 1589 | =for apidoc_item ||PERL_MAGIC_hook |
| 1590 | =for apidoc_item ||PERL_MAGIC_hookelem |
| 1591 | =for apidoc_item ||PERL_MAGIC_isa |
| 1592 | =for apidoc_item ||PERL_MAGIC_isaelem |
| 1593 | =for apidoc_item ||PERL_MAGIC_lvref |
| 1594 | =for apidoc_item ||PERL_MAGIC_nkeys |
| 1595 | =for apidoc_item ||PERL_MAGIC_nonelem |
| 1596 | =for apidoc_item ||PERL_MAGIC_overload_table |
| 1597 | =for apidoc_item ||PERL_MAGIC_pos |
| 1598 | =for apidoc_item ||PERL_MAGIC_qr |
| 1599 | =for apidoc_item ||PERL_MAGIC_regdata |
| 1600 | =for apidoc_item ||PERL_MAGIC_regdatum |
| 1601 | =for apidoc_item ||PERL_MAGIC_regex_global |
| 1602 | =for apidoc_item ||PERL_MAGIC_rhash |
| 1603 | =for apidoc_item ||PERL_MAGIC_shared |
| 1604 | =for apidoc_item ||PERL_MAGIC_shared_scalar |
| 1605 | =for apidoc_item ||PERL_MAGIC_sig |
| 1606 | =for apidoc_item ||PERL_MAGIC_sigelem |
| 1607 | =for apidoc_item ||PERL_MAGIC_substr |
| 1608 | =for apidoc_item ||PERL_MAGIC_sv |
| 1609 | =for apidoc_item ||PERL_MAGIC_symtab |
| 1610 | =for apidoc_item ||PERL_MAGIC_taint |
| 1611 | =for apidoc_item ||PERL_MAGIC_tied |
| 1612 | =for apidoc_item ||PERL_MAGIC_tiedelem |
| 1613 | =for apidoc_item ||PERL_MAGIC_tiedscalar |
| 1614 | =for apidoc_item ||PERL_MAGIC_utf8 |
| 1615 | =for apidoc_item ||PERL_MAGIC_uvar |
| 1616 | =for apidoc_item ||PERL_MAGIC_uvar_elem |
| 1617 | =for apidoc_item ||PERL_MAGIC_vec |
| 1618 | =for apidoc_item ||PERL_MAGIC_vstring |
| 1619 | |
| 1620 | =for mg_vtable.pl end |
| 1621 | |
| 1622 | When an uppercase and lowercase letter both exist in the table, then the |
| 1623 | uppercase letter is typically used to represent some kind of composite type |
| 1624 | (a list or a hash), and the lowercase letter is used to represent an element |
| 1625 | of that composite type. Some internals code makes use of this case |
| 1626 | relationship. However, 'v' and 'V' (vec and v-string) are in no way related. |
| 1627 | |
| 1628 | The C<PERL_MAGIC_ext>, C<PERL_MAGIC_extvalue> and C<PERL_MAGIC_uvar> magic types |
| 1629 | are defined specifically for use by extensions and will not be used by perl |
| 1630 | itself. Extensions can use C<PERL_MAGIC_ext> or C<PERL_MAGIC_extvalue> magic to |
| 1631 | 'attach' private information to variables (typically objects). This is |
| 1632 | especially useful because there is no way for normal perl code to corrupt this |
| 1633 | private information (unlike using extra elements of a hash object). |
| 1634 | C<PERL_MAGIC_extvalue> is value magic (unlike C<PERL_MAGIC_ext> and |
| 1635 | C<PERL_MAGIC_uvar>) meaning that on localization the new value will not be |
| 1636 | magical. |
| 1637 | |
| 1638 | Similarly, C<PERL_MAGIC_uvar> magic can be used much like tie() to call a |
| 1639 | C function any time a scalar's value is used or changed. The C<MAGIC>'s |
| 1640 | C<mg_ptr> field points to a C<ufuncs> structure: |
| 1641 | |
| 1642 | struct ufuncs { |
| 1643 | I32 (*uf_val)(pTHX_ IV, SV*); |
| 1644 | I32 (*uf_set)(pTHX_ IV, SV*); |
| 1645 | IV uf_index; |
| 1646 | }; |
| 1647 | |
| 1648 | When the SV is read from or written to, the C<uf_val> or C<uf_set> |
| 1649 | function will be called with C<uf_index> as the first arg and a pointer to |
| 1650 | the SV as the second. A simple example of how to add C<PERL_MAGIC_uvar> |
| 1651 | magic is shown below. Note that the ufuncs structure is copied by |
| 1652 | sv_magic, so you can safely allocate it on the stack. |
| 1653 | |
| 1654 | void |
| 1655 | Umagic(sv) |
| 1656 | SV *sv; |
| 1657 | PREINIT: |
| 1658 | struct ufuncs uf; |
| 1659 | CODE: |
| 1660 | uf.uf_val = &my_get_fn; |
| 1661 | uf.uf_set = &my_set_fn; |
| 1662 | uf.uf_index = 0; |
| 1663 | sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf)); |
| 1664 | |
| 1665 | Attaching C<PERL_MAGIC_uvar> to arrays is permissible but has no effect. |
| 1666 | |
| 1667 | For hashes there is a specialized hook that gives control over hash |
| 1668 | keys (but not values). This hook calls C<PERL_MAGIC_uvar> 'get' magic |
| 1669 | if the "set" function in the C<ufuncs> structure is NULL. The hook |
| 1670 | is activated whenever the hash is accessed with a key specified as |
| 1671 | an C<SV> through the functions C<hv_store_ent>, C<hv_fetch_ent>, |
| 1672 | C<hv_delete_ent>, and C<hv_exists_ent>. Accessing the key as a string |
| 1673 | through the functions without the C<..._ent> suffix circumvents the |
| 1674 | hook. See L<Hash::Util::FieldHash/GUTS> for a detailed description. |
| 1675 | |
| 1676 | Note that because multiple extensions may be using C<PERL_MAGIC_ext> |
| 1677 | or C<PERL_MAGIC_uvar> magic, it is important for extensions to take |
| 1678 | extra care to avoid conflict. Typically only using the magic on |
| 1679 | objects blessed into the same class as the extension is sufficient. |
| 1680 | For C<PERL_MAGIC_ext> magic, it is usually a good idea to define an |
| 1681 | C<MGVTBL>, even if all its fields will be C<0>, so that individual |
| 1682 | C<MAGIC> pointers can be identified as a particular kind of magic |
| 1683 | using their magic virtual table. C<mg_findext> provides an easy way |
| 1684 | to do that: |
| 1685 | |
| 1686 | STATIC MGVTBL my_vtbl = { 0, 0, 0, 0, 0, 0, 0, 0 }; |
| 1687 | |
| 1688 | MAGIC *mg; |
| 1689 | if ((mg = mg_findext(sv, PERL_MAGIC_ext, &my_vtbl))) { |
| 1690 | /* this is really ours, not another module's PERL_MAGIC_ext */ |
| 1691 | my_priv_data_t *priv = (my_priv_data_t *)mg->mg_ptr; |
| 1692 | ... |
| 1693 | } |
| 1694 | |
| 1695 | Also note that the C<sv_set*()> and C<sv_cat*()> functions described |
| 1696 | earlier do B<not> invoke 'set' magic on their targets. This must |
| 1697 | be done by the user either by calling the C<SvSETMAGIC()> macro after |
| 1698 | calling these functions, or by using one of the C<sv_set*_mg()> or |
| 1699 | C<sv_cat*_mg()> functions. Similarly, generic C code must call the |
| 1700 | C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV |
| 1701 | obtained from external sources in functions that don't handle magic. |
| 1702 | See L<perlapi> for a description of these functions. |
| 1703 | For example, calls to the C<sv_cat*()> functions typically need to be |
| 1704 | followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()> |
| 1705 | since their implementation handles 'get' magic. |
| 1706 | |
| 1707 | =head2 Finding Magic |
| 1708 | |
| 1709 | MAGIC *mg_find(SV *sv, int type); /* Finds the magic pointer of that |
| 1710 | * type */ |
| 1711 | |
| 1712 | This routine returns a pointer to a C<MAGIC> structure stored in the SV. |
| 1713 | If the SV does not have that magical |
| 1714 | feature, C<NULL> is returned. If the |
| 1715 | SV has multiple instances of that magical feature, the first one will be |
| 1716 | returned. C<mg_findext> can be used |
| 1717 | to find a C<MAGIC> structure of an SV |
| 1718 | based on both its magic type and its magic virtual table: |
| 1719 | |
| 1720 | MAGIC *mg_findext(SV *sv, int type, MGVTBL *vtbl); |
| 1721 | |
| 1722 | Also, if the SV passed to C<mg_find> or C<mg_findext> is not of type |
| 1723 | SVt_PVMG, Perl may core dump. |
| 1724 | |
| 1725 | int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen); |
| 1726 | |
| 1727 | This routine checks to see what types of magic C<sv> has. If the mg_type |
| 1728 | field is an uppercase letter, then the mg_obj is copied to C<nsv>, but |
| 1729 | the mg_type field is changed to be the lowercase letter. |
| 1730 | |
| 1731 | =head2 Understanding the Magic of Tied Hashes and Arrays |
| 1732 | |
| 1733 | Tied hashes and arrays are magical beasts of the C<PERL_MAGIC_tied> |
| 1734 | magic type. |
| 1735 | |
| 1736 | WARNING: As of the 5.004 release, proper usage of the array and hash |
| 1737 | access functions requires understanding a few caveats. Some |
| 1738 | of these caveats are actually considered bugs in the API, to be fixed |
| 1739 | in later releases, and are bracketed with [MAYCHANGE] below. If |
| 1740 | you find yourself actually applying such information in this section, be |
| 1741 | aware that the behavior may change in the future, umm, without warning. |
| 1742 | |
| 1743 | The perl tie function associates a variable with an object that implements |
| 1744 | the various GET, SET, etc methods. To perform the equivalent of the perl |
| 1745 | tie function from an XSUB, you must mimic this behaviour. The code below |
| 1746 | carries out the necessary steps -- firstly it creates a new hash, and then |
| 1747 | creates a second hash which it blesses into the class which will implement |
| 1748 | the tie methods. Lastly it ties the two hashes together, and returns a |
| 1749 | reference to the new tied hash. Note that the code below does NOT call the |
| 1750 | TIEHASH method in the MyTie class - |
| 1751 | see L</Calling Perl Routines from within C Programs> for details on how |
| 1752 | to do this. |
| 1753 | |
| 1754 | SV* |
| 1755 | mytie() |
| 1756 | PREINIT: |
| 1757 | HV *hash; |
| 1758 | HV *stash; |
| 1759 | SV *tie; |
| 1760 | CODE: |
| 1761 | hash = newHV(); |
| 1762 | tie = newRV_noinc((SV*)newHV()); |
| 1763 | stash = gv_stashpv("MyTie", GV_ADD); |
| 1764 | sv_bless(tie, stash); |
| 1765 | hv_magic(hash, (GV*)tie, PERL_MAGIC_tied); |
| 1766 | SvREFCNT_dec(tie); /* hv_magic() increases tie ref count */ |
| 1767 | RETVAL = newRV_noinc(hash); |
| 1768 | OUTPUT: |
| 1769 | RETVAL |
| 1770 | |
| 1771 | The C<av_store> function, when given a tied array argument, merely |
| 1772 | copies the magic of the array onto the value to be "stored", using |
| 1773 | C<mg_copy>. It may also return NULL, indicating that the value did not |
| 1774 | actually need to be stored in the array. [MAYCHANGE] After a call to |
| 1775 | C<av_store> on a tied array, the caller will usually need to call |
| 1776 | C<mg_set(val)> to actually invoke the perl level "STORE" method on the |
| 1777 | TIEARRAY object. If C<av_store> did return NULL, a call to |
| 1778 | C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory |
| 1779 | leak. [/MAYCHANGE] |
| 1780 | |
| 1781 | The previous paragraph is applicable verbatim to tied hash access using the |
| 1782 | C<hv_store> and C<hv_store_ent> functions as well. |
| 1783 | |
| 1784 | C<av_fetch> and the corresponding hash functions C<hv_fetch> and |
| 1785 | C<hv_fetch_ent> actually return an undefined mortal value whose magic |
| 1786 | has been initialized using C<mg_copy>. Note the value so returned does not |
| 1787 | need to be deallocated, as it is already mortal. [MAYCHANGE] But you will |
| 1788 | need to call C<mg_get()> on the returned value in order to actually invoke |
| 1789 | the perl level "FETCH" method on the underlying TIE object. Similarly, |
| 1790 | you may also call C<mg_set()> on the return value after possibly assigning |
| 1791 | a suitable value to it using C<sv_setsv>, which will invoke the "STORE" |
| 1792 | method on the TIE object. [/MAYCHANGE] |
| 1793 | |
| 1794 | [MAYCHANGE] |
| 1795 | In other words, the array or hash fetch/store functions don't really |
| 1796 | fetch and store actual values in the case of tied arrays and hashes. They |
| 1797 | merely call C<mg_copy> to attach magic to the values that were meant to be |
| 1798 | "stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually |
| 1799 | do the job of invoking the TIE methods on the underlying objects. Thus |
| 1800 | the magic mechanism currently implements a kind of lazy access to arrays |
| 1801 | and hashes. |
| 1802 | |
| 1803 | Currently (as of perl version 5.004), use of the hash and array access |
| 1804 | functions requires the user to be aware of whether they are operating on |
| 1805 | "normal" hashes and arrays, or on their tied variants. The API may be |
| 1806 | changed to provide more transparent access to both tied and normal data |
| 1807 | types in future versions. |
| 1808 | [/MAYCHANGE] |
| 1809 | |
| 1810 | You would do well to understand that the TIEARRAY and TIEHASH interfaces |
| 1811 | are mere sugar to invoke some perl method calls while using the uniform hash |
| 1812 | and array syntax. The use of this sugar imposes some overhead (typically |
| 1813 | about two to four extra opcodes per FETCH/STORE operation, in addition to |
| 1814 | the creation of all the mortal variables required to invoke the methods). |
| 1815 | This overhead will be comparatively small if the TIE methods are themselves |
| 1816 | substantial, but if they are only a few statements long, the overhead |
| 1817 | will not be insignificant. |
| 1818 | |
| 1819 | =head2 Localizing changes |
| 1820 | |
| 1821 | Perl has a very handy construction |
| 1822 | |
| 1823 | { |
| 1824 | local $var = 2; |
| 1825 | ... |
| 1826 | } |
| 1827 | |
| 1828 | This construction is I<approximately> equivalent to |
| 1829 | |
| 1830 | { |
| 1831 | my $oldvar = $var; |
| 1832 | $var = 2; |
| 1833 | ... |
| 1834 | $var = $oldvar; |
| 1835 | } |
| 1836 | |
| 1837 | The biggest difference is that the first construction would |
| 1838 | reinstate the initial value of $var, irrespective of how control exits |
| 1839 | the block: C<goto>, C<return>, C<die>/C<eval>, etc. It is a little bit |
| 1840 | more efficient as well. |
| 1841 | |
| 1842 | There is a way to achieve a similar task from C via Perl API: create a |
| 1843 | I<pseudo-block>, and arrange for some changes to be automatically |
| 1844 | undone at the end of it, either explicit, or via a non-local exit (via |
| 1845 | die()). A I<block>-like construct is created by a pair of |
| 1846 | C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">). |
| 1847 | Such a construct may be created specially for some important localized |
| 1848 | task, or an existing one (like boundaries of enclosing Perl |
| 1849 | subroutine/block, or an existing pair for freeing TMPs) may be |
| 1850 | used. (In the second case the overhead of additional localization must |
| 1851 | be almost negligible.) Note that any XSUB is automatically enclosed in |
| 1852 | an C<ENTER>/C<LEAVE> pair. |
| 1853 | |
| 1854 | Inside such a I<pseudo-block> the following service is available: |
| 1855 | |
| 1856 | =over 4 |
| 1857 | |
| 1858 | =item C<SAVEINT(int i)> |
| 1859 | |
| 1860 | =item C<SAVEIV(IV i)> |
| 1861 | |
| 1862 | =item C<SAVEI32(I32 i)> |
| 1863 | |
| 1864 | =item C<SAVELONG(long i)> |
| 1865 | |
| 1866 | =item C<SAVEI8(I8 i)> |
| 1867 | |
| 1868 | =item C<SAVEI16(I16 i)> |
| 1869 | |
| 1870 | =item C<SAVEBOOL(int i)> |
| 1871 | |
| 1872 | =item C<SAVESTRLEN(STRLEN i)> |
| 1873 | |
| 1874 | These macros arrange things to restore the value of integer variable |
| 1875 | C<i> at the end of the enclosing I<pseudo-block>. |
| 1876 | |
| 1877 | =for apidoc_section $callback |
| 1878 | =for apidoc Amh||SAVEINT|int i |
| 1879 | =for apidoc Amh||SAVEIV|IV i |
| 1880 | =for apidoc Amh||SAVEI32|I32 i |
| 1881 | =for apidoc Amh||SAVELONG|long i |
| 1882 | =for apidoc Amh||SAVEI8|I8 i |
| 1883 | =for apidoc Amh||SAVEI16|I16 i |
| 1884 | =for apidoc Amh||SAVEBOOL|bool i |
| 1885 | =for apidoc Amh||SAVESTRLEN|STRLEN i |
| 1886 | |
| 1887 | =item C<SAVESPTR(s)> |
| 1888 | |
| 1889 | =item C<SAVEPPTR(p)> |
| 1890 | |
| 1891 | These macros arrange things to restore the value of pointers C<s> and |
| 1892 | C<p>. C<s> must be a pointer of a type which survives conversion to |
| 1893 | C<SV*> and back, C<p> should be able to survive conversion to C<char*> |
| 1894 | and back. |
| 1895 | |
| 1896 | =for apidoc Amh||SAVESPTR|SV * s |
| 1897 | =for apidoc Amh||SAVEPPTR|char * p |
| 1898 | |
| 1899 | =item C<SAVERCPV(char **ppv)> |
| 1900 | |
| 1901 | This macro arranges to restore the value of a C<char *> variable which |
| 1902 | was allocated with a call to C<rcpv_new()> to its previous state when |
| 1903 | the current pseudo block is completed. The pointer stored in C<*ppv> at |
| 1904 | the time of the call will be refcount incremented and stored on the save |
| 1905 | stack. Later when the current I<pseudo-block> is completed the value |
| 1906 | stored in C<*ppv> will be refcount decremented, and the previous value |
| 1907 | restored from the savestack which will also be refcount decremented. |
| 1908 | |
| 1909 | This is the C<RCPV> equivalent of C<SAVEGENERICSV()>. |
| 1910 | |
| 1911 | =for apidoc Amh||SAVERCPV|char *pv |
| 1912 | |
| 1913 | =item C<SAVEGENERICSV(SV **psv)> |
| 1914 | |
| 1915 | This macro arranges to restore the value of a C<SV *> variable to its |
| 1916 | previous state when the current pseudo block is completed. The pointer |
| 1917 | stored in C<*psv> at the time of the call will be refcount incremented |
| 1918 | and stored on the save stack. Later when the current I<pseudo-block> is |
| 1919 | completed the value stored in C<*ppv> will be refcount decremented, and |
| 1920 | the previous value restored from the savestack which will also be refcount |
| 1921 | decremented. This the C equivalent of C<local $sv>. |
| 1922 | |
| 1923 | =for apidoc Amh||SAVEGENERICSV|char **psv |
| 1924 | |
| 1925 | =item C<SAVEFREESV(SV *sv)> |
| 1926 | |
| 1927 | The refcount of C<sv> will be decremented at the end of |
| 1928 | I<pseudo-block>. This is similar to C<sv_2mortal> in that it is also a |
| 1929 | mechanism for doing a delayed C<SvREFCNT_dec>. However, while C<sv_2mortal> |
| 1930 | extends the lifetime of C<sv> until the beginning of the next statement, |
| 1931 | C<SAVEFREESV> extends it until the end of the enclosing scope. These |
| 1932 | lifetimes can be wildly different. |
| 1933 | |
| 1934 | Also compare C<SAVEMORTALIZESV>. |
| 1935 | |
| 1936 | =for apidoc Amh||SAVEFREESV|SV* sv |
| 1937 | |
| 1938 | =item C<SAVEMORTALIZESV(SV *sv)> |
| 1939 | |
| 1940 | Just like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current |
| 1941 | scope instead of decrementing its reference count. This usually has the |
| 1942 | effect of keeping C<sv> alive until the statement that called the currently |
| 1943 | live scope has finished executing. |
| 1944 | |
| 1945 | =for apidoc Amh||SAVEMORTALIZESV|SV* sv |
| 1946 | |
| 1947 | =item C<SAVEFREEOP(OP *op)> |
| 1948 | |
| 1949 | The C<OP *> is C<op_free()>ed at the end of I<pseudo-block>. |
| 1950 | |
| 1951 | =for apidoc Amh||SAVEFREEOP|OP *op |
| 1952 | |
| 1953 | =item C<SAVEFREEPV(p)> |
| 1954 | |
| 1955 | The chunk of memory which is pointed to by C<p> is C<Safefree()>ed at the |
| 1956 | end of the current I<pseudo-block>. |
| 1957 | |
| 1958 | =for apidoc Amh||SAVEFREEPV|char *pv |
| 1959 | |
| 1960 | =item C<SAVEFREERCPV(char *pv)> |
| 1961 | |
| 1962 | Ensures that a C<char *> which was created by a call to C<rcpv_new()> is |
| 1963 | C<rcpv_free()>ed at the end of the current I<pseudo-block>. |
| 1964 | |
| 1965 | This is the RCPV equivalent of C<SAVEFREESV()>. |
| 1966 | |
| 1967 | =for apidoc Amh||SAVEFREERCPV|char *pv |
| 1968 | |
| 1969 | =item C<SAVECLEARSV(SV *sv)> |
| 1970 | |
| 1971 | Clears a slot in the current scratchpad which corresponds to C<sv> at |
| 1972 | the end of I<pseudo-block>. |
| 1973 | |
| 1974 | =item C<SAVEDELETE(HV *hv, char *key, I32 length)> |
| 1975 | |
| 1976 | The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The |
| 1977 | string pointed to by C<key> is Safefree()ed. If one has a I<key> in |
| 1978 | short-lived storage, the corresponding string may be reallocated like |
| 1979 | this: |
| 1980 | |
| 1981 | SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf)); |
| 1982 | |
| 1983 | =for apidoc Amh||SAVEDELETE|HV * hv|char * key|I32 length |
| 1984 | |
| 1985 | =item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)> |
| 1986 | |
| 1987 | At the end of I<pseudo-block> the function C<f> is called with the |
| 1988 | only argument C<p> which may be NULL. |
| 1989 | |
| 1990 | =for apidoc Ayh||DESTRUCTORFUNC_NOCONTEXT_t |
| 1991 | =for apidoc Amh||SAVEDESTRUCTOR|DESTRUCTORFUNC_NOCONTEXT_t f|void *p |
| 1992 | |
| 1993 | =item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)> |
| 1994 | |
| 1995 | At the end of I<pseudo-block> the function C<f> is called with the |
| 1996 | implicit context argument (if any), and C<p> which may be NULL. |
| 1997 | |
| 1998 | Note the I<end of the current pseudo-block> may occur much later than |
| 1999 | the I<end of the current statement>. You may wish to look at the |
| 2000 | C<MORTALSVFUNC_X()> macro instead. |
| 2001 | |
| 2002 | =for apidoc Ayh||DESTRUCTORFUNC_t |
| 2003 | =for apidoc Amh||SAVEDESTRUCTOR_X|DESTRUCTORFUNC_t f|void *p |
| 2004 | |
| 2005 | =item C<MORTALSVFUNC_X(SVFUNC_t f, SV *sv)> |
| 2006 | |
| 2007 | At the end of I<the current statement> the function C<f> is called with |
| 2008 | the implicit context argument (if any), and C<sv> which may be NULL. |
| 2009 | |
| 2010 | Be aware that the parameter argument to the destructor function differs |
| 2011 | from the related C<SAVEDESTRUCTOR_X()> in that it MUST be either NULL or |
| 2012 | an C<SV*>. |
| 2013 | |
| 2014 | Note the I<end of the current statement> may occur much before the |
| 2015 | the I<end of the current pseudo-block>. You may wish to look at the |
| 2016 | C<SAVEDESTRUCTOR_X()> macro instead. |
| 2017 | |
| 2018 | =for apidoc Amh||MORTALSVFUNC_X|SVFUNC_t f|SV *sv |
| 2019 | |
| 2020 | =item C<MORTALDESTRUCTOR_SV(SV *coderef, SV *args)> |
| 2021 | |
| 2022 | At the end of I<the current statement> the Perl function contained in |
| 2023 | C<coderef> is called with the arguments provided (if any) in C<args>. |
| 2024 | See the documentation for C<mortal_destructor_sv()> for details on |
| 2025 | the C<args> parameter is handled. |
| 2026 | |
| 2027 | Note the I<end of the current statement> may occur much before the |
| 2028 | the I<end of the current pseudo-block>. If you wish to call a perl |
| 2029 | function at the end of the current pseudo block you should use the |
| 2030 | C<SAVEDESTRUCTOR_X()> API instead, which will require you create a |
| 2031 | C wrapper to call the Perl function. |
| 2032 | |
| 2033 | =for apidoc Amh||MORTALDESTRUCTOR_SV|SV *coderef|SV *args |
| 2034 | |
| 2035 | =item C<SAVESTACK_POS()> |
| 2036 | |
| 2037 | The current offset on the Perl internal stack (cf. C<SP>) is restored |
| 2038 | at the end of I<pseudo-block>. |
| 2039 | |
| 2040 | =for apidoc Amh||SAVESTACK_POS |
| 2041 | |
| 2042 | =back |
| 2043 | |
| 2044 | The following API list contains functions, thus one needs to |
| 2045 | provide pointers to the modifiable data explicitly (either C pointers, |
| 2046 | or Perlish C<GV *>s). Where the above macros take C<int>, a similar |
| 2047 | function takes C<int *>. |
| 2048 | |
| 2049 | Other macros above have functions implementing them, but its probably |
| 2050 | best to just use the macro, and not those or the ones below. |
| 2051 | |
| 2052 | =over 4 |
| 2053 | |
| 2054 | =item C<SV* save_scalar(GV *gv)> |
| 2055 | |
| 2056 | =for apidoc save_scalar |
| 2057 | |
| 2058 | Equivalent to Perl code C<local $gv>. |
| 2059 | |
| 2060 | =item C<AV* save_ary(GV *gv)> |
| 2061 | |
| 2062 | =for apidoc save_ary |
| 2063 | |
| 2064 | =item C<HV* save_hash(GV *gv)> |
| 2065 | |
| 2066 | =for apidoc save_hash |
| 2067 | |
| 2068 | Similar to C<save_scalar>, but localize C<@gv> and C<%gv>. |
| 2069 | |
| 2070 | =item C<void save_item(SV *item)> |
| 2071 | |
| 2072 | =for apidoc save_item |
| 2073 | |
| 2074 | Duplicates the current value of C<SV>. On the exit from the current |
| 2075 | C<ENTER>/C<LEAVE> I<pseudo-block> the value of C<SV> will be restored |
| 2076 | using the stored value. It doesn't handle magic. Use C<save_scalar> if |
| 2077 | magic is affected. |
| 2078 | |
| 2079 | =item C<SV* save_svref(SV **sptr)> |
| 2080 | |
| 2081 | =for apidoc save_svref |
| 2082 | |
| 2083 | Similar to C<save_scalar>, but will reinstate an C<SV *>. |
| 2084 | |
| 2085 | =item C<void save_aptr(AV **aptr)> |
| 2086 | |
| 2087 | =item C<void save_hptr(HV **hptr)> |
| 2088 | |
| 2089 | =for apidoc save_aptr |
| 2090 | =for apidoc save_hptr |
| 2091 | |
| 2092 | Similar to C<save_svref>, but localize C<AV *> and C<HV *>. |
| 2093 | |
| 2094 | =back |
| 2095 | |
| 2096 | The C<Alias> module implements localization of the basic types within the |
| 2097 | I<caller's scope>. People who are interested in how to localize things in |
| 2098 | the containing scope should take a look there too. |
| 2099 | |
| 2100 | =head1 Subroutines |
| 2101 | |
| 2102 | =head2 XSUBs and the Argument Stack |
| 2103 | |
| 2104 | The XSUB mechanism is a simple way for Perl programs to access C subroutines. |
| 2105 | An XSUB routine will have a stack that contains the arguments from the Perl |
| 2106 | program, and a way to map from the Perl data structures to a C equivalent. |
| 2107 | |
| 2108 | The stack arguments are accessible through the C<ST(n)> macro, which returns |
| 2109 | the C<n>'th stack argument. Argument 0 is the first argument passed in the |
| 2110 | Perl subroutine call. These arguments are C<SV*>, and can be used anywhere |
| 2111 | an C<SV*> is used. |
| 2112 | |
| 2113 | Most of the time, output from the C routine can be handled through use of |
| 2114 | the RETVAL and OUTPUT directives. However, there are some cases where the |
| 2115 | argument stack is not already long enough to handle all the return values. |
| 2116 | An example is the POSIX tzname() call, which takes no arguments, but returns |
| 2117 | two, the local time zone's standard and summer time abbreviations. |
| 2118 | |
| 2119 | To handle this situation, the PPCODE directive is used and the stack is |
| 2120 | extended using the macro: |
| 2121 | |
| 2122 | EXTEND(SP, num); |
| 2123 | |
| 2124 | where C<SP> is the macro that represents the local copy of the stack pointer, |
| 2125 | and C<num> is the number of elements the stack should be extended by. |
| 2126 | |
| 2127 | Now that there is room on the stack, values can be pushed on it using C<PUSHs> |
| 2128 | macro. The pushed values will often need to be "mortal" (See |
| 2129 | L</Reference Counts and Mortality>): |
| 2130 | |
| 2131 | PUSHs(sv_2mortal(newSViv(an_integer))) |
| 2132 | PUSHs(sv_2mortal(newSVuv(an_unsigned_integer))) |
| 2133 | PUSHs(sv_2mortal(newSVnv(a_double))) |
| 2134 | PUSHs(sv_2mortal(newSVpv("Some String",0))) |
| 2135 | /* Although the last example is better written as the more |
| 2136 | * efficient: */ |
| 2137 | PUSHs(newSVpvs_flags("Some String", SVs_TEMP)) |
| 2138 | |
| 2139 | And now the Perl program calling C<tzname>, the two values will be assigned |
| 2140 | as in: |
| 2141 | |
| 2142 | ($standard_abbrev, $summer_abbrev) = POSIX::tzname; |
| 2143 | |
| 2144 | An alternate (and possibly simpler) method to pushing values on the stack is |
| 2145 | to use the macro: |
| 2146 | |
| 2147 | XPUSHs(SV*) |
| 2148 | |
| 2149 | This macro automatically adjusts the stack for you, if needed. Thus, you |
| 2150 | do not need to call C<EXTEND> to extend the stack. |
| 2151 | |
| 2152 | Despite their suggestions in earlier versions of this document the macros |
| 2153 | C<(X)PUSH[iunp]> are I<not> suited to XSUBs which return multiple results. |
| 2154 | For that, either stick to the C<(X)PUSHs> macros shown above, or use the new |
| 2155 | C<m(X)PUSH[iunp]> macros instead; see L</Putting a C value on Perl stack>. |
| 2156 | |
| 2157 | For more information, consult L<perlxs> and L<perlxstut>. |
| 2158 | |
| 2159 | =head2 Autoloading with XSUBs |
| 2160 | |
| 2161 | If an AUTOLOAD routine is an XSUB, as with Perl subroutines, Perl puts the |
| 2162 | fully-qualified name of the autoloaded subroutine in the $AUTOLOAD variable |
| 2163 | of the XSUB's package. |
| 2164 | |
| 2165 | But it also puts the same information in certain fields of the XSUB itself: |
| 2166 | |
| 2167 | HV *stash = CvSTASH(cv); |
| 2168 | const char *subname = SvPVX(cv); |
| 2169 | STRLEN name_length = SvCUR(cv); /* in bytes */ |
| 2170 | U32 is_utf8 = SvUTF8(cv); |
| 2171 | |
| 2172 | C<SvPVX(cv)> contains just the sub name itself, not including the package. |
| 2173 | For an AUTOLOAD routine in UNIVERSAL or one of its superclasses, |
| 2174 | C<CvSTASH(cv)> returns NULL during a method call on a nonexistent package. |
| 2175 | |
| 2176 | B<Note>: Setting $AUTOLOAD stopped working in 5.6.1, which did not support |
| 2177 | XS AUTOLOAD subs at all. Perl 5.8.0 introduced the use of fields in the |
| 2178 | XSUB itself. Perl 5.16.0 restored the setting of $AUTOLOAD. If you need |
| 2179 | to support 5.8-5.14, use the XSUB's fields. |
| 2180 | |
| 2181 | =head2 Calling Perl Routines from within C Programs |
| 2182 | |
| 2183 | There are four routines that can be used to call a Perl subroutine from |
| 2184 | within a C program. These four are: |
| 2185 | |
| 2186 | I32 call_sv(SV*, I32); |
| 2187 | I32 call_pv(const char*, I32); |
| 2188 | I32 call_method(const char*, I32); |
| 2189 | I32 call_argv(const char*, I32, char**); |
| 2190 | |
| 2191 | The routine most often used is C<call_sv>. The C<SV*> argument |
| 2192 | contains either the name of the Perl subroutine to be called, or a |
| 2193 | reference to the subroutine. The second argument consists of flags |
| 2194 | that control the context in which the subroutine is called, whether |
| 2195 | or not the subroutine is being passed arguments, how errors should be |
| 2196 | trapped, and how to treat return values. |
| 2197 | |
| 2198 | All four routines return the number of arguments that the subroutine returned |
| 2199 | on the Perl stack. |
| 2200 | |
| 2201 | These routines used to be called C<perl_call_sv>, etc., before Perl v5.6.0, |
| 2202 | but those names are now deprecated; macros of the same name are provided for |
| 2203 | compatibility. |
| 2204 | |
| 2205 | When using any of these routines (except C<call_argv>), the programmer |
| 2206 | must manipulate the Perl stack. These include the following macros and |
| 2207 | functions: |
| 2208 | |
| 2209 | dSP |
| 2210 | SP |
| 2211 | PUSHMARK() |
| 2212 | PUTBACK |
| 2213 | SPAGAIN |
| 2214 | ENTER |
| 2215 | SAVETMPS |
| 2216 | FREETMPS |
| 2217 | LEAVE |
| 2218 | XPUSH*() |
| 2219 | POP*() |
| 2220 | |
| 2221 | For a detailed description of calling conventions from C to Perl, |
| 2222 | consult L<perlcall>. |
| 2223 | |
| 2224 | =head2 Putting a C value on Perl stack |
| 2225 | |
| 2226 | A lot of opcodes (this is an elementary operation in the internal perl |
| 2227 | stack machine) put an SV* on the stack. However, as an optimization |
| 2228 | the corresponding SV is (usually) not recreated each time. The opcodes |
| 2229 | reuse specially assigned SVs (I<target>s) which are (as a corollary) |
| 2230 | not constantly freed/created. |
| 2231 | |
| 2232 | Each of the targets is created only once (but see |
| 2233 | L</Scratchpads and recursion> below), and when an opcode needs to put |
| 2234 | an integer, a double, or a string on the stack, it just sets the |
| 2235 | corresponding parts of its I<target> and puts the I<target> on stack. |
| 2236 | |
| 2237 | The macro to put this target on stack is C<PUSHTARG>, and it is |
| 2238 | directly used in some opcodes, as well as indirectly in zillions of |
| 2239 | others, which use it via C<(X)PUSH[iunp]>. |
| 2240 | |
| 2241 | Because the target is reused, you must be careful when pushing multiple |
| 2242 | values on the stack. The following code will not do what you think: |
| 2243 | |
| 2244 | XPUSHi(10); |
| 2245 | XPUSHi(20); |
| 2246 | |
| 2247 | This translates as "set C<TARG> to 10, push a pointer to C<TARG> onto |
| 2248 | the stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack". |
| 2249 | At the end of the operation, the stack does not contain the values 10 |
| 2250 | and 20, but actually contains two pointers to C<TARG>, which we have set |
| 2251 | to 20. |
| 2252 | |
| 2253 | If you need to push multiple different values then you should either use |
| 2254 | the C<(X)PUSHs> macros, or else use the new C<m(X)PUSH[iunp]> macros, |
| 2255 | none of which make use of C<TARG>. The C<(X)PUSHs> macros simply push an |
| 2256 | SV* on the stack, which, as noted under L</XSUBs and the Argument Stack>, |
| 2257 | will often need to be "mortal". The new C<m(X)PUSH[iunp]> macros make |
| 2258 | this a little easier to achieve by creating a new mortal for you (via |
| 2259 | C<(X)PUSHmortal>), pushing that onto the stack (extending it if necessary |
| 2260 | in the case of the C<mXPUSH[iunp]> macros), and then setting its value. |
| 2261 | Thus, instead of writing this to "fix" the example above: |
| 2262 | |
| 2263 | XPUSHs(sv_2mortal(newSViv(10))) |
| 2264 | XPUSHs(sv_2mortal(newSViv(20))) |
| 2265 | |
| 2266 | you can simply write: |
| 2267 | |
| 2268 | mXPUSHi(10) |
| 2269 | mXPUSHi(20) |
| 2270 | |
| 2271 | On a related note, if you do use C<(X)PUSH[iunp]>, then you're going to |
| 2272 | need a C<dTARG> in your variable declarations so that the C<*PUSH*> |
| 2273 | macros can make use of the local variable C<TARG>. See also |
| 2274 | C<dTARGET> and C<dXSTARG>. |
| 2275 | |
| 2276 | =head2 Scratchpads |
| 2277 | |
| 2278 | The question remains on when the SVs which are I<target>s for opcodes |
| 2279 | are created. The answer is that they are created when the current |
| 2280 | unit--a subroutine or a file (for opcodes for statements outside of |
| 2281 | subroutines)--is compiled. During this time a special anonymous Perl |
| 2282 | array is created, which is called a scratchpad for the current unit. |
| 2283 | |
| 2284 | A scratchpad keeps SVs which are lexicals for the current unit and are |
| 2285 | targets for opcodes. A previous version of this document |
| 2286 | stated that one can deduce that an SV lives on a scratchpad |
| 2287 | by looking on its flags: lexicals have C<SVs_PADMY> set, and |
| 2288 | I<target>s have C<SVs_PADTMP> set. But this has never been fully true. |
| 2289 | C<SVs_PADMY> could be set on a variable that no longer resides in any pad. |
| 2290 | While I<target>s do have C<SVs_PADTMP> set, it can also be set on variables |
| 2291 | that have never resided in a pad, but nonetheless act like I<target>s. As |
| 2292 | of perl 5.21.5, the C<SVs_PADMY> flag is no longer used and is defined as |
| 2293 | 0. C<SvPADMY()> now returns true for anything without C<SVs_PADTMP>. |
| 2294 | |
| 2295 | =for apidoc_section $pad |
| 2296 | =for apidoc Amnh||SVs_PADTMP |
| 2297 | =for apidoc AmnhD||SVs_PADMY |
| 2298 | |
| 2299 | The correspondence between OPs and I<target>s is not 1-to-1. Different |
| 2300 | OPs in the compile tree of the unit can use the same target, if this |
| 2301 | would not conflict with the expected life of the temporary. |
| 2302 | |
| 2303 | =head2 Scratchpads and recursion |
| 2304 | |
| 2305 | In fact it is not 100% true that a compiled unit contains a pointer to |
| 2306 | the scratchpad AV. In fact it contains a pointer to an AV of |
| 2307 | (initially) one element, and this element is the scratchpad AV. Why do |
| 2308 | we need an extra level of indirection? |
| 2309 | |
| 2310 | The answer is B<recursion>, and maybe B<threads>. Both |
| 2311 | these can create several execution pointers going into the same |
| 2312 | subroutine. For the subroutine-child not write over the temporaries |
| 2313 | for the subroutine-parent (lifespan of which covers the call to the |
| 2314 | child), the parent and the child should have different |
| 2315 | scratchpads. (I<And> the lexicals should be separate anyway!) |
| 2316 | |
| 2317 | So each subroutine is born with an array of scratchpads (of length 1). |
| 2318 | On each entry to the subroutine it is checked that the current |
| 2319 | depth of the recursion is not more than the length of this array, and |
| 2320 | if it is, new scratchpad is created and pushed into the array. |
| 2321 | |
| 2322 | The I<target>s on this scratchpad are C<undef>s, but they are already |
| 2323 | marked with correct flags. |
| 2324 | |
| 2325 | =head1 Memory Allocation |
| 2326 | |
| 2327 | =head2 Allocation |
| 2328 | |
| 2329 | All memory meant to be used with the Perl API functions should be manipulated |
| 2330 | using the macros described in this section. The macros provide the necessary |
| 2331 | transparency between differences in the actual malloc implementation that is |
| 2332 | used within perl. |
| 2333 | |
| 2334 | The following three macros are used to initially allocate memory : |
| 2335 | |
| 2336 | Newx(pointer, number, type); |
| 2337 | Newxc(pointer, number, type, cast); |
| 2338 | Newxz(pointer, number, type); |
| 2339 | |
| 2340 | The first argument C<pointer> should be the name of a variable that will |
| 2341 | point to the newly allocated memory. |
| 2342 | |
| 2343 | The second and third arguments C<number> and C<type> specify how many of |
| 2344 | the specified type of data structure should be allocated. The argument |
| 2345 | C<type> is passed to C<sizeof>. The final argument to C<Newxc>, C<cast>, |
| 2346 | should be used if the C<pointer> argument is different from the C<type> |
| 2347 | argument. |
| 2348 | |
| 2349 | Unlike the C<Newx> and C<Newxc> macros, the C<Newxz> macro calls C<memzero> |
| 2350 | to zero out all the newly allocated memory. |
| 2351 | |
| 2352 | =head2 Reallocation |
| 2353 | |
| 2354 | Renew(pointer, number, type); |
| 2355 | Renewc(pointer, number, type, cast); |
| 2356 | Safefree(pointer) |
| 2357 | |
| 2358 | These three macros are used to change a memory buffer size or to free a |
| 2359 | piece of memory no longer needed. The arguments to C<Renew> and C<Renewc> |
| 2360 | match those of C<New> and C<Newc> with the exception of not needing the |
| 2361 | "magic cookie" argument. |
| 2362 | |
| 2363 | =head2 Moving |
| 2364 | |
| 2365 | Move(source, dest, number, type); |
| 2366 | Copy(source, dest, number, type); |
| 2367 | Zero(dest, number, type); |
| 2368 | |
| 2369 | These three macros are used to move, copy, or zero out previously allocated |
| 2370 | memory. The C<source> and C<dest> arguments point to the source and |
| 2371 | destination starting points. Perl will move, copy, or zero out C<number> |
| 2372 | instances of the size of the C<type> data structure (using the C<sizeof> |
| 2373 | function). |
| 2374 | |
| 2375 | =head1 PerlIO |
| 2376 | |
| 2377 | The most recent development releases of Perl have been experimenting with |
| 2378 | removing Perl's dependency on the "normal" standard I/O suite and allowing |
| 2379 | other stdio implementations to be used. This involves creating a new |
| 2380 | abstraction layer that then calls whichever implementation of stdio Perl |
| 2381 | was compiled with. All XSUBs should now use the functions in the PerlIO |
| 2382 | abstraction layer and not make any assumptions about what kind of stdio |
| 2383 | is being used. |
| 2384 | |
| 2385 | For a complete description of the PerlIO abstraction, consult L<perlapio>. |
| 2386 | |
| 2387 | =head1 Compiled code |
| 2388 | |
| 2389 | =head2 Code tree |
| 2390 | |
| 2391 | Here we describe the internal form your code is converted to by |
| 2392 | Perl. Start with a simple example: |
| 2393 | |
| 2394 | $a = $b + $c; |
| 2395 | |
| 2396 | This is converted to a tree similar to this one: |
| 2397 | |
| 2398 | assign-to |
| 2399 | / \ |
| 2400 | + $a |
| 2401 | / \ |
| 2402 | $b $c |
| 2403 | |
| 2404 | (but slightly more complicated). This tree reflects the way Perl |
| 2405 | parsed your code, but has nothing to do with the execution order. |
| 2406 | There is an additional "thread" going through the nodes of the tree |
| 2407 | which shows the order of execution of the nodes. In our simplified |
| 2408 | example above it looks like: |
| 2409 | |
| 2410 | $b ---> $c ---> + ---> $a ---> assign-to |
| 2411 | |
| 2412 | But with the actual compile tree for C<$a = $b + $c> it is different: |
| 2413 | some nodes I<optimized away>. As a corollary, though the actual tree |
| 2414 | contains more nodes than our simplified example, the execution order |
| 2415 | is the same as in our example. |
| 2416 | |
| 2417 | =head2 Examining the tree |
| 2418 | |
| 2419 | If you have your perl compiled for debugging (usually done with |
| 2420 | C<-DDEBUGGING> on the C<Configure> command line), you may examine the |
| 2421 | compiled tree by specifying C<-Dx> on the Perl command line. The |
| 2422 | output takes several lines per node, and for C<$b+$c> it looks like |
| 2423 | this: |
| 2424 | |
| 2425 | 5 TYPE = add ===> 6 |
| 2426 | TARG = 1 |
| 2427 | FLAGS = (SCALAR,KIDS) |
| 2428 | { |
| 2429 | TYPE = null ===> (4) |
| 2430 | (was rv2sv) |
| 2431 | FLAGS = (SCALAR,KIDS) |
| 2432 | { |
| 2433 | 3 TYPE = gvsv ===> 4 |
| 2434 | FLAGS = (SCALAR) |
| 2435 | GV = main::b |
| 2436 | } |
| 2437 | } |
| 2438 | { |
| 2439 | TYPE = null ===> (5) |
| 2440 | (was rv2sv) |
| 2441 | FLAGS = (SCALAR,KIDS) |
| 2442 | { |
| 2443 | 4 TYPE = gvsv ===> 5 |
| 2444 | FLAGS = (SCALAR) |
| 2445 | GV = main::c |
| 2446 | } |
| 2447 | } |
| 2448 | |
| 2449 | This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are |
| 2450 | not optimized away (one per number in the left column). The immediate |
| 2451 | children of the given node correspond to C<{}> pairs on the same level |
| 2452 | of indentation, thus this listing corresponds to the tree: |
| 2453 | |
| 2454 | add |
| 2455 | / \ |
| 2456 | null null |
| 2457 | | | |
| 2458 | gvsv gvsv |
| 2459 | |
| 2460 | The execution order is indicated by C<===E<gt>> marks, thus it is C<3 |
| 2461 | 4 5 6> (node C<6> is not included into above listing), i.e., |
| 2462 | C<gvsv gvsv add whatever>. |
| 2463 | |
| 2464 | Each of these nodes represents an op, a fundamental operation inside the |
| 2465 | Perl core. The code which implements each operation can be found in the |
| 2466 | F<pp*.c> files; the function which implements the op with type C<gvsv> |
| 2467 | is C<pp_gvsv>, and so on. As the tree above shows, different ops have |
| 2468 | different numbers of children: C<add> is a binary operator, as one would |
| 2469 | expect, and so has two children. To accommodate the various different |
| 2470 | numbers of children, there are various types of op data structure, and |
| 2471 | they link together in different ways. |
| 2472 | |
| 2473 | The simplest type of op structure is C<OP>: this has no children. Unary |
| 2474 | operators, C<UNOP>s, have one child, and this is pointed to by the |
| 2475 | C<op_first> field. Binary operators (C<BINOP>s) have not only an |
| 2476 | C<op_first> field but also an C<op_last> field. The most complex type of |
| 2477 | op is a C<LISTOP>, which has any number of children. In this case, the |
| 2478 | first child is pointed to by C<op_first> and the last child by |
| 2479 | C<op_last>. The children in between can be found by iteratively |
| 2480 | following the C<OpSIBLING> pointer from the first child to the last (but |
| 2481 | see below). |
| 2482 | |
| 2483 | =for apidoc_section $optree_construction |
| 2484 | =for apidoc Ayh||OP |
| 2485 | =for apidoc Ayh||BINOP |
| 2486 | =for apidoc Ayh||LISTOP |
| 2487 | =for apidoc Ayh||UNOP |
| 2488 | |
| 2489 | There are also some other op types: a C<PMOP> holds a regular expression, |
| 2490 | and has no children, and a C<LOOP> may or may not have children. If the |
| 2491 | C<op_children> field is non-zero, it behaves like a C<LISTOP>. To |
| 2492 | complicate matters, if a C<UNOP> is actually a C<null> op after |
| 2493 | optimization (see L</Compile pass 2: context propagation>) it will still |
| 2494 | have children in accordance with its former type. |
| 2495 | |
| 2496 | =for apidoc Ayh||LOOP |
| 2497 | =for apidoc Ayh||PMOP |
| 2498 | |
| 2499 | Finally, there is a C<LOGOP>, or logic op. Like a C<LISTOP>, this has one |
| 2500 | or more children, but it doesn't have an C<op_last> field: so you have to |
| 2501 | follow C<op_first> and then the C<OpSIBLING> chain itself to find the |
| 2502 | last child. Instead it has an C<op_other> field, which is comparable to |
| 2503 | the C<op_next> field described below, and represents an alternate |
| 2504 | execution path. Operators like C<and>, C<or> and C<?> are C<LOGOP>s. Note |
| 2505 | that in general, C<op_other> may not point to any of the direct children |
| 2506 | of the C<LOGOP>. |
| 2507 | |
| 2508 | =for apidoc Ayh||LOGOP |
| 2509 | |
| 2510 | Starting in version 5.21.2, perls built with the experimental |
| 2511 | define C<-DPERL_OP_PARENT> add an extra boolean flag for each op, |
| 2512 | C<op_moresib>. When not set, this indicates that this is the last op in an |
| 2513 | C<OpSIBLING> chain. This frees up the C<op_sibling> field on the last |
| 2514 | sibling to point back to the parent op. Under this build, that field is |
| 2515 | also renamed C<op_sibparent> to reflect its joint role. The macro |
| 2516 | C<OpSIBLING(o)> wraps this special behaviour, and always returns NULL on |
| 2517 | the last sibling. With this build the C<op_parent(o)> function can be |
| 2518 | used to find the parent of any op. Thus for forward compatibility, you |
| 2519 | should always use the C<OpSIBLING(o)> macro rather than accessing |
| 2520 | C<op_sibling> directly. |
| 2521 | |
| 2522 | Another way to examine the tree is to use a compiler back-end module, such |
| 2523 | as L<B::Concise>. |
| 2524 | |
| 2525 | =head2 Compile pass 1: check routines |
| 2526 | |
| 2527 | The tree is created by the compiler while I<yacc> code feeds it |
| 2528 | the constructions it recognizes. Since I<yacc> works bottom-up, so does |
| 2529 | the first pass of perl compilation. |
| 2530 | |
| 2531 | What makes this pass interesting for perl developers is that some |
| 2532 | optimization may be performed on this pass. This is optimization by |
| 2533 | so-called "check routines". The correspondence between node names |
| 2534 | and corresponding check routines is described in F<opcode.pl> (do not |
| 2535 | forget to run C<make regen_headers> if you modify this file). |
| 2536 | |
| 2537 | A check routine is called when the node is fully constructed except |
| 2538 | for the execution-order thread. Since at this time there are no |
| 2539 | back-links to the currently constructed node, one can do most any |
| 2540 | operation to the top-level node, including freeing it and/or creating |
| 2541 | new nodes above/below it. |
| 2542 | |
| 2543 | The check routine returns the node which should be inserted into the |
| 2544 | tree (if the top-level node was not modified, check routine returns |
| 2545 | its argument). |
| 2546 | |
| 2547 | By convention, check routines have names C<ck_*>. They are usually |
| 2548 | called from C<new*OP> subroutines (or C<convert>) (which in turn are |
| 2549 | called from F<perly.y>). |
| 2550 | |
| 2551 | =head2 Compile pass 1a: constant folding |
| 2552 | |
| 2553 | Immediately after the check routine is called the returned node is |
| 2554 | checked for being compile-time executable. If it is (the value is |
| 2555 | judged to be constant) it is immediately executed, and a I<constant> |
| 2556 | node with the "return value" of the corresponding subtree is |
| 2557 | substituted instead. The subtree is deleted. |
| 2558 | |
| 2559 | If constant folding was not performed, the execution-order thread is |
| 2560 | created. |
| 2561 | |
| 2562 | =head2 Compile pass 2: context propagation |
| 2563 | |
| 2564 | When a context for a part of compile tree is known, it is propagated |
| 2565 | down through the tree. At this time the context can have 5 values |
| 2566 | (instead of 2 for runtime context): void, boolean, scalar, list, and |
| 2567 | lvalue. In contrast with the pass 1 this pass is processed from top |
| 2568 | to bottom: a node's context determines the context for its children. |
| 2569 | |
| 2570 | Additional context-dependent optimizations are performed at this time. |
| 2571 | Since at this moment the compile tree contains back-references (via |
| 2572 | "thread" pointers), nodes cannot be free()d now. To allow |
| 2573 | optimized-away nodes at this stage, such nodes are null()ified instead |
| 2574 | of free()ing (i.e. their type is changed to OP_NULL). |
| 2575 | |
| 2576 | =head2 Compile pass 3: peephole optimization |
| 2577 | |
| 2578 | After the compile tree for a subroutine (or for an C<eval> or a file) |
| 2579 | is created, an additional pass over the code is performed. This pass |
| 2580 | is neither top-down or bottom-up, but in the execution order (with |
| 2581 | additional complications for conditionals). Optimizations performed |
| 2582 | at this stage are subject to the same restrictions as in the pass 2. |
| 2583 | |
| 2584 | Peephole optimizations are done by calling the function pointed to |
| 2585 | by the global variable C<PL_peepp>. By default, C<PL_peepp> just |
| 2586 | calls the function pointed to by the global variable C<PL_rpeepp>. |
| 2587 | By default, that performs some basic op fixups and optimisations along |
| 2588 | the execution-order op chain, and recursively calls C<PL_rpeepp> for |
| 2589 | each side chain of ops (resulting from conditionals). Extensions may |
| 2590 | provide additional optimisations or fixups, hooking into either the |
| 2591 | per-subroutine or recursive stage, like this: |
| 2592 | |
| 2593 | static peep_t prev_peepp; |
| 2594 | static void my_peep(pTHX_ OP *o) |
| 2595 | { |
| 2596 | /* custom per-subroutine optimisation goes here */ |
| 2597 | prev_peepp(aTHX_ o); |
| 2598 | /* custom per-subroutine optimisation may also go here */ |
| 2599 | } |
| 2600 | BOOT: |
| 2601 | prev_peepp = PL_peepp; |
| 2602 | PL_peepp = my_peep; |
| 2603 | |
| 2604 | static peep_t prev_rpeepp; |
| 2605 | static void my_rpeep(pTHX_ OP *first) |
| 2606 | { |
| 2607 | OP *o = first, *t = first; |
| 2608 | for(; o = o->op_next, t = t->op_next) { |
| 2609 | /* custom per-op optimisation goes here */ |
| 2610 | o = o->op_next; |
| 2611 | if (!o || o == t) break; |
| 2612 | /* custom per-op optimisation goes AND here */ |
| 2613 | } |
| 2614 | prev_rpeepp(aTHX_ orig_o); |
| 2615 | } |
| 2616 | BOOT: |
| 2617 | prev_rpeepp = PL_rpeepp; |
| 2618 | PL_rpeepp = my_rpeep; |
| 2619 | |
| 2620 | =for apidoc_section $optree_manipulation |
| 2621 | =for apidoc Ayh||peep_t |
| 2622 | |
| 2623 | =head2 Pluggable runops |
| 2624 | |
| 2625 | The compile tree is executed in a runops function. There are two runops |
| 2626 | functions, in F<run.c> and in F<dump.c>. C<Perl_runops_debug> is used |
| 2627 | with DEBUGGING and C<Perl_runops_standard> is used otherwise. For fine |
| 2628 | control over the execution of the compile tree it is possible to provide |
| 2629 | your own runops function. |
| 2630 | |
| 2631 | It's probably best to copy one of the existing runops functions and |
| 2632 | change it to suit your needs. Then, in the BOOT section of your XS |
| 2633 | file, add the line: |
| 2634 | |
| 2635 | PL_runops = my_runops; |
| 2636 | |
| 2637 | =for apidoc_section $debugging |
| 2638 | =for apidoc runops_debug |
| 2639 | =for apidoc runops_standard |
| 2640 | =for apidoc Amnh|runops_proc_t|PL_runops |
| 2641 | |
| 2642 | This function should be as efficient as possible to keep your programs |
| 2643 | running as fast as possible. |
| 2644 | |
| 2645 | =head2 Compile-time scope hooks |
| 2646 | |
| 2647 | As of perl 5.14 it is possible to hook into the compile-time lexical |
| 2648 | scope mechanism using C<Perl_blockhook_register>. This is used like |
| 2649 | this: |
| 2650 | |
| 2651 | STATIC void my_start_hook(pTHX_ int full); |
| 2652 | STATIC BHK my_hooks; |
| 2653 | |
| 2654 | BOOT: |
| 2655 | BhkENTRY_set(&my_hooks, bhk_start, my_start_hook); |
| 2656 | Perl_blockhook_register(aTHX_ &my_hooks); |
| 2657 | |
| 2658 | This will arrange to have C<my_start_hook> called at the start of |
| 2659 | compiling every lexical scope. The available hooks are: |
| 2660 | |
| 2661 | =for apidoc_section $lexer |
| 2662 | =for apidoc Ayh||BHK |
| 2663 | |
| 2664 | =over 4 |
| 2665 | |
| 2666 | =item C<void bhk_start(pTHX_ int full)> |
| 2667 | |
| 2668 | This is called just after starting a new lexical scope. Note that Perl |
| 2669 | code like |
| 2670 | |
| 2671 | if ($x) { ... } |
| 2672 | |
| 2673 | creates two scopes: the first starts at the C<(> and has C<full == 1>, |
| 2674 | the second starts at the C<{> and has C<full == 0>. Both end at the |
| 2675 | C<}>, so calls to C<start> and C<pre>/C<post_end> will match. Anything |
| 2676 | pushed onto the save stack by this hook will be popped just before the |
| 2677 | scope ends (between the C<pre_> and C<post_end> hooks, in fact). |
| 2678 | |
| 2679 | =item C<void bhk_pre_end(pTHX_ OP **o)> |
| 2680 | |
| 2681 | This is called at the end of a lexical scope, just before unwinding the |
| 2682 | stack. I<o> is the root of the optree representing the scope; it is a |
| 2683 | double pointer so you can replace the OP if you need to. |
| 2684 | |
| 2685 | =item C<void bhk_post_end(pTHX_ OP **o)> |
| 2686 | |
| 2687 | This is called at the end of a lexical scope, just after unwinding the |
| 2688 | stack. I<o> is as above. Note that it is possible for calls to C<pre_> |
| 2689 | and C<post_end> to nest, if there is something on the save stack that |
| 2690 | calls string eval. |
| 2691 | |
| 2692 | =item C<void bhk_eval(pTHX_ OP *const o)> |
| 2693 | |
| 2694 | This is called just before starting to compile an C<eval STRING>, C<do |
| 2695 | FILE>, C<require> or C<use>, after the eval has been set up. I<o> is the |
| 2696 | OP that requested the eval, and will normally be an C<OP_ENTEREVAL>, |
| 2697 | C<OP_DOFILE> or C<OP_REQUIRE>. |
| 2698 | |
| 2699 | =back |
| 2700 | |
| 2701 | Once you have your hook functions, you need a C<BHK> structure to put |
| 2702 | them in. It's best to allocate it statically, since there is no way to |
| 2703 | free it once it's registered. The function pointers should be inserted |
| 2704 | into this structure using the C<BhkENTRY_set> macro, which will also set |
| 2705 | flags indicating which entries are valid. If you do need to allocate |
| 2706 | your C<BHK> dynamically for some reason, be sure to zero it before you |
| 2707 | start. |
| 2708 | |
| 2709 | Once registered, there is no mechanism to switch these hooks off, so if |
| 2710 | that is necessary you will need to do this yourself. An entry in C<%^H> |
| 2711 | is probably the best way, so the effect is lexically scoped; however it |
| 2712 | is also possible to use the C<BhkDISABLE> and C<BhkENABLE> macros to |
| 2713 | temporarily switch entries on and off. You should also be aware that |
| 2714 | generally speaking at least one scope will have opened before your |
| 2715 | extension is loaded, so you will see some C<pre>/C<post_end> pairs that |
| 2716 | didn't have a matching C<start>. |
| 2717 | |
| 2718 | =head1 Examining internal data structures with the C<dump> functions |
| 2719 | |
| 2720 | To aid debugging, the source file F<dump.c> contains a number of |
| 2721 | functions which produce formatted output of internal data structures. |
| 2722 | |
| 2723 | The most commonly used of these functions is C<Perl_sv_dump>; it's used |
| 2724 | for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls |
| 2725 | C<sv_dump> to produce debugging output from Perl-space, so users of that |
| 2726 | module should already be familiar with its format. |
| 2727 | |
| 2728 | C<Perl_op_dump> can be used to dump an C<OP> structure or any of its |
| 2729 | derivatives, and produces output similar to C<perl -Dx>; in fact, |
| 2730 | C<Perl_dump_eval> will dump the main root of the code being evaluated, |
| 2731 | exactly like C<-Dx>. |
| 2732 | |
| 2733 | =for apidoc_section $debugging |
| 2734 | =for apidoc dump_eval |
| 2735 | |
| 2736 | Other useful functions are C<Perl_dump_sub>, which turns a C<GV> into an |
| 2737 | op tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the |
| 2738 | subroutines in a package like so: (Thankfully, these are all xsubs, so |
| 2739 | there is no op tree) |
| 2740 | |
| 2741 | =for apidoc_section $debugging |
| 2742 | =for apidoc dump_sub |
| 2743 | |
| 2744 | (gdb) print Perl_dump_packsubs(PL_defstash) |
| 2745 | |
| 2746 | SUB attributes::bootstrap = (xsub 0x811fedc 0) |
| 2747 | |
| 2748 | SUB UNIVERSAL::can = (xsub 0x811f50c 0) |
| 2749 | |
| 2750 | SUB UNIVERSAL::isa = (xsub 0x811f304 0) |
| 2751 | |
| 2752 | SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0) |
| 2753 | |
| 2754 | SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0) |
| 2755 | |
| 2756 | and C<Perl_dump_all>, which dumps all the subroutines in the stash and |
| 2757 | the op tree of the main root. |
| 2758 | |
| 2759 | =head1 How multiple interpreters and concurrency are supported |
| 2760 | |
| 2761 | =head2 Background and MULTIPLICITY |
| 2762 | |
| 2763 | =for apidoc_section $concurrency |
| 2764 | =for apidoc Amnh||PERL_IMPLICIT_CONTEXT |
| 2765 | |
| 2766 | The Perl interpreter can be regarded as a closed box: it has an API |
| 2767 | for feeding it code or otherwise making it do things, but it also has |
| 2768 | functions for its own use. This smells a lot like an object, and |
| 2769 | there is a way for you to build Perl so that you can have multiple |
| 2770 | interpreters, with one interpreter represented either as a C structure, |
| 2771 | or inside a thread-specific structure. These structures contain all |
| 2772 | the context, the state of that interpreter. |
| 2773 | |
| 2774 | The macro that controls the major Perl build flavor is MULTIPLICITY. The |
| 2775 | MULTIPLICITY build has a C structure that packages all the interpreter |
| 2776 | state, which is being passed to various perl functions as a "hidden" |
| 2777 | first argument. MULTIPLICITY makes multi-threaded perls possible (with the |
| 2778 | ithreads threading model, related to the macro USE_ITHREADS.) |
| 2779 | |
| 2780 | PERL_IMPLICIT_CONTEXT is a legacy synonym for MULTIPLICITY. |
| 2781 | |
| 2782 | =for apidoc_section $concurrency |
| 2783 | =for apidoc Amnh||MULTIPLICITY |
| 2784 | |
| 2785 | To see whether you have non-const data you can use a BSD (or GNU) |
| 2786 | compatible C<nm>: |
| 2787 | |
| 2788 | nm libperl.a | grep -v ' [TURtr] ' |
| 2789 | |
| 2790 | If this displays any C<D> or C<d> symbols (or possibly C<C> or C<c>), |
| 2791 | you have non-const data. The symbols the C<grep> removed are as follows: |
| 2792 | C<Tt> are I<text>, or code, the C<Rr> are I<read-only> (const) data, |
| 2793 | and the C<U> is <undefined>, external symbols referred to. |
| 2794 | |
| 2795 | The test F<t/porting/libperl.t> does this kind of symbol sanity |
| 2796 | checking on C<libperl.a>. |
| 2797 | |
| 2798 | All this obviously requires a way for the Perl internal functions to be |
| 2799 | either subroutines taking some kind of structure as the first |
| 2800 | argument, or subroutines taking nothing as the first argument. To |
| 2801 | enable these two very different ways of building the interpreter, |
| 2802 | the Perl source (as it does in so many other situations) makes heavy |
| 2803 | use of macros and subroutine naming conventions. |
| 2804 | |
| 2805 | First problem: deciding which functions will be public API functions and |
| 2806 | which will be private. All functions whose names begin C<S_> are private |
| 2807 | (think "S" for "secret" or "static"). All other functions begin with |
| 2808 | "Perl_", but just because a function begins with "Perl_" does not mean it is |
| 2809 | part of the API. (See L</Internal |
| 2810 | Functions>.) The easiest way to be B<sure> a |
| 2811 | function is part of the API is to find its entry in L<perlapi>. |
| 2812 | If it exists in L<perlapi>, it's part of the API. If it doesn't, and you |
| 2813 | think it should be (i.e., you need it for your extension), submit an issue at |
| 2814 | L<https://github.com/Perl/perl5/issues> explaining why you think it should be. |
| 2815 | |
| 2816 | Second problem: there must be a syntax so that the same subroutine |
| 2817 | declarations and calls can pass a structure as their first argument, |
| 2818 | or pass nothing. To solve this, the subroutines are named and |
| 2819 | declared in a particular way. Here's a typical start of a static |
| 2820 | function used within the Perl guts: |
| 2821 | |
| 2822 | STATIC void |
| 2823 | S_incline(pTHX_ char *s) |
| 2824 | |
| 2825 | STATIC becomes "static" in C, and may be #define'd to nothing in some |
| 2826 | configurations in the future. |
| 2827 | |
| 2828 | =for apidoc_section $directives |
| 2829 | =for apidoc Ayh||STATIC |
| 2830 | |
| 2831 | A public function (i.e. part of the internal API, but not necessarily |
| 2832 | sanctioned for use in extensions) begins like this: |
| 2833 | |
| 2834 | void |
| 2835 | Perl_sv_setiv(pTHX_ SV* dsv, IV num) |
| 2836 | |
| 2837 | C<pTHX_> is one of a number of macros (in F<perl.h>) that hide the |
| 2838 | details of the interpreter's context. THX stands for "thread", "this", |
| 2839 | or "thingy", as the case may be. (And no, George Lucas is not involved. :-) |
| 2840 | The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument, |
| 2841 | or 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and |
| 2842 | their variants. |
| 2843 | |
| 2844 | =for apidoc_section $concurrency |
| 2845 | =for apidoc Amnh||aTHX |
| 2846 | =for apidoc Amnh||aTHX_ |
| 2847 | =for apidoc Amnh||dTHX |
| 2848 | =for apidoc Amnh||pTHX |
| 2849 | =for apidoc Amnh||pTHX_ |
| 2850 | |
| 2851 | When Perl is built without options that set MULTIPLICITY, there is no |
| 2852 | first argument containing the interpreter's context. The trailing underscore |
| 2853 | in the pTHX_ macro indicates that the macro expansion needs a comma |
| 2854 | after the context argument because other arguments follow it. If |
| 2855 | MULTIPLICITY is not defined, pTHX_ will be ignored, and the |
| 2856 | subroutine is not prototyped to take the extra argument. The form of the |
| 2857 | macro without the trailing underscore is used when there are no additional |
| 2858 | explicit arguments. |
| 2859 | |
| 2860 | When a core function calls another, it must pass the context. This |
| 2861 | is normally hidden via macros. Consider C<sv_setiv>. It expands into |
| 2862 | something like this: |
| 2863 | |
| 2864 | #ifdef MULTIPLICITY |
| 2865 | #define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b) |
| 2866 | /* can't do this for vararg functions, see below */ |
| 2867 | #else |
| 2868 | #define sv_setiv Perl_sv_setiv |
| 2869 | #endif |
| 2870 | |
| 2871 | This works well, and means that XS authors can gleefully write: |
| 2872 | |
| 2873 | sv_setiv(foo, bar); |
| 2874 | |
| 2875 | and still have it work under all the modes Perl could have been |
| 2876 | compiled with. |
| 2877 | |
| 2878 | This doesn't work so cleanly for varargs functions, though, as macros |
| 2879 | imply that the number of arguments is known in advance. Instead we |
| 2880 | either need to spell them out fully, passing C<aTHX_> as the first |
| 2881 | argument (the Perl core tends to do this with functions like |
| 2882 | Perl_warner), or use a context-free version. |
| 2883 | |
| 2884 | The context-free version of Perl_warner is called |
| 2885 | Perl_warner_nocontext, and does not take the extra argument. Instead |
| 2886 | it does C<dTHX;> to get the context from thread-local storage. We |
| 2887 | C<#define warner Perl_warner_nocontext> so that extensions get source |
| 2888 | compatibility at the expense of performance. (Passing an arg is |
| 2889 | cheaper than grabbing it from thread-local storage.) |
| 2890 | |
| 2891 | You can ignore [pad]THXx when browsing the Perl headers/sources. |
| 2892 | Those are strictly for use within the core. Extensions and embedders |
| 2893 | need only be aware of [pad]THX. |
| 2894 | |
| 2895 | =head2 So what happened to dTHR? |
| 2896 | |
| 2897 | =for apidoc_section $concurrency |
| 2898 | =for apidoc Amnh||dTHR |
| 2899 | |
| 2900 | C<dTHR> was introduced in perl 5.005 to support the older thread model. |
| 2901 | The older thread model now uses the C<THX> mechanism to pass context |
| 2902 | pointers around, so C<dTHR> is not useful any more. Perl 5.6.0 and |
| 2903 | later still have it for backward source compatibility, but it is defined |
| 2904 | to be a no-op. |
| 2905 | |
| 2906 | =head2 How do I use all this in extensions? |
| 2907 | |
| 2908 | When Perl is built with MULTIPLICITY, extensions that call |
| 2909 | any functions in the Perl API will need to pass the initial context |
| 2910 | argument somehow. The kicker is that you will need to write it in |
| 2911 | such a way that the extension still compiles when Perl hasn't been |
| 2912 | built with MULTIPLICITY enabled. |
| 2913 | |
| 2914 | There are three ways to do this. First, the easy but inefficient way, |
| 2915 | which is also the default, in order to maintain source compatibility |
| 2916 | with extensions: whenever F<XSUB.h> is #included, it redefines the aTHX |
| 2917 | and aTHX_ macros to call a function that will return the context. |
| 2918 | Thus, something like: |
| 2919 | |
| 2920 | sv_setiv(sv, num); |
| 2921 | |
| 2922 | in your extension will translate to this when MULTIPLICITY is |
| 2923 | in effect: |
| 2924 | |
| 2925 | Perl_sv_setiv(Perl_get_context(), sv, num); |
| 2926 | |
| 2927 | or to this otherwise: |
| 2928 | |
| 2929 | Perl_sv_setiv(sv, num); |
| 2930 | |
| 2931 | You don't have to do anything new in your extension to get this; since |
| 2932 | the Perl library provides Perl_get_context(), it will all just |
| 2933 | work. |
| 2934 | |
| 2935 | The second, more efficient way is to use the following template for |
| 2936 | your Foo.xs: |
| 2937 | |
| 2938 | #define PERL_NO_GET_CONTEXT /* we want efficiency */ |
| 2939 | #include "EXTERN.h" |
| 2940 | #include "perl.h" |
| 2941 | #include "XSUB.h" |
| 2942 | |
| 2943 | STATIC void my_private_function(int arg1, int arg2); |
| 2944 | |
| 2945 | STATIC void |
| 2946 | my_private_function(int arg1, int arg2) |
| 2947 | { |
| 2948 | dTHX; /* fetch context */ |
| 2949 | ... call many Perl API functions ... |
| 2950 | } |
| 2951 | |
| 2952 | [... etc ...] |
| 2953 | |
| 2954 | MODULE = Foo PACKAGE = Foo |
| 2955 | |
| 2956 | /* typical XSUB */ |
| 2957 | |
| 2958 | void |
| 2959 | my_xsub(arg) |
| 2960 | int arg |
| 2961 | CODE: |
| 2962 | my_private_function(arg, 10); |
| 2963 | |
| 2964 | Note that the only two changes from the normal way of writing an |
| 2965 | extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before |
| 2966 | including the Perl headers, followed by a C<dTHX;> declaration at |
| 2967 | the start of every function that will call the Perl API. (You'll |
| 2968 | know which functions need this, because the C compiler will complain |
| 2969 | that there's an undeclared identifier in those functions.) No changes |
| 2970 | are needed for the XSUBs themselves, because the XS() macro is |
| 2971 | correctly defined to pass in the implicit context if needed. |
| 2972 | |
| 2973 | =for apidoc_section $concurrency |
| 2974 | =for apidoc AmnhU#||PERL_NO_GET_CONTEXT |
| 2975 | |
| 2976 | The third, even more efficient way is to ape how it is done within |
| 2977 | the Perl guts: |
| 2978 | |
| 2979 | |
| 2980 | #define PERL_NO_GET_CONTEXT /* we want efficiency */ |
| 2981 | #include "EXTERN.h" |
| 2982 | #include "perl.h" |
| 2983 | #include "XSUB.h" |
| 2984 | |
| 2985 | /* pTHX_ only needed for functions that call Perl API */ |
| 2986 | STATIC void my_private_function(pTHX_ int arg1, int arg2); |
| 2987 | |
| 2988 | STATIC void |
| 2989 | my_private_function(pTHX_ int arg1, int arg2) |
| 2990 | { |
| 2991 | /* dTHX; not needed here, because THX is an argument */ |
| 2992 | ... call Perl API functions ... |
| 2993 | } |
| 2994 | |
| 2995 | [... etc ...] |
| 2996 | |
| 2997 | MODULE = Foo PACKAGE = Foo |
| 2998 | |
| 2999 | /* typical XSUB */ |
| 3000 | |
| 3001 | void |
| 3002 | my_xsub(arg) |
| 3003 | int arg |
| 3004 | CODE: |
| 3005 | my_private_function(aTHX_ arg, 10); |
| 3006 | |
| 3007 | This implementation never has to fetch the context using a function |
| 3008 | call, since it is always passed as an extra argument. Depending on |
| 3009 | your needs for simplicity or efficiency, you may mix the previous |
| 3010 | two approaches freely. |
| 3011 | |
| 3012 | Never add a comma after C<pTHX> yourself--always use the form of the |
| 3013 | macro with the underscore for functions that take explicit arguments, |
| 3014 | or the form without the argument for functions with no explicit arguments. |
| 3015 | |
| 3016 | =head2 Should I do anything special if I call perl from multiple threads? |
| 3017 | |
| 3018 | If you create interpreters in one thread and then proceed to call them in |
| 3019 | another, you need to make sure perl's own Thread Local Storage (TLS) slot is |
| 3020 | initialized correctly in each of those threads. |
| 3021 | |
| 3022 | The C<perl_alloc> and C<perl_clone> API functions will automatically set |
| 3023 | the TLS slot to the interpreter they created, so that there is no need to do |
| 3024 | anything special if the interpreter is always accessed in the same thread that |
| 3025 | created it, and that thread did not create or call any other interpreters |
| 3026 | afterwards. If that is not the case, you have to set the TLS slot of the |
| 3027 | thread before calling any functions in the Perl API on that particular |
| 3028 | interpreter. This is done by calling the C<PERL_SET_CONTEXT> macro in that |
| 3029 | thread as the first thing you do: |
| 3030 | |
| 3031 | /* do this before doing anything else with some_perl */ |
| 3032 | PERL_SET_CONTEXT(some_perl); |
| 3033 | |
| 3034 | ... other Perl API calls on some_perl go here ... |
| 3035 | |
| 3036 | =for apidoc_section $embedding |
| 3037 | =for apidoc Amh|void|PERL_SET_CONTEXT|PerlInterpreter* i |
| 3038 | |
| 3039 | (You can always get the current context via C<PERL_GET_CONTEXT>.) |
| 3040 | |
| 3041 | =for apidoc Amnh|PerlInterpreter*|PERL_GET_CONTEXT| |
| 3042 | |
| 3043 | =head2 Future Plans and PERL_IMPLICIT_SYS |
| 3044 | |
| 3045 | Just as MULTIPLICITY provides a way to bundle up everything |
| 3046 | that the interpreter knows about itself and pass it around, so too are |
| 3047 | there plans to allow the interpreter to bundle up everything it knows |
| 3048 | about the environment it's running on. This is enabled with the |
| 3049 | PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS on |
| 3050 | Windows. |
| 3051 | |
| 3052 | This allows the ability to provide an extra pointer (called the "host" |
| 3053 | environment) for all the system calls. This makes it possible for |
| 3054 | all the system stuff to maintain their own state, broken down into |
| 3055 | seven C structures. These are thin wrappers around the usual system |
| 3056 | calls (see F<win32/perllib.c>) for the default perl executable, but for a |
| 3057 | more ambitious host (like the one that would do fork() emulation) all |
| 3058 | the extra work needed to pretend that different interpreters are |
| 3059 | actually different "processes", would be done here. |
| 3060 | |
| 3061 | The Perl engine/interpreter and the host are orthogonal entities. |
| 3062 | There could be one or more interpreters in a process, and one or |
| 3063 | more "hosts", with free association between them. |
| 3064 | |
| 3065 | =head1 Internal Functions |
| 3066 | |
| 3067 | All of Perl's internal functions which will be exposed to the outside |
| 3068 | world are prefixed by C<Perl_> so that they will not conflict with XS |
| 3069 | functions or functions used in a program in which Perl is embedded. |
| 3070 | Similarly, all global variables begin with C<PL_>. (By convention, |
| 3071 | static functions start with C<S_>.) |
| 3072 | |
| 3073 | Inside the Perl core (C<PERL_CORE> defined), you can get at the functions |
| 3074 | either with or without the C<Perl_> prefix, thanks to a bunch of defines |
| 3075 | that live in F<embed.h>. Note that extension code should I<not> set |
| 3076 | C<PERL_CORE>; this exposes the full perl internals, and is likely to cause |
| 3077 | breakage of the XS in each new perl release. |
| 3078 | |
| 3079 | The file F<embed.h> is generated automatically from |
| 3080 | F<embed.pl> and F<embed.fnc>. F<embed.pl> also creates the prototyping |
| 3081 | header files for the internal functions, generates the documentation |
| 3082 | and a lot of other bits and pieces. It's important that when you add |
| 3083 | a new function to the core or change an existing one, you change the |
| 3084 | data in the table in F<embed.fnc> as well. Here's a sample entry from |
| 3085 | that table: |
| 3086 | |
| 3087 | Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval |
| 3088 | |
| 3089 | The first column is a set of flags, the second column the return type, |
| 3090 | the third column the name. Columns after that are the arguments. |
| 3091 | The flags are documented at the top of F<embed.fnc>. |
| 3092 | |
| 3093 | If you edit F<embed.pl> or F<embed.fnc>, you will need to run |
| 3094 | C<make regen_headers> to force a rebuild of F<embed.h> and other |
| 3095 | auto-generated files. |
| 3096 | |
| 3097 | =head2 Formatted Printing of IVs, UVs, and NVs |
| 3098 | |
| 3099 | If you are printing IVs, UVs, or NVS instead of the stdio(3) style |
| 3100 | formatting codes like C<%d>, C<%ld>, C<%f>, you should use the |
| 3101 | following macros for portability |
| 3102 | |
| 3103 | IVdf IV in decimal |
| 3104 | UVuf UV in decimal |
| 3105 | UVof UV in octal |
| 3106 | UVxf UV in hexadecimal |
| 3107 | NVef NV %e-like |
| 3108 | NVff NV %f-like |
| 3109 | NVgf NV %g-like |
| 3110 | |
| 3111 | These will take care of 64-bit integers and long doubles. |
| 3112 | For example: |
| 3113 | |
| 3114 | printf("IV is %" IVdf "\n", iv); |
| 3115 | |
| 3116 | The C<IVdf> will expand to whatever is the correct format for the IVs. |
| 3117 | Note that the spaces are required around the format in case the code is |
| 3118 | compiled with C++, to maintain compliance with its standard. |
| 3119 | |
| 3120 | Note that there are different "long doubles": Perl will use |
| 3121 | whatever the compiler has. |
| 3122 | |
| 3123 | If you are printing addresses of pointers, use %p or UVxf combined |
| 3124 | with PTR2UV(). |
| 3125 | |
| 3126 | =head2 Formatted Printing of SVs |
| 3127 | |
| 3128 | The contents of SVs may be printed using the C<SVf> format, like so: |
| 3129 | |
| 3130 | Perl_croak(aTHX_ "This croaked because: %" SVf "\n", SVfARG(err_msg)) |
| 3131 | |
| 3132 | where C<err_msg> is an SV. |
| 3133 | |
| 3134 | =for apidoc_section $io_formats |
| 3135 | =for apidoc Amnh||SVf |
| 3136 | =for apidoc Amh||SVfARG|SV *sv |
| 3137 | |
| 3138 | Not all scalar types are printable. Simple values certainly are: one of |
| 3139 | IV, UV, NV, or PV. Also, if the SV is a reference to some value, |
| 3140 | either it will be dereferenced and the value printed, or information |
| 3141 | about the type of that value and its address are displayed. The results |
| 3142 | of printing any other type of SV are undefined and likely to lead to an |
| 3143 | interpreter crash. NVs are printed using a C<%g>-ish format. |
| 3144 | |
| 3145 | Note that the spaces are required around the C<SVf> in case the code is |
| 3146 | compiled with C++, to maintain compliance with its standard. |
| 3147 | |
| 3148 | Note that any filehandle being printed to under UTF-8 must be expecting |
| 3149 | UTF-8 in order to get good results and avoid Wide-character warnings. |
| 3150 | One way to do this for typical filehandles is to invoke perl with the |
| 3151 | C<-C> parameter. (See L<perlrun/-C [numberE<sol>list]>. |
| 3152 | |
| 3153 | You can use this to concatenate two scalars: |
| 3154 | |
| 3155 | SV *var1 = get_sv("var1", GV_ADD); |
| 3156 | SV *var2 = get_sv("var2", GV_ADD); |
| 3157 | SV *var3 = newSVpvf("var1=%" SVf " and var2=%" SVf, |
| 3158 | SVfARG(var1), SVfARG(var2)); |
| 3159 | |
| 3160 | =for apidoc Amnh||SVf_QUOTEDPREFIX |
| 3161 | |
| 3162 | C<SVf_QUOTEDPREFIX> is similar to C<SVf> except that it restricts the |
| 3163 | number of the characters printed, showing at most the first |
| 3164 | C<PERL_QUOTEDPREFIX_LEN> characters of the argument, and rendering it with |
| 3165 | double quotes and with the contents escaped using double quoted string |
| 3166 | escaping rules. If the string is longer than this then ellipses "..." |
| 3167 | will be appended after the trailing quote. This is intended for error |
| 3168 | messages where the string is assumed to be a class name. |
| 3169 | |
| 3170 | =for apidoc Amnh||HvNAMEf |
| 3171 | =for apidoc Amnh||HvNAMEf_QUOTEDPREFIX |
| 3172 | |
| 3173 | C<HvNAMEf> and C<HvNAMEf_QUOTEDPREFIX> are similar to C<SVf> except they |
| 3174 | extract the string, length and utf8 flags from the argument using the |
| 3175 | C<HvNAME()>, C<HvNAMELEN()>, C<HvNAMEUTF8()> macros. This is intended |
| 3176 | for stringifying a class name directly from an stash HV. |
| 3177 | |
| 3178 | =head2 Formatted Printing of Strings |
| 3179 | |
| 3180 | If you just want the bytes printed in a 7bit NUL-terminated string, you can |
| 3181 | just use C<%s> (assuming they are all really only 7bit). But if there is a |
| 3182 | possibility the value will be encoded as UTF-8 or contains bytes above |
| 3183 | C<0x7F> (and therefore 8bit), you should instead use the C<UTF8f> format. |
| 3184 | And as its parameter, use the C<UTF8fARG()> macro: |
| 3185 | |
| 3186 | chr * msg; |
| 3187 | |
| 3188 | /* U+2018: \xE2\x80\x98 LEFT SINGLE QUOTATION MARK |
| 3189 | U+2019: \xE2\x80\x99 RIGHT SINGLE QUOTATION MARK */ |
| 3190 | if (can_utf8) |
| 3191 | msg = "\xE2\x80\x98Uses fancy quotes\xE2\x80\x99"; |
| 3192 | else |
| 3193 | msg = "'Uses simple quotes'"; |
| 3194 | |
| 3195 | Perl_croak(aTHX_ "The message is: %" UTF8f "\n", |
| 3196 | UTF8fARG(can_utf8, strlen(msg), msg)); |
| 3197 | |
| 3198 | The first parameter to C<UTF8fARG> is a boolean: 1 if the string is in |
| 3199 | UTF-8; 0 if string is in native byte encoding (Latin1). |
| 3200 | The second parameter is the number of bytes in the string to print. |
| 3201 | And the third and final parameter is a pointer to the first byte in the |
| 3202 | string. |
| 3203 | |
| 3204 | Note that any filehandle being printed to under UTF-8 must be expecting |
| 3205 | UTF-8 in order to get good results and avoid Wide-character warnings. |
| 3206 | One way to do this for typical filehandles is to invoke perl with the |
| 3207 | C<-C> parameter. (See L<perlrun/-C [numberE<sol>list]>. |
| 3208 | |
| 3209 | =for apidoc_section $io_formats |
| 3210 | =for apidoc Amnh||UTF8f |
| 3211 | Output a possibly UTF8 value. Be sure to use UTF8fARG() to compose |
| 3212 | the arguments for this format. |
| 3213 | =for apidoc Amnh||UTF8f_QUOTEDPREFIX |
| 3214 | Same as C<UTF8f> but the output is quoted, escaped and length limited. |
| 3215 | See C<SVf_QUOTEDPREFIX> for more details on escaping. |
| 3216 | =for apidoc Amh||UTF8fARG|bool is_utf8|Size_t byte_len|char *str |
| 3217 | |
| 3218 | =cut |
| 3219 | |
| 3220 | =head2 Formatted Printing of C<Size_t> and C<SSize_t> |
| 3221 | |
| 3222 | The most general way to do this is to cast them to a UV or IV, and |
| 3223 | print as in the |
| 3224 | L<previous section|/Formatted Printing of IVs, UVs, and NVs>. |
| 3225 | |
| 3226 | But if you're using C<PerlIO_printf()>, it's less typing and visual |
| 3227 | clutter to use the C<%z> length modifier (for I<siZe>): |
| 3228 | |
| 3229 | PerlIO_printf("STRLEN is %zu\n", len); |
| 3230 | |
| 3231 | This modifier is not portable, so its use should be restricted to |
| 3232 | C<PerlIO_printf()>. |
| 3233 | |
| 3234 | =head2 Formatted Printing of C<Ptrdiff_t>, C<intmax_t>, C<short> and other special sizes |
| 3235 | |
| 3236 | There are modifiers for these special situations if you are using |
| 3237 | C<PerlIO_printf()>. See L<perlfunc/size>. |
| 3238 | |
| 3239 | =head2 Pointer-To-Integer and Integer-To-Pointer |
| 3240 | |
| 3241 | Because pointer size does not necessarily equal integer size, |
| 3242 | use the follow macros to do it right. |
| 3243 | |
| 3244 | PTR2UV(pointer) |
| 3245 | PTR2IV(pointer) |
| 3246 | PTR2NV(pointer) |
| 3247 | INT2PTR(pointertotype, integer) |
| 3248 | |
| 3249 | =for apidoc_section $casting |
| 3250 | =for apidoc Amh|type|INT2PTR|type|int value |
| 3251 | =for apidoc Amh|UV|PTR2UV|void * ptr |
| 3252 | =for apidoc Amh|IV|PTR2IV|void * ptr |
| 3253 | =for apidoc Amh|NV|PTR2NV|void * ptr |
| 3254 | |
| 3255 | For example: |
| 3256 | |
| 3257 | IV iv = ...; |
| 3258 | SV *sv = INT2PTR(SV*, iv); |
| 3259 | |
| 3260 | and |
| 3261 | |
| 3262 | AV *av = ...; |
| 3263 | UV uv = PTR2UV(av); |
| 3264 | |
| 3265 | There are also |
| 3266 | |
| 3267 | PTR2nat(pointer) /* pointer to integer of PTRSIZE */ |
| 3268 | PTR2ul(pointer) /* pointer to unsigned long */ |
| 3269 | |
| 3270 | =for apidoc Amh|IV|PTR2nat|void * |
| 3271 | =for apidoc Amh|unsigned long|PTR2ul|void * |
| 3272 | |
| 3273 | And C<PTRV> which gives the native type for an integer the same size as |
| 3274 | pointers, such as C<unsigned> or C<unsigned long>. |
| 3275 | |
| 3276 | =for apidoc Ayh|type|PTRV |
| 3277 | |
| 3278 | =head2 Exception Handling |
| 3279 | |
| 3280 | There are a couple of macros to do very basic exception handling in XS |
| 3281 | modules. You have to define C<NO_XSLOCKS> before including F<XSUB.h> to |
| 3282 | be able to use these macros: |
| 3283 | |
| 3284 | #define NO_XSLOCKS |
| 3285 | #include "XSUB.h" |
| 3286 | |
| 3287 | You can use these macros if you call code that may croak, but you need |
| 3288 | to do some cleanup before giving control back to Perl. For example: |
| 3289 | |
| 3290 | dXCPT; /* set up necessary variables */ |
| 3291 | |
| 3292 | XCPT_TRY_START { |
| 3293 | code_that_may_croak(); |
| 3294 | } XCPT_TRY_END |
| 3295 | |
| 3296 | XCPT_CATCH |
| 3297 | { |
| 3298 | /* do cleanup here */ |
| 3299 | XCPT_RETHROW; |
| 3300 | } |
| 3301 | |
| 3302 | Note that you always have to rethrow an exception that has been |
| 3303 | caught. Using these macros, it is not possible to just catch the |
| 3304 | exception and ignore it. If you have to ignore the exception, you |
| 3305 | have to use the C<call_*> function. |
| 3306 | |
| 3307 | The advantage of using the above macros is that you don't have |
| 3308 | to setup an extra function for C<call_*>, and that using these |
| 3309 | macros is faster than using C<call_*>. |
| 3310 | |
| 3311 | =head2 Source Documentation |
| 3312 | |
| 3313 | There's an effort going on to document the internal functions and |
| 3314 | automatically produce reference manuals from them -- L<perlapi> is one |
| 3315 | such manual which details all the functions which are available to XS |
| 3316 | writers. L<perlintern> is the autogenerated manual for the functions |
| 3317 | which are not part of the API and are supposedly for internal use only. |
| 3318 | |
| 3319 | Source documentation is created by putting POD comments into the C |
| 3320 | source, like this: |
| 3321 | |
| 3322 | /* |
| 3323 | =for apidoc sv_setiv |
| 3324 | |
| 3325 | Copies an integer into the given SV. Does not handle 'set' magic. See |
| 3326 | L<perlapi/sv_setiv_mg>. |
| 3327 | |
| 3328 | =cut |
| 3329 | */ |
| 3330 | |
| 3331 | Please try and supply some documentation if you add functions to the |
| 3332 | Perl core. |
| 3333 | |
| 3334 | =head2 Backwards compatibility |
| 3335 | |
| 3336 | The Perl API changes over time. New functions are |
| 3337 | added or the interfaces of existing functions are |
| 3338 | changed. The C<Devel::PPPort> module tries to |
| 3339 | provide compatibility code for some of these changes, so XS writers don't |
| 3340 | have to code it themselves when supporting multiple versions of Perl. |
| 3341 | |
| 3342 | C<Devel::PPPort> generates a C header file F<ppport.h> that can also |
| 3343 | be run as a Perl script. To generate F<ppport.h>, run: |
| 3344 | |
| 3345 | perl -MDevel::PPPort -eDevel::PPPort::WriteFile |
| 3346 | |
| 3347 | Besides checking existing XS code, the script can also be used to retrieve |
| 3348 | compatibility information for various API calls using the C<--api-info> |
| 3349 | command line switch. For example: |
| 3350 | |
| 3351 | % perl ppport.h --api-info=sv_magicext |
| 3352 | |
| 3353 | For details, see S<C<perldoc ppport.h>>. |
| 3354 | |
| 3355 | =head1 Unicode Support |
| 3356 | |
| 3357 | Perl 5.6.0 introduced Unicode support. It's important for porters and XS |
| 3358 | writers to understand this support and make sure that the code they |
| 3359 | write does not corrupt Unicode data. |
| 3360 | |
| 3361 | =head2 What B<is> Unicode, anyway? |
| 3362 | |
| 3363 | In the olden, less enlightened times, we all used to use ASCII. Most of |
| 3364 | us did, anyway. The big problem with ASCII is that it's American. Well, |
| 3365 | no, that's not actually the problem; the problem is that it's not |
| 3366 | particularly useful for people who don't use the Roman alphabet. What |
| 3367 | used to happen was that particular languages would stick their own |
| 3368 | alphabet in the upper range of the sequence, between 128 and 255. Of |
| 3369 | course, we then ended up with plenty of variants that weren't quite |
| 3370 | ASCII, and the whole point of it being a standard was lost. |
| 3371 | |
| 3372 | Worse still, if you've got a language like Chinese or |
| 3373 | Japanese that has hundreds or thousands of characters, then you really |
| 3374 | can't fit them into a mere 256, so they had to forget about ASCII |
| 3375 | altogether, and build their own systems using pairs of numbers to refer |
| 3376 | to one character. |
| 3377 | |
| 3378 | To fix this, some people formed Unicode, Inc. and |
| 3379 | produced a new character set containing all the characters you can |
| 3380 | possibly think of and more. There are several ways of representing these |
| 3381 | characters, and the one Perl uses is called UTF-8. UTF-8 uses |
| 3382 | a variable number of bytes to represent a character. You can learn more |
| 3383 | about Unicode and Perl's Unicode model in L<perlunicode>. |
| 3384 | |
| 3385 | (On EBCDIC platforms, Perl uses instead UTF-EBCDIC, which is a form of |
| 3386 | UTF-8 adapted for EBCDIC platforms. Below, we just talk about UTF-8. |
| 3387 | UTF-EBCDIC is like UTF-8, but the details are different. The macros |
| 3388 | hide the differences from you, just remember that the particular numbers |
| 3389 | and bit patterns presented below will differ in UTF-EBCDIC.) |
| 3390 | |
| 3391 | =head2 How can I recognise a UTF-8 string? |
| 3392 | |
| 3393 | You can't. This is because UTF-8 data is stored in bytes just like |
| 3394 | non-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types) |
| 3395 | capital E with a grave accent, is represented by the two bytes |
| 3396 | C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)> |
| 3397 | has that byte sequence as well. So you can't tell just by looking -- this |
| 3398 | is what makes Unicode input an interesting problem. |
| 3399 | |
| 3400 | In general, you either have to know what you're dealing with, or you |
| 3401 | have to guess. The API function C<is_utf8_string> can help; it'll tell |
| 3402 | you if a string contains only valid UTF-8 characters, and the chances |
| 3403 | of a non-UTF-8 string looking like valid UTF-8 become very small very |
| 3404 | quickly with increasing string length. On a character-by-character |
| 3405 | basis, C<isUTF8_CHAR> |
| 3406 | will tell you whether the current character in a string is valid UTF-8. |
| 3407 | |
| 3408 | =head2 How does UTF-8 represent Unicode characters? |
| 3409 | |
| 3410 | As mentioned above, UTF-8 uses a variable number of bytes to store a |
| 3411 | character. Characters with values 0...127 are stored in one |
| 3412 | byte, just like good ol' ASCII. Character 128 is stored as |
| 3413 | C<v194.128>; this continues up to character 191, which is |
| 3414 | C<v194.191>. Now we've run out of bits (191 is binary |
| 3415 | C<10111111>) so we move on; character 192 is C<v195.128>. And |
| 3416 | so it goes on, moving to three bytes at character 2048. |
| 3417 | L<perlunicode/Unicode Encodings> has pictures of how this works. |
| 3418 | |
| 3419 | Assuming you know you're dealing with a UTF-8 string, you can find out |
| 3420 | how long the first character in it is with the C<UTF8SKIP> macro: |
| 3421 | |
| 3422 | char *utf = "\305\233\340\240\201"; |
| 3423 | I32 len; |
| 3424 | |
| 3425 | len = UTF8SKIP(utf); /* len is 2 here */ |
| 3426 | utf += len; |
| 3427 | len = UTF8SKIP(utf); /* len is 3 here */ |
| 3428 | |
| 3429 | Another way to skip over characters in a UTF-8 string is to use |
| 3430 | C<utf8_hop>, which takes a string and a number of characters to skip |
| 3431 | over. You're on your own about bounds checking, though, so don't use it |
| 3432 | lightly. |
| 3433 | |
| 3434 | All bytes in a multi-byte UTF-8 character will have the high bit set, |
| 3435 | so you can test if you need to do something special with this |
| 3436 | character like this (the C<UTF8_IS_INVARIANT()> is a macro that tests |
| 3437 | whether the byte is encoded as a single byte even in UTF-8): |
| 3438 | |
| 3439 | U8 *utf; /* Initialize this to point to the beginning of the |
| 3440 | sequence to convert */ |
| 3441 | U8 *utf_end; /* Initialize this to 1 beyond the end of the sequence |
| 3442 | pointed to by 'utf' */ |
| 3443 | UV uv; /* Returned code point; note: a UV, not a U8, not a |
| 3444 | char */ |
| 3445 | STRLEN len; /* Returned length of character in bytes */ |
| 3446 | |
| 3447 | if (!UTF8_IS_INVARIANT(*utf)) |
| 3448 | /* Must treat this as UTF-8 */ |
| 3449 | uv = utf8_to_uvchr_buf(utf, utf_end, &len); |
| 3450 | else |
| 3451 | /* OK to treat this character as a byte */ |
| 3452 | uv = *utf; |
| 3453 | |
| 3454 | You can also see in that example that we use C<utf8_to_uvchr_buf> to get the |
| 3455 | value of the character; the inverse function C<uvchr_to_utf8> is available |
| 3456 | for putting a UV into UTF-8: |
| 3457 | |
| 3458 | if (!UVCHR_IS_INVARIANT(uv)) |
| 3459 | /* Must treat this as UTF8 */ |
| 3460 | utf8 = uvchr_to_utf8(utf8, uv); |
| 3461 | else |
| 3462 | /* OK to treat this character as a byte */ |
| 3463 | *utf8++ = uv; |
| 3464 | |
| 3465 | You B<must> convert characters to UVs using the above functions if |
| 3466 | you're ever in a situation where you have to match UTF-8 and non-UTF-8 |
| 3467 | characters. You may not skip over UTF-8 characters in this case. If you |
| 3468 | do this, you'll lose the ability to match hi-bit non-UTF-8 characters; |
| 3469 | for instance, if your UTF-8 string contains C<v196.172>, and you skip |
| 3470 | that character, you can never match a C<chr(200)> in a non-UTF-8 string. |
| 3471 | So don't do that! |
| 3472 | |
| 3473 | (Note that we don't have to test for invariant characters in the |
| 3474 | examples above. The functions work on any well-formed UTF-8 input. |
| 3475 | It's just that its faster to avoid the function overhead when it's not |
| 3476 | needed.) |
| 3477 | |
| 3478 | =head2 How does Perl store UTF-8 strings? |
| 3479 | |
| 3480 | Currently, Perl deals with UTF-8 strings and non-UTF-8 strings |
| 3481 | slightly differently. A flag in the SV, C<SVf_UTF8>, indicates that the |
| 3482 | string is internally encoded as UTF-8. Without it, the byte value is the |
| 3483 | codepoint number and vice versa. This flag is only meaningful if the SV |
| 3484 | is C<SvPOK> or immediately after stringification via C<SvPV> or a |
| 3485 | similar macro. You can check and manipulate this flag with the |
| 3486 | following macros: |
| 3487 | |
| 3488 | SvUTF8(sv) |
| 3489 | SvUTF8_on(sv) |
| 3490 | SvUTF8_off(sv) |
| 3491 | |
| 3492 | This flag has an important effect on Perl's treatment of the string: if |
| 3493 | UTF-8 data is not properly distinguished, regular expressions, |
| 3494 | C<length>, C<substr> and other string handling operations will have |
| 3495 | undesirable (wrong) results. |
| 3496 | |
| 3497 | The problem comes when you have, for instance, a string that isn't |
| 3498 | flagged as UTF-8, and contains a byte sequence that could be UTF-8 -- |
| 3499 | especially when combining non-UTF-8 and UTF-8 strings. |
| 3500 | |
| 3501 | Never forget that the C<SVf_UTF8> flag is separate from the PV value; you |
| 3502 | need to be sure you don't accidentally knock it off while you're |
| 3503 | manipulating SVs. More specifically, you cannot expect to do this: |
| 3504 | |
| 3505 | SV *sv; |
| 3506 | SV *nsv; |
| 3507 | STRLEN len; |
| 3508 | char *p; |
| 3509 | |
| 3510 | p = SvPV(sv, len); |
| 3511 | frobnicate(p); |
| 3512 | nsv = newSVpvn(p, len); |
| 3513 | |
| 3514 | The C<char*> string does not tell you the whole story, and you can't |
| 3515 | copy or reconstruct an SV just by copying the string value. Check if the |
| 3516 | old SV has the UTF8 flag set (I<after> the C<SvPV> call), and act |
| 3517 | accordingly: |
| 3518 | |
| 3519 | p = SvPV(sv, len); |
| 3520 | is_utf8 = SvUTF8(sv); |
| 3521 | frobnicate(p, is_utf8); |
| 3522 | nsv = newSVpvn(p, len); |
| 3523 | if (is_utf8) |
| 3524 | SvUTF8_on(nsv); |
| 3525 | |
| 3526 | In the above, your C<frobnicate> function has been changed to be made |
| 3527 | aware of whether or not it's dealing with UTF-8 data, so that it can |
| 3528 | handle the string appropriately. |
| 3529 | |
| 3530 | Since just passing an SV to an XS function and copying the data of |
| 3531 | the SV is not enough to copy the UTF8 flags, even less right is just |
| 3532 | passing a S<C<char *>> to an XS function. |
| 3533 | |
| 3534 | For full generality, use the L<C<DO_UTF8>|perlapi/DO_UTF8> macro to see if the |
| 3535 | string in an SV is to be I<treated> as UTF-8. This takes into account |
| 3536 | if the call to the XS function is being made from within the scope of |
| 3537 | L<S<C<use bytes>>|bytes>. If so, the underlying bytes that comprise the |
| 3538 | UTF-8 string are to be exposed, rather than the character they |
| 3539 | represent. But this pragma should only really be used for debugging and |
| 3540 | perhaps low-level testing at the byte level. Hence most XS code need |
| 3541 | not concern itself with this, but various areas of the perl core do need |
| 3542 | to support it. |
| 3543 | |
| 3544 | And this isn't the whole story. Starting in Perl v5.12, strings that |
| 3545 | aren't encoded in UTF-8 may also be treated as Unicode under various |
| 3546 | conditions (see L<perlunicode/ASCII Rules versus Unicode Rules>). |
| 3547 | This is only really a problem for characters whose ordinals are between |
| 3548 | 128 and 255, and their behavior varies under ASCII versus Unicode rules |
| 3549 | in ways that your code cares about (see L<perlunicode/The "Unicode Bug">). |
| 3550 | There is no published API for dealing with this, as it is subject to |
| 3551 | change, but you can look at the code for C<pp_lc> in F<pp.c> for an |
| 3552 | example as to how it's currently done. |
| 3553 | |
| 3554 | =head2 How do I pass a Perl string to a C library? |
| 3555 | |
| 3556 | A Perl string, conceptually, is an opaque sequence of code points. |
| 3557 | Many C libraries expect their inputs to be "classical" C strings, which are |
| 3558 | arrays of octets 1-255, terminated with a NUL byte. Your job when writing |
| 3559 | an interface between Perl and a C library is to define the mapping between |
| 3560 | Perl and that library. |
| 3561 | |
| 3562 | Generally speaking, C<SvPVbyte> and related macros suit this task well. |
| 3563 | These assume that your Perl string is a "byte string", i.e., is either |
| 3564 | raw, undecoded input into Perl or is pre-encoded to, e.g., UTF-8. |
| 3565 | |
| 3566 | Alternatively, if your C library expects UTF-8 text, you can use |
| 3567 | C<SvPVutf8> and related macros. This has the same effect as encoding |
| 3568 | to UTF-8 then calling the corresponding C<SvPVbyte>-related macro. |
| 3569 | |
| 3570 | Some C libraries may expect other encodings (e.g., UTF-16LE). To give |
| 3571 | Perl strings to such libraries |
| 3572 | you must either do that encoding in Perl then use C<SvPVbyte>, or |
| 3573 | use an intermediary C library to convert from however Perl stores the |
| 3574 | string to the desired encoding. |
| 3575 | |
| 3576 | Take care also that NULs in your Perl string don't confuse the C |
| 3577 | library. If possible, give the string's length to the C library; if that's |
| 3578 | not possible, consider rejecting strings that contain NUL bytes. |
| 3579 | |
| 3580 | =head3 What about C<SvPV>, C<SvPV_nolen>, etc.? |
| 3581 | |
| 3582 | Consider a 3-character Perl string C<$foo = "\x64\x78\x8c">. |
| 3583 | Perl can store these 3 characters either of two ways: |
| 3584 | |
| 3585 | =over |
| 3586 | |
| 3587 | =item * bytes: 0x64 0x78 0x8c |
| 3588 | |
| 3589 | =item * UTF-8: 0x64 0x78 0xc2 0x8c |
| 3590 | |
| 3591 | =back |
| 3592 | |
| 3593 | Now let's say you convert C<$foo> to a C string thus: |
| 3594 | |
| 3595 | STRLEN strlen; |
| 3596 | char *str = SvPV(foo_sv, strlen); |
| 3597 | |
| 3598 | At this point C<str> could point to a 3-byte C string or a 4-byte one. |
| 3599 | |
| 3600 | Generally speaking, we want C<str> to be the same regardless of how |
| 3601 | Perl stores C<$foo>, so the ambiguity here is undesirable. C<SvPVbyte> |
| 3602 | and C<SvPVutf8> solve that by giving predictable output: use |
| 3603 | C<SvPVbyte> if your C library expects byte strings, or C<SvPVutf8> |
| 3604 | if it expects UTF-8. |
| 3605 | |
| 3606 | If your C library happens to support both encodings, then C<SvPV>--always |
| 3607 | in tandem with lookups to C<SvUTF8>!--may be safe and (slightly) more |
| 3608 | efficient. |
| 3609 | |
| 3610 | B<TESTING> B<TIP:> Use L<utf8>'s C<upgrade> and C<downgrade> functions |
| 3611 | in your tests to ensure consistent handling regardless of Perl's |
| 3612 | internal encoding. |
| 3613 | |
| 3614 | =head2 How do I convert a string to UTF-8? |
| 3615 | |
| 3616 | If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to upgrade |
| 3617 | the non-UTF-8 strings to UTF-8. If you've got an SV, the easiest way to do |
| 3618 | this is: |
| 3619 | |
| 3620 | sv_utf8_upgrade(sv); |
| 3621 | |
| 3622 | However, you must not do this, for example: |
| 3623 | |
| 3624 | if (!SvUTF8(left)) |
| 3625 | sv_utf8_upgrade(left); |
| 3626 | |
| 3627 | If you do this in a binary operator, you will actually change one of the |
| 3628 | strings that came into the operator, and, while it shouldn't be noticeable |
| 3629 | by the end user, it can cause problems in deficient code. |
| 3630 | |
| 3631 | Instead, C<bytes_to_utf8> will give you a UTF-8-encoded B<copy> of its |
| 3632 | string argument. This is useful for having the data available for |
| 3633 | comparisons and so on, without harming the original SV. There's also |
| 3634 | C<utf8_to_bytes> to go the other way, but naturally, this will fail if |
| 3635 | the string contains any characters above 255 that can't be represented |
| 3636 | in a single byte. |
| 3637 | |
| 3638 | =head2 How do I compare strings? |
| 3639 | |
| 3640 | L<perlapi/sv_cmp> and L<perlapi/sv_cmp_flags> do a lexigraphic |
| 3641 | comparison of two SV's, and handle UTF-8ness properly. Note, however, |
| 3642 | that Unicode specifies a much fancier mechanism for collation, available |
| 3643 | via the L<Unicode::Collate> module. |
| 3644 | |
| 3645 | To just compare two strings for equality/non-equality, you can just use |
| 3646 | L<C<memEQ()>|perlapi/memEQ> and L<C<memNE()>|perlapi/memEQ> as usual, |
| 3647 | except the strings must be both UTF-8 or not UTF-8 encoded. |
| 3648 | |
| 3649 | To compare two strings case-insensitively, use |
| 3650 | L<C<foldEQ_utf8()>|perlapi/foldEQ_utf8> (the strings don't have to have |
| 3651 | the same UTF-8ness). |
| 3652 | |
| 3653 | =head2 Is there anything else I need to know? |
| 3654 | |
| 3655 | Not really. Just remember these things: |
| 3656 | |
| 3657 | =over 3 |
| 3658 | |
| 3659 | =item * |
| 3660 | |
| 3661 | There's no way to tell if a S<C<char *>> or S<C<U8 *>> string is UTF-8 |
| 3662 | or not. But you can tell if an SV is to be treated as UTF-8 by calling |
| 3663 | C<DO_UTF8> on it, after stringifying it with C<SvPV> or a similar |
| 3664 | macro. And, you can tell if SV is actually UTF-8 (even if it is not to |
| 3665 | be treated as such) by looking at its C<SvUTF8> flag (again after |
| 3666 | stringifying it). Don't forget to set the flag if something should be |
| 3667 | UTF-8. |
| 3668 | Treat the flag as part of the PV, even though it's not -- if you pass on |
| 3669 | the PV to somewhere, pass on the flag too. |
| 3670 | |
| 3671 | =item * |
| 3672 | |
| 3673 | If a string is UTF-8, B<always> use C<utf8_to_uvchr_buf> to get at the value, |
| 3674 | unless C<UTF8_IS_INVARIANT(*s)> in which case you can use C<*s>. |
| 3675 | |
| 3676 | =item * |
| 3677 | |
| 3678 | When writing a character UV to a UTF-8 string, B<always> use |
| 3679 | C<uvchr_to_utf8>, unless C<UVCHR_IS_INVARIANT(uv))> in which case |
| 3680 | you can use C<*s = uv>. |
| 3681 | |
| 3682 | =item * |
| 3683 | |
| 3684 | Mixing UTF-8 and non-UTF-8 strings is |
| 3685 | tricky. Use C<bytes_to_utf8> to get |
| 3686 | a new string which is UTF-8 encoded, and then combine them. |
| 3687 | |
| 3688 | =back |
| 3689 | |
| 3690 | =head1 Custom Operators |
| 3691 | |
| 3692 | Custom operator support is an experimental feature that allows you to |
| 3693 | define your own ops. This is primarily to allow the building of |
| 3694 | interpreters for other languages in the Perl core, but it also allows |
| 3695 | optimizations through the creation of "macro-ops" (ops which perform the |
| 3696 | functions of multiple ops which are usually executed together, such as |
| 3697 | C<gvsv, gvsv, add>.) |
| 3698 | |
| 3699 | This feature is implemented as a new op type, C<OP_CUSTOM>. The Perl |
| 3700 | core does not "know" anything special about this op type, and so it will |
| 3701 | not be involved in any optimizations. This also means that you can |
| 3702 | define your custom ops to be any op structure -- unary, binary, list and |
| 3703 | so on -- you like. |
| 3704 | |
| 3705 | It's important to know what custom operators won't do for you. They |
| 3706 | won't let you add new syntax to Perl, directly. They won't even let you |
| 3707 | add new keywords, directly. In fact, they won't change the way Perl |
| 3708 | compiles a program at all. You have to do those changes yourself, after |
| 3709 | Perl has compiled the program. You do this either by manipulating the op |
| 3710 | tree using a C<CHECK> block and the C<B::Generate> module, or by adding |
| 3711 | a custom peephole optimizer with the C<optimize> module. |
| 3712 | |
| 3713 | When you do this, you replace ordinary Perl ops with custom ops by |
| 3714 | creating ops with the type C<OP_CUSTOM> and the C<op_ppaddr> of your own |
| 3715 | PP function. This should be defined in XS code, and should look like |
| 3716 | the PP ops in C<pp_*.c>. You are responsible for ensuring that your op |
| 3717 | takes the appropriate number of values from the stack, and you are |
| 3718 | responsible for adding stack marks if necessary. |
| 3719 | |
| 3720 | You should also "register" your op with the Perl interpreter so that it |
| 3721 | can produce sensible error and warning messages. Since it is possible to |
| 3722 | have multiple custom ops within the one "logical" op type C<OP_CUSTOM>, |
| 3723 | Perl uses the value of C<< o->op_ppaddr >> to determine which custom op |
| 3724 | it is dealing with. You should create an C<XOP> structure for each |
| 3725 | ppaddr you use, set the properties of the custom op with |
| 3726 | C<XopENTRY_set>, and register the structure against the ppaddr using |
| 3727 | C<Perl_custom_op_register>. A trivial example might look like: |
| 3728 | |
| 3729 | =for apidoc_section $optree_manipulation |
| 3730 | =for apidoc Ayh||XOP |
| 3731 | |
| 3732 | static XOP my_xop; |
| 3733 | static OP *my_pp(pTHX); |
| 3734 | |
| 3735 | BOOT: |
| 3736 | XopENTRY_set(&my_xop, xop_name, "myxop"); |
| 3737 | XopENTRY_set(&my_xop, xop_desc, "Useless custom op"); |
| 3738 | Perl_custom_op_register(aTHX_ my_pp, &my_xop); |
| 3739 | |
| 3740 | The available fields in the structure are: |
| 3741 | |
| 3742 | =over 4 |
| 3743 | |
| 3744 | =item xop_name |
| 3745 | |
| 3746 | A short name for your op. This will be included in some error messages, |
| 3747 | and will also be returned as C<< $op->name >> by the L<B|B> module, so |
| 3748 | it will appear in the output of module like L<B::Concise|B::Concise>. |
| 3749 | |
| 3750 | =item xop_desc |
| 3751 | |
| 3752 | A short description of the function of the op. |
| 3753 | |
| 3754 | =item xop_class |
| 3755 | |
| 3756 | Which of the various C<*OP> structures this op uses. This should be one of |
| 3757 | the C<OA_*> constants from F<op.h>, namely |
| 3758 | |
| 3759 | =over 4 |
| 3760 | |
| 3761 | =item OA_BASEOP |
| 3762 | |
| 3763 | =item OA_UNOP |
| 3764 | |
| 3765 | =item OA_BINOP |
| 3766 | |
| 3767 | =item OA_LOGOP |
| 3768 | |
| 3769 | =item OA_LISTOP |
| 3770 | |
| 3771 | =item OA_PMOP |
| 3772 | |
| 3773 | =item OA_SVOP |
| 3774 | |
| 3775 | =item OA_PADOP |
| 3776 | |
| 3777 | =item OA_PVOP_OR_SVOP |
| 3778 | |
| 3779 | This should be interpreted as 'C<PVOP>' only. The C<_OR_SVOP> is because |
| 3780 | the only core C<PVOP>, C<OP_TRANS>, can sometimes be a C<SVOP> instead. |
| 3781 | |
| 3782 | =item OA_LOOP |
| 3783 | |
| 3784 | =item OA_COP |
| 3785 | |
| 3786 | =for apidoc_section $optree_manipulation |
| 3787 | =for apidoc Amnh||OA_BASEOP |
| 3788 | =for apidoc_item OA_BINOP |
| 3789 | =for apidoc_item OA_COP |
| 3790 | =for apidoc_item OA_LISTOP |
| 3791 | =for apidoc_item OA_LOGOP |
| 3792 | =for apidoc_item OA_LOOP |
| 3793 | =for apidoc_item OA_PADOP |
| 3794 | =for apidoc_item OA_PMOP |
| 3795 | =for apidoc_item OA_PVOP_OR_SVOP |
| 3796 | =for apidoc_item OA_SVOP |
| 3797 | =for apidoc_item OA_UNOP |
| 3798 | |
| 3799 | =back |
| 3800 | |
| 3801 | The other C<OA_*> constants should not be used. |
| 3802 | |
| 3803 | =item xop_peep |
| 3804 | |
| 3805 | This member is of type C<Perl_cpeep_t>, which expands to C<void |
| 3806 | (*Perl_cpeep_t)(aTHX_ OP *o, OP *oldop)>. If it is set, this function |
| 3807 | will be called from C<Perl_rpeep> when ops of this type are encountered |
| 3808 | by the peephole optimizer. I<o> is the OP that needs optimizing; |
| 3809 | I<oldop> is the previous OP optimized, whose C<op_next> points to I<o>. |
| 3810 | |
| 3811 | =for apidoc_section $optree_manipulation |
| 3812 | =for apidoc Ayh||Perl_cpeep_t |
| 3813 | |
| 3814 | =back |
| 3815 | |
| 3816 | C<B::Generate> directly supports the creation of custom ops by name. |
| 3817 | |
| 3818 | =head1 Stacks |
| 3819 | |
| 3820 | Descriptions above occasionally refer to "the stack", but there are in fact |
| 3821 | many stack-like data structures within the perl interpreter. When otherwise |
| 3822 | unqualified, "the stack" usually refers to the value stack. |
| 3823 | |
| 3824 | The various stacks have different purposes, and operate in slightly different |
| 3825 | ways. Their differences are noted below. |
| 3826 | |
| 3827 | =head2 Value Stack |
| 3828 | |
| 3829 | This stack stores the values that regular perl code is operating on, usually |
| 3830 | intermediate values of expressions within a statement. The stack itself is |
| 3831 | formed of an array of SV pointers. |
| 3832 | |
| 3833 | The base of this stack is pointed to by the interpreter variable |
| 3834 | C<PL_stack_base>, of type C<SV **>. |
| 3835 | |
| 3836 | =for apidoc_section $stack |
| 3837 | =for apidoc Amnh||PL_stack_base |
| 3838 | |
| 3839 | The head of the stack is C<PL_stack_sp>, and points to the most |
| 3840 | recently-pushed item. |
| 3841 | |
| 3842 | =for apidoc Amnh||PL_stack_sp |
| 3843 | |
| 3844 | Items are pushed to the stack by using the C<PUSHs()> macro or its variants |
| 3845 | described above; C<XPUSHs()>, C<mPUSHs()>, C<mXPUSHs()> and the typed |
| 3846 | versions. Note carefully that the non-C<X> versions of these macros do not |
| 3847 | check the size of the stack and assume it to be big enough. These must be |
| 3848 | paired with a suitable check of the stack's size, such as the C<EXTEND> macro |
| 3849 | to ensure it is large enough. For example |
| 3850 | |
| 3851 | EXTEND(SP, 4); |
| 3852 | mPUSHi(10); |
| 3853 | mPUSHi(20); |
| 3854 | mPUSHi(30); |
| 3855 | mPUSHi(40); |
| 3856 | |
| 3857 | This is slightly more performant than making four separate checks in four |
| 3858 | separate C<mXPUSHi()> calls. |
| 3859 | |
| 3860 | As a further performance optimisation, the various C<PUSH> macros all operate |
| 3861 | using a local variable C<SP>, rather than the interpreter-global variable |
| 3862 | C<PL_stack_sp>. This variable is declared by the C<dSP> macro - though it is |
| 3863 | normally implied by XSUBs and similar so it is rare you have to consider it |
| 3864 | directly. Once declared, the C<PUSH> macros will operate only on this local |
| 3865 | variable, so before invoking any other perl core functions you must use the |
| 3866 | C<PUTBACK> macro to return the value from the local C<SP> variable back to |
| 3867 | the interpreter variable. Similarly, after calling a perl core function which |
| 3868 | may have had reason to move the stack or push/pop values to it, you must use |
| 3869 | the C<SPAGAIN> macro which refreshes the local C<SP> value back from the |
| 3870 | interpreter one. |
| 3871 | |
| 3872 | Items are popped from the stack by using the C<POPs> macro or its typed |
| 3873 | versions, There is also a macro C<TOPs> that inspects the topmost item without |
| 3874 | removing it. |
| 3875 | |
| 3876 | =for apidoc_section $stack |
| 3877 | =for apidoc Amnh||TOPs |
| 3878 | |
| 3879 | Note specifically that SV pointers on the value stack do not contribute to the |
| 3880 | overall reference count of the xVs being referred to. If newly-created xVs are |
| 3881 | being pushed to the stack you must arrange for them to be destroyed at a |
| 3882 | suitable time; usually by using one of the C<mPUSH*> macros or C<sv_2mortal()> |
| 3883 | to mortalise the xV. |
| 3884 | |
| 3885 | =head2 Mark Stack |
| 3886 | |
| 3887 | The value stack stores individual perl scalar values as temporaries between |
| 3888 | expressions. Some perl expressions operate on entire lists; for that purpose |
| 3889 | we need to know where on the stack each list begins. This is the purpose of the |
| 3890 | mark stack. |
| 3891 | |
| 3892 | The mark stack stores integers as I32 values, which are the height of the |
| 3893 | value stack at the time before the list began; thus the mark itself actually |
| 3894 | points to the value stack entry one before the list. The list itself starts at |
| 3895 | C<mark + 1>. |
| 3896 | |
| 3897 | The base of this stack is pointed to by the interpreter variable |
| 3898 | C<PL_markstack>, of type C<I32 *>. |
| 3899 | |
| 3900 | =for apidoc_section $stack |
| 3901 | =for apidoc Amnh||PL_markstack |
| 3902 | |
| 3903 | The head of the stack is C<PL_markstack_ptr>, and points to the most |
| 3904 | recently-pushed item. |
| 3905 | |
| 3906 | =for apidoc Amnh||PL_markstack_ptr |
| 3907 | |
| 3908 | Items are pushed to the stack by using the C<PUSHMARK()> macro. Even though |
| 3909 | the stack itself stores (value) stack indices as integers, the C<PUSHMARK> |
| 3910 | macro should be given a stack pointer directly; it will calculate the index |
| 3911 | offset by comparing to the C<PL_stack_sp> variable. Thus almost always the |
| 3912 | code to perform this is |
| 3913 | |
| 3914 | PUSHMARK(SP); |
| 3915 | |
| 3916 | Items are popped from the stack by the C<POPMARK> macro. There is also a macro |
| 3917 | C<TOPMARK> that inspects the topmost item without removing it. These macros |
| 3918 | return I32 index values directly. There is also the C<dMARK> macro which |
| 3919 | declares a new SV double-pointer variable, called C<mark>, which points at the |
| 3920 | marked stack slot; this is the usual macro that C code will use when operating |
| 3921 | on lists given on the stack. |
| 3922 | |
| 3923 | As noted above, the C<mark> variable itself will point at the most recently |
| 3924 | pushed value on the value stack before the list begins, and so the list itself |
| 3925 | starts at C<mark + 1>. The values of the list may be iterated by code such as |
| 3926 | |
| 3927 | for(SV **svp = mark + 1; svp <= PL_stack_sp; svp++) { |
| 3928 | SV *item = *svp; |
| 3929 | ... |
| 3930 | } |
| 3931 | |
| 3932 | Note specifically in the case that the list is already empty, C<mark> will |
| 3933 | equal C<PL_stack_sp>. |
| 3934 | |
| 3935 | Because the C<mark> variable is converted to a pointer on the value stack, |
| 3936 | extra care must be taken if C<EXTEND> or any of the C<XPUSH> macros are |
| 3937 | invoked within the function, because the stack may need to be moved to |
| 3938 | extend it and so the existing pointer will now be invalid. If this may be a |
| 3939 | problem, a possible solution is to track the mark offset as an integer and |
| 3940 | track the mark itself later on after the stack had been moved. |
| 3941 | |
| 3942 | I32 markoff = POPMARK; |
| 3943 | |
| 3944 | ... |
| 3945 | |
| 3946 | SP **mark = PL_stack_base + markoff; |
| 3947 | |
| 3948 | =head2 Temporaries Stack |
| 3949 | |
| 3950 | As noted above, xV references on the main value stack do not contribute to the |
| 3951 | reference count of an xV, and so another mechanism is used to track when |
| 3952 | temporary values which live on the stack must be released. This is the job of |
| 3953 | the temporaries stack. |
| 3954 | |
| 3955 | The temporaries stack stores pointers to xVs whose reference counts will be |
| 3956 | decremented soon. |
| 3957 | |
| 3958 | The base of this stack is pointed to by the interpreter variable |
| 3959 | C<PL_tmps_stack>, of type C<SV **>. |
| 3960 | |
| 3961 | =for apidoc_section $stack |
| 3962 | =for apidoc Amnh||PL_tmps_stack |
| 3963 | |
| 3964 | The head of the stack is indexed by C<PL_tmps_ix>, an integer which stores the |
| 3965 | index in the array of the most recently-pushed item. |
| 3966 | |
| 3967 | =for apidoc Amnh||PL_tmps_ix |
| 3968 | |
| 3969 | There is no public API to directly push items to the temporaries stack. Instead, |
| 3970 | the API function C<sv_2mortal()> is used to mortalize an xV, adding its |
| 3971 | address to the temporaries stack. |
| 3972 | |
| 3973 | Likewise, there is no public API to read values from the temporaries stack. |
| 3974 | Instead, the macros C<SAVETMPS> and C<FREETMPS> are used. The C<SAVETMPS> |
| 3975 | macro establishes the base levels of the temporaries stack, by capturing the |
| 3976 | current value of C<PL_tmps_ix> into C<PL_tmps_floor> and saving the previous |
| 3977 | value to the save stack. Thereafter, whenever C<FREETMPS> is invoked all of |
| 3978 | the temporaries that have been pushed since that level are reclaimed. |
| 3979 | |
| 3980 | =for apidoc_section $stack |
| 3981 | =for apidoc Amnh||PL_tmps_floor |
| 3982 | |
| 3983 | While it is common to see these two macros in pairs within an C<ENTER>/ |
| 3984 | C<LEAVE> pair, it is not necessary to match them. It is permitted to invoke |
| 3985 | C<FREETMPS> multiple times since the most recent C<SAVETMPS>; for example in a |
| 3986 | loop iterating over elements of a list. While you can invoke C<SAVETMPS> |
| 3987 | multiple times within a scope pair, it is unlikely to be useful. Subsequent |
| 3988 | invocations will move the temporaries floor further up, thus effectively |
| 3989 | trapping the existing temporaries to only be released at the end of the scope. |
| 3990 | |
| 3991 | =head2 Save Stack |
| 3992 | |
| 3993 | The save stack is used by perl to implement the C<local> keyword and other |
| 3994 | similar behaviours; any cleanup operations that need to be performed when |
| 3995 | leaving the current scope. Items pushed to this stack generally capture the |
| 3996 | current value of some internal variable or state, which will be restored when |
| 3997 | the scope is unwound due to leaving, C<return>, C<die>, C<goto> or other |
| 3998 | reasons. |
| 3999 | |
| 4000 | Whereas other perl internal stacks store individual items all of the same type |
| 4001 | (usually SV pointers or integers), the items pushed to the save stack are |
| 4002 | formed of many different types, having multiple fields to them. For example, |
| 4003 | the C<SAVEt_INT> type needs to store both the address of the C<int> variable |
| 4004 | to restore, and the value to restore it to. This information could have been |
| 4005 | stored using fields of a C<struct>, but would have to be large enough to store |
| 4006 | three pointers in the largest case, which would waste a lot of space in most |
| 4007 | of the smaller cases. |
| 4008 | |
| 4009 | =for apidoc_section $stack |
| 4010 | =for apidoc Amnh||SAVEt_INT |
| 4011 | |
| 4012 | Instead, the stack stores information in a variable-length encoding of C<ANY> |
| 4013 | structures. The final value pushed is stored in the C<UV> field which encodes |
| 4014 | the kind of item held by the preceding items; the count and types of which |
| 4015 | will depend on what kind of item is being stored. The kind field is pushed |
| 4016 | last because that will be the first field to be popped when unwinding items |
| 4017 | from the stack. |
| 4018 | |
| 4019 | The base of this stack is pointed to by the interpreter variable |
| 4020 | C<PL_savestack>, of type C<ANY *>. |
| 4021 | |
| 4022 | =for apidoc_section $stack |
| 4023 | =for apidoc Amnh||PL_savestack |
| 4024 | |
| 4025 | The head of the stack is indexed by C<PL_savestack_ix>, an integer which |
| 4026 | stores the index in the array at which the next item should be pushed. (Note |
| 4027 | that this is different to most other stacks, which reference the most |
| 4028 | recently-pushed item). |
| 4029 | |
| 4030 | =for apidoc_section $stack |
| 4031 | =for apidoc Amnh||PL_savestack_ix |
| 4032 | |
| 4033 | Items are pushed to the save stack by using the various C<SAVE...()> macros. |
| 4034 | Many of these macros take a variable and store both its address and current |
| 4035 | value on the save stack, ensuring that value gets restored on scope exit. |
| 4036 | |
| 4037 | SAVEI8(i8) |
| 4038 | SAVEI16(i16) |
| 4039 | SAVEI32(i32) |
| 4040 | SAVEINT(i) |
| 4041 | ... |
| 4042 | |
| 4043 | There are also a variety of other special-purpose macros which save particular |
| 4044 | types or values of interest. C<SAVETMPS> has already been mentioned above. |
| 4045 | Others include C<SAVEFREEPV> which arranges for a PV (i.e. a string buffer) to |
| 4046 | be freed, or C<SAVEDESTRUCTOR> which arranges for a given function pointer to |
| 4047 | be invoked on scope exit. A full list of such macros can be found in |
| 4048 | F<scope.h>. |
| 4049 | |
| 4050 | There is no public API for popping individual values or items from the save |
| 4051 | stack. Instead, via the scope stack, the C<ENTER> and C<LEAVE> pair form a way |
| 4052 | to start and stop nested scopes. Leaving a nested scope via C<LEAVE> will |
| 4053 | restore all of the saved values that had been pushed since the most recent |
| 4054 | C<ENTER>. |
| 4055 | |
| 4056 | =head2 Scope Stack |
| 4057 | |
| 4058 | As with the mark stack to the value stack, the scope stack forms a pair with |
| 4059 | the save stack. The scope stack stores the height of the save stack at which |
| 4060 | nested scopes begin, and allows the save stack to be unwound back to that |
| 4061 | point when the scope is left. |
| 4062 | |
| 4063 | When perl is built with debugging enabled, there is a second part to this |
| 4064 | stack storing human-readable string names describing the type of stack |
| 4065 | context. Each push operation saves the name as well as the height of the save |
| 4066 | stack, and each pop operation checks the topmost name with what is expected, |
| 4067 | causing an assertion failure if the name does not match. |
| 4068 | |
| 4069 | The base of this stack is pointed to by the interpreter variable |
| 4070 | C<PL_scopestack>, of type C<I32 *>. If enabled, the scope stack names are |
| 4071 | stored in a separate array pointed to by C<PL_scopestack_name>, of type |
| 4072 | C<const char **>. |
| 4073 | |
| 4074 | =for apidoc_section $stack |
| 4075 | =for apidoc Amnh||PL_scopestack |
| 4076 | =for apidoc Amnh||PL_scopestack_name |
| 4077 | |
| 4078 | The head of the stack is indexed by C<PL_scopestack_ix>, an integer which |
| 4079 | stores the index of the array or arrays at which the next item should be |
| 4080 | pushed. (Note that this is different to most other stacks, which reference the |
| 4081 | most recently-pushed item). |
| 4082 | |
| 4083 | =for apidoc_section $stack |
| 4084 | =for apidoc Amnh||PL_scopestack_ix |
| 4085 | |
| 4086 | Values are pushed to the scope stack using the C<ENTER> macro, which begins a |
| 4087 | new nested scope. Any items pushed to the save stack are then restored at the |
| 4088 | next nested invocation of the C<LEAVE> macro. |
| 4089 | |
| 4090 | =head1 Dynamic Scope and the Context Stack |
| 4091 | |
| 4092 | B<Note:> this section describes a non-public internal API that is subject |
| 4093 | to change without notice. |
| 4094 | |
| 4095 | =head2 Introduction to the context stack |
| 4096 | |
| 4097 | In Perl, dynamic scoping refers to the runtime nesting of things like |
| 4098 | subroutine calls, evals etc, as well as the entering and exiting of block |
| 4099 | scopes. For example, the restoring of a C<local>ised variable is |
| 4100 | determined by the dynamic scope. |
| 4101 | |
| 4102 | Perl tracks the dynamic scope by a data structure called the context |
| 4103 | stack, which is an array of C<PERL_CONTEXT> structures, and which is |
| 4104 | itself a big union for all the types of context. Whenever a new scope is |
| 4105 | entered (such as a block, a C<for> loop, or a subroutine call), a new |
| 4106 | context entry is pushed onto the stack. Similarly when leaving a block or |
| 4107 | returning from a subroutine call etc. a context is popped. Since the |
| 4108 | context stack represents the current dynamic scope, it can be searched. |
| 4109 | For example, C<next LABEL> searches back through the stack looking for a |
| 4110 | loop context that matches the label; C<return> pops contexts until it |
| 4111 | finds a sub or eval context or similar; C<caller> examines sub contexts on |
| 4112 | the stack. |
| 4113 | |
| 4114 | =for apidoc_section $concurrency |
| 4115 | =for apidoc Cyh||PERL_CONTEXT |
| 4116 | |
| 4117 | Each context entry is labelled with a context type, C<cx_type>. Typical |
| 4118 | context types are C<CXt_SUB>, C<CXt_EVAL> etc., as well as C<CXt_BLOCK> |
| 4119 | and C<CXt_NULL> which represent a basic scope (as pushed by C<pp_enter>) |
| 4120 | and a sort block. The type determines which part of the context union are |
| 4121 | valid. |
| 4122 | |
| 4123 | =for apidoc Cyh ||cx_type |
| 4124 | |
| 4125 | =for apidoc Cmnh||CXt_BLOCK |
| 4126 | =for apidoc_item ||CXt_EVAL |
| 4127 | =for apidoc_item ||CXt_FORMAT |
| 4128 | =for apidoc_item ||CXt_GIVEN |
| 4129 | =for apidoc_item ||CXt_LOOP_ARY |
| 4130 | =for apidoc_item ||CXt_LOOP_LAZYIV |
| 4131 | =for apidoc_item ||CXt_LOOP_LAZYSV |
| 4132 | =for apidoc_item ||CXt_LOOP_LIST |
| 4133 | =for apidoc_item ||CXt_LOOP_PLAIN |
| 4134 | =for apidoc_item ||CXt_NULL |
| 4135 | =for apidoc_item ||CXt_SUB |
| 4136 | =for apidoc_item ||CXt_SUBST |
| 4137 | =for apidoc_item ||CXt_WHEN |
| 4138 | |
| 4139 | The main division in the context struct is between a substitution scope |
| 4140 | (C<CXt_SUBST>) and block scopes, which are everything else. The former is |
| 4141 | just used while executing C<s///e>, and won't be discussed further |
| 4142 | here. |
| 4143 | |
| 4144 | All the block scope types share a common base, which corresponds to |
| 4145 | C<CXt_BLOCK>. This stores the old values of various scope-related |
| 4146 | variables like C<PL_curpm>, as well as information about the current |
| 4147 | scope, such as C<gimme>. On scope exit, the old variables are restored. |
| 4148 | |
| 4149 | Particular block scope types store extra per-type information. For |
| 4150 | example, C<CXt_SUB> stores the currently executing CV, while the various |
| 4151 | for loop types might hold the original loop variable SV. On scope exit, |
| 4152 | the per-type data is processed; for example the CV has its reference count |
| 4153 | decremented, and the original loop variable is restored. |
| 4154 | |
| 4155 | The macro C<cxstack> returns the base of the current context stack, while |
| 4156 | C<cxstack_ix> is the index of the current frame within that stack. |
| 4157 | |
| 4158 | =for apidoc_section $concurrency |
| 4159 | =for apidoc Cmnh|PERL_CONTEXT *|cxstack |
| 4160 | =for apidoc Cmnh|I32|cxstack_ix |
| 4161 | |
| 4162 | In fact, the context stack is actually part of a stack-of-stacks system; |
| 4163 | whenever something unusual is done such as calling a C<DESTROY> or tie |
| 4164 | handler, a new stack is pushed, then popped at the end. |
| 4165 | |
| 4166 | Note that the API described here changed considerably in perl 5.24; prior |
| 4167 | to that, big macros like C<PUSHBLOCK> and C<POPSUB> were used; in 5.24 |
| 4168 | they were replaced by the inline static functions described below. In |
| 4169 | addition, the ordering and detail of how these macros/function work |
| 4170 | changed in many ways, often subtly. In particular they didn't handle |
| 4171 | saving the savestack and temps stack positions, and required additional |
| 4172 | C<ENTER>, C<SAVETMPS> and C<LEAVE> compared to the new functions. The |
| 4173 | old-style macros will not be described further. |
| 4174 | |
| 4175 | |
| 4176 | =head2 Pushing contexts |
| 4177 | |
| 4178 | For pushing a new context, the two basic functions are |
| 4179 | C<cx = cx_pushblock()>, which pushes a new basic context block and returns |
| 4180 | its address, and a family of similar functions with names like |
| 4181 | C<cx_pushsub(cx)> which populate the additional type-dependent fields in |
| 4182 | the C<cx> struct. Note that C<CXt_NULL> and C<CXt_BLOCK> don't have their |
| 4183 | own push functions, as they don't store any data beyond that pushed by |
| 4184 | C<cx_pushblock>. |
| 4185 | |
| 4186 | The fields of the context struct and the arguments to the C<cx_*> |
| 4187 | functions are subject to change between perl releases, representing |
| 4188 | whatever is convenient or efficient for that release. |
| 4189 | |
| 4190 | A typical context stack pushing can be found in C<pp_entersub>; the |
| 4191 | following shows a simplified and stripped-down example of a non-XS call, |
| 4192 | along with comments showing roughly what each function does. |
| 4193 | |
| 4194 | dMARK; |
| 4195 | U8 gimme = GIMME_V; |
| 4196 | bool hasargs = cBOOL(PL_op->op_flags & OPf_STACKED); |
| 4197 | OP *retop = PL_op->op_next; |
| 4198 | I32 old_ss_ix = PL_savestack_ix; |
| 4199 | CV *cv = ....; |
| 4200 | |
| 4201 | /* ... make mortal copies of stack args which are PADTMPs here ... */ |
| 4202 | |
| 4203 | /* ... do any additional savestack pushes here ... */ |
| 4204 | |
| 4205 | /* Now push a new context entry of type 'CXt_SUB'; initially just |
| 4206 | * doing the actions common to all block types: */ |
| 4207 | |
| 4208 | cx = cx_pushblock(CXt_SUB, gimme, MARK, old_ss_ix); |
| 4209 | |
| 4210 | /* this does (approximately): |
| 4211 | CXINC; /* cxstack_ix++ (grow if necessary) */ |
| 4212 | cx = CX_CUR(); /* and get the address of new frame */ |
| 4213 | cx->cx_type = CXt_SUB; |
| 4214 | cx->blk_gimme = gimme; |
| 4215 | cx->blk_oldsp = MARK - PL_stack_base; |
| 4216 | cx->blk_oldsaveix = old_ss_ix; |
| 4217 | cx->blk_oldcop = PL_curcop; |
| 4218 | cx->blk_oldmarksp = PL_markstack_ptr - PL_markstack; |
| 4219 | cx->blk_oldscopesp = PL_scopestack_ix; |
| 4220 | cx->blk_oldpm = PL_curpm; |
| 4221 | cx->blk_old_tmpsfloor = PL_tmps_floor; |
| 4222 | |
| 4223 | PL_tmps_floor = PL_tmps_ix; |
| 4224 | */ |
| 4225 | |
| 4226 | |
| 4227 | /* then update the new context frame with subroutine-specific info, |
| 4228 | * such as the CV about to be executed: */ |
| 4229 | |
| 4230 | cx_pushsub(cx, cv, retop, hasargs); |
| 4231 | |
| 4232 | /* this does (approximately): |
| 4233 | cx->blk_sub.cv = cv; |
| 4234 | cx->blk_sub.olddepth = CvDEPTH(cv); |
| 4235 | cx->blk_sub.prevcomppad = PL_comppad; |
| 4236 | cx->cx_type |= (hasargs) ? CXp_HASARGS : 0; |
| 4237 | cx->blk_sub.retop = retop; |
| 4238 | SvREFCNT_inc_simple_void_NN(cv); |
| 4239 | */ |
| 4240 | |
| 4241 | =for apidoc_section $concurrency |
| 4242 | =for apidoc Cmnh||CXINC |
| 4243 | |
| 4244 | Note that C<cx_pushblock()> sets two new floors: for the args stack (to |
| 4245 | C<MARK>) and the temps stack (to C<PL_tmps_ix>). While executing at this |
| 4246 | scope level, every C<nextstate> (amongst others) will reset the args and |
| 4247 | tmps stack levels to these floors. Note that since C<cx_pushblock> uses |
| 4248 | the current value of C<PL_tmps_ix> rather than it being passed as an arg, |
| 4249 | this dictates at what point C<cx_pushblock> should be called. In |
| 4250 | particular, any new mortals which should be freed only on scope exit |
| 4251 | (rather than at the next C<nextstate>) should be created first. |
| 4252 | |
| 4253 | Most callers of C<cx_pushblock> simply set the new args stack floor to the |
| 4254 | top of the previous stack frame, but for C<CXt_LOOP_LIST> it stores the |
| 4255 | items being iterated over on the stack, and so sets C<blk_oldsp> to the |
| 4256 | top of these items instead. Note that, contrary to its name, C<blk_oldsp> |
| 4257 | doesn't always represent the value to restore C<PL_stack_sp> to on scope |
| 4258 | exit. |
| 4259 | |
| 4260 | Note the early capture of C<PL_savestack_ix> to C<old_ss_ix>, which is |
| 4261 | later passed as an arg to C<cx_pushblock>. In the case of C<pp_entersub>, |
| 4262 | this is because, although most values needing saving are stored in fields |
| 4263 | of the context struct, an extra value needs saving only when the debugger |
| 4264 | is running, and it doesn't make sense to bloat the struct for this rare |
| 4265 | case. So instead it is saved on the savestack. Since this value gets |
| 4266 | calculated and saved before the context is pushed, it is necessary to pass |
| 4267 | the old value of C<PL_savestack_ix> to C<cx_pushblock>, to ensure that the |
| 4268 | saved value gets freed during scope exit. For most users of |
| 4269 | C<cx_pushblock>, where nothing needs pushing on the save stack, |
| 4270 | C<PL_savestack_ix> is just passed directly as an arg to C<cx_pushblock>. |
| 4271 | |
| 4272 | Note that where possible, values should be saved in the context struct |
| 4273 | rather than on the save stack; it's much faster that way. |
| 4274 | |
| 4275 | Normally C<cx_pushblock> should be immediately followed by the appropriate |
| 4276 | C<cx_pushfoo>, with nothing between them; this is because if code |
| 4277 | in-between could die (e.g. a warning upgraded to fatal), then the context |
| 4278 | stack unwinding code in C<dounwind> would see (in the example above) a |
| 4279 | C<CXt_SUB> context frame, but without all the subroutine-specific fields |
| 4280 | set, and crashes would soon ensue. |
| 4281 | |
| 4282 | =for apidoc dounwind |
| 4283 | |
| 4284 | Where the two must be separate, initially set the type to C<CXt_NULL> or |
| 4285 | C<CXt_BLOCK>, and later change it to C<CXt_foo> when doing the |
| 4286 | C<cx_pushfoo>. This is exactly what C<pp_enteriter> does, once it's |
| 4287 | determined which type of loop it's pushing. |
| 4288 | |
| 4289 | =head2 Popping contexts |
| 4290 | |
| 4291 | Contexts are popped using C<cx_popsub()> etc. and C<cx_popblock()>. Note |
| 4292 | however, that unlike C<cx_pushblock>, neither of these functions actually |
| 4293 | decrement the current context stack index; this is done separately using |
| 4294 | C<CX_POP()>. |
| 4295 | |
| 4296 | =for apidoc_section $concurrency |
| 4297 | =for apidoc Cmh|void|CX_POP|PERL_CONTEXT* cx |
| 4298 | |
| 4299 | There are two main ways that contexts are popped. During normal execution |
| 4300 | as scopes are exited, functions like C<pp_leave>, C<pp_leaveloop> and |
| 4301 | C<pp_leavesub> process and pop just one context using C<cx_popfoo> and |
| 4302 | C<cx_popblock>. On the other hand, things like C<pp_return> and C<next> |
| 4303 | may have to pop back several scopes until a sub or loop context is found, |
| 4304 | and exceptions (such as C<die>) need to pop back contexts until an eval |
| 4305 | context is found. Both of these are accomplished by C<dounwind()>, which |
| 4306 | is capable of processing and popping all contexts above the target one. |
| 4307 | |
| 4308 | Here is a typical example of context popping, as found in C<pp_leavesub> |
| 4309 | (simplified slightly): |
| 4310 | |
| 4311 | U8 gimme; |
| 4312 | PERL_CONTEXT *cx; |
| 4313 | SV **oldsp; |
| 4314 | OP *retop; |
| 4315 | |
| 4316 | cx = CX_CUR(); |
| 4317 | |
| 4318 | gimme = cx->blk_gimme; |
| 4319 | oldsp = PL_stack_base + cx->blk_oldsp; /* last arg of previous frame */ |
| 4320 | |
| 4321 | if (gimme == G_VOID) |
| 4322 | PL_stack_sp = oldsp; |
| 4323 | else |
| 4324 | leave_adjust_stacks(oldsp, oldsp, gimme, 0); |
| 4325 | |
| 4326 | CX_LEAVE_SCOPE(cx); |
| 4327 | cx_popsub(cx); |
| 4328 | cx_popblock(cx); |
| 4329 | retop = cx->blk_sub.retop; |
| 4330 | CX_POP(cx); |
| 4331 | |
| 4332 | return retop; |
| 4333 | |
| 4334 | =for apidoc_section $concurrency |
| 4335 | =for apidoc Cmh||CX_CUR |
| 4336 | |
| 4337 | The steps above are in a very specific order, designed to be the reverse |
| 4338 | order of when the context was pushed. The first thing to do is to copy |
| 4339 | and/or protect any return arguments and free any temps in the current |
| 4340 | scope. Scope exits like an rvalue sub normally return a mortal copy of |
| 4341 | their return args (as opposed to lvalue subs). It is important to make |
| 4342 | this copy before the save stack is popped or variables are restored, or |
| 4343 | bad things like the following can happen: |
| 4344 | |
| 4345 | sub f { my $x =...; $x } # $x freed before we get to copy it |
| 4346 | sub f { /(...)/; $1 } # PL_curpm restored before $1 copied |
| 4347 | |
| 4348 | Although we wish to free any temps at the same time, we have to be careful |
| 4349 | not to free any temps which are keeping return args alive; nor to free the |
| 4350 | temps we have just created while mortal copying return args. Fortunately, |
| 4351 | C<leave_adjust_stacks()> is capable of making mortal copies of return args, |
| 4352 | shifting args down the stack, and only processing those entries on the |
| 4353 | temps stack that are safe to do so. |
| 4354 | |
| 4355 | In void context no args are returned, so it's more efficient to skip |
| 4356 | calling C<leave_adjust_stacks()>. Also in void context, a C<nextstate> op |
| 4357 | is likely to be imminently called which will do a C<FREETMPS>, so there's |
| 4358 | no need to do that either. |
| 4359 | |
| 4360 | The next step is to pop savestack entries: C<CX_LEAVE_SCOPE(cx)> is just |
| 4361 | defined as C<< LEAVE_SCOPE(cx->blk_oldsaveix) >>. Note that during the |
| 4362 | popping, it's possible for perl to call destructors, call C<STORE> to undo |
| 4363 | localisations of tied vars, and so on. Any of these can die or call |
| 4364 | C<exit()>. In this case, C<dounwind()> will be called, and the current |
| 4365 | context stack frame will be re-processed. Thus it is vital that all steps |
| 4366 | in popping a context are done in such a way to support reentrancy. The |
| 4367 | other alternative, of decrementing C<cxstack_ix> I<before> processing the |
| 4368 | frame, would lead to leaks and the like if something died halfway through, |
| 4369 | or overwriting of the current frame. |
| 4370 | |
| 4371 | =for apidoc_section $concurrency |
| 4372 | =for apidoc Cmh|void|CX_LEAVE_SCOPE|PERL_CONTEXT* cx |
| 4373 | |
| 4374 | C<CX_LEAVE_SCOPE> itself is safely re-entrant: if only half the savestack |
| 4375 | items have been popped before dying and getting trapped by eval, then the |
| 4376 | C<CX_LEAVE_SCOPE>s in C<dounwind> or C<pp_leaveeval> will continue where |
| 4377 | the first one left off. |
| 4378 | |
| 4379 | The next step is the type-specific context processing; in this case |
| 4380 | C<cx_popsub>. In part, this looks like: |
| 4381 | |
| 4382 | cv = cx->blk_sub.cv; |
| 4383 | CvDEPTH(cv) = cx->blk_sub.olddepth; |
| 4384 | cx->blk_sub.cv = NULL; |
| 4385 | SvREFCNT_dec(cv); |
| 4386 | |
| 4387 | where its processing the just-executed CV. Note that before it decrements |
| 4388 | the CV's reference count, it nulls the C<blk_sub.cv>. This means that if |
| 4389 | it re-enters, the CV won't be freed twice. It also means that you can't |
| 4390 | rely on such type-specific fields having useful values after the return |
| 4391 | from C<cx_popfoo>. |
| 4392 | |
| 4393 | Next, C<cx_popblock> restores all the various interpreter vars to their |
| 4394 | previous values or previous high water marks; it expands to: |
| 4395 | |
| 4396 | PL_markstack_ptr = PL_markstack + cx->blk_oldmarksp; |
| 4397 | PL_scopestack_ix = cx->blk_oldscopesp; |
| 4398 | PL_curpm = cx->blk_oldpm; |
| 4399 | PL_curcop = cx->blk_oldcop; |
| 4400 | PL_tmps_floor = cx->blk_old_tmpsfloor; |
| 4401 | |
| 4402 | Note that it I<doesn't> restore C<PL_stack_sp>; as mentioned earlier, |
| 4403 | which value to restore it to depends on the context type (specifically |
| 4404 | C<for (list) {}>), and what args (if any) it returns; and that will |
| 4405 | already have been sorted out earlier by C<leave_adjust_stacks()>. |
| 4406 | |
| 4407 | Finally, the context stack pointer is actually decremented by C<CX_POP(cx)>. |
| 4408 | After this point, it's possible that that the current context frame could |
| 4409 | be overwritten by other contexts being pushed. Although things like ties |
| 4410 | and C<DESTROY> are supposed to work within a new context stack, it's best |
| 4411 | not to assume this. Indeed on debugging builds, C<CX_POP(cx)> deliberately |
| 4412 | sets C<cx> to null to detect code that is still relying on the field |
| 4413 | values in that context frame. Note in the C<pp_leavesub()> example above, |
| 4414 | we grab C<blk_sub.retop> I<before> calling C<CX_POP>. |
| 4415 | |
| 4416 | =head2 Redoing contexts |
| 4417 | |
| 4418 | Finally, there is C<cx_topblock(cx)>, which acts like a super-C<nextstate> |
| 4419 | as regards to resetting various vars to their base values. It is used in |
| 4420 | places like C<pp_next>, C<pp_redo> and C<pp_goto> where rather than |
| 4421 | exiting a scope, we want to re-initialise the scope. As well as resetting |
| 4422 | C<PL_stack_sp> like C<nextstate>, it also resets C<PL_markstack_ptr>, |
| 4423 | C<PL_scopestack_ix> and C<PL_curpm>. Note that it doesn't do a |
| 4424 | C<FREETMPS>. |
| 4425 | |
| 4426 | |
| 4427 | =head1 Reference-counted argument stack |
| 4428 | |
| 4429 | =head2 Introduction |
| 4430 | |
| 4431 | As of perl 5.40, there is a build option, C<PERL_RC_STACK>, not enabled by |
| 4432 | default, which requires that items pushed onto, or popped off the argument |
| 4433 | stack have their reference counts adjusted. It is intended that eventually |
| 4434 | this will be the default way (and finally the only way) to configure perl. |
| 4435 | |
| 4436 | The macros which manipulate the stack such as PUSHs() and POPs() don't |
| 4437 | adjust the reference count of the SV. Most of the time this is fine, since |
| 4438 | something else is keeping the SV alive while on the argument stack, such |
| 4439 | a pointer from the TEMPs stack, or from the pad (e.g. a lexical variable |
| 4440 | or a C<PADTMP>). Occasionally this can go horribly wrong. For example, |
| 4441 | this code: |
| 4442 | |
| 4443 | my @a = (1,2,3); |
| 4444 | sub f { @a = (); print "(@_)\n" }; |
| 4445 | f(@a, 4); |
| 4446 | |
| 4447 | may print undefined or random freed values, since some of the elements of |
| 4448 | @_, which have been aliased to the elements of @a, have been freed. |
| 4449 | C<PERL_RC_STACK> is intended to fix this by making each SV pointer on the |
| 4450 | argument stack increment the reference count (RC) of the SV by one. |
| 4451 | |
| 4452 | In this new environment, unmodified existing PP and XS functions, which |
| 4453 | have been written assuming a non reference-counted stack (non-RC for |
| 4454 | short), are called via special wrapper functions which adjust the stack |
| 4455 | before and after. At the moment there is no API to write an RC XS |
| 4456 | function, so all XS code will continue to be called via a wrapper (which |
| 4457 | makes them slightly slower), but means that in general, CPAN distributions |
| 4458 | containing XS code code continue to work without modification. |
| 4459 | |
| 4460 | However, PP functions, either in perl core, or those in XS functions used |
| 4461 | to implement custom ops or to override the PP functions for built-in ops, |
| 4462 | need dealing with specially. For the latter, they can just be wrapped; |
| 4463 | this involves the least work, but has a performance impact. In the longer |
| 4464 | term, and for core PP functions, they need unwrapping and rewriting using |
| 4465 | a new API. With this, the old macros such as PUSHs() have been replaced |
| 4466 | with a new set of (mostly inline) functions with a common prefix, such as |
| 4467 | rpp_push_1(). "RPP" stands for "reference-counted push and pop functions". |
| 4468 | The new functions modify the reference count on C<PERL_RC_STACK> builds, |
| 4469 | while leaving them unadjusted otherwise. Thus in core they generally work |
| 4470 | in both cases, while in XS code they are portable to older perl versions |
| 4471 | via C<PPPort> (XXX assuming that they get been added to C<PPPort>). |
| 4472 | |
| 4473 | The rest of this section is mainly concerned with how to convert existing |
| 4474 | PP functions, and how to write new PP functions to use the new C<rpp_> |
| 4475 | API. |
| 4476 | |
| 4477 | A reference-counted perl can be built using the PERL_RC_STACK define. |
| 4478 | For development and debugging purposes, it is best to enable leaking |
| 4479 | scalar debugging too, as that displays extra information about scalars |
| 4480 | that have leaked or been prematurely freed. |
| 4481 | |
| 4482 | Configure -DDEBUGGING \ |
| 4483 | -Accflags='-DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS' |
| 4484 | |
| 4485 | =head2 Reference counted stack states |
| 4486 | |
| 4487 | In the new regime, the current argument stack can be in one of three |
| 4488 | states, which can be determined by the shown expression. |
| 4489 | |
| 4490 | =over |
| 4491 | |
| 4492 | =item * not reference-counted |
| 4493 | |
| 4494 | !AvREAL(PL_curstack) |
| 4495 | |
| 4496 | In this case, perl will assume when emptying the stack (such as during a |
| 4497 | croak()) that the items on it don't need freeing. This is the traditional |
| 4498 | perl behaviour. On C<PERL_RC_STACK> builds, such stacks will be rarely |
| 4499 | encountered. |
| 4500 | |
| 4501 | =item * fully reference-counted |
| 4502 | |
| 4503 | AvREAL(PL_curstack) && !PL_curstackinfo->si_stack_nonrc_base |
| 4504 | |
| 4505 | All the items on the stack are reference counted, and will be freed by |
| 4506 | functions like rpp_popfree_1() or if perl croak()s. This is the normal |
| 4507 | state of the stack in C<PERL_RC_STACK> builds. |
| 4508 | |
| 4509 | =item * partially reference-counted (split) |
| 4510 | |
| 4511 | AvREAL(PL_curstack) && PL_curstackinfo->si_stack_nonrc_base > 0 |
| 4512 | |
| 4513 | In this case, items on the stack from the index C<si_stack_nonrc_base> |
| 4514 | upwards are non-RC; those below are RC. This state occurs when a PP or XS |
| 4515 | function has been wrapped. In this case, the wrapper function pushes a |
| 4516 | non-RC copy of the arg pointers above the cut then calls the real |
| 4517 | function. When that returns, the wrapper function bumps up the RC of any |
| 4518 | returned args. See below for more details. |
| 4519 | |
| 4520 | =back |
| 4521 | |
| 4522 | Note that perl uses a stack-of-stacks, and the AvREAL() and |
| 4523 | C<si_stack_nonrc_base> states are per stack. When perl starts up, the main |
| 4524 | stack is RC, but by default, new stacks pushed in XS code via PUSHSTACKi() |
| 4525 | are non-RC, so it is quite possible to get a mixture. The perl core itself |
| 4526 | uses the new push_stackinfo() function which replaces PUSHSTACKi() and |
| 4527 | allows you to specify that the new stack should be RC by default. |
| 4528 | (XXX core mostly hasn't actually been updated yet to use push_stackinfo()) |
| 4529 | |
| 4530 | Most places in the core assume a particular RC environment. In particular, |
| 4531 | it is assumed that within a runops loop, all the PP functions are |
| 4532 | RC-aware, either because they have been (re)written to be aware, or |
| 4533 | because they have been wrapped. Whenever a runops loop is entered via |
| 4534 | CALLRUNOPS(), it will check the current state of the stack, and if it's |
| 4535 | not fully RC, will temporarily update its contents to be fully RC before |
| 4536 | entering the main runops loop. Then if necessary it will restore the stack |
| 4537 | to its old state on return. This means that functions like call_sv(), |
| 4538 | which can be called from any environment (e.g. RC core or wrapped and |
| 4539 | temporarily non-RC XS code) will always do the Right Thing when invoking |
| 4540 | the runops loop, no matter what the current stack state is. |
| 4541 | |
| 4542 | Similarly, croaks and the like (which can occur anywhere) have to be able |
| 4543 | to handle both stack types. So there are a few places in core - call_sv(), |
| 4544 | eval_sv() etc, Perl_die_unwind() and S_my_exit_jump() - which have been |
| 4545 | specially crafted to handle both cases; everything else can assume a fixed |
| 4546 | environment. |
| 4547 | |
| 4548 | =head2 Wrapping |
| 4549 | |
| 4550 | Normally a core PP function is declared like this: |
| 4551 | |
| 4552 | PP(pp_foo) |
| 4553 | { |
| 4554 | ... |
| 4555 | } |
| 4556 | |
| 4557 | This expands to something like: |
| 4558 | |
| 4559 | OP* Perl_pp_foo(pTHX) |
| 4560 | { |
| 4561 | ... |
| 4562 | } |
| 4563 | |
| 4564 | When such a function needs to be wrapped, it is instead declared as: |
| 4565 | |
| 4566 | PP_wrapped(pp_foo, nargs, nlists) |
| 4567 | { |
| 4568 | ... |
| 4569 | } |
| 4570 | |
| 4571 | which on non-RC builds, expands to the same as PP() (the extra args are |
| 4572 | ignored). On RC builds it expands to something like |
| 4573 | |
| 4574 | OP* Perl_pp_foo(pTHX) |
| 4575 | { |
| 4576 | return Perl_pp_wrap(aTHX_ S_Perl_pp_foo_norc, nargs, nlists); |
| 4577 | } |
| 4578 | |
| 4579 | STATIC OP* S_Perl_pp_foo_norc(pTHX) |
| 4580 | { |
| 4581 | ... |
| 4582 | } |
| 4583 | |
| 4584 | Here the externally visible PP function calls pp_wrap(), which adjusts |
| 4585 | the stack contents, then calls the hidden real body of the PP function, |
| 4586 | then on return, adjusts the stack back. |
| 4587 | |
| 4588 | There is an API macro, XSPP_wrapped(), intended for use on PP functions |
| 4589 | declared in XS code, It is identical to PP_wrapped(), except that it |
| 4590 | doesn't prepend a C<Perl_> prefix to the function name. |
| 4591 | |
| 4592 | The C<nargs> and C<nlists> parameters to the macro are numeric constants |
| 4593 | or simple expressions which specify how many arguments the PP function |
| 4594 | expects, or how many lists it expects. For example, |
| 4595 | |
| 4596 | PP_wrapped(pp_add, 2, 0); /* consumes two args off the stack */ |
| 4597 | |
| 4598 | PP_wrapped(pp_readline, /* consumes one or two args */ |
| 4599 | ((PL_op->op_flags & OPf_STACKED) ? 2 : 1), 0); |
| 4600 | |
| 4601 | PP_wrapped(pp_push, 0, 1); /* consumes one list */ |
| 4602 | |
| 4603 | PP_wrapped(pp_aassign, 0, 2); /* consumes two lists */ |
| 4604 | |
| 4605 | To understand what pp_wrap() does, consider calling Perl_pp_foo() which |
| 4606 | expects three arguments. On entry the stack may look like: |
| 4607 | |
| 4608 | ... A+ B+ C+ |
| 4609 | |
| 4610 | (where the C<+> indicates that the pointers to A, B and C are each |
| 4611 | reference counted). The wrapper function pp_wrap() marks a cut at the |
| 4612 | current stack position using C<si_stack_nonrc_base>, then, based on the |
| 4613 | value of C<nargs>, pushes a copy of those three pointers above the cut: |
| 4614 | |
| 4615 | ... A+ B+ C+ | A0 B0 C0 |
| 4616 | |
| 4617 | (where the C<0> indicates that the pointers aren't RC), then calls the |
| 4618 | real PP function, S_Perl_pp_foo_norc(). That function processes A, B and C, |
| 4619 | pops them off the stack, and pushes some result SVs. None of this |
| 4620 | manipulation adjusts any RCs. On return to pp_wrap(), the stack may look |
| 4621 | something like: |
| 4622 | |
| 4623 | ... A+ B+ C+ | X0 Y0 |
| 4624 | |
| 4625 | The wrapper function bumps up the RCs of X and Y, decrements A B C, |
| 4626 | shifts the results down and sets C<si_stack_nonrc_base> to zero, leaving |
| 4627 | the stack as: |
| 4628 | |
| 4629 | ... X+ Y+ |
| 4630 | |
| 4631 | In places like pp_entersub(), a similar wrapping (via the functions |
| 4632 | rpp_invoke_xs() and then xs_wrap()) is done when calling XS subs. |
| 4633 | |
| 4634 | When C<nlists> is positive, a similar action takes place, except that the |
| 4635 | mark stack is examined (and adjusted) in order to determine the number of |
| 4636 | args that need copying. |
| 4637 | |
| 4638 | A complex calling environment might have multiple nested stacks with |
| 4639 | different RC states. Perl starts off with an RC stack. Then for example, |
| 4640 | pp_entersub() is called, which (via xs_wrap()) splits the stack and |
| 4641 | executes the XS function in a non-RC environment. That function may call |
| 4642 | PUSHSTACKi(), which creates a new non-RC stack, then calls call_sv(), which |
| 4643 | does CALLRUNOPS(), which causes the new stack to temporarily become RC. |
| 4644 | Then a tied method is called, which pushes a new RC stack, and so on. (XXX |
| 4645 | currently tied methods actually push a non-RC stack. To be fixed soon). |
| 4646 | |
| 4647 | =head2 (Re)writing a PP function using the rpp_() API |
| 4648 | |
| 4649 | Wrapping a PP function has a performance overhead, and is there mainly as |
| 4650 | a temporary crutch. Eventually, PP functions should be updated to use |
| 4651 | rpp_() functions, and any new PP functions should be written this way from |
| 4652 | scratch and thus not ever need wrapping. |
| 4653 | |
| 4654 | A couple examples of core PP functions being converted can be seen in the |
| 4655 | commits C<v5.39.1-304-g205fcd8410> and C<v5.39.1-303-g2fe263a83a>, which |
| 4656 | demonstrate a unary and a binary op being converted (pp_not() and |
| 4657 | pp_and()). |
| 4658 | |
| 4659 | The traditional PP stack API consisted of a C<dSP> declaration, plus a |
| 4660 | number of macros to push, pop and extend the stack. A I<very simplified> |
| 4661 | pp_add() function might look something like: |
| 4662 | |
| 4663 | PP(pp_add) |
| 4664 | { |
| 4665 | dSP; |
| 4666 | dTARGET; |
| 4667 | IV right = SvIV(POPs); |
| 4668 | IV left = SvIV(POPs); |
| 4669 | TARGi(left + right, 1); |
| 4670 | PUSHs(TARG); |
| 4671 | PUTBACK; |
| 4672 | return NORMAL; |
| 4673 | } |
| 4674 | |
| 4675 | which expands to something like: |
| 4676 | |
| 4677 | { |
| 4678 | SV **sp = PL_stack_sp; |
| 4679 | SV *targ = PAD_SV(PL_op->op_targ); |
| 4680 | IV right = SvIV(*sp--); |
| 4681 | IV left = SvIV(*sp--); |
| 4682 | sv_setiv(targ, left + right); |
| 4683 | *++sp = targ; |
| 4684 | PL_stack_sp = sp; |
| 4685 | return PL_op->op_next; |
| 4686 | } |
| 4687 | |
| 4688 | The whole C<dSP> thing harks back to the days before decent optimising |
| 4689 | compilers. It was always error-prone, e.g. if you forgot a C<PUTBACK> or |
| 4690 | C<SPAGAIN>. The new API always just accesses C<PL_stack_sp> directly. In |
| 4691 | fact the first step of upgrading a PP function is always to remove the |
| 4692 | C<dSP> declaration. This has the happy side effect that any old-style |
| 4693 | macros left in the pp function which implicitly use C<sp> will become |
| 4694 | compile errors. The existence of a C<dSP> somewhere in core is a good sign |
| 4695 | that that function still needs updating. |
| 4696 | |
| 4697 | An obvious question is: why not just modify the definitions of the PUSHs() |
| 4698 | etc macros to modify reference counts on RC builds? The basic problem is |
| 4699 | that an SV may now be kept alive only by a single reference count from |
| 4700 | the stack (formerly, they tended to be on the TEMPs stack too). So in code |
| 4701 | like: |
| 4702 | |
| 4703 | SV *sv = POPs; |
| 4704 | IV i = SvIV(sv); |
| 4705 | |
| 4706 | including an SvREFCNT_dec() in the C<POPs> macro definition would cause |
| 4707 | C<sv> to be freed immediately, before its integer value can be read. |
| 4708 | |
| 4709 | A potential issue with the new regime is that perl can croak at basically |
| 4710 | any point in execution (e.g. the SvIV() above might call FETCH() on a tied |
| 4711 | variable which then croaks). Thus at all times, the RC of each SV must be |
| 4712 | properly accounted for. In the example above, a naive approach to avoiding |
| 4713 | a premature free of C<sv> might be: |
| 4714 | |
| 4715 | SV *sv = *PL_stack_sp--; |
| 4716 | IV i = SvIV(sv); |
| 4717 | SvREFCNT_dec(sv); // got i, so ok to free sv now |
| 4718 | |
| 4719 | but that means that C<sv> leaks if SvIV() triggers a croak. |
| 4720 | |
| 4721 | To avoid that, the new regime has the general outline that arguments are |
| 4722 | left on the stack I<until they are finished with>, then removed and their |
| 4723 | reference count adjusted at that point. With the new API, the pp_add() |
| 4724 | function looks something like: |
| 4725 | |
| 4726 | { |
| 4727 | dTARGET; |
| 4728 | IV right = SvIV(PL_stack_sp[ 0]); // NB: arguments left on stack |
| 4729 | IV left = SvIV(PL_stack_sp[-1]); |
| 4730 | TARGi(left + right, 1); |
| 4731 | rpp_replace_2_1(targ); |
| 4732 | return NORMAL; |
| 4733 | } |
| 4734 | |
| 4735 | The rpp_replace_2_1() function pops two values off the stack and pushes |
| 4736 | one new value on, while adjusting reference counts as appropriate |
| 4737 | (depending on whether built with C<PERL_RC_STACK> or not). |
| 4738 | |
| 4739 | The rpp_() functions in the new API will be described in detail below, but |
| 4740 | in summary: |
| 4741 | |
| 4742 | new function approximate old equivant |
| 4743 | ------------ ----------------------- |
| 4744 | |
| 4745 | rpp_extend(n) EXTEND(SP, n) |
| 4746 | |
| 4747 | rpp_push_1(sv) PUSHs(sv) |
| 4748 | rpp_push_2(sv1, sv2)) PUSHs(sv1); PUSHs(sv2) |
| 4749 | rpp_xpush_1(sv) XPUSHs(sv) |
| 4750 | rpp_xpush_2(sv1, sv2)) EXTEND(SP,2); PUSHs(sv1); PUSHs(sv2); |
| 4751 | |
| 4752 | rpp_push_1_norc(sv) mPUSHs(sv) // on RC bulds, skips RC++; |
| 4753 | // on non-RC builds, mortalises |
| 4754 | rpp_popfree_1() (void)POPs; |
| 4755 | rpp_popfree_2() (void)POPs; (void)POPs; |
| 4756 | rpp_popfree_to(svp) PL_stack_sp = svp; |
| 4757 | rpp_obliterate_stack_to(ix) // see description below |
| 4758 | |
| 4759 | sv = rpp_pop_1_norc() sv = SvREFCNT_inc(POPs) |
| 4760 | |
| 4761 | rpp_replace_1_1(sv) (void)POPs; PUSHs(sv); |
| 4762 | rpp_replace_2_1(sv) (void)POPs; (void)POPs; PUSHs(sv); |
| 4763 | rpp_replace_at(sp, sv) *sp = sv; |
| 4764 | rpp_replace_at_norc(sp, sv) *sp = sv_2mortal(sv); |
| 4765 | |
| 4766 | rpp_context(mark, gimme, |
| 4767 | extra) SP -= extra; |
| 4768 | // impose void/scalar/list context on return args |
| 4769 | SP = (gimme == G_VOID) ? mark : .... |
| 4770 | |
| 4771 | rpp_try_AMAGIC_1() tryAMAGICun_MG() |
| 4772 | rpp_try_AMAGIC_2() tryAMAGICbin_MG() |
| 4773 | |
| 4774 | rpp_is_lone(sv) SvTEMP(sv) && SvREFCNT(sv) == 1 |
| 4775 | rpp_stack_is_rc() no equivalent |
| 4776 | |
| 4777 | rpp_invoke_xs(cv) CvXSUB(cv)(aTHX_ cv); |
| 4778 | |
| 4779 | |
| 4780 | (no replacement) dATARGET // just write the macro body in full |
| 4781 | |
| 4782 | There are also some C<_NN> variants which assume that any items being |
| 4783 | removed from the stack are non-NULL, and so are slightly more efficient: |
| 4784 | |
| 4785 | rpp_popfree_1_NN() |
| 4786 | rpp_popfree_2_NN() |
| 4787 | rpp_popfree_to_NN(svp) |
| 4788 | |
| 4789 | rpp_replace_1_1_NN(sv) |
| 4790 | rpp_replace_2_1_NN(sv) |
| 4791 | rpp_replace_at_NN(sp, sv) |
| 4792 | rpp_replace_at_norc_NN(sp, sv) |
| 4793 | |
| 4794 | There are also a few C<_IMM> variants, which expect the single pushed or |
| 4795 | replacement value to be an immortal, such as C<&PL_sv_undef> - this skips |
| 4796 | incrementing the ref count of the immortal SV. It doesn't matter if the |
| 4797 | ref count of the SV prematurely reaches zero, as sv_free2() will just |
| 4798 | resurrect it. Not every variant is provided; if a suitable one |
| 4799 | doesn't exist, just using a standard C<_1> version is fine, albeit |
| 4800 | slightly slower. |
| 4801 | |
| 4802 | rpp_push_IMM(&PL_sv_undef) |
| 4803 | rpp_xpush_IMM(&PL_sv_zero) |
| 4804 | rpp_replace_1_IMM_NN(&PL_sv_yes) |
| 4805 | rpp_replace_2_IMM_NN(&PL_sv_no) |
| 4806 | |
| 4807 | Other new C and perl functions related to reference-counted stacks are: |
| 4808 | |
| 4809 | push_stackinfo(type,rc) PUSHSTACKi(type) |
| 4810 | pop_stackinfo() POPSTACK() |
| 4811 | switch_argstack(to) SWITCHSTACK(from,to) |
| 4812 | |
| 4813 | (Internals::stack_refcounted() & 1) # perl built with PERL_RC_STACK |
| 4814 | |
| 4815 | Some of these new functions are trivial, but should be used in preference |
| 4816 | to writing direct code because they will work on both RC and non-RC |
| 4817 | builds, and may do extra checks and assertions on C<DEBUGGING> builds. |
| 4818 | |
| 4819 | Note that rpp_popfree_1() etc aren't direct replacements for C<POPs>. The |
| 4820 | rpp_() variants don't return a value and are intended to be called when |
| 4821 | the SV is finished with. So |
| 4822 | |
| 4823 | SV *sv = POPs; |
| 4824 | ... do stuff with sv ... |
| 4825 | |
| 4826 | becomes |
| 4827 | |
| 4828 | SV *sv = *PL_stack_sp; |
| 4829 | ... do stuff with sv ... |
| 4830 | rpp_popfree_1(); /* does SvREFCNT_dec(*PL_stack_sp--) */ |
| 4831 | |
| 4832 | The rpp_replace_M_N() functions are shortcuts for popping and freeing C<M> |
| 4833 | items then pushing and bumping up the RCs of C<N> items. Note that they |
| 4834 | handle edge cases such as an old and new SV being the same. |
| 4835 | |
| 4836 | rpp_replace_at(sp, sv) is similar to rpp_replace_1_1(), except that |
| 4837 | it replaces an SV at an address in the stack rather than at the top. |
| 4838 | |
| 4839 | rpp_replace_at_norc(sp, sv) is similar to rpp_replace_at(), except that |
| 4840 | it assumes that C<sv> already has a bumped reference count. So, a bit |
| 4841 | like rpp_push_1_norc() (see below), it doesn't bother increasing C<sv>'s |
| 4842 | reference count, or on non-RC builds it mortalises it instead. |
| 4843 | |
| 4844 | rpp_popfree_to(svp) is designed to replace code like |
| 4845 | |
| 4846 | PL_stack_sp = PL_stack_base + cx->blk_oldsp; |
| 4847 | |
| 4848 | which typically appears in list ops or scope exits when the arguments are |
| 4849 | finished with. Left unaltered, all the SVs above C<oldsp> would leak. The |
| 4850 | new approach is |
| 4851 | |
| 4852 | rpp_popfree_to(PL_stack_base + cx->blk_oldsp); |
| 4853 | |
| 4854 | There is a rarely-used variant of this, rpp_obliterate_stack_to(), which |
| 4855 | pops the stack back to the specified index regardless of the current RC |
| 4856 | state of the stack. So for example if the stack is split, it will only |
| 4857 | adjust the RCs of any SVs which are below the split point, while |
| 4858 | rpp_popfree_to() would mindlessly free I<all> SVs (on RC builds anyway). |
| 4859 | For normal PP functions you should only ever use rpp_popfree_to(), which |
| 4860 | is faster. |
| 4861 | |
| 4862 | There are no new equivalents for all the convenience macros like POPi() |
| 4863 | and (shudder) dPOPPOPiirl(). These should be replaced with the rpp_() |
| 4864 | functions above and with the conversions and variable declarations being |
| 4865 | made explicit, e.g. dPOPPOPiirl() becomes: |
| 4866 | |
| 4867 | IV right = SvIV(PL_stack_sp[ 0]); |
| 4868 | IV left = SvIV(PL_stack_sp[-1]); |
| 4869 | rpp_popfree_2(); |
| 4870 | |
| 4871 | A couple of the rpp_() functions with C<norc> in their names don't adjust |
| 4872 | the reference count on RC builds (but, conversely, do on non-RC builds). |
| 4873 | |
| 4874 | rpp_push_1_norc(sv) does a simple C<*++PL_stack_sp = sv> on RC builds. It |
| 4875 | is typically used to "root" a newly-created SV, which already has an RC of |
| 4876 | 1. On non-RC builds it mortalises the SV instead. So for example, code |
| 4877 | which used to look like |
| 4878 | |
| 4879 | mPUSHs(newSViv(i)); |
| 4880 | |
| 4881 | and which expanded to the equivalent of: |
| 4882 | |
| 4883 | PUSHs(sv_2mortal(newSViv(i)); |
| 4884 | |
| 4885 | should be rewritten as: |
| 4886 | |
| 4887 | rpp_push_1_norc(newSViv(i)); |
| 4888 | |
| 4889 | This is because newSViv() and similar create a new SV with a reference |
| 4890 | count one too high (1 rather than 0). This count is then "donated" to the |
| 4891 | stack by pushing it. Conversely on non-RC builds, the count is donated to |
| 4892 | the TEMPs stack. |
| 4893 | |
| 4894 | Similarly, on RC builds, C<sv = rpp_pop_1_norc()> does a simple |
| 4895 | C<sv = *PL_stack_sv--> without adjusting the reference count, while on |
| 4896 | non-RC builds it actually increments the SV's reference count. It is |
| 4897 | intended for cases where you immediately want to increment the reference |
| 4898 | count again after popping, e.g. where the SV is to be immediately embedded |
| 4899 | somewhere. For example this code: |
| 4900 | |
| 4901 | SV *sv = PL_stack_sp[0]; |
| 4902 | SvREFCNT_inc(sv); |
| 4903 | av_store(av, i, sv); /* in real life should check return value */ |
| 4904 | rpp_popfree_1(); |
| 4905 | |
| 4906 | can be more efficiently written as |
| 4907 | |
| 4908 | av_store(av, i, rpp_pop_1_norc()); |
| 4909 | |
| 4910 | By using this function, the code works correctly on both RC and non-RC |
| 4911 | builds. |
| 4912 | |
| 4913 | A common operation on list ops is to impose void, scalar or list context |
| 4914 | on the return arguments, possibly discarding all, or all except one, of |
| 4915 | them. rpp_context(mark, gimme, extra) does this. As a first step (for |
| 4916 | convenience and efficiency) it notionally pops C<extra> args off the |
| 4917 | stack. Then for list context, leaves things as is. For void context, the |
| 4918 | stack pointer is reset to mark, and everything above is popped. For |
| 4919 | scalar, the top argument (or &PL_sv_undef) is moved from the top to |
| 4920 | mark+1 and everything above is discarded. |
| 4921 | |
| 4922 | The macros which appear at the start of many PP functions to check for |
| 4923 | unary or binary op overloading (among other things) have been replaced |
| 4924 | with rpp_try_AMAGIC_1() and _2() inline functions, which now rely on the |
| 4925 | calling PP function to choose whether to return immediately rather than |
| 4926 | the return being hidden away in the macro. |
| 4927 | |
| 4928 | The rpp_invoke_xs() function calls the XS function associated with the CV, |
| 4929 | but may do so via a wrapper function to adjust the stack as necessary. |
| 4930 | |
| 4931 | In the spirit of hiding away less in macros, C<dATARGET> hasn't been given |
| 4932 | a replacement; where its effect is needed, it is now written out in full; |
| 4933 | see pp_add() for an example. |
| 4934 | |
| 4935 | Finally, a couple of rpp() functions provide information rather than |
| 4936 | manipulate the stack. |
| 4937 | |
| 4938 | rpp_is_lone(sv) indicates whether C<sv>, assumed to be still on the stack, |
| 4939 | it kept alive only by a single reference-counted pointer from the argument |
| 4940 | and/or temps stacks, and thus is a candidate for some optimisations (like |
| 4941 | skipping the copying of return arguments from a subroutine call). |
| 4942 | |
| 4943 | rpp_stack_is_rc() indicates whether the current stack is currently |
| 4944 | reference-counted. It's used mainly in a few places like call_sv() which |
| 4945 | can be called from anywhere, and thus have to deal with both cases. |
| 4946 | |
| 4947 | So for example, rather than using rpp_xpush_1(), call_sv() has lines like: |
| 4948 | |
| 4949 | rpp_extend(1); |
| 4950 | *++PL_stack_sp = sv; |
| 4951 | #ifdef PERL_RC_STACK |
| 4952 | if (rpp_stack_is_rc()) |
| 4953 | SvREFCNT_inc_simple_void_NN(sv); |
| 4954 | #endif |
| 4955 | |
| 4956 | which works on both standard builds and RC builds, and works whether |
| 4957 | call_sv() is called from a standard PP function (rpp_stack_is_rc() is |
| 4958 | true) or from a wrapped PP or XS function (rpp_stack_is_rc() is false). |
| 4959 | Note that you're unlikely to need to use this function, as in most places, |
| 4960 | such as PP or XS functions, it is always RC or non-RC respectively. In |
| 4961 | fact on debugging builds under C<PERL_RC_STACK>, PUSHs() and similar |
| 4962 | macros include an C<assert(!rpp_stack_is_rc())>, while rpp_push_1() and |
| 4963 | similar functions have C<assert(rpp_stack_is_rc())>. |
| 4964 | |
| 4965 | The macros for pushing new stackinfos have been replaced with inline |
| 4966 | functions which don't rely on C<dSP> being in scope, and which have less |
| 4967 | ambiguous names: they make it clear that a new I<stackinfo> is being |
| 4968 | pushed, rather than just some sort of I<stack>. push_stackinfo() also has |
| 4969 | a boolean argument indicating whether the new argument stack should be |
| 4970 | reference-counted or not. For backwards compatibility, PUSHSTACKi(type) is |
| 4971 | defined to be push_stackinfo(type, 0). |
| 4972 | |
| 4973 | Some test scripts check for things like leaks by testing that the |
| 4974 | reference count of a particular variable has an expected value. If this |
| 4975 | is different on a perl built with C<PERL_RC_STACK>, then the perl |
| 4976 | function Internals::stack_refcounted() can be used. This returns an |
| 4977 | integer, the lowest bit of which indicates that perl was built with |
| 4978 | C<PERL_RC_STACK>. Other bits are reserved for future use and should be |
| 4979 | masked out. |
| 4980 | |
| 4981 | =head1 Slab-based operator allocation |
| 4982 | |
| 4983 | B<Note:> this section describes a non-public internal API that is subject |
| 4984 | to change without notice. |
| 4985 | |
| 4986 | Perl's internal error-handling mechanisms implement C<die> (and its internal |
| 4987 | equivalents) using longjmp. If this occurs during lexing, parsing or |
| 4988 | compilation, we must ensure that any ops allocated as part of the compilation |
| 4989 | process are freed. (Older Perl versions did not adequately handle this |
| 4990 | situation: when failing a parse, they would leak ops that were stored in |
| 4991 | C C<auto> variables and not linked anywhere else.) |
| 4992 | |
| 4993 | To handle this situation, Perl uses I<op slabs> that are attached to the |
| 4994 | currently-compiling CV. A slab is a chunk of allocated memory. New ops are |
| 4995 | allocated as regions of the slab. If the slab fills up, a new one is created |
| 4996 | (and linked from the previous one). When an error occurs and the CV is freed, |
| 4997 | any ops remaining are freed. |
| 4998 | |
| 4999 | Each op is preceded by two pointers: one points to the next op in the slab, and |
| 5000 | the other points to the slab that owns it. The next-op pointer is needed so |
| 5001 | that Perl can iterate over a slab and free all its ops. (Op structures are of |
| 5002 | different sizes, so the slab's ops can't merely be treated as a dense array.) |
| 5003 | The slab pointer is needed for accessing a reference count on the slab: when |
| 5004 | the last op on a slab is freed, the slab itself is freed. |
| 5005 | |
| 5006 | The slab allocator puts the ops at the end of the slab first. This will tend to |
| 5007 | allocate the leaves of the op tree first, and the layout will therefore |
| 5008 | hopefully be cache-friendly. In addition, this means that there's no need to |
| 5009 | store the size of the slab (see below on why slabs vary in size), because Perl |
| 5010 | can follow pointers to find the last op. |
| 5011 | |
| 5012 | It might seem possible to eliminate slab reference counts altogether, by having |
| 5013 | all ops implicitly attached to C<PL_compcv> when allocated and freed when the |
| 5014 | CV is freed. That would also allow C<op_free> to skip C<FreeOp> altogether, and |
| 5015 | thus free ops faster. But that doesn't work in those cases where ops need to |
| 5016 | survive beyond their CVs, such as re-evals. |
| 5017 | |
| 5018 | The CV also has to have a reference count on the slab. Sometimes the first op |
| 5019 | created is immediately freed. If the reference count of the slab reaches 0, |
| 5020 | then it will be freed with the CV still pointing to it. |
| 5021 | |
| 5022 | CVs use the C<CVf_SLABBED> flag to indicate that the CV has a reference count |
| 5023 | on the slab. When this flag is set, the slab is accessible via C<CvSTART> when |
| 5024 | C<CvROOT> is not set, or by subtracting two pointers C<(2*sizeof(I32 *))> from |
| 5025 | C<CvROOT> when it is set. The alternative to this approach of sneaking the slab |
| 5026 | into C<CvSTART> during compilation would be to enlarge the C<xpvcv> struct by |
| 5027 | another pointer. But that would make all CVs larger, even though slab-based op |
| 5028 | freeing is typically of benefit only for programs that make significant use of |
| 5029 | string eval. |
| 5030 | |
| 5031 | =for apidoc_section $concurrency |
| 5032 | =for apidoc Cmnh| |CVf_SLABBED |
| 5033 | =for apidoc_item |OP *|CvROOT|CV * sv |
| 5034 | =for apidoc_item |OP *|CvSTART|CV * sv |
| 5035 | |
| 5036 | When the C<CVf_SLABBED> flag is set, the CV takes responsibility for freeing |
| 5037 | the slab. If C<CvROOT> is not set when the CV is freed or undeffed, it is |
| 5038 | assumed that a compilation error has occurred, so the op slab is traversed and |
| 5039 | all the ops are freed. |
| 5040 | |
| 5041 | Under normal circumstances, the CV forgets about its slab (decrementing the |
| 5042 | reference count) when the root is attached. So the slab reference counting that |
| 5043 | happens when ops are freed takes care of freeing the slab. In some cases, the |
| 5044 | CV is told to forget about the slab (C<cv_forget_slab>) precisely so that the |
| 5045 | ops can survive after the CV is done away with. |
| 5046 | |
| 5047 | Forgetting the slab when the root is attached is not strictly necessary, but |
| 5048 | avoids potential problems with C<CvROOT> being written over. There is code all |
| 5049 | over the place, both in core and on CPAN, that does things with C<CvROOT>, so |
| 5050 | forgetting the slab makes things more robust and avoids potential problems. |
| 5051 | |
| 5052 | Since the CV takes ownership of its slab when flagged, that flag is never |
| 5053 | copied when a CV is cloned, as one CV could free a slab that another CV still |
| 5054 | points to, since forced freeing of ops ignores the reference count (but asserts |
| 5055 | that it looks right). |
| 5056 | |
| 5057 | To avoid slab fragmentation, freed ops are marked as freed and attached to the |
| 5058 | slab's freed chain (an idea stolen from DBM::Deep). Those freed ops are reused |
| 5059 | when possible. Not reusing freed ops would be simpler, but it would result in |
| 5060 | significantly higher memory usage for programs with large C<if (DEBUG) {...}> |
| 5061 | blocks. |
| 5062 | |
| 5063 | C<SAVEFREEOP> is slightly problematic under this scheme. Sometimes it can cause |
| 5064 | an op to be freed after its CV. If the CV has forcibly freed the ops on its |
| 5065 | slab and the slab itself, then we will be fiddling with a freed slab. Making |
| 5066 | C<SAVEFREEOP> a no-op doesn't help, as sometimes an op can be savefreed when |
| 5067 | there is no compilation error, so the op would never be freed. It holds |
| 5068 | a reference count on the slab, so the whole slab would leak. So C<SAVEFREEOP> |
| 5069 | now sets a special flag on the op (C<< ->op_savefree >>). The forced freeing of |
| 5070 | ops after a compilation error won't free any ops thus marked. |
| 5071 | |
| 5072 | Since many pieces of code create tiny subroutines consisting of only a few ops, |
| 5073 | and since a huge slab would be quite a bit of baggage for those to carry |
| 5074 | around, the first slab is always very small. To avoid allocating too many |
| 5075 | slabs for a single CV, each subsequent slab is twice the size of the previous. |
| 5076 | |
| 5077 | Smartmatch expects to be able to allocate an op at run time, run it, and then |
| 5078 | throw it away. For that to work the op is simply malloced when C<PL_compcv> hasn't |
| 5079 | been set up. So all slab-allocated ops are marked as such (C<< ->op_slabbed >>), |
| 5080 | to distinguish them from malloced ops. |
| 5081 | |
| 5082 | |
| 5083 | =head1 AUTHORS |
| 5084 | |
| 5085 | Until May 1997, this document was maintained by Jeff Okamoto |
| 5086 | E<lt>okamoto@corp.hp.comE<gt>. It is now maintained as part of Perl |
| 5087 | itself by the Perl 5 Porters E<lt>perl5-porters@perl.orgE<gt>. |
| 5088 | |
| 5089 | With lots of help and suggestions from Dean Roehrich, Malcolm Beattie, |
| 5090 | Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil |
| 5091 | Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer, |
| 5092 | Stephen McCamant, and Gurusamy Sarathy. |
| 5093 | |
| 5094 | =head1 SEE ALSO |
| 5095 | |
| 5096 | L<perlapi>, L<perlintern>, L<perlxs>, L<perlembed> |