Commit | Line | Data |
---|---|---|
a0d0e21e LW |
1 | =head1 NAME |
2 | ||
954c1994 | 3 | perlguts - Introduction to the Perl API |
a0d0e21e LW |
4 | |
5 | =head1 DESCRIPTION | |
6 | ||
b3b6085d | 7 | This document attempts to describe how to use the Perl API, as well as |
10e2eb10 FC |
8 | to provide some info on the basic workings of the Perl core. It is far |
9 | from complete and probably contains many errors. Please refer any | |
b3b6085d | 10 | questions or comments to the author below. |
a0d0e21e | 11 | |
0a753a76 | 12 | =head1 Variables |
13 | ||
5f05dabc | 14 | =head2 Datatypes |
a0d0e21e LW |
15 | |
16 | Perl has three typedefs that handle Perl's three main data types: | |
17 | ||
18 | SV Scalar Value | |
19 | AV Array Value | |
20 | HV Hash Value | |
21 | ||
d1b91892 | 22 | Each typedef has specific routines that manipulate the various data types. |
a0d0e21e LW |
23 | |
24 | =head2 What is an "IV"? | |
25 | ||
954c1994 | 26 | Perl uses a special typedef IV which is a simple signed integer type that is |
5f05dabc | 27 | guaranteed to be large enough to hold a pointer (as well as an integer). |
954c1994 | 28 | Additionally, there is the UV, which is simply an unsigned IV. |
a0d0e21e | 29 | |
d1b91892 | 30 | Perl also uses two special typedefs, I32 and I16, which will always be at |
10e2eb10 | 31 | least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16, |
20dbd849 NC |
32 | as well.) They will usually be exactly 32 and 16 bits long, but on Crays |
33 | they will both be 64 bits. | |
a0d0e21e | 34 | |
54310121 | 35 | =head2 Working with SVs |
a0d0e21e | 36 | |
20dbd849 NC |
37 | An SV can be created and loaded with one command. There are five types of |
38 | values that can be loaded: an integer value (IV), an unsigned integer | |
39 | value (UV), a double (NV), a string (PV), and another scalar (SV). | |
61984ee1 KW |
40 | ("PV" stands for "Pointer Value". You might think that it is misnamed |
41 | because it is described as pointing only to strings. However, it is | |
d6605d24 KW |
42 | possible to have it point to other things For example, it could point |
43 | to an array of UVs. But, | |
61984ee1 KW |
44 | using it for non-strings requires care, as the underlying assumption of |
45 | much of the internals is that PVs are just for strings. Often, for | |
46 | example, a trailing NUL is tacked on automatically. The non-string use | |
47 | is documented only in this paragraph.) | |
a0d0e21e | 48 | |
20dbd849 | 49 | The seven routines are: |
a0d0e21e LW |
50 | |
51 | SV* newSViv(IV); | |
20dbd849 | 52 | SV* newSVuv(UV); |
a0d0e21e | 53 | SV* newSVnv(double); |
06f6df17 RGS |
54 | SV* newSVpv(const char*, STRLEN); |
55 | SV* newSVpvn(const char*, STRLEN); | |
46fc3d4c | 56 | SV* newSVpvf(const char*, ...); |
a0d0e21e LW |
57 | SV* newSVsv(SV*); |
58 | ||
06f6df17 RGS |
59 | C<STRLEN> is an integer type (Size_t, usually defined as size_t in |
60 | F<config.h>) guaranteed to be large enough to represent the size of | |
61 | any string that perl can handle. | |
62 | ||
3bf17896 | 63 | In the unlikely case of a SV requiring more complex initialization, you |
06f6df17 RGS |
64 | can create an empty SV with newSV(len). If C<len> is 0 an empty SV of |
65 | type NULL is returned, else an SV of type PV is returned with len + 1 (for | |
66 | the NUL) bytes of storage allocated, accessible via SvPVX. In both cases | |
da8c5729 | 67 | the SV has the undef value. |
20dbd849 | 68 | |
06f6df17 | 69 | SV *sv = newSV(0); /* no storage allocated */ |
a9b0660e KW |
70 | SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage |
71 | * allocated */ | |
20dbd849 | 72 | |
06f6df17 | 73 | To change the value of an I<already-existing> SV, there are eight routines: |
a0d0e21e LW |
74 | |
75 | void sv_setiv(SV*, IV); | |
deb3007b | 76 | void sv_setuv(SV*, UV); |
a0d0e21e | 77 | void sv_setnv(SV*, double); |
08105a92 | 78 | void sv_setpv(SV*, const char*); |
06f6df17 | 79 | void sv_setpvn(SV*, const char*, STRLEN) |
46fc3d4c | 80 | void sv_setpvf(SV*, const char*, ...); |
a9b0660e KW |
81 | void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, |
82 | SV **, I32, bool *); | |
a0d0e21e LW |
83 | void sv_setsv(SV*, SV*); |
84 | ||
85 | Notice that you can choose to specify the length of the string to be | |
9da1e3b5 MUN |
86 | assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may |
87 | allow Perl to calculate the length by using C<sv_setpv> or by specifying | |
88 | 0 as the second argument to C<newSVpv>. Be warned, though, that Perl will | |
89 | determine the string's length by using C<strlen>, which depends on the | |
a9b0660e KW |
90 | string terminating with a NUL character, and not otherwise containing |
91 | NULs. | |
9abd00ed GS |
92 | |
93 | The arguments of C<sv_setpvf> are processed like C<sprintf>, and the | |
94 | formatted output becomes the value. | |
95 | ||
328bf373 | 96 | C<sv_vsetpvfn> is an analogue of C<vsprintf>, but it allows you to specify |
9abd00ed GS |
97 | either a pointer to a variable argument list or the address and length of |
98 | an array of SVs. The last argument points to a boolean; on return, if that | |
99 | boolean is true, then locale-specific information has been used to format | |
c2611fb3 | 100 | the string, and the string's contents are therefore untrustworthy (see |
9abd00ed GS |
101 | L<perlsec>). This pointer may be NULL if that information is not |
102 | important. Note that this function requires you to specify the length of | |
103 | the format. | |
104 | ||
9da1e3b5 MUN |
105 | The C<sv_set*()> functions are not generic enough to operate on values |
106 | that have "magic". See L<Magic Virtual Tables> later in this document. | |
a0d0e21e | 107 | |
a3cb178b GS |
108 | All SVs that contain strings should be terminated with a NUL character. |
109 | If it is not NUL-terminated there is a risk of | |
5f05dabc | 110 | core dumps and corruptions from code which passes the string to C |
111 | functions or system calls which expect a NUL-terminated string. | |
112 | Perl's own functions typically add a trailing NUL for this reason. | |
113 | Nevertheless, you should be very careful when you pass a string stored | |
114 | in an SV to a C function or system call. | |
115 | ||
a0d0e21e LW |
116 | To access the actual value that an SV points to, you can use the macros: |
117 | ||
118 | SvIV(SV*) | |
954c1994 | 119 | SvUV(SV*) |
a0d0e21e LW |
120 | SvNV(SV*) |
121 | SvPV(SV*, STRLEN len) | |
1fa8b10d | 122 | SvPV_nolen(SV*) |
a0d0e21e | 123 | |
954c1994 | 124 | which will automatically coerce the actual scalar type into an IV, UV, double, |
a0d0e21e LW |
125 | or string. |
126 | ||
127 | In the C<SvPV> macro, the length of the string returned is placed into the | |
1fa8b10d JD |
128 | variable C<len> (this is a macro, so you do I<not> use C<&len>). If you do |
129 | not care what the length of the data is, use the C<SvPV_nolen> macro. | |
130 | Historically the C<SvPV> macro with the global variable C<PL_na> has been | |
131 | used in this case. But that can be quite inefficient because C<PL_na> must | |
132 | be accessed in thread-local storage in threaded Perl. In any case, remember | |
133 | that Perl allows arbitrary strings of data that may both contain NULs and | |
134 | might not be terminated by a NUL. | |
a0d0e21e | 135 | |
ce2f5d8f | 136 | Also remember that C doesn't allow you to safely say C<foo(SvPV(s, len), |
10e2eb10 FC |
137 | len);>. It might work with your |
138 | compiler, but it won't work for everyone. | |
ce2f5d8f KA |
139 | Break this sort of statement up into separate assignments: |
140 | ||
1aa6ea50 JC |
141 | SV *s; |
142 | STRLEN len; | |
61955433 | 143 | char *ptr; |
1aa6ea50 JC |
144 | ptr = SvPV(s, len); |
145 | foo(ptr, len); | |
ce2f5d8f | 146 | |
07fa94a1 | 147 | If you want to know if the scalar value is TRUE, you can use: |
a0d0e21e LW |
148 | |
149 | SvTRUE(SV*) | |
150 | ||
151 | Although Perl will automatically grow strings for you, if you need to force | |
152 | Perl to allocate more memory for your SV, you can use the macro | |
153 | ||
154 | SvGROW(SV*, STRLEN newlen) | |
155 | ||
156 | which will determine if more memory needs to be allocated. If so, it will | |
157 | call the function C<sv_grow>. Note that C<SvGROW> can only increase, not | |
5f05dabc | 158 | decrease, the allocated memory of an SV and that it does not automatically |
a9b0660e | 159 | add space for the trailing NUL byte (perl's own string functions typically do |
8ebc5c01 | 160 | C<SvGROW(sv, len + 1)>). |
a0d0e21e LW |
161 | |
162 | If you have an SV and want to know what kind of data Perl thinks is stored | |
163 | in it, you can use the following macros to check the type of SV you have. | |
164 | ||
165 | SvIOK(SV*) | |
166 | SvNOK(SV*) | |
167 | SvPOK(SV*) | |
168 | ||
169 | You can get and set the current length of the string stored in an SV with | |
170 | the following macros: | |
171 | ||
172 | SvCUR(SV*) | |
173 | SvCUR_set(SV*, I32 val) | |
174 | ||
cb1a09d0 AD |
175 | You can also get a pointer to the end of the string stored in the SV |
176 | with the macro: | |
177 | ||
178 | SvEND(SV*) | |
179 | ||
180 | But note that these last three macros are valid only if C<SvPOK()> is true. | |
a0d0e21e | 181 | |
d1b91892 AD |
182 | If you want to append something to the end of string stored in an C<SV*>, |
183 | you can use the following functions: | |
184 | ||
08105a92 | 185 | void sv_catpv(SV*, const char*); |
e65f3abd | 186 | void sv_catpvn(SV*, const char*, STRLEN); |
46fc3d4c | 187 | void sv_catpvf(SV*, const char*, ...); |
a9b0660e KW |
188 | void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, |
189 | I32, bool); | |
d1b91892 AD |
190 | void sv_catsv(SV*, SV*); |
191 | ||
192 | The first function calculates the length of the string to be appended by | |
193 | using C<strlen>. In the second, you specify the length of the string | |
46fc3d4c | 194 | yourself. The third function processes its arguments like C<sprintf> and |
9abd00ed GS |
195 | appends the formatted output. The fourth function works like C<vsprintf>. |
196 | You can specify the address and length of an array of SVs instead of the | |
10e2eb10 FC |
197 | va_list argument. The fifth function |
198 | extends the string stored in the first | |
9abd00ed GS |
199 | SV with the string stored in the second SV. It also forces the second SV |
200 | to be interpreted as a string. | |
201 | ||
202 | The C<sv_cat*()> functions are not generic enough to operate on values that | |
203 | have "magic". See L<Magic Virtual Tables> later in this document. | |
d1b91892 | 204 | |
a0d0e21e LW |
205 | If you know the name of a scalar variable, you can get a pointer to its SV |
206 | by using the following: | |
207 | ||
64ace3f8 | 208 | SV* get_sv("package::varname", 0); |
a0d0e21e LW |
209 | |
210 | This returns NULL if the variable does not exist. | |
211 | ||
d1b91892 | 212 | If you want to know if this variable (or any other SV) is actually C<defined>, |
a0d0e21e LW |
213 | you can call: |
214 | ||
215 | SvOK(SV*) | |
216 | ||
06f6df17 | 217 | The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>. |
9adebda4 | 218 | |
10e2eb10 FC |
219 | Its address can be used whenever an C<SV*> is needed. Make sure that |
220 | you don't try to compare a random sv with C<&PL_sv_undef>. For example | |
9adebda4 SB |
221 | when interfacing Perl code, it'll work correctly for: |
222 | ||
223 | foo(undef); | |
224 | ||
225 | But won't work when called as: | |
226 | ||
227 | $x = undef; | |
228 | foo($x); | |
229 | ||
230 | So to repeat always use SvOK() to check whether an sv is defined. | |
231 | ||
232 | Also you have to be careful when using C<&PL_sv_undef> as a value in | |
233 | AVs or HVs (see L<AVs, HVs and undefined values>). | |
a0d0e21e | 234 | |
06f6df17 RGS |
235 | There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain |
236 | boolean TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their | |
237 | addresses can be used whenever an C<SV*> is needed. | |
a0d0e21e | 238 | |
9cde0e7f | 239 | Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>. |
a0d0e21e LW |
240 | Take this code: |
241 | ||
242 | SV* sv = (SV*) 0; | |
243 | if (I-am-to-return-a-real-value) { | |
244 | sv = sv_2mortal(newSViv(42)); | |
245 | } | |
246 | sv_setsv(ST(0), sv); | |
247 | ||
248 | This code tries to return a new SV (which contains the value 42) if it should | |
04343c6d | 249 | return a real value, or undef otherwise. Instead it has returned a NULL |
a0d0e21e | 250 | pointer which, somewhere down the line, will cause a segmentation violation, |
06f6df17 RGS |
251 | bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the |
252 | first line and all will be well. | |
a0d0e21e LW |
253 | |
254 | To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this | |
3fe9a6f1 | 255 | call is not necessary (see L<Reference Counts and Mortality>). |
a0d0e21e | 256 | |
94dde4fb SC |
257 | =head2 Offsets |
258 | ||
259 | Perl provides the function C<sv_chop> to efficiently remove characters | |
260 | from the beginning of a string; you give it an SV and a pointer to | |
da75cd15 | 261 | somewhere inside the PV, and it discards everything before the |
10e2eb10 | 262 | pointer. The efficiency comes by means of a little hack: instead of |
94dde4fb SC |
263 | actually removing the characters, C<sv_chop> sets the flag C<OOK> |
264 | (offset OK) to signal to other functions that the offset hack is in | |
883bb8c0 KW |
265 | effect, and it moves the PV pointer (called C<SvPVX>) forward |
266 | by the number of bytes chopped off, and adjusts C<SvCUR> and C<SvLEN> | |
267 | accordingly. (A portion of the space between the old and new PV | |
268 | pointers is used to store the count of chopped bytes.) | |
94dde4fb SC |
269 | |
270 | Hence, at this point, the start of the buffer that we allocated lives | |
271 | at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing | |
272 | into the middle of this allocated storage. | |
273 | ||
274 | This is best demonstrated by example: | |
275 | ||
276 | % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)' | |
277 | SV = PVIV(0x8128450) at 0x81340f0 | |
278 | REFCNT = 1 | |
279 | FLAGS = (POK,OOK,pPOK) | |
280 | IV = 1 (OFFSET) | |
281 | PV = 0x8135781 ( "1" . ) "2345"\0 | |
282 | CUR = 4 | |
283 | LEN = 5 | |
284 | ||
285 | Here the number of bytes chopped off (1) is put into IV, and | |
10e2eb10 | 286 | C<Devel::Peek::Dump> helpfully reminds us that this is an offset. The |
94dde4fb SC |
287 | portion of the string between the "real" and the "fake" beginnings is |
288 | shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect | |
289 | the fake beginning, not the real one. | |
290 | ||
fe854a6f | 291 | Something similar to the offset hack is performed on AVs to enable |
319cef53 SC |
292 | efficient shifting and splicing off the beginning of the array; while |
293 | C<AvARRAY> points to the first element in the array that is visible from | |
10e2eb10 | 294 | Perl, C<AvALLOC> points to the real start of the C array. These are |
319cef53 | 295 | usually the same, but a C<shift> operation can be carried out by |
6de131f0 | 296 | increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvMAX>. |
319cef53 | 297 | Again, the location of the real start of the C array only comes into |
10e2eb10 | 298 | play when freeing the array. See C<av_shift> in F<av.c>. |
319cef53 | 299 | |
d1b91892 | 300 | =head2 What's Really Stored in an SV? |
a0d0e21e LW |
301 | |
302 | Recall that the usual method of determining the type of scalar you have is | |
5f05dabc | 303 | to use C<Sv*OK> macros. Because a scalar can be both a number and a string, |
d1b91892 | 304 | usually these macros will always return TRUE and calling the C<Sv*V> |
a0d0e21e LW |
305 | macros will do the appropriate conversion of string to integer/double or |
306 | integer/double to string. | |
307 | ||
308 | If you I<really> need to know if you have an integer, double, or string | |
309 | pointer in an SV, you can use the following three macros instead: | |
310 | ||
311 | SvIOKp(SV*) | |
312 | SvNOKp(SV*) | |
313 | SvPOKp(SV*) | |
314 | ||
315 | These will tell you if you truly have an integer, double, or string pointer | |
d1b91892 | 316 | stored in your SV. The "p" stands for private. |
a0d0e21e | 317 | |
da8c5729 | 318 | There are various ways in which the private and public flags may differ. |
9090718a FC |
319 | For example, in perl 5.16 and earlier a tied SV may have a valid |
320 | underlying value in the IV slot (so SvIOKp is true), but the data | |
321 | should be accessed via the FETCH routine rather than directly, | |
322 | so SvIOK is false. (In perl 5.18 onwards, tied scalars use | |
323 | the flags the same way as untied scalars.) Another is when | |
d7f8936a | 324 | numeric conversion has occurred and precision has been lost: only the |
10e2eb10 | 325 | private flag is set on 'lossy' values. So when an NV is converted to an |
9e9796d6 JH |
326 | IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be. |
327 | ||
07fa94a1 | 328 | In general, though, it's best to use the C<Sv*V> macros. |
a0d0e21e | 329 | |
54310121 | 330 | =head2 Working with AVs |
a0d0e21e | 331 | |
07fa94a1 JO |
332 | There are two ways to create and load an AV. The first method creates an |
333 | empty AV: | |
a0d0e21e LW |
334 | |
335 | AV* newAV(); | |
336 | ||
54310121 | 337 | The second method both creates the AV and initially populates it with SVs: |
a0d0e21e | 338 | |
c70927a6 | 339 | AV* av_make(SSize_t num, SV **ptr); |
a0d0e21e | 340 | |
5f05dabc | 341 | The second argument points to an array containing C<num> C<SV*>'s. Once the |
54310121 | 342 | AV has been created, the SVs can be destroyed, if so desired. |
a0d0e21e | 343 | |
da8c5729 | 344 | Once the AV has been created, the following operations are possible on it: |
a0d0e21e LW |
345 | |
346 | void av_push(AV*, SV*); | |
347 | SV* av_pop(AV*); | |
348 | SV* av_shift(AV*); | |
c70927a6 | 349 | void av_unshift(AV*, SSize_t num); |
a0d0e21e LW |
350 | |
351 | These should be familiar operations, with the exception of C<av_unshift>. | |
352 | This routine adds C<num> elements at the front of the array with the C<undef> | |
353 | value. You must then use C<av_store> (described below) to assign values | |
354 | to these new elements. | |
355 | ||
356 | Here are some other functions: | |
357 | ||
c70927a6 FC |
358 | SSize_t av_top_index(AV*); |
359 | SV** av_fetch(AV*, SSize_t key, I32 lval); | |
360 | SV** av_store(AV*, SSize_t key, SV* val); | |
a0d0e21e | 361 | |
dab460cd | 362 | The C<av_top_index> function returns the highest index value in an array (just |
5f05dabc | 363 | like $#array in Perl). If the array is empty, -1 is returned. The |
364 | C<av_fetch> function returns the value at index C<key>, but if C<lval> | |
365 | is non-zero, then C<av_fetch> will store an undef value at that index. | |
04343c6d GS |
366 | The C<av_store> function stores the value C<val> at index C<key>, and does |
367 | not increment the reference count of C<val>. Thus the caller is responsible | |
368 | for taking care of that, and if C<av_store> returns NULL, the caller will | |
369 | have to decrement the reference count to avoid a memory leak. Note that | |
370 | C<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their | |
371 | return value. | |
d1b91892 | 372 | |
da8c5729 MH |
373 | A few more: |
374 | ||
a0d0e21e | 375 | void av_clear(AV*); |
a0d0e21e | 376 | void av_undef(AV*); |
c70927a6 | 377 | void av_extend(AV*, SSize_t key); |
5f05dabc | 378 | |
379 | The C<av_clear> function deletes all the elements in the AV* array, but | |
380 | does not actually delete the array itself. The C<av_undef> function will | |
381 | delete all the elements in the array plus the array itself. The | |
adc882cf GS |
382 | C<av_extend> function extends the array so that it contains at least C<key+1> |
383 | elements. If C<key+1> is less than the currently allocated length of the array, | |
384 | then nothing is done. | |
a0d0e21e LW |
385 | |
386 | If you know the name of an array variable, you can get a pointer to its AV | |
387 | by using the following: | |
388 | ||
cbfd0a87 | 389 | AV* get_av("package::varname", 0); |
a0d0e21e LW |
390 | |
391 | This returns NULL if the variable does not exist. | |
392 | ||
04343c6d GS |
393 | See L<Understanding the Magic of Tied Hashes and Arrays> for more |
394 | information on how to use the array access functions on tied arrays. | |
395 | ||
54310121 | 396 | =head2 Working with HVs |
a0d0e21e LW |
397 | |
398 | To create an HV, you use the following routine: | |
399 | ||
400 | HV* newHV(); | |
401 | ||
da8c5729 | 402 | Once the HV has been created, the following operations are possible on it: |
a0d0e21e | 403 | |
08105a92 GS |
404 | SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash); |
405 | SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval); | |
a0d0e21e | 406 | |
5f05dabc | 407 | The C<klen> parameter is the length of the key being passed in (Note that |
408 | you cannot pass 0 in as a value of C<klen> to tell Perl to measure the | |
409 | length of the key). The C<val> argument contains the SV pointer to the | |
54310121 | 410 | scalar being stored, and C<hash> is the precomputed hash value (zero if |
5f05dabc | 411 | you want C<hv_store> to calculate it for you). The C<lval> parameter |
412 | indicates whether this fetch is actually a part of a store operation, in | |
413 | which case a new undefined value will be added to the HV with the supplied | |
414 | key and C<hv_fetch> will return as if the value had already existed. | |
a0d0e21e | 415 | |
5f05dabc | 416 | Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just |
417 | C<SV*>. To access the scalar value, you must first dereference the return | |
418 | value. However, you should check to make sure that the return value is | |
419 | not NULL before dereferencing it. | |
a0d0e21e | 420 | |
da8c5729 MH |
421 | The first of these two functions checks if a hash table entry exists, and the |
422 | second deletes it. | |
a0d0e21e | 423 | |
08105a92 GS |
424 | bool hv_exists(HV*, const char* key, U32 klen); |
425 | SV* hv_delete(HV*, const char* key, U32 klen, I32 flags); | |
a0d0e21e | 426 | |
5f05dabc | 427 | If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will |
428 | create and return a mortal copy of the deleted value. | |
429 | ||
a0d0e21e LW |
430 | And more miscellaneous functions: |
431 | ||
432 | void hv_clear(HV*); | |
a0d0e21e | 433 | void hv_undef(HV*); |
5f05dabc | 434 | |
435 | Like their AV counterparts, C<hv_clear> deletes all the entries in the hash | |
436 | table but does not actually delete the hash table. The C<hv_undef> deletes | |
437 | both the entries and the hash table itself. | |
a0d0e21e | 438 | |
a9b0660e | 439 | Perl keeps the actual data in a linked list of structures with a typedef of HE. |
d1b91892 AD |
440 | These contain the actual key and value pointers (plus extra administrative |
441 | overhead). The key is a string pointer; the value is an C<SV*>. However, | |
442 | once you have an C<HE*>, to get the actual key and value, use the routines | |
443 | specified below. | |
444 | ||
a0d0e21e LW |
445 | I32 hv_iterinit(HV*); |
446 | /* Prepares starting point to traverse hash table */ | |
447 | HE* hv_iternext(HV*); | |
448 | /* Get the next entry, and return a pointer to a | |
449 | structure that has both the key and value */ | |
450 | char* hv_iterkey(HE* entry, I32* retlen); | |
451 | /* Get the key from an HE structure and also return | |
452 | the length of the key string */ | |
cb1a09d0 | 453 | SV* hv_iterval(HV*, HE* entry); |
d1be9408 | 454 | /* Return an SV pointer to the value of the HE |
a0d0e21e | 455 | structure */ |
cb1a09d0 | 456 | SV* hv_iternextsv(HV*, char** key, I32* retlen); |
d1b91892 AD |
457 | /* This convenience routine combines hv_iternext, |
458 | hv_iterkey, and hv_iterval. The key and retlen | |
459 | arguments are return values for the key and its | |
460 | length. The value is returned in the SV* argument */ | |
a0d0e21e LW |
461 | |
462 | If you know the name of a hash variable, you can get a pointer to its HV | |
463 | by using the following: | |
464 | ||
6673a63c | 465 | HV* get_hv("package::varname", 0); |
a0d0e21e LW |
466 | |
467 | This returns NULL if the variable does not exist. | |
468 | ||
a43e7901 | 469 | The hash algorithm is defined in the C<PERL_HASH> macro: |
a0d0e21e | 470 | |
a43e7901 | 471 | PERL_HASH(hash, key, klen) |
ab192400 | 472 | |
a43e7901 YO |
473 | The exact implementation of this macro varies by architecture and version |
474 | of perl, and the return value may change per invocation, so the value | |
475 | is only valid for the duration of a single perl process. | |
a0d0e21e | 476 | |
04343c6d GS |
477 | See L<Understanding the Magic of Tied Hashes and Arrays> for more |
478 | information on how to use the hash access functions on tied hashes. | |
479 | ||
1e422769 | 480 | =head2 Hash API Extensions |
481 | ||
482 | Beginning with version 5.004, the following functions are also supported: | |
483 | ||
484 | HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash); | |
485 | HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash); | |
c47ff5f1 | 486 | |
1e422769 | 487 | bool hv_exists_ent (HV* tb, SV* key, U32 hash); |
488 | SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash); | |
c47ff5f1 | 489 | |
1e422769 | 490 | SV* hv_iterkeysv (HE* entry); |
491 | ||
492 | Note that these functions take C<SV*> keys, which simplifies writing | |
493 | of extension code that deals with hash structures. These functions | |
494 | also allow passing of C<SV*> keys to C<tie> functions without forcing | |
495 | you to stringify the keys (unlike the previous set of functions). | |
496 | ||
497 | They also return and accept whole hash entries (C<HE*>), making their | |
498 | use more efficient (since the hash number for a particular string | |
4a4eefd0 GS |
499 | doesn't have to be recomputed every time). See L<perlapi> for detailed |
500 | descriptions. | |
1e422769 | 501 | |
502 | The following macros must always be used to access the contents of hash | |
503 | entries. Note that the arguments to these macros must be simple | |
504 | variables, since they may get evaluated more than once. See | |
4a4eefd0 | 505 | L<perlapi> for detailed descriptions of these macros. |
1e422769 | 506 | |
507 | HePV(HE* he, STRLEN len) | |
508 | HeVAL(HE* he) | |
509 | HeHASH(HE* he) | |
510 | HeSVKEY(HE* he) | |
511 | HeSVKEY_force(HE* he) | |
512 | HeSVKEY_set(HE* he, SV* sv) | |
513 | ||
514 | These two lower level macros are defined, but must only be used when | |
515 | dealing with keys that are not C<SV*>s: | |
516 | ||
517 | HeKEY(HE* he) | |
518 | HeKLEN(HE* he) | |
519 | ||
04343c6d GS |
520 | Note that both C<hv_store> and C<hv_store_ent> do not increment the |
521 | reference count of the stored C<val>, which is the caller's responsibility. | |
522 | If these functions return a NULL value, the caller will usually have to | |
523 | decrement the reference count of C<val> to avoid a memory leak. | |
1e422769 | 524 | |
a9381218 MHM |
525 | =head2 AVs, HVs and undefined values |
526 | ||
10e2eb10 FC |
527 | Sometimes you have to store undefined values in AVs or HVs. Although |
528 | this may be a rare case, it can be tricky. That's because you're | |
a9381218 MHM |
529 | used to using C<&PL_sv_undef> if you need an undefined SV. |
530 | ||
531 | For example, intuition tells you that this XS code: | |
532 | ||
533 | AV *av = newAV(); | |
534 | av_store( av, 0, &PL_sv_undef ); | |
535 | ||
536 | is equivalent to this Perl code: | |
537 | ||
538 | my @av; | |
539 | $av[0] = undef; | |
540 | ||
f3c4ec28 | 541 | Unfortunately, this isn't true. In perl 5.18 and earlier, AVs use C<&PL_sv_undef> as a marker |
a9381218 MHM |
542 | for indicating that an array element has not yet been initialized. |
543 | Thus, C<exists $av[0]> would be true for the above Perl code, but | |
f3c4ec28 FC |
544 | false for the array generated by the XS code. In perl 5.20, storing |
545 | &PL_sv_undef will create a read-only element, because the scalar | |
546 | &PL_sv_undef itself is stored, not a copy. | |
a9381218 | 547 | |
f3c4ec28 | 548 | Similar problems can occur when storing C<&PL_sv_undef> in HVs: |
a9381218 MHM |
549 | |
550 | hv_store( hv, "key", 3, &PL_sv_undef, 0 ); | |
551 | ||
552 | This will indeed make the value C<undef>, but if you try to modify | |
553 | the value of C<key>, you'll get the following error: | |
554 | ||
555 | Modification of non-creatable hash value attempted | |
556 | ||
557 | In perl 5.8.0, C<&PL_sv_undef> was also used to mark placeholders | |
10e2eb10 | 558 | in restricted hashes. This caused such hash entries not to appear |
a9381218 MHM |
559 | when iterating over the hash or when checking for the keys |
560 | with the C<hv_exists> function. | |
561 | ||
8abccac8 | 562 | You can run into similar problems when you store C<&PL_sv_yes> or |
10e2eb10 | 563 | C<&PL_sv_no> into AVs or HVs. Trying to modify such elements |
a9381218 MHM |
564 | will give you the following error: |
565 | ||
566 | Modification of a read-only value attempted | |
567 | ||
568 | To make a long story short, you can use the special variables | |
8abccac8 | 569 | C<&PL_sv_undef>, C<&PL_sv_yes> and C<&PL_sv_no> with AVs and |
a9381218 MHM |
570 | HVs, but you have to make sure you know what you're doing. |
571 | ||
572 | Generally, if you want to store an undefined value in an AV | |
573 | or HV, you should not use C<&PL_sv_undef>, but rather create a | |
574 | new undefined value using the C<newSV> function, for example: | |
575 | ||
576 | av_store( av, 42, newSV(0) ); | |
577 | hv_store( hv, "foo", 3, newSV(0), 0 ); | |
578 | ||
a0d0e21e LW |
579 | =head2 References |
580 | ||
d1b91892 | 581 | References are a special type of scalar that point to other data types |
a9b0660e | 582 | (including other references). |
a0d0e21e | 583 | |
07fa94a1 | 584 | To create a reference, use either of the following functions: |
a0d0e21e | 585 | |
5f05dabc | 586 | SV* newRV_inc((SV*) thing); |
587 | SV* newRV_noinc((SV*) thing); | |
a0d0e21e | 588 | |
5f05dabc | 589 | The C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>. The |
07fa94a1 JO |
590 | functions are identical except that C<newRV_inc> increments the reference |
591 | count of the C<thing>, while C<newRV_noinc> does not. For historical | |
592 | reasons, C<newRV> is a synonym for C<newRV_inc>. | |
593 | ||
594 | Once you have a reference, you can use the following macro to dereference | |
595 | the reference: | |
a0d0e21e LW |
596 | |
597 | SvRV(SV*) | |
598 | ||
599 | then call the appropriate routines, casting the returned C<SV*> to either an | |
d1b91892 | 600 | C<AV*> or C<HV*>, if required. |
a0d0e21e | 601 | |
d1b91892 | 602 | To determine if an SV is a reference, you can use the following macro: |
a0d0e21e LW |
603 | |
604 | SvROK(SV*) | |
605 | ||
07fa94a1 JO |
606 | To discover what type of value the reference refers to, use the following |
607 | macro and then check the return value. | |
d1b91892 AD |
608 | |
609 | SvTYPE(SvRV(SV*)) | |
610 | ||
611 | The most useful types that will be returned are: | |
612 | ||
a5e62da0 FC |
613 | < SVt_PVAV Scalar |
614 | SVt_PVAV Array | |
615 | SVt_PVHV Hash | |
616 | SVt_PVCV Code | |
617 | SVt_PVGV Glob (possibly a file handle) | |
618 | ||
619 | See L<perlapi/svtype> for more details. | |
d1b91892 | 620 | |
cb1a09d0 AD |
621 | =head2 Blessed References and Class Objects |
622 | ||
06f6df17 | 623 | References are also used to support object-oriented programming. In perl's |
cb1a09d0 AD |
624 | OO lexicon, an object is simply a reference that has been blessed into a |
625 | package (or class). Once blessed, the programmer may now use the reference | |
626 | to access the various methods in the class. | |
627 | ||
628 | A reference can be blessed into a package with the following function: | |
629 | ||
630 | SV* sv_bless(SV* sv, HV* stash); | |
631 | ||
06f6df17 RGS |
632 | The C<sv> argument must be a reference value. The C<stash> argument |
633 | specifies which class the reference will belong to. See | |
2ae324a7 | 634 | L<Stashes and Globs> for information on converting class names into stashes. |
cb1a09d0 AD |
635 | |
636 | /* Still under construction */ | |
637 | ||
ddd2cc91 DM |
638 | The following function upgrades rv to reference if not already one. |
639 | Creates a new SV for rv to point to. If C<classname> is non-null, the SV | |
640 | is blessed into the specified class. SV is returned. | |
cb1a09d0 | 641 | |
08105a92 | 642 | SV* newSVrv(SV* rv, const char* classname); |
cb1a09d0 | 643 | |
ddd2cc91 DM |
644 | The following three functions copy integer, unsigned integer or double |
645 | into an SV whose reference is C<rv>. SV is blessed if C<classname> is | |
646 | non-null. | |
cb1a09d0 | 647 | |
08105a92 | 648 | SV* sv_setref_iv(SV* rv, const char* classname, IV iv); |
e1c57cef | 649 | SV* sv_setref_uv(SV* rv, const char* classname, UV uv); |
08105a92 | 650 | SV* sv_setref_nv(SV* rv, const char* classname, NV iv); |
cb1a09d0 | 651 | |
ddd2cc91 DM |
652 | The following function copies the pointer value (I<the address, not the |
653 | string!>) into an SV whose reference is rv. SV is blessed if C<classname> | |
654 | is non-null. | |
cb1a09d0 | 655 | |
ddd2cc91 | 656 | SV* sv_setref_pv(SV* rv, const char* classname, void* pv); |
cb1a09d0 | 657 | |
a9b0660e | 658 | The following function copies a string into an SV whose reference is C<rv>. |
ddd2cc91 DM |
659 | Set length to 0 to let Perl calculate the string length. SV is blessed if |
660 | C<classname> is non-null. | |
cb1a09d0 | 661 | |
a9b0660e KW |
662 | SV* sv_setref_pvn(SV* rv, const char* classname, char* pv, |
663 | STRLEN length); | |
cb1a09d0 | 664 | |
ddd2cc91 DM |
665 | The following function tests whether the SV is blessed into the specified |
666 | class. It does not check inheritance relationships. | |
9abd00ed | 667 | |
08105a92 | 668 | int sv_isa(SV* sv, const char* name); |
9abd00ed | 669 | |
ddd2cc91 | 670 | The following function tests whether the SV is a reference to a blessed object. |
9abd00ed GS |
671 | |
672 | int sv_isobject(SV* sv); | |
673 | ||
ddd2cc91 | 674 | The following function tests whether the SV is derived from the specified |
10e2eb10 FC |
675 | class. SV can be either a reference to a blessed object or a string |
676 | containing a class name. This is the function implementing the | |
ddd2cc91 | 677 | C<UNIVERSAL::isa> functionality. |
9abd00ed | 678 | |
08105a92 | 679 | bool sv_derived_from(SV* sv, const char* name); |
9abd00ed | 680 | |
00aadd71 | 681 | To check if you've got an object derived from a specific class you have |
9abd00ed GS |
682 | to write: |
683 | ||
684 | if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... } | |
cb1a09d0 | 685 | |
5f05dabc | 686 | =head2 Creating New Variables |
cb1a09d0 | 687 | |
5f05dabc | 688 | To create a new Perl variable with an undef value which can be accessed from |
689 | your Perl script, use the following routines, depending on the variable type. | |
cb1a09d0 | 690 | |
64ace3f8 | 691 | SV* get_sv("package::varname", GV_ADD); |
cbfd0a87 | 692 | AV* get_av("package::varname", GV_ADD); |
6673a63c | 693 | HV* get_hv("package::varname", GV_ADD); |
cb1a09d0 | 694 | |
058a5f6c | 695 | Notice the use of GV_ADD as the second parameter. The new variable can now |
cb1a09d0 AD |
696 | be set, using the routines appropriate to the data type. |
697 | ||
5f05dabc | 698 | There are additional macros whose values may be bitwise OR'ed with the |
058a5f6c | 699 | C<GV_ADD> argument to enable certain extra features. Those bits are: |
cb1a09d0 | 700 | |
9a68f1db SB |
701 | =over |
702 | ||
703 | =item GV_ADDMULTI | |
704 | ||
705 | Marks the variable as multiply defined, thus preventing the: | |
706 | ||
707 | Name <varname> used only once: possible typo | |
708 | ||
709 | warning. | |
710 | ||
9a68f1db SB |
711 | =item GV_ADDWARN |
712 | ||
713 | Issues the warning: | |
714 | ||
715 | Had to create <varname> unexpectedly | |
716 | ||
717 | if the variable did not exist before the function was called. | |
718 | ||
719 | =back | |
cb1a09d0 | 720 | |
07fa94a1 JO |
721 | If you do not specify a package name, the variable is created in the current |
722 | package. | |
cb1a09d0 | 723 | |
5f05dabc | 724 | =head2 Reference Counts and Mortality |
a0d0e21e | 725 | |
10e2eb10 | 726 | Perl uses a reference count-driven garbage collection mechanism. SVs, |
54310121 | 727 | AVs, or HVs (xV for short in the following) start their life with a |
55497cff | 728 | reference count of 1. If the reference count of an xV ever drops to 0, |
07fa94a1 | 729 | then it will be destroyed and its memory made available for reuse. |
55497cff | 730 | |
731 | This normally doesn't happen at the Perl level unless a variable is | |
5f05dabc | 732 | undef'ed or the last variable holding a reference to it is changed or |
733 | overwritten. At the internal level, however, reference counts can be | |
55497cff | 734 | manipulated with the following macros: |
735 | ||
736 | int SvREFCNT(SV* sv); | |
5f05dabc | 737 | SV* SvREFCNT_inc(SV* sv); |
55497cff | 738 | void SvREFCNT_dec(SV* sv); |
739 | ||
740 | However, there is one other function which manipulates the reference | |
07fa94a1 JO |
741 | count of its argument. The C<newRV_inc> function, you will recall, |
742 | creates a reference to the specified argument. As a side effect, | |
743 | it increments the argument's reference count. If this is not what | |
744 | you want, use C<newRV_noinc> instead. | |
745 | ||
746 | For example, imagine you want to return a reference from an XSUB function. | |
747 | Inside the XSUB routine, you create an SV which initially has a reference | |
748 | count of one. Then you call C<newRV_inc>, passing it the just-created SV. | |
5f05dabc | 749 | This returns the reference as a new SV, but the reference count of the |
750 | SV you passed to C<newRV_inc> has been incremented to two. Now you | |
07fa94a1 JO |
751 | return the reference from the XSUB routine and forget about the SV. |
752 | But Perl hasn't! Whenever the returned reference is destroyed, the | |
753 | reference count of the original SV is decreased to one and nothing happens. | |
754 | The SV will hang around without any way to access it until Perl itself | |
755 | terminates. This is a memory leak. | |
5f05dabc | 756 | |
757 | The correct procedure, then, is to use C<newRV_noinc> instead of | |
faed5253 JO |
758 | C<newRV_inc>. Then, if and when the last reference is destroyed, |
759 | the reference count of the SV will go to zero and it will be destroyed, | |
07fa94a1 | 760 | stopping any memory leak. |
55497cff | 761 | |
5f05dabc | 762 | There are some convenience functions available that can help with the |
54310121 | 763 | destruction of xVs. These functions introduce the concept of "mortality". |
07fa94a1 JO |
764 | An xV that is mortal has had its reference count marked to be decremented, |
765 | but not actually decremented, until "a short time later". Generally the | |
766 | term "short time later" means a single Perl statement, such as a call to | |
54310121 | 767 | an XSUB function. The actual determinant for when mortal xVs have their |
07fa94a1 JO |
768 | reference count decremented depends on two macros, SAVETMPS and FREETMPS. |
769 | See L<perlcall> and L<perlxs> for more details on these macros. | |
55497cff | 770 | |
771 | "Mortalization" then is at its simplest a deferred C<SvREFCNT_dec>. | |
772 | However, if you mortalize a variable twice, the reference count will | |
773 | later be decremented twice. | |
774 | ||
00aadd71 NIS |
775 | "Mortal" SVs are mainly used for SVs that are placed on perl's stack. |
776 | For example an SV which is created just to pass a number to a called sub | |
06f6df17 | 777 | is made mortal to have it cleaned up automatically when it's popped off |
10e2eb10 | 778 | the stack. Similarly, results returned by XSUBs (which are pushed on the |
06f6df17 | 779 | stack) are often made mortal. |
a0d0e21e LW |
780 | |
781 | To create a mortal variable, use the functions: | |
782 | ||
783 | SV* sv_newmortal() | |
784 | SV* sv_2mortal(SV*) | |
785 | SV* sv_mortalcopy(SV*) | |
786 | ||
00aadd71 | 787 | The first call creates a mortal SV (with no value), the second converts an existing |
5f05dabc | 788 | SV to a mortal SV (and thus defers a call to C<SvREFCNT_dec>), and the |
789 | third creates a mortal copy of an existing SV. | |
da8c5729 | 790 | Because C<sv_newmortal> gives the new SV no value, it must normally be given one |
9a68f1db | 791 | via C<sv_setpv>, C<sv_setiv>, etc. : |
00aadd71 NIS |
792 | |
793 | SV *tmp = sv_newmortal(); | |
794 | sv_setiv(tmp, an_integer); | |
795 | ||
796 | As that is multiple C statements it is quite common so see this idiom instead: | |
797 | ||
798 | SV *tmp = sv_2mortal(newSViv(an_integer)); | |
799 | ||
800 | ||
801 | You should be careful about creating mortal variables. Strange things | |
802 | can happen if you make the same value mortal within multiple contexts, | |
10e2eb10 FC |
803 | or if you make a variable mortal multiple |
804 | times. Thinking of "Mortalization" | |
00aadd71 | 805 | as deferred C<SvREFCNT_dec> should help to minimize such problems. |
da8c5729 | 806 | For example if you are passing an SV which you I<know> has a high enough REFCNT |
00aadd71 NIS |
807 | to survive its use on the stack you need not do any mortalization. |
808 | If you are not sure then doing an C<SvREFCNT_inc> and C<sv_2mortal>, or | |
809 | making a C<sv_mortalcopy> is safer. | |
a0d0e21e | 810 | |
ac036724 | 811 | The mortal routines are not just for SVs; AVs and HVs can be |
faed5253 | 812 | made mortal by passing their address (type-casted to C<SV*>) to the |
07fa94a1 | 813 | C<sv_2mortal> or C<sv_mortalcopy> routines. |
a0d0e21e | 814 | |
5f05dabc | 815 | =head2 Stashes and Globs |
a0d0e21e | 816 | |
06f6df17 RGS |
817 | A B<stash> is a hash that contains all variables that are defined |
818 | within a package. Each key of the stash is a symbol | |
aa689395 | 819 | name (shared by all the different types of objects that have the same |
820 | name), and each value in the hash table is a GV (Glob Value). This GV | |
821 | in turn contains references to the various objects of that name, | |
822 | including (but not limited to) the following: | |
cb1a09d0 | 823 | |
a0d0e21e LW |
824 | Scalar Value |
825 | Array Value | |
826 | Hash Value | |
a3cb178b | 827 | I/O Handle |
a0d0e21e LW |
828 | Format |
829 | Subroutine | |
830 | ||
06f6df17 RGS |
831 | There is a single stash called C<PL_defstash> that holds the items that exist |
832 | in the C<main> package. To get at the items in other packages, append the | |
833 | string "::" to the package name. The items in the C<Foo> package are in | |
834 | the stash C<Foo::> in PL_defstash. The items in the C<Bar::Baz> package are | |
835 | in the stash C<Baz::> in C<Bar::>'s stash. | |
a0d0e21e | 836 | |
d1b91892 | 837 | To get the stash pointer for a particular package, use the function: |
a0d0e21e | 838 | |
da51bb9b NC |
839 | HV* gv_stashpv(const char* name, I32 flags) |
840 | HV* gv_stashsv(SV*, I32 flags) | |
a0d0e21e LW |
841 | |
842 | The first function takes a literal string, the second uses the string stored | |
d1b91892 | 843 | in the SV. Remember that a stash is just a hash table, so you get back an |
da51bb9b | 844 | C<HV*>. The C<flags> flag will create a new package if it is set to GV_ADD. |
a0d0e21e LW |
845 | |
846 | The name that C<gv_stash*v> wants is the name of the package whose symbol table | |
847 | you want. The default package is called C<main>. If you have multiply nested | |
d1b91892 AD |
848 | packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl |
849 | language itself. | |
a0d0e21e LW |
850 | |
851 | Alternately, if you have an SV that is a blessed reference, you can find | |
852 | out the stash pointer by using: | |
853 | ||
854 | HV* SvSTASH(SvRV(SV*)); | |
855 | ||
856 | then use the following to get the package name itself: | |
857 | ||
858 | char* HvNAME(HV* stash); | |
859 | ||
5f05dabc | 860 | If you need to bless or re-bless an object you can use the following |
861 | function: | |
a0d0e21e LW |
862 | |
863 | SV* sv_bless(SV*, HV* stash) | |
864 | ||
865 | where the first argument, an C<SV*>, must be a reference, and the second | |
866 | argument is a stash. The returned C<SV*> can now be used in the same way | |
867 | as any other SV. | |
868 | ||
d1b91892 AD |
869 | For more information on references and blessings, consult L<perlref>. |
870 | ||
54310121 | 871 | =head2 Double-Typed SVs |
0a753a76 | 872 | |
873 | Scalar variables normally contain only one type of value, an integer, | |
874 | double, pointer, or reference. Perl will automatically convert the | |
875 | actual scalar data from the stored type into the requested type. | |
876 | ||
877 | Some scalar variables contain more than one type of scalar data. For | |
878 | example, the variable C<$!> contains either the numeric value of C<errno> | |
879 | or its string equivalent from either C<strerror> or C<sys_errlist[]>. | |
880 | ||
881 | To force multiple data values into an SV, you must do two things: use the | |
882 | C<sv_set*v> routines to add the additional scalar type, then set a flag | |
883 | so that Perl will believe it contains more than one type of data. The | |
884 | four macros to set the flags are: | |
885 | ||
886 | SvIOK_on | |
887 | SvNOK_on | |
888 | SvPOK_on | |
889 | SvROK_on | |
890 | ||
891 | The particular macro you must use depends on which C<sv_set*v> routine | |
892 | you called first. This is because every C<sv_set*v> routine turns on | |
893 | only the bit for the particular type of data being set, and turns off | |
894 | all the rest. | |
895 | ||
896 | For example, to create a new Perl variable called "dberror" that contains | |
897 | both the numeric and descriptive string error values, you could use the | |
898 | following code: | |
899 | ||
900 | extern int dberror; | |
901 | extern char *dberror_list; | |
902 | ||
64ace3f8 | 903 | SV* sv = get_sv("dberror", GV_ADD); |
0a753a76 | 904 | sv_setiv(sv, (IV) dberror); |
905 | sv_setpv(sv, dberror_list[dberror]); | |
906 | SvIOK_on(sv); | |
907 | ||
908 | If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the | |
909 | macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>. | |
910 | ||
4f4531b8 FC |
911 | =head2 Read-Only Values |
912 | ||
913 | In Perl 5.16 and earlier, copy-on-write (see the next section) shared a | |
914 | flag bit with read-only scalars. So the only way to test whether | |
915 | C<sv_setsv>, etc., will raise a "Modification of a read-only value" error | |
916 | in those versions is: | |
917 | ||
918 | SvREADONLY(sv) && !SvIsCOW(sv) | |
919 | ||
920 | Under Perl 5.18 and later, SvREADONLY only applies to read-only variables, | |
921 | and, under 5.20, copy-on-write scalars can also be read-only, so the above | |
922 | check is incorrect. You just want: | |
923 | ||
924 | SvREADONLY(sv) | |
925 | ||
926 | If you need to do this check often, define your own macro like this: | |
927 | ||
928 | #if PERL_VERSION >= 18 | |
929 | # define SvTRULYREADONLY(sv) SvREADONLY(sv) | |
930 | #else | |
931 | # define SvTRULYREADONLY(sv) (SvREADONLY(sv) && !SvIsCOW(sv)) | |
932 | #endif | |
933 | ||
934 | =head2 Copy on Write | |
935 | ||
936 | Perl implements a copy-on-write (COW) mechanism for scalars, in which | |
937 | string copies are not immediately made when requested, but are deferred | |
938 | until made necessary by one or the other scalar changing. This is mostly | |
939 | transparent, but one must take care not to modify string buffers that are | |
940 | shared by multiple SVs. | |
941 | ||
942 | You can test whether an SV is using copy-on-write with C<SvIsCOW(sv)>. | |
943 | ||
944 | You can force an SV to make its own copy of its string buffer by calling C<sv_force_normal(sv)> or SvPV_force_nolen(sv). | |
945 | ||
946 | If you want to make the SV drop its string buffer, use | |
947 | C<sv_force_normal_flags(sv, SV_COW_DROP_PV)> or simply | |
948 | C<sv_setsv(sv, NULL)>. | |
949 | ||
950 | All of these functions will croak on read-only scalars (see the previous | |
951 | section for more on those). | |
952 | ||
953 | To test that your code is behaving correctly and not modifying COW buffers, | |
954 | on systems that support L<mmap(2)> (i.e., Unix) you can configure perl with | |
955 | C<-Accflags=-DPERL_DEBUG_READONLY_COW> and it will turn buffer violations | |
956 | into crashes. You will find it to be marvellously slow, so you may want to | |
957 | skip perl's own tests. | |
958 | ||
0a753a76 | 959 | =head2 Magic Variables |
a0d0e21e | 960 | |
d1b91892 AD |
961 | [This section still under construction. Ignore everything here. Post no |
962 | bills. Everything not permitted is forbidden.] | |
963 | ||
d1b91892 AD |
964 | Any SV may be magical, that is, it has special features that a normal |
965 | SV does not have. These features are stored in the SV structure in a | |
5f05dabc | 966 | linked list of C<struct magic>'s, typedef'ed to C<MAGIC>. |
d1b91892 AD |
967 | |
968 | struct magic { | |
969 | MAGIC* mg_moremagic; | |
970 | MGVTBL* mg_virtual; | |
971 | U16 mg_private; | |
972 | char mg_type; | |
973 | U8 mg_flags; | |
b205eb13 | 974 | I32 mg_len; |
d1b91892 AD |
975 | SV* mg_obj; |
976 | char* mg_ptr; | |
d1b91892 AD |
977 | }; |
978 | ||
979 | Note this is current as of patchlevel 0, and could change at any time. | |
980 | ||
981 | =head2 Assigning Magic | |
982 | ||
983 | Perl adds magic to an SV using the sv_magic function: | |
984 | ||
a9b0660e | 985 | void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen); |
d1b91892 AD |
986 | |
987 | The C<sv> argument is a pointer to the SV that is to acquire a new magical | |
988 | feature. | |
989 | ||
990 | If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to | |
10e2eb10 FC |
991 | convert C<sv> to type C<SVt_PVMG>. |
992 | Perl then continues by adding new magic | |
645c22ef DM |
993 | to the beginning of the linked list of magical features. Any prior entry |
994 | of the same type of magic is deleted. Note that this can be overridden, | |
995 | and multiple instances of the same type of magic can be associated with an | |
996 | SV. | |
d1b91892 | 997 | |
54310121 | 998 | The C<name> and C<namlen> arguments are used to associate a string with |
10e2eb10 | 999 | the magic, typically the name of a variable. C<namlen> is stored in the |
2d8d5d5a SH |
1000 | C<mg_len> field and if C<name> is non-null then either a C<savepvn> copy of |
1001 | C<name> or C<name> itself is stored in the C<mg_ptr> field, depending on | |
1002 | whether C<namlen> is greater than zero or equal to zero respectively. As a | |
1003 | special case, if C<(name && namlen == HEf_SVKEY)> then C<name> is assumed | |
1004 | to contain an C<SV*> and is stored as-is with its REFCNT incremented. | |
d1b91892 AD |
1005 | |
1006 | The sv_magic function uses C<how> to determine which, if any, predefined | |
1007 | "Magic Virtual Table" should be assigned to the C<mg_virtual> field. | |
06f6df17 | 1008 | See the L<Magic Virtual Tables> section below. The C<how> argument is also |
10e2eb10 FC |
1009 | stored in the C<mg_type> field. The value of |
1010 | C<how> should be chosen from the set of macros | |
1011 | C<PERL_MAGIC_foo> found in F<perl.h>. Note that before | |
645c22ef | 1012 | these macros were added, Perl internals used to directly use character |
14befaf4 | 1013 | literals, so you may occasionally come across old code or documentation |
75d0f26d | 1014 | referring to 'U' magic rather than C<PERL_MAGIC_uvar> for example. |
d1b91892 AD |
1015 | |
1016 | The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC> | |
1017 | structure. If it is not the same as the C<sv> argument, the reference | |
1018 | count of the C<obj> object is incremented. If it is the same, or if | |
645c22ef | 1019 | the C<how> argument is C<PERL_MAGIC_arylen>, or if it is a NULL pointer, |
14befaf4 | 1020 | then C<obj> is merely stored, without the reference count being incremented. |
d1b91892 | 1021 | |
2d8d5d5a SH |
1022 | See also C<sv_magicext> in L<perlapi> for a more flexible way to add magic |
1023 | to an SV. | |
1024 | ||
cb1a09d0 AD |
1025 | There is also a function to add magic to an C<HV>: |
1026 | ||
1027 | void hv_magic(HV *hv, GV *gv, int how); | |
1028 | ||
1029 | This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>. | |
1030 | ||
1031 | To remove the magic from an SV, call the function sv_unmagic: | |
1032 | ||
70a53b35 | 1033 | int sv_unmagic(SV *sv, int type); |
cb1a09d0 AD |
1034 | |
1035 | The C<type> argument should be equal to the C<how> value when the C<SV> | |
1036 | was initially made magical. | |
1037 | ||
f6ee7b17 | 1038 | However, note that C<sv_unmagic> removes all magic of a certain C<type> from the |
10e2eb10 FC |
1039 | C<SV>. If you want to remove only certain |
1040 | magic of a C<type> based on the magic | |
f6ee7b17 FR |
1041 | virtual table, use C<sv_unmagicext> instead: |
1042 | ||
1043 | int sv_unmagicext(SV *sv, int type, MGVTBL *vtbl); | |
1044 | ||
d1b91892 AD |
1045 | =head2 Magic Virtual Tables |
1046 | ||
d1be9408 | 1047 | The C<mg_virtual> field in the C<MAGIC> structure is a pointer to an |
d1b91892 AD |
1048 | C<MGVTBL>, which is a structure of function pointers and stands for |
1049 | "Magic Virtual Table" to handle the various operations that might be | |
1050 | applied to that variable. | |
1051 | ||
301cb7e8 DM |
1052 | The C<MGVTBL> has five (or sometimes eight) pointers to the following |
1053 | routine types: | |
d1b91892 AD |
1054 | |
1055 | int (*svt_get)(SV* sv, MAGIC* mg); | |
1056 | int (*svt_set)(SV* sv, MAGIC* mg); | |
1057 | U32 (*svt_len)(SV* sv, MAGIC* mg); | |
1058 | int (*svt_clear)(SV* sv, MAGIC* mg); | |
1059 | int (*svt_free)(SV* sv, MAGIC* mg); | |
1060 | ||
a9b0660e KW |
1061 | int (*svt_copy)(SV *sv, MAGIC* mg, SV *nsv, |
1062 | const char *name, I32 namlen); | |
301cb7e8 DM |
1063 | int (*svt_dup)(MAGIC *mg, CLONE_PARAMS *param); |
1064 | int (*svt_local)(SV *nsv, MAGIC *mg); | |
1065 | ||
1066 | ||
06f6df17 | 1067 | This MGVTBL structure is set at compile-time in F<perl.h> and there are |
b7a0f54c S |
1068 | currently 32 types. These different structures contain pointers to various |
1069 | routines that perform additional actions depending on which function is | |
1070 | being called. | |
d1b91892 | 1071 | |
a9b0660e KW |
1072 | Function pointer Action taken |
1073 | ---------------- ------------ | |
1074 | svt_get Do something before the value of the SV is | |
1075 | retrieved. | |
1076 | svt_set Do something after the SV is assigned a value. | |
1077 | svt_len Report on the SV's length. | |
1078 | svt_clear Clear something the SV represents. | |
1079 | svt_free Free any extra storage associated with the SV. | |
d1b91892 | 1080 | |
a9b0660e KW |
1081 | svt_copy copy tied variable magic to a tied element |
1082 | svt_dup duplicate a magic structure during thread cloning | |
1083 | svt_local copy magic to local value during 'local' | |
301cb7e8 | 1084 | |
d1b91892 | 1085 | For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds |
14befaf4 | 1086 | to an C<mg_type> of C<PERL_MAGIC_sv>) contains: |
d1b91892 AD |
1087 | |
1088 | { magic_get, magic_set, magic_len, 0, 0 } | |
1089 | ||
14befaf4 DM |
1090 | Thus, when an SV is determined to be magical and of type C<PERL_MAGIC_sv>, |
1091 | if a get operation is being performed, the routine C<magic_get> is | |
1092 | called. All the various routines for the various magical types begin | |
1093 | with C<magic_>. NOTE: the magic routines are not considered part of | |
1094 | the Perl API, and may not be exported by the Perl library. | |
d1b91892 | 1095 | |
301cb7e8 DM |
1096 | The last three slots are a recent addition, and for source code |
1097 | compatibility they are only checked for if one of the three flags | |
10e2eb10 FC |
1098 | MGf_COPY, MGf_DUP or MGf_LOCAL is set in mg_flags. |
1099 | This means that most code can continue declaring | |
1100 | a vtable as a 5-element value. These three are | |
301cb7e8 DM |
1101 | currently used exclusively by the threading code, and are highly subject |
1102 | to change. | |
1103 | ||
d1b91892 AD |
1104 | The current kinds of Magic Virtual Tables are: |
1105 | ||
f1f5ddd7 FC |
1106 | =for comment |
1107 | This table is generated by regen/mg_vtable.pl. Any changes made here | |
1108 | will be lost. | |
1109 | ||
1110 | =for mg_vtable.pl begin | |
1111 | ||
a9b0660e | 1112 | mg_type |
bd6e6c12 FC |
1113 | (old-style char and macro) MGVTBL Type of magic |
1114 | -------------------------- ------ ------------- | |
1115 | \0 PERL_MAGIC_sv vtbl_sv Special scalar variable | |
1116 | # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary) | |
1117 | % PERL_MAGIC_rhash (none) extra data for restricted | |
1118 | hashes | |
2a388207 | 1119 | & PERL_MAGIC_proto (none) my sub prototype CV |
bd6e6c12 FC |
1120 | . PERL_MAGIC_pos vtbl_pos pos() lvalue |
1121 | : PERL_MAGIC_symtab (none) extra data for symbol | |
1122 | tables | |
1123 | < PERL_MAGIC_backref vtbl_backref for weak ref data | |
1124 | @ PERL_MAGIC_arylen_p (none) to move arylen out of XPVAV | |
1125 | B PERL_MAGIC_bm vtbl_regexp Boyer-Moore | |
1126 | (fast string search) | |
1127 | c PERL_MAGIC_overload_table vtbl_ovrld Holds overload table | |
1128 | (AMT) on stash | |
1129 | D PERL_MAGIC_regdata vtbl_regdata Regex match position data | |
1130 | (@+ and @- vars) | |
1131 | d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data | |
1132 | element | |
1133 | E PERL_MAGIC_env vtbl_env %ENV hash | |
1134 | e PERL_MAGIC_envelem vtbl_envelem %ENV hash element | |
eccba044 | 1135 | f PERL_MAGIC_fm vtbl_regexp Formline |
bd6e6c12 | 1136 | ('compiled' format) |
bd6e6c12 FC |
1137 | g PERL_MAGIC_regex_global vtbl_mglob m//g target |
1138 | H PERL_MAGIC_hints vtbl_hints %^H hash | |
1139 | h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element | |
1140 | I PERL_MAGIC_isa vtbl_isa @ISA array | |
1141 | i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element | |
1142 | k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue | |
1143 | L PERL_MAGIC_dbfile (none) Debugger %_<filename | |
1144 | l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename | |
1145 | element | |
1146 | N PERL_MAGIC_shared (none) Shared between threads | |
1147 | n PERL_MAGIC_shared_scalar (none) Shared between threads | |
1148 | o PERL_MAGIC_collxfrm vtbl_collxfrm Locale transformation | |
1149 | P PERL_MAGIC_tied vtbl_pack Tied array or hash | |
1150 | p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element | |
1151 | q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle | |
1152 | r PERL_MAGIC_qr vtbl_regexp precompiled qr// regex | |
1153 | S PERL_MAGIC_sig (none) %SIG hash | |
1154 | s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element | |
1155 | t PERL_MAGIC_taint vtbl_taint Taintedness | |
1156 | U PERL_MAGIC_uvar vtbl_uvar Available for use by | |
1157 | extensions | |
1158 | u PERL_MAGIC_uvar_elem (none) Reserved for use by | |
1159 | extensions | |
4499db73 | 1160 | V PERL_MAGIC_vstring (none) SV was vstring literal |
bd6e6c12 FC |
1161 | v PERL_MAGIC_vec vtbl_vec vec() lvalue |
1162 | w PERL_MAGIC_utf8 vtbl_utf8 Cached UTF-8 information | |
1163 | x PERL_MAGIC_substr vtbl_substr substr() lvalue | |
1164 | y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator | |
1165 | variable / smart parameter | |
1166 | vivification | |
1167 | ] PERL_MAGIC_checkcall vtbl_checkcall inlining/mutation of call | |
1168 | to this CV | |
1169 | ~ PERL_MAGIC_ext (none) Available for use by | |
1170 | extensions | |
0cbee0a4 | 1171 | |
f1f5ddd7 | 1172 | =for mg_vtable.pl end |
d1b91892 | 1173 | |
68dc0745 | 1174 | When an uppercase and lowercase letter both exist in the table, then the |
92f0c265 JP |
1175 | uppercase letter is typically used to represent some kind of composite type |
1176 | (a list or a hash), and the lowercase letter is used to represent an element | |
10e2eb10 | 1177 | of that composite type. Some internals code makes use of this case |
92f0c265 | 1178 | relationship. However, 'v' and 'V' (vec and v-string) are in no way related. |
14befaf4 DM |
1179 | |
1180 | The C<PERL_MAGIC_ext> and C<PERL_MAGIC_uvar> magic types are defined | |
1181 | specifically for use by extensions and will not be used by perl itself. | |
1182 | Extensions can use C<PERL_MAGIC_ext> magic to 'attach' private information | |
1183 | to variables (typically objects). This is especially useful because | |
1184 | there is no way for normal perl code to corrupt this private information | |
1185 | (unlike using extra elements of a hash object). | |
1186 | ||
1187 | Similarly, C<PERL_MAGIC_uvar> magic can be used much like tie() to call a | |
1188 | C function any time a scalar's value is used or changed. The C<MAGIC>'s | |
bdbeb323 SM |
1189 | C<mg_ptr> field points to a C<ufuncs> structure: |
1190 | ||
1191 | struct ufuncs { | |
a9402793 AB |
1192 | I32 (*uf_val)(pTHX_ IV, SV*); |
1193 | I32 (*uf_set)(pTHX_ IV, SV*); | |
bdbeb323 SM |
1194 | IV uf_index; |
1195 | }; | |
1196 | ||
1197 | When the SV is read from or written to, the C<uf_val> or C<uf_set> | |
14befaf4 DM |
1198 | function will be called with C<uf_index> as the first arg and a pointer to |
1199 | the SV as the second. A simple example of how to add C<PERL_MAGIC_uvar> | |
1526ead6 AB |
1200 | magic is shown below. Note that the ufuncs structure is copied by |
1201 | sv_magic, so you can safely allocate it on the stack. | |
1202 | ||
1203 | void | |
1204 | Umagic(sv) | |
1205 | SV *sv; | |
1206 | PREINIT: | |
1207 | struct ufuncs uf; | |
1208 | CODE: | |
1209 | uf.uf_val = &my_get_fn; | |
1210 | uf.uf_set = &my_set_fn; | |
1211 | uf.uf_index = 0; | |
14befaf4 | 1212 | sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf)); |
5f05dabc | 1213 | |
1e73acc8 AS |
1214 | Attaching C<PERL_MAGIC_uvar> to arrays is permissible but has no effect. |
1215 | ||
1216 | For hashes there is a specialized hook that gives control over hash | |
1217 | keys (but not values). This hook calls C<PERL_MAGIC_uvar> 'get' magic | |
1218 | if the "set" function in the C<ufuncs> structure is NULL. The hook | |
1219 | is activated whenever the hash is accessed with a key specified as | |
1220 | an C<SV> through the functions C<hv_store_ent>, C<hv_fetch_ent>, | |
1221 | C<hv_delete_ent>, and C<hv_exists_ent>. Accessing the key as a string | |
1222 | through the functions without the C<..._ent> suffix circumvents the | |
4509d391 | 1223 | hook. See L<Hash::Util::FieldHash/GUTS> for a detailed description. |
1e73acc8 | 1224 | |
14befaf4 DM |
1225 | Note that because multiple extensions may be using C<PERL_MAGIC_ext> |
1226 | or C<PERL_MAGIC_uvar> magic, it is important for extensions to take | |
1227 | extra care to avoid conflict. Typically only using the magic on | |
1228 | objects blessed into the same class as the extension is sufficient. | |
2f07f21a FR |
1229 | For C<PERL_MAGIC_ext> magic, it is usually a good idea to define an |
1230 | C<MGVTBL>, even if all its fields will be C<0>, so that individual | |
1231 | C<MAGIC> pointers can be identified as a particular kind of magic | |
10e2eb10 | 1232 | using their magic virtual table. C<mg_findext> provides an easy way |
f6ee7b17 | 1233 | to do that: |
2f07f21a FR |
1234 | |
1235 | STATIC MGVTBL my_vtbl = { 0, 0, 0, 0, 0, 0, 0, 0 }; | |
1236 | ||
1237 | MAGIC *mg; | |
f6ee7b17 FR |
1238 | if ((mg = mg_findext(sv, PERL_MAGIC_ext, &my_vtbl))) { |
1239 | /* this is really ours, not another module's PERL_MAGIC_ext */ | |
1240 | my_priv_data_t *priv = (my_priv_data_t *)mg->mg_ptr; | |
1241 | ... | |
2f07f21a | 1242 | } |
5f05dabc | 1243 | |
ef50df4b GS |
1244 | Also note that the C<sv_set*()> and C<sv_cat*()> functions described |
1245 | earlier do B<not> invoke 'set' magic on their targets. This must | |
1246 | be done by the user either by calling the C<SvSETMAGIC()> macro after | |
1247 | calling these functions, or by using one of the C<sv_set*_mg()> or | |
1248 | C<sv_cat*_mg()> functions. Similarly, generic C code must call the | |
1249 | C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV | |
1250 | obtained from external sources in functions that don't handle magic. | |
4a4eefd0 | 1251 | See L<perlapi> for a description of these functions. |
189b2af5 GS |
1252 | For example, calls to the C<sv_cat*()> functions typically need to be |
1253 | followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()> | |
1254 | since their implementation handles 'get' magic. | |
1255 | ||
d1b91892 AD |
1256 | =head2 Finding Magic |
1257 | ||
a9b0660e KW |
1258 | MAGIC *mg_find(SV *sv, int type); /* Finds the magic pointer of that |
1259 | * type */ | |
f6ee7b17 FR |
1260 | |
1261 | This routine returns a pointer to a C<MAGIC> structure stored in the SV. | |
10e2eb10 FC |
1262 | If the SV does not have that magical |
1263 | feature, C<NULL> is returned. If the | |
f6ee7b17 | 1264 | SV has multiple instances of that magical feature, the first one will be |
10e2eb10 FC |
1265 | returned. C<mg_findext> can be used |
1266 | to find a C<MAGIC> structure of an SV | |
da8c5729 | 1267 | based on both its magic type and its magic virtual table: |
f6ee7b17 FR |
1268 | |
1269 | MAGIC *mg_findext(SV *sv, int type, MGVTBL *vtbl); | |
d1b91892 | 1270 | |
f6ee7b17 FR |
1271 | Also, if the SV passed to C<mg_find> or C<mg_findext> is not of type |
1272 | SVt_PVMG, Perl may core dump. | |
d1b91892 | 1273 | |
08105a92 | 1274 | int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen); |
d1b91892 AD |
1275 | |
1276 | This routine checks to see what types of magic C<sv> has. If the mg_type | |
68dc0745 | 1277 | field is an uppercase letter, then the mg_obj is copied to C<nsv>, but |
1278 | the mg_type field is changed to be the lowercase letter. | |
a0d0e21e | 1279 | |
04343c6d GS |
1280 | =head2 Understanding the Magic of Tied Hashes and Arrays |
1281 | ||
14befaf4 DM |
1282 | Tied hashes and arrays are magical beasts of the C<PERL_MAGIC_tied> |
1283 | magic type. | |
9edb2b46 GS |
1284 | |
1285 | WARNING: As of the 5.004 release, proper usage of the array and hash | |
1286 | access functions requires understanding a few caveats. Some | |
1287 | of these caveats are actually considered bugs in the API, to be fixed | |
10e2eb10 | 1288 | in later releases, and are bracketed with [MAYCHANGE] below. If |
9edb2b46 GS |
1289 | you find yourself actually applying such information in this section, be |
1290 | aware that the behavior may change in the future, umm, without warning. | |
04343c6d | 1291 | |
1526ead6 | 1292 | The perl tie function associates a variable with an object that implements |
9a68f1db | 1293 | the various GET, SET, etc methods. To perform the equivalent of the perl |
1526ead6 AB |
1294 | tie function from an XSUB, you must mimic this behaviour. The code below |
1295 | carries out the necessary steps - firstly it creates a new hash, and then | |
1296 | creates a second hash which it blesses into the class which will implement | |
10e2eb10 | 1297 | the tie methods. Lastly it ties the two hashes together, and returns a |
1526ead6 AB |
1298 | reference to the new tied hash. Note that the code below does NOT call the |
1299 | TIEHASH method in the MyTie class - | |
1300 | see L<Calling Perl Routines from within C Programs> for details on how | |
1301 | to do this. | |
1302 | ||
1303 | SV* | |
1304 | mytie() | |
1305 | PREINIT: | |
1306 | HV *hash; | |
1307 | HV *stash; | |
1308 | SV *tie; | |
1309 | CODE: | |
1310 | hash = newHV(); | |
1311 | tie = newRV_noinc((SV*)newHV()); | |
da51bb9b | 1312 | stash = gv_stashpv("MyTie", GV_ADD); |
1526ead6 | 1313 | sv_bless(tie, stash); |
899e16d0 | 1314 | hv_magic(hash, (GV*)tie, PERL_MAGIC_tied); |
1526ead6 AB |
1315 | RETVAL = newRV_noinc(hash); |
1316 | OUTPUT: | |
1317 | RETVAL | |
1318 | ||
04343c6d GS |
1319 | The C<av_store> function, when given a tied array argument, merely |
1320 | copies the magic of the array onto the value to be "stored", using | |
1321 | C<mg_copy>. It may also return NULL, indicating that the value did not | |
9edb2b46 GS |
1322 | actually need to be stored in the array. [MAYCHANGE] After a call to |
1323 | C<av_store> on a tied array, the caller will usually need to call | |
1324 | C<mg_set(val)> to actually invoke the perl level "STORE" method on the | |
1325 | TIEARRAY object. If C<av_store> did return NULL, a call to | |
1326 | C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory | |
1327 | leak. [/MAYCHANGE] | |
04343c6d GS |
1328 | |
1329 | The previous paragraph is applicable verbatim to tied hash access using the | |
1330 | C<hv_store> and C<hv_store_ent> functions as well. | |
1331 | ||
1332 | C<av_fetch> and the corresponding hash functions C<hv_fetch> and | |
1333 | C<hv_fetch_ent> actually return an undefined mortal value whose magic | |
1334 | has been initialized using C<mg_copy>. Note the value so returned does not | |
9edb2b46 GS |
1335 | need to be deallocated, as it is already mortal. [MAYCHANGE] But you will |
1336 | need to call C<mg_get()> on the returned value in order to actually invoke | |
1337 | the perl level "FETCH" method on the underlying TIE object. Similarly, | |
04343c6d GS |
1338 | you may also call C<mg_set()> on the return value after possibly assigning |
1339 | a suitable value to it using C<sv_setsv>, which will invoke the "STORE" | |
9edb2b46 | 1340 | method on the TIE object. [/MAYCHANGE] |
04343c6d | 1341 | |
9edb2b46 | 1342 | [MAYCHANGE] |
04343c6d GS |
1343 | In other words, the array or hash fetch/store functions don't really |
1344 | fetch and store actual values in the case of tied arrays and hashes. They | |
1345 | merely call C<mg_copy> to attach magic to the values that were meant to be | |
1346 | "stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually | |
1347 | do the job of invoking the TIE methods on the underlying objects. Thus | |
9edb2b46 | 1348 | the magic mechanism currently implements a kind of lazy access to arrays |
04343c6d GS |
1349 | and hashes. |
1350 | ||
1351 | Currently (as of perl version 5.004), use of the hash and array access | |
1352 | functions requires the user to be aware of whether they are operating on | |
9edb2b46 GS |
1353 | "normal" hashes and arrays, or on their tied variants. The API may be |
1354 | changed to provide more transparent access to both tied and normal data | |
1355 | types in future versions. | |
1356 | [/MAYCHANGE] | |
04343c6d GS |
1357 | |
1358 | You would do well to understand that the TIEARRAY and TIEHASH interfaces | |
1359 | are mere sugar to invoke some perl method calls while using the uniform hash | |
1360 | and array syntax. The use of this sugar imposes some overhead (typically | |
1361 | about two to four extra opcodes per FETCH/STORE operation, in addition to | |
1362 | the creation of all the mortal variables required to invoke the methods). | |
1363 | This overhead will be comparatively small if the TIE methods are themselves | |
1364 | substantial, but if they are only a few statements long, the overhead | |
1365 | will not be insignificant. | |
1366 | ||
d1c897a1 IZ |
1367 | =head2 Localizing changes |
1368 | ||
1369 | Perl has a very handy construction | |
1370 | ||
1371 | { | |
1372 | local $var = 2; | |
1373 | ... | |
1374 | } | |
1375 | ||
1376 | This construction is I<approximately> equivalent to | |
1377 | ||
1378 | { | |
1379 | my $oldvar = $var; | |
1380 | $var = 2; | |
1381 | ... | |
1382 | $var = $oldvar; | |
1383 | } | |
1384 | ||
1385 | The biggest difference is that the first construction would | |
1386 | reinstate the initial value of $var, irrespective of how control exits | |
10e2eb10 | 1387 | the block: C<goto>, C<return>, C<die>/C<eval>, etc. It is a little bit |
d1c897a1 IZ |
1388 | more efficient as well. |
1389 | ||
1390 | There is a way to achieve a similar task from C via Perl API: create a | |
1391 | I<pseudo-block>, and arrange for some changes to be automatically | |
1392 | undone at the end of it, either explicit, or via a non-local exit (via | |
10e2eb10 | 1393 | die()). A I<block>-like construct is created by a pair of |
b687b08b TC |
1394 | C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">). |
1395 | Such a construct may be created specially for some important localized | |
1396 | task, or an existing one (like boundaries of enclosing Perl | |
1397 | subroutine/block, or an existing pair for freeing TMPs) may be | |
10e2eb10 FC |
1398 | used. (In the second case the overhead of additional localization must |
1399 | be almost negligible.) Note that any XSUB is automatically enclosed in | |
b687b08b | 1400 | an C<ENTER>/C<LEAVE> pair. |
d1c897a1 IZ |
1401 | |
1402 | Inside such a I<pseudo-block> the following service is available: | |
1403 | ||
13a2d996 | 1404 | =over 4 |
d1c897a1 IZ |
1405 | |
1406 | =item C<SAVEINT(int i)> | |
1407 | ||
1408 | =item C<SAVEIV(IV i)> | |
1409 | ||
1410 | =item C<SAVEI32(I32 i)> | |
1411 | ||
1412 | =item C<SAVELONG(long i)> | |
1413 | ||
1414 | These macros arrange things to restore the value of integer variable | |
1415 | C<i> at the end of enclosing I<pseudo-block>. | |
1416 | ||
1417 | =item C<SAVESPTR(s)> | |
1418 | ||
1419 | =item C<SAVEPPTR(p)> | |
1420 | ||
1421 | These macros arrange things to restore the value of pointers C<s> and | |
10e2eb10 | 1422 | C<p>. C<s> must be a pointer of a type which survives conversion to |
d1c897a1 IZ |
1423 | C<SV*> and back, C<p> should be able to survive conversion to C<char*> |
1424 | and back. | |
1425 | ||
1426 | =item C<SAVEFREESV(SV *sv)> | |
1427 | ||
1428 | The refcount of C<sv> would be decremented at the end of | |
26d9b02f JH |
1429 | I<pseudo-block>. This is similar to C<sv_2mortal> in that it is also a |
1430 | mechanism for doing a delayed C<SvREFCNT_dec>. However, while C<sv_2mortal> | |
1431 | extends the lifetime of C<sv> until the beginning of the next statement, | |
1432 | C<SAVEFREESV> extends it until the end of the enclosing scope. These | |
1433 | lifetimes can be wildly different. | |
1434 | ||
1435 | Also compare C<SAVEMORTALIZESV>. | |
1436 | ||
1437 | =item C<SAVEMORTALIZESV(SV *sv)> | |
1438 | ||
1439 | Just like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current | |
1440 | scope instead of decrementing its reference count. This usually has the | |
1441 | effect of keeping C<sv> alive until the statement that called the currently | |
1442 | live scope has finished executing. | |
d1c897a1 IZ |
1443 | |
1444 | =item C<SAVEFREEOP(OP *op)> | |
1445 | ||
1446 | The C<OP *> is op_free()ed at the end of I<pseudo-block>. | |
1447 | ||
1448 | =item C<SAVEFREEPV(p)> | |
1449 | ||
1450 | The chunk of memory which is pointed to by C<p> is Safefree()ed at the | |
1451 | end of I<pseudo-block>. | |
1452 | ||
1453 | =item C<SAVECLEARSV(SV *sv)> | |
1454 | ||
1455 | Clears a slot in the current scratchpad which corresponds to C<sv> at | |
1456 | the end of I<pseudo-block>. | |
1457 | ||
1458 | =item C<SAVEDELETE(HV *hv, char *key, I32 length)> | |
1459 | ||
10e2eb10 | 1460 | The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The |
d1c897a1 IZ |
1461 | string pointed to by C<key> is Safefree()ed. If one has a I<key> in |
1462 | short-lived storage, the corresponding string may be reallocated like | |
1463 | this: | |
1464 | ||
9cde0e7f | 1465 | SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf)); |
d1c897a1 | 1466 | |
c76ac1ee | 1467 | =item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)> |
d1c897a1 IZ |
1468 | |
1469 | At the end of I<pseudo-block> the function C<f> is called with the | |
c76ac1ee GS |
1470 | only argument C<p>. |
1471 | ||
1472 | =item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)> | |
1473 | ||
1474 | At the end of I<pseudo-block> the function C<f> is called with the | |
1475 | implicit context argument (if any), and C<p>. | |
d1c897a1 IZ |
1476 | |
1477 | =item C<SAVESTACK_POS()> | |
1478 | ||
1479 | The current offset on the Perl internal stack (cf. C<SP>) is restored | |
1480 | at the end of I<pseudo-block>. | |
1481 | ||
1482 | =back | |
1483 | ||
1484 | The following API list contains functions, thus one needs to | |
1485 | provide pointers to the modifiable data explicitly (either C pointers, | |
00aadd71 | 1486 | or Perlish C<GV *>s). Where the above macros take C<int>, a similar |
d1c897a1 IZ |
1487 | function takes C<int *>. |
1488 | ||
13a2d996 | 1489 | =over 4 |
d1c897a1 IZ |
1490 | |
1491 | =item C<SV* save_scalar(GV *gv)> | |
1492 | ||
1493 | Equivalent to Perl code C<local $gv>. | |
1494 | ||
1495 | =item C<AV* save_ary(GV *gv)> | |
1496 | ||
1497 | =item C<HV* save_hash(GV *gv)> | |
1498 | ||
1499 | Similar to C<save_scalar>, but localize C<@gv> and C<%gv>. | |
1500 | ||
1501 | =item C<void save_item(SV *item)> | |
1502 | ||
1503 | Duplicates the current value of C<SV>, on the exit from the current | |
1504 | C<ENTER>/C<LEAVE> I<pseudo-block> will restore the value of C<SV> | |
10e2eb10 | 1505 | using the stored value. It doesn't handle magic. Use C<save_scalar> if |
038fcae3 | 1506 | magic is affected. |
d1c897a1 IZ |
1507 | |
1508 | =item C<void save_list(SV **sarg, I32 maxsarg)> | |
1509 | ||
1510 | A variant of C<save_item> which takes multiple arguments via an array | |
1511 | C<sarg> of C<SV*> of length C<maxsarg>. | |
1512 | ||
1513 | =item C<SV* save_svref(SV **sptr)> | |
1514 | ||
d1be9408 | 1515 | Similar to C<save_scalar>, but will reinstate an C<SV *>. |
d1c897a1 IZ |
1516 | |
1517 | =item C<void save_aptr(AV **aptr)> | |
1518 | ||
1519 | =item C<void save_hptr(HV **hptr)> | |
1520 | ||
1521 | Similar to C<save_svref>, but localize C<AV *> and C<HV *>. | |
1522 | ||
1523 | =back | |
1524 | ||
1525 | The C<Alias> module implements localization of the basic types within the | |
1526 | I<caller's scope>. People who are interested in how to localize things in | |
1527 | the containing scope should take a look there too. | |
1528 | ||
0a753a76 | 1529 | =head1 Subroutines |
a0d0e21e | 1530 | |
68dc0745 | 1531 | =head2 XSUBs and the Argument Stack |
5f05dabc | 1532 | |
1533 | The XSUB mechanism is a simple way for Perl programs to access C subroutines. | |
1534 | An XSUB routine will have a stack that contains the arguments from the Perl | |
1535 | program, and a way to map from the Perl data structures to a C equivalent. | |
1536 | ||
1537 | The stack arguments are accessible through the C<ST(n)> macro, which returns | |
1538 | the C<n>'th stack argument. Argument 0 is the first argument passed in the | |
1539 | Perl subroutine call. These arguments are C<SV*>, and can be used anywhere | |
1540 | an C<SV*> is used. | |
1541 | ||
1542 | Most of the time, output from the C routine can be handled through use of | |
1543 | the RETVAL and OUTPUT directives. However, there are some cases where the | |
1544 | argument stack is not already long enough to handle all the return values. | |
1545 | An example is the POSIX tzname() call, which takes no arguments, but returns | |
1546 | two, the local time zone's standard and summer time abbreviations. | |
1547 | ||
1548 | To handle this situation, the PPCODE directive is used and the stack is | |
1549 | extended using the macro: | |
1550 | ||
924508f0 | 1551 | EXTEND(SP, num); |
5f05dabc | 1552 | |
924508f0 GS |
1553 | where C<SP> is the macro that represents the local copy of the stack pointer, |
1554 | and C<num> is the number of elements the stack should be extended by. | |
5f05dabc | 1555 | |
00aadd71 | 1556 | Now that there is room on the stack, values can be pushed on it using C<PUSHs> |
10e2eb10 | 1557 | macro. The pushed values will often need to be "mortal" (See |
d82b684c | 1558 | L</Reference Counts and Mortality>): |
5f05dabc | 1559 | |
00aadd71 | 1560 | PUSHs(sv_2mortal(newSViv(an_integer))) |
d82b684c SH |
1561 | PUSHs(sv_2mortal(newSVuv(an_unsigned_integer))) |
1562 | PUSHs(sv_2mortal(newSVnv(a_double))) | |
00aadd71 | 1563 | PUSHs(sv_2mortal(newSVpv("Some String",0))) |
a9b0660e KW |
1564 | /* Although the last example is better written as the more |
1565 | * efficient: */ | |
a3179684 | 1566 | PUSHs(newSVpvs_flags("Some String", SVs_TEMP)) |
5f05dabc | 1567 | |
1568 | And now the Perl program calling C<tzname>, the two values will be assigned | |
1569 | as in: | |
1570 | ||
1571 | ($standard_abbrev, $summer_abbrev) = POSIX::tzname; | |
1572 | ||
1573 | An alternate (and possibly simpler) method to pushing values on the stack is | |
00aadd71 | 1574 | to use the macro: |
5f05dabc | 1575 | |
5f05dabc | 1576 | XPUSHs(SV*) |
1577 | ||
da8c5729 | 1578 | This macro automatically adjusts the stack for you, if needed. Thus, you |
5f05dabc | 1579 | do not need to call C<EXTEND> to extend the stack. |
00aadd71 NIS |
1580 | |
1581 | Despite their suggestions in earlier versions of this document the macros | |
d82b684c SH |
1582 | C<(X)PUSH[iunp]> are I<not> suited to XSUBs which return multiple results. |
1583 | For that, either stick to the C<(X)PUSHs> macros shown above, or use the new | |
1584 | C<m(X)PUSH[iunp]> macros instead; see L</Putting a C value on Perl stack>. | |
5f05dabc | 1585 | |
1586 | For more information, consult L<perlxs> and L<perlxstut>. | |
1587 | ||
5b36e945 FC |
1588 | =head2 Autoloading with XSUBs |
1589 | ||
1590 | If an AUTOLOAD routine is an XSUB, as with Perl subroutines, Perl puts the | |
1591 | fully-qualified name of the autoloaded subroutine in the $AUTOLOAD variable | |
1592 | of the XSUB's package. | |
1593 | ||
1594 | But it also puts the same information in certain fields of the XSUB itself: | |
1595 | ||
1596 | HV *stash = CvSTASH(cv); | |
1597 | const char *subname = SvPVX(cv); | |
1598 | STRLEN name_length = SvCUR(cv); /* in bytes */ | |
1599 | U32 is_utf8 = SvUTF8(cv); | |
f703fc96 | 1600 | |
5b36e945 | 1601 | C<SvPVX(cv)> contains just the sub name itself, not including the package. |
d8893903 FC |
1602 | For an AUTOLOAD routine in UNIVERSAL or one of its superclasses, |
1603 | C<CvSTASH(cv)> returns NULL during a method call on a nonexistent package. | |
5b36e945 FC |
1604 | |
1605 | B<Note>: Setting $AUTOLOAD stopped working in 5.6.1, which did not support | |
1606 | XS AUTOLOAD subs at all. Perl 5.8.0 introduced the use of fields in the | |
1607 | XSUB itself. Perl 5.16.0 restored the setting of $AUTOLOAD. If you need | |
1608 | to support 5.8-5.14, use the XSUB's fields. | |
1609 | ||
5f05dabc | 1610 | =head2 Calling Perl Routines from within C Programs |
a0d0e21e LW |
1611 | |
1612 | There are four routines that can be used to call a Perl subroutine from | |
1613 | within a C program. These four are: | |
1614 | ||
954c1994 GS |
1615 | I32 call_sv(SV*, I32); |
1616 | I32 call_pv(const char*, I32); | |
1617 | I32 call_method(const char*, I32); | |
5aaab254 | 1618 | I32 call_argv(const char*, I32, char**); |
a0d0e21e | 1619 | |
954c1994 | 1620 | The routine most often used is C<call_sv>. The C<SV*> argument |
d1b91892 AD |
1621 | contains either the name of the Perl subroutine to be called, or a |
1622 | reference to the subroutine. The second argument consists of flags | |
1623 | that control the context in which the subroutine is called, whether | |
1624 | or not the subroutine is being passed arguments, how errors should be | |
1625 | trapped, and how to treat return values. | |
a0d0e21e LW |
1626 | |
1627 | All four routines return the number of arguments that the subroutine returned | |
1628 | on the Perl stack. | |
1629 | ||
9a68f1db | 1630 | These routines used to be called C<perl_call_sv>, etc., before Perl v5.6.0, |
954c1994 GS |
1631 | but those names are now deprecated; macros of the same name are provided for |
1632 | compatibility. | |
1633 | ||
1634 | When using any of these routines (except C<call_argv>), the programmer | |
d1b91892 AD |
1635 | must manipulate the Perl stack. These include the following macros and |
1636 | functions: | |
a0d0e21e LW |
1637 | |
1638 | dSP | |
924508f0 | 1639 | SP |
a0d0e21e LW |
1640 | PUSHMARK() |
1641 | PUTBACK | |
1642 | SPAGAIN | |
1643 | ENTER | |
1644 | SAVETMPS | |
1645 | FREETMPS | |
1646 | LEAVE | |
1647 | XPUSH*() | |
cb1a09d0 | 1648 | POP*() |
a0d0e21e | 1649 | |
5f05dabc | 1650 | For a detailed description of calling conventions from C to Perl, |
1651 | consult L<perlcall>. | |
a0d0e21e | 1652 | |
8ebc5c01 | 1653 | =head2 Putting a C value on Perl stack |
ce3d39e2 IZ |
1654 | |
1655 | A lot of opcodes (this is an elementary operation in the internal perl | |
10e2eb10 FC |
1656 | stack machine) put an SV* on the stack. However, as an optimization |
1657 | the corresponding SV is (usually) not recreated each time. The opcodes | |
ce3d39e2 IZ |
1658 | reuse specially assigned SVs (I<target>s) which are (as a corollary) |
1659 | not constantly freed/created. | |
1660 | ||
0a753a76 | 1661 | Each of the targets is created only once (but see |
ce3d39e2 IZ |
1662 | L<Scratchpads and recursion> below), and when an opcode needs to put |
1663 | an integer, a double, or a string on stack, it just sets the | |
1664 | corresponding parts of its I<target> and puts the I<target> on stack. | |
1665 | ||
1666 | The macro to put this target on stack is C<PUSHTARG>, and it is | |
1667 | directly used in some opcodes, as well as indirectly in zillions of | |
d82b684c | 1668 | others, which use it via C<(X)PUSH[iunp]>. |
ce3d39e2 | 1669 | |
1bd1c0d5 | 1670 | Because the target is reused, you must be careful when pushing multiple |
10e2eb10 | 1671 | values on the stack. The following code will not do what you think: |
1bd1c0d5 SC |
1672 | |
1673 | XPUSHi(10); | |
1674 | XPUSHi(20); | |
1675 | ||
1676 | This translates as "set C<TARG> to 10, push a pointer to C<TARG> onto | |
1677 | the stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack". | |
1678 | At the end of the operation, the stack does not contain the values 10 | |
1679 | and 20, but actually contains two pointers to C<TARG>, which we have set | |
d82b684c | 1680 | to 20. |
1bd1c0d5 | 1681 | |
d82b684c SH |
1682 | If you need to push multiple different values then you should either use |
1683 | the C<(X)PUSHs> macros, or else use the new C<m(X)PUSH[iunp]> macros, | |
1684 | none of which make use of C<TARG>. The C<(X)PUSHs> macros simply push an | |
1685 | SV* on the stack, which, as noted under L</XSUBs and the Argument Stack>, | |
1686 | will often need to be "mortal". The new C<m(X)PUSH[iunp]> macros make | |
1687 | this a little easier to achieve by creating a new mortal for you (via | |
1688 | C<(X)PUSHmortal>), pushing that onto the stack (extending it if necessary | |
1689 | in the case of the C<mXPUSH[iunp]> macros), and then setting its value. | |
1690 | Thus, instead of writing this to "fix" the example above: | |
1691 | ||
1692 | XPUSHs(sv_2mortal(newSViv(10))) | |
1693 | XPUSHs(sv_2mortal(newSViv(20))) | |
1694 | ||
1695 | you can simply write: | |
1696 | ||
1697 | mXPUSHi(10) | |
1698 | mXPUSHi(20) | |
1699 | ||
1700 | On a related note, if you do use C<(X)PUSH[iunp]>, then you're going to | |
1bd1c0d5 | 1701 | need a C<dTARG> in your variable declarations so that the C<*PUSH*> |
d82b684c SH |
1702 | macros can make use of the local variable C<TARG>. See also C<dTARGET> |
1703 | and C<dXSTARG>. | |
1bd1c0d5 | 1704 | |
8ebc5c01 | 1705 | =head2 Scratchpads |
ce3d39e2 | 1706 | |
54310121 | 1707 | The question remains on when the SVs which are I<target>s for opcodes |
10e2eb10 | 1708 | are created. The answer is that they are created when the current |
ac036724 | 1709 | unit--a subroutine or a file (for opcodes for statements outside of |
10e2eb10 | 1710 | subroutines)--is compiled. During this time a special anonymous Perl |
ac036724 | 1711 | array is created, which is called a scratchpad for the current unit. |
ce3d39e2 | 1712 | |
54310121 | 1713 | A scratchpad keeps SVs which are lexicals for the current unit and are |
d777b41a FC |
1714 | targets for opcodes. A previous version of this document |
1715 | stated that one can deduce that an SV lives on a scratchpad | |
ce3d39e2 | 1716 | by looking on its flags: lexicals have C<SVs_PADMY> set, and |
d777b41a FC |
1717 | I<target>s have C<SVs_PADTMP> set. But this have never been fully true. |
1718 | C<SVs_PADMY> could be set on a variable that no longer resides in any pad. | |
1719 | While I<target>s do have C<SVs_PADTMP> set, it can also be set on variables | |
1720 | that have never resided in a pad, but nonetheless act like I<target>s. | |
ce3d39e2 | 1721 | |
10e2eb10 | 1722 | The correspondence between OPs and I<target>s is not 1-to-1. Different |
54310121 | 1723 | OPs in the compile tree of the unit can use the same target, if this |
ce3d39e2 IZ |
1724 | would not conflict with the expected life of the temporary. |
1725 | ||
2ae324a7 | 1726 | =head2 Scratchpads and recursion |
ce3d39e2 IZ |
1727 | |
1728 | In fact it is not 100% true that a compiled unit contains a pointer to | |
10e2eb10 FC |
1729 | the scratchpad AV. In fact it contains a pointer to an AV of |
1730 | (initially) one element, and this element is the scratchpad AV. Why do | |
ce3d39e2 IZ |
1731 | we need an extra level of indirection? |
1732 | ||
10e2eb10 | 1733 | The answer is B<recursion>, and maybe B<threads>. Both |
ce3d39e2 | 1734 | these can create several execution pointers going into the same |
10e2eb10 | 1735 | subroutine. For the subroutine-child not write over the temporaries |
ce3d39e2 IZ |
1736 | for the subroutine-parent (lifespan of which covers the call to the |
1737 | child), the parent and the child should have different | |
10e2eb10 | 1738 | scratchpads. (I<And> the lexicals should be separate anyway!) |
ce3d39e2 | 1739 | |
5f05dabc | 1740 | So each subroutine is born with an array of scratchpads (of length 1). |
1741 | On each entry to the subroutine it is checked that the current | |
ce3d39e2 IZ |
1742 | depth of the recursion is not more than the length of this array, and |
1743 | if it is, new scratchpad is created and pushed into the array. | |
1744 | ||
1745 | The I<target>s on this scratchpad are C<undef>s, but they are already | |
1746 | marked with correct flags. | |
1747 | ||
22d36020 FC |
1748 | =head1 Memory Allocation |
1749 | ||
1750 | =head2 Allocation | |
1751 | ||
1752 | All memory meant to be used with the Perl API functions should be manipulated | |
1753 | using the macros described in this section. The macros provide the necessary | |
1754 | transparency between differences in the actual malloc implementation that is | |
1755 | used within perl. | |
1756 | ||
1757 | It is suggested that you enable the version of malloc that is distributed | |
1758 | with Perl. It keeps pools of various sizes of unallocated memory in | |
1759 | order to satisfy allocation requests more quickly. However, on some | |
1760 | platforms, it may cause spurious malloc or free errors. | |
1761 | ||
1762 | The following three macros are used to initially allocate memory : | |
1763 | ||
1764 | Newx(pointer, number, type); | |
1765 | Newxc(pointer, number, type, cast); | |
1766 | Newxz(pointer, number, type); | |
1767 | ||
1768 | The first argument C<pointer> should be the name of a variable that will | |
1769 | point to the newly allocated memory. | |
1770 | ||
1771 | The second and third arguments C<number> and C<type> specify how many of | |
1772 | the specified type of data structure should be allocated. The argument | |
1773 | C<type> is passed to C<sizeof>. The final argument to C<Newxc>, C<cast>, | |
1774 | should be used if the C<pointer> argument is different from the C<type> | |
1775 | argument. | |
1776 | ||
1777 | Unlike the C<Newx> and C<Newxc> macros, the C<Newxz> macro calls C<memzero> | |
1778 | to zero out all the newly allocated memory. | |
1779 | ||
1780 | =head2 Reallocation | |
1781 | ||
1782 | Renew(pointer, number, type); | |
1783 | Renewc(pointer, number, type, cast); | |
1784 | Safefree(pointer) | |
1785 | ||
1786 | These three macros are used to change a memory buffer size or to free a | |
1787 | piece of memory no longer needed. The arguments to C<Renew> and C<Renewc> | |
1788 | match those of C<New> and C<Newc> with the exception of not needing the | |
1789 | "magic cookie" argument. | |
1790 | ||
1791 | =head2 Moving | |
1792 | ||
1793 | Move(source, dest, number, type); | |
1794 | Copy(source, dest, number, type); | |
1795 | Zero(dest, number, type); | |
1796 | ||
1797 | These three macros are used to move, copy, or zero out previously allocated | |
1798 | memory. The C<source> and C<dest> arguments point to the source and | |
1799 | destination starting points. Perl will move, copy, or zero out C<number> | |
1800 | instances of the size of the C<type> data structure (using the C<sizeof> | |
1801 | function). | |
1802 | ||
1803 | =head1 PerlIO | |
1804 | ||
1805 | The most recent development releases of Perl have been experimenting with | |
1806 | removing Perl's dependency on the "normal" standard I/O suite and allowing | |
1807 | other stdio implementations to be used. This involves creating a new | |
1808 | abstraction layer that then calls whichever implementation of stdio Perl | |
1809 | was compiled with. All XSUBs should now use the functions in the PerlIO | |
1810 | abstraction layer and not make any assumptions about what kind of stdio | |
1811 | is being used. | |
1812 | ||
1813 | For a complete description of the PerlIO abstraction, consult L<perlapio>. | |
1814 | ||
0a753a76 | 1815 | =head1 Compiled code |
1816 | ||
1817 | =head2 Code tree | |
1818 | ||
1819 | Here we describe the internal form your code is converted to by | |
10e2eb10 | 1820 | Perl. Start with a simple example: |
0a753a76 | 1821 | |
1822 | $a = $b + $c; | |
1823 | ||
1824 | This is converted to a tree similar to this one: | |
1825 | ||
1826 | assign-to | |
1827 | / \ | |
1828 | + $a | |
1829 | / \ | |
1830 | $b $c | |
1831 | ||
7b8d334a | 1832 | (but slightly more complicated). This tree reflects the way Perl |
0a753a76 | 1833 | parsed your code, but has nothing to do with the execution order. |
1834 | There is an additional "thread" going through the nodes of the tree | |
1835 | which shows the order of execution of the nodes. In our simplified | |
1836 | example above it looks like: | |
1837 | ||
1838 | $b ---> $c ---> + ---> $a ---> assign-to | |
1839 | ||
1840 | But with the actual compile tree for C<$a = $b + $c> it is different: | |
1841 | some nodes I<optimized away>. As a corollary, though the actual tree | |
1842 | contains more nodes than our simplified example, the execution order | |
1843 | is the same as in our example. | |
1844 | ||
1845 | =head2 Examining the tree | |
1846 | ||
06f6df17 RGS |
1847 | If you have your perl compiled for debugging (usually done with |
1848 | C<-DDEBUGGING> on the C<Configure> command line), you may examine the | |
0a753a76 | 1849 | compiled tree by specifying C<-Dx> on the Perl command line. The |
1850 | output takes several lines per node, and for C<$b+$c> it looks like | |
1851 | this: | |
1852 | ||
1853 | 5 TYPE = add ===> 6 | |
1854 | TARG = 1 | |
1855 | FLAGS = (SCALAR,KIDS) | |
1856 | { | |
1857 | TYPE = null ===> (4) | |
1858 | (was rv2sv) | |
1859 | FLAGS = (SCALAR,KIDS) | |
1860 | { | |
1861 | 3 TYPE = gvsv ===> 4 | |
1862 | FLAGS = (SCALAR) | |
1863 | GV = main::b | |
1864 | } | |
1865 | } | |
1866 | { | |
1867 | TYPE = null ===> (5) | |
1868 | (was rv2sv) | |
1869 | FLAGS = (SCALAR,KIDS) | |
1870 | { | |
1871 | 4 TYPE = gvsv ===> 5 | |
1872 | FLAGS = (SCALAR) | |
1873 | GV = main::c | |
1874 | } | |
1875 | } | |
1876 | ||
1877 | This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are | |
1878 | not optimized away (one per number in the left column). The immediate | |
1879 | children of the given node correspond to C<{}> pairs on the same level | |
1880 | of indentation, thus this listing corresponds to the tree: | |
1881 | ||
1882 | add | |
1883 | / \ | |
1884 | null null | |
1885 | | | | |
1886 | gvsv gvsv | |
1887 | ||
1888 | The execution order is indicated by C<===E<gt>> marks, thus it is C<3 | |
1889 | 4 5 6> (node C<6> is not included into above listing), i.e., | |
1890 | C<gvsv gvsv add whatever>. | |
1891 | ||
9afa14e3 | 1892 | Each of these nodes represents an op, a fundamental operation inside the |
10e2eb10 | 1893 | Perl core. The code which implements each operation can be found in the |
9afa14e3 | 1894 | F<pp*.c> files; the function which implements the op with type C<gvsv> |
10e2eb10 | 1895 | is C<pp_gvsv>, and so on. As the tree above shows, different ops have |
9afa14e3 | 1896 | different numbers of children: C<add> is a binary operator, as one would |
10e2eb10 | 1897 | expect, and so has two children. To accommodate the various different |
9afa14e3 SC |
1898 | numbers of children, there are various types of op data structure, and |
1899 | they link together in different ways. | |
1900 | ||
10e2eb10 | 1901 | The simplest type of op structure is C<OP>: this has no children. Unary |
9afa14e3 | 1902 | operators, C<UNOP>s, have one child, and this is pointed to by the |
10e2eb10 FC |
1903 | C<op_first> field. Binary operators (C<BINOP>s) have not only an |
1904 | C<op_first> field but also an C<op_last> field. The most complex type of | |
1905 | op is a C<LISTOP>, which has any number of children. In this case, the | |
9afa14e3 | 1906 | first child is pointed to by C<op_first> and the last child by |
10e2eb10 | 1907 | C<op_last>. The children in between can be found by iteratively |
9afa14e3 SC |
1908 | following the C<op_sibling> pointer from the first child to the last. |
1909 | ||
1910 | There are also two other op types: a C<PMOP> holds a regular expression, | |
10e2eb10 FC |
1911 | and has no children, and a C<LOOP> may or may not have children. If the |
1912 | C<op_children> field is non-zero, it behaves like a C<LISTOP>. To | |
9afa14e3 SC |
1913 | complicate matters, if a C<UNOP> is actually a C<null> op after |
1914 | optimization (see L</Compile pass 2: context propagation>) it will still | |
1915 | have children in accordance with its former type. | |
1916 | ||
06f6df17 RGS |
1917 | Another way to examine the tree is to use a compiler back-end module, such |
1918 | as L<B::Concise>. | |
1919 | ||
0a753a76 | 1920 | =head2 Compile pass 1: check routines |
1921 | ||
8870b5c7 | 1922 | The tree is created by the compiler while I<yacc> code feeds it |
10e2eb10 | 1923 | the constructions it recognizes. Since I<yacc> works bottom-up, so does |
0a753a76 | 1924 | the first pass of perl compilation. |
1925 | ||
1926 | What makes this pass interesting for perl developers is that some | |
1927 | optimization may be performed on this pass. This is optimization by | |
8870b5c7 | 1928 | so-called "check routines". The correspondence between node names |
0a753a76 | 1929 | and corresponding check routines is described in F<opcode.pl> (do not |
1930 | forget to run C<make regen_headers> if you modify this file). | |
1931 | ||
1932 | A check routine is called when the node is fully constructed except | |
7b8d334a | 1933 | for the execution-order thread. Since at this time there are no |
0a753a76 | 1934 | back-links to the currently constructed node, one can do most any |
1935 | operation to the top-level node, including freeing it and/or creating | |
1936 | new nodes above/below it. | |
1937 | ||
1938 | The check routine returns the node which should be inserted into the | |
1939 | tree (if the top-level node was not modified, check routine returns | |
1940 | its argument). | |
1941 | ||
10e2eb10 | 1942 | By convention, check routines have names C<ck_*>. They are usually |
0a753a76 | 1943 | called from C<new*OP> subroutines (or C<convert>) (which in turn are |
1944 | called from F<perly.y>). | |
1945 | ||
1946 | =head2 Compile pass 1a: constant folding | |
1947 | ||
1948 | Immediately after the check routine is called the returned node is | |
1949 | checked for being compile-time executable. If it is (the value is | |
1950 | judged to be constant) it is immediately executed, and a I<constant> | |
1951 | node with the "return value" of the corresponding subtree is | |
1952 | substituted instead. The subtree is deleted. | |
1953 | ||
1954 | If constant folding was not performed, the execution-order thread is | |
1955 | created. | |
1956 | ||
1957 | =head2 Compile pass 2: context propagation | |
1958 | ||
1959 | When a context for a part of compile tree is known, it is propagated | |
a3cb178b | 1960 | down through the tree. At this time the context can have 5 values |
0a753a76 | 1961 | (instead of 2 for runtime context): void, boolean, scalar, list, and |
1962 | lvalue. In contrast with the pass 1 this pass is processed from top | |
1963 | to bottom: a node's context determines the context for its children. | |
1964 | ||
1965 | Additional context-dependent optimizations are performed at this time. | |
1966 | Since at this moment the compile tree contains back-references (via | |
1967 | "thread" pointers), nodes cannot be free()d now. To allow | |
1968 | optimized-away nodes at this stage, such nodes are null()ified instead | |
1969 | of free()ing (i.e. their type is changed to OP_NULL). | |
1970 | ||
1971 | =head2 Compile pass 3: peephole optimization | |
1972 | ||
1973 | After the compile tree for a subroutine (or for an C<eval> or a file) | |
10e2eb10 | 1974 | is created, an additional pass over the code is performed. This pass |
0a753a76 | 1975 | is neither top-down or bottom-up, but in the execution order (with |
9ea12537 Z |
1976 | additional complications for conditionals). Optimizations performed |
1977 | at this stage are subject to the same restrictions as in the pass 2. | |
1978 | ||
1979 | Peephole optimizations are done by calling the function pointed to | |
1980 | by the global variable C<PL_peepp>. By default, C<PL_peepp> just | |
1981 | calls the function pointed to by the global variable C<PL_rpeepp>. | |
1982 | By default, that performs some basic op fixups and optimisations along | |
1983 | the execution-order op chain, and recursively calls C<PL_rpeepp> for | |
1984 | each side chain of ops (resulting from conditionals). Extensions may | |
1985 | provide additional optimisations or fixups, hooking into either the | |
1986 | per-subroutine or recursive stage, like this: | |
1987 | ||
1988 | static peep_t prev_peepp; | |
1989 | static void my_peep(pTHX_ OP *o) | |
1990 | { | |
1991 | /* custom per-subroutine optimisation goes here */ | |
f0358462 | 1992 | prev_peepp(aTHX_ o); |
9ea12537 Z |
1993 | /* custom per-subroutine optimisation may also go here */ |
1994 | } | |
1995 | BOOT: | |
1996 | prev_peepp = PL_peepp; | |
1997 | PL_peepp = my_peep; | |
1998 | ||
1999 | static peep_t prev_rpeepp; | |
2000 | static void my_rpeep(pTHX_ OP *o) | |
2001 | { | |
2002 | OP *orig_o = o; | |
2003 | for(; o; o = o->op_next) { | |
2004 | /* custom per-op optimisation goes here */ | |
2005 | } | |
f0358462 | 2006 | prev_rpeepp(aTHX_ orig_o); |
9ea12537 Z |
2007 | } |
2008 | BOOT: | |
2009 | prev_rpeepp = PL_rpeepp; | |
2010 | PL_rpeepp = my_rpeep; | |
0a753a76 | 2011 | |
1ba7f851 PJ |
2012 | =head2 Pluggable runops |
2013 | ||
2014 | The compile tree is executed in a runops function. There are two runops | |
1388f78e RGS |
2015 | functions, in F<run.c> and in F<dump.c>. C<Perl_runops_debug> is used |
2016 | with DEBUGGING and C<Perl_runops_standard> is used otherwise. For fine | |
2017 | control over the execution of the compile tree it is possible to provide | |
2018 | your own runops function. | |
1ba7f851 PJ |
2019 | |
2020 | It's probably best to copy one of the existing runops functions and | |
2021 | change it to suit your needs. Then, in the BOOT section of your XS | |
2022 | file, add the line: | |
2023 | ||
2024 | PL_runops = my_runops; | |
2025 | ||
2026 | This function should be as efficient as possible to keep your programs | |
2027 | running as fast as possible. | |
2028 | ||
fd85fad2 BM |
2029 | =head2 Compile-time scope hooks |
2030 | ||
2031 | As of perl 5.14 it is possible to hook into the compile-time lexical | |
10e2eb10 | 2032 | scope mechanism using C<Perl_blockhook_register>. This is used like |
fd85fad2 BM |
2033 | this: |
2034 | ||
2035 | STATIC void my_start_hook(pTHX_ int full); | |
2036 | STATIC BHK my_hooks; | |
2037 | ||
2038 | BOOT: | |
a88d97bf | 2039 | BhkENTRY_set(&my_hooks, bhk_start, my_start_hook); |
fd85fad2 BM |
2040 | Perl_blockhook_register(aTHX_ &my_hooks); |
2041 | ||
2042 | This will arrange to have C<my_start_hook> called at the start of | |
10e2eb10 | 2043 | compiling every lexical scope. The available hooks are: |
fd85fad2 BM |
2044 | |
2045 | =over 4 | |
2046 | ||
a88d97bf | 2047 | =item C<void bhk_start(pTHX_ int full)> |
fd85fad2 | 2048 | |
10e2eb10 | 2049 | This is called just after starting a new lexical scope. Note that Perl |
fd85fad2 BM |
2050 | code like |
2051 | ||
2052 | if ($x) { ... } | |
2053 | ||
2054 | creates two scopes: the first starts at the C<(> and has C<full == 1>, | |
10e2eb10 FC |
2055 | the second starts at the C<{> and has C<full == 0>. Both end at the |
2056 | C<}>, so calls to C<start> and C<pre/post_end> will match. Anything | |
fd85fad2 BM |
2057 | pushed onto the save stack by this hook will be popped just before the |
2058 | scope ends (between the C<pre_> and C<post_end> hooks, in fact). | |
2059 | ||
a88d97bf | 2060 | =item C<void bhk_pre_end(pTHX_ OP **o)> |
fd85fad2 BM |
2061 | |
2062 | This is called at the end of a lexical scope, just before unwinding the | |
10e2eb10 | 2063 | stack. I<o> is the root of the optree representing the scope; it is a |
fd85fad2 BM |
2064 | double pointer so you can replace the OP if you need to. |
2065 | ||
a88d97bf | 2066 | =item C<void bhk_post_end(pTHX_ OP **o)> |
fd85fad2 BM |
2067 | |
2068 | This is called at the end of a lexical scope, just after unwinding the | |
10e2eb10 | 2069 | stack. I<o> is as above. Note that it is possible for calls to C<pre_> |
fd85fad2 BM |
2070 | and C<post_end> to nest, if there is something on the save stack that |
2071 | calls string eval. | |
2072 | ||
a88d97bf | 2073 | =item C<void bhk_eval(pTHX_ OP *const o)> |
fd85fad2 BM |
2074 | |
2075 | This is called just before starting to compile an C<eval STRING>, C<do | |
10e2eb10 | 2076 | FILE>, C<require> or C<use>, after the eval has been set up. I<o> is the |
fd85fad2 BM |
2077 | OP that requested the eval, and will normally be an C<OP_ENTEREVAL>, |
2078 | C<OP_DOFILE> or C<OP_REQUIRE>. | |
2079 | ||
2080 | =back | |
2081 | ||
2082 | Once you have your hook functions, you need a C<BHK> structure to put | |
10e2eb10 FC |
2083 | them in. It's best to allocate it statically, since there is no way to |
2084 | free it once it's registered. The function pointers should be inserted | |
fd85fad2 | 2085 | into this structure using the C<BhkENTRY_set> macro, which will also set |
10e2eb10 | 2086 | flags indicating which entries are valid. If you do need to allocate |
fd85fad2 BM |
2087 | your C<BHK> dynamically for some reason, be sure to zero it before you |
2088 | start. | |
2089 | ||
2090 | Once registered, there is no mechanism to switch these hooks off, so if | |
10e2eb10 | 2091 | that is necessary you will need to do this yourself. An entry in C<%^H> |
a3e07c87 BM |
2092 | is probably the best way, so the effect is lexically scoped; however it |
2093 | is also possible to use the C<BhkDISABLE> and C<BhkENABLE> macros to | |
10e2eb10 | 2094 | temporarily switch entries on and off. You should also be aware that |
a3e07c87 BM |
2095 | generally speaking at least one scope will have opened before your |
2096 | extension is loaded, so you will see some C<pre/post_end> pairs that | |
2097 | didn't have a matching C<start>. | |
fd85fad2 | 2098 | |
9afa14e3 SC |
2099 | =head1 Examining internal data structures with the C<dump> functions |
2100 | ||
2101 | To aid debugging, the source file F<dump.c> contains a number of | |
2102 | functions which produce formatted output of internal data structures. | |
2103 | ||
2104 | The most commonly used of these functions is C<Perl_sv_dump>; it's used | |
10e2eb10 | 2105 | for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls |
9afa14e3 | 2106 | C<sv_dump> to produce debugging output from Perl-space, so users of that |
00aadd71 | 2107 | module should already be familiar with its format. |
9afa14e3 SC |
2108 | |
2109 | C<Perl_op_dump> can be used to dump an C<OP> structure or any of its | |
210b36aa | 2110 | derivatives, and produces output similar to C<perl -Dx>; in fact, |
9afa14e3 SC |
2111 | C<Perl_dump_eval> will dump the main root of the code being evaluated, |
2112 | exactly like C<-Dx>. | |
2113 | ||
2114 | Other useful functions are C<Perl_dump_sub>, which turns a C<GV> into an | |
2115 | op tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the | |
2116 | subroutines in a package like so: (Thankfully, these are all xsubs, so | |
2117 | there is no op tree) | |
2118 | ||
2119 | (gdb) print Perl_dump_packsubs(PL_defstash) | |
2120 | ||
2121 | SUB attributes::bootstrap = (xsub 0x811fedc 0) | |
2122 | ||
2123 | SUB UNIVERSAL::can = (xsub 0x811f50c 0) | |
2124 | ||
2125 | SUB UNIVERSAL::isa = (xsub 0x811f304 0) | |
2126 | ||
2127 | SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0) | |
2128 | ||
2129 | SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0) | |
2130 | ||
2131 | and C<Perl_dump_all>, which dumps all the subroutines in the stash and | |
2132 | the op tree of the main root. | |
2133 | ||
954c1994 | 2134 | =head1 How multiple interpreters and concurrency are supported |
ee072b34 | 2135 | |
ee072b34 GS |
2136 | =head2 Background and PERL_IMPLICIT_CONTEXT |
2137 | ||
2138 | The Perl interpreter can be regarded as a closed box: it has an API | |
2139 | for feeding it code or otherwise making it do things, but it also has | |
2140 | functions for its own use. This smells a lot like an object, and | |
2141 | there are ways for you to build Perl so that you can have multiple | |
acfe0abc GS |
2142 | interpreters, with one interpreter represented either as a C structure, |
2143 | or inside a thread-specific structure. These structures contain all | |
2144 | the context, the state of that interpreter. | |
2145 | ||
10e2eb10 | 2146 | One macro controls the major Perl build flavor: MULTIPLICITY. The |
7b52221d | 2147 | MULTIPLICITY build has a C structure that packages all the interpreter |
10e2eb10 | 2148 | state. With multiplicity-enabled perls, PERL_IMPLICIT_CONTEXT is also |
7b52221d | 2149 | normally defined, and enables the support for passing in a "hidden" first |
10e2eb10 | 2150 | argument that represents all three data structures. MULTIPLICITY makes |
1a64a5e6 | 2151 | multi-threaded perls possible (with the ithreads threading model, related |
7b52221d | 2152 | to the macro USE_ITHREADS.) |
54aff467 | 2153 | |
27da23d5 JH |
2154 | Two other "encapsulation" macros are the PERL_GLOBAL_STRUCT and |
2155 | PERL_GLOBAL_STRUCT_PRIVATE (the latter turns on the former, and the | |
2156 | former turns on MULTIPLICITY.) The PERL_GLOBAL_STRUCT causes all the | |
2157 | internal variables of Perl to be wrapped inside a single global struct, | |
2158 | struct perl_vars, accessible as (globals) &PL_Vars or PL_VarsPtr or | |
2159 | the function Perl_GetVars(). The PERL_GLOBAL_STRUCT_PRIVATE goes | |
2160 | one step further, there is still a single struct (allocated in main() | |
2161 | either from heap or from stack) but there are no global data symbols | |
3bf17896 | 2162 | pointing to it. In either case the global struct should be initialized |
27da23d5 JH |
2163 | as the very first thing in main() using Perl_init_global_struct() and |
2164 | correspondingly tear it down after perl_free() using Perl_free_global_struct(), | |
2165 | please see F<miniperlmain.c> for usage details. You may also need | |
2166 | to use C<dVAR> in your coding to "declare the global variables" | |
2167 | when you are using them. dTHX does this for you automatically. | |
2168 | ||
bc028b6b JH |
2169 | To see whether you have non-const data you can use a BSD-compatible C<nm>: |
2170 | ||
2171 | nm libperl.a | grep -v ' [TURtr] ' | |
2172 | ||
2173 | If this displays any C<D> or C<d> symbols, you have non-const data. | |
2174 | ||
27da23d5 JH |
2175 | For backward compatibility reasons defining just PERL_GLOBAL_STRUCT |
2176 | doesn't actually hide all symbols inside a big global struct: some | |
2177 | PerlIO_xxx vtables are left visible. The PERL_GLOBAL_STRUCT_PRIVATE | |
2178 | then hides everything (see how the PERLIO_FUNCS_DECL is used). | |
2179 | ||
54aff467 | 2180 | All this obviously requires a way for the Perl internal functions to be |
acfe0abc | 2181 | either subroutines taking some kind of structure as the first |
ee072b34 | 2182 | argument, or subroutines taking nothing as the first argument. To |
acfe0abc | 2183 | enable these two very different ways of building the interpreter, |
ee072b34 GS |
2184 | the Perl source (as it does in so many other situations) makes heavy |
2185 | use of macros and subroutine naming conventions. | |
2186 | ||
54aff467 | 2187 | First problem: deciding which functions will be public API functions and |
00aadd71 | 2188 | which will be private. All functions whose names begin C<S_> are private |
954c1994 GS |
2189 | (think "S" for "secret" or "static"). All other functions begin with |
2190 | "Perl_", but just because a function begins with "Perl_" does not mean it is | |
10e2eb10 FC |
2191 | part of the API. (See L</Internal |
2192 | Functions>.) The easiest way to be B<sure> a | |
00aadd71 NIS |
2193 | function is part of the API is to find its entry in L<perlapi>. |
2194 | If it exists in L<perlapi>, it's part of the API. If it doesn't, and you | |
2195 | think it should be (i.e., you need it for your extension), send mail via | |
a422fd2d | 2196 | L<perlbug> explaining why you think it should be. |
ee072b34 GS |
2197 | |
2198 | Second problem: there must be a syntax so that the same subroutine | |
2199 | declarations and calls can pass a structure as their first argument, | |
2200 | or pass nothing. To solve this, the subroutines are named and | |
2201 | declared in a particular way. Here's a typical start of a static | |
2202 | function used within the Perl guts: | |
2203 | ||
2204 | STATIC void | |
2205 | S_incline(pTHX_ char *s) | |
2206 | ||
acfe0abc | 2207 | STATIC becomes "static" in C, and may be #define'd to nothing in some |
da8c5729 | 2208 | configurations in the future. |
ee072b34 | 2209 | |
651a3225 GS |
2210 | A public function (i.e. part of the internal API, but not necessarily |
2211 | sanctioned for use in extensions) begins like this: | |
ee072b34 GS |
2212 | |
2213 | void | |
2307c6d0 | 2214 | Perl_sv_setiv(pTHX_ SV* dsv, IV num) |
ee072b34 | 2215 | |
0147cd53 | 2216 | C<pTHX_> is one of a number of macros (in F<perl.h>) that hide the |
ee072b34 GS |
2217 | details of the interpreter's context. THX stands for "thread", "this", |
2218 | or "thingy", as the case may be. (And no, George Lucas is not involved. :-) | |
2219 | The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument, | |
a7486cbb JH |
2220 | or 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and |
2221 | their variants. | |
ee072b34 | 2222 | |
a7486cbb JH |
2223 | When Perl is built without options that set PERL_IMPLICIT_CONTEXT, there is no |
2224 | first argument containing the interpreter's context. The trailing underscore | |
ee072b34 GS |
2225 | in the pTHX_ macro indicates that the macro expansion needs a comma |
2226 | after the context argument because other arguments follow it. If | |
2227 | PERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be ignored, and the | |
54aff467 GS |
2228 | subroutine is not prototyped to take the extra argument. The form of the |
2229 | macro without the trailing underscore is used when there are no additional | |
ee072b34 GS |
2230 | explicit arguments. |
2231 | ||
54aff467 | 2232 | When a core function calls another, it must pass the context. This |
2307c6d0 | 2233 | is normally hidden via macros. Consider C<sv_setiv>. It expands into |
ee072b34 GS |
2234 | something like this: |
2235 | ||
2307c6d0 SB |
2236 | #ifdef PERL_IMPLICIT_CONTEXT |
2237 | #define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b) | |
ee072b34 | 2238 | /* can't do this for vararg functions, see below */ |
2307c6d0 SB |
2239 | #else |
2240 | #define sv_setiv Perl_sv_setiv | |
2241 | #endif | |
ee072b34 GS |
2242 | |
2243 | This works well, and means that XS authors can gleefully write: | |
2244 | ||
2307c6d0 | 2245 | sv_setiv(foo, bar); |
ee072b34 GS |
2246 | |
2247 | and still have it work under all the modes Perl could have been | |
2248 | compiled with. | |
2249 | ||
ee072b34 GS |
2250 | This doesn't work so cleanly for varargs functions, though, as macros |
2251 | imply that the number of arguments is known in advance. Instead we | |
2252 | either need to spell them out fully, passing C<aTHX_> as the first | |
2253 | argument (the Perl core tends to do this with functions like | |
2254 | Perl_warner), or use a context-free version. | |
2255 | ||
2256 | The context-free version of Perl_warner is called | |
2257 | Perl_warner_nocontext, and does not take the extra argument. Instead | |
2258 | it does dTHX; to get the context from thread-local storage. We | |
2259 | C<#define warner Perl_warner_nocontext> so that extensions get source | |
2260 | compatibility at the expense of performance. (Passing an arg is | |
2261 | cheaper than grabbing it from thread-local storage.) | |
2262 | ||
acfe0abc | 2263 | You can ignore [pad]THXx when browsing the Perl headers/sources. |
ee072b34 GS |
2264 | Those are strictly for use within the core. Extensions and embedders |
2265 | need only be aware of [pad]THX. | |
2266 | ||
a7486cbb JH |
2267 | =head2 So what happened to dTHR? |
2268 | ||
2269 | C<dTHR> was introduced in perl 5.005 to support the older thread model. | |
2270 | The older thread model now uses the C<THX> mechanism to pass context | |
2271 | pointers around, so C<dTHR> is not useful any more. Perl 5.6.0 and | |
2272 | later still have it for backward source compatibility, but it is defined | |
2273 | to be a no-op. | |
2274 | ||
ee072b34 GS |
2275 | =head2 How do I use all this in extensions? |
2276 | ||
2277 | When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call | |
2278 | any functions in the Perl API will need to pass the initial context | |
2279 | argument somehow. The kicker is that you will need to write it in | |
2280 | such a way that the extension still compiles when Perl hasn't been | |
2281 | built with PERL_IMPLICIT_CONTEXT enabled. | |
2282 | ||
2283 | There are three ways to do this. First, the easy but inefficient way, | |
2284 | which is also the default, in order to maintain source compatibility | |
0147cd53 | 2285 | with extensions: whenever F<XSUB.h> is #included, it redefines the aTHX |
ee072b34 GS |
2286 | and aTHX_ macros to call a function that will return the context. |
2287 | Thus, something like: | |
2288 | ||
2307c6d0 | 2289 | sv_setiv(sv, num); |
ee072b34 | 2290 | |
4375e838 | 2291 | in your extension will translate to this when PERL_IMPLICIT_CONTEXT is |
54aff467 | 2292 | in effect: |
ee072b34 | 2293 | |
2307c6d0 | 2294 | Perl_sv_setiv(Perl_get_context(), sv, num); |
ee072b34 | 2295 | |
54aff467 | 2296 | or to this otherwise: |
ee072b34 | 2297 | |
2307c6d0 | 2298 | Perl_sv_setiv(sv, num); |
ee072b34 | 2299 | |
da8c5729 | 2300 | You don't have to do anything new in your extension to get this; since |
2fa86c13 | 2301 | the Perl library provides Perl_get_context(), it will all just |
ee072b34 GS |
2302 | work. |
2303 | ||
2304 | The second, more efficient way is to use the following template for | |
2305 | your Foo.xs: | |
2306 | ||
c52f9dcd JH |
2307 | #define PERL_NO_GET_CONTEXT /* we want efficiency */ |
2308 | #include "EXTERN.h" | |
2309 | #include "perl.h" | |
2310 | #include "XSUB.h" | |
ee072b34 | 2311 | |
fd061412 | 2312 | STATIC void my_private_function(int arg1, int arg2); |
ee072b34 | 2313 | |
fd061412 | 2314 | STATIC void |
c52f9dcd JH |
2315 | my_private_function(int arg1, int arg2) |
2316 | { | |
2317 | dTHX; /* fetch context */ | |
2318 | ... call many Perl API functions ... | |
2319 | } | |
ee072b34 GS |
2320 | |
2321 | [... etc ...] | |
2322 | ||
c52f9dcd | 2323 | MODULE = Foo PACKAGE = Foo |
ee072b34 | 2324 | |
c52f9dcd | 2325 | /* typical XSUB */ |
ee072b34 | 2326 | |
c52f9dcd JH |
2327 | void |
2328 | my_xsub(arg) | |
2329 | int arg | |
2330 | CODE: | |
2331 | my_private_function(arg, 10); | |
ee072b34 GS |
2332 | |
2333 | Note that the only two changes from the normal way of writing an | |
2334 | extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before | |
2335 | including the Perl headers, followed by a C<dTHX;> declaration at | |
2336 | the start of every function that will call the Perl API. (You'll | |
2337 | know which functions need this, because the C compiler will complain | |
2338 | that there's an undeclared identifier in those functions.) No changes | |
2339 | are needed for the XSUBs themselves, because the XS() macro is | |
2340 | correctly defined to pass in the implicit context if needed. | |
2341 | ||
2342 | The third, even more efficient way is to ape how it is done within | |
2343 | the Perl guts: | |
2344 | ||
2345 | ||
c52f9dcd JH |
2346 | #define PERL_NO_GET_CONTEXT /* we want efficiency */ |
2347 | #include "EXTERN.h" | |
2348 | #include "perl.h" | |
2349 | #include "XSUB.h" | |
ee072b34 GS |
2350 | |
2351 | /* pTHX_ only needed for functions that call Perl API */ | |
fd061412 | 2352 | STATIC void my_private_function(pTHX_ int arg1, int arg2); |
ee072b34 | 2353 | |
fd061412 | 2354 | STATIC void |
c52f9dcd JH |
2355 | my_private_function(pTHX_ int arg1, int arg2) |
2356 | { | |
2357 | /* dTHX; not needed here, because THX is an argument */ | |
2358 | ... call Perl API functions ... | |
2359 | } | |
ee072b34 GS |
2360 | |
2361 | [... etc ...] | |
2362 | ||
c52f9dcd | 2363 | MODULE = Foo PACKAGE = Foo |
ee072b34 | 2364 | |
c52f9dcd | 2365 | /* typical XSUB */ |
ee072b34 | 2366 | |
c52f9dcd JH |
2367 | void |
2368 | my_xsub(arg) | |
2369 | int arg | |
2370 | CODE: | |
2371 | my_private_function(aTHX_ arg, 10); | |
ee072b34 GS |
2372 | |
2373 | This implementation never has to fetch the context using a function | |
2374 | call, since it is always passed as an extra argument. Depending on | |
2375 | your needs for simplicity or efficiency, you may mix the previous | |
2376 | two approaches freely. | |
2377 | ||
651a3225 GS |
2378 | Never add a comma after C<pTHX> yourself--always use the form of the |
2379 | macro with the underscore for functions that take explicit arguments, | |
2380 | or the form without the argument for functions with no explicit arguments. | |
ee072b34 | 2381 | |
27da23d5 JH |
2382 | If one is compiling Perl with the C<-DPERL_GLOBAL_STRUCT> the C<dVAR> |
2383 | definition is needed if the Perl global variables (see F<perlvars.h> | |
2384 | or F<globvar.sym>) are accessed in the function and C<dTHX> is not | |
2385 | used (the C<dTHX> includes the C<dVAR> if necessary). One notices | |
2386 | the need for C<dVAR> only with the said compile-time define, because | |
2387 | otherwise the Perl global variables are visible as-is. | |
2388 | ||
a7486cbb JH |
2389 | =head2 Should I do anything special if I call perl from multiple threads? |
2390 | ||
2391 | If you create interpreters in one thread and then proceed to call them in | |
2392 | another, you need to make sure perl's own Thread Local Storage (TLS) slot is | |
2393 | initialized correctly in each of those threads. | |
2394 | ||
2395 | The C<perl_alloc> and C<perl_clone> API functions will automatically set | |
2396 | the TLS slot to the interpreter they created, so that there is no need to do | |
2397 | anything special if the interpreter is always accessed in the same thread that | |
2398 | created it, and that thread did not create or call any other interpreters | |
2399 | afterwards. If that is not the case, you have to set the TLS slot of the | |
2400 | thread before calling any functions in the Perl API on that particular | |
2401 | interpreter. This is done by calling the C<PERL_SET_CONTEXT> macro in that | |
2402 | thread as the first thing you do: | |
2403 | ||
2404 | /* do this before doing anything else with some_perl */ | |
2405 | PERL_SET_CONTEXT(some_perl); | |
2406 | ||
2407 | ... other Perl API calls on some_perl go here ... | |
2408 | ||
ee072b34 GS |
2409 | =head2 Future Plans and PERL_IMPLICIT_SYS |
2410 | ||
2411 | Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything | |
2412 | that the interpreter knows about itself and pass it around, so too are | |
2413 | there plans to allow the interpreter to bundle up everything it knows | |
2414 | about the environment it's running on. This is enabled with the | |
7b52221d RGS |
2415 | PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS on |
2416 | Windows. | |
ee072b34 GS |
2417 | |
2418 | This allows the ability to provide an extra pointer (called the "host" | |
2419 | environment) for all the system calls. This makes it possible for | |
2420 | all the system stuff to maintain their own state, broken down into | |
2421 | seven C structures. These are thin wrappers around the usual system | |
0147cd53 | 2422 | calls (see F<win32/perllib.c>) for the default perl executable, but for a |
ee072b34 GS |
2423 | more ambitious host (like the one that would do fork() emulation) all |
2424 | the extra work needed to pretend that different interpreters are | |
2425 | actually different "processes", would be done here. | |
2426 | ||
2427 | The Perl engine/interpreter and the host are orthogonal entities. | |
2428 | There could be one or more interpreters in a process, and one or | |
2429 | more "hosts", with free association between them. | |
2430 | ||
a422fd2d SC |
2431 | =head1 Internal Functions |
2432 | ||
2433 | All of Perl's internal functions which will be exposed to the outside | |
06f6df17 | 2434 | world are prefixed by C<Perl_> so that they will not conflict with XS |
a422fd2d | 2435 | functions or functions used in a program in which Perl is embedded. |
10e2eb10 | 2436 | Similarly, all global variables begin with C<PL_>. (By convention, |
06f6df17 | 2437 | static functions start with C<S_>.) |
a422fd2d | 2438 | |
0972ecdf DM |
2439 | Inside the Perl core (C<PERL_CORE> defined), you can get at the functions |
2440 | either with or without the C<Perl_> prefix, thanks to a bunch of defines | |
10e2eb10 | 2441 | that live in F<embed.h>. Note that extension code should I<not> set |
0972ecdf DM |
2442 | C<PERL_CORE>; this exposes the full perl internals, and is likely to cause |
2443 | breakage of the XS in each new perl release. | |
2444 | ||
2445 | The file F<embed.h> is generated automatically from | |
10e2eb10 | 2446 | F<embed.pl> and F<embed.fnc>. F<embed.pl> also creates the prototyping |
dc9b1d22 | 2447 | header files for the internal functions, generates the documentation |
10e2eb10 | 2448 | and a lot of other bits and pieces. It's important that when you add |
dc9b1d22 | 2449 | a new function to the core or change an existing one, you change the |
10e2eb10 | 2450 | data in the table in F<embed.fnc> as well. Here's a sample entry from |
dc9b1d22 | 2451 | that table: |
a422fd2d SC |
2452 | |
2453 | Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval | |
2454 | ||
10e2eb10 FC |
2455 | The second column is the return type, the third column the name. Columns |
2456 | after that are the arguments. The first column is a set of flags: | |
a422fd2d SC |
2457 | |
2458 | =over 3 | |
2459 | ||
2460 | =item A | |
2461 | ||
10e2eb10 FC |
2462 | This function is a part of the public |
2463 | API. All such functions should also | |
1aa6ea50 | 2464 | have 'd', very few do not. |
a422fd2d SC |
2465 | |
2466 | =item p | |
2467 | ||
1aa6ea50 JC |
2468 | This function has a C<Perl_> prefix; i.e. it is defined as |
2469 | C<Perl_av_fetch>. | |
a422fd2d SC |
2470 | |
2471 | =item d | |
2472 | ||
2473 | This function has documentation using the C<apidoc> feature which we'll | |
1aa6ea50 | 2474 | look at in a second. Some functions have 'd' but not 'A'; docs are good. |
a422fd2d SC |
2475 | |
2476 | =back | |
2477 | ||
2478 | Other available flags are: | |
2479 | ||
2480 | =over 3 | |
2481 | ||
2482 | =item s | |
2483 | ||
1aa6ea50 JC |
2484 | This is a static function and is defined as C<STATIC S_whatever>, and |
2485 | usually called within the sources as C<whatever(...)>. | |
a422fd2d SC |
2486 | |
2487 | =item n | |
2488 | ||
da8c5729 | 2489 | This does not need an interpreter context, so the definition has no |
1aa6ea50 | 2490 | C<pTHX>, and it follows that callers don't use C<aTHX>. (See |
d3a43cd8 | 2491 | L</Background and PERL_IMPLICIT_CONTEXT>.) |
a422fd2d SC |
2492 | |
2493 | =item r | |
2494 | ||
2495 | This function never returns; C<croak>, C<exit> and friends. | |
2496 | ||
2497 | =item f | |
2498 | ||
2499 | This function takes a variable number of arguments, C<printf> style. | |
2500 | The argument list should end with C<...>, like this: | |
2501 | ||
2502 | Afprd |void |croak |const char* pat|... | |
2503 | ||
a7486cbb | 2504 | =item M |
a422fd2d | 2505 | |
00aadd71 | 2506 | This function is part of the experimental development API, and may change |
a422fd2d SC |
2507 | or disappear without notice. |
2508 | ||
2509 | =item o | |
2510 | ||
2511 | This function should not have a compatibility macro to define, say, | |
10e2eb10 | 2512 | C<Perl_parse> to C<parse>. It must be called as C<Perl_parse>. |
a422fd2d | 2513 | |
a422fd2d SC |
2514 | =item x |
2515 | ||
2516 | This function isn't exported out of the Perl core. | |
2517 | ||
dc9b1d22 MHM |
2518 | =item m |
2519 | ||
2520 | This is implemented as a macro. | |
2521 | ||
2522 | =item X | |
2523 | ||
2524 | This function is explicitly exported. | |
2525 | ||
2526 | =item E | |
2527 | ||
2528 | This function is visible to extensions included in the Perl core. | |
2529 | ||
2530 | =item b | |
2531 | ||
2532 | Binary backward compatibility; this function is a macro but also has | |
2533 | a C<Perl_> implementation (which is exported). | |
2534 | ||
1aa6ea50 JC |
2535 | =item others |
2536 | ||
2537 | See the comments at the top of C<embed.fnc> for others. | |
2538 | ||
a422fd2d SC |
2539 | =back |
2540 | ||
dc9b1d22 MHM |
2541 | If you edit F<embed.pl> or F<embed.fnc>, you will need to run |
2542 | C<make regen_headers> to force a rebuild of F<embed.h> and other | |
2543 | auto-generated files. | |
a422fd2d | 2544 | |
6b4667fc | 2545 | =head2 Formatted Printing of IVs, UVs, and NVs |
9dd9db0b | 2546 | |
6b4667fc A |
2547 | If you are printing IVs, UVs, or NVS instead of the stdio(3) style |
2548 | formatting codes like C<%d>, C<%ld>, C<%f>, you should use the | |
2549 | following macros for portability | |
9dd9db0b | 2550 | |
c52f9dcd JH |
2551 | IVdf IV in decimal |
2552 | UVuf UV in decimal | |
2553 | UVof UV in octal | |
2554 | UVxf UV in hexadecimal | |
2555 | NVef NV %e-like | |
2556 | NVff NV %f-like | |
2557 | NVgf NV %g-like | |
9dd9db0b | 2558 | |
6b4667fc A |
2559 | These will take care of 64-bit integers and long doubles. |
2560 | For example: | |
2561 | ||
c52f9dcd | 2562 | printf("IV is %"IVdf"\n", iv); |
6b4667fc A |
2563 | |
2564 | The IVdf will expand to whatever is the correct format for the IVs. | |
9dd9db0b | 2565 | |
8908e76d JH |
2566 | If you are printing addresses of pointers, use UVxf combined |
2567 | with PTR2UV(), do not use %lx or %p. | |
2568 | ||
2569 | =head2 Pointer-To-Integer and Integer-To-Pointer | |
2570 | ||
2571 | Because pointer size does not necessarily equal integer size, | |
2572 | use the follow macros to do it right. | |
2573 | ||
c52f9dcd JH |
2574 | PTR2UV(pointer) |
2575 | PTR2IV(pointer) | |
2576 | PTR2NV(pointer) | |
2577 | INT2PTR(pointertotype, integer) | |
8908e76d JH |
2578 | |
2579 | For example: | |
2580 | ||
c52f9dcd JH |
2581 | IV iv = ...; |
2582 | SV *sv = INT2PTR(SV*, iv); | |
8908e76d JH |
2583 | |
2584 | and | |
2585 | ||
c52f9dcd JH |
2586 | AV *av = ...; |
2587 | UV uv = PTR2UV(av); | |
8908e76d | 2588 | |
0ca3a874 MHM |
2589 | =head2 Exception Handling |
2590 | ||
9b5c3821 | 2591 | There are a couple of macros to do very basic exception handling in XS |
10e2eb10 | 2592 | modules. You have to define C<NO_XSLOCKS> before including F<XSUB.h> to |
9b5c3821 MHM |
2593 | be able to use these macros: |
2594 | ||
2595 | #define NO_XSLOCKS | |
2596 | #include "XSUB.h" | |
2597 | ||
2598 | You can use these macros if you call code that may croak, but you need | |
10e2eb10 | 2599 | to do some cleanup before giving control back to Perl. For example: |
0ca3a874 | 2600 | |
d7f8936a | 2601 | dXCPT; /* set up necessary variables */ |
0ca3a874 MHM |
2602 | |
2603 | XCPT_TRY_START { | |
2604 | code_that_may_croak(); | |
2605 | } XCPT_TRY_END | |
2606 | ||
2607 | XCPT_CATCH | |
2608 | { | |
2609 | /* do cleanup here */ | |
2610 | XCPT_RETHROW; | |
2611 | } | |
2612 | ||
2613 | Note that you always have to rethrow an exception that has been | |
10e2eb10 FC |
2614 | caught. Using these macros, it is not possible to just catch the |
2615 | exception and ignore it. If you have to ignore the exception, you | |
0ca3a874 MHM |
2616 | have to use the C<call_*> function. |
2617 | ||
2618 | The advantage of using the above macros is that you don't have | |
2619 | to setup an extra function for C<call_*>, and that using these | |
2620 | macros is faster than using C<call_*>. | |
2621 | ||
a422fd2d SC |
2622 | =head2 Source Documentation |
2623 | ||
2624 | There's an effort going on to document the internal functions and | |
2625 | automatically produce reference manuals from them - L<perlapi> is one | |
2626 | such manual which details all the functions which are available to XS | |
10e2eb10 | 2627 | writers. L<perlintern> is the autogenerated manual for the functions |
a422fd2d SC |
2628 | which are not part of the API and are supposedly for internal use only. |
2629 | ||
2630 | Source documentation is created by putting POD comments into the C | |
2631 | source, like this: | |
2632 | ||
2633 | /* | |
2634 | =for apidoc sv_setiv | |
2635 | ||
2636 | Copies an integer into the given SV. Does not handle 'set' magic. See | |
2637 | C<sv_setiv_mg>. | |
2638 | ||
2639 | =cut | |
2640 | */ | |
2641 | ||
2642 | Please try and supply some documentation if you add functions to the | |
2643 | Perl core. | |
2644 | ||
0d098d33 MHM |
2645 | =head2 Backwards compatibility |
2646 | ||
10e2eb10 FC |
2647 | The Perl API changes over time. New functions are |
2648 | added or the interfaces of existing functions are | |
2649 | changed. The C<Devel::PPPort> module tries to | |
0d098d33 MHM |
2650 | provide compatibility code for some of these changes, so XS writers don't |
2651 | have to code it themselves when supporting multiple versions of Perl. | |
2652 | ||
2653 | C<Devel::PPPort> generates a C header file F<ppport.h> that can also | |
10e2eb10 | 2654 | be run as a Perl script. To generate F<ppport.h>, run: |
0d098d33 MHM |
2655 | |
2656 | perl -MDevel::PPPort -eDevel::PPPort::WriteFile | |
2657 | ||
2658 | Besides checking existing XS code, the script can also be used to retrieve | |
2659 | compatibility information for various API calls using the C<--api-info> | |
10e2eb10 | 2660 | command line switch. For example: |
0d098d33 MHM |
2661 | |
2662 | % perl ppport.h --api-info=sv_magicext | |
2663 | ||
2664 | For details, see C<perldoc ppport.h>. | |
2665 | ||
a422fd2d SC |
2666 | =head1 Unicode Support |
2667 | ||
10e2eb10 | 2668 | Perl 5.6.0 introduced Unicode support. It's important for porters and XS |
a422fd2d SC |
2669 | writers to understand this support and make sure that the code they |
2670 | write does not corrupt Unicode data. | |
2671 | ||
2672 | =head2 What B<is> Unicode, anyway? | |
2673 | ||
10e2eb10 FC |
2674 | In the olden, less enlightened times, we all used to use ASCII. Most of |
2675 | us did, anyway. The big problem with ASCII is that it's American. Well, | |
a422fd2d | 2676 | no, that's not actually the problem; the problem is that it's not |
10e2eb10 | 2677 | particularly useful for people who don't use the Roman alphabet. What |
a422fd2d | 2678 | used to happen was that particular languages would stick their own |
10e2eb10 | 2679 | alphabet in the upper range of the sequence, between 128 and 255. Of |
a422fd2d SC |
2680 | course, we then ended up with plenty of variants that weren't quite |
2681 | ASCII, and the whole point of it being a standard was lost. | |
2682 | ||
2683 | Worse still, if you've got a language like Chinese or | |
2684 | Japanese that has hundreds or thousands of characters, then you really | |
2685 | can't fit them into a mere 256, so they had to forget about ASCII | |
2686 | altogether, and build their own systems using pairs of numbers to refer | |
2687 | to one character. | |
2688 | ||
2689 | To fix this, some people formed Unicode, Inc. and | |
2690 | produced a new character set containing all the characters you can | |
10e2eb10 FC |
2691 | possibly think of and more. There are several ways of representing these |
2692 | characters, and the one Perl uses is called UTF-8. UTF-8 uses | |
2693 | a variable number of bytes to represent a character. You can learn more | |
2575c402 | 2694 | about Unicode and Perl's Unicode model in L<perlunicode>. |
a422fd2d | 2695 | |
1e54db1a | 2696 | =head2 How can I recognise a UTF-8 string? |
a422fd2d | 2697 | |
10e2eb10 FC |
2698 | You can't. This is because UTF-8 data is stored in bytes just like |
2699 | non-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types) | |
a422fd2d | 2700 | capital E with a grave accent, is represented by the two bytes |
10e2eb10 FC |
2701 | C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)> |
2702 | has that byte sequence as well. So you can't tell just by looking - this | |
a422fd2d SC |
2703 | is what makes Unicode input an interesting problem. |
2704 | ||
2575c402 JW |
2705 | In general, you either have to know what you're dealing with, or you |
2706 | have to guess. The API function C<is_utf8_string> can help; it'll tell | |
10e2eb10 FC |
2707 | you if a string contains only valid UTF-8 characters. However, it can't |
2708 | do the work for you. On a character-by-character basis, | |
49f4c4e4 | 2709 | C<is_utf8_char_buf> |
2575c402 | 2710 | will tell you whether the current character in a string is valid UTF-8. |
a422fd2d | 2711 | |
1e54db1a | 2712 | =head2 How does UTF-8 represent Unicode characters? |
a422fd2d | 2713 | |
1e54db1a | 2714 | As mentioned above, UTF-8 uses a variable number of bytes to store a |
10e2eb10 FC |
2715 | character. Characters with values 0...127 are stored in one |
2716 | byte, just like good ol' ASCII. Character 128 is stored as | |
2717 | C<v194.128>; this continues up to character 191, which is | |
2718 | C<v194.191>. Now we've run out of bits (191 is binary | |
2719 | C<10111111>) so we move on; 192 is C<v195.128>. And | |
a422fd2d SC |
2720 | so it goes on, moving to three bytes at character 2048. |
2721 | ||
1e54db1a | 2722 | Assuming you know you're dealing with a UTF-8 string, you can find out |
a422fd2d SC |
2723 | how long the first character in it is with the C<UTF8SKIP> macro: |
2724 | ||
2725 | char *utf = "\305\233\340\240\201"; | |
2726 | I32 len; | |
2727 | ||
2728 | len = UTF8SKIP(utf); /* len is 2 here */ | |
2729 | utf += len; | |
2730 | len = UTF8SKIP(utf); /* len is 3 here */ | |
2731 | ||
1e54db1a | 2732 | Another way to skip over characters in a UTF-8 string is to use |
a422fd2d | 2733 | C<utf8_hop>, which takes a string and a number of characters to skip |
10e2eb10 | 2734 | over. You're on your own about bounds checking, though, so don't use it |
a422fd2d SC |
2735 | lightly. |
2736 | ||
1e54db1a | 2737 | All bytes in a multi-byte UTF-8 character will have the high bit set, |
3a2263fe RGS |
2738 | so you can test if you need to do something special with this |
2739 | character like this (the UTF8_IS_INVARIANT() is a macro that tests | |
9f98c7fe | 2740 | whether the byte is encoded as a single byte even in UTF-8): |
a422fd2d | 2741 | |
3a2263fe | 2742 | U8 *utf; |
4b88fb76 | 2743 | U8 *utf_end; /* 1 beyond buffer pointed to by utf */ |
3a2263fe | 2744 | UV uv; /* Note: a UV, not a U8, not a char */ |
95701e00 | 2745 | STRLEN len; /* length of character in bytes */ |
a422fd2d | 2746 | |
3a2263fe | 2747 | if (!UTF8_IS_INVARIANT(*utf)) |
1e54db1a | 2748 | /* Must treat this as UTF-8 */ |
4b88fb76 | 2749 | uv = utf8_to_uvchr_buf(utf, utf_end, &len); |
a422fd2d SC |
2750 | else |
2751 | /* OK to treat this character as a byte */ | |
2752 | uv = *utf; | |
2753 | ||
4b88fb76 | 2754 | You can also see in that example that we use C<utf8_to_uvchr_buf> to get the |
95701e00 | 2755 | value of the character; the inverse function C<uvchr_to_utf8> is available |
1e54db1a | 2756 | for putting a UV into UTF-8: |
a422fd2d | 2757 | |
3a2263fe | 2758 | if (!UTF8_IS_INVARIANT(uv)) |
a422fd2d | 2759 | /* Must treat this as UTF8 */ |
95701e00 | 2760 | utf8 = uvchr_to_utf8(utf8, uv); |
a422fd2d SC |
2761 | else |
2762 | /* OK to treat this character as a byte */ | |
2763 | *utf8++ = uv; | |
2764 | ||
2765 | You B<must> convert characters to UVs using the above functions if | |
1e54db1a | 2766 | you're ever in a situation where you have to match UTF-8 and non-UTF-8 |
10e2eb10 | 2767 | characters. You may not skip over UTF-8 characters in this case. If you |
1e54db1a JH |
2768 | do this, you'll lose the ability to match hi-bit non-UTF-8 characters; |
2769 | for instance, if your UTF-8 string contains C<v196.172>, and you skip | |
2770 | that character, you can never match a C<chr(200)> in a non-UTF-8 string. | |
a422fd2d SC |
2771 | So don't do that! |
2772 | ||
1e54db1a | 2773 | =head2 How does Perl store UTF-8 strings? |
a422fd2d SC |
2774 | |
2775 | Currently, Perl deals with Unicode strings and non-Unicode strings | |
10e2eb10 FC |
2776 | slightly differently. A flag in the SV, C<SVf_UTF8>, indicates that the |
2777 | string is internally encoded as UTF-8. Without it, the byte value is the | |
2575c402 | 2778 | codepoint number and vice versa (in other words, the string is encoded |
e1b711da | 2779 | as iso-8859-1, but C<use feature 'unicode_strings'> is needed to get iso-8859-1 |
c31cc9fc FC |
2780 | semantics). This flag is only meaningful if the SV is C<SvPOK> |
2781 | or immediately after stringification via C<SvPV> or a similar | |
2782 | macro. You can check and manipulate this flag with the | |
2575c402 | 2783 | following macros: |
a422fd2d SC |
2784 | |
2785 | SvUTF8(sv) | |
2786 | SvUTF8_on(sv) | |
2787 | SvUTF8_off(sv) | |
2788 | ||
2789 | This flag has an important effect on Perl's treatment of the string: if | |
2790 | Unicode data is not properly distinguished, regular expressions, | |
2791 | C<length>, C<substr> and other string handling operations will have | |
2792 | undesirable results. | |
2793 | ||
2794 | The problem comes when you have, for instance, a string that isn't | |
2575c402 | 2795 | flagged as UTF-8, and contains a byte sequence that could be UTF-8 - |
1e54db1a | 2796 | especially when combining non-UTF-8 and UTF-8 strings. |
a422fd2d SC |
2797 | |
2798 | Never forget that the C<SVf_UTF8> flag is separate to the PV value; you | |
2799 | need be sure you don't accidentally knock it off while you're | |
10e2eb10 | 2800 | manipulating SVs. More specifically, you cannot expect to do this: |
a422fd2d SC |
2801 | |
2802 | SV *sv; | |
2803 | SV *nsv; | |
2804 | STRLEN len; | |
2805 | char *p; | |
2806 | ||
2807 | p = SvPV(sv, len); | |
2808 | frobnicate(p); | |
2809 | nsv = newSVpvn(p, len); | |
2810 | ||
2811 | The C<char*> string does not tell you the whole story, and you can't | |
10e2eb10 | 2812 | copy or reconstruct an SV just by copying the string value. Check if the |
c31cc9fc FC |
2813 | old SV has the UTF8 flag set (I<after> the C<SvPV> call), and act |
2814 | accordingly: | |
a422fd2d SC |
2815 | |
2816 | p = SvPV(sv, len); | |
2817 | frobnicate(p); | |
2818 | nsv = newSVpvn(p, len); | |
2819 | if (SvUTF8(sv)) | |
2820 | SvUTF8_on(nsv); | |
2821 | ||
2822 | In fact, your C<frobnicate> function should be made aware of whether or | |
1e54db1a | 2823 | not it's dealing with UTF-8 data, so that it can handle the string |
a422fd2d SC |
2824 | appropriately. |
2825 | ||
3a2263fe | 2826 | Since just passing an SV to an XS function and copying the data of |
2575c402 | 2827 | the SV is not enough to copy the UTF8 flags, even less right is just |
3a2263fe RGS |
2828 | passing a C<char *> to an XS function. |
2829 | ||
1e54db1a | 2830 | =head2 How do I convert a string to UTF-8? |
a422fd2d | 2831 | |
2575c402 | 2832 | If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to upgrade |
10e2eb10 | 2833 | one of the strings to UTF-8. If you've got an SV, the easiest way to do |
2575c402 | 2834 | this is: |
a422fd2d SC |
2835 | |
2836 | sv_utf8_upgrade(sv); | |
2837 | ||
2838 | However, you must not do this, for example: | |
2839 | ||
2840 | if (!SvUTF8(left)) | |
2841 | sv_utf8_upgrade(left); | |
2842 | ||
2843 | If you do this in a binary operator, you will actually change one of the | |
b1866b2d | 2844 | strings that came into the operator, and, while it shouldn't be noticeable |
2575c402 | 2845 | by the end user, it can cause problems in deficient code. |
a422fd2d | 2846 | |
1e54db1a | 2847 | Instead, C<bytes_to_utf8> will give you a UTF-8-encoded B<copy> of its |
10e2eb10 FC |
2848 | string argument. This is useful for having the data available for |
2849 | comparisons and so on, without harming the original SV. There's also | |
a422fd2d SC |
2850 | C<utf8_to_bytes> to go the other way, but naturally, this will fail if |
2851 | the string contains any characters above 255 that can't be represented | |
2852 | in a single byte. | |
2853 | ||
2854 | =head2 Is there anything else I need to know? | |
2855 | ||
10e2eb10 | 2856 | Not really. Just remember these things: |
a422fd2d SC |
2857 | |
2858 | =over 3 | |
2859 | ||
2860 | =item * | |
2861 | ||
10e2eb10 | 2862 | There's no way to tell if a string is UTF-8 or not. You can tell if an SV |
c31cc9fc FC |
2863 | is UTF-8 by looking at its C<SvUTF8> flag after stringifying it |
2864 | with C<SvPV> or a similar macro. Don't forget to set the flag if | |
10e2eb10 | 2865 | something should be UTF-8. Treat the flag as part of the PV, even though |
a422fd2d SC |
2866 | it's not - if you pass on the PV to somewhere, pass on the flag too. |
2867 | ||
2868 | =item * | |
2869 | ||
4b88fb76 | 2870 | If a string is UTF-8, B<always> use C<utf8_to_uvchr_buf> to get at the value, |
3a2263fe | 2871 | unless C<UTF8_IS_INVARIANT(*s)> in which case you can use C<*s>. |
a422fd2d SC |
2872 | |
2873 | =item * | |
2874 | ||
1e54db1a | 2875 | When writing a character C<uv> to a UTF-8 string, B<always> use |
95701e00 | 2876 | C<uvchr_to_utf8>, unless C<UTF8_IS_INVARIANT(uv))> in which case |
3a2263fe | 2877 | you can use C<*s = uv>. |
a422fd2d SC |
2878 | |
2879 | =item * | |
2880 | ||
10e2eb10 FC |
2881 | Mixing UTF-8 and non-UTF-8 strings is |
2882 | tricky. Use C<bytes_to_utf8> to get | |
2bbc8d55 | 2883 | a new string which is UTF-8 encoded, and then combine them. |
a422fd2d SC |
2884 | |
2885 | =back | |
2886 | ||
53e06cf0 SC |
2887 | =head1 Custom Operators |
2888 | ||
2a0fd0f1 | 2889 | Custom operator support is an experimental feature that allows you to |
10e2eb10 | 2890 | define your own ops. This is primarily to allow the building of |
53e06cf0 SC |
2891 | interpreters for other languages in the Perl core, but it also allows |
2892 | optimizations through the creation of "macro-ops" (ops which perform the | |
2893 | functions of multiple ops which are usually executed together, such as | |
1aa6ea50 | 2894 | C<gvsv, gvsv, add>.) |
53e06cf0 | 2895 | |
10e2eb10 | 2896 | This feature is implemented as a new op type, C<OP_CUSTOM>. The Perl |
53e06cf0 | 2897 | core does not "know" anything special about this op type, and so it will |
10e2eb10 | 2898 | not be involved in any optimizations. This also means that you can |
53e06cf0 SC |
2899 | define your custom ops to be any op structure - unary, binary, list and |
2900 | so on - you like. | |
2901 | ||
10e2eb10 FC |
2902 | It's important to know what custom operators won't do for you. They |
2903 | won't let you add new syntax to Perl, directly. They won't even let you | |
2904 | add new keywords, directly. In fact, they won't change the way Perl | |
2905 | compiles a program at all. You have to do those changes yourself, after | |
2906 | Perl has compiled the program. You do this either by manipulating the op | |
53e06cf0 SC |
2907 | tree using a C<CHECK> block and the C<B::Generate> module, or by adding |
2908 | a custom peephole optimizer with the C<optimize> module. | |
2909 | ||
2910 | When you do this, you replace ordinary Perl ops with custom ops by | |
407f86e1 | 2911 | creating ops with the type C<OP_CUSTOM> and the C<op_ppaddr> of your own |
10e2eb10 FC |
2912 | PP function. This should be defined in XS code, and should look like |
2913 | the PP ops in C<pp_*.c>. You are responsible for ensuring that your op | |
53e06cf0 SC |
2914 | takes the appropriate number of values from the stack, and you are |
2915 | responsible for adding stack marks if necessary. | |
2916 | ||
2917 | You should also "register" your op with the Perl interpreter so that it | |
10e2eb10 | 2918 | can produce sensible error and warning messages. Since it is possible to |
53e06cf0 | 2919 | have multiple custom ops within the one "logical" op type C<OP_CUSTOM>, |
9733086d | 2920 | Perl uses the value of C<< o->op_ppaddr >> to determine which custom op |
10e2eb10 | 2921 | it is dealing with. You should create an C<XOP> structure for each |
9733086d BM |
2922 | ppaddr you use, set the properties of the custom op with |
2923 | C<XopENTRY_set>, and register the structure against the ppaddr using | |
10e2eb10 | 2924 | C<Perl_custom_op_register>. A trivial example might look like: |
9733086d BM |
2925 | |
2926 | static XOP my_xop; | |
2927 | static OP *my_pp(pTHX); | |
2928 | ||
2929 | BOOT: | |
2930 | XopENTRY_set(&my_xop, xop_name, "myxop"); | |
2931 | XopENTRY_set(&my_xop, xop_desc, "Useless custom op"); | |
2932 | Perl_custom_op_register(aTHX_ my_pp, &my_xop); | |
2933 | ||
2934 | The available fields in the structure are: | |
2935 | ||
2936 | =over 4 | |
2937 | ||
2938 | =item xop_name | |
2939 | ||
10e2eb10 | 2940 | A short name for your op. This will be included in some error messages, |
9733086d BM |
2941 | and will also be returned as C<< $op->name >> by the L<B|B> module, so |
2942 | it will appear in the output of module like L<B::Concise|B::Concise>. | |
2943 | ||
2944 | =item xop_desc | |
2945 | ||
2946 | A short description of the function of the op. | |
2947 | ||
2948 | =item xop_class | |
2949 | ||
10e2eb10 | 2950 | Which of the various C<*OP> structures this op uses. This should be one of |
9733086d BM |
2951 | the C<OA_*> constants from F<op.h>, namely |
2952 | ||
2953 | =over 4 | |
2954 | ||
2955 | =item OA_BASEOP | |
2956 | ||
2957 | =item OA_UNOP | |
2958 | ||
2959 | =item OA_BINOP | |
2960 | ||
2961 | =item OA_LOGOP | |
2962 | ||
2963 | =item OA_LISTOP | |
2964 | ||
2965 | =item OA_PMOP | |
2966 | ||
2967 | =item OA_SVOP | |
2968 | ||
2969 | =item OA_PADOP | |
2970 | ||
2971 | =item OA_PVOP_OR_SVOP | |
2972 | ||
10e2eb10 | 2973 | This should be interpreted as 'C<PVOP>' only. The C<_OR_SVOP> is because |
9733086d BM |
2974 | the only core C<PVOP>, C<OP_TRANS>, can sometimes be a C<SVOP> instead. |
2975 | ||
2976 | =item OA_LOOP | |
2977 | ||
2978 | =item OA_COP | |
2979 | ||
2980 | =back | |
2981 | ||
2982 | The other C<OA_*> constants should not be used. | |
2983 | ||
2984 | =item xop_peep | |
2985 | ||
2986 | This member is of type C<Perl_cpeep_t>, which expands to C<void | |
10e2eb10 | 2987 | (*Perl_cpeep_t)(aTHX_ OP *o, OP *oldop)>. If it is set, this function |
9733086d | 2988 | will be called from C<Perl_rpeep> when ops of this type are encountered |
10e2eb10 | 2989 | by the peephole optimizer. I<o> is the OP that needs optimizing; |
9733086d BM |
2990 | I<oldop> is the previous OP optimized, whose C<op_next> points to I<o>. |
2991 | ||
2992 | =back | |
53e06cf0 | 2993 | |
e7d4c058 | 2994 | C<B::Generate> directly supports the creation of custom ops by name. |
53e06cf0 | 2995 | |
954c1994 | 2996 | =head1 AUTHORS |
e89caa19 | 2997 | |
954c1994 | 2998 | Until May 1997, this document was maintained by Jeff Okamoto |
9b5bb84f SB |
2999 | E<lt>okamoto@corp.hp.comE<gt>. It is now maintained as part of Perl |
3000 | itself by the Perl 5 Porters E<lt>perl5-porters@perl.orgE<gt>. | |
cb1a09d0 | 3001 | |
954c1994 GS |
3002 | With lots of help and suggestions from Dean Roehrich, Malcolm Beattie, |
3003 | Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil | |
3004 | Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer, | |
3005 | Stephen McCamant, and Gurusamy Sarathy. | |
cb1a09d0 | 3006 | |
954c1994 | 3007 | =head1 SEE ALSO |
cb1a09d0 | 3008 | |
ba555bf5 | 3009 | L<perlapi>, L<perlintern>, L<perlxs>, L<perlembed> |