Commit | Line | Data |
---|---|---|
a0d0e21e LW |
1 | =head1 NAME |
2 | ||
954c1994 | 3 | perlguts - Introduction to the Perl API |
a0d0e21e LW |
4 | |
5 | =head1 DESCRIPTION | |
6 | ||
b3b6085d PP |
7 | This document attempts to describe how to use the Perl API, as well as |
8 | containing some info on the basic workings of the Perl core. It is far | |
9 | from complete and probably contains many errors. Please refer any | |
10 | questions or comments to the author below. | |
a0d0e21e | 11 | |
0a753a76 | 12 | =head1 Variables |
13 | ||
5f05dabc | 14 | =head2 Datatypes |
a0d0e21e LW |
15 | |
16 | Perl has three typedefs that handle Perl's three main data types: | |
17 | ||
18 | SV Scalar Value | |
19 | AV Array Value | |
20 | HV Hash Value | |
21 | ||
d1b91892 | 22 | Each typedef has specific routines that manipulate the various data types. |
a0d0e21e LW |
23 | |
24 | =head2 What is an "IV"? | |
25 | ||
954c1994 | 26 | Perl uses a special typedef IV which is a simple signed integer type that is |
5f05dabc | 27 | guaranteed to be large enough to hold a pointer (as well as an integer). |
954c1994 | 28 | Additionally, there is the UV, which is simply an unsigned IV. |
a0d0e21e | 29 | |
d1b91892 | 30 | Perl also uses two special typedefs, I32 and I16, which will always be at |
954c1994 GS |
31 | least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16, |
32 | as well.) | |
a0d0e21e | 33 | |
54310121 | 34 | =head2 Working with SVs |
a0d0e21e LW |
35 | |
36 | An SV can be created and loaded with one command. There are four types of | |
a7dfe00a MS |
37 | values that can be loaded: an integer value (IV), a double (NV), |
38 | a string (PV), and another scalar (SV). | |
a0d0e21e | 39 | |
9da1e3b5 | 40 | The six routines are: |
a0d0e21e LW |
41 | |
42 | SV* newSViv(IV); | |
43 | SV* newSVnv(double); | |
08105a92 GS |
44 | SV* newSVpv(const char*, int); |
45 | SV* newSVpvn(const char*, int); | |
46fc3d4c | 46 | SV* newSVpvf(const char*, ...); |
a0d0e21e LW |
47 | SV* newSVsv(SV*); |
48 | ||
deb3007b | 49 | To change the value of an *already-existing* SV, there are seven routines: |
a0d0e21e LW |
50 | |
51 | void sv_setiv(SV*, IV); | |
deb3007b | 52 | void sv_setuv(SV*, UV); |
a0d0e21e | 53 | void sv_setnv(SV*, double); |
08105a92 GS |
54 | void sv_setpv(SV*, const char*); |
55 | void sv_setpvn(SV*, const char*, int) | |
46fc3d4c | 56 | void sv_setpvf(SV*, const char*, ...); |
9abd00ed | 57 | void sv_setpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool); |
a0d0e21e LW |
58 | void sv_setsv(SV*, SV*); |
59 | ||
60 | Notice that you can choose to specify the length of the string to be | |
9da1e3b5 MUN |
61 | assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may |
62 | allow Perl to calculate the length by using C<sv_setpv> or by specifying | |
63 | 0 as the second argument to C<newSVpv>. Be warned, though, that Perl will | |
64 | determine the string's length by using C<strlen>, which depends on the | |
9abd00ed GS |
65 | string terminating with a NUL character. |
66 | ||
67 | The arguments of C<sv_setpvf> are processed like C<sprintf>, and the | |
68 | formatted output becomes the value. | |
69 | ||
70 | C<sv_setpvfn> is an analogue of C<vsprintf>, but it allows you to specify | |
71 | either a pointer to a variable argument list or the address and length of | |
72 | an array of SVs. The last argument points to a boolean; on return, if that | |
73 | boolean is true, then locale-specific information has been used to format | |
c2611fb3 | 74 | the string, and the string's contents are therefore untrustworthy (see |
9abd00ed GS |
75 | L<perlsec>). This pointer may be NULL if that information is not |
76 | important. Note that this function requires you to specify the length of | |
77 | the format. | |
78 | ||
9da1e3b5 MUN |
79 | The C<sv_set*()> functions are not generic enough to operate on values |
80 | that have "magic". See L<Magic Virtual Tables> later in this document. | |
a0d0e21e | 81 | |
a3cb178b GS |
82 | All SVs that contain strings should be terminated with a NUL character. |
83 | If it is not NUL-terminated there is a risk of | |
5f05dabc | 84 | core dumps and corruptions from code which passes the string to C |
85 | functions or system calls which expect a NUL-terminated string. | |
86 | Perl's own functions typically add a trailing NUL for this reason. | |
87 | Nevertheless, you should be very careful when you pass a string stored | |
88 | in an SV to a C function or system call. | |
89 | ||
a0d0e21e LW |
90 | To access the actual value that an SV points to, you can use the macros: |
91 | ||
92 | SvIV(SV*) | |
954c1994 | 93 | SvUV(SV*) |
a0d0e21e LW |
94 | SvNV(SV*) |
95 | SvPV(SV*, STRLEN len) | |
1fa8b10d | 96 | SvPV_nolen(SV*) |
a0d0e21e | 97 | |
954c1994 | 98 | which will automatically coerce the actual scalar type into an IV, UV, double, |
a0d0e21e LW |
99 | or string. |
100 | ||
101 | In the C<SvPV> macro, the length of the string returned is placed into the | |
1fa8b10d JD |
102 | variable C<len> (this is a macro, so you do I<not> use C<&len>). If you do |
103 | not care what the length of the data is, use the C<SvPV_nolen> macro. | |
104 | Historically the C<SvPV> macro with the global variable C<PL_na> has been | |
105 | used in this case. But that can be quite inefficient because C<PL_na> must | |
106 | be accessed in thread-local storage in threaded Perl. In any case, remember | |
107 | that Perl allows arbitrary strings of data that may both contain NULs and | |
108 | might not be terminated by a NUL. | |
a0d0e21e | 109 | |
ce2f5d8f KA |
110 | Also remember that C doesn't allow you to safely say C<foo(SvPV(s, len), |
111 | len);>. It might work with your compiler, but it won't work for everyone. | |
112 | Break this sort of statement up into separate assignments: | |
113 | ||
b2f5ed49 | 114 | SV *s; |
ce2f5d8f KA |
115 | STRLEN len; |
116 | char * ptr; | |
b2f5ed49 | 117 | ptr = SvPV(s, len); |
ce2f5d8f KA |
118 | foo(ptr, len); |
119 | ||
07fa94a1 | 120 | If you want to know if the scalar value is TRUE, you can use: |
a0d0e21e LW |
121 | |
122 | SvTRUE(SV*) | |
123 | ||
124 | Although Perl will automatically grow strings for you, if you need to force | |
125 | Perl to allocate more memory for your SV, you can use the macro | |
126 | ||
127 | SvGROW(SV*, STRLEN newlen) | |
128 | ||
129 | which will determine if more memory needs to be allocated. If so, it will | |
130 | call the function C<sv_grow>. Note that C<SvGROW> can only increase, not | |
5f05dabc | 131 | decrease, the allocated memory of an SV and that it does not automatically |
132 | add a byte for the a trailing NUL (perl's own string functions typically do | |
8ebc5c01 | 133 | C<SvGROW(sv, len + 1)>). |
a0d0e21e LW |
134 | |
135 | If you have an SV and want to know what kind of data Perl thinks is stored | |
136 | in it, you can use the following macros to check the type of SV you have. | |
137 | ||
138 | SvIOK(SV*) | |
139 | SvNOK(SV*) | |
140 | SvPOK(SV*) | |
141 | ||
142 | You can get and set the current length of the string stored in an SV with | |
143 | the following macros: | |
144 | ||
145 | SvCUR(SV*) | |
146 | SvCUR_set(SV*, I32 val) | |
147 | ||
cb1a09d0 AD |
148 | You can also get a pointer to the end of the string stored in the SV |
149 | with the macro: | |
150 | ||
151 | SvEND(SV*) | |
152 | ||
153 | But note that these last three macros are valid only if C<SvPOK()> is true. | |
a0d0e21e | 154 | |
d1b91892 AD |
155 | If you want to append something to the end of string stored in an C<SV*>, |
156 | you can use the following functions: | |
157 | ||
08105a92 | 158 | void sv_catpv(SV*, const char*); |
e65f3abd | 159 | void sv_catpvn(SV*, const char*, STRLEN); |
46fc3d4c | 160 | void sv_catpvf(SV*, const char*, ...); |
9abd00ed | 161 | void sv_catpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool); |
d1b91892 AD |
162 | void sv_catsv(SV*, SV*); |
163 | ||
164 | The first function calculates the length of the string to be appended by | |
165 | using C<strlen>. In the second, you specify the length of the string | |
46fc3d4c | 166 | yourself. The third function processes its arguments like C<sprintf> and |
9abd00ed GS |
167 | appends the formatted output. The fourth function works like C<vsprintf>. |
168 | You can specify the address and length of an array of SVs instead of the | |
169 | va_list argument. The fifth function extends the string stored in the first | |
170 | SV with the string stored in the second SV. It also forces the second SV | |
171 | to be interpreted as a string. | |
172 | ||
173 | The C<sv_cat*()> functions are not generic enough to operate on values that | |
174 | have "magic". See L<Magic Virtual Tables> later in this document. | |
d1b91892 | 175 | |
a0d0e21e LW |
176 | If you know the name of a scalar variable, you can get a pointer to its SV |
177 | by using the following: | |
178 | ||
4929bf7b | 179 | SV* get_sv("package::varname", FALSE); |
a0d0e21e LW |
180 | |
181 | This returns NULL if the variable does not exist. | |
182 | ||
d1b91892 | 183 | If you want to know if this variable (or any other SV) is actually C<defined>, |
a0d0e21e LW |
184 | you can call: |
185 | ||
186 | SvOK(SV*) | |
187 | ||
9cde0e7f | 188 | The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>. Its |
a0d0e21e LW |
189 | address can be used whenever an C<SV*> is needed. |
190 | ||
9cde0e7f GS |
191 | There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain Boolean |
192 | TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their addresses can | |
a0d0e21e LW |
193 | be used whenever an C<SV*> is needed. |
194 | ||
9cde0e7f | 195 | Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>. |
a0d0e21e LW |
196 | Take this code: |
197 | ||
198 | SV* sv = (SV*) 0; | |
199 | if (I-am-to-return-a-real-value) { | |
200 | sv = sv_2mortal(newSViv(42)); | |
201 | } | |
202 | sv_setsv(ST(0), sv); | |
203 | ||
204 | This code tries to return a new SV (which contains the value 42) if it should | |
04343c6d | 205 | return a real value, or undef otherwise. Instead it has returned a NULL |
a0d0e21e | 206 | pointer which, somewhere down the line, will cause a segmentation violation, |
9cde0e7f | 207 | bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the first |
5f05dabc | 208 | line and all will be well. |
a0d0e21e LW |
209 | |
210 | To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this | |
3fe9a6f1 | 211 | call is not necessary (see L<Reference Counts and Mortality>). |
a0d0e21e | 212 | |
94dde4fb SC |
213 | =head2 Offsets |
214 | ||
215 | Perl provides the function C<sv_chop> to efficiently remove characters | |
216 | from the beginning of a string; you give it an SV and a pointer to | |
217 | somewhere inside the the PV, and it discards everything before the | |
218 | pointer. The efficiency comes by means of a little hack: instead of | |
219 | actually removing the characters, C<sv_chop> sets the flag C<OOK> | |
220 | (offset OK) to signal to other functions that the offset hack is in | |
221 | effect, and it puts the number of bytes chopped off into the IV field | |
222 | of the SV. It then moves the PV pointer (called C<SvPVX>) forward that | |
223 | many bytes, and adjusts C<SvCUR> and C<SvLEN>. | |
224 | ||
225 | Hence, at this point, the start of the buffer that we allocated lives | |
226 | at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing | |
227 | into the middle of this allocated storage. | |
228 | ||
229 | This is best demonstrated by example: | |
230 | ||
231 | % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)' | |
232 | SV = PVIV(0x8128450) at 0x81340f0 | |
233 | REFCNT = 1 | |
234 | FLAGS = (POK,OOK,pPOK) | |
235 | IV = 1 (OFFSET) | |
236 | PV = 0x8135781 ( "1" . ) "2345"\0 | |
237 | CUR = 4 | |
238 | LEN = 5 | |
239 | ||
240 | Here the number of bytes chopped off (1) is put into IV, and | |
241 | C<Devel::Peek::Dump> helpfully reminds us that this is an offset. The | |
242 | portion of the string between the "real" and the "fake" beginnings is | |
243 | shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect | |
244 | the fake beginning, not the real one. | |
245 | ||
d1b91892 | 246 | =head2 What's Really Stored in an SV? |
a0d0e21e LW |
247 | |
248 | Recall that the usual method of determining the type of scalar you have is | |
5f05dabc | 249 | to use C<Sv*OK> macros. Because a scalar can be both a number and a string, |
d1b91892 | 250 | usually these macros will always return TRUE and calling the C<Sv*V> |
a0d0e21e LW |
251 | macros will do the appropriate conversion of string to integer/double or |
252 | integer/double to string. | |
253 | ||
254 | If you I<really> need to know if you have an integer, double, or string | |
255 | pointer in an SV, you can use the following three macros instead: | |
256 | ||
257 | SvIOKp(SV*) | |
258 | SvNOKp(SV*) | |
259 | SvPOKp(SV*) | |
260 | ||
261 | These will tell you if you truly have an integer, double, or string pointer | |
d1b91892 | 262 | stored in your SV. The "p" stands for private. |
a0d0e21e | 263 | |
07fa94a1 | 264 | In general, though, it's best to use the C<Sv*V> macros. |
a0d0e21e | 265 | |
54310121 | 266 | =head2 Working with AVs |
a0d0e21e | 267 | |
07fa94a1 JO |
268 | There are two ways to create and load an AV. The first method creates an |
269 | empty AV: | |
a0d0e21e LW |
270 | |
271 | AV* newAV(); | |
272 | ||
54310121 | 273 | The second method both creates the AV and initially populates it with SVs: |
a0d0e21e LW |
274 | |
275 | AV* av_make(I32 num, SV **ptr); | |
276 | ||
5f05dabc | 277 | The second argument points to an array containing C<num> C<SV*>'s. Once the |
54310121 | 278 | AV has been created, the SVs can be destroyed, if so desired. |
a0d0e21e | 279 | |
54310121 | 280 | Once the AV has been created, the following operations are possible on AVs: |
a0d0e21e LW |
281 | |
282 | void av_push(AV*, SV*); | |
283 | SV* av_pop(AV*); | |
284 | SV* av_shift(AV*); | |
285 | void av_unshift(AV*, I32 num); | |
286 | ||
287 | These should be familiar operations, with the exception of C<av_unshift>. | |
288 | This routine adds C<num> elements at the front of the array with the C<undef> | |
289 | value. You must then use C<av_store> (described below) to assign values | |
290 | to these new elements. | |
291 | ||
292 | Here are some other functions: | |
293 | ||
5f05dabc | 294 | I32 av_len(AV*); |
a0d0e21e | 295 | SV** av_fetch(AV*, I32 key, I32 lval); |
a0d0e21e | 296 | SV** av_store(AV*, I32 key, SV* val); |
a0d0e21e | 297 | |
5f05dabc | 298 | The C<av_len> function returns the highest index value in array (just |
299 | like $#array in Perl). If the array is empty, -1 is returned. The | |
300 | C<av_fetch> function returns the value at index C<key>, but if C<lval> | |
301 | is non-zero, then C<av_fetch> will store an undef value at that index. | |
04343c6d GS |
302 | The C<av_store> function stores the value C<val> at index C<key>, and does |
303 | not increment the reference count of C<val>. Thus the caller is responsible | |
304 | for taking care of that, and if C<av_store> returns NULL, the caller will | |
305 | have to decrement the reference count to avoid a memory leak. Note that | |
306 | C<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their | |
307 | return value. | |
d1b91892 | 308 | |
a0d0e21e | 309 | void av_clear(AV*); |
a0d0e21e | 310 | void av_undef(AV*); |
cb1a09d0 | 311 | void av_extend(AV*, I32 key); |
5f05dabc | 312 | |
313 | The C<av_clear> function deletes all the elements in the AV* array, but | |
314 | does not actually delete the array itself. The C<av_undef> function will | |
315 | delete all the elements in the array plus the array itself. The | |
adc882cf GS |
316 | C<av_extend> function extends the array so that it contains at least C<key+1> |
317 | elements. If C<key+1> is less than the currently allocated length of the array, | |
318 | then nothing is done. | |
a0d0e21e LW |
319 | |
320 | If you know the name of an array variable, you can get a pointer to its AV | |
321 | by using the following: | |
322 | ||
4929bf7b | 323 | AV* get_av("package::varname", FALSE); |
a0d0e21e LW |
324 | |
325 | This returns NULL if the variable does not exist. | |
326 | ||
04343c6d GS |
327 | See L<Understanding the Magic of Tied Hashes and Arrays> for more |
328 | information on how to use the array access functions on tied arrays. | |
329 | ||
54310121 | 330 | =head2 Working with HVs |
a0d0e21e LW |
331 | |
332 | To create an HV, you use the following routine: | |
333 | ||
334 | HV* newHV(); | |
335 | ||
54310121 | 336 | Once the HV has been created, the following operations are possible on HVs: |
a0d0e21e | 337 | |
08105a92 GS |
338 | SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash); |
339 | SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval); | |
a0d0e21e | 340 | |
5f05dabc | 341 | The C<klen> parameter is the length of the key being passed in (Note that |
342 | you cannot pass 0 in as a value of C<klen> to tell Perl to measure the | |
343 | length of the key). The C<val> argument contains the SV pointer to the | |
54310121 | 344 | scalar being stored, and C<hash> is the precomputed hash value (zero if |
5f05dabc | 345 | you want C<hv_store> to calculate it for you). The C<lval> parameter |
346 | indicates whether this fetch is actually a part of a store operation, in | |
347 | which case a new undefined value will be added to the HV with the supplied | |
348 | key and C<hv_fetch> will return as if the value had already existed. | |
a0d0e21e | 349 | |
5f05dabc | 350 | Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just |
351 | C<SV*>. To access the scalar value, you must first dereference the return | |
352 | value. However, you should check to make sure that the return value is | |
353 | not NULL before dereferencing it. | |
a0d0e21e LW |
354 | |
355 | These two functions check if a hash table entry exists, and deletes it. | |
356 | ||
08105a92 GS |
357 | bool hv_exists(HV*, const char* key, U32 klen); |
358 | SV* hv_delete(HV*, const char* key, U32 klen, I32 flags); | |
a0d0e21e | 359 | |
5f05dabc | 360 | If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will |
361 | create and return a mortal copy of the deleted value. | |
362 | ||
a0d0e21e LW |
363 | And more miscellaneous functions: |
364 | ||
365 | void hv_clear(HV*); | |
a0d0e21e | 366 | void hv_undef(HV*); |
5f05dabc | 367 | |
368 | Like their AV counterparts, C<hv_clear> deletes all the entries in the hash | |
369 | table but does not actually delete the hash table. The C<hv_undef> deletes | |
370 | both the entries and the hash table itself. | |
a0d0e21e | 371 | |
d1b91892 AD |
372 | Perl keeps the actual data in linked list of structures with a typedef of HE. |
373 | These contain the actual key and value pointers (plus extra administrative | |
374 | overhead). The key is a string pointer; the value is an C<SV*>. However, | |
375 | once you have an C<HE*>, to get the actual key and value, use the routines | |
376 | specified below. | |
377 | ||
a0d0e21e LW |
378 | I32 hv_iterinit(HV*); |
379 | /* Prepares starting point to traverse hash table */ | |
380 | HE* hv_iternext(HV*); | |
381 | /* Get the next entry, and return a pointer to a | |
382 | structure that has both the key and value */ | |
383 | char* hv_iterkey(HE* entry, I32* retlen); | |
384 | /* Get the key from an HE structure and also return | |
385 | the length of the key string */ | |
cb1a09d0 | 386 | SV* hv_iterval(HV*, HE* entry); |
a0d0e21e LW |
387 | /* Return a SV pointer to the value of the HE |
388 | structure */ | |
cb1a09d0 | 389 | SV* hv_iternextsv(HV*, char** key, I32* retlen); |
d1b91892 AD |
390 | /* This convenience routine combines hv_iternext, |
391 | hv_iterkey, and hv_iterval. The key and retlen | |
392 | arguments are return values for the key and its | |
393 | length. The value is returned in the SV* argument */ | |
a0d0e21e LW |
394 | |
395 | If you know the name of a hash variable, you can get a pointer to its HV | |
396 | by using the following: | |
397 | ||
4929bf7b | 398 | HV* get_hv("package::varname", FALSE); |
a0d0e21e LW |
399 | |
400 | This returns NULL if the variable does not exist. | |
401 | ||
8ebc5c01 | 402 | The hash algorithm is defined in the C<PERL_HASH(hash, key, klen)> macro: |
a0d0e21e | 403 | |
a0d0e21e | 404 | hash = 0; |
ab192400 GS |
405 | while (klen--) |
406 | hash = (hash * 33) + *key++; | |
87275199 | 407 | hash = hash + (hash >> 5); /* after 5.6 */ |
ab192400 | 408 | |
87275199 | 409 | The last step was added in version 5.6 to improve distribution of |
ab192400 | 410 | lower bits in the resulting hash value. |
a0d0e21e | 411 | |
04343c6d GS |
412 | See L<Understanding the Magic of Tied Hashes and Arrays> for more |
413 | information on how to use the hash access functions on tied hashes. | |
414 | ||
1e422769 | 415 | =head2 Hash API Extensions |
416 | ||
417 | Beginning with version 5.004, the following functions are also supported: | |
418 | ||
419 | HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash); | |
420 | HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash); | |
c47ff5f1 | 421 | |
1e422769 | 422 | bool hv_exists_ent (HV* tb, SV* key, U32 hash); |
423 | SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash); | |
c47ff5f1 | 424 | |
1e422769 | 425 | SV* hv_iterkeysv (HE* entry); |
426 | ||
427 | Note that these functions take C<SV*> keys, which simplifies writing | |
428 | of extension code that deals with hash structures. These functions | |
429 | also allow passing of C<SV*> keys to C<tie> functions without forcing | |
430 | you to stringify the keys (unlike the previous set of functions). | |
431 | ||
432 | They also return and accept whole hash entries (C<HE*>), making their | |
433 | use more efficient (since the hash number for a particular string | |
4a4eefd0 GS |
434 | doesn't have to be recomputed every time). See L<perlapi> for detailed |
435 | descriptions. | |
1e422769 | 436 | |
437 | The following macros must always be used to access the contents of hash | |
438 | entries. Note that the arguments to these macros must be simple | |
439 | variables, since they may get evaluated more than once. See | |
4a4eefd0 | 440 | L<perlapi> for detailed descriptions of these macros. |
1e422769 | 441 | |
442 | HePV(HE* he, STRLEN len) | |
443 | HeVAL(HE* he) | |
444 | HeHASH(HE* he) | |
445 | HeSVKEY(HE* he) | |
446 | HeSVKEY_force(HE* he) | |
447 | HeSVKEY_set(HE* he, SV* sv) | |
448 | ||
449 | These two lower level macros are defined, but must only be used when | |
450 | dealing with keys that are not C<SV*>s: | |
451 | ||
452 | HeKEY(HE* he) | |
453 | HeKLEN(HE* he) | |
454 | ||
04343c6d GS |
455 | Note that both C<hv_store> and C<hv_store_ent> do not increment the |
456 | reference count of the stored C<val>, which is the caller's responsibility. | |
457 | If these functions return a NULL value, the caller will usually have to | |
458 | decrement the reference count of C<val> to avoid a memory leak. | |
1e422769 | 459 | |
a0d0e21e LW |
460 | =head2 References |
461 | ||
d1b91892 AD |
462 | References are a special type of scalar that point to other data types |
463 | (including references). | |
a0d0e21e | 464 | |
07fa94a1 | 465 | To create a reference, use either of the following functions: |
a0d0e21e | 466 | |
5f05dabc | 467 | SV* newRV_inc((SV*) thing); |
468 | SV* newRV_noinc((SV*) thing); | |
a0d0e21e | 469 | |
5f05dabc | 470 | The C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>. The |
07fa94a1 JO |
471 | functions are identical except that C<newRV_inc> increments the reference |
472 | count of the C<thing>, while C<newRV_noinc> does not. For historical | |
473 | reasons, C<newRV> is a synonym for C<newRV_inc>. | |
474 | ||
475 | Once you have a reference, you can use the following macro to dereference | |
476 | the reference: | |
a0d0e21e LW |
477 | |
478 | SvRV(SV*) | |
479 | ||
480 | then call the appropriate routines, casting the returned C<SV*> to either an | |
d1b91892 | 481 | C<AV*> or C<HV*>, if required. |
a0d0e21e | 482 | |
d1b91892 | 483 | To determine if an SV is a reference, you can use the following macro: |
a0d0e21e LW |
484 | |
485 | SvROK(SV*) | |
486 | ||
07fa94a1 JO |
487 | To discover what type of value the reference refers to, use the following |
488 | macro and then check the return value. | |
d1b91892 AD |
489 | |
490 | SvTYPE(SvRV(SV*)) | |
491 | ||
492 | The most useful types that will be returned are: | |
493 | ||
494 | SVt_IV Scalar | |
495 | SVt_NV Scalar | |
496 | SVt_PV Scalar | |
5f05dabc | 497 | SVt_RV Scalar |
d1b91892 AD |
498 | SVt_PVAV Array |
499 | SVt_PVHV Hash | |
500 | SVt_PVCV Code | |
5f05dabc | 501 | SVt_PVGV Glob (possible a file handle) |
502 | SVt_PVMG Blessed or Magical Scalar | |
503 | ||
504 | See the sv.h header file for more details. | |
d1b91892 | 505 | |
cb1a09d0 AD |
506 | =head2 Blessed References and Class Objects |
507 | ||
508 | References are also used to support object-oriented programming. In the | |
509 | OO lexicon, an object is simply a reference that has been blessed into a | |
510 | package (or class). Once blessed, the programmer may now use the reference | |
511 | to access the various methods in the class. | |
512 | ||
513 | A reference can be blessed into a package with the following function: | |
514 | ||
515 | SV* sv_bless(SV* sv, HV* stash); | |
516 | ||
517 | The C<sv> argument must be a reference. The C<stash> argument specifies | |
3fe9a6f1 | 518 | which class the reference will belong to. See |
2ae324a7 | 519 | L<Stashes and Globs> for information on converting class names into stashes. |
cb1a09d0 AD |
520 | |
521 | /* Still under construction */ | |
522 | ||
523 | Upgrades rv to reference if not already one. Creates new SV for rv to | |
8ebc5c01 | 524 | point to. If C<classname> is non-null, the SV is blessed into the specified |
525 | class. SV is returned. | |
cb1a09d0 | 526 | |
08105a92 | 527 | SV* newSVrv(SV* rv, const char* classname); |
cb1a09d0 | 528 | |
8ebc5c01 | 529 | Copies integer or double into an SV whose reference is C<rv>. SV is blessed |
530 | if C<classname> is non-null. | |
cb1a09d0 | 531 | |
08105a92 GS |
532 | SV* sv_setref_iv(SV* rv, const char* classname, IV iv); |
533 | SV* sv_setref_nv(SV* rv, const char* classname, NV iv); | |
cb1a09d0 | 534 | |
5f05dabc | 535 | Copies the pointer value (I<the address, not the string!>) into an SV whose |
8ebc5c01 | 536 | reference is rv. SV is blessed if C<classname> is non-null. |
cb1a09d0 | 537 | |
08105a92 | 538 | SV* sv_setref_pv(SV* rv, const char* classname, PV iv); |
cb1a09d0 | 539 | |
8ebc5c01 | 540 | Copies string into an SV whose reference is C<rv>. Set length to 0 to let |
541 | Perl calculate the string length. SV is blessed if C<classname> is non-null. | |
cb1a09d0 | 542 | |
e65f3abd | 543 | SV* sv_setref_pvn(SV* rv, const char* classname, PV iv, STRLEN length); |
cb1a09d0 | 544 | |
9abd00ed GS |
545 | Tests whether the SV is blessed into the specified class. It does not |
546 | check inheritance relationships. | |
547 | ||
08105a92 | 548 | int sv_isa(SV* sv, const char* name); |
9abd00ed GS |
549 | |
550 | Tests whether the SV is a reference to a blessed object. | |
551 | ||
552 | int sv_isobject(SV* sv); | |
553 | ||
554 | Tests whether the SV is derived from the specified class. SV can be either | |
555 | a reference to a blessed object or a string containing a class name. This | |
556 | is the function implementing the C<UNIVERSAL::isa> functionality. | |
557 | ||
08105a92 | 558 | bool sv_derived_from(SV* sv, const char* name); |
9abd00ed GS |
559 | |
560 | To check if you've got an object derived from a specific class you have | |
561 | to write: | |
562 | ||
563 | if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... } | |
cb1a09d0 | 564 | |
5f05dabc | 565 | =head2 Creating New Variables |
cb1a09d0 | 566 | |
5f05dabc | 567 | To create a new Perl variable with an undef value which can be accessed from |
568 | your Perl script, use the following routines, depending on the variable type. | |
cb1a09d0 | 569 | |
4929bf7b GS |
570 | SV* get_sv("package::varname", TRUE); |
571 | AV* get_av("package::varname", TRUE); | |
572 | HV* get_hv("package::varname", TRUE); | |
cb1a09d0 AD |
573 | |
574 | Notice the use of TRUE as the second parameter. The new variable can now | |
575 | be set, using the routines appropriate to the data type. | |
576 | ||
5f05dabc | 577 | There are additional macros whose values may be bitwise OR'ed with the |
578 | C<TRUE> argument to enable certain extra features. Those bits are: | |
cb1a09d0 | 579 | |
5f05dabc | 580 | GV_ADDMULTI Marks the variable as multiply defined, thus preventing the |
54310121 | 581 | "Name <varname> used only once: possible typo" warning. |
07fa94a1 JO |
582 | GV_ADDWARN Issues the warning "Had to create <varname> unexpectedly" if |
583 | the variable did not exist before the function was called. | |
cb1a09d0 | 584 | |
07fa94a1 JO |
585 | If you do not specify a package name, the variable is created in the current |
586 | package. | |
cb1a09d0 | 587 | |
5f05dabc | 588 | =head2 Reference Counts and Mortality |
a0d0e21e | 589 | |
54310121 | 590 | Perl uses an reference count-driven garbage collection mechanism. SVs, |
591 | AVs, or HVs (xV for short in the following) start their life with a | |
55497cff | 592 | reference count of 1. If the reference count of an xV ever drops to 0, |
07fa94a1 | 593 | then it will be destroyed and its memory made available for reuse. |
55497cff | 594 | |
595 | This normally doesn't happen at the Perl level unless a variable is | |
5f05dabc | 596 | undef'ed or the last variable holding a reference to it is changed or |
597 | overwritten. At the internal level, however, reference counts can be | |
55497cff | 598 | manipulated with the following macros: |
599 | ||
600 | int SvREFCNT(SV* sv); | |
5f05dabc | 601 | SV* SvREFCNT_inc(SV* sv); |
55497cff | 602 | void SvREFCNT_dec(SV* sv); |
603 | ||
604 | However, there is one other function which manipulates the reference | |
07fa94a1 JO |
605 | count of its argument. The C<newRV_inc> function, you will recall, |
606 | creates a reference to the specified argument. As a side effect, | |
607 | it increments the argument's reference count. If this is not what | |
608 | you want, use C<newRV_noinc> instead. | |
609 | ||
610 | For example, imagine you want to return a reference from an XSUB function. | |
611 | Inside the XSUB routine, you create an SV which initially has a reference | |
612 | count of one. Then you call C<newRV_inc>, passing it the just-created SV. | |
5f05dabc | 613 | This returns the reference as a new SV, but the reference count of the |
614 | SV you passed to C<newRV_inc> has been incremented to two. Now you | |
07fa94a1 JO |
615 | return the reference from the XSUB routine and forget about the SV. |
616 | But Perl hasn't! Whenever the returned reference is destroyed, the | |
617 | reference count of the original SV is decreased to one and nothing happens. | |
618 | The SV will hang around without any way to access it until Perl itself | |
619 | terminates. This is a memory leak. | |
5f05dabc | 620 | |
621 | The correct procedure, then, is to use C<newRV_noinc> instead of | |
faed5253 JO |
622 | C<newRV_inc>. Then, if and when the last reference is destroyed, |
623 | the reference count of the SV will go to zero and it will be destroyed, | |
07fa94a1 | 624 | stopping any memory leak. |
55497cff | 625 | |
5f05dabc | 626 | There are some convenience functions available that can help with the |
54310121 | 627 | destruction of xVs. These functions introduce the concept of "mortality". |
07fa94a1 JO |
628 | An xV that is mortal has had its reference count marked to be decremented, |
629 | but not actually decremented, until "a short time later". Generally the | |
630 | term "short time later" means a single Perl statement, such as a call to | |
54310121 | 631 | an XSUB function. The actual determinant for when mortal xVs have their |
07fa94a1 JO |
632 | reference count decremented depends on two macros, SAVETMPS and FREETMPS. |
633 | See L<perlcall> and L<perlxs> for more details on these macros. | |
55497cff | 634 | |
635 | "Mortalization" then is at its simplest a deferred C<SvREFCNT_dec>. | |
636 | However, if you mortalize a variable twice, the reference count will | |
637 | later be decremented twice. | |
638 | ||
639 | You should be careful about creating mortal variables. Strange things | |
640 | can happen if you make the same value mortal within multiple contexts, | |
5f05dabc | 641 | or if you make a variable mortal multiple times. |
a0d0e21e LW |
642 | |
643 | To create a mortal variable, use the functions: | |
644 | ||
645 | SV* sv_newmortal() | |
646 | SV* sv_2mortal(SV*) | |
647 | SV* sv_mortalcopy(SV*) | |
648 | ||
5f05dabc | 649 | The first call creates a mortal SV, the second converts an existing |
650 | SV to a mortal SV (and thus defers a call to C<SvREFCNT_dec>), and the | |
651 | third creates a mortal copy of an existing SV. | |
a0d0e21e | 652 | |
54310121 | 653 | The mortal routines are not just for SVs -- AVs and HVs can be |
faed5253 | 654 | made mortal by passing their address (type-casted to C<SV*>) to the |
07fa94a1 | 655 | C<sv_2mortal> or C<sv_mortalcopy> routines. |
a0d0e21e | 656 | |
5f05dabc | 657 | =head2 Stashes and Globs |
a0d0e21e | 658 | |
aa689395 | 659 | A "stash" is a hash that contains all of the different objects that |
660 | are contained within a package. Each key of the stash is a symbol | |
661 | name (shared by all the different types of objects that have the same | |
662 | name), and each value in the hash table is a GV (Glob Value). This GV | |
663 | in turn contains references to the various objects of that name, | |
664 | including (but not limited to) the following: | |
cb1a09d0 | 665 | |
a0d0e21e LW |
666 | Scalar Value |
667 | Array Value | |
668 | Hash Value | |
a3cb178b | 669 | I/O Handle |
a0d0e21e LW |
670 | Format |
671 | Subroutine | |
672 | ||
9cde0e7f | 673 | There is a single stash called "PL_defstash" that holds the items that exist |
5f05dabc | 674 | in the "main" package. To get at the items in other packages, append the |
675 | string "::" to the package name. The items in the "Foo" package are in | |
9cde0e7f | 676 | the stash "Foo::" in PL_defstash. The items in the "Bar::Baz" package are |
5f05dabc | 677 | in the stash "Baz::" in "Bar::"'s stash. |
a0d0e21e | 678 | |
d1b91892 | 679 | To get the stash pointer for a particular package, use the function: |
a0d0e21e | 680 | |
08105a92 | 681 | HV* gv_stashpv(const char* name, I32 create) |
a0d0e21e LW |
682 | HV* gv_stashsv(SV*, I32 create) |
683 | ||
684 | The first function takes a literal string, the second uses the string stored | |
d1b91892 | 685 | in the SV. Remember that a stash is just a hash table, so you get back an |
cb1a09d0 | 686 | C<HV*>. The C<create> flag will create a new package if it is set. |
a0d0e21e LW |
687 | |
688 | The name that C<gv_stash*v> wants is the name of the package whose symbol table | |
689 | you want. The default package is called C<main>. If you have multiply nested | |
d1b91892 AD |
690 | packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl |
691 | language itself. | |
a0d0e21e LW |
692 | |
693 | Alternately, if you have an SV that is a blessed reference, you can find | |
694 | out the stash pointer by using: | |
695 | ||
696 | HV* SvSTASH(SvRV(SV*)); | |
697 | ||
698 | then use the following to get the package name itself: | |
699 | ||
700 | char* HvNAME(HV* stash); | |
701 | ||
5f05dabc | 702 | If you need to bless or re-bless an object you can use the following |
703 | function: | |
a0d0e21e LW |
704 | |
705 | SV* sv_bless(SV*, HV* stash) | |
706 | ||
707 | where the first argument, an C<SV*>, must be a reference, and the second | |
708 | argument is a stash. The returned C<SV*> can now be used in the same way | |
709 | as any other SV. | |
710 | ||
d1b91892 AD |
711 | For more information on references and blessings, consult L<perlref>. |
712 | ||
54310121 | 713 | =head2 Double-Typed SVs |
0a753a76 | 714 | |
715 | Scalar variables normally contain only one type of value, an integer, | |
716 | double, pointer, or reference. Perl will automatically convert the | |
717 | actual scalar data from the stored type into the requested type. | |
718 | ||
719 | Some scalar variables contain more than one type of scalar data. For | |
720 | example, the variable C<$!> contains either the numeric value of C<errno> | |
721 | or its string equivalent from either C<strerror> or C<sys_errlist[]>. | |
722 | ||
723 | To force multiple data values into an SV, you must do two things: use the | |
724 | C<sv_set*v> routines to add the additional scalar type, then set a flag | |
725 | so that Perl will believe it contains more than one type of data. The | |
726 | four macros to set the flags are: | |
727 | ||
728 | SvIOK_on | |
729 | SvNOK_on | |
730 | SvPOK_on | |
731 | SvROK_on | |
732 | ||
733 | The particular macro you must use depends on which C<sv_set*v> routine | |
734 | you called first. This is because every C<sv_set*v> routine turns on | |
735 | only the bit for the particular type of data being set, and turns off | |
736 | all the rest. | |
737 | ||
738 | For example, to create a new Perl variable called "dberror" that contains | |
739 | both the numeric and descriptive string error values, you could use the | |
740 | following code: | |
741 | ||
742 | extern int dberror; | |
743 | extern char *dberror_list; | |
744 | ||
4929bf7b | 745 | SV* sv = get_sv("dberror", TRUE); |
0a753a76 | 746 | sv_setiv(sv, (IV) dberror); |
747 | sv_setpv(sv, dberror_list[dberror]); | |
748 | SvIOK_on(sv); | |
749 | ||
750 | If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the | |
751 | macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>. | |
752 | ||
753 | =head2 Magic Variables | |
a0d0e21e | 754 | |
d1b91892 AD |
755 | [This section still under construction. Ignore everything here. Post no |
756 | bills. Everything not permitted is forbidden.] | |
757 | ||
d1b91892 AD |
758 | Any SV may be magical, that is, it has special features that a normal |
759 | SV does not have. These features are stored in the SV structure in a | |
5f05dabc | 760 | linked list of C<struct magic>'s, typedef'ed to C<MAGIC>. |
d1b91892 AD |
761 | |
762 | struct magic { | |
763 | MAGIC* mg_moremagic; | |
764 | MGVTBL* mg_virtual; | |
765 | U16 mg_private; | |
766 | char mg_type; | |
767 | U8 mg_flags; | |
768 | SV* mg_obj; | |
769 | char* mg_ptr; | |
770 | I32 mg_len; | |
771 | }; | |
772 | ||
773 | Note this is current as of patchlevel 0, and could change at any time. | |
774 | ||
775 | =head2 Assigning Magic | |
776 | ||
777 | Perl adds magic to an SV using the sv_magic function: | |
778 | ||
08105a92 | 779 | void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen); |
d1b91892 AD |
780 | |
781 | The C<sv> argument is a pointer to the SV that is to acquire a new magical | |
782 | feature. | |
783 | ||
784 | If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to | |
785 | set the C<SVt_PVMG> flag for the C<sv>. Perl then continues by adding | |
786 | it to the beginning of the linked list of magical features. Any prior | |
787 | entry of the same type of magic is deleted. Note that this can be | |
5fb8527f | 788 | overridden, and multiple instances of the same type of magic can be |
d1b91892 AD |
789 | associated with an SV. |
790 | ||
54310121 | 791 | The C<name> and C<namlen> arguments are used to associate a string with |
792 | the magic, typically the name of a variable. C<namlen> is stored in the | |
793 | C<mg_len> field and if C<name> is non-null and C<namlen> >= 0 a malloc'd | |
d1b91892 AD |
794 | copy of the name is stored in C<mg_ptr> field. |
795 | ||
796 | The sv_magic function uses C<how> to determine which, if any, predefined | |
797 | "Magic Virtual Table" should be assigned to the C<mg_virtual> field. | |
cb1a09d0 AD |
798 | See the "Magic Virtual Table" section below. The C<how> argument is also |
799 | stored in the C<mg_type> field. | |
d1b91892 AD |
800 | |
801 | The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC> | |
802 | structure. If it is not the same as the C<sv> argument, the reference | |
803 | count of the C<obj> object is incremented. If it is the same, or if | |
04343c6d | 804 | the C<how> argument is "#", or if it is a NULL pointer, then C<obj> is |
d1b91892 AD |
805 | merely stored, without the reference count being incremented. |
806 | ||
cb1a09d0 AD |
807 | There is also a function to add magic to an C<HV>: |
808 | ||
809 | void hv_magic(HV *hv, GV *gv, int how); | |
810 | ||
811 | This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>. | |
812 | ||
813 | To remove the magic from an SV, call the function sv_unmagic: | |
814 | ||
815 | void sv_unmagic(SV *sv, int type); | |
816 | ||
817 | The C<type> argument should be equal to the C<how> value when the C<SV> | |
818 | was initially made magical. | |
819 | ||
d1b91892 AD |
820 | =head2 Magic Virtual Tables |
821 | ||
822 | The C<mg_virtual> field in the C<MAGIC> structure is a pointer to a | |
823 | C<MGVTBL>, which is a structure of function pointers and stands for | |
824 | "Magic Virtual Table" to handle the various operations that might be | |
825 | applied to that variable. | |
826 | ||
827 | The C<MGVTBL> has five pointers to the following routine types: | |
828 | ||
829 | int (*svt_get)(SV* sv, MAGIC* mg); | |
830 | int (*svt_set)(SV* sv, MAGIC* mg); | |
831 | U32 (*svt_len)(SV* sv, MAGIC* mg); | |
832 | int (*svt_clear)(SV* sv, MAGIC* mg); | |
833 | int (*svt_free)(SV* sv, MAGIC* mg); | |
834 | ||
835 | This MGVTBL structure is set at compile-time in C<perl.h> and there are | |
836 | currently 19 types (or 21 with overloading turned on). These different | |
837 | structures contain pointers to various routines that perform additional | |
838 | actions depending on which function is being called. | |
839 | ||
840 | Function pointer Action taken | |
841 | ---------------- ------------ | |
842 | svt_get Do something after the value of the SV is retrieved. | |
843 | svt_set Do something after the SV is assigned a value. | |
844 | svt_len Report on the SV's length. | |
845 | svt_clear Clear something the SV represents. | |
846 | svt_free Free any extra storage associated with the SV. | |
847 | ||
848 | For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds | |
849 | to an C<mg_type> of '\0') contains: | |
850 | ||
851 | { magic_get, magic_set, magic_len, 0, 0 } | |
852 | ||
853 | Thus, when an SV is determined to be magical and of type '\0', if a get | |
854 | operation is being performed, the routine C<magic_get> is called. All | |
855 | the various routines for the various magical types begin with C<magic_>. | |
954c1994 GS |
856 | NOTE: the magic routines are not considered part of the Perl API, and may |
857 | not be exported by the Perl library. | |
d1b91892 AD |
858 | |
859 | The current kinds of Magic Virtual Tables are: | |
860 | ||
bdbeb323 | 861 | mg_type MGVTBL Type of magic |
5f05dabc | 862 | ------- ------ ---------------------------- |
bdbeb323 SM |
863 | \0 vtbl_sv Special scalar variable |
864 | A vtbl_amagic %OVERLOAD hash | |
865 | a vtbl_amagicelem %OVERLOAD hash element | |
866 | c (none) Holds overload table (AMT) on stash | |
867 | B vtbl_bm Boyer-Moore (fast string search) | |
c2e66d9e GS |
868 | D vtbl_regdata Regex match position data (@+ and @- vars) |
869 | d vtbl_regdatum Regex match position data element | |
d1b91892 AD |
870 | E vtbl_env %ENV hash |
871 | e vtbl_envelem %ENV hash element | |
bdbeb323 SM |
872 | f vtbl_fm Formline ('compiled' format) |
873 | g vtbl_mglob m//g target / study()ed string | |
d1b91892 AD |
874 | I vtbl_isa @ISA array |
875 | i vtbl_isaelem @ISA array element | |
bdbeb323 SM |
876 | k vtbl_nkeys scalar(keys()) lvalue |
877 | L (none) Debugger %_<filename | |
878 | l vtbl_dbline Debugger %_<filename element | |
44a8e56a | 879 | o vtbl_collxfrm Locale transformation |
bdbeb323 SM |
880 | P vtbl_pack Tied array or hash |
881 | p vtbl_packelem Tied array or hash element | |
882 | q vtbl_packelem Tied scalar or handle | |
883 | S vtbl_sig %SIG hash | |
884 | s vtbl_sigelem %SIG hash element | |
d1b91892 | 885 | t vtbl_taint Taintedness |
bdbeb323 SM |
886 | U vtbl_uvar Available for use by extensions |
887 | v vtbl_vec vec() lvalue | |
888 | x vtbl_substr substr() lvalue | |
889 | y vtbl_defelem Shadow "foreach" iterator variable / | |
890 | smart parameter vivification | |
891 | * vtbl_glob GV (typeglob) | |
892 | # vtbl_arylen Array length ($#ary) | |
893 | . vtbl_pos pos() lvalue | |
894 | ~ (none) Available for use by extensions | |
d1b91892 | 895 | |
68dc0745 | 896 | When an uppercase and lowercase letter both exist in the table, then the |
897 | uppercase letter is used to represent some kind of composite type (a list | |
898 | or a hash), and the lowercase letter is used to represent an element of | |
d1b91892 AD |
899 | that composite type. |
900 | ||
bdbeb323 SM |
901 | The '~' and 'U' magic types are defined specifically for use by |
902 | extensions and will not be used by perl itself. Extensions can use | |
903 | '~' magic to 'attach' private information to variables (typically | |
904 | objects). This is especially useful because there is no way for | |
905 | normal perl code to corrupt this private information (unlike using | |
906 | extra elements of a hash object). | |
907 | ||
908 | Similarly, 'U' magic can be used much like tie() to call a C function | |
909 | any time a scalar's value is used or changed. The C<MAGIC>'s | |
910 | C<mg_ptr> field points to a C<ufuncs> structure: | |
911 | ||
912 | struct ufuncs { | |
913 | I32 (*uf_val)(IV, SV*); | |
914 | I32 (*uf_set)(IV, SV*); | |
915 | IV uf_index; | |
916 | }; | |
917 | ||
918 | When the SV is read from or written to, the C<uf_val> or C<uf_set> | |
919 | function will be called with C<uf_index> as the first arg and a | |
1526ead6 AB |
920 | pointer to the SV as the second. A simple example of how to add 'U' |
921 | magic is shown below. Note that the ufuncs structure is copied by | |
922 | sv_magic, so you can safely allocate it on the stack. | |
923 | ||
924 | void | |
925 | Umagic(sv) | |
926 | SV *sv; | |
927 | PREINIT: | |
928 | struct ufuncs uf; | |
929 | CODE: | |
930 | uf.uf_val = &my_get_fn; | |
931 | uf.uf_set = &my_set_fn; | |
932 | uf.uf_index = 0; | |
933 | sv_magic(sv, 0, 'U', (char*)&uf, sizeof(uf)); | |
5f05dabc | 934 | |
bdbeb323 SM |
935 | Note that because multiple extensions may be using '~' or 'U' magic, |
936 | it is important for extensions to take extra care to avoid conflict. | |
937 | Typically only using the magic on objects blessed into the same class | |
938 | as the extension is sufficient. For '~' magic, it may also be | |
939 | appropriate to add an I32 'signature' at the top of the private data | |
940 | area and check that. | |
5f05dabc | 941 | |
ef50df4b GS |
942 | Also note that the C<sv_set*()> and C<sv_cat*()> functions described |
943 | earlier do B<not> invoke 'set' magic on their targets. This must | |
944 | be done by the user either by calling the C<SvSETMAGIC()> macro after | |
945 | calling these functions, or by using one of the C<sv_set*_mg()> or | |
946 | C<sv_cat*_mg()> functions. Similarly, generic C code must call the | |
947 | C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV | |
948 | obtained from external sources in functions that don't handle magic. | |
4a4eefd0 | 949 | See L<perlapi> for a description of these functions. |
189b2af5 GS |
950 | For example, calls to the C<sv_cat*()> functions typically need to be |
951 | followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()> | |
952 | since their implementation handles 'get' magic. | |
953 | ||
d1b91892 AD |
954 | =head2 Finding Magic |
955 | ||
956 | MAGIC* mg_find(SV*, int type); /* Finds the magic pointer of that type */ | |
957 | ||
958 | This routine returns a pointer to the C<MAGIC> structure stored in the SV. | |
959 | If the SV does not have that magical feature, C<NULL> is returned. Also, | |
54310121 | 960 | if the SV is not of type SVt_PVMG, Perl may core dump. |
d1b91892 | 961 | |
08105a92 | 962 | int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen); |
d1b91892 AD |
963 | |
964 | This routine checks to see what types of magic C<sv> has. If the mg_type | |
68dc0745 | 965 | field is an uppercase letter, then the mg_obj is copied to C<nsv>, but |
966 | the mg_type field is changed to be the lowercase letter. | |
a0d0e21e | 967 | |
04343c6d GS |
968 | =head2 Understanding the Magic of Tied Hashes and Arrays |
969 | ||
970 | Tied hashes and arrays are magical beasts of the 'P' magic type. | |
9edb2b46 GS |
971 | |
972 | WARNING: As of the 5.004 release, proper usage of the array and hash | |
973 | access functions requires understanding a few caveats. Some | |
974 | of these caveats are actually considered bugs in the API, to be fixed | |
975 | in later releases, and are bracketed with [MAYCHANGE] below. If | |
976 | you find yourself actually applying such information in this section, be | |
977 | aware that the behavior may change in the future, umm, without warning. | |
04343c6d | 978 | |
1526ead6 AB |
979 | The perl tie function associates a variable with an object that implements |
980 | the various GET, SET etc methods. To perform the equivalent of the perl | |
981 | tie function from an XSUB, you must mimic this behaviour. The code below | |
982 | carries out the necessary steps - firstly it creates a new hash, and then | |
983 | creates a second hash which it blesses into the class which will implement | |
984 | the tie methods. Lastly it ties the two hashes together, and returns a | |
985 | reference to the new tied hash. Note that the code below does NOT call the | |
986 | TIEHASH method in the MyTie class - | |
987 | see L<Calling Perl Routines from within C Programs> for details on how | |
988 | to do this. | |
989 | ||
990 | SV* | |
991 | mytie() | |
992 | PREINIT: | |
993 | HV *hash; | |
994 | HV *stash; | |
995 | SV *tie; | |
996 | CODE: | |
997 | hash = newHV(); | |
998 | tie = newRV_noinc((SV*)newHV()); | |
999 | stash = gv_stashpv("MyTie", TRUE); | |
1000 | sv_bless(tie, stash); | |
1001 | hv_magic(hash, tie, 'P'); | |
1002 | RETVAL = newRV_noinc(hash); | |
1003 | OUTPUT: | |
1004 | RETVAL | |
1005 | ||
04343c6d GS |
1006 | The C<av_store> function, when given a tied array argument, merely |
1007 | copies the magic of the array onto the value to be "stored", using | |
1008 | C<mg_copy>. It may also return NULL, indicating that the value did not | |
9edb2b46 GS |
1009 | actually need to be stored in the array. [MAYCHANGE] After a call to |
1010 | C<av_store> on a tied array, the caller will usually need to call | |
1011 | C<mg_set(val)> to actually invoke the perl level "STORE" method on the | |
1012 | TIEARRAY object. If C<av_store> did return NULL, a call to | |
1013 | C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory | |
1014 | leak. [/MAYCHANGE] | |
04343c6d GS |
1015 | |
1016 | The previous paragraph is applicable verbatim to tied hash access using the | |
1017 | C<hv_store> and C<hv_store_ent> functions as well. | |
1018 | ||
1019 | C<av_fetch> and the corresponding hash functions C<hv_fetch> and | |
1020 | C<hv_fetch_ent> actually return an undefined mortal value whose magic | |
1021 | has been initialized using C<mg_copy>. Note the value so returned does not | |
9edb2b46 GS |
1022 | need to be deallocated, as it is already mortal. [MAYCHANGE] But you will |
1023 | need to call C<mg_get()> on the returned value in order to actually invoke | |
1024 | the perl level "FETCH" method on the underlying TIE object. Similarly, | |
04343c6d GS |
1025 | you may also call C<mg_set()> on the return value after possibly assigning |
1026 | a suitable value to it using C<sv_setsv>, which will invoke the "STORE" | |
9edb2b46 | 1027 | method on the TIE object. [/MAYCHANGE] |
04343c6d | 1028 | |
9edb2b46 | 1029 | [MAYCHANGE] |
04343c6d GS |
1030 | In other words, the array or hash fetch/store functions don't really |
1031 | fetch and store actual values in the case of tied arrays and hashes. They | |
1032 | merely call C<mg_copy> to attach magic to the values that were meant to be | |
1033 | "stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually | |
1034 | do the job of invoking the TIE methods on the underlying objects. Thus | |
9edb2b46 | 1035 | the magic mechanism currently implements a kind of lazy access to arrays |
04343c6d GS |
1036 | and hashes. |
1037 | ||
1038 | Currently (as of perl version 5.004), use of the hash and array access | |
1039 | functions requires the user to be aware of whether they are operating on | |
9edb2b46 GS |
1040 | "normal" hashes and arrays, or on their tied variants. The API may be |
1041 | changed to provide more transparent access to both tied and normal data | |
1042 | types in future versions. | |
1043 | [/MAYCHANGE] | |
04343c6d GS |
1044 | |
1045 | You would do well to understand that the TIEARRAY and TIEHASH interfaces | |
1046 | are mere sugar to invoke some perl method calls while using the uniform hash | |
1047 | and array syntax. The use of this sugar imposes some overhead (typically | |
1048 | about two to four extra opcodes per FETCH/STORE operation, in addition to | |
1049 | the creation of all the mortal variables required to invoke the methods). | |
1050 | This overhead will be comparatively small if the TIE methods are themselves | |
1051 | substantial, but if they are only a few statements long, the overhead | |
1052 | will not be insignificant. | |
1053 | ||
d1c897a1 IZ |
1054 | =head2 Localizing changes |
1055 | ||
1056 | Perl has a very handy construction | |
1057 | ||
1058 | { | |
1059 | local $var = 2; | |
1060 | ... | |
1061 | } | |
1062 | ||
1063 | This construction is I<approximately> equivalent to | |
1064 | ||
1065 | { | |
1066 | my $oldvar = $var; | |
1067 | $var = 2; | |
1068 | ... | |
1069 | $var = $oldvar; | |
1070 | } | |
1071 | ||
1072 | The biggest difference is that the first construction would | |
1073 | reinstate the initial value of $var, irrespective of how control exits | |
1074 | the block: C<goto>, C<return>, C<die>/C<eval> etc. It is a little bit | |
1075 | more efficient as well. | |
1076 | ||
1077 | There is a way to achieve a similar task from C via Perl API: create a | |
1078 | I<pseudo-block>, and arrange for some changes to be automatically | |
1079 | undone at the end of it, either explicit, or via a non-local exit (via | |
1080 | die()). A I<block>-like construct is created by a pair of | |
b687b08b TC |
1081 | C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">). |
1082 | Such a construct may be created specially for some important localized | |
1083 | task, or an existing one (like boundaries of enclosing Perl | |
1084 | subroutine/block, or an existing pair for freeing TMPs) may be | |
1085 | used. (In the second case the overhead of additional localization must | |
1086 | be almost negligible.) Note that any XSUB is automatically enclosed in | |
1087 | an C<ENTER>/C<LEAVE> pair. | |
d1c897a1 IZ |
1088 | |
1089 | Inside such a I<pseudo-block> the following service is available: | |
1090 | ||
13a2d996 | 1091 | =over 4 |
d1c897a1 IZ |
1092 | |
1093 | =item C<SAVEINT(int i)> | |
1094 | ||
1095 | =item C<SAVEIV(IV i)> | |
1096 | ||
1097 | =item C<SAVEI32(I32 i)> | |
1098 | ||
1099 | =item C<SAVELONG(long i)> | |
1100 | ||
1101 | These macros arrange things to restore the value of integer variable | |
1102 | C<i> at the end of enclosing I<pseudo-block>. | |
1103 | ||
1104 | =item C<SAVESPTR(s)> | |
1105 | ||
1106 | =item C<SAVEPPTR(p)> | |
1107 | ||
1108 | These macros arrange things to restore the value of pointers C<s> and | |
1109 | C<p>. C<s> must be a pointer of a type which survives conversion to | |
1110 | C<SV*> and back, C<p> should be able to survive conversion to C<char*> | |
1111 | and back. | |
1112 | ||
1113 | =item C<SAVEFREESV(SV *sv)> | |
1114 | ||
1115 | The refcount of C<sv> would be decremented at the end of | |
1116 | I<pseudo-block>. This is similar to C<sv_2mortal>, which should (?) be | |
1117 | used instead. | |
1118 | ||
1119 | =item C<SAVEFREEOP(OP *op)> | |
1120 | ||
1121 | The C<OP *> is op_free()ed at the end of I<pseudo-block>. | |
1122 | ||
1123 | =item C<SAVEFREEPV(p)> | |
1124 | ||
1125 | The chunk of memory which is pointed to by C<p> is Safefree()ed at the | |
1126 | end of I<pseudo-block>. | |
1127 | ||
1128 | =item C<SAVECLEARSV(SV *sv)> | |
1129 | ||
1130 | Clears a slot in the current scratchpad which corresponds to C<sv> at | |
1131 | the end of I<pseudo-block>. | |
1132 | ||
1133 | =item C<SAVEDELETE(HV *hv, char *key, I32 length)> | |
1134 | ||
1135 | The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The | |
1136 | string pointed to by C<key> is Safefree()ed. If one has a I<key> in | |
1137 | short-lived storage, the corresponding string may be reallocated like | |
1138 | this: | |
1139 | ||
9cde0e7f | 1140 | SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf)); |
d1c897a1 | 1141 | |
c76ac1ee | 1142 | =item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)> |
d1c897a1 IZ |
1143 | |
1144 | At the end of I<pseudo-block> the function C<f> is called with the | |
c76ac1ee GS |
1145 | only argument C<p>. |
1146 | ||
1147 | =item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)> | |
1148 | ||
1149 | At the end of I<pseudo-block> the function C<f> is called with the | |
1150 | implicit context argument (if any), and C<p>. | |
d1c897a1 IZ |
1151 | |
1152 | =item C<SAVESTACK_POS()> | |
1153 | ||
1154 | The current offset on the Perl internal stack (cf. C<SP>) is restored | |
1155 | at the end of I<pseudo-block>. | |
1156 | ||
1157 | =back | |
1158 | ||
1159 | The following API list contains functions, thus one needs to | |
1160 | provide pointers to the modifiable data explicitly (either C pointers, | |
1161 | or Perlish C<GV *>s). Where the above macros take C<int>, a similar | |
1162 | function takes C<int *>. | |
1163 | ||
13a2d996 | 1164 | =over 4 |
d1c897a1 IZ |
1165 | |
1166 | =item C<SV* save_scalar(GV *gv)> | |
1167 | ||
1168 | Equivalent to Perl code C<local $gv>. | |
1169 | ||
1170 | =item C<AV* save_ary(GV *gv)> | |
1171 | ||
1172 | =item C<HV* save_hash(GV *gv)> | |
1173 | ||
1174 | Similar to C<save_scalar>, but localize C<@gv> and C<%gv>. | |
1175 | ||
1176 | =item C<void save_item(SV *item)> | |
1177 | ||
1178 | Duplicates the current value of C<SV>, on the exit from the current | |
1179 | C<ENTER>/C<LEAVE> I<pseudo-block> will restore the value of C<SV> | |
1180 | using the stored value. | |
1181 | ||
1182 | =item C<void save_list(SV **sarg, I32 maxsarg)> | |
1183 | ||
1184 | A variant of C<save_item> which takes multiple arguments via an array | |
1185 | C<sarg> of C<SV*> of length C<maxsarg>. | |
1186 | ||
1187 | =item C<SV* save_svref(SV **sptr)> | |
1188 | ||
1189 | Similar to C<save_scalar>, but will reinstate a C<SV *>. | |
1190 | ||
1191 | =item C<void save_aptr(AV **aptr)> | |
1192 | ||
1193 | =item C<void save_hptr(HV **hptr)> | |
1194 | ||
1195 | Similar to C<save_svref>, but localize C<AV *> and C<HV *>. | |
1196 | ||
1197 | =back | |
1198 | ||
1199 | The C<Alias> module implements localization of the basic types within the | |
1200 | I<caller's scope>. People who are interested in how to localize things in | |
1201 | the containing scope should take a look there too. | |
1202 | ||
0a753a76 | 1203 | =head1 Subroutines |
a0d0e21e | 1204 | |
68dc0745 | 1205 | =head2 XSUBs and the Argument Stack |
5f05dabc | 1206 | |
1207 | The XSUB mechanism is a simple way for Perl programs to access C subroutines. | |
1208 | An XSUB routine will have a stack that contains the arguments from the Perl | |
1209 | program, and a way to map from the Perl data structures to a C equivalent. | |
1210 | ||
1211 | The stack arguments are accessible through the C<ST(n)> macro, which returns | |
1212 | the C<n>'th stack argument. Argument 0 is the first argument passed in the | |
1213 | Perl subroutine call. These arguments are C<SV*>, and can be used anywhere | |
1214 | an C<SV*> is used. | |
1215 | ||
1216 | Most of the time, output from the C routine can be handled through use of | |
1217 | the RETVAL and OUTPUT directives. However, there are some cases where the | |
1218 | argument stack is not already long enough to handle all the return values. | |
1219 | An example is the POSIX tzname() call, which takes no arguments, but returns | |
1220 | two, the local time zone's standard and summer time abbreviations. | |
1221 | ||
1222 | To handle this situation, the PPCODE directive is used and the stack is | |
1223 | extended using the macro: | |
1224 | ||
924508f0 | 1225 | EXTEND(SP, num); |
5f05dabc | 1226 | |
924508f0 GS |
1227 | where C<SP> is the macro that represents the local copy of the stack pointer, |
1228 | and C<num> is the number of elements the stack should be extended by. | |
5f05dabc | 1229 | |
1230 | Now that there is room on the stack, values can be pushed on it using the | |
54310121 | 1231 | macros to push IVs, doubles, strings, and SV pointers respectively: |
5f05dabc | 1232 | |
1233 | PUSHi(IV) | |
1234 | PUSHn(double) | |
1235 | PUSHp(char*, I32) | |
1236 | PUSHs(SV*) | |
1237 | ||
1238 | And now the Perl program calling C<tzname>, the two values will be assigned | |
1239 | as in: | |
1240 | ||
1241 | ($standard_abbrev, $summer_abbrev) = POSIX::tzname; | |
1242 | ||
1243 | An alternate (and possibly simpler) method to pushing values on the stack is | |
1244 | to use the macros: | |
1245 | ||
1246 | XPUSHi(IV) | |
1247 | XPUSHn(double) | |
1248 | XPUSHp(char*, I32) | |
1249 | XPUSHs(SV*) | |
1250 | ||
1251 | These macros automatically adjust the stack for you, if needed. Thus, you | |
1252 | do not need to call C<EXTEND> to extend the stack. | |
1253 | ||
1254 | For more information, consult L<perlxs> and L<perlxstut>. | |
1255 | ||
1256 | =head2 Calling Perl Routines from within C Programs | |
a0d0e21e LW |
1257 | |
1258 | There are four routines that can be used to call a Perl subroutine from | |
1259 | within a C program. These four are: | |
1260 | ||
954c1994 GS |
1261 | I32 call_sv(SV*, I32); |
1262 | I32 call_pv(const char*, I32); | |
1263 | I32 call_method(const char*, I32); | |
1264 | I32 call_argv(const char*, I32, register char**); | |
a0d0e21e | 1265 | |
954c1994 | 1266 | The routine most often used is C<call_sv>. The C<SV*> argument |
d1b91892 AD |
1267 | contains either the name of the Perl subroutine to be called, or a |
1268 | reference to the subroutine. The second argument consists of flags | |
1269 | that control the context in which the subroutine is called, whether | |
1270 | or not the subroutine is being passed arguments, how errors should be | |
1271 | trapped, and how to treat return values. | |
a0d0e21e LW |
1272 | |
1273 | All four routines return the number of arguments that the subroutine returned | |
1274 | on the Perl stack. | |
1275 | ||
954c1994 GS |
1276 | These routines used to be called C<perl_call_sv> etc., before Perl v5.6.0, |
1277 | but those names are now deprecated; macros of the same name are provided for | |
1278 | compatibility. | |
1279 | ||
1280 | When using any of these routines (except C<call_argv>), the programmer | |
d1b91892 AD |
1281 | must manipulate the Perl stack. These include the following macros and |
1282 | functions: | |
a0d0e21e LW |
1283 | |
1284 | dSP | |
924508f0 | 1285 | SP |
a0d0e21e LW |
1286 | PUSHMARK() |
1287 | PUTBACK | |
1288 | SPAGAIN | |
1289 | ENTER | |
1290 | SAVETMPS | |
1291 | FREETMPS | |
1292 | LEAVE | |
1293 | XPUSH*() | |
cb1a09d0 | 1294 | POP*() |
a0d0e21e | 1295 | |
5f05dabc | 1296 | For a detailed description of calling conventions from C to Perl, |
1297 | consult L<perlcall>. | |
a0d0e21e | 1298 | |
5f05dabc | 1299 | =head2 Memory Allocation |
a0d0e21e | 1300 | |
86058a2d GS |
1301 | All memory meant to be used with the Perl API functions should be manipulated |
1302 | using the macros described in this section. The macros provide the necessary | |
1303 | transparency between differences in the actual malloc implementation that is | |
1304 | used within perl. | |
1305 | ||
1306 | It is suggested that you enable the version of malloc that is distributed | |
5f05dabc | 1307 | with Perl. It keeps pools of various sizes of unallocated memory in |
07fa94a1 JO |
1308 | order to satisfy allocation requests more quickly. However, on some |
1309 | platforms, it may cause spurious malloc or free errors. | |
d1b91892 AD |
1310 | |
1311 | New(x, pointer, number, type); | |
1312 | Newc(x, pointer, number, type, cast); | |
1313 | Newz(x, pointer, number, type); | |
1314 | ||
07fa94a1 | 1315 | These three macros are used to initially allocate memory. |
5f05dabc | 1316 | |
1317 | The first argument C<x> was a "magic cookie" that was used to keep track | |
1318 | of who called the macro, to help when debugging memory problems. However, | |
07fa94a1 JO |
1319 | the current code makes no use of this feature (most Perl developers now |
1320 | use run-time memory checkers), so this argument can be any number. | |
5f05dabc | 1321 | |
1322 | The second argument C<pointer> should be the name of a variable that will | |
1323 | point to the newly allocated memory. | |
d1b91892 | 1324 | |
d1b91892 AD |
1325 | The third and fourth arguments C<number> and C<type> specify how many of |
1326 | the specified type of data structure should be allocated. The argument | |
1327 | C<type> is passed to C<sizeof>. The final argument to C<Newc>, C<cast>, | |
1328 | should be used if the C<pointer> argument is different from the C<type> | |
1329 | argument. | |
1330 | ||
1331 | Unlike the C<New> and C<Newc> macros, the C<Newz> macro calls C<memzero> | |
1332 | to zero out all the newly allocated memory. | |
1333 | ||
1334 | Renew(pointer, number, type); | |
1335 | Renewc(pointer, number, type, cast); | |
1336 | Safefree(pointer) | |
1337 | ||
1338 | These three macros are used to change a memory buffer size or to free a | |
1339 | piece of memory no longer needed. The arguments to C<Renew> and C<Renewc> | |
1340 | match those of C<New> and C<Newc> with the exception of not needing the | |
1341 | "magic cookie" argument. | |
1342 | ||
1343 | Move(source, dest, number, type); | |
1344 | Copy(source, dest, number, type); | |
1345 | Zero(dest, number, type); | |
1346 | ||
1347 | These three macros are used to move, copy, or zero out previously allocated | |
1348 | memory. The C<source> and C<dest> arguments point to the source and | |
1349 | destination starting points. Perl will move, copy, or zero out C<number> | |
1350 | instances of the size of the C<type> data structure (using the C<sizeof> | |
1351 | function). | |
a0d0e21e | 1352 | |
0cf5025f SC |
1353 | Here is a handy table of equivalents between ordinary C and Perl's |
1354 | memory abstraction layer: | |
1355 | ||
ef7adf26 JH |
1356 | Instead Of: Use: |
1357 | ||
1358 | malloc New | |
1359 | calloc Newz | |
1360 | realloc Renew | |
1361 | memcopy Copy | |
1362 | memmove Move | |
1363 | free Safefree | |
1364 | strdup savepv | |
1365 | strndup savepvn (Hey, strndup doesn't exist!) | |
1366 | memcpy/*(struct foo *) StructCopy | |
0cf5025f | 1367 | |
5f05dabc | 1368 | =head2 PerlIO |
ce3d39e2 | 1369 | |
5f05dabc | 1370 | The most recent development releases of Perl has been experimenting with |
1371 | removing Perl's dependency on the "normal" standard I/O suite and allowing | |
1372 | other stdio implementations to be used. This involves creating a new | |
1373 | abstraction layer that then calls whichever implementation of stdio Perl | |
68dc0745 | 1374 | was compiled with. All XSUBs should now use the functions in the PerlIO |
5f05dabc | 1375 | abstraction layer and not make any assumptions about what kind of stdio |
1376 | is being used. | |
1377 | ||
1378 | For a complete description of the PerlIO abstraction, consult L<perlapio>. | |
1379 | ||
8ebc5c01 | 1380 | =head2 Putting a C value on Perl stack |
ce3d39e2 IZ |
1381 | |
1382 | A lot of opcodes (this is an elementary operation in the internal perl | |
1383 | stack machine) put an SV* on the stack. However, as an optimization | |
1384 | the corresponding SV is (usually) not recreated each time. The opcodes | |
1385 | reuse specially assigned SVs (I<target>s) which are (as a corollary) | |
1386 | not constantly freed/created. | |
1387 | ||
0a753a76 | 1388 | Each of the targets is created only once (but see |
ce3d39e2 IZ |
1389 | L<Scratchpads and recursion> below), and when an opcode needs to put |
1390 | an integer, a double, or a string on stack, it just sets the | |
1391 | corresponding parts of its I<target> and puts the I<target> on stack. | |
1392 | ||
1393 | The macro to put this target on stack is C<PUSHTARG>, and it is | |
1394 | directly used in some opcodes, as well as indirectly in zillions of | |
1395 | others, which use it via C<(X)PUSH[pni]>. | |
1396 | ||
8ebc5c01 | 1397 | =head2 Scratchpads |
ce3d39e2 | 1398 | |
54310121 | 1399 | The question remains on when the SVs which are I<target>s for opcodes |
5f05dabc | 1400 | are created. The answer is that they are created when the current unit -- |
1401 | a subroutine or a file (for opcodes for statements outside of | |
1402 | subroutines) -- is compiled. During this time a special anonymous Perl | |
ce3d39e2 IZ |
1403 | array is created, which is called a scratchpad for the current |
1404 | unit. | |
1405 | ||
54310121 | 1406 | A scratchpad keeps SVs which are lexicals for the current unit and are |
ce3d39e2 IZ |
1407 | targets for opcodes. One can deduce that an SV lives on a scratchpad |
1408 | by looking on its flags: lexicals have C<SVs_PADMY> set, and | |
1409 | I<target>s have C<SVs_PADTMP> set. | |
1410 | ||
54310121 | 1411 | The correspondence between OPs and I<target>s is not 1-to-1. Different |
1412 | OPs in the compile tree of the unit can use the same target, if this | |
ce3d39e2 IZ |
1413 | would not conflict with the expected life of the temporary. |
1414 | ||
2ae324a7 | 1415 | =head2 Scratchpads and recursion |
ce3d39e2 IZ |
1416 | |
1417 | In fact it is not 100% true that a compiled unit contains a pointer to | |
1418 | the scratchpad AV. In fact it contains a pointer to an AV of | |
1419 | (initially) one element, and this element is the scratchpad AV. Why do | |
1420 | we need an extra level of indirection? | |
1421 | ||
1422 | The answer is B<recursion>, and maybe (sometime soon) B<threads>. Both | |
1423 | these can create several execution pointers going into the same | |
1424 | subroutine. For the subroutine-child not write over the temporaries | |
1425 | for the subroutine-parent (lifespan of which covers the call to the | |
1426 | child), the parent and the child should have different | |
1427 | scratchpads. (I<And> the lexicals should be separate anyway!) | |
1428 | ||
5f05dabc | 1429 | So each subroutine is born with an array of scratchpads (of length 1). |
1430 | On each entry to the subroutine it is checked that the current | |
ce3d39e2 IZ |
1431 | depth of the recursion is not more than the length of this array, and |
1432 | if it is, new scratchpad is created and pushed into the array. | |
1433 | ||
1434 | The I<target>s on this scratchpad are C<undef>s, but they are already | |
1435 | marked with correct flags. | |
1436 | ||
0a753a76 | 1437 | =head1 Compiled code |
1438 | ||
1439 | =head2 Code tree | |
1440 | ||
1441 | Here we describe the internal form your code is converted to by | |
1442 | Perl. Start with a simple example: | |
1443 | ||
1444 | $a = $b + $c; | |
1445 | ||
1446 | This is converted to a tree similar to this one: | |
1447 | ||
1448 | assign-to | |
1449 | / \ | |
1450 | + $a | |
1451 | / \ | |
1452 | $b $c | |
1453 | ||
7b8d334a | 1454 | (but slightly more complicated). This tree reflects the way Perl |
0a753a76 | 1455 | parsed your code, but has nothing to do with the execution order. |
1456 | There is an additional "thread" going through the nodes of the tree | |
1457 | which shows the order of execution of the nodes. In our simplified | |
1458 | example above it looks like: | |
1459 | ||
1460 | $b ---> $c ---> + ---> $a ---> assign-to | |
1461 | ||
1462 | But with the actual compile tree for C<$a = $b + $c> it is different: | |
1463 | some nodes I<optimized away>. As a corollary, though the actual tree | |
1464 | contains more nodes than our simplified example, the execution order | |
1465 | is the same as in our example. | |
1466 | ||
1467 | =head2 Examining the tree | |
1468 | ||
1469 | If you have your perl compiled for debugging (usually done with C<-D | |
1470 | optimize=-g> on C<Configure> command line), you may examine the | |
1471 | compiled tree by specifying C<-Dx> on the Perl command line. The | |
1472 | output takes several lines per node, and for C<$b+$c> it looks like | |
1473 | this: | |
1474 | ||
1475 | 5 TYPE = add ===> 6 | |
1476 | TARG = 1 | |
1477 | FLAGS = (SCALAR,KIDS) | |
1478 | { | |
1479 | TYPE = null ===> (4) | |
1480 | (was rv2sv) | |
1481 | FLAGS = (SCALAR,KIDS) | |
1482 | { | |
1483 | 3 TYPE = gvsv ===> 4 | |
1484 | FLAGS = (SCALAR) | |
1485 | GV = main::b | |
1486 | } | |
1487 | } | |
1488 | { | |
1489 | TYPE = null ===> (5) | |
1490 | (was rv2sv) | |
1491 | FLAGS = (SCALAR,KIDS) | |
1492 | { | |
1493 | 4 TYPE = gvsv ===> 5 | |
1494 | FLAGS = (SCALAR) | |
1495 | GV = main::c | |
1496 | } | |
1497 | } | |
1498 | ||
1499 | This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are | |
1500 | not optimized away (one per number in the left column). The immediate | |
1501 | children of the given node correspond to C<{}> pairs on the same level | |
1502 | of indentation, thus this listing corresponds to the tree: | |
1503 | ||
1504 | add | |
1505 | / \ | |
1506 | null null | |
1507 | | | | |
1508 | gvsv gvsv | |
1509 | ||
1510 | The execution order is indicated by C<===E<gt>> marks, thus it is C<3 | |
1511 | 4 5 6> (node C<6> is not included into above listing), i.e., | |
1512 | C<gvsv gvsv add whatever>. | |
1513 | ||
1514 | =head2 Compile pass 1: check routines | |
1515 | ||
8870b5c7 GS |
1516 | The tree is created by the compiler while I<yacc> code feeds it |
1517 | the constructions it recognizes. Since I<yacc> works bottom-up, so does | |
0a753a76 | 1518 | the first pass of perl compilation. |
1519 | ||
1520 | What makes this pass interesting for perl developers is that some | |
1521 | optimization may be performed on this pass. This is optimization by | |
8870b5c7 | 1522 | so-called "check routines". The correspondence between node names |
0a753a76 | 1523 | and corresponding check routines is described in F<opcode.pl> (do not |
1524 | forget to run C<make regen_headers> if you modify this file). | |
1525 | ||
1526 | A check routine is called when the node is fully constructed except | |
7b8d334a | 1527 | for the execution-order thread. Since at this time there are no |
0a753a76 | 1528 | back-links to the currently constructed node, one can do most any |
1529 | operation to the top-level node, including freeing it and/or creating | |
1530 | new nodes above/below it. | |
1531 | ||
1532 | The check routine returns the node which should be inserted into the | |
1533 | tree (if the top-level node was not modified, check routine returns | |
1534 | its argument). | |
1535 | ||
1536 | By convention, check routines have names C<ck_*>. They are usually | |
1537 | called from C<new*OP> subroutines (or C<convert>) (which in turn are | |
1538 | called from F<perly.y>). | |
1539 | ||
1540 | =head2 Compile pass 1a: constant folding | |
1541 | ||
1542 | Immediately after the check routine is called the returned node is | |
1543 | checked for being compile-time executable. If it is (the value is | |
1544 | judged to be constant) it is immediately executed, and a I<constant> | |
1545 | node with the "return value" of the corresponding subtree is | |
1546 | substituted instead. The subtree is deleted. | |
1547 | ||
1548 | If constant folding was not performed, the execution-order thread is | |
1549 | created. | |
1550 | ||
1551 | =head2 Compile pass 2: context propagation | |
1552 | ||
1553 | When a context for a part of compile tree is known, it is propagated | |
a3cb178b | 1554 | down through the tree. At this time the context can have 5 values |
0a753a76 | 1555 | (instead of 2 for runtime context): void, boolean, scalar, list, and |
1556 | lvalue. In contrast with the pass 1 this pass is processed from top | |
1557 | to bottom: a node's context determines the context for its children. | |
1558 | ||
1559 | Additional context-dependent optimizations are performed at this time. | |
1560 | Since at this moment the compile tree contains back-references (via | |
1561 | "thread" pointers), nodes cannot be free()d now. To allow | |
1562 | optimized-away nodes at this stage, such nodes are null()ified instead | |
1563 | of free()ing (i.e. their type is changed to OP_NULL). | |
1564 | ||
1565 | =head2 Compile pass 3: peephole optimization | |
1566 | ||
1567 | After the compile tree for a subroutine (or for an C<eval> or a file) | |
1568 | is created, an additional pass over the code is performed. This pass | |
1569 | is neither top-down or bottom-up, but in the execution order (with | |
7b8d334a | 1570 | additional complications for conditionals). These optimizations are |
0a753a76 | 1571 | done in the subroutine peep(). Optimizations performed at this stage |
1572 | are subject to the same restrictions as in the pass 2. | |
1573 | ||
954c1994 | 1574 | =head1 How multiple interpreters and concurrency are supported |
ee072b34 | 1575 | |
ee072b34 GS |
1576 | =head2 Background and PERL_IMPLICIT_CONTEXT |
1577 | ||
1578 | The Perl interpreter can be regarded as a closed box: it has an API | |
1579 | for feeding it code or otherwise making it do things, but it also has | |
1580 | functions for its own use. This smells a lot like an object, and | |
1581 | there are ways for you to build Perl so that you can have multiple | |
1582 | interpreters, with one interpreter represented either as a C++ object, | |
1583 | a C structure, or inside a thread. The thread, the C structure, or | |
1584 | the C++ object will contain all the context, the state of that | |
1585 | interpreter. | |
1586 | ||
54aff467 GS |
1587 | Three macros control the major Perl build flavors: MULTIPLICITY, |
1588 | USE_THREADS and PERL_OBJECT. The MULTIPLICITY build has a C structure | |
1589 | that packages all the interpreter state, there is a similar thread-specific | |
1590 | data structure under USE_THREADS, and the PERL_OBJECT build has a C++ | |
1591 | class to maintain interpreter state. In all three cases, | |
1592 | PERL_IMPLICIT_CONTEXT is also normally defined, and enables the | |
1593 | support for passing in a "hidden" first argument that represents all three | |
651a3225 | 1594 | data structures. |
54aff467 GS |
1595 | |
1596 | All this obviously requires a way for the Perl internal functions to be | |
ee072b34 GS |
1597 | C++ methods, subroutines taking some kind of structure as the first |
1598 | argument, or subroutines taking nothing as the first argument. To | |
1599 | enable these three very different ways of building the interpreter, | |
1600 | the Perl source (as it does in so many other situations) makes heavy | |
1601 | use of macros and subroutine naming conventions. | |
1602 | ||
54aff467 | 1603 | First problem: deciding which functions will be public API functions and |
954c1994 GS |
1604 | which will be private. All functions whose names begin C<S_> are private |
1605 | (think "S" for "secret" or "static"). All other functions begin with | |
1606 | "Perl_", but just because a function begins with "Perl_" does not mean it is | |
a422fd2d SC |
1607 | part of the API. (See L</Internal Functions>.) The easiest way to be B<sure> a |
1608 | function is part of the API is to find its entry in L<perlapi>. | |
1609 | If it exists in L<perlapi>, it's part of the API. If it doesn't, and you | |
1610 | think it should be (i.e., you need it for your extension), send mail via | |
1611 | L<perlbug> explaining why you think it should be. | |
ee072b34 GS |
1612 | |
1613 | Second problem: there must be a syntax so that the same subroutine | |
1614 | declarations and calls can pass a structure as their first argument, | |
1615 | or pass nothing. To solve this, the subroutines are named and | |
1616 | declared in a particular way. Here's a typical start of a static | |
1617 | function used within the Perl guts: | |
1618 | ||
1619 | STATIC void | |
1620 | S_incline(pTHX_ char *s) | |
1621 | ||
1622 | STATIC becomes "static" in C, and is #define'd to nothing in C++. | |
1623 | ||
651a3225 GS |
1624 | A public function (i.e. part of the internal API, but not necessarily |
1625 | sanctioned for use in extensions) begins like this: | |
ee072b34 GS |
1626 | |
1627 | void | |
1628 | Perl_sv_setsv(pTHX_ SV* dsv, SV* ssv) | |
1629 | ||
1630 | C<pTHX_> is one of a number of macros (in perl.h) that hide the | |
1631 | details of the interpreter's context. THX stands for "thread", "this", | |
1632 | or "thingy", as the case may be. (And no, George Lucas is not involved. :-) | |
1633 | The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument, | |
1634 | or 'd' for B<d>eclaration. | |
1635 | ||
1636 | When Perl is built without PERL_IMPLICIT_CONTEXT, there is no first | |
1637 | argument containing the interpreter's context. The trailing underscore | |
1638 | in the pTHX_ macro indicates that the macro expansion needs a comma | |
1639 | after the context argument because other arguments follow it. If | |
1640 | PERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be ignored, and the | |
54aff467 GS |
1641 | subroutine is not prototyped to take the extra argument. The form of the |
1642 | macro without the trailing underscore is used when there are no additional | |
ee072b34 GS |
1643 | explicit arguments. |
1644 | ||
54aff467 | 1645 | When a core function calls another, it must pass the context. This |
ee072b34 GS |
1646 | is normally hidden via macros. Consider C<sv_setsv>. It expands |
1647 | something like this: | |
1648 | ||
1649 | ifdef PERL_IMPLICIT_CONTEXT | |
1650 | define sv_setsv(a,b) Perl_sv_setsv(aTHX_ a, b) | |
1651 | /* can't do this for vararg functions, see below */ | |
1652 | else | |
1653 | define sv_setsv Perl_sv_setsv | |
1654 | endif | |
1655 | ||
1656 | This works well, and means that XS authors can gleefully write: | |
1657 | ||
1658 | sv_setsv(foo, bar); | |
1659 | ||
1660 | and still have it work under all the modes Perl could have been | |
1661 | compiled with. | |
1662 | ||
1663 | Under PERL_OBJECT in the core, that will translate to either: | |
1664 | ||
1665 | CPerlObj::Perl_sv_setsv(foo,bar); # in CPerlObj functions, | |
1666 | # C++ takes care of 'this' | |
1667 | or | |
1668 | ||
1669 | pPerl->Perl_sv_setsv(foo,bar); # in truly static functions, | |
1670 | # see objXSUB.h | |
1671 | ||
1672 | Under PERL_OBJECT in extensions (aka PERL_CAPI), or under | |
1673 | MULTIPLICITY/USE_THREADS w/ PERL_IMPLICIT_CONTEXT in both core | |
1674 | and extensions, it will be: | |
1675 | ||
1676 | Perl_sv_setsv(aTHX_ foo, bar); # the canonical Perl "API" | |
1677 | # for all build flavors | |
1678 | ||
1679 | This doesn't work so cleanly for varargs functions, though, as macros | |
1680 | imply that the number of arguments is known in advance. Instead we | |
1681 | either need to spell them out fully, passing C<aTHX_> as the first | |
1682 | argument (the Perl core tends to do this with functions like | |
1683 | Perl_warner), or use a context-free version. | |
1684 | ||
1685 | The context-free version of Perl_warner is called | |
1686 | Perl_warner_nocontext, and does not take the extra argument. Instead | |
1687 | it does dTHX; to get the context from thread-local storage. We | |
1688 | C<#define warner Perl_warner_nocontext> so that extensions get source | |
1689 | compatibility at the expense of performance. (Passing an arg is | |
1690 | cheaper than grabbing it from thread-local storage.) | |
1691 | ||
1692 | You can ignore [pad]THX[xo] when browsing the Perl headers/sources. | |
1693 | Those are strictly for use within the core. Extensions and embedders | |
1694 | need only be aware of [pad]THX. | |
1695 | ||
1696 | =head2 How do I use all this in extensions? | |
1697 | ||
1698 | When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call | |
1699 | any functions in the Perl API will need to pass the initial context | |
1700 | argument somehow. The kicker is that you will need to write it in | |
1701 | such a way that the extension still compiles when Perl hasn't been | |
1702 | built with PERL_IMPLICIT_CONTEXT enabled. | |
1703 | ||
1704 | There are three ways to do this. First, the easy but inefficient way, | |
1705 | which is also the default, in order to maintain source compatibility | |
1706 | with extensions: whenever XSUB.h is #included, it redefines the aTHX | |
1707 | and aTHX_ macros to call a function that will return the context. | |
1708 | Thus, something like: | |
1709 | ||
1710 | sv_setsv(asv, bsv); | |
1711 | ||
4375e838 | 1712 | in your extension will translate to this when PERL_IMPLICIT_CONTEXT is |
54aff467 | 1713 | in effect: |
ee072b34 | 1714 | |
2fa86c13 | 1715 | Perl_sv_setsv(Perl_get_context(), asv, bsv); |
ee072b34 | 1716 | |
54aff467 | 1717 | or to this otherwise: |
ee072b34 GS |
1718 | |
1719 | Perl_sv_setsv(asv, bsv); | |
1720 | ||
1721 | You have to do nothing new in your extension to get this; since | |
2fa86c13 | 1722 | the Perl library provides Perl_get_context(), it will all just |
ee072b34 GS |
1723 | work. |
1724 | ||
1725 | The second, more efficient way is to use the following template for | |
1726 | your Foo.xs: | |
1727 | ||
1728 | #define PERL_NO_GET_CONTEXT /* we want efficiency */ | |
1729 | #include "EXTERN.h" | |
1730 | #include "perl.h" | |
1731 | #include "XSUB.h" | |
1732 | ||
1733 | static my_private_function(int arg1, int arg2); | |
1734 | ||
1735 | static SV * | |
54aff467 | 1736 | my_private_function(int arg1, int arg2) |
ee072b34 GS |
1737 | { |
1738 | dTHX; /* fetch context */ | |
1739 | ... call many Perl API functions ... | |
1740 | } | |
1741 | ||
1742 | [... etc ...] | |
1743 | ||
1744 | MODULE = Foo PACKAGE = Foo | |
1745 | ||
1746 | /* typical XSUB */ | |
1747 | ||
1748 | void | |
1749 | my_xsub(arg) | |
1750 | int arg | |
1751 | CODE: | |
1752 | my_private_function(arg, 10); | |
1753 | ||
1754 | Note that the only two changes from the normal way of writing an | |
1755 | extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before | |
1756 | including the Perl headers, followed by a C<dTHX;> declaration at | |
1757 | the start of every function that will call the Perl API. (You'll | |
1758 | know which functions need this, because the C compiler will complain | |
1759 | that there's an undeclared identifier in those functions.) No changes | |
1760 | are needed for the XSUBs themselves, because the XS() macro is | |
1761 | correctly defined to pass in the implicit context if needed. | |
1762 | ||
1763 | The third, even more efficient way is to ape how it is done within | |
1764 | the Perl guts: | |
1765 | ||
1766 | ||
1767 | #define PERL_NO_GET_CONTEXT /* we want efficiency */ | |
1768 | #include "EXTERN.h" | |
1769 | #include "perl.h" | |
1770 | #include "XSUB.h" | |
1771 | ||
1772 | /* pTHX_ only needed for functions that call Perl API */ | |
1773 | static my_private_function(pTHX_ int arg1, int arg2); | |
1774 | ||
1775 | static SV * | |
1776 | my_private_function(pTHX_ int arg1, int arg2) | |
1777 | { | |
1778 | /* dTHX; not needed here, because THX is an argument */ | |
1779 | ... call Perl API functions ... | |
1780 | } | |
1781 | ||
1782 | [... etc ...] | |
1783 | ||
1784 | MODULE = Foo PACKAGE = Foo | |
1785 | ||
1786 | /* typical XSUB */ | |
1787 | ||
1788 | void | |
1789 | my_xsub(arg) | |
1790 | int arg | |
1791 | CODE: | |
1792 | my_private_function(aTHX_ arg, 10); | |
1793 | ||
1794 | This implementation never has to fetch the context using a function | |
1795 | call, since it is always passed as an extra argument. Depending on | |
1796 | your needs for simplicity or efficiency, you may mix the previous | |
1797 | two approaches freely. | |
1798 | ||
651a3225 GS |
1799 | Never add a comma after C<pTHX> yourself--always use the form of the |
1800 | macro with the underscore for functions that take explicit arguments, | |
1801 | or the form without the argument for functions with no explicit arguments. | |
ee072b34 GS |
1802 | |
1803 | =head2 Future Plans and PERL_IMPLICIT_SYS | |
1804 | ||
1805 | Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything | |
1806 | that the interpreter knows about itself and pass it around, so too are | |
1807 | there plans to allow the interpreter to bundle up everything it knows | |
1808 | about the environment it's running on. This is enabled with the | |
1809 | PERL_IMPLICIT_SYS macro. Currently it only works with PERL_OBJECT, | |
1810 | but is mostly there for MULTIPLICITY and USE_THREADS (see inside | |
1811 | iperlsys.h). | |
1812 | ||
1813 | This allows the ability to provide an extra pointer (called the "host" | |
1814 | environment) for all the system calls. This makes it possible for | |
1815 | all the system stuff to maintain their own state, broken down into | |
1816 | seven C structures. These are thin wrappers around the usual system | |
1817 | calls (see win32/perllib.c) for the default perl executable, but for a | |
1818 | more ambitious host (like the one that would do fork() emulation) all | |
1819 | the extra work needed to pretend that different interpreters are | |
1820 | actually different "processes", would be done here. | |
1821 | ||
1822 | The Perl engine/interpreter and the host are orthogonal entities. | |
1823 | There could be one or more interpreters in a process, and one or | |
1824 | more "hosts", with free association between them. | |
1825 | ||
a422fd2d SC |
1826 | =head1 Internal Functions |
1827 | ||
1828 | All of Perl's internal functions which will be exposed to the outside | |
1829 | world are be prefixed by C<Perl_> so that they will not conflict with XS | |
1830 | functions or functions used in a program in which Perl is embedded. | |
1831 | Similarly, all global variables begin with C<PL_>. (By convention, | |
1832 | static functions start with C<S_>) | |
1833 | ||
1834 | Inside the Perl core, you can get at the functions either with or | |
1835 | without the C<Perl_> prefix, thanks to a bunch of defines that live in | |
1836 | F<embed.h>. This header file is generated automatically from | |
1837 | F<embed.pl>. F<embed.pl> also creates the prototyping header files for | |
1838 | the internal functions, generates the documentation and a lot of other | |
1839 | bits and pieces. It's important that when you add a new function to the | |
1840 | core or change an existing one, you change the data in the table at the | |
1841 | end of F<embed.pl> as well. Here's a sample entry from that table: | |
1842 | ||
1843 | Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval | |
1844 | ||
1845 | The second column is the return type, the third column the name. Columns | |
1846 | after that are the arguments. The first column is a set of flags: | |
1847 | ||
1848 | =over 3 | |
1849 | ||
1850 | =item A | |
1851 | ||
1852 | This function is a part of the public API. | |
1853 | ||
1854 | =item p | |
1855 | ||
1856 | This function has a C<Perl_> prefix; ie, it is defined as C<Perl_av_fetch> | |
1857 | ||
1858 | =item d | |
1859 | ||
1860 | This function has documentation using the C<apidoc> feature which we'll | |
1861 | look at in a second. | |
1862 | ||
1863 | =back | |
1864 | ||
1865 | Other available flags are: | |
1866 | ||
1867 | =over 3 | |
1868 | ||
1869 | =item s | |
1870 | ||
1871 | This is a static function and is defined as C<S_whatever>. | |
1872 | ||
1873 | =item n | |
1874 | ||
1875 | This does not use C<aTHX_> and C<pTHX> to pass interpreter context. (See | |
1876 | L<perlguts/Background and PERL_IMPLICIT_CONTEXT>.) | |
1877 | ||
1878 | =item r | |
1879 | ||
1880 | This function never returns; C<croak>, C<exit> and friends. | |
1881 | ||
1882 | =item f | |
1883 | ||
1884 | This function takes a variable number of arguments, C<printf> style. | |
1885 | The argument list should end with C<...>, like this: | |
1886 | ||
1887 | Afprd |void |croak |const char* pat|... | |
1888 | ||
1889 | =item m | |
1890 | ||
1891 | This function is part of the experimental development API, and may change | |
1892 | or disappear without notice. | |
1893 | ||
1894 | =item o | |
1895 | ||
1896 | This function should not have a compatibility macro to define, say, | |
1897 | C<Perl_parse> to C<parse>. It must be called as C<Perl_parse>. | |
1898 | ||
1899 | =item j | |
1900 | ||
1901 | This function is not a member of C<CPerlObj>. If you don't know | |
1902 | what this means, don't use it. | |
1903 | ||
1904 | =item x | |
1905 | ||
1906 | This function isn't exported out of the Perl core. | |
1907 | ||
1908 | =back | |
1909 | ||
1910 | If you edit F<embed.pl>, you will need to run C<make regen_headers> to | |
1911 | force a rebuild of F<embed.h> and other auto-generated files. | |
1912 | ||
6b4667fc | 1913 | =head2 Formatted Printing of IVs, UVs, and NVs |
9dd9db0b | 1914 | |
6b4667fc A |
1915 | If you are printing IVs, UVs, or NVS instead of the stdio(3) style |
1916 | formatting codes like C<%d>, C<%ld>, C<%f>, you should use the | |
1917 | following macros for portability | |
9dd9db0b JH |
1918 | |
1919 | IVdf IV in decimal | |
1920 | UVuf UV in decimal | |
1921 | UVof UV in octal | |
1922 | UVxf UV in hexadecimal | |
6b4667fc A |
1923 | NVef NV %e-like |
1924 | NVff NV %f-like | |
1925 | NVgf NV %g-like | |
9dd9db0b | 1926 | |
6b4667fc A |
1927 | These will take care of 64-bit integers and long doubles. |
1928 | For example: | |
1929 | ||
1930 | printf("IV is %"IVdf"\n", iv); | |
1931 | ||
1932 | The IVdf will expand to whatever is the correct format for the IVs. | |
9dd9db0b | 1933 | |
8908e76d JH |
1934 | If you are printing addresses of pointers, use UVxf combined |
1935 | with PTR2UV(), do not use %lx or %p. | |
1936 | ||
1937 | =head2 Pointer-To-Integer and Integer-To-Pointer | |
1938 | ||
1939 | Because pointer size does not necessarily equal integer size, | |
1940 | use the follow macros to do it right. | |
1941 | ||
1942 | PTR2UV(pointer) | |
1943 | PTR2IV(pointer) | |
1944 | PTR2NV(pointer) | |
1945 | INT2PTR(pointertotype, integer) | |
1946 | ||
1947 | For example: | |
1948 | ||
1949 | IV iv = ...; | |
1950 | SV *sv = INT2PTR(SV*, iv); | |
1951 | ||
1952 | and | |
1953 | ||
1954 | AV *av = ...; | |
1955 | UV uv = PTR2UV(av); | |
1956 | ||
a422fd2d SC |
1957 | =head2 Source Documentation |
1958 | ||
1959 | There's an effort going on to document the internal functions and | |
1960 | automatically produce reference manuals from them - L<perlapi> is one | |
1961 | such manual which details all the functions which are available to XS | |
1962 | writers. L<perlintern> is the autogenerated manual for the functions | |
1963 | which are not part of the API and are supposedly for internal use only. | |
1964 | ||
1965 | Source documentation is created by putting POD comments into the C | |
1966 | source, like this: | |
1967 | ||
1968 | /* | |
1969 | =for apidoc sv_setiv | |
1970 | ||
1971 | Copies an integer into the given SV. Does not handle 'set' magic. See | |
1972 | C<sv_setiv_mg>. | |
1973 | ||
1974 | =cut | |
1975 | */ | |
1976 | ||
1977 | Please try and supply some documentation if you add functions to the | |
1978 | Perl core. | |
1979 | ||
1980 | =head1 Unicode Support | |
1981 | ||
1982 | Perl 5.6.0 introduced Unicode support. It's important for porters and XS | |
1983 | writers to understand this support and make sure that the code they | |
1984 | write does not corrupt Unicode data. | |
1985 | ||
1986 | =head2 What B<is> Unicode, anyway? | |
1987 | ||
1988 | In the olden, less enlightened times, we all used to use ASCII. Most of | |
1989 | us did, anyway. The big problem with ASCII is that it's American. Well, | |
1990 | no, that's not actually the problem; the problem is that it's not | |
1991 | particularly useful for people who don't use the Roman alphabet. What | |
1992 | used to happen was that particular languages would stick their own | |
1993 | alphabet in the upper range of the sequence, between 128 and 255. Of | |
1994 | course, we then ended up with plenty of variants that weren't quite | |
1995 | ASCII, and the whole point of it being a standard was lost. | |
1996 | ||
1997 | Worse still, if you've got a language like Chinese or | |
1998 | Japanese that has hundreds or thousands of characters, then you really | |
1999 | can't fit them into a mere 256, so they had to forget about ASCII | |
2000 | altogether, and build their own systems using pairs of numbers to refer | |
2001 | to one character. | |
2002 | ||
2003 | To fix this, some people formed Unicode, Inc. and | |
2004 | produced a new character set containing all the characters you can | |
2005 | possibly think of and more. There are several ways of representing these | |
2006 | characters, and the one Perl uses is called UTF8. UTF8 uses | |
2007 | a variable number of bytes to represent a character, instead of just | |
b3b6085d | 2008 | one. You can learn more about Unicode at http://www.unicode.org/ |
a422fd2d SC |
2009 | |
2010 | =head2 How can I recognise a UTF8 string? | |
2011 | ||
2012 | You can't. This is because UTF8 data is stored in bytes just like | |
2013 | non-UTF8 data. The Unicode character 200, (C<0xC8> for you hex types) | |
2014 | capital E with a grave accent, is represented by the two bytes | |
2015 | C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)> | |
2016 | has that byte sequence as well. So you can't tell just by looking - this | |
2017 | is what makes Unicode input an interesting problem. | |
2018 | ||
2019 | The API function C<is_utf8_string> can help; it'll tell you if a string | |
2020 | contains only valid UTF8 characters. However, it can't do the work for | |
2021 | you. On a character-by-character basis, C<is_utf8_char> will tell you | |
2022 | whether the current character in a string is valid UTF8. | |
2023 | ||
2024 | =head2 How does UTF8 represent Unicode characters? | |
2025 | ||
2026 | As mentioned above, UTF8 uses a variable number of bytes to store a | |
2027 | character. Characters with values 1...128 are stored in one byte, just | |
2028 | like good ol' ASCII. Character 129 is stored as C<v194.129>; this | |
a31a806a | 2029 | continues up to character 191, which is C<v194.191>. Now we've run out of |
a422fd2d SC |
2030 | bits (191 is binary C<10111111>) so we move on; 192 is C<v195.128>. And |
2031 | so it goes on, moving to three bytes at character 2048. | |
2032 | ||
2033 | Assuming you know you're dealing with a UTF8 string, you can find out | |
2034 | how long the first character in it is with the C<UTF8SKIP> macro: | |
2035 | ||
2036 | char *utf = "\305\233\340\240\201"; | |
2037 | I32 len; | |
2038 | ||
2039 | len = UTF8SKIP(utf); /* len is 2 here */ | |
2040 | utf += len; | |
2041 | len = UTF8SKIP(utf); /* len is 3 here */ | |
2042 | ||
2043 | Another way to skip over characters in a UTF8 string is to use | |
2044 | C<utf8_hop>, which takes a string and a number of characters to skip | |
2045 | over. You're on your own about bounds checking, though, so don't use it | |
2046 | lightly. | |
2047 | ||
2048 | All bytes in a multi-byte UTF8 character will have the high bit set, so | |
2049 | you can test if you need to do something special with this character | |
2050 | like this: | |
2051 | ||
2052 | UV uv; | |
2053 | ||
2054 | if (utf & 0x80) | |
2055 | /* Must treat this as UTF8 */ | |
2056 | uv = utf8_to_uv(utf); | |
2057 | else | |
2058 | /* OK to treat this character as a byte */ | |
2059 | uv = *utf; | |
2060 | ||
2061 | You can also see in that example that we use C<utf8_to_uv> to get the | |
2062 | value of the character; the inverse function C<uv_to_utf8> is available | |
2063 | for putting a UV into UTF8: | |
2064 | ||
2065 | if (uv > 0x80) | |
2066 | /* Must treat this as UTF8 */ | |
2067 | utf8 = uv_to_utf8(utf8, uv); | |
2068 | else | |
2069 | /* OK to treat this character as a byte */ | |
2070 | *utf8++ = uv; | |
2071 | ||
2072 | You B<must> convert characters to UVs using the above functions if | |
2073 | you're ever in a situation where you have to match UTF8 and non-UTF8 | |
2074 | characters. You may not skip over UTF8 characters in this case. If you | |
2075 | do this, you'll lose the ability to match hi-bit non-UTF8 characters; | |
2076 | for instance, if your UTF8 string contains C<v196.172>, and you skip | |
2077 | that character, you can never match a C<chr(200)> in a non-UTF8 string. | |
2078 | So don't do that! | |
2079 | ||
2080 | =head2 How does Perl store UTF8 strings? | |
2081 | ||
2082 | Currently, Perl deals with Unicode strings and non-Unicode strings | |
2083 | slightly differently. If a string has been identified as being UTF-8 | |
2084 | encoded, Perl will set a flag in the SV, C<SVf_UTF8>. You can check and | |
2085 | manipulate this flag with the following macros: | |
2086 | ||
2087 | SvUTF8(sv) | |
2088 | SvUTF8_on(sv) | |
2089 | SvUTF8_off(sv) | |
2090 | ||
2091 | This flag has an important effect on Perl's treatment of the string: if | |
2092 | Unicode data is not properly distinguished, regular expressions, | |
2093 | C<length>, C<substr> and other string handling operations will have | |
2094 | undesirable results. | |
2095 | ||
2096 | The problem comes when you have, for instance, a string that isn't | |
2097 | flagged is UTF8, and contains a byte sequence that could be UTF8 - | |
2098 | especially when combining non-UTF8 and UTF8 strings. | |
2099 | ||
2100 | Never forget that the C<SVf_UTF8> flag is separate to the PV value; you | |
2101 | need be sure you don't accidentally knock it off while you're | |
2102 | manipulating SVs. More specifically, you cannot expect to do this: | |
2103 | ||
2104 | SV *sv; | |
2105 | SV *nsv; | |
2106 | STRLEN len; | |
2107 | char *p; | |
2108 | ||
2109 | p = SvPV(sv, len); | |
2110 | frobnicate(p); | |
2111 | nsv = newSVpvn(p, len); | |
2112 | ||
2113 | The C<char*> string does not tell you the whole story, and you can't | |
2114 | copy or reconstruct an SV just by copying the string value. Check if the | |
2115 | old SV has the UTF8 flag set, and act accordingly: | |
2116 | ||
2117 | p = SvPV(sv, len); | |
2118 | frobnicate(p); | |
2119 | nsv = newSVpvn(p, len); | |
2120 | if (SvUTF8(sv)) | |
2121 | SvUTF8_on(nsv); | |
2122 | ||
2123 | In fact, your C<frobnicate> function should be made aware of whether or | |
2124 | not it's dealing with UTF8 data, so that it can handle the string | |
2125 | appropriately. | |
2126 | ||
2127 | =head2 How do I convert a string to UTF8? | |
2128 | ||
2129 | If you're mixing UTF8 and non-UTF8 strings, you might find it necessary | |
2130 | to upgrade one of the strings to UTF8. If you've got an SV, the easiest | |
2131 | way to do this is: | |
2132 | ||
2133 | sv_utf8_upgrade(sv); | |
2134 | ||
2135 | However, you must not do this, for example: | |
2136 | ||
2137 | if (!SvUTF8(left)) | |
2138 | sv_utf8_upgrade(left); | |
2139 | ||
2140 | If you do this in a binary operator, you will actually change one of the | |
b1866b2d | 2141 | strings that came into the operator, and, while it shouldn't be noticeable |
a422fd2d SC |
2142 | by the end user, it can cause problems. |
2143 | ||
2144 | Instead, C<bytes_to_utf8> will give you a UTF8-encoded B<copy> of its | |
2145 | string argument. This is useful for having the data available for | |
b1866b2d | 2146 | comparisons and so on, without harming the original SV. There's also |
a422fd2d SC |
2147 | C<utf8_to_bytes> to go the other way, but naturally, this will fail if |
2148 | the string contains any characters above 255 that can't be represented | |
2149 | in a single byte. | |
2150 | ||
2151 | =head2 Is there anything else I need to know? | |
2152 | ||
2153 | Not really. Just remember these things: | |
2154 | ||
2155 | =over 3 | |
2156 | ||
2157 | =item * | |
2158 | ||
2159 | There's no way to tell if a string is UTF8 or not. You can tell if an SV | |
2160 | is UTF8 by looking at is C<SvUTF8> flag. Don't forget to set the flag if | |
2161 | something should be UTF8. Treat the flag as part of the PV, even though | |
2162 | it's not - if you pass on the PV to somewhere, pass on the flag too. | |
2163 | ||
2164 | =item * | |
2165 | ||
2166 | If a string is UTF8, B<always> use C<utf8_to_uv> to get at the value, | |
2167 | unless C<!(*s & 0x80)> in which case you can use C<*s>. | |
2168 | ||
2169 | =item * | |
2170 | ||
2171 | When writing to a UTF8 string, B<always> use C<uv_to_utf8>, unless | |
2172 | C<uv < 0x80> in which case you can use C<*s = uv>. | |
2173 | ||
2174 | =item * | |
2175 | ||
2176 | Mixing UTF8 and non-UTF8 strings is tricky. Use C<bytes_to_utf8> to get | |
2177 | a new string which is UTF8 encoded. There are tricks you can use to | |
2178 | delay deciding whether you need to use a UTF8 string until you get to a | |
2179 | high character - C<HALF_UPGRADE> is one of those. | |
2180 | ||
2181 | =back | |
2182 | ||
954c1994 | 2183 | =head1 AUTHORS |
e89caa19 | 2184 | |
954c1994 GS |
2185 | Until May 1997, this document was maintained by Jeff Okamoto |
2186 | <okamoto@corp.hp.com>. It is now maintained as part of Perl itself | |
2187 | by the Perl 5 Porters <perl5-porters@perl.org>. | |
cb1a09d0 | 2188 | |
954c1994 GS |
2189 | With lots of help and suggestions from Dean Roehrich, Malcolm Beattie, |
2190 | Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil | |
2191 | Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer, | |
2192 | Stephen McCamant, and Gurusamy Sarathy. | |
cb1a09d0 | 2193 | |
954c1994 | 2194 | API Listing originally by Dean Roehrich <roehrich@cray.com>. |
cb1a09d0 | 2195 | |
954c1994 GS |
2196 | Modifications to autogenerate the API listing (L<perlapi>) by Benjamin |
2197 | Stuhl. | |
cb1a09d0 | 2198 | |
954c1994 | 2199 | =head1 SEE ALSO |
cb1a09d0 | 2200 | |
954c1994 | 2201 | perlapi(1), perlintern(1), perlxs(1), perlembed(1) |