This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
diff -se shows these as different
[perl5.git] / pod / perlguts.pod
CommitLineData
a0d0e21e
LW
1=head1 NAME
2
954c1994 3perlguts - Introduction to the Perl API
a0d0e21e
LW
4
5=head1 DESCRIPTION
6
b3b6085d
PP
7This document attempts to describe how to use the Perl API, as well as
8containing some info on the basic workings of the Perl core. It is far
9from complete and probably contains many errors. Please refer any
10questions or comments to the author below.
a0d0e21e 11
0a753a76 12=head1 Variables
13
5f05dabc 14=head2 Datatypes
a0d0e21e
LW
15
16Perl has three typedefs that handle Perl's three main data types:
17
18 SV Scalar Value
19 AV Array Value
20 HV Hash Value
21
d1b91892 22Each typedef has specific routines that manipulate the various data types.
a0d0e21e
LW
23
24=head2 What is an "IV"?
25
954c1994 26Perl uses a special typedef IV which is a simple signed integer type that is
5f05dabc 27guaranteed to be large enough to hold a pointer (as well as an integer).
954c1994 28Additionally, there is the UV, which is simply an unsigned IV.
a0d0e21e 29
d1b91892 30Perl also uses two special typedefs, I32 and I16, which will always be at
954c1994
GS
31least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16,
32as well.)
a0d0e21e 33
54310121 34=head2 Working with SVs
a0d0e21e
LW
35
36An SV can be created and loaded with one command. There are four types of
a7dfe00a
MS
37values that can be loaded: an integer value (IV), a double (NV),
38a string (PV), and another scalar (SV).
a0d0e21e 39
9da1e3b5 40The six routines are:
a0d0e21e
LW
41
42 SV* newSViv(IV);
43 SV* newSVnv(double);
08105a92
GS
44 SV* newSVpv(const char*, int);
45 SV* newSVpvn(const char*, int);
46fc3d4c 46 SV* newSVpvf(const char*, ...);
a0d0e21e
LW
47 SV* newSVsv(SV*);
48
deb3007b 49To change the value of an *already-existing* SV, there are seven routines:
a0d0e21e
LW
50
51 void sv_setiv(SV*, IV);
deb3007b 52 void sv_setuv(SV*, UV);
a0d0e21e 53 void sv_setnv(SV*, double);
08105a92
GS
54 void sv_setpv(SV*, const char*);
55 void sv_setpvn(SV*, const char*, int)
46fc3d4c 56 void sv_setpvf(SV*, const char*, ...);
9abd00ed 57 void sv_setpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool);
a0d0e21e
LW
58 void sv_setsv(SV*, SV*);
59
60Notice that you can choose to specify the length of the string to be
9da1e3b5
MUN
61assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may
62allow Perl to calculate the length by using C<sv_setpv> or by specifying
630 as the second argument to C<newSVpv>. Be warned, though, that Perl will
64determine the string's length by using C<strlen>, which depends on the
9abd00ed
GS
65string terminating with a NUL character.
66
67The arguments of C<sv_setpvf> are processed like C<sprintf>, and the
68formatted output becomes the value.
69
70C<sv_setpvfn> is an analogue of C<vsprintf>, but it allows you to specify
71either a pointer to a variable argument list or the address and length of
72an array of SVs. The last argument points to a boolean; on return, if that
73boolean is true, then locale-specific information has been used to format
c2611fb3 74the string, and the string's contents are therefore untrustworthy (see
9abd00ed
GS
75L<perlsec>). This pointer may be NULL if that information is not
76important. Note that this function requires you to specify the length of
77the format.
78
9da1e3b5
MUN
79The C<sv_set*()> functions are not generic enough to operate on values
80that have "magic". See L<Magic Virtual Tables> later in this document.
a0d0e21e 81
a3cb178b
GS
82All SVs that contain strings should be terminated with a NUL character.
83If it is not NUL-terminated there is a risk of
5f05dabc 84core dumps and corruptions from code which passes the string to C
85functions or system calls which expect a NUL-terminated string.
86Perl's own functions typically add a trailing NUL for this reason.
87Nevertheless, you should be very careful when you pass a string stored
88in an SV to a C function or system call.
89
a0d0e21e
LW
90To access the actual value that an SV points to, you can use the macros:
91
92 SvIV(SV*)
954c1994 93 SvUV(SV*)
a0d0e21e
LW
94 SvNV(SV*)
95 SvPV(SV*, STRLEN len)
1fa8b10d 96 SvPV_nolen(SV*)
a0d0e21e 97
954c1994 98which will automatically coerce the actual scalar type into an IV, UV, double,
a0d0e21e
LW
99or string.
100
101In the C<SvPV> macro, the length of the string returned is placed into the
1fa8b10d
JD
102variable C<len> (this is a macro, so you do I<not> use C<&len>). If you do
103not care what the length of the data is, use the C<SvPV_nolen> macro.
104Historically the C<SvPV> macro with the global variable C<PL_na> has been
105used in this case. But that can be quite inefficient because C<PL_na> must
106be accessed in thread-local storage in threaded Perl. In any case, remember
107that Perl allows arbitrary strings of data that may both contain NULs and
108might not be terminated by a NUL.
a0d0e21e 109
ce2f5d8f
KA
110Also remember that C doesn't allow you to safely say C<foo(SvPV(s, len),
111len);>. It might work with your compiler, but it won't work for everyone.
112Break this sort of statement up into separate assignments:
113
b2f5ed49 114 SV *s;
ce2f5d8f
KA
115 STRLEN len;
116 char * ptr;
b2f5ed49 117 ptr = SvPV(s, len);
ce2f5d8f
KA
118 foo(ptr, len);
119
07fa94a1 120If you want to know if the scalar value is TRUE, you can use:
a0d0e21e
LW
121
122 SvTRUE(SV*)
123
124Although Perl will automatically grow strings for you, if you need to force
125Perl to allocate more memory for your SV, you can use the macro
126
127 SvGROW(SV*, STRLEN newlen)
128
129which will determine if more memory needs to be allocated. If so, it will
130call the function C<sv_grow>. Note that C<SvGROW> can only increase, not
5f05dabc 131decrease, the allocated memory of an SV and that it does not automatically
132add a byte for the a trailing NUL (perl's own string functions typically do
8ebc5c01 133C<SvGROW(sv, len + 1)>).
a0d0e21e
LW
134
135If you have an SV and want to know what kind of data Perl thinks is stored
136in it, you can use the following macros to check the type of SV you have.
137
138 SvIOK(SV*)
139 SvNOK(SV*)
140 SvPOK(SV*)
141
142You can get and set the current length of the string stored in an SV with
143the following macros:
144
145 SvCUR(SV*)
146 SvCUR_set(SV*, I32 val)
147
cb1a09d0
AD
148You can also get a pointer to the end of the string stored in the SV
149with the macro:
150
151 SvEND(SV*)
152
153But note that these last three macros are valid only if C<SvPOK()> is true.
a0d0e21e 154
d1b91892
AD
155If you want to append something to the end of string stored in an C<SV*>,
156you can use the following functions:
157
08105a92 158 void sv_catpv(SV*, const char*);
e65f3abd 159 void sv_catpvn(SV*, const char*, STRLEN);
46fc3d4c 160 void sv_catpvf(SV*, const char*, ...);
9abd00ed 161 void sv_catpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool);
d1b91892
AD
162 void sv_catsv(SV*, SV*);
163
164The first function calculates the length of the string to be appended by
165using C<strlen>. In the second, you specify the length of the string
46fc3d4c 166yourself. The third function processes its arguments like C<sprintf> and
9abd00ed
GS
167appends the formatted output. The fourth function works like C<vsprintf>.
168You can specify the address and length of an array of SVs instead of the
169va_list argument. The fifth function extends the string stored in the first
170SV with the string stored in the second SV. It also forces the second SV
171to be interpreted as a string.
172
173The C<sv_cat*()> functions are not generic enough to operate on values that
174have "magic". See L<Magic Virtual Tables> later in this document.
d1b91892 175
a0d0e21e
LW
176If you know the name of a scalar variable, you can get a pointer to its SV
177by using the following:
178
4929bf7b 179 SV* get_sv("package::varname", FALSE);
a0d0e21e
LW
180
181This returns NULL if the variable does not exist.
182
d1b91892 183If you want to know if this variable (or any other SV) is actually C<defined>,
a0d0e21e
LW
184you can call:
185
186 SvOK(SV*)
187
9cde0e7f 188The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>. Its
a0d0e21e
LW
189address can be used whenever an C<SV*> is needed.
190
9cde0e7f
GS
191There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain Boolean
192TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their addresses can
a0d0e21e
LW
193be used whenever an C<SV*> is needed.
194
9cde0e7f 195Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>.
a0d0e21e
LW
196Take this code:
197
198 SV* sv = (SV*) 0;
199 if (I-am-to-return-a-real-value) {
200 sv = sv_2mortal(newSViv(42));
201 }
202 sv_setsv(ST(0), sv);
203
204This code tries to return a new SV (which contains the value 42) if it should
04343c6d 205return a real value, or undef otherwise. Instead it has returned a NULL
a0d0e21e 206pointer which, somewhere down the line, will cause a segmentation violation,
9cde0e7f 207bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the first
5f05dabc 208line and all will be well.
a0d0e21e
LW
209
210To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this
3fe9a6f1 211call is not necessary (see L<Reference Counts and Mortality>).
a0d0e21e 212
94dde4fb
SC
213=head2 Offsets
214
215Perl provides the function C<sv_chop> to efficiently remove characters
216from the beginning of a string; you give it an SV and a pointer to
217somewhere inside the the PV, and it discards everything before the
218pointer. The efficiency comes by means of a little hack: instead of
219actually removing the characters, C<sv_chop> sets the flag C<OOK>
220(offset OK) to signal to other functions that the offset hack is in
221effect, and it puts the number of bytes chopped off into the IV field
222of the SV. It then moves the PV pointer (called C<SvPVX>) forward that
223many bytes, and adjusts C<SvCUR> and C<SvLEN>.
224
225Hence, at this point, the start of the buffer that we allocated lives
226at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing
227into the middle of this allocated storage.
228
229This is best demonstrated by example:
230
231 % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)'
232 SV = PVIV(0x8128450) at 0x81340f0
233 REFCNT = 1
234 FLAGS = (POK,OOK,pPOK)
235 IV = 1 (OFFSET)
236 PV = 0x8135781 ( "1" . ) "2345"\0
237 CUR = 4
238 LEN = 5
239
240Here the number of bytes chopped off (1) is put into IV, and
241C<Devel::Peek::Dump> helpfully reminds us that this is an offset. The
242portion of the string between the "real" and the "fake" beginnings is
243shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect
244the fake beginning, not the real one.
245
d1b91892 246=head2 What's Really Stored in an SV?
a0d0e21e
LW
247
248Recall that the usual method of determining the type of scalar you have is
5f05dabc 249to use C<Sv*OK> macros. Because a scalar can be both a number and a string,
d1b91892 250usually these macros will always return TRUE and calling the C<Sv*V>
a0d0e21e
LW
251macros will do the appropriate conversion of string to integer/double or
252integer/double to string.
253
254If you I<really> need to know if you have an integer, double, or string
255pointer in an SV, you can use the following three macros instead:
256
257 SvIOKp(SV*)
258 SvNOKp(SV*)
259 SvPOKp(SV*)
260
261These will tell you if you truly have an integer, double, or string pointer
d1b91892 262stored in your SV. The "p" stands for private.
a0d0e21e 263
07fa94a1 264In general, though, it's best to use the C<Sv*V> macros.
a0d0e21e 265
54310121 266=head2 Working with AVs
a0d0e21e 267
07fa94a1
JO
268There are two ways to create and load an AV. The first method creates an
269empty AV:
a0d0e21e
LW
270
271 AV* newAV();
272
54310121 273The second method both creates the AV and initially populates it with SVs:
a0d0e21e
LW
274
275 AV* av_make(I32 num, SV **ptr);
276
5f05dabc 277The second argument points to an array containing C<num> C<SV*>'s. Once the
54310121 278AV has been created, the SVs can be destroyed, if so desired.
a0d0e21e 279
54310121 280Once the AV has been created, the following operations are possible on AVs:
a0d0e21e
LW
281
282 void av_push(AV*, SV*);
283 SV* av_pop(AV*);
284 SV* av_shift(AV*);
285 void av_unshift(AV*, I32 num);
286
287These should be familiar operations, with the exception of C<av_unshift>.
288This routine adds C<num> elements at the front of the array with the C<undef>
289value. You must then use C<av_store> (described below) to assign values
290to these new elements.
291
292Here are some other functions:
293
5f05dabc 294 I32 av_len(AV*);
a0d0e21e 295 SV** av_fetch(AV*, I32 key, I32 lval);
a0d0e21e 296 SV** av_store(AV*, I32 key, SV* val);
a0d0e21e 297
5f05dabc 298The C<av_len> function returns the highest index value in array (just
299like $#array in Perl). If the array is empty, -1 is returned. The
300C<av_fetch> function returns the value at index C<key>, but if C<lval>
301is non-zero, then C<av_fetch> will store an undef value at that index.
04343c6d
GS
302The C<av_store> function stores the value C<val> at index C<key>, and does
303not increment the reference count of C<val>. Thus the caller is responsible
304for taking care of that, and if C<av_store> returns NULL, the caller will
305have to decrement the reference count to avoid a memory leak. Note that
306C<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their
307return value.
d1b91892 308
a0d0e21e 309 void av_clear(AV*);
a0d0e21e 310 void av_undef(AV*);
cb1a09d0 311 void av_extend(AV*, I32 key);
5f05dabc 312
313The C<av_clear> function deletes all the elements in the AV* array, but
314does not actually delete the array itself. The C<av_undef> function will
315delete all the elements in the array plus the array itself. The
adc882cf
GS
316C<av_extend> function extends the array so that it contains at least C<key+1>
317elements. If C<key+1> is less than the currently allocated length of the array,
318then nothing is done.
a0d0e21e
LW
319
320If you know the name of an array variable, you can get a pointer to its AV
321by using the following:
322
4929bf7b 323 AV* get_av("package::varname", FALSE);
a0d0e21e
LW
324
325This returns NULL if the variable does not exist.
326
04343c6d
GS
327See L<Understanding the Magic of Tied Hashes and Arrays> for more
328information on how to use the array access functions on tied arrays.
329
54310121 330=head2 Working with HVs
a0d0e21e
LW
331
332To create an HV, you use the following routine:
333
334 HV* newHV();
335
54310121 336Once the HV has been created, the following operations are possible on HVs:
a0d0e21e 337
08105a92
GS
338 SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash);
339 SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval);
a0d0e21e 340
5f05dabc 341The C<klen> parameter is the length of the key being passed in (Note that
342you cannot pass 0 in as a value of C<klen> to tell Perl to measure the
343length of the key). The C<val> argument contains the SV pointer to the
54310121 344scalar being stored, and C<hash> is the precomputed hash value (zero if
5f05dabc 345you want C<hv_store> to calculate it for you). The C<lval> parameter
346indicates whether this fetch is actually a part of a store operation, in
347which case a new undefined value will be added to the HV with the supplied
348key and C<hv_fetch> will return as if the value had already existed.
a0d0e21e 349
5f05dabc 350Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just
351C<SV*>. To access the scalar value, you must first dereference the return
352value. However, you should check to make sure that the return value is
353not NULL before dereferencing it.
a0d0e21e
LW
354
355These two functions check if a hash table entry exists, and deletes it.
356
08105a92
GS
357 bool hv_exists(HV*, const char* key, U32 klen);
358 SV* hv_delete(HV*, const char* key, U32 klen, I32 flags);
a0d0e21e 359
5f05dabc 360If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will
361create and return a mortal copy of the deleted value.
362
a0d0e21e
LW
363And more miscellaneous functions:
364
365 void hv_clear(HV*);
a0d0e21e 366 void hv_undef(HV*);
5f05dabc 367
368Like their AV counterparts, C<hv_clear> deletes all the entries in the hash
369table but does not actually delete the hash table. The C<hv_undef> deletes
370both the entries and the hash table itself.
a0d0e21e 371
d1b91892
AD
372Perl keeps the actual data in linked list of structures with a typedef of HE.
373These contain the actual key and value pointers (plus extra administrative
374overhead). The key is a string pointer; the value is an C<SV*>. However,
375once you have an C<HE*>, to get the actual key and value, use the routines
376specified below.
377
a0d0e21e
LW
378 I32 hv_iterinit(HV*);
379 /* Prepares starting point to traverse hash table */
380 HE* hv_iternext(HV*);
381 /* Get the next entry, and return a pointer to a
382 structure that has both the key and value */
383 char* hv_iterkey(HE* entry, I32* retlen);
384 /* Get the key from an HE structure and also return
385 the length of the key string */
cb1a09d0 386 SV* hv_iterval(HV*, HE* entry);
a0d0e21e
LW
387 /* Return a SV pointer to the value of the HE
388 structure */
cb1a09d0 389 SV* hv_iternextsv(HV*, char** key, I32* retlen);
d1b91892
AD
390 /* This convenience routine combines hv_iternext,
391 hv_iterkey, and hv_iterval. The key and retlen
392 arguments are return values for the key and its
393 length. The value is returned in the SV* argument */
a0d0e21e
LW
394
395If you know the name of a hash variable, you can get a pointer to its HV
396by using the following:
397
4929bf7b 398 HV* get_hv("package::varname", FALSE);
a0d0e21e
LW
399
400This returns NULL if the variable does not exist.
401
8ebc5c01 402The hash algorithm is defined in the C<PERL_HASH(hash, key, klen)> macro:
a0d0e21e 403
a0d0e21e 404 hash = 0;
ab192400
GS
405 while (klen--)
406 hash = (hash * 33) + *key++;
87275199 407 hash = hash + (hash >> 5); /* after 5.6 */
ab192400 408
87275199 409The last step was added in version 5.6 to improve distribution of
ab192400 410lower bits in the resulting hash value.
a0d0e21e 411
04343c6d
GS
412See L<Understanding the Magic of Tied Hashes and Arrays> for more
413information on how to use the hash access functions on tied hashes.
414
1e422769 415=head2 Hash API Extensions
416
417Beginning with version 5.004, the following functions are also supported:
418
419 HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash);
420 HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash);
c47ff5f1 421
1e422769 422 bool hv_exists_ent (HV* tb, SV* key, U32 hash);
423 SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
c47ff5f1 424
1e422769 425 SV* hv_iterkeysv (HE* entry);
426
427Note that these functions take C<SV*> keys, which simplifies writing
428of extension code that deals with hash structures. These functions
429also allow passing of C<SV*> keys to C<tie> functions without forcing
430you to stringify the keys (unlike the previous set of functions).
431
432They also return and accept whole hash entries (C<HE*>), making their
433use more efficient (since the hash number for a particular string
4a4eefd0
GS
434doesn't have to be recomputed every time). See L<perlapi> for detailed
435descriptions.
1e422769 436
437The following macros must always be used to access the contents of hash
438entries. Note that the arguments to these macros must be simple
439variables, since they may get evaluated more than once. See
4a4eefd0 440L<perlapi> for detailed descriptions of these macros.
1e422769 441
442 HePV(HE* he, STRLEN len)
443 HeVAL(HE* he)
444 HeHASH(HE* he)
445 HeSVKEY(HE* he)
446 HeSVKEY_force(HE* he)
447 HeSVKEY_set(HE* he, SV* sv)
448
449These two lower level macros are defined, but must only be used when
450dealing with keys that are not C<SV*>s:
451
452 HeKEY(HE* he)
453 HeKLEN(HE* he)
454
04343c6d
GS
455Note that both C<hv_store> and C<hv_store_ent> do not increment the
456reference count of the stored C<val>, which is the caller's responsibility.
457If these functions return a NULL value, the caller will usually have to
458decrement the reference count of C<val> to avoid a memory leak.
1e422769 459
a0d0e21e
LW
460=head2 References
461
d1b91892
AD
462References are a special type of scalar that point to other data types
463(including references).
a0d0e21e 464
07fa94a1 465To create a reference, use either of the following functions:
a0d0e21e 466
5f05dabc 467 SV* newRV_inc((SV*) thing);
468 SV* newRV_noinc((SV*) thing);
a0d0e21e 469
5f05dabc 470The C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>. The
07fa94a1
JO
471functions are identical except that C<newRV_inc> increments the reference
472count of the C<thing>, while C<newRV_noinc> does not. For historical
473reasons, C<newRV> is a synonym for C<newRV_inc>.
474
475Once you have a reference, you can use the following macro to dereference
476the reference:
a0d0e21e
LW
477
478 SvRV(SV*)
479
480then call the appropriate routines, casting the returned C<SV*> to either an
d1b91892 481C<AV*> or C<HV*>, if required.
a0d0e21e 482
d1b91892 483To determine if an SV is a reference, you can use the following macro:
a0d0e21e
LW
484
485 SvROK(SV*)
486
07fa94a1
JO
487To discover what type of value the reference refers to, use the following
488macro and then check the return value.
d1b91892
AD
489
490 SvTYPE(SvRV(SV*))
491
492The most useful types that will be returned are:
493
494 SVt_IV Scalar
495 SVt_NV Scalar
496 SVt_PV Scalar
5f05dabc 497 SVt_RV Scalar
d1b91892
AD
498 SVt_PVAV Array
499 SVt_PVHV Hash
500 SVt_PVCV Code
5f05dabc 501 SVt_PVGV Glob (possible a file handle)
502 SVt_PVMG Blessed or Magical Scalar
503
504 See the sv.h header file for more details.
d1b91892 505
cb1a09d0
AD
506=head2 Blessed References and Class Objects
507
508References are also used to support object-oriented programming. In the
509OO lexicon, an object is simply a reference that has been blessed into a
510package (or class). Once blessed, the programmer may now use the reference
511to access the various methods in the class.
512
513A reference can be blessed into a package with the following function:
514
515 SV* sv_bless(SV* sv, HV* stash);
516
517The C<sv> argument must be a reference. The C<stash> argument specifies
3fe9a6f1 518which class the reference will belong to. See
2ae324a7 519L<Stashes and Globs> for information on converting class names into stashes.
cb1a09d0
AD
520
521/* Still under construction */
522
523Upgrades rv to reference if not already one. Creates new SV for rv to
8ebc5c01 524point to. If C<classname> is non-null, the SV is blessed into the specified
525class. SV is returned.
cb1a09d0 526
08105a92 527 SV* newSVrv(SV* rv, const char* classname);
cb1a09d0 528
8ebc5c01 529Copies integer or double into an SV whose reference is C<rv>. SV is blessed
530if C<classname> is non-null.
cb1a09d0 531
08105a92
GS
532 SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
533 SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
cb1a09d0 534
5f05dabc 535Copies the pointer value (I<the address, not the string!>) into an SV whose
8ebc5c01 536reference is rv. SV is blessed if C<classname> is non-null.
cb1a09d0 537
08105a92 538 SV* sv_setref_pv(SV* rv, const char* classname, PV iv);
cb1a09d0 539
8ebc5c01 540Copies string into an SV whose reference is C<rv>. Set length to 0 to let
541Perl calculate the string length. SV is blessed if C<classname> is non-null.
cb1a09d0 542
e65f3abd 543 SV* sv_setref_pvn(SV* rv, const char* classname, PV iv, STRLEN length);
cb1a09d0 544
9abd00ed
GS
545Tests whether the SV is blessed into the specified class. It does not
546check inheritance relationships.
547
08105a92 548 int sv_isa(SV* sv, const char* name);
9abd00ed
GS
549
550Tests whether the SV is a reference to a blessed object.
551
552 int sv_isobject(SV* sv);
553
554Tests whether the SV is derived from the specified class. SV can be either
555a reference to a blessed object or a string containing a class name. This
556is the function implementing the C<UNIVERSAL::isa> functionality.
557
08105a92 558 bool sv_derived_from(SV* sv, const char* name);
9abd00ed
GS
559
560To check if you've got an object derived from a specific class you have
561to write:
562
563 if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }
cb1a09d0 564
5f05dabc 565=head2 Creating New Variables
cb1a09d0 566
5f05dabc 567To create a new Perl variable with an undef value which can be accessed from
568your Perl script, use the following routines, depending on the variable type.
cb1a09d0 569
4929bf7b
GS
570 SV* get_sv("package::varname", TRUE);
571 AV* get_av("package::varname", TRUE);
572 HV* get_hv("package::varname", TRUE);
cb1a09d0
AD
573
574Notice the use of TRUE as the second parameter. The new variable can now
575be set, using the routines appropriate to the data type.
576
5f05dabc 577There are additional macros whose values may be bitwise OR'ed with the
578C<TRUE> argument to enable certain extra features. Those bits are:
cb1a09d0 579
5f05dabc 580 GV_ADDMULTI Marks the variable as multiply defined, thus preventing the
54310121 581 "Name <varname> used only once: possible typo" warning.
07fa94a1
JO
582 GV_ADDWARN Issues the warning "Had to create <varname> unexpectedly" if
583 the variable did not exist before the function was called.
cb1a09d0 584
07fa94a1
JO
585If you do not specify a package name, the variable is created in the current
586package.
cb1a09d0 587
5f05dabc 588=head2 Reference Counts and Mortality
a0d0e21e 589
54310121 590Perl uses an reference count-driven garbage collection mechanism. SVs,
591AVs, or HVs (xV for short in the following) start their life with a
55497cff 592reference count of 1. If the reference count of an xV ever drops to 0,
07fa94a1 593then it will be destroyed and its memory made available for reuse.
55497cff 594
595This normally doesn't happen at the Perl level unless a variable is
5f05dabc 596undef'ed or the last variable holding a reference to it is changed or
597overwritten. At the internal level, however, reference counts can be
55497cff 598manipulated with the following macros:
599
600 int SvREFCNT(SV* sv);
5f05dabc 601 SV* SvREFCNT_inc(SV* sv);
55497cff 602 void SvREFCNT_dec(SV* sv);
603
604However, there is one other function which manipulates the reference
07fa94a1
JO
605count of its argument. The C<newRV_inc> function, you will recall,
606creates a reference to the specified argument. As a side effect,
607it increments the argument's reference count. If this is not what
608you want, use C<newRV_noinc> instead.
609
610For example, imagine you want to return a reference from an XSUB function.
611Inside the XSUB routine, you create an SV which initially has a reference
612count of one. Then you call C<newRV_inc>, passing it the just-created SV.
5f05dabc 613This returns the reference as a new SV, but the reference count of the
614SV you passed to C<newRV_inc> has been incremented to two. Now you
07fa94a1
JO
615return the reference from the XSUB routine and forget about the SV.
616But Perl hasn't! Whenever the returned reference is destroyed, the
617reference count of the original SV is decreased to one and nothing happens.
618The SV will hang around without any way to access it until Perl itself
619terminates. This is a memory leak.
5f05dabc 620
621The correct procedure, then, is to use C<newRV_noinc> instead of
faed5253
JO
622C<newRV_inc>. Then, if and when the last reference is destroyed,
623the reference count of the SV will go to zero and it will be destroyed,
07fa94a1 624stopping any memory leak.
55497cff 625
5f05dabc 626There are some convenience functions available that can help with the
54310121 627destruction of xVs. These functions introduce the concept of "mortality".
07fa94a1
JO
628An xV that is mortal has had its reference count marked to be decremented,
629but not actually decremented, until "a short time later". Generally the
630term "short time later" means a single Perl statement, such as a call to
54310121 631an XSUB function. The actual determinant for when mortal xVs have their
07fa94a1
JO
632reference count decremented depends on two macros, SAVETMPS and FREETMPS.
633See L<perlcall> and L<perlxs> for more details on these macros.
55497cff 634
635"Mortalization" then is at its simplest a deferred C<SvREFCNT_dec>.
636However, if you mortalize a variable twice, the reference count will
637later be decremented twice.
638
639You should be careful about creating mortal variables. Strange things
640can happen if you make the same value mortal within multiple contexts,
5f05dabc 641or if you make a variable mortal multiple times.
a0d0e21e
LW
642
643To create a mortal variable, use the functions:
644
645 SV* sv_newmortal()
646 SV* sv_2mortal(SV*)
647 SV* sv_mortalcopy(SV*)
648
5f05dabc 649The first call creates a mortal SV, the second converts an existing
650SV to a mortal SV (and thus defers a call to C<SvREFCNT_dec>), and the
651third creates a mortal copy of an existing SV.
a0d0e21e 652
54310121 653The mortal routines are not just for SVs -- AVs and HVs can be
faed5253 654made mortal by passing their address (type-casted to C<SV*>) to the
07fa94a1 655C<sv_2mortal> or C<sv_mortalcopy> routines.
a0d0e21e 656
5f05dabc 657=head2 Stashes and Globs
a0d0e21e 658
aa689395 659A "stash" is a hash that contains all of the different objects that
660are contained within a package. Each key of the stash is a symbol
661name (shared by all the different types of objects that have the same
662name), and each value in the hash table is a GV (Glob Value). This GV
663in turn contains references to the various objects of that name,
664including (but not limited to) the following:
cb1a09d0 665
a0d0e21e
LW
666 Scalar Value
667 Array Value
668 Hash Value
a3cb178b 669 I/O Handle
a0d0e21e
LW
670 Format
671 Subroutine
672
9cde0e7f 673There is a single stash called "PL_defstash" that holds the items that exist
5f05dabc 674in the "main" package. To get at the items in other packages, append the
675string "::" to the package name. The items in the "Foo" package are in
9cde0e7f 676the stash "Foo::" in PL_defstash. The items in the "Bar::Baz" package are
5f05dabc 677in the stash "Baz::" in "Bar::"'s stash.
a0d0e21e 678
d1b91892 679To get the stash pointer for a particular package, use the function:
a0d0e21e 680
08105a92 681 HV* gv_stashpv(const char* name, I32 create)
a0d0e21e
LW
682 HV* gv_stashsv(SV*, I32 create)
683
684The first function takes a literal string, the second uses the string stored
d1b91892 685in the SV. Remember that a stash is just a hash table, so you get back an
cb1a09d0 686C<HV*>. The C<create> flag will create a new package if it is set.
a0d0e21e
LW
687
688The name that C<gv_stash*v> wants is the name of the package whose symbol table
689you want. The default package is called C<main>. If you have multiply nested
d1b91892
AD
690packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl
691language itself.
a0d0e21e
LW
692
693Alternately, if you have an SV that is a blessed reference, you can find
694out the stash pointer by using:
695
696 HV* SvSTASH(SvRV(SV*));
697
698then use the following to get the package name itself:
699
700 char* HvNAME(HV* stash);
701
5f05dabc 702If you need to bless or re-bless an object you can use the following
703function:
a0d0e21e
LW
704
705 SV* sv_bless(SV*, HV* stash)
706
707where the first argument, an C<SV*>, must be a reference, and the second
708argument is a stash. The returned C<SV*> can now be used in the same way
709as any other SV.
710
d1b91892
AD
711For more information on references and blessings, consult L<perlref>.
712
54310121 713=head2 Double-Typed SVs
0a753a76 714
715Scalar variables normally contain only one type of value, an integer,
716double, pointer, or reference. Perl will automatically convert the
717actual scalar data from the stored type into the requested type.
718
719Some scalar variables contain more than one type of scalar data. For
720example, the variable C<$!> contains either the numeric value of C<errno>
721or its string equivalent from either C<strerror> or C<sys_errlist[]>.
722
723To force multiple data values into an SV, you must do two things: use the
724C<sv_set*v> routines to add the additional scalar type, then set a flag
725so that Perl will believe it contains more than one type of data. The
726four macros to set the flags are:
727
728 SvIOK_on
729 SvNOK_on
730 SvPOK_on
731 SvROK_on
732
733The particular macro you must use depends on which C<sv_set*v> routine
734you called first. This is because every C<sv_set*v> routine turns on
735only the bit for the particular type of data being set, and turns off
736all the rest.
737
738For example, to create a new Perl variable called "dberror" that contains
739both the numeric and descriptive string error values, you could use the
740following code:
741
742 extern int dberror;
743 extern char *dberror_list;
744
4929bf7b 745 SV* sv = get_sv("dberror", TRUE);
0a753a76 746 sv_setiv(sv, (IV) dberror);
747 sv_setpv(sv, dberror_list[dberror]);
748 SvIOK_on(sv);
749
750If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the
751macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>.
752
753=head2 Magic Variables
a0d0e21e 754
d1b91892
AD
755[This section still under construction. Ignore everything here. Post no
756bills. Everything not permitted is forbidden.]
757
d1b91892
AD
758Any SV may be magical, that is, it has special features that a normal
759SV does not have. These features are stored in the SV structure in a
5f05dabc 760linked list of C<struct magic>'s, typedef'ed to C<MAGIC>.
d1b91892
AD
761
762 struct magic {
763 MAGIC* mg_moremagic;
764 MGVTBL* mg_virtual;
765 U16 mg_private;
766 char mg_type;
767 U8 mg_flags;
768 SV* mg_obj;
769 char* mg_ptr;
770 I32 mg_len;
771 };
772
773Note this is current as of patchlevel 0, and could change at any time.
774
775=head2 Assigning Magic
776
777Perl adds magic to an SV using the sv_magic function:
778
08105a92 779 void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen);
d1b91892
AD
780
781The C<sv> argument is a pointer to the SV that is to acquire a new magical
782feature.
783
784If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to
785set the C<SVt_PVMG> flag for the C<sv>. Perl then continues by adding
786it to the beginning of the linked list of magical features. Any prior
787entry of the same type of magic is deleted. Note that this can be
5fb8527f 788overridden, and multiple instances of the same type of magic can be
d1b91892
AD
789associated with an SV.
790
54310121 791The C<name> and C<namlen> arguments are used to associate a string with
792the magic, typically the name of a variable. C<namlen> is stored in the
793C<mg_len> field and if C<name> is non-null and C<namlen> >= 0 a malloc'd
d1b91892
AD
794copy of the name is stored in C<mg_ptr> field.
795
796The sv_magic function uses C<how> to determine which, if any, predefined
797"Magic Virtual Table" should be assigned to the C<mg_virtual> field.
cb1a09d0
AD
798See the "Magic Virtual Table" section below. The C<how> argument is also
799stored in the C<mg_type> field.
d1b91892
AD
800
801The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC>
802structure. If it is not the same as the C<sv> argument, the reference
803count of the C<obj> object is incremented. If it is the same, or if
04343c6d 804the C<how> argument is "#", or if it is a NULL pointer, then C<obj> is
d1b91892
AD
805merely stored, without the reference count being incremented.
806
cb1a09d0
AD
807There is also a function to add magic to an C<HV>:
808
809 void hv_magic(HV *hv, GV *gv, int how);
810
811This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>.
812
813To remove the magic from an SV, call the function sv_unmagic:
814
815 void sv_unmagic(SV *sv, int type);
816
817The C<type> argument should be equal to the C<how> value when the C<SV>
818was initially made magical.
819
d1b91892
AD
820=head2 Magic Virtual Tables
821
822The C<mg_virtual> field in the C<MAGIC> structure is a pointer to a
823C<MGVTBL>, which is a structure of function pointers and stands for
824"Magic Virtual Table" to handle the various operations that might be
825applied to that variable.
826
827The C<MGVTBL> has five pointers to the following routine types:
828
829 int (*svt_get)(SV* sv, MAGIC* mg);
830 int (*svt_set)(SV* sv, MAGIC* mg);
831 U32 (*svt_len)(SV* sv, MAGIC* mg);
832 int (*svt_clear)(SV* sv, MAGIC* mg);
833 int (*svt_free)(SV* sv, MAGIC* mg);
834
835This MGVTBL structure is set at compile-time in C<perl.h> and there are
836currently 19 types (or 21 with overloading turned on). These different
837structures contain pointers to various routines that perform additional
838actions depending on which function is being called.
839
840 Function pointer Action taken
841 ---------------- ------------
842 svt_get Do something after the value of the SV is retrieved.
843 svt_set Do something after the SV is assigned a value.
844 svt_len Report on the SV's length.
845 svt_clear Clear something the SV represents.
846 svt_free Free any extra storage associated with the SV.
847
848For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds
849to an C<mg_type> of '\0') contains:
850
851 { magic_get, magic_set, magic_len, 0, 0 }
852
853Thus, when an SV is determined to be magical and of type '\0', if a get
854operation is being performed, the routine C<magic_get> is called. All
855the various routines for the various magical types begin with C<magic_>.
954c1994
GS
856NOTE: the magic routines are not considered part of the Perl API, and may
857not be exported by the Perl library.
d1b91892
AD
858
859The current kinds of Magic Virtual Tables are:
860
bdbeb323 861 mg_type MGVTBL Type of magic
5f05dabc 862 ------- ------ ----------------------------
bdbeb323
SM
863 \0 vtbl_sv Special scalar variable
864 A vtbl_amagic %OVERLOAD hash
865 a vtbl_amagicelem %OVERLOAD hash element
866 c (none) Holds overload table (AMT) on stash
867 B vtbl_bm Boyer-Moore (fast string search)
c2e66d9e
GS
868 D vtbl_regdata Regex match position data (@+ and @- vars)
869 d vtbl_regdatum Regex match position data element
d1b91892
AD
870 E vtbl_env %ENV hash
871 e vtbl_envelem %ENV hash element
bdbeb323
SM
872 f vtbl_fm Formline ('compiled' format)
873 g vtbl_mglob m//g target / study()ed string
d1b91892
AD
874 I vtbl_isa @ISA array
875 i vtbl_isaelem @ISA array element
bdbeb323
SM
876 k vtbl_nkeys scalar(keys()) lvalue
877 L (none) Debugger %_<filename
878 l vtbl_dbline Debugger %_<filename element
44a8e56a 879 o vtbl_collxfrm Locale transformation
bdbeb323
SM
880 P vtbl_pack Tied array or hash
881 p vtbl_packelem Tied array or hash element
882 q vtbl_packelem Tied scalar or handle
883 S vtbl_sig %SIG hash
884 s vtbl_sigelem %SIG hash element
d1b91892 885 t vtbl_taint Taintedness
bdbeb323
SM
886 U vtbl_uvar Available for use by extensions
887 v vtbl_vec vec() lvalue
888 x vtbl_substr substr() lvalue
889 y vtbl_defelem Shadow "foreach" iterator variable /
890 smart parameter vivification
891 * vtbl_glob GV (typeglob)
892 # vtbl_arylen Array length ($#ary)
893 . vtbl_pos pos() lvalue
894 ~ (none) Available for use by extensions
d1b91892 895
68dc0745 896When an uppercase and lowercase letter both exist in the table, then the
897uppercase letter is used to represent some kind of composite type (a list
898or a hash), and the lowercase letter is used to represent an element of
d1b91892
AD
899that composite type.
900
bdbeb323
SM
901The '~' and 'U' magic types are defined specifically for use by
902extensions and will not be used by perl itself. Extensions can use
903'~' magic to 'attach' private information to variables (typically
904objects). This is especially useful because there is no way for
905normal perl code to corrupt this private information (unlike using
906extra elements of a hash object).
907
908Similarly, 'U' magic can be used much like tie() to call a C function
909any time a scalar's value is used or changed. The C<MAGIC>'s
910C<mg_ptr> field points to a C<ufuncs> structure:
911
912 struct ufuncs {
913 I32 (*uf_val)(IV, SV*);
914 I32 (*uf_set)(IV, SV*);
915 IV uf_index;
916 };
917
918When the SV is read from or written to, the C<uf_val> or C<uf_set>
919function will be called with C<uf_index> as the first arg and a
1526ead6
AB
920pointer to the SV as the second. A simple example of how to add 'U'
921magic is shown below. Note that the ufuncs structure is copied by
922sv_magic, so you can safely allocate it on the stack.
923
924 void
925 Umagic(sv)
926 SV *sv;
927 PREINIT:
928 struct ufuncs uf;
929 CODE:
930 uf.uf_val = &my_get_fn;
931 uf.uf_set = &my_set_fn;
932 uf.uf_index = 0;
933 sv_magic(sv, 0, 'U', (char*)&uf, sizeof(uf));
5f05dabc 934
bdbeb323
SM
935Note that because multiple extensions may be using '~' or 'U' magic,
936it is important for extensions to take extra care to avoid conflict.
937Typically only using the magic on objects blessed into the same class
938as the extension is sufficient. For '~' magic, it may also be
939appropriate to add an I32 'signature' at the top of the private data
940area and check that.
5f05dabc 941
ef50df4b
GS
942Also note that the C<sv_set*()> and C<sv_cat*()> functions described
943earlier do B<not> invoke 'set' magic on their targets. This must
944be done by the user either by calling the C<SvSETMAGIC()> macro after
945calling these functions, or by using one of the C<sv_set*_mg()> or
946C<sv_cat*_mg()> functions. Similarly, generic C code must call the
947C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV
948obtained from external sources in functions that don't handle magic.
4a4eefd0 949See L<perlapi> for a description of these functions.
189b2af5
GS
950For example, calls to the C<sv_cat*()> functions typically need to be
951followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()>
952since their implementation handles 'get' magic.
953
d1b91892
AD
954=head2 Finding Magic
955
956 MAGIC* mg_find(SV*, int type); /* Finds the magic pointer of that type */
957
958This routine returns a pointer to the C<MAGIC> structure stored in the SV.
959If the SV does not have that magical feature, C<NULL> is returned. Also,
54310121 960if the SV is not of type SVt_PVMG, Perl may core dump.
d1b91892 961
08105a92 962 int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen);
d1b91892
AD
963
964This routine checks to see what types of magic C<sv> has. If the mg_type
68dc0745 965field is an uppercase letter, then the mg_obj is copied to C<nsv>, but
966the mg_type field is changed to be the lowercase letter.
a0d0e21e 967
04343c6d
GS
968=head2 Understanding the Magic of Tied Hashes and Arrays
969
970Tied hashes and arrays are magical beasts of the 'P' magic type.
9edb2b46
GS
971
972WARNING: As of the 5.004 release, proper usage of the array and hash
973access functions requires understanding a few caveats. Some
974of these caveats are actually considered bugs in the API, to be fixed
975in later releases, and are bracketed with [MAYCHANGE] below. If
976you find yourself actually applying such information in this section, be
977aware that the behavior may change in the future, umm, without warning.
04343c6d 978
1526ead6
AB
979The perl tie function associates a variable with an object that implements
980the various GET, SET etc methods. To perform the equivalent of the perl
981tie function from an XSUB, you must mimic this behaviour. The code below
982carries out the necessary steps - firstly it creates a new hash, and then
983creates a second hash which it blesses into the class which will implement
984the tie methods. Lastly it ties the two hashes together, and returns a
985reference to the new tied hash. Note that the code below does NOT call the
986TIEHASH method in the MyTie class -
987see L<Calling Perl Routines from within C Programs> for details on how
988to do this.
989
990 SV*
991 mytie()
992 PREINIT:
993 HV *hash;
994 HV *stash;
995 SV *tie;
996 CODE:
997 hash = newHV();
998 tie = newRV_noinc((SV*)newHV());
999 stash = gv_stashpv("MyTie", TRUE);
1000 sv_bless(tie, stash);
1001 hv_magic(hash, tie, 'P');
1002 RETVAL = newRV_noinc(hash);
1003 OUTPUT:
1004 RETVAL
1005
04343c6d
GS
1006The C<av_store> function, when given a tied array argument, merely
1007copies the magic of the array onto the value to be "stored", using
1008C<mg_copy>. It may also return NULL, indicating that the value did not
9edb2b46
GS
1009actually need to be stored in the array. [MAYCHANGE] After a call to
1010C<av_store> on a tied array, the caller will usually need to call
1011C<mg_set(val)> to actually invoke the perl level "STORE" method on the
1012TIEARRAY object. If C<av_store> did return NULL, a call to
1013C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory
1014leak. [/MAYCHANGE]
04343c6d
GS
1015
1016The previous paragraph is applicable verbatim to tied hash access using the
1017C<hv_store> and C<hv_store_ent> functions as well.
1018
1019C<av_fetch> and the corresponding hash functions C<hv_fetch> and
1020C<hv_fetch_ent> actually return an undefined mortal value whose magic
1021has been initialized using C<mg_copy>. Note the value so returned does not
9edb2b46
GS
1022need to be deallocated, as it is already mortal. [MAYCHANGE] But you will
1023need to call C<mg_get()> on the returned value in order to actually invoke
1024the perl level "FETCH" method on the underlying TIE object. Similarly,
04343c6d
GS
1025you may also call C<mg_set()> on the return value after possibly assigning
1026a suitable value to it using C<sv_setsv>, which will invoke the "STORE"
9edb2b46 1027method on the TIE object. [/MAYCHANGE]
04343c6d 1028
9edb2b46 1029[MAYCHANGE]
04343c6d
GS
1030In other words, the array or hash fetch/store functions don't really
1031fetch and store actual values in the case of tied arrays and hashes. They
1032merely call C<mg_copy> to attach magic to the values that were meant to be
1033"stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually
1034do the job of invoking the TIE methods on the underlying objects. Thus
9edb2b46 1035the magic mechanism currently implements a kind of lazy access to arrays
04343c6d
GS
1036and hashes.
1037
1038Currently (as of perl version 5.004), use of the hash and array access
1039functions requires the user to be aware of whether they are operating on
9edb2b46
GS
1040"normal" hashes and arrays, or on their tied variants. The API may be
1041changed to provide more transparent access to both tied and normal data
1042types in future versions.
1043[/MAYCHANGE]
04343c6d
GS
1044
1045You would do well to understand that the TIEARRAY and TIEHASH interfaces
1046are mere sugar to invoke some perl method calls while using the uniform hash
1047and array syntax. The use of this sugar imposes some overhead (typically
1048about two to four extra opcodes per FETCH/STORE operation, in addition to
1049the creation of all the mortal variables required to invoke the methods).
1050This overhead will be comparatively small if the TIE methods are themselves
1051substantial, but if they are only a few statements long, the overhead
1052will not be insignificant.
1053
d1c897a1
IZ
1054=head2 Localizing changes
1055
1056Perl has a very handy construction
1057
1058 {
1059 local $var = 2;
1060 ...
1061 }
1062
1063This construction is I<approximately> equivalent to
1064
1065 {
1066 my $oldvar = $var;
1067 $var = 2;
1068 ...
1069 $var = $oldvar;
1070 }
1071
1072The biggest difference is that the first construction would
1073reinstate the initial value of $var, irrespective of how control exits
1074the block: C<goto>, C<return>, C<die>/C<eval> etc. It is a little bit
1075more efficient as well.
1076
1077There is a way to achieve a similar task from C via Perl API: create a
1078I<pseudo-block>, and arrange for some changes to be automatically
1079undone at the end of it, either explicit, or via a non-local exit (via
1080die()). A I<block>-like construct is created by a pair of
b687b08b
TC
1081C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">).
1082Such a construct may be created specially for some important localized
1083task, or an existing one (like boundaries of enclosing Perl
1084subroutine/block, or an existing pair for freeing TMPs) may be
1085used. (In the second case the overhead of additional localization must
1086be almost negligible.) Note that any XSUB is automatically enclosed in
1087an C<ENTER>/C<LEAVE> pair.
d1c897a1
IZ
1088
1089Inside such a I<pseudo-block> the following service is available:
1090
13a2d996 1091=over 4
d1c897a1
IZ
1092
1093=item C<SAVEINT(int i)>
1094
1095=item C<SAVEIV(IV i)>
1096
1097=item C<SAVEI32(I32 i)>
1098
1099=item C<SAVELONG(long i)>
1100
1101These macros arrange things to restore the value of integer variable
1102C<i> at the end of enclosing I<pseudo-block>.
1103
1104=item C<SAVESPTR(s)>
1105
1106=item C<SAVEPPTR(p)>
1107
1108These macros arrange things to restore the value of pointers C<s> and
1109C<p>. C<s> must be a pointer of a type which survives conversion to
1110C<SV*> and back, C<p> should be able to survive conversion to C<char*>
1111and back.
1112
1113=item C<SAVEFREESV(SV *sv)>
1114
1115The refcount of C<sv> would be decremented at the end of
1116I<pseudo-block>. This is similar to C<sv_2mortal>, which should (?) be
1117used instead.
1118
1119=item C<SAVEFREEOP(OP *op)>
1120
1121The C<OP *> is op_free()ed at the end of I<pseudo-block>.
1122
1123=item C<SAVEFREEPV(p)>
1124
1125The chunk of memory which is pointed to by C<p> is Safefree()ed at the
1126end of I<pseudo-block>.
1127
1128=item C<SAVECLEARSV(SV *sv)>
1129
1130Clears a slot in the current scratchpad which corresponds to C<sv> at
1131the end of I<pseudo-block>.
1132
1133=item C<SAVEDELETE(HV *hv, char *key, I32 length)>
1134
1135The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The
1136string pointed to by C<key> is Safefree()ed. If one has a I<key> in
1137short-lived storage, the corresponding string may be reallocated like
1138this:
1139
9cde0e7f 1140 SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));
d1c897a1 1141
c76ac1ee 1142=item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)>
d1c897a1
IZ
1143
1144At the end of I<pseudo-block> the function C<f> is called with the
c76ac1ee
GS
1145only argument C<p>.
1146
1147=item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)>
1148
1149At the end of I<pseudo-block> the function C<f> is called with the
1150implicit context argument (if any), and C<p>.
d1c897a1
IZ
1151
1152=item C<SAVESTACK_POS()>
1153
1154The current offset on the Perl internal stack (cf. C<SP>) is restored
1155at the end of I<pseudo-block>.
1156
1157=back
1158
1159The following API list contains functions, thus one needs to
1160provide pointers to the modifiable data explicitly (either C pointers,
1161or Perlish C<GV *>s). Where the above macros take C<int>, a similar
1162function takes C<int *>.
1163
13a2d996 1164=over 4
d1c897a1
IZ
1165
1166=item C<SV* save_scalar(GV *gv)>
1167
1168Equivalent to Perl code C<local $gv>.
1169
1170=item C<AV* save_ary(GV *gv)>
1171
1172=item C<HV* save_hash(GV *gv)>
1173
1174Similar to C<save_scalar>, but localize C<@gv> and C<%gv>.
1175
1176=item C<void save_item(SV *item)>
1177
1178Duplicates the current value of C<SV>, on the exit from the current
1179C<ENTER>/C<LEAVE> I<pseudo-block> will restore the value of C<SV>
1180using the stored value.
1181
1182=item C<void save_list(SV **sarg, I32 maxsarg)>
1183
1184A variant of C<save_item> which takes multiple arguments via an array
1185C<sarg> of C<SV*> of length C<maxsarg>.
1186
1187=item C<SV* save_svref(SV **sptr)>
1188
1189Similar to C<save_scalar>, but will reinstate a C<SV *>.
1190
1191=item C<void save_aptr(AV **aptr)>
1192
1193=item C<void save_hptr(HV **hptr)>
1194
1195Similar to C<save_svref>, but localize C<AV *> and C<HV *>.
1196
1197=back
1198
1199The C<Alias> module implements localization of the basic types within the
1200I<caller's scope>. People who are interested in how to localize things in
1201the containing scope should take a look there too.
1202
0a753a76 1203=head1 Subroutines
a0d0e21e 1204
68dc0745 1205=head2 XSUBs and the Argument Stack
5f05dabc 1206
1207The XSUB mechanism is a simple way for Perl programs to access C subroutines.
1208An XSUB routine will have a stack that contains the arguments from the Perl
1209program, and a way to map from the Perl data structures to a C equivalent.
1210
1211The stack arguments are accessible through the C<ST(n)> macro, which returns
1212the C<n>'th stack argument. Argument 0 is the first argument passed in the
1213Perl subroutine call. These arguments are C<SV*>, and can be used anywhere
1214an C<SV*> is used.
1215
1216Most of the time, output from the C routine can be handled through use of
1217the RETVAL and OUTPUT directives. However, there are some cases where the
1218argument stack is not already long enough to handle all the return values.
1219An example is the POSIX tzname() call, which takes no arguments, but returns
1220two, the local time zone's standard and summer time abbreviations.
1221
1222To handle this situation, the PPCODE directive is used and the stack is
1223extended using the macro:
1224
924508f0 1225 EXTEND(SP, num);
5f05dabc 1226
924508f0
GS
1227where C<SP> is the macro that represents the local copy of the stack pointer,
1228and C<num> is the number of elements the stack should be extended by.
5f05dabc 1229
1230Now that there is room on the stack, values can be pushed on it using the
54310121 1231macros to push IVs, doubles, strings, and SV pointers respectively:
5f05dabc 1232
1233 PUSHi(IV)
1234 PUSHn(double)
1235 PUSHp(char*, I32)
1236 PUSHs(SV*)
1237
1238And now the Perl program calling C<tzname>, the two values will be assigned
1239as in:
1240
1241 ($standard_abbrev, $summer_abbrev) = POSIX::tzname;
1242
1243An alternate (and possibly simpler) method to pushing values on the stack is
1244to use the macros:
1245
1246 XPUSHi(IV)
1247 XPUSHn(double)
1248 XPUSHp(char*, I32)
1249 XPUSHs(SV*)
1250
1251These macros automatically adjust the stack for you, if needed. Thus, you
1252do not need to call C<EXTEND> to extend the stack.
1253
1254For more information, consult L<perlxs> and L<perlxstut>.
1255
1256=head2 Calling Perl Routines from within C Programs
a0d0e21e
LW
1257
1258There are four routines that can be used to call a Perl subroutine from
1259within a C program. These four are:
1260
954c1994
GS
1261 I32 call_sv(SV*, I32);
1262 I32 call_pv(const char*, I32);
1263 I32 call_method(const char*, I32);
1264 I32 call_argv(const char*, I32, register char**);
a0d0e21e 1265
954c1994 1266The routine most often used is C<call_sv>. The C<SV*> argument
d1b91892
AD
1267contains either the name of the Perl subroutine to be called, or a
1268reference to the subroutine. The second argument consists of flags
1269that control the context in which the subroutine is called, whether
1270or not the subroutine is being passed arguments, how errors should be
1271trapped, and how to treat return values.
a0d0e21e
LW
1272
1273All four routines return the number of arguments that the subroutine returned
1274on the Perl stack.
1275
954c1994
GS
1276These routines used to be called C<perl_call_sv> etc., before Perl v5.6.0,
1277but those names are now deprecated; macros of the same name are provided for
1278compatibility.
1279
1280When using any of these routines (except C<call_argv>), the programmer
d1b91892
AD
1281must manipulate the Perl stack. These include the following macros and
1282functions:
a0d0e21e
LW
1283
1284 dSP
924508f0 1285 SP
a0d0e21e
LW
1286 PUSHMARK()
1287 PUTBACK
1288 SPAGAIN
1289 ENTER
1290 SAVETMPS
1291 FREETMPS
1292 LEAVE
1293 XPUSH*()
cb1a09d0 1294 POP*()
a0d0e21e 1295
5f05dabc 1296For a detailed description of calling conventions from C to Perl,
1297consult L<perlcall>.
a0d0e21e 1298
5f05dabc 1299=head2 Memory Allocation
a0d0e21e 1300
86058a2d
GS
1301All memory meant to be used with the Perl API functions should be manipulated
1302using the macros described in this section. The macros provide the necessary
1303transparency between differences in the actual malloc implementation that is
1304used within perl.
1305
1306It is suggested that you enable the version of malloc that is distributed
5f05dabc 1307with Perl. It keeps pools of various sizes of unallocated memory in
07fa94a1
JO
1308order to satisfy allocation requests more quickly. However, on some
1309platforms, it may cause spurious malloc or free errors.
d1b91892
AD
1310
1311 New(x, pointer, number, type);
1312 Newc(x, pointer, number, type, cast);
1313 Newz(x, pointer, number, type);
1314
07fa94a1 1315These three macros are used to initially allocate memory.
5f05dabc 1316
1317The first argument C<x> was a "magic cookie" that was used to keep track
1318of who called the macro, to help when debugging memory problems. However,
07fa94a1
JO
1319the current code makes no use of this feature (most Perl developers now
1320use run-time memory checkers), so this argument can be any number.
5f05dabc 1321
1322The second argument C<pointer> should be the name of a variable that will
1323point to the newly allocated memory.
d1b91892 1324
d1b91892
AD
1325The third and fourth arguments C<number> and C<type> specify how many of
1326the specified type of data structure should be allocated. The argument
1327C<type> is passed to C<sizeof>. The final argument to C<Newc>, C<cast>,
1328should be used if the C<pointer> argument is different from the C<type>
1329argument.
1330
1331Unlike the C<New> and C<Newc> macros, the C<Newz> macro calls C<memzero>
1332to zero out all the newly allocated memory.
1333
1334 Renew(pointer, number, type);
1335 Renewc(pointer, number, type, cast);
1336 Safefree(pointer)
1337
1338These three macros are used to change a memory buffer size or to free a
1339piece of memory no longer needed. The arguments to C<Renew> and C<Renewc>
1340match those of C<New> and C<Newc> with the exception of not needing the
1341"magic cookie" argument.
1342
1343 Move(source, dest, number, type);
1344 Copy(source, dest, number, type);
1345 Zero(dest, number, type);
1346
1347These three macros are used to move, copy, or zero out previously allocated
1348memory. The C<source> and C<dest> arguments point to the source and
1349destination starting points. Perl will move, copy, or zero out C<number>
1350instances of the size of the C<type> data structure (using the C<sizeof>
1351function).
a0d0e21e 1352
0cf5025f
SC
1353Here is a handy table of equivalents between ordinary C and Perl's
1354memory abstraction layer:
1355
ef7adf26
JH
1356 Instead Of: Use:
1357
1358 malloc New
1359 calloc Newz
1360 realloc Renew
1361 memcopy Copy
1362 memmove Move
1363 free Safefree
1364 strdup savepv
1365 strndup savepvn (Hey, strndup doesn't exist!)
1366 memcpy/*(struct foo *) StructCopy
0cf5025f 1367
5f05dabc 1368=head2 PerlIO
ce3d39e2 1369
5f05dabc 1370The most recent development releases of Perl has been experimenting with
1371removing Perl's dependency on the "normal" standard I/O suite and allowing
1372other stdio implementations to be used. This involves creating a new
1373abstraction layer that then calls whichever implementation of stdio Perl
68dc0745 1374was compiled with. All XSUBs should now use the functions in the PerlIO
5f05dabc 1375abstraction layer and not make any assumptions about what kind of stdio
1376is being used.
1377
1378For a complete description of the PerlIO abstraction, consult L<perlapio>.
1379
8ebc5c01 1380=head2 Putting a C value on Perl stack
ce3d39e2
IZ
1381
1382A lot of opcodes (this is an elementary operation in the internal perl
1383stack machine) put an SV* on the stack. However, as an optimization
1384the corresponding SV is (usually) not recreated each time. The opcodes
1385reuse specially assigned SVs (I<target>s) which are (as a corollary)
1386not constantly freed/created.
1387
0a753a76 1388Each of the targets is created only once (but see
ce3d39e2
IZ
1389L<Scratchpads and recursion> below), and when an opcode needs to put
1390an integer, a double, or a string on stack, it just sets the
1391corresponding parts of its I<target> and puts the I<target> on stack.
1392
1393The macro to put this target on stack is C<PUSHTARG>, and it is
1394directly used in some opcodes, as well as indirectly in zillions of
1395others, which use it via C<(X)PUSH[pni]>.
1396
8ebc5c01 1397=head2 Scratchpads
ce3d39e2 1398
54310121 1399The question remains on when the SVs which are I<target>s for opcodes
5f05dabc 1400are created. The answer is that they are created when the current unit --
1401a subroutine or a file (for opcodes for statements outside of
1402subroutines) -- is compiled. During this time a special anonymous Perl
ce3d39e2
IZ
1403array is created, which is called a scratchpad for the current
1404unit.
1405
54310121 1406A scratchpad keeps SVs which are lexicals for the current unit and are
ce3d39e2
IZ
1407targets for opcodes. One can deduce that an SV lives on a scratchpad
1408by looking on its flags: lexicals have C<SVs_PADMY> set, and
1409I<target>s have C<SVs_PADTMP> set.
1410
54310121 1411The correspondence between OPs and I<target>s is not 1-to-1. Different
1412OPs in the compile tree of the unit can use the same target, if this
ce3d39e2
IZ
1413would not conflict with the expected life of the temporary.
1414
2ae324a7 1415=head2 Scratchpads and recursion
ce3d39e2
IZ
1416
1417In fact it is not 100% true that a compiled unit contains a pointer to
1418the scratchpad AV. In fact it contains a pointer to an AV of
1419(initially) one element, and this element is the scratchpad AV. Why do
1420we need an extra level of indirection?
1421
1422The answer is B<recursion>, and maybe (sometime soon) B<threads>. Both
1423these can create several execution pointers going into the same
1424subroutine. For the subroutine-child not write over the temporaries
1425for the subroutine-parent (lifespan of which covers the call to the
1426child), the parent and the child should have different
1427scratchpads. (I<And> the lexicals should be separate anyway!)
1428
5f05dabc 1429So each subroutine is born with an array of scratchpads (of length 1).
1430On each entry to the subroutine it is checked that the current
ce3d39e2
IZ
1431depth of the recursion is not more than the length of this array, and
1432if it is, new scratchpad is created and pushed into the array.
1433
1434The I<target>s on this scratchpad are C<undef>s, but they are already
1435marked with correct flags.
1436
0a753a76 1437=head1 Compiled code
1438
1439=head2 Code tree
1440
1441Here we describe the internal form your code is converted to by
1442Perl. Start with a simple example:
1443
1444 $a = $b + $c;
1445
1446This is converted to a tree similar to this one:
1447
1448 assign-to
1449 / \
1450 + $a
1451 / \
1452 $b $c
1453
7b8d334a 1454(but slightly more complicated). This tree reflects the way Perl
0a753a76 1455parsed your code, but has nothing to do with the execution order.
1456There is an additional "thread" going through the nodes of the tree
1457which shows the order of execution of the nodes. In our simplified
1458example above it looks like:
1459
1460 $b ---> $c ---> + ---> $a ---> assign-to
1461
1462But with the actual compile tree for C<$a = $b + $c> it is different:
1463some nodes I<optimized away>. As a corollary, though the actual tree
1464contains more nodes than our simplified example, the execution order
1465is the same as in our example.
1466
1467=head2 Examining the tree
1468
1469If you have your perl compiled for debugging (usually done with C<-D
1470optimize=-g> on C<Configure> command line), you may examine the
1471compiled tree by specifying C<-Dx> on the Perl command line. The
1472output takes several lines per node, and for C<$b+$c> it looks like
1473this:
1474
1475 5 TYPE = add ===> 6
1476 TARG = 1
1477 FLAGS = (SCALAR,KIDS)
1478 {
1479 TYPE = null ===> (4)
1480 (was rv2sv)
1481 FLAGS = (SCALAR,KIDS)
1482 {
1483 3 TYPE = gvsv ===> 4
1484 FLAGS = (SCALAR)
1485 GV = main::b
1486 }
1487 }
1488 {
1489 TYPE = null ===> (5)
1490 (was rv2sv)
1491 FLAGS = (SCALAR,KIDS)
1492 {
1493 4 TYPE = gvsv ===> 5
1494 FLAGS = (SCALAR)
1495 GV = main::c
1496 }
1497 }
1498
1499This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are
1500not optimized away (one per number in the left column). The immediate
1501children of the given node correspond to C<{}> pairs on the same level
1502of indentation, thus this listing corresponds to the tree:
1503
1504 add
1505 / \
1506 null null
1507 | |
1508 gvsv gvsv
1509
1510The execution order is indicated by C<===E<gt>> marks, thus it is C<3
15114 5 6> (node C<6> is not included into above listing), i.e.,
1512C<gvsv gvsv add whatever>.
1513
1514=head2 Compile pass 1: check routines
1515
8870b5c7
GS
1516The tree is created by the compiler while I<yacc> code feeds it
1517the constructions it recognizes. Since I<yacc> works bottom-up, so does
0a753a76 1518the first pass of perl compilation.
1519
1520What makes this pass interesting for perl developers is that some
1521optimization may be performed on this pass. This is optimization by
8870b5c7 1522so-called "check routines". The correspondence between node names
0a753a76 1523and corresponding check routines is described in F<opcode.pl> (do not
1524forget to run C<make regen_headers> if you modify this file).
1525
1526A check routine is called when the node is fully constructed except
7b8d334a 1527for the execution-order thread. Since at this time there are no
0a753a76 1528back-links to the currently constructed node, one can do most any
1529operation to the top-level node, including freeing it and/or creating
1530new nodes above/below it.
1531
1532The check routine returns the node which should be inserted into the
1533tree (if the top-level node was not modified, check routine returns
1534its argument).
1535
1536By convention, check routines have names C<ck_*>. They are usually
1537called from C<new*OP> subroutines (or C<convert>) (which in turn are
1538called from F<perly.y>).
1539
1540=head2 Compile pass 1a: constant folding
1541
1542Immediately after the check routine is called the returned node is
1543checked for being compile-time executable. If it is (the value is
1544judged to be constant) it is immediately executed, and a I<constant>
1545node with the "return value" of the corresponding subtree is
1546substituted instead. The subtree is deleted.
1547
1548If constant folding was not performed, the execution-order thread is
1549created.
1550
1551=head2 Compile pass 2: context propagation
1552
1553When a context for a part of compile tree is known, it is propagated
a3cb178b 1554down through the tree. At this time the context can have 5 values
0a753a76 1555(instead of 2 for runtime context): void, boolean, scalar, list, and
1556lvalue. In contrast with the pass 1 this pass is processed from top
1557to bottom: a node's context determines the context for its children.
1558
1559Additional context-dependent optimizations are performed at this time.
1560Since at this moment the compile tree contains back-references (via
1561"thread" pointers), nodes cannot be free()d now. To allow
1562optimized-away nodes at this stage, such nodes are null()ified instead
1563of free()ing (i.e. their type is changed to OP_NULL).
1564
1565=head2 Compile pass 3: peephole optimization
1566
1567After the compile tree for a subroutine (or for an C<eval> or a file)
1568is created, an additional pass over the code is performed. This pass
1569is neither top-down or bottom-up, but in the execution order (with
7b8d334a 1570additional complications for conditionals). These optimizations are
0a753a76 1571done in the subroutine peep(). Optimizations performed at this stage
1572are subject to the same restrictions as in the pass 2.
1573
954c1994 1574=head1 How multiple interpreters and concurrency are supported
ee072b34 1575
ee072b34
GS
1576=head2 Background and PERL_IMPLICIT_CONTEXT
1577
1578The Perl interpreter can be regarded as a closed box: it has an API
1579for feeding it code or otherwise making it do things, but it also has
1580functions for its own use. This smells a lot like an object, and
1581there are ways for you to build Perl so that you can have multiple
1582interpreters, with one interpreter represented either as a C++ object,
1583a C structure, or inside a thread. The thread, the C structure, or
1584the C++ object will contain all the context, the state of that
1585interpreter.
1586
54aff467
GS
1587Three macros control the major Perl build flavors: MULTIPLICITY,
1588USE_THREADS and PERL_OBJECT. The MULTIPLICITY build has a C structure
1589that packages all the interpreter state, there is a similar thread-specific
1590data structure under USE_THREADS, and the PERL_OBJECT build has a C++
1591class to maintain interpreter state. In all three cases,
1592PERL_IMPLICIT_CONTEXT is also normally defined, and enables the
1593support for passing in a "hidden" first argument that represents all three
651a3225 1594data structures.
54aff467
GS
1595
1596All this obviously requires a way for the Perl internal functions to be
ee072b34
GS
1597C++ methods, subroutines taking some kind of structure as the first
1598argument, or subroutines taking nothing as the first argument. To
1599enable these three very different ways of building the interpreter,
1600the Perl source (as it does in so many other situations) makes heavy
1601use of macros and subroutine naming conventions.
1602
54aff467 1603First problem: deciding which functions will be public API functions and
954c1994
GS
1604which will be private. All functions whose names begin C<S_> are private
1605(think "S" for "secret" or "static"). All other functions begin with
1606"Perl_", but just because a function begins with "Perl_" does not mean it is
a422fd2d
SC
1607part of the API. (See L</Internal Functions>.) The easiest way to be B<sure> a
1608function is part of the API is to find its entry in L<perlapi>.
1609If it exists in L<perlapi>, it's part of the API. If it doesn't, and you
1610think it should be (i.e., you need it for your extension), send mail via
1611L<perlbug> explaining why you think it should be.
ee072b34
GS
1612
1613Second problem: there must be a syntax so that the same subroutine
1614declarations and calls can pass a structure as their first argument,
1615or pass nothing. To solve this, the subroutines are named and
1616declared in a particular way. Here's a typical start of a static
1617function used within the Perl guts:
1618
1619 STATIC void
1620 S_incline(pTHX_ char *s)
1621
1622STATIC becomes "static" in C, and is #define'd to nothing in C++.
1623
651a3225
GS
1624A public function (i.e. part of the internal API, but not necessarily
1625sanctioned for use in extensions) begins like this:
ee072b34
GS
1626
1627 void
1628 Perl_sv_setsv(pTHX_ SV* dsv, SV* ssv)
1629
1630C<pTHX_> is one of a number of macros (in perl.h) that hide the
1631details of the interpreter's context. THX stands for "thread", "this",
1632or "thingy", as the case may be. (And no, George Lucas is not involved. :-)
1633The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument,
1634or 'd' for B<d>eclaration.
1635
1636When Perl is built without PERL_IMPLICIT_CONTEXT, there is no first
1637argument containing the interpreter's context. The trailing underscore
1638in the pTHX_ macro indicates that the macro expansion needs a comma
1639after the context argument because other arguments follow it. If
1640PERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be ignored, and the
54aff467
GS
1641subroutine is not prototyped to take the extra argument. The form of the
1642macro without the trailing underscore is used when there are no additional
ee072b34
GS
1643explicit arguments.
1644
54aff467 1645When a core function calls another, it must pass the context. This
ee072b34
GS
1646is normally hidden via macros. Consider C<sv_setsv>. It expands
1647something like this:
1648
1649 ifdef PERL_IMPLICIT_CONTEXT
1650 define sv_setsv(a,b) Perl_sv_setsv(aTHX_ a, b)
1651 /* can't do this for vararg functions, see below */
1652 else
1653 define sv_setsv Perl_sv_setsv
1654 endif
1655
1656This works well, and means that XS authors can gleefully write:
1657
1658 sv_setsv(foo, bar);
1659
1660and still have it work under all the modes Perl could have been
1661compiled with.
1662
1663Under PERL_OBJECT in the core, that will translate to either:
1664
1665 CPerlObj::Perl_sv_setsv(foo,bar); # in CPerlObj functions,
1666 # C++ takes care of 'this'
1667 or
1668
1669 pPerl->Perl_sv_setsv(foo,bar); # in truly static functions,
1670 # see objXSUB.h
1671
1672Under PERL_OBJECT in extensions (aka PERL_CAPI), or under
1673MULTIPLICITY/USE_THREADS w/ PERL_IMPLICIT_CONTEXT in both core
1674and extensions, it will be:
1675
1676 Perl_sv_setsv(aTHX_ foo, bar); # the canonical Perl "API"
1677 # for all build flavors
1678
1679This doesn't work so cleanly for varargs functions, though, as macros
1680imply that the number of arguments is known in advance. Instead we
1681either need to spell them out fully, passing C<aTHX_> as the first
1682argument (the Perl core tends to do this with functions like
1683Perl_warner), or use a context-free version.
1684
1685The context-free version of Perl_warner is called
1686Perl_warner_nocontext, and does not take the extra argument. Instead
1687it does dTHX; to get the context from thread-local storage. We
1688C<#define warner Perl_warner_nocontext> so that extensions get source
1689compatibility at the expense of performance. (Passing an arg is
1690cheaper than grabbing it from thread-local storage.)
1691
1692You can ignore [pad]THX[xo] when browsing the Perl headers/sources.
1693Those are strictly for use within the core. Extensions and embedders
1694need only be aware of [pad]THX.
1695
1696=head2 How do I use all this in extensions?
1697
1698When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call
1699any functions in the Perl API will need to pass the initial context
1700argument somehow. The kicker is that you will need to write it in
1701such a way that the extension still compiles when Perl hasn't been
1702built with PERL_IMPLICIT_CONTEXT enabled.
1703
1704There are three ways to do this. First, the easy but inefficient way,
1705which is also the default, in order to maintain source compatibility
1706with extensions: whenever XSUB.h is #included, it redefines the aTHX
1707and aTHX_ macros to call a function that will return the context.
1708Thus, something like:
1709
1710 sv_setsv(asv, bsv);
1711
4375e838 1712in your extension will translate to this when PERL_IMPLICIT_CONTEXT is
54aff467 1713in effect:
ee072b34 1714
2fa86c13 1715 Perl_sv_setsv(Perl_get_context(), asv, bsv);
ee072b34 1716
54aff467 1717or to this otherwise:
ee072b34
GS
1718
1719 Perl_sv_setsv(asv, bsv);
1720
1721You have to do nothing new in your extension to get this; since
2fa86c13 1722the Perl library provides Perl_get_context(), it will all just
ee072b34
GS
1723work.
1724
1725The second, more efficient way is to use the following template for
1726your Foo.xs:
1727
1728 #define PERL_NO_GET_CONTEXT /* we want efficiency */
1729 #include "EXTERN.h"
1730 #include "perl.h"
1731 #include "XSUB.h"
1732
1733 static my_private_function(int arg1, int arg2);
1734
1735 static SV *
54aff467 1736 my_private_function(int arg1, int arg2)
ee072b34
GS
1737 {
1738 dTHX; /* fetch context */
1739 ... call many Perl API functions ...
1740 }
1741
1742 [... etc ...]
1743
1744 MODULE = Foo PACKAGE = Foo
1745
1746 /* typical XSUB */
1747
1748 void
1749 my_xsub(arg)
1750 int arg
1751 CODE:
1752 my_private_function(arg, 10);
1753
1754Note that the only two changes from the normal way of writing an
1755extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before
1756including the Perl headers, followed by a C<dTHX;> declaration at
1757the start of every function that will call the Perl API. (You'll
1758know which functions need this, because the C compiler will complain
1759that there's an undeclared identifier in those functions.) No changes
1760are needed for the XSUBs themselves, because the XS() macro is
1761correctly defined to pass in the implicit context if needed.
1762
1763The third, even more efficient way is to ape how it is done within
1764the Perl guts:
1765
1766
1767 #define PERL_NO_GET_CONTEXT /* we want efficiency */
1768 #include "EXTERN.h"
1769 #include "perl.h"
1770 #include "XSUB.h"
1771
1772 /* pTHX_ only needed for functions that call Perl API */
1773 static my_private_function(pTHX_ int arg1, int arg2);
1774
1775 static SV *
1776 my_private_function(pTHX_ int arg1, int arg2)
1777 {
1778 /* dTHX; not needed here, because THX is an argument */
1779 ... call Perl API functions ...
1780 }
1781
1782 [... etc ...]
1783
1784 MODULE = Foo PACKAGE = Foo
1785
1786 /* typical XSUB */
1787
1788 void
1789 my_xsub(arg)
1790 int arg
1791 CODE:
1792 my_private_function(aTHX_ arg, 10);
1793
1794This implementation never has to fetch the context using a function
1795call, since it is always passed as an extra argument. Depending on
1796your needs for simplicity or efficiency, you may mix the previous
1797two approaches freely.
1798
651a3225
GS
1799Never add a comma after C<pTHX> yourself--always use the form of the
1800macro with the underscore for functions that take explicit arguments,
1801or the form without the argument for functions with no explicit arguments.
ee072b34
GS
1802
1803=head2 Future Plans and PERL_IMPLICIT_SYS
1804
1805Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything
1806that the interpreter knows about itself and pass it around, so too are
1807there plans to allow the interpreter to bundle up everything it knows
1808about the environment it's running on. This is enabled with the
1809PERL_IMPLICIT_SYS macro. Currently it only works with PERL_OBJECT,
1810but is mostly there for MULTIPLICITY and USE_THREADS (see inside
1811iperlsys.h).
1812
1813This allows the ability to provide an extra pointer (called the "host"
1814environment) for all the system calls. This makes it possible for
1815all the system stuff to maintain their own state, broken down into
1816seven C structures. These are thin wrappers around the usual system
1817calls (see win32/perllib.c) for the default perl executable, but for a
1818more ambitious host (like the one that would do fork() emulation) all
1819the extra work needed to pretend that different interpreters are
1820actually different "processes", would be done here.
1821
1822The Perl engine/interpreter and the host are orthogonal entities.
1823There could be one or more interpreters in a process, and one or
1824more "hosts", with free association between them.
1825
a422fd2d
SC
1826=head1 Internal Functions
1827
1828All of Perl's internal functions which will be exposed to the outside
1829world are be prefixed by C<Perl_> so that they will not conflict with XS
1830functions or functions used in a program in which Perl is embedded.
1831Similarly, all global variables begin with C<PL_>. (By convention,
1832static functions start with C<S_>)
1833
1834Inside the Perl core, you can get at the functions either with or
1835without the C<Perl_> prefix, thanks to a bunch of defines that live in
1836F<embed.h>. This header file is generated automatically from
1837F<embed.pl>. F<embed.pl> also creates the prototyping header files for
1838the internal functions, generates the documentation and a lot of other
1839bits and pieces. It's important that when you add a new function to the
1840core or change an existing one, you change the data in the table at the
1841end of F<embed.pl> as well. Here's a sample entry from that table:
1842
1843 Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval
1844
1845The second column is the return type, the third column the name. Columns
1846after that are the arguments. The first column is a set of flags:
1847
1848=over 3
1849
1850=item A
1851
1852This function is a part of the public API.
1853
1854=item p
1855
1856This function has a C<Perl_> prefix; ie, it is defined as C<Perl_av_fetch>
1857
1858=item d
1859
1860This function has documentation using the C<apidoc> feature which we'll
1861look at in a second.
1862
1863=back
1864
1865Other available flags are:
1866
1867=over 3
1868
1869=item s
1870
1871This is a static function and is defined as C<S_whatever>.
1872
1873=item n
1874
1875This does not use C<aTHX_> and C<pTHX> to pass interpreter context. (See
1876L<perlguts/Background and PERL_IMPLICIT_CONTEXT>.)
1877
1878=item r
1879
1880This function never returns; C<croak>, C<exit> and friends.
1881
1882=item f
1883
1884This function takes a variable number of arguments, C<printf> style.
1885The argument list should end with C<...>, like this:
1886
1887 Afprd |void |croak |const char* pat|...
1888
1889=item m
1890
1891This function is part of the experimental development API, and may change
1892or disappear without notice.
1893
1894=item o
1895
1896This function should not have a compatibility macro to define, say,
1897C<Perl_parse> to C<parse>. It must be called as C<Perl_parse>.
1898
1899=item j
1900
1901This function is not a member of C<CPerlObj>. If you don't know
1902what this means, don't use it.
1903
1904=item x
1905
1906This function isn't exported out of the Perl core.
1907
1908=back
1909
1910If you edit F<embed.pl>, you will need to run C<make regen_headers> to
1911force a rebuild of F<embed.h> and other auto-generated files.
1912
6b4667fc 1913=head2 Formatted Printing of IVs, UVs, and NVs
9dd9db0b 1914
6b4667fc
A
1915If you are printing IVs, UVs, or NVS instead of the stdio(3) style
1916formatting codes like C<%d>, C<%ld>, C<%f>, you should use the
1917following macros for portability
9dd9db0b
JH
1918
1919 IVdf IV in decimal
1920 UVuf UV in decimal
1921 UVof UV in octal
1922 UVxf UV in hexadecimal
6b4667fc
A
1923 NVef NV %e-like
1924 NVff NV %f-like
1925 NVgf NV %g-like
9dd9db0b 1926
6b4667fc
A
1927These will take care of 64-bit integers and long doubles.
1928For example:
1929
1930 printf("IV is %"IVdf"\n", iv);
1931
1932The IVdf will expand to whatever is the correct format for the IVs.
9dd9db0b 1933
8908e76d
JH
1934If you are printing addresses of pointers, use UVxf combined
1935with PTR2UV(), do not use %lx or %p.
1936
1937=head2 Pointer-To-Integer and Integer-To-Pointer
1938
1939Because pointer size does not necessarily equal integer size,
1940use the follow macros to do it right.
1941
1942 PTR2UV(pointer)
1943 PTR2IV(pointer)
1944 PTR2NV(pointer)
1945 INT2PTR(pointertotype, integer)
1946
1947For example:
1948
1949 IV iv = ...;
1950 SV *sv = INT2PTR(SV*, iv);
1951
1952and
1953
1954 AV *av = ...;
1955 UV uv = PTR2UV(av);
1956
a422fd2d
SC
1957=head2 Source Documentation
1958
1959There's an effort going on to document the internal functions and
1960automatically produce reference manuals from them - L<perlapi> is one
1961such manual which details all the functions which are available to XS
1962writers. L<perlintern> is the autogenerated manual for the functions
1963which are not part of the API and are supposedly for internal use only.
1964
1965Source documentation is created by putting POD comments into the C
1966source, like this:
1967
1968 /*
1969 =for apidoc sv_setiv
1970
1971 Copies an integer into the given SV. Does not handle 'set' magic. See
1972 C<sv_setiv_mg>.
1973
1974 =cut
1975 */
1976
1977Please try and supply some documentation if you add functions to the
1978Perl core.
1979
1980=head1 Unicode Support
1981
1982Perl 5.6.0 introduced Unicode support. It's important for porters and XS
1983writers to understand this support and make sure that the code they
1984write does not corrupt Unicode data.
1985
1986=head2 What B<is> Unicode, anyway?
1987
1988In the olden, less enlightened times, we all used to use ASCII. Most of
1989us did, anyway. The big problem with ASCII is that it's American. Well,
1990no, that's not actually the problem; the problem is that it's not
1991particularly useful for people who don't use the Roman alphabet. What
1992used to happen was that particular languages would stick their own
1993alphabet in the upper range of the sequence, between 128 and 255. Of
1994course, we then ended up with plenty of variants that weren't quite
1995ASCII, and the whole point of it being a standard was lost.
1996
1997Worse still, if you've got a language like Chinese or
1998Japanese that has hundreds or thousands of characters, then you really
1999can't fit them into a mere 256, so they had to forget about ASCII
2000altogether, and build their own systems using pairs of numbers to refer
2001to one character.
2002
2003To fix this, some people formed Unicode, Inc. and
2004produced a new character set containing all the characters you can
2005possibly think of and more. There are several ways of representing these
2006characters, and the one Perl uses is called UTF8. UTF8 uses
2007a variable number of bytes to represent a character, instead of just
b3b6085d 2008one. You can learn more about Unicode at http://www.unicode.org/
a422fd2d
SC
2009
2010=head2 How can I recognise a UTF8 string?
2011
2012You can't. This is because UTF8 data is stored in bytes just like
2013non-UTF8 data. The Unicode character 200, (C<0xC8> for you hex types)
2014capital E with a grave accent, is represented by the two bytes
2015C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)>
2016has that byte sequence as well. So you can't tell just by looking - this
2017is what makes Unicode input an interesting problem.
2018
2019The API function C<is_utf8_string> can help; it'll tell you if a string
2020contains only valid UTF8 characters. However, it can't do the work for
2021you. On a character-by-character basis, C<is_utf8_char> will tell you
2022whether the current character in a string is valid UTF8.
2023
2024=head2 How does UTF8 represent Unicode characters?
2025
2026As mentioned above, UTF8 uses a variable number of bytes to store a
2027character. Characters with values 1...128 are stored in one byte, just
2028like good ol' ASCII. Character 129 is stored as C<v194.129>; this
a31a806a 2029continues up to character 191, which is C<v194.191>. Now we've run out of
a422fd2d
SC
2030bits (191 is binary C<10111111>) so we move on; 192 is C<v195.128>. And
2031so it goes on, moving to three bytes at character 2048.
2032
2033Assuming you know you're dealing with a UTF8 string, you can find out
2034how long the first character in it is with the C<UTF8SKIP> macro:
2035
2036 char *utf = "\305\233\340\240\201";
2037 I32 len;
2038
2039 len = UTF8SKIP(utf); /* len is 2 here */
2040 utf += len;
2041 len = UTF8SKIP(utf); /* len is 3 here */
2042
2043Another way to skip over characters in a UTF8 string is to use
2044C<utf8_hop>, which takes a string and a number of characters to skip
2045over. You're on your own about bounds checking, though, so don't use it
2046lightly.
2047
2048All bytes in a multi-byte UTF8 character will have the high bit set, so
2049you can test if you need to do something special with this character
2050like this:
2051
2052 UV uv;
2053
2054 if (utf & 0x80)
2055 /* Must treat this as UTF8 */
2056 uv = utf8_to_uv(utf);
2057 else
2058 /* OK to treat this character as a byte */
2059 uv = *utf;
2060
2061You can also see in that example that we use C<utf8_to_uv> to get the
2062value of the character; the inverse function C<uv_to_utf8> is available
2063for putting a UV into UTF8:
2064
2065 if (uv > 0x80)
2066 /* Must treat this as UTF8 */
2067 utf8 = uv_to_utf8(utf8, uv);
2068 else
2069 /* OK to treat this character as a byte */
2070 *utf8++ = uv;
2071
2072You B<must> convert characters to UVs using the above functions if
2073you're ever in a situation where you have to match UTF8 and non-UTF8
2074characters. You may not skip over UTF8 characters in this case. If you
2075do this, you'll lose the ability to match hi-bit non-UTF8 characters;
2076for instance, if your UTF8 string contains C<v196.172>, and you skip
2077that character, you can never match a C<chr(200)> in a non-UTF8 string.
2078So don't do that!
2079
2080=head2 How does Perl store UTF8 strings?
2081
2082Currently, Perl deals with Unicode strings and non-Unicode strings
2083slightly differently. If a string has been identified as being UTF-8
2084encoded, Perl will set a flag in the SV, C<SVf_UTF8>. You can check and
2085manipulate this flag with the following macros:
2086
2087 SvUTF8(sv)
2088 SvUTF8_on(sv)
2089 SvUTF8_off(sv)
2090
2091This flag has an important effect on Perl's treatment of the string: if
2092Unicode data is not properly distinguished, regular expressions,
2093C<length>, C<substr> and other string handling operations will have
2094undesirable results.
2095
2096The problem comes when you have, for instance, a string that isn't
2097flagged is UTF8, and contains a byte sequence that could be UTF8 -
2098especially when combining non-UTF8 and UTF8 strings.
2099
2100Never forget that the C<SVf_UTF8> flag is separate to the PV value; you
2101need be sure you don't accidentally knock it off while you're
2102manipulating SVs. More specifically, you cannot expect to do this:
2103
2104 SV *sv;
2105 SV *nsv;
2106 STRLEN len;
2107 char *p;
2108
2109 p = SvPV(sv, len);
2110 frobnicate(p);
2111 nsv = newSVpvn(p, len);
2112
2113The C<char*> string does not tell you the whole story, and you can't
2114copy or reconstruct an SV just by copying the string value. Check if the
2115old SV has the UTF8 flag set, and act accordingly:
2116
2117 p = SvPV(sv, len);
2118 frobnicate(p);
2119 nsv = newSVpvn(p, len);
2120 if (SvUTF8(sv))
2121 SvUTF8_on(nsv);
2122
2123In fact, your C<frobnicate> function should be made aware of whether or
2124not it's dealing with UTF8 data, so that it can handle the string
2125appropriately.
2126
2127=head2 How do I convert a string to UTF8?
2128
2129If you're mixing UTF8 and non-UTF8 strings, you might find it necessary
2130to upgrade one of the strings to UTF8. If you've got an SV, the easiest
2131way to do this is:
2132
2133 sv_utf8_upgrade(sv);
2134
2135However, you must not do this, for example:
2136
2137 if (!SvUTF8(left))
2138 sv_utf8_upgrade(left);
2139
2140If you do this in a binary operator, you will actually change one of the
b1866b2d 2141strings that came into the operator, and, while it shouldn't be noticeable
a422fd2d
SC
2142by the end user, it can cause problems.
2143
2144Instead, C<bytes_to_utf8> will give you a UTF8-encoded B<copy> of its
2145string argument. This is useful for having the data available for
b1866b2d 2146comparisons and so on, without harming the original SV. There's also
a422fd2d
SC
2147C<utf8_to_bytes> to go the other way, but naturally, this will fail if
2148the string contains any characters above 255 that can't be represented
2149in a single byte.
2150
2151=head2 Is there anything else I need to know?
2152
2153Not really. Just remember these things:
2154
2155=over 3
2156
2157=item *
2158
2159There's no way to tell if a string is UTF8 or not. You can tell if an SV
2160is UTF8 by looking at is C<SvUTF8> flag. Don't forget to set the flag if
2161something should be UTF8. Treat the flag as part of the PV, even though
2162it's not - if you pass on the PV to somewhere, pass on the flag too.
2163
2164=item *
2165
2166If a string is UTF8, B<always> use C<utf8_to_uv> to get at the value,
2167unless C<!(*s & 0x80)> in which case you can use C<*s>.
2168
2169=item *
2170
2171When writing to a UTF8 string, B<always> use C<uv_to_utf8>, unless
2172C<uv < 0x80> in which case you can use C<*s = uv>.
2173
2174=item *
2175
2176Mixing UTF8 and non-UTF8 strings is tricky. Use C<bytes_to_utf8> to get
2177a new string which is UTF8 encoded. There are tricks you can use to
2178delay deciding whether you need to use a UTF8 string until you get to a
2179high character - C<HALF_UPGRADE> is one of those.
2180
2181=back
2182
954c1994 2183=head1 AUTHORS
e89caa19 2184
954c1994
GS
2185Until May 1997, this document was maintained by Jeff Okamoto
2186<okamoto@corp.hp.com>. It is now maintained as part of Perl itself
2187by the Perl 5 Porters <perl5-porters@perl.org>.
cb1a09d0 2188
954c1994
GS
2189With lots of help and suggestions from Dean Roehrich, Malcolm Beattie,
2190Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil
2191Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer,
2192Stephen McCamant, and Gurusamy Sarathy.
cb1a09d0 2193
954c1994 2194API Listing originally by Dean Roehrich <roehrich@cray.com>.
cb1a09d0 2195
954c1994
GS
2196Modifications to autogenerate the API listing (L<perlapi>) by Benjamin
2197Stuhl.
cb1a09d0 2198
954c1994 2199=head1 SEE ALSO
cb1a09d0 2200
954c1994 2201perlapi(1), perlintern(1), perlxs(1), perlembed(1)