perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=head1 NAME
	2
	3	perlguts - Introduction to the Perl API
	4
	5	=head1 DESCRIPTION
	6
	7	This document attempts to describe how to use the Perl API, as well as
	8	containing some info on the basic workings of the Perl core. It is far
	9	from complete and probably contains many errors. Please refer any
	10	questions or comments to the author below.
	11
	12	=head1 Variables
	13
	14	=head2 Datatypes
	15
	16	Perl has three typedefs that handle Perl's three main data types:
	17
	18	SV Scalar Value
	19	AV Array Value
	20	HV Hash Value
	21
	22	Each typedef has specific routines that manipulate the various data types.
	23
	24	=head2 What is an "IV"?
	25
	26	Perl uses a special typedef IV which is a simple signed integer type that is
	27	guaranteed to be large enough to hold a pointer (as well as an integer).
	28	Additionally, there is the UV, which is simply an unsigned IV.
	29
	30	Perl also uses two special typedefs, I32 and I16, which will always be at
	31	least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16,
	32	as well.)
	33
	34	=head2 Working with SVs
	35
	36	An SV can be created and loaded with one command. There are four types of
	37	values that can be loaded: an integer value (IV), a double (NV),
	38	a string (PV), and another scalar (SV).
	39
	40	The six routines are:
	41
	42	SV* newSViv(IV);
	43	SV* newSVnv(double);
	44	SV* newSVpv(const char*, int);
	45	SV* newSVpvn(const char*, int);
	46	SV* newSVpvf(const char*, ...);
	47	SV* newSVsv(SV*);
	48
	49	To change the value of an already-existing SV, there are seven routines:
	50
	51	void sv_setiv(SV*, IV);
	52	void sv_setuv(SV*, UV);
	53	void sv_setnv(SV*, double);
	54	void sv_setpv(SV, const char);
	55	void sv_setpvn(SV, const char, int)
	56	void sv_setpvf(SV, const char, ...);
	57	void sv_setpvfn(SV, const char, STRLEN, va_list , SV *, I32, bool);
	58	void sv_setsv(SV, SV);
	59
	60	Notice that you can choose to specify the length of the string to be
	61	assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may
	62	allow Perl to calculate the length by using C<sv_setpv> or by specifying
	63	0 as the second argument to C<newSVpv>. Be warned, though, that Perl will
	64	determine the string's length by using C<strlen>, which depends on the
	65	string terminating with a NUL character.
	66
	67	The arguments of C<sv_setpvf> are processed like C<sprintf>, and the
	68	formatted output becomes the value.
	69
	70	C<sv_setpvfn> is an analogue of C<vsprintf>, but it allows you to specify
	71	either a pointer to a variable argument list or the address and length of
	72	an array of SVs. The last argument points to a boolean; on return, if that
	73	boolean is true, then locale-specific information has been used to format
	74	the string, and the string's contents are therefore untrustworthy (see
	75	L<perlsec>). This pointer may be NULL if that information is not
	76	important. Note that this function requires you to specify the length of
	77	the format.
	78
	79	STRLEN is an integer type (Size_t, usually defined as size_t in
	80	config.h) guaranteed to be large enough to represent the size of
	81	any string that perl can handle.
	82
	83	The C<sv_set*()> functions are not generic enough to operate on values
	84	that have "magic". See L<Magic Virtual Tables> later in this document.
	85
	86	All SVs that contain strings should be terminated with a NUL character.
	87	If it is not NUL-terminated there is a risk of
	88	core dumps and corruptions from code which passes the string to C
	89	functions or system calls which expect a NUL-terminated string.
	90	Perl's own functions typically add a trailing NUL for this reason.
	91	Nevertheless, you should be very careful when you pass a string stored
	92	in an SV to a C function or system call.
	93
	94	To access the actual value that an SV points to, you can use the macros:
	95
	96	SvIV(SV*)
	97	SvUV(SV*)
	98	SvNV(SV*)
	99	SvPV(SV*, STRLEN len)
	100	SvPV_nolen(SV*)
	101
	102	which will automatically coerce the actual scalar type into an IV, UV, double,
	103	or string.
	104
	105	In the C<SvPV> macro, the length of the string returned is placed into the
	106	variable C<len> (this is a macro, so you do I<not> use C<&len>). If you do
	107	not care what the length of the data is, use the C<SvPV_nolen> macro.
	108	Historically the C<SvPV> macro with the global variable C<PL_na> has been
	109	used in this case. But that can be quite inefficient because C<PL_na> must
	110	be accessed in thread-local storage in threaded Perl. In any case, remember
	111	that Perl allows arbitrary strings of data that may both contain NULs and
	112	might not be terminated by a NUL.
	113
	114	Also remember that C doesn't allow you to safely say C<foo(SvPV(s, len),
	115	len);>. It might work with your compiler, but it won't work for everyone.
	116	Break this sort of statement up into separate assignments:
	117
	118	SV *s;
	119	STRLEN len;
	120	char * ptr;
	121	ptr = SvPV(s, len);
	122	foo(ptr, len);
	123
	124	If you want to know if the scalar value is TRUE, you can use:
	125
	126	SvTRUE(SV*)
	127
	128	Although Perl will automatically grow strings for you, if you need to force
	129	Perl to allocate more memory for your SV, you can use the macro
	130
	131	SvGROW(SV*, STRLEN newlen)
	132
	133	which will determine if more memory needs to be allocated. If so, it will
	134	call the function C<sv_grow>. Note that C<SvGROW> can only increase, not
	135	decrease, the allocated memory of an SV and that it does not automatically
	136	add a byte for the a trailing NUL (perl's own string functions typically do
	137	C<SvGROW(sv, len + 1)>).
	138
	139	If you have an SV and want to know what kind of data Perl thinks is stored
	140	in it, you can use the following macros to check the type of SV you have.
	141
	142	SvIOK(SV*)
	143	SvNOK(SV*)
	144	SvPOK(SV*)
	145
	146	You can get and set the current length of the string stored in an SV with
	147	the following macros:
	148
	149	SvCUR(SV*)
	150	SvCUR_set(SV*, I32 val)
	151
	152	You can also get a pointer to the end of the string stored in the SV
	153	with the macro:
	154
	155	SvEND(SV*)
	156
	157	But note that these last three macros are valid only if C<SvPOK()> is true.
	158
	159	If you want to append something to the end of string stored in an C<SV*>,
	160	you can use the following functions:
	161
	162	void sv_catpv(SV, const char);
	163	void sv_catpvn(SV, const char, STRLEN);
	164	void sv_catpvf(SV, const char, ...);
	165	void sv_catpvfn(SV, const char, STRLEN, va_list , SV *, I32, bool);
	166	void sv_catsv(SV, SV);
	167
	168	The first function calculates the length of the string to be appended by
	169	using C<strlen>. In the second, you specify the length of the string
	170	yourself. The third function processes its arguments like C<sprintf> and
	171	appends the formatted output. The fourth function works like C<vsprintf>.
	172	You can specify the address and length of an array of SVs instead of the
	173	va_list argument. The fifth function extends the string stored in the first
	174	SV with the string stored in the second SV. It also forces the second SV
	175	to be interpreted as a string.
	176
	177	The C<sv_cat*()> functions are not generic enough to operate on values that
	178	have "magic". See L<Magic Virtual Tables> later in this document.
	179
	180	If you know the name of a scalar variable, you can get a pointer to its SV
	181	by using the following:
	182
	183	SV* get_sv("package::varname", FALSE);
	184
	185	This returns NULL if the variable does not exist.
	186
	187	If you want to know if this variable (or any other SV) is actually C<defined>,
	188	you can call:
	189
	190	SvOK(SV*)
	191
	192	The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>. Its
	193	address can be used whenever an C<SV*> is needed.
	194
	195	There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain Boolean
	196	TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their addresses can
	197	be used whenever an C<SV*> is needed.
	198
	199	Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>.
	200	Take this code:
	201
	202	SV* sv = (SV*) 0;
	203	if (I-am-to-return-a-real-value) {
	204	sv = sv_2mortal(newSViv(42));
	205	}
	206	sv_setsv(ST(0), sv);
	207
	208	This code tries to return a new SV (which contains the value 42) if it should
	209	return a real value, or undef otherwise. Instead it has returned a NULL
	210	pointer which, somewhere down the line, will cause a segmentation violation,
	211	bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the first
	212	line and all will be well.
	213
	214	To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this
	215	call is not necessary (see L<Reference Counts and Mortality>).
	216
	217	=head2 Offsets
	218
	219	Perl provides the function C<sv_chop> to efficiently remove characters
	220	from the beginning of a string; you give it an SV and a pointer to
	221	somewhere inside the the PV, and it discards everything before the
	222	pointer. The efficiency comes by means of a little hack: instead of
	223	actually removing the characters, C<sv_chop> sets the flag C<OOK>
	224	(offset OK) to signal to other functions that the offset hack is in
	225	effect, and it puts the number of bytes chopped off into the IV field
	226	of the SV. It then moves the PV pointer (called C<SvPVX>) forward that
	227	many bytes, and adjusts C<SvCUR> and C<SvLEN>.
	228
	229	Hence, at this point, the start of the buffer that we allocated lives
	230	at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing
	231	into the middle of this allocated storage.
	232
	233	This is best demonstrated by example:
	234
	235	% ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)'
	236	SV = PVIV(0x8128450) at 0x81340f0
	237	REFCNT = 1
	238	FLAGS = (POK,OOK,pPOK)
	239	IV = 1 (OFFSET)
	240	PV = 0x8135781 ( "1" . ) "2345"\0
	241	CUR = 4
	242	LEN = 5
	243
	244	Here the number of bytes chopped off (1) is put into IV, and
	245	C<Devel::Peek::Dump> helpfully reminds us that this is an offset. The
	246	portion of the string between the "real" and the "fake" beginnings is
	247	shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect
	248	the fake beginning, not the real one.
	249
	250	=head2 What's Really Stored in an SV?
	251
	252	Recall that the usual method of determining the type of scalar you have is
	253	to use C<Sv*OK> macros. Because a scalar can be both a number and a string,
	254	usually these macros will always return TRUE and calling the C<Sv*V>
	255	macros will do the appropriate conversion of string to integer/double or
	256	integer/double to string.
	257
	258	If you I<really> need to know if you have an integer, double, or string
	259	pointer in an SV, you can use the following three macros instead:
	260
	261	SvIOKp(SV*)
	262	SvNOKp(SV*)
	263	SvPOKp(SV*)
	264
	265	These will tell you if you truly have an integer, double, or string pointer
	266	stored in your SV. The "p" stands for private.
	267
	268	In general, though, it's best to use the C<Sv*V> macros.
	269
	270	=head2 Working with AVs
	271
	272	There are two ways to create and load an AV. The first method creates an
	273	empty AV:
	274
	275	AV* newAV();
	276
	277	The second method both creates the AV and initially populates it with SVs:
	278
	279	AV* av_make(I32 num, SV **ptr);
	280
	281	The second argument points to an array containing C<num> C<SV*>'s. Once the
	282	AV has been created, the SVs can be destroyed, if so desired.
	283
	284	Once the AV has been created, the following operations are possible on AVs:
	285
	286	void av_push(AV, SV);
	287	SV* av_pop(AV*);
	288	SV* av_shift(AV*);
	289	void av_unshift(AV*, I32 num);
	290
	291	These should be familiar operations, with the exception of C<av_unshift>.
	292	This routine adds C<num> elements at the front of the array with the C<undef>
	293	value. You must then use C<av_store> (described below) to assign values
	294	to these new elements.
	295
	296	Here are some other functions:
	297
	298	I32 av_len(AV*);
	299	SV** av_fetch(AV*, I32 key, I32 lval);
	300	SV** av_store(AV, I32 key, SV val);
	301
	302	The C<av_len> function returns the highest index value in array (just
	303	like $#array in Perl). If the array is empty, -1 is returned. The
	304	C<av_fetch> function returns the value at index C<key>, but if C<lval>
	305	is non-zero, then C<av_fetch> will store an undef value at that index.
	306	The C<av_store> function stores the value C<val> at index C<key>, and does
	307	not increment the reference count of C<val>. Thus the caller is responsible
	308	for taking care of that, and if C<av_store> returns NULL, the caller will
	309	have to decrement the reference count to avoid a memory leak. Note that
	310	C<av_fetch> and C<av_store> both return C<SV*>'s, not C<SV>'s as their
	311	return value.
	312
	313	void av_clear(AV*);
	314	void av_undef(AV*);
	315	void av_extend(AV*, I32 key);
	316
	317	The C<av_clear> function deletes all the elements in the AV* array, but
	318	does not actually delete the array itself. The C<av_undef> function will
	319	delete all the elements in the array plus the array itself. The
	320	C<av_extend> function extends the array so that it contains at least C<key+1>
	321	elements. If C<key+1> is less than the currently allocated length of the array,
	322	then nothing is done.
	323
	324	If you know the name of an array variable, you can get a pointer to its AV
	325	by using the following:
	326
	327	AV* get_av("package::varname", FALSE);
	328
	329	This returns NULL if the variable does not exist.
	330
	331	See L<Understanding the Magic of Tied Hashes and Arrays> for more
	332	information on how to use the array access functions on tied arrays.
	333
	334	=head2 Working with HVs
	335
	336	To create an HV, you use the following routine:
	337
	338	HV* newHV();
	339
	340	Once the HV has been created, the following operations are possible on HVs:
	341
	342	SV** hv_store(HV, const char key, U32 klen, SV* val, U32 hash);
	343	SV** hv_fetch(HV, const char key, U32 klen, I32 lval);
	344
	345	The C<klen> parameter is the length of the key being passed in (Note that
	346	you cannot pass 0 in as a value of C<klen> to tell Perl to measure the
	347	length of the key). The C<val> argument contains the SV pointer to the
	348	scalar being stored, and C<hash> is the precomputed hash value (zero if
	349	you want C<hv_store> to calculate it for you). The C<lval> parameter
	350	indicates whether this fetch is actually a part of a store operation, in
	351	which case a new undefined value will be added to the HV with the supplied
	352	key and C<hv_fetch> will return as if the value had already existed.
	353
	354	Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just
	355	C<SV*>. To access the scalar value, you must first dereference the return
	356	value. However, you should check to make sure that the return value is
	357	not NULL before dereferencing it.
	358
	359	These two functions check if a hash table entry exists, and deletes it.
	360
	361	bool hv_exists(HV, const char key, U32 klen);
	362	SV* hv_delete(HV, const char key, U32 klen, I32 flags);
	363
	364	If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will
	365	create and return a mortal copy of the deleted value.
	366
	367	And more miscellaneous functions:
	368
	369	void hv_clear(HV*);
	370	void hv_undef(HV*);
	371
	372	Like their AV counterparts, C<hv_clear> deletes all the entries in the hash
	373	table but does not actually delete the hash table. The C<hv_undef> deletes
	374	both the entries and the hash table itself.
	375
	376	Perl keeps the actual data in linked list of structures with a typedef of HE.
	377	These contain the actual key and value pointers (plus extra administrative
	378	overhead). The key is a string pointer; the value is an C<SV*>. However,
	379	once you have an C<HE*>, to get the actual key and value, use the routines
	380	specified below.
	381
	382	I32 hv_iterinit(HV*);
	383	/* Prepares starting point to traverse hash table */
	384	HE* hv_iternext(HV*);
	385	/* Get the next entry, and return a pointer to a
	386	structure that has both the key and value */
	387	char* hv_iterkey(HE* entry, I32* retlen);
	388	/* Get the key from an HE structure and also return
	389	the length of the key string */
	390	SV* hv_iterval(HV, HE entry);
	391	/* Return a SV pointer to the value of the HE
	392	structure */
	393	SV* hv_iternextsv(HV, char* key, I32* retlen);
	394	/* This convenience routine combines hv_iternext,
	395	hv_iterkey, and hv_iterval. The key and retlen
	396	arguments are return values for the key and its
	397	length. The value is returned in the SV* argument */
	398
	399	If you know the name of a hash variable, you can get a pointer to its HV
	400	by using the following:
	401
	402	HV* get_hv("package::varname", FALSE);
	403
	404	This returns NULL if the variable does not exist.
	405
	406	The hash algorithm is defined in the C<PERL_HASH(hash, key, klen)> macro:
	407
	408	hash = 0;
	409	while (klen--)
	410	hash = (hash * 33) + *key++;
	411	hash = hash + (hash >> 5); /* after 5.6 */
	412
	413	The last step was added in version 5.6 to improve distribution of
	414	lower bits in the resulting hash value.
	415
	416	See L<Understanding the Magic of Tied Hashes and Arrays> for more
	417	information on how to use the hash access functions on tied hashes.
	418
	419	=head2 Hash API Extensions
	420
	421	Beginning with version 5.004, the following functions are also supported:
	422
	423	HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash);
	424	HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash);
	425
	426	bool hv_exists_ent (HV* tb, SV* key, U32 hash);
	427	SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
	428
	429	SV* hv_iterkeysv (HE* entry);
	430
	431	Note that these functions take C<SV*> keys, which simplifies writing
	432	of extension code that deals with hash structures. These functions
	433	also allow passing of C<SV*> keys to C<tie> functions without forcing
	434	you to stringify the keys (unlike the previous set of functions).
	435
	436	They also return and accept whole hash entries (C<HE*>), making their
	437	use more efficient (since the hash number for a particular string
	438	doesn't have to be recomputed every time). See L<perlapi> for detailed
	439	descriptions.
	440
	441	The following macros must always be used to access the contents of hash
	442	entries. Note that the arguments to these macros must be simple
	443	variables, since they may get evaluated more than once. See
	444	L<perlapi> for detailed descriptions of these macros.
	445
	446	HePV(HE* he, STRLEN len)
	447	HeVAL(HE* he)
	448	HeHASH(HE* he)
	449	HeSVKEY(HE* he)
	450	HeSVKEY_force(HE* he)
	451	HeSVKEY_set(HE* he, SV* sv)
	452
	453	These two lower level macros are defined, but must only be used when
	454	dealing with keys that are not C<SV*>s:
	455
	456	HeKEY(HE* he)
	457	HeKLEN(HE* he)
	458
	459	Note that both C<hv_store> and C<hv_store_ent> do not increment the
	460	reference count of the stored C<val>, which is the caller's responsibility.
	461	If these functions return a NULL value, the caller will usually have to
	462	decrement the reference count of C<val> to avoid a memory leak.
	463
	464	=head2 References
	465
	466	References are a special type of scalar that point to other data types
	467	(including references).
	468
	469	To create a reference, use either of the following functions:
	470
	471	SV* newRV_inc((SV*) thing);
	472	SV* newRV_noinc((SV*) thing);
	473
	474	The C<thing> argument can be any of an C<SV>, C<AV>, or C<HV*>. The
	475	functions are identical except that C<newRV_inc> increments the reference
	476	count of the C<thing>, while C<newRV_noinc> does not. For historical
	477	reasons, C<newRV> is a synonym for C<newRV_inc>.
	478
	479	Once you have a reference, you can use the following macro to dereference
	480	the reference:
	481
	482	SvRV(SV*)
	483
	484	then call the appropriate routines, casting the returned C<SV*> to either an
	485	C<AV> or C<HV>, if required.
	486
	487	To determine if an SV is a reference, you can use the following macro:
	488
	489	SvROK(SV*)
	490
	491	To discover what type of value the reference refers to, use the following
	492	macro and then check the return value.
	493
	494	SvTYPE(SvRV(SV*))
	495
	496	The most useful types that will be returned are:
	497
	498	SVt_IV Scalar
	499	SVt_NV Scalar
	500	SVt_PV Scalar
	501	SVt_RV Scalar
	502	SVt_PVAV Array
	503	SVt_PVHV Hash
	504	SVt_PVCV Code
	505	SVt_PVGV Glob (possible a file handle)
	506	SVt_PVMG Blessed or Magical Scalar
	507
	508	See the sv.h header file for more details.
	509
	510	=head2 Blessed References and Class Objects
	511
	512	References are also used to support object-oriented programming. In the
	513	OO lexicon, an object is simply a reference that has been blessed into a
	514	package (or class). Once blessed, the programmer may now use the reference
	515	to access the various methods in the class.
	516
	517	A reference can be blessed into a package with the following function:
	518
	519	SV* sv_bless(SV* sv, HV* stash);
	520
	521	The C<sv> argument must be a reference. The C<stash> argument specifies
	522	which class the reference will belong to. See
	523	L<Stashes and Globs> for information on converting class names into stashes.
	524
	525	/* Still under construction */
	526
	527	Upgrades rv to reference if not already one. Creates new SV for rv to
	528	point to. If C<classname> is non-null, the SV is blessed into the specified
	529	class. SV is returned.
	530
	531	SV* newSVrv(SV* rv, const char* classname);
	532
	533	Copies integer, unsigned integer or double into an SV whose reference is C<rv>. SV is blessed
	534	if C<classname> is non-null.
	535
	536	SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
	537	SV* sv_setref_uv(SV* rv, const char* classname, UV uv);
	538	SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
	539
	540	Copies the pointer value (I<the address, not the string!>) into an SV whose
	541	reference is rv. SV is blessed if C<classname> is non-null.
	542
	543	SV* sv_setref_pv(SV* rv, const char* classname, PV iv);
	544
	545	Copies string into an SV whose reference is C<rv>. Set length to 0 to let
	546	Perl calculate the string length. SV is blessed if C<classname> is non-null.
	547
	548	SV* sv_setref_pvn(SV* rv, const char* classname, PV iv, STRLEN length);
	549
	550	Tests whether the SV is blessed into the specified class. It does not
	551	check inheritance relationships.
	552
	553	int sv_isa(SV* sv, const char* name);
	554
	555	Tests whether the SV is a reference to a blessed object.
	556
	557	int sv_isobject(SV* sv);
	558
	559	Tests whether the SV is derived from the specified class. SV can be either
	560	a reference to a blessed object or a string containing a class name. This
	561	is the function implementing the C<UNIVERSAL::isa> functionality.
	562
	563	bool sv_derived_from(SV* sv, const char* name);
	564
	565	To check if you've got an object derived from a specific class you have
	566	to write:
	567
	568	if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }
	569
	570	=head2 Creating New Variables
	571
	572	To create a new Perl variable with an undef value which can be accessed from
	573	your Perl script, use the following routines, depending on the variable type.
	574
	575	SV* get_sv("package::varname", TRUE);
	576	AV* get_av("package::varname", TRUE);
	577	HV* get_hv("package::varname", TRUE);
	578
	579	Notice the use of TRUE as the second parameter. The new variable can now
	580	be set, using the routines appropriate to the data type.
	581
	582	There are additional macros whose values may be bitwise OR'ed with the
	583	C<TRUE> argument to enable certain extra features. Those bits are:
	584
	585	GV_ADDMULTI Marks the variable as multiply defined, thus preventing the
	586	"Name <varname> used only once: possible typo" warning.
	587	GV_ADDWARN Issues the warning "Had to create <varname> unexpectedly" if
	588	the variable did not exist before the function was called.
	589
	590	If you do not specify a package name, the variable is created in the current
	591	package.
	592
	593	=head2 Reference Counts and Mortality
	594
	595	Perl uses an reference count-driven garbage collection mechanism. SVs,
	596	AVs, or HVs (xV for short in the following) start their life with a
	597	reference count of 1. If the reference count of an xV ever drops to 0,
	598	then it will be destroyed and its memory made available for reuse.
	599
	600	This normally doesn't happen at the Perl level unless a variable is
	601	undef'ed or the last variable holding a reference to it is changed or
	602	overwritten. At the internal level, however, reference counts can be
	603	manipulated with the following macros:
	604
	605	int SvREFCNT(SV* sv);
	606	SV* SvREFCNT_inc(SV* sv);
	607	void SvREFCNT_dec(SV* sv);
	608
	609	However, there is one other function which manipulates the reference
	610	count of its argument. The C<newRV_inc> function, you will recall,
	611	creates a reference to the specified argument. As a side effect,
	612	it increments the argument's reference count. If this is not what
	613	you want, use C<newRV_noinc> instead.
	614
	615	For example, imagine you want to return a reference from an XSUB function.
	616	Inside the XSUB routine, you create an SV which initially has a reference
	617	count of one. Then you call C<newRV_inc>, passing it the just-created SV.
	618	This returns the reference as a new SV, but the reference count of the
	619	SV you passed to C<newRV_inc> has been incremented to two. Now you
	620	return the reference from the XSUB routine and forget about the SV.
	621	But Perl hasn't! Whenever the returned reference is destroyed, the
	622	reference count of the original SV is decreased to one and nothing happens.
	623	The SV will hang around without any way to access it until Perl itself
	624	terminates. This is a memory leak.
	625
	626	The correct procedure, then, is to use C<newRV_noinc> instead of
	627	C<newRV_inc>. Then, if and when the last reference is destroyed,
	628	the reference count of the SV will go to zero and it will be destroyed,
	629	stopping any memory leak.
	630
	631	There are some convenience functions available that can help with the
	632	destruction of xVs. These functions introduce the concept of "mortality".
	633	An xV that is mortal has had its reference count marked to be decremented,
	634	but not actually decremented, until "a short time later". Generally the
	635	term "short time later" means a single Perl statement, such as a call to
	636	an XSUB function. The actual determinant for when mortal xVs have their
	637	reference count decremented depends on two macros, SAVETMPS and FREETMPS.
	638	See L<perlcall> and L<perlxs> for more details on these macros.
	639
	640	"Mortalization" then is at its simplest a deferred C<SvREFCNT_dec>.
	641	However, if you mortalize a variable twice, the reference count will
	642	later be decremented twice.
	643
	644	You should be careful about creating mortal variables. Strange things
	645	can happen if you make the same value mortal within multiple contexts,
	646	or if you make a variable mortal multiple times.
	647
	648	To create a mortal variable, use the functions:
	649
	650	SV* sv_newmortal()
	651	SV* sv_2mortal(SV*)
	652	SV* sv_mortalcopy(SV*)
	653
	654	The first call creates a mortal SV, the second converts an existing
	655	SV to a mortal SV (and thus defers a call to C<SvREFCNT_dec>), and the
	656	third creates a mortal copy of an existing SV.
	657
	658	The mortal routines are not just for SVs -- AVs and HVs can be
	659	made mortal by passing their address (type-casted to C<SV*>) to the
	660	C<sv_2mortal> or C<sv_mortalcopy> routines.
	661
	662	=head2 Stashes and Globs
	663
	664	A "stash" is a hash that contains all of the different objects that
	665	are contained within a package. Each key of the stash is a symbol
	666	name (shared by all the different types of objects that have the same
	667	name), and each value in the hash table is a GV (Glob Value). This GV
	668	in turn contains references to the various objects of that name,
	669	including (but not limited to) the following:
	670
	671	Scalar Value
	672	Array Value
	673	Hash Value
	674	I/O Handle
	675	Format
	676	Subroutine
	677
	678	There is a single stash called "PL_defstash" that holds the items that exist
	679	in the "main" package. To get at the items in other packages, append the
	680	string "::" to the package name. The items in the "Foo" package are in
	681	the stash "Foo::" in PL_defstash. The items in the "Bar::Baz" package are
	682	in the stash "Baz::" in "Bar::"'s stash.
	683
	684	To get the stash pointer for a particular package, use the function:
	685
	686	HV* gv_stashpv(const char* name, I32 create)
	687	HV* gv_stashsv(SV*, I32 create)
	688
	689	The first function takes a literal string, the second uses the string stored
	690	in the SV. Remember that a stash is just a hash table, so you get back an
	691	C<HV*>. The C<create> flag will create a new package if it is set.
	692
	693	The name that C<gv_stash*v> wants is the name of the package whose symbol table
	694	you want. The default package is called C<main>. If you have multiply nested
	695	packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl
	696	language itself.
	697
	698	Alternately, if you have an SV that is a blessed reference, you can find
	699	out the stash pointer by using:
	700
	701	HV* SvSTASH(SvRV(SV*));
	702
	703	then use the following to get the package name itself:
	704
	705	char* HvNAME(HV* stash);
	706
	707	If you need to bless or re-bless an object you can use the following
	708	function:
	709
	710	SV* sv_bless(SV, HV stash)
	711
	712	where the first argument, an C<SV*>, must be a reference, and the second
	713	argument is a stash. The returned C<SV*> can now be used in the same way
	714	as any other SV.
	715
	716	For more information on references and blessings, consult L<perlref>.
	717
	718	=head2 Double-Typed SVs
	719
	720	Scalar variables normally contain only one type of value, an integer,
	721	double, pointer, or reference. Perl will automatically convert the
	722	actual scalar data from the stored type into the requested type.
	723
	724	Some scalar variables contain more than one type of scalar data. For
	725	example, the variable C<$!> contains either the numeric value of C<errno>
	726	or its string equivalent from either C<strerror> or C<sys_errlist[]>.
	727
	728	To force multiple data values into an SV, you must do two things: use the
	729	C<sv_set*v> routines to add the additional scalar type, then set a flag
	730	so that Perl will believe it contains more than one type of data. The
	731	four macros to set the flags are:
	732
	733	SvIOK_on
	734	SvNOK_on
	735	SvPOK_on
	736	SvROK_on
	737
	738	The particular macro you must use depends on which C<sv_set*v> routine
	739	you called first. This is because every C<sv_set*v> routine turns on
	740	only the bit for the particular type of data being set, and turns off
	741	all the rest.
	742
	743	For example, to create a new Perl variable called "dberror" that contains
	744	both the numeric and descriptive string error values, you could use the
	745	following code:
	746
	747	extern int dberror;
	748	extern char *dberror_list;
	749
	750	SV* sv = get_sv("dberror", TRUE);
	751	sv_setiv(sv, (IV) dberror);
	752	sv_setpv(sv, dberror_list[dberror]);
	753	SvIOK_on(sv);
	754
	755	If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the
	756	macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>.
	757
	758	=head2 Magic Variables
	759
	760	[This section still under construction. Ignore everything here. Post no
	761	bills. Everything not permitted is forbidden.]
	762
	763	Any SV may be magical, that is, it has special features that a normal
	764	SV does not have. These features are stored in the SV structure in a
	765	linked list of C<struct magic>'s, typedef'ed to C<MAGIC>.
	766
	767	struct magic {
	768	MAGIC* mg_moremagic;
	769	MGVTBL* mg_virtual;
	770	U16 mg_private;
	771	char mg_type;
	772	U8 mg_flags;
	773	SV* mg_obj;
	774	char* mg_ptr;
	775	I32 mg_len;
	776	};
	777
	778	Note this is current as of patchlevel 0, and could change at any time.
	779
	780	=head2 Assigning Magic
	781
	782	Perl adds magic to an SV using the sv_magic function:
	783
	784	void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen);
	785
	786	The C<sv> argument is a pointer to the SV that is to acquire a new magical
	787	feature.
	788
	789	If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to
	790	set the C<SVt_PVMG> flag for the C<sv>. Perl then continues by adding
	791	it to the beginning of the linked list of magical features. Any prior
	792	entry of the same type of magic is deleted. Note that this can be
	793	overridden, and multiple instances of the same type of magic can be
	794	associated with an SV.
	795
	796	The C<name> and C<namlen> arguments are used to associate a string with
	797	the magic, typically the name of a variable. C<namlen> is stored in the
	798	C<mg_len> field and if C<name> is non-null and C<namlen> >= 0 a malloc'd
	799	copy of the name is stored in C<mg_ptr> field.
	800
	801	The sv_magic function uses C<how> to determine which, if any, predefined
	802	"Magic Virtual Table" should be assigned to the C<mg_virtual> field.
	803	See the "Magic Virtual Table" section below. The C<how> argument is also
	804	stored in the C<mg_type> field.
	805
	806	The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC>
	807	structure. If it is not the same as the C<sv> argument, the reference
	808	count of the C<obj> object is incremented. If it is the same, or if
	809	the C<how> argument is "#", or if it is a NULL pointer, then C<obj> is
	810	merely stored, without the reference count being incremented.
	811
	812	There is also a function to add magic to an C<HV>:
	813
	814	void hv_magic(HV hv, GV gv, int how);
	815
	816	This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>.
	817
	818	To remove the magic from an SV, call the function sv_unmagic:
	819
	820	void sv_unmagic(SV *sv, int type);
	821
	822	The C<type> argument should be equal to the C<how> value when the C<SV>
	823	was initially made magical.
	824
	825	=head2 Magic Virtual Tables
	826
	827	The C<mg_virtual> field in the C<MAGIC> structure is a pointer to a
	828	C<MGVTBL>, which is a structure of function pointers and stands for
	829	"Magic Virtual Table" to handle the various operations that might be
	830	applied to that variable.
	831
	832	The C<MGVTBL> has five pointers to the following routine types:
	833
	834	int (svt_get)(SV sv, MAGIC* mg);
	835	int (svt_set)(SV sv, MAGIC* mg);
	836	U32 (svt_len)(SV sv, MAGIC* mg);
	837	int (svt_clear)(SV sv, MAGIC* mg);
	838	int (svt_free)(SV sv, MAGIC* mg);
	839
	840	This MGVTBL structure is set at compile-time in C<perl.h> and there are
	841	currently 19 types (or 21 with overloading turned on). These different
	842	structures contain pointers to various routines that perform additional
	843	actions depending on which function is being called.
	844
	845	Function pointer Action taken
	846	---------------- ------------
	847	svt_get Do something after the value of the SV is retrieved.
	848	svt_set Do something after the SV is assigned a value.
	849	svt_len Report on the SV's length.
	850	svt_clear Clear something the SV represents.
	851	svt_free Free any extra storage associated with the SV.
	852
	853	For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds
	854	to an C<mg_type> of '\0') contains:
	855
	856	{ magic_get, magic_set, magic_len, 0, 0 }
	857
	858	Thus, when an SV is determined to be magical and of type '\0', if a get
	859	operation is being performed, the routine C<magic_get> is called. All
	860	the various routines for the various magical types begin with C<magic_>.
	861	NOTE: the magic routines are not considered part of the Perl API, and may
	862	not be exported by the Perl library.
	863
	864	The current kinds of Magic Virtual Tables are:
	865
	866	mg_type MGVTBL Type of magic
	867	------- ------ ----------------------------
	868	\0 vtbl_sv Special scalar variable
	869	A vtbl_amagic %OVERLOAD hash
	870	a vtbl_amagicelem %OVERLOAD hash element
	871	c (none) Holds overload table (AMT) on stash
	872	B vtbl_bm Boyer-Moore (fast string search)
	873	D vtbl_regdata Regex match position data (@+ and @- vars)
	874	d vtbl_regdatum Regex match position data element
	875	E vtbl_env %ENV hash
	876	e vtbl_envelem %ENV hash element
	877	f vtbl_fm Formline ('compiled' format)
	878	g vtbl_mglob m//g target / study()ed string
	879	I vtbl_isa @ISA array
	880	i vtbl_isaelem @ISA array element
	881	k vtbl_nkeys scalar(keys()) lvalue
	882	L (none) Debugger %_<filename
	883	l vtbl_dbline Debugger %_<filename element
	884	o vtbl_collxfrm Locale transformation
	885	P vtbl_pack Tied array or hash
	886	p vtbl_packelem Tied array or hash element
	887	q vtbl_packelem Tied scalar or handle
	888	S vtbl_sig %SIG hash
	889	s vtbl_sigelem %SIG hash element
	890	t vtbl_taint Taintedness
	891	U vtbl_uvar Available for use by extensions
	892	v vtbl_vec vec() lvalue
	893	x vtbl_substr substr() lvalue
	894	y vtbl_defelem Shadow "foreach" iterator variable /
	895	smart parameter vivification
	896	* vtbl_glob GV (typeglob)
	897	# vtbl_arylen Array length ($#ary)
	898	. vtbl_pos pos() lvalue
	899	~ (none) Available for use by extensions
	900
	901	When an uppercase and lowercase letter both exist in the table, then the
	902	uppercase letter is used to represent some kind of composite type (a list
	903	or a hash), and the lowercase letter is used to represent an element of
	904	that composite type.
	905
	906	The '~' and 'U' magic types are defined specifically for use by
	907	extensions and will not be used by perl itself. Extensions can use
	908	'~' magic to 'attach' private information to variables (typically
	909	objects). This is especially useful because there is no way for
	910	normal perl code to corrupt this private information (unlike using
	911	extra elements of a hash object).
	912
	913	Similarly, 'U' magic can be used much like tie() to call a C function
	914	any time a scalar's value is used or changed. The C<MAGIC>'s
	915	C<mg_ptr> field points to a C<ufuncs> structure:
	916
	917	struct ufuncs {
	918	I32 (uf_val)(IV, SV);
	919	I32 (uf_set)(IV, SV);
	920	IV uf_index;
	921	};
	922
	923	When the SV is read from or written to, the C<uf_val> or C<uf_set>
	924	function will be called with C<uf_index> as the first arg and a
	925	pointer to the SV as the second. A simple example of how to add 'U'
	926	magic is shown below. Note that the ufuncs structure is copied by
	927	sv_magic, so you can safely allocate it on the stack.
	928
	929	void
	930	Umagic(sv)
	931	SV *sv;
	932	PREINIT:
	933	struct ufuncs uf;
	934	CODE:
	935	uf.uf_val = &my_get_fn;
	936	uf.uf_set = &my_set_fn;
	937	uf.uf_index = 0;
	938	sv_magic(sv, 0, 'U', (char*)&uf, sizeof(uf));
	939
	940	Note that because multiple extensions may be using '~' or 'U' magic,
	941	it is important for extensions to take extra care to avoid conflict.
	942	Typically only using the magic on objects blessed into the same class
	943	as the extension is sufficient. For '~' magic, it may also be
	944	appropriate to add an I32 'signature' at the top of the private data
	945	area and check that.
	946
	947	Also note that the C<sv_set()> and C<sv_cat()> functions described
	948	earlier do B<not> invoke 'set' magic on their targets. This must
	949	be done by the user either by calling the C<SvSETMAGIC()> macro after
	950	calling these functions, or by using one of the C<sv_set*_mg()> or
	951	C<sv_cat*_mg()> functions. Similarly, generic C code must call the
	952	C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV
	953	obtained from external sources in functions that don't handle magic.
	954	See L<perlapi> for a description of these functions.
	955	For example, calls to the C<sv_cat*()> functions typically need to be
	956	followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()>
	957	since their implementation handles 'get' magic.
	958
	959	=head2 Finding Magic
	960
	961	MAGIC* mg_find(SV, int type); / Finds the magic pointer of that type */
	962
	963	This routine returns a pointer to the C<MAGIC> structure stored in the SV.
	964	If the SV does not have that magical feature, C<NULL> is returned. Also,
	965	if the SV is not of type SVt_PVMG, Perl may core dump.
	966
	967	int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen);
	968
	969	This routine checks to see what types of magic C<sv> has. If the mg_type
	970	field is an uppercase letter, then the mg_obj is copied to C<nsv>, but
	971	the mg_type field is changed to be the lowercase letter.
	972
	973	=head2 Understanding the Magic of Tied Hashes and Arrays
	974
	975	Tied hashes and arrays are magical beasts of the 'P' magic type.
	976
	977	WARNING: As of the 5.004 release, proper usage of the array and hash
	978	access functions requires understanding a few caveats. Some
	979	of these caveats are actually considered bugs in the API, to be fixed
	980	in later releases, and are bracketed with [MAYCHANGE] below. If
	981	you find yourself actually applying such information in this section, be
	982	aware that the behavior may change in the future, umm, without warning.
	983
	984	The perl tie function associates a variable with an object that implements
	985	the various GET, SET etc methods. To perform the equivalent of the perl
	986	tie function from an XSUB, you must mimic this behaviour. The code below
	987	carries out the necessary steps - firstly it creates a new hash, and then
	988	creates a second hash which it blesses into the class which will implement
	989	the tie methods. Lastly it ties the two hashes together, and returns a
	990	reference to the new tied hash. Note that the code below does NOT call the
	991	TIEHASH method in the MyTie class -
	992	see L<Calling Perl Routines from within C Programs> for details on how
	993	to do this.
	994
	995	SV*
	996	mytie()
	997	PREINIT:
	998	HV *hash;
	999	HV *stash;
	1000	SV *tie;
	1001	CODE:
	1002	hash = newHV();
	1003	tie = newRV_noinc((SV*)newHV());
	1004	stash = gv_stashpv("MyTie", TRUE);
	1005	sv_bless(tie, stash);
	1006	hv_magic(hash, tie, 'P');
	1007	RETVAL = newRV_noinc(hash);
	1008	OUTPUT:
	1009	RETVAL
	1010
	1011	The C<av_store> function, when given a tied array argument, merely
	1012	copies the magic of the array onto the value to be "stored", using
	1013	C<mg_copy>. It may also return NULL, indicating that the value did not
	1014	actually need to be stored in the array. [MAYCHANGE] After a call to
	1015	C<av_store> on a tied array, the caller will usually need to call
	1016	C<mg_set(val)> to actually invoke the perl level "STORE" method on the
	1017	TIEARRAY object. If C<av_store> did return NULL, a call to
	1018	C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory
	1019	leak. [/MAYCHANGE]
	1020
	1021	The previous paragraph is applicable verbatim to tied hash access using the
	1022	C<hv_store> and C<hv_store_ent> functions as well.
	1023
	1024	C<av_fetch> and the corresponding hash functions C<hv_fetch> and
	1025	C<hv_fetch_ent> actually return an undefined mortal value whose magic
	1026	has been initialized using C<mg_copy>. Note the value so returned does not
	1027	need to be deallocated, as it is already mortal. [MAYCHANGE] But you will
	1028	need to call C<mg_get()> on the returned value in order to actually invoke
	1029	the perl level "FETCH" method on the underlying TIE object. Similarly,
	1030	you may also call C<mg_set()> on the return value after possibly assigning
	1031	a suitable value to it using C<sv_setsv>, which will invoke the "STORE"
	1032	method on the TIE object. [/MAYCHANGE]
	1033
	1034	[MAYCHANGE]
	1035	In other words, the array or hash fetch/store functions don't really
	1036	fetch and store actual values in the case of tied arrays and hashes. They
	1037	merely call C<mg_copy> to attach magic to the values that were meant to be
	1038	"stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually
	1039	do the job of invoking the TIE methods on the underlying objects. Thus
	1040	the magic mechanism currently implements a kind of lazy access to arrays
	1041	and hashes.
	1042
	1043	Currently (as of perl version 5.004), use of the hash and array access
	1044	functions requires the user to be aware of whether they are operating on
	1045	"normal" hashes and arrays, or on their tied variants. The API may be
	1046	changed to provide more transparent access to both tied and normal data
	1047	types in future versions.
	1048	[/MAYCHANGE]
	1049
	1050	You would do well to understand that the TIEARRAY and TIEHASH interfaces
	1051	are mere sugar to invoke some perl method calls while using the uniform hash
	1052	and array syntax. The use of this sugar imposes some overhead (typically
	1053	about two to four extra opcodes per FETCH/STORE operation, in addition to
	1054	the creation of all the mortal variables required to invoke the methods).
	1055	This overhead will be comparatively small if the TIE methods are themselves
	1056	substantial, but if they are only a few statements long, the overhead
	1057	will not be insignificant.
	1058
	1059	=head2 Localizing changes
	1060
	1061	Perl has a very handy construction
	1062
	1063	{
	1064	local $var = 2;
	1065	...
	1066	}
	1067
	1068	This construction is I<approximately> equivalent to
	1069
	1070	{
	1071	my $oldvar = $var;
	1072	$var = 2;
	1073	...
	1074	$var = $oldvar;
	1075	}
	1076
	1077	The biggest difference is that the first construction would
	1078	reinstate the initial value of $var, irrespective of how control exits
	1079	the block: C<goto>, C<return>, C<die>/C<eval> etc. It is a little bit
	1080	more efficient as well.
	1081
	1082	There is a way to achieve a similar task from C via Perl API: create a
	1083	I<pseudo-block>, and arrange for some changes to be automatically
	1084	undone at the end of it, either explicit, or via a non-local exit (via
	1085	die()). A I<block>-like construct is created by a pair of
	1086	C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">).
	1087	Such a construct may be created specially for some important localized
	1088	task, or an existing one (like boundaries of enclosing Perl
	1089	subroutine/block, or an existing pair for freeing TMPs) may be
	1090	used. (In the second case the overhead of additional localization must
	1091	be almost negligible.) Note that any XSUB is automatically enclosed in
	1092	an C<ENTER>/C<LEAVE> pair.
	1093
	1094	Inside such a I<pseudo-block> the following service is available:
	1095
	1096	=over 4
	1097
	1098	=item C<SAVEINT(int i)>
	1099
	1100	=item C<SAVEIV(IV i)>
	1101
	1102	=item C<SAVEI32(I32 i)>
	1103
	1104	=item C<SAVELONG(long i)>
	1105
	1106	These macros arrange things to restore the value of integer variable
	1107	C<i> at the end of enclosing I<pseudo-block>.
	1108
	1109	=item C<SAVESPTR(s)>
	1110
	1111	=item C<SAVEPPTR(p)>
	1112
	1113	These macros arrange things to restore the value of pointers C<s> and
	1114	C<p>. C<s> must be a pointer of a type which survives conversion to
	1115	C<SV> and back, C<p> should be able to survive conversion to C<char>
	1116	and back.
	1117
	1118	=item C<SAVEFREESV(SV *sv)>
	1119
	1120	The refcount of C<sv> would be decremented at the end of
	1121	I<pseudo-block>. This is similar to C<sv_2mortal>, which should (?) be
	1122	used instead.
	1123
	1124	=item C<SAVEFREEOP(OP *op)>
	1125
	1126	The C<OP *> is op_free()ed at the end of I<pseudo-block>.
	1127
	1128	=item C<SAVEFREEPV(p)>
	1129
	1130	The chunk of memory which is pointed to by C<p> is Safefree()ed at the
	1131	end of I<pseudo-block>.
	1132
	1133	=item C<SAVECLEARSV(SV *sv)>
	1134
	1135	Clears a slot in the current scratchpad which corresponds to C<sv> at
	1136	the end of I<pseudo-block>.
	1137
	1138	=item C<SAVEDELETE(HV hv, char key, I32 length)>
	1139
	1140	The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The
	1141	string pointed to by C<key> is Safefree()ed. If one has a I<key> in
	1142	short-lived storage, the corresponding string may be reallocated like
	1143	this:
	1144
	1145	SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));
	1146
	1147	=item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)>
	1148
	1149	At the end of I<pseudo-block> the function C<f> is called with the
	1150	only argument C<p>.
	1151
	1152	=item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)>
	1153
	1154	At the end of I<pseudo-block> the function C<f> is called with the
	1155	implicit context argument (if any), and C<p>.
	1156
	1157	=item C<SAVESTACK_POS()>
	1158
	1159	The current offset on the Perl internal stack (cf. C<SP>) is restored
	1160	at the end of I<pseudo-block>.
	1161
	1162	=back
	1163
	1164	The following API list contains functions, thus one needs to
	1165	provide pointers to the modifiable data explicitly (either C pointers,
	1166	or Perlish C<GV *>s). Where the above macros take C<int>, a similar
	1167	function takes C<int *>.
	1168
	1169	=over 4
	1170
	1171	=item C<SV* save_scalar(GV *gv)>
	1172
	1173	Equivalent to Perl code C<local $gv>.
	1174
	1175	=item C<AV* save_ary(GV *gv)>
	1176
	1177	=item C<HV* save_hash(GV *gv)>
	1178
	1179	Similar to C<save_scalar>, but localize C<@gv> and C<%gv>.
	1180
	1181	=item C<void save_item(SV *item)>
	1182
	1183	Duplicates the current value of C<SV>, on the exit from the current
	1184	C<ENTER>/C<LEAVE> I<pseudo-block> will restore the value of C<SV>
	1185	using the stored value.
	1186
	1187	=item C<void save_list(SV **sarg, I32 maxsarg)>
	1188
	1189	A variant of C<save_item> which takes multiple arguments via an array
	1190	C<sarg> of C<SV*> of length C<maxsarg>.
	1191
	1192	=item C<SV* save_svref(SV **sptr)>
	1193
	1194	Similar to C<save_scalar>, but will reinstate a C<SV *>.
	1195
	1196	=item C<void save_aptr(AV **aptr)>
	1197
	1198	=item C<void save_hptr(HV **hptr)>
	1199
	1200	Similar to C<save_svref>, but localize C<AV > and C<HV >.
	1201
	1202	=back
	1203
	1204	The C<Alias> module implements localization of the basic types within the
	1205	I<caller's scope>. People who are interested in how to localize things in
	1206	the containing scope should take a look there too.
	1207
	1208	=head1 Subroutines
	1209
	1210	=head2 XSUBs and the Argument Stack
	1211
	1212	The XSUB mechanism is a simple way for Perl programs to access C subroutines.
	1213	An XSUB routine will have a stack that contains the arguments from the Perl
	1214	program, and a way to map from the Perl data structures to a C equivalent.
	1215
	1216	The stack arguments are accessible through the C<ST(n)> macro, which returns
	1217	the C<n>'th stack argument. Argument 0 is the first argument passed in the
	1218	Perl subroutine call. These arguments are C<SV*>, and can be used anywhere
	1219	an C<SV*> is used.
	1220
	1221	Most of the time, output from the C routine can be handled through use of
	1222	the RETVAL and OUTPUT directives. However, there are some cases where the
	1223	argument stack is not already long enough to handle all the return values.
	1224	An example is the POSIX tzname() call, which takes no arguments, but returns
	1225	two, the local time zone's standard and summer time abbreviations.
	1226
	1227	To handle this situation, the PPCODE directive is used and the stack is
	1228	extended using the macro:
	1229
	1230	EXTEND(SP, num);
	1231
	1232	where C<SP> is the macro that represents the local copy of the stack pointer,
	1233	and C<num> is the number of elements the stack should be extended by.
	1234
	1235	Now that there is room on the stack, values can be pushed on it using the
	1236	macros to push IVs, doubles, strings, and SV pointers respectively:
	1237
	1238	PUSHi(IV)
	1239	PUSHn(double)
	1240	PUSHp(char*, I32)
	1241	PUSHs(SV*)
	1242
	1243	And now the Perl program calling C<tzname>, the two values will be assigned
	1244	as in:
	1245
	1246	($standard_abbrev, $summer_abbrev) = POSIX::tzname;
	1247
	1248	An alternate (and possibly simpler) method to pushing values on the stack is
	1249	to use the macros:
	1250
	1251	XPUSHi(IV)
	1252	XPUSHn(double)
	1253	XPUSHp(char*, I32)
	1254	XPUSHs(SV*)
	1255
	1256	These macros automatically adjust the stack for you, if needed. Thus, you
	1257	do not need to call C<EXTEND> to extend the stack.
	1258
	1259	For more information, consult L<perlxs> and L<perlxstut>.
	1260
	1261	=head2 Calling Perl Routines from within C Programs
	1262
	1263	There are four routines that can be used to call a Perl subroutine from
	1264	within a C program. These four are:
	1265
	1266	I32 call_sv(SV*, I32);
	1267	I32 call_pv(const char*, I32);
	1268	I32 call_method(const char*, I32);
	1269	I32 call_argv(const char, I32, register char*);
	1270
	1271	The routine most often used is C<call_sv>. The C<SV*> argument
	1272	contains either the name of the Perl subroutine to be called, or a
	1273	reference to the subroutine. The second argument consists of flags
	1274	that control the context in which the subroutine is called, whether
	1275	or not the subroutine is being passed arguments, how errors should be
	1276	trapped, and how to treat return values.
	1277
	1278	All four routines return the number of arguments that the subroutine returned
	1279	on the Perl stack.
	1280
	1281	These routines used to be called C<perl_call_sv> etc., before Perl v5.6.0,
	1282	but those names are now deprecated; macros of the same name are provided for
	1283	compatibility.
	1284
	1285	When using any of these routines (except C<call_argv>), the programmer
	1286	must manipulate the Perl stack. These include the following macros and
	1287	functions:
	1288
	1289	dSP
	1290	SP
	1291	PUSHMARK()
	1292	PUTBACK
	1293	SPAGAIN
	1294	ENTER
	1295	SAVETMPS
	1296	FREETMPS
	1297	LEAVE
	1298	XPUSH*()
	1299	POP*()
	1300
	1301	For a detailed description of calling conventions from C to Perl,
	1302	consult L<perlcall>.
	1303
	1304	=head2 Memory Allocation
	1305
	1306	All memory meant to be used with the Perl API functions should be manipulated
	1307	using the macros described in this section. The macros provide the necessary
	1308	transparency between differences in the actual malloc implementation that is
	1309	used within perl.
	1310
	1311	It is suggested that you enable the version of malloc that is distributed
	1312	with Perl. It keeps pools of various sizes of unallocated memory in
	1313	order to satisfy allocation requests more quickly. However, on some
	1314	platforms, it may cause spurious malloc or free errors.
	1315
	1316	New(x, pointer, number, type);
	1317	Newc(x, pointer, number, type, cast);
	1318	Newz(x, pointer, number, type);
	1319
	1320	These three macros are used to initially allocate memory.
	1321
	1322	The first argument C<x> was a "magic cookie" that was used to keep track
	1323	of who called the macro, to help when debugging memory problems. However,
	1324	the current code makes no use of this feature (most Perl developers now
	1325	use run-time memory checkers), so this argument can be any number.
	1326
	1327	The second argument C<pointer> should be the name of a variable that will
	1328	point to the newly allocated memory.
	1329
	1330	The third and fourth arguments C<number> and C<type> specify how many of
	1331	the specified type of data structure should be allocated. The argument
	1332	C<type> is passed to C<sizeof>. The final argument to C<Newc>, C<cast>,
	1333	should be used if the C<pointer> argument is different from the C<type>
	1334	argument.
	1335
	1336	Unlike the C<New> and C<Newc> macros, the C<Newz> macro calls C<memzero>
	1337	to zero out all the newly allocated memory.
	1338
	1339	Renew(pointer, number, type);
	1340	Renewc(pointer, number, type, cast);
	1341	Safefree(pointer)
	1342
	1343	These three macros are used to change a memory buffer size or to free a
	1344	piece of memory no longer needed. The arguments to C<Renew> and C<Renewc>
	1345	match those of C<New> and C<Newc> with the exception of not needing the
	1346	"magic cookie" argument.
	1347
	1348	Move(source, dest, number, type);
	1349	Copy(source, dest, number, type);
	1350	Zero(dest, number, type);
	1351
	1352	These three macros are used to move, copy, or zero out previously allocated
	1353	memory. The C<source> and C<dest> arguments point to the source and
	1354	destination starting points. Perl will move, copy, or zero out C<number>
	1355	instances of the size of the C<type> data structure (using the C<sizeof>
	1356	function).
	1357
	1358	=head2 PerlIO
	1359
	1360	The most recent development releases of Perl has been experimenting with
	1361	removing Perl's dependency on the "normal" standard I/O suite and allowing
	1362	other stdio implementations to be used. This involves creating a new
	1363	abstraction layer that then calls whichever implementation of stdio Perl
	1364	was compiled with. All XSUBs should now use the functions in the PerlIO
	1365	abstraction layer and not make any assumptions about what kind of stdio
	1366	is being used.
	1367
	1368	For a complete description of the PerlIO abstraction, consult L<perlapio>.
	1369
	1370	=head2 Putting a C value on Perl stack
	1371
	1372	A lot of opcodes (this is an elementary operation in the internal perl
	1373	stack machine) put an SV* on the stack. However, as an optimization
	1374	the corresponding SV is (usually) not recreated each time. The opcodes
	1375	reuse specially assigned SVs (I<target>s) which are (as a corollary)
	1376	not constantly freed/created.
	1377
	1378	Each of the targets is created only once (but see
	1379	L<Scratchpads and recursion> below), and when an opcode needs to put
	1380	an integer, a double, or a string on stack, it just sets the
	1381	corresponding parts of its I<target> and puts the I<target> on stack.
	1382
	1383	The macro to put this target on stack is C<PUSHTARG>, and it is
	1384	directly used in some opcodes, as well as indirectly in zillions of
	1385	others, which use it via C<(X)PUSH[pni]>.
	1386
	1387	=head2 Scratchpads
	1388
	1389	The question remains on when the SVs which are I<target>s for opcodes
	1390	are created. The answer is that they are created when the current unit --
	1391	a subroutine or a file (for opcodes for statements outside of
	1392	subroutines) -- is compiled. During this time a special anonymous Perl
	1393	array is created, which is called a scratchpad for the current
	1394	unit.
	1395
	1396	A scratchpad keeps SVs which are lexicals for the current unit and are
	1397	targets for opcodes. One can deduce that an SV lives on a scratchpad
	1398	by looking on its flags: lexicals have C<SVs_PADMY> set, and
	1399	I<target>s have C<SVs_PADTMP> set.
	1400
	1401	The correspondence between OPs and I<target>s is not 1-to-1. Different
	1402	OPs in the compile tree of the unit can use the same target, if this
	1403	would not conflict with the expected life of the temporary.
	1404
	1405	=head2 Scratchpads and recursion
	1406
	1407	In fact it is not 100% true that a compiled unit contains a pointer to
	1408	the scratchpad AV. In fact it contains a pointer to an AV of
	1409	(initially) one element, and this element is the scratchpad AV. Why do
	1410	we need an extra level of indirection?
	1411
	1412	The answer is B<recursion>, and maybe (sometime soon) B<threads>. Both
	1413	these can create several execution pointers going into the same
	1414	subroutine. For the subroutine-child not write over the temporaries
	1415	for the subroutine-parent (lifespan of which covers the call to the
	1416	child), the parent and the child should have different
	1417	scratchpads. (I<And> the lexicals should be separate anyway!)
	1418
	1419	So each subroutine is born with an array of scratchpads (of length 1).
	1420	On each entry to the subroutine it is checked that the current
	1421	depth of the recursion is not more than the length of this array, and
	1422	if it is, new scratchpad is created and pushed into the array.
	1423
	1424	The I<target>s on this scratchpad are C<undef>s, but they are already
	1425	marked with correct flags.
	1426
	1427	=head1 Compiled code
	1428
	1429	=head2 Code tree
	1430
	1431	Here we describe the internal form your code is converted to by
	1432	Perl. Start with a simple example:
	1433
	1434	$a = $b + $c;
	1435
	1436	This is converted to a tree similar to this one:
	1437
	1438	assign-to
	1439	/ \
	1440	+ $a
	1441	/ \
	1442	$b $c
	1443
	1444	(but slightly more complicated). This tree reflects the way Perl
	1445	parsed your code, but has nothing to do with the execution order.
	1446	There is an additional "thread" going through the nodes of the tree
	1447	which shows the order of execution of the nodes. In our simplified
	1448	example above it looks like:
	1449
	1450	$b ---> $c ---> + ---> $a ---> assign-to
	1451
	1452	But with the actual compile tree for C<$a = $b + $c> it is different:
	1453	some nodes I<optimized away>. As a corollary, though the actual tree
	1454	contains more nodes than our simplified example, the execution order
	1455	is the same as in our example.
	1456
	1457	=head2 Examining the tree
	1458
	1459	If you have your perl compiled for debugging (usually done with C<-D
	1460	optimize=-g> on C<Configure> command line), you may examine the
	1461	compiled tree by specifying C<-Dx> on the Perl command line. The
	1462	output takes several lines per node, and for C<$b+$c> it looks like
	1463	this:
	1464
	1465	5 TYPE = add ===> 6
	1466	TARG = 1
	1467	FLAGS = (SCALAR,KIDS)
	1468	{
	1469	TYPE = null ===> (4)
	1470	(was rv2sv)
	1471	FLAGS = (SCALAR,KIDS)
	1472	{
	1473	3 TYPE = gvsv ===> 4
	1474	FLAGS = (SCALAR)
	1475	GV = main::b
	1476	}
	1477	}
	1478	{
	1479	TYPE = null ===> (5)
	1480	(was rv2sv)
	1481	FLAGS = (SCALAR,KIDS)
	1482	{
	1483	4 TYPE = gvsv ===> 5
	1484	FLAGS = (SCALAR)
	1485	GV = main::c
	1486	}
	1487	}
	1488
	1489	This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are
	1490	not optimized away (one per number in the left column). The immediate
	1491	children of the given node correspond to C<{}> pairs on the same level
	1492	of indentation, thus this listing corresponds to the tree:
	1493
	1494	add
	1495	/ \
	1496	null null
	1497	\| \|
	1498	gvsv gvsv
	1499
	1500	The execution order is indicated by C<===E<gt>> marks, thus it is C<3
	1501	4 5 6> (node C<6> is not included into above listing), i.e.,
	1502	C<gvsv gvsv add whatever>.
	1503
	1504	Each of these nodes represents an op, a fundamental operation inside the
	1505	Perl core. The code which implements each operation can be found in the
	1506	F<pp*.c> files; the function which implements the op with type C<gvsv>
	1507	is C<pp_gvsv>, and so on. As the tree above shows, different ops have
	1508	different numbers of children: C<add> is a binary operator, as one would
	1509	expect, and so has two children. To accommodate the various different
	1510	numbers of children, there are various types of op data structure, and
	1511	they link together in different ways.
	1512
	1513	The simplest type of op structure is C<OP>: this has no children. Unary
	1514	operators, C<UNOP>s, have one child, and this is pointed to by the
	1515	C<op_first> field. Binary operators (C<BINOP>s) have not only an
	1516	C<op_first> field but also an C<op_last> field. The most complex type of
	1517	op is a C<LISTOP>, which has any number of children. In this case, the
	1518	first child is pointed to by C<op_first> and the last child by
	1519	C<op_last>. The children in between can be found by iteratively
	1520	following the C<op_sibling> pointer from the first child to the last.
	1521
	1522	There are also two other op types: a C<PMOP> holds a regular expression,
	1523	and has no children, and a C<LOOP> may or may not have children. If the
	1524	C<op_children> field is non-zero, it behaves like a C<LISTOP>. To
	1525	complicate matters, if a C<UNOP> is actually a C<null> op after
	1526	optimization (see L</Compile pass 2: context propagation>) it will still
	1527	have children in accordance with its former type.
	1528
	1529	=head2 Compile pass 1: check routines
	1530
	1531	The tree is created by the compiler while I<yacc> code feeds it
	1532	the constructions it recognizes. Since I<yacc> works bottom-up, so does
	1533	the first pass of perl compilation.
	1534
	1535	What makes this pass interesting for perl developers is that some
	1536	optimization may be performed on this pass. This is optimization by
	1537	so-called "check routines". The correspondence between node names
	1538	and corresponding check routines is described in F<opcode.pl> (do not
	1539	forget to run C<make regen_headers> if you modify this file).
	1540
	1541	A check routine is called when the node is fully constructed except
	1542	for the execution-order thread. Since at this time there are no
	1543	back-links to the currently constructed node, one can do most any
	1544	operation to the top-level node, including freeing it and/or creating
	1545	new nodes above/below it.
	1546
	1547	The check routine returns the node which should be inserted into the
	1548	tree (if the top-level node was not modified, check routine returns
	1549	its argument).
	1550
	1551	By convention, check routines have names C<ck_*>. They are usually
	1552	called from C<new*OP> subroutines (or C<convert>) (which in turn are
	1553	called from F<perly.y>).
	1554
	1555	=head2 Compile pass 1a: constant folding
	1556
	1557	Immediately after the check routine is called the returned node is
	1558	checked for being compile-time executable. If it is (the value is
	1559	judged to be constant) it is immediately executed, and a I<constant>
	1560	node with the "return value" of the corresponding subtree is
	1561	substituted instead. The subtree is deleted.
	1562
	1563	If constant folding was not performed, the execution-order thread is
	1564	created.
	1565
	1566	=head2 Compile pass 2: context propagation
	1567
	1568	When a context for a part of compile tree is known, it is propagated
	1569	down through the tree. At this time the context can have 5 values
	1570	(instead of 2 for runtime context): void, boolean, scalar, list, and
	1571	lvalue. In contrast with the pass 1 this pass is processed from top
	1572	to bottom: a node's context determines the context for its children.
	1573
	1574	Additional context-dependent optimizations are performed at this time.
	1575	Since at this moment the compile tree contains back-references (via
	1576	"thread" pointers), nodes cannot be free()d now. To allow
	1577	optimized-away nodes at this stage, such nodes are null()ified instead
	1578	of free()ing (i.e. their type is changed to OP_NULL).
	1579
	1580	=head2 Compile pass 3: peephole optimization
	1581
	1582	After the compile tree for a subroutine (or for an C<eval> or a file)
	1583	is created, an additional pass over the code is performed. This pass
	1584	is neither top-down or bottom-up, but in the execution order (with
	1585	additional complications for conditionals). These optimizations are
	1586	done in the subroutine peep(). Optimizations performed at this stage
	1587	are subject to the same restrictions as in the pass 2.
	1588
	1589	=head1 Examining internal data structures with the C<dump> functions
	1590
	1591	To aid debugging, the source file F<dump.c> contains a number of
	1592	functions which produce formatted output of internal data structures.
	1593
	1594	The most commonly used of these functions is C<Perl_sv_dump>; it's used
	1595	for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls
	1596	C<sv_dump> to produce debugging output from Perl-space, so users of that
	1597	module should already be familiar with its format.
	1598
	1599	C<Perl_op_dump> can be used to dump an C<OP> structure or any of its
	1600	derivatives, and produces output similiar to C<perl -Dx>; in fact,
	1601	C<Perl_dump_eval> will dump the main root of the code being evaluated,
	1602	exactly like C<-Dx>.
	1603
	1604	Other useful functions are C<Perl_dump_sub>, which turns a C<GV> into an
	1605	op tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the
	1606	subroutines in a package like so: (Thankfully, these are all xsubs, so
	1607	there is no op tree)
	1608
	1609	(gdb) print Perl_dump_packsubs(PL_defstash)
	1610
	1611	SUB attributes::bootstrap = (xsub 0x811fedc 0)
	1612
	1613	SUB UNIVERSAL::can = (xsub 0x811f50c 0)
	1614
	1615	SUB UNIVERSAL::isa = (xsub 0x811f304 0)
	1616
	1617	SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0)
	1618
	1619	SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0)
	1620
	1621	and C<Perl_dump_all>, which dumps all the subroutines in the stash and
	1622	the op tree of the main root.
	1623
	1624	=head1 How multiple interpreters and concurrency are supported
	1625
	1626	=head2 Background and PERL_IMPLICIT_CONTEXT
	1627
	1628	The Perl interpreter can be regarded as a closed box: it has an API
	1629	for feeding it code or otherwise making it do things, but it also has
	1630	functions for its own use. This smells a lot like an object, and
	1631	there are ways for you to build Perl so that you can have multiple
	1632	interpreters, with one interpreter represented either as a C++ object,
	1633	a C structure, or inside a thread. The thread, the C structure, or
	1634	the C++ object will contain all the context, the state of that
	1635	interpreter.
	1636
	1637	Three macros control the major Perl build flavors: MULTIPLICITY,
	1638	USE_THREADS and PERL_OBJECT. The MULTIPLICITY build has a C structure
	1639	that packages all the interpreter state, there is a similar thread-specific
	1640	data structure under USE_THREADS, and the PERL_OBJECT build has a C++
	1641	class to maintain interpreter state. In all three cases,
	1642	PERL_IMPLICIT_CONTEXT is also normally defined, and enables the
	1643	support for passing in a "hidden" first argument that represents all three
	1644	data structures.
	1645
	1646	All this obviously requires a way for the Perl internal functions to be
	1647	C++ methods, subroutines taking some kind of structure as the first
	1648	argument, or subroutines taking nothing as the first argument. To
	1649	enable these three very different ways of building the interpreter,
	1650	the Perl source (as it does in so many other situations) makes heavy
	1651	use of macros and subroutine naming conventions.
	1652
	1653	First problem: deciding which functions will be public API functions and
	1654	which will be private. All functions whose names begin C<S_> are private
	1655	(think "S" for "secret" or "static"). All other functions begin with
	1656	"Perl_", but just because a function begins with "Perl_" does not mean it is
	1657	part of the API. (See L</Internal Functions>.) The easiest way to be B<sure> a
	1658	function is part of the API is to find its entry in L<perlapi>.
	1659	If it exists in L<perlapi>, it's part of the API. If it doesn't, and you
	1660	think it should be (i.e., you need it for your extension), send mail via
	1661	L<perlbug> explaining why you think it should be.
	1662
	1663	Second problem: there must be a syntax so that the same subroutine
	1664	declarations and calls can pass a structure as their first argument,
	1665	or pass nothing. To solve this, the subroutines are named and
	1666	declared in a particular way. Here's a typical start of a static
	1667	function used within the Perl guts:
	1668
	1669	STATIC void
	1670	S_incline(pTHX_ char *s)
	1671
	1672	STATIC becomes "static" in C, and is #define'd to nothing in C++.
	1673
	1674	A public function (i.e. part of the internal API, but not necessarily
	1675	sanctioned for use in extensions) begins like this:
	1676
	1677	void
	1678	Perl_sv_setsv(pTHX_ SV* dsv, SV* ssv)
	1679
	1680	C<pTHX_> is one of a number of macros (in perl.h) that hide the
	1681	details of the interpreter's context. THX stands for "thread", "this",
	1682	or "thingy", as the case may be. (And no, George Lucas is not involved. :-)
	1683	The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument,
	1684	or 'd' for B<d>eclaration.
	1685
	1686	When Perl is built without PERL_IMPLICIT_CONTEXT, there is no first
	1687	argument containing the interpreter's context. The trailing underscore
	1688	in the pTHX_ macro indicates that the macro expansion needs a comma
	1689	after the context argument because other arguments follow it. If
	1690	PERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be ignored, and the
	1691	subroutine is not prototyped to take the extra argument. The form of the
	1692	macro without the trailing underscore is used when there are no additional
	1693	explicit arguments.
	1694
	1695	When a core function calls another, it must pass the context. This
	1696	is normally hidden via macros. Consider C<sv_setsv>. It expands
	1697	something like this:
	1698
	1699	ifdef PERL_IMPLICIT_CONTEXT
	1700	define sv_setsv(a,b) Perl_sv_setsv(aTHX_ a, b)
	1701	/* can't do this for vararg functions, see below */
	1702	else
	1703	define sv_setsv Perl_sv_setsv
	1704	endif
	1705
	1706	This works well, and means that XS authors can gleefully write:
	1707
	1708	sv_setsv(foo, bar);
	1709
	1710	and still have it work under all the modes Perl could have been
	1711	compiled with.
	1712
	1713	Under PERL_OBJECT in the core, that will translate to either:
	1714
	1715	CPerlObj::Perl_sv_setsv(foo,bar); # in CPerlObj functions,
	1716	# C++ takes care of 'this'
	1717	or
	1718
	1719	pPerl->Perl_sv_setsv(foo,bar); # in truly static functions,
	1720	# see objXSUB.h
	1721
	1722	Under PERL_OBJECT in extensions (aka PERL_CAPI), or under
	1723	MULTIPLICITY/USE_THREADS w/ PERL_IMPLICIT_CONTEXT in both core
	1724	and extensions, it will be:
	1725
	1726	Perl_sv_setsv(aTHX_ foo, bar); # the canonical Perl "API"
	1727	# for all build flavors
	1728
	1729	This doesn't work so cleanly for varargs functions, though, as macros
	1730	imply that the number of arguments is known in advance. Instead we
	1731	either need to spell them out fully, passing C<aTHX_> as the first
	1732	argument (the Perl core tends to do this with functions like
	1733	Perl_warner), or use a context-free version.
	1734
	1735	The context-free version of Perl_warner is called
	1736	Perl_warner_nocontext, and does not take the extra argument. Instead
	1737	it does dTHX; to get the context from thread-local storage. We
	1738	C<#define warner Perl_warner_nocontext> so that extensions get source
	1739	compatibility at the expense of performance. (Passing an arg is
	1740	cheaper than grabbing it from thread-local storage.)
	1741
	1742	You can ignore [pad]THX[xo] when browsing the Perl headers/sources.
	1743	Those are strictly for use within the core. Extensions and embedders
	1744	need only be aware of [pad]THX.
	1745
	1746	=head2 How do I use all this in extensions?
	1747
	1748	When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call
	1749	any functions in the Perl API will need to pass the initial context
	1750	argument somehow. The kicker is that you will need to write it in
	1751	such a way that the extension still compiles when Perl hasn't been
	1752	built with PERL_IMPLICIT_CONTEXT enabled.
	1753
	1754	There are three ways to do this. First, the easy but inefficient way,
	1755	which is also the default, in order to maintain source compatibility
	1756	with extensions: whenever XSUB.h is #included, it redefines the aTHX
	1757	and aTHX_ macros to call a function that will return the context.
	1758	Thus, something like:
	1759
	1760	sv_setsv(asv, bsv);
	1761
	1762	in your extension will translate to this when PERL_IMPLICIT_CONTEXT is
	1763	in effect:
	1764
	1765	Perl_sv_setsv(Perl_get_context(), asv, bsv);
	1766
	1767	or to this otherwise:
	1768
	1769	Perl_sv_setsv(asv, bsv);
	1770
	1771	You have to do nothing new in your extension to get this; since
	1772	the Perl library provides Perl_get_context(), it will all just
	1773	work.
	1774
	1775	The second, more efficient way is to use the following template for
	1776	your Foo.xs:
	1777
	1778	#define PERL_NO_GET_CONTEXT /* we want efficiency */
	1779	#include "EXTERN.h"
	1780	#include "perl.h"
	1781	#include "XSUB.h"
	1782
	1783	static my_private_function(int arg1, int arg2);
	1784
	1785	static SV *
	1786	my_private_function(int arg1, int arg2)
	1787	{
	1788	dTHX; /* fetch context */
	1789	... call many Perl API functions ...
	1790	}
	1791
	1792	[... etc ...]
	1793
	1794	MODULE = Foo PACKAGE = Foo
	1795
	1796	/* typical XSUB */
	1797
	1798	void
	1799	my_xsub(arg)
	1800	int arg
	1801	CODE:
	1802	my_private_function(arg, 10);
	1803
	1804	Note that the only two changes from the normal way of writing an
	1805	extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before
	1806	including the Perl headers, followed by a C<dTHX;> declaration at
	1807	the start of every function that will call the Perl API. (You'll
	1808	know which functions need this, because the C compiler will complain
	1809	that there's an undeclared identifier in those functions.) No changes
	1810	are needed for the XSUBs themselves, because the XS() macro is
	1811	correctly defined to pass in the implicit context if needed.
	1812
	1813	The third, even more efficient way is to ape how it is done within
	1814	the Perl guts:
	1815
	1816
	1817	#define PERL_NO_GET_CONTEXT /* we want efficiency */
	1818	#include "EXTERN.h"
	1819	#include "perl.h"
	1820	#include "XSUB.h"
	1821
	1822	/* pTHX_ only needed for functions that call Perl API */
	1823	static my_private_function(pTHX_ int arg1, int arg2);
	1824
	1825	static SV *
	1826	my_private_function(pTHX_ int arg1, int arg2)
	1827	{
	1828	/* dTHX; not needed here, because THX is an argument */
	1829	... call Perl API functions ...
	1830	}
	1831
	1832	[... etc ...]
	1833
	1834	MODULE = Foo PACKAGE = Foo
	1835
	1836	/* typical XSUB */
	1837
	1838	void
	1839	my_xsub(arg)
	1840	int arg
	1841	CODE:
	1842	my_private_function(aTHX_ arg, 10);
	1843
	1844	This implementation never has to fetch the context using a function
	1845	call, since it is always passed as an extra argument. Depending on
	1846	your needs for simplicity or efficiency, you may mix the previous
	1847	two approaches freely.
	1848
	1849	Never add a comma after C<pTHX> yourself--always use the form of the
	1850	macro with the underscore for functions that take explicit arguments,
	1851	or the form without the argument for functions with no explicit arguments.
	1852
	1853	=head2 Future Plans and PERL_IMPLICIT_SYS
	1854
	1855	Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything
	1856	that the interpreter knows about itself and pass it around, so too are
	1857	there plans to allow the interpreter to bundle up everything it knows
	1858	about the environment it's running on. This is enabled with the
	1859	PERL_IMPLICIT_SYS macro. Currently it only works with PERL_OBJECT,
	1860	but is mostly there for MULTIPLICITY and USE_THREADS (see inside
	1861	iperlsys.h).
	1862
	1863	This allows the ability to provide an extra pointer (called the "host"
	1864	environment) for all the system calls. This makes it possible for
	1865	all the system stuff to maintain their own state, broken down into
	1866	seven C structures. These are thin wrappers around the usual system
	1867	calls (see win32/perllib.c) for the default perl executable, but for a
	1868	more ambitious host (like the one that would do fork() emulation) all
	1869	the extra work needed to pretend that different interpreters are
	1870	actually different "processes", would be done here.
	1871
	1872	The Perl engine/interpreter and the host are orthogonal entities.
	1873	There could be one or more interpreters in a process, and one or
	1874	more "hosts", with free association between them.
	1875
	1876	=head1 Internal Functions
	1877
	1878	All of Perl's internal functions which will be exposed to the outside
	1879	world are be prefixed by C<Perl_> so that they will not conflict with XS
	1880	functions or functions used in a program in which Perl is embedded.
	1881	Similarly, all global variables begin with C<PL_>. (By convention,
	1882	static functions start with C<S_>)
	1883
	1884	Inside the Perl core, you can get at the functions either with or
	1885	without the C<Perl_> prefix, thanks to a bunch of defines that live in
	1886	F<embed.h>. This header file is generated automatically from
	1887	F<embed.pl>. F<embed.pl> also creates the prototyping header files for
	1888	the internal functions, generates the documentation and a lot of other
	1889	bits and pieces. It's important that when you add a new function to the
	1890	core or change an existing one, you change the data in the table at the
	1891	end of F<embed.pl> as well. Here's a sample entry from that table:
	1892
	1893	Apd \|SV** \|av_fetch \|AV* ar\|I32 key\|I32 lval
	1894
	1895	The second column is the return type, the third column the name. Columns
	1896	after that are the arguments. The first column is a set of flags:
	1897
	1898	=over 3
	1899
	1900	=item A
	1901
	1902	This function is a part of the public API.
	1903
	1904	=item p
	1905
	1906	This function has a C<Perl_> prefix; ie, it is defined as C<Perl_av_fetch>
	1907
	1908	=item d
	1909
	1910	This function has documentation using the C<apidoc> feature which we'll
	1911	look at in a second.
	1912
	1913	=back
	1914
	1915	Other available flags are:
	1916
	1917	=over 3
	1918
	1919	=item s
	1920
	1921	This is a static function and is defined as C<S_whatever>.
	1922
	1923	=item n
	1924
	1925	This does not use C<aTHX_> and C<pTHX> to pass interpreter context. (See
	1926	L<perlguts/Background and PERL_IMPLICIT_CONTEXT>.)
	1927
	1928	=item r
	1929
	1930	This function never returns; C<croak>, C<exit> and friends.
	1931
	1932	=item f
	1933
	1934	This function takes a variable number of arguments, C<printf> style.
	1935	The argument list should end with C<...>, like this:
	1936
	1937	Afprd \|void \|croak \|const char* pat\|...
	1938
	1939	=item m
	1940
	1941	This function is part of the experimental development API, and may change
	1942	or disappear without notice.
	1943
	1944	=item o
	1945
	1946	This function should not have a compatibility macro to define, say,
	1947	C<Perl_parse> to C<parse>. It must be called as C<Perl_parse>.
	1948
	1949	=item j
	1950
	1951	This function is not a member of C<CPerlObj>. If you don't know
	1952	what this means, don't use it.
	1953
	1954	=item x
	1955
	1956	This function isn't exported out of the Perl core.
	1957
	1958	=back
	1959
	1960	If you edit F<embed.pl>, you will need to run C<make regen_headers> to
	1961	force a rebuild of F<embed.h> and other auto-generated files.
	1962
	1963	=head2 Formatted Printing of IVs, UVs, and NVs
	1964
	1965	If you are printing IVs, UVs, or NVS instead of the stdio(3) style
	1966	formatting codes like C<%d>, C<%ld>, C<%f>, you should use the
	1967	following macros for portability
	1968
	1969	IVdf IV in decimal
	1970	UVuf UV in decimal
	1971	UVof UV in octal
	1972	UVxf UV in hexadecimal
	1973	NVef NV %e-like
	1974	NVff NV %f-like
	1975	NVgf NV %g-like
	1976
	1977	These will take care of 64-bit integers and long doubles.
	1978	For example:
	1979
	1980	printf("IV is %"IVdf"\n", iv);
	1981
	1982	The IVdf will expand to whatever is the correct format for the IVs.
	1983
	1984	If you are printing addresses of pointers, use UVxf combined
	1985	with PTR2UV(), do not use %lx or %p.
	1986
	1987	=head2 Pointer-To-Integer and Integer-To-Pointer
	1988
	1989	Because pointer size does not necessarily equal integer size,
	1990	use the follow macros to do it right.
	1991
	1992	PTR2UV(pointer)
	1993	PTR2IV(pointer)
	1994	PTR2NV(pointer)
	1995	INT2PTR(pointertotype, integer)
	1996
	1997	For example:
	1998
	1999	IV iv = ...;
	2000	SV sv = INT2PTR(SV, iv);
	2001
	2002	and
	2003
	2004	AV *av = ...;
	2005	UV uv = PTR2UV(av);
	2006
	2007	=head2 Source Documentation
	2008
	2009	There's an effort going on to document the internal functions and
	2010	automatically produce reference manuals from them - L<perlapi> is one
	2011	such manual which details all the functions which are available to XS
	2012	writers. L<perlintern> is the autogenerated manual for the functions
	2013	which are not part of the API and are supposedly for internal use only.
	2014
	2015	Source documentation is created by putting POD comments into the C
	2016	source, like this:
	2017
	2018	/*
	2019	=for apidoc sv_setiv
	2020
	2021	Copies an integer into the given SV. Does not handle 'set' magic. See
	2022	C<sv_setiv_mg>.
	2023
	2024	=cut
	2025	*/
	2026
	2027	Please try and supply some documentation if you add functions to the
	2028	Perl core.
	2029
	2030	=head1 Unicode Support
	2031
	2032	Perl 5.6.0 introduced Unicode support. It's important for porters and XS
	2033	writers to understand this support and make sure that the code they
	2034	write does not corrupt Unicode data.
	2035
	2036	=head2 What B<is> Unicode, anyway?
	2037
	2038	In the olden, less enlightened times, we all used to use ASCII. Most of
	2039	us did, anyway. The big problem with ASCII is that it's American. Well,
	2040	no, that's not actually the problem; the problem is that it's not
	2041	particularly useful for people who don't use the Roman alphabet. What
	2042	used to happen was that particular languages would stick their own
	2043	alphabet in the upper range of the sequence, between 128 and 255. Of
	2044	course, we then ended up with plenty of variants that weren't quite
	2045	ASCII, and the whole point of it being a standard was lost.
	2046
	2047	Worse still, if you've got a language like Chinese or
	2048	Japanese that has hundreds or thousands of characters, then you really
	2049	can't fit them into a mere 256, so they had to forget about ASCII
	2050	altogether, and build their own systems using pairs of numbers to refer
	2051	to one character.
	2052
	2053	To fix this, some people formed Unicode, Inc. and
	2054	produced a new character set containing all the characters you can
	2055	possibly think of and more. There are several ways of representing these
	2056	characters, and the one Perl uses is called UTF8. UTF8 uses
	2057	a variable number of bytes to represent a character, instead of just
	2058	one. You can learn more about Unicode at http://www.unicode.org/
	2059
	2060	=head2 How can I recognise a UTF8 string?
	2061
	2062	You can't. This is because UTF8 data is stored in bytes just like
	2063	non-UTF8 data. The Unicode character 200, (C<0xC8> for you hex types)
	2064	capital E with a grave accent, is represented by the two bytes
	2065	C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)>
	2066	has that byte sequence as well. So you can't tell just by looking - this
	2067	is what makes Unicode input an interesting problem.
	2068
	2069	The API function C<is_utf8_string> can help; it'll tell you if a string
	2070	contains only valid UTF8 characters. However, it can't do the work for
	2071	you. On a character-by-character basis, C<is_utf8_char> will tell you
	2072	whether the current character in a string is valid UTF8.
	2073
	2074	=head2 How does UTF8 represent Unicode characters?
	2075
	2076	As mentioned above, UTF8 uses a variable number of bytes to store a
	2077	character. Characters with values 1...128 are stored in one byte, just
	2078	like good ol' ASCII. Character 129 is stored as C<v194.129>; this
	2079	continues up to character 191, which is C<v194.191>. Now we've run out of
	2080	bits (191 is binary C<10111111>) so we move on; 192 is C<v195.128>. And
	2081	so it goes on, moving to three bytes at character 2048.
	2082
	2083	Assuming you know you're dealing with a UTF8 string, you can find out
	2084	how long the first character in it is with the C<UTF8SKIP> macro:
	2085
	2086	char *utf = "\305\233\340\240\201";
	2087	I32 len;
	2088
	2089	len = UTF8SKIP(utf); /* len is 2 here */
	2090	utf += len;
	2091	len = UTF8SKIP(utf); /* len is 3 here */
	2092
	2093	Another way to skip over characters in a UTF8 string is to use
	2094	C<utf8_hop>, which takes a string and a number of characters to skip
	2095	over. You're on your own about bounds checking, though, so don't use it
	2096	lightly.
	2097
	2098	All bytes in a multi-byte UTF8 character will have the high bit set, so
	2099	you can test if you need to do something special with this character
	2100	like this:
	2101
	2102	UV uv;
	2103
	2104	if (utf & 0x80)
	2105	/* Must treat this as UTF8 */
	2106	uv = utf8_to_uv(utf);
	2107	else
	2108	/* OK to treat this character as a byte */
	2109	uv = *utf;
	2110
	2111	You can also see in that example that we use C<utf8_to_uv> to get the
	2112	value of the character; the inverse function C<uv_to_utf8> is available
	2113	for putting a UV into UTF8:
	2114
	2115	if (uv > 0x80)
	2116	/* Must treat this as UTF8 */
	2117	utf8 = uv_to_utf8(utf8, uv);
	2118	else
	2119	/* OK to treat this character as a byte */
	2120	*utf8++ = uv;
	2121
	2122	You B<must> convert characters to UVs using the above functions if
	2123	you're ever in a situation where you have to match UTF8 and non-UTF8
	2124	characters. You may not skip over UTF8 characters in this case. If you
	2125	do this, you'll lose the ability to match hi-bit non-UTF8 characters;
	2126	for instance, if your UTF8 string contains C<v196.172>, and you skip
	2127	that character, you can never match a C<chr(200)> in a non-UTF8 string.
	2128	So don't do that!
	2129
	2130	=head2 How does Perl store UTF8 strings?
	2131
	2132	Currently, Perl deals with Unicode strings and non-Unicode strings
	2133	slightly differently. If a string has been identified as being UTF-8
	2134	encoded, Perl will set a flag in the SV, C<SVf_UTF8>. You can check and
	2135	manipulate this flag with the following macros:
	2136
	2137	SvUTF8(sv)
	2138	SvUTF8_on(sv)
	2139	SvUTF8_off(sv)
	2140
	2141	This flag has an important effect on Perl's treatment of the string: if
	2142	Unicode data is not properly distinguished, regular expressions,
	2143	C<length>, C<substr> and other string handling operations will have
	2144	undesirable results.
	2145
	2146	The problem comes when you have, for instance, a string that isn't
	2147	flagged is UTF8, and contains a byte sequence that could be UTF8 -
	2148	especially when combining non-UTF8 and UTF8 strings.
	2149
	2150	Never forget that the C<SVf_UTF8> flag is separate to the PV value; you
	2151	need be sure you don't accidentally knock it off while you're
	2152	manipulating SVs. More specifically, you cannot expect to do this:
	2153
	2154	SV *sv;
	2155	SV *nsv;
	2156	STRLEN len;
	2157	char *p;
	2158
	2159	p = SvPV(sv, len);
	2160	frobnicate(p);
	2161	nsv = newSVpvn(p, len);
	2162
	2163	The C<char*> string does not tell you the whole story, and you can't
	2164	copy or reconstruct an SV just by copying the string value. Check if the
	2165	old SV has the UTF8 flag set, and act accordingly:
	2166
	2167	p = SvPV(sv, len);
	2168	frobnicate(p);
	2169	nsv = newSVpvn(p, len);
	2170	if (SvUTF8(sv))
	2171	SvUTF8_on(nsv);
	2172
	2173	In fact, your C<frobnicate> function should be made aware of whether or
	2174	not it's dealing with UTF8 data, so that it can handle the string
	2175	appropriately.
	2176
	2177	=head2 How do I convert a string to UTF8?
	2178
	2179	If you're mixing UTF8 and non-UTF8 strings, you might find it necessary
	2180	to upgrade one of the strings to UTF8. If you've got an SV, the easiest
	2181	way to do this is:
	2182
	2183	sv_utf8_upgrade(sv);
	2184
	2185	However, you must not do this, for example:
	2186
	2187	if (!SvUTF8(left))
	2188	sv_utf8_upgrade(left);
	2189
	2190	If you do this in a binary operator, you will actually change one of the
	2191	strings that came into the operator, and, while it shouldn't be noticeable
	2192	by the end user, it can cause problems.
	2193
	2194	Instead, C<bytes_to_utf8> will give you a UTF8-encoded B<copy> of its
	2195	string argument. This is useful for having the data available for
	2196	comparisons and so on, without harming the original SV. There's also
	2197	C<utf8_to_bytes> to go the other way, but naturally, this will fail if
	2198	the string contains any characters above 255 that can't be represented
	2199	in a single byte.
	2200
	2201	=head2 Is there anything else I need to know?
	2202
	2203	Not really. Just remember these things:
	2204
	2205	=over 3
	2206
	2207	=item *
	2208
	2209	There's no way to tell if a string is UTF8 or not. You can tell if an SV
	2210	is UTF8 by looking at is C<SvUTF8> flag. Don't forget to set the flag if
	2211	something should be UTF8. Treat the flag as part of the PV, even though
	2212	it's not - if you pass on the PV to somewhere, pass on the flag too.
	2213
	2214	=item *
	2215
	2216	If a string is UTF8, B<always> use C<utf8_to_uv> to get at the value,
	2217	unless C<!(s & 0x80)> in which case you can use C<s>.
	2218
	2219	=item *
	2220
	2221	When writing to a UTF8 string, B<always> use C<uv_to_utf8>, unless
	2222	C<uv < 0x80> in which case you can use C<*s = uv>.
	2223
	2224	=item *
	2225
	2226	Mixing UTF8 and non-UTF8 strings is tricky. Use C<bytes_to_utf8> to get
	2227	a new string which is UTF8 encoded. There are tricks you can use to
	2228	delay deciding whether you need to use a UTF8 string until you get to a
	2229	high character - C<HALF_UPGRADE> is one of those.
	2230
	2231	=back
	2232
	2233	=head1 AUTHORS
	2234
	2235	Until May 1997, this document was maintained by Jeff Okamoto
	2236	<okamoto@corp.hp.com>. It is now maintained as part of Perl itself
	2237	by the Perl 5 Porters <perl5-porters@perl.org>.
	2238
	2239	With lots of help and suggestions from Dean Roehrich, Malcolm Beattie,
	2240	Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil
	2241	Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer,
	2242	Stephen McCamant, and Gurusamy Sarathy.
	2243
	2244	API Listing originally by Dean Roehrich <roehrich@cray.com>.
	2245
	2246	Modifications to autogenerate the API listing (L<perlapi>) by Benjamin
	2247	Stuhl.
	2248
	2249	=head1 SEE ALSO
	2250
	2251	perlapi(1), perlintern(1), perlxs(1), perlembed(1)