perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=head1 NAME
	2
	3	perlguts - Introduction to the Perl API
	4
	5	=head1 DESCRIPTION
	6
	7	This document attempts to describe how to use the Perl API, as well as
	8	to provide some info on the basic workings of the Perl core. It is far
	9	from complete and probably contains many errors. Please refer any
	10	questions or comments to the author below.
	11
	12	=head1 Variables
	13
	14	=head2 Datatypes
	15
	16	Perl has three typedefs that handle Perl's three main data types:
	17
	18	SV Scalar Value
	19	AV Array Value
	20	HV Hash Value
	21
	22	Each typedef has specific routines that manipulate the various data types.
	23
	24	=head2 What is an "IV"?
	25
	26	Perl uses a special typedef IV which is a simple signed integer type that is
	27	guaranteed to be large enough to hold a pointer (as well as an integer).
	28	Additionally, there is the UV, which is simply an unsigned IV.
	29
	30	Perl also uses two special typedefs, I32 and I16, which will always be at
	31	least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16,
	32	as well.) They will usually be exactly 32 and 16 bits long, but on Crays
	33	they will both be 64 bits.
	34
	35	=head2 Working with SVs
	36
	37	An SV can be created and loaded with one command. There are five types of
	38	values that can be loaded: an integer value (IV), an unsigned integer
	39	value (UV), a double (NV), a string (PV), and another scalar (SV).
	40	("PV" stands for "Pointer Value". You might think that it is misnamed
	41	because it is described as pointing only to strings. However, it is
	42	possible to have it point to other things. For example, it could point
	43	to an array of UVs. But,
	44	using it for non-strings requires care, as the underlying assumption of
	45	much of the internals is that PVs are just for strings. Often, for
	46	example, a trailing C<NUL> is tacked on automatically. The non-string use
	47	is documented only in this paragraph.)
	48
	49	The seven routines are:
	50
	51	SV* newSViv(IV);
	52	SV* newSVuv(UV);
	53	SV* newSVnv(double);
	54	SV* newSVpv(const char*, STRLEN);
	55	SV* newSVpvn(const char*, STRLEN);
	56	SV* newSVpvf(const char*, ...);
	57	SV* newSVsv(SV*);
	58
	59	C<STRLEN> is an integer type (Size_t, usually defined as size_t in
	60	F<config.h>) guaranteed to be large enough to represent the size of
	61	any string that perl can handle.
	62
	63	In the unlikely case of a SV requiring more complex initialization, you
	64	can create an empty SV with newSV(len). If C<len> is 0 an empty SV of
	65	type NULL is returned, else an SV of type PV is returned with len + 1 (for
	66	the C<NUL>) bytes of storage allocated, accessible via SvPVX. In both cases
	67	the SV has the undef value.
	68
	69	SV sv = newSV(0); / no storage allocated */
	70	SV sv = newSV(10); / 10 (+1) bytes of uninitialised storage
	71	* allocated */
	72
	73	To change the value of an I<already-existing> SV, there are eight routines:
	74
	75	void sv_setiv(SV*, IV);
	76	void sv_setuv(SV*, UV);
	77	void sv_setnv(SV*, double);
	78	void sv_setpv(SV, const char);
	79	void sv_setpvn(SV, const char, STRLEN)
	80	void sv_setpvf(SV, const char, ...);
	81	void sv_vsetpvfn(SV, const char, STRLEN, va_list *,
	82	SV *, I32, bool );
	83	void sv_setsv(SV, SV);
	84
	85	Notice that you can choose to specify the length of the string to be
	86	assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may
	87	allow Perl to calculate the length by using C<sv_setpv> or by specifying
	88	0 as the second argument to C<newSVpv>. Be warned, though, that Perl will
	89	determine the string's length by using C<strlen>, which depends on the
	90	string terminating with a C<NUL> character, and not otherwise containing
	91	NULs.
	92
	93	The arguments of C<sv_setpvf> are processed like C<sprintf>, and the
	94	formatted output becomes the value.
	95
	96	C<sv_vsetpvfn> is an analogue of C<vsprintf>, but it allows you to specify
	97	either a pointer to a variable argument list or the address and length of
	98	an array of SVs. The last argument points to a boolean; on return, if that
	99	boolean is true, then locale-specific information has been used to format
	100	the string, and the string's contents are therefore untrustworthy (see
	101	L<perlsec>). This pointer may be NULL if that information is not
	102	important. Note that this function requires you to specify the length of
	103	the format.
	104
	105	The C<sv_set*()> functions are not generic enough to operate on values
	106	that have "magic". See L<Magic Virtual Tables> later in this document.
	107
	108	All SVs that contain strings should be terminated with a C<NUL> character.
	109	If it is not C<NUL>-terminated there is a risk of
	110	core dumps and corruptions from code which passes the string to C
	111	functions or system calls which expect a C<NUL>-terminated string.
	112	Perl's own functions typically add a trailing C<NUL> for this reason.
	113	Nevertheless, you should be very careful when you pass a string stored
	114	in an SV to a C function or system call.
	115
	116	To access the actual value that an SV points to, you can use the macros:
	117
	118	SvIV(SV*)
	119	SvUV(SV*)
	120	SvNV(SV*)
	121	SvPV(SV*, STRLEN len)
	122	SvPV_nolen(SV*)
	123
	124	which will automatically coerce the actual scalar type into an IV, UV, double,
	125	or string.
	126
	127	In the C<SvPV> macro, the length of the string returned is placed into the
	128	variable C<len> (this is a macro, so you do I<not> use C<&len>). If you do
	129	not care what the length of the data is, use the C<SvPV_nolen> macro.
	130	Historically the C<SvPV> macro with the global variable C<PL_na> has been
	131	used in this case. But that can be quite inefficient because C<PL_na> must
	132	be accessed in thread-local storage in threaded Perl. In any case, remember
	133	that Perl allows arbitrary strings of data that may both contain NULs and
	134	might not be terminated by a C<NUL>.
	135
	136	Also remember that C doesn't allow you to safely say C<foo(SvPV(s, len),
	137	len);>. It might work with your
	138	compiler, but it won't work for everyone.
	139	Break this sort of statement up into separate assignments:
	140
	141	SV *s;
	142	STRLEN len;
	143	char *ptr;
	144	ptr = SvPV(s, len);
	145	foo(ptr, len);
	146
	147	If you want to know if the scalar value is TRUE, you can use:
	148
	149	SvTRUE(SV*)
	150
	151	Although Perl will automatically grow strings for you, if you need to force
	152	Perl to allocate more memory for your SV, you can use the macro
	153
	154	SvGROW(SV*, STRLEN newlen)
	155
	156	which will determine if more memory needs to be allocated. If so, it will
	157	call the function C<sv_grow>. Note that C<SvGROW> can only increase, not
	158	decrease, the allocated memory of an SV and that it does not automatically
	159	add space for the trailing C<NUL> byte (perl's own string functions typically do
	160	C<SvGROW(sv, len + 1)>).
	161
	162	If you want to write to an existing SV's buffer and set its value to a
	163	string, use SvPV_force() or one of its variants to force the SV to be
	164	a PV. This will remove any of various types of non-stringness from
	165	the SV while preserving the content of the SV in the PV. This can be
	166	used, for example, to append data from an API function to a buffer
	167	without extra copying:
	168
	169	(void)SvPVbyte_force(sv, len);
	170	s = SvGROW(sv, len + needlen + 1);
	171	/* something that modifies up to needlen bytes at s+len, but
	172	modifies newlen bytes
	173	eg. newlen = read(fd, s + len, needlen);
	174	ignoring errors for these examples
	175	*/
	176	s[len + newlen] = '\0';
	177	SvCUR_set(sv, len + newlen);
	178	SvUTF8_off(sv);
	179	SvSETMAGIC(sv);
	180
	181	If you already have the data in memory or if you want to keep your
	182	code simple, you can use one of the sv_cat*() variants, such as
	183	sv_catpvn(). If you want to insert anywhere in the string you can use
	184	sv_insert() or sv_insert_flags().
	185
	186	If you don't need the existing content of the SV, you can avoid some
	187	copying with:
	188
	189	sv_setpvn(sv, "", 0);
	190	s = SvGROW(sv, needlen + 1);
	191	/* something that modifies up to needlen bytes at s, but modifies
	192	newlen bytes
	193	eg. newlen = read(fd, s. needlen);
	194	*/
	195	s[newlen] = '\0';
	196	SvCUR_set(sv, newlen);
	197	SvPOK_only(sv); /* also clears SVf_UTF8 */
	198	SvSETMAGIC(sv);
	199
	200	Again, if you already have the data in memory or want to avoid the
	201	complexity of the above, you can use sv_setpvn().
	202
	203	If you have a buffer allocated with Newx() and want to set that as the
	204	SV's value, you can use sv_usepvn_flags(). That has some requirements
	205	if you want to avoid perl re-allocating the buffer to fit the trailing
	206	NUL:
	207
	208	Newx(buf, somesize+1, char);
	209	/* ... fill in buf ... */
	210	buf[somesize] = '\0';
	211	sv_usepvn_flags(sv, buf, somesize, SV_SMAGIC \| SV_HAS_TRAILING_NUL);
	212	/* buf now belongs to perl, don't release it */
	213
	214	If you have an SV and want to know what kind of data Perl thinks is stored
	215	in it, you can use the following macros to check the type of SV you have.
	216
	217	SvIOK(SV*)
	218	SvNOK(SV*)
	219	SvPOK(SV*)
	220
	221	You can get and set the current length of the string stored in an SV with
	222	the following macros:
	223
	224	SvCUR(SV*)
	225	SvCUR_set(SV*, I32 val)
	226
	227	You can also get a pointer to the end of the string stored in the SV
	228	with the macro:
	229
	230	SvEND(SV*)
	231
	232	But note that these last three macros are valid only if C<SvPOK()> is true.
	233
	234	If you want to append something to the end of string stored in an C<SV*>,
	235	you can use the following functions:
	236
	237	void sv_catpv(SV, const char);
	238	void sv_catpvn(SV, const char, STRLEN);
	239	void sv_catpvf(SV, const char, ...);
	240	void sv_vcatpvfn(SV, const char, STRLEN, va_list , SV *,
	241	I32, bool);
	242	void sv_catsv(SV, SV);
	243
	244	The first function calculates the length of the string to be appended by
	245	using C<strlen>. In the second, you specify the length of the string
	246	yourself. The third function processes its arguments like C<sprintf> and
	247	appends the formatted output. The fourth function works like C<vsprintf>.
	248	You can specify the address and length of an array of SVs instead of the
	249	va_list argument. The fifth function
	250	extends the string stored in the first
	251	SV with the string stored in the second SV. It also forces the second SV
	252	to be interpreted as a string.
	253
	254	The C<sv_cat*()> functions are not generic enough to operate on values that
	255	have "magic". See L<Magic Virtual Tables> later in this document.
	256
	257	If you know the name of a scalar variable, you can get a pointer to its SV
	258	by using the following:
	259
	260	SV* get_sv("package::varname", 0);
	261
	262	This returns NULL if the variable does not exist.
	263
	264	If you want to know if this variable (or any other SV) is actually C<defined>,
	265	you can call:
	266
	267	SvOK(SV*)
	268
	269	The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>.
	270
	271	Its address can be used whenever an C<SV*> is needed. Make sure that
	272	you don't try to compare a random sv with C<&PL_sv_undef>. For example
	273	when interfacing Perl code, it'll work correctly for:
	274
	275	foo(undef);
	276
	277	But won't work when called as:
	278
	279	$x = undef;
	280	foo($x);
	281
	282	So to repeat always use SvOK() to check whether an sv is defined.
	283
	284	Also you have to be careful when using C<&PL_sv_undef> as a value in
	285	AVs or HVs (see L<AVs, HVs and undefined values>).
	286
	287	There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain
	288	boolean TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their
	289	addresses can be used whenever an C<SV*> is needed.
	290
	291	Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>.
	292	Take this code:
	293
	294	SV* sv = (SV*) 0;
	295	if (I-am-to-return-a-real-value) {
	296	sv = sv_2mortal(newSViv(42));
	297	}
	298	sv_setsv(ST(0), sv);
	299
	300	This code tries to return a new SV (which contains the value 42) if it should
	301	return a real value, or undef otherwise. Instead it has returned a NULL
	302	pointer which, somewhere down the line, will cause a segmentation violation,
	303	bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the
	304	first line and all will be well.
	305
	306	To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this
	307	call is not necessary (see L<Reference Counts and Mortality>).
	308
	309	=head2 Offsets
	310
	311	Perl provides the function C<sv_chop> to efficiently remove characters
	312	from the beginning of a string; you give it an SV and a pointer to
	313	somewhere inside the PV, and it discards everything before the
	314	pointer. The efficiency comes by means of a little hack: instead of
	315	actually removing the characters, C<sv_chop> sets the flag C<OOK>
	316	(offset OK) to signal to other functions that the offset hack is in
	317	effect, and it moves the PV pointer (called C<SvPVX>) forward
	318	by the number of bytes chopped off, and adjusts C<SvCUR> and C<SvLEN>
	319	accordingly. (A portion of the space between the old and new PV
	320	pointers is used to store the count of chopped bytes.)
	321
	322	Hence, at this point, the start of the buffer that we allocated lives
	323	at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing
	324	into the middle of this allocated storage.
	325
	326	This is best demonstrated by example. Normally copy-on-write will prevent
	327	the substitution from operator from using this hack, but if you can craft a
	328	string for which copy-on-write is not possible, you can see it in play. In
	329	the current implementation, the final byte of a string buffer is used as a
	330	copy-on-write reference count. If the buffer is not big enough, then
	331	copy-on-write is skipped. First have a look at an empty string:
	332
	333	% ./perl -Ilib -MDevel::Peek -le '$a=""; $a .= ""; Dump $a'
	334	SV = PV(0x7ffb7c008a70) at 0x7ffb7c030390
	335	REFCNT = 1
	336	FLAGS = (POK,pPOK)
	337	PV = 0x7ffb7bc05b50 ""\0
	338	CUR = 0
	339	LEN = 10
	340
	341	Notice here the LEN is 10. (It may differ on your platform.) Extend the
	342	length of the string to one less than 10, and do a substitution:
	343
	344	% ./perl -Ilib -MDevel::Peek -le '$a=""; $a.="123456789"; $a=~s/.//; Dump($a)'
	345	SV = PV(0x7ffa04008a70) at 0x7ffa04030390
	346	REFCNT = 1
	347	FLAGS = (POK,OOK,pPOK)
	348	OFFSET = 1
	349	PV = 0x7ffa03c05b61 ( "\1" . ) "23456789"\0
	350	CUR = 8
	351	LEN = 9
	352
	353	Here the number of bytes chopped off (1) is shown next as the OFFSET. The
	354	portion of the string between the "real" and the "fake" beginnings is
	355	shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect
	356	the fake beginning, not the real one. (The first character of the string
	357	buffer happens to have changed to "\1" here, not "1", because the current
	358	implementation stores the offset count in the string buffer. This is
	359	subject to change.)
	360
	361	Something similar to the offset hack is performed on AVs to enable
	362	efficient shifting and splicing off the beginning of the array; while
	363	C<AvARRAY> points to the first element in the array that is visible from
	364	Perl, C<AvALLOC> points to the real start of the C array. These are
	365	usually the same, but a C<shift> operation can be carried out by
	366	increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvMAX>.
	367	Again, the location of the real start of the C array only comes into
	368	play when freeing the array. See C<av_shift> in F<av.c>.
	369
	370	=head2 What's Really Stored in an SV?
	371
	372	Recall that the usual method of determining the type of scalar you have is
	373	to use C<Sv*OK> macros. Because a scalar can be both a number and a string,
	374	usually these macros will always return TRUE and calling the C<Sv*V>
	375	macros will do the appropriate conversion of string to integer/double or
	376	integer/double to string.
	377
	378	If you I<really> need to know if you have an integer, double, or string
	379	pointer in an SV, you can use the following three macros instead:
	380
	381	SvIOKp(SV*)
	382	SvNOKp(SV*)
	383	SvPOKp(SV*)
	384
	385	These will tell you if you truly have an integer, double, or string pointer
	386	stored in your SV. The "p" stands for private.
	387
	388	There are various ways in which the private and public flags may differ.
	389	For example, in perl 5.16 and earlier a tied SV may have a valid
	390	underlying value in the IV slot (so SvIOKp is true), but the data
	391	should be accessed via the FETCH routine rather than directly,
	392	so SvIOK is false. (In perl 5.18 onwards, tied scalars use
	393	the flags the same way as untied scalars.) Another is when
	394	numeric conversion has occurred and precision has been lost: only the
	395	private flag is set on 'lossy' values. So when an NV is converted to an
	396	IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be.
	397
	398	In general, though, it's best to use the C<Sv*V> macros.
	399
	400	=head2 Working with AVs
	401
	402	There are two ways to create and load an AV. The first method creates an
	403	empty AV:
	404
	405	AV* newAV();
	406
	407	The second method both creates the AV and initially populates it with SVs:
	408
	409	AV* av_make(SSize_t num, SV **ptr);
	410
	411	The second argument points to an array containing C<num> C<SV*>'s. Once the
	412	AV has been created, the SVs can be destroyed, if so desired.
	413
	414	Once the AV has been created, the following operations are possible on it:
	415
	416	void av_push(AV, SV);
	417	SV* av_pop(AV*);
	418	SV* av_shift(AV*);
	419	void av_unshift(AV*, SSize_t num);
	420
	421	These should be familiar operations, with the exception of C<av_unshift>.
	422	This routine adds C<num> elements at the front of the array with the C<undef>
	423	value. You must then use C<av_store> (described below) to assign values
	424	to these new elements.
	425
	426	Here are some other functions:
	427
	428	SSize_t av_top_index(AV*);
	429	SV** av_fetch(AV*, SSize_t key, I32 lval);
	430	SV** av_store(AV, SSize_t key, SV val);
	431
	432	The C<av_top_index> function returns the highest index value in an array (just
	433	like $#array in Perl). If the array is empty, -1 is returned. The
	434	C<av_fetch> function returns the value at index C<key>, but if C<lval>
	435	is non-zero, then C<av_fetch> will store an undef value at that index.
	436	The C<av_store> function stores the value C<val> at index C<key>, and does
	437	not increment the reference count of C<val>. Thus the caller is responsible
	438	for taking care of that, and if C<av_store> returns NULL, the caller will
	439	have to decrement the reference count to avoid a memory leak. Note that
	440	C<av_fetch> and C<av_store> both return C<SV*>'s, not C<SV>'s as their
	441	return value.
	442
	443	A few more:
	444
	445	void av_clear(AV*);
	446	void av_undef(AV*);
	447	void av_extend(AV*, SSize_t key);
	448
	449	The C<av_clear> function deletes all the elements in the AV* array, but
	450	does not actually delete the array itself. The C<av_undef> function will
	451	delete all the elements in the array plus the array itself. The
	452	C<av_extend> function extends the array so that it contains at least C<key+1>
	453	elements. If C<key+1> is less than the currently allocated length of the array,
	454	then nothing is done.
	455
	456	If you know the name of an array variable, you can get a pointer to its AV
	457	by using the following:
	458
	459	AV* get_av("package::varname", 0);
	460
	461	This returns NULL if the variable does not exist.
	462
	463	See L<Understanding the Magic of Tied Hashes and Arrays> for more
	464	information on how to use the array access functions on tied arrays.
	465
	466	=head2 Working with HVs
	467
	468	To create an HV, you use the following routine:
	469
	470	HV* newHV();
	471
	472	Once the HV has been created, the following operations are possible on it:
	473
	474	SV** hv_store(HV, const char key, U32 klen, SV* val, U32 hash);
	475	SV** hv_fetch(HV, const char key, U32 klen, I32 lval);
	476
	477	The C<klen> parameter is the length of the key being passed in (Note that
	478	you cannot pass 0 in as a value of C<klen> to tell Perl to measure the
	479	length of the key). The C<val> argument contains the SV pointer to the
	480	scalar being stored, and C<hash> is the precomputed hash value (zero if
	481	you want C<hv_store> to calculate it for you). The C<lval> parameter
	482	indicates whether this fetch is actually a part of a store operation, in
	483	which case a new undefined value will be added to the HV with the supplied
	484	key and C<hv_fetch> will return as if the value had already existed.
	485
	486	Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just
	487	C<SV*>. To access the scalar value, you must first dereference the return
	488	value. However, you should check to make sure that the return value is
	489	not NULL before dereferencing it.
	490
	491	The first of these two functions checks if a hash table entry exists, and the
	492	second deletes it.
	493
	494	bool hv_exists(HV, const char key, U32 klen);
	495	SV* hv_delete(HV, const char key, U32 klen, I32 flags);
	496
	497	If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will
	498	create and return a mortal copy of the deleted value.
	499
	500	And more miscellaneous functions:
	501
	502	void hv_clear(HV*);
	503	void hv_undef(HV*);
	504
	505	Like their AV counterparts, C<hv_clear> deletes all the entries in the hash
	506	table but does not actually delete the hash table. The C<hv_undef> deletes
	507	both the entries and the hash table itself.
	508
	509	Perl keeps the actual data in a linked list of structures with a typedef of HE.
	510	These contain the actual key and value pointers (plus extra administrative
	511	overhead). The key is a string pointer; the value is an C<SV*>. However,
	512	once you have an C<HE*>, to get the actual key and value, use the routines
	513	specified below.
	514
	515	I32 hv_iterinit(HV*);
	516	/* Prepares starting point to traverse hash table */
	517	HE* hv_iternext(HV*);
	518	/* Get the next entry, and return a pointer to a
	519	structure that has both the key and value */
	520	char* hv_iterkey(HE* entry, I32* retlen);
	521	/* Get the key from an HE structure and also return
	522	the length of the key string */
	523	SV* hv_iterval(HV, HE entry);
	524	/* Return an SV pointer to the value of the HE
	525	structure */
	526	SV* hv_iternextsv(HV, char* key, I32* retlen);
	527	/* This convenience routine combines hv_iternext,
	528	hv_iterkey, and hv_iterval. The key and retlen
	529	arguments are return values for the key and its
	530	length. The value is returned in the SV* argument */
	531
	532	If you know the name of a hash variable, you can get a pointer to its HV
	533	by using the following:
	534
	535	HV* get_hv("package::varname", 0);
	536
	537	This returns NULL if the variable does not exist.
	538
	539	The hash algorithm is defined in the C<PERL_HASH> macro:
	540
	541	PERL_HASH(hash, key, klen)
	542
	543	The exact implementation of this macro varies by architecture and version
	544	of perl, and the return value may change per invocation, so the value
	545	is only valid for the duration of a single perl process.
	546
	547	See L<Understanding the Magic of Tied Hashes and Arrays> for more
	548	information on how to use the hash access functions on tied hashes.
	549
	550	=head2 Hash API Extensions
	551
	552	Beginning with version 5.004, the following functions are also supported:
	553
	554	HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash);
	555	HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash);
	556
	557	bool hv_exists_ent (HV* tb, SV* key, U32 hash);
	558	SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
	559
	560	SV* hv_iterkeysv (HE* entry);
	561
	562	Note that these functions take C<SV*> keys, which simplifies writing
	563	of extension code that deals with hash structures. These functions
	564	also allow passing of C<SV*> keys to C<tie> functions without forcing
	565	you to stringify the keys (unlike the previous set of functions).
	566
	567	They also return and accept whole hash entries (C<HE*>), making their
	568	use more efficient (since the hash number for a particular string
	569	doesn't have to be recomputed every time). See L<perlapi> for detailed
	570	descriptions.
	571
	572	The following macros must always be used to access the contents of hash
	573	entries. Note that the arguments to these macros must be simple
	574	variables, since they may get evaluated more than once. See
	575	L<perlapi> for detailed descriptions of these macros.
	576
	577	HePV(HE* he, STRLEN len)
	578	HeVAL(HE* he)
	579	HeHASH(HE* he)
	580	HeSVKEY(HE* he)
	581	HeSVKEY_force(HE* he)
	582	HeSVKEY_set(HE* he, SV* sv)
	583
	584	These two lower level macros are defined, but must only be used when
	585	dealing with keys that are not C<SV*>s:
	586
	587	HeKEY(HE* he)
	588	HeKLEN(HE* he)
	589
	590	Note that both C<hv_store> and C<hv_store_ent> do not increment the
	591	reference count of the stored C<val>, which is the caller's responsibility.
	592	If these functions return a NULL value, the caller will usually have to
	593	decrement the reference count of C<val> to avoid a memory leak.
	594
	595	=head2 AVs, HVs and undefined values
	596
	597	Sometimes you have to store undefined values in AVs or HVs. Although
	598	this may be a rare case, it can be tricky. That's because you're
	599	used to using C<&PL_sv_undef> if you need an undefined SV.
	600
	601	For example, intuition tells you that this XS code:
	602
	603	AV *av = newAV();
	604	av_store( av, 0, &PL_sv_undef );
	605
	606	is equivalent to this Perl code:
	607
	608	my @av;
	609	$av[0] = undef;
	610
	611	Unfortunately, this isn't true. In perl 5.18 and earlier, AVs use C<&PL_sv_undef> as a marker
	612	for indicating that an array element has not yet been initialized.
	613	Thus, C<exists $av[0]> would be true for the above Perl code, but
	614	false for the array generated by the XS code. In perl 5.20, storing
	615	&PL_sv_undef will create a read-only element, because the scalar
	616	&PL_sv_undef itself is stored, not a copy.
	617
	618	Similar problems can occur when storing C<&PL_sv_undef> in HVs:
	619
	620	hv_store( hv, "key", 3, &PL_sv_undef, 0 );
	621
	622	This will indeed make the value C<undef>, but if you try to modify
	623	the value of C<key>, you'll get the following error:
	624
	625	Modification of non-creatable hash value attempted
	626
	627	In perl 5.8.0, C<&PL_sv_undef> was also used to mark placeholders
	628	in restricted hashes. This caused such hash entries not to appear
	629	when iterating over the hash or when checking for the keys
	630	with the C<hv_exists> function.
	631
	632	You can run into similar problems when you store C<&PL_sv_yes> or
	633	C<&PL_sv_no> into AVs or HVs. Trying to modify such elements
	634	will give you the following error:
	635
	636	Modification of a read-only value attempted
	637
	638	To make a long story short, you can use the special variables
	639	C<&PL_sv_undef>, C<&PL_sv_yes> and C<&PL_sv_no> with AVs and
	640	HVs, but you have to make sure you know what you're doing.
	641
	642	Generally, if you want to store an undefined value in an AV
	643	or HV, you should not use C<&PL_sv_undef>, but rather create a
	644	new undefined value using the C<newSV> function, for example:
	645
	646	av_store( av, 42, newSV(0) );
	647	hv_store( hv, "foo", 3, newSV(0), 0 );
	648
	649	=head2 References
	650
	651	References are a special type of scalar that point to other data types
	652	(including other references).
	653
	654	To create a reference, use either of the following functions:
	655
	656	SV* newRV_inc((SV*) thing);
	657	SV* newRV_noinc((SV*) thing);
	658
	659	The C<thing> argument can be any of an C<SV>, C<AV>, or C<HV*>. The
	660	functions are identical except that C<newRV_inc> increments the reference
	661	count of the C<thing>, while C<newRV_noinc> does not. For historical
	662	reasons, C<newRV> is a synonym for C<newRV_inc>.
	663
	664	Once you have a reference, you can use the following macro to dereference
	665	the reference:
	666
	667	SvRV(SV*)
	668
	669	then call the appropriate routines, casting the returned C<SV*> to either an
	670	C<AV> or C<HV>, if required.
	671
	672	To determine if an SV is a reference, you can use the following macro:
	673
	674	SvROK(SV*)
	675
	676	To discover what type of value the reference refers to, use the following
	677	macro and then check the return value.
	678
	679	SvTYPE(SvRV(SV*))
	680
	681	The most useful types that will be returned are:
	682
	683	< SVt_PVAV Scalar
	684	SVt_PVAV Array
	685	SVt_PVHV Hash
	686	SVt_PVCV Code
	687	SVt_PVGV Glob (possibly a file handle)
	688
	689	See L<perlapi/svtype> for more details.
	690
	691	=head2 Blessed References and Class Objects
	692
	693	References are also used to support object-oriented programming. In perl's
	694	OO lexicon, an object is simply a reference that has been blessed into a
	695	package (or class). Once blessed, the programmer may now use the reference
	696	to access the various methods in the class.
	697
	698	A reference can be blessed into a package with the following function:
	699
	700	SV* sv_bless(SV* sv, HV* stash);
	701
	702	The C<sv> argument must be a reference value. The C<stash> argument
	703	specifies which class the reference will belong to. See
	704	L<Stashes and Globs> for information on converting class names into stashes.
	705
	706	/* Still under construction */
	707
	708	The following function upgrades rv to reference if not already one.
	709	Creates a new SV for rv to point to. If C<classname> is non-null, the SV
	710	is blessed into the specified class. SV is returned.
	711
	712	SV* newSVrv(SV* rv, const char* classname);
	713
	714	The following three functions copy integer, unsigned integer or double
	715	into an SV whose reference is C<rv>. SV is blessed if C<classname> is
	716	non-null.
	717
	718	SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
	719	SV* sv_setref_uv(SV* rv, const char* classname, UV uv);
	720	SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
	721
	722	The following function copies the pointer value (I<the address, not the
	723	string!>) into an SV whose reference is rv. SV is blessed if C<classname>
	724	is non-null.
	725
	726	SV* sv_setref_pv(SV* rv, const char* classname, void* pv);
	727
	728	The following function copies a string into an SV whose reference is C<rv>.
	729	Set length to 0 to let Perl calculate the string length. SV is blessed if
	730	C<classname> is non-null.
	731
	732	SV* sv_setref_pvn(SV* rv, const char* classname, char* pv,
	733	STRLEN length);
	734
	735	The following function tests whether the SV is blessed into the specified
	736	class. It does not check inheritance relationships.
	737
	738	int sv_isa(SV* sv, const char* name);
	739
	740	The following function tests whether the SV is a reference to a blessed object.
	741
	742	int sv_isobject(SV* sv);
	743
	744	The following function tests whether the SV is derived from the specified
	745	class. SV can be either a reference to a blessed object or a string
	746	containing a class name. This is the function implementing the
	747	C<UNIVERSAL::isa> functionality.
	748
	749	bool sv_derived_from(SV* sv, const char* name);
	750
	751	To check if you've got an object derived from a specific class you have
	752	to write:
	753
	754	if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }
	755
	756	=head2 Creating New Variables
	757
	758	To create a new Perl variable with an undef value which can be accessed from
	759	your Perl script, use the following routines, depending on the variable type.
	760
	761	SV* get_sv("package::varname", GV_ADD);
	762	AV* get_av("package::varname", GV_ADD);
	763	HV* get_hv("package::varname", GV_ADD);
	764
	765	Notice the use of GV_ADD as the second parameter. The new variable can now
	766	be set, using the routines appropriate to the data type.
	767
	768	There are additional macros whose values may be bitwise OR'ed with the
	769	C<GV_ADD> argument to enable certain extra features. Those bits are:
	770
	771	=over
	772
	773	=item GV_ADDMULTI
	774
	775	Marks the variable as multiply defined, thus preventing the:
	776
	777	Name <varname> used only once: possible typo
	778
	779	warning.
	780
	781	=item GV_ADDWARN
	782
	783	Issues the warning:
	784
	785	Had to create <varname> unexpectedly
	786
	787	if the variable did not exist before the function was called.
	788
	789	=back
	790
	791	If you do not specify a package name, the variable is created in the current
	792	package.
	793
	794	=head2 Reference Counts and Mortality
	795
	796	Perl uses a reference count-driven garbage collection mechanism. SVs,
	797	AVs, or HVs (xV for short in the following) start their life with a
	798	reference count of 1. If the reference count of an xV ever drops to 0,
	799	then it will be destroyed and its memory made available for reuse.
	800
	801	This normally doesn't happen at the Perl level unless a variable is
	802	undef'ed or the last variable holding a reference to it is changed or
	803	overwritten. At the internal level, however, reference counts can be
	804	manipulated with the following macros:
	805
	806	int SvREFCNT(SV* sv);
	807	SV* SvREFCNT_inc(SV* sv);
	808	void SvREFCNT_dec(SV* sv);
	809
	810	However, there is one other function which manipulates the reference
	811	count of its argument. The C<newRV_inc> function, you will recall,
	812	creates a reference to the specified argument. As a side effect,
	813	it increments the argument's reference count. If this is not what
	814	you want, use C<newRV_noinc> instead.
	815
	816	For example, imagine you want to return a reference from an XSUB function.
	817	Inside the XSUB routine, you create an SV which initially has a reference
	818	count of one. Then you call C<newRV_inc>, passing it the just-created SV.
	819	This returns the reference as a new SV, but the reference count of the
	820	SV you passed to C<newRV_inc> has been incremented to two. Now you
	821	return the reference from the XSUB routine and forget about the SV.
	822	But Perl hasn't! Whenever the returned reference is destroyed, the
	823	reference count of the original SV is decreased to one and nothing happens.
	824	The SV will hang around without any way to access it until Perl itself
	825	terminates. This is a memory leak.
	826
	827	The correct procedure, then, is to use C<newRV_noinc> instead of
	828	C<newRV_inc>. Then, if and when the last reference is destroyed,
	829	the reference count of the SV will go to zero and it will be destroyed,
	830	stopping any memory leak.
	831
	832	There are some convenience functions available that can help with the
	833	destruction of xVs. These functions introduce the concept of "mortality".
	834	An xV that is mortal has had its reference count marked to be decremented,
	835	but not actually decremented, until "a short time later". Generally the
	836	term "short time later" means a single Perl statement, such as a call to
	837	an XSUB function. The actual determinant for when mortal xVs have their
	838	reference count decremented depends on two macros, SAVETMPS and FREETMPS.
	839	See L<perlcall> and L<perlxs> for more details on these macros.
	840
	841	"Mortalization" then is at its simplest a deferred C<SvREFCNT_dec>.
	842	However, if you mortalize a variable twice, the reference count will
	843	later be decremented twice.
	844
	845	"Mortal" SVs are mainly used for SVs that are placed on perl's stack.
	846	For example an SV which is created just to pass a number to a called sub
	847	is made mortal to have it cleaned up automatically when it's popped off
	848	the stack. Similarly, results returned by XSUBs (which are pushed on the
	849	stack) are often made mortal.
	850
	851	To create a mortal variable, use the functions:
	852
	853	SV* sv_newmortal()
	854	SV* sv_2mortal(SV*)
	855	SV* sv_mortalcopy(SV*)
	856
	857	The first call creates a mortal SV (with no value), the second converts an existing
	858	SV to a mortal SV (and thus defers a call to C<SvREFCNT_dec>), and the
	859	third creates a mortal copy of an existing SV.
	860	Because C<sv_newmortal> gives the new SV no value, it must normally be given one
	861	via C<sv_setpv>, C<sv_setiv>, etc. :
	862
	863	SV *tmp = sv_newmortal();
	864	sv_setiv(tmp, an_integer);
	865
	866	As that is multiple C statements it is quite common so see this idiom instead:
	867
	868	SV *tmp = sv_2mortal(newSViv(an_integer));
	869
	870
	871	You should be careful about creating mortal variables. Strange things
	872	can happen if you make the same value mortal within multiple contexts,
	873	or if you make a variable mortal multiple
	874	times. Thinking of "Mortalization"
	875	as deferred C<SvREFCNT_dec> should help to minimize such problems.
	876	For example if you are passing an SV which you I<know> has a high enough REFCNT
	877	to survive its use on the stack you need not do any mortalization.
	878	If you are not sure then doing an C<SvREFCNT_inc> and C<sv_2mortal>, or
	879	making a C<sv_mortalcopy> is safer.
	880
	881	The mortal routines are not just for SVs; AVs and HVs can be
	882	made mortal by passing their address (type-casted to C<SV*>) to the
	883	C<sv_2mortal> or C<sv_mortalcopy> routines.
	884
	885	=head2 Stashes and Globs
	886
	887	A B<stash> is a hash that contains all variables that are defined
	888	within a package. Each key of the stash is a symbol
	889	name (shared by all the different types of objects that have the same
	890	name), and each value in the hash table is a GV (Glob Value). This GV
	891	in turn contains references to the various objects of that name,
	892	including (but not limited to) the following:
	893
	894	Scalar Value
	895	Array Value
	896	Hash Value
	897	I/O Handle
	898	Format
	899	Subroutine
	900
	901	There is a single stash called C<PL_defstash> that holds the items that exist
	902	in the C<main> package. To get at the items in other packages, append the
	903	string "::" to the package name. The items in the C<Foo> package are in
	904	the stash C<Foo::> in PL_defstash. The items in the C<Bar::Baz> package are
	905	in the stash C<Baz::> in C<Bar::>'s stash.
	906
	907	To get the stash pointer for a particular package, use the function:
	908
	909	HV* gv_stashpv(const char* name, I32 flags)
	910	HV* gv_stashsv(SV*, I32 flags)
	911
	912	The first function takes a literal string, the second uses the string stored
	913	in the SV. Remember that a stash is just a hash table, so you get back an
	914	C<HV*>. The C<flags> flag will create a new package if it is set to GV_ADD.
	915
	916	The name that C<gv_stash*v> wants is the name of the package whose symbol table
	917	you want. The default package is called C<main>. If you have multiply nested
	918	packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl
	919	language itself.
	920
	921	Alternately, if you have an SV that is a blessed reference, you can find
	922	out the stash pointer by using:
	923
	924	HV* SvSTASH(SvRV(SV*));
	925
	926	then use the following to get the package name itself:
	927
	928	char* HvNAME(HV* stash);
	929
	930	If you need to bless or re-bless an object you can use the following
	931	function:
	932
	933	SV* sv_bless(SV, HV stash)
	934
	935	where the first argument, an C<SV*>, must be a reference, and the second
	936	argument is a stash. The returned C<SV*> can now be used in the same way
	937	as any other SV.
	938
	939	For more information on references and blessings, consult L<perlref>.
	940
	941	=head2 Double-Typed SVs
	942
	943	Scalar variables normally contain only one type of value, an integer,
	944	double, pointer, or reference. Perl will automatically convert the
	945	actual scalar data from the stored type into the requested type.
	946
	947	Some scalar variables contain more than one type of scalar data. For
	948	example, the variable C<$!> contains either the numeric value of C<errno>
	949	or its string equivalent from either C<strerror> or C<sys_errlist[]>.
	950
	951	To force multiple data values into an SV, you must do two things: use the
	952	C<sv_set*v> routines to add the additional scalar type, then set a flag
	953	so that Perl will believe it contains more than one type of data. The
	954	four macros to set the flags are:
	955
	956	SvIOK_on
	957	SvNOK_on
	958	SvPOK_on
	959	SvROK_on
	960
	961	The particular macro you must use depends on which C<sv_set*v> routine
	962	you called first. This is because every C<sv_set*v> routine turns on
	963	only the bit for the particular type of data being set, and turns off
	964	all the rest.
	965
	966	For example, to create a new Perl variable called "dberror" that contains
	967	both the numeric and descriptive string error values, you could use the
	968	following code:
	969
	970	extern int dberror;
	971	extern char *dberror_list;
	972
	973	SV* sv = get_sv("dberror", GV_ADD);
	974	sv_setiv(sv, (IV) dberror);
	975	sv_setpv(sv, dberror_list[dberror]);
	976	SvIOK_on(sv);
	977
	978	If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the
	979	macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>.
	980
	981	=head2 Read-Only Values
	982
	983	In Perl 5.16 and earlier, copy-on-write (see the next section) shared a
	984	flag bit with read-only scalars. So the only way to test whether
	985	C<sv_setsv>, etc., will raise a "Modification of a read-only value" error
	986	in those versions is:
	987
	988	SvREADONLY(sv) && !SvIsCOW(sv)
	989
	990	Under Perl 5.18 and later, SvREADONLY only applies to read-only variables,
	991	and, under 5.20, copy-on-write scalars can also be read-only, so the above
	992	check is incorrect. You just want:
	993
	994	SvREADONLY(sv)
	995
	996	If you need to do this check often, define your own macro like this:
	997
	998	#if PERL_VERSION >= 18
	999	# define SvTRULYREADONLY(sv) SvREADONLY(sv)
	1000	#else
	1001	# define SvTRULYREADONLY(sv) (SvREADONLY(sv) && !SvIsCOW(sv))
	1002	#endif
	1003
	1004	=head2 Copy on Write
	1005
	1006	Perl implements a copy-on-write (COW) mechanism for scalars, in which
	1007	string copies are not immediately made when requested, but are deferred
	1008	until made necessary by one or the other scalar changing. This is mostly
	1009	transparent, but one must take care not to modify string buffers that are
	1010	shared by multiple SVs.
	1011
	1012	You can test whether an SV is using copy-on-write with C<SvIsCOW(sv)>.
	1013
	1014	You can force an SV to make its own copy of its string buffer by calling C<sv_force_normal(sv)> or SvPV_force_nolen(sv).
	1015
	1016	If you want to make the SV drop its string buffer, use
	1017	C<sv_force_normal_flags(sv, SV_COW_DROP_PV)> or simply
	1018	C<sv_setsv(sv, NULL)>.
	1019
	1020	All of these functions will croak on read-only scalars (see the previous
	1021	section for more on those).
	1022
	1023	To test that your code is behaving correctly and not modifying COW buffers,
	1024	on systems that support L<mmap(2)> (i.e., Unix) you can configure perl with
	1025	C<-Accflags=-DPERL_DEBUG_READONLY_COW> and it will turn buffer violations
	1026	into crashes. You will find it to be marvellously slow, so you may want to
	1027	skip perl's own tests.
	1028
	1029	=head2 Magic Variables
	1030
	1031	[This section still under construction. Ignore everything here. Post no
	1032	bills. Everything not permitted is forbidden.]
	1033
	1034	Any SV may be magical, that is, it has special features that a normal
	1035	SV does not have. These features are stored in the SV structure in a
	1036	linked list of C<struct magic>'s, typedef'ed to C<MAGIC>.
	1037
	1038	struct magic {
	1039	MAGIC* mg_moremagic;
	1040	MGVTBL* mg_virtual;
	1041	U16 mg_private;
	1042	char mg_type;
	1043	U8 mg_flags;
	1044	I32 mg_len;
	1045	SV* mg_obj;
	1046	char* mg_ptr;
	1047	};
	1048
	1049	Note this is current as of patchlevel 0, and could change at any time.
	1050
	1051	=head2 Assigning Magic
	1052
	1053	Perl adds magic to an SV using the sv_magic function:
	1054
	1055	void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen);
	1056
	1057	The C<sv> argument is a pointer to the SV that is to acquire a new magical
	1058	feature.
	1059
	1060	If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to
	1061	convert C<sv> to type C<SVt_PVMG>.
	1062	Perl then continues by adding new magic
	1063	to the beginning of the linked list of magical features. Any prior entry
	1064	of the same type of magic is deleted. Note that this can be overridden,
	1065	and multiple instances of the same type of magic can be associated with an
	1066	SV.
	1067
	1068	The C<name> and C<namlen> arguments are used to associate a string with
	1069	the magic, typically the name of a variable. C<namlen> is stored in the
	1070	C<mg_len> field and if C<name> is non-null then either a C<savepvn> copy of
	1071	C<name> or C<name> itself is stored in the C<mg_ptr> field, depending on
	1072	whether C<namlen> is greater than zero or equal to zero respectively. As a
	1073	special case, if C<(name && namlen == HEf_SVKEY)> then C<name> is assumed
	1074	to contain an C<SV*> and is stored as-is with its REFCNT incremented.
	1075
	1076	The sv_magic function uses C<how> to determine which, if any, predefined
	1077	"Magic Virtual Table" should be assigned to the C<mg_virtual> field.
	1078	See the L<Magic Virtual Tables> section below. The C<how> argument is also
	1079	stored in the C<mg_type> field. The value of
	1080	C<how> should be chosen from the set of macros
	1081	C<PERL_MAGIC_foo> found in F<perl.h>. Note that before
	1082	these macros were added, Perl internals used to directly use character
	1083	literals, so you may occasionally come across old code or documentation
	1084	referring to 'U' magic rather than C<PERL_MAGIC_uvar> for example.
	1085
	1086	The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC>
	1087	structure. If it is not the same as the C<sv> argument, the reference
	1088	count of the C<obj> object is incremented. If it is the same, or if
	1089	the C<how> argument is C<PERL_MAGIC_arylen>, or if it is a NULL pointer,
	1090	then C<obj> is merely stored, without the reference count being incremented.
	1091
	1092	See also C<sv_magicext> in L<perlapi> for a more flexible way to add magic
	1093	to an SV.
	1094
	1095	There is also a function to add magic to an C<HV>:
	1096
	1097	void hv_magic(HV hv, GV gv, int how);
	1098
	1099	This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>.
	1100
	1101	To remove the magic from an SV, call the function sv_unmagic:
	1102
	1103	int sv_unmagic(SV *sv, int type);
	1104
	1105	The C<type> argument should be equal to the C<how> value when the C<SV>
	1106	was initially made magical.
	1107
	1108	However, note that C<sv_unmagic> removes all magic of a certain C<type> from the
	1109	C<SV>. If you want to remove only certain
	1110	magic of a C<type> based on the magic
	1111	virtual table, use C<sv_unmagicext> instead:
	1112
	1113	int sv_unmagicext(SV sv, int type, MGVTBL vtbl);
	1114
	1115	=head2 Magic Virtual Tables
	1116
	1117	The C<mg_virtual> field in the C<MAGIC> structure is a pointer to an
	1118	C<MGVTBL>, which is a structure of function pointers and stands for
	1119	"Magic Virtual Table" to handle the various operations that might be
	1120	applied to that variable.
	1121
	1122	The C<MGVTBL> has five (or sometimes eight) pointers to the following
	1123	routine types:
	1124
	1125	int (svt_get)(SV sv, MAGIC* mg);
	1126	int (svt_set)(SV sv, MAGIC* mg);
	1127	U32 (svt_len)(SV sv, MAGIC* mg);
	1128	int (svt_clear)(SV sv, MAGIC* mg);
	1129	int (svt_free)(SV sv, MAGIC* mg);
	1130
	1131	int (svt_copy)(SV sv, MAGIC* mg, SV *nsv,
	1132	const char *name, I32 namlen);
	1133	int (svt_dup)(MAGIC mg, CLONE_PARAMS *param);
	1134	int (svt_local)(SV nsv, MAGIC *mg);
	1135
	1136
	1137	This MGVTBL structure is set at compile-time in F<perl.h> and there are
	1138	currently 32 types. These different structures contain pointers to various
	1139	routines that perform additional actions depending on which function is
	1140	being called.
	1141
	1142	Function pointer Action taken
	1143	---------------- ------------
	1144	svt_get Do something before the value of the SV is
	1145	retrieved.
	1146	svt_set Do something after the SV is assigned a value.
	1147	svt_len Report on the SV's length.
	1148	svt_clear Clear something the SV represents.
	1149	svt_free Free any extra storage associated with the SV.
	1150
	1151	svt_copy copy tied variable magic to a tied element
	1152	svt_dup duplicate a magic structure during thread cloning
	1153	svt_local copy magic to local value during 'local'
	1154
	1155	For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds
	1156	to an C<mg_type> of C<PERL_MAGIC_sv>) contains:
	1157
	1158	{ magic_get, magic_set, magic_len, 0, 0 }
	1159
	1160	Thus, when an SV is determined to be magical and of type C<PERL_MAGIC_sv>,
	1161	if a get operation is being performed, the routine C<magic_get> is
	1162	called. All the various routines for the various magical types begin
	1163	with C<magic_>. NOTE: the magic routines are not considered part of
	1164	the Perl API, and may not be exported by the Perl library.
	1165
	1166	The last three slots are a recent addition, and for source code
	1167	compatibility they are only checked for if one of the three flags
	1168	MGf_COPY, MGf_DUP or MGf_LOCAL is set in mg_flags.
	1169	This means that most code can continue declaring
	1170	a vtable as a 5-element value. These three are
	1171	currently used exclusively by the threading code, and are highly subject
	1172	to change.
	1173
	1174	The current kinds of Magic Virtual Tables are:
	1175
	1176	=for comment
	1177	This table is generated by regen/mg_vtable.pl. Any changes made here
	1178	will be lost.
	1179
	1180	=for mg_vtable.pl begin
	1181
	1182	mg_type
	1183	(old-style char and macro) MGVTBL Type of magic
	1184	-------------------------- ------ -------------
	1185	\0 PERL_MAGIC_sv vtbl_sv Special scalar variable
	1186	# PERL_MAGIC_arylen vtbl_arylen Array length ($#ary)
	1187	% PERL_MAGIC_rhash (none) Extra data for restricted
	1188	hashes
	1189	* PERL_MAGIC_debugvar vtbl_debugvar $DB::single, signal, trace
	1190	vars
	1191	. PERL_MAGIC_pos vtbl_pos pos() lvalue
	1192	: PERL_MAGIC_symtab (none) Extra data for symbol
	1193	tables
	1194	< PERL_MAGIC_backref vtbl_backref For weak ref data
	1195	@ PERL_MAGIC_arylen_p (none) To move arylen out of XPVAV
	1196	B PERL_MAGIC_bm vtbl_regexp Boyer-Moore
	1197	(fast string search)
	1198	c PERL_MAGIC_overload_table vtbl_ovrld Holds overload table
	1199	(AMT) on stash
	1200	D PERL_MAGIC_regdata vtbl_regdata Regex match position data
	1201	(@+ and @- vars)
	1202	d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data
	1203	element
	1204	E PERL_MAGIC_env vtbl_env %ENV hash
	1205	e PERL_MAGIC_envelem vtbl_envelem %ENV hash element
	1206	f PERL_MAGIC_fm vtbl_regexp Formline
	1207	('compiled' format)
	1208	g PERL_MAGIC_regex_global vtbl_mglob m//g target
	1209	H PERL_MAGIC_hints vtbl_hints %^H hash
	1210	h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element
	1211	I PERL_MAGIC_isa vtbl_isa @ISA array
	1212	i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element
	1213	k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue
	1214	L PERL_MAGIC_dbfile (none) Debugger %_<filename
	1215	l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename
	1216	element
	1217	N PERL_MAGIC_shared (none) Shared between threads
	1218	n PERL_MAGIC_shared_scalar (none) Shared between threads
	1219	o PERL_MAGIC_collxfrm vtbl_collxfrm Locale transformation
	1220	P PERL_MAGIC_tied vtbl_pack Tied array or hash
	1221	p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element
	1222	q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle
	1223	r PERL_MAGIC_qr vtbl_regexp Precompiled qr// regex
	1224	S PERL_MAGIC_sig (none) %SIG hash
	1225	s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element
	1226	t PERL_MAGIC_taint vtbl_taint Taintedness
	1227	U PERL_MAGIC_uvar vtbl_uvar Available for use by
	1228	extensions
	1229	u PERL_MAGIC_uvar_elem (none) Reserved for use by
	1230	extensions
	1231	V PERL_MAGIC_vstring (none) SV was vstring literal
	1232	v PERL_MAGIC_vec vtbl_vec vec() lvalue
	1233	w PERL_MAGIC_utf8 vtbl_utf8 Cached UTF-8 information
	1234	x PERL_MAGIC_substr vtbl_substr substr() lvalue
	1235	y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator
	1236	variable / smart parameter
	1237	vivification
	1238	\ PERL_MAGIC_lvref vtbl_lvref Lvalue reference
	1239	constructor
	1240	] PERL_MAGIC_checkcall vtbl_checkcall Inlining/mutation of call
	1241	to this CV
	1242	~ PERL_MAGIC_ext (none) Available for use by
	1243	extensions
	1244
	1245	=for mg_vtable.pl end
	1246
	1247	When an uppercase and lowercase letter both exist in the table, then the
	1248	uppercase letter is typically used to represent some kind of composite type
	1249	(a list or a hash), and the lowercase letter is used to represent an element
	1250	of that composite type. Some internals code makes use of this case
	1251	relationship. However, 'v' and 'V' (vec and v-string) are in no way related.
	1252
	1253	The C<PERL_MAGIC_ext> and C<PERL_MAGIC_uvar> magic types are defined
	1254	specifically for use by extensions and will not be used by perl itself.
	1255	Extensions can use C<PERL_MAGIC_ext> magic to 'attach' private information
	1256	to variables (typically objects). This is especially useful because
	1257	there is no way for normal perl code to corrupt this private information
	1258	(unlike using extra elements of a hash object).
	1259
	1260	Similarly, C<PERL_MAGIC_uvar> magic can be used much like tie() to call a
	1261	C function any time a scalar's value is used or changed. The C<MAGIC>'s
	1262	C<mg_ptr> field points to a C<ufuncs> structure:
	1263
	1264	struct ufuncs {
	1265	I32 (uf_val)(pTHX_ IV, SV);
	1266	I32 (uf_set)(pTHX_ IV, SV);
	1267	IV uf_index;
	1268	};
	1269
	1270	When the SV is read from or written to, the C<uf_val> or C<uf_set>
	1271	function will be called with C<uf_index> as the first arg and a pointer to
	1272	the SV as the second. A simple example of how to add C<PERL_MAGIC_uvar>
	1273	magic is shown below. Note that the ufuncs structure is copied by
	1274	sv_magic, so you can safely allocate it on the stack.
	1275
	1276	void
	1277	Umagic(sv)
	1278	SV *sv;
	1279	PREINIT:
	1280	struct ufuncs uf;
	1281	CODE:
	1282	uf.uf_val = &my_get_fn;
	1283	uf.uf_set = &my_set_fn;
	1284	uf.uf_index = 0;
	1285	sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf));
	1286
	1287	Attaching C<PERL_MAGIC_uvar> to arrays is permissible but has no effect.
	1288
	1289	For hashes there is a specialized hook that gives control over hash
	1290	keys (but not values). This hook calls C<PERL_MAGIC_uvar> 'get' magic
	1291	if the "set" function in the C<ufuncs> structure is NULL. The hook
	1292	is activated whenever the hash is accessed with a key specified as
	1293	an C<SV> through the functions C<hv_store_ent>, C<hv_fetch_ent>,
	1294	C<hv_delete_ent>, and C<hv_exists_ent>. Accessing the key as a string
	1295	through the functions without the C<..._ent> suffix circumvents the
	1296	hook. See L<Hash::Util::FieldHash/GUTS> for a detailed description.
	1297
	1298	Note that because multiple extensions may be using C<PERL_MAGIC_ext>
	1299	or C<PERL_MAGIC_uvar> magic, it is important for extensions to take
	1300	extra care to avoid conflict. Typically only using the magic on
	1301	objects blessed into the same class as the extension is sufficient.
	1302	For C<PERL_MAGIC_ext> magic, it is usually a good idea to define an
	1303	C<MGVTBL>, even if all its fields will be C<0>, so that individual
	1304	C<MAGIC> pointers can be identified as a particular kind of magic
	1305	using their magic virtual table. C<mg_findext> provides an easy way
	1306	to do that:
	1307
	1308	STATIC MGVTBL my_vtbl = { 0, 0, 0, 0, 0, 0, 0, 0 };
	1309
	1310	MAGIC *mg;
	1311	if ((mg = mg_findext(sv, PERL_MAGIC_ext, &my_vtbl))) {
	1312	/* this is really ours, not another module's PERL_MAGIC_ext */
	1313	my_priv_data_t priv = (my_priv_data_t )mg->mg_ptr;
	1314	...
	1315	}
	1316
	1317	Also note that the C<sv_set()> and C<sv_cat()> functions described
	1318	earlier do B<not> invoke 'set' magic on their targets. This must
	1319	be done by the user either by calling the C<SvSETMAGIC()> macro after
	1320	calling these functions, or by using one of the C<sv_set*_mg()> or
	1321	C<sv_cat*_mg()> functions. Similarly, generic C code must call the
	1322	C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV
	1323	obtained from external sources in functions that don't handle magic.
	1324	See L<perlapi> for a description of these functions.
	1325	For example, calls to the C<sv_cat*()> functions typically need to be
	1326	followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()>
	1327	since their implementation handles 'get' magic.
	1328
	1329	=head2 Finding Magic
	1330
	1331	MAGIC mg_find(SV sv, int type); /* Finds the magic pointer of that
	1332	* type */
	1333
	1334	This routine returns a pointer to a C<MAGIC> structure stored in the SV.
	1335	If the SV does not have that magical
	1336	feature, C<NULL> is returned. If the
	1337	SV has multiple instances of that magical feature, the first one will be
	1338	returned. C<mg_findext> can be used
	1339	to find a C<MAGIC> structure of an SV
	1340	based on both its magic type and its magic virtual table:
	1341
	1342	MAGIC mg_findext(SV sv, int type, MGVTBL *vtbl);
	1343
	1344	Also, if the SV passed to C<mg_find> or C<mg_findext> is not of type
	1345	SVt_PVMG, Perl may core dump.
	1346
	1347	int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen);
	1348
	1349	This routine checks to see what types of magic C<sv> has. If the mg_type
	1350	field is an uppercase letter, then the mg_obj is copied to C<nsv>, but
	1351	the mg_type field is changed to be the lowercase letter.
	1352
	1353	=head2 Understanding the Magic of Tied Hashes and Arrays
	1354
	1355	Tied hashes and arrays are magical beasts of the C<PERL_MAGIC_tied>
	1356	magic type.
	1357
	1358	WARNING: As of the 5.004 release, proper usage of the array and hash
	1359	access functions requires understanding a few caveats. Some
	1360	of these caveats are actually considered bugs in the API, to be fixed
	1361	in later releases, and are bracketed with [MAYCHANGE] below. If
	1362	you find yourself actually applying such information in this section, be
	1363	aware that the behavior may change in the future, umm, without warning.
	1364
	1365	The perl tie function associates a variable with an object that implements
	1366	the various GET, SET, etc methods. To perform the equivalent of the perl
	1367	tie function from an XSUB, you must mimic this behaviour. The code below
	1368	carries out the necessary steps -- firstly it creates a new hash, and then
	1369	creates a second hash which it blesses into the class which will implement
	1370	the tie methods. Lastly it ties the two hashes together, and returns a
	1371	reference to the new tied hash. Note that the code below does NOT call the
	1372	TIEHASH method in the MyTie class -
	1373	see L<Calling Perl Routines from within C Programs> for details on how
	1374	to do this.
	1375
	1376	SV*
	1377	mytie()
	1378	PREINIT:
	1379	HV *hash;
	1380	HV *stash;
	1381	SV *tie;
	1382	CODE:
	1383	hash = newHV();
	1384	tie = newRV_noinc((SV*)newHV());
	1385	stash = gv_stashpv("MyTie", GV_ADD);
	1386	sv_bless(tie, stash);
	1387	hv_magic(hash, (GV*)tie, PERL_MAGIC_tied);
	1388	RETVAL = newRV_noinc(hash);
	1389	OUTPUT:
	1390	RETVAL
	1391
	1392	The C<av_store> function, when given a tied array argument, merely
	1393	copies the magic of the array onto the value to be "stored", using
	1394	C<mg_copy>. It may also return NULL, indicating that the value did not
	1395	actually need to be stored in the array. [MAYCHANGE] After a call to
	1396	C<av_store> on a tied array, the caller will usually need to call
	1397	C<mg_set(val)> to actually invoke the perl level "STORE" method on the
	1398	TIEARRAY object. If C<av_store> did return NULL, a call to
	1399	C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory
	1400	leak. [/MAYCHANGE]
	1401
	1402	The previous paragraph is applicable verbatim to tied hash access using the
	1403	C<hv_store> and C<hv_store_ent> functions as well.
	1404
	1405	C<av_fetch> and the corresponding hash functions C<hv_fetch> and
	1406	C<hv_fetch_ent> actually return an undefined mortal value whose magic
	1407	has been initialized using C<mg_copy>. Note the value so returned does not
	1408	need to be deallocated, as it is already mortal. [MAYCHANGE] But you will
	1409	need to call C<mg_get()> on the returned value in order to actually invoke
	1410	the perl level "FETCH" method on the underlying TIE object. Similarly,
	1411	you may also call C<mg_set()> on the return value after possibly assigning
	1412	a suitable value to it using C<sv_setsv>, which will invoke the "STORE"
	1413	method on the TIE object. [/MAYCHANGE]
	1414
	1415	[MAYCHANGE]
	1416	In other words, the array or hash fetch/store functions don't really
	1417	fetch and store actual values in the case of tied arrays and hashes. They
	1418	merely call C<mg_copy> to attach magic to the values that were meant to be
	1419	"stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually
	1420	do the job of invoking the TIE methods on the underlying objects. Thus
	1421	the magic mechanism currently implements a kind of lazy access to arrays
	1422	and hashes.
	1423
	1424	Currently (as of perl version 5.004), use of the hash and array access
	1425	functions requires the user to be aware of whether they are operating on
	1426	"normal" hashes and arrays, or on their tied variants. The API may be
	1427	changed to provide more transparent access to both tied and normal data
	1428	types in future versions.
	1429	[/MAYCHANGE]
	1430
	1431	You would do well to understand that the TIEARRAY and TIEHASH interfaces
	1432	are mere sugar to invoke some perl method calls while using the uniform hash
	1433	and array syntax. The use of this sugar imposes some overhead (typically
	1434	about two to four extra opcodes per FETCH/STORE operation, in addition to
	1435	the creation of all the mortal variables required to invoke the methods).
	1436	This overhead will be comparatively small if the TIE methods are themselves
	1437	substantial, but if they are only a few statements long, the overhead
	1438	will not be insignificant.
	1439
	1440	=head2 Localizing changes
	1441
	1442	Perl has a very handy construction
	1443
	1444	{
	1445	local $var = 2;
	1446	...
	1447	}
	1448
	1449	This construction is I<approximately> equivalent to
	1450
	1451	{
	1452	my $oldvar = $var;
	1453	$var = 2;
	1454	...
	1455	$var = $oldvar;
	1456	}
	1457
	1458	The biggest difference is that the first construction would
	1459	reinstate the initial value of $var, irrespective of how control exits
	1460	the block: C<goto>, C<return>, C<die>/C<eval>, etc. It is a little bit
	1461	more efficient as well.
	1462
	1463	There is a way to achieve a similar task from C via Perl API: create a
	1464	I<pseudo-block>, and arrange for some changes to be automatically
	1465	undone at the end of it, either explicit, or via a non-local exit (via
	1466	die()). A I<block>-like construct is created by a pair of
	1467	C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">).
	1468	Such a construct may be created specially for some important localized
	1469	task, or an existing one (like boundaries of enclosing Perl
	1470	subroutine/block, or an existing pair for freeing TMPs) may be
	1471	used. (In the second case the overhead of additional localization must
	1472	be almost negligible.) Note that any XSUB is automatically enclosed in
	1473	an C<ENTER>/C<LEAVE> pair.
	1474
	1475	Inside such a I<pseudo-block> the following service is available:
	1476
	1477	=over 4
	1478
	1479	=item C<SAVEINT(int i)>
	1480
	1481	=item C<SAVEIV(IV i)>
	1482
	1483	=item C<SAVEI32(I32 i)>
	1484
	1485	=item C<SAVELONG(long i)>
	1486
	1487	These macros arrange things to restore the value of integer variable
	1488	C<i> at the end of enclosing I<pseudo-block>.
	1489
	1490	=item C<SAVESPTR(s)>
	1491
	1492	=item C<SAVEPPTR(p)>
	1493
	1494	These macros arrange things to restore the value of pointers C<s> and
	1495	C<p>. C<s> must be a pointer of a type which survives conversion to
	1496	C<SV> and back, C<p> should be able to survive conversion to C<char>
	1497	and back.
	1498
	1499	=item C<SAVEFREESV(SV *sv)>
	1500
	1501	The refcount of C<sv> would be decremented at the end of
	1502	I<pseudo-block>. This is similar to C<sv_2mortal> in that it is also a
	1503	mechanism for doing a delayed C<SvREFCNT_dec>. However, while C<sv_2mortal>
	1504	extends the lifetime of C<sv> until the beginning of the next statement,
	1505	C<SAVEFREESV> extends it until the end of the enclosing scope. These
	1506	lifetimes can be wildly different.
	1507
	1508	Also compare C<SAVEMORTALIZESV>.
	1509
	1510	=item C<SAVEMORTALIZESV(SV *sv)>
	1511
	1512	Just like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current
	1513	scope instead of decrementing its reference count. This usually has the
	1514	effect of keeping C<sv> alive until the statement that called the currently
	1515	live scope has finished executing.
	1516
	1517	=item C<SAVEFREEOP(OP *op)>
	1518
	1519	The C<OP *> is op_free()ed at the end of I<pseudo-block>.
	1520
	1521	=item C<SAVEFREEPV(p)>
	1522
	1523	The chunk of memory which is pointed to by C<p> is Safefree()ed at the
	1524	end of I<pseudo-block>.
	1525
	1526	=item C<SAVECLEARSV(SV *sv)>
	1527
	1528	Clears a slot in the current scratchpad which corresponds to C<sv> at
	1529	the end of I<pseudo-block>.
	1530
	1531	=item C<SAVEDELETE(HV hv, char key, I32 length)>
	1532
	1533	The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The
	1534	string pointed to by C<key> is Safefree()ed. If one has a I<key> in
	1535	short-lived storage, the corresponding string may be reallocated like
	1536	this:
	1537
	1538	SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));
	1539
	1540	=item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)>
	1541
	1542	At the end of I<pseudo-block> the function C<f> is called with the
	1543	only argument C<p>.
	1544
	1545	=item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)>
	1546
	1547	At the end of I<pseudo-block> the function C<f> is called with the
	1548	implicit context argument (if any), and C<p>.
	1549
	1550	=item C<SAVESTACK_POS()>
	1551
	1552	The current offset on the Perl internal stack (cf. C<SP>) is restored
	1553	at the end of I<pseudo-block>.
	1554
	1555	=back
	1556
	1557	The following API list contains functions, thus one needs to
	1558	provide pointers to the modifiable data explicitly (either C pointers,
	1559	or Perlish C<GV *>s). Where the above macros take C<int>, a similar
	1560	function takes C<int *>.
	1561
	1562	=over 4
	1563
	1564	=item C<SV* save_scalar(GV *gv)>
	1565
	1566	Equivalent to Perl code C<local $gv>.
	1567
	1568	=item C<AV* save_ary(GV *gv)>
	1569
	1570	=item C<HV* save_hash(GV *gv)>
	1571
	1572	Similar to C<save_scalar>, but localize C<@gv> and C<%gv>.
	1573
	1574	=item C<void save_item(SV *item)>
	1575
	1576	Duplicates the current value of C<SV>, on the exit from the current
	1577	C<ENTER>/C<LEAVE> I<pseudo-block> will restore the value of C<SV>
	1578	using the stored value. It doesn't handle magic. Use C<save_scalar> if
	1579	magic is affected.
	1580
	1581	=item C<void save_list(SV **sarg, I32 maxsarg)>
	1582
	1583	A variant of C<save_item> which takes multiple arguments via an array
	1584	C<sarg> of C<SV*> of length C<maxsarg>.
	1585
	1586	=item C<SV* save_svref(SV **sptr)>
	1587
	1588	Similar to C<save_scalar>, but will reinstate an C<SV *>.
	1589
	1590	=item C<void save_aptr(AV **aptr)>
	1591
	1592	=item C<void save_hptr(HV **hptr)>
	1593
	1594	Similar to C<save_svref>, but localize C<AV > and C<HV >.
	1595
	1596	=back
	1597
	1598	The C<Alias> module implements localization of the basic types within the
	1599	I<caller's scope>. People who are interested in how to localize things in
	1600	the containing scope should take a look there too.
	1601
	1602	=head1 Subroutines
	1603
	1604	=head2 XSUBs and the Argument Stack
	1605
	1606	The XSUB mechanism is a simple way for Perl programs to access C subroutines.
	1607	An XSUB routine will have a stack that contains the arguments from the Perl
	1608	program, and a way to map from the Perl data structures to a C equivalent.
	1609
	1610	The stack arguments are accessible through the C<ST(n)> macro, which returns
	1611	the C<n>'th stack argument. Argument 0 is the first argument passed in the
	1612	Perl subroutine call. These arguments are C<SV*>, and can be used anywhere
	1613	an C<SV*> is used.
	1614
	1615	Most of the time, output from the C routine can be handled through use of
	1616	the RETVAL and OUTPUT directives. However, there are some cases where the
	1617	argument stack is not already long enough to handle all the return values.
	1618	An example is the POSIX tzname() call, which takes no arguments, but returns
	1619	two, the local time zone's standard and summer time abbreviations.
	1620
	1621	To handle this situation, the PPCODE directive is used and the stack is
	1622	extended using the macro:
	1623
	1624	EXTEND(SP, num);
	1625
	1626	where C<SP> is the macro that represents the local copy of the stack pointer,
	1627	and C<num> is the number of elements the stack should be extended by.
	1628
	1629	Now that there is room on the stack, values can be pushed on it using C<PUSHs>
	1630	macro. The pushed values will often need to be "mortal" (See
	1631	L</Reference Counts and Mortality>):
	1632
	1633	PUSHs(sv_2mortal(newSViv(an_integer)))
	1634	PUSHs(sv_2mortal(newSVuv(an_unsigned_integer)))
	1635	PUSHs(sv_2mortal(newSVnv(a_double)))
	1636	PUSHs(sv_2mortal(newSVpv("Some String",0)))
	1637	/* Although the last example is better written as the more
	1638	* efficient: */
	1639	PUSHs(newSVpvs_flags("Some String", SVs_TEMP))
	1640
	1641	And now the Perl program calling C<tzname>, the two values will be assigned
	1642	as in:
	1643
	1644	($standard_abbrev, $summer_abbrev) = POSIX::tzname;
	1645
	1646	An alternate (and possibly simpler) method to pushing values on the stack is
	1647	to use the macro:
	1648
	1649	XPUSHs(SV*)
	1650
	1651	This macro automatically adjusts the stack for you, if needed. Thus, you
	1652	do not need to call C<EXTEND> to extend the stack.
	1653
	1654	Despite their suggestions in earlier versions of this document the macros
	1655	C<(X)PUSH[iunp]> are I<not> suited to XSUBs which return multiple results.
	1656	For that, either stick to the C<(X)PUSHs> macros shown above, or use the new
	1657	C<m(X)PUSH[iunp]> macros instead; see L</Putting a C value on Perl stack>.
	1658
	1659	For more information, consult L<perlxs> and L<perlxstut>.
	1660
	1661	=head2 Autoloading with XSUBs
	1662
	1663	If an AUTOLOAD routine is an XSUB, as with Perl subroutines, Perl puts the
	1664	fully-qualified name of the autoloaded subroutine in the $AUTOLOAD variable
	1665	of the XSUB's package.
	1666
	1667	But it also puts the same information in certain fields of the XSUB itself:
	1668
	1669	HV *stash = CvSTASH(cv);
	1670	const char *subname = SvPVX(cv);
	1671	STRLEN name_length = SvCUR(cv); /* in bytes */
	1672	U32 is_utf8 = SvUTF8(cv);
	1673
	1674	C<SvPVX(cv)> contains just the sub name itself, not including the package.
	1675	For an AUTOLOAD routine in UNIVERSAL or one of its superclasses,
	1676	C<CvSTASH(cv)> returns NULL during a method call on a nonexistent package.
	1677
	1678	B<Note>: Setting $AUTOLOAD stopped working in 5.6.1, which did not support
	1679	XS AUTOLOAD subs at all. Perl 5.8.0 introduced the use of fields in the
	1680	XSUB itself. Perl 5.16.0 restored the setting of $AUTOLOAD. If you need
	1681	to support 5.8-5.14, use the XSUB's fields.
	1682
	1683	=head2 Calling Perl Routines from within C Programs
	1684
	1685	There are four routines that can be used to call a Perl subroutine from
	1686	within a C program. These four are:
	1687
	1688	I32 call_sv(SV*, I32);
	1689	I32 call_pv(const char*, I32);
	1690	I32 call_method(const char*, I32);
	1691	I32 call_argv(const char, I32, char*);
	1692
	1693	The routine most often used is C<call_sv>. The C<SV*> argument
	1694	contains either the name of the Perl subroutine to be called, or a
	1695	reference to the subroutine. The second argument consists of flags
	1696	that control the context in which the subroutine is called, whether
	1697	or not the subroutine is being passed arguments, how errors should be
	1698	trapped, and how to treat return values.
	1699
	1700	All four routines return the number of arguments that the subroutine returned
	1701	on the Perl stack.
	1702
	1703	These routines used to be called C<perl_call_sv>, etc., before Perl v5.6.0,
	1704	but those names are now deprecated; macros of the same name are provided for
	1705	compatibility.
	1706
	1707	When using any of these routines (except C<call_argv>), the programmer
	1708	must manipulate the Perl stack. These include the following macros and
	1709	functions:
	1710
	1711	dSP
	1712	SP
	1713	PUSHMARK()
	1714	PUTBACK
	1715	SPAGAIN
	1716	ENTER
	1717	SAVETMPS
	1718	FREETMPS
	1719	LEAVE
	1720	XPUSH*()
	1721	POP*()
	1722
	1723	For a detailed description of calling conventions from C to Perl,
	1724	consult L<perlcall>.
	1725
	1726	=head2 Putting a C value on Perl stack
	1727
	1728	A lot of opcodes (this is an elementary operation in the internal perl
	1729	stack machine) put an SV* on the stack. However, as an optimization
	1730	the corresponding SV is (usually) not recreated each time. The opcodes
	1731	reuse specially assigned SVs (I<target>s) which are (as a corollary)
	1732	not constantly freed/created.
	1733
	1734	Each of the targets is created only once (but see
	1735	L<Scratchpads and recursion> below), and when an opcode needs to put
	1736	an integer, a double, or a string on stack, it just sets the
	1737	corresponding parts of its I<target> and puts the I<target> on stack.
	1738
	1739	The macro to put this target on stack is C<PUSHTARG>, and it is
	1740	directly used in some opcodes, as well as indirectly in zillions of
	1741	others, which use it via C<(X)PUSH[iunp]>.
	1742
	1743	Because the target is reused, you must be careful when pushing multiple
	1744	values on the stack. The following code will not do what you think:
	1745
	1746	XPUSHi(10);
	1747	XPUSHi(20);
	1748
	1749	This translates as "set C<TARG> to 10, push a pointer to C<TARG> onto
	1750	the stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack".
	1751	At the end of the operation, the stack does not contain the values 10
	1752	and 20, but actually contains two pointers to C<TARG>, which we have set
	1753	to 20.
	1754
	1755	If you need to push multiple different values then you should either use
	1756	the C<(X)PUSHs> macros, or else use the new C<m(X)PUSH[iunp]> macros,
	1757	none of which make use of C<TARG>. The C<(X)PUSHs> macros simply push an
	1758	SV* on the stack, which, as noted under L</XSUBs and the Argument Stack>,
	1759	will often need to be "mortal". The new C<m(X)PUSH[iunp]> macros make
	1760	this a little easier to achieve by creating a new mortal for you (via
	1761	C<(X)PUSHmortal>), pushing that onto the stack (extending it if necessary
	1762	in the case of the C<mXPUSH[iunp]> macros), and then setting its value.
	1763	Thus, instead of writing this to "fix" the example above:
	1764
	1765	XPUSHs(sv_2mortal(newSViv(10)))
	1766	XPUSHs(sv_2mortal(newSViv(20)))
	1767
	1768	you can simply write:
	1769
	1770	mXPUSHi(10)
	1771	mXPUSHi(20)
	1772
	1773	On a related note, if you do use C<(X)PUSH[iunp]>, then you're going to
	1774	need a C<dTARG> in your variable declarations so that the C<PUSH>
	1775	macros can make use of the local variable C<TARG>. See also C<dTARGET>
	1776	and C<dXSTARG>.
	1777
	1778	=head2 Scratchpads
	1779
	1780	The question remains on when the SVs which are I<target>s for opcodes
	1781	are created. The answer is that they are created when the current
	1782	unit--a subroutine or a file (for opcodes for statements outside of
	1783	subroutines)--is compiled. During this time a special anonymous Perl
	1784	array is created, which is called a scratchpad for the current unit.
	1785
	1786	A scratchpad keeps SVs which are lexicals for the current unit and are
	1787	targets for opcodes. A previous version of this document
	1788	stated that one can deduce that an SV lives on a scratchpad
	1789	by looking on its flags: lexicals have C<SVs_PADMY> set, and
	1790	I<target>s have C<SVs_PADTMP> set. But this has never been fully true.
	1791	C<SVs_PADMY> could be set on a variable that no longer resides in any pad.
	1792	While I<target>s do have C<SVs_PADTMP> set, it can also be set on variables
	1793	that have never resided in a pad, but nonetheless act like I<target>s. As
	1794	of perl 5.21.5, the C<SVs_PADMY> flag is no longer used and is defined as
	1795	0. C<SvPADMY()> now returns true for anything without C<SVs_PADTMP>.
	1796
	1797	The correspondence between OPs and I<target>s is not 1-to-1. Different
	1798	OPs in the compile tree of the unit can use the same target, if this
	1799	would not conflict with the expected life of the temporary.
	1800
	1801	=head2 Scratchpads and recursion
	1802
	1803	In fact it is not 100% true that a compiled unit contains a pointer to
	1804	the scratchpad AV. In fact it contains a pointer to an AV of
	1805	(initially) one element, and this element is the scratchpad AV. Why do
	1806	we need an extra level of indirection?
	1807
	1808	The answer is B<recursion>, and maybe B<threads>. Both
	1809	these can create several execution pointers going into the same
	1810	subroutine. For the subroutine-child not write over the temporaries
	1811	for the subroutine-parent (lifespan of which covers the call to the
	1812	child), the parent and the child should have different
	1813	scratchpads. (I<And> the lexicals should be separate anyway!)
	1814
	1815	So each subroutine is born with an array of scratchpads (of length 1).
	1816	On each entry to the subroutine it is checked that the current
	1817	depth of the recursion is not more than the length of this array, and
	1818	if it is, new scratchpad is created and pushed into the array.
	1819
	1820	The I<target>s on this scratchpad are C<undef>s, but they are already
	1821	marked with correct flags.
	1822
	1823	=head1 Memory Allocation
	1824
	1825	=head2 Allocation
	1826
	1827	All memory meant to be used with the Perl API functions should be manipulated
	1828	using the macros described in this section. The macros provide the necessary
	1829	transparency between differences in the actual malloc implementation that is
	1830	used within perl.
	1831
	1832	It is suggested that you enable the version of malloc that is distributed
	1833	with Perl. It keeps pools of various sizes of unallocated memory in
	1834	order to satisfy allocation requests more quickly. However, on some
	1835	platforms, it may cause spurious malloc or free errors.
	1836
	1837	The following three macros are used to initially allocate memory :
	1838
	1839	Newx(pointer, number, type);
	1840	Newxc(pointer, number, type, cast);
	1841	Newxz(pointer, number, type);
	1842
	1843	The first argument C<pointer> should be the name of a variable that will
	1844	point to the newly allocated memory.
	1845
	1846	The second and third arguments C<number> and C<type> specify how many of
	1847	the specified type of data structure should be allocated. The argument
	1848	C<type> is passed to C<sizeof>. The final argument to C<Newxc>, C<cast>,
	1849	should be used if the C<pointer> argument is different from the C<type>
	1850	argument.
	1851
	1852	Unlike the C<Newx> and C<Newxc> macros, the C<Newxz> macro calls C<memzero>
	1853	to zero out all the newly allocated memory.
	1854
	1855	=head2 Reallocation
	1856
	1857	Renew(pointer, number, type);
	1858	Renewc(pointer, number, type, cast);
	1859	Safefree(pointer)
	1860
	1861	These three macros are used to change a memory buffer size or to free a
	1862	piece of memory no longer needed. The arguments to C<Renew> and C<Renewc>
	1863	match those of C<New> and C<Newc> with the exception of not needing the
	1864	"magic cookie" argument.
	1865
	1866	=head2 Moving
	1867
	1868	Move(source, dest, number, type);
	1869	Copy(source, dest, number, type);
	1870	Zero(dest, number, type);
	1871
	1872	These three macros are used to move, copy, or zero out previously allocated
	1873	memory. The C<source> and C<dest> arguments point to the source and
	1874	destination starting points. Perl will move, copy, or zero out C<number>
	1875	instances of the size of the C<type> data structure (using the C<sizeof>
	1876	function).
	1877
	1878	=head1 PerlIO
	1879
	1880	The most recent development releases of Perl have been experimenting with
	1881	removing Perl's dependency on the "normal" standard I/O suite and allowing
	1882	other stdio implementations to be used. This involves creating a new
	1883	abstraction layer that then calls whichever implementation of stdio Perl
	1884	was compiled with. All XSUBs should now use the functions in the PerlIO
	1885	abstraction layer and not make any assumptions about what kind of stdio
	1886	is being used.
	1887
	1888	For a complete description of the PerlIO abstraction, consult L<perlapio>.
	1889
	1890	=head1 Compiled code
	1891
	1892	=head2 Code tree
	1893
	1894	Here we describe the internal form your code is converted to by
	1895	Perl. Start with a simple example:
	1896
	1897	$a = $b + $c;
	1898
	1899	This is converted to a tree similar to this one:
	1900
	1901	assign-to
	1902	/ \
	1903	+ $a
	1904	/ \
	1905	$b $c
	1906
	1907	(but slightly more complicated). This tree reflects the way Perl
	1908	parsed your code, but has nothing to do with the execution order.
	1909	There is an additional "thread" going through the nodes of the tree
	1910	which shows the order of execution of the nodes. In our simplified
	1911	example above it looks like:
	1912
	1913	$b ---> $c ---> + ---> $a ---> assign-to
	1914
	1915	But with the actual compile tree for C<$a = $b + $c> it is different:
	1916	some nodes I<optimized away>. As a corollary, though the actual tree
	1917	contains more nodes than our simplified example, the execution order
	1918	is the same as in our example.
	1919
	1920	=head2 Examining the tree
	1921
	1922	If you have your perl compiled for debugging (usually done with
	1923	C<-DDEBUGGING> on the C<Configure> command line), you may examine the
	1924	compiled tree by specifying C<-Dx> on the Perl command line. The
	1925	output takes several lines per node, and for C<$b+$c> it looks like
	1926	this:
	1927
	1928	5 TYPE = add ===> 6
	1929	TARG = 1
	1930	FLAGS = (SCALAR,KIDS)
	1931	{
	1932	TYPE = null ===> (4)
	1933	(was rv2sv)
	1934	FLAGS = (SCALAR,KIDS)
	1935	{
	1936	3 TYPE = gvsv ===> 4
	1937	FLAGS = (SCALAR)
	1938	GV = main::b
	1939	}
	1940	}
	1941	{
	1942	TYPE = null ===> (5)
	1943	(was rv2sv)
	1944	FLAGS = (SCALAR,KIDS)
	1945	{
	1946	4 TYPE = gvsv ===> 5
	1947	FLAGS = (SCALAR)
	1948	GV = main::c
	1949	}
	1950	}
	1951
	1952	This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are
	1953	not optimized away (one per number in the left column). The immediate
	1954	children of the given node correspond to C<{}> pairs on the same level
	1955	of indentation, thus this listing corresponds to the tree:
	1956
	1957	add
	1958	/ \
	1959	null null
	1960	\| \|
	1961	gvsv gvsv
	1962
	1963	The execution order is indicated by C<===E<gt>> marks, thus it is C<3
	1964	4 5 6> (node C<6> is not included into above listing), i.e.,
	1965	C<gvsv gvsv add whatever>.
	1966
	1967	Each of these nodes represents an op, a fundamental operation inside the
	1968	Perl core. The code which implements each operation can be found in the
	1969	F<pp*.c> files; the function which implements the op with type C<gvsv>
	1970	is C<pp_gvsv>, and so on. As the tree above shows, different ops have
	1971	different numbers of children: C<add> is a binary operator, as one would
	1972	expect, and so has two children. To accommodate the various different
	1973	numbers of children, there are various types of op data structure, and
	1974	they link together in different ways.
	1975
	1976	The simplest type of op structure is C<OP>: this has no children. Unary
	1977	operators, C<UNOP>s, have one child, and this is pointed to by the
	1978	C<op_first> field. Binary operators (C<BINOP>s) have not only an
	1979	C<op_first> field but also an C<op_last> field. The most complex type of
	1980	op is a C<LISTOP>, which has any number of children. In this case, the
	1981	first child is pointed to by C<op_first> and the last child by
	1982	C<op_last>. The children in between can be found by iteratively
	1983	following the C<OpSIBLING> pointer from the first child to the last (but
	1984	see below).
	1985
	1986	There are also some other op types: a C<PMOP> holds a regular expression,
	1987	and has no children, and a C<LOOP> may or may not have children. If the
	1988	C<op_children> field is non-zero, it behaves like a C<LISTOP>. To
	1989	complicate matters, if a C<UNOP> is actually a C<null> op after
	1990	optimization (see L</Compile pass 2: context propagation>) it will still
	1991	have children in accordance with its former type.
	1992
	1993	Finally, there is a C<LOGOP>, or logic op. Like a C<LISTOP>, this has one
	1994	or more children, but it doesn't have an C<op_last> field: so you have to
	1995	follow C<op_first> and then the C<OpSIBLING> chain itself to find the
	1996	last child. Instead it has an C<op_other> field, which is comparable to
	1997	the C<op_next> field described below, and represents an alternate
	1998	execution path. Operators like C<and>, C<or> and C<?> are C<LOGOP>s. Note
	1999	that in general, C<op_other> may not point to any of the direct children
	2000	of the C<LOGOP>.
	2001
	2002	Starting in version 5.21.2, perls built with the experimental
	2003	define C<-DPERL_OP_PARENT> add an extra boolean flag for each op,
	2004	C<op_moresib>. When not set, this indicates that this is the last op in an
	2005	C<OpSIBLING> chain. This frees up the C<op_sibling> field on the last
	2006	sibling to point back to the parent op. Under this build, that field is
	2007	also renamed C<op_sibparent> to reflect its joint role. The macro
	2008	C<OpSIBLING(o)> wraps this special behaviour, and always returns NULL on
	2009	the last sibling. With this build the C<op_parent(o)> function can be
	2010	used to find the parent of any op. Thus for forward compatibility, you
	2011	should always use the C<OpSIBLING(o)> macro rather than accessing
	2012	C<op_sibling> directly.
	2013
	2014	Another way to examine the tree is to use a compiler back-end module, such
	2015	as L<B::Concise>.
	2016
	2017	=head2 Compile pass 1: check routines
	2018
	2019	The tree is created by the compiler while I<yacc> code feeds it
	2020	the constructions it recognizes. Since I<yacc> works bottom-up, so does
	2021	the first pass of perl compilation.
	2022
	2023	What makes this pass interesting for perl developers is that some
	2024	optimization may be performed on this pass. This is optimization by
	2025	so-called "check routines". The correspondence between node names
	2026	and corresponding check routines is described in F<opcode.pl> (do not
	2027	forget to run C<make regen_headers> if you modify this file).
	2028
	2029	A check routine is called when the node is fully constructed except
	2030	for the execution-order thread. Since at this time there are no
	2031	back-links to the currently constructed node, one can do most any
	2032	operation to the top-level node, including freeing it and/or creating
	2033	new nodes above/below it.
	2034
	2035	The check routine returns the node which should be inserted into the
	2036	tree (if the top-level node was not modified, check routine returns
	2037	its argument).
	2038
	2039	By convention, check routines have names C<ck_*>. They are usually
	2040	called from C<new*OP> subroutines (or C<convert>) (which in turn are
	2041	called from F<perly.y>).
	2042
	2043	=head2 Compile pass 1a: constant folding
	2044
	2045	Immediately after the check routine is called the returned node is
	2046	checked for being compile-time executable. If it is (the value is
	2047	judged to be constant) it is immediately executed, and a I<constant>
	2048	node with the "return value" of the corresponding subtree is
	2049	substituted instead. The subtree is deleted.
	2050
	2051	If constant folding was not performed, the execution-order thread is
	2052	created.
	2053
	2054	=head2 Compile pass 2: context propagation
	2055
	2056	When a context for a part of compile tree is known, it is propagated
	2057	down through the tree. At this time the context can have 5 values
	2058	(instead of 2 for runtime context): void, boolean, scalar, list, and
	2059	lvalue. In contrast with the pass 1 this pass is processed from top
	2060	to bottom: a node's context determines the context for its children.
	2061
	2062	Additional context-dependent optimizations are performed at this time.
	2063	Since at this moment the compile tree contains back-references (via
	2064	"thread" pointers), nodes cannot be free()d now. To allow
	2065	optimized-away nodes at this stage, such nodes are null()ified instead
	2066	of free()ing (i.e. their type is changed to OP_NULL).
	2067
	2068	=head2 Compile pass 3: peephole optimization
	2069
	2070	After the compile tree for a subroutine (or for an C<eval> or a file)
	2071	is created, an additional pass over the code is performed. This pass
	2072	is neither top-down or bottom-up, but in the execution order (with
	2073	additional complications for conditionals). Optimizations performed
	2074	at this stage are subject to the same restrictions as in the pass 2.
	2075
	2076	Peephole optimizations are done by calling the function pointed to
	2077	by the global variable C<PL_peepp>. By default, C<PL_peepp> just
	2078	calls the function pointed to by the global variable C<PL_rpeepp>.
	2079	By default, that performs some basic op fixups and optimisations along
	2080	the execution-order op chain, and recursively calls C<PL_rpeepp> for
	2081	each side chain of ops (resulting from conditionals). Extensions may
	2082	provide additional optimisations or fixups, hooking into either the
	2083	per-subroutine or recursive stage, like this:
	2084
	2085	static peep_t prev_peepp;
	2086	static void my_peep(pTHX_ OP *o)
	2087	{
	2088	/* custom per-subroutine optimisation goes here */
	2089	prev_peepp(aTHX_ o);
	2090	/* custom per-subroutine optimisation may also go here */
	2091	}
	2092	BOOT:
	2093	prev_peepp = PL_peepp;
	2094	PL_peepp = my_peep;
	2095
	2096	static peep_t prev_rpeepp;
	2097	static void my_rpeep(pTHX_ OP *o)
	2098	{
	2099	OP *orig_o = o;
	2100	for(; o; o = o->op_next) {
	2101	/* custom per-op optimisation goes here */
	2102	}
	2103	prev_rpeepp(aTHX_ orig_o);
	2104	}
	2105	BOOT:
	2106	prev_rpeepp = PL_rpeepp;
	2107	PL_rpeepp = my_rpeep;
	2108
	2109	=head2 Pluggable runops
	2110
	2111	The compile tree is executed in a runops function. There are two runops
	2112	functions, in F<run.c> and in F<dump.c>. C<Perl_runops_debug> is used
	2113	with DEBUGGING and C<Perl_runops_standard> is used otherwise. For fine
	2114	control over the execution of the compile tree it is possible to provide
	2115	your own runops function.
	2116
	2117	It's probably best to copy one of the existing runops functions and
	2118	change it to suit your needs. Then, in the BOOT section of your XS
	2119	file, add the line:
	2120
	2121	PL_runops = my_runops;
	2122
	2123	This function should be as efficient as possible to keep your programs
	2124	running as fast as possible.
	2125
	2126	=head2 Compile-time scope hooks
	2127
	2128	As of perl 5.14 it is possible to hook into the compile-time lexical
	2129	scope mechanism using C<Perl_blockhook_register>. This is used like
	2130	this:
	2131
	2132	STATIC void my_start_hook(pTHX_ int full);
	2133	STATIC BHK my_hooks;
	2134
	2135	BOOT:
	2136	BhkENTRY_set(&my_hooks, bhk_start, my_start_hook);
	2137	Perl_blockhook_register(aTHX_ &my_hooks);
	2138
	2139	This will arrange to have C<my_start_hook> called at the start of
	2140	compiling every lexical scope. The available hooks are:
	2141
	2142	=over 4
	2143
	2144	=item C<void bhk_start(pTHX_ int full)>
	2145
	2146	This is called just after starting a new lexical scope. Note that Perl
	2147	code like
	2148
	2149	if ($x) { ... }
	2150
	2151	creates two scopes: the first starts at the C<(> and has C<full == 1>,
	2152	the second starts at the C<{> and has C<full == 0>. Both end at the
	2153	C<}>, so calls to C<start> and C<pre>/C<post_end> will match. Anything
	2154	pushed onto the save stack by this hook will be popped just before the
	2155	scope ends (between the C<pre_> and C<post_end> hooks, in fact).
	2156
	2157	=item C<void bhk_pre_end(pTHX_ OP **o)>
	2158
	2159	This is called at the end of a lexical scope, just before unwinding the
	2160	stack. I<o> is the root of the optree representing the scope; it is a
	2161	double pointer so you can replace the OP if you need to.
	2162
	2163	=item C<void bhk_post_end(pTHX_ OP **o)>
	2164
	2165	This is called at the end of a lexical scope, just after unwinding the
	2166	stack. I<o> is as above. Note that it is possible for calls to C<pre_>
	2167	and C<post_end> to nest, if there is something on the save stack that
	2168	calls string eval.
	2169
	2170	=item C<void bhk_eval(pTHX_ OP *const o)>
	2171
	2172	This is called just before starting to compile an C<eval STRING>, C<do
	2173	FILE>, C<require> or C<use>, after the eval has been set up. I<o> is the
	2174	OP that requested the eval, and will normally be an C<OP_ENTEREVAL>,
	2175	C<OP_DOFILE> or C<OP_REQUIRE>.
	2176
	2177	=back
	2178
	2179	Once you have your hook functions, you need a C<BHK> structure to put
	2180	them in. It's best to allocate it statically, since there is no way to
	2181	free it once it's registered. The function pointers should be inserted
	2182	into this structure using the C<BhkENTRY_set> macro, which will also set
	2183	flags indicating which entries are valid. If you do need to allocate
	2184	your C<BHK> dynamically for some reason, be sure to zero it before you
	2185	start.
	2186
	2187	Once registered, there is no mechanism to switch these hooks off, so if
	2188	that is necessary you will need to do this yourself. An entry in C<%^H>
	2189	is probably the best way, so the effect is lexically scoped; however it
	2190	is also possible to use the C<BhkDISABLE> and C<BhkENABLE> macros to
	2191	temporarily switch entries on and off. You should also be aware that
	2192	generally speaking at least one scope will have opened before your
	2193	extension is loaded, so you will see some C<pre>/C<post_end> pairs that
	2194	didn't have a matching C<start>.
	2195
	2196	=head1 Examining internal data structures with the C<dump> functions
	2197
	2198	To aid debugging, the source file F<dump.c> contains a number of
	2199	functions which produce formatted output of internal data structures.
	2200
	2201	The most commonly used of these functions is C<Perl_sv_dump>; it's used
	2202	for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls
	2203	C<sv_dump> to produce debugging output from Perl-space, so users of that
	2204	module should already be familiar with its format.
	2205
	2206	C<Perl_op_dump> can be used to dump an C<OP> structure or any of its
	2207	derivatives, and produces output similar to C<perl -Dx>; in fact,
	2208	C<Perl_dump_eval> will dump the main root of the code being evaluated,
	2209	exactly like C<-Dx>.
	2210
	2211	Other useful functions are C<Perl_dump_sub>, which turns a C<GV> into an
	2212	op tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the
	2213	subroutines in a package like so: (Thankfully, these are all xsubs, so
	2214	there is no op tree)
	2215
	2216	(gdb) print Perl_dump_packsubs(PL_defstash)
	2217
	2218	SUB attributes::bootstrap = (xsub 0x811fedc 0)
	2219
	2220	SUB UNIVERSAL::can = (xsub 0x811f50c 0)
	2221
	2222	SUB UNIVERSAL::isa = (xsub 0x811f304 0)
	2223
	2224	SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0)
	2225
	2226	SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0)
	2227
	2228	and C<Perl_dump_all>, which dumps all the subroutines in the stash and
	2229	the op tree of the main root.
	2230
	2231	=head1 How multiple interpreters and concurrency are supported
	2232
	2233	=head2 Background and PERL_IMPLICIT_CONTEXT
	2234
	2235	The Perl interpreter can be regarded as a closed box: it has an API
	2236	for feeding it code or otherwise making it do things, but it also has
	2237	functions for its own use. This smells a lot like an object, and
	2238	there are ways for you to build Perl so that you can have multiple
	2239	interpreters, with one interpreter represented either as a C structure,
	2240	or inside a thread-specific structure. These structures contain all
	2241	the context, the state of that interpreter.
	2242
	2243	One macro controls the major Perl build flavor: MULTIPLICITY. The
	2244	MULTIPLICITY build has a C structure that packages all the interpreter
	2245	state. With multiplicity-enabled perls, PERL_IMPLICIT_CONTEXT is also
	2246	normally defined, and enables the support for passing in a "hidden" first
	2247	argument that represents all three data structures. MULTIPLICITY makes
	2248	multi-threaded perls possible (with the ithreads threading model, related
	2249	to the macro USE_ITHREADS.)
	2250
	2251	Two other "encapsulation" macros are the PERL_GLOBAL_STRUCT and
	2252	PERL_GLOBAL_STRUCT_PRIVATE (the latter turns on the former, and the
	2253	former turns on MULTIPLICITY.) The PERL_GLOBAL_STRUCT causes all the
	2254	internal variables of Perl to be wrapped inside a single global struct,
	2255	struct perl_vars, accessible as (globals) &PL_Vars or PL_VarsPtr or
	2256	the function Perl_GetVars(). The PERL_GLOBAL_STRUCT_PRIVATE goes
	2257	one step further, there is still a single struct (allocated in main()
	2258	either from heap or from stack) but there are no global data symbols
	2259	pointing to it. In either case the global struct should be initialized
	2260	as the very first thing in main() using Perl_init_global_struct() and
	2261	correspondingly tear it down after perl_free() using Perl_free_global_struct(),
	2262	please see F<miniperlmain.c> for usage details. You may also need
	2263	to use C<dVAR> in your coding to "declare the global variables"
	2264	when you are using them. dTHX does this for you automatically.
	2265
	2266	To see whether you have non-const data you can use a BSD (or GNU)
	2267	compatible C<nm>:
	2268
	2269	nm libperl.a \| grep -v ' [TURtr] '
	2270
	2271	If this displays any C<D> or C<d> symbols (or possibly C<C> or C<c>),
	2272	you have non-const data. The symbols the C<grep> removed are as follows:
	2273	C<Tt> are I<text>, or code, the C<Rr> are I<read-only> (const) data,
	2274	and the C<U> is <undefined>, external symbols referred to.
	2275
	2276	The test F<t/porting/libperl.t> does this kind of symbol sanity
	2277	checking on C<libperl.a>.
	2278
	2279	For backward compatibility reasons defining just PERL_GLOBAL_STRUCT
	2280	doesn't actually hide all symbols inside a big global struct: some
	2281	PerlIO_xxx vtables are left visible. The PERL_GLOBAL_STRUCT_PRIVATE
	2282	then hides everything (see how the PERLIO_FUNCS_DECL is used).
	2283
	2284	All this obviously requires a way for the Perl internal functions to be
	2285	either subroutines taking some kind of structure as the first
	2286	argument, or subroutines taking nothing as the first argument. To
	2287	enable these two very different ways of building the interpreter,
	2288	the Perl source (as it does in so many other situations) makes heavy
	2289	use of macros and subroutine naming conventions.
	2290
	2291	First problem: deciding which functions will be public API functions and
	2292	which will be private. All functions whose names begin C<S_> are private
	2293	(think "S" for "secret" or "static"). All other functions begin with
	2294	"Perl_", but just because a function begins with "Perl_" does not mean it is
	2295	part of the API. (See L</Internal
	2296	Functions>.) The easiest way to be B<sure> a
	2297	function is part of the API is to find its entry in L<perlapi>.
	2298	If it exists in L<perlapi>, it's part of the API. If it doesn't, and you
	2299	think it should be (i.e., you need it for your extension), send mail via
	2300	L<perlbug> explaining why you think it should be.
	2301
	2302	Second problem: there must be a syntax so that the same subroutine
	2303	declarations and calls can pass a structure as their first argument,
	2304	or pass nothing. To solve this, the subroutines are named and
	2305	declared in a particular way. Here's a typical start of a static
	2306	function used within the Perl guts:
	2307
	2308	STATIC void
	2309	S_incline(pTHX_ char *s)
	2310
	2311	STATIC becomes "static" in C, and may be #define'd to nothing in some
	2312	configurations in the future.
	2313
	2314	A public function (i.e. part of the internal API, but not necessarily
	2315	sanctioned for use in extensions) begins like this:
	2316
	2317	void
	2318	Perl_sv_setiv(pTHX_ SV* dsv, IV num)
	2319
	2320	C<pTHX_> is one of a number of macros (in F<perl.h>) that hide the
	2321	details of the interpreter's context. THX stands for "thread", "this",
	2322	or "thingy", as the case may be. (And no, George Lucas is not involved. :-)
	2323	The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument,
	2324	or 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and
	2325	their variants.
	2326
	2327	When Perl is built without options that set PERL_IMPLICIT_CONTEXT, there is no
	2328	first argument containing the interpreter's context. The trailing underscore
	2329	in the pTHX_ macro indicates that the macro expansion needs a comma
	2330	after the context argument because other arguments follow it. If
	2331	PERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be ignored, and the
	2332	subroutine is not prototyped to take the extra argument. The form of the
	2333	macro without the trailing underscore is used when there are no additional
	2334	explicit arguments.
	2335
	2336	When a core function calls another, it must pass the context. This
	2337	is normally hidden via macros. Consider C<sv_setiv>. It expands into
	2338	something like this:
	2339
	2340	#ifdef PERL_IMPLICIT_CONTEXT
	2341	#define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b)
	2342	/* can't do this for vararg functions, see below */
	2343	#else
	2344	#define sv_setiv Perl_sv_setiv
	2345	#endif
	2346
	2347	This works well, and means that XS authors can gleefully write:
	2348
	2349	sv_setiv(foo, bar);
	2350
	2351	and still have it work under all the modes Perl could have been
	2352	compiled with.
	2353
	2354	This doesn't work so cleanly for varargs functions, though, as macros
	2355	imply that the number of arguments is known in advance. Instead we
	2356	either need to spell them out fully, passing C<aTHX_> as the first
	2357	argument (the Perl core tends to do this with functions like
	2358	Perl_warner), or use a context-free version.
	2359
	2360	The context-free version of Perl_warner is called
	2361	Perl_warner_nocontext, and does not take the extra argument. Instead
	2362	it does dTHX; to get the context from thread-local storage. We
	2363	C<#define warner Perl_warner_nocontext> so that extensions get source
	2364	compatibility at the expense of performance. (Passing an arg is
	2365	cheaper than grabbing it from thread-local storage.)
	2366
	2367	You can ignore [pad]THXx when browsing the Perl headers/sources.
	2368	Those are strictly for use within the core. Extensions and embedders
	2369	need only be aware of [pad]THX.
	2370
	2371	=head2 So what happened to dTHR?
	2372
	2373	C<dTHR> was introduced in perl 5.005 to support the older thread model.
	2374	The older thread model now uses the C<THX> mechanism to pass context
	2375	pointers around, so C<dTHR> is not useful any more. Perl 5.6.0 and
	2376	later still have it for backward source compatibility, but it is defined
	2377	to be a no-op.
	2378
	2379	=head2 How do I use all this in extensions?
	2380
	2381	When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call
	2382	any functions in the Perl API will need to pass the initial context
	2383	argument somehow. The kicker is that you will need to write it in
	2384	such a way that the extension still compiles when Perl hasn't been
	2385	built with PERL_IMPLICIT_CONTEXT enabled.
	2386
	2387	There are three ways to do this. First, the easy but inefficient way,
	2388	which is also the default, in order to maintain source compatibility
	2389	with extensions: whenever F<XSUB.h> is #included, it redefines the aTHX
	2390	and aTHX_ macros to call a function that will return the context.
	2391	Thus, something like:
	2392
	2393	sv_setiv(sv, num);
	2394
	2395	in your extension will translate to this when PERL_IMPLICIT_CONTEXT is
	2396	in effect:
	2397
	2398	Perl_sv_setiv(Perl_get_context(), sv, num);
	2399
	2400	or to this otherwise:
	2401
	2402	Perl_sv_setiv(sv, num);
	2403
	2404	You don't have to do anything new in your extension to get this; since
	2405	the Perl library provides Perl_get_context(), it will all just
	2406	work.
	2407
	2408	The second, more efficient way is to use the following template for
	2409	your Foo.xs:
	2410
	2411	#define PERL_NO_GET_CONTEXT /* we want efficiency */
	2412	#include "EXTERN.h"
	2413	#include "perl.h"
	2414	#include "XSUB.h"
	2415
	2416	STATIC void my_private_function(int arg1, int arg2);
	2417
	2418	STATIC void
	2419	my_private_function(int arg1, int arg2)
	2420	{
	2421	dTHX; /* fetch context */
	2422	... call many Perl API functions ...
	2423	}
	2424
	2425	[... etc ...]
	2426
	2427	MODULE = Foo PACKAGE = Foo
	2428
	2429	/* typical XSUB */
	2430
	2431	void
	2432	my_xsub(arg)
	2433	int arg
	2434	CODE:
	2435	my_private_function(arg, 10);
	2436
	2437	Note that the only two changes from the normal way of writing an
	2438	extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before
	2439	including the Perl headers, followed by a C<dTHX;> declaration at
	2440	the start of every function that will call the Perl API. (You'll
	2441	know which functions need this, because the C compiler will complain
	2442	that there's an undeclared identifier in those functions.) No changes
	2443	are needed for the XSUBs themselves, because the XS() macro is
	2444	correctly defined to pass in the implicit context if needed.
	2445
	2446	The third, even more efficient way is to ape how it is done within
	2447	the Perl guts:
	2448
	2449
	2450	#define PERL_NO_GET_CONTEXT /* we want efficiency */
	2451	#include "EXTERN.h"
	2452	#include "perl.h"
	2453	#include "XSUB.h"
	2454
	2455	/* pTHX_ only needed for functions that call Perl API */
	2456	STATIC void my_private_function(pTHX_ int arg1, int arg2);
	2457
	2458	STATIC void
	2459	my_private_function(pTHX_ int arg1, int arg2)
	2460	{
	2461	/* dTHX; not needed here, because THX is an argument */
	2462	... call Perl API functions ...
	2463	}
	2464
	2465	[... etc ...]
	2466
	2467	MODULE = Foo PACKAGE = Foo
	2468
	2469	/* typical XSUB */
	2470
	2471	void
	2472	my_xsub(arg)
	2473	int arg
	2474	CODE:
	2475	my_private_function(aTHX_ arg, 10);
	2476
	2477	This implementation never has to fetch the context using a function
	2478	call, since it is always passed as an extra argument. Depending on
	2479	your needs for simplicity or efficiency, you may mix the previous
	2480	two approaches freely.
	2481
	2482	Never add a comma after C<pTHX> yourself--always use the form of the
	2483	macro with the underscore for functions that take explicit arguments,
	2484	or the form without the argument for functions with no explicit arguments.
	2485
	2486	If one is compiling Perl with the C<-DPERL_GLOBAL_STRUCT> the C<dVAR>
	2487	definition is needed if the Perl global variables (see F<perlvars.h>
	2488	or F<globvar.sym>) are accessed in the function and C<dTHX> is not
	2489	used (the C<dTHX> includes the C<dVAR> if necessary). One notices
	2490	the need for C<dVAR> only with the said compile-time define, because
	2491	otherwise the Perl global variables are visible as-is.
	2492
	2493	=head2 Should I do anything special if I call perl from multiple threads?
	2494
	2495	If you create interpreters in one thread and then proceed to call them in
	2496	another, you need to make sure perl's own Thread Local Storage (TLS) slot is
	2497	initialized correctly in each of those threads.
	2498
	2499	The C<perl_alloc> and C<perl_clone> API functions will automatically set
	2500	the TLS slot to the interpreter they created, so that there is no need to do
	2501	anything special if the interpreter is always accessed in the same thread that
	2502	created it, and that thread did not create or call any other interpreters
	2503	afterwards. If that is not the case, you have to set the TLS slot of the
	2504	thread before calling any functions in the Perl API on that particular
	2505	interpreter. This is done by calling the C<PERL_SET_CONTEXT> macro in that
	2506	thread as the first thing you do:
	2507
	2508	/* do this before doing anything else with some_perl */
	2509	PERL_SET_CONTEXT(some_perl);
	2510
	2511	... other Perl API calls on some_perl go here ...
	2512
	2513	=head2 Future Plans and PERL_IMPLICIT_SYS
	2514
	2515	Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything
	2516	that the interpreter knows about itself and pass it around, so too are
	2517	there plans to allow the interpreter to bundle up everything it knows
	2518	about the environment it's running on. This is enabled with the
	2519	PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS on
	2520	Windows.
	2521
	2522	This allows the ability to provide an extra pointer (called the "host"
	2523	environment) for all the system calls. This makes it possible for
	2524	all the system stuff to maintain their own state, broken down into
	2525	seven C structures. These are thin wrappers around the usual system
	2526	calls (see F<win32/perllib.c>) for the default perl executable, but for a
	2527	more ambitious host (like the one that would do fork() emulation) all
	2528	the extra work needed to pretend that different interpreters are
	2529	actually different "processes", would be done here.
	2530
	2531	The Perl engine/interpreter and the host are orthogonal entities.
	2532	There could be one or more interpreters in a process, and one or
	2533	more "hosts", with free association between them.
	2534
	2535	=head1 Internal Functions
	2536
	2537	All of Perl's internal functions which will be exposed to the outside
	2538	world are prefixed by C<Perl_> so that they will not conflict with XS
	2539	functions or functions used in a program in which Perl is embedded.
	2540	Similarly, all global variables begin with C<PL_>. (By convention,
	2541	static functions start with C<S_>.)
	2542
	2543	Inside the Perl core (C<PERL_CORE> defined), you can get at the functions
	2544	either with or without the C<Perl_> prefix, thanks to a bunch of defines
	2545	that live in F<embed.h>. Note that extension code should I<not> set
	2546	C<PERL_CORE>; this exposes the full perl internals, and is likely to cause
	2547	breakage of the XS in each new perl release.
	2548
	2549	The file F<embed.h> is generated automatically from
	2550	F<embed.pl> and F<embed.fnc>. F<embed.pl> also creates the prototyping
	2551	header files for the internal functions, generates the documentation
	2552	and a lot of other bits and pieces. It's important that when you add
	2553	a new function to the core or change an existing one, you change the
	2554	data in the table in F<embed.fnc> as well. Here's a sample entry from
	2555	that table:
	2556
	2557	Apd \|SV** \|av_fetch \|AV* ar\|I32 key\|I32 lval
	2558
	2559	The second column is the return type, the third column the name. Columns
	2560	after that are the arguments. The first column is a set of flags:
	2561
	2562	=over 3
	2563
	2564	=item A
	2565
	2566	This function is a part of the public
	2567	API. All such functions should also
	2568	have 'd', very few do not.
	2569
	2570	=item p
	2571
	2572	This function has a C<Perl_> prefix; i.e. it is defined as
	2573	C<Perl_av_fetch>.
	2574
	2575	=item d
	2576
	2577	This function has documentation using the C<apidoc> feature which we'll
	2578	look at in a second. Some functions have 'd' but not 'A'; docs are good.
	2579
	2580	=back
	2581
	2582	Other available flags are:
	2583
	2584	=over 3
	2585
	2586	=item s
	2587
	2588	This is a static function and is defined as C<STATIC S_whatever>, and
	2589	usually called within the sources as C<whatever(...)>.
	2590
	2591	=item n
	2592
	2593	This does not need an interpreter context, so the definition has no
	2594	C<pTHX>, and it follows that callers don't use C<aTHX>. (See
	2595	L</Background and PERL_IMPLICIT_CONTEXT>.)
	2596
	2597	=item r
	2598
	2599	This function never returns; C<croak>, C<exit> and friends.
	2600
	2601	=item f
	2602
	2603	This function takes a variable number of arguments, C<printf> style.
	2604	The argument list should end with C<...>, like this:
	2605
	2606	Afprd \|void \|croak \|const char* pat\|...
	2607
	2608	=item M
	2609
	2610	This function is part of the experimental development API, and may change
	2611	or disappear without notice.
	2612
	2613	=item o
	2614
	2615	This function should not have a compatibility macro to define, say,
	2616	C<Perl_parse> to C<parse>. It must be called as C<Perl_parse>.
	2617
	2618	=item x
	2619
	2620	This function isn't exported out of the Perl core.
	2621
	2622	=item m
	2623
	2624	This is implemented as a macro.
	2625
	2626	=item X
	2627
	2628	This function is explicitly exported.
	2629
	2630	=item E
	2631
	2632	This function is visible to extensions included in the Perl core.
	2633
	2634	=item b
	2635
	2636	Binary backward compatibility; this function is a macro but also has
	2637	a C<Perl_> implementation (which is exported).
	2638
	2639	=item others
	2640
	2641	See the comments at the top of C<embed.fnc> for others.
	2642
	2643	=back
	2644
	2645	If you edit F<embed.pl> or F<embed.fnc>, you will need to run
	2646	C<make regen_headers> to force a rebuild of F<embed.h> and other
	2647	auto-generated files.
	2648
	2649	=head2 Formatted Printing of IVs, UVs, and NVs
	2650
	2651	If you are printing IVs, UVs, or NVS instead of the stdio(3) style
	2652	formatting codes like C<%d>, C<%ld>, C<%f>, you should use the
	2653	following macros for portability
	2654
	2655	IVdf IV in decimal
	2656	UVuf UV in decimal
	2657	UVof UV in octal
	2658	UVxf UV in hexadecimal
	2659	NVef NV %e-like
	2660	NVff NV %f-like
	2661	NVgf NV %g-like
	2662
	2663	These will take care of 64-bit integers and long doubles.
	2664	For example:
	2665
	2666	printf("IV is %"IVdf"\n", iv);
	2667
	2668	The IVdf will expand to whatever is the correct format for the IVs.
	2669
	2670	Note that there are different "long doubles": Perl will use
	2671	whatever the compiler has.
	2672
	2673	If you are printing addresses of pointers, use UVxf combined
	2674	with PTR2UV(), do not use %lx or %p.
	2675
	2676	=head2 Pointer-To-Integer and Integer-To-Pointer
	2677
	2678	Because pointer size does not necessarily equal integer size,
	2679	use the follow macros to do it right.
	2680
	2681	PTR2UV(pointer)
	2682	PTR2IV(pointer)
	2683	PTR2NV(pointer)
	2684	INT2PTR(pointertotype, integer)
	2685
	2686	For example:
	2687
	2688	IV iv = ...;
	2689	SV sv = INT2PTR(SV, iv);
	2690
	2691	and
	2692
	2693	AV *av = ...;
	2694	UV uv = PTR2UV(av);
	2695
	2696	=head2 Exception Handling
	2697
	2698	There are a couple of macros to do very basic exception handling in XS
	2699	modules. You have to define C<NO_XSLOCKS> before including F<XSUB.h> to
	2700	be able to use these macros:
	2701
	2702	#define NO_XSLOCKS
	2703	#include "XSUB.h"
	2704
	2705	You can use these macros if you call code that may croak, but you need
	2706	to do some cleanup before giving control back to Perl. For example:
	2707
	2708	dXCPT; /* set up necessary variables */
	2709
	2710	XCPT_TRY_START {
	2711	code_that_may_croak();
	2712	} XCPT_TRY_END
	2713
	2714	XCPT_CATCH
	2715	{
	2716	/* do cleanup here */
	2717	XCPT_RETHROW;
	2718	}
	2719
	2720	Note that you always have to rethrow an exception that has been
	2721	caught. Using these macros, it is not possible to just catch the
	2722	exception and ignore it. If you have to ignore the exception, you
	2723	have to use the C<call_*> function.
	2724
	2725	The advantage of using the above macros is that you don't have
	2726	to setup an extra function for C<call_*>, and that using these
	2727	macros is faster than using C<call_*>.
	2728
	2729	=head2 Source Documentation
	2730
	2731	There's an effort going on to document the internal functions and
	2732	automatically produce reference manuals from them -- L<perlapi> is one
	2733	such manual which details all the functions which are available to XS
	2734	writers. L<perlintern> is the autogenerated manual for the functions
	2735	which are not part of the API and are supposedly for internal use only.
	2736
	2737	Source documentation is created by putting POD comments into the C
	2738	source, like this:
	2739
	2740	/*
	2741	=for apidoc sv_setiv
	2742
	2743	Copies an integer into the given SV. Does not handle 'set' magic. See
	2744	C<sv_setiv_mg>.
	2745
	2746	=cut
	2747	*/
	2748
	2749	Please try and supply some documentation if you add functions to the
	2750	Perl core.
	2751
	2752	=head2 Backwards compatibility
	2753
	2754	The Perl API changes over time. New functions are
	2755	added or the interfaces of existing functions are
	2756	changed. The C<Devel::PPPort> module tries to
	2757	provide compatibility code for some of these changes, so XS writers don't
	2758	have to code it themselves when supporting multiple versions of Perl.
	2759
	2760	C<Devel::PPPort> generates a C header file F<ppport.h> that can also
	2761	be run as a Perl script. To generate F<ppport.h>, run:
	2762
	2763	perl -MDevel::PPPort -eDevel::PPPort::WriteFile
	2764
	2765	Besides checking existing XS code, the script can also be used to retrieve
	2766	compatibility information for various API calls using the C<--api-info>
	2767	command line switch. For example:
	2768
	2769	% perl ppport.h --api-info=sv_magicext
	2770
	2771	For details, see C<perldoc ppport.h>.
	2772
	2773	=head1 Unicode Support
	2774
	2775	Perl 5.6.0 introduced Unicode support. It's important for porters and XS
	2776	writers to understand this support and make sure that the code they
	2777	write does not corrupt Unicode data.
	2778
	2779	=head2 What B<is> Unicode, anyway?
	2780
	2781	In the olden, less enlightened times, we all used to use ASCII. Most of
	2782	us did, anyway. The big problem with ASCII is that it's American. Well,
	2783	no, that's not actually the problem; the problem is that it's not
	2784	particularly useful for people who don't use the Roman alphabet. What
	2785	used to happen was that particular languages would stick their own
	2786	alphabet in the upper range of the sequence, between 128 and 255. Of
	2787	course, we then ended up with plenty of variants that weren't quite
	2788	ASCII, and the whole point of it being a standard was lost.
	2789
	2790	Worse still, if you've got a language like Chinese or
	2791	Japanese that has hundreds or thousands of characters, then you really
	2792	can't fit them into a mere 256, so they had to forget about ASCII
	2793	altogether, and build their own systems using pairs of numbers to refer
	2794	to one character.
	2795
	2796	To fix this, some people formed Unicode, Inc. and
	2797	produced a new character set containing all the characters you can
	2798	possibly think of and more. There are several ways of representing these
	2799	characters, and the one Perl uses is called UTF-8. UTF-8 uses
	2800	a variable number of bytes to represent a character. You can learn more
	2801	about Unicode and Perl's Unicode model in L<perlunicode>.
	2802
	2803	(On EBCDIC platforms, Perl uses instead UTF-EBCDIC, which is a form of
	2804	UTF-8 adapted for EBCDIC platforms. Below, we just talk about UTF-8.
	2805	UTF-EBCDIC is like UTF-8, but the details are different. The macros
	2806	hide the differences from you, just remember that the particular numbers
	2807	and bit patterns presented below will differ in UTF-EBCDIC.)
	2808
	2809	=head2 How can I recognise a UTF-8 string?
	2810
	2811	You can't. This is because UTF-8 data is stored in bytes just like
	2812	non-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types)
	2813	capital E with a grave accent, is represented by the two bytes
	2814	C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)>
	2815	has that byte sequence as well. So you can't tell just by looking -- this
	2816	is what makes Unicode input an interesting problem.
	2817
	2818	In general, you either have to know what you're dealing with, or you
	2819	have to guess. The API function C<is_utf8_string> can help; it'll tell
	2820	you if a string contains only valid UTF-8 characters, and the chances
	2821	of a non-UTF-8 string looking like valid UTF-8 become very small very
	2822	quickly with increasing string length. On a character-by-character
	2823	basis, C<isUTF8_CHAR>
	2824	will tell you whether the current character in a string is valid UTF-8.
	2825
	2826	=head2 How does UTF-8 represent Unicode characters?
	2827
	2828	As mentioned above, UTF-8 uses a variable number of bytes to store a
	2829	character. Characters with values 0...127 are stored in one
	2830	byte, just like good ol' ASCII. Character 128 is stored as
	2831	C<v194.128>; this continues up to character 191, which is
	2832	C<v194.191>. Now we've run out of bits (191 is binary
	2833	C<10111111>) so we move on; character 192 is C<v195.128>. And
	2834	so it goes on, moving to three bytes at character 2048.
	2835	L<perlunicode/Unicode Encodings> has pictures of how this works.
	2836
	2837	Assuming you know you're dealing with a UTF-8 string, you can find out
	2838	how long the first character in it is with the C<UTF8SKIP> macro:
	2839
	2840	char *utf = "\305\233\340\240\201";
	2841	I32 len;
	2842
	2843	len = UTF8SKIP(utf); /* len is 2 here */
	2844	utf += len;
	2845	len = UTF8SKIP(utf); /* len is 3 here */
	2846
	2847	Another way to skip over characters in a UTF-8 string is to use
	2848	C<utf8_hop>, which takes a string and a number of characters to skip
	2849	over. You're on your own about bounds checking, though, so don't use it
	2850	lightly.
	2851
	2852	All bytes in a multi-byte UTF-8 character will have the high bit set,
	2853	so you can test if you need to do something special with this
	2854	character like this (the C<UTF8_IS_INVARIANT()> is a macro that tests
	2855	whether the byte is encoded as a single byte even in UTF-8):
	2856
	2857	U8 *utf;
	2858	U8 utf_end; / 1 beyond buffer pointed to by utf */
	2859	UV uv; /* Note: a UV, not a U8, not a char */
	2860	STRLEN len; /* length of character in bytes */
	2861
	2862	if (!UTF8_IS_INVARIANT(*utf))
	2863	/* Must treat this as UTF-8 */
	2864	uv = utf8_to_uvchr_buf(utf, utf_end, &len);
	2865	else
	2866	/* OK to treat this character as a byte */
	2867	uv = *utf;
	2868
	2869	You can also see in that example that we use C<utf8_to_uvchr_buf> to get the
	2870	value of the character; the inverse function C<uvchr_to_utf8> is available
	2871	for putting a UV into UTF-8:
	2872
	2873	if (!UVCHR_IS_INVARIANT(uv))
	2874	/* Must treat this as UTF8 */
	2875	utf8 = uvchr_to_utf8(utf8, uv);
	2876	else
	2877	/* OK to treat this character as a byte */
	2878	*utf8++ = uv;
	2879
	2880	You B<must> convert characters to UVs using the above functions if
	2881	you're ever in a situation where you have to match UTF-8 and non-UTF-8
	2882	characters. You may not skip over UTF-8 characters in this case. If you
	2883	do this, you'll lose the ability to match hi-bit non-UTF-8 characters;
	2884	for instance, if your UTF-8 string contains C<v196.172>, and you skip
	2885	that character, you can never match a C<chr(200)> in a non-UTF-8 string.
	2886	So don't do that!
	2887
	2888	(Note that we don't have to test for invariant characters in the
	2889	examples above. The functions work on any well-formed UTF-8 input.
	2890	It's just that its faster to avoid the function overhead when it's not
	2891	needed.)
	2892
	2893	=head2 How does Perl store UTF-8 strings?
	2894
	2895	Currently, Perl deals with UTF-8 strings and non-UTF-8 strings
	2896	slightly differently. A flag in the SV, C<SVf_UTF8>, indicates that the
	2897	string is internally encoded as UTF-8. Without it, the byte value is the
	2898	codepoint number and vice versa. This flag is only meaningful if the SV
	2899	is C<SvPOK> or immediately after stringification via C<SvPV> or a
	2900	similar macro. You can check and manipulate this flag with the
	2901	following macros:
	2902
	2903	SvUTF8(sv)
	2904	SvUTF8_on(sv)
	2905	SvUTF8_off(sv)
	2906
	2907	This flag has an important effect on Perl's treatment of the string: if
	2908	UTF-8 data is not properly distinguished, regular expressions,
	2909	C<length>, C<substr> and other string handling operations will have
	2910	undesirable (wrong) results.
	2911
	2912	The problem comes when you have, for instance, a string that isn't
	2913	flagged as UTF-8, and contains a byte sequence that could be UTF-8 --
	2914	especially when combining non-UTF-8 and UTF-8 strings.
	2915
	2916	Never forget that the C<SVf_UTF8> flag is separate from the PV value; you
	2917	need to be sure you don't accidentally knock it off while you're
	2918	manipulating SVs. More specifically, you cannot expect to do this:
	2919
	2920	SV *sv;
	2921	SV *nsv;
	2922	STRLEN len;
	2923	char *p;
	2924
	2925	p = SvPV(sv, len);
	2926	frobnicate(p);
	2927	nsv = newSVpvn(p, len);
	2928
	2929	The C<char*> string does not tell you the whole story, and you can't
	2930	copy or reconstruct an SV just by copying the string value. Check if the
	2931	old SV has the UTF8 flag set (I<after> the C<SvPV> call), and act
	2932	accordingly:
	2933
	2934	p = SvPV(sv, len);
	2935	is_utf8 = SvUTF8(sv);
	2936	frobnicate(p, is_utf8);
	2937	nsv = newSVpvn(p, len);
	2938	if (is_utf8)
	2939	SvUTF8_on(nsv);
	2940
	2941	In the above, your C<frobnicate> function has been changed to be made
	2942	aware of whether or not it's dealing with UTF-8 data, so that it can
	2943	handle the string appropriately.
	2944
	2945	Since just passing an SV to an XS function and copying the data of
	2946	the SV is not enough to copy the UTF8 flags, even less right is just
	2947	passing a S<C<char *>> to an XS function.
	2948
	2949	For full generality, use the L<C<DO_UTF8>\|perlapi/DO_UTF8> macro to see if the
	2950	string in an SV is to be I<treated> as UTF-8. This takes into account
	2951	if the call to the XS function is being made from within the scope of
	2952	L<S<C<use bytes>>\|bytes>. If so, the underlying bytes that comprise the
	2953	UTF-8 string are to be exposed, rather than the character they
	2954	represent. But this pragma should only really be used for debugging and
	2955	perhaps low-level testing at the byte level. Hence most XS code need
	2956	not concern itself with this, but various areas of the perl core do need
	2957	to support it.
	2958
	2959	And this isn't the whole story. Starting in Perl v5.12, strings that
	2960	aren't encoded in UTF-8 may also be treated as Unicode under various
	2961	conditions (see L<perlunicode/ASCII Rules versus Unicode Rules>).
	2962	This is only really a problem for characters whose ordinals are between
	2963	128 and 255, and their behavior varies under ASCII versus Unicode rules
	2964	in ways that your code cares about (see L<perlunicode/The "Unicode Bug">).
	2965	There is no published API for dealing with this, as it is subject to
	2966	change, but you can look at the code for C<pp_lc> in F<pp.c> for an
	2967	example as to how it's currently done.
	2968
	2969	=head2 How do I convert a string to UTF-8?
	2970
	2971	If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to upgrade
	2972	the non-UTF-8 strings to UTF-8. If you've got an SV, the easiest way to do
	2973	this is:
	2974
	2975	sv_utf8_upgrade(sv);
	2976
	2977	However, you must not do this, for example:
	2978
	2979	if (!SvUTF8(left))
	2980	sv_utf8_upgrade(left);
	2981
	2982	If you do this in a binary operator, you will actually change one of the
	2983	strings that came into the operator, and, while it shouldn't be noticeable
	2984	by the end user, it can cause problems in deficient code.
	2985
	2986	Instead, C<bytes_to_utf8> will give you a UTF-8-encoded B<copy> of its
	2987	string argument. This is useful for having the data available for
	2988	comparisons and so on, without harming the original SV. There's also
	2989	C<utf8_to_bytes> to go the other way, but naturally, this will fail if
	2990	the string contains any characters above 255 that can't be represented
	2991	in a single byte.
	2992
	2993	=head2 How do I compare strings?
	2994
	2995	L<perlapi/sv_cmp> and L<perlapi/sv_cmp_flags> do a lexigraphic
	2996	comparison of two SV's, and handle UTF-8ness properly. Note, however,
	2997	that Unicode specifies a much fancier mechanism for collation, available
	2998	via the L<Unicode::Collate> module.
	2999
	3000	To just compare two strings for equality/non-equality, you can just use
	3001	L<C<memEQ()>\|perlapi/memEQ> and L<C<memNE()>\|perlapi/memEQ> as usual,
	3002	except the strings must be both UTF-8 or not UTF-8 encoded.
	3003
	3004	To compare two strings case-insensitively, use
	3005	L<C<foldEQ_utf8()>\|perlapi/foldEQ_utf8> (the strings don't have to have
	3006	the same UTF-8ness).
	3007
	3008	=head2 Is there anything else I need to know?
	3009
	3010	Not really. Just remember these things:
	3011
	3012	=over 3
	3013
	3014	=item *
	3015
	3016	There's no way to tell if a S<C<char >> or S<C<U8 >> string is UTF-8
	3017	or not. But you can tell if an SV is to be treated as UTF-8 by calling
	3018	C<DO_UTF8> on it, after stringifying it with C<SvPV> or a similar
	3019	macro. And, you can tell if SV is actually UTF-8 (even if it is not to
	3020	be treated as such) by looking at its C<SvUTF8> flag (again after
	3021	stringifying it). Don't forget to set the flag if something should be
	3022	UTF-8.
	3023	Treat the flag as part of the PV, even though it's not -- if you pass on
	3024	the PV to somewhere, pass on the flag too.
	3025
	3026	=item *
	3027
	3028	If a string is UTF-8, B<always> use C<utf8_to_uvchr_buf> to get at the value,
	3029	unless C<UTF8_IS_INVARIANT(s)> in which case you can use C<s>.
	3030
	3031	=item *
	3032
	3033	When writing a character UV to a UTF-8 string, B<always> use
	3034	C<uvchr_to_utf8>, unless C<UVCHR_IS_INVARIANT(uv))> in which case
	3035	you can use C<*s = uv>.
	3036
	3037	=item *
	3038
	3039	Mixing UTF-8 and non-UTF-8 strings is
	3040	tricky. Use C<bytes_to_utf8> to get
	3041	a new string which is UTF-8 encoded, and then combine them.
	3042
	3043	=back
	3044
	3045	=head1 Custom Operators
	3046
	3047	Custom operator support is an experimental feature that allows you to
	3048	define your own ops. This is primarily to allow the building of
	3049	interpreters for other languages in the Perl core, but it also allows
	3050	optimizations through the creation of "macro-ops" (ops which perform the
	3051	functions of multiple ops which are usually executed together, such as
	3052	C<gvsv, gvsv, add>.)
	3053
	3054	This feature is implemented as a new op type, C<OP_CUSTOM>. The Perl
	3055	core does not "know" anything special about this op type, and so it will
	3056	not be involved in any optimizations. This also means that you can
	3057	define your custom ops to be any op structure -- unary, binary, list and
	3058	so on -- you like.
	3059
	3060	It's important to know what custom operators won't do for you. They
	3061	won't let you add new syntax to Perl, directly. They won't even let you
	3062	add new keywords, directly. In fact, they won't change the way Perl
	3063	compiles a program at all. You have to do those changes yourself, after
	3064	Perl has compiled the program. You do this either by manipulating the op
	3065	tree using a C<CHECK> block and the C<B::Generate> module, or by adding
	3066	a custom peephole optimizer with the C<optimize> module.
	3067
	3068	When you do this, you replace ordinary Perl ops with custom ops by
	3069	creating ops with the type C<OP_CUSTOM> and the C<op_ppaddr> of your own
	3070	PP function. This should be defined in XS code, and should look like
	3071	the PP ops in C<pp_*.c>. You are responsible for ensuring that your op
	3072	takes the appropriate number of values from the stack, and you are
	3073	responsible for adding stack marks if necessary.
	3074
	3075	You should also "register" your op with the Perl interpreter so that it
	3076	can produce sensible error and warning messages. Since it is possible to
	3077	have multiple custom ops within the one "logical" op type C<OP_CUSTOM>,
	3078	Perl uses the value of C<< o->op_ppaddr >> to determine which custom op
	3079	it is dealing with. You should create an C<XOP> structure for each
	3080	ppaddr you use, set the properties of the custom op with
	3081	C<XopENTRY_set>, and register the structure against the ppaddr using
	3082	C<Perl_custom_op_register>. A trivial example might look like:
	3083
	3084	static XOP my_xop;
	3085	static OP *my_pp(pTHX);
	3086
	3087	BOOT:
	3088	XopENTRY_set(&my_xop, xop_name, "myxop");
	3089	XopENTRY_set(&my_xop, xop_desc, "Useless custom op");
	3090	Perl_custom_op_register(aTHX_ my_pp, &my_xop);
	3091
	3092	The available fields in the structure are:
	3093
	3094	=over 4
	3095
	3096	=item xop_name
	3097
	3098	A short name for your op. This will be included in some error messages,
	3099	and will also be returned as C<< $op->name >> by the L<B\|B> module, so
	3100	it will appear in the output of module like L<B::Concise\|B::Concise>.
	3101
	3102	=item xop_desc
	3103
	3104	A short description of the function of the op.
	3105
	3106	=item xop_class
	3107
	3108	Which of the various C<*OP> structures this op uses. This should be one of
	3109	the C<OA_*> constants from F<op.h>, namely
	3110
	3111	=over 4
	3112
	3113	=item OA_BASEOP
	3114
	3115	=item OA_UNOP
	3116
	3117	=item OA_BINOP
	3118
	3119	=item OA_LOGOP
	3120
	3121	=item OA_LISTOP
	3122
	3123	=item OA_PMOP
	3124
	3125	=item OA_SVOP
	3126
	3127	=item OA_PADOP
	3128
	3129	=item OA_PVOP_OR_SVOP
	3130
	3131	This should be interpreted as 'C<PVOP>' only. The C<_OR_SVOP> is because
	3132	the only core C<PVOP>, C<OP_TRANS>, can sometimes be a C<SVOP> instead.
	3133
	3134	=item OA_LOOP
	3135
	3136	=item OA_COP
	3137
	3138	=back
	3139
	3140	The other C<OA_*> constants should not be used.
	3141
	3142	=item xop_peep
	3143
	3144	This member is of type C<Perl_cpeep_t>, which expands to C<void
	3145	(Perl_cpeep_t)(aTHX_ OP o, OP *oldop)>. If it is set, this function
	3146	will be called from C<Perl_rpeep> when ops of this type are encountered
	3147	by the peephole optimizer. I<o> is the OP that needs optimizing;
	3148	I<oldop> is the previous OP optimized, whose C<op_next> points to I<o>.
	3149
	3150	=back
	3151
	3152	C<B::Generate> directly supports the creation of custom ops by name.
	3153
	3154	=head1 AUTHORS
	3155
	3156	Until May 1997, this document was maintained by Jeff Okamoto
	3157	E<lt>okamoto@corp.hp.comE<gt>. It is now maintained as part of Perl
	3158	itself by the Perl 5 Porters E<lt>perl5-porters@perl.orgE<gt>.
	3159
	3160	With lots of help and suggestions from Dean Roehrich, Malcolm Beattie,
	3161	Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil
	3162	Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer,
	3163	Stephen McCamant, and Gurusamy Sarathy.
	3164
	3165	=head1 SEE ALSO
	3166
	3167	L<perlapi>, L<perlintern>, L<perlxs>, L<perlembed>