perl5.git.perl.org Git - perl5.git/blame

Commit	Line	Data
fed514bc KW	1	=for comment
	2	The part of this file between =for mg_vtable.pl markers is auto
	3	generated by mg_vtable.pl; any changes there need to be made instead to
	4	mg_vtable.pl
	5
a0d0e21e LW	6	=head1 NAME
a0d0e21e LW	7
954c1994	8	perlguts - Introduction to the Perl API
a0d0e21e LW	9
	10	=head1 DESCRIPTION
	11
b3b6085d	12	This document attempts to describe how to use the Perl API, as well as
10e2eb10 FC	13	to provide some info on the basic workings of the Perl core. It is far
10e2eb10 FC	14	from complete and probably contains many errors. Please refer any
b3b6085d	15	questions or comments to the author below.
a0d0e21e	16
0a753a76	17	=head1 Variables
0a753a76	18
5f05dabc	19	=head2 Datatypes
a0d0e21e LW	20
	21	Perl has three typedefs that handle Perl's three main data types:
	22
	23	SV Scalar Value
	24	AV Array Value
	25	HV Hash Value
	26
d1b91892	27	Each typedef has specific routines that manipulate the various data types.
a0d0e21e	28
3f620621	29	=for apidoc_section $AV
63dbc4a9	30	=for apidoc Ayh\|\|AV
3f620621	31	=for apidoc_section $HV
63dbc4a9	32	=for apidoc Ayh\|\|HV
3f620621	33	=for apidoc_section $SV
63dbc4a9 KW	34	=for apidoc Ayh\|\|SV
63dbc4a9 KW	35
a0d0e21e LW	36	=head2 What is an "IV"?
a0d0e21e LW	37
954c1994	38	Perl uses a special typedef IV which is a simple signed integer type that is
5f05dabc	39	guaranteed to be large enough to hold a pointer (as well as an integer).
954c1994	40	Additionally, there is the UV, which is simply an unsigned IV.
a0d0e21e	41
63dbc4a9 KW	42	Perl also uses several special typedefs to declare variables to hold
	43	integers of (at least) a given size.
	44	Use I8, I16, I32, and I64 to declare a signed integer variable which has
	45	at least as many bits as the number in its name. These all evaluate to
	46	the native C type that is closest to the given number of bits, but no
	47	smaller than that number. For example, on many platforms, a C<short> is
	48	16 bits long, and if so, I16 will evaluate to a C<short>. But on
	49	platforms where a C<short> isn't exactly 16 bits, Perl will use the
	50	smallest type that contains 16 bits or more.
	51
	52	U8, U16, U32, and U64 are to declare the corresponding unsigned integer
	53	types.
	54
	55	If the platform doesn't support 64-bit integers, both I64 and U64 will
	56	be undefined. Use IV and UV to declare the largest practicable, and
	57	C<L<perlapi/WIDEST_UTYPE>> for the absolute maximum unsigned, but which
	58	may not be usable in all circumstances.
	59
	60	A numeric constant can be specified with L<perlapi/C<INT16_C>>,
	61	L<perlapi/C<UINTMAX_C>>, and similar.
	62
3f620621	63	=for apidoc_section $integer
1607e393 KW	64	=for apidoc Ayh \|\|IV
1607e393 KW	65	=for apidoc_item \|\|I8
63dbc4a9 KW	66	=for apidoc_item \|\|I16
	67	=for apidoc_item \|\|I32
	68	=for apidoc_item \|\|I64
63dbc4a9	69
1607e393 KW	70	=for apidoc Ayh \|\|UV
1607e393 KW	71	=for apidoc_item \|\|U8
63dbc4a9 KW	72	=for apidoc_item \|\|U16
	73	=for apidoc_item \|\|U32
	74	=for apidoc_item \|\|U64
a0d0e21e	75
54310121	76	=head2 Working with SVs
a0d0e21e	77
20dbd849 NC	78	An SV can be created and loaded with one command. There are five types of
	79	values that can be loaded: an integer value (IV), an unsigned integer
	80	value (UV), a double (NV), a string (PV), and another scalar (SV).
61984ee1 KW	81	("PV" stands for "Pointer Value". You might think that it is misnamed
61984ee1 KW	82	because it is described as pointing only to strings. However, it is
3ee1a09c	83	possible to have it point to other things. For example, it could point
d6605d24	84	to an array of UVs. But,
61984ee1 KW	85	using it for non-strings requires care, as the underlying assumption of
61984ee1 KW	86	much of the internals is that PVs are just for strings. Often, for
6602b933	87	example, a trailing C<NUL> is tacked on automatically. The non-string use
61984ee1	88	is documented only in this paragraph.)
a0d0e21e	89
7cc7ada7	90	=for apidoc_section $floating
63dbc4a9 KW	91	=for apidoc Ayh\|\|NV
63dbc4a9 KW	92
20dbd849	93	The seven routines are:
a0d0e21e LW	94
a0d0e21e LW	95	SV* newSViv(IV);
20dbd849	96	SV* newSVuv(UV);
a0d0e21e	97	SV* newSVnv(double);
06f6df17 RGS	98	SV* newSVpv(const char*, STRLEN);
06f6df17 RGS	99	SV* newSVpvn(const char*, STRLEN);
46fc3d4c	100	SV* newSVpvf(const char*, ...);
a0d0e21e LW	101	SV* newSVsv(SV*);
a0d0e21e LW	102
e613617c	103	C<STRLEN> is an integer type (C<Size_t>, usually defined as C<size_t> in
06f6df17 RGS	104	F<config.h>) guaranteed to be large enough to represent the size of
	105	any string that perl can handle.
	106
7cc7ada7	107	=for apidoc_section $string
63dbc4a9 KW	108	=for apidoc Ayh\|\|STRLEN
63dbc4a9 KW	109
3bf17896	110	In the unlikely case of a SV requiring more complex initialization, you
06f6df17 RGS	111	can create an empty SV with newSV(len). If C<len> is 0 an empty SV of
06f6df17 RGS	112	type NULL is returned, else an SV of type PV is returned with len + 1 (for
6602b933	113	the C<NUL>) bytes of storage allocated, accessible via SvPVX. In both cases
da8c5729	114	the SV has the undef value.
20dbd849	115
06f6df17	116	SV sv = newSV(0); / no storage allocated */
a9b0660e KW	117	SV sv = newSV(10); / 10 (+1) bytes of uninitialised storage
a9b0660e KW	118	* allocated */
20dbd849	119
06f6df17	120	To change the value of an I<already-existing> SV, there are eight routines:
a0d0e21e LW	121
a0d0e21e LW	122	void sv_setiv(SV*, IV);
deb3007b	123	void sv_setuv(SV*, UV);
a0d0e21e	124	void sv_setnv(SV*, double);
08105a92	125	void sv_setpv(SV, const char);
06f6df17	126	void sv_setpvn(SV, const char, STRLEN)
46fc3d4c	127	void sv_setpvf(SV, const char, ...);
a9b0660e	128	void sv_vsetpvfn(SV, const char, STRLEN, va_list *,
03a22d83	129	SV *, Size_t, bool );
a0d0e21e LW	130	void sv_setsv(SV, SV);
	131
	132	Notice that you can choose to specify the length of the string to be
9da1e3b5 MUN	133	assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may
	134	allow Perl to calculate the length by using C<sv_setpv> or by specifying
	135	0 as the second argument to C<newSVpv>. Be warned, though, that Perl will
	136	determine the string's length by using C<strlen>, which depends on the
6602b933	137	string terminating with a C<NUL> character, and not otherwise containing
a9b0660e	138	NULs.
9abd00ed GS	139
	140	The arguments of C<sv_setpvf> are processed like C<sprintf>, and the
	141	formatted output becomes the value.
	142
328bf373	143	C<sv_vsetpvfn> is an analogue of C<vsprintf>, but it allows you to specify
9abd00ed GS	144	either a pointer to a variable argument list or the address and length of
	145	an array of SVs. The last argument points to a boolean; on return, if that
	146	boolean is true, then locale-specific information has been used to format
c2611fb3	147	the string, and the string's contents are therefore untrustworthy (see
9abd00ed GS	148	L<perlsec>). This pointer may be NULL if that information is not
	149	important. Note that this function requires you to specify the length of
	150	the format.
	151
9da1e3b5	152	The C<sv_set*()> functions are not generic enough to operate on values
5a0de581	153	that have "magic". See L</Magic Virtual Tables> later in this document.
a0d0e21e	154
6602b933 KW	155	All SVs that contain strings should be terminated with a C<NUL> character.
6602b933 KW	156	If it is not C<NUL>-terminated there is a risk of
5f05dabc	157	core dumps and corruptions from code which passes the string to C
6602b933 KW	158	functions or system calls which expect a C<NUL>-terminated string.
6602b933 KW	159	Perl's own functions typically add a trailing C<NUL> for this reason.
5f05dabc	160	Nevertheless, you should be very careful when you pass a string stored
	161	in an SV to a C function or system call.
	162
3c3f883d FG	163	To access the actual value that an SV points to, Perl's API exposes
	164	several macros that coerce the actual scalar type into an IV, UV, double,
	165	or string:
	166
	167	=over
	168
	169	=item * C<SvIV(SV)> (C<IV>) and C<SvUV(SV)> (C<UV>)
	170
	171	=item * C<SvNV(SV*)> (C<double>)
	172
	173	=item * Strings are a bit complicated:
	174
	175	=over
	176
	177	=item * Byte string: C<SvPVbyte(SV, STRLEN len)> or C<SvPVbyte_nolen(SV)>
	178
	179	If the Perl string is C<"\xff\xff">, then this returns a 2-byte C<char*>.
	180
	181	This is suitable for Perl strings that represent bytes.
	182
	183	=item * UTF-8 string: C<SvPVutf8(SV, STRLEN len)> or C<SvPVutf8_nolen(SV)>
	184
	185	If the Perl string is C<"\xff\xff">, then this returns a 4-byte C<char*>.
	186
	187	This is suitable for Perl strings that represent characters.
	188
	189	B<CAVEAT>: That C<char*> will be encoded via Perl's internal UTF-8 variant,
	190	which means that if the SV contains non-Unicode code points (e.g.,
	191	0x110000), then the result may contain extensions over valid UTF-8.
	192	See L<perlapi/is_strict_utf8_string> for some methods Perl gives
	193	you to check the UTF-8 validity of these macros' returns.
	194
	195	=item * You can also use C<SvPV(SV, STRLEN len)> or C<SvPV_nolen(SV)>
	196	to fetch the SV's raw internal buffer. This is tricky, though; if your Perl
	197	string
	198	is C<"\xff\xff">, then depending on the SV's internal encoding you might get
	199	back a 2-byte B<OR> a 4-byte C<char*>.
	200	Moreover, if it's the 4-byte string, that could come from either Perl
	201	C<"\xff\xff"> stored UTF-8 encoded, or Perl C<"\xc3\xbf\xc3\xbf"> stored
	202	as raw octets. To differentiate between these you B<MUST> look up the
	203	SV's UTF8 bit (cf. C<SvUTF8>) to know whether the source Perl string
	204	is 2 characters (C<SvUTF8> would be on) or 4 characters (C<SvUTF8> would be
	205	off).
	206
	207	B<IMPORTANT:> Use of C<SvPV>, C<SvPV_nolen>, or
	208	similarly-named macros I<without> looking up the SV's UTF8 bit is
	209	almost certainly a bug if non-ASCII input is allowed.
	210
	211	When the UTF8 bit is on, the same B<CAVEAT> about UTF-8 validity applies
	212	here as for C<SvPVutf8>.
	213
	214	=back
	215
	216	(See L</How do I pass a Perl string to a C library?> for more details.)
	217
	218	In C<SvPVbyte>, C<SvPVutf8>, and C<SvPV>, the length of the C<char*> returned
	219	is placed into the
	220	variable C<len> (these are macros, so you do I<not> use C<&len>). If you do
	221	not care what the length of the data is, use C<SvPVbyte_nolen>,
	222	C<SvPVutf8_nolen>, or C<SvPV_nolen> instead.
	223	The global variable C<PL_na> can also be given to
	224	C<SvPVbyte>/C<SvPVutf8>/C<SvPV>
	225	in this case. But that can be quite inefficient because C<PL_na> must
1fa8b10d JD	226	be accessed in thread-local storage in threaded Perl. In any case, remember
1fa8b10d JD	227	that Perl allows arbitrary strings of data that may both contain NULs and
6602b933	228	might not be terminated by a C<NUL>.
a0d0e21e	229
3c3f883d	230	Also remember that C doesn't allow you to safely say C<foo(SvPVbyte(s, len),
10e2eb10 FC	231	len);>. It might work with your
10e2eb10 FC	232	compiler, but it won't work for everyone.
ce2f5d8f KA	233	Break this sort of statement up into separate assignments:
ce2f5d8f KA	234
1aa6ea50 JC	235	SV *s;
1aa6ea50 JC	236	STRLEN len;
61955433	237	char *ptr;
3c3f883d	238	ptr = SvPVbyte(s, len);
1aa6ea50	239	foo(ptr, len);
ce2f5d8f	240
3c3f883d FG	241	=back
3c3f883d FG	242
07fa94a1	243	If you want to know if the scalar value is TRUE, you can use:
a0d0e21e LW	244
	245	SvTRUE(SV*)
	246
	247	Although Perl will automatically grow strings for you, if you need to force
	248	Perl to allocate more memory for your SV, you can use the macro
	249
	250	SvGROW(SV*, STRLEN newlen)
	251
	252	which will determine if more memory needs to be allocated. If so, it will
	253	call the function C<sv_grow>. Note that C<SvGROW> can only increase, not
5f05dabc	254	decrease, the allocated memory of an SV and that it does not automatically
6602b933	255	add space for the trailing C<NUL> byte (perl's own string functions typically do
8ebc5c01	256	C<SvGROW(sv, len + 1)>).
a0d0e21e	257
21134f66	258	If you want to write to an existing SV's buffer and set its value to a
3c3f883d	259	string, use SvPVbyte_force() or one of its variants to force the SV to be
21134f66 TC	260	a PV. This will remove any of various types of non-stringness from
	261	the SV while preserving the content of the SV in the PV. This can be
	262	used, for example, to append data from an API function to a buffer
	263	without extra copying:
	264
	265	(void)SvPVbyte_force(sv, len);
	266	s = SvGROW(sv, len + needlen + 1);
	267	/* something that modifies up to needlen bytes at s+len, but
	268	modifies newlen bytes
	269	eg. newlen = read(fd, s + len, needlen);
	270	ignoring errors for these examples
	271	*/
	272	s[len + newlen] = '\0';
	273	SvCUR_set(sv, len + newlen);
	274	SvUTF8_off(sv);
	275	SvSETMAGIC(sv);
	276
	277	If you already have the data in memory or if you want to keep your
	278	code simple, you can use one of the sv_cat*() variants, such as
	279	sv_catpvn(). If you want to insert anywhere in the string you can use
	280	sv_insert() or sv_insert_flags().
	281
	282	If you don't need the existing content of the SV, you can avoid some
	283	copying with:
	284
5b1fede8	285	SvPVCLEAR(sv);
21134f66 TC	286	s = SvGROW(sv, needlen + 1);
	287	/* something that modifies up to needlen bytes at s, but modifies
	288	newlen bytes
889339bf	289	eg. newlen = read(fd, s, needlen);
21134f66 TC	290	*/
	291	s[newlen] = '\0';
	292	SvCUR_set(sv, newlen);
	293	SvPOK_only(sv); /* also clears SVf_UTF8 */
	294	SvSETMAGIC(sv);
	295
	296	Again, if you already have the data in memory or want to avoid the
	297	complexity of the above, you can use sv_setpvn().
	298
	299	If you have a buffer allocated with Newx() and want to set that as the
	300	SV's value, you can use sv_usepvn_flags(). That has some requirements
	301	if you want to avoid perl re-allocating the buffer to fit the trailing
	302	NUL:
	303
	304	Newx(buf, somesize+1, char);
	305	/* ... fill in buf ... */
	306	buf[somesize] = '\0';
	307	sv_usepvn_flags(sv, buf, somesize, SV_SMAGIC \| SV_HAS_TRAILING_NUL);
	308	/* buf now belongs to perl, don't release it */
	309
a0d0e21e LW	310	If you have an SV and want to know what kind of data Perl thinks is stored
	311	in it, you can use the following macros to check the type of SV you have.
	312
	313	SvIOK(SV*)
	314	SvNOK(SV*)
	315	SvPOK(SV*)
	316
dcab5185 TC	317	Be aware that retrieving the numeric value of an SV can set IOK or NOK
	318	on that SV, even when the SV started as a string. Prior to Perl
	319	5.36.0 retrieving the string value of an integer could set POK, but
	320	this can no longer occur. From 5.36.0 this can be used to distinguish
	321	the original representation of an SV and is intended to make life
	322	simpler for serializers:
	323
	324	/* references handled elsewhere */
	325	if (SvIsBOOL(sv)) {
	326	/* originally boolean */
	327	...
	328	}
	329	else if (SvPOK(sv)) {
	330	/* originally a string */
	331	...
	332	}
	333	else if (SvNIOK(sv)) {
	334	/* originally numeric */
	335	...
	336	}
	337	else {
	338	/* something special or undef */
	339	}
	340
a0d0e21e LW	341	You can get and set the current length of the string stored in an SV with
	342	the following macros:
	343
	344	SvCUR(SV*)
	345	SvCUR_set(SV*, I32 val)
	346
cb1a09d0 AD	347	You can also get a pointer to the end of the string stored in the SV
	348	with the macro:
	349
	350	SvEND(SV*)
	351
	352	But note that these last three macros are valid only if C<SvPOK()> is true.
a0d0e21e	353
d1b91892 AD	354	If you want to append something to the end of string stored in an C<SV*>,
	355	you can use the following functions:
	356
08105a92	357	void sv_catpv(SV, const char);
e65f3abd	358	void sv_catpvn(SV, const char, STRLEN);
46fc3d4c	359	void sv_catpvf(SV, const char, ...);
a9b0660e KW	360	void sv_vcatpvfn(SV, const char, STRLEN, va_list , SV *,
a9b0660e KW	361	I32, bool);
d1b91892 AD	362	void sv_catsv(SV, SV);
	363
	364	The first function calculates the length of the string to be appended by
	365	using C<strlen>. In the second, you specify the length of the string
46fc3d4c	366	yourself. The third function processes its arguments like C<sprintf> and
9abd00ed GS	367	appends the formatted output. The fourth function works like C<vsprintf>.
9abd00ed GS	368	You can specify the address and length of an array of SVs instead of the
10e2eb10 FC	369	va_list argument. The fifth function
10e2eb10 FC	370	extends the string stored in the first
9abd00ed GS	371	SV with the string stored in the second SV. It also forces the second SV
	372	to be interpreted as a string.
	373
	374	The C<sv_cat*()> functions are not generic enough to operate on values that
5a0de581	375	have "magic". See L</Magic Virtual Tables> later in this document.
d1b91892	376
a0d0e21e LW	377	If you know the name of a scalar variable, you can get a pointer to its SV
	378	by using the following:
	379
64ace3f8	380	SV* get_sv("package::varname", 0);
a0d0e21e LW	381
	382	This returns NULL if the variable does not exist.
	383
d1b91892	384	If you want to know if this variable (or any other SV) is actually C<defined>,
a0d0e21e LW	385	you can call:
	386
	387	SvOK(SV*)
	388
06f6df17	389	The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>.
9adebda4	390
10e2eb10 FC	391	Its address can be used whenever an C<SV*> is needed. Make sure that
10e2eb10 FC	392	you don't try to compare a random sv with C<&PL_sv_undef>. For example
9adebda4 SB	393	when interfacing Perl code, it'll work correctly for:
	394
	395	foo(undef);
	396
	397	But won't work when called as:
	398
	399	$x = undef;
	400	foo($x);
	401
	402	So to repeat always use SvOK() to check whether an sv is defined.
	403
	404	Also you have to be careful when using C<&PL_sv_undef> as a value in
5a0de581	405	AVs or HVs (see L</AVs, HVs and undefined values>).
a0d0e21e	406
06f6df17 RGS	407	There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain
	408	boolean TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their
	409	addresses can be used whenever an C<SV*> is needed.
a0d0e21e	410
9cde0e7f	411	Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>.
a0d0e21e LW	412	Take this code:
	413
	414	SV* sv = (SV*) 0;
	415	if (I-am-to-return-a-real-value) {
	416	sv = sv_2mortal(newSViv(42));
	417	}
	418	sv_setsv(ST(0), sv);
	419
	420	This code tries to return a new SV (which contains the value 42) if it should
04343c6d	421	return a real value, or undef otherwise. Instead it has returned a NULL
a0d0e21e	422	pointer which, somewhere down the line, will cause a segmentation violation,
06f6df17 RGS	423	bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the
06f6df17 RGS	424	first line and all will be well.
a0d0e21e LW	425
a0d0e21e LW	426	To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this
5a0de581	427	call is not necessary (see L</Reference Counts and Mortality>).
a0d0e21e	428
94dde4fb SC	429	=head2 Offsets
	430
	431	Perl provides the function C<sv_chop> to efficiently remove characters
	432	from the beginning of a string; you give it an SV and a pointer to
da75cd15	433	somewhere inside the PV, and it discards everything before the
10e2eb10	434	pointer. The efficiency comes by means of a little hack: instead of
94dde4fb SC	435	actually removing the characters, C<sv_chop> sets the flag C<OOK>
94dde4fb SC	436	(offset OK) to signal to other functions that the offset hack is in
883bb8c0 KW	437	effect, and it moves the PV pointer (called C<SvPVX>) forward
	438	by the number of bytes chopped off, and adjusts C<SvCUR> and C<SvLEN>
	439	accordingly. (A portion of the space between the old and new PV
	440	pointers is used to store the count of chopped bytes.)
94dde4fb SC	441
	442	Hence, at this point, the start of the buffer that we allocated lives
	443	at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing
	444	into the middle of this allocated storage.
	445
f942a0df FC	446	This is best demonstrated by example. Normally copy-on-write will prevent
	447	the substitution from operator from using this hack, but if you can craft a
	448	string for which copy-on-write is not possible, you can see it in play. In
	449	the current implementation, the final byte of a string buffer is used as a
	450	copy-on-write reference count. If the buffer is not big enough, then
	451	copy-on-write is skipped. First have a look at an empty string:
	452
	453	% ./perl -Ilib -MDevel::Peek -le '$a=""; $a .= ""; Dump $a'
	454	SV = PV(0x7ffb7c008a70) at 0x7ffb7c030390
	455	REFCNT = 1
	456	FLAGS = (POK,pPOK)
	457	PV = 0x7ffb7bc05b50 ""\0
	458	CUR = 0
	459	LEN = 10
	460
	461	Notice here the LEN is 10. (It may differ on your platform.) Extend the
	462	length of the string to one less than 10, and do a substitution:
94dde4fb	463
e46aa1dd KW	464	% ./perl -Ilib -MDevel::Peek -le '$a=""; $a.="123456789"; $a=~s/.//; \
	465	Dump($a)'
	466	SV = PV(0x7ffa04008a70) at 0x7ffa04030390
	467	REFCNT = 1
	468	FLAGS = (POK,OOK,pPOK)
	469	OFFSET = 1
	470	PV = 0x7ffa03c05b61 ( "\1" . ) "23456789"\0
	471	CUR = 8
	472	LEN = 9
94dde4fb	473
f942a0df	474	Here the number of bytes chopped off (1) is shown next as the OFFSET. The
94dde4fb SC	475	portion of the string between the "real" and the "fake" beginnings is
94dde4fb SC	476	shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect
f942a0df FC	477	the fake beginning, not the real one. (The first character of the string
	478	buffer happens to have changed to "\1" here, not "1", because the current
	479	implementation stores the offset count in the string buffer. This is
	480	subject to change.)
94dde4fb	481
fe854a6f	482	Something similar to the offset hack is performed on AVs to enable
319cef53 SC	483	efficient shifting and splicing off the beginning of the array; while
319cef53 SC	484	C<AvARRAY> points to the first element in the array that is visible from
10e2eb10	485	Perl, C<AvALLOC> points to the real start of the C array. These are
319cef53	486	usually the same, but a C<shift> operation can be carried out by
6de131f0	487	increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvMAX>.
319cef53	488	Again, the location of the real start of the C array only comes into
10e2eb10	489	play when freeing the array. See C<av_shift> in F<av.c>.
319cef53	490
6ef63541 KW	491	=for apidoc_section $AV
	492	=for apidoc Amh\|\|AvALLOC\|AV* av
	493
d1b91892	494	=head2 What's Really Stored in an SV?
a0d0e21e LW	495
a0d0e21e LW	496	Recall that the usual method of determining the type of scalar you have is
5f05dabc	497	to use C<Sv*OK> macros. Because a scalar can be both a number and a string,
d1b91892	498	usually these macros will always return TRUE and calling the C<Sv*V>
a0d0e21e LW	499	macros will do the appropriate conversion of string to integer/double or
	500	integer/double to string.
	501
	502	If you I<really> need to know if you have an integer, double, or string
	503	pointer in an SV, you can use the following three macros instead:
	504
	505	SvIOKp(SV*)
	506	SvNOKp(SV*)
	507	SvPOKp(SV*)
	508
	509	These will tell you if you truly have an integer, double, or string pointer
d1b91892	510	stored in your SV. The "p" stands for private.
a0d0e21e	511
da8c5729	512	There are various ways in which the private and public flags may differ.
9090718a FC	513	For example, in perl 5.16 and earlier a tied SV may have a valid
	514	underlying value in the IV slot (so SvIOKp is true), but the data
	515	should be accessed via the FETCH routine rather than directly,
	516	so SvIOK is false. (In perl 5.18 onwards, tied scalars use
	517	the flags the same way as untied scalars.) Another is when
d7f8936a	518	numeric conversion has occurred and precision has been lost: only the
10e2eb10	519	private flag is set on 'lossy' values. So when an NV is converted to an
9e9796d6 JH	520	IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be.
9e9796d6 JH	521
07fa94a1	522	In general, though, it's best to use the C<Sv*V> macros.
a0d0e21e	523
54310121	524	=head2 Working with AVs
a0d0e21e	525
e69d7f8b RL	526	There are two main, longstanding ways to create and load an AV. The first
e69d7f8b RL	527	method creates an empty AV:
a0d0e21e LW	528
	529	AV* newAV();
	530
54310121	531	The second method both creates the AV and initially populates it with SVs:
a0d0e21e	532
c70927a6	533	AV* av_make(SSize_t num, SV **ptr);
a0d0e21e	534
5f05dabc	535	The second argument points to an array containing C<num> C<SV*>'s. Once the
54310121	536	AV has been created, the SVs can be destroyed, if so desired.
a0d0e21e	537
e69d7f8b RL	538	Perl v5.36 added two new ways to create an AV and allocate a SV** array
	539	without populating it. These are more efficient than a newAV() followed by an
	540	av_extend().
	541
	542	/* Creates but does not initialize (Zero) the SV** array */
	543	AV *av = newAV_alloc_x(1);
	544	/* Creates and does initialize (Zero) the SV** array */
	545	AV *av = newAV_alloc_xz(1);
	546
	547	The numerical argument refers to the number of array elements to allocate, not
	548	an array index, and must be >0. The first form must only ever be used when all
	549	elements will be initialized before any read occurs. Reading a non-initialized
	550	SV* - i.e. treating a random memory address as a SV* - is a serious bug.
	551
da8c5729	552	Once the AV has been created, the following operations are possible on it:
a0d0e21e LW	553
	554	void av_push(AV, SV);
	555	SV* av_pop(AV*);
	556	SV* av_shift(AV*);
c70927a6	557	void av_unshift(AV*, SSize_t num);
a0d0e21e LW	558
	559	These should be familiar operations, with the exception of C<av_unshift>.
	560	This routine adds C<num> elements at the front of the array with the C<undef>
	561	value. You must then use C<av_store> (described below) to assign values
	562	to these new elements.
	563
	564	Here are some other functions:
	565
c70927a6 FC	566	SSize_t av_top_index(AV*);
	567	SV** av_fetch(AV*, SSize_t key, I32 lval);
	568	SV** av_store(AV, SSize_t key, SV val);
a0d0e21e	569
dab460cd	570	The C<av_top_index> function returns the highest index value in an array (just
5f05dabc	571	like $#array in Perl). If the array is empty, -1 is returned. The
	572	C<av_fetch> function returns the value at index C<key>, but if C<lval>
	573	is non-zero, then C<av_fetch> will store an undef value at that index.
04343c6d GS	574	The C<av_store> function stores the value C<val> at index C<key>, and does
	575	not increment the reference count of C<val>. Thus the caller is responsible
	576	for taking care of that, and if C<av_store> returns NULL, the caller will
	577	have to decrement the reference count to avoid a memory leak. Note that
	578	C<av_fetch> and C<av_store> both return C<SV*>'s, not C<SV>'s as their
	579	return value.
d1b91892	580
da8c5729 MH	581	A few more:
da8c5729 MH	582
a0d0e21e	583	void av_clear(AV*);
a0d0e21e	584	void av_undef(AV*);
c70927a6	585	void av_extend(AV*, SSize_t key);
5f05dabc	586
	587	The C<av_clear> function deletes all the elements in the AV* array, but
	588	does not actually delete the array itself. The C<av_undef> function will
	589	delete all the elements in the array plus the array itself. The
adc882cf GS	590	C<av_extend> function extends the array so that it contains at least C<key+1>
	591	elements. If C<key+1> is less than the currently allocated length of the array,
	592	then nothing is done.
a0d0e21e LW	593
	594	If you know the name of an array variable, you can get a pointer to its AV
	595	by using the following:
	596
cbfd0a87	597	AV* get_av("package::varname", 0);
a0d0e21e LW	598
	599	This returns NULL if the variable does not exist.
	600
5a0de581	601	See L</Understanding the Magic of Tied Hashes and Arrays> for more
04343c6d GS	602	information on how to use the array access functions on tied arrays.
04343c6d GS	603
e69d7f8b RL	604	=head3 More efficient working with new or vanilla AVs
	605
	606	Perl v5.36 and v5.38 introduced streamlined, inlined versions of some
	607	functions:
	608
	609	=over
	610
	611	=item * C<av_store_simple>
	612
	613	=item * C<av_fetch_simple>
	614
	615	=item * C<av_push_simple>
	616
	617	=back
	618
	619	These are drop-in replacements, but can only be used on straightforward
	620	AVs that meet the following criteria:
	621
	622	=over
	623
	624	=item * are not magical
	625
	626	=item * are not readonly
	627
	628	=item * are "real" (refcounted) AVs
	629
	630	=item * have an av_top_index value > -2
	631
	632	=back
	633
	634	AVs created using C<newAV()>, C<av_make>, C<newAV_alloc_x>, and
	635	C<newAV_alloc_xz> are all compatible at the time of creation. It is
	636	only if they are declared readonly or unreal, have magic attached, or
	637	are otherwise configured unusually that they will stop being compatible.
	638
	639	Note that some interpreter functions may attach magic to an AV as part
	640	of normal operations. It is therefore safest, unless you are sure of the
	641	lifecycle of an AV, to only use these new functions close to the point
	642	of AV creation.
	643
54310121	644	=head2 Working with HVs
a0d0e21e LW	645
	646	To create an HV, you use the following routine:
	647
	648	HV* newHV();
	649
da8c5729	650	Once the HV has been created, the following operations are possible on it:
a0d0e21e	651
08105a92 GS	652	SV** hv_store(HV, const char key, U32 klen, SV* val, U32 hash);
08105a92 GS	653	SV** hv_fetch(HV, const char key, U32 klen, I32 lval);
a0d0e21e	654
5f05dabc	655	The C<klen> parameter is the length of the key being passed in (Note that
	656	you cannot pass 0 in as a value of C<klen> to tell Perl to measure the
	657	length of the key). The C<val> argument contains the SV pointer to the
54310121	658	scalar being stored, and C<hash> is the precomputed hash value (zero if
5f05dabc	659	you want C<hv_store> to calculate it for you). The C<lval> parameter
	660	indicates whether this fetch is actually a part of a store operation, in
	661	which case a new undefined value will be added to the HV with the supplied
	662	key and C<hv_fetch> will return as if the value had already existed.
a0d0e21e	663
5f05dabc	664	Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just
	665	C<SV*>. To access the scalar value, you must first dereference the return
	666	value. However, you should check to make sure that the return value is
	667	not NULL before dereferencing it.
a0d0e21e	668
da8c5729 MH	669	The first of these two functions checks if a hash table entry exists, and the
da8c5729 MH	670	second deletes it.
a0d0e21e	671
08105a92 GS	672	bool hv_exists(HV, const char key, U32 klen);
08105a92 GS	673	SV* hv_delete(HV, const char key, U32 klen, I32 flags);
a0d0e21e	674
5f05dabc	675	If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will
	676	create and return a mortal copy of the deleted value.
	677
a0d0e21e LW	678	And more miscellaneous functions:
	679
	680	void hv_clear(HV*);
a0d0e21e	681	void hv_undef(HV*);
5f05dabc	682
	683	Like their AV counterparts, C<hv_clear> deletes all the entries in the hash
	684	table but does not actually delete the hash table. The C<hv_undef> deletes
	685	both the entries and the hash table itself.
a0d0e21e	686
a9b0660e	687	Perl keeps the actual data in a linked list of structures with a typedef of HE.
d1b91892 AD	688	These contain the actual key and value pointers (plus extra administrative
	689	overhead). The key is a string pointer; the value is an C<SV*>. However,
	690	once you have an C<HE*>, to get the actual key and value, use the routines
	691	specified below.
	692
7cc7ada7	693	=for apidoc_section $HV
63dbc4a9 KW	694	=for apidoc Ayh\|\|HE
63dbc4a9 KW	695
a0d0e21e LW	696	I32 hv_iterinit(HV*);
	697	/* Prepares starting point to traverse hash table */
	698	HE* hv_iternext(HV*);
	699	/* Get the next entry, and return a pointer to a
	700	structure that has both the key and value */
	701	char* hv_iterkey(HE* entry, I32* retlen);
	702	/* Get the key from an HE structure and also return
	703	the length of the key string */
cb1a09d0	704	SV* hv_iterval(HV, HE entry);
d1be9408	705	/* Return an SV pointer to the value of the HE
a0d0e21e	706	structure */
cb1a09d0	707	SV* hv_iternextsv(HV, char* key, I32* retlen);
d1b91892 AD	708	/* This convenience routine combines hv_iternext,
	709	hv_iterkey, and hv_iterval. The key and retlen
	710	arguments are return values for the key and its
	711	length. The value is returned in the SV* argument */
a0d0e21e LW	712
	713	If you know the name of a hash variable, you can get a pointer to its HV
	714	by using the following:
	715
6673a63c	716	HV* get_hv("package::varname", 0);
a0d0e21e LW	717
	718	This returns NULL if the variable does not exist.
	719
a43e7901	720	The hash algorithm is defined in the C<PERL_HASH> macro:
a0d0e21e	721
a43e7901	722	PERL_HASH(hash, key, klen)
ab192400	723
a43e7901 YO	724	The exact implementation of this macro varies by architecture and version
	725	of perl, and the return value may change per invocation, so the value
	726	is only valid for the duration of a single perl process.
a0d0e21e	727
5a0de581	728	See L</Understanding the Magic of Tied Hashes and Arrays> for more
04343c6d GS	729	information on how to use the hash access functions on tied hashes.
04343c6d GS	730
3f620621	731	=for apidoc_section $HV
4f313521 KW	732	=for apidoc Amh\|void\|PERL_HASH\|U32 hash\|char *key\|STRLEN klen
4f313521 KW	733
1e422769	734	=head2 Hash API Extensions
	735
	736	Beginning with version 5.004, the following functions are also supported:
	737
	738	HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash);
	739	HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash);
c47ff5f1	740
1e422769	741	bool hv_exists_ent (HV* tb, SV* key, U32 hash);
1e422769	742	SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
c47ff5f1	743
1e422769	744	SV* hv_iterkeysv (HE* entry);
	745
	746	Note that these functions take C<SV*> keys, which simplifies writing
	747	of extension code that deals with hash structures. These functions
	748	also allow passing of C<SV*> keys to C<tie> functions without forcing
	749	you to stringify the keys (unlike the previous set of functions).
	750
	751	They also return and accept whole hash entries (C<HE*>), making their
	752	use more efficient (since the hash number for a particular string
4a4eefd0 GS	753	doesn't have to be recomputed every time). See L<perlapi> for detailed
4a4eefd0 GS	754	descriptions.
1e422769	755
	756	The following macros must always be used to access the contents of hash
	757	entries. Note that the arguments to these macros must be simple
	758	variables, since they may get evaluated more than once. See
4a4eefd0	759	L<perlapi> for detailed descriptions of these macros.
1e422769	760
	761	HePV(HE* he, STRLEN len)
	762	HeVAL(HE* he)
	763	HeHASH(HE* he)
	764	HeSVKEY(HE* he)
	765	HeSVKEY_force(HE* he)
	766	HeSVKEY_set(HE* he, SV* sv)
	767
	768	These two lower level macros are defined, but must only be used when
	769	dealing with keys that are not C<SV*>s:
	770
	771	HeKEY(HE* he)
	772	HeKLEN(HE* he)
	773
04343c6d GS	774	Note that both C<hv_store> and C<hv_store_ent> do not increment the
	775	reference count of the stored C<val>, which is the caller's responsibility.
	776	If these functions return a NULL value, the caller will usually have to
	777	decrement the reference count of C<val> to avoid a memory leak.
1e422769	778
a9381218 MHM	779	=head2 AVs, HVs and undefined values
a9381218 MHM	780
10e2eb10 FC	781	Sometimes you have to store undefined values in AVs or HVs. Although
10e2eb10 FC	782	this may be a rare case, it can be tricky. That's because you're
a9381218 MHM	783	used to using C<&PL_sv_undef> if you need an undefined SV.
	784
	785	For example, intuition tells you that this XS code:
	786
	787	AV *av = newAV();
	788	av_store( av, 0, &PL_sv_undef );
	789
	790	is equivalent to this Perl code:
	791
	792	my @av;
	793	$av[0] = undef;
	794
f3c4ec28	795	Unfortunately, this isn't true. In perl 5.18 and earlier, AVs use C<&PL_sv_undef> as a marker
a9381218 MHM	796	for indicating that an array element has not yet been initialized.
a9381218 MHM	797	Thus, C<exists $av[0]> would be true for the above Perl code, but
f3c4ec28 FC	798	false for the array generated by the XS code. In perl 5.20, storing
	799	&PL_sv_undef will create a read-only element, because the scalar
	800	&PL_sv_undef itself is stored, not a copy.
a9381218	801
f3c4ec28	802	Similar problems can occur when storing C<&PL_sv_undef> in HVs:
a9381218 MHM	803
	804	hv_store( hv, "key", 3, &PL_sv_undef, 0 );
	805
	806	This will indeed make the value C<undef>, but if you try to modify
	807	the value of C<key>, you'll get the following error:
	808
	809	Modification of non-creatable hash value attempted
	810
	811	In perl 5.8.0, C<&PL_sv_undef> was also used to mark placeholders
10e2eb10	812	in restricted hashes. This caused such hash entries not to appear
a9381218 MHM	813	when iterating over the hash or when checking for the keys
	814	with the C<hv_exists> function.
	815
8abccac8	816	You can run into similar problems when you store C<&PL_sv_yes> or
10e2eb10	817	C<&PL_sv_no> into AVs or HVs. Trying to modify such elements
a9381218 MHM	818	will give you the following error:
	819
	820	Modification of a read-only value attempted
	821
	822	To make a long story short, you can use the special variables
8abccac8	823	C<&PL_sv_undef>, C<&PL_sv_yes> and C<&PL_sv_no> with AVs and
a9381218 MHM	824	HVs, but you have to make sure you know what you're doing.
	825
	826	Generally, if you want to store an undefined value in an AV
	827	or HV, you should not use C<&PL_sv_undef>, but rather create a
	828	new undefined value using the C<newSV> function, for example:
	829
	830	av_store( av, 42, newSV(0) );
	831	hv_store( hv, "foo", 3, newSV(0), 0 );
	832
a0d0e21e LW	833	=head2 References
a0d0e21e LW	834
d1b91892	835	References are a special type of scalar that point to other data types
a9b0660e	836	(including other references).
a0d0e21e	837
07fa94a1	838	To create a reference, use either of the following functions:
a0d0e21e	839
5f05dabc	840	SV* newRV_inc((SV*) thing);
5f05dabc	841	SV* newRV_noinc((SV*) thing);
a0d0e21e	842
5f05dabc	843	The C<thing> argument can be any of an C<SV>, C<AV>, or C<HV*>. The
07fa94a1 JO	844	functions are identical except that C<newRV_inc> increments the reference
	845	count of the C<thing>, while C<newRV_noinc> does not. For historical
	846	reasons, C<newRV> is a synonym for C<newRV_inc>.
	847
	848	Once you have a reference, you can use the following macro to dereference
	849	the reference:
a0d0e21e LW	850
	851	SvRV(SV*)
	852
	853	then call the appropriate routines, casting the returned C<SV*> to either an
d1b91892	854	C<AV> or C<HV>, if required.
a0d0e21e	855
d1b91892	856	To determine if an SV is a reference, you can use the following macro:
a0d0e21e LW	857
	858	SvROK(SV*)
	859
07fa94a1 JO	860	To discover what type of value the reference refers to, use the following
07fa94a1 JO	861	macro and then check the return value.
d1b91892 AD	862
	863	SvTYPE(SvRV(SV*))
	864
	865	The most useful types that will be returned are:
	866
a5e62da0 FC	867	SVt_PVAV Array
	868	SVt_PVHV Hash
	869	SVt_PVCV Code
	870	SVt_PVGV Glob (possibly a file handle)
	871
2d0e7d1f DM	872	Any numerical value returned which is less than SVt_PVAV will be a scalar
	873	of some form.
	874
a5e62da0	875	See L<perlapi/svtype> for more details.
d1b91892	876
cb1a09d0 AD	877	=head2 Blessed References and Class Objects
cb1a09d0 AD	878
06f6df17	879	References are also used to support object-oriented programming. In perl's
cb1a09d0 AD	880	OO lexicon, an object is simply a reference that has been blessed into a
	881	package (or class). Once blessed, the programmer may now use the reference
	882	to access the various methods in the class.
	883
	884	A reference can be blessed into a package with the following function:
	885
	886	SV* sv_bless(SV* sv, HV* stash);
	887
06f6df17 RGS	888	The C<sv> argument must be a reference value. The C<stash> argument
06f6df17 RGS	889	specifies which class the reference will belong to. See
5a0de581	890	L</Stashes and Globs> for information on converting class names into stashes.
cb1a09d0 AD	891
	892	/* Still under construction */
	893
ddd2cc91 DM	894	The following function upgrades rv to reference if not already one.
	895	Creates a new SV for rv to point to. If C<classname> is non-null, the SV
	896	is blessed into the specified class. SV is returned.
cb1a09d0	897
08105a92	898	SV* newSVrv(SV* rv, const char* classname);
cb1a09d0	899
ddd2cc91 DM	900	The following three functions copy integer, unsigned integer or double
	901	into an SV whose reference is C<rv>. SV is blessed if C<classname> is
	902	non-null.
cb1a09d0	903
08105a92	904	SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
e1c57cef	905	SV* sv_setref_uv(SV* rv, const char* classname, UV uv);
08105a92	906	SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
cb1a09d0	907
ddd2cc91 DM	908	The following function copies the pointer value (I<the address, not the
	909	string!>) into an SV whose reference is rv. SV is blessed if C<classname>
	910	is non-null.
cb1a09d0	911
ddd2cc91	912	SV* sv_setref_pv(SV* rv, const char* classname, void* pv);
cb1a09d0	913
a9b0660e	914	The following function copies a string into an SV whose reference is C<rv>.
ddd2cc91 DM	915	Set length to 0 to let Perl calculate the string length. SV is blessed if
ddd2cc91 DM	916	C<classname> is non-null.
cb1a09d0	917
a9b0660e KW	918	SV* sv_setref_pvn(SV* rv, const char* classname, char* pv,
a9b0660e KW	919	STRLEN length);
cb1a09d0	920
ddd2cc91 DM	921	The following function tests whether the SV is blessed into the specified
ddd2cc91 DM	922	class. It does not check inheritance relationships.
9abd00ed	923
08105a92	924	int sv_isa(SV* sv, const char* name);
9abd00ed	925
ddd2cc91	926	The following function tests whether the SV is a reference to a blessed object.
9abd00ed GS	927
	928	int sv_isobject(SV* sv);
	929
ddd2cc91	930	The following function tests whether the SV is derived from the specified
10e2eb10 FC	931	class. SV can be either a reference to a blessed object or a string
10e2eb10 FC	932	containing a class name. This is the function implementing the
ddd2cc91	933	C<UNIVERSAL::isa> functionality.
9abd00ed	934
08105a92	935	bool sv_derived_from(SV* sv, const char* name);
9abd00ed	936
00aadd71	937	To check if you've got an object derived from a specific class you have
9abd00ed GS	938	to write:
	939
	940	if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }
cb1a09d0	941
5f05dabc	942	=head2 Creating New Variables
cb1a09d0	943
5f05dabc	944	To create a new Perl variable with an undef value which can be accessed from
5f05dabc	945	your Perl script, use the following routines, depending on the variable type.
cb1a09d0	946
64ace3f8	947	SV* get_sv("package::varname", GV_ADD);
cbfd0a87	948	AV* get_av("package::varname", GV_ADD);
6673a63c	949	HV* get_hv("package::varname", GV_ADD);
cb1a09d0	950
058a5f6c	951	Notice the use of GV_ADD as the second parameter. The new variable can now
cb1a09d0 AD	952	be set, using the routines appropriate to the data type.
cb1a09d0 AD	953
5f05dabc	954	There are additional macros whose values may be bitwise OR'ed with the
058a5f6c	955	C<GV_ADD> argument to enable certain extra features. Those bits are:
cb1a09d0	956
9a68f1db SB	957	=over
	958
	959	=item GV_ADDMULTI
	960
	961	Marks the variable as multiply defined, thus preventing the:
	962
	963	Name <varname> used only once: possible typo
	964
	965	warning.
	966
9a68f1db SB	967	=item GV_ADDWARN
	968
	969	Issues the warning:
	970
	971	Had to create <varname> unexpectedly
	972
	973	if the variable did not exist before the function was called.
	974
	975	=back
cb1a09d0	976
07fa94a1 JO	977	If you do not specify a package name, the variable is created in the current
07fa94a1 JO	978	package.
cb1a09d0	979
5f05dabc	980	=head2 Reference Counts and Mortality
a0d0e21e	981
10e2eb10	982	Perl uses a reference count-driven garbage collection mechanism. SVs,
54310121	983	AVs, or HVs (xV for short in the following) start their life with a
55497cff	984	reference count of 1. If the reference count of an xV ever drops to 0,
07fa94a1	985	then it will be destroyed and its memory made available for reuse.
3d2ba989 Z	986	At the most basic internal level, reference counts can be manipulated
3d2ba989 Z	987	with the following macros:
55497cff	988
55497cff	989	int SvREFCNT(SV* sv);
5f05dabc	990	SV* SvREFCNT_inc(SV* sv);
55497cff	991	void SvREFCNT_dec(SV* sv);
55497cff	992
3d2ba989 Z	993	(There are also suffixed versions of the increment and decrement macros,
	994	for situations where the full generality of these basic macros can be
	995	exchanged for some performance.)
	996
	997	However, the way a programmer should think about references is not so
	998	much in terms of the bare reference count, but in terms of I<ownership>
	999	of references. A reference to an xV can be owned by any of a variety
	1000	of entities: another xV, the Perl interpreter, an XS data structure,
	1001	a piece of running code, or a dynamic scope. An xV generally does not
	1002	know what entities own the references to it; it only knows how many
	1003	references there are, which is the reference count.
	1004
	1005	To correctly maintain reference counts, it is essential to keep track
	1006	of what references the XS code is manipulating. The programmer should
	1007	always know where a reference has come from and who owns it, and be
	1008	aware of any creation or destruction of references, and any transfers
	1009	of ownership. Because ownership isn't represented explicitly in the xV
	1010	data structures, only the reference count need be actually maintained
	1011	by the code, and that means that this understanding of ownership is not
	1012	actually evident in the code. For example, transferring ownership of a
	1013	reference from one owner to another doesn't change the reference count
	1014	at all, so may be achieved with no actual code. (The transferring code
	1015	doesn't touch the referenced object, but does need to ensure that the
	1016	former owner knows that it no longer owns the reference, and that the
	1017	new owner knows that it now does.)
	1018
	1019	An xV that is visible at the Perl level should not become unreferenced
	1020	and thus be destroyed. Normally, an object will only become unreferenced
	1021	when it is no longer visible, often by the same means that makes it
	1022	invisible. For example, a Perl reference value (RV) owns a reference to
	1023	its referent, so if the RV is overwritten that reference gets destroyed,
	1024	and the no-longer-reachable referent may be destroyed as a result.
	1025
	1026	Many functions have some kind of reference manipulation as
	1027	part of their purpose. Sometimes this is documented in terms
	1028	of ownership of references, and sometimes it is (less helpfully)
	1029	documented in terms of changes to reference counts. For example, the
	1030	L<newRV_inc()\|perlapi/newRV_inc> function is documented to create a new RV
	1031	(with reference count 1) and increment the reference count of the referent
	1032	that was supplied by the caller. This is best understood as creating
	1033	a new reference to the referent, which is owned by the created RV,
	1034	and returning to the caller ownership of the sole reference to the RV.
	1035	The L<newRV_noinc()\|perlapi/newRV_noinc> function instead does not
	1036	increment the reference count of the referent, but the RV nevertheless
	1037	ends up owning a reference to the referent. It is therefore implied
	1038	that the caller of C<newRV_noinc()> is relinquishing a reference to the
	1039	referent, making this conceptually a more complicated operation even
	1040	though it does less to the data structures.
	1041
	1042	For example, imagine you want to return a reference from an XSUB
	1043	function. Inside the XSUB routine, you create an SV which initially
	1044	has just a single reference, owned by the XSUB routine. This reference
	1045	needs to be disposed of before the routine is complete, otherwise it
	1046	will leak, preventing the SV from ever being destroyed. So to create
	1047	an RV referencing the SV, it is most convenient to pass the SV to
	1048	C<newRV_noinc()>, which consumes that reference. Now the XSUB routine
	1049	no longer owns a reference to the SV, but does own a reference to the RV,
	1050	which in turn owns a reference to the SV. The ownership of the reference
	1051	to the RV is then transferred by the process of returning the RV from
	1052	the XSUB.
55497cff	1053
5f05dabc	1054	There are some convenience functions available that can help with the
54310121	1055	destruction of xVs. These functions introduce the concept of "mortality".
3d2ba989 Z	1056	Much documentation speaks of an xV itself being mortal, but this is
	1057	misleading. It is really I<a reference to> an xV that is mortal, and it
	1058	is possible for there to be more than one mortal reference to a single xV.
	1059	For a reference to be mortal means that it is owned by the temps stack,
	1060	one of perl's many internal stacks, which will destroy that reference
	1061	"a short time later". Usually the "short time later" is the end of
	1062	the current Perl statement. However, it gets more complicated around
	1063	dynamic scopes: there can be multiple sets of mortal references hanging
	1064	around at the same time, with different death dates. Internally, the
	1065	actual determinant for when mortal xV references are destroyed depends
	1066	on two macros, SAVETMPS and FREETMPS. See L<perlcall> and L<perlxs>
e55ec392	1067	and L</Temporaries Stack> below for more details on these macros.
3d2ba989 Z	1068
	1069	Mortal references are mainly used for xVs that are placed on perl's
	1070	main stack. The stack is problematic for reference tracking, because it
	1071	contains a lot of xV references, but doesn't own those references: they
	1072	are not counted. Currently, there are many bugs resulting from xVs being
	1073	destroyed while referenced by the stack, because the stack's uncounted
	1074	references aren't enough to keep the xVs alive. So when putting an
	1075	(uncounted) reference on the stack, it is vitally important to ensure that
	1076	there will be a counted reference to the same xV that will last at least
	1077	as long as the uncounted reference. But it's also important that that
	1078	counted reference be cleaned up at an appropriate time, and not unduly
	1079	prolong the xV's life. For there to be a mortal reference is often the
	1080	best way to satisfy this requirement, especially if the xV was created
	1081	especially to be put on the stack and would otherwise be unreferenced.
	1082
	1083	To create a mortal reference, use the functions:
a0d0e21e LW	1084
a0d0e21e LW	1085	SV* sv_newmortal()
a0d0e21e	1086	SV* sv_mortalcopy(SV*)
3d2ba989	1087	SV* sv_2mortal(SV*)
a0d0e21e	1088
3d2ba989 Z	1089	C<sv_newmortal()> creates an SV (with the undefined value) whose sole
	1090	reference is mortal. C<sv_mortalcopy()> creates an xV whose value is a
	1091	copy of a supplied xV and whose sole reference is mortal. C<sv_2mortal()>
	1092	mortalises an existing xV reference: it transfers ownership of a reference
	1093	from the caller to the temps stack. Because C<sv_newmortal> gives the new
	1094	SV no value, it must normally be given one via C<sv_setpv>, C<sv_setiv>,
	1095	etc. :
00aadd71 NIS	1096
	1097	SV *tmp = sv_newmortal();
	1098	sv_setiv(tmp, an_integer);
	1099
	1100	As that is multiple C statements it is quite common so see this idiom instead:
	1101
	1102	SV *tmp = sv_2mortal(newSViv(an_integer));
	1103
ac036724	1104	The mortal routines are not just for SVs; AVs and HVs can be
faed5253	1105	made mortal by passing their address (type-casted to C<SV*>) to the
07fa94a1	1106	C<sv_2mortal> or C<sv_mortalcopy> routines.
a0d0e21e	1107
5f05dabc	1108	=head2 Stashes and Globs
a0d0e21e	1109
06f6df17 RGS	1110	A B<stash> is a hash that contains all variables that are defined
06f6df17 RGS	1111	within a package. Each key of the stash is a symbol
aa689395	1112	name (shared by all the different types of objects that have the same
	1113	name), and each value in the hash table is a GV (Glob Value). This GV
	1114	in turn contains references to the various objects of that name,
	1115	including (but not limited to) the following:
cb1a09d0	1116
a0d0e21e LW	1117	Scalar Value
	1118	Array Value
	1119	Hash Value
a3cb178b	1120	I/O Handle
a0d0e21e LW	1121	Format
	1122	Subroutine
	1123
06f6df17 RGS	1124	There is a single stash called C<PL_defstash> that holds the items that exist
	1125	in the C<main> package. To get at the items in other packages, append the
	1126	string "::" to the package name. The items in the C<Foo> package are in
	1127	the stash C<Foo::> in PL_defstash. The items in the C<Bar::Baz> package are
	1128	in the stash C<Baz::> in C<Bar::>'s stash.
a0d0e21e	1129
6ef63541 KW	1130	=for apidoc_section $GV
	1131	=for apidoc Amnh\|\|PL_defstash
	1132
d1b91892	1133	To get the stash pointer for a particular package, use the function:
a0d0e21e	1134
da51bb9b NC	1135	HV* gv_stashpv(const char* name, I32 flags)
da51bb9b NC	1136	HV* gv_stashsv(SV*, I32 flags)
a0d0e21e LW	1137
a0d0e21e LW	1138	The first function takes a literal string, the second uses the string stored
d1b91892	1139	in the SV. Remember that a stash is just a hash table, so you get back an
da51bb9b	1140	C<HV*>. The C<flags> flag will create a new package if it is set to GV_ADD.
a0d0e21e LW	1141
	1142	The name that C<gv_stash*v> wants is the name of the package whose symbol table
	1143	you want. The default package is called C<main>. If you have multiply nested
d1b91892 AD	1144	packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl
d1b91892 AD	1145	language itself.
a0d0e21e LW	1146
	1147	Alternately, if you have an SV that is a blessed reference, you can find
	1148	out the stash pointer by using:
	1149
	1150	HV* SvSTASH(SvRV(SV*));
	1151
	1152	then use the following to get the package name itself:
	1153
	1154	char* HvNAME(HV* stash);
	1155
5f05dabc	1156	If you need to bless or re-bless an object you can use the following
5f05dabc	1157	function:
a0d0e21e LW	1158
	1159	SV* sv_bless(SV, HV stash)
	1160
	1161	where the first argument, an C<SV*>, must be a reference, and the second
	1162	argument is a stash. The returned C<SV*> can now be used in the same way
	1163	as any other SV.
	1164
d1b91892 AD	1165	For more information on references and blessings, consult L<perlref>.
d1b91892 AD	1166
4c29eb71 TC	1167	=head2 I/O Handles
	1168
	1169	Like AVs and HVs, IO objects are another type of non-scalar SV which
	1170	may contain input and output L<PerlIO\|perlapio> objects or a C<DIR *>
	1171	from opendir().
	1172
	1173	You can create a new IO object:
	1174
	1175	IO* newIO();
	1176
	1177	Unlike other SVs, a new IO object is automatically blessed into the
	1178	L<IO::File> class.
	1179
	1180	The IO object contains an input and output PerlIO handle:
	1181
	1182	PerlIO IoIFP(IO io);
	1183	PerlIO IoOFP(IO io);
	1184
6ef63541 KW	1185	=for apidoc_section $io
	1186	=for apidoc Amh\|PerlIO \|IoIFP\|IO io
	1187	=for apidoc Amh\|PerlIO \|IoOFP\|IO io
	1188
4c29eb71 TC	1189	Typically if the IO object has been opened on a file, the input handle
	1190	is always present, but the output handle is only present if the file
	1191	is open for output. For a file, if both are present they will be the
	1192	same PerlIO object.
	1193
	1194	Distinct input and output PerlIO objects are created for sockets and
	1195	character devices.
	1196
	1197	The IO object also contains other data associated with Perl I/O
	1198	handles:
	1199
	1200	IV IoLINES(io); /* $. */
	1201	IV IoPAGE(io); /* $% */
	1202	IV IoPAGE_LEN(io); /* $= */
	1203	IV IoLINES_LEFT(io); /* $- */
	1204	char IoTOP_NAME(io); / $^ */
	1205	GV IoTOP_GV(io); / $^ */
	1206	char IoFMT_NAME(io); / $~ */
	1207	GV IoFMT_GV(io); / $~ */
	1208	char *IoBOTTOM_NAME(io);
	1209	GV *IoBOTTOM_GV(io);
	1210	char IoTYPE(io);
	1211	U8 IoFLAGS(io);
	1212
6ef63541 KW	1213	=for apidoc_sections $io_scn, $formats_section
	1214	=for apidoc_section $reports
	1215	=for apidoc Amh\|IV\|IoLINES\|IO *io
	1216	=for apidoc Amh\|IV\|IoPAGE\|IO *io
	1217	=for apidoc Amh\|IV\|IoPAGE_LEN\|IO *io
	1218	=for apidoc Amh\|IV\|IoLINES_LEFT\|IO *io
	1219	=for apidoc Amh\|char \|IoTOP_NAME\|IO io
	1220	=for apidoc Amh\|GV \|IoTOP_GV\|IO io
	1221	=for apidoc Amh\|char \|IoFMT_NAME\|IO io
	1222	=for apidoc Amh\|GV \|IoFMT_GV\|IO io
	1223	=for apidoc Amh\|char \|IoBOTTOM_NAME\|IO io
	1224	=for apidoc Amh\|GV \|IoBOTTOM_GV\|IO io
	1225	=for apidoc_section $io
	1226	=for apidoc Amh\|char\|IoTYPE\|IO *io
	1227	=for apidoc Amh\|U8\|IoFLAGS\|IO *io
	1228
4c29eb71 TC	1229	Most of these are involved with L<formats\|perlform>.
	1230
	1231	IoFLAGs() may contain a combination of flags, the most interesting of
	1232	which are C<IOf_FLUSH> (C<$\|>) for autoflush and C<IOf_UNTAINT>,
	1233	settable with L<< IO::Handle's untaint() method\|IO::Handle/"$io->untaint" >>.
	1234
6ef63541 KW	1235	=for apidoc Amnh\|\|IOf_FLUSH
	1236	=for apidoc Amnh\|\|IOf_UNTAINT
	1237
4c29eb71 TC	1238	The IO object may also contains a directory handle:
	1239
	1240	DIR *IoDIRP(io);
	1241
6ef63541 KW	1242	=for apidoc Amh\|DIR \|IoDIRP\|IO io
6ef63541 KW	1243
4c29eb71 TC	1244	suitable for use with PerlDir_read() etc.
	1245
	1246	All of these accessors macros are lvalues, there are no distinct
	1247	C<_set()> macros to modify the members of the IO object.
	1248
54310121	1249	=head2 Double-Typed SVs
0a753a76	1250
	1251	Scalar variables normally contain only one type of value, an integer,
	1252	double, pointer, or reference. Perl will automatically convert the
	1253	actual scalar data from the stored type into the requested type.
	1254
	1255	Some scalar variables contain more than one type of scalar data. For
	1256	example, the variable C<$!> contains either the numeric value of C<errno>
	1257	or its string equivalent from either C<strerror> or C<sys_errlist[]>.
	1258
	1259	To force multiple data values into an SV, you must do two things: use the
	1260	C<sv_set*v> routines to add the additional scalar type, then set a flag
	1261	so that Perl will believe it contains more than one type of data. The
	1262	four macros to set the flags are:
	1263
	1264	SvIOK_on
	1265	SvNOK_on
	1266	SvPOK_on
	1267	SvROK_on
	1268
	1269	The particular macro you must use depends on which C<sv_set*v> routine
	1270	you called first. This is because every C<sv_set*v> routine turns on
	1271	only the bit for the particular type of data being set, and turns off
	1272	all the rest.
	1273
	1274	For example, to create a new Perl variable called "dberror" that contains
	1275	both the numeric and descriptive string error values, you could use the
	1276	following code:
	1277
	1278	extern int dberror;
	1279	extern char *dberror_list;
	1280
64ace3f8	1281	SV* sv = get_sv("dberror", GV_ADD);
0a753a76	1282	sv_setiv(sv, (IV) dberror);
	1283	sv_setpv(sv, dberror_list[dberror]);
	1284	SvIOK_on(sv);
	1285
	1286	If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the
	1287	macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>.
	1288
4f4531b8 FC	1289	=head2 Read-Only Values
	1290
	1291	In Perl 5.16 and earlier, copy-on-write (see the next section) shared a
	1292	flag bit with read-only scalars. So the only way to test whether
	1293	C<sv_setsv>, etc., will raise a "Modification of a read-only value" error
	1294	in those versions is:
	1295
	1296	SvREADONLY(sv) && !SvIsCOW(sv)
	1297
	1298	Under Perl 5.18 and later, SvREADONLY only applies to read-only variables,
	1299	and, under 5.20, copy-on-write scalars can also be read-only, so the above
	1300	check is incorrect. You just want:
	1301
	1302	SvREADONLY(sv)
	1303
	1304	If you need to do this check often, define your own macro like this:
	1305
	1306	#if PERL_VERSION >= 18
	1307	# define SvTRULYREADONLY(sv) SvREADONLY(sv)
	1308	#else
	1309	# define SvTRULYREADONLY(sv) (SvREADONLY(sv) && !SvIsCOW(sv))
	1310	#endif
	1311
	1312	=head2 Copy on Write
	1313
	1314	Perl implements a copy-on-write (COW) mechanism for scalars, in which
	1315	string copies are not immediately made when requested, but are deferred
	1316	until made necessary by one or the other scalar changing. This is mostly
	1317	transparent, but one must take care not to modify string buffers that are
	1318	shared by multiple SVs.
	1319
	1320	You can test whether an SV is using copy-on-write with C<SvIsCOW(sv)>.
	1321
	1322	You can force an SV to make its own copy of its string buffer by calling C<sv_force_normal(sv)> or SvPV_force_nolen(sv).
	1323
	1324	If you want to make the SV drop its string buffer, use
	1325	C<sv_force_normal_flags(sv, SV_COW_DROP_PV)> or simply
	1326	C<sv_setsv(sv, NULL)>.
	1327
	1328	All of these functions will croak on read-only scalars (see the previous
	1329	section for more on those).
	1330
	1331	To test that your code is behaving correctly and not modifying COW buffers,
	1332	on systems that support L<mmap(2)> (i.e., Unix) you can configure perl with
	1333	C<-Accflags=-DPERL_DEBUG_READONLY_COW> and it will turn buffer violations
	1334	into crashes. You will find it to be marvellously slow, so you may want to
	1335	skip perl's own tests.
	1336
0a753a76	1337	=head2 Magic Variables
a0d0e21e	1338
d1b91892 AD	1339	[This section still under construction. Ignore everything here. Post no
	1340	bills. Everything not permitted is forbidden.]
	1341
d1b91892 AD	1342	Any SV may be magical, that is, it has special features that a normal
d1b91892 AD	1343	SV does not have. These features are stored in the SV structure in a
5f05dabc	1344	linked list of C<struct magic>'s, typedef'ed to C<MAGIC>.
d1b91892 AD	1345
	1346	struct magic {
	1347	MAGIC* mg_moremagic;
	1348	MGVTBL* mg_virtual;
	1349	U16 mg_private;
	1350	char mg_type;
	1351	U8 mg_flags;
b205eb13	1352	I32 mg_len;
d1b91892 AD	1353	SV* mg_obj;
d1b91892 AD	1354	char* mg_ptr;
d1b91892 AD	1355	};
	1356
	1357	Note this is current as of patchlevel 0, and could change at any time.
	1358
	1359	=head2 Assigning Magic
	1360
	1361	Perl adds magic to an SV using the sv_magic function:
	1362
a9b0660e	1363	void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen);
d1b91892 AD	1364
	1365	The C<sv> argument is a pointer to the SV that is to acquire a new magical
	1366	feature.
	1367
	1368	If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to
10e2eb10 FC	1369	convert C<sv> to type C<SVt_PVMG>.
10e2eb10 FC	1370	Perl then continues by adding new magic
645c22ef DM	1371	to the beginning of the linked list of magical features. Any prior entry
	1372	of the same type of magic is deleted. Note that this can be overridden,
	1373	and multiple instances of the same type of magic can be associated with an
	1374	SV.
d1b91892	1375
54310121	1376	The C<name> and C<namlen> arguments are used to associate a string with
10e2eb10	1377	the magic, typically the name of a variable. C<namlen> is stored in the
2d8d5d5a SH	1378	C<mg_len> field and if C<name> is non-null then either a C<savepvn> copy of
	1379	C<name> or C<name> itself is stored in the C<mg_ptr> field, depending on
	1380	whether C<namlen> is greater than zero or equal to zero respectively. As a
	1381	special case, if C<(name && namlen == HEf_SVKEY)> then C<name> is assumed
	1382	to contain an C<SV*> and is stored as-is with its REFCNT incremented.
d1b91892 AD	1383
	1384	The sv_magic function uses C<how> to determine which, if any, predefined
	1385	"Magic Virtual Table" should be assigned to the C<mg_virtual> field.
5a0de581	1386	See the L</Magic Virtual Tables> section below. The C<how> argument is also
10e2eb10 FC	1387	stored in the C<mg_type> field. The value of
	1388	C<how> should be chosen from the set of macros
	1389	C<PERL_MAGIC_foo> found in F<perl.h>. Note that before
645c22ef	1390	these macros were added, Perl internals used to directly use character
14befaf4	1391	literals, so you may occasionally come across old code or documentation
75d0f26d	1392	referring to 'U' magic rather than C<PERL_MAGIC_uvar> for example.
d1b91892 AD	1393
	1394	The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC>
	1395	structure. If it is not the same as the C<sv> argument, the reference
	1396	count of the C<obj> object is incremented. If it is the same, or if
27deb0cf YO	1397	the C<how> argument is C<PERL_MAGIC_arylen>, C<PERL_MAGIC_regdatum>,
	1398	C<PERL_MAGIC_regdata>, or if it is a NULL pointer, then C<obj> is merely
	1399	stored, without the reference count being incremented.
d1b91892	1400
2d8d5d5a SH	1401	See also C<sv_magicext> in L<perlapi> for a more flexible way to add magic
	1402	to an SV.
	1403
cb1a09d0 AD	1404	There is also a function to add magic to an C<HV>:
	1405
	1406	void hv_magic(HV hv, GV gv, int how);
	1407
	1408	This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>.
	1409
	1410	To remove the magic from an SV, call the function sv_unmagic:
	1411
70a53b35	1412	int sv_unmagic(SV *sv, int type);
cb1a09d0 AD	1413
	1414	The C<type> argument should be equal to the C<how> value when the C<SV>
	1415	was initially made magical.
	1416
f6ee7b17	1417	However, note that C<sv_unmagic> removes all magic of a certain C<type> from the
10e2eb10 FC	1418	C<SV>. If you want to remove only certain
10e2eb10 FC	1419	magic of a C<type> based on the magic
f6ee7b17 FR	1420	virtual table, use C<sv_unmagicext> instead:
	1421
	1422	int sv_unmagicext(SV sv, int type, MGVTBL vtbl);
	1423
d1b91892 AD	1424	=head2 Magic Virtual Tables
d1b91892 AD	1425
d1be9408	1426	The C<mg_virtual> field in the C<MAGIC> structure is a pointer to an
d1b91892 AD	1427	C<MGVTBL>, which is a structure of function pointers and stands for
	1428	"Magic Virtual Table" to handle the various operations that might be
	1429	applied to that variable.
	1430
39988615	1431	=for apidoc_section $magic
63dbc4a9 KW	1432	=for apidoc Ayh\|\|MGVTBL
63dbc4a9 KW	1433
301cb7e8 DM	1434	The C<MGVTBL> has five (or sometimes eight) pointers to the following
301cb7e8 DM	1435	routine types:
d1b91892	1436
e97ca230 DM	1437	int (svt_get) (pTHX_ SV sv, MAGIC* mg);
	1438	int (svt_set) (pTHX_ SV sv, MAGIC* mg);
	1439	U32 (svt_len) (pTHX_ SV sv, MAGIC* mg);
	1440	int (svt_clear)(pTHX_ SV sv, MAGIC* mg);
	1441	int (svt_free) (pTHX_ SV sv, MAGIC* mg);
d1b91892	1442
e97ca230	1443	int (svt_copy) (pTHX_ SV sv, MAGIC* mg, SV *nsv,
a9b0660e	1444	const char *name, I32 namlen);
e97ca230 DM	1445	int (svt_dup) (pTHX_ MAGIC mg, CLONE_PARAMS *param);
e97ca230 DM	1446	int (svt_local)(pTHX_ SV nsv, MAGIC *mg);
301cb7e8 DM	1447
301cb7e8 DM	1448
06f6df17	1449	This MGVTBL structure is set at compile-time in F<perl.h> and there are
b7a0f54c S	1450	currently 32 types. These different structures contain pointers to various
	1451	routines that perform additional actions depending on which function is
	1452	being called.
d1b91892	1453
a9b0660e KW	1454	Function pointer Action taken
	1455	---------------- ------------
	1456	svt_get Do something before the value of the SV is
	1457	retrieved.
	1458	svt_set Do something after the SV is assigned a value.
	1459	svt_len Report on the SV's length.
	1460	svt_clear Clear something the SV represents.
	1461	svt_free Free any extra storage associated with the SV.
d1b91892	1462
a9b0660e KW	1463	svt_copy copy tied variable magic to a tied element
	1464	svt_dup duplicate a magic structure during thread cloning
	1465	svt_local copy magic to local value during 'local'
301cb7e8	1466
d1b91892	1467	For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds
14befaf4	1468	to an C<mg_type> of C<PERL_MAGIC_sv>) contains:
d1b91892 AD	1469
	1470	{ magic_get, magic_set, magic_len, 0, 0 }
	1471
14befaf4 DM	1472	Thus, when an SV is determined to be magical and of type C<PERL_MAGIC_sv>,
	1473	if a get operation is being performed, the routine C<magic_get> is
	1474	called. All the various routines for the various magical types begin
	1475	with C<magic_>. NOTE: the magic routines are not considered part of
	1476	the Perl API, and may not be exported by the Perl library.
d1b91892	1477
301cb7e8 DM	1478	The last three slots are a recent addition, and for source code
301cb7e8 DM	1479	compatibility they are only checked for if one of the three flags
0985f7e5	1480	C<MGf_COPY>, C<MGf_DUP>, or C<MGf_LOCAL> is set in mg_flags.
10e2eb10 FC	1481	This means that most code can continue declaring
10e2eb10 FC	1482	a vtable as a 5-element value. These three are
301cb7e8 DM	1483	currently used exclusively by the threading code, and are highly subject
	1484	to change.
	1485
6ef63541 KW	1486	=for apidoc_section $magic
	1487	=for apidoc Amnh\|\|MGf_COPY
	1488	=for apidoc_item \|\|MGf_DUP
	1489	=for apidoc_item \|\|MGf_LOCAL
	1490
d1b91892 AD	1491	The current kinds of Magic Virtual Tables are:
d1b91892 AD	1492
f1f5ddd7 FC	1493	=for comment
	1494	This table is generated by regen/mg_vtable.pl. Any changes made here
	1495	will be lost.
	1496
	1497	=for mg_vtable.pl begin
	1498
a9b0660e	1499	mg_type
bd6e6c12 FC	1500	(old-style char and macro) MGVTBL Type of magic
	1501	-------------------------- ------ -------------
	1502	\0 PERL_MAGIC_sv vtbl_sv Special scalar variable
	1503	# PERL_MAGIC_arylen vtbl_arylen Array length ($#ary)
e5e1ee61	1504	% PERL_MAGIC_rhash (none) Extra data for restricted
bd6e6c12	1505	hashes
a6d69523 TC	1506	* PERL_MAGIC_debugvar vtbl_debugvar $DB::single, signal, trace
a6d69523 TC	1507	vars
bd6e6c12	1508	. PERL_MAGIC_pos vtbl_pos pos() lvalue
e5e1ee61	1509	: PERL_MAGIC_symtab (none) Extra data for symbol
bd6e6c12	1510	tables
e5e1ee61 FC	1511	< PERL_MAGIC_backref vtbl_backref For weak ref data
e5e1ee61 FC	1512	@ PERL_MAGIC_arylen_p (none) To move arylen out of XPVAV
2f920c2f	1513	B PERL_MAGIC_bm vtbl_regexp Boyer-Moore
bd6e6c12	1514	(fast string search)
2f920c2f	1515	c PERL_MAGIC_overload_table vtbl_ovrld Holds overload table
bd6e6c12	1516	(AMT) on stash
2f920c2f	1517	D PERL_MAGIC_regdata vtbl_regdata Regex match position data
bd6e6c12 FC	1518	(@+ and @- vars)
	1519	d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data
	1520	element
	1521	E PERL_MAGIC_env vtbl_env %ENV hash
	1522	e PERL_MAGIC_envelem vtbl_envelem %ENV hash element
2f920c2f	1523	f PERL_MAGIC_fm vtbl_regexp Formline
bd6e6c12	1524	('compiled' format)
bd6e6c12 FC	1525	g PERL_MAGIC_regex_global vtbl_mglob m//g target
	1526	H PERL_MAGIC_hints vtbl_hints %^H hash
	1527	h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element
	1528	I PERL_MAGIC_isa vtbl_isa @ISA array
	1529	i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element
	1530	k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue
	1531	L PERL_MAGIC_dbfile (none) Debugger %_<filename
	1532	l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename
	1533	element
	1534	N PERL_MAGIC_shared (none) Shared between threads
	1535	n PERL_MAGIC_shared_scalar (none) Shared between threads
	1536	o PERL_MAGIC_collxfrm vtbl_collxfrm Locale transformation
	1537	P PERL_MAGIC_tied vtbl_pack Tied array or hash
	1538	p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element
	1539	q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle
e5e1ee61	1540	r PERL_MAGIC_qr vtbl_regexp Precompiled qr// regex
55f5e765	1541	S PERL_MAGIC_sig vtbl_sig %SIG hash
bd6e6c12 FC	1542	s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element
	1543	t PERL_MAGIC_taint vtbl_taint Taintedness
	1544	U PERL_MAGIC_uvar vtbl_uvar Available for use by
	1545	extensions
	1546	u PERL_MAGIC_uvar_elem (none) Reserved for use by
	1547	extensions
4499db73	1548	V PERL_MAGIC_vstring (none) SV was vstring literal
bd6e6c12 FC	1549	v PERL_MAGIC_vec vtbl_vec vec() lvalue
bd6e6c12 FC	1550	w PERL_MAGIC_utf8 vtbl_utf8 Cached UTF-8 information
2f920c2f	1551	X PERL_MAGIC_destruct vtbl_destruct destruct callback
bd6e6c12	1552	x PERL_MAGIC_substr vtbl_substr substr() lvalue
1f1dcfb5 FC	1553	Y PERL_MAGIC_nonelem vtbl_nonelem Array element that does not
1f1dcfb5 FC	1554	exist
bd6e6c12 FC	1555	y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator
	1556	variable / smart parameter
	1557	vivification
93f6f965 YO	1558	Z PERL_MAGIC_hook vtbl_hook %{^HOOK} hash
93f6f965 YO	1559	z PERL_MAGIC_hookelem vtbl_hookelem %{^HOOK} hash element
baabe3fb FC	1560	\ PERL_MAGIC_lvref vtbl_lvref Lvalue reference
baabe3fb FC	1561	constructor
e5e1ee61	1562	] PERL_MAGIC_checkcall vtbl_checkcall Inlining/mutation of call
bd6e6c12	1563	to this CV
3e510e80 LT	1564	^ PERL_MAGIC_extvalue (none) Value magic available for
	1565	use by extensions
	1566	~ PERL_MAGIC_ext (none) Variable magic available
	1567	for use by extensions
0cbee0a4	1568
ed48408e	1569
7d61aa1c	1570	=for apidoc_section $magic
4295f56c KW	1571	=for apidoc AmnhU\|\|PERL_MAGIC_arylen
	1572	=for apidoc_item \|\|PERL_MAGIC_arylen_p
	1573	=for apidoc_item \|\|PERL_MAGIC_backref
	1574	=for apidoc_item \|\|PERL_MAGIC_bm
	1575	=for apidoc_item \|\|PERL_MAGIC_checkcall
	1576	=for apidoc_item \|\|PERL_MAGIC_collxfrm
	1577	=for apidoc_item \|\|PERL_MAGIC_dbfile
	1578	=for apidoc_item \|\|PERL_MAGIC_dbline
	1579	=for apidoc_item \|\|PERL_MAGIC_debugvar
	1580	=for apidoc_item \|\|PERL_MAGIC_defelem
2f920c2f	1581	=for apidoc_item \|\|PERL_MAGIC_destruct
4295f56c KW	1582	=for apidoc_item \|\|PERL_MAGIC_env
	1583	=for apidoc_item \|\|PERL_MAGIC_envelem
	1584	=for apidoc_item \|\|PERL_MAGIC_ext
3e510e80	1585	=for apidoc_item \|\|PERL_MAGIC_extvalue
4295f56c KW	1586	=for apidoc_item \|\|PERL_MAGIC_fm
	1587	=for apidoc_item \|\|PERL_MAGIC_hints
	1588	=for apidoc_item \|\|PERL_MAGIC_hintselem
93f6f965 YO	1589	=for apidoc_item \|\|PERL_MAGIC_hook
93f6f965 YO	1590	=for apidoc_item \|\|PERL_MAGIC_hookelem
4295f56c KW	1591	=for apidoc_item \|\|PERL_MAGIC_isa
	1592	=for apidoc_item \|\|PERL_MAGIC_isaelem
	1593	=for apidoc_item \|\|PERL_MAGIC_lvref
	1594	=for apidoc_item \|\|PERL_MAGIC_nkeys
	1595	=for apidoc_item \|\|PERL_MAGIC_nonelem
	1596	=for apidoc_item \|\|PERL_MAGIC_overload_table
	1597	=for apidoc_item \|\|PERL_MAGIC_pos
	1598	=for apidoc_item \|\|PERL_MAGIC_qr
	1599	=for apidoc_item \|\|PERL_MAGIC_regdata
	1600	=for apidoc_item \|\|PERL_MAGIC_regdatum
	1601	=for apidoc_item \|\|PERL_MAGIC_regex_global
	1602	=for apidoc_item \|\|PERL_MAGIC_rhash
	1603	=for apidoc_item \|\|PERL_MAGIC_shared
	1604	=for apidoc_item \|\|PERL_MAGIC_shared_scalar
	1605	=for apidoc_item \|\|PERL_MAGIC_sig
	1606	=for apidoc_item \|\|PERL_MAGIC_sigelem
	1607	=for apidoc_item \|\|PERL_MAGIC_substr
	1608	=for apidoc_item \|\|PERL_MAGIC_sv
	1609	=for apidoc_item \|\|PERL_MAGIC_symtab
	1610	=for apidoc_item \|\|PERL_MAGIC_taint
	1611	=for apidoc_item \|\|PERL_MAGIC_tied
	1612	=for apidoc_item \|\|PERL_MAGIC_tiedelem
	1613	=for apidoc_item \|\|PERL_MAGIC_tiedscalar
	1614	=for apidoc_item \|\|PERL_MAGIC_utf8
	1615	=for apidoc_item \|\|PERL_MAGIC_uvar
	1616	=for apidoc_item \|\|PERL_MAGIC_uvar_elem
	1617	=for apidoc_item \|\|PERL_MAGIC_vec
	1618	=for apidoc_item \|\|PERL_MAGIC_vstring
ed48408e	1619
f1f5ddd7	1620	=for mg_vtable.pl end
d1b91892	1621
68dc0745	1622	When an uppercase and lowercase letter both exist in the table, then the
92f0c265 JP	1623	uppercase letter is typically used to represent some kind of composite type
92f0c265 JP	1624	(a list or a hash), and the lowercase letter is used to represent an element
10e2eb10	1625	of that composite type. Some internals code makes use of this case
92f0c265	1626	relationship. However, 'v' and 'V' (vec and v-string) are in no way related.
14befaf4	1627
3e510e80 LT	1628	The C<PERL_MAGIC_ext>, C<PERL_MAGIC_extvalue> and C<PERL_MAGIC_uvar> magic types
	1629	are defined specifically for use by extensions and will not be used by perl
	1630	itself. Extensions can use C<PERL_MAGIC_ext> or C<PERL_MAGIC_extvalue> magic to
	1631	'attach' private information to variables (typically objects). This is
	1632	especially useful because there is no way for normal perl code to corrupt this
	1633	private information (unlike using extra elements of a hash object).
	1634	C<PERL_MAGIC_extvalue> is value magic (unlike C<PERL_MAGIC_ext> and
	1635	C<PERL_MAGIC_uvar>) meaning that on localization the new value will not be
	1636	magical.
14befaf4 DM	1637
	1638	Similarly, C<PERL_MAGIC_uvar> magic can be used much like tie() to call a
	1639	C function any time a scalar's value is used or changed. The C<MAGIC>'s
bdbeb323 SM	1640	C<mg_ptr> field points to a C<ufuncs> structure:
	1641
	1642	struct ufuncs {
a9402793 AB	1643	I32 (uf_val)(pTHX_ IV, SV);
a9402793 AB	1644	I32 (uf_set)(pTHX_ IV, SV);
bdbeb323 SM	1645	IV uf_index;
	1646	};
	1647
	1648	When the SV is read from or written to, the C<uf_val> or C<uf_set>
14befaf4 DM	1649	function will be called with C<uf_index> as the first arg and a pointer to
14befaf4 DM	1650	the SV as the second. A simple example of how to add C<PERL_MAGIC_uvar>
1526ead6 AB	1651	magic is shown below. Note that the ufuncs structure is copied by
	1652	sv_magic, so you can safely allocate it on the stack.
	1653
	1654	void
	1655	Umagic(sv)
	1656	SV *sv;
	1657	PREINIT:
	1658	struct ufuncs uf;
	1659	CODE:
	1660	uf.uf_val = &my_get_fn;
	1661	uf.uf_set = &my_set_fn;
	1662	uf.uf_index = 0;
14befaf4	1663	sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf));
5f05dabc	1664
1e73acc8 AS	1665	Attaching C<PERL_MAGIC_uvar> to arrays is permissible but has no effect.
	1666
	1667	For hashes there is a specialized hook that gives control over hash
	1668	keys (but not values). This hook calls C<PERL_MAGIC_uvar> 'get' magic
	1669	if the "set" function in the C<ufuncs> structure is NULL. The hook
	1670	is activated whenever the hash is accessed with a key specified as
	1671	an C<SV> through the functions C<hv_store_ent>, C<hv_fetch_ent>,
	1672	C<hv_delete_ent>, and C<hv_exists_ent>. Accessing the key as a string
	1673	through the functions without the C<..._ent> suffix circumvents the
4509d391	1674	hook. See L<Hash::Util::FieldHash/GUTS> for a detailed description.
1e73acc8	1675
14befaf4 DM	1676	Note that because multiple extensions may be using C<PERL_MAGIC_ext>
	1677	or C<PERL_MAGIC_uvar> magic, it is important for extensions to take
	1678	extra care to avoid conflict. Typically only using the magic on
	1679	objects blessed into the same class as the extension is sufficient.
2f07f21a FR	1680	For C<PERL_MAGIC_ext> magic, it is usually a good idea to define an
	1681	C<MGVTBL>, even if all its fields will be C<0>, so that individual
	1682	C<MAGIC> pointers can be identified as a particular kind of magic
10e2eb10	1683	using their magic virtual table. C<mg_findext> provides an easy way
f6ee7b17	1684	to do that:
2f07f21a FR	1685
	1686	STATIC MGVTBL my_vtbl = { 0, 0, 0, 0, 0, 0, 0, 0 };
	1687
	1688	MAGIC *mg;
f6ee7b17 FR	1689	if ((mg = mg_findext(sv, PERL_MAGIC_ext, &my_vtbl))) {
	1690	/* this is really ours, not another module's PERL_MAGIC_ext */
	1691	my_priv_data_t priv = (my_priv_data_t )mg->mg_ptr;
	1692	...
2f07f21a	1693	}
5f05dabc	1694
ef50df4b GS	1695	Also note that the C<sv_set()> and C<sv_cat()> functions described
	1696	earlier do B<not> invoke 'set' magic on their targets. This must
	1697	be done by the user either by calling the C<SvSETMAGIC()> macro after
	1698	calling these functions, or by using one of the C<sv_set*_mg()> or
	1699	C<sv_cat*_mg()> functions. Similarly, generic C code must call the
	1700	C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV
	1701	obtained from external sources in functions that don't handle magic.
4a4eefd0	1702	See L<perlapi> for a description of these functions.
189b2af5 GS	1703	For example, calls to the C<sv_cat*()> functions typically need to be
	1704	followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()>
	1705	since their implementation handles 'get' magic.
	1706
d1b91892 AD	1707	=head2 Finding Magic
d1b91892 AD	1708
a9b0660e KW	1709	MAGIC mg_find(SV sv, int type); /* Finds the magic pointer of that
a9b0660e KW	1710	* type */
f6ee7b17 FR	1711
f6ee7b17 FR	1712	This routine returns a pointer to a C<MAGIC> structure stored in the SV.
10e2eb10 FC	1713	If the SV does not have that magical
10e2eb10 FC	1714	feature, C<NULL> is returned. If the
f6ee7b17	1715	SV has multiple instances of that magical feature, the first one will be
10e2eb10 FC	1716	returned. C<mg_findext> can be used
10e2eb10 FC	1717	to find a C<MAGIC> structure of an SV
da8c5729	1718	based on both its magic type and its magic virtual table:
f6ee7b17 FR	1719
f6ee7b17 FR	1720	MAGIC mg_findext(SV sv, int type, MGVTBL *vtbl);
d1b91892	1721
f6ee7b17 FR	1722	Also, if the SV passed to C<mg_find> or C<mg_findext> is not of type
f6ee7b17 FR	1723	SVt_PVMG, Perl may core dump.
d1b91892	1724
08105a92	1725	int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen);
d1b91892 AD	1726
d1b91892 AD	1727	This routine checks to see what types of magic C<sv> has. If the mg_type
68dc0745	1728	field is an uppercase letter, then the mg_obj is copied to C<nsv>, but
68dc0745	1729	the mg_type field is changed to be the lowercase letter.
a0d0e21e	1730
04343c6d GS	1731	=head2 Understanding the Magic of Tied Hashes and Arrays
04343c6d GS	1732
14befaf4 DM	1733	Tied hashes and arrays are magical beasts of the C<PERL_MAGIC_tied>
14befaf4 DM	1734	magic type.
9edb2b46 GS	1735
	1736	WARNING: As of the 5.004 release, proper usage of the array and hash
	1737	access functions requires understanding a few caveats. Some
	1738	of these caveats are actually considered bugs in the API, to be fixed
10e2eb10	1739	in later releases, and are bracketed with [MAYCHANGE] below. If
9edb2b46 GS	1740	you find yourself actually applying such information in this section, be
9edb2b46 GS	1741	aware that the behavior may change in the future, umm, without warning.
04343c6d	1742
1526ead6	1743	The perl tie function associates a variable with an object that implements
9a68f1db	1744	the various GET, SET, etc methods. To perform the equivalent of the perl
1526ead6	1745	tie function from an XSUB, you must mimic this behaviour. The code below
61ad4b94	1746	carries out the necessary steps -- firstly it creates a new hash, and then
1526ead6	1747	creates a second hash which it blesses into the class which will implement
10e2eb10	1748	the tie methods. Lastly it ties the two hashes together, and returns a
1526ead6 AB	1749	reference to the new tied hash. Note that the code below does NOT call the
1526ead6 AB	1750	TIEHASH method in the MyTie class -
5a0de581	1751	see L</Calling Perl Routines from within C Programs> for details on how
1526ead6 AB	1752	to do this.
	1753
	1754	SV*
	1755	mytie()
	1756	PREINIT:
	1757	HV *hash;
	1758	HV *stash;
	1759	SV *tie;
	1760	CODE:
	1761	hash = newHV();
	1762	tie = newRV_noinc((SV*)newHV());
da51bb9b	1763	stash = gv_stashpv("MyTie", GV_ADD);
1526ead6	1764	sv_bless(tie, stash);
899e16d0	1765	hv_magic(hash, (GV*)tie, PERL_MAGIC_tied);
2100da0d	1766	SvREFCNT_dec(tie); /* hv_magic() increases tie ref count */
1526ead6 AB	1767	RETVAL = newRV_noinc(hash);
	1768	OUTPUT:
	1769	RETVAL
	1770
04343c6d GS	1771	The C<av_store> function, when given a tied array argument, merely
	1772	copies the magic of the array onto the value to be "stored", using
	1773	C<mg_copy>. It may also return NULL, indicating that the value did not
9edb2b46 GS	1774	actually need to be stored in the array. [MAYCHANGE] After a call to
	1775	C<av_store> on a tied array, the caller will usually need to call
	1776	C<mg_set(val)> to actually invoke the perl level "STORE" method on the
	1777	TIEARRAY object. If C<av_store> did return NULL, a call to
	1778	C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory
	1779	leak. [/MAYCHANGE]
04343c6d GS	1780
	1781	The previous paragraph is applicable verbatim to tied hash access using the
	1782	C<hv_store> and C<hv_store_ent> functions as well.
	1783
	1784	C<av_fetch> and the corresponding hash functions C<hv_fetch> and
	1785	C<hv_fetch_ent> actually return an undefined mortal value whose magic
	1786	has been initialized using C<mg_copy>. Note the value so returned does not
9edb2b46 GS	1787	need to be deallocated, as it is already mortal. [MAYCHANGE] But you will
	1788	need to call C<mg_get()> on the returned value in order to actually invoke
	1789	the perl level "FETCH" method on the underlying TIE object. Similarly,
04343c6d GS	1790	you may also call C<mg_set()> on the return value after possibly assigning
04343c6d GS	1791	a suitable value to it using C<sv_setsv>, which will invoke the "STORE"
9edb2b46	1792	method on the TIE object. [/MAYCHANGE]
04343c6d	1793
9edb2b46	1794	[MAYCHANGE]
04343c6d GS	1795	In other words, the array or hash fetch/store functions don't really
	1796	fetch and store actual values in the case of tied arrays and hashes. They
	1797	merely call C<mg_copy> to attach magic to the values that were meant to be
	1798	"stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually
	1799	do the job of invoking the TIE methods on the underlying objects. Thus
9edb2b46	1800	the magic mechanism currently implements a kind of lazy access to arrays
04343c6d GS	1801	and hashes.
	1802
	1803	Currently (as of perl version 5.004), use of the hash and array access
	1804	functions requires the user to be aware of whether they are operating on
9edb2b46 GS	1805	"normal" hashes and arrays, or on their tied variants. The API may be
	1806	changed to provide more transparent access to both tied and normal data
	1807	types in future versions.
	1808	[/MAYCHANGE]
04343c6d GS	1809
	1810	You would do well to understand that the TIEARRAY and TIEHASH interfaces
	1811	are mere sugar to invoke some perl method calls while using the uniform hash
	1812	and array syntax. The use of this sugar imposes some overhead (typically
	1813	about two to four extra opcodes per FETCH/STORE operation, in addition to
	1814	the creation of all the mortal variables required to invoke the methods).
	1815	This overhead will be comparatively small if the TIE methods are themselves
	1816	substantial, but if they are only a few statements long, the overhead
	1817	will not be insignificant.
	1818
d1c897a1 IZ	1819	=head2 Localizing changes
	1820
	1821	Perl has a very handy construction
	1822
	1823	{
	1824	local $var = 2;
	1825	...
	1826	}
	1827
	1828	This construction is I<approximately> equivalent to
	1829
	1830	{
	1831	my $oldvar = $var;
	1832	$var = 2;
	1833	...
	1834	$var = $oldvar;
	1835	}
	1836
	1837	The biggest difference is that the first construction would
	1838	reinstate the initial value of $var, irrespective of how control exits
10e2eb10	1839	the block: C<goto>, C<return>, C<die>/C<eval>, etc. It is a little bit
d1c897a1 IZ	1840	more efficient as well.
	1841
	1842	There is a way to achieve a similar task from C via Perl API: create a
	1843	I<pseudo-block>, and arrange for some changes to be automatically
	1844	undone at the end of it, either explicit, or via a non-local exit (via
10e2eb10	1845	die()). A I<block>-like construct is created by a pair of
b687b08b TC	1846	C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">).
	1847	Such a construct may be created specially for some important localized
	1848	task, or an existing one (like boundaries of enclosing Perl
	1849	subroutine/block, or an existing pair for freeing TMPs) may be
10e2eb10 FC	1850	used. (In the second case the overhead of additional localization must
10e2eb10 FC	1851	be almost negligible.) Note that any XSUB is automatically enclosed in
b687b08b	1852	an C<ENTER>/C<LEAVE> pair.
d1c897a1 IZ	1853
	1854	Inside such a I<pseudo-block> the following service is available:
	1855
13a2d996	1856	=over 4
d1c897a1 IZ	1857
	1858	=item C<SAVEINT(int i)>
	1859
	1860	=item C<SAVEIV(IV i)>
	1861
	1862	=item C<SAVEI32(I32 i)>
	1863
	1864	=item C<SAVELONG(long i)>
	1865
6c53e783 KW	1866	=item C<SAVEI8(I8 i)>
	1867
	1868	=item C<SAVEI16(I16 i)>
	1869
	1870	=item C<SAVEBOOL(int i)>
	1871
58541fd0 PE	1872	=item C<SAVESTRLEN(STRLEN i)>
58541fd0 PE	1873
d1c897a1	1874	These macros arrange things to restore the value of integer variable
88d9f68d	1875	C<i> at the end of the enclosing I<pseudo-block>.
d1c897a1	1876
7cc7ada7	1877	=for apidoc_section $callback
9144f9d9 KW	1878	=for apidoc Amh\|\|SAVEINT\|int i
	1879	=for apidoc Amh\|\|SAVEIV\|IV i
	1880	=for apidoc Amh\|\|SAVEI32\|I32 i
	1881	=for apidoc Amh\|\|SAVELONG\|long i
	1882	=for apidoc Amh\|\|SAVEI8\|I8 i
	1883	=for apidoc Amh\|\|SAVEI16\|I16 i
d633272a	1884	=for apidoc Amh\|\|SAVEBOOL\|bool i
58541fd0	1885	=for apidoc Amh\|\|SAVESTRLEN\|STRLEN i
9144f9d9	1886
d1c897a1 IZ	1887	=item C<SAVESPTR(s)>
	1888
	1889	=item C<SAVEPPTR(p)>
	1890
	1891	These macros arrange things to restore the value of pointers C<s> and
10e2eb10	1892	C<p>. C<s> must be a pointer of a type which survives conversion to
d1c897a1 IZ	1893	C<SV> and back, C<p> should be able to survive conversion to C<char>
	1894	and back.
	1895
9144f9d9 KW	1896	=for apidoc Amh\|\|SAVESPTR\|SV * s
	1897	=for apidoc Amh\|\|SAVEPPTR\|char * p
	1898
624f6f53 YO	1899	=item C<SAVERCPV(char **ppv)>
	1900
	1901	This macro arranges to restore the value of a C<char *> variable which
	1902	was allocated with a call to C<rcpv_new()> to its previous state when
	1903	the current pseudo block is completed. The pointer stored in C<*ppv> at
	1904	the time of the call will be refcount incremented and stored on the save
	1905	stack. Later when the current I<pseudo-block> is completed the value
	1906	stored in C<*ppv> will be refcount decremented, and the previous value
	1907	restored from the savestack which will also be refcount decremented.
	1908
	1909	This is the C<RCPV> equivalent of C<SAVEGENERICSV()>.
	1910
	1911	=for apidoc Amh\|\|SAVERCPV\|char *pv
	1912
	1913	=item C<SAVEGENERICSV(SV **psv)>
	1914
	1915	This macro arranges to restore the value of a C<SV *> variable to its
	1916	previous state when the current pseudo block is completed. The pointer
	1917	stored in C<*psv> at the time of the call will be refcount incremented
	1918	and stored on the save stack. Later when the current I<pseudo-block> is
	1919	completed the value stored in C<*ppv> will be refcount decremented, and
	1920	the previous value restored from the savestack which will also be refcount
	1921	decremented. This the C equivalent of C<local $sv>.
	1922
	1923	=for apidoc Amh\|\|SAVEGENERICSV\|char **psv
	1924
d1c897a1 IZ	1925	=item C<SAVEFREESV(SV *sv)>
d1c897a1 IZ	1926
06f1e0b6	1927	The refcount of C<sv> will be decremented at the end of
26d9b02f JH	1928	I<pseudo-block>. This is similar to C<sv_2mortal> in that it is also a
	1929	mechanism for doing a delayed C<SvREFCNT_dec>. However, while C<sv_2mortal>
	1930	extends the lifetime of C<sv> until the beginning of the next statement,
	1931	C<SAVEFREESV> extends it until the end of the enclosing scope. These
	1932	lifetimes can be wildly different.
	1933
	1934	Also compare C<SAVEMORTALIZESV>.
	1935
9144f9d9 KW	1936	=for apidoc Amh\|\|SAVEFREESV\|SV* sv
9144f9d9 KW	1937
26d9b02f JH	1938	=item C<SAVEMORTALIZESV(SV *sv)>
	1939
	1940	Just like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current
	1941	scope instead of decrementing its reference count. This usually has the
	1942	effect of keeping C<sv> alive until the statement that called the currently
	1943	live scope has finished executing.
d1c897a1	1944
9144f9d9 KW	1945	=for apidoc Amh\|\|SAVEMORTALIZESV\|SV* sv
9144f9d9 KW	1946
d1c897a1 IZ	1947	=item C<SAVEFREEOP(OP *op)>
d1c897a1 IZ	1948
624f6f53	1949	The C<OP *> is C<op_free()>ed at the end of I<pseudo-block>.
d1c897a1	1950
9144f9d9 KW	1951	=for apidoc Amh\|\|SAVEFREEOP\|OP *op
9144f9d9 KW	1952
d1c897a1 IZ	1953	=item C<SAVEFREEPV(p)>
d1c897a1 IZ	1954
624f6f53 YO	1955	The chunk of memory which is pointed to by C<p> is C<Safefree()>ed at the
	1956	end of the current I<pseudo-block>.
	1957
	1958	=for apidoc Amh\|\|SAVEFREEPV\|char *pv
	1959
	1960	=item C<SAVEFREERCPV(char *pv)>
	1961
	1962	Ensures that a C<char *> which was created by a call to C<rcpv_new()> is
	1963	C<rcpv_free()>ed at the end of the current I<pseudo-block>.
	1964
	1965	This is the RCPV equivalent of C<SAVEFREESV()>.
d1c897a1	1966
624f6f53	1967	=for apidoc Amh\|\|SAVEFREERCPV\|char *pv
9144f9d9	1968
d1c897a1 IZ	1969	=item C<SAVECLEARSV(SV *sv)>
	1970
	1971	Clears a slot in the current scratchpad which corresponds to C<sv> at
	1972	the end of I<pseudo-block>.
	1973
	1974	=item C<SAVEDELETE(HV hv, char key, I32 length)>
	1975
10e2eb10	1976	The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The
d1c897a1 IZ	1977	string pointed to by C<key> is Safefree()ed. If one has a I<key> in
	1978	short-lived storage, the corresponding string may be reallocated like
	1979	this:
	1980
9cde0e7f	1981	SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));
d1c897a1	1982
9144f9d9 KW	1983	=for apidoc Amh\|\|SAVEDELETE\|HV * hv\|char * key\|I32 length
9144f9d9 KW	1984
c76ac1ee	1985	=item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)>
d1c897a1 IZ	1986
d1c897a1 IZ	1987	At the end of I<pseudo-block> the function C<f> is called with the
2f920c2f	1988	only argument C<p> which may be NULL.
c76ac1ee	1989
63dbc4a9	1990	=for apidoc Ayh\|\|DESTRUCTORFUNC_NOCONTEXT_t
9144f9d9 KW	1991	=for apidoc Amh\|\|SAVEDESTRUCTOR\|DESTRUCTORFUNC_NOCONTEXT_t f\|void *p
9144f9d9 KW	1992
c76ac1ee GS	1993	=item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)>
	1994
	1995	At the end of I<pseudo-block> the function C<f> is called with the
2f920c2f YO	1996	implicit context argument (if any), and C<p> which may be NULL.
	1997
	1998	Note the I<end of the current pseudo-block> may occur much later than
493e6288	1999	the I<end of the current statement>. You may wish to look at the
475dc022	2000	C<MORTALSVFUNC_X()> macro instead.
d1c897a1	2001
63dbc4a9 KW	2002	=for apidoc Ayh\|\|DESTRUCTORFUNC_t
63dbc4a9 KW	2003	=for apidoc Amh\|\|SAVEDESTRUCTOR_X\|DESTRUCTORFUNC_t f\|void *p
9144f9d9	2004
2f920c2f YO	2005	=item C<MORTALSVFUNC_X(SVFUNC_t f, SV *sv)>
	2006
	2007	At the end of I<the current statement> the function C<f> is called with
	2008	the implicit context argument (if any), and C<sv> which may be NULL.
	2009
	2010	Be aware that the parameter argument to the destructor function differs
	2011	from the related C<SAVEDESTRUCTOR_X()> in that it MUST be either NULL or
	2012	an C<SV*>.
	2013
	2014	Note the I<end of the current statement> may occur much before the
	2015	the I<end of the current pseudo-block>. You may wish to look at the
	2016	C<SAVEDESTRUCTOR_X()> macro instead.
	2017
475dc022	2018	=for apidoc Amh\|\|MORTALSVFUNC_X\|SVFUNC_t f\|SV *sv
2f920c2f YO	2019
	2020	=item C<MORTALDESTRUCTOR_SV(SV coderef, SV args)>
	2021
	2022	At the end of I<the current statement> the Perl function contained in
	2023	C<coderef> is called with the arguments provided (if any) in C<args>.
	2024	See the documentation for C<mortal_destructor_sv()> for details on
	2025	the C<args> parameter is handled.
	2026
	2027	Note the I<end of the current statement> may occur much before the
	2028	the I<end of the current pseudo-block>. If you wish to call a perl
	2029	function at the end of the current pseudo block you should use the
	2030	C<SAVEDESTRUCTOR_X()> API instead, which will require you create a
	2031	C wrapper to call the Perl function.
	2032
	2033	=for apidoc Amh\|\|MORTALDESTRUCTOR_SV\|SV coderef\|SV args
	2034
d1c897a1 IZ	2035	=item C<SAVESTACK_POS()>
	2036
	2037	The current offset on the Perl internal stack (cf. C<SP>) is restored
	2038	at the end of I<pseudo-block>.
	2039
9144f9d9 KW	2040	=for apidoc Amh\|\|SAVESTACK_POS
9144f9d9 KW	2041
d1c897a1 IZ	2042	=back
	2043
	2044	The following API list contains functions, thus one needs to
	2045	provide pointers to the modifiable data explicitly (either C pointers,
00aadd71	2046	or Perlish C<GV *>s). Where the above macros take C<int>, a similar
d1c897a1 IZ	2047	function takes C<int *>.
d1c897a1 IZ	2048
9144f9d9 KW	2049	Other macros above have functions implementing them, but its probably
	2050	best to just use the macro, and not those or the ones below.
	2051
13a2d996	2052	=over 4
d1c897a1 IZ	2053
	2054	=item C<SV* save_scalar(GV *gv)>
	2055
4f313521 KW	2056	=for apidoc save_scalar
4f313521 KW	2057
d1c897a1 IZ	2058	Equivalent to Perl code C<local $gv>.
	2059
	2060	=item C<AV* save_ary(GV *gv)>
	2061
4f313521 KW	2062	=for apidoc save_ary
4f313521 KW	2063
d1c897a1 IZ	2064	=item C<HV* save_hash(GV *gv)>
d1c897a1 IZ	2065
4f313521 KW	2066	=for apidoc save_hash
4f313521 KW	2067
d1c897a1 IZ	2068	Similar to C<save_scalar>, but localize C<@gv> and C<%gv>.
	2069
	2070	=item C<void save_item(SV *item)>
	2071
53dedf6f KW	2072	=for apidoc save_item
	2073
	2074	Duplicates the current value of C<SV>. On the exit from the current
	2075	C<ENTER>/C<LEAVE> I<pseudo-block> the value of C<SV> will be restored
10e2eb10	2076	using the stored value. It doesn't handle magic. Use C<save_scalar> if
038fcae3	2077	magic is affected.
d1c897a1	2078
d1c897a1 IZ	2079	=item C<SV* save_svref(SV **sptr)>
d1c897a1 IZ	2080
4f313521 KW	2081	=for apidoc save_svref
4f313521 KW	2082
d1be9408	2083	Similar to C<save_scalar>, but will reinstate an C<SV *>.
d1c897a1 IZ	2084
	2085	=item C<void save_aptr(AV **aptr)>
	2086
	2087	=item C<void save_hptr(HV **hptr)>
	2088
4f313521 KW	2089	=for apidoc save_aptr
	2090	=for apidoc save_hptr
	2091
d1c897a1 IZ	2092	Similar to C<save_svref>, but localize C<AV > and C<HV >.
	2093
	2094	=back
	2095
	2096	The C<Alias> module implements localization of the basic types within the
	2097	I<caller's scope>. People who are interested in how to localize things in
	2098	the containing scope should take a look there too.
	2099
0a753a76	2100	=head1 Subroutines
a0d0e21e	2101
68dc0745	2102	=head2 XSUBs and the Argument Stack
5f05dabc	2103
	2104	The XSUB mechanism is a simple way for Perl programs to access C subroutines.
	2105	An XSUB routine will have a stack that contains the arguments from the Perl
	2106	program, and a way to map from the Perl data structures to a C equivalent.
	2107
	2108	The stack arguments are accessible through the C<ST(n)> macro, which returns
	2109	the C<n>'th stack argument. Argument 0 is the first argument passed in the
	2110	Perl subroutine call. These arguments are C<SV*>, and can be used anywhere
	2111	an C<SV*> is used.
	2112
	2113	Most of the time, output from the C routine can be handled through use of
	2114	the RETVAL and OUTPUT directives. However, there are some cases where the
	2115	argument stack is not already long enough to handle all the return values.
	2116	An example is the POSIX tzname() call, which takes no arguments, but returns
	2117	two, the local time zone's standard and summer time abbreviations.
	2118
	2119	To handle this situation, the PPCODE directive is used and the stack is
	2120	extended using the macro:
	2121
924508f0	2122	EXTEND(SP, num);
5f05dabc	2123
924508f0 GS	2124	where C<SP> is the macro that represents the local copy of the stack pointer,
924508f0 GS	2125	and C<num> is the number of elements the stack should be extended by.
5f05dabc	2126
00aadd71	2127	Now that there is room on the stack, values can be pushed on it using C<PUSHs>
10e2eb10	2128	macro. The pushed values will often need to be "mortal" (See
d82b684c	2129	L</Reference Counts and Mortality>):
5f05dabc	2130
00aadd71	2131	PUSHs(sv_2mortal(newSViv(an_integer)))
d82b684c SH	2132	PUSHs(sv_2mortal(newSVuv(an_unsigned_integer)))
d82b684c SH	2133	PUSHs(sv_2mortal(newSVnv(a_double)))
00aadd71	2134	PUSHs(sv_2mortal(newSVpv("Some String",0)))
a9b0660e KW	2135	/* Although the last example is better written as the more
a9b0660e KW	2136	* efficient: */
a3179684	2137	PUSHs(newSVpvs_flags("Some String", SVs_TEMP))
5f05dabc	2138
	2139	And now the Perl program calling C<tzname>, the two values will be assigned
	2140	as in:
	2141
	2142	($standard_abbrev, $summer_abbrev) = POSIX::tzname;
	2143
	2144	An alternate (and possibly simpler) method to pushing values on the stack is
00aadd71	2145	to use the macro:
5f05dabc	2146
5f05dabc	2147	XPUSHs(SV*)
5f05dabc	2148
da8c5729	2149	This macro automatically adjusts the stack for you, if needed. Thus, you
5f05dabc	2150	do not need to call C<EXTEND> to extend the stack.
00aadd71 NIS	2151
00aadd71 NIS	2152	Despite their suggestions in earlier versions of this document the macros
d82b684c SH	2153	C<(X)PUSH[iunp]> are I<not> suited to XSUBs which return multiple results.
	2154	For that, either stick to the C<(X)PUSHs> macros shown above, or use the new
	2155	C<m(X)PUSH[iunp]> macros instead; see L</Putting a C value on Perl stack>.
5f05dabc	2156
	2157	For more information, consult L<perlxs> and L<perlxstut>.
	2158
5b36e945 FC	2159	=head2 Autoloading with XSUBs
	2160
	2161	If an AUTOLOAD routine is an XSUB, as with Perl subroutines, Perl puts the
	2162	fully-qualified name of the autoloaded subroutine in the $AUTOLOAD variable
	2163	of the XSUB's package.
	2164
	2165	But it also puts the same information in certain fields of the XSUB itself:
	2166
	2167	HV *stash = CvSTASH(cv);
	2168	const char *subname = SvPVX(cv);
	2169	STRLEN name_length = SvCUR(cv); /* in bytes */
	2170	U32 is_utf8 = SvUTF8(cv);
f703fc96	2171
5b36e945	2172	C<SvPVX(cv)> contains just the sub name itself, not including the package.
d8893903 FC	2173	For an AUTOLOAD routine in UNIVERSAL or one of its superclasses,
d8893903 FC	2174	C<CvSTASH(cv)> returns NULL during a method call on a nonexistent package.
5b36e945 FC	2175
	2176	B<Note>: Setting $AUTOLOAD stopped working in 5.6.1, which did not support
	2177	XS AUTOLOAD subs at all. Perl 5.8.0 introduced the use of fields in the
	2178	XSUB itself. Perl 5.16.0 restored the setting of $AUTOLOAD. If you need
	2179	to support 5.8-5.14, use the XSUB's fields.
	2180
5f05dabc	2181	=head2 Calling Perl Routines from within C Programs
a0d0e21e LW	2182
	2183	There are four routines that can be used to call a Perl subroutine from
	2184	within a C program. These four are:
	2185
954c1994 GS	2186	I32 call_sv(SV*, I32);
	2187	I32 call_pv(const char*, I32);
	2188	I32 call_method(const char*, I32);
5aaab254	2189	I32 call_argv(const char, I32, char*);
a0d0e21e	2190
954c1994	2191	The routine most often used is C<call_sv>. The C<SV*> argument
d1b91892 AD	2192	contains either the name of the Perl subroutine to be called, or a
	2193	reference to the subroutine. The second argument consists of flags
	2194	that control the context in which the subroutine is called, whether
	2195	or not the subroutine is being passed arguments, how errors should be
	2196	trapped, and how to treat return values.
a0d0e21e LW	2197
	2198	All four routines return the number of arguments that the subroutine returned
	2199	on the Perl stack.
	2200
9a68f1db	2201	These routines used to be called C<perl_call_sv>, etc., before Perl v5.6.0,
954c1994 GS	2202	but those names are now deprecated; macros of the same name are provided for
	2203	compatibility.
	2204
	2205	When using any of these routines (except C<call_argv>), the programmer
d1b91892 AD	2206	must manipulate the Perl stack. These include the following macros and
d1b91892 AD	2207	functions:
a0d0e21e LW	2208
a0d0e21e LW	2209	dSP
924508f0	2210	SP
a0d0e21e LW	2211	PUSHMARK()
	2212	PUTBACK
	2213	SPAGAIN
	2214	ENTER
	2215	SAVETMPS
	2216	FREETMPS
	2217	LEAVE
	2218	XPUSH*()
cb1a09d0	2219	POP*()
a0d0e21e	2220
5f05dabc	2221	For a detailed description of calling conventions from C to Perl,
5f05dabc	2222	consult L<perlcall>.
a0d0e21e	2223
8ebc5c01	2224	=head2 Putting a C value on Perl stack
ce3d39e2 IZ	2225
ce3d39e2 IZ	2226	A lot of opcodes (this is an elementary operation in the internal perl
10e2eb10 FC	2227	stack machine) put an SV* on the stack. However, as an optimization
10e2eb10 FC	2228	the corresponding SV is (usually) not recreated each time. The opcodes
ce3d39e2 IZ	2229	reuse specially assigned SVs (I<target>s) which are (as a corollary)
	2230	not constantly freed/created.
	2231
0a753a76	2232	Each of the targets is created only once (but see
5a0de581	2233	L</Scratchpads and recursion> below), and when an opcode needs to put
01825556	2234	an integer, a double, or a string on the stack, it just sets the
ce3d39e2 IZ	2235	corresponding parts of its I<target> and puts the I<target> on stack.
	2236
	2237	The macro to put this target on stack is C<PUSHTARG>, and it is
	2238	directly used in some opcodes, as well as indirectly in zillions of
d82b684c	2239	others, which use it via C<(X)PUSH[iunp]>.
ce3d39e2	2240
1bd1c0d5	2241	Because the target is reused, you must be careful when pushing multiple
10e2eb10	2242	values on the stack. The following code will not do what you think:
1bd1c0d5 SC	2243
	2244	XPUSHi(10);
	2245	XPUSHi(20);
	2246
	2247	This translates as "set C<TARG> to 10, push a pointer to C<TARG> onto
	2248	the stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack".
	2249	At the end of the operation, the stack does not contain the values 10
	2250	and 20, but actually contains two pointers to C<TARG>, which we have set
d82b684c	2251	to 20.
1bd1c0d5	2252
d82b684c SH	2253	If you need to push multiple different values then you should either use
	2254	the C<(X)PUSHs> macros, or else use the new C<m(X)PUSH[iunp]> macros,
	2255	none of which make use of C<TARG>. The C<(X)PUSHs> macros simply push an
	2256	SV* on the stack, which, as noted under L</XSUBs and the Argument Stack>,
	2257	will often need to be "mortal". The new C<m(X)PUSH[iunp]> macros make
	2258	this a little easier to achieve by creating a new mortal for you (via
	2259	C<(X)PUSHmortal>), pushing that onto the stack (extending it if necessary
	2260	in the case of the C<mXPUSH[iunp]> macros), and then setting its value.
	2261	Thus, instead of writing this to "fix" the example above:
	2262
	2263	XPUSHs(sv_2mortal(newSViv(10)))
	2264	XPUSHs(sv_2mortal(newSViv(20)))
	2265
	2266	you can simply write:
	2267
	2268	mXPUSHi(10)
	2269	mXPUSHi(20)
	2270
	2271	On a related note, if you do use C<(X)PUSH[iunp]>, then you're going to
1bd1c0d5	2272	need a C<dTARG> in your variable declarations so that the C<PUSH>
0985f7e5	2273	macros can make use of the local variable C<TARG>. See also
6ef63541	2274	C<dTARGET> and C<dXSTARG>.
1bd1c0d5	2275
8ebc5c01	2276	=head2 Scratchpads
ce3d39e2	2277
54310121	2278	The question remains on when the SVs which are I<target>s for opcodes
10e2eb10	2279	are created. The answer is that they are created when the current
ac036724	2280	unit--a subroutine or a file (for opcodes for statements outside of
10e2eb10	2281	subroutines)--is compiled. During this time a special anonymous Perl
ac036724	2282	array is created, which is called a scratchpad for the current unit.
ce3d39e2	2283
54310121	2284	A scratchpad keeps SVs which are lexicals for the current unit and are
d777b41a FC	2285	targets for opcodes. A previous version of this document
d777b41a FC	2286	stated that one can deduce that an SV lives on a scratchpad
ce3d39e2	2287	by looking on its flags: lexicals have C<SVs_PADMY> set, and
eee3e302	2288	I<target>s have C<SVs_PADTMP> set. But this has never been fully true.
d777b41a FC	2289	C<SVs_PADMY> could be set on a variable that no longer resides in any pad.
d777b41a FC	2290	While I<target>s do have C<SVs_PADTMP> set, it can also be set on variables
eed77337 FC	2291	that have never resided in a pad, but nonetheless act like I<target>s. As
	2292	of perl 5.21.5, the C<SVs_PADMY> flag is no longer used and is defined as
	2293	0. C<SvPADMY()> now returns true for anything without C<SVs_PADTMP>.
ce3d39e2	2294
6ef63541 KW	2295	=for apidoc_section $pad
	2296	=for apidoc Amnh\|\|SVs_PADTMP
	2297	=for apidoc AmnhD\|\|SVs_PADMY
	2298
10e2eb10	2299	The correspondence between OPs and I<target>s is not 1-to-1. Different
54310121	2300	OPs in the compile tree of the unit can use the same target, if this
ce3d39e2 IZ	2301	would not conflict with the expected life of the temporary.
ce3d39e2 IZ	2302
2ae324a7	2303	=head2 Scratchpads and recursion
ce3d39e2 IZ	2304
ce3d39e2 IZ	2305	In fact it is not 100% true that a compiled unit contains a pointer to
10e2eb10 FC	2306	the scratchpad AV. In fact it contains a pointer to an AV of
10e2eb10 FC	2307	(initially) one element, and this element is the scratchpad AV. Why do
ce3d39e2 IZ	2308	we need an extra level of indirection?
ce3d39e2 IZ	2309
10e2eb10	2310	The answer is B<recursion>, and maybe B<threads>. Both
ce3d39e2	2311	these can create several execution pointers going into the same
10e2eb10	2312	subroutine. For the subroutine-child not write over the temporaries
ce3d39e2 IZ	2313	for the subroutine-parent (lifespan of which covers the call to the
ce3d39e2 IZ	2314	child), the parent and the child should have different
10e2eb10	2315	scratchpads. (I<And> the lexicals should be separate anyway!)
ce3d39e2	2316
5f05dabc	2317	So each subroutine is born with an array of scratchpads (of length 1).
5f05dabc	2318	On each entry to the subroutine it is checked that the current
ce3d39e2 IZ	2319	depth of the recursion is not more than the length of this array, and
	2320	if it is, new scratchpad is created and pushed into the array.
	2321
	2322	The I<target>s on this scratchpad are C<undef>s, but they are already
	2323	marked with correct flags.
	2324
22d36020 FC	2325	=head1 Memory Allocation
	2326
	2327	=head2 Allocation
	2328
	2329	All memory meant to be used with the Perl API functions should be manipulated
	2330	using the macros described in this section. The macros provide the necessary
	2331	transparency between differences in the actual malloc implementation that is
	2332	used within perl.
	2333
22d36020 FC	2334	The following three macros are used to initially allocate memory :
	2335
	2336	Newx(pointer, number, type);
	2337	Newxc(pointer, number, type, cast);
	2338	Newxz(pointer, number, type);
	2339
	2340	The first argument C<pointer> should be the name of a variable that will
	2341	point to the newly allocated memory.
	2342
	2343	The second and third arguments C<number> and C<type> specify how many of
	2344	the specified type of data structure should be allocated. The argument
	2345	C<type> is passed to C<sizeof>. The final argument to C<Newxc>, C<cast>,
	2346	should be used if the C<pointer> argument is different from the C<type>
	2347	argument.
	2348
	2349	Unlike the C<Newx> and C<Newxc> macros, the C<Newxz> macro calls C<memzero>
	2350	to zero out all the newly allocated memory.
	2351
	2352	=head2 Reallocation
	2353
	2354	Renew(pointer, number, type);
	2355	Renewc(pointer, number, type, cast);
	2356	Safefree(pointer)
	2357
	2358	These three macros are used to change a memory buffer size or to free a
	2359	piece of memory no longer needed. The arguments to C<Renew> and C<Renewc>
	2360	match those of C<New> and C<Newc> with the exception of not needing the
	2361	"magic cookie" argument.
	2362
	2363	=head2 Moving
	2364
	2365	Move(source, dest, number, type);
	2366	Copy(source, dest, number, type);
	2367	Zero(dest, number, type);
	2368
	2369	These three macros are used to move, copy, or zero out previously allocated
	2370	memory. The C<source> and C<dest> arguments point to the source and
	2371	destination starting points. Perl will move, copy, or zero out C<number>
	2372	instances of the size of the C<type> data structure (using the C<sizeof>
	2373	function).
	2374
	2375	=head1 PerlIO
	2376
	2377	The most recent development releases of Perl have been experimenting with
	2378	removing Perl's dependency on the "normal" standard I/O suite and allowing
	2379	other stdio implementations to be used. This involves creating a new
	2380	abstraction layer that then calls whichever implementation of stdio Perl
	2381	was compiled with. All XSUBs should now use the functions in the PerlIO
	2382	abstraction layer and not make any assumptions about what kind of stdio
	2383	is being used.
	2384
	2385	For a complete description of the PerlIO abstraction, consult L<perlapio>.
	2386
0a753a76	2387	=head1 Compiled code
	2388
	2389	=head2 Code tree
	2390
	2391	Here we describe the internal form your code is converted to by
10e2eb10	2392	Perl. Start with a simple example:
0a753a76	2393
	2394	$a = $b + $c;
	2395
	2396	This is converted to a tree similar to this one:
	2397
	2398	assign-to
	2399	/ \
	2400	+ $a
	2401	/ \
	2402	$b $c
	2403
7b8d334a	2404	(but slightly more complicated). This tree reflects the way Perl
0a753a76	2405	parsed your code, but has nothing to do with the execution order.
	2406	There is an additional "thread" going through the nodes of the tree
	2407	which shows the order of execution of the nodes. In our simplified
	2408	example above it looks like:
	2409
	2410	$b ---> $c ---> + ---> $a ---> assign-to
	2411
	2412	But with the actual compile tree for C<$a = $b + $c> it is different:
	2413	some nodes I<optimized away>. As a corollary, though the actual tree
	2414	contains more nodes than our simplified example, the execution order
	2415	is the same as in our example.
	2416
	2417	=head2 Examining the tree
	2418
06f6df17 RGS	2419	If you have your perl compiled for debugging (usually done with
06f6df17 RGS	2420	C<-DDEBUGGING> on the C<Configure> command line), you may examine the
0a753a76	2421	compiled tree by specifying C<-Dx> on the Perl command line. The
	2422	output takes several lines per node, and for C<$b+$c> it looks like
	2423	this:
	2424
	2425	5 TYPE = add ===> 6
	2426	TARG = 1
	2427	FLAGS = (SCALAR,KIDS)
	2428	{
	2429	TYPE = null ===> (4)
	2430	(was rv2sv)
	2431	FLAGS = (SCALAR,KIDS)
	2432	{
	2433	3 TYPE = gvsv ===> 4
	2434	FLAGS = (SCALAR)
	2435	GV = main::b
	2436	}
	2437	}
	2438	{
	2439	TYPE = null ===> (5)
	2440	(was rv2sv)
	2441	FLAGS = (SCALAR,KIDS)
	2442	{
	2443	4 TYPE = gvsv ===> 5
	2444	FLAGS = (SCALAR)
	2445	GV = main::c
	2446	}
	2447	}
	2448
	2449	This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are
	2450	not optimized away (one per number in the left column). The immediate
	2451	children of the given node correspond to C<{}> pairs on the same level
	2452	of indentation, thus this listing corresponds to the tree:
	2453
	2454	add
	2455	/ \
	2456	null null
	2457	\| \|
	2458	gvsv gvsv
	2459
	2460	The execution order is indicated by C<===E<gt>> marks, thus it is C<3
	2461	4 5 6> (node C<6> is not included into above listing), i.e.,
	2462	C<gvsv gvsv add whatever>.
	2463
9afa14e3	2464	Each of these nodes represents an op, a fundamental operation inside the
10e2eb10	2465	Perl core. The code which implements each operation can be found in the
9afa14e3	2466	F<pp*.c> files; the function which implements the op with type C<gvsv>
10e2eb10	2467	is C<pp_gvsv>, and so on. As the tree above shows, different ops have
9afa14e3	2468	different numbers of children: C<add> is a binary operator, as one would
10e2eb10	2469	expect, and so has two children. To accommodate the various different
9afa14e3 SC	2470	numbers of children, there are various types of op data structure, and
	2471	they link together in different ways.
	2472
10e2eb10	2473	The simplest type of op structure is C<OP>: this has no children. Unary
9afa14e3	2474	operators, C<UNOP>s, have one child, and this is pointed to by the
10e2eb10 FC	2475	C<op_first> field. Binary operators (C<BINOP>s) have not only an
	2476	C<op_first> field but also an C<op_last> field. The most complex type of
	2477	op is a C<LISTOP>, which has any number of children. In this case, the
9afa14e3	2478	first child is pointed to by C<op_first> and the last child by
10e2eb10	2479	C<op_last>. The children in between can be found by iteratively
86cd3a13	2480	following the C<OpSIBLING> pointer from the first child to the last (but
29e61fd9	2481	see below).
9afa14e3	2482
7cc7ada7	2483	=for apidoc_section $optree_construction
63dbc4a9 KW	2484	=for apidoc Ayh\|\|OP
	2485	=for apidoc Ayh\|\|BINOP
	2486	=for apidoc Ayh\|\|LISTOP
	2487	=for apidoc Ayh\|\|UNOP
	2488
29e61fd9	2489	There are also some other op types: a C<PMOP> holds a regular expression,
10e2eb10 FC	2490	and has no children, and a C<LOOP> may or may not have children. If the
10e2eb10 FC	2491	C<op_children> field is non-zero, it behaves like a C<LISTOP>. To
9afa14e3 SC	2492	complicate matters, if a C<UNOP> is actually a C<null> op after
	2493	optimization (see L</Compile pass 2: context propagation>) it will still
	2494	have children in accordance with its former type.
	2495
63dbc4a9 KW	2496	=for apidoc Ayh\|\|LOOP
	2497	=for apidoc Ayh\|\|PMOP
	2498
29e61fd9 DM	2499	Finally, there is a C<LOGOP>, or logic op. Like a C<LISTOP>, this has one
29e61fd9 DM	2500	or more children, but it doesn't have an C<op_last> field: so you have to
86cd3a13	2501	follow C<op_first> and then the C<OpSIBLING> chain itself to find the
29e61fd9 DM	2502	last child. Instead it has an C<op_other> field, which is comparable to
	2503	the C<op_next> field described below, and represents an alternate
	2504	execution path. Operators like C<and>, C<or> and C<?> are C<LOGOP>s. Note
	2505	that in general, C<op_other> may not point to any of the direct children
	2506	of the C<LOGOP>.
	2507
63dbc4a9 KW	2508	=for apidoc Ayh\|\|LOGOP
63dbc4a9 KW	2509
29e61fd9 DM	2510	Starting in version 5.21.2, perls built with the experimental
29e61fd9 DM	2511	define C<-DPERL_OP_PARENT> add an extra boolean flag for each op,
87b5a8b9	2512	C<op_moresib>. When not set, this indicates that this is the last op in an
86cd3a13 DM	2513	C<OpSIBLING> chain. This frees up the C<op_sibling> field on the last
	2514	sibling to point back to the parent op. Under this build, that field is
	2515	also renamed C<op_sibparent> to reflect its joint role. The macro
	2516	C<OpSIBLING(o)> wraps this special behaviour, and always returns NULL on
	2517	the last sibling. With this build the C<op_parent(o)> function can be
	2518	used to find the parent of any op. Thus for forward compatibility, you
	2519	should always use the C<OpSIBLING(o)> macro rather than accessing
	2520	C<op_sibling> directly.
29e61fd9	2521
06f6df17 RGS	2522	Another way to examine the tree is to use a compiler back-end module, such
	2523	as L<B::Concise>.
	2524
0a753a76	2525	=head2 Compile pass 1: check routines
0a753a76	2526
8870b5c7	2527	The tree is created by the compiler while I<yacc> code feeds it
10e2eb10	2528	the constructions it recognizes. Since I<yacc> works bottom-up, so does
0a753a76	2529	the first pass of perl compilation.
	2530
	2531	What makes this pass interesting for perl developers is that some
	2532	optimization may be performed on this pass. This is optimization by
8870b5c7	2533	so-called "check routines". The correspondence between node names
0a753a76	2534	and corresponding check routines is described in F<opcode.pl> (do not
	2535	forget to run C<make regen_headers> if you modify this file).
	2536
	2537	A check routine is called when the node is fully constructed except
7b8d334a	2538	for the execution-order thread. Since at this time there are no
0a753a76	2539	back-links to the currently constructed node, one can do most any
	2540	operation to the top-level node, including freeing it and/or creating
	2541	new nodes above/below it.
	2542
	2543	The check routine returns the node which should be inserted into the
	2544	tree (if the top-level node was not modified, check routine returns
	2545	its argument).
	2546
10e2eb10	2547	By convention, check routines have names C<ck_*>. They are usually
0a753a76	2548	called from C<new*OP> subroutines (or C<convert>) (which in turn are
	2549	called from F<perly.y>).
	2550
	2551	=head2 Compile pass 1a: constant folding
	2552
	2553	Immediately after the check routine is called the returned node is
	2554	checked for being compile-time executable. If it is (the value is
	2555	judged to be constant) it is immediately executed, and a I<constant>
	2556	node with the "return value" of the corresponding subtree is
	2557	substituted instead. The subtree is deleted.
	2558
	2559	If constant folding was not performed, the execution-order thread is
	2560	created.
	2561
	2562	=head2 Compile pass 2: context propagation
	2563
	2564	When a context for a part of compile tree is known, it is propagated
a3cb178b	2565	down through the tree. At this time the context can have 5 values
0a753a76	2566	(instead of 2 for runtime context): void, boolean, scalar, list, and
	2567	lvalue. In contrast with the pass 1 this pass is processed from top
	2568	to bottom: a node's context determines the context for its children.
	2569
	2570	Additional context-dependent optimizations are performed at this time.
	2571	Since at this moment the compile tree contains back-references (via
	2572	"thread" pointers), nodes cannot be free()d now. To allow
	2573	optimized-away nodes at this stage, such nodes are null()ified instead
	2574	of free()ing (i.e. their type is changed to OP_NULL).
	2575
	2576	=head2 Compile pass 3: peephole optimization
	2577
	2578	After the compile tree for a subroutine (or for an C<eval> or a file)
10e2eb10	2579	is created, an additional pass over the code is performed. This pass
0a753a76	2580	is neither top-down or bottom-up, but in the execution order (with
9ea12537 Z	2581	additional complications for conditionals). Optimizations performed
	2582	at this stage are subject to the same restrictions as in the pass 2.
	2583
	2584	Peephole optimizations are done by calling the function pointed to
	2585	by the global variable C<PL_peepp>. By default, C<PL_peepp> just
	2586	calls the function pointed to by the global variable C<PL_rpeepp>.
	2587	By default, that performs some basic op fixups and optimisations along
	2588	the execution-order op chain, and recursively calls C<PL_rpeepp> for
	2589	each side chain of ops (resulting from conditionals). Extensions may
	2590	provide additional optimisations or fixups, hooking into either the
	2591	per-subroutine or recursive stage, like this:
	2592
	2593	static peep_t prev_peepp;
	2594	static void my_peep(pTHX_ OP *o)
	2595	{
	2596	/* custom per-subroutine optimisation goes here */
f0358462	2597	prev_peepp(aTHX_ o);
9ea12537 Z	2598	/* custom per-subroutine optimisation may also go here */
	2599	}
	2600	BOOT:
	2601	prev_peepp = PL_peepp;
	2602	PL_peepp = my_peep;
	2603
	2604	static peep_t prev_rpeepp;
39f7bd8a	2605	static void my_rpeep(pTHX_ OP *first)
9ea12537	2606	{
39f7bd8a IB	2607	OP o = first, t = first;
39f7bd8a IB	2608	for(; o = o->op_next, t = t->op_next) {
9ea12537	2609	/* custom per-op optimisation goes here */
39f7bd8a IB	2610	o = o->op_next;
	2611	if (!o \|\| o == t) break;
	2612	/* custom per-op optimisation goes AND here */
9ea12537	2613	}
f0358462	2614	prev_rpeepp(aTHX_ orig_o);
9ea12537 Z	2615	}
	2616	BOOT:
	2617	prev_rpeepp = PL_rpeepp;
	2618	PL_rpeepp = my_rpeep;
0a753a76	2619
7cc7ada7	2620	=for apidoc_section $optree_manipulation
63dbc4a9 KW	2621	=for apidoc Ayh\|\|peep_t
63dbc4a9 KW	2622
1ba7f851 PJ	2623	=head2 Pluggable runops
	2624
	2625	The compile tree is executed in a runops function. There are two runops
1388f78e RGS	2626	functions, in F<run.c> and in F<dump.c>. C<Perl_runops_debug> is used
	2627	with DEBUGGING and C<Perl_runops_standard> is used otherwise. For fine
	2628	control over the execution of the compile tree it is possible to provide
	2629	your own runops function.
1ba7f851 PJ	2630
	2631	It's probably best to copy one of the existing runops functions and
	2632	change it to suit your needs. Then, in the BOOT section of your XS
	2633	file, add the line:
	2634
	2635	PL_runops = my_runops;
	2636
7cc7ada7	2637	=for apidoc_section $debugging
6ef63541 KW	2638	=for apidoc runops_debug
6ef63541 KW	2639	=for apidoc runops_standard
6a7c980a KW	2640	=for apidoc Amnh\|runops_proc_t\|PL_runops
6a7c980a KW	2641
1ba7f851 PJ	2642	This function should be as efficient as possible to keep your programs
	2643	running as fast as possible.
	2644
fd85fad2 BM	2645	=head2 Compile-time scope hooks
	2646
	2647	As of perl 5.14 it is possible to hook into the compile-time lexical
10e2eb10	2648	scope mechanism using C<Perl_blockhook_register>. This is used like
fd85fad2 BM	2649	this:
	2650
	2651	STATIC void my_start_hook(pTHX_ int full);
	2652	STATIC BHK my_hooks;
	2653
	2654	BOOT:
a88d97bf	2655	BhkENTRY_set(&my_hooks, bhk_start, my_start_hook);
fd85fad2 BM	2656	Perl_blockhook_register(aTHX_ &my_hooks);
	2657
	2658	This will arrange to have C<my_start_hook> called at the start of
10e2eb10	2659	compiling every lexical scope. The available hooks are:
fd85fad2	2660
7cc7ada7	2661	=for apidoc_section $lexer
63dbc4a9 KW	2662	=for apidoc Ayh\|\|BHK
63dbc4a9 KW	2663
fd85fad2 BM	2664	=over 4
fd85fad2 BM	2665
a88d97bf	2666	=item C<void bhk_start(pTHX_ int full)>
fd85fad2	2667
10e2eb10	2668	This is called just after starting a new lexical scope. Note that Perl
fd85fad2 BM	2669	code like
	2670
	2671	if ($x) { ... }
	2672
	2673	creates two scopes: the first starts at the C<(> and has C<full == 1>,
10e2eb10	2674	the second starts at the C<{> and has C<full == 0>. Both end at the
f185f654	2675	C<}>, so calls to C<start> and C<pre>/C<post_end> will match. Anything
fd85fad2 BM	2676	pushed onto the save stack by this hook will be popped just before the
	2677	scope ends (between the C<pre_> and C<post_end> hooks, in fact).
	2678
a88d97bf	2679	=item C<void bhk_pre_end(pTHX_ OP **o)>
fd85fad2 BM	2680
fd85fad2 BM	2681	This is called at the end of a lexical scope, just before unwinding the
10e2eb10	2682	stack. I<o> is the root of the optree representing the scope; it is a
fd85fad2 BM	2683	double pointer so you can replace the OP if you need to.
fd85fad2 BM	2684
a88d97bf	2685	=item C<void bhk_post_end(pTHX_ OP **o)>
fd85fad2 BM	2686
fd85fad2 BM	2687	This is called at the end of a lexical scope, just after unwinding the
10e2eb10	2688	stack. I<o> is as above. Note that it is possible for calls to C<pre_>
fd85fad2 BM	2689	and C<post_end> to nest, if there is something on the save stack that
	2690	calls string eval.
	2691
a88d97bf	2692	=item C<void bhk_eval(pTHX_ OP *const o)>
fd85fad2 BM	2693
fd85fad2 BM	2694	This is called just before starting to compile an C<eval STRING>, C<do
10e2eb10	2695	FILE>, C<require> or C<use>, after the eval has been set up. I<o> is the
fd85fad2 BM	2696	OP that requested the eval, and will normally be an C<OP_ENTEREVAL>,
	2697	C<OP_DOFILE> or C<OP_REQUIRE>.
	2698
	2699	=back
	2700
	2701	Once you have your hook functions, you need a C<BHK> structure to put
10e2eb10 FC	2702	them in. It's best to allocate it statically, since there is no way to
10e2eb10 FC	2703	free it once it's registered. The function pointers should be inserted
fd85fad2	2704	into this structure using the C<BhkENTRY_set> macro, which will also set
10e2eb10	2705	flags indicating which entries are valid. If you do need to allocate
fd85fad2 BM	2706	your C<BHK> dynamically for some reason, be sure to zero it before you
	2707	start.
	2708
	2709	Once registered, there is no mechanism to switch these hooks off, so if
10e2eb10	2710	that is necessary you will need to do this yourself. An entry in C<%^H>
a3e07c87 BM	2711	is probably the best way, so the effect is lexically scoped; however it
a3e07c87 BM	2712	is also possible to use the C<BhkDISABLE> and C<BhkENABLE> macros to
10e2eb10	2713	temporarily switch entries on and off. You should also be aware that
a3e07c87	2714	generally speaking at least one scope will have opened before your
f185f654	2715	extension is loaded, so you will see some C<pre>/C<post_end> pairs that
a3e07c87	2716	didn't have a matching C<start>.
fd85fad2	2717
9afa14e3 SC	2718	=head1 Examining internal data structures with the C<dump> functions
	2719
	2720	To aid debugging, the source file F<dump.c> contains a number of
	2721	functions which produce formatted output of internal data structures.
	2722
	2723	The most commonly used of these functions is C<Perl_sv_dump>; it's used
10e2eb10	2724	for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls
9afa14e3	2725	C<sv_dump> to produce debugging output from Perl-space, so users of that
00aadd71	2726	module should already be familiar with its format.
9afa14e3 SC	2727
9afa14e3 SC	2728	C<Perl_op_dump> can be used to dump an C<OP> structure or any of its
210b36aa	2729	derivatives, and produces output similar to C<perl -Dx>; in fact,
9afa14e3 SC	2730	C<Perl_dump_eval> will dump the main root of the code being evaluated,
	2731	exactly like C<-Dx>.
	2732
03c0fc11 KW	2733	=for apidoc_section $debugging
	2734	=for apidoc dump_eval
	2735
9afa14e3 SC	2736	Other useful functions are C<Perl_dump_sub>, which turns a C<GV> into an
	2737	op tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the
	2738	subroutines in a package like so: (Thankfully, these are all xsubs, so
	2739	there is no op tree)
	2740
6ef63541 KW	2741	=for apidoc_section $debugging
	2742	=for apidoc dump_sub
	2743
9afa14e3 SC	2744	(gdb) print Perl_dump_packsubs(PL_defstash)
	2745
	2746	SUB attributes::bootstrap = (xsub 0x811fedc 0)
	2747
	2748	SUB UNIVERSAL::can = (xsub 0x811f50c 0)
	2749
	2750	SUB UNIVERSAL::isa = (xsub 0x811f304 0)
	2751
	2752	SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0)
	2753
	2754	SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0)
	2755
	2756	and C<Perl_dump_all>, which dumps all the subroutines in the stash and
	2757	the op tree of the main root.
	2758
954c1994	2759	=head1 How multiple interpreters and concurrency are supported
ee072b34	2760
6e512bc2	2761	=head2 Background and MULTIPLICITY
ee072b34	2762
6ef63541 KW	2763	=for apidoc_section $concurrency
	2764	=for apidoc Amnh\|\|PERL_IMPLICIT_CONTEXT
	2765
ee072b34 GS	2766	The Perl interpreter can be regarded as a closed box: it has an API
	2767	for feeding it code or otherwise making it do things, but it also has
	2768	functions for its own use. This smells a lot like an object, and
8c3a0f6c	2769	there is a way for you to build Perl so that you can have multiple
acfe0abc GS	2770	interpreters, with one interpreter represented either as a C structure,
	2771	or inside a thread-specific structure. These structures contain all
	2772	the context, the state of that interpreter.
	2773
8c3a0f6c	2774	The macro that controls the major Perl build flavor is MULTIPLICITY. The
7b52221d	2775	MULTIPLICITY build has a C structure that packages all the interpreter
6e512bc2 TK	2776	state, which is being passed to various perl functions as a "hidden"
	2777	first argument. MULTIPLICITY makes multi-threaded perls possible (with the
	2778	ithreads threading model, related to the macro USE_ITHREADS.)
	2779
	2780	PERL_IMPLICIT_CONTEXT is a legacy synonym for MULTIPLICITY.
54aff467	2781
6ef63541 KW	2782	=for apidoc_section $concurrency
	2783	=for apidoc Amnh\|\|MULTIPLICITY
	2784
9aa97215 JH	2785	To see whether you have non-const data you can use a BSD (or GNU)
9aa97215 JH	2786	compatible C<nm>:
bc028b6b JH	2787
	2788	nm libperl.a \| grep -v ' [TURtr] '
	2789
9aa97215 JH	2790	If this displays any C<D> or C<d> symbols (or possibly C<C> or C<c>),
	2791	you have non-const data. The symbols the C<grep> removed are as follows:
	2792	C<Tt> are I<text>, or code, the C<Rr> are I<read-only> (const) data,
	2793	and the C<U> is <undefined>, external symbols referred to.
	2794
	2795	The test F<t/porting/libperl.t> does this kind of symbol sanity
	2796	checking on C<libperl.a>.
bc028b6b	2797
54aff467	2798	All this obviously requires a way for the Perl internal functions to be
acfe0abc	2799	either subroutines taking some kind of structure as the first
ee072b34	2800	argument, or subroutines taking nothing as the first argument. To
acfe0abc	2801	enable these two very different ways of building the interpreter,
ee072b34 GS	2802	the Perl source (as it does in so many other situations) makes heavy
	2803	use of macros and subroutine naming conventions.
	2804
54aff467	2805	First problem: deciding which functions will be public API functions and
00aadd71	2806	which will be private. All functions whose names begin C<S_> are private
954c1994 GS	2807	(think "S" for "secret" or "static"). All other functions begin with
954c1994 GS	2808	"Perl_", but just because a function begins with "Perl_" does not mean it is
10e2eb10 FC	2809	part of the API. (See L</Internal
10e2eb10 FC	2810	Functions>.) The easiest way to be B<sure> a
00aadd71 NIS	2811	function is part of the API is to find its entry in L<perlapi>.
00aadd71 NIS	2812	If it exists in L<perlapi>, it's part of the API. If it doesn't, and you
8166b4e0 DB	2813	think it should be (i.e., you need it for your extension), submit an issue at
8166b4e0 DB	2814	L<https://github.com/Perl/perl5/issues> explaining why you think it should be.
ee072b34 GS	2815
	2816	Second problem: there must be a syntax so that the same subroutine
	2817	declarations and calls can pass a structure as their first argument,
	2818	or pass nothing. To solve this, the subroutines are named and
	2819	declared in a particular way. Here's a typical start of a static
	2820	function used within the Perl guts:
	2821
	2822	STATIC void
	2823	S_incline(pTHX_ char *s)
	2824
acfe0abc	2825	STATIC becomes "static" in C, and may be #define'd to nothing in some
da8c5729	2826	configurations in the future.
ee072b34	2827
3f620621	2828	=for apidoc_section $directives
63dbc4a9 KW	2829	=for apidoc Ayh\|\|STATIC
63dbc4a9 KW	2830
651a3225 GS	2831	A public function (i.e. part of the internal API, but not necessarily
651a3225 GS	2832	sanctioned for use in extensions) begins like this:
ee072b34 GS	2833
ee072b34 GS	2834	void
2307c6d0	2835	Perl_sv_setiv(pTHX_ SV* dsv, IV num)
ee072b34	2836
0147cd53	2837	C<pTHX_> is one of a number of macros (in F<perl.h>) that hide the
ee072b34 GS	2838	details of the interpreter's context. THX stands for "thread", "this",
	2839	or "thingy", as the case may be. (And no, George Lucas is not involved. :-)
	2840	The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument,
a7486cbb JH	2841	or 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and
a7486cbb JH	2842	their variants.
ee072b34	2843
3f620621	2844	=for apidoc_section $concurrency
4f313521 KW	2845	=for apidoc Amnh\|\|aTHX
	2846	=for apidoc Amnh\|\|aTHX_
	2847	=for apidoc Amnh\|\|dTHX
	2848	=for apidoc Amnh\|\|pTHX
	2849	=for apidoc Amnh\|\|pTHX_
	2850
6e512bc2	2851	When Perl is built without options that set MULTIPLICITY, there is no
a7486cbb	2852	first argument containing the interpreter's context. The trailing underscore
ee072b34 GS	2853	in the pTHX_ macro indicates that the macro expansion needs a comma
ee072b34 GS	2854	after the context argument because other arguments follow it. If
6e512bc2	2855	MULTIPLICITY is not defined, pTHX_ will be ignored, and the
54aff467 GS	2856	subroutine is not prototyped to take the extra argument. The form of the
54aff467 GS	2857	macro without the trailing underscore is used when there are no additional
ee072b34 GS	2858	explicit arguments.
ee072b34 GS	2859
54aff467	2860	When a core function calls another, it must pass the context. This
2307c6d0	2861	is normally hidden via macros. Consider C<sv_setiv>. It expands into
ee072b34 GS	2862	something like this:
ee072b34 GS	2863
6e512bc2	2864	#ifdef MULTIPLICITY
2307c6d0	2865	#define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b)
ee072b34	2866	/* can't do this for vararg functions, see below */
2307c6d0 SB	2867	#else
	2868	#define sv_setiv Perl_sv_setiv
	2869	#endif
ee072b34 GS	2870
	2871	This works well, and means that XS authors can gleefully write:
	2872
2307c6d0	2873	sv_setiv(foo, bar);
ee072b34 GS	2874
	2875	and still have it work under all the modes Perl could have been
	2876	compiled with.
	2877
ee072b34 GS	2878	This doesn't work so cleanly for varargs functions, though, as macros
	2879	imply that the number of arguments is known in advance. Instead we
	2880	either need to spell them out fully, passing C<aTHX_> as the first
	2881	argument (the Perl core tends to do this with functions like
	2882	Perl_warner), or use a context-free version.
	2883
	2884	The context-free version of Perl_warner is called
	2885	Perl_warner_nocontext, and does not take the extra argument. Instead
10bee092	2886	it does C<dTHX;> to get the context from thread-local storage. We
ee072b34 GS	2887	C<#define warner Perl_warner_nocontext> so that extensions get source
	2888	compatibility at the expense of performance. (Passing an arg is
	2889	cheaper than grabbing it from thread-local storage.)
	2890
acfe0abc	2891	You can ignore [pad]THXx when browsing the Perl headers/sources.
ee072b34 GS	2892	Those are strictly for use within the core. Extensions and embedders
	2893	need only be aware of [pad]THX.
	2894
a7486cbb JH	2895	=head2 So what happened to dTHR?
a7486cbb JH	2896
7cc7ada7	2897	=for apidoc_section $concurrency
4f313521 KW	2898	=for apidoc Amnh\|\|dTHR
4f313521 KW	2899
a7486cbb JH	2900	C<dTHR> was introduced in perl 5.005 to support the older thread model.
	2901	The older thread model now uses the C<THX> mechanism to pass context
	2902	pointers around, so C<dTHR> is not useful any more. Perl 5.6.0 and
	2903	later still have it for backward source compatibility, but it is defined
	2904	to be a no-op.
	2905
ee072b34 GS	2906	=head2 How do I use all this in extensions?
ee072b34 GS	2907
6e512bc2	2908	When Perl is built with MULTIPLICITY, extensions that call
ee072b34 GS	2909	any functions in the Perl API will need to pass the initial context
	2910	argument somehow. The kicker is that you will need to write it in
	2911	such a way that the extension still compiles when Perl hasn't been
6e512bc2	2912	built with MULTIPLICITY enabled.
ee072b34 GS	2913
	2914	There are three ways to do this. First, the easy but inefficient way,
	2915	which is also the default, in order to maintain source compatibility
0147cd53	2916	with extensions: whenever F<XSUB.h> is #included, it redefines the aTHX
ee072b34 GS	2917	and aTHX_ macros to call a function that will return the context.
	2918	Thus, something like:
	2919
2307c6d0	2920	sv_setiv(sv, num);
ee072b34	2921
6e512bc2	2922	in your extension will translate to this when MULTIPLICITY is
54aff467	2923	in effect:
ee072b34	2924
2307c6d0	2925	Perl_sv_setiv(Perl_get_context(), sv, num);
ee072b34	2926
54aff467	2927	or to this otherwise:
ee072b34	2928
2307c6d0	2929	Perl_sv_setiv(sv, num);
ee072b34	2930
da8c5729	2931	You don't have to do anything new in your extension to get this; since
2fa86c13	2932	the Perl library provides Perl_get_context(), it will all just
ee072b34 GS	2933	work.
	2934
	2935	The second, more efficient way is to use the following template for
	2936	your Foo.xs:
	2937
c52f9dcd JH	2938	#define PERL_NO_GET_CONTEXT /* we want efficiency */
	2939	#include "EXTERN.h"
	2940	#include "perl.h"
	2941	#include "XSUB.h"
ee072b34	2942
fd061412	2943	STATIC void my_private_function(int arg1, int arg2);
ee072b34	2944
fd061412	2945	STATIC void
c52f9dcd JH	2946	my_private_function(int arg1, int arg2)
	2947	{
	2948	dTHX; /* fetch context */
	2949	... call many Perl API functions ...
	2950	}
ee072b34 GS	2951
	2952	[... etc ...]
	2953
c52f9dcd	2954	MODULE = Foo PACKAGE = Foo
ee072b34	2955
c52f9dcd	2956	/* typical XSUB */
ee072b34	2957
c52f9dcd JH	2958	void
	2959	my_xsub(arg)
	2960	int arg
	2961	CODE:
	2962	my_private_function(arg, 10);
ee072b34 GS	2963
	2964	Note that the only two changes from the normal way of writing an
	2965	extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before
	2966	including the Perl headers, followed by a C<dTHX;> declaration at
	2967	the start of every function that will call the Perl API. (You'll
	2968	know which functions need this, because the C compiler will complain
	2969	that there's an undeclared identifier in those functions.) No changes
	2970	are needed for the XSUBs themselves, because the XS() macro is
	2971	correctly defined to pass in the implicit context if needed.
	2972
40578475	2973	=for apidoc_section $concurrency
9a7c5cb7	2974	=for apidoc AmnhU#\|\|PERL_NO_GET_CONTEXT
40578475	2975
ee072b34 GS	2976	The third, even more efficient way is to ape how it is done within
	2977	the Perl guts:
	2978
	2979
c52f9dcd JH	2980	#define PERL_NO_GET_CONTEXT /* we want efficiency */
	2981	#include "EXTERN.h"
	2982	#include "perl.h"
	2983	#include "XSUB.h"
ee072b34 GS	2984
ee072b34 GS	2985	/* pTHX_ only needed for functions that call Perl API */
fd061412	2986	STATIC void my_private_function(pTHX_ int arg1, int arg2);
ee072b34	2987
fd061412	2988	STATIC void
c52f9dcd JH	2989	my_private_function(pTHX_ int arg1, int arg2)
	2990	{
	2991	/* dTHX; not needed here, because THX is an argument */
	2992	... call Perl API functions ...
	2993	}
ee072b34 GS	2994
	2995	[... etc ...]
	2996
c52f9dcd	2997	MODULE = Foo PACKAGE = Foo
ee072b34	2998
c52f9dcd	2999	/* typical XSUB */
ee072b34	3000
c52f9dcd JH	3001	void
	3002	my_xsub(arg)
	3003	int arg
	3004	CODE:
	3005	my_private_function(aTHX_ arg, 10);
ee072b34 GS	3006
	3007	This implementation never has to fetch the context using a function
	3008	call, since it is always passed as an extra argument. Depending on
	3009	your needs for simplicity or efficiency, you may mix the previous
	3010	two approaches freely.
	3011
651a3225 GS	3012	Never add a comma after C<pTHX> yourself--always use the form of the
	3013	macro with the underscore for functions that take explicit arguments,
	3014	or the form without the argument for functions with no explicit arguments.
ee072b34	3015
a7486cbb JH	3016	=head2 Should I do anything special if I call perl from multiple threads?
	3017
	3018	If you create interpreters in one thread and then proceed to call them in
	3019	another, you need to make sure perl's own Thread Local Storage (TLS) slot is
	3020	initialized correctly in each of those threads.
	3021
	3022	The C<perl_alloc> and C<perl_clone> API functions will automatically set
	3023	the TLS slot to the interpreter they created, so that there is no need to do
	3024	anything special if the interpreter is always accessed in the same thread that
	3025	created it, and that thread did not create or call any other interpreters
	3026	afterwards. If that is not the case, you have to set the TLS slot of the
	3027	thread before calling any functions in the Perl API on that particular
	3028	interpreter. This is done by calling the C<PERL_SET_CONTEXT> macro in that
	3029	thread as the first thing you do:
	3030
	3031	/* do this before doing anything else with some_perl */
	3032	PERL_SET_CONTEXT(some_perl);
	3033
	3034	... other Perl API calls on some_perl go here ...
	3035
32c3a37b KW	3036	=for apidoc_section $embedding
	3037	=for apidoc Amh\|void\|PERL_SET_CONTEXT\|PerlInterpreter* i
	3038
	3039	(You can always get the current context via C<PERL_GET_CONTEXT>.)
	3040
	3041	=for apidoc Amnh\|PerlInterpreter*\|PERL_GET_CONTEXT\|
	3042
ee072b34 GS	3043	=head2 Future Plans and PERL_IMPLICIT_SYS
ee072b34 GS	3044
6e512bc2	3045	Just as MULTIPLICITY provides a way to bundle up everything
ee072b34 GS	3046	that the interpreter knows about itself and pass it around, so too are
	3047	there plans to allow the interpreter to bundle up everything it knows
	3048	about the environment it's running on. This is enabled with the
7b52221d RGS	3049	PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS on
7b52221d RGS	3050	Windows.
ee072b34 GS	3051
	3052	This allows the ability to provide an extra pointer (called the "host"
	3053	environment) for all the system calls. This makes it possible for
	3054	all the system stuff to maintain their own state, broken down into
	3055	seven C structures. These are thin wrappers around the usual system
0147cd53	3056	calls (see F<win32/perllib.c>) for the default perl executable, but for a
ee072b34 GS	3057	more ambitious host (like the one that would do fork() emulation) all
	3058	the extra work needed to pretend that different interpreters are
	3059	actually different "processes", would be done here.
	3060
	3061	The Perl engine/interpreter and the host are orthogonal entities.
	3062	There could be one or more interpreters in a process, and one or
	3063	more "hosts", with free association between them.
	3064
a422fd2d SC	3065	=head1 Internal Functions
	3066
	3067	All of Perl's internal functions which will be exposed to the outside
06f6df17	3068	world are prefixed by C<Perl_> so that they will not conflict with XS
a422fd2d	3069	functions or functions used in a program in which Perl is embedded.
10e2eb10	3070	Similarly, all global variables begin with C<PL_>. (By convention,
06f6df17	3071	static functions start with C<S_>.)
a422fd2d	3072
0972ecdf DM	3073	Inside the Perl core (C<PERL_CORE> defined), you can get at the functions
0972ecdf DM	3074	either with or without the C<Perl_> prefix, thanks to a bunch of defines
10e2eb10	3075	that live in F<embed.h>. Note that extension code should I<not> set
0972ecdf DM	3076	C<PERL_CORE>; this exposes the full perl internals, and is likely to cause
	3077	breakage of the XS in each new perl release.
	3078
	3079	The file F<embed.h> is generated automatically from
10e2eb10	3080	F<embed.pl> and F<embed.fnc>. F<embed.pl> also creates the prototyping
dc9b1d22	3081	header files for the internal functions, generates the documentation
10e2eb10	3082	and a lot of other bits and pieces. It's important that when you add
dc9b1d22	3083	a new function to the core or change an existing one, you change the
10e2eb10	3084	data in the table in F<embed.fnc> as well. Here's a sample entry from
dc9b1d22	3085	that table:
a422fd2d SC	3086
	3087	Apd \|SV** \|av_fetch \|AV* ar\|I32 key\|I32 lval
	3088
790ba721 KW	3089	The first column is a set of flags, the second column the return type,
	3090	the third column the name. Columns after that are the arguments.
	3091	The flags are documented at the top of F<embed.fnc>.
a422fd2d	3092
dc9b1d22 MHM	3093	If you edit F<embed.pl> or F<embed.fnc>, you will need to run
	3094	C<make regen_headers> to force a rebuild of F<embed.h> and other
	3095	auto-generated files.
a422fd2d	3096
6b4667fc	3097	=head2 Formatted Printing of IVs, UVs, and NVs
9dd9db0b	3098
6b4667fc A	3099	If you are printing IVs, UVs, or NVS instead of the stdio(3) style
	3100	formatting codes like C<%d>, C<%ld>, C<%f>, you should use the
	3101	following macros for portability
9dd9db0b	3102
c52f9dcd JH	3103	IVdf IV in decimal
	3104	UVuf UV in decimal
	3105	UVof UV in octal
	3106	UVxf UV in hexadecimal
	3107	NVef NV %e-like
	3108	NVff NV %f-like
	3109	NVgf NV %g-like
9dd9db0b	3110
6b4667fc A	3111	These will take care of 64-bit integers and long doubles.
	3112	For example:
	3113
9faa5a89	3114	printf("IV is %" IVdf "\n", iv);
6b4667fc	3115
9faa5a89 KW	3116	The C<IVdf> will expand to whatever is the correct format for the IVs.
	3117	Note that the spaces are required around the format in case the code is
	3118	compiled with C++, to maintain compliance with its standard.
9dd9db0b	3119
aacf4ea2 JH	3120	Note that there are different "long doubles": Perl will use
	3121	whatever the compiler has.
	3122
a7c67fbc KW	3123	If you are printing addresses of pointers, use %p or UVxf combined
a7c67fbc KW	3124	with PTR2UV().
8908e76d	3125
2d197238 KW	3126	=head2 Formatted Printing of SVs
	3127
	3128	The contents of SVs may be printed using the C<SVf> format, like so:
	3129
0e13edb0	3130	Perl_croak(aTHX_ "This croaked because: %" SVf "\n", SVfARG(err_msg))
2d197238 KW	3131
	3132	where C<err_msg> is an SV.
	3133
7cc7ada7	3134	=for apidoc_section $io_formats
6015ce9b KW	3135	=for apidoc Amnh\|\|SVf
	3136	=for apidoc Amh\|\|SVfARG\|SV *sv
	3137
2d197238 KW	3138	Not all scalar types are printable. Simple values certainly are: one of
	3139	IV, UV, NV, or PV. Also, if the SV is a reference to some value,
	3140	either it will be dereferenced and the value printed, or information
	3141	about the type of that value and its address are displayed. The results
	3142	of printing any other type of SV are undefined and likely to lead to an
e807022f	3143	interpreter crash. NVs are printed using a C<%g>-ish format.
2d197238 KW	3144
	3145	Note that the spaces are required around the C<SVf> in case the code is
	3146	compiled with C++, to maintain compliance with its standard.
	3147
	3148	Note that any filehandle being printed to under UTF-8 must be expecting
	3149	UTF-8 in order to get good results and avoid Wide-character warnings.
	3150	One way to do this for typical filehandles is to invoke perl with the
2d2503ee	3151	C<-C> parameter. (See L<perlrun/-C [numberE<sol>list]>.
2d197238 KW	3152
	3153	You can use this to concatenate two scalars:
	3154
	3155	SV *var1 = get_sv("var1", GV_ADD);
	3156	SV *var2 = get_sv("var2", GV_ADD);
	3157	SV *var3 = newSVpvf("var1=%" SVf " and var2=%" SVf,
e807022f	3158	SVfARG(var1), SVfARG(var2));
2d197238	3159
33ef5d2c YO	3160	=for apidoc Amnh\|\|SVf_QUOTEDPREFIX
	3161
	3162	C<SVf_QUOTEDPREFIX> is similar to C<SVf> except that it restricts the
	3163	number of the characters printed, showing at most the first
	3164	C<PERL_QUOTEDPREFIX_LEN> characters of the argument, and rendering it with
	3165	double quotes and with the contents escaped using double quoted string
	3166	escaping rules. If the string is longer than this then ellipses "..."
	3167	will be appended after the trailing quote. This is intended for error
	3168	messages where the string is assumed to be a class name.
	3169
332af227 YO	3170	=for apidoc Amnh\|\|HvNAMEf
	3171	=for apidoc Amnh\|\|HvNAMEf_QUOTEDPREFIX
	3172
	3173	C<HvNAMEf> and C<HvNAMEf_QUOTEDPREFIX> are similar to C<SVf> except they
	3174	extract the string, length and utf8 flags from the argument using the
	3175	C<HvNAME()>, C<HvNAMELEN()>, C<HvNAMEUTF8()> macros. This is intended
	3176	for stringifying a class name directly from an stash HV.
	3177
9bec17d7	3178	=head2 Formatted Printing of Strings
8b64b5d1	3179
aae69fa9 P	3180	If you just want the bytes printed in a 7bit NUL-terminated string, you can
	3181	just use C<%s> (assuming they are all really only 7bit). But if there is a
	3182	possibility the value will be encoded as UTF-8 or contains bytes above
	3183	C<0x7F> (and therefore 8bit), you should instead use the C<UTF8f> format.
	3184	And as its parameter, use the C<UTF8fARG()> macro:
9bec17d7 KW	3185
	3186	chr * msg;
	3187
	3188	/* U+2018: \xE2\x80\x98 LEFT SINGLE QUOTATION MARK
	3189	U+2019: \xE2\x80\x99 RIGHT SINGLE QUOTATION MARK */
	3190	if (can_utf8)
	3191	msg = "\xE2\x80\x98Uses fancy quotes\xE2\x80\x99";
	3192	else
	3193	msg = "'Uses simple quotes'";
	3194
	3195	Perl_croak(aTHX_ "The message is: %" UTF8f "\n",
	3196	UTF8fARG(can_utf8, strlen(msg), msg));
8b64b5d1 KW	3197
8b64b5d1 KW	3198	The first parameter to C<UTF8fARG> is a boolean: 1 if the string is in
aae69fa9	3199	UTF-8; 0 if string is in native byte encoding (Latin1).
8b64b5d1 KW	3200	The second parameter is the number of bytes in the string to print.
	3201	And the third and final parameter is a pointer to the first byte in the
	3202	string.
	3203
1f633c5e KW	3204	Note that any filehandle being printed to under UTF-8 must be expecting
	3205	UTF-8 in order to get good results and avoid Wide-character warnings.
	3206	One way to do this for typical filehandles is to invoke perl with the
2d2503ee	3207	C<-C> parameter. (See L<perlrun/-C [numberE<sol>list]>.
1f633c5e	3208
a87f9c51	3209	=for apidoc_section $io_formats
5c29a976	3210	=for apidoc Amnh\|\|UTF8f
33ef5d2c YO	3211	Output a possibly UTF8 value. Be sure to use UTF8fARG() to compose
	3212	the arguments for this format.
	3213	=for apidoc Amnh\|\|UTF8f_QUOTEDPREFIX
	3214	Same as C<UTF8f> but the output is quoted, escaped and length limited.
	3215	See C<SVf_QUOTEDPREFIX> for more details on escaping.
5c29a976 KW	3216	=for apidoc Amh\|\|UTF8fARG\|bool is_utf8\|Size_t byte_len\|char *str
5c29a976 KW	3217
51b56f5c KW	3218	=cut
51b56f5c KW	3219
e613617c	3220	=head2 Formatted Printing of C<Size_t> and C<SSize_t>
5862f74e KW	3221
	3222	The most general way to do this is to cast them to a UV or IV, and
	3223	print as in the
	3224	L<previous section\|/Formatted Printing of IVs, UVs, and NVs>.
	3225
	3226	But if you're using C<PerlIO_printf()>, it's less typing and visual
e807022f	3227	clutter to use the C<%z> length modifier (for I<siZe>):
5862f74e KW	3228
	3229	PerlIO_printf("STRLEN is %zu\n", len);
	3230
	3231	This modifier is not portable, so its use should be restricted to
	3232	C<PerlIO_printf()>.
	3233
f02bba19 KW	3234	=head2 Formatted Printing of C<Ptrdiff_t>, C<intmax_t>, C<short> and other special sizes
	3235
	3236	There are modifiers for these special situations if you are using
	3237	C<PerlIO_printf()>. See L<perlfunc/size>.
	3238
8908e76d JH	3239	=head2 Pointer-To-Integer and Integer-To-Pointer
	3240
	3241	Because pointer size does not necessarily equal integer size,
	3242	use the follow macros to do it right.
	3243
c52f9dcd JH	3244	PTR2UV(pointer)
	3245	PTR2IV(pointer)
	3246	PTR2NV(pointer)
	3247	INT2PTR(pointertotype, integer)
8908e76d	3248
3f620621	3249	=for apidoc_section $casting
3e2d7a92 KW	3250	=for apidoc Amh\|type\|INT2PTR\|type\|int value
	3251	=for apidoc Amh\|UV\|PTR2UV\|void * ptr
	3252	=for apidoc Amh\|IV\|PTR2IV\|void * ptr
	3253	=for apidoc Amh\|NV\|PTR2NV\|void * ptr
4f313521	3254
8908e76d JH	3255	For example:
8908e76d JH	3256
c52f9dcd JH	3257	IV iv = ...;
c52f9dcd JH	3258	SV sv = INT2PTR(SV, iv);
8908e76d JH	3259
	3260	and
	3261
c52f9dcd JH	3262	AV *av = ...;
c52f9dcd JH	3263	UV uv = PTR2UV(av);
8908e76d	3264
b770a21b KW	3265	There are also
	3266
	3267	PTR2nat(pointer) /* pointer to integer of PTRSIZE */
	3268	PTR2ul(pointer) /* pointer to unsigned long */
	3269
	3270	=for apidoc Amh\|IV\|PTR2nat\|void *
	3271	=for apidoc Amh\|unsigned long\|PTR2ul\|void *
	3272
	3273	And C<PTRV> which gives the native type for an integer the same size as
	3274	pointers, such as C<unsigned> or C<unsigned long>.
	3275
21017b82	3276	=for apidoc Ayh\|type\|PTRV
b770a21b	3277
0ca3a874 MHM	3278	=head2 Exception Handling
0ca3a874 MHM	3279
9b5c3821	3280	There are a couple of macros to do very basic exception handling in XS
10e2eb10	3281	modules. You have to define C<NO_XSLOCKS> before including F<XSUB.h> to
9b5c3821 MHM	3282	be able to use these macros:
	3283
	3284	#define NO_XSLOCKS
	3285	#include "XSUB.h"
	3286
	3287	You can use these macros if you call code that may croak, but you need
10e2eb10	3288	to do some cleanup before giving control back to Perl. For example:
0ca3a874	3289
d7f8936a	3290	dXCPT; /* set up necessary variables */
0ca3a874 MHM	3291
	3292	XCPT_TRY_START {
	3293	code_that_may_croak();
	3294	} XCPT_TRY_END
	3295
	3296	XCPT_CATCH
	3297	{
	3298	/* do cleanup here */
	3299	XCPT_RETHROW;
	3300	}
	3301
	3302	Note that you always have to rethrow an exception that has been
10e2eb10 FC	3303	caught. Using these macros, it is not possible to just catch the
10e2eb10 FC	3304	exception and ignore it. If you have to ignore the exception, you
0ca3a874 MHM	3305	have to use the C<call_*> function.
	3306
	3307	The advantage of using the above macros is that you don't have
	3308	to setup an extra function for C<call_*>, and that using these
	3309	macros is faster than using C<call_*>.
	3310
a422fd2d SC	3311	=head2 Source Documentation
	3312
	3313	There's an effort going on to document the internal functions and
61ad4b94	3314	automatically produce reference manuals from them -- L<perlapi> is one
a422fd2d	3315	such manual which details all the functions which are available to XS
10e2eb10	3316	writers. L<perlintern> is the autogenerated manual for the functions
a422fd2d SC	3317	which are not part of the API and are supposedly for internal use only.
	3318
	3319	Source documentation is created by putting POD comments into the C
	3320	source, like this:
	3321
	3322	/*
	3323	=for apidoc sv_setiv
	3324
	3325	Copies an integer into the given SV. Does not handle 'set' magic. See
a95b3d6a	3326	L<perlapi/sv_setiv_mg>.
a422fd2d SC	3327
	3328	=cut
	3329	*/
	3330
	3331	Please try and supply some documentation if you add functions to the
	3332	Perl core.
	3333
0d098d33 MHM	3334	=head2 Backwards compatibility
0d098d33 MHM	3335
10e2eb10 FC	3336	The Perl API changes over time. New functions are
	3337	added or the interfaces of existing functions are
	3338	changed. The C<Devel::PPPort> module tries to
0d098d33 MHM	3339	provide compatibility code for some of these changes, so XS writers don't
	3340	have to code it themselves when supporting multiple versions of Perl.
	3341
	3342	C<Devel::PPPort> generates a C header file F<ppport.h> that can also
10e2eb10	3343	be run as a Perl script. To generate F<ppport.h>, run:
0d098d33 MHM	3344
	3345	perl -MDevel::PPPort -eDevel::PPPort::WriteFile
	3346
	3347	Besides checking existing XS code, the script can also be used to retrieve
	3348	compatibility information for various API calls using the C<--api-info>
10e2eb10	3349	command line switch. For example:
0d098d33 MHM	3350
	3351	% perl ppport.h --api-info=sv_magicext
	3352
0985f7e5	3353	For details, see S<C<perldoc ppport.h>>.
0d098d33	3354
a422fd2d SC	3355	=head1 Unicode Support
a422fd2d SC	3356
10e2eb10	3357	Perl 5.6.0 introduced Unicode support. It's important for porters and XS
a422fd2d SC	3358	writers to understand this support and make sure that the code they
	3359	write does not corrupt Unicode data.
	3360
	3361	=head2 What B<is> Unicode, anyway?
	3362
10e2eb10 FC	3363	In the olden, less enlightened times, we all used to use ASCII. Most of
10e2eb10 FC	3364	us did, anyway. The big problem with ASCII is that it's American. Well,
a422fd2d	3365	no, that's not actually the problem; the problem is that it's not
10e2eb10	3366	particularly useful for people who don't use the Roman alphabet. What
a422fd2d	3367	used to happen was that particular languages would stick their own
10e2eb10	3368	alphabet in the upper range of the sequence, between 128 and 255. Of
a422fd2d SC	3369	course, we then ended up with plenty of variants that weren't quite
	3370	ASCII, and the whole point of it being a standard was lost.
	3371
	3372	Worse still, if you've got a language like Chinese or
	3373	Japanese that has hundreds or thousands of characters, then you really
	3374	can't fit them into a mere 256, so they had to forget about ASCII
	3375	altogether, and build their own systems using pairs of numbers to refer
	3376	to one character.
	3377
	3378	To fix this, some people formed Unicode, Inc. and
	3379	produced a new character set containing all the characters you can
10e2eb10 FC	3380	possibly think of and more. There are several ways of representing these
	3381	characters, and the one Perl uses is called UTF-8. UTF-8 uses
	3382	a variable number of bytes to represent a character. You can learn more
2575c402	3383	about Unicode and Perl's Unicode model in L<perlunicode>.
a422fd2d	3384
3ad86f0e KW	3385	(On EBCDIC platforms, Perl uses instead UTF-EBCDIC, which is a form of
	3386	UTF-8 adapted for EBCDIC platforms. Below, we just talk about UTF-8.
	3387	UTF-EBCDIC is like UTF-8, but the details are different. The macros
	3388	hide the differences from you, just remember that the particular numbers
	3389	and bit patterns presented below will differ in UTF-EBCDIC.)
	3390
1e54db1a	3391	=head2 How can I recognise a UTF-8 string?
a422fd2d	3392
10e2eb10 FC	3393	You can't. This is because UTF-8 data is stored in bytes just like
10e2eb10 FC	3394	non-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types)
a422fd2d	3395	capital E with a grave accent, is represented by the two bytes
10e2eb10	3396	C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)>
61ad4b94	3397	has that byte sequence as well. So you can't tell just by looking -- this
a422fd2d SC	3398	is what makes Unicode input an interesting problem.
a422fd2d SC	3399
2575c402 JW	3400	In general, you either have to know what you're dealing with, or you
2575c402 JW	3401	have to guess. The API function C<is_utf8_string> can help; it'll tell
61ad4b94 KW	3402	you if a string contains only valid UTF-8 characters, and the chances
	3403	of a non-UTF-8 string looking like valid UTF-8 become very small very
	3404	quickly with increasing string length. On a character-by-character
	3405	basis, C<isUTF8_CHAR>
2575c402	3406	will tell you whether the current character in a string is valid UTF-8.
a422fd2d	3407
1e54db1a	3408	=head2 How does UTF-8 represent Unicode characters?
a422fd2d	3409
1e54db1a	3410	As mentioned above, UTF-8 uses a variable number of bytes to store a
10e2eb10 FC	3411	character. Characters with values 0...127 are stored in one
	3412	byte, just like good ol' ASCII. Character 128 is stored as
	3413	C<v194.128>; this continues up to character 191, which is
	3414	C<v194.191>. Now we've run out of bits (191 is binary
61ad4b94	3415	C<10111111>) so we move on; character 192 is C<v195.128>. And
a422fd2d	3416	so it goes on, moving to three bytes at character 2048.
6e31cdd1	3417	L<perlunicode/Unicode Encodings> has pictures of how this works.
a422fd2d	3418
1e54db1a	3419	Assuming you know you're dealing with a UTF-8 string, you can find out
a422fd2d SC	3420	how long the first character in it is with the C<UTF8SKIP> macro:
	3421
	3422	char *utf = "\305\233\340\240\201";
	3423	I32 len;
	3424
	3425	len = UTF8SKIP(utf); /* len is 2 here */
	3426	utf += len;
	3427	len = UTF8SKIP(utf); /* len is 3 here */
	3428
1e54db1a	3429	Another way to skip over characters in a UTF-8 string is to use
a422fd2d	3430	C<utf8_hop>, which takes a string and a number of characters to skip
10e2eb10	3431	over. You're on your own about bounds checking, though, so don't use it
a422fd2d SC	3432	lightly.
a422fd2d SC	3433
1e54db1a	3434	All bytes in a multi-byte UTF-8 character will have the high bit set,
3a2263fe	3435	so you can test if you need to do something special with this
61ad4b94	3436	character like this (the C<UTF8_IS_INVARIANT()> is a macro that tests
9f98c7fe	3437	whether the byte is encoded as a single byte even in UTF-8):
a422fd2d	3438
32128a7f KW	3439	U8 utf; / Initialize this to point to the beginning of the
	3440	sequence to convert */
	3441	U8 utf_end; / Initialize this to 1 beyond the end of the sequence
	3442	pointed to by 'utf' */
	3443	UV uv; /* Returned code point; note: a UV, not a U8, not a
	3444	char */
	3445	STRLEN len; /* Returned length of character in bytes */
a422fd2d	3446
3a2263fe	3447	if (!UTF8_IS_INVARIANT(*utf))
1e54db1a	3448	/* Must treat this as UTF-8 */
4b88fb76	3449	uv = utf8_to_uvchr_buf(utf, utf_end, &len);
a422fd2d SC	3450	else
	3451	/* OK to treat this character as a byte */
	3452	uv = *utf;
	3453
4b88fb76	3454	You can also see in that example that we use C<utf8_to_uvchr_buf> to get the
95701e00	3455	value of the character; the inverse function C<uvchr_to_utf8> is available
1e54db1a	3456	for putting a UV into UTF-8:
a422fd2d	3457
61ad4b94	3458	if (!UVCHR_IS_INVARIANT(uv))
a422fd2d	3459	/* Must treat this as UTF8 */
95701e00	3460	utf8 = uvchr_to_utf8(utf8, uv);
a422fd2d SC	3461	else
	3462	/* OK to treat this character as a byte */
	3463	*utf8++ = uv;
	3464
	3465	You B<must> convert characters to UVs using the above functions if
1e54db1a	3466	you're ever in a situation where you have to match UTF-8 and non-UTF-8
10e2eb10	3467	characters. You may not skip over UTF-8 characters in this case. If you
1e54db1a JH	3468	do this, you'll lose the ability to match hi-bit non-UTF-8 characters;
	3469	for instance, if your UTF-8 string contains C<v196.172>, and you skip
	3470	that character, you can never match a C<chr(200)> in a non-UTF-8 string.
a422fd2d SC	3471	So don't do that!
a422fd2d SC	3472
61ad4b94 KW	3473	(Note that we don't have to test for invariant characters in the
	3474	examples above. The functions work on any well-formed UTF-8 input.
	3475	It's just that its faster to avoid the function overhead when it's not
	3476	needed.)
	3477
1e54db1a	3478	=head2 How does Perl store UTF-8 strings?
a422fd2d	3479
61ad4b94	3480	Currently, Perl deals with UTF-8 strings and non-UTF-8 strings
10e2eb10 FC	3481	slightly differently. A flag in the SV, C<SVf_UTF8>, indicates that the
10e2eb10 FC	3482	string is internally encoded as UTF-8. Without it, the byte value is the
61ad4b94 KW	3483	codepoint number and vice versa. This flag is only meaningful if the SV
	3484	is C<SvPOK> or immediately after stringification via C<SvPV> or a
	3485	similar macro. You can check and manipulate this flag with the
2575c402	3486	following macros:
a422fd2d SC	3487
	3488	SvUTF8(sv)
	3489	SvUTF8_on(sv)
	3490	SvUTF8_off(sv)
	3491
	3492	This flag has an important effect on Perl's treatment of the string: if
61ad4b94	3493	UTF-8 data is not properly distinguished, regular expressions,
a422fd2d	3494	C<length>, C<substr> and other string handling operations will have
61ad4b94	3495	undesirable (wrong) results.
a422fd2d SC	3496
a422fd2d SC	3497	The problem comes when you have, for instance, a string that isn't
61ad4b94	3498	flagged as UTF-8, and contains a byte sequence that could be UTF-8 --
1e54db1a	3499	especially when combining non-UTF-8 and UTF-8 strings.
a422fd2d	3500
61ad4b94 KW	3501	Never forget that the C<SVf_UTF8> flag is separate from the PV value; you
61ad4b94 KW	3502	need to be sure you don't accidentally knock it off while you're
10e2eb10	3503	manipulating SVs. More specifically, you cannot expect to do this:
a422fd2d SC	3504
	3505	SV *sv;
	3506	SV *nsv;
	3507	STRLEN len;
	3508	char *p;
	3509
	3510	p = SvPV(sv, len);
	3511	frobnicate(p);
	3512	nsv = newSVpvn(p, len);
	3513
	3514	The C<char*> string does not tell you the whole story, and you can't
10e2eb10	3515	copy or reconstruct an SV just by copying the string value. Check if the
c31cc9fc FC	3516	old SV has the UTF8 flag set (I<after> the C<SvPV> call), and act
c31cc9fc FC	3517	accordingly:
a422fd2d SC	3518
a422fd2d SC	3519	p = SvPV(sv, len);
6db25795 KW	3520	is_utf8 = SvUTF8(sv);
6db25795 KW	3521	frobnicate(p, is_utf8);
a422fd2d	3522	nsv = newSVpvn(p, len);
6db25795	3523	if (is_utf8)
a422fd2d SC	3524	SvUTF8_on(nsv);
a422fd2d SC	3525
6db25795 KW	3526	In the above, your C<frobnicate> function has been changed to be made
	3527	aware of whether or not it's dealing with UTF-8 data, so that it can
	3528	handle the string appropriately.
a422fd2d	3529
3a2263fe	3530	Since just passing an SV to an XS function and copying the data of
2575c402	3531	the SV is not enough to copy the UTF8 flags, even less right is just
61ad4b94	3532	passing a S<C<char *>> to an XS function.
3a2263fe	3533
dc83bf8e	3534	For full generality, use the L<C<DO_UTF8>\|perlapi/DO_UTF8> macro to see if the
6db25795 KW	3535	string in an SV is to be I<treated> as UTF-8. This takes into account
	3536	if the call to the XS function is being made from within the scope of
	3537	L<S<C<use bytes>>\|bytes>. If so, the underlying bytes that comprise the
	3538	UTF-8 string are to be exposed, rather than the character they
	3539	represent. But this pragma should only really be used for debugging and
	3540	perhaps low-level testing at the byte level. Hence most XS code need
	3541	not concern itself with this, but various areas of the perl core do need
	3542	to support it.
	3543
	3544	And this isn't the whole story. Starting in Perl v5.12, strings that
	3545	aren't encoded in UTF-8 may also be treated as Unicode under various
6e31cdd1	3546	conditions (see L<perlunicode/ASCII Rules versus Unicode Rules>).
6db25795 KW	3547	This is only really a problem for characters whose ordinals are between
	3548	128 and 255, and their behavior varies under ASCII versus Unicode rules
	3549	in ways that your code cares about (see L<perlunicode/The "Unicode Bug">).
	3550	There is no published API for dealing with this, as it is subject to
	3551	change, but you can look at the code for C<pp_lc> in F<pp.c> for an
	3552	example as to how it's currently done.
	3553
3c3f883d FG	3554	=head2 How do I pass a Perl string to a C library?
	3555
	3556	A Perl string, conceptually, is an opaque sequence of code points.
	3557	Many C libraries expect their inputs to be "classical" C strings, which are
	3558	arrays of octets 1-255, terminated with a NUL byte. Your job when writing
	3559	an interface between Perl and a C library is to define the mapping between
	3560	Perl and that library.
	3561
	3562	Generally speaking, C<SvPVbyte> and related macros suit this task well.
	3563	These assume that your Perl string is a "byte string", i.e., is either
	3564	raw, undecoded input into Perl or is pre-encoded to, e.g., UTF-8.
	3565
	3566	Alternatively, if your C library expects UTF-8 text, you can use
	3567	C<SvPVutf8> and related macros. This has the same effect as encoding
	3568	to UTF-8 then calling the corresponding C<SvPVbyte>-related macro.
	3569
	3570	Some C libraries may expect other encodings (e.g., UTF-16LE). To give
	3571	Perl strings to such libraries
	3572	you must either do that encoding in Perl then use C<SvPVbyte>, or
	3573	use an intermediary C library to convert from however Perl stores the
	3574	string to the desired encoding.
	3575
	3576	Take care also that NULs in your Perl string don't confuse the C
	3577	library. If possible, give the string's length to the C library; if that's
	3578	not possible, consider rejecting strings that contain NUL bytes.
	3579
	3580	=head3 What about C<SvPV>, C<SvPV_nolen>, etc.?
	3581
	3582	Consider a 3-character Perl string C<$foo = "\x64\x78\x8c">.
	3583	Perl can store these 3 characters either of two ways:
	3584
	3585	=over
	3586
	3587	=item * bytes: 0x64 0x78 0x8c
	3588
	3589	=item * UTF-8: 0x64 0x78 0xc2 0x8c
	3590
	3591	=back
	3592
	3593	Now let's say you convert C<$foo> to a C string thus:
	3594
	3595	STRLEN strlen;
	3596	char *str = SvPV(foo_sv, strlen);
	3597
	3598	At this point C<str> could point to a 3-byte C string or a 4-byte one.
	3599
	3600	Generally speaking, we want C<str> to be the same regardless of how
	3601	Perl stores C<$foo>, so the ambiguity here is undesirable. C<SvPVbyte>
	3602	and C<SvPVutf8> solve that by giving predictable output: use
	3603	C<SvPVbyte> if your C library expects byte strings, or C<SvPVutf8>
	3604	if it expects UTF-8.
	3605
	3606	If your C library happens to support both encodings, then C<SvPV>--always
	3607	in tandem with lookups to C<SvUTF8>!--may be safe and (slightly) more
	3608	efficient.
	3609
	3610	B<TESTING> B<TIP:> Use L<utf8>'s C<upgrade> and C<downgrade> functions
	3611	in your tests to ensure consistent handling regardless of Perl's
	3612	internal encoding.
	3613
1e54db1a	3614	=head2 How do I convert a string to UTF-8?
a422fd2d	3615
2575c402	3616	If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to upgrade
61ad4b94	3617	the non-UTF-8 strings to UTF-8. If you've got an SV, the easiest way to do
2575c402	3618	this is:
a422fd2d SC	3619
	3620	sv_utf8_upgrade(sv);
	3621
	3622	However, you must not do this, for example:
	3623
	3624	if (!SvUTF8(left))
	3625	sv_utf8_upgrade(left);
	3626
	3627	If you do this in a binary operator, you will actually change one of the
b1866b2d	3628	strings that came into the operator, and, while it shouldn't be noticeable
2575c402	3629	by the end user, it can cause problems in deficient code.
a422fd2d	3630
1e54db1a	3631	Instead, C<bytes_to_utf8> will give you a UTF-8-encoded B<copy> of its
10e2eb10 FC	3632	string argument. This is useful for having the data available for
10e2eb10 FC	3633	comparisons and so on, without harming the original SV. There's also
a422fd2d SC	3634	C<utf8_to_bytes> to go the other way, but naturally, this will fail if
	3635	the string contains any characters above 255 that can't be represented
	3636	in a single byte.
	3637
6db25795 KW	3638	=head2 How do I compare strings?
	3639
	3640	L<perlapi/sv_cmp> and L<perlapi/sv_cmp_flags> do a lexigraphic
	3641	comparison of two SV's, and handle UTF-8ness properly. Note, however,
	3642	that Unicode specifies a much fancier mechanism for collation, available
	3643	via the L<Unicode::Collate> module.
	3644
	3645	To just compare two strings for equality/non-equality, you can just use
	3646	L<C<memEQ()>\|perlapi/memEQ> and L<C<memNE()>\|perlapi/memEQ> as usual,
	3647	except the strings must be both UTF-8 or not UTF-8 encoded.
	3648
	3649	To compare two strings case-insensitively, use
	3650	L<C<foldEQ_utf8()>\|perlapi/foldEQ_utf8> (the strings don't have to have
	3651	the same UTF-8ness).
	3652
a422fd2d SC	3653	=head2 Is there anything else I need to know?
a422fd2d SC	3654
10e2eb10	3655	Not really. Just remember these things:
a422fd2d SC	3656
	3657	=over 3
	3658
	3659	=item *
	3660
6db25795 KW	3661	There's no way to tell if a S<C<char >> or S<C<U8 >> string is UTF-8
	3662	or not. But you can tell if an SV is to be treated as UTF-8 by calling
	3663	C<DO_UTF8> on it, after stringifying it with C<SvPV> or a similar
	3664	macro. And, you can tell if SV is actually UTF-8 (even if it is not to
	3665	be treated as such) by looking at its C<SvUTF8> flag (again after
	3666	stringifying it). Don't forget to set the flag if something should be
	3667	UTF-8.
	3668	Treat the flag as part of the PV, even though it's not -- if you pass on
	3669	the PV to somewhere, pass on the flag too.
a422fd2d SC	3670
	3671	=item *
	3672
4b88fb76	3673	If a string is UTF-8, B<always> use C<utf8_to_uvchr_buf> to get at the value,
3a2263fe	3674	unless C<UTF8_IS_INVARIANT(s)> in which case you can use C<s>.
a422fd2d SC	3675
	3676	=item *
	3677
61ad4b94 KW	3678	When writing a character UV to a UTF-8 string, B<always> use
61ad4b94 KW	3679	C<uvchr_to_utf8>, unless C<UVCHR_IS_INVARIANT(uv))> in which case
3a2263fe	3680	you can use C<*s = uv>.
a422fd2d SC	3681
	3682	=item *
	3683
10e2eb10 FC	3684	Mixing UTF-8 and non-UTF-8 strings is
10e2eb10 FC	3685	tricky. Use C<bytes_to_utf8> to get
2bbc8d55	3686	a new string which is UTF-8 encoded, and then combine them.
a422fd2d SC	3687
	3688	=back
	3689
53e06cf0 SC	3690	=head1 Custom Operators
53e06cf0 SC	3691
2a0fd0f1	3692	Custom operator support is an experimental feature that allows you to
10e2eb10	3693	define your own ops. This is primarily to allow the building of
53e06cf0 SC	3694	interpreters for other languages in the Perl core, but it also allows
	3695	optimizations through the creation of "macro-ops" (ops which perform the
	3696	functions of multiple ops which are usually executed together, such as
1aa6ea50	3697	C<gvsv, gvsv, add>.)
53e06cf0	3698
10e2eb10	3699	This feature is implemented as a new op type, C<OP_CUSTOM>. The Perl
53e06cf0	3700	core does not "know" anything special about this op type, and so it will
10e2eb10	3701	not be involved in any optimizations. This also means that you can
61ad4b94 KW	3702	define your custom ops to be any op structure -- unary, binary, list and
61ad4b94 KW	3703	so on -- you like.
53e06cf0	3704
10e2eb10 FC	3705	It's important to know what custom operators won't do for you. They
	3706	won't let you add new syntax to Perl, directly. They won't even let you
	3707	add new keywords, directly. In fact, they won't change the way Perl
	3708	compiles a program at all. You have to do those changes yourself, after
	3709	Perl has compiled the program. You do this either by manipulating the op
53e06cf0 SC	3710	tree using a C<CHECK> block and the C<B::Generate> module, or by adding
	3711	a custom peephole optimizer with the C<optimize> module.
	3712
	3713	When you do this, you replace ordinary Perl ops with custom ops by
407f86e1	3714	creating ops with the type C<OP_CUSTOM> and the C<op_ppaddr> of your own
10e2eb10 FC	3715	PP function. This should be defined in XS code, and should look like
10e2eb10 FC	3716	the PP ops in C<pp_*.c>. You are responsible for ensuring that your op
53e06cf0 SC	3717	takes the appropriate number of values from the stack, and you are
	3718	responsible for adding stack marks if necessary.
	3719
	3720	You should also "register" your op with the Perl interpreter so that it
10e2eb10	3721	can produce sensible error and warning messages. Since it is possible to
53e06cf0	3722	have multiple custom ops within the one "logical" op type C<OP_CUSTOM>,
9733086d	3723	Perl uses the value of C<< o->op_ppaddr >> to determine which custom op
10e2eb10	3724	it is dealing with. You should create an C<XOP> structure for each
9733086d BM	3725	ppaddr you use, set the properties of the custom op with
9733086d BM	3726	C<XopENTRY_set>, and register the structure against the ppaddr using
10e2eb10	3727	C<Perl_custom_op_register>. A trivial example might look like:
9733086d	3728
7cc7ada7	3729	=for apidoc_section $optree_manipulation
63dbc4a9 KW	3730	=for apidoc Ayh\|\|XOP
63dbc4a9 KW	3731
9733086d BM	3732	static XOP my_xop;
	3733	static OP *my_pp(pTHX);
	3734
	3735	BOOT:
	3736	XopENTRY_set(&my_xop, xop_name, "myxop");
	3737	XopENTRY_set(&my_xop, xop_desc, "Useless custom op");
	3738	Perl_custom_op_register(aTHX_ my_pp, &my_xop);
	3739
	3740	The available fields in the structure are:
	3741
	3742	=over 4
	3743
	3744	=item xop_name
	3745
10e2eb10	3746	A short name for your op. This will be included in some error messages,
9733086d BM	3747	and will also be returned as C<< $op->name >> by the L<B\|B> module, so
	3748	it will appear in the output of module like L<B::Concise\|B::Concise>.
	3749
	3750	=item xop_desc
	3751
	3752	A short description of the function of the op.
	3753
	3754	=item xop_class
	3755
10e2eb10	3756	Which of the various C<*OP> structures this op uses. This should be one of
9733086d BM	3757	the C<OA_*> constants from F<op.h>, namely
	3758
	3759	=over 4
	3760
	3761	=item OA_BASEOP
	3762
	3763	=item OA_UNOP
	3764
	3765	=item OA_BINOP
	3766
	3767	=item OA_LOGOP
	3768
	3769	=item OA_LISTOP
	3770
	3771	=item OA_PMOP
	3772
	3773	=item OA_SVOP
	3774
	3775	=item OA_PADOP
	3776
	3777	=item OA_PVOP_OR_SVOP
	3778
10e2eb10	3779	This should be interpreted as 'C<PVOP>' only. The C<_OR_SVOP> is because
9733086d BM	3780	the only core C<PVOP>, C<OP_TRANS>, can sometimes be a C<SVOP> instead.
	3781
	3782	=item OA_LOOP
	3783
	3784	=item OA_COP
	3785
6ef63541 KW	3786	=for apidoc_section $optree_manipulation
	3787	=for apidoc Amnh\|\|OA_BASEOP
	3788	=for apidoc_item OA_BINOP
	3789	=for apidoc_item OA_COP
	3790	=for apidoc_item OA_LISTOP
	3791	=for apidoc_item OA_LOGOP
1607e393	3792	=for apidoc_item OA_LOOP
6ef63541 KW	3793	=for apidoc_item OA_PADOP
	3794	=for apidoc_item OA_PMOP
	3795	=for apidoc_item OA_PVOP_OR_SVOP
	3796	=for apidoc_item OA_SVOP
	3797	=for apidoc_item OA_UNOP
6ef63541	3798
9733086d BM	3799	=back
	3800
	3801	The other C<OA_*> constants should not be used.
	3802
	3803	=item xop_peep
	3804
	3805	This member is of type C<Perl_cpeep_t>, which expands to C<void
10e2eb10	3806	(Perl_cpeep_t)(aTHX_ OP o, OP *oldop)>. If it is set, this function
9733086d	3807	will be called from C<Perl_rpeep> when ops of this type are encountered
10e2eb10	3808	by the peephole optimizer. I<o> is the OP that needs optimizing;
9733086d BM	3809	I<oldop> is the previous OP optimized, whose C<op_next> points to I<o>.
9733086d BM	3810
7cc7ada7	3811	=for apidoc_section $optree_manipulation
63dbc4a9 KW	3812	=for apidoc Ayh\|\|Perl_cpeep_t
63dbc4a9 KW	3813
9733086d	3814	=back
53e06cf0	3815
e7d4c058	3816	C<B::Generate> directly supports the creation of custom ops by name.
53e06cf0	3817
e55ec392 PE	3818	=head1 Stacks
	3819
	3820	Descriptions above occasionally refer to "the stack", but there are in fact
	3821	many stack-like data structures within the perl interpreter. When otherwise
	3822	unqualified, "the stack" usually refers to the value stack.
	3823
	3824	The various stacks have different purposes, and operate in slightly different
	3825	ways. Their differences are noted below.
	3826
	3827	=head2 Value Stack
	3828
	3829	This stack stores the values that regular perl code is operating on, usually
	3830	intermediate values of expressions within a statement. The stack itself is
	3831	formed of an array of SV pointers.
	3832
	3833	The base of this stack is pointed to by the interpreter variable
	3834	C<PL_stack_base>, of type C<SV **>.
	3835
6ef63541 KW	3836	=for apidoc_section $stack
	3837	=for apidoc Amnh\|\|PL_stack_base
	3838
e55ec392 PE	3839	The head of the stack is C<PL_stack_sp>, and points to the most
	3840	recently-pushed item.
	3841
6ef63541 KW	3842	=for apidoc Amnh\|\|PL_stack_sp
6ef63541 KW	3843
e55ec392 PE	3844	Items are pushed to the stack by using the C<PUSHs()> macro or its variants
	3845	described above; C<XPUSHs()>, C<mPUSHs()>, C<mXPUSHs()> and the typed
	3846	versions. Note carefully that the non-C<X> versions of these macros do not
	3847	check the size of the stack and assume it to be big enough. These must be
	3848	paired with a suitable check of the stack's size, such as the C<EXTEND> macro
	3849	to ensure it is large enough. For example
	3850
	3851	EXTEND(SP, 4);
	3852	mPUSHi(10);
	3853	mPUSHi(20);
	3854	mPUSHi(30);
	3855	mPUSHi(40);
	3856
	3857	This is slightly more performant than making four separate checks in four
	3858	separate C<mXPUSHi()> calls.
	3859
	3860	As a further performance optimisation, the various C<PUSH> macros all operate
	3861	using a local variable C<SP>, rather than the interpreter-global variable
	3862	C<PL_stack_sp>. This variable is declared by the C<dSP> macro - though it is
	3863	normally implied by XSUBs and similar so it is rare you have to consider it
	3864	directly. Once declared, the C<PUSH> macros will operate only on this local
	3865	variable, so before invoking any other perl core functions you must use the
	3866	C<PUTBACK> macro to return the value from the local C<SP> variable back to
	3867	the interpreter variable. Similarly, after calling a perl core function which
	3868	may have had reason to move the stack or push/pop values to it, you must use
	3869	the C<SPAGAIN> macro which refreshes the local C<SP> value back from the
	3870	interpreter one.
	3871
	3872	Items are popped from the stack by using the C<POPs> macro or its typed
	3873	versions, There is also a macro C<TOPs> that inspects the topmost item without
	3874	removing it.
	3875
6ef63541 KW	3876	=for apidoc_section $stack
	3877	=for apidoc Amnh\|\|TOPs
	3878
e55ec392 PE	3879	Note specifically that SV pointers on the value stack do not contribute to the
	3880	overall reference count of the xVs being referred to. If newly-created xVs are
	3881	being pushed to the stack you must arrange for them to be destroyed at a
	3882	suitable time; usually by using one of the C<mPUSH*> macros or C<sv_2mortal()>
	3883	to mortalise the xV.
	3884
	3885	=head2 Mark Stack
	3886
	3887	The value stack stores individual perl scalar values as temporaries between
	3888	expressions. Some perl expressions operate on entire lists; for that purpose
	3889	we need to know where on the stack each list begins. This is the purpose of the
	3890	mark stack.
	3891
	3892	The mark stack stores integers as I32 values, which are the height of the
	3893	value stack at the time before the list began; thus the mark itself actually
	3894	points to the value stack entry one before the list. The list itself starts at
	3895	C<mark + 1>.
	3896
	3897	The base of this stack is pointed to by the interpreter variable
	3898	C<PL_markstack>, of type C<I32 *>.
	3899
6ef63541 KW	3900	=for apidoc_section $stack
	3901	=for apidoc Amnh\|\|PL_markstack
	3902
e55ec392 PE	3903	The head of the stack is C<PL_markstack_ptr>, and points to the most
	3904	recently-pushed item.
	3905
6ef63541 KW	3906	=for apidoc Amnh\|\|PL_markstack_ptr
6ef63541 KW	3907
e55ec392 PE	3908	Items are pushed to the stack by using the C<PUSHMARK()> macro. Even though
	3909	the stack itself stores (value) stack indices as integers, the C<PUSHMARK>
	3910	macro should be given a stack pointer directly; it will calculate the index
	3911	offset by comparing to the C<PL_stack_sp> variable. Thus almost always the
	3912	code to perform this is
	3913
	3914	PUSHMARK(SP);
	3915
	3916	Items are popped from the stack by the C<POPMARK> macro. There is also a macro
	3917	C<TOPMARK> that inspects the topmost item without removing it. These macros
	3918	return I32 index values directly. There is also the C<dMARK> macro which
	3919	declares a new SV double-pointer variable, called C<mark>, which points at the
	3920	marked stack slot; this is the usual macro that C code will use when operating
	3921	on lists given on the stack.
	3922
	3923	As noted above, the C<mark> variable itself will point at the most recently
	3924	pushed value on the value stack before the list begins, and so the list itself
	3925	starts at C<mark + 1>. The values of the list may be iterated by code such as
	3926
	3927	for(SV **svp = mark + 1; svp <= PL_stack_sp; svp++) {
	3928	SV item = svp;
	3929	...
	3930	}
	3931
	3932	Note specifically in the case that the list is already empty, C<mark> will
	3933	equal C<PL_stack_sp>.
	3934
	3935	Because the C<mark> variable is converted to a pointer on the value stack,
	3936	extra care must be taken if C<EXTEND> or any of the C<XPUSH> macros are
	3937	invoked within the function, because the stack may need to be moved to
	3938	extend it and so the existing pointer will now be invalid. If this may be a
	3939	problem, a possible solution is to track the mark offset as an integer and
	3940	track the mark itself later on after the stack had been moved.
	3941
	3942	I32 markoff = POPMARK;
	3943
	3944	...
	3945
	3946	SP **mark = PL_stack_base + markoff;
	3947
	3948	=head2 Temporaries Stack
	3949
	3950	As noted above, xV references on the main value stack do not contribute to the
	3951	reference count of an xV, and so another mechanism is used to track when
	3952	temporary values which live on the stack must be released. This is the job of
	3953	the temporaries stack.
	3954
	3955	The temporaries stack stores pointers to xVs whose reference counts will be
	3956	decremented soon.
	3957
	3958	The base of this stack is pointed to by the interpreter variable
	3959	C<PL_tmps_stack>, of type C<SV **>.
	3960
6ef63541 KW	3961	=for apidoc_section $stack
	3962	=for apidoc Amnh\|\|PL_tmps_stack
	3963
e55ec392 PE	3964	The head of the stack is indexed by C<PL_tmps_ix>, an integer which stores the
	3965	index in the array of the most recently-pushed item.
	3966
6ef63541 KW	3967	=for apidoc Amnh\|\|PL_tmps_ix
6ef63541 KW	3968
e55ec392 PE	3969	There is no public API to directly push items to the temporaries stack. Instead,
	3970	the API function C<sv_2mortal()> is used to mortalize an xV, adding its
	3971	address to the temporaries stack.
	3972
	3973	Likewise, there is no public API to read values from the temporaries stack.
b1b78d72	3974	Instead, the macros C<SAVETMPS> and C<FREETMPS> are used. The C<SAVETMPS>
e55ec392 PE	3975	macro establishes the base levels of the temporaries stack, by capturing the
	3976	current value of C<PL_tmps_ix> into C<PL_tmps_floor> and saving the previous
	3977	value to the save stack. Thereafter, whenever C<FREETMPS> is invoked all of
	3978	the temporaries that have been pushed since that level are reclaimed.
	3979
6ef63541 KW	3980	=for apidoc_section $stack
	3981	=for apidoc Amnh\|\|PL_tmps_floor
	3982
e55ec392 PE	3983	While it is common to see these two macros in pairs within an C<ENTER>/
	3984	C<LEAVE> pair, it is not necessary to match them. It is permitted to invoke
	3985	C<FREETMPS> multiple times since the most recent C<SAVETMPS>; for example in a
	3986	loop iterating over elements of a list. While you can invoke C<SAVETMPS>
	3987	multiple times within a scope pair, it is unlikely to be useful. Subsequent
	3988	invocations will move the temporaries floor further up, thus effectively
	3989	trapping the existing temporaries to only be released at the end of the scope.
	3990
	3991	=head2 Save Stack
	3992
	3993	The save stack is used by perl to implement the C<local> keyword and other
	3994	similar behaviours; any cleanup operations that need to be performed when
	3995	leaving the current scope. Items pushed to this stack generally capture the
	3996	current value of some internal variable or state, which will be restored when
	3997	the scope is unwound due to leaving, C<return>, C<die>, C<goto> or other
	3998	reasons.
	3999
	4000	Whereas other perl internal stacks store individual items all of the same type
	4001	(usually SV pointers or integers), the items pushed to the save stack are
	4002	formed of many different types, having multiple fields to them. For example,
	4003	the C<SAVEt_INT> type needs to store both the address of the C<int> variable
	4004	to restore, and the value to restore it to. This information could have been
	4005	stored using fields of a C<struct>, but would have to be large enough to store
	4006	three pointers in the largest case, which would waste a lot of space in most
	4007	of the smaller cases.
	4008
6ef63541 KW	4009	=for apidoc_section $stack
	4010	=for apidoc Amnh\|\|SAVEt_INT
	4011
e55ec392 PE	4012	Instead, the stack stores information in a variable-length encoding of C<ANY>
e55ec392 PE	4013	structures. The final value pushed is stored in the C<UV> field which encodes
5ab5717f	4014	the kind of item held by the preceding items; the count and types of which
e55ec392 PE	4015	will depend on what kind of item is being stored. The kind field is pushed
	4016	last because that will be the first field to be popped when unwinding items
	4017	from the stack.
	4018
	4019	The base of this stack is pointed to by the interpreter variable
	4020	C<PL_savestack>, of type C<ANY *>.
	4021
6ef63541 KW	4022	=for apidoc_section $stack
	4023	=for apidoc Amnh\|\|PL_savestack
	4024
e55ec392 PE	4025	The head of the stack is indexed by C<PL_savestack_ix>, an integer which
	4026	stores the index in the array at which the next item should be pushed. (Note
	4027	that this is different to most other stacks, which reference the most
	4028	recently-pushed item).
	4029
6ef63541 KW	4030	=for apidoc_section $stack
	4031	=for apidoc Amnh\|\|PL_savestack_ix
	4032
e55ec392 PE	4033	Items are pushed to the save stack by using the various C<SAVE...()> macros.
	4034	Many of these macros take a variable and store both its address and current
	4035	value on the save stack, ensuring that value gets restored on scope exit.
	4036
	4037	SAVEI8(i8)
	4038	SAVEI16(i16)
	4039	SAVEI32(i32)
	4040	SAVEINT(i)
	4041	...
	4042
	4043	There are also a variety of other special-purpose macros which save particular
	4044	types or values of interest. C<SAVETMPS> has already been mentioned above.
	4045	Others include C<SAVEFREEPV> which arranges for a PV (i.e. a string buffer) to
	4046	be freed, or C<SAVEDESTRUCTOR> which arranges for a given function pointer to
	4047	be invoked on scope exit. A full list of such macros can be found in
	4048	F<scope.h>.
	4049
	4050	There is no public API for popping individual values or items from the save
	4051	stack. Instead, via the scope stack, the C<ENTER> and C<LEAVE> pair form a way
	4052	to start and stop nested scopes. Leaving a nested scope via C<LEAVE> will
	4053	restore all of the saved values that had been pushed since the most recent
	4054	C<ENTER>.
	4055
	4056	=head2 Scope Stack
	4057
	4058	As with the mark stack to the value stack, the scope stack forms a pair with
170a6378	4059	the save stack. The scope stack stores the height of the save stack at which
e55ec392 PE	4060	nested scopes begin, and allows the save stack to be unwound back to that
	4061	point when the scope is left.
	4062
	4063	When perl is built with debugging enabled, there is a second part to this
	4064	stack storing human-readable string names describing the type of stack
	4065	context. Each push operation saves the name as well as the height of the save
	4066	stack, and each pop operation checks the topmost name with what is expected,
	4067	causing an assertion failure if the name does not match.
	4068
	4069	The base of this stack is pointed to by the interpreter variable
	4070	C<PL_scopestack>, of type C<I32 *>. If enabled, the scope stack names are
	4071	stored in a separate array pointed to by C<PL_scopestack_name>, of type
	4072	C<const char **>.
	4073
6ef63541 KW	4074	=for apidoc_section $stack
	4075	=for apidoc Amnh\|\|PL_scopestack
	4076	=for apidoc Amnh\|\|PL_scopestack_name
	4077
e55ec392 PE	4078	The head of the stack is indexed by C<PL_scopestack_ix>, an integer which
	4079	stores the index of the array or arrays at which the next item should be
	4080	pushed. (Note that this is different to most other stacks, which reference the
	4081	most recently-pushed item).
	4082
6ef63541 KW	4083	=for apidoc_section $stack
	4084	=for apidoc Amnh\|\|PL_scopestack_ix
	4085
e55ec392 PE	4086	Values are pushed to the scope stack using the C<ENTER> macro, which begins a
	4087	new nested scope. Any items pushed to the save stack are then restored at the
	4088	next nested invocation of the C<LEAVE> macro.
bf5e9371 DM	4089
	4090	=head1 Dynamic Scope and the Context Stack
	4091
	4092	B<Note:> this section describes a non-public internal API that is subject
	4093	to change without notice.
	4094
	4095	=head2 Introduction to the context stack
	4096
	4097	In Perl, dynamic scoping refers to the runtime nesting of things like
	4098	subroutine calls, evals etc, as well as the entering and exiting of block
	4099	scopes. For example, the restoring of a C<local>ised variable is
	4100	determined by the dynamic scope.
	4101
	4102	Perl tracks the dynamic scope by a data structure called the context
	4103	stack, which is an array of C<PERL_CONTEXT> structures, and which is
	4104	itself a big union for all the types of context. Whenever a new scope is
	4105	entered (such as a block, a C<for> loop, or a subroutine call), a new
	4106	context entry is pushed onto the stack. Similarly when leaving a block or
	4107	returning from a subroutine call etc. a context is popped. Since the
	4108	context stack represents the current dynamic scope, it can be searched.
	4109	For example, C<next LABEL> searches back through the stack looking for a
	4110	loop context that matches the label; C<return> pops contexts until it
	4111	finds a sub or eval context or similar; C<caller> examines sub contexts on
	4112	the stack.
	4113
6ef63541 KW	4114	=for apidoc_section $concurrency
	4115	=for apidoc Cyh\|\|PERL_CONTEXT
	4116
bf5e9371 DM	4117	Each context entry is labelled with a context type, C<cx_type>. Typical
	4118	context types are C<CXt_SUB>, C<CXt_EVAL> etc., as well as C<CXt_BLOCK>
	4119	and C<CXt_NULL> which represent a basic scope (as pushed by C<pp_enter>)
	4120	and a sort block. The type determines which part of the context union are
	4121	valid.
	4122
6ef63541 KW	4123	=for apidoc Cyh \|\|cx_type
	4124
	4125	=for apidoc Cmnh\|\|CXt_BLOCK
	4126	=for apidoc_item \|\|CXt_EVAL
	4127	=for apidoc_item \|\|CXt_FORMAT
	4128	=for apidoc_item \|\|CXt_GIVEN
	4129	=for apidoc_item \|\|CXt_LOOP_ARY
	4130	=for apidoc_item \|\|CXt_LOOP_LAZYIV
	4131	=for apidoc_item \|\|CXt_LOOP_LAZYSV
	4132	=for apidoc_item \|\|CXt_LOOP_LIST
	4133	=for apidoc_item \|\|CXt_LOOP_PLAIN
	4134	=for apidoc_item \|\|CXt_NULL
	4135	=for apidoc_item \|\|CXt_SUB
	4136	=for apidoc_item \|\|CXt_SUBST
	4137	=for apidoc_item \|\|CXt_WHEN
	4138
bf5e9371 DM	4139	The main division in the context struct is between a substitution scope
bf5e9371 DM	4140	(C<CXt_SUBST>) and block scopes, which are everything else. The former is
d7c7f8cb	4141	just used while executing C<s///e>, and won't be discussed further
bf5e9371 DM	4142	here.
	4143
	4144	All the block scope types share a common base, which corresponds to
	4145	C<CXt_BLOCK>. This stores the old values of various scope-related
	4146	variables like C<PL_curpm>, as well as information about the current
	4147	scope, such as C<gimme>. On scope exit, the old variables are restored.
	4148
	4149	Particular block scope types store extra per-type information. For
	4150	example, C<CXt_SUB> stores the currently executing CV, while the various
	4151	for loop types might hold the original loop variable SV. On scope exit,
	4152	the per-type data is processed; for example the CV has its reference count
	4153	decremented, and the original loop variable is restored.
	4154
	4155	The macro C<cxstack> returns the base of the current context stack, while
	4156	C<cxstack_ix> is the index of the current frame within that stack.
	4157
6ef63541 KW	4158	=for apidoc_section $concurrency
	4159	=for apidoc Cmnh\|PERL_CONTEXT *\|cxstack
	4160	=for apidoc Cmnh\|I32\|cxstack_ix
	4161
bf5e9371 DM	4162	In fact, the context stack is actually part of a stack-of-stacks system;
	4163	whenever something unusual is done such as calling a C<DESTROY> or tie
	4164	handler, a new stack is pushed, then popped at the end.
	4165
	4166	Note that the API described here changed considerably in perl 5.24; prior
	4167	to that, big macros like C<PUSHBLOCK> and C<POPSUB> were used; in 5.24
	4168	they were replaced by the inline static functions described below. In
	4169	addition, the ordering and detail of how these macros/function work
	4170	changed in many ways, often subtly. In particular they didn't handle
	4171	saving the savestack and temps stack positions, and required additional
	4172	C<ENTER>, C<SAVETMPS> and C<LEAVE> compared to the new functions. The
	4173	old-style macros will not be described further.
	4174
	4175
	4176	=head2 Pushing contexts
	4177
	4178	For pushing a new context, the two basic functions are
	4179	C<cx = cx_pushblock()>, which pushes a new basic context block and returns
	4180	its address, and a family of similar functions with names like
	4181	C<cx_pushsub(cx)> which populate the additional type-dependent fields in
	4182	the C<cx> struct. Note that C<CXt_NULL> and C<CXt_BLOCK> don't have their
	4183	own push functions, as they don't store any data beyond that pushed by
	4184	C<cx_pushblock>.
	4185
	4186	The fields of the context struct and the arguments to the C<cx_*>
	4187	functions are subject to change between perl releases, representing
	4188	whatever is convenient or efficient for that release.
	4189
	4190	A typical context stack pushing can be found in C<pp_entersub>; the
	4191	following shows a simplified and stripped-down example of a non-XS call,
	4192	along with comments showing roughly what each function does.
	4193
61f554bd KW	4194	dMARK;
	4195	U8 gimme = GIMME_V;
	4196	bool hasargs = cBOOL(PL_op->op_flags & OPf_STACKED);
	4197	OP *retop = PL_op->op_next;
	4198	I32 old_ss_ix = PL_savestack_ix;
	4199	CV *cv = ....;
	4200
	4201	/* ... make mortal copies of stack args which are PADTMPs here ... */
	4202
	4203	/* ... do any additional savestack pushes here ... */
	4204
	4205	/* Now push a new context entry of type 'CXt_SUB'; initially just
	4206	* doing the actions common to all block types: */
	4207
	4208	cx = cx_pushblock(CXt_SUB, gimme, MARK, old_ss_ix);
	4209
	4210	/* this does (approximately):
	4211	CXINC; /* cxstack_ix++ (grow if necessary) */
	4212	cx = CX_CUR(); /* and get the address of new frame */
	4213	cx->cx_type = CXt_SUB;
	4214	cx->blk_gimme = gimme;
	4215	cx->blk_oldsp = MARK - PL_stack_base;
	4216	cx->blk_oldsaveix = old_ss_ix;
	4217	cx->blk_oldcop = PL_curcop;
	4218	cx->blk_oldmarksp = PL_markstack_ptr - PL_markstack;
	4219	cx->blk_oldscopesp = PL_scopestack_ix;
	4220	cx->blk_oldpm = PL_curpm;
	4221	cx->blk_old_tmpsfloor = PL_tmps_floor;
	4222
	4223	PL_tmps_floor = PL_tmps_ix;
	4224	*/
bf5e9371 DM	4225
bf5e9371 DM	4226
61f554bd KW	4227	/* then update the new context frame with subroutine-specific info,
61f554bd KW	4228	* such as the CV about to be executed: */
bf5e9371	4229
61f554bd	4230	cx_pushsub(cx, cv, retop, hasargs);
bf5e9371	4231
61f554bd KW	4232	/* this does (approximately):
	4233	cx->blk_sub.cv = cv;
	4234	cx->blk_sub.olddepth = CvDEPTH(cv);
	4235	cx->blk_sub.prevcomppad = PL_comppad;
	4236	cx->cx_type \|= (hasargs) ? CXp_HASARGS : 0;
	4237	cx->blk_sub.retop = retop;
	4238	SvREFCNT_inc_simple_void_NN(cv);
	4239	*/
bf5e9371	4240
6ef63541 KW	4241	=for apidoc_section $concurrency
	4242	=for apidoc Cmnh\|\|CXINC
	4243
bf5e9371 DM	4244	Note that C<cx_pushblock()> sets two new floors: for the args stack (to
	4245	C<MARK>) and the temps stack (to C<PL_tmps_ix>). While executing at this
	4246	scope level, every C<nextstate> (amongst others) will reset the args and
	4247	tmps stack levels to these floors. Note that since C<cx_pushblock> uses
	4248	the current value of C<PL_tmps_ix> rather than it being passed as an arg,
	4249	this dictates at what point C<cx_pushblock> should be called. In
	4250	particular, any new mortals which should be freed only on scope exit
	4251	(rather than at the next C<nextstate>) should be created first.
	4252
	4253	Most callers of C<cx_pushblock> simply set the new args stack floor to the
	4254	top of the previous stack frame, but for C<CXt_LOOP_LIST> it stores the
	4255	items being iterated over on the stack, and so sets C<blk_oldsp> to the
	4256	top of these items instead. Note that, contrary to its name, C<blk_oldsp>
	4257	doesn't always represent the value to restore C<PL_stack_sp> to on scope
	4258	exit.
	4259
	4260	Note the early capture of C<PL_savestack_ix> to C<old_ss_ix>, which is
	4261	later passed as an arg to C<cx_pushblock>. In the case of C<pp_entersub>,
	4262	this is because, although most values needing saving are stored in fields
	4263	of the context struct, an extra value needs saving only when the debugger
	4264	is running, and it doesn't make sense to bloat the struct for this rare
	4265	case. So instead it is saved on the savestack. Since this value gets
	4266	calculated and saved before the context is pushed, it is necessary to pass
	4267	the old value of C<PL_savestack_ix> to C<cx_pushblock>, to ensure that the
	4268	saved value gets freed during scope exit. For most users of
	4269	C<cx_pushblock>, where nothing needs pushing on the save stack,
	4270	C<PL_savestack_ix> is just passed directly as an arg to C<cx_pushblock>.
	4271
	4272	Note that where possible, values should be saved in the context struct
	4273	rather than on the save stack; it's much faster that way.
	4274
	4275	Normally C<cx_pushblock> should be immediately followed by the appropriate
	4276	C<cx_pushfoo>, with nothing between them; this is because if code
	4277	in-between could die (e.g. a warning upgraded to fatal), then the context
	4278	stack unwinding code in C<dounwind> would see (in the example above) a
	4279	C<CXt_SUB> context frame, but without all the subroutine-specific fields
	4280	set, and crashes would soon ensue.
	4281
82f65d69 KW	4282	=for apidoc dounwind
82f65d69 KW	4283
bf5e9371 DM	4284	Where the two must be separate, initially set the type to C<CXt_NULL> or
	4285	C<CXt_BLOCK>, and later change it to C<CXt_foo> when doing the
	4286	C<cx_pushfoo>. This is exactly what C<pp_enteriter> does, once it's
	4287	determined which type of loop it's pushing.
	4288
	4289	=head2 Popping contexts
	4290
	4291	Contexts are popped using C<cx_popsub()> etc. and C<cx_popblock()>. Note
	4292	however, that unlike C<cx_pushblock>, neither of these functions actually
	4293	decrement the current context stack index; this is done separately using
	4294	C<CX_POP()>.
	4295
6ef63541 KW	4296	=for apidoc_section $concurrency
	4297	=for apidoc Cmh\|void\|CX_POP\|PERL_CONTEXT* cx
	4298
bf5e9371 DM	4299	There are two main ways that contexts are popped. During normal execution
	4300	as scopes are exited, functions like C<pp_leave>, C<pp_leaveloop> and
	4301	C<pp_leavesub> process and pop just one context using C<cx_popfoo> and
	4302	C<cx_popblock>. On the other hand, things like C<pp_return> and C<next>
	4303	may have to pop back several scopes until a sub or loop context is found,
	4304	and exceptions (such as C<die>) need to pop back contexts until an eval
	4305	context is found. Both of these are accomplished by C<dounwind()>, which
	4306	is capable of processing and popping all contexts above the target one.
	4307
	4308	Here is a typical example of context popping, as found in C<pp_leavesub>
	4309	(simplified slightly):
	4310
61f554bd KW	4311	U8 gimme;
	4312	PERL_CONTEXT *cx;
	4313	SV **oldsp;
	4314	OP *retop;
bf5e9371	4315
61f554bd	4316	cx = CX_CUR();
bf5e9371	4317
61f554bd KW	4318	gimme = cx->blk_gimme;
61f554bd KW	4319	oldsp = PL_stack_base + cx->blk_oldsp; /* last arg of previous frame */
bf5e9371	4320
61f554bd KW	4321	if (gimme == G_VOID)
	4322	PL_stack_sp = oldsp;
	4323	else
	4324	leave_adjust_stacks(oldsp, oldsp, gimme, 0);
bf5e9371	4325
61f554bd KW	4326	CX_LEAVE_SCOPE(cx);
	4327	cx_popsub(cx);
	4328	cx_popblock(cx);
	4329	retop = cx->blk_sub.retop;
	4330	CX_POP(cx);
bf5e9371	4331
61f554bd	4332	return retop;
bf5e9371	4333
6ef63541 KW	4334	=for apidoc_section $concurrency
	4335	=for apidoc Cmh\|\|CX_CUR
	4336
bf5e9371 DM	4337	The steps above are in a very specific order, designed to be the reverse
bf5e9371 DM	4338	order of when the context was pushed. The first thing to do is to copy
a3815e44	4339	and/or protect any return arguments and free any temps in the current
bf5e9371 DM	4340	scope. Scope exits like an rvalue sub normally return a mortal copy of
	4341	their return args (as opposed to lvalue subs). It is important to make
	4342	this copy before the save stack is popped or variables are restored, or
	4343	bad things like the following can happen:
	4344
	4345	sub f { my $x =...; $x } # $x freed before we get to copy it
	4346	sub f { /(...)/; $1 } # PL_curpm restored before $1 copied
	4347
	4348	Although we wish to free any temps at the same time, we have to be careful
	4349	not to free any temps which are keeping return args alive; nor to free the
	4350	temps we have just created while mortal copying return args. Fortunately,
	4351	C<leave_adjust_stacks()> is capable of making mortal copies of return args,
	4352	shifting args down the stack, and only processing those entries on the
	4353	temps stack that are safe to do so.
	4354
	4355	In void context no args are returned, so it's more efficient to skip
	4356	calling C<leave_adjust_stacks()>. Also in void context, a C<nextstate> op
	4357	is likely to be imminently called which will do a C<FREETMPS>, so there's
	4358	no need to do that either.
	4359
	4360	The next step is to pop savestack entries: C<CX_LEAVE_SCOPE(cx)> is just
4b93f2ab	4361	defined as C<< LEAVE_SCOPE(cx->blk_oldsaveix) >>. Note that during the
bf5e9371 DM	4362	popping, it's possible for perl to call destructors, call C<STORE> to undo
	4363	localisations of tied vars, and so on. Any of these can die or call
	4364	C<exit()>. In this case, C<dounwind()> will be called, and the current
	4365	context stack frame will be re-processed. Thus it is vital that all steps
	4366	in popping a context are done in such a way to support reentrancy. The
	4367	other alternative, of decrementing C<cxstack_ix> I<before> processing the
	4368	frame, would lead to leaks and the like if something died halfway through,
	4369	or overwriting of the current frame.
	4370
6ef63541 KW	4371	=for apidoc_section $concurrency
	4372	=for apidoc Cmh\|void\|CX_LEAVE_SCOPE\|PERL_CONTEXT* cx
	4373
bf5e9371 DM	4374	C<CX_LEAVE_SCOPE> itself is safely re-entrant: if only half the savestack
	4375	items have been popped before dying and getting trapped by eval, then the
	4376	C<CX_LEAVE_SCOPE>s in C<dounwind> or C<pp_leaveeval> will continue where
	4377	the first one left off.
	4378
	4379	The next step is the type-specific context processing; in this case
	4380	C<cx_popsub>. In part, this looks like:
	4381
	4382	cv = cx->blk_sub.cv;
	4383	CvDEPTH(cv) = cx->blk_sub.olddepth;
	4384	cx->blk_sub.cv = NULL;
	4385	SvREFCNT_dec(cv);
	4386
	4387	where its processing the just-executed CV. Note that before it decrements
	4388	the CV's reference count, it nulls the C<blk_sub.cv>. This means that if
	4389	it re-enters, the CV won't be freed twice. It also means that you can't
	4390	rely on such type-specific fields having useful values after the return
	4391	from C<cx_popfoo>.
	4392
	4393	Next, C<cx_popblock> restores all the various interpreter vars to their
	4394	previous values or previous high water marks; it expands to:
	4395
	4396	PL_markstack_ptr = PL_markstack + cx->blk_oldmarksp;
	4397	PL_scopestack_ix = cx->blk_oldscopesp;
	4398	PL_curpm = cx->blk_oldpm;
	4399	PL_curcop = cx->blk_oldcop;
	4400	PL_tmps_floor = cx->blk_old_tmpsfloor;
	4401
	4402	Note that it I<doesn't> restore C<PL_stack_sp>; as mentioned earlier,
	4403	which value to restore it to depends on the context type (specifically
	4404	C<for (list) {}>), and what args (if any) it returns; and that will
	4405	already have been sorted out earlier by C<leave_adjust_stacks()>.
	4406
	4407	Finally, the context stack pointer is actually decremented by C<CX_POP(cx)>.
	4408	After this point, it's possible that that the current context frame could
	4409	be overwritten by other contexts being pushed. Although things like ties
	4410	and C<DESTROY> are supposed to work within a new context stack, it's best
	4411	not to assume this. Indeed on debugging builds, C<CX_POP(cx)> deliberately
	4412	sets C<cx> to null to detect code that is still relying on the field
	4413	values in that context frame. Note in the C<pp_leavesub()> example above,
	4414	we grab C<blk_sub.retop> I<before> calling C<CX_POP>.
	4415
	4416	=head2 Redoing contexts
	4417
	4418	Finally, there is C<cx_topblock(cx)>, which acts like a super-C<nextstate>
	4419	as regards to resetting various vars to their base values. It is used in
	4420	places like C<pp_next>, C<pp_redo> and C<pp_goto> where rather than
	4421	exiting a scope, we want to re-initialise the scope. As well as resetting
	4422	C<PL_stack_sp> like C<nextstate>, it also resets C<PL_markstack_ptr>,
	4423	C<PL_scopestack_ix> and C<PL_curpm>. Note that it doesn't do a
	4424	C<FREETMPS>.
	4425
	4426
a40b4738 DM	4427	=head1 Reference-counted argument stack
	4428
	4429	=head2 Introduction
	4430
dfe607e3 DM	4431	As of perl 5.40, there is a build option, C<PERL_RC_STACK>, not enabled by
	4432	default, which requires that items pushed onto, or popped off the argument
	4433	stack have their reference counts adjusted. It is intended that eventually
	4434	this will be the default way (and finally the only way) to configure perl.
a40b4738 DM	4435
	4436	The macros which manipulate the stack such as PUSHs() and POPs() don't
	4437	adjust the reference count of the SV. Most of the time this is fine, since
	4438	something else is keeping the SV alive while on the argument stack, such
	4439	a pointer from the TEMPs stack, or from the pad (e.g. a lexical variable
	4440	or a C<PADTMP>). Occasionally this can go horribly wrong. For example,
	4441	this code:
	4442
	4443	my @a = (1,2,3);
	4444	sub f { @a = (); print "(@_)\n" };
	4445	f(@a, 4);
	4446
	4447	may print undefined or random freed values, since some of the elements of
	4448	@_, which have been aliased to the elements of @a, have been freed.
	4449	C<PERL_RC_STACK> is intended to fix this by making each SV pointer on the
	4450	argument stack increment the reference count (RC) of the SV by one.
	4451
	4452	In this new environment, unmodified existing PP and XS functions, which
	4453	have been written assuming a non reference-counted stack (non-RC for
	4454	short), are called via special wrapper functions which adjust the stack
	4455	before and after. At the moment there is no API to write an RC XS
	4456	function, so all XS code will continue to be called via a wrapper (which
	4457	makes them slightly slower), but means that in general, CPAN distributions
	4458	containing XS code code continue to work without modification.
	4459
	4460	However, PP functions, either in perl core, or those in XS functions used
	4461	to implement custom ops or to override the PP functions for built-in ops,
	4462	need dealing with specially. For the latter, they can just be wrapped;
	4463	this involves the least work, but has a performance impact. In the longer
	4464	term, and for core PP functions, they need unwrapping and rewriting using
	4465	a new API. With this, the old macros such as PUSHs() have been replaced
	4466	with a new set of (mostly inline) functions with a common prefix, such as
	4467	rpp_push_1(). "RPP" stands for "reference-counted push and pop functions".
	4468	The new functions modify the reference count on C<PERL_RC_STACK> builds,
	4469	while leaving them unadjusted otherwise. Thus in core they generally work
	4470	in both cases, while in XS code they are portable to older perl versions
	4471	via C<PPPort> (XXX assuming that they get been added to C<PPPort>).
	4472
	4473	The rest of this section is mainly concerned with how to convert existing
	4474	PP functions, and how to write new PP functions to use the new C<rpp_>
	4475	API.
	4476
	4477	A reference-counted perl can be built using the PERL_RC_STACK define.
	4478	For development and debugging purposes, it is best to enable leaking
	4479	scalar debugging too, as that displays extra information about scalars
0398979c	4480	that have leaked or been prematurely freed.
a40b4738 DM	4481
	4482	Configure -DDEBUGGING \
	4483	-Accflags='-DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS'
	4484
	4485	=head2 Reference counted stack states
	4486
	4487	In the new regime, the current argument stack can be in one of three
	4488	states, which can be determined by the shown expression.
	4489
	4490	=over
	4491
	4492	=item * not reference-counted
	4493
	4494	!AvREAL(PL_curstack)
	4495
	4496	In this case, perl will assume when emptying the stack (such as during a
	4497	croak()) that the items on it don't need freeing. This is the traditional
	4498	perl behaviour. On C<PERL_RC_STACK> builds, such stacks will be rarely
	4499	encountered.
	4500
	4501	=item * fully reference-counted
	4502
	4503	AvREAL(PL_curstack) && !PL_curstackinfo->si_stack_nonrc_base
	4504
	4505	All the items on the stack are reference counted, and will be freed by
	4506	functions like rpp_popfree_1() or if perl croak()s. This is the normal
	4507	state of the stack in C<PERL_RC_STACK> builds.
	4508
	4509	=item * partially reference-counted (split)
	4510
	4511	AvREAL(PL_curstack) && PL_curstackinfo->si_stack_nonrc_base > 0
	4512
	4513	In this case, items on the stack from the index C<si_stack_nonrc_base>
	4514	upwards are non-RC; those below are RC. This state occurs when a PP or XS
	4515	function has been wrapped. In this case, the wrapper function pushes a
	4516	non-RC copy of the arg pointers above the cut then calls the real
	4517	function. When that returns, the wrapper function bumps up the RC of any
	4518	returned args. See below for more details.
	4519
	4520	=back
	4521
	4522	Note that perl uses a stack-of-stacks, and the AvREAL() and
	4523	C<si_stack_nonrc_base> states are per stack. When perl starts up, the main
	4524	stack is RC, but by default, new stacks pushed in XS code via PUSHSTACKi()
	4525	are non-RC, so it is quite possible to get a mixture. The perl core itself
	4526	uses the new push_stackinfo() function which replaces PUSHSTACKi() and
	4527	allows you to specify that the new stack should be RC by default.
	4528	(XXX core mostly hasn't actually been updated yet to use push_stackinfo())
	4529
	4530	Most places in the core assume a particular RC environment. In particular,
	4531	it is assumed that within a runops loop, all the PP functions are
	4532	RC-aware, either because they have been (re)written to be aware, or
	4533	because they have been wrapped. Whenever a runops loop is entered via
	4534	CALLRUNOPS(), it will check the current state of the stack, and if it's
	4535	not fully RC, will temporarily update its contents to be fully RC before
	4536	entering the main runops loop. Then if necessary it will restore the stack
	4537	to its old state on return. This means that functions like call_sv(),
	4538	which can be called from any environment (e.g. RC core or wrapped and
	4539	temporarily non-RC XS code) will always do the Right Thing when invoking
	4540	the runops loop, no matter what the current stack state is.
	4541
	4542	Similarly, croaks and the like (which can occur anywhere) have to be able
	4543	to handle both stack types. So there are a few places in core - call_sv(),
	4544	eval_sv() etc, Perl_die_unwind() and S_my_exit_jump() - which have been
4545	specially crafted to handle both cases; everything else can assume a fixed
4546	environment.
4547
4548	=head2 Wrapping
4549
4550	Normally a core PP function is declared like this:
4551
4552	PP(pp_foo)
4553	{
4554	...
4555	}
4556
4557	This expands to something like:
4558
4559	OP* Perl_pp_foo(pTHX)
4560	{
4561	...
4562	}
4563
4564	When such a function needs to be wrapped, it is instead declared as:
4565
4566	PP_wrapped(pp_foo, nargs, nlists)
4567	{
4568	...
4569	}
4570
4571	which on non-RC builds, expands to the same as PP() (the extra args are
4572	ignored). On RC builds it expands to something like
4573
4574	OP* Perl_pp_foo(pTHX)
4575	{
4576	return Perl_pp_wrap(aTHX_ S_Perl_pp_foo_norc, nargs, nlists);
4577	}
4578
4579	STATIC OP* S_Perl_pp_foo_norc(pTHX)
4580	{
4581	...
4582	}
4583
4584	Here the externally visible PP function calls pp_wrap(), which adjusts
4585	the stack contents, then calls the hidden real body of the PP function,
4586	then on return, adjusts the stack back.
4587
4588	There is an API macro, XSPP_wrapped(), intended for use on PP functions
4589	declared in XS code, It is identical to PP_wrapped(), except that it
4590	doesn't prepend a C<Perl_> prefix to the function name.
4591
4592	The C<nargs> and C<nlists> parameters to the macro are numeric constants
4593	or simple expressions which specify how many arguments the PP function
4594	expects, or how many lists it expects. For example,
4595
4596	PP_wrapped(pp_add, 2, 0); /* consumes two args off the stack */
4597
4598	PP_wrapped(pp_readline, /* consumes one or two args */
4599	((PL_op->op_flags & OPf_STACKED) ? 2 : 1), 0);
4600
4601	PP_wrapped(pp_push, 0, 1); /* consumes one list */
4602
4603	PP_wrapped(pp_aassign, 0, 2); /* consumes two lists */
4604
4605	To understand what pp_wrap() does, consider calling Perl_pp_foo() which
4606	expects three arguments. On entry the stack may look like:
4607
4608	... A+ B+ C+
4609
4610	(where the C<+> indicates that the pointers to A, B and C are each
4611	reference counted). The wrapper function pp_wrap() marks a cut at the
4612	current stack position using C<si_stack_nonrc_base>, then, based on the
4613	value of C<nargs>, pushes a copy of those three pointers above the cut:
4614
4615	... A+ B+ C+ \| A0 B0 C0
4616
4617	(where the C<0> indicates that the pointers aren't RC), then calls the
4618	real PP function, S_Perl_pp_foo_norc(). That function processes A, B and C,
4619	pops them off the stack, and pushes some result SVs. None of this
4620	manipulation adjusts any RCs. On return to pp_wrap(), the stack may look
4621	something like:
4622
4623	... A+ B+ C+ \| X0 Y0
4624
4625	The wrapper function bumps up the RCs of X and Y, decrements A B C,
4626	shifts the results down and sets C<si_stack_nonrc_base> to zero, leaving
4627	the stack as:
4628
4629	... X+ Y+
4630
9d5d54aa DM	4631	In places like pp_entersub(), a similar wrapping (via the functions
9d5d54aa DM	4632	rpp_invoke_xs() and then xs_wrap()) is done when calling XS subs.
a40b4738 DM	4633
	4634	When C<nlists> is positive, a similar action takes place, except that the
	4635	mark stack is examined (and adjusted) in order to determine the number of
	4636	args that need copying.
	4637
	4638	A complex calling environment might have multiple nested stacks with
	4639	different RC states. Perl starts off with an RC stack. Then for example,
	4640	pp_entersub() is called, which (via xs_wrap()) splits the stack and
	4641	executes the XS function in a non-RC environment. That function may call
	4642	PUSHSTACKi(), which creates a new non-RC stack, then calls call_sv(), which
	4643	does CALLRUNOPS(), which causes the new stack to temporarily become RC.
	4644	Then a tied method is called, which pushes a new RC stack, and so on. (XXX
	4645	currently tied methods actually push a non-RC stack. To be fixed soon).
	4646
	4647	=head2 (Re)writing a PP function using the rpp_() API
	4648
	4649	Wrapping a PP function has a performance overhead, and is there mainly as
	4650	a temporary crutch. Eventually, PP functions should be updated to use
	4651	rpp_() functions, and any new PP functions should be written this way from
	4652	scratch and thus not ever need wrapping.
	4653
9ba4b38b DM	4654	A couple examples of core PP functions being converted can be seen in the
	4655	commits C<v5.39.1-304-g205fcd8410> and C<v5.39.1-303-g2fe263a83a>, which
	4656	demonstrate a unary and a binary op being converted (pp_not() and
	4657	pp_and()).
	4658
a40b4738 DM	4659	The traditional PP stack API consisted of a C<dSP> declaration, plus a
	4660	number of macros to push, pop and extend the stack. A I<very simplified>
	4661	pp_add() function might look something like:
	4662
	4663	PP(pp_add)
	4664	{
	4665	dSP;
	4666	dTARGET;
	4667	IV right = SvIV(POPs);
	4668	IV left = SvIV(POPs);
	4669	TARGi(left + right, 1);
	4670	PUSHs(TARG);
	4671	PUTBACK;
	4672	return NORMAL;
	4673	}
	4674
	4675	which expands to something like:
	4676
	4677	{
	4678	SV **sp = PL_stack_sp;
	4679	SV *targ = PAD_SV(PL_op->op_targ);
	4680	IV right = SvIV(*sp--);
	4681	IV left = SvIV(*sp--);
	4682	sv_setiv(targ, left + right);
	4683	*++sp = targ;
	4684	PL_stack_sp = sp;
	4685	return PL_op->op_next;
	4686	}
	4687
	4688	The whole C<dSP> thing harks back to the days before decent optimising
	4689	compilers. It was always error-prone, e.g. if you forgot a C<PUTBACK> or
	4690	C<SPAGAIN>. The new API always just accesses C<PL_stack_sp> directly. In
	4691	fact the first step of upgrading a PP function is always to remove the
	4692	C<dSP> declaration. This has the happy side effect that any old-style
	4693	macros left in the pp function which implicitly use C<sp> will become
	4694	compile errors. The existence of a C<dSP> somewhere in core is a good sign
	4695	that that function still needs updating.
	4696
	4697	An obvious question is: why not just modify the definitions of the PUSHs()
	4698	etc macros to modify reference counts on RC builds? The basic problem is
	4699	that an SV may now be kept alive only by a single reference count from
	4700	the stack (formerly, they tended to be on the TEMPs stack too). So in code
	4701	like:
	4702
	4703	SV *sv = POPs;
	4704	IV i = SvIV(sv);
	4705
	4706	including an SvREFCNT_dec() in the C<POPs> macro definition would cause
	4707	C<sv> to be freed immediately, before its integer value can be read.
	4708
	4709	A potential issue with the new regime is that perl can croak at basically
	4710	any point in execution (e.g. the SvIV() above might call FETCH() on a tied
	4711	variable which then croaks). Thus at all times, the RC of each SV must be
	4712	properly accounted for. In the example above, a naive approach to avoiding
	4713	a premature free of C<sv> might be:
	4714
	4715	SV sv = PL_stack_sp--;
	4716	IV i = SvIV(sv);
	4717	SvREFCNT_dec(sv); // got i, so ok to free sv now
	4718
	4719	but that means that C<sv> leaks if SvIV() triggers a croak.
	4720
	4721	To avoid that, the new regime has the general outline that arguments are
	4722	left on the stack I<until they are finished with>, then removed and their
4723	reference count adjusted at that point. With the new API, the pp_add()
4724	function looks something like:
4725
4726	{
4727	dTARGET;
4728	IV right = SvIV(PL_stack_sp[ 0]); // NB: arguments left on stack
4729	IV left = SvIV(PL_stack_sp[-1]);
4730	TARGi(left + right, 1);
4731	rpp_replace_2_1(targ);
4732	return NORMAL;
4733	}
4734
4735	The rpp_replace_2_1() function pops two values off the stack and pushes
4736	one new value on, while adjusting reference counts as appropriate
4737	(depending on whether built with C<PERL_RC_STACK> or not).
4738
4739	The rpp_() functions in the new API will be described in detail below, but
4740	in summary:
4741
4742	new function approximate old equivant
4743	------------ -----------------------
4744
4745	rpp_extend(n) EXTEND(SP, n)
4746
4747	rpp_push_1(sv) PUSHs(sv)
4748	rpp_push_2(sv1, sv2)) PUSHs(sv1); PUSHs(sv2)
4749	rpp_xpush_1(sv) XPUSHs(sv)
4750	rpp_xpush_2(sv1, sv2)) EXTEND(SP,2); PUSHs(sv1); PUSHs(sv2);
4751
4752	rpp_push_1_norc(sv) mPUSHs(sv) // on RC bulds, skips RC++;
4753	// on non-RC builds, mortalises
4754	rpp_popfree_1() (void)POPs;
4755	rpp_popfree_2() (void)POPs; (void)POPs;
4756	rpp_popfree_to(svp) PL_stack_sp = svp;
4757	rpp_obliterate_stack_to(ix) // see description below
4758
4759	sv = rpp_pop_1_norc() sv = SvREFCNT_inc(POPs)
4760
4761	rpp_replace_1_1(sv) (void)POPs; PUSHs(sv);
4762	rpp_replace_2_1(sv) (void)POPs; (void)POPs; PUSHs(sv);
2505eae6	4763	rpp_replace_at(sp, sv) *sp = sv;
f11efeed	4764	rpp_replace_at_norc(sp, sv) *sp = sv_2mortal(sv);
a40b4738	4765
ebb19935 DM	4766	rpp_context(mark, gimme,
	4767	extra) SP -= extra;
	4768	// impose void/scalar/list context on return args
	4769	SP = (gimme == G_VOID) ? mark : ....
	4770
a40b4738 DM	4771	rpp_try_AMAGIC_1() tryAMAGICun_MG()
	4772	rpp_try_AMAGIC_2() tryAMAGICbin_MG()
	4773
	4774	rpp_is_lone(sv) SvTEMP(sv) && SvREFCNT(sv) == 1
	4775	rpp_stack_is_rc() no equivalent
	4776
9d5d54aa DM	4777	rpp_invoke_xs(cv) CvXSUB(cv)(aTHX_ cv);
9d5d54aa DM	4778
ebb19935	4779
a40b4738 DM	4780	(no replacement) dATARGET // just write the macro body in full
a40b4738 DM	4781
cc49ed3d DM	4782	There are also some C<_NN> variants which assume that any items being
	4783	removed from the stack are non-NULL, and so are slightly more efficient:
	4784
	4785	rpp_popfree_1_NN()
	4786	rpp_popfree_2_NN()
	4787	rpp_popfree_to_NN(svp)
	4788
	4789	rpp_replace_1_1_NN(sv)
	4790	rpp_replace_2_1_NN(sv)
	4791	rpp_replace_at_NN(sp, sv)
	4792	rpp_replace_at_norc_NN(sp, sv)
	4793
7f04cfc0 DM	4794	There are also a few C<_IMM> variants, which expect the single pushed or
	4795	replacement value to be an immortal, such as C<&PL_sv_undef> - this skips
	4796	incrementing the ref count of the immortal SV. It doesn't matter if the
	4797	ref count of the SV prematurely reaches zero, as sv_free2() will just
	4798	resurrect it. Not every variant is provided; if a suitable one
	4799	doesn't exist, just using a standard C<_1> version is fine, albeit
	4800	slightly slower.
	4801
	4802	rpp_push_IMM(&PL_sv_undef)
	4803	rpp_xpush_IMM(&PL_sv_zero)
	4804	rpp_replace_1_IMM_NN(&PL_sv_yes)
	4805	rpp_replace_2_IMM_NN(&PL_sv_no)
	4806
a40b4738 DM	4807	Other new C and perl functions related to reference-counted stacks are:
	4808
	4809	push_stackinfo(type,rc) PUSHSTACKi(type)
	4810	pop_stackinfo() POPSTACK()
	4811	switch_argstack(to) SWITCHSTACK(from,to)
	4812
	4813	(Internals::stack_refcounted() & 1) # perl built with PERL_RC_STACK
	4814
	4815	Some of these new functions are trivial, but should be used in preference
	4816	to writing direct code because they will work on both RC and non-RC
	4817	builds, and may do extra checks and assertions on C<DEBUGGING> builds.
	4818
	4819	Note that rpp_popfree_1() etc aren't direct replacements for C<POPs>. The
	4820	rpp_() variants don't return a value and are intended to be called when
	4821	the SV is finished with. So
	4822
	4823	SV *sv = POPs;
	4824	... do stuff with sv ...
	4825
	4826	becomes
	4827
	4828	SV sv = PL_stack_sp;
	4829	... do stuff with sv ...
	4830	rpp_popfree_1(); /* does SvREFCNT_dec(PL_stack_sp--) /
	4831
	4832	The rpp_replace_M_N() functions are shortcuts for popping and freeing C<M>
	4833	items then pushing and bumping up the RCs of C<N> items. Note that they
	4834	handle edge cases such as an old and new SV being the same.
	4835
2505eae6 DM	4836	rpp_replace_at(sp, sv) is similar to rpp_replace_1_1(), except that
	4837	it replaces an SV at an address in the stack rather than at the top.
	4838
f11efeed DM	4839	rpp_replace_at_norc(sp, sv) is similar to rpp_replace_at(), except that
	4840	it assumes that C<sv> already has a bumped reference count. So, a bit
	4841	like rpp_push_1_norc() (see below), it doesn't bother increasing C<sv>'s
	4842	reference count, or on non-RC builds it mortalises it instead.
	4843
a40b4738 DM	4844	rpp_popfree_to(svp) is designed to replace code like
	4845
	4846	PL_stack_sp = PL_stack_base + cx->blk_oldsp;
	4847
	4848	which typically appears in list ops or scope exits when the arguments are
	4849	finished with. Left unaltered, all the SVs above C<oldsp> would leak. The
	4850	new approach is
	4851
	4852	rpp_popfree_to(PL_stack_base + cx->blk_oldsp);
	4853
	4854	There is a rarely-used variant of this, rpp_obliterate_stack_to(), which
	4855	pops the stack back to the specified index regardless of the current RC
	4856	state of the stack. So for example if the stack is split, it will only
	4857	adjust the RCs of any SVs which are below the split point, while
	4858	rpp_popfree_to() would mindlessly free I<all> SVs (on RC builds anyway).
	4859	For normal PP functions you should only ever use rpp_popfree_to(), which
	4860	is faster.
	4861
	4862	There are no new equivalents for all the convenience macros like POPi()
	4863	and (shudder) dPOPPOPiirl(). These should be replaced with the rpp_()
	4864	functions above and with the conversions and variable declarations being
	4865	made explicit, e.g. dPOPPOPiirl() becomes:
	4866
	4867	IV right = SvIV(PL_stack_sp[ 0]);
	4868	IV left = SvIV(PL_stack_sp[-1]);
	4869	rpp_popfree_2();
	4870
	4871	A couple of the rpp_() functions with C<norc> in their names don't adjust
	4872	the reference count on RC builds (but, conversely, do on non-RC builds).
	4873
	4874	rpp_push_1_norc(sv) does a simple C<*++PL_stack_sp = sv> on RC builds. It
	4875	is typically used to "root" a newly-created SV, which already has an RC of
	4876	1. On non-RC builds it mortalises the SV instead. So for example, code
	4877	which used to look like
	4878
	4879	mPUSHs(newSViv(i));
	4880
	4881	and which expanded to the equivalent of:
	4882
	4883	PUSHs(sv_2mortal(newSViv(i));
	4884
	4885	should be rewritten as:
	4886
	4887	rpp_push_1_norc(newSViv(i));
	4888
	4889	This is because newSViv() and similar create a new SV with a reference
	4890	count one too high (1 rather than 0). This count is then "donated" to the
	4891	stack by pushing it. Conversely on non-RC builds, the count is donated to
	4892	the TEMPs stack.
	4893
	4894	Similarly, on RC builds, C<sv = rpp_pop_1_norc()> does a simple
	4895	C<sv = *PL_stack_sv--> without adjusting the reference count, while on
	4896	non-RC builds it actually increments the SV's reference count. It is
	4897	intended for cases where you immediately want to increment the reference
	4898	count again after popping, e.g. where the SV is to be immediately embedded
	4899	somewhere. For example this code:
	4900
	4901	SV *sv = PL_stack_sp[0];
	4902	SvREFCNT_inc(sv);
	4903	av_store(av, i, sv); /* in real life should check return value */
	4904	rpp_popfree_1();
	4905
	4906	can be more efficiently written as
	4907
4908	av_store(av, i, rpp_pop_1_norc());
4909
4910	By using this function, the code works correctly on both RC and non-RC
4911	builds.
4912
ebb19935 DM	4913	A common operation on list ops is to impose void, scalar or list context
	4914	on the return arguments, possibly discarding all, or all except one, of
	4915	them. rpp_context(mark, gimme, extra) does this. As a first step (for
	4916	convenience and efficiency) it notionally pops C<extra> args off the
	4917	stack. Then for list context, leaves things as is. For void context, the
	4918	stack pointer is reset to mark, and everything above is popped. For
	4919	scalar, the top argument (or &PL_sv_undef) is moved from the top to
	4920	mark+1 and everything above is discarded.
	4921
a40b4738 DM	4922	The macros which appear at the start of many PP functions to check for
	4923	unary or binary op overloading (among other things) have been replaced
	4924	with rpp_try_AMAGIC_1() and _2() inline functions, which now rely on the
	4925	calling PP function to choose whether to return immediately rather than
	4926	the return being hidden away in the macro.
	4927
9d5d54aa DM	4928	The rpp_invoke_xs() function calls the XS function associated with the CV,
	4929	but may do so via a wrapper function to adjust the stack as necessary.
	4930
a40b4738 DM	4931	In the spirit of hiding away less in macros, C<dATARGET> hasn't been given
	4932	a replacement; where its effect is needed, it is now written out in full;
	4933	see pp_add() for an example.
	4934
	4935	Finally, a couple of rpp() functions provide information rather than
	4936	manipulate the stack.
	4937
	4938	rpp_is_lone(sv) indicates whether C<sv>, assumed to be still on the stack,
	4939	it kept alive only by a single reference-counted pointer from the argument
	4940	and/or temps stacks, and thus is a candidate for some optimisations (like
	4941	skipping the copying of return arguments from a subroutine call).
	4942
	4943	rpp_stack_is_rc() indicates whether the current stack is currently
	4944	reference-counted. It's used mainly in a few places like call_sv() which
	4945	can be called from anywhere, and thus have to deal with both cases.
	4946
	4947	So for example, rather than using rpp_xpush_1(), call_sv() has lines like:
	4948
	4949	rpp_extend(1);
	4950	*++PL_stack_sp = sv;
	4951	#ifdef PERL_RC_STACK
	4952	if (rpp_stack_is_rc())
	4953	SvREFCNT_inc_simple_void_NN(sv);
	4954	#endif
	4955
	4956	which works on both standard builds and RC builds, and works whether
	4957	call_sv() is called from a standard PP function (rpp_stack_is_rc() is
	4958	true) or from a wrapped PP or XS function (rpp_stack_is_rc() is false).
	4959	Note that you're unlikely to need to use this function, as in most places,
	4960	such as PP or XS functions, it is always RC or non-RC respectively. In
0398979c DM	4961	fact on debugging builds under C<PERL_RC_STACK>, PUSHs() and similar
	4962	macros include an C<assert(!rpp_stack_is_rc())>, while rpp_push_1() and
	4963	similar functions have C<assert(rpp_stack_is_rc())>.
a40b4738 DM	4964
	4965	The macros for pushing new stackinfos have been replaced with inline
	4966	functions which don't rely on C<dSP> being in scope, and which have less
	4967	ambiguous names: they make it clear that a new I<stackinfo> is being
	4968	pushed, rather than just some sort of I<stack>. push_stackinfo() also has
	4969	a boolean argument indicating whether the new argument stack should be
	4970	reference-counted or not. For backwards compatibility, PUSHSTACKi(type) is
	4971	defined to be push_stackinfo(type, 0).
	4972
	4973	Some test scripts check for things like leaks by testing that the
	4974	reference count of a particular variable has an expected value. If this
	4975	is different on a perl built with C<PERL_RC_STACK>, then the perl
	4976	function Internals::stack_refcounted() can be used. This returns an
	4977	integer, the lowest bit of which indicates that perl was built with
	4978	C<PERL_RC_STACK>. Other bits are reserved for future use and should be
	4979	masked out.
	4980
091ff1b6 AC	4981	=head1 Slab-based operator allocation
	4982
	4983	B<Note:> this section describes a non-public internal API that is subject
	4984	to change without notice.
	4985
	4986	Perl's internal error-handling mechanisms implement C<die> (and its internal
	4987	equivalents) using longjmp. If this occurs during lexing, parsing or
	4988	compilation, we must ensure that any ops allocated as part of the compilation
	4989	process are freed. (Older Perl versions did not adequately handle this
	4990	situation: when failing a parse, they would leak ops that were stored in
	4991	C C<auto> variables and not linked anywhere else.)
	4992
	4993	To handle this situation, Perl uses I<op slabs> that are attached to the
	4994	currently-compiling CV. A slab is a chunk of allocated memory. New ops are
	4995	allocated as regions of the slab. If the slab fills up, a new one is created
	4996	(and linked from the previous one). When an error occurs and the CV is freed,
	4997	any ops remaining are freed.
	4998
	4999	Each op is preceded by two pointers: one points to the next op in the slab, and
	5000	the other points to the slab that owns it. The next-op pointer is needed so
	5001	that Perl can iterate over a slab and free all its ops. (Op structures are of
	5002	different sizes, so the slab's ops can't merely be treated as a dense array.)
	5003	The slab pointer is needed for accessing a reference count on the slab: when
	5004	the last op on a slab is freed, the slab itself is freed.
	5005
	5006	The slab allocator puts the ops at the end of the slab first. This will tend to
	5007	allocate the leaves of the op tree first, and the layout will therefore
	5008	hopefully be cache-friendly. In addition, this means that there's no need to
	5009	store the size of the slab (see below on why slabs vary in size), because Perl
	5010	can follow pointers to find the last op.
	5011
eab86acb	5012	It might seem possible to eliminate slab reference counts altogether, by having
091ff1b6 AC	5013	all ops implicitly attached to C<PL_compcv> when allocated and freed when the
	5014	CV is freed. That would also allow C<op_free> to skip C<FreeOp> altogether, and
	5015	thus free ops faster. But that doesn't work in those cases where ops need to
	5016	survive beyond their CVs, such as re-evals.
	5017
	5018	The CV also has to have a reference count on the slab. Sometimes the first op
	5019	created is immediately freed. If the reference count of the slab reaches 0,
	5020	then it will be freed with the CV still pointing to it.
	5021
	5022	CVs use the C<CVf_SLABBED> flag to indicate that the CV has a reference count
	5023	on the slab. When this flag is set, the slab is accessible via C<CvSTART> when
	5024	C<CvROOT> is not set, or by subtracting two pointers C<(2sizeof(I32 ))> from
	5025	C<CvROOT> when it is set. The alternative to this approach of sneaking the slab
	5026	into C<CvSTART> during compilation would be to enlarge the C<xpvcv> struct by
	5027	another pointer. But that would make all CVs larger, even though slab-based op
	5028	freeing is typically of benefit only for programs that make significant use of
	5029	string eval.
	5030
6ef63541 KW	5031	=for apidoc_section $concurrency
	5032	=for apidoc Cmnh\| \|CVf_SLABBED
	5033	=for apidoc_item \|OP \|CvROOT\|CV sv
	5034	=for apidoc_item \|OP \|CvSTART\|CV sv
	5035
091ff1b6 AC	5036	When the C<CVf_SLABBED> flag is set, the CV takes responsibility for freeing
	5037	the slab. If C<CvROOT> is not set when the CV is freed or undeffed, it is
	5038	assumed that a compilation error has occurred, so the op slab is traversed and
	5039	all the ops are freed.
	5040
	5041	Under normal circumstances, the CV forgets about its slab (decrementing the
	5042	reference count) when the root is attached. So the slab reference counting that
	5043	happens when ops are freed takes care of freeing the slab. In some cases, the
	5044	CV is told to forget about the slab (C<cv_forget_slab>) precisely so that the
	5045	ops can survive after the CV is done away with.
	5046
	5047	Forgetting the slab when the root is attached is not strictly necessary, but
	5048	avoids potential problems with C<CvROOT> being written over. There is code all
	5049	over the place, both in core and on CPAN, that does things with C<CvROOT>, so
	5050	forgetting the slab makes things more robust and avoids potential problems.
	5051
	5052	Since the CV takes ownership of its slab when flagged, that flag is never
	5053	copied when a CV is cloned, as one CV could free a slab that another CV still
	5054	points to, since forced freeing of ops ignores the reference count (but asserts
	5055	that it looks right).
	5056
	5057	To avoid slab fragmentation, freed ops are marked as freed and attached to the
	5058	slab's freed chain (an idea stolen from DBM::Deep). Those freed ops are reused
	5059	when possible. Not reusing freed ops would be simpler, but it would result in
	5060	significantly higher memory usage for programs with large C<if (DEBUG) {...}>
	5061	blocks.
	5062
	5063	C<SAVEFREEOP> is slightly problematic under this scheme. Sometimes it can cause
	5064	an op to be freed after its CV. If the CV has forcibly freed the ops on its
	5065	slab and the slab itself, then we will be fiddling with a freed slab. Making
	5066	C<SAVEFREEOP> a no-op doesn't help, as sometimes an op can be savefreed when
	5067	there is no compilation error, so the op would never be freed. It holds
	5068	a reference count on the slab, so the whole slab would leak. So C<SAVEFREEOP>
	5069	now sets a special flag on the op (C<< ->op_savefree >>). The forced freeing of
	5070	ops after a compilation error won't free any ops thus marked.
	5071
	5072	Since many pieces of code create tiny subroutines consisting of only a few ops,
	5073	and since a huge slab would be quite a bit of baggage for those to carry
	5074	around, the first slab is always very small. To avoid allocating too many
	5075	slabs for a single CV, each subsequent slab is twice the size of the previous.
	5076
	5077	Smartmatch expects to be able to allocate an op at run time, run it, and then
0985f7e5	5078	throw it away. For that to work the op is simply malloced when C<PL_compcv> hasn't
091ff1b6 AC	5079	been set up. So all slab-allocated ops are marked as such (C<< ->op_slabbed >>),
	5080	to distinguish them from malloced ops.
	5081
	5082
954c1994	5083	=head1 AUTHORS
e89caa19	5084
954c1994	5085	Until May 1997, this document was maintained by Jeff Okamoto
9b5bb84f SB	5086	E<lt>okamoto@corp.hp.comE<gt>. It is now maintained as part of Perl
9b5bb84f SB	5087	itself by the Perl 5 Porters E<lt>perl5-porters@perl.orgE<gt>.
cb1a09d0	5088
954c1994 GS	5089	With lots of help and suggestions from Dean Roehrich, Malcolm Beattie,
	5090	Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil
	5091	Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer,
	5092	Stephen McCamant, and Gurusamy Sarathy.
cb1a09d0	5093
954c1994	5094	=head1 SEE ALSO
cb1a09d0	5095
ba555bf5	5096	L<perlapi>, L<perlintern>, L<perlxs>, L<perlembed>