=head1 DESCRIPTION
This document attempts to describe how to use the Perl API, as well as
-to provide some info on the basic workings of the Perl core. It is far
-from complete and probably contains many errors. Please refer any
+to provide some info on the basic workings of the Perl core. It is far
+from complete and probably contains many errors. Please refer any
questions or comments to the author below.
=head1 Variables
Additionally, there is the UV, which is simply an unsigned IV.
Perl also uses two special typedefs, I32 and I16, which will always be at
-least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16,
+least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16,
as well.) They will usually be exactly 32 and 16 bits long, but on Crays
they will both be 64 bits.
might not be terminated by a NUL.
Also remember that C doesn't allow you to safely say C<foo(SvPV(s, len),
-len);>. It might work with your compiler, but it won't work for everyone.
+len);>. It might work with your
+compiler, but it won't work for everyone.
Break this sort of statement up into separate assignments:
SV *s;
yourself. The third function processes its arguments like C<sprintf> and
appends the formatted output. The fourth function works like C<vsprintf>.
You can specify the address and length of an array of SVs instead of the
-va_list argument. The fifth function extends the string stored in the first
+va_list argument. The fifth function
+extends the string stored in the first
SV with the string stored in the second SV. It also forces the second SV
to be interpreted as a string.
The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>.
-Its address can be used whenever an C<SV*> is needed. Make sure that
-you don't try to compare a random sv with C<&PL_sv_undef>. For example
+Its address can be used whenever an C<SV*> is needed. Make sure that
+you don't try to compare a random sv with C<&PL_sv_undef>. For example
when interfacing Perl code, it'll work correctly for:
foo(undef);
Perl provides the function C<sv_chop> to efficiently remove characters
from the beginning of a string; you give it an SV and a pointer to
somewhere inside the PV, and it discards everything before the
-pointer. The efficiency comes by means of a little hack: instead of
+pointer. The efficiency comes by means of a little hack: instead of
actually removing the characters, C<sv_chop> sets the flag C<OOK>
(offset OK) to signal to other functions that the offset hack is in
effect, and it moves the PV pointer (called C<SvPVX>) forward
LEN = 5
Here the number of bytes chopped off (1) is put into IV, and
-C<Devel::Peek::Dump> helpfully reminds us that this is an offset. The
+C<Devel::Peek::Dump> helpfully reminds us that this is an offset. The
portion of the string between the "real" and the "fake" beginnings is
shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect
the fake beginning, not the real one.
Something similar to the offset hack is performed on AVs to enable
efficient shifting and splicing off the beginning of the array; while
C<AvARRAY> points to the first element in the array that is visible from
-Perl, C<AvALLOC> points to the real start of the C array. These are
+Perl, C<AvALLOC> points to the real start of the C array. These are
usually the same, but a C<shift> operation can be carried out by
increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvMAX>.
Again, the location of the real start of the C array only comes into
-play when freeing the array. See C<av_shift> in F<av.c>.
+play when freeing the array. See C<av_shift> in F<av.c>.
=head2 What's Really Stored in an SV?
There are various ways in which the private and public flags may differ.
For example, a tied SV may have a valid underlying value in the IV slot
(so SvIOKp is true), but the data should be accessed via the FETCH
-routine rather than directly, so SvIOK is false. Another is when
+routine rather than directly, so SvIOK is false. Another is when
numeric conversion has occurred and precision has been lost: only the
-private flag is set on 'lossy' values. So when an NV is converted to an
+private flag is set on 'lossy' values. So when an NV is converted to an
IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be.
In general, though, it's best to use the C<Sv*V> macros.
=head2 AVs, HVs and undefined values
-Sometimes you have to store undefined values in AVs or HVs. Although
-this may be a rare case, it can be tricky. That's because you're
+Sometimes you have to store undefined values in AVs or HVs. Although
+this may be a rare case, it can be tricky. That's because you're
used to using C<&PL_sv_undef> if you need an undefined SV.
For example, intuition tells you that this XS code:
Modification of non-creatable hash value attempted
In perl 5.8.0, C<&PL_sv_undef> was also used to mark placeholders
-in restricted hashes. This caused such hash entries not to appear
+in restricted hashes. This caused such hash entries not to appear
when iterating over the hash or when checking for the keys
with the C<hv_exists> function.
You can run into similar problems when you store C<&PL_sv_yes> or
-C<&PL_sv_no> into AVs or HVs. Trying to modify such elements
+C<&PL_sv_no> into AVs or HVs. Trying to modify such elements
will give you the following error:
Modification of a read-only value attempted
int sv_isobject(SV* sv);
The following function tests whether the SV is derived from the specified
-class. SV can be either a reference to a blessed object or a string
-containing a class name. This is the function implementing the
+class. SV can be either a reference to a blessed object or a string
+containing a class name. This is the function implementing the
C<UNIVERSAL::isa> functionality.
bool sv_derived_from(SV* sv, const char* name);
=head2 Reference Counts and Mortality
-Perl uses a reference count-driven garbage collection mechanism. SVs,
+Perl uses a reference count-driven garbage collection mechanism. SVs,
AVs, or HVs (xV for short in the following) start their life with a
reference count of 1. If the reference count of an xV ever drops to 0,
then it will be destroyed and its memory made available for reuse.
"Mortal" SVs are mainly used for SVs that are placed on perl's stack.
For example an SV which is created just to pass a number to a called sub
is made mortal to have it cleaned up automatically when it's popped off
-the stack. Similarly, results returned by XSUBs (which are pushed on the
+the stack. Similarly, results returned by XSUBs (which are pushed on the
stack) are often made mortal.
To create a mortal variable, use the functions:
You should be careful about creating mortal variables. Strange things
can happen if you make the same value mortal within multiple contexts,
-or if you make a variable mortal multiple times. Thinking of "Mortalization"
+or if you make a variable mortal multiple
+times. Thinking of "Mortalization"
as deferred C<SvREFCNT_dec> should help to minimize such problems.
For example if you are passing an SV which you I<know> has a high enough REFCNT
to survive its use on the stack you need not do any mortalization.
feature.
If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to
-convert C<sv> to type C<SVt_PVMG>. Perl then continues by adding new magic
+convert C<sv> to type C<SVt_PVMG>.
+Perl then continues by adding new magic
to the beginning of the linked list of magical features. Any prior entry
of the same type of magic is deleted. Note that this can be overridden,
and multiple instances of the same type of magic can be associated with an
SV.
The C<name> and C<namlen> arguments are used to associate a string with
-the magic, typically the name of a variable. C<namlen> is stored in the
+the magic, typically the name of a variable. C<namlen> is stored in the
C<mg_len> field and if C<name> is non-null then either a C<savepvn> copy of
C<name> or C<name> itself is stored in the C<mg_ptr> field, depending on
whether C<namlen> is greater than zero or equal to zero respectively. As a
The sv_magic function uses C<how> to determine which, if any, predefined
"Magic Virtual Table" should be assigned to the C<mg_virtual> field.
See the L<Magic Virtual Tables> section below. The C<how> argument is also
-stored in the C<mg_type> field. The value of C<how> should be chosen
-from the set of macros C<PERL_MAGIC_foo> found in F<perl.h>. Note that before
+stored in the C<mg_type> field. The value of
+C<how> should be chosen from the set of macros
+C<PERL_MAGIC_foo> found in F<perl.h>. Note that before
these macros were added, Perl internals used to directly use character
literals, so you may occasionally come across old code or documentation
referring to 'U' magic rather than C<PERL_MAGIC_uvar> for example.
was initially made magical.
However, note that C<sv_unmagic> removes all magic of a certain C<type> from the
-C<SV>. If you want to remove only certain magic of a C<type> based on the magic
+C<SV>. If you want to remove only certain
+magic of a C<type> based on the magic
virtual table, use C<sv_unmagicext> instead:
int sv_unmagicext(SV *sv, int type, MGVTBL *vtbl);
The last three slots are a recent addition, and for source code
compatibility they are only checked for if one of the three flags
-MGf_COPY, MGf_DUP or MGf_LOCAL is set in mg_flags. This means that most
-code can continue declaring a vtable as a 5-element value. These three are
+MGf_COPY, MGf_DUP or MGf_LOCAL is set in mg_flags.
+This means that most code can continue declaring
+a vtable as a 5-element value. These three are
currently used exclusively by the threading code, and are highly subject
to change.
When an uppercase and lowercase letter both exist in the table, then the
uppercase letter is typically used to represent some kind of composite type
(a list or a hash), and the lowercase letter is used to represent an element
-of that composite type. Some internals code makes use of this case
+of that composite type. Some internals code makes use of this case
relationship. However, 'v' and 'V' (vec and v-string) are in no way related.
The C<PERL_MAGIC_ext> and C<PERL_MAGIC_uvar> magic types are defined
For C<PERL_MAGIC_ext> magic, it is usually a good idea to define an
C<MGVTBL>, even if all its fields will be C<0>, so that individual
C<MAGIC> pointers can be identified as a particular kind of magic
-using their magic virtual table. C<mg_findext> provides an easy way
+using their magic virtual table. C<mg_findext> provides an easy way
to do that:
STATIC MGVTBL my_vtbl = { 0, 0, 0, 0, 0, 0, 0, 0 };
* type */
This routine returns a pointer to a C<MAGIC> structure stored in the SV.
-If the SV does not have that magical feature, C<NULL> is returned. If the
+If the SV does not have that magical
+feature, C<NULL> is returned. If the
SV has multiple instances of that magical feature, the first one will be
-returned. C<mg_findext> can be used to find a C<MAGIC> structure of an SV
+returned. C<mg_findext> can be used
+to find a C<MAGIC> structure of an SV
based on both its magic type and its magic virtual table:
MAGIC *mg_findext(SV *sv, int type, MGVTBL *vtbl);
WARNING: As of the 5.004 release, proper usage of the array and hash
access functions requires understanding a few caveats. Some
of these caveats are actually considered bugs in the API, to be fixed
-in later releases, and are bracketed with [MAYCHANGE] below. If
+in later releases, and are bracketed with [MAYCHANGE] below. If
you find yourself actually applying such information in this section, be
aware that the behavior may change in the future, umm, without warning.
tie function from an XSUB, you must mimic this behaviour. The code below
carries out the necessary steps - firstly it creates a new hash, and then
creates a second hash which it blesses into the class which will implement
-the tie methods. Lastly it ties the two hashes together, and returns a
+the tie methods. Lastly it ties the two hashes together, and returns a
reference to the new tied hash. Note that the code below does NOT call the
TIEHASH method in the MyTie class -
see L<Calling Perl Routines from within C Programs> for details on how
The biggest difference is that the first construction would
reinstate the initial value of $var, irrespective of how control exits
-the block: C<goto>, C<return>, C<die>/C<eval>, etc. It is a little bit
+the block: C<goto>, C<return>, C<die>/C<eval>, etc. It is a little bit
more efficient as well.
There is a way to achieve a similar task from C via Perl API: create a
I<pseudo-block>, and arrange for some changes to be automatically
undone at the end of it, either explicit, or via a non-local exit (via
-die()). A I<block>-like construct is created by a pair of
+die()). A I<block>-like construct is created by a pair of
C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">).
Such a construct may be created specially for some important localized
task, or an existing one (like boundaries of enclosing Perl
subroutine/block, or an existing pair for freeing TMPs) may be
-used. (In the second case the overhead of additional localization must
-be almost negligible.) Note that any XSUB is automatically enclosed in
+used. (In the second case the overhead of additional localization must
+be almost negligible.) Note that any XSUB is automatically enclosed in
an C<ENTER>/C<LEAVE> pair.
Inside such a I<pseudo-block> the following service is available:
=item C<SAVEPPTR(p)>
These macros arrange things to restore the value of pointers C<s> and
-C<p>. C<s> must be a pointer of a type which survives conversion to
+C<p>. C<s> must be a pointer of a type which survives conversion to
C<SV*> and back, C<p> should be able to survive conversion to C<char*>
and back.
=item C<SAVEDELETE(HV *hv, char *key, I32 length)>
-The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The
+The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The
string pointed to by C<key> is Safefree()ed. If one has a I<key> in
short-lived storage, the corresponding string may be reallocated like
this:
Duplicates the current value of C<SV>, on the exit from the current
C<ENTER>/C<LEAVE> I<pseudo-block> will restore the value of C<SV>
-using the stored value. It doesn't handle magic. Use C<save_scalar> if
+using the stored value. It doesn't handle magic. Use C<save_scalar> if
magic is affected.
=item C<void save_list(SV **sarg, I32 maxsarg)>
and C<num> is the number of elements the stack should be extended by.
Now that there is room on the stack, values can be pushed on it using C<PUSHs>
-macro. The pushed values will often need to be "mortal" (See
+macro. The pushed values will often need to be "mortal" (See
L</Reference Counts and Mortality>):
PUSHs(sv_2mortal(newSViv(an_integer)))
=head2 Putting a C value on Perl stack
A lot of opcodes (this is an elementary operation in the internal perl
-stack machine) put an SV* on the stack. However, as an optimization
-the corresponding SV is (usually) not recreated each time. The opcodes
+stack machine) put an SV* on the stack. However, as an optimization
+the corresponding SV is (usually) not recreated each time. The opcodes
reuse specially assigned SVs (I<target>s) which are (as a corollary)
not constantly freed/created.
others, which use it via C<(X)PUSH[iunp]>.
Because the target is reused, you must be careful when pushing multiple
-values on the stack. The following code will not do what you think:
+values on the stack. The following code will not do what you think:
XPUSHi(10);
XPUSHi(20);
=head2 Scratchpads
The question remains on when the SVs which are I<target>s for opcodes
-are created. The answer is that they are created when the current
+are created. The answer is that they are created when the current
unit--a subroutine or a file (for opcodes for statements outside of
-subroutines)--is compiled. During this time a special anonymous Perl
+subroutines)--is compiled. During this time a special anonymous Perl
array is created, which is called a scratchpad for the current unit.
A scratchpad keeps SVs which are lexicals for the current unit and are
While I<target>s do have C<SVs_PADTMP> set, it can also be set on variables
that have never resided in a pad, but nonetheless act like I<target>s.
-The correspondence between OPs and I<target>s is not 1-to-1. Different
+The correspondence between OPs and I<target>s is not 1-to-1. Different
OPs in the compile tree of the unit can use the same target, if this
would not conflict with the expected life of the temporary.
=head2 Scratchpads and recursion
In fact it is not 100% true that a compiled unit contains a pointer to
-the scratchpad AV. In fact it contains a pointer to an AV of
-(initially) one element, and this element is the scratchpad AV. Why do
+the scratchpad AV. In fact it contains a pointer to an AV of
+(initially) one element, and this element is the scratchpad AV. Why do
we need an extra level of indirection?
-The answer is B<recursion>, and maybe B<threads>. Both
+The answer is B<recursion>, and maybe B<threads>. Both
these can create several execution pointers going into the same
-subroutine. For the subroutine-child not write over the temporaries
+subroutine. For the subroutine-child not write over the temporaries
for the subroutine-parent (lifespan of which covers the call to the
child), the parent and the child should have different
-scratchpads. (I<And> the lexicals should be separate anyway!)
+scratchpads. (I<And> the lexicals should be separate anyway!)
So each subroutine is born with an array of scratchpads (of length 1).
On each entry to the subroutine it is checked that the current
=head2 Code tree
Here we describe the internal form your code is converted to by
-Perl. Start with a simple example:
+Perl. Start with a simple example:
$a = $b + $c;
C<gvsv gvsv add whatever>.
Each of these nodes represents an op, a fundamental operation inside the
-Perl core. The code which implements each operation can be found in the
+Perl core. The code which implements each operation can be found in the
F<pp*.c> files; the function which implements the op with type C<gvsv>
-is C<pp_gvsv>, and so on. As the tree above shows, different ops have
+is C<pp_gvsv>, and so on. As the tree above shows, different ops have
different numbers of children: C<add> is a binary operator, as one would
-expect, and so has two children. To accommodate the various different
+expect, and so has two children. To accommodate the various different
numbers of children, there are various types of op data structure, and
they link together in different ways.
-The simplest type of op structure is C<OP>: this has no children. Unary
+The simplest type of op structure is C<OP>: this has no children. Unary
operators, C<UNOP>s, have one child, and this is pointed to by the
-C<op_first> field. Binary operators (C<BINOP>s) have not only an
-C<op_first> field but also an C<op_last> field. The most complex type of
-op is a C<LISTOP>, which has any number of children. In this case, the
+C<op_first> field. Binary operators (C<BINOP>s) have not only an
+C<op_first> field but also an C<op_last> field. The most complex type of
+op is a C<LISTOP>, which has any number of children. In this case, the
first child is pointed to by C<op_first> and the last child by
-C<op_last>. The children in between can be found by iteratively
+C<op_last>. The children in between can be found by iteratively
following the C<op_sibling> pointer from the first child to the last.
There are also two other op types: a C<PMOP> holds a regular expression,
-and has no children, and a C<LOOP> may or may not have children. If the
-C<op_children> field is non-zero, it behaves like a C<LISTOP>. To
+and has no children, and a C<LOOP> may or may not have children. If the
+C<op_children> field is non-zero, it behaves like a C<LISTOP>. To
complicate matters, if a C<UNOP> is actually a C<null> op after
optimization (see L</Compile pass 2: context propagation>) it will still
have children in accordance with its former type.
=head2 Compile pass 1: check routines
The tree is created by the compiler while I<yacc> code feeds it
-the constructions it recognizes. Since I<yacc> works bottom-up, so does
+the constructions it recognizes. Since I<yacc> works bottom-up, so does
the first pass of perl compilation.
What makes this pass interesting for perl developers is that some
tree (if the top-level node was not modified, check routine returns
its argument).
-By convention, check routines have names C<ck_*>. They are usually
+By convention, check routines have names C<ck_*>. They are usually
called from C<new*OP> subroutines (or C<convert>) (which in turn are
called from F<perly.y>).
=head2 Compile pass 3: peephole optimization
After the compile tree for a subroutine (or for an C<eval> or a file)
-is created, an additional pass over the code is performed. This pass
+is created, an additional pass over the code is performed. This pass
is neither top-down or bottom-up, but in the execution order (with
additional complications for conditionals). Optimizations performed
at this stage are subject to the same restrictions as in the pass 2.
=head2 Compile-time scope hooks
As of perl 5.14 it is possible to hook into the compile-time lexical
-scope mechanism using C<Perl_blockhook_register>. This is used like
+scope mechanism using C<Perl_blockhook_register>. This is used like
this:
STATIC void my_start_hook(pTHX_ int full);
Perl_blockhook_register(aTHX_ &my_hooks);
This will arrange to have C<my_start_hook> called at the start of
-compiling every lexical scope. The available hooks are:
+compiling every lexical scope. The available hooks are:
=over 4
=item C<void bhk_start(pTHX_ int full)>
-This is called just after starting a new lexical scope. Note that Perl
+This is called just after starting a new lexical scope. Note that Perl
code like
if ($x) { ... }
creates two scopes: the first starts at the C<(> and has C<full == 1>,
-the second starts at the C<{> and has C<full == 0>. Both end at the
-C<}>, so calls to C<start> and C<pre/post_end> will match. Anything
+the second starts at the C<{> and has C<full == 0>. Both end at the
+C<}>, so calls to C<start> and C<pre/post_end> will match. Anything
pushed onto the save stack by this hook will be popped just before the
scope ends (between the C<pre_> and C<post_end> hooks, in fact).
=item C<void bhk_pre_end(pTHX_ OP **o)>
This is called at the end of a lexical scope, just before unwinding the
-stack. I<o> is the root of the optree representing the scope; it is a
+stack. I<o> is the root of the optree representing the scope; it is a
double pointer so you can replace the OP if you need to.
=item C<void bhk_post_end(pTHX_ OP **o)>
This is called at the end of a lexical scope, just after unwinding the
-stack. I<o> is as above. Note that it is possible for calls to C<pre_>
+stack. I<o> is as above. Note that it is possible for calls to C<pre_>
and C<post_end> to nest, if there is something on the save stack that
calls string eval.
=item C<void bhk_eval(pTHX_ OP *const o)>
This is called just before starting to compile an C<eval STRING>, C<do
-FILE>, C<require> or C<use>, after the eval has been set up. I<o> is the
+FILE>, C<require> or C<use>, after the eval has been set up. I<o> is the
OP that requested the eval, and will normally be an C<OP_ENTEREVAL>,
C<OP_DOFILE> or C<OP_REQUIRE>.
=back
Once you have your hook functions, you need a C<BHK> structure to put
-them in. It's best to allocate it statically, since there is no way to
-free it once it's registered. The function pointers should be inserted
+them in. It's best to allocate it statically, since there is no way to
+free it once it's registered. The function pointers should be inserted
into this structure using the C<BhkENTRY_set> macro, which will also set
-flags indicating which entries are valid. If you do need to allocate
+flags indicating which entries are valid. If you do need to allocate
your C<BHK> dynamically for some reason, be sure to zero it before you
start.
Once registered, there is no mechanism to switch these hooks off, so if
-that is necessary you will need to do this yourself. An entry in C<%^H>
+that is necessary you will need to do this yourself. An entry in C<%^H>
is probably the best way, so the effect is lexically scoped; however it
is also possible to use the C<BhkDISABLE> and C<BhkENABLE> macros to
-temporarily switch entries on and off. You should also be aware that
+temporarily switch entries on and off. You should also be aware that
generally speaking at least one scope will have opened before your
extension is loaded, so you will see some C<pre/post_end> pairs that
didn't have a matching C<start>.
functions which produce formatted output of internal data structures.
The most commonly used of these functions is C<Perl_sv_dump>; it's used
-for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls
+for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls
C<sv_dump> to produce debugging output from Perl-space, so users of that
module should already be familiar with its format.
or inside a thread-specific structure. These structures contain all
the context, the state of that interpreter.
-One macro controls the major Perl build flavor: MULTIPLICITY. The
+One macro controls the major Perl build flavor: MULTIPLICITY. The
MULTIPLICITY build has a C structure that packages all the interpreter
-state. With multiplicity-enabled perls, PERL_IMPLICIT_CONTEXT is also
+state. With multiplicity-enabled perls, PERL_IMPLICIT_CONTEXT is also
normally defined, and enables the support for passing in a "hidden" first
-argument that represents all three data structures. MULTIPLICITY makes
+argument that represents all three data structures. MULTIPLICITY makes
multi-threaded perls possible (with the ithreads threading model, related
to the macro USE_ITHREADS.)
which will be private. All functions whose names begin C<S_> are private
(think "S" for "secret" or "static"). All other functions begin with
"Perl_", but just because a function begins with "Perl_" does not mean it is
-part of the API. (See L</Internal Functions>.) The easiest way to be B<sure> a
+part of the API. (See L</Internal
+Functions>.) The easiest way to be B<sure> a
function is part of the API is to find its entry in L<perlapi>.
If it exists in L<perlapi>, it's part of the API. If it doesn't, and you
think it should be (i.e., you need it for your extension), send mail via
All of Perl's internal functions which will be exposed to the outside
world are prefixed by C<Perl_> so that they will not conflict with XS
functions or functions used in a program in which Perl is embedded.
-Similarly, all global variables begin with C<PL_>. (By convention,
+Similarly, all global variables begin with C<PL_>. (By convention,
static functions start with C<S_>.)
Inside the Perl core (C<PERL_CORE> defined), you can get at the functions
either with or without the C<Perl_> prefix, thanks to a bunch of defines
-that live in F<embed.h>. Note that extension code should I<not> set
+that live in F<embed.h>. Note that extension code should I<not> set
C<PERL_CORE>; this exposes the full perl internals, and is likely to cause
breakage of the XS in each new perl release.
The file F<embed.h> is generated automatically from
-F<embed.pl> and F<embed.fnc>. F<embed.pl> also creates the prototyping
+F<embed.pl> and F<embed.fnc>. F<embed.pl> also creates the prototyping
header files for the internal functions, generates the documentation
-and a lot of other bits and pieces. It's important that when you add
+and a lot of other bits and pieces. It's important that when you add
a new function to the core or change an existing one, you change the
-data in the table in F<embed.fnc> as well. Here's a sample entry from
+data in the table in F<embed.fnc> as well. Here's a sample entry from
that table:
Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval
-The second column is the return type, the third column the name. Columns
-after that are the arguments. The first column is a set of flags:
+The second column is the return type, the third column the name. Columns
+after that are the arguments. The first column is a set of flags:
=over 3
=item A
-This function is a part of the public API. All such functions should also
+This function is a part of the public
+API. All such functions should also
have 'd', very few do not.
=item p
=item o
This function should not have a compatibility macro to define, say,
-C<Perl_parse> to C<parse>. It must be called as C<Perl_parse>.
+C<Perl_parse> to C<parse>. It must be called as C<Perl_parse>.
=item x
=head2 Exception Handling
There are a couple of macros to do very basic exception handling in XS
-modules. You have to define C<NO_XSLOCKS> before including F<XSUB.h> to
+modules. You have to define C<NO_XSLOCKS> before including F<XSUB.h> to
be able to use these macros:
#define NO_XSLOCKS
#include "XSUB.h"
You can use these macros if you call code that may croak, but you need
-to do some cleanup before giving control back to Perl. For example:
+to do some cleanup before giving control back to Perl. For example:
dXCPT; /* set up necessary variables */
}
Note that you always have to rethrow an exception that has been
-caught. Using these macros, it is not possible to just catch the
-exception and ignore it. If you have to ignore the exception, you
+caught. Using these macros, it is not possible to just catch the
+exception and ignore it. If you have to ignore the exception, you
have to use the C<call_*> function.
The advantage of using the above macros is that you don't have
There's an effort going on to document the internal functions and
automatically produce reference manuals from them - L<perlapi> is one
such manual which details all the functions which are available to XS
-writers. L<perlintern> is the autogenerated manual for the functions
+writers. L<perlintern> is the autogenerated manual for the functions
which are not part of the API and are supposedly for internal use only.
Source documentation is created by putting POD comments into the C
=head2 Backwards compatibility
-The Perl API changes over time. New functions are added or the interfaces
-of existing functions are changed. The C<Devel::PPPort> module tries to
+The Perl API changes over time. New functions are
+added or the interfaces of existing functions are
+changed. The C<Devel::PPPort> module tries to
provide compatibility code for some of these changes, so XS writers don't
have to code it themselves when supporting multiple versions of Perl.
C<Devel::PPPort> generates a C header file F<ppport.h> that can also
-be run as a Perl script. To generate F<ppport.h>, run:
+be run as a Perl script. To generate F<ppport.h>, run:
perl -MDevel::PPPort -eDevel::PPPort::WriteFile
Besides checking existing XS code, the script can also be used to retrieve
compatibility information for various API calls using the C<--api-info>
-command line switch. For example:
+command line switch. For example:
% perl ppport.h --api-info=sv_magicext
=head1 Unicode Support
-Perl 5.6.0 introduced Unicode support. It's important for porters and XS
+Perl 5.6.0 introduced Unicode support. It's important for porters and XS
writers to understand this support and make sure that the code they
write does not corrupt Unicode data.
=head2 What B<is> Unicode, anyway?
-In the olden, less enlightened times, we all used to use ASCII. Most of
-us did, anyway. The big problem with ASCII is that it's American. Well,
+In the olden, less enlightened times, we all used to use ASCII. Most of
+us did, anyway. The big problem with ASCII is that it's American. Well,
no, that's not actually the problem; the problem is that it's not
-particularly useful for people who don't use the Roman alphabet. What
+particularly useful for people who don't use the Roman alphabet. What
used to happen was that particular languages would stick their own
-alphabet in the upper range of the sequence, between 128 and 255. Of
+alphabet in the upper range of the sequence, between 128 and 255. Of
course, we then ended up with plenty of variants that weren't quite
ASCII, and the whole point of it being a standard was lost.
To fix this, some people formed Unicode, Inc. and
produced a new character set containing all the characters you can
-possibly think of and more. There are several ways of representing these
-characters, and the one Perl uses is called UTF-8. UTF-8 uses
-a variable number of bytes to represent a character. You can learn more
+possibly think of and more. There are several ways of representing these
+characters, and the one Perl uses is called UTF-8. UTF-8 uses
+a variable number of bytes to represent a character. You can learn more
about Unicode and Perl's Unicode model in L<perlunicode>.
=head2 How can I recognise a UTF-8 string?
-You can't. This is because UTF-8 data is stored in bytes just like
-non-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types)
+You can't. This is because UTF-8 data is stored in bytes just like
+non-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types)
capital E with a grave accent, is represented by the two bytes
-C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)>
-has that byte sequence as well. So you can't tell just by looking - this
+C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)>
+has that byte sequence as well. So you can't tell just by looking - this
is what makes Unicode input an interesting problem.
In general, you either have to know what you're dealing with, or you
have to guess. The API function C<is_utf8_string> can help; it'll tell
-you if a string contains only valid UTF-8 characters. However, it can't
-do the work for you. On a character-by-character basis,
+you if a string contains only valid UTF-8 characters. However, it can't
+do the work for you. On a character-by-character basis,
C<is_utf8_char_buf>
will tell you whether the current character in a string is valid UTF-8.
=head2 How does UTF-8 represent Unicode characters?
As mentioned above, UTF-8 uses a variable number of bytes to store a
-character. Characters with values 0...127 are stored in one byte, just
-like good ol' ASCII. Character 128 is stored as C<v194.128>; this
-continues up to character 191, which is C<v194.191>. Now we've run out of
-bits (191 is binary C<10111111>) so we move on; 192 is C<v195.128>. And
+character. Characters with values 0...127 are stored in one
+byte, just like good ol' ASCII. Character 128 is stored as
+C<v194.128>; this continues up to character 191, which is
+C<v194.191>. Now we've run out of bits (191 is binary
+C<10111111>) so we move on; 192 is C<v195.128>. And
so it goes on, moving to three bytes at character 2048.
Assuming you know you're dealing with a UTF-8 string, you can find out
Another way to skip over characters in a UTF-8 string is to use
C<utf8_hop>, which takes a string and a number of characters to skip
-over. You're on your own about bounds checking, though, so don't use it
+over. You're on your own about bounds checking, though, so don't use it
lightly.
All bytes in a multi-byte UTF-8 character will have the high bit set,
You B<must> convert characters to UVs using the above functions if
you're ever in a situation where you have to match UTF-8 and non-UTF-8
-characters. You may not skip over UTF-8 characters in this case. If you
+characters. You may not skip over UTF-8 characters in this case. If you
do this, you'll lose the ability to match hi-bit non-UTF-8 characters;
for instance, if your UTF-8 string contains C<v196.172>, and you skip
that character, you can never match a C<chr(200)> in a non-UTF-8 string.
=head2 How does Perl store UTF-8 strings?
Currently, Perl deals with Unicode strings and non-Unicode strings
-slightly differently. A flag in the SV, C<SVf_UTF8>, indicates that the
-string is internally encoded as UTF-8. Without it, the byte value is the
+slightly differently. A flag in the SV, C<SVf_UTF8>, indicates that the
+string is internally encoded as UTF-8. Without it, the byte value is the
codepoint number and vice versa (in other words, the string is encoded
as iso-8859-1, but C<use feature 'unicode_strings'> is needed to get iso-8859-1
-semantics). You can check and manipulate this flag with the
+semantics). You can check and manipulate this flag with the
following macros:
SvUTF8(sv)
Never forget that the C<SVf_UTF8> flag is separate to the PV value; you
need be sure you don't accidentally knock it off while you're
-manipulating SVs. More specifically, you cannot expect to do this:
+manipulating SVs. More specifically, you cannot expect to do this:
SV *sv;
SV *nsv;
nsv = newSVpvn(p, len);
The C<char*> string does not tell you the whole story, and you can't
-copy or reconstruct an SV just by copying the string value. Check if the
+copy or reconstruct an SV just by copying the string value. Check if the
old SV has the UTF8 flag set, and act accordingly:
p = SvPV(sv, len);
=head2 How do I convert a string to UTF-8?
If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to upgrade
-one of the strings to UTF-8. If you've got an SV, the easiest way to do
+one of the strings to UTF-8. If you've got an SV, the easiest way to do
this is:
sv_utf8_upgrade(sv);
by the end user, it can cause problems in deficient code.
Instead, C<bytes_to_utf8> will give you a UTF-8-encoded B<copy> of its
-string argument. This is useful for having the data available for
-comparisons and so on, without harming the original SV. There's also
+string argument. This is useful for having the data available for
+comparisons and so on, without harming the original SV. There's also
C<utf8_to_bytes> to go the other way, but naturally, this will fail if
the string contains any characters above 255 that can't be represented
in a single byte.
=head2 Is there anything else I need to know?
-Not really. Just remember these things:
+Not really. Just remember these things:
=over 3
=item *
-There's no way to tell if a string is UTF-8 or not. You can tell if an SV
-is UTF-8 by looking at its C<SvUTF8> flag. Don't forget to set the flag if
-something should be UTF-8. Treat the flag as part of the PV, even though
+There's no way to tell if a string is UTF-8 or not. You can tell if an SV
+is UTF-8 by looking at its C<SvUTF8> flag. Don't forget to set the flag if
+something should be UTF-8. Treat the flag as part of the PV, even though
it's not - if you pass on the PV to somewhere, pass on the flag too.
=item *
=item *
-Mixing UTF-8 and non-UTF-8 strings is tricky. Use C<bytes_to_utf8> to get
+Mixing UTF-8 and non-UTF-8 strings is
+tricky. Use C<bytes_to_utf8> to get
a new string which is UTF-8 encoded, and then combine them.
=back
=head1 Custom Operators
Custom operator support is an experimental feature that allows you to
-define your own ops. This is primarily to allow the building of
+define your own ops. This is primarily to allow the building of
interpreters for other languages in the Perl core, but it also allows
optimizations through the creation of "macro-ops" (ops which perform the
functions of multiple ops which are usually executed together, such as
C<gvsv, gvsv, add>.)
-This feature is implemented as a new op type, C<OP_CUSTOM>. The Perl
+This feature is implemented as a new op type, C<OP_CUSTOM>. The Perl
core does not "know" anything special about this op type, and so it will
-not be involved in any optimizations. This also means that you can
+not be involved in any optimizations. This also means that you can
define your custom ops to be any op structure - unary, binary, list and
so on - you like.
-It's important to know what custom operators won't do for you. They
-won't let you add new syntax to Perl, directly. They won't even let you
-add new keywords, directly. In fact, they won't change the way Perl
-compiles a program at all. You have to do those changes yourself, after
-Perl has compiled the program. You do this either by manipulating the op
+It's important to know what custom operators won't do for you. They
+won't let you add new syntax to Perl, directly. They won't even let you
+add new keywords, directly. In fact, they won't change the way Perl
+compiles a program at all. You have to do those changes yourself, after
+Perl has compiled the program. You do this either by manipulating the op
tree using a C<CHECK> block and the C<B::Generate> module, or by adding
a custom peephole optimizer with the C<optimize> module.
When you do this, you replace ordinary Perl ops with custom ops by
creating ops with the type C<OP_CUSTOM> and the C<op_ppaddr> of your own
-PP function. This should be defined in XS code, and should look like
-the PP ops in C<pp_*.c>. You are responsible for ensuring that your op
+PP function. This should be defined in XS code, and should look like
+the PP ops in C<pp_*.c>. You are responsible for ensuring that your op
takes the appropriate number of values from the stack, and you are
responsible for adding stack marks if necessary.
You should also "register" your op with the Perl interpreter so that it
-can produce sensible error and warning messages. Since it is possible to
+can produce sensible error and warning messages. Since it is possible to
have multiple custom ops within the one "logical" op type C<OP_CUSTOM>,
Perl uses the value of C<< o->op_ppaddr >> to determine which custom op
-it is dealing with. You should create an C<XOP> structure for each
+it is dealing with. You should create an C<XOP> structure for each
ppaddr you use, set the properties of the custom op with
C<XopENTRY_set>, and register the structure against the ppaddr using
-C<Perl_custom_op_register>. A trivial example might look like:
+C<Perl_custom_op_register>. A trivial example might look like:
static XOP my_xop;
static OP *my_pp(pTHX);
=item xop_name
-A short name for your op. This will be included in some error messages,
+A short name for your op. This will be included in some error messages,
and will also be returned as C<< $op->name >> by the L<B|B> module, so
it will appear in the output of module like L<B::Concise|B::Concise>.
=item xop_class
-Which of the various C<*OP> structures this op uses. This should be one of
+Which of the various C<*OP> structures this op uses. This should be one of
the C<OA_*> constants from F<op.h>, namely
=over 4
=item OA_PVOP_OR_SVOP
-This should be interpreted as 'C<PVOP>' only. The C<_OR_SVOP> is because
+This should be interpreted as 'C<PVOP>' only. The C<_OR_SVOP> is because
the only core C<PVOP>, C<OP_TRANS>, can sometimes be a C<SVOP> instead.
=item OA_LOOP
=item xop_peep
This member is of type C<Perl_cpeep_t>, which expands to C<void
-(*Perl_cpeep_t)(aTHX_ OP *o, OP *oldop)>. If it is set, this function
+(*Perl_cpeep_t)(aTHX_ OP *o, OP *oldop)>. If it is set, this function
will be called from C<Perl_rpeep> when ops of this type are encountered
-by the peephole optimizer. I<o> is the OP that needs optimizing;
+by the peephole optimizer. I<o> is the OP that needs optimizing;
I<oldop> is the previous OP optimized, whose C<op_next> points to I<o>.
=back