Each typedef has specific routines that manipulate the various data types.
+=for apidoc_section $AV
+=for apidoc Ayh||AV
+=for apidoc_section $HV
+=for apidoc Ayh||HV
+=for apidoc_section $SV
+=for apidoc Ayh||SV
+
=head2 What is an "IV"?
Perl uses a special typedef IV which is a simple signed integer type that is
guaranteed to be large enough to hold a pointer (as well as an integer).
Additionally, there is the UV, which is simply an unsigned IV.
-Perl also uses two special typedefs, I32 and I16, which will always be at
-least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16,
-as well.) They will usually be exactly 32 and 16 bits long, but on Crays
-they will both be 64 bits.
+Perl also uses several special typedefs to declare variables to hold
+integers of (at least) a given size.
+Use I8, I16, I32, and I64 to declare a signed integer variable which has
+at least as many bits as the number in its name. These all evaluate to
+the native C type that is closest to the given number of bits, but no
+smaller than that number. For example, on many platforms, a C<short> is
+16 bits long, and if so, I16 will evaluate to a C<short>. But on
+platforms where a C<short> isn't exactly 16 bits, Perl will use the
+smallest type that contains 16 bits or more.
+
+U8, U16, U32, and U64 are to declare the corresponding unsigned integer
+types.
+
+If the platform doesn't support 64-bit integers, both I64 and U64 will
+be undefined. Use IV and UV to declare the largest practicable, and
+C<L<perlapi/WIDEST_UTYPE>> for the absolute maximum unsigned, but which
+may not be usable in all circumstances.
+
+A numeric constant can be specified with L<perlapi/C<INT16_C>>,
+L<perlapi/C<UINTMAX_C>>, and similar.
+
+=for apidoc_section $integer
+=for apidoc Ayh||I8
+=for apidoc_item ||I16
+=for apidoc_item ||I32
+=for apidoc_item ||I64
+=for apidoc_item ||IV
+
+=for apidoc Ayh||U8
+=for apidoc_item ||U16
+=for apidoc_item ||U32
+=for apidoc_item ||U64
+=for apidoc_item ||UV
=head2 Working with SVs
example, a trailing C<NUL> is tacked on automatically. The non-string use
is documented only in this paragraph.)
+=for apidoc Ayh||NV
+
The seven routines are:
SV* newSViv(IV);
F<config.h>) guaranteed to be large enough to represent the size of
any string that perl can handle.
+=for apidoc Ayh||STRLEN
+
In the unlikely case of a SV requiring more complex initialization, you
can create an empty SV with newSV(len). If C<len> is 0 an empty SV of
type NULL is returned, else an SV of type PV is returned with len + 1 (for
Nevertheless, you should be very careful when you pass a string stored
in an SV to a C function or system call.
-To access the actual value that an SV points to, you can use the macros:
+To access the actual value that an SV points to, Perl's API exposes
+several macros that coerce the actual scalar type into an IV, UV, double,
+or string:
+
+=over
+
+=item * C<SvIV(SV*)> (C<IV>) and C<SvUV(SV*)> (C<UV>)
+
+=item * C<SvNV(SV*)> (C<double>)
+
+=item * Strings are a bit complicated:
+
+=over
+
+=item * Byte string: C<SvPVbyte(SV*, STRLEN len)> or C<SvPVbyte_nolen(SV*)>
+
+If the Perl string is C<"\xff\xff">, then this returns a 2-byte C<char*>.
+
+This is suitable for Perl strings that represent bytes.
- SvIV(SV*)
- SvUV(SV*)
- SvNV(SV*)
- SvPV(SV*, STRLEN len)
- SvPV_nolen(SV*)
+=item * UTF-8 string: C<SvPVutf8(SV*, STRLEN len)> or C<SvPVutf8_nolen(SV*)>
-which will automatically coerce the actual scalar type into an IV, UV, double,
-or string.
+If the Perl string is C<"\xff\xff">, then this returns a 4-byte C<char*>.
-In the C<SvPV> macro, the length of the string returned is placed into the
-variable C<len> (this is a macro, so you do I<not> use C<&len>). If you do
-not care what the length of the data is, use the C<SvPV_nolen> macro.
-Historically the C<SvPV> macro with the global variable C<PL_na> has been
-used in this case. But that can be quite inefficient because C<PL_na> must
+This is suitable for Perl strings that represent characters.
+
+B<CAVEAT>: That C<char*> will be encoded via Perl's internal UTF-8 variant,
+which means that if the SV contains non-Unicode code points (e.g.,
+0x110000), then the result may contain extensions over valid UTF-8.
+See L<perlapi/is_strict_utf8_string> for some methods Perl gives
+you to check the UTF-8 validity of these macros' returns.
+
+=item * You can also use C<SvPV(SV*, STRLEN len)> or C<SvPV_nolen(SV*)>
+to fetch the SV's raw internal buffer. This is tricky, though; if your Perl
+string
+is C<"\xff\xff">, then depending on the SV's internal encoding you might get
+back a 2-byte B<OR> a 4-byte C<char*>.
+Moreover, if it's the 4-byte string, that could come from either Perl
+C<"\xff\xff"> stored UTF-8 encoded, or Perl C<"\xc3\xbf\xc3\xbf"> stored
+as raw octets. To differentiate between these you B<MUST> look up the
+SV's UTF8 bit (cf. C<SvUTF8>) to know whether the source Perl string
+is 2 characters (C<SvUTF8> would be on) or 4 characters (C<SvUTF8> would be
+off).
+
+B<IMPORTANT:> Use of C<SvPV>, C<SvPV_nolen>, or
+similarly-named macros I<without> looking up the SV's UTF8 bit is
+almost certainly a bug if non-ASCII input is allowed.
+
+When the UTF8 bit is on, the same B<CAVEAT> about UTF-8 validity applies
+here as for C<SvPVutf8>.
+
+=back
+
+(See L</How do I pass a Perl string to a C library?> for more details.)
+
+In C<SvPVbyte>, C<SvPVutf8>, and C<SvPV>, the length of the C<char*> returned
+is placed into the
+variable C<len> (these are macros, so you do I<not> use C<&len>). If you do
+not care what the length of the data is, use C<SvPVbyte_nolen>,
+C<SvPVutf8_nolen>, or C<SvPV_nolen> instead.
+The global variable C<PL_na> can also be given to
+C<SvPVbyte>/C<SvPVutf8>/C<SvPV>
+in this case. But that can be quite inefficient because C<PL_na> must
be accessed in thread-local storage in threaded Perl. In any case, remember
that Perl allows arbitrary strings of data that may both contain NULs and
might not be terminated by a C<NUL>.
-Also remember that C doesn't allow you to safely say C<foo(SvPV(s, len),
+Also remember that C doesn't allow you to safely say C<foo(SvPVbyte(s, len),
len);>. It might work with your
compiler, but it won't work for everyone.
Break this sort of statement up into separate assignments:
SV *s;
STRLEN len;
char *ptr;
- ptr = SvPV(s, len);
+ ptr = SvPVbyte(s, len);
foo(ptr, len);
+=back
+
If you want to know if the scalar value is TRUE, you can use:
SvTRUE(SV*)
C<SvGROW(sv, len + 1)>).
If you want to write to an existing SV's buffer and set its value to a
-string, use SvPV_force() or one of its variants to force the SV to be
+string, use SvPVbyte_force() or one of its variants to force the SV to be
a PV. This will remove any of various types of non-stringness from
the SV while preserving the content of the SV in the PV. This can be
used, for example, to append data from an API function to a buffer
s = SvGROW(sv, needlen + 1);
/* something that modifies up to needlen bytes at s, but modifies
newlen bytes
- eg. newlen = read(fd, s. needlen);
+ eg. newlen = read(fd, s, needlen);
*/
s[newlen] = '\0';
SvCUR_set(sv, newlen);
once you have an C<HE*>, to get the actual key and value, use the routines
specified below.
+=for apidoc Ayh||HE
+
I32 hv_iterinit(HV*);
/* Prepares starting point to traverse hash table */
HE* hv_iternext(HV*);
See L</Understanding the Magic of Tied Hashes and Arrays> for more
information on how to use the hash access functions on tied hashes.
+=for apidoc_section $HV
=for apidoc Amh|void|PERL_HASH|U32 hash|char *key|STRLEN klen
=head2 Hash API Extensions
For more information on references and blessings, consult L<perlref>.
+=head2 I/O Handles
+
+Like AVs and HVs, IO objects are another type of non-scalar SV which
+may contain input and output L<PerlIO|perlapio> objects or a C<DIR *>
+from opendir().
+
+You can create a new IO object:
+
+ IO* newIO();
+
+Unlike other SVs, a new IO object is automatically blessed into the
+L<IO::File> class.
+
+The IO object contains an input and output PerlIO handle:
+
+ PerlIO *IoIFP(IO *io);
+ PerlIO *IoOFP(IO *io);
+
+Typically if the IO object has been opened on a file, the input handle
+is always present, but the output handle is only present if the file
+is open for output. For a file, if both are present they will be the
+same PerlIO object.
+
+Distinct input and output PerlIO objects are created for sockets and
+character devices.
+
+The IO object also contains other data associated with Perl I/O
+handles:
+
+ IV IoLINES(io); /* $. */
+ IV IoPAGE(io); /* $% */
+ IV IoPAGE_LEN(io); /* $= */
+ IV IoLINES_LEFT(io); /* $- */
+ char *IoTOP_NAME(io); /* $^ */
+ GV *IoTOP_GV(io); /* $^ */
+ char *IoFMT_NAME(io); /* $~ */
+ GV *IoFMT_GV(io); /* $~ */
+ char *IoBOTTOM_NAME(io);
+ GV *IoBOTTOM_GV(io);
+ char IoTYPE(io);
+ U8 IoFLAGS(io);
+
+Most of these are involved with L<formats|perlform>.
+
+IoFLAGs() may contain a combination of flags, the most interesting of
+which are C<IOf_FLUSH> (C<$|>) for autoflush and C<IOf_UNTAINT>,
+settable with L<< IO::Handle's untaint() method|IO::Handle/"$io->untaint" >>.
+
+The IO object may also contains a directory handle:
+
+ DIR *IoDIRP(io);
+
+suitable for use with PerlDir_read() etc.
+
+All of these accessors macros are lvalues, there are no distinct
+C<_set()> macros to modify the members of the IO object.
+
=head2 Double-Typed SVs
Scalar variables normally contain only one type of value, an integer,
"Magic Virtual Table" to handle the various operations that might be
applied to that variable.
+=for apidoc Ayh||MGVTBL
+
The C<MGVTBL> has five (or sometimes eight) pointers to the following
routine types:
extensions
-=for apidoc Amnh||PERL_MAGIC_sv
-=for apidoc Amnh||PERL_MAGIC_arylen
-=for apidoc Amnh||PERL_MAGIC_rhash
-=for apidoc Amnh||PERL_MAGIC_debugvar
-=for apidoc Amnh||PERL_MAGIC_pos
-=for apidoc Amnh||PERL_MAGIC_symtab
-=for apidoc Amnh||PERL_MAGIC_backref
-=for apidoc Amnh||PERL_MAGIC_arylen_p
-=for apidoc Amnh||PERL_MAGIC_bm
-=for apidoc Amnh||PERL_MAGIC_overload_table
-=for apidoc Amnh||PERL_MAGIC_regdata
-=for apidoc Amnh||PERL_MAGIC_regdatum
-=for apidoc Amnh||PERL_MAGIC_env
-=for apidoc Amnh||PERL_MAGIC_envelem
-=for apidoc Amnh||PERL_MAGIC_fm
-=for apidoc Amnh||PERL_MAGIC_regex_global
-=for apidoc Amnh||PERL_MAGIC_hints
-=for apidoc Amnh||PERL_MAGIC_hintselem
-=for apidoc Amnh||PERL_MAGIC_isa
-=for apidoc Amnh||PERL_MAGIC_isaelem
-=for apidoc Amnh||PERL_MAGIC_nkeys
-=for apidoc Amnh||PERL_MAGIC_dbfile
-=for apidoc Amnh||PERL_MAGIC_dbline
-=for apidoc Amnh||PERL_MAGIC_shared
-=for apidoc Amnh||PERL_MAGIC_shared_scalar
-=for apidoc Amnh||PERL_MAGIC_collxfrm
-=for apidoc Amnh||PERL_MAGIC_tied
-=for apidoc Amnh||PERL_MAGIC_tiedelem
-=for apidoc Amnh||PERL_MAGIC_tiedscalar
-=for apidoc Amnh||PERL_MAGIC_qr
-=for apidoc Amnh||PERL_MAGIC_sig
-=for apidoc Amnh||PERL_MAGIC_sigelem
-=for apidoc Amnh||PERL_MAGIC_taint
-=for apidoc Amnh||PERL_MAGIC_uvar
-=for apidoc Amnh||PERL_MAGIC_uvar_elem
-=for apidoc Amnh||PERL_MAGIC_vstring
-=for apidoc Amnh||PERL_MAGIC_vec
-=for apidoc Amnh||PERL_MAGIC_utf8
-=for apidoc Amnh||PERL_MAGIC_substr
-=for apidoc Amnh||PERL_MAGIC_nonelem
-=for apidoc Amnh||PERL_MAGIC_defelem
-=for apidoc Amnh||PERL_MAGIC_lvref
-=for apidoc Amnh||PERL_MAGIC_checkcall
-=for apidoc Amnh||PERL_MAGIC_ext
+=for apidoc AmnhU||PERL_MAGIC_arylen
+=for apidoc_item ||PERL_MAGIC_arylen_p
+=for apidoc_item ||PERL_MAGIC_backref
+=for apidoc_item ||PERL_MAGIC_bm
+=for apidoc_item ||PERL_MAGIC_checkcall
+=for apidoc_item ||PERL_MAGIC_collxfrm
+=for apidoc_item ||PERL_MAGIC_dbfile
+=for apidoc_item ||PERL_MAGIC_dbline
+=for apidoc_item ||PERL_MAGIC_debugvar
+=for apidoc_item ||PERL_MAGIC_defelem
+=for apidoc_item ||PERL_MAGIC_env
+=for apidoc_item ||PERL_MAGIC_envelem
+=for apidoc_item ||PERL_MAGIC_ext
+=for apidoc_item ||PERL_MAGIC_fm
+=for apidoc_item ||PERL_MAGIC_hints
+=for apidoc_item ||PERL_MAGIC_hintselem
+=for apidoc_item ||PERL_MAGIC_isa
+=for apidoc_item ||PERL_MAGIC_isaelem
+=for apidoc_item ||PERL_MAGIC_lvref
+=for apidoc_item ||PERL_MAGIC_nkeys
+=for apidoc_item ||PERL_MAGIC_nonelem
+=for apidoc_item ||PERL_MAGIC_overload_table
+=for apidoc_item ||PERL_MAGIC_pos
+=for apidoc_item ||PERL_MAGIC_qr
+=for apidoc_item ||PERL_MAGIC_regdata
+=for apidoc_item ||PERL_MAGIC_regdatum
+=for apidoc_item ||PERL_MAGIC_regex_global
+=for apidoc_item ||PERL_MAGIC_rhash
+=for apidoc_item ||PERL_MAGIC_shared
+=for apidoc_item ||PERL_MAGIC_shared_scalar
+=for apidoc_item ||PERL_MAGIC_sig
+=for apidoc_item ||PERL_MAGIC_sigelem
+=for apidoc_item ||PERL_MAGIC_substr
+=for apidoc_item ||PERL_MAGIC_sv
+=for apidoc_item ||PERL_MAGIC_symtab
+=for apidoc_item ||PERL_MAGIC_taint
+=for apidoc_item ||PERL_MAGIC_tied
+=for apidoc_item ||PERL_MAGIC_tiedelem
+=for apidoc_item ||PERL_MAGIC_tiedscalar
+=for apidoc_item ||PERL_MAGIC_utf8
+=for apidoc_item ||PERL_MAGIC_uvar
+=for apidoc_item ||PERL_MAGIC_uvar_elem
+=for apidoc_item ||PERL_MAGIC_vec
+=for apidoc_item ||PERL_MAGIC_vstring
=for mg_vtable.pl end
=item C<SAVELONG(long i)>
+=item C<SAVEI8(I8 i)>
+
+=item C<SAVEI16(I16 i)>
+
+=item C<SAVEBOOL(int i)>
+
These macros arrange things to restore the value of integer variable
-C<i> at the end of enclosing I<pseudo-block>.
+C<i> at the end of the enclosing I<pseudo-block>.
+
+=for apidoc_section $stack
+=for apidoc Amh||SAVEINT|int i
+=for apidoc Amh||SAVEIV|IV i
+=for apidoc Amh||SAVEI32|I32 i
+=for apidoc Amh||SAVELONG|long i
+=for apidoc Amh||SAVEI8|I8 i
+=for apidoc Amh||SAVEI16|I16 i
+=for apidoc Amh||SAVEBOOL|bool i
=item C<SAVESPTR(s)>
C<SV*> and back, C<p> should be able to survive conversion to C<char*>
and back.
+=for apidoc Amh||SAVESPTR|SV * s
+=for apidoc Amh||SAVEPPTR|char * p
+
=item C<SAVEFREESV(SV *sv)>
The refcount of C<sv> will be decremented at the end of
Also compare C<SAVEMORTALIZESV>.
+=for apidoc Amh||SAVEFREESV|SV* sv
+
=item C<SAVEMORTALIZESV(SV *sv)>
Just like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current
effect of keeping C<sv> alive until the statement that called the currently
live scope has finished executing.
+=for apidoc Amh||SAVEMORTALIZESV|SV* sv
+
=item C<SAVEFREEOP(OP *op)>
The C<OP *> is op_free()ed at the end of I<pseudo-block>.
+=for apidoc Amh||SAVEFREEOP|OP *op
+
=item C<SAVEFREEPV(p)>
The chunk of memory which is pointed to by C<p> is Safefree()ed at the
end of I<pseudo-block>.
+=for apidoc Amh||SAVEFREEPV|void * p
+
=item C<SAVECLEARSV(SV *sv)>
Clears a slot in the current scratchpad which corresponds to C<sv> at
SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));
+=for apidoc Amh||SAVEDELETE|HV * hv|char * key|I32 length
+
=item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)>
At the end of I<pseudo-block> the function C<f> is called with the
only argument C<p>.
+=for apidoc Ayh||DESTRUCTORFUNC_NOCONTEXT_t
+=for apidoc Amh||SAVEDESTRUCTOR|DESTRUCTORFUNC_NOCONTEXT_t f|void *p
+
=item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)>
At the end of I<pseudo-block> the function C<f> is called with the
implicit context argument (if any), and C<p>.
+=for apidoc Ayh||DESTRUCTORFUNC_t
+=for apidoc Amh||SAVEDESTRUCTOR_X|DESTRUCTORFUNC_t f|void *p
+
=item C<SAVESTACK_POS()>
The current offset on the Perl internal stack (cf. C<SP>) is restored
at the end of I<pseudo-block>.
+=for apidoc Amh||SAVESTACK_POS
+
=back
The following API list contains functions, thus one needs to
or Perlish C<GV *>s). Where the above macros take C<int>, a similar
function takes C<int *>.
+Other macros above have functions implementing them, but its probably
+best to just use the macro, and not those or the ones below.
+
=over 4
=item C<SV* save_scalar(GV *gv)>
following the C<OpSIBLING> pointer from the first child to the last (but
see below).
+=for apidoc Ayh||OP
+=for apidoc Ayh||BINOP
+=for apidoc Ayh||LISTOP
+=for apidoc Ayh||UNOP
+
There are also some other op types: a C<PMOP> holds a regular expression,
and has no children, and a C<LOOP> may or may not have children. If the
C<op_children> field is non-zero, it behaves like a C<LISTOP>. To
optimization (see L</Compile pass 2: context propagation>) it will still
have children in accordance with its former type.
+=for apidoc Ayh||LOOP
+=for apidoc Ayh||PMOP
+
Finally, there is a C<LOGOP>, or logic op. Like a C<LISTOP>, this has one
or more children, but it doesn't have an C<op_last> field: so you have to
follow C<op_first> and then the C<OpSIBLING> chain itself to find the
that in general, C<op_other> may not point to any of the direct children
of the C<LOGOP>.
+=for apidoc Ayh||LOGOP
+
Starting in version 5.21.2, perls built with the experimental
define C<-DPERL_OP_PARENT> add an extra boolean flag for each op,
C<op_moresib>. When not set, this indicates that this is the last op in an
PL_peepp = my_peep;
static peep_t prev_rpeepp;
- static void my_rpeep(pTHX_ OP *o)
+ static void my_rpeep(pTHX_ OP *first)
{
- OP *orig_o = o;
- for(; o; o = o->op_next) {
+ OP *o = first, *t = first;
+ for(; o = o->op_next, t = t->op_next) {
/* custom per-op optimisation goes here */
+ o = o->op_next;
+ if (!o || o == t) break;
+ /* custom per-op optimisation goes AND here */
}
prev_rpeepp(aTHX_ orig_o);
}
prev_rpeepp = PL_rpeepp;
PL_rpeepp = my_rpeep;
+=for apidoc Ayh||peep_t
+
=head2 Pluggable runops
The compile tree is executed in a runops function. There are two runops
PL_runops = my_runops;
+=for apidoc Amnh|runops_proc_t|PL_runops
+
This function should be as efficient as possible to keep your programs
running as fast as possible.
This will arrange to have C<my_start_hook> called at the start of
compiling every lexical scope. The available hooks are:
+=for apidoc Ayh||BHK
+
=over 4
=item C<void bhk_start(pTHX_ int full)>
STATIC becomes "static" in C, and may be #define'd to nothing in some
configurations in the future.
+=for apidoc_section $directives
+=for apidoc Ayh||STATIC
+
A public function (i.e. part of the internal API, but not necessarily
sanctioned for use in extensions) begins like this:
or 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and
their variants.
+=for apidoc_section $concurrency
=for apidoc Amnh||aTHX
=for apidoc Amnh||aTHX_
=for apidoc Amnh||dTHX
NVff NV %f-like
NVgf NV %g-like
-=for apidoc Amnh||IVdf
-=for apidoc Amnh||UVuf
-=for apidoc Amnh||UVof
-=for apidoc Amnh||UVxf
-=for apidoc Amnh||NVef
-=for apidoc Amnh||NVff
-=for apidoc Amnh||NVgf
-
These will take care of 64-bit integers and long doubles.
For example:
The contents of SVs may be printed using the C<SVf> format, like so:
- Perl_croak(aTHX_ "This croaked because: %" SVf "\n", SvfARG(err_msg))
+ Perl_croak(aTHX_ "This croaked because: %" SVf "\n", SVfARG(err_msg))
where C<err_msg> is an SV.
One way to do this for typical filehandles is to invoke perl with the
C<-C>> parameter. (See L<perlrun/-C [numberE<sol>list]>.
+=for apidoc_section $formats
+=for apidoc Amnh||UTF8f
+=for apidoc Amh||UTF8fARG|bool is_utf8|Size_t byte_len|char *str
+
+=cut
+
=head2 Formatted Printing of C<Size_t> and C<SSize_t>
The most general way to do this is to cast them to a UV or IV, and
This modifier is not portable, so its use should be restricted to
C<PerlIO_printf()>.
+=head2 Formatted Printing of C<Ptrdiff_t>, C<intmax_t>, C<short> and other special sizes
+
+There are modifiers for these special situations if you are using
+C<PerlIO_printf()>. See L<perlfunc/size>.
+
=head2 Pointer-To-Integer and Integer-To-Pointer
Because pointer size does not necessarily equal integer size,
PTR2NV(pointer)
INT2PTR(pointertotype, integer)
-=for apidoc Amh|void *|INT2PTR|type|int value
-=for apidoc Amh|UV|PTR2UV|void *
-=for apidoc Amh|IV|PTR2IV|void *
-=for apidoc Amh|NV|PTR2NV|void *
+=for apidoc_section $casting
+=for apidoc Amh|type|INT2PTR|type|int value
+=for apidoc Amh|UV|PTR2UV|void * ptr
+=for apidoc Amh|IV|PTR2IV|void * ptr
+=for apidoc Amh|NV|PTR2NV|void * ptr
For example:
AV *av = ...;
UV uv = PTR2UV(av);
+There are also
+
+ PTR2nat(pointer) /* pointer to integer of PTRSIZE */
+ PTR2ul(pointer) /* pointer to unsigned long */
+
+=for apidoc Amh|IV|PTR2nat|void *
+=for apidoc Amh|unsigned long|PTR2ul|void *
+
+And C<PTRV> which gives the native type for an integer the same size as
+pointers, such as C<unsigned> or C<unsigned long>.
+
+=for apidoc Ayh|type|PTRV
+
=head2 Exception Handling
There are a couple of macros to do very basic exception handling in XS
writers. L<perlintern> is the autogenerated manual for the functions
which are not part of the API and are supposedly for internal use only.
-=for comment
-skip apidoc
-The following is an example and shouldn't be read as a real apidoc line
-
Source documentation is created by putting POD comments into the C
source, like this:
change, but you can look at the code for C<pp_lc> in F<pp.c> for an
example as to how it's currently done.
+=head2 How do I pass a Perl string to a C library?
+
+A Perl string, conceptually, is an opaque sequence of code points.
+Many C libraries expect their inputs to be "classical" C strings, which are
+arrays of octets 1-255, terminated with a NUL byte. Your job when writing
+an interface between Perl and a C library is to define the mapping between
+Perl and that library.
+
+Generally speaking, C<SvPVbyte> and related macros suit this task well.
+These assume that your Perl string is a "byte string", i.e., is either
+raw, undecoded input into Perl or is pre-encoded to, e.g., UTF-8.
+
+Alternatively, if your C library expects UTF-8 text, you can use
+C<SvPVutf8> and related macros. This has the same effect as encoding
+to UTF-8 then calling the corresponding C<SvPVbyte>-related macro.
+
+Some C libraries may expect other encodings (e.g., UTF-16LE). To give
+Perl strings to such libraries
+you must either do that encoding in Perl then use C<SvPVbyte>, or
+use an intermediary C library to convert from however Perl stores the
+string to the desired encoding.
+
+Take care also that NULs in your Perl string don't confuse the C
+library. If possible, give the string's length to the C library; if that's
+not possible, consider rejecting strings that contain NUL bytes.
+
+=head3 What about C<SvPV>, C<SvPV_nolen>, etc.?
+
+Consider a 3-character Perl string C<$foo = "\x64\x78\x8c">.
+Perl can store these 3 characters either of two ways:
+
+=over
+
+=item * bytes: 0x64 0x78 0x8c
+
+=item * UTF-8: 0x64 0x78 0xc2 0x8c
+
+=back
+
+Now let's say you convert C<$foo> to a C string thus:
+
+ STRLEN strlen;
+ char *str = SvPV(foo_sv, strlen);
+
+At this point C<str> could point to a 3-byte C string or a 4-byte one.
+
+Generally speaking, we want C<str> to be the same regardless of how
+Perl stores C<$foo>, so the ambiguity here is undesirable. C<SvPVbyte>
+and C<SvPVutf8> solve that by giving predictable output: use
+C<SvPVbyte> if your C library expects byte strings, or C<SvPVutf8>
+if it expects UTF-8.
+
+If your C library happens to support both encodings, then C<SvPV>--always
+in tandem with lookups to C<SvUTF8>!--may be safe and (slightly) more
+efficient.
+
+B<TESTING> B<TIP:> Use L<utf8>'s C<upgrade> and C<downgrade> functions
+in your tests to ensure consistent handling regardless of Perl's
+internal encoding.
+
=head2 How do I convert a string to UTF-8?
If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to upgrade
C<XopENTRY_set>, and register the structure against the ppaddr using
C<Perl_custom_op_register>. A trivial example might look like:
+=for apidoc Ayh||XOP
+
static XOP my_xop;
static OP *my_pp(pTHX);
by the peephole optimizer. I<o> is the OP that needs optimizing;
I<oldop> is the previous OP optimized, whose C<op_next> points to I<o>.
+=for apidoc Ayh||Perl_cpeep_t
+
=back
C<B::Generate> directly supports the creation of custom ops by name.
address to the temporaries stack.
Likewise, there is no public API to read values from the temporaries stack.
-Instead. the macros C<SAVETMPS> and C<FREETPMS> are used. The C<SAVETMPS>
+Instead, the macros C<SAVETMPS> and C<FREETMPS> are used. The C<SAVETMPS>
macro establishes the base levels of the temporaries stack, by capturing the
current value of C<PL_tmps_ix> into C<PL_tmps_floor> and saving the previous
value to the save stack. Thereafter, whenever C<FREETMPS> is invoked all of
store the size of the slab (see below on why slabs vary in size), because Perl
can follow pointers to find the last op.
-It might seem possible eliminate slab reference counts altogether, by having
+It might seem possible to eliminate slab reference counts altogether, by having
all ops implicitly attached to C<PL_compcv> when allocated and freed when the
CV is freed. That would also allow C<op_free> to skip C<FreeOp> altogether, and
thus free ops faster. But that doesn't work in those cases where ops need to