An SV can be created and loaded with one command. There are five types of
values that can be loaded: an integer value (IV), an unsigned integer
value (UV), a double (NV), a string (PV), and another scalar (SV).
+("PV" stands for "Pointer Value". You might think that it is misnamed
+because it is described as pointing only to strings. However, it is
+possible to have it point to other things. For example, inversion
+lists, used in regular expression data structures, are scalars, each
+consisting of an array of UVs which are accessed through PVs. But,
+using it for non-strings requires care, as the underlying assumption of
+much of the internals is that PVs are just for strings. Often, for
+example, a trailing NUL is tacked on automatically. The non-string use
+is documented only in this paragraph.)
The seven routines are:
can create an empty SV with newSV(len). If C<len> is 0 an empty SV of
type NULL is returned, else an SV of type PV is returned with len + 1 (for
the NUL) bytes of storage allocated, accessible via SvPVX. In both cases
-the SV has value undef.
+the SV has the undef value.
SV *sv = newSV(0); /* no storage allocated */
SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage
SV *s;
STRLEN len;
- char * ptr;
+ char *ptr;
ptr = SvPV(s, len);
foo(ptr, len);
These will tell you if you truly have an integer, double, or string pointer
stored in your SV. The "p" stands for private.
-The are various ways in which the private and public flags may differ.
+There are various ways in which the private and public flags may differ.
For example, a tied SV may have a valid underlying value in the IV slot
(so SvIOKp is true), but the data should be accessed via the FETCH
routine rather than directly, so SvIOK is false. Another is when
The second argument points to an array containing C<num> C<SV*>'s. Once the
AV has been created, the SVs can be destroyed, if so desired.
-Once the AV has been created, the following operations are possible on AVs:
+Once the AV has been created, the following operations are possible on it:
void av_push(AV*, SV*);
SV* av_pop(AV*);
Here are some other functions:
- I32 av_len(AV*);
+ I32 av_top(AV*);
SV** av_fetch(AV*, I32 key, I32 lval);
SV** av_store(AV*, I32 key, SV* val);
-The C<av_len> function returns the highest index value in array (just
+The C<av_top> function returns the highest index value in an array (just
like $#array in Perl). If the array is empty, -1 is returned. The
C<av_fetch> function returns the value at index C<key>, but if C<lval>
is non-zero, then C<av_fetch> will store an undef value at that index.
C<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their
return value.
+A few more:
+
void av_clear(AV*);
void av_undef(AV*);
void av_extend(AV*, I32 key);
HV* newHV();
-Once the HV has been created, the following operations are possible on HVs:
+Once the HV has been created, the following operations are possible on it:
SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash);
SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval);
value. However, you should check to make sure that the return value is
not NULL before dereferencing it.
-These two functions check if a hash table entry exists, and deletes it.
+The first of these two functions checks if a hash table entry exists, and the
+second deletes it.
bool hv_exists(HV*, const char* key, U32 klen);
SV* hv_delete(HV*, const char* key, U32 klen, I32 flags);
This returns NULL if the variable does not exist.
-The hash algorithm is defined in the C<PERL_HASH(hash, key, klen)> macro:
+The hash algorithm is defined in the C<PERL_HASH> macro:
- hash = 0;
- while (klen--)
- hash = (hash * 33) + *key++;
- hash = hash + (hash >> 5); /* after 5.6 */
+ PERL_HASH(hash, key, klen)
-The last step was added in version 5.6 to improve distribution of
-lower bits in the resulting hash value.
+The exact implementation of this macro varies by architecture and version
+of perl, and the return value may change per invocation, so the value
+is only valid for the duration of a single perl process.
See L<Understanding the Magic of Tied Hashes and Arrays> for more
information on how to use the hash access functions on tied hashes.
The most useful types that will be returned are:
- SVt_IV Scalar
- SVt_NV Scalar
- SVt_PV Scalar
- SVt_RV Scalar
- SVt_PVAV Array
- SVt_PVHV Hash
- SVt_PVCV Code
- SVt_PVGV Glob (possibly a file handle)
- SVt_PVMG Blessed or Magical Scalar
+ < SVt_PVAV Scalar
+ SVt_PVAV Array
+ SVt_PVHV Hash
+ SVt_PVCV Code
+ SVt_PVGV Glob (possibly a file handle)
-See the F<sv.h> header file for more details.
+See L<perlapi/svtype> for more details.
=head2 Blessed References and Class Objects
The first call creates a mortal SV (with no value), the second converts an existing
SV to a mortal SV (and thus defers a call to C<SvREFCNT_dec>), and the
third creates a mortal copy of an existing SV.
-Because C<sv_newmortal> gives the new SV no value,it must normally be given one
+Because C<sv_newmortal> gives the new SV no value, it must normally be given one
via C<sv_setpv>, C<sv_setiv>, etc. :
SV *tmp = sv_newmortal();
can happen if you make the same value mortal within multiple contexts,
or if you make a variable mortal multiple times. Thinking of "Mortalization"
as deferred C<SvREFCNT_dec> should help to minimize such problems.
-For example if you are passing an SV which you I<know> has high enough REFCNT
+For example if you are passing an SV which you I<know> has a high enough REFCNT
to survive its use on the stack you need not do any mortalization.
If you are not sure then doing an C<SvREFCNT_inc> and C<sv_2mortal>, or
making a C<sv_mortalcopy> is safer.
The current kinds of Magic Virtual Tables are:
- mg_type
- (old-style char and macro) MGVTBL Type of magic
- -------------------------- ------ -------------
- \0 PERL_MAGIC_sv vtbl_sv Special scalar variable
- # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary)
- % PERL_MAGIC_rhash (none) extra data for restricted
- hashes
- . PERL_MAGIC_pos vtbl_pos pos() lvalue
- : PERL_MAGIC_symtab (none) extra data for symbol
- tables
- < PERL_MAGIC_backref vtbl_backref for weak ref data
- @ PERL_MAGIC_arylen_p (none) to move arylen out of
- XPVAV
- A PERL_MAGIC_overload vtbl_amagic %OVERLOAD hash
- a PERL_MAGIC_overload_elem vtbl_amagicelem %OVERLOAD hash element
- B PERL_MAGIC_bm vtbl_regexp Boyer-Moore
- (fast string search)
- c PERL_MAGIC_overload_table vtbl_ovrld Holds overload table
- (AMT) on stash
- D PERL_MAGIC_regdata vtbl_regdata Regex match position data
- (@+ and @- vars)
- d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data
- element
- E PERL_MAGIC_env vtbl_env %ENV hash
- e PERL_MAGIC_envelem vtbl_envelem %ENV hash element
- f PERL_MAGIC_fm vtbl_regdata Formline ('compiled'
- format)
- G PERL_MAGIC_study vtbl_regdata study()ed string
- g PERL_MAGIC_regex_global vtbl_mglob m//g target
- H PERL_MAGIC_hints vtbl_hints %^H hash
- h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element
- I PERL_MAGIC_isa vtbl_isa @ISA array
- i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element
- k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue
- L PERL_MAGIC_dbfile (none) Debugger %_<filename
- l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename
- element
- N PERL_MAGIC_shared (none) Shared between threads
- n PERL_MAGIC_shared_scalar (none) Shared between threads
- o PERL_MAGIC_collxfrm vtbl_collxfrm Locale transformation
- P PERL_MAGIC_tied vtbl_pack Tied array or hash
- p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element
- q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle
- r PERL_MAGIC_qr vtbl_regexp precompiled qr// regex
- S PERL_MAGIC_sig (none) %SIG hash
- s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element
- t PERL_MAGIC_taint vtbl_taint Taintedness
- U PERL_MAGIC_uvar vtbl_uvar Available for use by
- extensions
- u PERL_MAGIC_uvar_elem (none) Reserved for use by
- extensions
- V PERL_MAGIC_vstring (none) SV was vstring literal
- v PERL_MAGIC_vec vtbl_vec vec() lvalue
- w PERL_MAGIC_utf8 vtbl_utf8 Cached UTF-8 information
- x PERL_MAGIC_substr vtbl_substr substr() lvalue
- y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator
- variable / smart parameter
- vivification
- ] PERL_MAGIC_checkcall (none) inlining/mutation of call
- to this CV
- ~ PERL_MAGIC_ext (none) Available for use by
- extensions
+=for comment
+This table is generated by regen/mg_vtable.pl. Any changes made here
+will be lost.
+=for mg_vtable.pl begin
+
+ mg_type
+ (old-style char and macro) MGVTBL Type of magic
+ -------------------------- ------ -------------
+ \0 PERL_MAGIC_sv vtbl_sv Special scalar variable
+ # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary)
+ % PERL_MAGIC_rhash (none) extra data for restricted
+ hashes
+ & PERL_MAGIC_proto (none) my sub prototype CV
+ . PERL_MAGIC_pos vtbl_pos pos() lvalue
+ : PERL_MAGIC_symtab (none) extra data for symbol
+ tables
+ < PERL_MAGIC_backref vtbl_backref for weak ref data
+ @ PERL_MAGIC_arylen_p (none) to move arylen out of XPVAV
+ B PERL_MAGIC_bm vtbl_regexp Boyer-Moore
+ (fast string search)
+ c PERL_MAGIC_overload_table vtbl_ovrld Holds overload table
+ (AMT) on stash
+ D PERL_MAGIC_regdata vtbl_regdata Regex match position data
+ (@+ and @- vars)
+ d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data
+ element
+ E PERL_MAGIC_env vtbl_env %ENV hash
+ e PERL_MAGIC_envelem vtbl_envelem %ENV hash element
+ f PERL_MAGIC_fm vtbl_regexp Formline
+ ('compiled' format)
+ g PERL_MAGIC_regex_global vtbl_mglob m//g target
+ H PERL_MAGIC_hints vtbl_hints %^H hash
+ h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element
+ I PERL_MAGIC_isa vtbl_isa @ISA array
+ i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element
+ k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue
+ L PERL_MAGIC_dbfile (none) Debugger %_<filename
+ l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename
+ element
+ N PERL_MAGIC_shared (none) Shared between threads
+ n PERL_MAGIC_shared_scalar (none) Shared between threads
+ o PERL_MAGIC_collxfrm vtbl_collxfrm Locale transformation
+ P PERL_MAGIC_tied vtbl_pack Tied array or hash
+ p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element
+ q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle
+ r PERL_MAGIC_qr vtbl_regexp precompiled qr// regex
+ S PERL_MAGIC_sig (none) %SIG hash
+ s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element
+ t PERL_MAGIC_taint vtbl_taint Taintedness
+ U PERL_MAGIC_uvar vtbl_uvar Available for use by
+ extensions
+ u PERL_MAGIC_uvar_elem (none) Reserved for use by
+ extensions
+ V PERL_MAGIC_vstring (none) SV was vstring literal
+ v PERL_MAGIC_vec vtbl_vec vec() lvalue
+ w PERL_MAGIC_utf8 vtbl_utf8 Cached UTF-8 information
+ x PERL_MAGIC_substr vtbl_substr substr() lvalue
+ y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator
+ variable / smart parameter
+ vivification
+ ] PERL_MAGIC_checkcall vtbl_checkcall inlining/mutation of call
+ to this CV
+ ~ PERL_MAGIC_ext (none) Available for use by
+ extensions
+
+=for mg_vtable.pl end
When an uppercase and lowercase letter both exist in the table, then the
uppercase letter is typically used to represent some kind of composite type
If the SV does not have that magical feature, C<NULL> is returned. If the
SV has multiple instances of that magical feature, the first one will be
returned. C<mg_findext> can be used to find a C<MAGIC> structure of an SV
-based on both it's magic type and it's magic virtual table:
+based on both its magic type and its magic virtual table:
MAGIC *mg_findext(SV *sv, int type, MGVTBL *vtbl);
XPUSHs(SV*)
-This macro automatically adjust the stack for you, if needed. Thus, you
+This macro automatically adjusts the stack for you, if needed. Thus, you
do not need to call C<EXTEND> to extend the stack.
Despite their suggestions in earlier versions of this document the macros
I32 call_sv(SV*, I32);
I32 call_pv(const char*, I32);
I32 call_method(const char*, I32);
- I32 call_argv(const char*, I32, register char**);
+ I32 call_argv(const char*, I32, char**);
The routine most often used is C<call_sv>. The C<SV*> argument
contains either the name of the Perl subroutine to be called, or a
=head2 PerlIO
-The most recent development releases of Perl has been experimenting with
+The most recent development releases of Perl have been experimenting with
removing Perl's dependency on the "normal" standard I/O suite and allowing
other stdio implementations to be used. This involves creating a new
abstraction layer that then calls whichever implementation of stdio Perl
S_incline(pTHX_ char *s)
STATIC becomes "static" in C, and may be #define'd to nothing in some
-configurations in future.
+configurations in the future.
A public function (i.e. part of the internal API, but not necessarily
sanctioned for use in extensions) begins like this:
Perl_sv_setiv(sv, num);
-You have to do nothing new in your extension to get this; since
+You don't have to do anything new in your extension to get this; since
the Perl library provides Perl_get_context(), it will all just
work.
=item n
-This does not need a interpreter context, so the definition has no
+This does not need an interpreter context, so the definition has no
C<pTHX>, and it follows that callers don't use C<aTHX>. (See
L</Background and PERL_IMPLICIT_CONTEXT>.)
In general, you either have to know what you're dealing with, or you
have to guess. The API function C<is_utf8_string> can help; it'll tell
you if a string contains only valid UTF-8 characters. However, it can't
-do the work for you. On a character-by-character basis, C<is_utf8_char>
+do the work for you. On a character-by-character basis,
+C<is_utf8_char_buf>
will tell you whether the current character in a string is valid UTF-8.
=head2 How does UTF-8 represent Unicode characters?
whether the byte can be encoded as a single byte even in UTF-8):
U8 *utf;
+ U8 *utf_end; /* 1 beyond buffer pointed to by utf */
UV uv; /* Note: a UV, not a U8, not a char */
+ STRLEN len; /* length of character in bytes */
if (!UTF8_IS_INVARIANT(*utf))
/* Must treat this as UTF-8 */
- uv = utf8_to_uv(utf);
+ uv = utf8_to_uvchr_buf(utf, utf_end, &len);
else
/* OK to treat this character as a byte */
uv = *utf;
-You can also see in that example that we use C<utf8_to_uv> to get the
-value of the character; the inverse function C<uv_to_utf8> is available
+You can also see in that example that we use C<utf8_to_uvchr_buf> to get the
+value of the character; the inverse function C<uvchr_to_utf8> is available
for putting a UV into UTF-8:
if (!UTF8_IS_INVARIANT(uv))
/* Must treat this as UTF8 */
- utf8 = uv_to_utf8(utf8, uv);
+ utf8 = uvchr_to_utf8(utf8, uv);
else
/* OK to treat this character as a byte */
*utf8++ = uv;
=item *
There's no way to tell if a string is UTF-8 or not. You can tell if an SV
-is UTF-8 by looking at is C<SvUTF8> flag. Don't forget to set the flag if
+is UTF-8 by looking at its C<SvUTF8> flag. Don't forget to set the flag if
something should be UTF-8. Treat the flag as part of the PV, even though
it's not - if you pass on the PV to somewhere, pass on the flag too.
=item *
-If a string is UTF-8, B<always> use C<utf8_to_uv> to get at the value,
+If a string is UTF-8, B<always> use C<utf8_to_uvchr_buf> to get at the value,
unless C<UTF8_IS_INVARIANT(*s)> in which case you can use C<*s>.
=item *
When writing a character C<uv> to a UTF-8 string, B<always> use
-C<uv_to_utf8>, unless C<UTF8_IS_INVARIANT(uv))> in which case
+C<uvchr_to_utf8>, unless C<UTF8_IS_INVARIANT(uv))> in which case
you can use C<*s = uv>.
=item *