An SV can be created and loaded with one command. There are five types of
values that can be loaded: an integer value (IV), an unsigned integer
value (UV), a double (NV), a string (PV), and another scalar (SV).
+("PV" stands for "Pointer Value". You might think that it is misnamed
+because it is described as pointing only to strings. However, it is
+possible to have it point to other things. For example, inversion
+lists, used in regular expression data structures, are scalars, each
+consisting of an array of UVs which are accessed through PVs. But,
+using it for non-strings requires care, as the underlying assumption of
+much of the internals is that PVs are just for strings. Often, for
+example, a trailing NUL is tacked on automatically. The non-string use
+is documented only in this paragraph.)
The seven routines are:
e PERL_MAGIC_envelem vtbl_envelem %ENV hash element
f PERL_MAGIC_fm vtbl_regdata Formline
('compiled' format)
- G PERL_MAGIC_study vtbl_regexp study()ed string
g PERL_MAGIC_regex_global vtbl_mglob m//g target
H PERL_MAGIC_hints vtbl_hints %^H hash
h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element
extensions
u PERL_MAGIC_uvar_elem (none) Reserved for use by
extensions
- V PERL_MAGIC_vstring vtbl_vstring SV was vstring literal
+ V PERL_MAGIC_vstring (none) SV was vstring literal
v PERL_MAGIC_vec vtbl_vec vec() lvalue
w PERL_MAGIC_utf8 vtbl_utf8 Cached UTF-8 information
x PERL_MAGIC_substr vtbl_substr substr() lvalue
In general, you either have to know what you're dealing with, or you
have to guess. The API function C<is_utf8_string> can help; it'll tell
you if a string contains only valid UTF-8 characters. However, it can't
-do the work for you. On a character-by-character basis, C<is_utf8_char>
+do the work for you. On a character-by-character basis,
+C<is_utf8_char_buf>
will tell you whether the current character in a string is valid UTF-8.
=head2 How does UTF-8 represent Unicode characters?