X-Git-Url: https://perl5.git.perl.org/perl5.git/blobdiff_plain/bc028b6b7f0f25fba45e10fa46e3fe91dbe9a76d..289d61c2fe0a9a435b7e5828a3fbe9df5967f4d9:/pod/perlguts.pod diff --git a/pod/perlguts.pod b/pod/perlguts.pod index bbf8742..afc69ae 100644 --- a/pod/perlguts.pod +++ b/pod/perlguts.pod @@ -191,7 +191,7 @@ have "magic". See L later in this document. If you know the name of a scalar variable, you can get a pointer to its SV by using the following: - SV* get_sv("package::varname", FALSE); + SV* get_sv("package::varname", 0); This returns NULL if the variable does not exist. @@ -367,7 +367,7 @@ then nothing is done. If you know the name of an array variable, you can get a pointer to its AV by using the following: - AV* get_av("package::varname", FALSE); + AV* get_av("package::varname", 0); This returns NULL if the variable does not exist. @@ -442,7 +442,7 @@ specified below. If you know the name of a hash variable, you can get a pointer to its HV by using the following: - HV* get_hv("package::varname", FALSE); + HV* get_hv("package::varname", 0); This returns NULL if the variable does not exist. @@ -600,7 +600,7 @@ The most useful types that will be returned are: SVt_PVGV Glob (possible a file handle) SVt_PVMG Blessed or Magical Scalar - See the sv.h header file for more details. +See the F header file for more details. =head2 Blessed References and Class Objects @@ -667,9 +667,9 @@ to write: To create a new Perl variable with an undef value which can be accessed from your Perl script, use the following routines, depending on the variable type. - SV* get_sv("package::varname", TRUE); - AV* get_av("package::varname", TRUE); - HV* get_hv("package::varname", TRUE); + SV* get_sv("package::varname", GV_ADD); + AV* get_av("package::varname", GV_ADD); + HV* get_hv("package::varname", GV_ADD); Notice the use of TRUE as the second parameter. The new variable can now be set, using the routines appropriate to the data type. @@ -814,12 +814,12 @@ in the stash C in C's stash. To get the stash pointer for a particular package, use the function: - HV* gv_stashpv(const char* name, I32 create) - HV* gv_stashsv(SV*, I32 create) + HV* gv_stashpv(const char* name, I32 flags) + HV* gv_stashsv(SV*, I32 flags) The first function takes a literal string, the second uses the string stored in the SV. Remember that a stash is just a hash table, so you get back an -C. The C flag will create a new package if it is set. +C. The C flag will create a new package if it is set to GV_ADD. The name that C wants is the name of the package whose symbol table you want. The default package is called C
. If you have multiply nested @@ -878,7 +878,7 @@ following code: extern int dberror; extern char *dberror_list; - SV* sv = get_sv("dberror", TRUE); + SV* sv = get_sv("dberror", GV_ADD); sv_setiv(sv, (IV) dberror); sv_setpv(sv, dberror_list[dberror]); SvIOK_on(sv); @@ -901,9 +901,9 @@ linked list of C's, typedef'ed to C. U16 mg_private; char mg_type; U8 mg_flags; + I32 mg_len; SV* mg_obj; char* mg_ptr; - I32 mg_len; }; Note this is current as of patchlevel 0, and could change at any time. @@ -979,27 +979,27 @@ routine types: int (*svt_clear)(SV* sv, MAGIC* mg); int (*svt_free)(SV* sv, MAGIC* mg); - int (*svt_copy)(SV *sv, MAGIC* mg, SV *nsv, const char *name, int namlen); + int (*svt_copy)(SV *sv, MAGIC* mg, SV *nsv, const char *name, I32 namlen); int (*svt_dup)(MAGIC *mg, CLONE_PARAMS *param); int (*svt_local)(SV *nsv, MAGIC *mg); This MGVTBL structure is set at compile-time in F and there are -currently 19 types (or 21 with overloading turned on). These different -structures contain pointers to various routines that perform additional -actions depending on which function is being called. +currently 32 types. These different structures contain pointers to various +routines that perform additional actions depending on which function is +being called. Function pointer Action taken ---------------- ------------ svt_get Do something before the value of the SV is retrieved. svt_set Do something after the SV is assigned a value. svt_len Report on the SV's length. - svt_clear Clear something the SV represents. + svt_clear Clear something the SV represents. svt_free Free any extra storage associated with the SV. - svt_copy copy tied variable magic to a tied element - svt_dup duplicate a magic structure during thread cloning - svt_local copy magic to local value during 'local' + svt_copy copy tied variable magic to a tied element + svt_dup duplicate a magic structure during thread cloning + svt_local copy magic to local value during 'local' For instance, the MGVTBL structure called C (which corresponds to an C of C) contains: @@ -1022,53 +1022,52 @@ to change. The current kinds of Magic Virtual Tables are: mg_type - (old-style char and macro) MGVTBL Type of magic - -------------------------- ------ ---------------------------- - \0 PERL_MAGIC_sv vtbl_sv Special scalar variable - A PERL_MAGIC_overload vtbl_amagic %OVERLOAD hash + (old-style char and macro) MGVTBL Type of magic + -------------------------- ------ ------------- + \0 PERL_MAGIC_sv vtbl_sv Special scalar variable + A PERL_MAGIC_overload vtbl_amagic %OVERLOAD hash a PERL_MAGIC_overload_elem vtbl_amagicelem %OVERLOAD hash element - c PERL_MAGIC_overload_table (none) Holds overload table (AMT) - on stash - B PERL_MAGIC_bm vtbl_bm Boyer-Moore (fast string search) - D PERL_MAGIC_regdata vtbl_regdata Regex match position data - (@+ and @- vars) - d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data - element - E PERL_MAGIC_env vtbl_env %ENV hash - e PERL_MAGIC_envelem vtbl_envelem %ENV hash element - f PERL_MAGIC_fm vtbl_fm Formline ('compiled' format) - g PERL_MAGIC_regex_global vtbl_mglob m//g target / study()ed string - H PERL_MAGIC_hints vtbl_sig %^H hash - h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element - I PERL_MAGIC_isa vtbl_isa @ISA array - i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element - k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue - L PERL_MAGIC_dbfile (none) Debugger %_ is one of a number of macros (in perl.h) that hide the +C is one of a number of macros (in F) that hide the details of the interpreter's context. THX stands for "thread", "this", or "thingy", as the case may be. (And no, George Lucas is not involved. :-) The first character could be 'p' for a B

rototype, 'a' for Brgument, @@ -2029,7 +2028,7 @@ built with PERL_IMPLICIT_CONTEXT enabled. There are three ways to do this. First, the easy but inefficient way, which is also the default, in order to maintain source compatibility -with extensions: whenever XSUB.h is #included, it redefines the aTHX +with extensions: whenever F is #included, it redefines the aTHX and aTHX_ macros to call a function that will return the context. Thus, something like: @@ -2056,9 +2055,9 @@ your Foo.xs: #include "perl.h" #include "XSUB.h" - static my_private_function(int arg1, int arg2); + STATIC void my_private_function(int arg1, int arg2); - static SV * + STATIC void my_private_function(int arg1, int arg2) { dTHX; /* fetch context */ @@ -2096,9 +2095,9 @@ the Perl guts: #include "XSUB.h" /* pTHX_ only needed for functions that call Perl API */ - static my_private_function(pTHX_ int arg1, int arg2); + STATIC void my_private_function(pTHX_ int arg1, int arg2); - static SV * + STATIC void my_private_function(pTHX_ int arg1, int arg2) { /* dTHX; not needed here, because THX is an argument */ @@ -2166,7 +2165,7 @@ This allows the ability to provide an extra pointer (called the "host" environment) for all the system calls. This makes it possible for all the system stuff to maintain their own state, broken down into seven C structures. These are thin wrappers around the usual system -calls (see win32/perllib.c) for the default perl executable, but for a +calls (see F) for the default perl executable, but for a more ambitious host (like the one that would do fork() emulation) all the extra work needed to pretend that different interpreters are actually different "processes", would be done here. @@ -2431,8 +2430,8 @@ To fix this, some people formed Unicode, Inc. and produced a new character set containing all the characters you can possibly think of and more. There are several ways of representing these characters, and the one Perl uses is called UTF-8. UTF-8 uses -a variable number of bytes to represent a character, instead of just -one. You can learn more about Unicode at http://www.unicode.org/ +a variable number of bytes to represent a character. You can learn more +about Unicode and Perl's Unicode model in L. =head2 How can I recognise a UTF-8 string? @@ -2443,16 +2442,17 @@ C. Unfortunately, the non-Unicode string C has that byte sequence as well. So you can't tell just by looking - this is what makes Unicode input an interesting problem. -The API function C can help; it'll tell you if a string -contains only valid UTF-8 characters. However, it can't do the work for -you. On a character-by-character basis, C will tell you -whether the current character in a string is valid UTF-8. +In general, you either have to know what you're dealing with, or you +have to guess. The API function C can help; it'll tell +you if a string contains only valid UTF-8 characters. However, it can't +do the work for you. On a character-by-character basis, C +will tell you whether the current character in a string is valid UTF-8. =head2 How does UTF-8 represent Unicode characters? As mentioned above, UTF-8 uses a variable number of bytes to store a -character. Characters with values 1...128 are stored in one byte, just -like good ol' ASCII. Character 129 is stored as C; this +character. Characters with values 0...127 are stored in one byte, just +like good ol' ASCII. Character 128 is stored as C; this continues up to character 191, which is C. Now we've run out of bits (191 is binary C<10111111>) so we move on; 192 is C. And so it goes on, moving to three bytes at character 2048. @@ -2509,9 +2509,11 @@ So don't do that! =head2 How does Perl store UTF-8 strings? Currently, Perl deals with Unicode strings and non-Unicode strings -slightly differently. If a string has been identified as being UTF-8 -encoded, Perl will set a flag in the SV, C. You can check and -manipulate this flag with the following macros: +slightly differently. A flag in the SV, C, indicates that the +string is internally encoded as UTF-8. Without it, the byte value is the +codepoint number and vice versa (in other words, the string is encoded +as iso-8859-1). You can check and manipulate this flag with the +following macros: SvUTF8(sv) SvUTF8_on(sv) @@ -2523,7 +2525,7 @@ C, C and other string handling operations will have undesirable results. The problem comes when you have, for instance, a string that isn't -flagged is UTF-8, and contains a byte sequence that could be UTF-8 - +flagged as UTF-8, and contains a byte sequence that could be UTF-8 - especially when combining non-UTF-8 and UTF-8 strings. Never forget that the C flag is separate to the PV value; you @@ -2541,7 +2543,7 @@ manipulating SVs. More specifically, you cannot expect to do this: The C string does not tell you the whole story, and you can't copy or reconstruct an SV just by copying the string value. Check if the -old SV has the UTF-8 flag set, and act accordingly: +old SV has the UTF8 flag set, and act accordingly: p = SvPV(sv, len); frobnicate(p); @@ -2554,14 +2556,14 @@ not it's dealing with UTF-8 data, so that it can handle the string appropriately. Since just passing an SV to an XS function and copying the data of -the SV is not enough to copy the UTF-8 flags, even less right is just +the SV is not enough to copy the UTF8 flags, even less right is just passing a C to an XS function. =head2 How do I convert a string to UTF-8? -If you're mixing UTF-8 and non-UTF-8 strings, you might find it necessary -to upgrade one of the strings to UTF-8. If you've got an SV, the easiest -way to do this is: +If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to upgrade +one of the strings to UTF-8. If you've got an SV, the easiest way to do +this is: sv_utf8_upgrade(sv); @@ -2572,7 +2574,7 @@ However, you must not do this, for example: If you do this in a binary operator, you will actually change one of the strings that came into the operator, and, while it shouldn't be noticeable -by the end user, it can cause problems. +by the end user, it can cause problems in deficient code. Instead, C will give you a UTF-8-encoded B of its string argument. This is useful for having the data available for @@ -2608,9 +2610,7 @@ you can use C<*s = uv>. =item * Mixing UTF-8 and non-UTF-8 strings is tricky. Use C to get -a new string which is UTF-8 encoded. There are tricks you can use to -delay deciding whether you need to use a UTF-8 string until you get to a -high character - C is one of those. +a new string which is UTF-8 encoded, and then combine them. =back