perlexperiment: document the private_use experiment

[perl5.git] / pod / perlguts.pod
diff --git a/pod/perlguts.pod b/pod/perlguts.pod

index 96a4098..f1fd7da 100644 (file)
--- a/pod/perlguts.pod
+++ b/pod/perlguts.pod
@@ -21,16 +21,52 @@ Perl has three typedefs that handle Perl's three main data types:
  
  Each typedef has specific routines that manipulate the various data types.
  
+=for apidoc_section $AV
+=for apidoc Ayh||AV
+=for apidoc_section $HV
+=for apidoc Ayh||HV
+=for apidoc_section $SV
+=for apidoc Ayh||SV
+
  =head2 What is an "IV"?
  
  Perl uses a special typedef IV which is a simple signed integer type that is
  guaranteed to be large enough to hold a pointer (as well as an integer).
  Additionally, there is the UV, which is simply an unsigned IV.
  
-Perl also uses two special typedefs, I32 and I16, which will always be at
-least 32-bits and 16-bits long, respectively.  (Again, there are U32 and U16,
-as well.)  They will usually be exactly 32 and 16 bits long, but on Crays
-they will both be 64 bits.
+Perl also uses several special typedefs to declare variables to hold
+integers of (at least) a given size.
+Use I8, I16, I32, and I64 to declare a signed integer variable which has
+at least as many bits as the number in its name.  These all evaluate to
+the native C type that is closest to the given number of bits, but no
+smaller than that number.  For example, on many platforms, a C<short> is
+16 bits long, and if so, I16 will evaluate to a C<short>.  But on
+platforms where a C<short> isn't exactly 16 bits, Perl will use the
+smallest type that contains 16 bits or more.
+
+U8, U16, U32, and U64 are to declare the corresponding unsigned integer
+types.
+
+If the platform doesn't support 64-bit integers, both I64 and U64 will
+be undefined.  Use IV and UV to declare the largest practicable, and
+C<L<perlapi/WIDEST_UTYPE>> for the absolute maximum unsigned, but which
+may not be usable in all circumstances.
+
+A numeric constant can be specified with L<perlapi/C<INT16_C>>,
+L<perlapi/C<UINTMAX_C>>, and similar.
+
+=for apidoc_section $integer
+=for apidoc Ayh||I8
+=for apidoc_item ||I16
+=for apidoc_item ||I32
+=for apidoc_item ||I64
+=for apidoc_item ||IV
+
+=for apidoc Ayh||U8
+=for apidoc_item ||U16
+=for apidoc_item ||U32
+=for apidoc_item ||U64
+=for apidoc_item ||UV
  
  =head2 Working with SVs
  
@@ -46,6 +82,8 @@ much of the internals is that PVs are just for strings.  Often, for
  example, a trailing C<NUL> is tacked on automatically.  The non-string use
  is documented only in this paragraph.)
  
+=for apidoc Ayh||NV
+
  The seven routines are:
  
      SV*  newSViv(IV);
@@ -60,6 +98,8 @@ C<STRLEN> is an integer type (C<Size_t>, usually defined as C<size_t> in
  F<config.h>) guaranteed to be large enough to represent the size of
  any string that perl can handle.
  
+=for apidoc Ayh||STRLEN
+
  In the unlikely case of a SV requiring more complex initialization, you
  can create an empty SV with newSV(len).  If C<len> is 0 an empty SV of
  type NULL is returned, else an SV of type PV is returned with len + 1 (for
@@ -113,27 +153,74 @@ Perl's own functions typically add a trailing C<NUL> for this reason.
  Nevertheless, you should be very careful when you pass a string stored
  in an SV to a C function or system call.
  
-To access the actual value that an SV points to, you can use the macros:
+To access the actual value that an SV points to, Perl's API exposes
+several macros that coerce the actual scalar type into an IV, UV, double,
+or string:
+
+=over
+
+=item * C<SvIV(SV*)> (C<IV>) and C<SvUV(SV*)> (C<UV>)
+
+=item * C<SvNV(SV*)> (C<double>)
+
+=item * Strings are a bit complicated:
+
+=over
+
+=item * Byte string: C<SvPVbyte(SV*, STRLEN len)> or C<SvPVbyte_nolen(SV*)>
+
+If the Perl string is C<"\xff\xff">, then this returns a 2-byte C<char*>.
+
+This is suitable for Perl strings that represent bytes.
+
+=item * UTF-8 string: C<SvPVutf8(SV*, STRLEN len)> or C<SvPVutf8_nolen(SV*)>
+
+If the Perl string is C<"\xff\xff">, then this returns a 4-byte C<char*>.
+
+This is suitable for Perl strings that represent characters.
  
-    SvIV(SV*)
-    SvUV(SV*)
-    SvNV(SV*)
-    SvPV(SV*, STRLEN len)
-    SvPV_nolen(SV*)
+B<CAVEAT>: That C<char*> will be encoded via Perl's internal UTF-8 variant,
+which means that if the SV contains non-Unicode code points (e.g.,
+0x110000), then the result may contain extensions over valid UTF-8.
+See L<perlapi/is_strict_utf8_string> for some methods Perl gives
+you to check the UTF-8 validity of these macros' returns.
  
-which will automatically coerce the actual scalar type into an IV, UV, double,
-or string.
+=item * You can also use C<SvPV(SV*, STRLEN len)> or C<SvPV_nolen(SV*)>
+to fetch the SV's raw internal buffer. This is tricky, though; if your Perl
+string
+is C<"\xff\xff">, then depending on the SV's internal encoding you might get
+back a 2-byte B<OR> a 4-byte C<char*>.
+Moreover, if it's the 4-byte string, that could come from either Perl
+C<"\xff\xff"> stored UTF-8 encoded, or Perl C<"\xc3\xbf\xc3\xbf"> stored
+as raw octets. To differentiate between these you B<MUST> look up the
+SV's UTF8 bit (cf. C<SvUTF8>) to know whether the source Perl string
+is 2 characters (C<SvUTF8> would be on) or 4 characters (C<SvUTF8> would be
+off).
  
-In the C<SvPV> macro, the length of the string returned is placed into the
-variable C<len> (this is a macro, so you do I<not> use C<&len>).  If you do
-not care what the length of the data is, use the C<SvPV_nolen> macro.
-Historically the C<SvPV> macro with the global variable C<PL_na> has been
-used in this case.  But that can be quite inefficient because C<PL_na> must
+B<IMPORTANT:> Use of C<SvPV>, C<SvPV_nolen>, or
+similarly-named macros I<without> looking up the SV's UTF8 bit is
+almost certainly a bug if non-ASCII input is allowed.
+
+When the UTF8 bit is on, the same B<CAVEAT> about UTF-8 validity applies
+here as for C<SvPVutf8>.
+
+=back
+
+(See L</How do I pass a Perl string to a C library?> for more details.)
+
+In C<SvPVbyte>, C<SvPVutf8>, and C<SvPV>, the length of the C<char*> returned
+is placed into the
+variable C<len> (these are macros, so you do I<not> use C<&len>). If you do
+not care what the length of the data is, use C<SvPVbyte_nolen>,
+C<SvPVutf8_nolen>, or C<SvPV_nolen> instead.
+The global variable C<PL_na> can also be given to
+C<SvPVbyte>/C<SvPVutf8>/C<SvPV>
+in this case.  But that can be quite inefficient because C<PL_na> must
  be accessed in thread-local storage in threaded Perl.  In any case, remember
  that Perl allows arbitrary strings of data that may both contain NULs and
  might not be terminated by a C<NUL>.
  
-Also remember that C doesn't allow you to safely say C<foo(SvPV(s, len),
+Also remember that C doesn't allow you to safely say C<foo(SvPVbyte(s, len),
  len);>.  It might work with your
  compiler, but it won't work for everyone.
  Break this sort of statement up into separate assignments:
@@ -141,9 +228,11 @@ Break this sort of statement up into separate assignments:
      SV *s;
      STRLEN len;
      char *ptr;
-    ptr = SvPV(s, len);
+    ptr = SvPVbyte(s, len);
      foo(ptr, len);
  
+=back
+
  If you want to know if the scalar value is TRUE, you can use:
  
      SvTRUE(SV*)
@@ -160,7 +249,7 @@ add space for the trailing C<NUL> byte (perl's own string functions typically do
  C<SvGROW(sv, len + 1)>).
  
  If you want to write to an existing SV's buffer and set its value to a
-string, use SvPV_force() or one of its variants to force the SV to be
+string, use SvPVbyte_force() or one of its variants to force the SV to be
  a PV.  This will remove any of various types of non-stringness from
  the SV while preserving the content of the SV in the PV.  This can be
  used, for example, to append data from an API function to a buffer
@@ -190,7 +279,7 @@ copying with:
      s = SvGROW(sv, needlen + 1);
      /* something that modifies up to needlen bytes at s, but modifies
         newlen bytes
-         eg. newlen = read(fd, s. needlen);
+         eg. newlen = read(fd, s, needlen);
       */
      s[newlen] = '\0';
      SvCUR_set(sv, newlen);
@@ -513,6 +602,8 @@ overhead).  The key is a string pointer; the value is an C<SV*>.  However,
  once you have an C<HE*>, to get the actual key and value, use the routines
  specified below.
  
+=for apidoc Ayh||HE
+
      I32    hv_iterinit(HV*);
              /* Prepares starting point to traverse hash table */
      HE*    hv_iternext(HV*);
@@ -548,7 +639,7 @@ is only valid for the duration of a single perl process.
  See L</Understanding the Magic of Tied Hashes and Arrays> for more
  information on how to use the hash access functions on tied hashes.
  
-=for apidoc_section HV Handling
+=for apidoc_section $HV
  =for apidoc Amh|void|PERL_HASH|U32 hash|char *key|STRLEN klen
  
  =head2 Hash API Extensions
@@ -981,6 +1072,63 @@ as any other SV.
  
  For more information on references and blessings, consult L<perlref>.
  
+=head2 I/O Handles
+
+Like AVs and HVs, IO objects are another type of non-scalar SV which
+may contain input and output L<PerlIO|perlapio> objects or a C<DIR *>
+from opendir().
+
+You can create a new IO object:
+
+    IO*  newIO();
+
+Unlike other SVs, a new IO object is automatically blessed into the
+L<IO::File> class.
+
+The IO object contains an input and output PerlIO handle:
+
+  PerlIO *IoIFP(IO *io);
+  PerlIO *IoOFP(IO *io);
+
+Typically if the IO object has been opened on a file, the input handle
+is always present, but the output handle is only present if the file
+is open for output.  For a file, if both are present they will be the
+same PerlIO object.
+
+Distinct input and output PerlIO objects are created for sockets and
+character devices.
+
+The IO object also contains other data associated with Perl I/O
+handles:
+
+  IV IoLINES(io);                /* $. */
+  IV IoPAGE(io);                 /* $% */
+  IV IoPAGE_LEN(io);             /* $= */
+  IV IoLINES_LEFT(io);           /* $- */
+  char *IoTOP_NAME(io);          /* $^ */
+  GV *IoTOP_GV(io);              /* $^ */
+  char *IoFMT_NAME(io);          /* $~ */
+  GV *IoFMT_GV(io);              /* $~ */
+  char *IoBOTTOM_NAME(io);
+  GV *IoBOTTOM_GV(io);
+  char IoTYPE(io);
+  U8 IoFLAGS(io);
+
+Most of these are involved with L<formats|perlform>.
+
+IoFLAGs() may contain a combination of flags, the most interesting of
+which are C<IOf_FLUSH> (C<$|>) for autoflush and C<IOf_UNTAINT>,
+settable with L<< IO::Handle's untaint() method|IO::Handle/"$io->untaint" >>.
+
+The IO object may also contains a directory handle:
+
+  DIR *IoDIRP(io);
+
+suitable for use with PerlDir_read() etc.
+
+All of these accessors macros are lvalues, there are no distinct
+C<_set()> macros to modify the members of the IO object.
+
  =head2 Double-Typed SVs
  
  Scalar variables normally contain only one type of value, an integer,
@@ -1163,6 +1311,8 @@ C<MGVTBL>, which is a structure of function pointers and stands for
  "Magic Virtual Table" to handle the various operations that might be
  applied to that variable.
  
+=for apidoc Ayh||MGVTBL
+
  The C<MGVTBL> has five (or sometimes eight) pointers to the following
  routine types:
  
@@ -1289,98 +1439,7 @@ will be lost.
                                               extensions
  
  
-=for apidoc Amnh||PERL_MAGIC_sv
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-=for apidoc_item ||PERL_MAGIC_=for mg_vtable.pl begin
-
-
-=for mg_vtable.pl end
-
-=for apidoc_section Magic
-=for apidoc  Amnh||PERL_MAGIC_arylen
+=for apidoc AmnhU||PERL_MAGIC_arylen
  =for apidoc_item ||PERL_MAGIC_arylen_p
  =for apidoc_item ||PERL_MAGIC_backref
  =for apidoc_item ||PERL_MAGIC_bm
@@ -1413,6 +1472,7 @@ will be lost.
  =for apidoc_item ||PERL_MAGIC_sig
  =for apidoc_item ||PERL_MAGIC_sigelem
  =for apidoc_item ||PERL_MAGIC_substr
+=for apidoc_item ||PERL_MAGIC_sv
  =for apidoc_item ||PERL_MAGIC_symtab
  =for apidoc_item ||PERL_MAGIC_taint
  =for apidoc_item ||PERL_MAGIC_tied
@@ -1424,6 +1484,8 @@ will be lost.
  =for apidoc_item ||PERL_MAGIC_vec
  =for apidoc_item ||PERL_MAGIC_vstring
  
+=for mg_vtable.pl end
+
  When an uppercase and lowercase letter both exist in the table, then the
  uppercase letter is typically used to represent some kind of composite type
  (a list or a hash), and the lowercase letter is used to represent an element
@@ -1673,14 +1735,14 @@ Inside such a I<pseudo-block> the following service is available:
  These macros arrange things to restore the value of integer variable
  C<i> at the end of the enclosing I<pseudo-block>.
  
-=for apidoc_section Stack Manipulation Macros
+=for apidoc_section $stack
  =for apidoc Amh||SAVEINT|int i
  =for apidoc Amh||SAVEIV|IV i
  =for apidoc Amh||SAVEI32|I32 i
  =for apidoc Amh||SAVELONG|long i
  =for apidoc Amh||SAVEI8|I8 i
  =for apidoc Amh||SAVEI16|I16 i
-=for apidoc Amh||SAVEBOOL|int i
+=for apidoc Amh||SAVEBOOL|bool i
  
  =item C<SAVESPTR(s)>
  
@@ -1750,6 +1812,7 @@ this:
  At the end of I<pseudo-block> the function C<f> is called with the
  only argument C<p>.
  
+=for apidoc Ayh||DESTRUCTORFUNC_NOCONTEXT_t
  =for apidoc Amh||SAVEDESTRUCTOR|DESTRUCTORFUNC_NOCONTEXT_t f|void *p
  
  =item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)>
@@ -1757,7 +1820,8 @@ only argument C<p>.
  At the end of I<pseudo-block> the function C<f> is called with the
  implicit context argument (if any), and C<p>.
  
-for foo AMh||SAVEDESTRUCTOR_X|DESTRUCTORFUNC_t f|void *p
+=for apidoc Ayh||DESTRUCTORFUNC_t
+=for apidoc Amh||SAVEDESTRUCTOR_X|DESTRUCTORFUNC_t f|void *p
  
  =item C<SAVESTACK_POS()>
  
@@ -2210,6 +2274,11 @@ C<op_last>.  The children in between can be found by iteratively
  following the C<OpSIBLING> pointer from the first child to the last (but
  see below).
  
+=for apidoc Ayh||OP
+=for apidoc Ayh||BINOP
+=for apidoc Ayh||LISTOP
+=for apidoc Ayh||UNOP
+
  There are also some other op types: a C<PMOP> holds a regular expression,
  and has no children, and a C<LOOP> may or may not have children.  If the
  C<op_children> field is non-zero, it behaves like a C<LISTOP>.  To
@@ -2217,6 +2286,9 @@ complicate matters, if a C<UNOP> is actually a C<null> op after
  optimization (see L</Compile pass 2: context propagation>) it will still
  have children in accordance with its former type.
  
+=for apidoc Ayh||LOOP
+=for apidoc Ayh||PMOP
+
  Finally, there is a C<LOGOP>, or logic op. Like a C<LISTOP>, this has one
  or more children, but it doesn't have an C<op_last> field: so you have to
  follow C<op_first> and then the C<OpSIBLING> chain itself to find the
@@ -2226,6 +2298,8 @@ execution path. Operators like C<and>, C<or> and C<?> are C<LOGOP>s. Note
  that in general, C<op_other> may not point to any of the direct children
  of the C<LOGOP>.
  
+=for apidoc Ayh||LOGOP
+
  Starting in version 5.21.2, perls built with the experimental
  define C<-DPERL_OP_PARENT> add an extra boolean flag for each op,
  C<op_moresib>.  When not set, this indicates that this is the last op in an
@@ -2336,6 +2410,8 @@ per-subroutine or recursive stage, like this:
          prev_rpeepp = PL_rpeepp;
          PL_rpeepp = my_rpeep;
  
+=for apidoc Ayh||peep_t
+
  =head2 Pluggable runops
  
  The compile tree is executed in a runops function.  There are two runops
@@ -2350,6 +2426,8 @@ file, add the line:
  
    PL_runops = my_runops;
  
+=for apidoc Amnh|runops_proc_t|PL_runops
+
  This function should be as efficient as possible to keep your programs
  running as fast as possible.
  
@@ -2369,6 +2447,8 @@ this:
  This will arrange to have C<my_start_hook> called at the start of
  compiling every lexical scope.  The available hooks are:
  
+=for apidoc Ayh||BHK
+
  =over 4
  
  =item C<void bhk_start(pTHX_ int full)>
@@ -2521,6 +2601,9 @@ function used within the Perl guts:
  STATIC becomes "static" in C, and may be #define'd to nothing in some
  configurations in the future.
  
+=for apidoc_section $directives
+=for apidoc Ayh||STATIC
+
  A public function (i.e. part of the internal API, but not necessarily
  sanctioned for use in extensions) begins like this:
  
@@ -2534,7 +2617,7 @@ The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument,
  or 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and
  their variants.
  
-=for apidoc_section Concurrency
+=for apidoc_section $concurrency
  =for apidoc Amnh||aTHX
  =for apidoc Amnh||aTHX_
  =for apidoc Amnh||dTHX
@@ -2790,16 +2873,6 @@ following macros for portability
          NVff            NV %f-like
          NVgf            NV %g-like
  
-=for apidoc_section Formats
-=for apidoc Amnh||IVdf
-=for apidoc Amnh||UVuf
-=for apidoc Amnh||UVof
-=for apidoc Amnh||UVxf
-=for apidoc Amnh||UVXf
-=for apidoc Amnh||NVef
-=for apidoc Amnh||NVff
-=for apidoc Amnh||NVgf
-
  These will take care of 64-bit integers and long doubles.
  For example:
  
@@ -2819,7 +2892,7 @@ with PTR2UV().
  
  The contents of SVs may be printed using the C<SVf> format, like so:
  
- Perl_croak(aTHX_ "This croaked because: %" SVf "\n", SvfARG(err_msg))
+ Perl_croak(aTHX_ "This croaked because: %" SVf "\n", SVfARG(err_msg))
  
  where C<err_msg> is an SV.
  
@@ -2879,7 +2952,7 @@ UTF-8 in order to get good results and avoid Wide-character warnings.
  One way to do this for typical filehandles is to invoke perl with the
  C<-C>> parameter.  (See L<perlrun/-C [numberE<sol>list]>.
  
-=for apidoc_section Formats
+=for apidoc_section $formats
  =for apidoc Amnh||UTF8f
  =for apidoc Amh||UTF8fARG|bool is_utf8|Size_t byte_len|char *str
  
@@ -2914,11 +2987,11 @@ use the follow macros to do it right.
          PTR2NV(pointer)
          INT2PTR(pointertotype, integer)
  
-=for apidoc_section Casting
-=for apidoc Amh|void *|INT2PTR|type|int value
-=for apidoc Amh|UV|PTR2UV|void *
-=for apidoc Amh|IV|PTR2IV|void *
-=for apidoc Amh|NV|PTR2NV|void *
+=for apidoc_section $casting
+=for apidoc Amh|type|INT2PTR|type|int value
+=for apidoc Amh|UV|PTR2UV|void * ptr
+=for apidoc Amh|IV|PTR2IV|void * ptr
+=for apidoc Amh|NV|PTR2NV|void * ptr
  
  For example:
  
@@ -2941,7 +3014,7 @@ There are also
  And C<PTRV> which gives the native type for an integer the same size as
  pointers, such as C<unsigned> or C<unsigned long>.
  
-=for apidoc AmhuU|type|PTRV
+=for apidoc Ayh|type|PTRV
  
  =head2 Exception Handling
  
@@ -2984,10 +3057,6 @@ such manual which details all the functions which are available to XS
  writers.  L<perlintern> is the autogenerated manual for the functions
  which are not part of the API and are supposedly for internal use only.
  
-=for comment
-skip apidoc
-The following is an example and shouldn't be read as a real apidoc line
-
  Source documentation is created by putting POD comments into the C
  source, like this:
  
@@ -3223,6 +3292,66 @@ There is no published API for dealing with this, as it is subject to
  change, but you can look at the code for C<pp_lc> in F<pp.c> for an
  example as to how it's currently done.
  
+=head2 How do I pass a Perl string to a C library?
+
+A Perl string, conceptually, is an opaque sequence of code points.
+Many C libraries expect their inputs to be "classical" C strings, which are
+arrays of octets 1-255, terminated with a NUL byte. Your job when writing
+an interface between Perl and a C library is to define the mapping between
+Perl and that library.
+
+Generally speaking, C<SvPVbyte> and related macros suit this task well.
+These assume that your Perl string is a "byte string", i.e., is either
+raw, undecoded input into Perl or is pre-encoded to, e.g., UTF-8.
+
+Alternatively, if your C library expects UTF-8 text, you can use
+C<SvPVutf8> and related macros. This has the same effect as encoding
+to UTF-8 then calling the corresponding C<SvPVbyte>-related macro.
+
+Some C libraries may expect other encodings (e.g., UTF-16LE). To give
+Perl strings to such libraries
+you must either do that encoding in Perl then use C<SvPVbyte>, or
+use an intermediary C library to convert from however Perl stores the
+string to the desired encoding.
+
+Take care also that NULs in your Perl string don't confuse the C
+library. If possible, give the string's length to the C library; if that's
+not possible, consider rejecting strings that contain NUL bytes.
+
+=head3 What about C<SvPV>, C<SvPV_nolen>, etc.?
+
+Consider a 3-character Perl string C<$foo = "\x64\x78\x8c">.
+Perl can store these 3 characters either of two ways:
+
+=over
+
+=item * bytes: 0x64 0x78 0x8c
+
+=item * UTF-8: 0x64 0x78 0xc2 0x8c
+
+=back
+
+Now let's say you convert C<$foo> to a C string thus:
+
+    STRLEN strlen;
+    char *str = SvPV(foo_sv, strlen);
+
+At this point C<str> could point to a 3-byte C string or a 4-byte one.
+
+Generally speaking, we want C<str> to be the same regardless of how
+Perl stores C<$foo>, so the ambiguity here is undesirable. C<SvPVbyte>
+and C<SvPVutf8> solve that by giving predictable output: use
+C<SvPVbyte> if your C library expects byte strings, or C<SvPVutf8>
+if it expects UTF-8.
+
+If your C library happens to support both encodings, then C<SvPV>--always
+in tandem with lookups to C<SvUTF8>!--may be safe and (slightly) more
+efficient.
+
+B<TESTING> B<TIP:> Use L<utf8>'s C<upgrade> and C<downgrade> functions
+in your tests to ensure consistent handling regardless of Perl's
+internal encoding.
+
  =head2 How do I convert a string to UTF-8?
  
  If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to upgrade
@@ -3338,6 +3467,8 @@ ppaddr you use, set the properties of the custom op with
  C<XopENTRY_set>, and register the structure against the ppaddr using
  C<Perl_custom_op_register>.  A trivial example might look like:
  
+=for apidoc Ayh||XOP
+
      static XOP my_xop;
      static OP *my_pp(pTHX);
  
@@ -3404,6 +3535,8 @@ will be called from C<Perl_rpeep> when ops of this type are encountered
  by the peephole optimizer.  I<o> is the OP that needs optimizing;
  I<oldop> is the previous OP optimized, whose C<op_next> points to I<o>.
  
+=for apidoc Ayh||Perl_cpeep_t
+
  =back
  
  C<B::Generate> directly supports the creation of custom ops by name.
@@ -3546,7 +3679,7 @@ the API function C<sv_2mortal()> is used to mortalize an xV, adding its
  address to the temporaries stack.
  
  Likewise, there is no public API to read values from the temporaries stack.
-Instead. the macros C<SAVETMPS> and C<FREETPMS> are used. The C<SAVETMPS>
+Instead, the macros C<SAVETMPS> and C<FREETMPS> are used. The C<SAVETMPS>
  macro establishes the base levels of the temporaries stack, by capturing the
  current value of C<PL_tmps_ix> into C<PL_tmps_floor> and saving the previous
  value to the save stack. Thereafter, whenever C<FREETMPS> is invoked all of