From: Karl Williamson <khw@cpan.org>
Date: Thu, 7 May 2015 03:07:33 +0000 (-0600)
Subject: perlunicode: Refer to perlguts for XS handling
X-Git-Tag: v5.22.0-RC1~75
X-Git-Url: https://perl5.git.perl.org/perl5.git/commitdiff_plain/37b3b6086552e04f105d3f09b6fdef16cc6a4a64

perlunicode: Refer to perlguts for XS handling

Don't redescribe things here.  Also refer to perlapi.
---

diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index 34dac61..e11adea 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -1690,113 +1690,11 @@ that don't fit into a byte.
 Calling either function on a string that already is in the desired state is a
 no-op.
 
-=head2 Using Unicode in XS
-
-If you want to handle Perl Unicode in XS extensions, you may find the
-following C APIs useful.  See also L<perlguts/"Unicode Support"> for an
-explanation about Unicode at the XS level, and L<perlapi> for the API
-details.
-
-=over 4
-
-=item *
-
-C<DO_UTF8(sv)> returns true if the C<UTF8> flag is on and the bytes
-pragma is not in effect.  C<SvUTF8(sv)> returns true if the C<UTF8>
-flag is on; the C<bytes> pragma is ignored.  The C<UTF8> flag being on
-does B<not> mean that there are any characters of code points greater
-than 255 (or 127) in the scalar or that there are even any characters
-in the scalar.  What the C<UTF8> flag means is that the sequence of
-octets in the representation of the scalar is the sequence of UTF-8
-encoded code points of the characters of a string.  The C<UTF8> flag
-being off means that each octet in this representation encodes a
-single character with code point 0..255 within the string.  Perl's
-Unicode model is not to use UTF-8 until it is absolutely necessary.
-
-=item *
-
-C<uvchr_to_utf8(buf, chr)> writes a Unicode character code point into
-a buffer encoding the code point as UTF-8, and returns a pointer
-pointing after the UTF-8 bytes.  It works appropriately on EBCDIC machines.
-
-=item *
-
-C<utf8_to_uvchr_buf(buf, bufend, lenp)> reads UTF-8 encoded bytes from a
-buffer and
-returns the Unicode character code point and, optionally, the length of
-the UTF-8 byte sequence.  It works appropriately on EBCDIC machines.
-
-=item *
-
-C<utf8_length(start, end)> returns the length of the UTF-8 encoded buffer
-in characters.  C<sv_len_utf8(sv)> returns the length of the UTF-8 encoded
-scalar.
-
-=item *
-
-C<sv_utf8_upgrade(sv)> converts the string of the scalar to its UTF-8
-encoded form.  C<sv_utf8_downgrade(sv)> does the opposite, if
-possible.  C<sv_utf8_encode(sv)> is like sv_utf8_upgrade except that
-it does not set the C<UTF8> flag.  C<sv_utf8_decode()> does the
-opposite of C<sv_utf8_encode()>.  Note that none of these are to be
-used as general-purpose encoding or decoding interfaces: C<use Encode>
-for that.  C<sv_utf8_upgrade()> is affected by the encoding pragma
-but C<sv_utf8_downgrade()> is not (since the encoding pragma is
-designed to be a one-way street).
-
-=item *
 
-C<is_utf8_string(buf, len)> returns true if C<len> bytes of the buffer
-are valid UTF-8.
-
-=item *
-
-C<isUTF8_CHAR(buf, buf_end)> returns true if the pointer points to
-a valid UTF-8 character.
-
-=item *
-
-C<UTF8SKIP(buf)> will return the number of bytes in the UTF-8 encoded
-character in the buffer.  C<UNISKIP(chr)> will return the number of bytes
-required to UTF-8-encode the code point.  C<UTF8SKIP()>
-is useful for example for iterating over the characters of a UTF-8
-encoded buffer; C<UNISKIP()> is useful, for example, in computing
-the size required for a UTF-8 encoded buffer.
-
-=item *
-
-C<utf8_distance(a, b)> will tell the distance in characters between the
-two pointers pointing to the same UTF-8 encoded buffer.
-
-=item *
-
-C<utf8_hop(s, off)> will return a pointer to a UTF-8 encoded buffer
-that is C<off> (positive or negative) Unicode characters displaced
-from the UTF-8 buffer C<s>.  Be careful not to overstep the buffer:
-C<utf8_hop()> will merrily run off the end or the beginning of the
-buffer if told to do so.
-
-=item *
-
-C<pv_uni_display(dsv, spv, len, pvlim, flags)> and
-C<sv_uni_display(dsv, ssv, pvlim, flags)> are useful for debugging the
-output of Unicode strings and scalars.  By default they are useful
-only for debugging--they display B<all> characters as hexadecimal code
-points--but with the flags C<UNI_DISPLAY_ISPRINT>,
-C<UNI_DISPLAY_BACKSLASH>, and C<UNI_DISPLAY_QQ> you can make the
-output more readable.
-
-=item *
-
-C<foldEQ_utf8(s1, pe1, l1, u1, s2, pe2, l2, u2)> can be used to
-compare two strings case-insensitively in Unicode.  For case-sensitive
-comparisons you can just use C<memEQ()> and C<memNE()> as usual, except
-if one string is in utf8 and the other isn't.
-
-=back
+=head2 Using Unicode in XS
 
-For more information, see L<perlapi>, and F<utf8.c> and F<utf8.h>
-in the Perl source code distribution.
+See L<perlguts/"Unicode Support"> for an introduction to Unicode at
+the XS level, and L<perlapi/Unicode Support> for the API details.
 
 =head2 Hacking Perl to work on earlier Unicode versions (for very serious hackers only)