From: Karl Williamson Date: Thu, 7 May 2015 03:07:33 +0000 (-0600) Subject: perlunicode: Refer to perlguts for XS handling X-Git-Tag: v5.22.0-RC1~75 X-Git-Url: https://perl5.git.perl.org/perl5.git/commitdiff_plain/37b3b6086552e04f105d3f09b6fdef16cc6a4a64 perlunicode: Refer to perlguts for XS handling Don't redescribe things here. Also refer to perlapi. --- diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod index 34dac61..e11adea 100644 --- a/pod/perlunicode.pod +++ b/pod/perlunicode.pod @@ -1690,113 +1690,11 @@ that don't fit into a byte. Calling either function on a string that already is in the desired state is a no-op. -=head2 Using Unicode in XS - -If you want to handle Perl Unicode in XS extensions, you may find the -following C APIs useful. See also L for an -explanation about Unicode at the XS level, and L for the API -details. - -=over 4 - -=item * - -C returns true if the C flag is on and the bytes -pragma is not in effect. C returns true if the C -flag is on; the C pragma is ignored. The C flag being on -does B mean that there are any characters of code points greater -than 255 (or 127) in the scalar or that there are even any characters -in the scalar. What the C flag means is that the sequence of -octets in the representation of the scalar is the sequence of UTF-8 -encoded code points of the characters of a string. The C flag -being off means that each octet in this representation encodes a -single character with code point 0..255 within the string. Perl's -Unicode model is not to use UTF-8 until it is absolutely necessary. - -=item * - -C writes a Unicode character code point into -a buffer encoding the code point as UTF-8, and returns a pointer -pointing after the UTF-8 bytes. It works appropriately on EBCDIC machines. - -=item * - -C reads UTF-8 encoded bytes from a -buffer and -returns the Unicode character code point and, optionally, the length of -the UTF-8 byte sequence. It works appropriately on EBCDIC machines. - -=item * - -C returns the length of the UTF-8 encoded buffer -in characters. C returns the length of the UTF-8 encoded -scalar. - -=item * - -C converts the string of the scalar to its UTF-8 -encoded form. C does the opposite, if -possible. C is like sv_utf8_upgrade except that -it does not set the C flag. C does the -opposite of C. Note that none of these are to be -used as general-purpose encoding or decoding interfaces: C -for that. C is affected by the encoding pragma -but C is not (since the encoding pragma is -designed to be a one-way street). - -=item * -C returns true if C bytes of the buffer -are valid UTF-8. - -=item * - -C returns true if the pointer points to -a valid UTF-8 character. - -=item * - -C will return the number of bytes in the UTF-8 encoded -character in the buffer. C will return the number of bytes -required to UTF-8-encode the code point. C -is useful for example for iterating over the characters of a UTF-8 -encoded buffer; C is useful, for example, in computing -the size required for a UTF-8 encoded buffer. - -=item * - -C will tell the distance in characters between the -two pointers pointing to the same UTF-8 encoded buffer. - -=item * - -C will return a pointer to a UTF-8 encoded buffer -that is C (positive or negative) Unicode characters displaced -from the UTF-8 buffer C. Be careful not to overstep the buffer: -C will merrily run off the end or the beginning of the -buffer if told to do so. - -=item * - -C and -C are useful for debugging the -output of Unicode strings and scalars. By default they are useful -only for debugging--they display B characters as hexadecimal code -points--but with the flags C, -C, and C you can make the -output more readable. - -=item * - -C can be used to -compare two strings case-insensitively in Unicode. For case-sensitive -comparisons you can just use C and C as usual, except -if one string is in utf8 and the other isn't. - -=back +=head2 Using Unicode in XS -For more information, see L, and F and F -in the Perl source code distribution. +See L for an introduction to Unicode at +the XS level, and L for the API details. =head2 Hacking Perl to work on earlier Unicode versions (for very serious hackers only)