Slight tweaks on the XS-and-Unicode docs, inspired by [perl #17852].

author Jarkko Hietaniemi <jhi@iki.fi>

Tue, 3 Dec 2002 15:04:07 +0000 (15:04 +0000)

committer Jarkko Hietaniemi <jhi@iki.fi>

Tue, 3 Dec 2002 15:04:07 +0000 (15:04 +0000)
author Jarkko Hietaniemi <jhi@iki.fi>
Tue, 3 Dec 2002 15:04:07 +0000 (15:04 +0000)
committer Jarkko Hietaniemi <jhi@iki.fi>
Tue, 3 Dec 2002 15:04:07 +0000 (15:04 +0000)
diff --git a/pod/perlguts.pod b/pod/perlguts.pod

index d93eadf..e71173f 100644 (file)
--- a/pod/perlguts.pod
+++ b/pod/perlguts.pod
@@ -2230,13 +2230,15 @@ C<utf8_hop>, which takes a string and a number of characters to skip
  over. You're on your own about bounds checking, though, so don't use it
  lightly.
  
-All bytes in a multi-byte UTF8 character will have the high bit set, so
-you can test if you need to do something special with this character
-like this:
+All bytes in a multi-byte UTF8 character will have the high bit set,
+so you can test if you need to do something special with this
+character like this (the UTF8_IS_CONTINUED() is a macro that tests
+whether the byte is part of a multi-byte UTF-8 character):
  
-    UV uv;
+    U8 *utf;
+    UV uv;     /* Note: a UV, not a U8, not a char */
  
-    if (utf & 0x80)
+    if (UTF8_IS_CONTINUED(*utf))
          /* Must treat this as UTF8 */
          uv = utf8_to_uv(utf);
      else
@@ -2247,7 +2249,7 @@ You can also see in that example that we use C<utf8_to_uv> to get the
  value of the character; the inverse function C<uv_to_utf8> is available
  for putting a UV into UTF8:
  
-    if (uv > 0x80)
+    if (UTF8_IS_CONTINUED(uv))
          /* Must treat this as UTF8 */
          utf8 = uv_to_utf8(utf8, uv);
      else
@@ -2309,6 +2311,10 @@ In fact, your C<frobnicate> function should be made aware of whether or
  not it's dealing with UTF8 data, so that it can handle the string
  appropriately.
  
+Since just passing an SV to an XS function and copying the data of
+the SV is not enough to copy the UTF8 flags, even less right is just
+passing a C<char *> to an XS function.
+
  =head2 How do I convert a string to UTF8?
  
  If you're mixing UTF8 and non-UTF8 strings, you might find it necessary
@@ -2349,12 +2355,13 @@ it's not - if you pass on the PV to somewhere, pass on the flag too.
  =item *
  
  If a string is UTF8, B<always> use C<utf8_to_uv> to get at the value,
-unless C<!(*s & 0x80)> in which case you can use C<*s>.
+unless C<!UTF8_IS_CONTINUED(*s)> in which case you can use C<*s>.
  
  =item *
  
-When writing to a UTF8 string, B<always> use C<uv_to_utf8>, unless
-C<uv < 0x80> in which case you can use C<*s = uv>.
+When writing a character C<uv> to a UTF8 string, B<always> use
+C<uv_to_utf8>, unless C<!UTF8_IS_CONTINUED(uv))> in which case
+you can use C<*s = uv>.
  
  =item *
  
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod

index bf21206..ce9883d 100644 (file)
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -1015,8 +1015,10 @@ straddling of the proverbial fence causes problems.
  
  =head2 Using Unicode in XS
  
-If you want to handle Perl Unicode in XS extensions, you may find
-the following C APIs useful.  See L<perlapi> for details.
+If you want to handle Perl Unicode in XS extensions, you may find the
+following C APIs useful.  See also L<perlguts/"Unicode Support"> for an
+explanation about Unicode at the XS level, and L<perlapi> for the API
+details.
  
  =over 4
author	Jarkko Hietaniemi <jhi@iki.fi>
	Tue, 3 Dec 2002 15:04:07 +0000 (15:04 +0000)
committer	Jarkko Hietaniemi <jhi@iki.fi>
	Tue, 3 Dec 2002 15:04:07 +0000 (15:04 +0000)
pod/perlguts.pod		patch \| blob \| blame \| history
pod/perlunicode.pod		patch \| blob \| blame \| history