Slight tweaks on the XS-and-Unicode docs, inspired by [perl #17852].

author Jarkko Hietaniemi <jhi@iki.fi>

Tue, 3 Dec 2002 15:04:07 +0000 (15:04 +0000)

committer Jarkko Hietaniemi <jhi@iki.fi>

Tue, 3 Dec 2002 15:04:07 +0000 (15:04 +0000)
author Jarkko Hietaniemi <jhi@iki.fi>
Tue, 3 Dec 2002 15:04:07 +0000 (15:04 +0000)
committer Jarkko Hietaniemi <jhi@iki.fi>
Tue, 3 Dec 2002 15:04:07 +0000 (15:04 +0000)
diff --git a/pod/perlguts.pod b/pod/perlguts.pod

index d93eadf..e71173f 100644 (file)
--- a/pod/perlguts.pod
+++ b/pod/perlguts.pod
@@ -2230,13 +2230,15 @@ C<utf8_hop>, which takes a string and a number of characters to skip
  over. You're on your own about bounds checking, though, so don't use it
  lightly.
  
  over. You're on your own about bounds checking, though, so don't use it
  lightly.
  
-All bytes in a multi-byte UTF8 character will have the high bit set, so
-you can test if you need to do something special with this character
-like this:
+All bytes in a multi-byte UTF8 character will have the high bit set,
+so you can test if you need to do something special with this
+character like this (the UTF8_IS_CONTINUED() is a macro that tests
+whether the byte is part of a multi-byte UTF-8 character):
  
  
-    UV uv;
+    U8 *utf;
+    UV uv;     /* Note: a UV, not a U8, not a char */
  
  
-    if (utf & 0x80)
+    if (UTF8_IS_CONTINUED(*utf))
          /* Must treat this as UTF8 */
          uv = utf8_to_uv(utf);
      else
          /* Must treat this as UTF8 */
          uv = utf8_to_uv(utf);
      else
@@ -2247,7 +2249,7 @@ You can also see in that example that we use C<utf8_to_uv> to get the
  value of the character; the inverse function C<uv_to_utf8> is available
  for putting a UV into UTF8:
  
  value of the character; the inverse function C<uv_to_utf8> is available
  for putting a UV into UTF8:
  
-    if (uv > 0x80)
+    if (UTF8_IS_CONTINUED(uv))
          /* Must treat this as UTF8 */
          utf8 = uv_to_utf8(utf8, uv);
      else
          /* Must treat this as UTF8 */
          utf8 = uv_to_utf8(utf8, uv);
      else
@@ -2309,6 +2311,10 @@ In fact, your C<frobnicate> function should be made aware of whether or
  not it's dealing with UTF8 data, so that it can handle the string
  appropriately.
  
  not it's dealing with UTF8 data, so that it can handle the string
  appropriately.
  
+Since just passing an SV to an XS function and copying the data of
+the SV is not enough to copy the UTF8 flags, even less right is just
+passing a C<char *> to an XS function.
+
  =head2 How do I convert a string to UTF8?
  
  If you're mixing UTF8 and non-UTF8 strings, you might find it necessary
  =head2 How do I convert a string to UTF8?
  
  If you're mixing UTF8 and non-UTF8 strings, you might find it necessary
@@ -2349,12 +2355,13 @@ it's not - if you pass on the PV to somewhere, pass on the flag too.
  =item *
  
  If a string is UTF8, B<always> use C<utf8_to_uv> to get at the value,
  =item *
  
  If a string is UTF8, B<always> use C<utf8_to_uv> to get at the value,
-unless C<!(*s & 0x80)> in which case you can use C<*s>.
+unless C<!UTF8_IS_CONTINUED(*s)> in which case you can use C<*s>.
  
  =item *
  
  
  =item *
  
-When writing to a UTF8 string, B<always> use C<uv_to_utf8>, unless
-C<uv < 0x80> in which case you can use C<*s = uv>.
+When writing a character C<uv> to a UTF8 string, B<always> use
+C<uv_to_utf8>, unless C<!UTF8_IS_CONTINUED(uv))> in which case
+you can use C<*s = uv>.
  
  =item *
  
  
  =item *
  
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod

index bf21206..ce9883d 100644 (file)
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -1015,8 +1015,10 @@ straddling of the proverbial fence causes problems.
  
  =head2 Using Unicode in XS
  
  
  =head2 Using Unicode in XS
  
-If you want to handle Perl Unicode in XS extensions, you may find
-the following C APIs useful.  See L<perlapi> for details.
+If you want to handle Perl Unicode in XS extensions, you may find the
+following C APIs useful.  See also L<perlguts/"Unicode Support"> for an
+explanation about Unicode at the XS level, and L<perlapi> for the API
+details.
  
  =over 4
  
  
  =over 4
author	Jarkko Hietaniemi <jhi@iki.fi>
	Tue, 3 Dec 2002 15:04:07 +0000 (15:04 +0000)
committer	Jarkko Hietaniemi <jhi@iki.fi>
	Tue, 3 Dec 2002 15:04:07 +0000 (15:04 +0000)
pod/perlguts.pod		patch \| blob \| blame \| history
pod/perlunicode.pod		patch \| blob \| blame \| history