From 13aab5dd33b0c93c68cd510a6395b3e2d66079e3 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Sun, 1 Jul 2018 13:48:34 -0600 Subject: [PATCH 1/1] Fix outdated docs for isUTF8_char() It doesn't accept non-negative code points that don't fit in an IV --- inline.h | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/inline.h b/inline.h index d0e60e4..78a162c 100644 --- a/inline.h +++ b/inline.h @@ -1022,7 +1022,7 @@ value gives how many bytes starting at C comprise the code point's representation. Any bytes remaining before C, but beyond the ones needed to form the first code point in C, are not examined. -The code point can be any that will fit in a UV on this machine, using Perl's +The code point can be any that will fit in an IV on this machine, using Perl's extension to official UTF-8 to represent those higher than the Unicode maximum of 0x10FFFF. That means that this macro is used to efficiently decide if the next few bytes in C is legal UTF-8 for a single character. @@ -1036,12 +1036,8 @@ code points; and C> for a more customized definition. Use C>, C>, and C> to check entire strings. -Note that it is deprecated to use code points higher than what will fit in an -IV. This macro does not raise any warnings for such code points, treating them -as valid. - -Note also that a UTF-8 INVARIANT character (i.e. ASCII on non-EBCDIC machines) -is a valid UTF-8 character. +Note also that a UTF-8 "invariant" character (i.e. ASCII on non-EBCDIC +machines) is a valid UTF-8 character. =cut -- 1.8.3.1