with no suffix C<"_A">. This variant is used to emphasize by its name that
only ASCII-range characters can return TRUE.
-Variant C<isI<FOO>_L1> imposes the Latin-1 (or EBCDIC equivlalent) character set
+Variant C<isI<FOO>_L1> imposes the Latin-1 (or EBCDIC equivalent) character set
onto the platform. That is, the code points that are ASCII are unaffected,
since ASCII is a subset of Latin-1. But the non-ASCII code points are treated
as if they are Latin-1 characters. For example, C<isWORDCHAR_L1()> will return
(but note, as explained at L<the top of this section|/Character case
changing>, that there may be more.)
-=for apidoc Am|UV|toUPPER_utf8|U8* p|U8* s|STRLEN* lenp
-Converts the UTF-8 encoded character at C<p> to its uppercase version, and
+=for apidoc Am|UV|toUPPER_utf8_safe|U8* p|U8* e|U8* s|STRLEN* lenp
+Converts the first UTF-8 encoded character in the sequence starting at C<p> and
+extending no further than S<C<e - 1>> to its uppercase version, and
stores that in UTF-8 in C<s>, and its length in bytes in C<lenp>. Note
that the buffer pointed to by C<s> needs to be at least C<UTF8_MAXBYTES_CASE+1>
bytes since the uppercase version may be longer than the original character.
(but note, as explained at L<the top of this section|/Character case
changing>, that there may be more).
-The input character at C<p> is assumed to be well-formed.
+The suffix C<_safe> in the function's name indicates that it will not attempt
+to read beyond S<C<e - 1>>, provided that the constraint S<C<s E<lt> e>> is
+true (this is asserted for in C<-DDEBUGGING> builds). If the UTF-8 for the
+input character is malformed in some way, the program may croak, or the
+function may return the REPLACEMENT CHARACTER, at the discretion of the
+implementation, and subject to change in future releases.
+
+=for apidoc Am|UV|toUPPER_utf8|U8* p|U8* s|STRLEN* lenp
+This is like C<L</toUPPER_utf8_safe>>, but doesn't have the C<e>
+parameter The function therefore can't check if it is reading
+beyond the end of the string. Starting in Perl v5.30, it will take the C<e>
+parameter, becoming a synonym for C<toUPPER_utf8_safe>. At that time every
+program that uses it will have to be changed to successfully compile. In the
+meantime, the first runtime call to C<toUPPER_utf8> from each call point in the
+program will raise a deprecation warning, enabled by default. You can convert
+your program now to use C<toUPPER_utf8_safe>, and avoid the warnings, and get an
+extra measure of protection, or you can wait until v5.30, when you'll be forced
+to add the C<e> parameter.
=for apidoc Am|U8|toFOLD|U8 ch
Converts the specified character to foldcase. If the input is anything but an
(but note, as explained at L<the top of this section|/Character case
changing>, that there may be more).
-=for apidoc Am|UV|toFOLD_utf8|U8* p|U8* s|STRLEN* lenp
-Converts the UTF-8 encoded character at C<p> to its foldcase version, and
+=for apidoc Am|UV|toFOLD_utf8_safe|U8* p|U8* e|U8* s|STRLEN* lenp
+Converts the first UTF-8 encoded character in the sequence starting at C<p> and
+extending no further than S<C<e - 1>> to its foldcase version, and
stores that in UTF-8 in C<s>, and its length in bytes in C<lenp>. Note
that the buffer pointed to by C<s> needs to be at least C<UTF8_MAXBYTES_CASE+1>
bytes since the foldcase version may be longer than the original character.
(but note, as explained at L<the top of this section|/Character case
changing>, that there may be more).
-The input character at C<p> is assumed to be well-formed.
+The suffix C<_safe> in the function's name indicates that it will not attempt
+to read beyond S<C<e - 1>>, provided that the constraint S<C<s E<lt> e>> is
+true (this is asserted for in C<-DDEBUGGING> builds). If the UTF-8 for the
+input character is malformed in some way, the program may croak, or the
+function may return the REPLACEMENT CHARACTER, at the discretion of the
+implementation, and subject to change in future releases.
+
+=for apidoc Am|UV|toFOLD_utf8|U8* p|U8* s|STRLEN* lenp
+This is like C<L</toFOLD_utf8_safe>>, but doesn't have the C<e>
+parameter The function therefore can't check if it is reading
+beyond the end of the string. Starting in Perl v5.30, it will take the C<e>
+parameter, becoming a synonym for C<toFOLD_utf8_safe>. At that time every
+program that uses it will have to be changed to successfully compile. In the
+meantime, the first runtime call to C<toFOLD_utf8> from each call point in the
+program will raise a deprecation warning, enabled by default. You can convert
+your program now to use C<toFOLD_utf8_safe>, and avoid the warnings, and get an
+extra measure of protection, or you can wait until v5.30, when you'll be forced
+to add the C<e> parameter.
=for apidoc Am|U8|toLOWER|U8 ch
Converts the specified character to lowercase. If the input is anything but an
(but note, as explained at L<the top of this section|/Character case
changing>, that there may be more).
-=for apidoc Am|UV|toLOWER_utf8|U8* p|U8* s|STRLEN* lenp
-Converts the UTF-8 encoded character at C<p> to its lowercase version, and
+
+=for apidoc Am|UV|toLOWER_utf8_safe|U8* p|U8* e|U8* s|STRLEN* lenp
+Converts the first UTF-8 encoded character in the sequence starting at C<p> and
+extending no further than S<C<e - 1>> to its lowercase version, and
stores that in UTF-8 in C<s>, and its length in bytes in C<lenp>. Note
that the buffer pointed to by C<s> needs to be at least C<UTF8_MAXBYTES_CASE+1>
bytes since the lowercase version may be longer than the original character.
(but note, as explained at L<the top of this section|/Character case
changing>, that there may be more).
-The input character at C<p> is assumed to be well-formed.
+The suffix C<_safe> in the function's name indicates that it will not attempt
+to read beyond S<C<e - 1>>, provided that the constraint S<C<s E<lt> e>> is
+true (this is asserted for in C<-DDEBUGGING> builds). If the UTF-8 for the
+input character is malformed in some way, the program may croak, or the
+function may return the REPLACEMENT CHARACTER, at the discretion of the
+implementation, and subject to change in future releases.
+
+=for apidoc Am|UV|toLOWER_utf8|U8* p|U8* s|STRLEN* lenp
+This is like C<L</toLOWER_utf8_safe>>, but doesn't have the C<e>
+parameter The function therefore can't check if it is reading
+beyond the end of the string. Starting in Perl v5.30, it will take the C<e>
+parameter, becoming a synonym for C<toLOWER_utf8_safe>. At that time every
+program that uses it will have to be changed to successfully compile. In the
+meantime, the first runtime call to C<toLOWER_utf8> from each call point in the
+program will raise a deprecation warning, enabled by default. You can convert
+your program now to use C<toLOWER_utf8_safe>, and avoid the warnings, and get an
+extra measure of protection, or you can wait until v5.30, when you'll be forced
+to add the C<e> parameter.
=for apidoc Am|U8|toTITLE|U8 ch
Converts the specified character to titlecase. If the input is anything but an
(but note, as explained at L<the top of this section|/Character case
changing>, that there may be more).
-=for apidoc Am|UV|toTITLE_utf8|U8* p|U8* s|STRLEN* lenp
-Converts the UTF-8 encoded character at C<p> to its titlecase version, and
+=for apidoc Am|UV|toTITLE_utf8_safe|U8* p|U8* e|U8* s|STRLEN* lenp
+Converts the first UTF-8 encoded character in the sequence starting at C<p> and
+extending no further than S<C<e - 1>> to its titlecase version, and
stores that in UTF-8 in C<s>, and its length in bytes in C<lenp>. Note
that the buffer pointed to by C<s> needs to be at least C<UTF8_MAXBYTES_CASE+1>
bytes since the titlecase version may be longer than the original character.
(but note, as explained at L<the top of this section|/Character case
changing>, that there may be more).
-The input character at C<p> is assumed to be well-formed.
+The suffix C<_safe> in the function's name indicates that it will not attempt
+to read beyond S<C<e - 1>>, provided that the constraint S<C<s E<lt> e>> is
+true (this is asserted for in C<-DDEBUGGING> builds). If the UTF-8 for the
+input character is malformed in some way, the program may croak, or the
+function may return the REPLACEMENT CHARACTER, at the discretion of the
+implementation, and subject to change in future releases.
+
+=for apidoc Am|UV|toTITLE_utf8|U8* p|U8* s|STRLEN* lenp
+This is like C<L</toLOWER_utf8_safe>>, but doesn't have the C<e>
+parameter The function therefore can't check if it is reading
+beyond the end of the string. Starting in Perl v5.30, it will take the C<e>
+parameter, becoming a synonym for C<toTITLE_utf8_safe>. At that time every
+program that uses it will have to be changed to successfully compile. In the
+meantime, the first runtime call to C<toTITLE_utf8> from each call point in the
+program will raise a deprecation warning, enabled by default. You can convert
+your program now to use C<toTITLE_utf8_safe>, and avoid the warnings, and get an
+extra measure of protection, or you can wait until v5.30, when you'll be forced
+to add the C<e> parameter.
=cut
/* Because all controls are UTF-8 invariants in EBCDIC, we can use this
* more efficient macro instead of the more general one */
# define isCNTRL_utf8_safe(p, e) \
- (__ASSERT_(_utf8_safe_assert(p, e)) isCNTRL_L1(*(p))
+ (__ASSERT_(_utf8_safe_assert(p, e)) isCNTRL_L1(*(p)))
#else
# define isCNTRL_utf8_safe(p, e) _generic_utf8_safe(_CC_CNTRL, p, e, 0)
#endif
#define toUPPER_utf8(p,s,l) to_utf8_upper(p,s,l)
/* For internal core use only, subject to change */
-#define _toFOLD_utf8_flags(p,s,l,f) _to_utf8_fold_flags (p,s,l,f)
-#define _toLOWER_utf8_flags(p,s,l,f) _to_utf8_lower_flags(p,s,l,f)
-#define _toTITLE_utf8_flags(p,s,l,f) _to_utf8_title_flags(p,s,l,f)
-#define _toUPPER_utf8_flags(p,s,l,f) _to_utf8_upper_flags(p,s,l,f)
+#define _toFOLD_utf8_flags(p,e,s,l,f) _to_utf8_fold_flags (p,e,s,l,f, "", 0)
+#define _toLOWER_utf8_flags(p,e,s,l,f) _to_utf8_lower_flags(p,e,s,l,f, "", 0)
+#define _toTITLE_utf8_flags(p,e,s,l,f) _to_utf8_title_flags(p,e,s,l,f, "", 0)
+#define _toUPPER_utf8_flags(p,e,s,l,f) _to_utf8_upper_flags(p,e,s,l,f, "", 0)
+
+#define toFOLD_utf8_safe(p,e,s,l) _toFOLD_utf8_flags(p,e,s,l, FOLD_FLAGS_FULL)
+#define toLOWER_utf8_safe(p,e,s,l) _toLOWER_utf8_flags(p,e,s,l, 0)
+#define toTITLE_utf8_safe(p,e,s,l) _toTITLE_utf8_flags(p,e,s,l, 0)
+#define toUPPER_utf8_safe(p,e,s,l) _toUPPER_utf8_flags(p,e,s,l, 0)
/* For internal core Perl use only: the base macros for defining macros like
* isALPHA_LC_utf8. These are like _generic_utf8, but if the first code point