can warn and/or disallow these extremely high code points, even if other
above-Unicode ones are accepted. They are the C<UNICODE_WARN_PERL_EXTENDED>
and C<UNICODE_DISALLOW_PERL_EXTENDED> flags. For more information see
-L</C<UTF8_GOT_PERL_EXTENDED>>. Of course C<UNICODE_DISALLOW_SUPER> will
+C<L</UTF8_GOT_PERL_EXTENDED>>. Of course C<UNICODE_DISALLOW_SUPER> will
treat all above-Unicode code points, including these, as malformations. (Note
that the Unicode standard considers anything above 0x10FFFF to be illegal, but
there are standards predating it that allow up to 0x7FFF_FFFF (2**31 -1))
can warn and/or disallow these extremely high code points, even if other
above-Unicode ones are accepted. They are the C<UTF8_WARN_PERL_EXTENDED> and
C<UTF8_DISALLOW_PERL_EXTENDED> flags. For more information see
-L</C<UTF8_GOT_PERL_EXTENDED>>. Of course C<UTF8_DISALLOW_SUPER> will treat all
+C<L</UTF8_GOT_PERL_EXTENDED>>. Of course C<UTF8_DISALLOW_SUPER> will treat all
above-Unicode code points, including these, as malformations.
(Note that the Unicode standard considers anything above 0x10FFFF to be
illegal, but there are standards predating it that allow up to 0x7FFF_FFFF
The input sequence was malformed in that a non-continuation type byte was found
in a position where only a continuation type one should be. See also
-L</C<UTF8_GOT_SHORT>>.
+C<L</UTF8_GOT_SHORT>>.
=item C<UTF8_GOT_OVERFLOW>
}
/*
-=for comment
-skip apidoc
-This is not currently externally documented because we don't want people to use
-it for now. XXX Perhaps that is too paranoid, and it should be documented?
-
=for apidoc bytes_from_utf8_loc
-Like C<L</bytes_from_utf8>()>, but takes an extra parameter, a pointer to where
-to store the location of the first character in C<"s"> that cannot be
+Like C<L<perlapi/bytes_from_utf8>()>, but takes an extra parameter, a pointer
+to where to store the location of the first character in C<"s"> that cannot be
converted to non-UTF8.
If that parameter is C<NULL>, this function behaves identically to
If the entire input string was converted, C<*is_utf8p> is set to a FALSE value,
and C<*first_non_downgradable> is set to C<NULL>.
-Otherwise, C<*first_non_downgradable> set to point to the first byte of the
+Otherwise, C<*first_non_downgradable> is set to point to the first byte of the
first character in the original string that wasn't converted. C<*is_utf8p> is
unchanged. Note that the new string may have length 0.
bool
Perl__is_uni_FOO(pTHX_ const U8 classnum, const UV c)
{
- dVAR;
return _invlist_contains_cp(PL_XPosix_ptrs[classnum], c);
}
bool
Perl__is_uni_perl_idcont(pTHX_ UV c)
{
- dVAR;
return _invlist_contains_cp(PL_utf8_perl_idcont, c);
}
bool
Perl__is_uni_perl_idstart(pTHX_ UV c)
{
- dVAR;
return _invlist_contains_cp(PL_utf8_perl_idstart, c);
}
* The ordinal of the first character of the changed version is returned
* (but note, as explained above, that there may be more.) */
- dVAR;
PERL_ARGS_ASSERT_TO_UNI_UPPER;
if (c < 256) {
UV
Perl_to_uni_title(pTHX_ UV c, U8* p, STRLEN *lenp)
{
- dVAR;
PERL_ARGS_ASSERT_TO_UNI_TITLE;
if (c < 256) {
UV
Perl_to_uni_lower(pTHX_ UV c, U8* p, STRLEN *lenp)
{
- dVAR;
PERL_ARGS_ASSERT_TO_UNI_LOWER;
if (c < 256) {
* FOLD_FLAGS_NOMIX_ASCII iff non-ASCII to ASCII folds are prohibited
*/
- dVAR;
PERL_ARGS_ASSERT__TO_UNI_FOLD_FLAGS;
if (flags & FOLD_FLAGS_LOCALE) {
bool
Perl__is_utf8_FOO(pTHX_ const U8 classnum, const U8 *p, const U8 * const e)
{
- dVAR;
PERL_ARGS_ASSERT__IS_UTF8_FOO;
return is_utf8_common(p, e, PL_XPosix_ptrs[classnum]);
bool
Perl__is_utf8_perl_idstart(pTHX_ const U8 *p, const U8 * const e)
{
- dVAR;
PERL_ARGS_ASSERT__IS_UTF8_PERL_IDSTART;
return is_utf8_common(p, e, PL_utf8_perl_idstart);
bool
Perl__is_utf8_perl_idcont(pTHX_ const U8 *p, const U8 * const e)
{
- dVAR;
PERL_ARGS_ASSERT__IS_UTF8_PERL_IDCONT;
return is_utf8_common(p, e, PL_utf8_perl_idcont);
* constructed with this size (to save space and memory), and we return
* pointers, so they must be this size */
- dVAR;
/* 'index' is guaranteed to be non-negative, as this is an inversion map
* that covers all possible inputs. See [perl #133365] */
SSize_t index = _invlist_search(PL_utf8_foldclosures, cp);
* sequence, and the entire sequence will be stored in *ustrp. ustrp will
* contain *lenp bytes */
- dVAR;
PERL_ARGS_ASSERT_TURKIC_LC;
assert(e > p0);
STRLEN *lenp,
bool flags)
{
- dVAR;
UV result;
PERL_ARGS_ASSERT__TO_UTF8_UPPER_FLAGS;
STRLEN *lenp,
bool flags)
{
- dVAR;
UV result;
PERL_ARGS_ASSERT__TO_UTF8_TITLE_FLAGS;
STRLEN *lenp,
bool flags)
{
- dVAR;
UV result;
PERL_ARGS_ASSERT__TO_UTF8_LOWER_FLAGS;
STRLEN *lenp,
U8 flags)
{
- dVAR;
UV result;
PERL_ARGS_ASSERT__TO_UTF8_FOLD_FLAGS;