the published Unicode rules; otherwise, it uses the C library function that
gives the named classification. For example, C<isDIGIT_LC()> when not in a
UTF-8 locale returns the result of calling C<isdigit()>. FALSE is always
-returned if the input won't fit into an octet.
+returned if the input won't fit into an octet. On some platforms where the C
+library function is known to be defective, Perl changes its result to follow
+the POSIX standard's rules.
Variant C<isFOO_LC_uvchr> is like C<isFOO_LC>, but is defined on any UV. It
returns the same as C<isFOO_LC> for input code points less than 256, and
alphanumeric.
See the L<top of this section|/Character classification> for an explanation of
variants
-C<isWORDCHAR_A>, C<isWORDCHAR_L1>, C<isWORDCHAR_uni>, C<isWORDCHAR_utf8>,
-C<isWORDCHAR_LC>, C<isWORDCHAR_LC_uvchr>, and C<isWORDCHAR_LC_utf8>.
+C<isWORDCHAR_A>, C<isWORDCHAR_L1>, C<isWORDCHAR_uni>, and C<isWORDCHAR_utf8>.
+C<isWORDCHAR_LC>, C<isWORDCHAR_LC_uvchr>, and C<isWORDCHAR_LC_utf8> are also as
+described there, but additionally include the platform's native underscore.
=for apidoc Am|bool|isXDIGIT|char ch
Returns a boolean indicating whether the specified character is a hexadecimal
# define _CC_QUOTEMETA 21
# define _CC_NON_FINAL_FOLD 22
# define _CC_IS_IN_SOME_FOLD 23
-/* Unused: 24-31
+# define _CC_MNEMONIC_CNTRL 24
+/* Unused: 25-31
* If more bits are needed, one could add a second word for non-64bit
* QUAD_IS_INT systems, using some #ifdefs to distinguish between having a 2nd
* word or not. The IS_IN_SOME_FOLD bit is the most easily expendable, as it
# define isALPHANUMERIC_A(c) _generic_isCC_A(c, _CC_ALPHANUMERIC)
# define isBLANK_A(c) _generic_isCC_A(c, _CC_BLANK)
# define isCNTRL_A(c) _generic_isCC_A(c, _CC_CNTRL)
-# define isDIGIT_A(c) _generic_isCC(c, _CC_DIGIT)
+# define isDIGIT_A(c) _generic_isCC(c, _CC_DIGIT) /* No non-ASCII digits */
# define isGRAPH_A(c) _generic_isCC_A(c, _CC_GRAPH)
# define isLOWER_A(c) _generic_isCC_A(c, _CC_LOWER)
# define isPRINT_A(c) _generic_isCC_A(c, _CC_PRINT)
# define isSPACE_A(c) _generic_isCC_A(c, _CC_SPACE)
# define isUPPER_A(c) _generic_isCC_A(c, _CC_UPPER)
# define isWORDCHAR_A(c) _generic_isCC_A(c, _CC_WORDCHAR)
-# define isXDIGIT_A(c) _generic_isCC(c, _CC_XDIGIT)
+# define isXDIGIT_A(c) _generic_isCC(c, _CC_XDIGIT) /* No non-ASCII xdigits */
# define isIDFIRST_A(c) _generic_isCC_A(c, _CC_IDFIRST)
# define isALPHA_L1(c) _generic_isCC(c, _CC_ALPHA)
# define isALPHANUMERIC_L1(c) _generic_isCC(c, _CC_ALPHANUMERIC)
_generic_isCC(c, _CC_NON_FINAL_FOLD)
# define _IS_IN_SOME_FOLD_ONLY_FOR_USE_BY_REGCOMP_DOT_C(c) \
_generic_isCC(c, _CC_IS_IN_SOME_FOLD)
+# define _IS_MNEMONIC_CNTRL_ONLY_FOR_USE_BY_REGCOMP_DOT_C(c) \
+ _generic_isCC(c, _CC_MNEMONIC_CNTRL)
#else /* else we don't have perl.h */
/* If we don't have perl.h, we are compiling a utility program. Below we
* compiler, this reduces to an AND and a TEST. On both EBCDIC and ASCII
* machines, 'A' and 'a' differ by a single bit; the same with the upper and
* lower case of all other ASCII-range alphabetics. On ASCII platforms, they
- * are 32 apart; on EBCDIC, they are 64. This uses an exclusive 'or' to find
- * that bit and then inverts it to form a mask, with just a single 0, in the
- * bit position where the upper- and lowercase differ. */
+ * are 32 apart; on EBCDIC, they are 64. At compile time, this uses an
+ * exclusive 'or' to find that bit and then inverts it to form a mask, with
+ * just a single 0, in the bit position where the upper- and lowercase differ.
+ * */
#define isALPHA_FOLD_EQ(c1, c2) \
(__ASSERT_(isALPHA_A(c1) || isALPHA_A(c2)) \
((c1) & ~('A' ^ 'a')) == ((c2) & ~('A' ^ 'a')))