=back
-All forms other than C<S<"use charnames ();">> also enable the use of
-C<\N{I<CHARNAME>}> sequences to compile a Unicode character into a
-string, based on its name.
+Starting in Perl 5.16, any occurrence of C<\N{I<CHARNAME>}> sequences
+in a double-quotish string automatically loads this module with arguments
+C<:full> and C<:short> (described below) if it hasn't already been loaded with
+different arguments, in order to compile the named Unicode character into
+position in the string. Prior to 5.16, an explicit S<C<use charnames>> was
+required to enable this usage. (However, prior to 5.16, the form C<S<"use
+charnames ();">> did not enable C<\N{I<CHARNAME>}>.)
Note that C<\N{U+I<...>}>, where the I<...> is a hexadecimal number,
-also inserts a character into a string, but doesn't require the use of
-this pragma. The character it inserts is the one whose code point
+also inserts a character into a string.
+The character it inserts is the one whose code point
(ordinal value) is equal to the number. For example, C<"\N{U+263a}"> is
-the Unicode (white background, black foreground) smiley face; it doesn't
-require this pragma, whereas the equivalent, C<"\N{WHITE SMILING FACE}">
-does.
+the Unicode (white background, black foreground) smiley face
+equivalent to C<"\N{WHITE SMILING FACE}">.
Also note, C<\N{I<...>}> can mean a regex quantifier instead of a character
name, when the I<...> is a number (or comma separated pair of numbers
(see L<perlreref/QUANTIFIERS>), and is not related to this pragma.
# ---- For the alias extensions
require "../t/lib/common.pl";
-use charnames ':full';
-
-is("Here\N{EXCLAMATION MARK}?", "Here!?");
+is("Here\N{EXCLAMATION MARK}?", "Here!?", "Basic sanity, autoload of :full upon \\N");
+is("\N{latin: Q}", "Q", "autoload of :short upon \\N");
{
use bytes; # TEST -utf8 can switch utf8 on
# to differentiate between it and gc=c, which can be written as 'isc',
# which is the same characters as ISO_Comment's short name.
- 'Name' => "Accessible via 'use charnames;' or Unicode::UCD::prop_invmap()",
+ 'Name' => "Accessible via \\N{...} or 'use charnames;' or Unicode::UCD::prop_invmap()",
'Simple_Case_Folding' => "$simple. Can access this through Unicode::UCD::casefold or Unicode::UCD::prop_invmap()",
'Simple_Lowercase_Mapping' => "$simple. Can access this through Unicode::UCD::charinfo or Unicode::UCD::prop_invmap()",
expressions.
And, the Name and Name_Aliases properties are accessible through the C<\\N{}>
-interpolation in double-quoted strings and regular expressions, but both
-usages require a L<use charnames;|charnames> to be specified, which also
-contains related functions viacode(), vianame(), and string_vianame().
+interpolation in double-quoted strings and regular expressions; and functions
+C<charnames::viacode()>, C<charnames::vianame()>, and
+C<charnames::string_vianame()> (which require a C<use charnames ();> to be
+specified.
Finally, most properties related to decomposition are accessible via
L<Unicode::Normalize>.
[ List each enhancement as a =head2 entry ]
+=head2 C<use charnames> no longer needed for C<\N{I<name>}>
+
+The C<charnames> module is now automatically loaded when needed as if
+the C<:full> and C<:short> options had been specified. See
+L<charnames>.
+
=head1 Security
XXX Any security-related notices go here. In particular, any security
(F) The parser found inconsistencies either while attempting to define
an overloaded constant, or when trying to find the character name
specified in the C<\N{...}> escape. Perhaps you forgot to load the
-corresponding C<overload> or C<charnames> pragma? See L<charnames> and
-L<overload>.
+corresponding L<overload> pragma?.
=item Constant(%s)%s: %s in regex; marked by <-- HERE in m/%s/
(F) The parser found inconsistencies while attempting to find
-the character name specified in the C<\N{...}> escape. Perhaps you
-forgot to load the corresponding C<charnames> pragma?
-See L<charnames>.
+the character name specified in the C<\N{...}> escape.
=item Constant is not %s reference
Certain sequences of characters also have names.
To specify by name, the name of the character or character sequence goes
-between the curly braces. In this case, you have to C<use charnames> to
-load the Unicode names of the characters; otherwise Perl will complain.
+between the curly braces.
To specify a character by Unicode code point, use the form C<\N{U+I<code
point>}>, where I<code point> is a number in hexadecimal that gives the
=head4 Example
- use charnames ':full'; # Loads the Unicode names.
$str =~ /\N{THAI CHARACTER SO SO}/; # Matches the Thai SO SO character
use charnames 'Cyrillic'; # Loads Cyrillic names.
represent or match the astrological sign for the planet Mercury, we
could use
- use charnames ":full"; # use named chars with Unicode full names
$x = "abc\N{MERCURY}def";
$x =~ /\N{MERCURY}/; # matches
-One can also use short names or restrict names to a certain alphabet:
+One can also use "short" names:
- use charnames ':full';
print "\N{GREEK SMALL LETTER SIGMA} is called sigma.\n";
-
- use charnames ":short";
print "\N{greek:Sigma} is an upper-case sigma.\n";
+You can also restrict names to a certain alphabet by specifying the
+L<charnames> pragma:
+
use charnames qw(greek);
print "\N{sigma} is Greek sigma\n";
character class, which is the negation of the C<\p{name}> class. For
example, to match lower and uppercase characters,
- use charnames ":full"; # use named chars with Unicode full names
$x = "BOB";
$x =~ /^\p{IsUpper}/; # matches, uppercase char class
$x =~ /^\P{IsUpper}/; # doesn't match, char class sans uppercase
character rather than the Unicode one, thus it is more portable to use
C<\N{U+...}> instead.
-Additionally, if you
-
- use charnames ':full';
-
-you can use the C<\N{...}> notation and put the official Unicode
-character name within the braces, such as C<\N{WHITE SMILING FACE}>.
-See L<charnames>.
+Additionally, you can use the C<\N{...}> notation and put the official
+Unicode character name within the braces, such as
+C<\N{WHITE SMILING FACE}>. This automatically loads the L<charnames>
+module with the C<:full> and C<:short> options. If you prefer different
+options for this module, you can instead, before the C<\N{...}>,
+explicitly load it with your desired options; for example,
+
+ use charnames ':loose';
=item *
characters regardless of the numeric value, use C<pack("U", ...)>
instead of C<\x..>, C<\x{...}>, or C<chr()>.
-You can also use the C<charnames> pragma to invoke characters
+You can invoke characters
by name in double-quoted strings:
- use charnames ':full';
my $arabic_alef = "\N{ARABIC LETTER ALEF}";
And, as mentioned above, you can also C<pack()> numbers into Unicode
Note that Perl considers grapheme clusters to be separate characters, so for
example
- use charnames ':full';
print length("\N{LATIN CAPITAL LETTER A}\N{COMBINING ACUTE ACCENT}"), "\n";
will print 2, not 1. The only exception is that regular expressions
-Tests for use charnames with aliases.
-(With the exception of the first test, which otherwise would need its own file)
+Tests for use charnames with compilation errors and aliases.
__END__
# unsupported pragma
use warnings;
OPTIONS regex
unsupported special ':scoobydoo' in charnames at
########
+# NAME autoload doesn't get vianame
+print "Here: \N{DIGIT ONE}\n";
+charnames::vianame("DIGIT TWO");
+EXPECT
+Undefined subroutine &charnames::vianame called at - line 2.
+Here: 1
+########
+# NAME autoload doesn't get viacode
+print "Here: \N{DIGIT THREE}\n";
+charnames::viacode(0x34);
+EXPECT
+OPTIONS regex
+Undefined subroutine &charnames::viacode called at - line 2.
+Here: 3
+########
+# NAME autoload doesn't get string_vianame
+print "Here: \N{DIGIT FOUR}\n";
+charnames::string_vianame("DIGIT FIVE");
+EXPECT
+OPTIONS regex
+Undefined subroutine &charnames::string_vianame called at - line 2.
+Here: 4
+########
# wrong type of alias (missing colon)
no warnings;
use charnames "alias";
SV *sv, SV *pv, const char *type, STRLEN typelen)
{
dVAR; dSP;
- HV * const table = GvHV(PL_hintgv); /* ^H */
+ HV * table = GvHV(PL_hintgv); /* ^H */
SV *res;
SV **cvp;
SV *cv, *typesv;
if (PL_error_count > 0 && strEQ(key,"charnames"))
return &PL_sv_undef;
- if (!table || !(PL_hints & HINT_LOCALIZE_HH)) {
+ if (!table
+ || ! (PL_hints & HINT_LOCALIZE_HH)
+ || ! (cvp = hv_fetch(table, key, keylen, FALSE))
+ || ! SvOK(*cvp))
+ {
SV *msg;
- why2 = (const char *)
- (strEQ(key,"charnames")
- ? "(possibly a missing \"use charnames ...\")"
- : "");
- msg = Perl_newSVpvf(aTHX_ "Constant(%s) unknown: %s",
- (type ? type: "undef"), why2);
-
- /* This is convoluted and evil ("goto considered harmful")
- * but I do not understand the intricacies of all the different
- * failure modes of %^H in here. The goal here is to make
- * the most probable error message user-friendly. --jhi */
-
- goto msgdone;
-
+ /* Here haven't found what we're looking for. If it is charnames,
+ * perhaps it needs to be loaded. Try doing that before giving up */
+ if (strEQ(key,"charnames")) {
+ Perl_load_module(aTHX_
+ 0,
+ newSVpvs("_charnames"),
+ /* version parameter; no need to specify it, as if
+ * we get too early a version, will fail anyway,
+ * not being able to find '_charnames' */
+ NULL,
+ newSVpvs(":full"),
+ newSVpvs(":short"),
+ NULL);
+ SPAGAIN;
+ table = GvHV(PL_hintgv);
+ if (table
+ && (PL_hints & HINT_LOCALIZE_HH)
+ && (cvp = hv_fetch(table, key, keylen, FALSE))
+ && SvOK(*cvp))
+ {
+ goto now_ok;
+ }
+ }
+ if (!table || !(PL_hints & HINT_LOCALIZE_HH)) {
+ msg = Perl_newSVpvf(aTHX_
+ "Constant(%s) unknown", (type ? type: "undef"));
+ }
+ else {
+ why1 = "$^H{";
+ why2 = key;
+ why3 = "} is not defined";
report:
msg = Perl_newSVpvf(aTHX_ "Constant(%s): %s%s%s",
(type ? type: "undef"), why1, why2, why3);
- msgdone:
+ }
yyerror(SvPVX_const(msg));
SvREFCNT_dec(msg);
return sv;
}
-
- cvp = hv_fetch(table, key, keylen, FALSE);
- if (!cvp || !SvOK(*cvp)) {
- why1 = "$^H{";
- why2 = key;
- why3 = "} is not defined";
- goto report;
- }
+now_ok:
sv_2mortal(sv); /* Parent created it permanently */
cv = *cvp;
if (!pv && s)