}
# Look at each alias
+ my $is_last_resort = 0;
+ my $deprecated_or_discouraged
+ = qr/ ^ (?: $DEPRECATED | $DISCOURAGED ) $/x;
foreach my $alias ($self->aliases()) {
# Don't use an alias that isn't ok to use for an external name.
my $name = main::Standardize($alias->name);
trace $self, $name if main::DEBUG && $to_trace;
- # Take the first one, or a shorter one that isn't numeric. This
+ # Take the first one, or any non-deprecated non-discouraged one
+ # over one that is, or a shorter one that isn't numeric. This
# relies on numeric aliases always being last in the array
# returned by aliases(). Any alpha one will have precedence.
- if (! defined $short_name{$addr}
+ if ( ! defined $short_name{$addr}
+ || ( $is_last_resort
+ && $alias->status !~ $deprecated_or_discouraged)
|| ($name =~ /\D/
&& length($name) < length($short_name{$addr})))
{
($short_name{$addr} = $name) =~ s/ (?<= . ) _ (?= . ) //xg;
$nominal_short_name_length{$addr} = length $name;
+ $is_last_resort = $alias->status =~ $deprecated_or_discouraged;
}
}
# If the short name isn't a nice one, perhaps an equivalent table has
# a better one.
- if (! defined $short_name{$addr}
- || $short_name{$addr} eq ""
- || $short_name{$addr} eq "_")
+ if ( $self->can('children')
+ && ( ! defined $short_name{$addr}
+ || $short_name{$addr} eq ""
+ || $short_name{$addr} eq "_"))
{
my $return;
foreach my $follower ($self->children) { # All equivalents
my $status = $alias->status;
if ($nominal_property == $block) {
- # For block properties, the 'In' form is preferred for
- # external use; the pod file contains wild cards for
- # this and the 'Is' form so no entries for those; and
- # we don't want people using the name without the
- # 'In', so discourage that.
+ # For block properties, only the compound form is
+ # preferred for external use; the others are
+ # discouraged. The pod file contains wild cards for
+ # the 'In' and 'Is' forms so no entries for those; and
+ # we don't want people using the name without any
+ # prefix, so discourage that.
if ($prefix eq "") {
$make_re_pod_entry = 1;
$status = $status || $DISCOURAGED;
}
elsif ($prefix eq 'In_') {
$make_re_pod_entry = 0;
- $status = $status || $NORMAL;
+ $status = $status || $DISCOURAGED;
$ok_as_filename = 1;
}
else {
# And if this is a compound form name, see if there is a
# single form equivalent
my $single_form;
- if ($table_property != $perl) {
+ if ($table_property != $perl && $table_property != $block) {
# Special case the binary N tables, so that will print
# \P{single}, but use the Y table values to populate
'\p{Block: *}'
. (($has_In_conflicts)
? " $exception_message"
- : ""));
+ : ""),
+ $DISCOURAGED);
@block_warning = << "END";
-Matches in the Block property have shortcuts that begin with "In_". For
-example, C<\\p{Block=Latin1}> can be written as C<\\p{In_Latin1}>. For
-backward compatibility, if there is no conflict with another shortcut, these
-may also be written as C<\\p{Latin1}> or C<\\p{Is_Latin1}>. But, N.B., there
-are numerous such conflicting shortcuts. Use of these forms for Block is
-discouraged, and are flagged as such, not only because of the potential
-confusion as to what is meant, but also because a later release of Unicode may
-preempt the shortcut, and your program would no longer be correct. Use the
-"In_" form instead to avoid this, or even more clearly, use the compound form,
-e.g., C<\\p{blk:latin1}>. See L<perlunicode/"Blocks"> for more information
-about this.
+In particular, matches in the Block property have single forms
+defined by Perl that begin with C<"In_">, C<"Is_>, or even with no prefix at
+all, Like all B<DISCOURAGED> forms, these are not stable. For example,
+C<\\p{Block=Deseret}> can currently be written as C<\\p{In_Deseret}>,
+C<\\p{Is_Deseret}>, or C<\\p{Deseret}>. But, a new Unicode version may
+come along that would force Perl to change the meaning of one or more of
+these, and your program would no longer be correct. Currently there are no
+such conflicts with the form that begins C<"In_">, but there are many with the
+other two shortcuts, and Unicode continues to define new properties that begin
+with C<"In">, so it's quite possible that a conflict will occur in the future.
+The compound form is guaranteed to not become obsolete, and its meaning is
+clearer anyway. See L<perlunicode/"Blocks"> for more information about this.
END
}
my $text = $Is_flags_text;
obsolete. Generally this designation is given to properties that Unicode once
used for internal purposes (but not any longer).
-=back
+=item Discouraged
+
+This is not actually a Unicode-specified obsolescence, but applies to certain
+Perl extensions that are present for backwards compatibility, but are
+discouraged from being used. These are not obsolete, but their meanings are
+not stable. Future Unicode versions could force any of these extensions to be
+removed without warning, replaced by another property with the same name that
+means something different. $A_bold_discouraged flags each such entry in the
+table. Use the equivalent shown instead.
-Some Perl extensions are present for backwards compatibility and are
-discouraged from being used, but are not obsolete. $A_bold_discouraged
-flags each such entry in the table. Future Unicode versions may force
-some of these extensions to be removed without warning, replaced by another
-property with the same name that means something different. Use the
-equivalent shown instead.
+@block_warning
=back
-@block_warning
+=back
The table below has two columns. The left column contains the C<\\p{}>
constructs to look up, possibly preceded by the flags mentioned above; and
Block names are matched in the compound form, like C<\p{Block: Arrows}> or
C<\p{Blk=Hebrew}>. Unlike most other properties, only a few block names have a
-Unicode-defined short name. But Perl does provide a (slight, no longer
-recommended) shortcut: You can say, for example C<\p{In_Arrows}> or
-C<\p{In_Hebrew}>.
-
-For backwards compatibility, the C<In> prefix may be
-omitted if there is no naming conflict with a script or any other
-property, and you can even use an C<Is> prefix instead in those cases.
-But don't do this for new code because your code could break in new
-releases, and this has already happened: There was a time in very
-early Unicode releases when C<\p{Hebrew}> would have matched the
-I<block> Hebrew; now it doesn't.
-
-Using the C<In> prefix avoids this ambiguity, so far. But new versions
-of Unicode continue to add new properties whose names begin with C<In>.
-There is a possibility that one of them someday will conflict with your
-usage. Since this is just a Perl extension, Unicode's name will take
-precedence and your code will become broken. Also, Unicode is free to
-add a script whose name begins with C<In>; that would cause problems.
-
-So it's clearer and best to use the compound form when specifying
-blocks. And be sure that is what you really really want to do. In most
-cases scripts are what you want instead.
-
-A complete list of blocks and their shortcuts is in L<perluniprops>.
+Unicode-defined short name.
+
+Perl also defines single form synonyms for the block property in cases
+where these do not conflict with something else. But don't use any of
+these, because they are unstable. Since these are Perl extensions, they
+are subordinate to official Unicode property names; Unicode doesn't know
+nor care about Perl's extensions. It may happen that a name that
+currently means the Perl extension will later be changed without warning
+to mean a different Unicode property in a future version of the perl
+interpreter that uses a later Unicode release, and your code would no
+longer work. The extensions are mentioned here for completeness: Take
+the block name and prefix it with one of: C<In> (for example
+C<\p{Blk=Arrows}> can currently be written as C<\p{In_Arrows}>); or
+sometimes C<Is> (like C<\p{Is_Arrows}>); or sometimes no prefix at all
+(C<\p{Arrows}>). As of this writing (Unicode 8.0) there are no
+conflicts with using the C<In_> prefix, but there are plenty with the
+other two forms. For example, C<\p{Is_Hebrew}> and C<\p{Hebrew}> mean
+C<\p{Script=Hebrew}> which is NOT the same thing as C<\p{Blk=Hebrew}>. Our
+advice used to be to use the C<In_> prefix as a single form way of
+specifying a block. But Unicode 8.0 added properties whose names begin
+with C<In>, and it's now clear that it's only luck that's so far
+prevented a conflict. Using C<In> is only marginally less typing than
+C<Blk:>, and the latter's meaning is clearer anyway, and guaranteed to
+never conflict. So don't take chances. Use C<\p{Blk=foo}> for new
+code. And be sure that block is what you really really want to do. In
+most cases scripts are what you want instead.
+
+A complete list of blocks is in L<perluniprops>.
=head3 B<Other Properties>