parenthesized parts of a regular expression are saved under names
containing only digits after the C<$> (see L<perlop> and L<perlre>).
In addition, several special variables that provide windows into
-the inner working of Perl have names containing punctuation characters
-and control characters. These are documented in L<perlvar>.
+the inner working of Perl have names containing punctuation characters.
+These are documented in L<perlvar>.
X<variable, built-in>
Scalar values are always named with '$', even when referring to a
of this, see L<perlref>.
Names that start with a digit may contain only more digits. Names
-that do not start with a letter, underscore, digit or a caret (i.e.
-a control character) are limited to one character, e.g., C<$%> or
+that do not start with a letter, underscore, digit or a caret are
+limited to one character, e.g., C<$%> or
C<$$>. (Most of these one character names have a predefined
significance to Perl. For instance, C<$$> is the current process
-id.)
+id. And all such names are reserved for Perl's possible use.)
=head2 Identifier parsing
X<identifiers>
additionally accepts identfier names beginning with an underscore.
If not under C<use utf8>, the source is treated as ASCII + 128 extra
-controls, and identifiers should match
+generic characters, and identifiers should match
/ (?aa) (?!\d) \w+ /x
Meanwhile, special identifiers don't follow the above rules; For the most
part, all of the identifiers in this category have a special meaning given
by Perl. Because they have special parsing rules, these generally can't be
-fully-qualified. They come in four forms:
+fully-qualified. They come in six forms (but don't use forms 5 and 6):
=over
-=item *
+=item 1.
A sigil, followed solely by digits matching C<\p{POSIX_Digit}>, like
C<$0>, C<$1>, or C<$10000>.
-=item *
+=item 2.
-A sigil, followed by a caret and any one of the characters
-C<[][A-Z^_?\]>, like C<$^V> or C<$^]>, or a sigil followed by a literal non-space,
-non-C<NUL> control character matching the C<\p{POSIX_Cntrl}> property.
-Due to a historical oddity, if not running under C<use utf8>, the 128
-characters in the C<[0x80-0xff]> range are considered to be controls,
-and may also be used in length-one variables. However, the use of
-non-graphical characters is deprecated as of v5.22, and support for them
-will be removed in a future version of perl. ASCII space characters and
-C<NUL> already aren't allowed, so this means that a single-character
-variable name with that name being any other C0 control C<[0x01-0x1F]>,
-or C<DEL> will generate a deprecated warning. Already, under C<"use
-utf8">, non-ASCII characters must match C<Perl_XIDS>. As of v5.22, when
-not under C<"use utf8"> C1 controls C<[0x80-0x9F]>, NO BREAK SPACE, and
-SOFT HYPHEN (C<SHY>)) generate a deprecated warning.
-
-=item *
+A sigil followed by a single character matching the C<\p{POSIX_Punct}>
+property, like C<$!> or C<%+>, except the character C<"{"> doesn't work.
-Similar to the above, a sigil, followed by bareword text in braces,
-where the first character is either a caret followed by any one of
-the characters C<[][A-Z^_?\]>, like C<${^GLOBAL_PHASE}>, or a non-C<NUL>,
-non-space literal
-control like C<${\7LOBAL_PHASE}>. Like the above, when not under
-C<"use utf8">, the characters in C<[0x80-0xFF]> are considered controls, but as
-of v5.22, the use of any that are non-graphical are deprecated, and as
-of v5.20 the use of any ASCII-range literal control is deprecated.
-Support for these will be removed in a future version of perl.
+=item 3.
-=item *
+A sigil, followed by a caret and any one of the characters
+C<[][A-Z^_?\]>, like C<$^V> or C<$^]>.
-A sigil followed by a single character matching the C<\p{POSIX_Punct}>
-property, like C<$!> or C<%+>, except the character C<"{"> doesn't work.
+=item 4.
+
+Similar to the above, a sigil, followed by bareword text in braces,
+where the first character is a caret. The next character is any one of
+the characters C<[][A-Z^_?\]>, followed by ASCII word characters. An
+example is C<${^GLOBAL_PHASE}>.
+
+=item 5.
+
+A sigil, followed by any single character in the range C<[\x80-\xFF]>
+when not under C<S<"use utf8">>. (Under C<S<"use utf8">>, the normal
+identifier rules given earlier in this section apply.) Use of
+non-graphic characters (the C1 controls, the NO-BREAK SPACE, and the
+SOFT HYPHEN) is deprecated and will be forbidden in a future Perl
+version. The use of the other characters is unwise, as these are all
+reserved to have special meaning to Perl, and none of them currently
+do have special meaning, though this could change without notice.
+
+Note that an implication of this form is that there are identifiers only
+legal under C<S<"use utf8">>, and vice-versa, for example the identifier
+C<$E<233>tat> is legal under C<S<"use utf8">>, but is otherwise
+considered to be the single character variable C<$E<233>> followed by
+the bareword C<"tat">, the combination of which is a syntax error.
+
+=item 6.
+
+This is a combination of the previous two forms. It is valid only when
+not under S<C<"use utf8">> (normal identifier rules apply when under
+S<C<"use utf8">>). The form is a sigil, followed by text in braces,
+where the first character is any one of the characters in the range
+C<[\x80-\xFF]> followed by ASCII word characters up to the trailing
+brace.
+
+The same caveats as the previous form apply: The non-graphic characters
+are deprecated, it is unwise to use this form at all, and utf8ness makes
+a big difference.
=back
-Note that as of Perl 5.20, literal control characters in variable names
-are deprecated; and as of Perl 5.22, any other non-graphic characters
-are also deprecated.
+Prior to Perl v5.24, non-graphical ASCII control characters were also
+allowed in some situations; this had been deprecated since v5.20.
=head2 Context
X<context> X<scalar context> X<list context>
=head1 Incompatible Changes
-XXX For a release on a stable branch, this section aspires to be:
-
- There are no changes intentionally incompatible with 5.XXX.XXX
- If any exist, they are bugs, and we request that you submit a
- report. See L</Reporting Bugs> below.
-
-[ List each incompatible change as a =head2 entry ]
+=head2 ASCII characters in variable names must now be all visible
+
+It was legal until now on ASCII platforms for variable names to contain
+non-graphical ASCII control characters (ordinals 0 through 31, and 127,
+which are the C0 controls and C<DELETE>). This usage has been
+deprecated since v5.20, and as of now causes a syntax error. The
+variables these names referred to are special, reserved by Perl for
+whatever use it may choose, now, or in the future. Each such variable
+has an alternative way of spelling it. Instead of the single
+non-graphic control character, a two character sequence beginning with a
+caret is used, like C<$^]> and C<${^GLOBAL_PHASE}>. Details are at
+L<perlvar>. It remains legal, though unwise and deprecated (raising a
+deprecation warning), to use certain non-graphic non-ASCII characters in
+variables names when not under S<C<use utf8>>. No code should do this,
+as all such variables are reserved by Perl, and Perl doesn't currently
+define any of them (but could at any time, without notice).
=head2 The C<autoderef> feature has been removed
may contain letters, digits, underscores, or the special sequence
C<::> or C<'>. In this case, the part before the last C<::> or
C<'> is taken to be a I<package qualifier>; see L<perlmod>.
-
-Perl variable names may also be a sequence of digits or a single
-punctuation or control character (with the literal control character
-form deprecated). These names are all reserved for
+A Unicode letter that is not ASCII is not considered to be a letter
+unless S<C<"use utf8">> is in effect, and somewhat more complicated
+rules apply; see L<perldata/Identifier parsing> for details.
+
+Perl variable names may also be a sequence of digits, a single
+punctuation character, or the two-character sequence: C<^> (caret or
+CIRCUMFLEX ACCENT) followed by any one of the characters C<[][A-Z^_?\]>.
+These names are all reserved for
special uses by Perl; for example, the all-digits names are used
to hold data captured by backreferences after a regular expression
-match. Perl has a special syntax for the single-control-character
-names: It understands C<^X> (caret C<X>) to mean the control-C<X>
-character. For example, the notation C<$^W> (dollar-sign caret
-C<W>) is the scalar variable whose name is the single character
-control-C<W>. This is better than typing a literal control-C<W>
-into your program.
-
-Since Perl v5.6.0, Perl variable names may be alphanumeric strings that
-begin with a caret (or a control character, but this form is
-deprecated).
-These variables must be written in the form C<${^Foo}>; the braces
-are not optional. C<${^Foo}> denotes the scalar variable whose
-name is a control-C<F> followed by two C<o>'s. These variables are
+match.
+
+Since Perl v5.6.0, Perl variable names may also be alphanumeric strings
+preceded by a caret. These must all be written in the form C<${^Foo}>;
+the braces are not optional. C<${^Foo}> denotes the scalar variable
+whose name is considered to be a control-C<F> followed by two C<o>'s.
+These variables are
reserved for future special uses by Perl, except for the ones that
-begin with C<^_> (control-underscore or caret-underscore). No
-control-character name that begins with C<^_> will acquire a special
+begin with C<^_> (caret-underscore). No
+name that begins with C<^_> will acquire a special
meaning in any future version of Perl; such names may therefore be
used safely in programs. C<$^_> itself, however, I<is> reserved.
-Perl identifiers that begin with digits, control characters, or
+Perl identifiers that begin with digits or
punctuation characters are exempt from the effects of the C<package>
declaration and are always forced to be in package C<main>; they are
also exempt from C<strict 'vars'> errors. A few other names are also
Use of bare << to mean <<"" is deprecated at - line 2.
########
# toke.c
-BEGIN {
- if (ord('A') == 193) {
- print "SKIPPED\n# Literal control characters in variable names forbidden on EBCDIC";
- exit 0;
- }
-}
-eval "\$\cT";
-eval "\${\7LOBAL_PHASE}";
-eval "\${\cT}";
-eval "\${\n\cT}";
-eval "\${\cT\n}";
-my $ret = eval "\${\n\cT\n}";
-print "ok\n" if $ret == $^T;
-
-no warnings 'deprecated' ;
-eval "\$\cT";
-eval "\${\7LOBAL_PHASE}";
-eval "\${\cT}";
-eval "\${\n\cT}";
-eval "\${\cT\n}";
-eval "\${\n\cT\n}";
-
-EXPECT
-Use of literal control characters in variable names is deprecated at (eval 1) line 1.
-Use of literal control characters in variable names is deprecated at (eval 2) line 1.
-Use of literal control characters in variable names is deprecated at (eval 3) line 1.
-Use of literal control characters in variable names is deprecated at (eval 4) line 2.
-Use of literal control characters in variable names is deprecated at (eval 5) line 1.
-Use of literal control characters in variable names is deprecated at (eval 6) line 2.
-ok
-########
-# toke.c
$a =~ m/$foo/eq;
$a =~ s/$foo/fool/seq;
########
# toke.c
#[perl #119123] disallow literal control character variables
-BEGIN {
- if (ord('A') == 193) {
- print "SKIPPED\n# Literal control characters in variable names forbidden on EBCDIC";
- exit 0;
- }
-}
-eval "\$\cQ = 25";
-eval "\${ \cX } = 24";
*{
Foo
}; # shouldn't warn on {\n, even though \n is a control character
EXPECT
-Use of literal control characters in variable names is deprecated at (eval 1) line 1.
-Use of literal control characters in variable names is deprecated at (eval 2) line 1.
########
# toke.c
# [perl #120288] -X at start of line gave spurious warning, where X is not
use open qw( :utf8 :std );
no warnings qw(misc reserved);
-plan (tests => 66900);
+plan (tests => 66894);
# ${single:colon} should not be treated as a simple variable, but as a
# block with a label inside.
$syntax_error = 1;
}
elsif ($chr =~ /[[:cntrl:]]/a) {
- if ($chr eq "\N{NULL}") {
- $name = sprintf "\\x%02x, NUL", $ord;
- $syntax_error = 1;
- }
- else {
- $name = sprintf "\\x%02x, an ASCII control", $ord;
- $syntax_error = $::IS_EBCDIC;
- $deprecated = ! $syntax_error;
- }
+ $name = sprintf "\\x%02x, an ASCII control", $ord;
+ $syntax_error = 1;
}
elsif ($chr =~ /\pC/) {
if ($chr eq "\N{SHY}") {
" ... and the same under 'use utf8'");
$tests++;
}
- elsif ($ord < 32 || $chr =~ /[[:punct:][:digit:]]/a) {
+ elsif ($chr =~ /[[:punct:][:digit:]]/a) {
# Unlike other variables, we dare not try setting the length-1
- # variables that are \cX (for all valid X) nor ASCII ones that are
- # punctuation nor digits. This is because many of these variables
- # have meaning to the system, and setting them could have side
- # effects or not work as expected (And using fresh_perl() doesn't
- # always help.) For example, setting $^D (to use a visible
- # representation of code point 0x04) turns on tracing, and setting
- # $^E sets an error number, but what gets printed is instead a
- # string associated with that number. For all these we just
- # verify that they don't generate a syntax error.
+ # variables that are ASCII punctuation and digits. This is
+ # because many of these variables have meaning to the system, and
+ # setting them could have side effects or not work as expected
+ # (And using fresh_perl() doesn't always help.) For all these we
+ # just verify that they don't generate a syntax error.
local $@;
evalbytes "\$$chr;";
is $@, '', "$name as a length-1 variable doesn't generate a syntax error";
{
no strict;
- # Silence the deprecation warning for literal controls
- no warnings 'deprecated';
- for my $var ( '$', "\7LOBAL_PHASE", "^GLOBAL_PHASE", "^V" ) {
- SKIP: {
- skip("Literal control characters in variable names forbidden on EBCDIC", 3)
- if ($::IS_EBCDIC && ord substr($var, 0, 1) < 32);
+ for my $var ( '$', "^GLOBAL_PHASE", "^V" ) {
eval "\${ $var}";
is($@, '', "\${ $var} works" );
eval "\${$var }";
is($@, '', "\${$var } works" );
eval "\${ $var }";
is($@, '', "\${ $var } works" );
- }
}
+ my $var = "\7LOBAL_PHASE";
+ eval "\${ $var}";
+ like($@, qr/Unrecognized character \\x07/,
+ "\${ $var} generates 'Unrecognized character' error" );
+ eval "\${$var }";
+ like($@, qr/Unrecognized character \\x07/,
+ "\${$var } generates 'Unrecognized character' error" );
+ eval "\${ $var }";
+ like($@, qr/Unrecognized character \\x07/,
+ "\${ $var } generates 'Unrecognized character' error" );
}
}
);
}
- SKIP: {
- skip("Literal control characters in variable names forbidden on EBCDIC", 2)
- if $::IS_EBCDIC;
- no warnings 'deprecated';
my $ret = eval "\${\cT\n}";
- is($@, "", 'No errors from using ${\n\cT\n}');
- is($ret, $^T, " ... and we got the right value");
- }
-}
-
-SKIP: {
- skip("Literal control characters in variable names forbidden on EBCDIC", 5)
- if $::IS_EBCDIC;
-
- # Originally from t/base/lex.t, moved here since we can't
- # turn deprecation warnings off in that file.
- no strict;
- no warnings 'deprecated';
-
- my $CX = "\cX";
- $ {$CX} = 17;
-
- # Does the syntax where we use the literal control character still work?
- is(
- eval "\$ {\cX}",
- 17,
- "Literal control character variables work"
- );
-
- eval "\$\cQ = 24"; # Literal control character
- is($@, "", " ... and they can be assigned to without error");
- is(${"\cQ"}, 24, " ... and the assignment works");
- is($^Q, 24, " ... even if we access the variable through the caret name");
- is(\${"\cQ"}, \$^Q, '\${\cQ} == \$^Q');
+ like($@, qr/\QUnrecognized character/, '${\n\cT\n} gives an error message');
}
{
/* Is the byte 'd' a legal single character identifier name? 'u' is true
* iff Unicode semantics are to be used. The legal ones are any of:
* a) all ASCII characters except:
- * 1) space-type ones, like \t and SPACE;
- 2) NUL;
- * 3) '{'
+ * 1) control and space-type ones, like NUL, SOH, \t, and SPACE;
+ * 2) '{'
* The final case currently doesn't get this far in the program, so we
* don't test for it. If that were to change, it would be ok to allow it.
* c) When not under Unicode rules, any upper Latin1 character
: (isGRAPH_L1(*s) \
&& LIKELY((U8) *(s) != LATIN1_TO_NATIVE(0xAD)))))
#else
-# define VALID_LEN_ONE_IDENT(s, is_utf8) (! isSPACE_A(*(s)) \
- && LIKELY(*(s) != '\0') \
- && (! is_utf8 \
- || isASCII_utf8((U8*) (s)) \
- || isIDFIRST_utf8((U8*) (s))))
+# define VALID_LEN_ONE_IDENT(s, is_utf8) \
+ (isGRAPH_A(*(s)) || ((is_utf8) \
+ ? isIDFIRST_utf8((U8*) (s)) \
+ : ! isASCII_utf8((U8*) (s))))
#endif
if ((s <= PL_bufend - (is_utf8)
? UTF8SKIP(s)
: (! isGRAPH_L1( (U8) *s)
|| UNLIKELY((U8) *(s) == LATIN1_TO_NATIVE(0xAD))))
{
- /* Split messages for back compat */
- if (isCNTRL_A( (U8) *s)) {
- deprecate("literal control characters in variable names");
- }
- else {
- deprecate("literal non-graphic characters in variable names");
- }
+ deprecate("literal non-graphic characters in variable names");
}
if (is_utf8) {