=head1 DESCRIPTION
+In Perl, the operator determines what operation is performed,
+independent of the type of the operands. For example C<$x + $y>
+is always a numeric addition, and if C<$x> or C<$y> do not contain
+numbers, an attempt is made to convert them to numbers first.
+
+This is in contrast to many other dynamic languages, where the
+operation is determined by the type of the first argument. It also
+means that Perl has two versions of some operators, one for numeric
+and one for string comparison. For example C<$x == $y> compares
+two numbers for equality, and C<$x eq $y> compares two strings.
+
+There are a few exceptions though: C<x> can be either string
+repetition or list repetition, depending on the type of the left
+operand, and C<&>, C<|> and C<^> can be either string or numeric bit
+operations.
+
=head2 Operator Precedence and Associativity
X<operator, precedence> X<precedence> X<associativity>
left || //
nonassoc .. ...
right ?:
- right = += -= *= etc.
+ right = += -= *= etc. goto last next redo dump
left , =>
nonassoc list operators (rightward)
right not
and the left side must be either an object (a blessed reference)
or a class name (that is, a package name). See L<perlobj>.
+The dereferencing cases (as opposed to method-calling cases) are
+somewhat extended by the experimental C<postderef> feature. For the
+details of that feature, consult L<perlref/Postfix Dereference Syntax>.
+
=head2 Auto-increment and Auto-decrement
X<increment> X<auto-increment> X<++> X<decrement> X<auto-decrement> X<-->
print ++$j; # prints 1
Note that just as in C, Perl doesn't define B<when> the variable is
-incremented or decremented. You just know it will be done sometime
-before or after the value is returned. This also means that modifying
+incremented or decremented. You just know it will be done sometime
+before or after the value is returned. This also means that modifying
a variable twice in the same statement will lead to undefined behavior.
Avoid statements like:
X<**> X<exponentiation> X<power>
Binary "**" is the exponentiation operator. It binds even more
-tightly than unary minus, so -2**4 is -(2**4), not (-2)**4. (This is
+tightly than unary minus, so -2**4 is -(2**4), not (-2)**4. (This is
implemented using C's pow(3) function, which actually works on doubles
internally.)
returned. One effect of these rules is that -bareword is equivalent
to the string "-bareword". If, however, the string begins with a
non-alphabetic character (excluding "+" or "-"), Perl will attempt to convert
-the string to a numeric and the arithmetic negation is performed. If the
+the string to a numeric and the arithmetic negation is performed. If the
string cannot be cleanly converted to a numeric, Perl will give the warning
B<Argument "the string" isn't numeric in negation (-) at ...>.
X<-> X<negation, arithmetic>
If the right argument is an expression rather than a search pattern,
substitution, or transliteration, it is interpreted as a search pattern at run
-time. Note that this means that its contents will be interpolated twice, so
+time. Note that this means that its
+contents will be interpolated twice, so
'\\' =~ q'\\';
Binary "%" is the modulo operator, which computes the division
remainder of its first argument with respect to its second argument.
Given integer
-operands C<$a> and C<$b>: If C<$b> is positive, then C<$a % $b> is
-C<$a> minus the largest multiple of C<$b> less than or equal to
-C<$a>. If C<$b> is negative, then C<$a % $b> is C<$a> minus the
-smallest multiple of C<$b> that is not less than C<$a> (that is, the
+operands C<$m> and C<$n>: If C<$n> is positive, then C<$m % $n> is
+C<$m> minus the largest multiple of C<$n> less than or equal to
+C<$m>. If C<$n> is negative, then C<$m % $n> is C<$m> minus the
+smallest multiple of C<$n> that is not less than C<$m> (that is, the
result will be less than or equal to zero). If the operands
-C<$a> and C<$b> are floating point values and the absolute value of
-C<$b> (that is C<abs($b)>) is less than C<(UV_MAX + 1)>, only
-the integer portion of C<$a> and C<$b> will be used in the operation
+C<$m> and C<$n> are floating point values and the absolute value of
+C<$n> (that is C<abs($n)>) is less than C<(UV_MAX + 1)>, only
+the integer portion of C<$m> and C<$n> will be used in the operation
(Note: here C<UV_MAX> means the maximum of the unsigned integer type).
-If the absolute value of the right operand (C<abs($b)>) is greater than
+If the absolute value of the right operand (C<abs($n)>) is greater than
or equal to C<(UV_MAX + 1)>, "%" computes the floating-point remainder
-C<$r> in the equation C<($r = $a - $i*$b)> where C<$i> is a certain
+C<$r> in the equation C<($r = $m - $i*$n)> where C<$i> is a certain
integer that makes C<$r> have the same sign as the right operand
-C<$b> (B<not> as the left operand C<$a> like C function C<fmod()>)
-and the absolute value less than that of C<$b>.
+C<$n> (B<not> as the left operand C<$m> like C function C<fmod()>)
+and the absolute value less than that of C<$n>.
Note that when C<use integer> is in scope, "%" gives you direct access
to the modulo operator as implemented by your C compiler. This
operator is not as well defined for negative operands, but it will
of the left operand repeated the number of times specified by the right
operand. In list context, if the left operand is enclosed in
parentheses or is a list formed by C<qw/STRING/>, it repeats the list.
-If the right operand is zero or negative, it returns an empty string
+If the right operand is zero or negative (raising a warning on
+negative), it returns an empty string
or an empty list, depending on the context.
X<x>
argument is numerically less than, equal to, or greater than the right
argument. If your platform supports NaNs (not-a-numbers) as numeric
values, using them with "<=>" returns undef. NaN is not "<", "==", ">",
-"<=" or ">=" anything (even NaN), so those 5 return false. NaN != NaN
-returns true, as does NaN != anything else. If your platform doesn't
+"<=" or ">=" anything (even NaN), so those 5 return false. NaN != NaN
+returns true, as does NaN != anything else. If your platform doesn't
support NaNs then NaN is just a string with numeric value 0.
X<< <=> >> X<spaceship>
- $ perl -le '$a = "NaN"; print "No NaN support here" if $a == $a'
- $ perl -le '$a = "NaN"; print "NaN support here" if $a != $a'
+ $ perl -le '$x = "NaN"; print "No NaN support here" if $x == $x'
+ $ perl -le '$x = "NaN"; print "NaN support here" if $x != $x'
-(Note that the L<bigint>, L<bigrat>, and L<bignum> pragmas all
+(Note that the L<bigint>, L<bigrat>, and L<bignum> pragmas all
support "NaN".)
Binary "eq" returns true if the left argument is stringwise equal to
binary C<~~> does a "smartmatch" between its arguments. This is mostly
used implicitly in the C<when> construct described in L<perlsyn>, although
not all C<when> clauses call the smartmatch operator. Unique among all of
-Perl's operators, the smartmatch operator can recurse.
+Perl's operators, the smartmatch operator can recurse. The smartmatch
+operator is L<experimental|perlpolicy/experimental> and its behavior is
+subject to change.
It is also unique in that all other Perl operators impose a context
(usually string or numeric context) on their operands, autoconverting
like: exists HASH->{Any}
Right operand is CODE:
-
+
Left Right Description and pseudocode
===============================================================
ARRAY CODE sub returns true on all ARRAY elements[1]
(eventually) has a 4 in it.
Smartmatching one hash against another reports whether both contain the
-same keys, no more and no less. This could be used to see whether two
+same keys, no more and no less. This could be used to see whether two
records have the same field names, without caring what values those fields
might have. For example:
To avoid relying on an object's underlying representation, if the
smartmatch's right operand is an object that doesn't overload C<~~>,
it raises the exception "C<Smartmatching a non-overloaded object
-breaks encapsulation>". That's because one has no business digging
-around to see whether something is "in" an object. These are all
+breaks encapsulation>". That's because one has no business digging
+around to see whether something is "in" an object. These are all
illegal on objects without a C<~~> overload:
%hash ~~ $object
"fred" ~~ $object
However, you can change the way an object is smartmatched by overloading
-the C<~~> operator. This is allowed to extend the usual smartmatch semantics.
+the C<~~> operator. This is allowed to
+extend the usual smartmatch semantics.
For objects that do have an C<~~> overload, see L<overload>.
Using an object as the left operand is allowed, although not very useful.
=head2 Bitwise And
X<operator, bitwise, and> X<bitwise and> X<&>
-Binary "&" returns its operands ANDed together bit by bit.
-(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
+Binary "&" returns its operands ANDed together bit by bit. Although no
+warning is currently raised, the result is not well defined when this operation
+is performed on operands that aren't either numbers (see
+L<Integer Arithmetic>) or bitstrings (see L<Bitwise String Operators>).
Note that "&" has lower priority than relational operators, so for example
the parentheses are essential in a test like
X<bitwise xor> X<^>
Binary "|" returns its operands ORed together bit by bit.
-(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
Binary "^" returns its operands XORed together bit by bit.
-(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
+
+Although no warning is currently raised, the results are not well
+defined when these operations are performed on operands that aren't either
+numbers (see L<Integer Arithmetic>) or bitstrings (see L<Bitwise String
+Operators>).
Note that "|" and "^" have lower priority than relational operators, so
for example the brackets are essential in a test like
to its C-style or. In fact, it's exactly the same as C<||>, except that it
tests the left hand side's definedness instead of its truth. Thus,
C<< EXPR1 // EXPR2 >> returns the value of C<< EXPR1 >> if it's defined,
-otherwise, the value of C<< EXPR2 >> is returned. (C<< EXPR1 >> is evaluated
-in scalar context, C<< EXPR2 >> in the context of C<< // >> itself). Usually,
+otherwise, the value of C<< EXPR2 >> is returned.
+(C<< EXPR1 >> is evaluated in scalar context, C<< EXPR2 >>
+in the context of C<< // >> itself). Usually,
this is the same result as C<< defined(EXPR1) ? EXPR1 : EXPR2 >> (except that
the ternary-operator form can be used as a lvalue, while C<< EXPR1 // EXPR2 >>
-cannot). This is very useful for
+cannot). This is very useful for
providing default values for variables. If you actually want to test if
-at least one of C<$a> and C<$b> is defined, use C<defined($a // $b)>.
+at least one of C<$x> and C<$y> is defined, use C<defined($x // $y)>.
The C<||>, C<//> and C<&&> operators return the last value evaluated
-(unlike C's C<||> and C<&&>, which return 0 or 1). Thus, a reasonably
+(unlike C's C<||> and C<&&>, which return 0 or 1). Thus, a reasonably
portable way to find out the home directory might be:
$home = $ENV{HOME}
list of values counting (up by ones) from the left value to the right
value. If the left value is greater than the right value then it
returns the empty list. The range operator is useful for writing
-C<foreach (1..10)> loops and for doing slice operations on arrays. In
+C<foreach (1..10)> loops and for doing slice operations on arrays. In
the current implementation, no temporary array is created when the
range operator is used as the expression in C<foreach> loops, but older
versions of Perl might burn a lot of memory when you write something
In scalar context, ".." returns a boolean value. The operator is
bistable, like a flip-flop, and emulates the line-range (comma)
-operator of B<sed>, B<awk>, and various editors. Each ".." operator
+operator of B<sed>, B<awk>, and various editors. Each ".." operator
maintains its own boolean state, even across calls to a subroutine
-that contains it. It is false as long as its left operand is false.
+that contains it. It is false as long as its left operand is false.
Once the left operand is true, the range operator stays true until the
right operand is true, I<AFTER> which the range operator becomes false
again. It doesn't become false till the next time the range operator
is evaluated. It can test the right operand and become false on the
same evaluation it became true (as in B<awk>), but it still returns
-true once. If you don't want it to test the right operand until the
+true once. If you don't want it to test the right operand until the
next evaluation, as in B<sed>, just use three dots ("...") instead of
two. In all other regards, "..." behaves just like ".." does.
}
}
-This program will print only the line containing "Bar". If
+This program will print only the line containing "Bar". If
the range operator is changed to C<...>, it will also print the
"Baz" line.
However, because there are I<many> other lowercase Greek characters than
just those, to match lowercase Greek characters in a regular expression,
-you would use the pattern C</(?:(?=\p{Greek})\p{Lower})+/>.
+you could use the pattern C</(?:(?=\p{Greek})\p{Lower})+/> (or the
+L<experimental feature|perlrecharclass/Extended Bracketed Character
+Classes> C<S</(?[ \p{Greek} & \p{Lower} ])+/>>).
Because each operand is evaluated in integer form, C<2.18 .. 3.14> will
return two elements in list context.
Scalar or list context propagates downward into the 2nd
or 3rd argument, whichever is selected.
- $a = $ok ? $b : $c; # get a scalar
- @a = $ok ? @b : @c; # get an array
- $a = $ok ? @b : @c; # oops, that's just a count!
+ $x = $ok ? $y : $z; # get a scalar
+ @x = $ok ? @y : @z; # get an array
+ $x = $ok ? @y : @z; # oops, that's just a count!
The operator may be assigned to if both the 2nd and 3rd arguments are
legal lvalues (meaning that you can assign to them):
- ($a_or_b ? $a : $b) = $c;
+ ($x_or_y ? $x : $y) = $z;
Because this operator produces an assignable result, using assignments
without parentheses will get you in trouble. For example, this:
- $a % 2 ? $a += 10 : $a += 2
+ $x % 2 ? $x += 10 : $x += 2
Really means this:
- (($a % 2) ? ($a += 10) : $a) += 2
+ (($x % 2) ? ($x += 10) : $x) += 2
Rather than this:
- ($a % 2) ? ($a += 10) : ($a += 2)
+ ($x % 2) ? ($x += 10) : ($x += 2)
That should probably be written more simply as:
- $a += ($a % 2) ? 10 : 2;
+ $x += ($x % 2) ? 10 : 2;
=head2 Assignment Operators
X<assignment> X<operator, assignment> X<=> X<**=> X<+=> X<*=> X<&=>
Assignment operators work as in C. That is,
- $a += 2;
+ $x += 2;
is equivalent to
- $a = $a + 2;
+ $x = $x + 2;
although without duplicating any side effects that dereferencing the lvalue
might trigger, such as from tie(). Other assignment operators work similarly.
x=
Although these are grouped by family, they all have the precedence
-of assignment.
+of assignment. These combined assignment operators can only operate on
+scalars, whereas the ordinary assignment operator can assign to arrays,
+hashes, lists and even references. (See L<"Context"|perldata/Context>
+and L<perldata/List value constructors>, and L<perlref/Assigning to
+References>.)
Unlike in C, the scalar assignment operator produces a valid lvalue.
Modifying an assignment is equivalent to doing the assignment and
Likewise,
- ($a += 2) *= 3;
+ ($x += 2) *= 3;
is equivalent to
- $a += 2;
- $a *= 3;
+ $x += 2;
+ $x *= 3;
Similarly, a list assignment in list context produces the list of
lvalues assigned to, and a list assignment in scalar context returns
word on its left to be interpreted as a string if it begins with a letter
or underscore and is composed only of letters, digits and underscores.
This includes operands that might otherwise be interpreted as operators,
-constants, single number v-strings or function calls. If in doubt about
+constants, single number v-strings or function calls. If in doubt about
this behavior, the left operand can be quoted explicitly.
Otherwise, the C<< => >> operator behaves exactly as the comma operator
be careful to avoid using it as replacement for the C<||> operator.
It usually works out better for flow control than in assignments:
- $a = $b or $c; # bug: this is wrong
- ($a = $b) or $c; # really means this
- $a = $b || $c; # better written this way
+ $x = $y or $z; # bug: this is wrong
+ ($x = $y) or $z; # really means this
+ $x = $y || $z; # better written this way
However, when it's a list-context assignment and you're trying to use
C<||> for control flow, you probably need "or" so that the assignment
=item unary *
-Dereference-address operator. (Perl's prefix dereferencing
+Dereference-address operator. (Perl's prefix dereferencing
operators are typed: $, @, %, and &.)
=item (TYPE)
Note, however, that this does not always work for quoting Perl code:
- $s = q{ if($a eq "}") ... }; # WRONG
+ $s = q{ if($x eq "}") ... }; # WRONG
-is a syntax error. The C<Text::Balanced> module (standard as of v5.8,
+is a syntax error. The C<Text::Balanced> module (standard as of v5.8,
and from CPAN before then) is able to do this properly.
There can be whitespace between the operator and the quoting
The result is the character specified by the hexadecimal number between
the braces. See L</[8]> below for details on which character.
-Only hexadecimal digits are valid between the braces. If an invalid
+Only hexadecimal digits are valid between the braces. If an invalid
character is encountered, a warning will be issued and the invalid
character and all subsequent characters (valid or invalid) within the
braces will be discarded.
\c[ chr(27)
\c] chr(29)
\c^ chr(30)
- \c? chr(127)
+ \c_ chr(31)
+ \c? chr(127) # (on ASCII platforms)
In other words, it's the character whose code point has had 64 xor'd with
-its uppercase. C<\c?> is DELETE because C<ord("@") ^ 64> is 127, and
+its uppercase. C<\c?> is DELETE on ASCII platforms because
+S<C<ord("?") ^ 64>> is 127, and
C<\c@> is NULL because the ord of "@" is 64, so xor'ing 64 itself produces 0.
Also, C<\c\I<X>> yields C< chr(28) . "I<X>"> for any I<X>, but cannot come at the
On ASCII platforms, the resulting characters from the list above are the
complete set of ASCII controls. This isn't the case on EBCDIC platforms; see
-L<perlebcdic/OPERATOR DIFFERENCES> for the complete list of what these
-sequences mean on both ASCII and EBCDIC platforms.
+L<perlebcdic/OPERATOR DIFFERENCES> for a full discussion of the
+differences between these for ASCII versus EBCDIC platforms.
-Use of any other character following the "c" besides those listed above is
-discouraged, and some are deprecated with the intention of removing
-those in a later Perl version. What happens for any of these
-other characters currently though, is that the value is derived by xor'ing
-with the seventh bit, which is 64.
+Use of any other character following the C<"c"> besides those listed above is
+discouraged, and as of Perl v5.20, the only characters actually allowed
+are the printable ASCII ones, minus the left brace C<"{">. What happens
+for any of the allowed other characters is that the value is derived by
+xor'ing with the seventh bit, which is 64, and a warning raised if
+enabled. Using the non-allowed characters generates a fatal error.
To get platform independent controls, you can use C<\N{...}>.
C<\o{}>, or convert to something else, such as to hex and use C<\x{}>
instead.
-Having fewer than 3 digits may lead to a misleading warning message that says
-that what follows is ignored. For example, C<"\128"> in the ASCII character set
-is equivalent to the two characters C<"\n8">, but the warning C<Illegal octal
-digit '8' ignored> will be thrown. If C<"\n8"> is what you want, you can
-avoid this warning by padding your octal number with C<0>'s: C<"\0128">.
-
=item [8]
Several constructs above specify a character by a number. That number
=back
B<NOTE>: Unlike C and other languages, Perl has no C<\v> escape sequence for
-the vertical tab (VT - ASCII 11), but you may use C<\ck> or C<\x0b>. (C<\v>
+the vertical tab (VT, which is 11 in both ASCII and EBCDIC), but you may
+use C<\ck> or
+C<\x0b>. (C<\v>
does have meaning in regular expression patterns in Perl, see L<perlre>.)
The following escape sequences are available in constructs that interpolate,
beyond) is being used, the case map used by C<\l>, C<\L>, C<\u>, and
C<\U> is as defined by Unicode. That means that case-mapping
a single character can sometimes produce several characters.
-Under C<use locale>, C<\F> produces the same results as C<\L>.
+Under C<use locale>, C<\F> produces the same results as C<\L>
+for all locales but a UTF-8 one, where it instead uses the Unicode
+definition.
All systems use the virtual C<"\n"> to represent a line terminator,
called a "newline". There is no such thing as an unvarying, physical
For the pattern of regex operators (C<qr//>, C<m//> and C<s///>),
the quoting from C<\Q> is applied after interpolation is processed,
-but before escapes are processed. This allows the pattern to match
-literally (except for C<$> and C<@>). For example, the following matches:
+but before escapes are processed. This allows the pattern to match
+literally (except for C<$> and C<@>). For example, the following matches:
'\s\t' =~ /\Q\s\t/
expression. I<STRING> is interpolated the same way as I<PATTERN>
in C<m/PATTERN/>. If "'" is used as the delimiter, no interpolation
is done. Returns a Perl value which may be used instead of the
-corresponding C</STRING/msixpodual> expression. The returned value is a
-normalized version of the original pattern. It magically differs from
+corresponding C</STRING/msixpodual> expression. The returned value is a
+normalized version of the original pattern. It magically differs from
a string containing the same characters: C<ref(qr/x/)> returns "Regexp";
however, dereferencing it is not well defined (you currently get the
normalized version of the original pattern, but this may change).
o Compile pattern only once.
a ASCII-restrict: Use ASCII for \d, \s, \w; specifying two
a's further restricts /i matching so that no ASCII
- character will match a non-ASCII one
- l Use the locale
- u Use Unicode rules
- d Use Unicode or native charset, as in 5.12 and earlier
+ character will match a non-ASCII one.
+ l Use the locale.
+ u Use Unicode rules.
+ d Use Unicode or native charset, as in 5.12 and earlier.
If a precompiled pattern is embedded in a larger pattern then the effect
of "msixpluad" will be propagated appropriately. The effect the "o"
explicitly using it.
The last four modifiers listed above, added in Perl 5.14,
-control the character set semantics, but C</a> is the only one you are likely
+control the character set rules, but C</a> is the only one you are likely
to want to specify explicitly; the other three are selected
automatically by various pragmas.
as delimiters. This is particularly useful for matching path names
that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is
the delimiter, then a match-only-once rule applies,
-described in C<m?PATTERN?> below.
-If "'" is the delimiter, no interpolation is performed on the PATTERN.
+described in C<m?PATTERN?> below. If "'" (single quote) is the delimiter,
+no interpolation is performed on the PATTERN.
When using a character valid in an identifier, whitespace is required
after the C<m>.
after the trailing delimiter.
Once upon a time, Perl would recompile regular expressions
unnecessarily, and this modifier was useful to tell it not to do so, in the
-interests of speed. But now, the only reasons to use C</o> are either:
+interests of speed. But now, the only reasons to use C</o> are one of:
=over
=item The empty pattern //
If the PATTERN evaluates to the empty string, the last
-I<successfully> matched regular expression is used instead. In this
+I<successfully> matched regular expression is used instead. In this
case, only the C<g> and C<c> flags on the empty pattern are honored;
-the other flags are taken from the original pattern. If no match has
+the other flags are taken from the original pattern. If no match has
previously succeeded, this will (silently) act instead as a genuine
empty pattern (which will always match).
Note that it's possible to confuse Perl into thinking C<//> (the empty
regex) is really C<//> (the defined-or operator). Perl is usually pretty
good about this, but some pathological cases might trigger this, such as
-C<$a///> (is that C<($a) / (//)> or C<$a // />?) and C<print $fh //>
+C<$x///> (is that C<($x) / (//)> or C<$x // />?) and C<print $fh //>
(C<print $fh(//> or C<print($fh //>?). In all of these examples, Perl
will assume you meant defined-or. If you meant the empty regex, just
use parentheses or spaces to disambiguate, or even prefix the empty
if the pattern matched.
The C</g> modifier specifies global pattern matching--that is,
-matching as many times as possible within the string. How it behaves
-depends on the context. In list context, it returns a list of the
+matching as many times as possible within the string. How it behaves
+depends on the context. In list context, it returns a list of the
substrings matched by any capturing parentheses in the regular
-expression. If there are no parentheses, it returns a list of all
+expression. If there are no parentheses, it returns a list of all
the matched strings, as if there were parentheses around the whole
pattern.
In scalar context, each execution of C<m//g> finds the next match,
returning true if it matches, and false if there is no further match.
The position after the last match can be read or set using the C<pos()>
-function; see L<perlfunc/pos>. A failed match normally resets the
+function; see L<perlfunc/pos>. A failed match normally resets the
search position to the beginning of the string, but you can avoid that
-by adding the C</c> modifier (for example, C<m//gc>). Modifying the target
+by adding the C</c> modifier (for example, C<m//gc>). Modifying the target
string also resets the search position.
=item \G assertion
You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a
zero-width assertion that matches the exact position where the
-previous C<m//g>, if any, left off. Without the C</g> modifier, the
+previous C<m//g>, if any, left off. Without the C</g> modifier, the
C<\G> assertion still anchors at C<pos()> as it was at the start of
the operation (see L<perlfunc/pos>), but the match is of course only
-attempted once. Using C<\G> without C</g> on a target string that has
+attempted once. Using C<\G> without C</g> on a target string that has
not previously had a C</g> match applied to it is the same as using
the C<\A> assertion to match the beginning of the string. Note also
that, currently, C<\G> is only properly supported when anchored at the
Final: 'q', pos=8
Notice that the final match matched C<q> instead of C<p>, which a match
-without the C<\G> anchor would have done. Also note that the final match
-did not update C<pos>. C<pos> is only updated on a C</g> match. If the
+without the C<\G> anchor would have done. Also note that the final match
+did not update C<pos>. C<pos> is only updated on a C</g> match. If the
final match did indeed match C<p>, it's a good bet that you're running a
very old (pre-5.6.0) version of Perl.
C<s(foo)(bar)> or C<< s<foo>/bar/ >>. A C</e> will cause the
replacement portion to be treated as a full-fledged Perl expression
and evaluated right then and there. It is, however, syntax checked at
-compile-time. A second C<e> modifier will cause the replacement portion
+compile-time. A second C<e> modifier will cause the replacement portion
to be C<eval>ed before being run as a Perl expression.
Examples:
s/^=(\w+)/pod($1)/ge; # use function call
$_ = 'abc123xyz';
- $a = s/abc/def/r; # $a is 'def123xyz' and
+ $x = s/abc/def/r; # $x is 'def123xyz' and
# $_ remains 'abc123xyz'.
# expand variables in $_, but dynamics only, using
=item `STRING`
A string which is (possibly) interpolated and then executed as a
-system command with C</bin/sh> or its equivalent. Shell wildcards,
+system command with F</bin/sh> or its equivalent. Shell wildcards,
pipes, and redirections will be honored. The collected standard
output of the command is returned; standard error is unaffected. In
scalar context, it comes back as a single (potentially multi-line)
specified by SEARCHLIST not found in REPLACEMENTLIST are deleted.
(Note that this is slightly more flexible than the behavior of some
B<tr> programs, which delete anything they find in the SEARCHLIST,
-period.) If the C</s> modifier is specified, sequences of characters
+period.) If the C</s> modifier is specified, sequences of characters
that were transliterated to the same character are squashed down
to a single instance of the character.
=item Single Quotes
Single quotes indicate the text is to be treated literally with no
-interpolation of its content. This is similar to single quoted
+interpolation of its content. This is similar to single quoted
strings except that backslashes have no special meaning, with C<\\>
being treated as two backslashes and not one as they would in every
other quoting construct.
=item Backticks
The content of the here doc is treated just as it would be if the
-string were embedded in backticks. Thus the content is interpolated
+string were embedded in backticks. Thus the content is interpolated
as though it were double quoted and then executed via the shell, with
the results of the execution returned.
FINIS
If you use a here-doc within a delimited construct, such as in C<s///eg>,
-the quoted material must come on the lines following the final delimiter.
-So instead of
+the quoted material must still come on the line following the
+C<<< <<FOO >>> marker, which means it may be inside the delimited
+construct:
s/this/<<E . 'that'
the other
E
. 'more '/eg;
-you have to write
+It works this way as of Perl 5.18. Historically, it was inconsistent, and
+you would have to write
s/this/<<E . 'that'
. 'more '/eg;
the other
E
-If the terminating identifier is on the last line of the program, you
-must be sure there is a newline after it; otherwise, Perl will give the
-warning B<Can't find string terminator "END" anywhere before EOF...>.
+outside of string evals.
Additionally, quoting rules for the end-of-string identifier are
-unrelated to Perl's quoting rules. C<q()>, C<qq()>, and the like are not
+unrelated to Perl's quoting rules. C<q()>, C<qq()>, and the like are not
supported in place of C<''> and C<"">, and the only interpolation is for
backslashing the quoting character:
The first pass is finding the end of the quoted construct, where
the information about the delimiters is used in parsing.
During this search, text between the starting and ending delimiters
-is copied to a safe location. The text copied gets delimiter-independent.
+is copied to a safe location. The text copied gets delimiter-independent.
If the construct is a here-doc, the ending delimiter is a line
-that has a terminating string as the content. Therefore C<<<EOF> is
+that has a terminating string as the content. Therefore C<<<EOF> is
terminated by C<EOF> immediately followed by C<"\n"> and starting
from the first column of the terminating line.
When searching for the terminating line of a here-doc, nothing
-is skipped. In other words, lines after the here-doc syntax
+is skipped. In other words, lines after the here-doc syntax
are compared with the terminating string line by line.
For the constructs except here-docs, single characters are used as starting
-and ending delimiters. If the starting delimiter is an opening punctuation
+and ending delimiters. If the starting delimiter is an opening punctuation
(that is C<(>, C<[>, C<{>, or C<< < >>), the ending delimiter is the
corresponding closing punctuation (that is C<)>, C<]>, C<}>, or C<< > >>).
If the starting delimiter is an unpaired character like C</> or a closing
punctuation, the ending delimiter is same as the starting delimiter.
Therefore a C</> terminates a C<qq//> construct, while a C<]> terminates
-C<qq[]> and C<qq]]> constructs.
+both C<qq[]> and C<qq]]> constructs.
When searching for single-character delimiters, escaped delimiters
and C<\\> are skipped. For example, while searching for terminating C</>,
For constructs with three-part delimiters (C<s///>, C<y///>, and
C<tr///>), the search is repeated once more.
-If the first delimiter is not an opening punctuation, three delimiters must
-be same such as C<s!!!> and C<tr)))>, in which case the second delimiter
+If the first delimiter is not an opening punctuation, the three delimiters must
+be the same, such as C<s!!!> and C<tr)))>,
+in which case the second delimiter
terminates the left part and starts the right part at once.
If the left part is delimited by bracketing punctuation (that is C<()>,
C<[]>, C<{}>, or C<< <> >>), the right part needs another pair of
delimiters such as C<s(){}> and C<tr[]//>. In these cases, whitespace
-and comments are allowed between both parts, though the comment must follow
+and comments are allowed between the two parts, though the comment must follow
at least one whitespace character; otherwise a character expected as the
start of the comment may be regarded as the starting delimiter of the right part.
modifier. So the embedded C<#> is interpreted as a literal C<#>.
Also no attention is paid to C<\c\> (multichar control char syntax) during
-this search. Thus the second C<\> in C<qq/\c\/> is interpreted as a part
+this search. Thus the second C<\> in C<qq/\c\/> is interpreted as a part
of C<\/>, and the following C</> is not recognized as a delimiter.
Instead, use C<\034> or C<\x1c> at the end of quoted constructs.
Let it be stressed that I<whatever falls between C<\Q> and C<\E>>
is interpolated in the usual way. Something like C<"\Q\\E"> has
-no C<\E> inside. instead, it has C<\Q>, C<\\>, and C<E>, so the
+no C<\E> inside. Instead, it has C<\Q>, C<\\>, and C<E>, so the
result is the same as for C<"\\\\E">. As a general rule, backslashes
between C<\Q> and C<\E> may lead to counterintuitive results. So,
C<"\Q\t\E"> is converted to C<quotemeta("\t")>, which is the same
Note also that the interpolation code needs to make a decision on
where the interpolated scalar ends. For instance, whether
-C<< "a $b -> {c}" >> really means:
+C<< "a $x -> {c}" >> really means:
- "a " . $b . " -> {c}";
+ "a " . $x . " -> {c}";
or:
- "a " . $b -> {c};
+ "a " . $x -> {c};
Most of the time, the longest possible text that does not include
spaces between components and which contains matching braces or
=head2 I/O Operators
X<operator, i/o> X<operator, io> X<io> X<while> X<filehandle>
-X<< <> >> X<@ARGV>
+X<< <> >> X<< <<>> >> X<@ARGV>
There are several I/O operators you should know about.
except that it isn't so cumbersome to say, and will actually work.
It really does shift the @ARGV array and put the current filename
into the $ARGV variable. It also uses filehandle I<ARGV>
-internally. <> is just a synonym for <ARGV>, which
+internally. <> is just a synonym for <ARGV>, which
is magical. (The pseudo code above doesn't work because it treats
<ARGV> as non-magical.)
and call it with C<perl dangerous.pl 'rm -rfv *|'>, it actually opens a
pipe, executes the C<rm> command and reads C<rm>'s output from that pipe.
If you want all items in C<@ARGV> to be interpreted as file names, you
-can use the module C<ARGV::readonly> from CPAN.
+can use the module C<ARGV::readonly> from CPAN, or use the double bracket:
+
+ while (<<>>) {
+ print;
+ }
+
+Using double angle brackets inside of a while causes the open to use the
+three argument form (with the second argument being C<< < >>), so all
+arguments in ARGV are treated as literal filenames (including "-").
+(Note that for convenience, if you use C<< <<>> >> and if @ARGV is
+empty, it will still read from the standard input.)
You can modify @ARGV before the first <> as long as the array ends up
containing the list of filenames you really want. Line numbers (C<$.>)
The standard C<Math::BigInt>, C<Math::BigRat>, and C<Math::BigFloat> modules,
along with the C<bignum>, C<bigint>, and C<bigrat> pragmas, provide
variable-precision arithmetic and overloaded operators, although
-they're currently pretty slow. At the cost of some space and
+they're currently pretty slow. At the cost of some space and
considerable speed, they avoid the normal pitfalls associated with
limited-precision representations.
Or with rationals:
- use 5.010;
- use bigrat;
- $a = 3/22;
- $b = 4/6;
- say "a/b is ", $a/$b;
- say "a*b is ", $a*$b;
- a/b is 9/44
- a*b is 1/11
+ use 5.010;
+ use bigrat;
+ $x = 3/22;
+ $y = 4/6;
+ say "x/y is ", $x/$y;
+ say "x*y is ", $x*$y;
+ x/y is 9/44
+ x*y is 1/11
Several modules let you calculate with (bound only by memory and CPU time)
-unlimited or fixed precision. There are also some non-standard modules that
+unlimited or fixed precision. There
+are also some non-standard modules that
provide faster implementations via external C libraries.
Here is a short, but incomplete summary:
- Math::Fraction big, unlimited fractions like 9973 / 12967
Math::String treat string sequences like numbers
Math::FixedPrecision calculate with a fixed precision
Math::Currency for currency calculations
Bit::Vector manipulate bit vectors fast (uses C)
Math::BigIntFast Bit::Vector wrapper for big numbers
Math::Pari provides access to the Pari C library
- Math::BigInteger uses an external C library
- Math::Cephes uses external Cephes C library (no big numbers)
+ Math::Cephes uses the external Cephes C library (no
+ big numbers)
Math::Cephes::Fraction fractions via the Cephes library
Math::GMP another one using an external C library
+ Math::GMPz an alternative interface to libgmp's big ints
+ Math::GMPq an interface to libgmp's fraction numbers
+ Math::GMPf an interface to libgmp's floating point numbers
Choose wisely.