This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
More symbol scan logic from Alan Burlison.
[perl5.git] / pod / perlop.pod
CommitLineData
a0d0e21e
LW
1=head1 NAME
2
3perlop - Perl operators and precedence
4
5=head1 SYNOPSIS
6
7Perl operators have the following associativity and precedence,
19799a22
GS
8listed from highest precedence to lowest. Operators borrowed from
9C keep the same precedence relationship with each other, even where
10C's precedence is slightly screwy. (This makes learning Perl easier
11for C folks.) With very few exceptions, these all operate on scalar
12values only, not array values.
a0d0e21e
LW
13
14 left terms and list operators (leftward)
15 left ->
16 nonassoc ++ --
17 right **
18 right ! ~ \ and unary + and -
54310121 19 left =~ !~
a0d0e21e
LW
20 left * / % x
21 left + - .
22 left << >>
23 nonassoc named unary operators
24 nonassoc < > <= >= lt gt le ge
25 nonassoc == != <=> eq ne cmp
26 left &
27 left | ^
28 left &&
c963b151 29 left || //
137443ea 30 nonassoc .. ...
a0d0e21e
LW
31 right ?:
32 right = += -= *= etc.
33 left , =>
34 nonassoc list operators (rightward)
a5f75d66 35 right not
a0d0e21e 36 left and
c963b151 37 left or xor err
a0d0e21e
LW
38
39In the following sections, these operators are covered in precedence order.
40
5a964f20
TC
41Many operators can be overloaded for objects. See L<overload>.
42
cb1a09d0 43=head1 DESCRIPTION
a0d0e21e
LW
44
45=head2 Terms and List Operators (Leftward)
46
62c18ce2 47A TERM has the highest precedence in Perl. They include variables,
5f05dabc 48quote and quote-like operators, any expression in parentheses,
a0d0e21e
LW
49and any function whose arguments are parenthesized. Actually, there
50aren't really functions in this sense, just list operators and unary
51operators behaving as functions because you put parentheses around
52the arguments. These are all documented in L<perlfunc>.
53
54If any list operator (print(), etc.) or any unary operator (chdir(), etc.)
55is followed by a left parenthesis as the next token, the operator and
56arguments within parentheses are taken to be of highest precedence,
57just like a normal function call.
58
59In the absence of parentheses, the precedence of list operators such as
60C<print>, C<sort>, or C<chmod> is either very high or very low depending on
54310121 61whether you are looking at the left side or the right side of the operator.
a0d0e21e
LW
62For example, in
63
64 @ary = (1, 3, sort 4, 2);
65 print @ary; # prints 1324
66
19799a22
GS
67the commas on the right of the sort are evaluated before the sort,
68but the commas on the left are evaluated after. In other words,
69list operators tend to gobble up all arguments that follow, and
a0d0e21e 70then act like a simple TERM with regard to the preceding expression.
19799a22 71Be careful with parentheses:
a0d0e21e
LW
72
73 # These evaluate exit before doing the print:
74 print($foo, exit); # Obviously not what you want.
75 print $foo, exit; # Nor is this.
76
77 # These do the print before evaluating exit:
78 (print $foo), exit; # This is what you want.
79 print($foo), exit; # Or this.
80 print ($foo), exit; # Or even this.
81
82Also note that
83
84 print ($foo & 255) + 1, "\n";
85
54310121 86probably doesn't do what you expect at first glance. See
a0d0e21e
LW
87L<Named Unary Operators> for more discussion of this.
88
89Also parsed as terms are the C<do {}> and C<eval {}> constructs, as
54310121 90well as subroutine and method calls, and the anonymous
a0d0e21e
LW
91constructors C<[]> and C<{}>.
92
2ae324a7 93See also L<Quote and Quote-like Operators> toward the end of this section,
c07a80fd 94as well as L<"I/O Operators">.
a0d0e21e
LW
95
96=head2 The Arrow Operator
97
35f2feb0 98"C<< -> >>" is an infix dereference operator, just as it is in C
19799a22
GS
99and C++. If the right side is either a C<[...]>, C<{...}>, or a
100C<(...)> subscript, then the left side must be either a hard or
101symbolic reference to an array, a hash, or a subroutine respectively.
102(Or technically speaking, a location capable of holding a hard
103reference, if it's an array or hash reference being used for
104assignment.) See L<perlreftut> and L<perlref>.
a0d0e21e 105
19799a22
GS
106Otherwise, the right side is a method name or a simple scalar
107variable containing either the method name or a subroutine reference,
108and the left side must be either an object (a blessed reference)
109or a class name (that is, a package name). See L<perlobj>.
a0d0e21e 110
5f05dabc 111=head2 Auto-increment and Auto-decrement
a0d0e21e
LW
112
113"++" and "--" work as in C. That is, if placed before a variable, they
114increment or decrement the variable before returning the value, and if
115placed after, increment or decrement the variable after returning the value.
116
54310121 117The auto-increment operator has a little extra builtin magic to it. If
a0d0e21e
LW
118you increment a variable that is numeric, or that has ever been used in
119a numeric context, you get a normal increment. If, however, the
5f05dabc 120variable has been used in only string contexts since it was set, and
5a964f20 121has a value that is not the empty string and matches the pattern
9c0670e1 122C</^[a-zA-Z]*[0-9]*\z/>, the increment is done as a string, preserving each
a0d0e21e
LW
123character within its range, with carry:
124
125 print ++($foo = '99'); # prints '100'
126 print ++($foo = 'a0'); # prints 'a1'
127 print ++($foo = 'Az'); # prints 'Ba'
128 print ++($foo = 'zz'); # prints 'aaa'
129
6a61d433
HS
130C<undef> is always treated as numeric, and in particular is changed
131to C<0> before incrementing (so that a post-increment of an undef value
132will return C<0> rather than C<undef>).
133
5f05dabc 134The auto-decrement operator is not magical.
a0d0e21e
LW
135
136=head2 Exponentiation
137
19799a22 138Binary "**" is the exponentiation operator. It binds even more
cb1a09d0
AD
139tightly than unary minus, so -2**4 is -(2**4), not (-2)**4. (This is
140implemented using C's pow(3) function, which actually works on doubles
141internally.)
a0d0e21e
LW
142
143=head2 Symbolic Unary Operators
144
5f05dabc 145Unary "!" performs logical negation, i.e., "not". See also C<not> for a lower
a0d0e21e
LW
146precedence version of this.
147
148Unary "-" performs arithmetic negation if the operand is numeric. If
149the operand is an identifier, a string consisting of a minus sign
150concatenated with the identifier is returned. Otherwise, if the string
151starts with a plus or minus, a string starting with the opposite sign
152is returned. One effect of these rules is that C<-bareword> is equivalent
153to C<"-bareword">.
154
972b05a9
JH
155Unary "~" performs bitwise negation, i.e., 1's complement. For
156example, C<0666 & ~027> is 0640. (See also L<Integer Arithmetic> and
157L<Bitwise String Operators>.) Note that the width of the result is
158platform-dependent: ~0 is 32 bits wide on a 32-bit platform, but 64
159bits wide on a 64-bit platform, so if you are expecting a certain bit
160width, remember use the & operator to mask off the excess bits.
a0d0e21e
LW
161
162Unary "+" has no effect whatsoever, even on strings. It is useful
163syntactically for separating a function name from a parenthesized expression
164that would otherwise be interpreted as the complete list of function
5ba421f6 165arguments. (See examples above under L<Terms and List Operators (Leftward)>.)
a0d0e21e 166
19799a22
GS
167Unary "\" creates a reference to whatever follows it. See L<perlreftut>
168and L<perlref>. Do not confuse this behavior with the behavior of
169backslash within a string, although both forms do convey the notion
170of protecting the next thing from interpolation.
a0d0e21e
LW
171
172=head2 Binding Operators
173
c07a80fd 174Binary "=~" binds a scalar expression to a pattern match. Certain operations
cb1a09d0
AD
175search or modify the string $_ by default. This operator makes that kind
176of operation work on some other string. The right argument is a search
2c268ad5
TP
177pattern, substitution, or transliteration. The left argument is what is
178supposed to be searched, substituted, or transliterated instead of the default
f8bab1e9
GS
179$_. When used in scalar context, the return value generally indicates the
180success of the operation. Behavior in list context depends on the particular
181operator. See L</"Regexp Quote-Like Operators"> for details.
182
183If the right argument is an expression rather than a search pattern,
2c268ad5 184substitution, or transliteration, it is interpreted as a search pattern at run
573e01ca 185time.
a0d0e21e
LW
186
187Binary "!~" is just like "=~" except the return value is negated in
188the logical sense.
189
190=head2 Multiplicative Operators
191
192Binary "*" multiplies two numbers.
193
194Binary "/" divides two numbers.
195
54310121 196Binary "%" computes the modulus of two numbers. Given integer
197operands C<$a> and C<$b>: If C<$b> is positive, then C<$a % $b> is
198C<$a> minus the largest multiple of C<$b> that is not greater than
199C<$a>. If C<$b> is negative, then C<$a % $b> is C<$a> minus the
200smallest multiple of C<$b> that is not less than C<$a> (i.e. the
6bb4e6d4 201result will be less than or equal to zero).
0412d526 202Note that when C<use integer> is in scope, "%" gives you direct access
55d729e4
GS
203to the modulus operator as implemented by your C compiler. This
204operator is not as well defined for negative operands, but it will
205execute faster.
206
62d10b70
GS
207Binary "x" is the repetition operator. In scalar context or if the left
208operand is not enclosed in parentheses, it returns a string consisting
209of the left operand repeated the number of times specified by the right
210operand. In list context, if the left operand is enclosed in
211parentheses, it repeats the list.
a0d0e21e
LW
212
213 print '-' x 80; # print row of dashes
214
215 print "\t" x ($tab/8), ' ' x ($tab%8); # tab over
216
217 @ones = (1) x 80; # a list of 80 1's
218 @ones = (5) x @ones; # set all elements to 5
219
220
221=head2 Additive Operators
222
223Binary "+" returns the sum of two numbers.
224
225Binary "-" returns the difference of two numbers.
226
227Binary "." concatenates two strings.
228
229=head2 Shift Operators
230
55497cff 231Binary "<<" returns the value of its left argument shifted left by the
232number of bits specified by the right argument. Arguments should be
982ce180 233integers. (See also L<Integer Arithmetic>.)
a0d0e21e 234
55497cff 235Binary ">>" returns the value of its left argument shifted right by
236the number of bits specified by the right argument. Arguments should
982ce180 237be integers. (See also L<Integer Arithmetic>.)
a0d0e21e 238
b16cf6df
JH
239Note that both "<<" and ">>" in Perl are implemented directly using
240"<<" and ">>" in C. If C<use integer> (see L<Integer Arithmetic>) is
241in force then signed C integers are used, else unsigned C integers are
242used. Either way, the implementation isn't going to generate results
243larger than the size of the integer type Perl was built with (32 bits
244or 64 bits).
245
246The result of overflowing the range of the integers is undefined
247because it is undefined also in C. In other words, using 32-bit
248integers, C<< 1 << 32 >> is undefined. Shifting by a negative number
249of bits is also undefined.
250
a0d0e21e
LW
251=head2 Named Unary Operators
252
253The various named unary operators are treated as functions with one
254argument, with optional parentheses. These include the filetest
255operators, like C<-f>, C<-M>, etc. See L<perlfunc>.
256
257If any list operator (print(), etc.) or any unary operator (chdir(), etc.)
258is followed by a left parenthesis as the next token, the operator and
259arguments within parentheses are taken to be of highest precedence,
3981b0eb
JA
260just like a normal function call. For example,
261because named unary operators are higher precedence than ||:
a0d0e21e
LW
262
263 chdir $foo || die; # (chdir $foo) || die
264 chdir($foo) || die; # (chdir $foo) || die
265 chdir ($foo) || die; # (chdir $foo) || die
266 chdir +($foo) || die; # (chdir $foo) || die
267
3981b0eb 268but, because * is higher precedence than named operators:
a0d0e21e
LW
269
270 chdir $foo * 20; # chdir ($foo * 20)
271 chdir($foo) * 20; # (chdir $foo) * 20
272 chdir ($foo) * 20; # (chdir $foo) * 20
273 chdir +($foo) * 20; # chdir ($foo * 20)
274
275 rand 10 * 20; # rand (10 * 20)
276 rand(10) * 20; # (rand 10) * 20
277 rand (10) * 20; # (rand 10) * 20
278 rand +(10) * 20; # rand (10 * 20)
279
5ba421f6 280See also L<"Terms and List Operators (Leftward)">.
a0d0e21e
LW
281
282=head2 Relational Operators
283
35f2feb0 284Binary "<" returns true if the left argument is numerically less than
a0d0e21e
LW
285the right argument.
286
35f2feb0 287Binary ">" returns true if the left argument is numerically greater
a0d0e21e
LW
288than the right argument.
289
35f2feb0 290Binary "<=" returns true if the left argument is numerically less than
a0d0e21e
LW
291or equal to the right argument.
292
35f2feb0 293Binary ">=" returns true if the left argument is numerically greater
a0d0e21e
LW
294than or equal to the right argument.
295
296Binary "lt" returns true if the left argument is stringwise less than
297the right argument.
298
299Binary "gt" returns true if the left argument is stringwise greater
300than the right argument.
301
302Binary "le" returns true if the left argument is stringwise less than
303or equal to the right argument.
304
305Binary "ge" returns true if the left argument is stringwise greater
306than or equal to the right argument.
307
308=head2 Equality Operators
309
310Binary "==" returns true if the left argument is numerically equal to
311the right argument.
312
313Binary "!=" returns true if the left argument is numerically not equal
314to the right argument.
315
35f2feb0 316Binary "<=>" returns -1, 0, or 1 depending on whether the left
6ee5d4e7 317argument is numerically less than, equal to, or greater than the right
d4ad863d 318argument. If your platform supports NaNs (not-a-numbers) as numeric
7d3a9d88
NC
319values, using them with "<=>" returns undef. NaN is not "<", "==", ">",
320"<=" or ">=" anything (even NaN), so those 5 return false. NaN != NaN
321returns true, as does NaN != anything else. If your platform doesn't
322support NaNs then NaN is just a string with numeric value 0.
323
324 perl -le '$a = NaN; print "No NaN support here" if $a == $a'
325 perl -le '$a = NaN; print "NaN support here" if $a != $a'
a0d0e21e
LW
326
327Binary "eq" returns true if the left argument is stringwise equal to
328the right argument.
329
330Binary "ne" returns true if the left argument is stringwise not equal
331to the right argument.
332
d4ad863d
JH
333Binary "cmp" returns -1, 0, or 1 depending on whether the left
334argument is stringwise less than, equal to, or greater than the right
335argument.
a0d0e21e 336
a034a98d
DD
337"lt", "le", "ge", "gt" and "cmp" use the collation (sort) order specified
338by the current locale if C<use locale> is in effect. See L<perllocale>.
339
a0d0e21e
LW
340=head2 Bitwise And
341
2cdc098b 342Binary "&" returns its operands ANDed together bit by bit.
2c268ad5 343(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
a0d0e21e 344
2cdc098b
MG
345Note that "&" has lower priority than relational operators, so for example
346the brackets are essential in a test like
347
348 print "Even\n" if ($x & 1) == 0;
349
a0d0e21e
LW
350=head2 Bitwise Or and Exclusive Or
351
2cdc098b 352Binary "|" returns its operands ORed together bit by bit.
2c268ad5 353(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
a0d0e21e 354
2cdc098b 355Binary "^" returns its operands XORed together bit by bit.
2c268ad5 356(See also L<Integer Arithmetic> and L<Bitwise String Operators>.)
a0d0e21e 357
2cdc098b
MG
358Note that "|" and "^" have lower priority than relational operators, so
359for example the brackets are essential in a test like
360
361 print "false\n" if (8 | 2) != 10;
362
a0d0e21e
LW
363=head2 C-style Logical And
364
365Binary "&&" performs a short-circuit logical AND operation. That is,
366if the left operand is false, the right operand is not even evaluated.
367Scalar or list context propagates down to the right operand if it
368is evaluated.
369
370=head2 C-style Logical Or
371
372Binary "||" performs a short-circuit logical OR operation. That is,
373if the left operand is true, the right operand is not even evaluated.
374Scalar or list context propagates down to the right operand if it
375is evaluated.
376
c963b151
BD
377=head2 C-style Logical Defined-Or
378
379Although it has no direct equivalent in C, Perl's C<//> operator is related
380to its C-style or. In fact, it's exactly the same as C<||>, except that it
381tests the left hand side's definedness instead of its truth. Thus, C<$a // $b>
382is similar to C<defined($a) || $b> (except that it returns the value of C<$a>
383rather than the value of C<defined($a)>) and is exactly equivalent to
384C<defined($a) ? $a : $b>. This is very useful for providing default values
385for variables. If you actually want to test if at least one of C<$a> and C<$b> is
386defined, use C<defined($a // $b)>.
387
388The C<||>, C<//> and C<&&> operators differ from C's in that, rather than returning
a0d0e21e 3890 or 1, they return the last value evaluated. Thus, a reasonably portable
c963b151 390way to find out the home directory might be:
a0d0e21e 391
c963b151
BD
392 $home = $ENV{'HOME'} // $ENV{'LOGDIR'} //
393 (getpwuid($<))[7] // die "You're homeless!\n";
a0d0e21e 394
5a964f20
TC
395In particular, this means that you shouldn't use this
396for selecting between two aggregates for assignment:
397
398 @a = @b || @c; # this is wrong
399 @a = scalar(@b) || @c; # really meant this
400 @a = @b ? @b : @c; # this works fine, though
401
c963b151
BD
402As more readable alternatives to C<&&>, C<//> and C<||> when used for
403control flow, Perl provides C<and>, C<err> and C<or> operators (see below).
404The short-circuit behavior is identical. The precedence of "and", "err"
405and "or" is much lower, however, so that you can safely use them after a
5a964f20 406list operator without the need for parentheses:
a0d0e21e
LW
407
408 unlink "alpha", "beta", "gamma"
409 or gripe(), next LINE;
410
411With the C-style operators that would have been written like this:
412
413 unlink("alpha", "beta", "gamma")
414 || (gripe(), next LINE);
415
eeb6a2c9 416Using "or" for assignment is unlikely to do what you want; see below.
5a964f20
TC
417
418=head2 Range Operators
a0d0e21e
LW
419
420Binary ".." is the range operator, which is really two different
5a964f20 421operators depending on the context. In list context, it returns an
54ae734e 422list of values counting (up by ones) from the left value to the right
2cdbc966
JD
423value. If the left value is greater than the right value then it
424returns the empty array. The range operator is useful for writing
54ae734e 425C<foreach (1..10)> loops and for doing slice operations on arrays. In
2cdbc966
JD
426the current implementation, no temporary array is created when the
427range operator is used as the expression in C<foreach> loops, but older
428versions of Perl might burn a lot of memory when you write something
429like this:
a0d0e21e
LW
430
431 for (1 .. 1_000_000) {
432 # code
54310121 433 }
a0d0e21e 434
54ae734e
MG
435The range operator also works on strings, using the magical auto-increment,
436see below.
437
5a964f20 438In scalar context, ".." returns a boolean value. The operator is
a0d0e21e
LW
439bistable, like a flip-flop, and emulates the line-range (comma) operator
440of B<sed>, B<awk>, and various editors. Each ".." operator maintains its
441own boolean state. It is false as long as its left operand is false.
442Once the left operand is true, the range operator stays true until the
443right operand is true, I<AFTER> which the range operator becomes false
19799a22 444again. It doesn't become false till the next time the range operator is
a0d0e21e
LW
445evaluated. It can test the right operand and become false on the same
446evaluation it became true (as in B<awk>), but it still returns true once.
19799a22
GS
447If you don't want it to test the right operand till the next
448evaluation, as in B<sed>, just use three dots ("...") instead of
449two. In all other regards, "..." behaves just like ".." does.
450
451The right operand is not evaluated while the operator is in the
452"false" state, and the left operand is not evaluated while the
453operator is in the "true" state. The precedence is a little lower
454than || and &&. The value returned is either the empty string for
455false, or a sequence number (beginning with 1) for true. The
456sequence number is reset for each range encountered. The final
457sequence number in a range has the string "E0" appended to it, which
458doesn't affect its numeric value, but gives you something to search
459for if you want to exclude the endpoint. You can exclude the
460beginning point by waiting for the sequence number to be greater
df5f8116
CW
461than 1.
462
463If either operand of scalar ".." is a constant expression,
464that operand is considered true if it is equal (C<==>) to the current
465input line number (the C<$.> variable).
466
467To be pedantic, the comparison is actually C<int(EXPR) == int(EXPR)>,
468but that is only an issue if you use a floating point expression; when
469implicitly using C<$.> as described in the previous paragraph, the
470comparison is C<int(EXPR) == int($.)> which is only an issue when C<$.>
471is set to a floating point value and you are not reading from a file.
472Furthermore, C<"span" .. "spat"> or C<2.18 .. 3.14> will not do what
473you want in scalar context because each of the operands are evaluated
474using their integer representation.
475
476Examples:
a0d0e21e
LW
477
478As a scalar operator:
479
df5f8116
CW
480 if (101 .. 200) { print; } # print 2nd hundred lines, short for
481 # if ($. == 101 .. $. == 200) ...
482 next line if (1 .. /^$/); # skip header lines, short for
483 # ... if ($. == 1 .. /^$/);
a0d0e21e
LW
484 s/^/> / if (/^$/ .. eof()); # quote body
485
5a964f20
TC
486 # parse mail messages
487 while (<>) {
488 $in_header = 1 .. /^$/;
df5f8116
CW
489 $in_body = /^$/ .. eof;
490 if ($in_header) {
491 # ...
492 } else { # in body
493 # ...
494 }
5a964f20 495 } continue {
df5f8116 496 close ARGV if eof; # reset $. each file
5a964f20
TC
497 }
498
a0d0e21e
LW
499As a list operator:
500
501 for (101 .. 200) { print; } # print $_ 100 times
3e3baf6d 502 @foo = @foo[0 .. $#foo]; # an expensive no-op
a0d0e21e
LW
503 @foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items
504
5a964f20 505The range operator (in list context) makes use of the magical
5f05dabc 506auto-increment algorithm if the operands are strings. You
a0d0e21e
LW
507can say
508
509 @alphabet = ('A' .. 'Z');
510
54ae734e 511to get all normal letters of the English alphabet, or
a0d0e21e
LW
512
513 $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15];
514
515to get a hexadecimal digit, or
516
517 @z2 = ('01' .. '31'); print $z2[$mday];
518
519to get dates with leading zeros. If the final value specified is not
520in the sequence that the magical increment would produce, the sequence
521goes until the next value would be longer than the final value
522specified.
523
df5f8116
CW
524Because each operand is evaluated in integer form, C<2.18 .. 3.14> will
525return two elements in list context.
526
527 @list = (2.18 .. 3.14); # same as @list = (2 .. 3);
528
a0d0e21e
LW
529=head2 Conditional Operator
530
531Ternary "?:" is the conditional operator, just as in C. It works much
532like an if-then-else. If the argument before the ? is true, the
533argument before the : is returned, otherwise the argument after the :
cb1a09d0
AD
534is returned. For example:
535
54310121 536 printf "I have %d dog%s.\n", $n,
cb1a09d0
AD
537 ($n == 1) ? '' : "s";
538
539Scalar or list context propagates downward into the 2nd
54310121 540or 3rd argument, whichever is selected.
cb1a09d0
AD
541
542 $a = $ok ? $b : $c; # get a scalar
543 @a = $ok ? @b : @c; # get an array
544 $a = $ok ? @b : @c; # oops, that's just a count!
545
546The operator may be assigned to if both the 2nd and 3rd arguments are
547legal lvalues (meaning that you can assign to them):
a0d0e21e
LW
548
549 ($a_or_b ? $a : $b) = $c;
550
5a964f20
TC
551Because this operator produces an assignable result, using assignments
552without parentheses will get you in trouble. For example, this:
553
554 $a % 2 ? $a += 10 : $a += 2
555
556Really means this:
557
558 (($a % 2) ? ($a += 10) : $a) += 2
559
560Rather than this:
561
562 ($a % 2) ? ($a += 10) : ($a += 2)
563
19799a22
GS
564That should probably be written more simply as:
565
566 $a += ($a % 2) ? 10 : 2;
567
4633a7c4 568=head2 Assignment Operators
a0d0e21e
LW
569
570"=" is the ordinary assignment operator.
571
572Assignment operators work as in C. That is,
573
574 $a += 2;
575
576is equivalent to
577
578 $a = $a + 2;
579
580although without duplicating any side effects that dereferencing the lvalue
54310121 581might trigger, such as from tie(). Other assignment operators work similarly.
582The following are recognized:
a0d0e21e
LW
583
584 **= += *= &= <<= &&=
585 -= /= |= >>= ||=
586 .= %= ^=
587 x=
588
19799a22 589Although these are grouped by family, they all have the precedence
a0d0e21e
LW
590of assignment.
591
b350dd2f
GS
592Unlike in C, the scalar assignment operator produces a valid lvalue.
593Modifying an assignment is equivalent to doing the assignment and
594then modifying the variable that was assigned to. This is useful
595for modifying a copy of something, like this:
a0d0e21e
LW
596
597 ($tmp = $global) =~ tr [A-Z] [a-z];
598
599Likewise,
600
601 ($a += 2) *= 3;
602
603is equivalent to
604
605 $a += 2;
606 $a *= 3;
607
b350dd2f
GS
608Similarly, a list assignment in list context produces the list of
609lvalues assigned to, and a list assignment in scalar context returns
610the number of elements produced by the expression on the right hand
611side of the assignment.
612
748a9306 613=head2 Comma Operator
a0d0e21e 614
5a964f20 615Binary "," is the comma operator. In scalar context it evaluates
a0d0e21e
LW
616its left argument, throws that value away, then evaluates its right
617argument and returns that value. This is just like C's comma operator.
618
5a964f20 619In list context, it's just the list argument separator, and inserts
a0d0e21e
LW
620both its arguments into the list.
621
35f2feb0 622The => digraph is mostly just a synonym for the comma operator. It's useful for
cb1a09d0 623documenting arguments that come in pairs. As of release 5.001, it also forces
4633a7c4 624any word to the left of it to be interpreted as a string.
748a9306 625
a0d0e21e
LW
626=head2 List Operators (Rightward)
627
628On the right side of a list operator, it has very low precedence,
629such that it controls all comma-separated expressions found there.
630The only operators with lower precedence are the logical operators
631"and", "or", and "not", which may be used to evaluate calls to list
632operators without the need for extra parentheses:
633
634 open HANDLE, "filename"
635 or die "Can't open: $!\n";
636
5ba421f6 637See also discussion of list operators in L<Terms and List Operators (Leftward)>.
a0d0e21e
LW
638
639=head2 Logical Not
640
641Unary "not" returns the logical negation of the expression to its right.
642It's the equivalent of "!" except for the very low precedence.
643
644=head2 Logical And
645
646Binary "and" returns the logical conjunction of the two surrounding
647expressions. It's equivalent to && except for the very low
5f05dabc 648precedence. This means that it short-circuits: i.e., the right
a0d0e21e
LW
649expression is evaluated only if the left expression is true.
650
c963b151 651=head2 Logical or, Defined or, and Exclusive Or
a0d0e21e
LW
652
653Binary "or" returns the logical disjunction of the two surrounding
5a964f20
TC
654expressions. It's equivalent to || except for the very low precedence.
655This makes it useful for control flow
656
657 print FH $data or die "Can't write to FH: $!";
658
659This means that it short-circuits: i.e., the right expression is evaluated
660only if the left expression is false. Due to its precedence, you should
661probably avoid using this for assignment, only for control flow.
662
663 $a = $b or $c; # bug: this is wrong
664 ($a = $b) or $c; # really means this
665 $a = $b || $c; # better written this way
666
19799a22 667However, when it's a list-context assignment and you're trying to use
5a964f20
TC
668"||" for control flow, you probably need "or" so that the assignment
669takes higher precedence.
670
671 @info = stat($file) || die; # oops, scalar sense of stat!
672 @info = stat($file) or die; # better, now @info gets its due
673
c963b151
BD
674Then again, you could always use parentheses.
675
676Binary "err" is equivalent to C<//>--it's just like binary "or", except it tests
677its left argument's definedness instead of its truth. There are two ways to
678remember "err": either because many functions return C<undef> on an B<err>or,
679or as a sort of correction: C<$a=($b err 'default')>
a0d0e21e
LW
680
681Binary "xor" returns the exclusive-OR of the two surrounding expressions.
682It cannot short circuit, of course.
683
684=head2 C Operators Missing From Perl
685
686Here is what C has that Perl doesn't:
687
688=over 8
689
690=item unary &
691
692Address-of operator. (But see the "\" operator for taking a reference.)
693
694=item unary *
695
54310121 696Dereference-address operator. (Perl's prefix dereferencing
a0d0e21e
LW
697operators are typed: $, @, %, and &.)
698
699=item (TYPE)
700
19799a22 701Type-casting operator.
a0d0e21e
LW
702
703=back
704
5f05dabc 705=head2 Quote and Quote-like Operators
a0d0e21e
LW
706
707While we usually think of quotes as literal values, in Perl they
708function as operators, providing various kinds of interpolating and
709pattern matching capabilities. Perl provides customary quote characters
710for these behaviors, but also provides a way for you to choose your
711quote character for any of them. In the following table, a C<{}> represents
87275199 712any pair of delimiters you choose.
a0d0e21e 713
2c268ad5
TP
714 Customary Generic Meaning Interpolates
715 '' q{} Literal no
716 "" qq{} Literal yes
af9219ee 717 `` qx{} Command yes*
2c268ad5 718 qw{} Word list no
af9219ee
MG
719 // m{} Pattern match yes*
720 qr{} Pattern yes*
721 s{}{} Substitution yes*
2c268ad5 722 tr{}{} Transliteration no (but see below)
7e3b091d 723 <<EOF here-doc yes*
a0d0e21e 724
af9219ee
MG
725 * unless the delimiter is ''.
726
87275199
GS
727Non-bracketing delimiters use the same character fore and aft, but the four
728sorts of brackets (round, angle, square, curly) will all nest, which means
729that
730
731 q{foo{bar}baz}
35f2feb0 732
87275199
GS
733is the same as
734
735 'foo{bar}baz'
736
737Note, however, that this does not always work for quoting Perl code:
738
739 $s = q{ if($a eq "}") ... }; # WRONG
740
83df6a1d
JH
741is a syntax error. The C<Text::Balanced> module (from CPAN, and
742starting from Perl 5.8 part of the standard distribution) is able
743to do this properly.
87275199 744
19799a22 745There can be whitespace between the operator and the quoting
fb73857a 746characters, except when C<#> is being used as the quoting character.
19799a22
GS
747C<q#foo#> is parsed as the string C<foo>, while C<q #foo#> is the
748operator C<q> followed by a comment. Its argument will be taken
749from the next line. This allows you to write:
fb73857a 750
751 s {foo} # Replace foo
752 {bar} # with bar.
753
904501ec
MG
754The following escape sequences are available in constructs that interpolate
755and in transliterations.
a0d0e21e 756
6ee5d4e7 757 \t tab (HT, TAB)
5a964f20 758 \n newline (NL)
6ee5d4e7 759 \r return (CR)
760 \f form feed (FF)
761 \b backspace (BS)
762 \a alarm (bell) (BEL)
763 \e escape (ESC)
a0ed51b3
LW
764 \033 octal char (ESC)
765 \x1b hex char (ESC)
766 \x{263a} wide hex char (SMILEY)
19799a22 767 \c[ control char (ESC)
95cc3e0c 768 \N{name} named Unicode character
2c268ad5 769
904501ec
MG
770The following escape sequences are available in constructs that interpolate
771but not in transliterations.
772
a0d0e21e
LW
773 \l lowercase next char
774 \u uppercase next char
775 \L lowercase till \E
776 \U uppercase till \E
777 \E end case modification
1d2dff63 778 \Q quote non-word characters till \E
a0d0e21e 779
95cc3e0c
JH
780If C<use locale> is in effect, the case map used by C<\l>, C<\L>,
781C<\u> and C<\U> is taken from the current locale. See L<perllocale>.
782If Unicode (for example, C<\N{}> or wide hex characters of 0x100 or
783beyond) is being used, the case map used by C<\l>, C<\L>, C<\u> and
784C<\U> is as defined by Unicode. For documentation of C<\N{name}>,
785see L<charnames>.
a034a98d 786
5a964f20
TC
787All systems use the virtual C<"\n"> to represent a line terminator,
788called a "newline". There is no such thing as an unvarying, physical
19799a22 789newline character. It is only an illusion that the operating system,
5a964f20
TC
790device drivers, C libraries, and Perl all conspire to preserve. Not all
791systems read C<"\r"> as ASCII CR and C<"\n"> as ASCII LF. For example,
792on a Mac, these are reversed, and on systems without line terminator,
793printing C<"\n"> may emit no actual data. In general, use C<"\n"> when
794you mean a "newline" for your system, but use the literal ASCII when you
795need an exact character. For example, most networking protocols expect
2a380090 796and prefer a CR+LF (C<"\015\012"> or C<"\cM\cJ">) for line terminators,
5a964f20
TC
797and although they often accept just C<"\012">, they seldom tolerate just
798C<"\015">. If you get in the habit of using C<"\n"> for networking,
799you may be burned some day.
800
904501ec
MG
801For constructs that do interpolate, variables beginning with "C<$>"
802or "C<@>" are interpolated. Subscripted variables such as C<$a[3]> or
ad0f383a
A
803C<< $href->{key}[0] >> are also interpolated, as are array and hash slices.
804But method calls such as C<< $obj->meth >> are not.
af9219ee
MG
805
806Interpolating an array or slice interpolates the elements in order,
807separated by the value of C<$">, so is equivalent to interpolating
904501ec
MG
808C<join $", @array>. "Punctuation" arrays such as C<@+> are only
809interpolated if the name is enclosed in braces C<@{+}>.
af9219ee 810
1d2dff63
GS
811You cannot include a literal C<$> or C<@> within a C<\Q> sequence.
812An unescaped C<$> or C<@> interpolates the corresponding variable,
813while escaping will cause the literal string C<\$> to be inserted.
814You'll need to write something like C<m/\Quser\E\@\Qhost/>.
815
a0d0e21e
LW
816Patterns are subject to an additional level of interpretation as a
817regular expression. This is done as a second pass, after variables are
818interpolated, so that regular expressions may be incorporated into the
819pattern from the variables. If this is not what you want, use C<\Q> to
820interpolate a variable literally.
821
19799a22
GS
822Apart from the behavior described above, Perl does not expand
823multiple levels of interpolation. In particular, contrary to the
824expectations of shell programmers, back-quotes do I<NOT> interpolate
825within double quotes, nor do single quotes impede evaluation of
826variables when used within double quotes.
a0d0e21e 827
5f05dabc 828=head2 Regexp Quote-Like Operators
cb1a09d0 829
5f05dabc 830Here are the quote-like operators that apply to pattern
cb1a09d0
AD
831matching and related activities.
832
a0d0e21e
LW
833=over 8
834
835=item ?PATTERN?
836
837This is just like the C</pattern/> search, except that it matches only
838once between calls to the reset() operator. This is a useful
5f05dabc 839optimization when you want to see only the first occurrence of
a0d0e21e
LW
840something in each file of a set of files, for instance. Only C<??>
841patterns local to the current package are reset.
842
5a964f20
TC
843 while (<>) {
844 if (?^$?) {
845 # blank line between header and body
846 }
847 } continue {
848 reset if eof; # clear ?? status for next file
849 }
850
483b4840 851This usage is vaguely deprecated, which means it just might possibly
19799a22
GS
852be removed in some distant future version of Perl, perhaps somewhere
853around the year 2168.
a0d0e21e 854
fb73857a 855=item m/PATTERN/cgimosx
a0d0e21e 856
fb73857a 857=item /PATTERN/cgimosx
a0d0e21e 858
5a964f20 859Searches a string for a pattern match, and in scalar context returns
19799a22
GS
860true if it succeeds, false if it fails. If no string is specified
861via the C<=~> or C<!~> operator, the $_ string is searched. (The
862string specified with C<=~> need not be an lvalue--it may be the
863result of an expression evaluation, but remember the C<=~> binds
864rather tightly.) See also L<perlre>. See L<perllocale> for
865discussion of additional considerations that apply when C<use locale>
866is in effect.
a0d0e21e
LW
867
868Options are:
869
fb73857a 870 c Do not reset search position on a failed match when /g is in effect.
5f05dabc 871 g Match globally, i.e., find all occurrences.
a0d0e21e
LW
872 i Do case-insensitive pattern matching.
873 m Treat string as multiple lines.
5f05dabc 874 o Compile pattern only once.
a0d0e21e
LW
875 s Treat string as single line.
876 x Use extended regular expressions.
877
878If "/" is the delimiter then the initial C<m> is optional. With the C<m>
01ae956f 879you can use any pair of non-alphanumeric, non-whitespace characters
19799a22
GS
880as delimiters. This is particularly useful for matching path names
881that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is
7bac28a0 882the delimiter, then the match-only-once rule of C<?PATTERN?> applies.
19799a22 883If "'" is the delimiter, no interpolation is performed on the PATTERN.
a0d0e21e
LW
884
885PATTERN may contain variables, which will be interpolated (and the
f70b4f9c 886pattern recompiled) every time the pattern search is evaluated, except
1f247705
GS
887for when the delimiter is a single quote. (Note that C<$(>, C<$)>, and
888C<$|> are not interpolated because they look like end-of-string tests.)
f70b4f9c
AB
889If you want such a pattern to be compiled only once, add a C</o> after
890the trailing delimiter. This avoids expensive run-time recompilations,
891and is useful when the value you are interpolating won't change over
892the life of the script. However, mentioning C</o> constitutes a promise
893that you won't change the variables in the pattern. If you change them,
13a2d996 894Perl won't even notice. See also L<"qr/STRING/imosx">.
a0d0e21e 895
5a964f20 896If the PATTERN evaluates to the empty string, the last
d65afb4b
HS
897I<successfully> matched regular expression is used instead. In this
898case, only the C<g> and C<c> flags on the empty pattern is honoured -
899the other flags are taken from the original pattern. If no match has
900previously succeeded, this will (silently) act instead as a genuine
901empty pattern (which will always match).
a0d0e21e 902
c963b151
BD
903Note that it's possible to confuse Perl into thinking C<//> (the empty
904regex) is really C<//> (the defined-or operator). Perl is usually pretty
905good about this, but some pathological cases might trigger this, such as
906C<$a///> (is that C<($a) / (//)> or C<$a // />?) and C<print $fh //>
907(C<print $fh(//> or C<print($fh //>?). In all of these examples, Perl
908will assume you meant defined-or. If you meant the empty regex, just
909use parentheses or spaces to disambiguate, or even prefix the empty
910regex with an C<m> (so C<//> becomes C<m//>).
911
19799a22 912If the C</g> option is not used, C<m//> in list context returns a
a0d0e21e 913list consisting of the subexpressions matched by the parentheses in the
f7e33566
GS
914pattern, i.e., (C<$1>, C<$2>, C<$3>...). (Note that here C<$1> etc. are
915also set, and that this differs from Perl 4's behavior.) When there are
916no parentheses in the pattern, the return value is the list C<(1)> for
917success. With or without parentheses, an empty list is returned upon
918failure.
a0d0e21e
LW
919
920Examples:
921
922 open(TTY, '/dev/tty');
923 <TTY> =~ /^y/i && foo(); # do foo if desired
924
925 if (/Version: *([0-9.]*)/) { $version = $1; }
926
927 next if m#^/usr/spool/uucp#;
928
929 # poor man's grep
930 $arg = shift;
931 while (<>) {
932 print if /$arg/o; # compile only once
933 }
934
935 if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
936
937This last example splits $foo into the first two words and the
5f05dabc 938remainder of the line, and assigns those three fields to $F1, $F2, and
939$Etc. The conditional is true if any variables were assigned, i.e., if
a0d0e21e
LW
940the pattern matched.
941
19799a22
GS
942The C</g> modifier specifies global pattern matching--that is,
943matching as many times as possible within the string. How it behaves
944depends on the context. In list context, it returns a list of the
945substrings matched by any capturing parentheses in the regular
946expression. If there are no parentheses, it returns a list of all
947the matched strings, as if there were parentheses around the whole
948pattern.
a0d0e21e 949
7e86de3e 950In scalar context, each execution of C<m//g> finds the next match,
19799a22 951returning true if it matches, and false if there is no further match.
7e86de3e
MG
952The position after the last match can be read or set using the pos()
953function; see L<perlfunc/pos>. A failed match normally resets the
954search position to the beginning of the string, but you can avoid that
955by adding the C</c> modifier (e.g. C<m//gc>). Modifying the target
956string also resets the search position.
c90c0ff4 957
958You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a
959zero-width assertion that matches the exact position where the previous
5d43e42d
DC
960C<m//g>, if any, left off. Without the C</g> modifier, the C<\G> assertion
961still anchors at pos(), but the match is of course only attempted once.
962Using C<\G> without C</g> on a target string that has not previously had a
963C</g> match applied to it is the same as using the C<\A> assertion to match
fe4b3f22
RGS
964the beginning of the string. Note also that, currently, C<\G> is only
965properly supported when anchored at the very beginning of the pattern.
c90c0ff4 966
967Examples:
a0d0e21e
LW
968
969 # list context
970 ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
971
972 # scalar context
5d43e42d 973 $/ = "";
19799a22
GS
974 while (defined($paragraph = <>)) {
975 while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) {
976 $sentences++;
a0d0e21e
LW
977 }
978 }
979 print "$sentences\n";
980
c90c0ff4 981 # using m//gc with \G
137443ea 982 $_ = "ppooqppqq";
44a8e56a 983 while ($i++ < 2) {
984 print "1: '";
c90c0ff4 985 print $1 while /(o)/gc; print "', pos=", pos, "\n";
44a8e56a 986 print "2: '";
c90c0ff4 987 print $1 if /\G(q)/gc; print "', pos=", pos, "\n";
44a8e56a 988 print "3: '";
c90c0ff4 989 print $1 while /(p)/gc; print "', pos=", pos, "\n";
44a8e56a 990 }
5d43e42d 991 print "Final: '$1', pos=",pos,"\n" if /\G(.)/;
44a8e56a 992
993The last example should print:
994
995 1: 'oo', pos=4
137443ea 996 2: 'q', pos=5
44a8e56a 997 3: 'pp', pos=7
998 1: '', pos=7
137443ea 999 2: 'q', pos=8
1000 3: '', pos=8
5d43e42d
DC
1001 Final: 'q', pos=8
1002
1003Notice that the final match matched C<q> instead of C<p>, which a match
1004without the C<\G> anchor would have done. Also note that the final match
1005did not update C<pos> -- C<pos> is only updated on a C</g> match. If the
1006final match did indeed match C<p>, it's a good bet that you're running an
1007older (pre-5.6.0) Perl.
44a8e56a 1008
c90c0ff4 1009A useful idiom for C<lex>-like scanners is C</\G.../gc>. You can
e7ea3e70 1010combine several regexps like this to process a string part-by-part,
c90c0ff4 1011doing different actions depending on which regexp matched. Each
1012regexp tries to match where the previous one leaves off.
e7ea3e70 1013
3fe9a6f1 1014 $_ = <<'EOL';
e7ea3e70 1015 $url = new URI::URL "http://www/"; die if $url eq "xXx";
3fe9a6f1 1016 EOL
1017 LOOP:
e7ea3e70 1018 {
c90c0ff4 1019 print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc;
1020 print(" lowercase"), redo LOOP if /\G[a-z]+\b[,.;]?\s*/gc;
1021 print(" UPPERCASE"), redo LOOP if /\G[A-Z]+\b[,.;]?\s*/gc;
1022 print(" Capitalized"), redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/gc;
1023 print(" MiXeD"), redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/gc;
1024 print(" alphanumeric"), redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/gc;
1025 print(" line-noise"), redo LOOP if /\G[^A-Za-z0-9]+/gc;
e7ea3e70
IZ
1026 print ". That's all!\n";
1027 }
1028
1029Here is the output (split into several lines):
1030
1031 line-noise lowercase line-noise lowercase UPPERCASE line-noise
1032 UPPERCASE line-noise lowercase line-noise lowercase line-noise
1033 lowercase lowercase line-noise lowercase lowercase line-noise
1034 MiXeD line-noise. That's all!
44a8e56a 1035
a0d0e21e
LW
1036=item q/STRING/
1037
1038=item C<'STRING'>
1039
19799a22 1040A single-quoted, literal string. A backslash represents a backslash
68dc0745 1041unless followed by the delimiter or another backslash, in which case
1042the delimiter or backslash is interpolated.
a0d0e21e
LW
1043
1044 $foo = q!I said, "You said, 'She said it.'"!;
1045 $bar = q('This is it.');
68dc0745 1046 $baz = '\n'; # a two-character string
a0d0e21e
LW
1047
1048=item qq/STRING/
1049
1050=item "STRING"
1051
1052A double-quoted, interpolated string.
1053
1054 $_ .= qq
1055 (*** The previous line contains the naughty word "$1".\n)
19799a22 1056 if /\b(tcl|java|python)\b/i; # :-)
68dc0745 1057 $baz = "\n"; # a one-character string
a0d0e21e 1058
eec2d3df
GS
1059=item qr/STRING/imosx
1060
322edccd 1061This operator quotes (and possibly compiles) its I<STRING> as a regular
19799a22
GS
1062expression. I<STRING> is interpolated the same way as I<PATTERN>
1063in C<m/PATTERN/>. If "'" is used as the delimiter, no interpolation
1064is done. Returns a Perl value which may be used instead of the
1065corresponding C</STRING/imosx> expression.
4b6a7270
IZ
1066
1067For example,
1068
1069 $rex = qr/my.STRING/is;
1070 s/$rex/foo/;
1071
1072is equivalent to
1073
1074 s/my.STRING/foo/is;
1075
1076The result may be used as a subpattern in a match:
eec2d3df
GS
1077
1078 $re = qr/$pattern/;
0a92e3a8
GS
1079 $string =~ /foo${re}bar/; # can be interpolated in other patterns
1080 $string =~ $re; # or used standalone
4b6a7270
IZ
1081 $string =~ /$re/; # or this way
1082
1083Since Perl may compile the pattern at the moment of execution of qr()
19799a22 1084operator, using qr() may have speed advantages in some situations,
4b6a7270
IZ
1085notably if the result of qr() is used standalone:
1086
1087 sub match {
1088 my $patterns = shift;
1089 my @compiled = map qr/$_/i, @$patterns;
1090 grep {
1091 my $success = 0;
a7665c5e 1092 foreach my $pat (@compiled) {
4b6a7270
IZ
1093 $success = 1, last if /$pat/;
1094 }
1095 $success;
1096 } @_;
1097 }
1098
19799a22
GS
1099Precompilation of the pattern into an internal representation at
1100the moment of qr() avoids a need to recompile the pattern every
1101time a match C</$pat/> is attempted. (Perl has many other internal
1102optimizations, but none would be triggered in the above example if
1103we did not use qr() operator.)
eec2d3df
GS
1104
1105Options are:
1106
1107 i Do case-insensitive pattern matching.
1108 m Treat string as multiple lines.
1109 o Compile pattern only once.
1110 s Treat string as single line.
1111 x Use extended regular expressions.
1112
0a92e3a8
GS
1113See L<perlre> for additional information on valid syntax for STRING, and
1114for a detailed look at the semantics of regular expressions.
1115
a0d0e21e
LW
1116=item qx/STRING/
1117
1118=item `STRING`
1119
43dd4d21
JH
1120A string which is (possibly) interpolated and then executed as a
1121system command with C</bin/sh> or its equivalent. Shell wildcards,
1122pipes, and redirections will be honored. The collected standard
1123output of the command is returned; standard error is unaffected. In
1124scalar context, it comes back as a single (potentially multi-line)
1125string, or undef if the command failed. In list context, returns a
1126list of lines (however you've defined lines with $/ or
1127$INPUT_RECORD_SEPARATOR), or an empty list if the command failed.
5a964f20
TC
1128
1129Because backticks do not affect standard error, use shell file descriptor
1130syntax (assuming the shell supports this) if you care to address this.
1131To capture a command's STDERR and STDOUT together:
a0d0e21e 1132
5a964f20
TC
1133 $output = `cmd 2>&1`;
1134
1135To capture a command's STDOUT but discard its STDERR:
1136
1137 $output = `cmd 2>/dev/null`;
1138
1139To capture a command's STDERR but discard its STDOUT (ordering is
1140important here):
1141
1142 $output = `cmd 2>&1 1>/dev/null`;
1143
1144To exchange a command's STDOUT and STDERR in order to capture the STDERR
1145but leave its STDOUT to come out the old STDERR:
1146
1147 $output = `cmd 3>&1 1>&2 2>&3 3>&-`;
1148
1149To read both a command's STDOUT and its STDERR separately, it's easiest
1150and safest to redirect them separately to files, and then read from those
1151files when the program is done:
1152
1153 system("program args 1>/tmp/program.stdout 2>/tmp/program.stderr");
1154
1155Using single-quote as a delimiter protects the command from Perl's
1156double-quote interpolation, passing it on to the shell instead:
1157
1158 $perl_info = qx(ps $$); # that's Perl's $$
1159 $shell_info = qx'ps $$'; # that's the new shell's $$
1160
19799a22 1161How that string gets evaluated is entirely subject to the command
5a964f20
TC
1162interpreter on your system. On most platforms, you will have to protect
1163shell metacharacters if you want them treated literally. This is in
1164practice difficult to do, as it's unclear how to escape which characters.
1165See L<perlsec> for a clean and safe example of a manual fork() and exec()
1166to emulate backticks safely.
a0d0e21e 1167
bb32b41a
GS
1168On some platforms (notably DOS-like ones), the shell may not be
1169capable of dealing with multiline commands, so putting newlines in
1170the string may not get you what you want. You may be able to evaluate
1171multiple commands in a single line by separating them with the command
1172separator character, if your shell supports that (e.g. C<;> on many Unix
1173shells; C<&> on the Windows NT C<cmd> shell).
1174
0f897271
GS
1175Beginning with v5.6.0, Perl will attempt to flush all files opened for
1176output before starting the child process, but this may not be supported
1177on some platforms (see L<perlport>). To be safe, you may need to set
1178C<$|> ($AUTOFLUSH in English) or call the C<autoflush()> method of
1179C<IO::Handle> on any open handles.
1180
bb32b41a
GS
1181Beware that some command shells may place restrictions on the length
1182of the command line. You must ensure your strings don't exceed this
1183limit after any necessary interpolations. See the platform-specific
1184release notes for more details about your particular environment.
1185
5a964f20
TC
1186Using this operator can lead to programs that are difficult to port,
1187because the shell commands called vary between systems, and may in
1188fact not be present at all. As one example, the C<type> command under
1189the POSIX shell is very different from the C<type> command under DOS.
1190That doesn't mean you should go out of your way to avoid backticks
1191when they're the right way to get something done. Perl was made to be
1192a glue language, and one of the things it glues together is commands.
1193Just understand what you're getting yourself into.
bb32b41a 1194
dc848c6f 1195See L<"I/O Operators"> for more discussion.
a0d0e21e 1196
945c54fd
JH
1197=item qw/STRING/
1198
1199Evaluates to a list of the words extracted out of STRING, using embedded
1200whitespace as the word delimiters. It can be understood as being roughly
1201equivalent to:
1202
1203 split(' ', q/STRING/);
1204
efb1e162
CW
1205the differences being that it generates a real list at compile time, and
1206in scalar context it returns the last element in the list. So
945c54fd
JH
1207this expression:
1208
1209 qw(foo bar baz)
1210
1211is semantically equivalent to the list:
1212
1213 'foo', 'bar', 'baz'
1214
1215Some frequently seen examples:
1216
1217 use POSIX qw( setlocale localeconv )
1218 @EXPORT = qw( foo bar baz );
1219
1220A common mistake is to try to separate the words with comma or to
1221put comments into a multi-line C<qw>-string. For this reason, the
1222C<use warnings> pragma and the B<-w> switch (that is, the C<$^W> variable)
1223produces warnings if the STRING contains the "," or the "#" character.
1224
a0d0e21e
LW
1225=item s/PATTERN/REPLACEMENT/egimosx
1226
1227Searches a string for a pattern, and if found, replaces that pattern
1228with the replacement text and returns the number of substitutions
e37d713d 1229made. Otherwise it returns false (specifically, the empty string).
a0d0e21e
LW
1230
1231If no string is specified via the C<=~> or C<!~> operator, the C<$_>
1232variable is searched and modified. (The string specified with C<=~> must
5a964f20 1233be scalar variable, an array element, a hash element, or an assignment
5f05dabc 1234to one of those, i.e., an lvalue.)
a0d0e21e 1235
19799a22 1236If the delimiter chosen is a single quote, no interpolation is
a0d0e21e
LW
1237done on either the PATTERN or the REPLACEMENT. Otherwise, if the
1238PATTERN contains a $ that looks like a variable rather than an
1239end-of-string test, the variable will be interpolated into the pattern
5f05dabc 1240at run-time. If you want the pattern compiled only once the first time
a0d0e21e 1241the variable is interpolated, use the C</o> option. If the pattern
5a964f20 1242evaluates to the empty string, the last successfully executed regular
a0d0e21e 1243expression is used instead. See L<perlre> for further explanation on these.
5a964f20 1244See L<perllocale> for discussion of additional considerations that apply
a034a98d 1245when C<use locale> is in effect.
a0d0e21e
LW
1246
1247Options are:
1248
1249 e Evaluate the right side as an expression.
5f05dabc 1250 g Replace globally, i.e., all occurrences.
a0d0e21e
LW
1251 i Do case-insensitive pattern matching.
1252 m Treat string as multiple lines.
5f05dabc 1253 o Compile pattern only once.
a0d0e21e
LW
1254 s Treat string as single line.
1255 x Use extended regular expressions.
1256
1257Any non-alphanumeric, non-whitespace delimiter may replace the
1258slashes. If single quotes are used, no interpretation is done on the
e37d713d 1259replacement string (the C</e> modifier overrides this, however). Unlike
54310121 1260Perl 4, Perl 5 treats backticks as normal delimiters; the replacement
e37d713d 1261text is not evaluated as a command. If the
a0d0e21e 1262PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own
5f05dabc 1263pair of quotes, which may or may not be bracketing quotes, e.g.,
35f2feb0 1264C<s(foo)(bar)> or C<< s<foo>/bar/ >>. A C</e> will cause the
cec88af6
GS
1265replacement portion to be treated as a full-fledged Perl expression
1266and evaluated right then and there. It is, however, syntax checked at
1267compile-time. A second C<e> modifier will cause the replacement portion
1268to be C<eval>ed before being run as a Perl expression.
a0d0e21e
LW
1269
1270Examples:
1271
1272 s/\bgreen\b/mauve/g; # don't change wintergreen
1273
1274 $path =~ s|/usr/bin|/usr/local/bin|;
1275
1276 s/Login: $foo/Login: $bar/; # run-time pattern
1277
5a964f20 1278 ($foo = $bar) =~ s/this/that/; # copy first, then change
a0d0e21e 1279
5a964f20 1280 $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-count
a0d0e21e
LW
1281
1282 $_ = 'abc123xyz';
1283 s/\d+/$&*2/e; # yields 'abc246xyz'
1284 s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz'
1285 s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz'
1286
1287 s/%(.)/$percent{$1}/g; # change percent escapes; no /e
1288 s/%(.)/$percent{$1} || $&/ge; # expr now, so /e
1289 s/^=(\w+)/&pod($1)/ge; # use function call
1290
5a964f20
TC
1291 # expand variables in $_, but dynamics only, using
1292 # symbolic dereferencing
1293 s/\$(\w+)/${$1}/g;
1294
cec88af6
GS
1295 # Add one to the value of any numbers in the string
1296 s/(\d+)/1 + $1/eg;
1297
1298 # This will expand any embedded scalar variable
1299 # (including lexicals) in $_ : First $1 is interpolated
1300 # to the variable name, and then evaluated
a0d0e21e
LW
1301 s/(\$\w+)/$1/eeg;
1302
5a964f20 1303 # Delete (most) C comments.
a0d0e21e 1304 $program =~ s {
4633a7c4
LW
1305 /\* # Match the opening delimiter.
1306 .*? # Match a minimal number of characters.
1307 \*/ # Match the closing delimiter.
a0d0e21e
LW
1308 } []gsx;
1309
5a964f20
TC
1310 s/^\s*(.*?)\s*$/$1/; # trim white space in $_, expensively
1311
1312 for ($variable) { # trim white space in $variable, cheap
1313 s/^\s+//;
1314 s/\s+$//;
1315 }
a0d0e21e
LW
1316
1317 s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
1318
54310121 1319Note the use of $ instead of \ in the last example. Unlike
35f2feb0
GS
1320B<sed>, we use the \<I<digit>> form in only the left hand side.
1321Anywhere else it's $<I<digit>>.
a0d0e21e 1322
5f05dabc 1323Occasionally, you can't use just a C</g> to get all the changes
19799a22 1324to occur that you might want. Here are two common cases:
a0d0e21e
LW
1325
1326 # put commas in the right places in an integer
19799a22 1327 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g;
a0d0e21e
LW
1328
1329 # expand tabs to 8-column spacing
1330 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
1331
6940069f 1332=item tr/SEARCHLIST/REPLACEMENTLIST/cds
a0d0e21e 1333
6940069f 1334=item y/SEARCHLIST/REPLACEMENTLIST/cds
a0d0e21e 1335
2c268ad5 1336Transliterates all occurrences of the characters found in the search list
a0d0e21e
LW
1337with the corresponding character in the replacement list. It returns
1338the number of characters replaced or deleted. If no string is
2c268ad5 1339specified via the =~ or !~ operator, the $_ string is transliterated. (The
54310121 1340string specified with =~ must be a scalar variable, an array element, a
1341hash element, or an assignment to one of those, i.e., an lvalue.)
8ada0baa 1342
2c268ad5
TP
1343A character range may be specified with a hyphen, so C<tr/A-J/0-9/>
1344does the same replacement as C<tr/ACEGIBDFHJ/0246813579/>.
54310121 1345For B<sed> devotees, C<y> is provided as a synonym for C<tr>. If the
1346SEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST has
1347its own pair of quotes, which may or may not be bracketing quotes,
2c268ad5 1348e.g., C<tr[A-Z][a-z]> or C<tr(+\-*/)/ABCD/>.
a0d0e21e 1349
cc255d5f
JH
1350Note that C<tr> does B<not> do regular expression character classes
1351such as C<\d> or C<[:lower:]>. The <tr> operator is not equivalent to
1352the tr(1) utility. If you want to map strings between lower/upper
1353cases, see L<perlfunc/lc> and L<perlfunc/uc>, and in general consider
1354using the C<s> operator if you need regular expressions.
1355
8ada0baa
JH
1356Note also that the whole range idea is rather unportable between
1357character sets--and even within character sets they may cause results
1358you probably didn't expect. A sound principle is to use only ranges
1359that begin from and end at either alphabets of equal case (a-e, A-E),
1360or digits (0-4). Anything else is unsafe. If in doubt, spell out the
1361character sets in full.
1362
a0d0e21e
LW
1363Options:
1364
1365 c Complement the SEARCHLIST.
1366 d Delete found but unreplaced characters.
1367 s Squash duplicate replaced characters.
1368
19799a22
GS
1369If the C</c> modifier is specified, the SEARCHLIST character set
1370is complemented. If the C</d> modifier is specified, any characters
1371specified by SEARCHLIST not found in REPLACEMENTLIST are deleted.
1372(Note that this is slightly more flexible than the behavior of some
1373B<tr> programs, which delete anything they find in the SEARCHLIST,
1374period.) If the C</s> modifier is specified, sequences of characters
1375that were transliterated to the same character are squashed down
1376to a single instance of the character.
a0d0e21e
LW
1377
1378If the C</d> modifier is used, the REPLACEMENTLIST is always interpreted
1379exactly as specified. Otherwise, if the REPLACEMENTLIST is shorter
1380than the SEARCHLIST, the final character is replicated till it is long
5a964f20 1381enough. If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated.
a0d0e21e
LW
1382This latter is useful for counting characters in a class or for
1383squashing character sequences in a class.
1384
1385Examples:
1386
1387 $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case
1388
1389 $cnt = tr/*/*/; # count the stars in $_
1390
1391 $cnt = $sky =~ tr/*/*/; # count the stars in $sky
1392
1393 $cnt = tr/0-9//; # count the digits in $_
1394
1395 tr/a-zA-Z//s; # bookkeeper -> bokeper
1396
1397 ($HOST = $host) =~ tr/a-z/A-Z/;
1398
1399 tr/a-zA-Z/ /cs; # change non-alphas to single space
1400
1401 tr [\200-\377]
1402 [\000-\177]; # delete 8th bit
1403
19799a22
GS
1404If multiple transliterations are given for a character, only the
1405first one is used:
748a9306
LW
1406
1407 tr/AAA/XYZ/
1408
2c268ad5 1409will transliterate any A to X.
748a9306 1410
19799a22 1411Because the transliteration table is built at compile time, neither
a0d0e21e 1412the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote
19799a22
GS
1413interpolation. That means that if you want to use variables, you
1414must use an eval():
a0d0e21e
LW
1415
1416 eval "tr/$oldlist/$newlist/";
1417 die $@ if $@;
1418
1419 eval "tr/$oldlist/$newlist/, 1" or die $@;
1420
7e3b091d
DA
1421=item <<EOF
1422
1423A line-oriented form of quoting is based on the shell "here-document"
1424syntax. Following a C<< << >> you specify a string to terminate
1425the quoted material, and all lines following the current line down to
1426the terminating string are the value of the item. The terminating
1427string may be either an identifier (a word), or some quoted text. If
1428quoted, the type of quotes you use determines the treatment of the
1429text, just as in regular quoting. An unquoted identifier works like
1430double quotes. There must be no space between the C<< << >> and
1431the identifier, unless the identifier is quoted. (If you put a space it
1432will be treated as a null identifier, which is valid, and matches the first
1433empty line.) The terminating string must appear by itself (unquoted and
1434with no surrounding whitespace) on the terminating line.
1435
1436 print <<EOF;
1437 The price is $Price.
1438 EOF
1439
1440 print << "EOF"; # same as above
1441 The price is $Price.
1442 EOF
1443
1444 print << `EOC`; # execute commands
1445 echo hi there
1446 echo lo there
1447 EOC
1448
1449 print <<"foo", <<"bar"; # you can stack them
1450 I said foo.
1451 foo
1452 I said bar.
1453 bar
1454
1455 myfunc(<< "THIS", 23, <<'THAT');
1456 Here's a line
1457 or two.
1458 THIS
1459 and here's another.
1460 THAT
1461
1462Just don't forget that you have to put a semicolon on the end
1463to finish the statement, as Perl doesn't know you're not going to
1464try to do this:
1465
1466 print <<ABC
1467 179231
1468 ABC
1469 + 20;
1470
1471If you want your here-docs to be indented with the
1472rest of the code, you'll need to remove leading whitespace
1473from each line manually:
1474
1475 ($quote = <<'FINIS') =~ s/^\s+//gm;
1476 The Road goes ever on and on,
1477 down from the door where it began.
1478 FINIS
1479
1480If you use a here-doc within a delimited construct, such as in C<s///eg>,
1481the quoted material must come on the lines following the final delimiter.
1482So instead of
1483
1484 s/this/<<E . 'that'
1485 the other
1486 E
1487 . 'more '/eg;
1488
1489you have to write
1490
1491 s/this/<<E . 'that'
1492 . 'more '/eg;
1493 the other
1494 E
1495
1496If the terminating identifier is on the last line of the program, you
1497must be sure there is a newline after it; otherwise, Perl will give the
1498warning B<Can't find string terminator "END" anywhere before EOF...>.
1499
1500Additionally, the quoting rules for the identifier are not related to
1501Perl's quoting rules -- C<q()>, C<qq()>, and the like are not supported
1502in place of C<''> and C<"">, and the only interpolation is for backslashing
1503the quoting character:
1504
1505 print << "abc\"def";
1506 testing...
1507 abc"def
1508
1509Finally, quoted strings cannot span multiple lines. The general rule is
1510that the identifier must be a string literal. Stick with that, and you
1511should be safe.
1512
a0d0e21e
LW
1513=back
1514
75e14d17
IZ
1515=head2 Gory details of parsing quoted constructs
1516
19799a22
GS
1517When presented with something that might have several different
1518interpretations, Perl uses the B<DWIM> (that's "Do What I Mean")
1519principle to pick the most probable interpretation. This strategy
1520is so successful that Perl programmers often do not suspect the
1521ambivalence of what they write. But from time to time, Perl's
1522notions differ substantially from what the author honestly meant.
1523
1524This section hopes to clarify how Perl handles quoted constructs.
1525Although the most common reason to learn this is to unravel labyrinthine
1526regular expressions, because the initial steps of parsing are the
1527same for all quoting operators, they are all discussed together.
1528
1529The most important Perl parsing rule is the first one discussed
1530below: when processing a quoted construct, Perl first finds the end
1531of that construct, then interprets its contents. If you understand
1532this rule, you may skip the rest of this section on the first
1533reading. The other rules are likely to contradict the user's
1534expectations much less frequently than this first one.
1535
1536Some passes discussed below are performed concurrently, but because
1537their results are the same, we consider them individually. For different
1538quoting constructs, Perl performs different numbers of passes, from
1539one to five, but these passes are always performed in the same order.
75e14d17 1540
13a2d996 1541=over 4
75e14d17
IZ
1542
1543=item Finding the end
1544
19799a22
GS
1545The first pass is finding the end of the quoted construct, whether
1546it be a multicharacter delimiter C<"\nEOF\n"> in the C<<<EOF>
1547construct, a C</> that terminates a C<qq//> construct, a C<]> which
35f2feb0
GS
1548terminates C<qq[]> construct, or a C<< > >> which terminates a
1549fileglob started with C<< < >>.
75e14d17 1550
19799a22
GS
1551When searching for single-character non-pairing delimiters, such
1552as C</>, combinations of C<\\> and C<\/> are skipped. However,
1553when searching for single-character pairing delimiter like C<[>,
1554combinations of C<\\>, C<\]>, and C<\[> are all skipped, and nested
1555C<[>, C<]> are skipped as well. When searching for multicharacter
1556delimiters, nothing is skipped.
75e14d17 1557
19799a22
GS
1558For constructs with three-part delimiters (C<s///>, C<y///>, and
1559C<tr///>), the search is repeated once more.
75e14d17 1560
19799a22
GS
1561During this search no attention is paid to the semantics of the construct.
1562Thus:
75e14d17
IZ
1563
1564 "$hash{"$foo/$bar"}"
1565
2a94b7ce 1566or:
75e14d17
IZ
1567
1568 m/
2a94b7ce 1569 bar # NOT a comment, this slash / terminated m//!
75e14d17
IZ
1570 /x
1571
19799a22
GS
1572do not form legal quoted expressions. The quoted part ends on the
1573first C<"> and C</>, and the rest happens to be a syntax error.
1574Because the slash that terminated C<m//> was followed by a C<SPACE>,
1575the example above is not C<m//x>, but rather C<m//> with no C</x>
1576modifier. So the embedded C<#> is interpreted as a literal C<#>.
75e14d17
IZ
1577
1578=item Removal of backslashes before delimiters
1579
19799a22
GS
1580During the second pass, text between the starting and ending
1581delimiters is copied to a safe location, and the C<\> is removed
1582from combinations consisting of C<\> and delimiter--or delimiters,
1583meaning both starting and ending delimiters will should these differ.
1584This removal does not happen for multi-character delimiters.
1585Note that the combination C<\\> is left intact, just as it was.
75e14d17 1586
19799a22
GS
1587Starting from this step no information about the delimiters is
1588used in parsing.
75e14d17
IZ
1589
1590=item Interpolation
1591
19799a22
GS
1592The next step is interpolation in the text obtained, which is now
1593delimiter-independent. There are four different cases.
75e14d17 1594
13a2d996 1595=over 4
75e14d17
IZ
1596
1597=item C<<<'EOF'>, C<m''>, C<s'''>, C<tr///>, C<y///>
1598
1599No interpolation is performed.
1600
1601=item C<''>, C<q//>
1602
1603The only interpolation is removal of C<\> from pairs C<\\>.
1604
35f2feb0 1605=item C<"">, C<``>, C<qq//>, C<qx//>, C<< <file*glob> >>
75e14d17 1606
19799a22
GS
1607C<\Q>, C<\U>, C<\u>, C<\L>, C<\l> (possibly paired with C<\E>) are
1608converted to corresponding Perl constructs. Thus, C<"$foo\Qbaz$bar">
1609is converted to C<$foo . (quotemeta("baz" . $bar))> internally.
1610The other combinations are replaced with appropriate expansions.
2a94b7ce 1611
19799a22
GS
1612Let it be stressed that I<whatever falls between C<\Q> and C<\E>>
1613is interpolated in the usual way. Something like C<"\Q\\E"> has
1614no C<\E> inside. instead, it has C<\Q>, C<\\>, and C<E>, so the
1615result is the same as for C<"\\\\E">. As a general rule, backslashes
1616between C<\Q> and C<\E> may lead to counterintuitive results. So,
1617C<"\Q\t\E"> is converted to C<quotemeta("\t")>, which is the same
1618as C<"\\\t"> (since TAB is not alphanumeric). Note also that:
2a94b7ce
IZ
1619
1620 $str = '\t';
1621 return "\Q$str";
1622
1623may be closer to the conjectural I<intention> of the writer of C<"\Q\t\E">.
1624
19799a22 1625Interpolated scalars and arrays are converted internally to the C<join> and
92d29cee 1626C<.> catenation operations. Thus, C<"$foo XXX '@arr'"> becomes:
75e14d17 1627
19799a22 1628 $foo . " XXX '" . (join $", @arr) . "'";
75e14d17 1629
19799a22 1630All operations above are performed simultaneously, left to right.
75e14d17 1631
19799a22
GS
1632Because the result of C<"\Q STRING \E"> has all metacharacters
1633quoted, there is no way to insert a literal C<$> or C<@> inside a
1634C<\Q\E> pair. If protected by C<\>, C<$> will be quoted to became
1635C<"\\\$">; if not, it is interpreted as the start of an interpolated
1636scalar.
75e14d17 1637
19799a22
GS
1638Note also that the interpolation code needs to make a decision on
1639where the interpolated scalar ends. For instance, whether
35f2feb0 1640C<< "a $b -> {c}" >> really means:
75e14d17
IZ
1641
1642 "a " . $b . " -> {c}";
1643
2a94b7ce 1644or:
75e14d17
IZ
1645
1646 "a " . $b -> {c};
1647
19799a22
GS
1648Most of the time, the longest possible text that does not include
1649spaces between components and which contains matching braces or
1650brackets. because the outcome may be determined by voting based
1651on heuristic estimators, the result is not strictly predictable.
1652Fortunately, it's usually correct for ambiguous cases.
75e14d17
IZ
1653
1654=item C<?RE?>, C</RE/>, C<m/RE/>, C<s/RE/foo/>,
1655
19799a22
GS
1656Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, and interpolation
1657happens (almost) as with C<qq//> constructs, but the substitution
1658of C<\> followed by RE-special chars (including C<\>) is not
1659performed. Moreover, inside C<(?{BLOCK})>, C<(?# comment )>, and
1660a C<#>-comment in a C<//x>-regular expression, no processing is
1661performed whatsoever. This is the first step at which the presence
1662of the C<//x> modifier is relevant.
1663
1664Interpolation has several quirks: C<$|>, C<$(>, and C<$)> are not
1665interpolated, and constructs C<$var[SOMETHING]> are voted (by several
1666different estimators) to be either an array element or C<$var>
1667followed by an RE alternative. This is where the notation
1668C<${arr[$bar]}> comes handy: C</${arr[0-9]}/> is interpreted as
1669array element C<-9>, not as a regular expression from the variable
1670C<$arr> followed by a digit, which would be the interpretation of
1671C</$arr[0-9]/>. Since voting among different estimators may occur,
1672the result is not predictable.
1673
1674It is at this step that C<\1> is begrudgingly converted to C<$1> in
1675the replacement text of C<s///> to correct the incorrigible
1676I<sed> hackers who haven't picked up the saner idiom yet. A warning
9f1b1f2d
GS
1677is emitted if the C<use warnings> pragma or the B<-w> command-line flag
1678(that is, the C<$^W> variable) was set.
19799a22
GS
1679
1680The lack of processing of C<\\> creates specific restrictions on
1681the post-processed text. If the delimiter is C</>, one cannot get
1682the combination C<\/> into the result of this step. C</> will
1683finish the regular expression, C<\/> will be stripped to C</> on
1684the previous step, and C<\\/> will be left as is. Because C</> is
1685equivalent to C<\/> inside a regular expression, this does not
1686matter unless the delimiter happens to be character special to the
1687RE engine, such as in C<s*foo*bar*>, C<m[foo]>, or C<?foo?>; or an
1688alphanumeric char, as in:
2a94b7ce
IZ
1689
1690 m m ^ a \s* b mmx;
1691
19799a22 1692In the RE above, which is intentionally obfuscated for illustration, the
2a94b7ce 1693delimiter is C<m>, the modifier is C<mx>, and after backslash-removal the
aa863641 1694RE is the same as for C<m/ ^ a \s* b /mx>. There's more than one
19799a22
GS
1695reason you're encouraged to restrict your delimiters to non-alphanumeric,
1696non-whitespace choices.
75e14d17
IZ
1697
1698=back
1699
19799a22 1700This step is the last one for all constructs except regular expressions,
75e14d17
IZ
1701which are processed further.
1702
1703=item Interpolation of regular expressions
1704
19799a22
GS
1705Previous steps were performed during the compilation of Perl code,
1706but this one happens at run time--although it may be optimized to
1707be calculated at compile time if appropriate. After preprocessing
1708described above, and possibly after evaluation if catenation,
1709joining, casing translation, or metaquoting are involved, the
1710resulting I<string> is passed to the RE engine for compilation.
1711
1712Whatever happens in the RE engine might be better discussed in L<perlre>,
1713but for the sake of continuity, we shall do so here.
1714
1715This is another step where the presence of the C<//x> modifier is
1716relevant. The RE engine scans the string from left to right and
1717converts it to a finite automaton.
1718
1719Backslashed characters are either replaced with corresponding
1720literal strings (as with C<\{>), or else they generate special nodes
1721in the finite automaton (as with C<\b>). Characters special to the
1722RE engine (such as C<|>) generate corresponding nodes or groups of
1723nodes. C<(?#...)> comments are ignored. All the rest is either
1724converted to literal strings to match, or else is ignored (as is
1725whitespace and C<#>-style comments if C<//x> is present).
1726
1727Parsing of the bracketed character class construct, C<[...]>, is
1728rather different than the rule used for the rest of the pattern.
1729The terminator of this construct is found using the same rules as
1730for finding the terminator of a C<{}>-delimited construct, the only
1731exception being that C<]> immediately following C<[> is treated as
1732though preceded by a backslash. Similarly, the terminator of
1733C<(?{...})> is found using the same rules as for finding the
1734terminator of a C<{}>-delimited construct.
1735
1736It is possible to inspect both the string given to RE engine and the
1737resulting finite automaton. See the arguments C<debug>/C<debugcolor>
1738in the C<use L<re>> pragma, as well as Perl's B<-Dr> command-line
4a4eefd0 1739switch documented in L<perlrun/"Command Switches">.
75e14d17
IZ
1740
1741=item Optimization of regular expressions
1742
7522fed5 1743This step is listed for completeness only. Since it does not change
75e14d17 1744semantics, details of this step are not documented and are subject
19799a22
GS
1745to change without notice. This step is performed over the finite
1746automaton that was generated during the previous pass.
2a94b7ce 1747
19799a22
GS
1748It is at this stage that C<split()> silently optimizes C</^/> to
1749mean C</^/m>.
75e14d17
IZ
1750
1751=back
1752
a0d0e21e
LW
1753=head2 I/O Operators
1754
54310121 1755There are several I/O operators you should know about.
fbad3eb5 1756
7b8d334a 1757A string enclosed by backticks (grave accents) first undergoes
19799a22
GS
1758double-quote interpolation. It is then interpreted as an external
1759command, and the output of that command is the value of the
e9c56f9b
JH
1760backtick string, like in a shell. In scalar context, a single string
1761consisting of all output is returned. In list context, a list of
1762values is returned, one per line of output. (You can set C<$/> to use
1763a different line terminator.) The command is executed each time the
1764pseudo-literal is evaluated. The status value of the command is
1765returned in C<$?> (see L<perlvar> for the interpretation of C<$?>).
1766Unlike in B<csh>, no translation is done on the return data--newlines
1767remain newlines. Unlike in any of the shells, single quotes do not
1768hide variable names in the command from interpretation. To pass a
1769literal dollar-sign through to the shell you need to hide it with a
1770backslash. The generalized form of backticks is C<qx//>. (Because
1771backticks always undergo shell expansion as well, see L<perlsec> for
1772security concerns.)
19799a22
GS
1773
1774In scalar context, evaluating a filehandle in angle brackets yields
1775the next line from that file (the newline, if any, included), or
1776C<undef> at end-of-file or on error. When C<$/> is set to C<undef>
1777(sometimes known as file-slurp mode) and the file is empty, it
1778returns C<''> the first time, followed by C<undef> subsequently.
1779
1780Ordinarily you must assign the returned value to a variable, but
1781there is one situation where an automatic assignment happens. If
1782and only if the input symbol is the only thing inside the conditional
1783of a C<while> statement (even if disguised as a C<for(;;)> loop),
1784the value is automatically assigned to the global variable $_,
1785destroying whatever was there previously. (This may seem like an
1786odd thing to you, but you'll use the construct in almost every Perl
17b829fa 1787script you write.) The $_ variable is not implicitly localized.
19799a22
GS
1788You'll have to put a C<local $_;> before the loop if you want that
1789to happen.
1790
1791The following lines are equivalent:
a0d0e21e 1792
748a9306 1793 while (defined($_ = <STDIN>)) { print; }
7b8d334a 1794 while ($_ = <STDIN>) { print; }
a0d0e21e
LW
1795 while (<STDIN>) { print; }
1796 for (;<STDIN>;) { print; }
748a9306 1797 print while defined($_ = <STDIN>);
7b8d334a 1798 print while ($_ = <STDIN>);
a0d0e21e
LW
1799 print while <STDIN>;
1800
19799a22 1801This also behaves similarly, but avoids $_ :
7b8d334a
GS
1802
1803 while (my $line = <STDIN>) { print $line }
1804
19799a22
GS
1805In these loop constructs, the assigned value (whether assignment
1806is automatic or explicit) is then tested to see whether it is
1807defined. The defined test avoids problems where line has a string
1808value that would be treated as false by Perl, for example a "" or
1809a "0" with no trailing newline. If you really mean for such values
1810to terminate the loop, they should be tested for explicitly:
7b8d334a
GS
1811
1812 while (($_ = <STDIN>) ne '0') { ... }
1813 while (<STDIN>) { last unless $_; ... }
1814
35f2feb0 1815In other boolean contexts, C<< <I<filehandle>> >> without an
9f1b1f2d
GS
1816explicit C<defined> test or comparison elicit a warning if the
1817C<use warnings> pragma or the B<-w>
19799a22 1818command-line switch (the C<$^W> variable) is in effect.
7b8d334a 1819
5f05dabc 1820The filehandles STDIN, STDOUT, and STDERR are predefined. (The
19799a22
GS
1821filehandles C<stdin>, C<stdout>, and C<stderr> will also work except
1822in packages, where they would be interpreted as local identifiers
1823rather than global.) Additional filehandles may be created with
1824the open() function, amongst others. See L<perlopentut> and
1825L<perlfunc/open> for details on this.
a0d0e21e 1826
35f2feb0 1827If a <FILEHANDLE> is used in a context that is looking for
19799a22
GS
1828a list, a list comprising all input lines is returned, one line per
1829list element. It's easy to grow to a rather large data space this
1830way, so use with care.
a0d0e21e 1831
35f2feb0 1832<FILEHANDLE> may also be spelled C<readline(*FILEHANDLE)>.
19799a22 1833See L<perlfunc/readline>.
fbad3eb5 1834
35f2feb0
GS
1835The null filehandle <> is special: it can be used to emulate the
1836behavior of B<sed> and B<awk>. Input from <> comes either from
a0d0e21e 1837standard input, or from each file listed on the command line. Here's
35f2feb0 1838how it works: the first time <> is evaluated, the @ARGV array is
5a964f20 1839checked, and if it is empty, C<$ARGV[0]> is set to "-", which when opened
a0d0e21e
LW
1840gives you standard input. The @ARGV array is then processed as a list
1841of filenames. The loop
1842
1843 while (<>) {
1844 ... # code for each line
1845 }
1846
1847is equivalent to the following Perl-like pseudo code:
1848
3e3baf6d 1849 unshift(@ARGV, '-') unless @ARGV;
a0d0e21e
LW
1850 while ($ARGV = shift) {
1851 open(ARGV, $ARGV);
1852 while (<ARGV>) {
1853 ... # code for each line
1854 }
1855 }
1856
19799a22
GS
1857except that it isn't so cumbersome to say, and will actually work.
1858It really does shift the @ARGV array and put the current filename
1859into the $ARGV variable. It also uses filehandle I<ARGV>
35f2feb0 1860internally--<> is just a synonym for <ARGV>, which
19799a22 1861is magical. (The pseudo code above doesn't work because it treats
35f2feb0 1862<ARGV> as non-magical.)
a0d0e21e 1863
35f2feb0 1864You can modify @ARGV before the first <> as long as the array ends up
a0d0e21e 1865containing the list of filenames you really want. Line numbers (C<$.>)
19799a22
GS
1866continue as though the input were one big happy file. See the example
1867in L<perlfunc/eof> for how to reset line numbers on each file.
5a964f20
TC
1868
1869If you want to set @ARGV to your own list of files, go right ahead.
1870This sets @ARGV to all plain text files if no @ARGV was given:
1871
1872 @ARGV = grep { -f && -T } glob('*') unless @ARGV;
a0d0e21e 1873
5a964f20
TC
1874You can even set them to pipe commands. For example, this automatically
1875filters compressed arguments through B<gzip>:
1876
1877 @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV;
1878
1879If you want to pass switches into your script, you can use one of the
a0d0e21e
LW
1880Getopts modules or put a loop on the front like this:
1881
1882 while ($_ = $ARGV[0], /^-/) {
1883 shift;
1884 last if /^--$/;
1885 if (/^-D(.*)/) { $debug = $1 }
1886 if (/^-v/) { $verbose++ }
5a964f20 1887 # ... # other switches
a0d0e21e 1888 }
5a964f20 1889
a0d0e21e 1890 while (<>) {
5a964f20 1891 # ... # code for each line
a0d0e21e
LW
1892 }
1893
35f2feb0 1894The <> symbol will return C<undef> for end-of-file only once.
19799a22
GS
1895If you call it again after this, it will assume you are processing another
1896@ARGV list, and if you haven't set @ARGV, will read input from STDIN.
a0d0e21e 1897
b159ebd3 1898If what the angle brackets contain is a simple scalar variable (e.g.,
35f2feb0 1899<$foo>), then that variable contains the name of the
19799a22
GS
1900filehandle to input from, or its typeglob, or a reference to the
1901same. For example:
cb1a09d0
AD
1902
1903 $fh = \*STDIN;
1904 $line = <$fh>;
a0d0e21e 1905
5a964f20
TC
1906If what's within the angle brackets is neither a filehandle nor a simple
1907scalar variable containing a filehandle name, typeglob, or typeglob
1908reference, it is interpreted as a filename pattern to be globbed, and
1909either a list of filenames or the next filename in the list is returned,
19799a22 1910depending on context. This distinction is determined on syntactic
35f2feb0
GS
1911grounds alone. That means C<< <$x> >> is always a readline() from
1912an indirect handle, but C<< <$hash{key}> >> is always a glob().
5a964f20
TC
1913That's because $x is a simple scalar variable, but C<$hash{key}> is
1914not--it's a hash element.
1915
1916One level of double-quote interpretation is done first, but you can't
35f2feb0 1917say C<< <$foo> >> because that's an indirect filehandle as explained
5a964f20
TC
1918in the previous paragraph. (In older versions of Perl, programmers
1919would insert curly brackets to force interpretation as a filename glob:
35f2feb0 1920C<< <${foo}> >>. These days, it's considered cleaner to call the
5a964f20 1921internal function directly as C<glob($foo)>, which is probably the right
19799a22 1922way to have done it in the first place.) For example:
a0d0e21e
LW
1923
1924 while (<*.c>) {
1925 chmod 0644, $_;
1926 }
1927
3a4b19e4 1928is roughly equivalent to:
a0d0e21e
LW
1929
1930 open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
1931 while (<FOO>) {
5b3eff12 1932 chomp;
a0d0e21e
LW
1933 chmod 0644, $_;
1934 }
1935
3a4b19e4
GS
1936except that the globbing is actually done internally using the standard
1937C<File::Glob> extension. Of course, the shortest way to do the above is:
a0d0e21e
LW
1938
1939 chmod 0644, <*.c>;
1940
19799a22
GS
1941A (file)glob evaluates its (embedded) argument only when it is
1942starting a new list. All values must be read before it will start
1943over. In list context, this isn't important because you automatically
1944get them all anyway. However, in scalar context the operator returns
069e01df 1945the next value each time it's called, or C<undef> when the list has
19799a22
GS
1946run out. As with filehandle reads, an automatic C<defined> is
1947generated when the glob occurs in the test part of a C<while>,
1948because legal glob returns (e.g. a file called F<0>) would otherwise
1949terminate the loop. Again, C<undef> is returned only once. So if
1950you're expecting a single value from a glob, it is much better to
1951say
4633a7c4
LW
1952
1953 ($file) = <blurch*>;
1954
1955than
1956
1957 $file = <blurch*>;
1958
1959because the latter will alternate between returning a filename and
19799a22 1960returning false.
4633a7c4 1961
b159ebd3 1962If you're trying to do variable interpolation, it's definitely better
4633a7c4 1963to use the glob() function, because the older notation can cause people
e37d713d 1964to become confused with the indirect filehandle notation.
4633a7c4
LW
1965
1966 @files = glob("$dir/*.[ch]");
1967 @files = glob($files[$i]);
1968
a0d0e21e
LW
1969=head2 Constant Folding
1970
1971Like C, Perl does a certain amount of expression evaluation at
19799a22 1972compile time whenever it determines that all arguments to an
a0d0e21e
LW
1973operator are static and have no side effects. In particular, string
1974concatenation happens at compile time between literals that don't do
19799a22 1975variable substitution. Backslash interpolation also happens at
a0d0e21e
LW
1976compile time. You can say
1977
1978 'Now is the time for all' . "\n" .
1979 'good men to come to.'
1980
54310121 1981and this all reduces to one string internally. Likewise, if
a0d0e21e
LW
1982you say
1983
1984 foreach $file (@filenames) {
5a964f20 1985 if (-s $file > 5 + 100 * 2**16) { }
54310121 1986 }
a0d0e21e 1987
19799a22
GS
1988the compiler will precompute the number which that expression
1989represents so that the interpreter won't have to.
a0d0e21e 1990
2c268ad5
TP
1991=head2 Bitwise String Operators
1992
1993Bitstrings of any size may be manipulated by the bitwise operators
1994(C<~ | & ^>).
1995
19799a22
GS
1996If the operands to a binary bitwise op are strings of different
1997sizes, B<|> and B<^> ops act as though the shorter operand had
1998additional zero bits on the right, while the B<&> op acts as though
1999the longer operand were truncated to the length of the shorter.
2000The granularity for such extension or truncation is one or more
2001bytes.
2c268ad5
TP
2002
2003 # ASCII-based examples
2004 print "j p \n" ^ " a h"; # prints "JAPH\n"
2005 print "JA" | " ph\n"; # prints "japh\n"
2006 print "japh\nJunk" & '_____'; # prints "JAPH\n";
2007 print 'p N$' ^ " E<H\n"; # prints "Perl\n";
2008
19799a22 2009If you are intending to manipulate bitstrings, be certain that
2c268ad5 2010you're supplying bitstrings: If an operand is a number, that will imply
19799a22 2011a B<numeric> bitwise operation. You may explicitly show which type of
2c268ad5
TP
2012operation you intend by using C<""> or C<0+>, as in the examples below.
2013
2014 $foo = 150 | 105 ; # yields 255 (0x96 | 0x69 is 0xFF)
2015 $foo = '150' | 105 ; # yields 255
2016 $foo = 150 | '105'; # yields 255
2017 $foo = '150' | '105'; # yields string '155' (under ASCII)
2018
2019 $baz = 0+$foo & 0+$bar; # both ops explicitly numeric
2020 $biz = "$foo" ^ "$bar"; # both ops explicitly stringy
a0d0e21e 2021
1ae175c8
GS
2022See L<perlfunc/vec> for information on how to manipulate individual bits
2023in a bit vector.
2024
55497cff 2025=head2 Integer Arithmetic
a0d0e21e 2026
19799a22 2027By default, Perl assumes that it must do most of its arithmetic in
a0d0e21e
LW
2028floating point. But by saying
2029
2030 use integer;
2031
2032you may tell the compiler that it's okay to use integer operations
19799a22
GS
2033(if it feels like it) from here to the end of the enclosing BLOCK.
2034An inner BLOCK may countermand this by saying
a0d0e21e
LW
2035
2036 no integer;
2037
19799a22
GS
2038which lasts until the end of that BLOCK. Note that this doesn't
2039mean everything is only an integer, merely that Perl may use integer
2040operations if it is so inclined. For example, even under C<use
2041integer>, if you take the C<sqrt(2)>, you'll still get C<1.4142135623731>
2042or so.
2043
2044Used on numbers, the bitwise operators ("&", "|", "^", "~", "<<",
13a2d996
SP
2045and ">>") always produce integral results. (But see also
2046L<Bitwise String Operators>.) However, C<use integer> still has meaning for
19799a22
GS
2047them. By default, their results are interpreted as unsigned integers, but
2048if C<use integer> is in effect, their results are interpreted
2049as signed integers. For example, C<~0> usually evaluates to a large
2050integral value. However, C<use integer; ~0> is C<-1> on twos-complement
2051machines.
68dc0745 2052
2053=head2 Floating-point Arithmetic
2054
2055While C<use integer> provides integer-only arithmetic, there is no
19799a22
GS
2056analogous mechanism to provide automatic rounding or truncation to a
2057certain number of decimal places. For rounding to a certain number
2058of digits, sprintf() or printf() is usually the easiest route.
2059See L<perlfaq4>.
68dc0745 2060
5a964f20
TC
2061Floating-point numbers are only approximations to what a mathematician
2062would call real numbers. There are infinitely more reals than floats,
2063so some corners must be cut. For example:
2064
2065 printf "%.20g\n", 123456789123456789;
2066 # produces 123456789123456784
2067
2068Testing for exact equality of floating-point equality or inequality is
2069not a good idea. Here's a (relatively expensive) work-around to compare
2070whether two floating-point numbers are equal to a particular number of
2071decimal places. See Knuth, volume II, for a more robust treatment of
2072this topic.
2073
2074 sub fp_equal {
2075 my ($X, $Y, $POINTS) = @_;
2076 my ($tX, $tY);
2077 $tX = sprintf("%.${POINTS}g", $X);
2078 $tY = sprintf("%.${POINTS}g", $Y);
2079 return $tX eq $tY;
2080 }
2081
68dc0745 2082The POSIX module (part of the standard perl distribution) implements
19799a22
GS
2083ceil(), floor(), and other mathematical and trigonometric functions.
2084The Math::Complex module (part of the standard perl distribution)
2085defines mathematical functions that work on both the reals and the
2086imaginary numbers. Math::Complex not as efficient as POSIX, but
68dc0745 2087POSIX can't work with complex numbers.
2088
2089Rounding in financial applications can have serious implications, and
2090the rounding method used should be specified precisely. In these
2091cases, it probably pays not to trust whichever system rounding is
2092being used by Perl, but to instead implement the rounding function you
2093need yourself.
5a964f20
TC
2094
2095=head2 Bigger Numbers
2096
2097The standard Math::BigInt and Math::BigFloat modules provide
19799a22 2098variable-precision arithmetic and overloaded operators, although
cd5c4fce 2099they're currently pretty slow. At the cost of some space and
19799a22
GS
2100considerable speed, they avoid the normal pitfalls associated with
2101limited-precision representations.
5a964f20
TC
2102
2103 use Math::BigInt;
2104 $x = Math::BigInt->new('123456789123456789');
2105 print $x * $x;
2106
2107 # prints +15241578780673678515622620750190521
19799a22 2108
cd5c4fce
T
2109There are several modules that let you calculate with (bound only by
2110memory and cpu-time) unlimited or fixed precision. There are also
2111some non-standard modules that provide faster implementations via
2112external C libraries.
2113
2114Here is a short, but incomplete summary:
2115
2116 Math::Fraction big, unlimited fractions like 9973 / 12967
2117 Math::String treat string sequences like numbers
2118 Math::FixedPrecision calculate with a fixed precision
2119 Math::Currency for currency calculations
2120 Bit::Vector manipulate bit vectors fast (uses C)
2121 Math::BigIntFast Bit::Vector wrapper for big numbers
2122 Math::Pari provides access to the Pari C library
2123 Math::BigInteger uses an external C library
2124 Math::Cephes uses external Cephes C library (no big numbers)
2125 Math::Cephes::Fraction fractions via the Cephes library
2126 Math::GMP another one using an external C library
2127
2128Choose wisely.
16070b82
GS
2129
2130=cut