This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
perlhack: Nits and update for v5.22
[perl5.git] / pod / perlop.pod
CommitLineData
a0d0e21e 1=head1 NAME
d74e8afc 2X<operator>
a0d0e21e
LW
3
4perlop - Perl operators and precedence
5
d042e63d
MS
6=head1 DESCRIPTION
7
ae3f7391 8In Perl, the operator determines what operation is performed,
db691027
SF
9independent of the type of the operands. For example C<$x + $y>
10is always a numeric addition, and if C<$x> or C<$y> do not contain
ae3f7391
ML
11numbers, an attempt is made to convert them to numbers first.
12
13This is in contrast to many other dynamic languages, where the
46f8a5ea 14operation is determined by the type of the first argument. It also
ae3f7391 15means that Perl has two versions of some operators, one for numeric
db691027
SF
16and one for string comparison. For example C<$x == $y> compares
17two numbers for equality, and C<$x eq $y> compares two strings.
ae3f7391
ML
18
19There are a few exceptions though: C<x> can be either string
20repetition or list repetition, depending on the type of the left
0b55efd7 21operand, and C<&>, C<|>, C<^> and C<~> can be either string or numeric bit
ae3f7391
ML
22operations.
23
89d205f2 24=head2 Operator Precedence and Associativity
d74e8afc 25X<operator, precedence> X<precedence> X<associativity>
d042e63d
MS
26
27Operator precedence and associativity work in Perl more or less like
28they do in mathematics.
29
30I<Operator precedence> means some operators are evaluated before
31others. For example, in C<2 + 4 * 5>, the multiplication has higher
32precedence so C<4 * 5> is evaluated first yielding C<2 + 20 ==
3322> and not C<6 * 5 == 30>.
34
35I<Operator associativity> defines what happens if a sequence of the
36same operators is used one after another: whether the evaluator will
37evaluate the left operations first or the right. For example, in C<8
38- 4 - 2>, subtraction is left associative so Perl evaluates the
39expression left to right. C<8 - 4> is evaluated first making the
40expression C<4 - 2 == 2> and not C<8 - 2 == 6>.
a0d0e21e
LW
41
42Perl operators have the following associativity and precedence,
19799a22
GS
43listed from highest precedence to lowest. Operators borrowed from
44C keep the same precedence relationship with each other, even where
45C's precedence is slightly screwy. (This makes learning Perl easier
46for C folks.) With very few exceptions, these all operate on scalar
47values only, not array values.
a0d0e21e
LW
48
49 left terms and list operators (leftward)
50 left ->
51 nonassoc ++ --
52 right **
53 right ! ~ \ and unary + and -
54310121 54 left =~ !~
a0d0e21e
LW
55 left * / % x
56 left + - .
57 left << >>
58 nonassoc named unary operators
59 nonassoc < > <= >= lt gt le ge
0d863452 60 nonassoc == != <=> eq ne cmp ~~
a0d0e21e
LW
61 left &
62 left | ^
63 left &&
c963b151 64 left || //
137443ea 65 nonassoc .. ...
a0d0e21e 66 right ?:
2ba1f20a 67 right = += -= *= etc. goto last next redo dump
a0d0e21e
LW
68 left , =>
69 nonassoc list operators (rightward)
a5f75d66 70 right not
a0d0e21e 71 left and
f23102e2 72 left or xor
a0d0e21e
LW
73
74In the following sections, these operators are covered in precedence order.
75
5a964f20
TC
76Many operators can be overloaded for objects. See L<overload>.
77
a0d0e21e 78=head2 Terms and List Operators (Leftward)
d74e8afc 79X<list operator> X<operator, list> X<term>
a0d0e21e 80
62c18ce2 81A TERM has the highest precedence in Perl. They include variables,
5f05dabc 82quote and quote-like operators, any expression in parentheses,
a0d0e21e
LW
83and any function whose arguments are parenthesized. Actually, there
84aren't really functions in this sense, just list operators and unary
85operators behaving as functions because you put parentheses around
86the arguments. These are all documented in L<perlfunc>.
87
88If any list operator (print(), etc.) or any unary operator (chdir(), etc.)
89is followed by a left parenthesis as the next token, the operator and
90arguments within parentheses are taken to be of highest precedence,
91just like a normal function call.
92
93In the absence of parentheses, the precedence of list operators such as
94C<print>, C<sort>, or C<chmod> is either very high or very low depending on
54310121 95whether you are looking at the left side or the right side of the operator.
a0d0e21e
LW
96For example, in
97
98 @ary = (1, 3, sort 4, 2);
99 print @ary; # prints 1324
100
19799a22
GS
101the commas on the right of the sort are evaluated before the sort,
102but the commas on the left are evaluated after. In other words,
103list operators tend to gobble up all arguments that follow, and
a0d0e21e 104then act like a simple TERM with regard to the preceding expression.
19799a22 105Be careful with parentheses:
a0d0e21e
LW
106
107 # These evaluate exit before doing the print:
108 print($foo, exit); # Obviously not what you want.
109 print $foo, exit; # Nor is this.
110
111 # These do the print before evaluating exit:
112 (print $foo), exit; # This is what you want.
113 print($foo), exit; # Or this.
114 print ($foo), exit; # Or even this.
115
116Also note that
117
118 print ($foo & 255) + 1, "\n";
119
d042e63d
MS
120probably doesn't do what you expect at first glance. The parentheses
121enclose the argument list for C<print> which is evaluated (printing
122the result of C<$foo & 255>). Then one is added to the return value
123of C<print> (usually 1). The result is something like this:
124
125 1 + 1, "\n"; # Obviously not what you meant.
126
127To do what you meant properly, you must write:
128
129 print(($foo & 255) + 1, "\n");
130
131See L<Named Unary Operators> for more discussion of this.
a0d0e21e
LW
132
133Also parsed as terms are the C<do {}> and C<eval {}> constructs, as
54310121 134well as subroutine and method calls, and the anonymous
a0d0e21e
LW
135constructors C<[]> and C<{}>.
136
2ae324a7 137See also L<Quote and Quote-like Operators> toward the end of this section,
da87341d 138as well as L</"I/O Operators">.
a0d0e21e
LW
139
140=head2 The Arrow Operator
d74e8afc 141X<arrow> X<dereference> X<< -> >>
a0d0e21e 142
35f2feb0 143"C<< -> >>" is an infix dereference operator, just as it is in C
19799a22
GS
144and C++. If the right side is either a C<[...]>, C<{...}>, or a
145C<(...)> subscript, then the left side must be either a hard or
146symbolic reference to an array, a hash, or a subroutine respectively.
147(Or technically speaking, a location capable of holding a hard
148reference, if it's an array or hash reference being used for
149assignment.) See L<perlreftut> and L<perlref>.
a0d0e21e 150
19799a22
GS
151Otherwise, the right side is a method name or a simple scalar
152variable containing either the method name or a subroutine reference,
153and the left side must be either an object (a blessed reference)
154or a class name (that is, a package name). See L<perlobj>.
a0d0e21e 155
821361b6
RS
156The dereferencing cases (as opposed to method-calling cases) are
157somewhat extended by the experimental C<postderef> feature. For the
158details of that feature, consult L<perlref/Postfix Dereference Syntax>.
159
5f05dabc 160=head2 Auto-increment and Auto-decrement
d74e8afc 161X<increment> X<auto-increment> X<++> X<decrement> X<auto-decrement> X<-->
a0d0e21e 162
d042e63d
MS
163"++" and "--" work as in C. That is, if placed before a variable,
164they increment or decrement the variable by one before returning the
165value, and if placed after, increment or decrement after returning the
166value.
167
168 $i = 0; $j = 0;
169 print $i++; # prints 0
170 print ++$j; # prints 1
a0d0e21e 171
b033823e 172Note that just as in C, Perl doesn't define B<when> the variable is
46f8a5ea
FC
173incremented or decremented. You just know it will be done sometime
174before or after the value is returned. This also means that modifying
c543c01b 175a variable twice in the same statement will lead to undefined behavior.
b033823e
A
176Avoid statements like:
177
178 $i = $i ++;
179 print ++ $i + $i ++;
180
181Perl will not guarantee what the result of the above statements is.
182
54310121 183The auto-increment operator has a little extra builtin magic to it. If
a0d0e21e
LW
184you increment a variable that is numeric, or that has ever been used in
185a numeric context, you get a normal increment. If, however, the
5f05dabc 186variable has been used in only string contexts since it was set, and
5a964f20 187has a value that is not the empty string and matches the pattern
9c0670e1 188C</^[a-zA-Z]*[0-9]*\z/>, the increment is done as a string, preserving each
a0d0e21e
LW
189character within its range, with carry:
190
c543c01b
TC
191 print ++($foo = "99"); # prints "100"
192 print ++($foo = "a0"); # prints "a1"
193 print ++($foo = "Az"); # prints "Ba"
194 print ++($foo = "zz"); # prints "aaa"
a0d0e21e 195
6a61d433
HS
196C<undef> is always treated as numeric, and in particular is changed
197to C<0> before incrementing (so that a post-increment of an undef value
198will return C<0> rather than C<undef>).
199
5f05dabc 200The auto-decrement operator is not magical.
a0d0e21e
LW
201
202=head2 Exponentiation
d74e8afc 203X<**> X<exponentiation> X<power>
a0d0e21e 204
19799a22 205Binary "**" is the exponentiation operator. It binds even more
46f8a5ea 206tightly than unary minus, so -2**4 is -(2**4), not (-2)**4. (This is
cb1a09d0
AD
207implemented using C's pow(3) function, which actually works on doubles
208internally.)
a0d0e21e 209
44a465b3
JH
210Note that certain exponentiation expressions are ill-defined:
211these include C<0**0>, C<1**Inf>, and C<Inf**0>. Do not expect
212any particular results from these special cases, the results
213are platform-dependent.
214
a0d0e21e 215=head2 Symbolic Unary Operators
d74e8afc 216X<unary operator> X<operator, unary>
a0d0e21e 217
1ca345ed 218Unary "!" performs logical negation, that is, "not". See also C<not> for a lower
a0d0e21e 219precedence version of this.
d74e8afc 220X<!>
a0d0e21e 221
da2f94c5
FC
222Unary "-" performs arithmetic negation if the operand is numeric,
223including any string that looks like a number. If the operand is
224an identifier, a string consisting of a minus sign concatenated
225with the identifier is returned. Otherwise, if the string starts
226with a plus or minus, a string starting with the opposite sign is
227returned. One effect of these rules is that -bareword is equivalent
8705167b 228to the string "-bareword". If, however, the string begins with a
353c6505 229non-alphabetic character (excluding "+" or "-"), Perl will attempt to convert
46f8a5ea 230the string to a numeric and the arithmetic negation is performed. If the
06705523
SP
231string cannot be cleanly converted to a numeric, Perl will give the warning
232B<Argument "the string" isn't numeric in negation (-) at ...>.
d74e8afc 233X<-> X<negation, arithmetic>
a0d0e21e 234
1ca345ed 235Unary "~" performs bitwise negation, that is, 1's complement. For
972b05a9
JH
236example, C<0666 & ~027> is 0640. (See also L<Integer Arithmetic> and
237L<Bitwise String Operators>.) Note that the width of the result is
238platform-dependent: ~0 is 32 bits wide on a 32-bit platform, but 64
239bits wide on a 64-bit platform, so if you are expecting a certain bit
f113cf86 240width, remember to use the "&" operator to mask off the excess bits.
d74e8afc 241X<~> X<negation, binary>
a0d0e21e 242
f113cf86
TC
243When complementing strings, if all characters have ordinal values under
244256, then their complements will, also. But if they do not, all
245characters will be in either 32- or 64-bit complements, depending on your
246architecture. So for example, C<~"\x{3B1}"> is C<"\x{FFFF_FC4E}"> on
24732-bit machines and C<"\x{FFFF_FFFF_FFFF_FC4E}"> on 64-bit machines.
248
fb7054ba
FC
249If the experimental "bitwise" feature is enabled via C<use feature
250'bitwise'>, then unary "~" always treats its argument as a number, and an
251alternate form of the operator, "~.", always treats its argument as a
252string. So C<~0> and C<~"0"> will both give 2**32-1 on 32-bit platforms,
253whereas C<~.0> and C<~."0"> will both yield C<"\xff">. This feature
254produces a warning unless you use C<no warnings 'experimental::bitwise'>.
255
a0d0e21e
LW
256Unary "+" has no effect whatsoever, even on strings. It is useful
257syntactically for separating a function name from a parenthesized expression
258that would otherwise be interpreted as the complete list of function
5ba421f6 259arguments. (See examples above under L<Terms and List Operators (Leftward)>.)
d74e8afc 260X<+>
a0d0e21e 261
19799a22
GS
262Unary "\" creates a reference to whatever follows it. See L<perlreftut>
263and L<perlref>. Do not confuse this behavior with the behavior of
264backslash within a string, although both forms do convey the notion
265of protecting the next thing from interpolation.
d74e8afc 266X<\> X<reference> X<backslash>
a0d0e21e
LW
267
268=head2 Binding Operators
d74e8afc 269X<binding> X<operator, binding> X<=~> X<!~>
a0d0e21e 270
c07a80fd 271Binary "=~" binds a scalar expression to a pattern match. Certain operations
cb1a09d0
AD
272search or modify the string $_ by default. This operator makes that kind
273of operation work on some other string. The right argument is a search
2c268ad5
TP
274pattern, substitution, or transliteration. The left argument is what is
275supposed to be searched, substituted, or transliterated instead of the default
f8bab1e9 276$_. When used in scalar context, the return value generally indicates the
8ff32507
FC
277success of the operation. The exceptions are substitution (s///)
278and transliteration (y///) with the C</r> (non-destructive) option,
279which cause the B<r>eturn value to be the result of the substitution.
280Behavior in list context depends on the particular operator.
000c65fc
DG
281See L</"Regexp Quote-Like Operators"> for details and L<perlretut> for
282examples using these operators.
f8bab1e9
GS
283
284If the right argument is an expression rather than a search pattern,
2c268ad5 285substitution, or transliteration, it is interpreted as a search pattern at run
46f8a5ea
FC
286time. Note that this means that its
287contents will be interpolated twice, so
89d205f2 288
1ca345ed 289 '\\' =~ q'\\';
89d205f2
YO
290
291is not ok, as the regex engine will end up trying to compile the
292pattern C<\>, which it will consider a syntax error.
a0d0e21e
LW
293
294Binary "!~" is just like "=~" except the return value is negated in
295the logical sense.
296
8ff32507
FC
297Binary "!~" with a non-destructive substitution (s///r) or transliteration
298(y///r) is a syntax error.
4f4d7508 299
a0d0e21e 300=head2 Multiplicative Operators
d74e8afc 301X<operator, multiplicative>
a0d0e21e
LW
302
303Binary "*" multiplies two numbers.
d74e8afc 304X<*>
a0d0e21e
LW
305
306Binary "/" divides two numbers.
d74e8afc 307X</> X<slash>
a0d0e21e 308
f7918450
KW
309Binary "%" is the modulo operator, which computes the division
310remainder of its first argument with respect to its second argument.
311Given integer
db691027
SF
312operands C<$m> and C<$n>: If C<$n> is positive, then C<$m % $n> is
313C<$m> minus the largest multiple of C<$n> less than or equal to
314C<$m>. If C<$n> is negative, then C<$m % $n> is C<$m> minus the
315smallest multiple of C<$n> that is not less than C<$m> (that is, the
89b4f0ad 316result will be less than or equal to zero). If the operands
db691027
SF
317C<$m> and C<$n> are floating point values and the absolute value of
318C<$n> (that is C<abs($n)>) is less than C<(UV_MAX + 1)>, only
319the integer portion of C<$m> and C<$n> will be used in the operation
4848a83b 320(Note: here C<UV_MAX> means the maximum of the unsigned integer type).
db691027 321If the absolute value of the right operand (C<abs($n)>) is greater than
4848a83b 322or equal to C<(UV_MAX + 1)>, "%" computes the floating-point remainder
db691027 323C<$r> in the equation C<($r = $m - $i*$n)> where C<$i> is a certain
f7918450 324integer that makes C<$r> have the same sign as the right operand
db691027
SF
325C<$n> (B<not> as the left operand C<$m> like C function C<fmod()>)
326and the absolute value less than that of C<$n>.
0412d526 327Note that when C<use integer> is in scope, "%" gives you direct access
f7918450 328to the modulo operator as implemented by your C compiler. This
55d729e4
GS
329operator is not as well defined for negative operands, but it will
330execute faster.
f7918450 331X<%> X<remainder> X<modulo> X<mod>
55d729e4 332
62d10b70
GS
333Binary "x" is the repetition operator. In scalar context or if the left
334operand is not enclosed in parentheses, it returns a string consisting
335of the left operand repeated the number of times specified by the right
336operand. In list context, if the left operand is enclosed in
3585017f 337parentheses or is a list formed by C<qw/STRING/>, it repeats the list.
31201a8e
KW
338If the right operand is zero or negative (raising a warning on
339negative), it returns an empty string
3585017f 340or an empty list, depending on the context.
d74e8afc 341X<x>
a0d0e21e
LW
342
343 print '-' x 80; # print row of dashes
344
345 print "\t" x ($tab/8), ' ' x ($tab%8); # tab over
346
347 @ones = (1) x 80; # a list of 80 1's
348 @ones = (5) x @ones; # set all elements to 5
349
350
351=head2 Additive Operators
d74e8afc 352X<operator, additive>
a0d0e21e 353
1ca345ed 354Binary C<+> returns the sum of two numbers.
d74e8afc 355X<+>
a0d0e21e 356
1ca345ed 357Binary C<-> returns the difference of two numbers.
d74e8afc 358X<->
a0d0e21e 359
1ca345ed 360Binary C<.> concatenates two strings.
d74e8afc
ITB
361X<string, concatenation> X<concatenation>
362X<cat> X<concat> X<concatenate> X<.>
a0d0e21e
LW
363
364=head2 Shift Operators
d74e8afc
ITB
365X<shift operator> X<operator, shift> X<<< << >>>
366X<<< >> >>> X<right shift> X<left shift> X<bitwise shift>
367X<shl> X<shr> X<shift, right> X<shift, left>
a0d0e21e 368
1ca345ed 369Binary C<<< << >>> returns the value of its left argument shifted left by the
55497cff 370number of bits specified by the right argument. Arguments should be
982ce180 371integers. (See also L<Integer Arithmetic>.)
a0d0e21e 372
1ca345ed 373Binary C<<< >> >>> returns the value of its left argument shifted right by
55497cff 374the number of bits specified by the right argument. Arguments should
982ce180 375be integers. (See also L<Integer Arithmetic>.)
a0d0e21e 376
1ca345ed
TC
377Note that both C<<< << >>> and C<<< >> >>> in Perl are implemented directly using
378C<<< << >>> and C<<< >> >>> in C. If C<use integer> (see L<Integer Arithmetic>) is
b16cf6df
JH
379in force then signed C integers are used, else unsigned C integers are
380used. Either way, the implementation isn't going to generate results
381larger than the size of the integer type Perl was built with (32 bits
382or 64 bits).
383
384The result of overflowing the range of the integers is undefined
385because it is undefined also in C. In other words, using 32-bit
386integers, C<< 1 << 32 >> is undefined. Shifting by a negative number
387of bits is also undefined.
388
1ca345ed
TC
389If you get tired of being subject to your platform's native integers,
390the C<use bigint> pragma neatly sidesteps the issue altogether:
391
392 print 20 << 20; # 20971520
393 print 20 << 40; # 5120 on 32-bit machines,
394 # 21990232555520 on 64-bit machines
395 use bigint;
396 print 20 << 100; # 25353012004564588029934064107520
397
a0d0e21e 398=head2 Named Unary Operators
d74e8afc 399X<operator, named unary>
a0d0e21e
LW
400
401The various named unary operators are treated as functions with one
568e6d8b 402argument, with optional parentheses.
a0d0e21e
LW
403
404If any list operator (print(), etc.) or any unary operator (chdir(), etc.)
405is followed by a left parenthesis as the next token, the operator and
406arguments within parentheses are taken to be of highest precedence,
3981b0eb 407just like a normal function call. For example,
1ca345ed 408because named unary operators are higher precedence than C<||>:
a0d0e21e
LW
409
410 chdir $foo || die; # (chdir $foo) || die
411 chdir($foo) || die; # (chdir $foo) || die
412 chdir ($foo) || die; # (chdir $foo) || die
413 chdir +($foo) || die; # (chdir $foo) || die
414
3981b0eb 415but, because * is higher precedence than named operators:
a0d0e21e
LW
416
417 chdir $foo * 20; # chdir ($foo * 20)
418 chdir($foo) * 20; # (chdir $foo) * 20
419 chdir ($foo) * 20; # (chdir $foo) * 20
420 chdir +($foo) * 20; # chdir ($foo * 20)
421
422 rand 10 * 20; # rand (10 * 20)
423 rand(10) * 20; # (rand 10) * 20
424 rand (10) * 20; # (rand 10) * 20
425 rand +(10) * 20; # rand (10 * 20)
426
568e6d8b
RGS
427Regarding precedence, the filetest operators, like C<-f>, C<-M>, etc. are
428treated like named unary operators, but they don't follow this functional
429parenthesis rule. That means, for example, that C<-f($file).".bak"> is
430equivalent to C<-f "$file.bak">.
d74e8afc 431X<-X> X<filetest> X<operator, filetest>
568e6d8b 432
5ba421f6 433See also L<"Terms and List Operators (Leftward)">.
a0d0e21e
LW
434
435=head2 Relational Operators
d74e8afc 436X<relational operator> X<operator, relational>
a0d0e21e 437
1ca345ed
TC
438Perl operators that return true or false generally return values
439that can be safely used as numbers. For example, the relational
440operators in this section and the equality operators in the next
441one return C<1> for true and a special version of the defined empty
442string, C<"">, which counts as a zero but is exempt from warnings
443about improper numeric conversions, just as C<"0 but true"> is.
444
35f2feb0 445Binary "<" returns true if the left argument is numerically less than
a0d0e21e 446the right argument.
d74e8afc 447X<< < >>
a0d0e21e 448
35f2feb0 449Binary ">" returns true if the left argument is numerically greater
a0d0e21e 450than the right argument.
d74e8afc 451X<< > >>
a0d0e21e 452
35f2feb0 453Binary "<=" returns true if the left argument is numerically less than
a0d0e21e 454or equal to the right argument.
d74e8afc 455X<< <= >>
a0d0e21e 456
35f2feb0 457Binary ">=" returns true if the left argument is numerically greater
a0d0e21e 458than or equal to the right argument.
d74e8afc 459X<< >= >>
a0d0e21e
LW
460
461Binary "lt" returns true if the left argument is stringwise less than
462the right argument.
d74e8afc 463X<< lt >>
a0d0e21e
LW
464
465Binary "gt" returns true if the left argument is stringwise greater
466than the right argument.
d74e8afc 467X<< gt >>
a0d0e21e
LW
468
469Binary "le" returns true if the left argument is stringwise less than
470or equal to the right argument.
d74e8afc 471X<< le >>
a0d0e21e
LW
472
473Binary "ge" returns true if the left argument is stringwise greater
474than or equal to the right argument.
d74e8afc 475X<< ge >>
a0d0e21e
LW
476
477=head2 Equality Operators
d74e8afc 478X<equality> X<equal> X<equals> X<operator, equality>
a0d0e21e
LW
479
480Binary "==" returns true if the left argument is numerically equal to
481the right argument.
d74e8afc 482X<==>
a0d0e21e
LW
483
484Binary "!=" returns true if the left argument is numerically not equal
485to the right argument.
d74e8afc 486X<!=>
a0d0e21e 487
35f2feb0 488Binary "<=>" returns -1, 0, or 1 depending on whether the left
6ee5d4e7 489argument is numerically less than, equal to, or greater than the right
d4ad863d 490argument. If your platform supports NaNs (not-a-numbers) as numeric
7d3a9d88 491values, using them with "<=>" returns undef. NaN is not "<", "==", ">",
46f8a5ea
FC
492"<=" or ">=" anything (even NaN), so those 5 return false. NaN != NaN
493returns true, as does NaN != anything else. If your platform doesn't
7d3a9d88 494support NaNs then NaN is just a string with numeric value 0.
d74e8afc 495X<< <=> >> X<spaceship>
7d3a9d88 496
db691027
SF
497 $ perl -le '$x = "NaN"; print "No NaN support here" if $x == $x'
498 $ perl -le '$x = "NaN"; print "NaN support here" if $x != $x'
1ca345ed 499
db691027 500(Note that the L<bigint>, L<bigrat>, and L<bignum> pragmas all
1ca345ed 501support "NaN".)
a0d0e21e
LW
502
503Binary "eq" returns true if the left argument is stringwise equal to
504the right argument.
d74e8afc 505X<eq>
a0d0e21e
LW
506
507Binary "ne" returns true if the left argument is stringwise not equal
508to the right argument.
d74e8afc 509X<ne>
a0d0e21e 510
d4ad863d
JH
511Binary "cmp" returns -1, 0, or 1 depending on whether the left
512argument is stringwise less than, equal to, or greater than the right
513argument.
d74e8afc 514X<cmp>
a0d0e21e 515
1ca345ed
TC
516Binary "~~" does a smartmatch between its arguments. Smart matching
517is described in the next section.
0d863452
RH
518X<~~>
519
a034a98d 520"lt", "le", "ge", "gt" and "cmp" use the collation (sort) order specified
66cbab2c
KW
521by the current locale if a legacy C<use locale> (but not
522C<use locale ':not_characters'>) is in effect. See
1ca345ed
TC
523L<perllocale>. Do not mix these with Unicode, only with legacy binary
524encodings. The standard L<Unicode::Collate> and
525L<Unicode::Collate::Locale> modules offer much more powerful solutions to
526collation issues.
527
82365311
DG
528For case-insensitive comparisions, look at the L<perlfunc/fc> case-folding
529function, available in Perl v5.16 or later:
530
531 if ( fc($x) eq fc($y) ) { ... }
532
1ca345ed
TC
533=head2 Smartmatch Operator
534
535First available in Perl 5.10.1 (the 5.10.0 version behaved differently),
536binary C<~~> does a "smartmatch" between its arguments. This is mostly
537used implicitly in the C<when> construct described in L<perlsyn>, although
538not all C<when> clauses call the smartmatch operator. Unique among all of
cc08d69f
RS
539Perl's operators, the smartmatch operator can recurse. The smartmatch
540operator is L<experimental|perlpolicy/experimental> and its behavior is
541subject to change.
1ca345ed
TC
542
543It is also unique in that all other Perl operators impose a context
544(usually string or numeric context) on their operands, autoconverting
545those operands to those imposed contexts. In contrast, smartmatch
546I<infers> contexts from the actual types of its operands and uses that
547type information to select a suitable comparison mechanism.
548
549The C<~~> operator compares its operands "polymorphically", determining how
550to compare them according to their actual types (numeric, string, array,
551hash, etc.) Like the equality operators with which it shares the same
552precedence, C<~~> returns 1 for true and C<""> for false. It is often best
553read aloud as "in", "inside of", or "is contained in", because the left
554operand is often looked for I<inside> the right operand. That makes the
40bec8a5 555order of the operands to the smartmatch operand often opposite that of
1ca345ed
TC
556the regular match operator. In other words, the "smaller" thing is usually
557placed in the left operand and the larger one in the right.
558
559The behavior of a smartmatch depends on what type of things its arguments
560are, as determined by the following table. The first row of the table
561whose types apply determines the smartmatch behavior. Because what
562actually happens is mostly determined by the type of the second operand,
563the table is sorted on the right operand instead of on the left.
564
565 Left Right Description and pseudocode
566 ===============================================================
567 Any undef check whether Any is undefined
568 like: !defined Any
569
570 Any Object invoke ~~ overloading on Object, or die
571
572 Right operand is an ARRAY:
573
574 Left Right Description and pseudocode
575 ===============================================================
576 ARRAY1 ARRAY2 recurse on paired elements of ARRAY1 and ARRAY2[2]
577 like: (ARRAY1[0] ~~ ARRAY2[0])
578 && (ARRAY1[1] ~~ ARRAY2[1]) && ...
579 HASH ARRAY any ARRAY elements exist as HASH keys
580 like: grep { exists HASH->{$_} } ARRAY
581 Regexp ARRAY any ARRAY elements pattern match Regexp
582 like: grep { /Regexp/ } ARRAY
583 undef ARRAY undef in ARRAY
584 like: grep { !defined } ARRAY
40bec8a5 585 Any ARRAY smartmatch each ARRAY element[3]
1ca345ed
TC
586 like: grep { Any ~~ $_ } ARRAY
587
588 Right operand is a HASH:
589
590 Left Right Description and pseudocode
591 ===============================================================
592 HASH1 HASH2 all same keys in both HASHes
593 like: keys HASH1 ==
594 grep { exists HASH2->{$_} } keys HASH1
595 ARRAY HASH any ARRAY elements exist as HASH keys
596 like: grep { exists HASH->{$_} } ARRAY
597 Regexp HASH any HASH keys pattern match Regexp
598 like: grep { /Regexp/ } keys HASH
599 undef HASH always false (undef can't be a key)
600 like: 0 == 1
601 Any HASH HASH key existence
602 like: exists HASH->{Any}
603
604 Right operand is CODE:
f703fc96 605
1ca345ed
TC
606 Left Right Description and pseudocode
607 ===============================================================
608 ARRAY CODE sub returns true on all ARRAY elements[1]
609 like: !grep { !CODE->($_) } ARRAY
610 HASH CODE sub returns true on all HASH keys[1]
611 like: !grep { !CODE->($_) } keys HASH
612 Any CODE sub passed Any returns true
613 like: CODE->(Any)
614
615Right operand is a Regexp:
616
617 Left Right Description and pseudocode
618 ===============================================================
619 ARRAY Regexp any ARRAY elements match Regexp
620 like: grep { /Regexp/ } ARRAY
621 HASH Regexp any HASH keys match Regexp
622 like: grep { /Regexp/ } keys HASH
623 Any Regexp pattern match
624 like: Any =~ /Regexp/
625
626 Other:
627
628 Left Right Description and pseudocode
629 ===============================================================
630 Object Any invoke ~~ overloading on Object,
631 or fall back to...
632
633 Any Num numeric equality
634 like: Any == Num
635 Num nummy[4] numeric equality
636 like: Num == nummy
637 undef Any check whether undefined
638 like: !defined(Any)
639 Any Any string equality
640 like: Any eq Any
641
642
643Notes:
644
645=over
646
647=item 1.
648Empty hashes or arrays match.
649
650=item 2.
40bec8a5 651That is, each element smartmatches the element of the same index in the other array.[3]
1ca345ed
TC
652
653=item 3.
654If a circular reference is found, fall back to referential equality.
655
656=item 4.
657Either an actual number, or a string that looks like one.
658
659=back
660
661The smartmatch implicitly dereferences any non-blessed hash or array
662reference, so the C<I<HASH>> and C<I<ARRAY>> entries apply in those cases.
663For blessed references, the C<I<Object>> entries apply. Smartmatches
664involving hashes only consider hash keys, never hash values.
665
666The "like" code entry is not always an exact rendition. For example, the
40bec8a5 667smartmatch operator short-circuits whenever possible, but C<grep> does
1ca345ed
TC
668not. Also, C<grep> in scalar context returns the number of matches, but
669C<~~> returns only true or false.
670
671Unlike most operators, the smartmatch operator knows to treat C<undef>
672specially:
673
674 use v5.10.1;
675 @array = (1, 2, 3, undef, 4, 5);
676 say "some elements undefined" if undef ~~ @array;
677
678Each operand is considered in a modified scalar context, the modification
679being that array and hash variables are passed by reference to the
680operator, which implicitly dereferences them. Both elements
681of each pair are the same:
682
683 use v5.10.1;
684
685 my %hash = (red => 1, blue => 2, green => 3,
686 orange => 4, yellow => 5, purple => 6,
687 black => 7, grey => 8, white => 9);
688
689 my @array = qw(red blue green);
690
691 say "some array elements in hash keys" if @array ~~ %hash;
692 say "some array elements in hash keys" if \@array ~~ \%hash;
693
694 say "red in array" if "red" ~~ @array;
695 say "red in array" if "red" ~~ \@array;
696
697 say "some keys end in e" if /e$/ ~~ %hash;
698 say "some keys end in e" if /e$/ ~~ \%hash;
699
40bec8a5
TC
700Two arrays smartmatch if each element in the first array smartmatches
701(that is, is "in") the corresponding element in the second array,
702recursively.
1ca345ed
TC
703
704 use v5.10.1;
705 my @little = qw(red blue green);
706 my @bigger = ("red", "blue", [ "orange", "green" ] );
707 if (@little ~~ @bigger) { # true!
708 say "little is contained in bigger";
709 }
710
711Because the smartmatch operator recurses on nested arrays, this
712will still report that "red" is in the array.
713
714 use v5.10.1;
715 my @array = qw(red blue green);
716 my $nested_array = [[[[[[[ @array ]]]]]]];
717 say "red in array" if "red" ~~ $nested_array;
718
719If two arrays smartmatch each other, then they are deep
720copies of each others' values, as this example reports:
721
722 use v5.12.0;
723 my @a = (0, 1, 2, [3, [4, 5], 6], 7);
724 my @b = (0, 1, 2, [3, [4, 5], 6], 7);
725
726 if (@a ~~ @b && @b ~~ @a) {
727 say "a and b are deep copies of each other";
728 }
729 elsif (@a ~~ @b) {
730 say "a smartmatches in b";
731 }
732 elsif (@b ~~ @a) {
733 say "b smartmatches in a";
734 }
735 else {
736 say "a and b don't smartmatch each other at all";
737 }
738
739
740If you were to set C<$b[3] = 4>, then instead of reporting that "a and b
741are deep copies of each other", it now reports that "b smartmatches in a".
742That because the corresponding position in C<@a> contains an array that
743(eventually) has a 4 in it.
744
745Smartmatching one hash against another reports whether both contain the
46f8a5ea 746same keys, no more and no less. This could be used to see whether two
1ca345ed
TC
747records have the same field names, without caring what values those fields
748might have. For example:
749
750 use v5.10.1;
751 sub make_dogtag {
752 state $REQUIRED_FIELDS = { name=>1, rank=>1, serial_num=>1 };
753
754 my ($class, $init_fields) = @_;
755
756 die "Must supply (only) name, rank, and serial number"
757 unless $init_fields ~~ $REQUIRED_FIELDS;
758
759 ...
760 }
761
762or, if other non-required fields are allowed, use ARRAY ~~ HASH:
763
764 use v5.10.1;
765 sub make_dogtag {
766 state $REQUIRED_FIELDS = { name=>1, rank=>1, serial_num=>1 };
767
768 my ($class, $init_fields) = @_;
769
770 die "Must supply (at least) name, rank, and serial number"
771 unless [keys %{$init_fields}] ~~ $REQUIRED_FIELDS;
772
773 ...
774 }
775
776The smartmatch operator is most often used as the implicit operator of a
777C<when> clause. See the section on "Switch Statements" in L<perlsyn>.
778
779=head3 Smartmatching of Objects
780
40bec8a5
TC
781To avoid relying on an object's underlying representation, if the
782smartmatch's right operand is an object that doesn't overload C<~~>,
783it raises the exception "C<Smartmatching a non-overloaded object
46f8a5ea
FC
784breaks encapsulation>". That's because one has no business digging
785around to see whether something is "in" an object. These are all
40bec8a5 786illegal on objects without a C<~~> overload:
1ca345ed
TC
787
788 %hash ~~ $object
789 42 ~~ $object
790 "fred" ~~ $object
791
792However, you can change the way an object is smartmatched by overloading
46f8a5ea
FC
793the C<~~> operator. This is allowed to
794extend the usual smartmatch semantics.
1ca345ed
TC
795For objects that do have an C<~~> overload, see L<overload>.
796
797Using an object as the left operand is allowed, although not very useful.
798Smartmatching rules take precedence over overloading, so even if the
799object in the left operand has smartmatch overloading, this will be
800ignored. A left operand that is a non-overloaded object falls back on a
801string or numeric comparison of whatever the C<ref> operator returns. That
802means that
803
804 $object ~~ X
805
806does I<not> invoke the overload method with C<I<X>> as an argument.
807Instead the above table is consulted as normal, and based on the type of
808C<I<X>>, overloading may or may not be invoked. For simple strings or
809numbers, in becomes equivalent to this:
810
811 $object ~~ $number ref($object) == $number
812 $object ~~ $string ref($object) eq $string
813
814For example, this reports that the handle smells IOish
815(but please don't really do this!):
816
817 use IO::Handle;
818 my $fh = IO::Handle->new();
819 if ($fh ~~ /\bIO\b/) {
820 say "handle smells IOish";
821 }
822
823That's because it treats C<$fh> as a string like
824C<"IO::Handle=GLOB(0x8039e0)">, then pattern matches against that.
a034a98d 825
a0d0e21e 826=head2 Bitwise And
d74e8afc 827X<operator, bitwise, and> X<bitwise and> X<&>
a0d0e21e 828
c791a246
KW
829Binary "&" returns its operands ANDed together bit by bit. Although no
830warning is currently raised, the result is not well defined when this operation
831is performed on operands that aren't either numbers (see
832L<Integer Arithmetic>) or bitstrings (see L<Bitwise String Operators>).
a0d0e21e 833
2cdc098b 834Note that "&" has lower priority than relational operators, so for example
1ca345ed 835the parentheses are essential in a test like
2cdc098b 836
1ca345ed 837 print "Even\n" if ($x & 1) == 0;
2cdc098b 838
fb7054ba
FC
839If the experimental "bitwise" feature is enabled via C<use feature
840'bitwise'>, then this operator always treats its operand as numbers. This
841feature produces a warning unless you use C<no warnings
842'experimental::bitwise'>.
843
a0d0e21e 844=head2 Bitwise Or and Exclusive Or
d74e8afc
ITB
845X<operator, bitwise, or> X<bitwise or> X<|> X<operator, bitwise, xor>
846X<bitwise xor> X<^>
a0d0e21e 847
2cdc098b 848Binary "|" returns its operands ORed together bit by bit.
a0d0e21e 849
2cdc098b 850Binary "^" returns its operands XORed together bit by bit.
c791a246
KW
851
852Although no warning is currently raised, the results are not well
853defined when these operations are performed on operands that aren't either
854numbers (see L<Integer Arithmetic>) or bitstrings (see L<Bitwise String
855Operators>).
a0d0e21e 856
2cdc098b
MG
857Note that "|" and "^" have lower priority than relational operators, so
858for example the brackets are essential in a test like
859
1ca345ed 860 print "false\n" if (8 | 2) != 10;
2cdc098b 861
fb7054ba
FC
862If the experimental "bitwise" feature is enabled via C<use feature
863'bitwise'>, then this operator always treats its operand as numbers. This
864feature produces a warning unless you use C<no warnings
865'experimental::bitwise'>.
866
a0d0e21e 867=head2 C-style Logical And
d74e8afc 868X<&&> X<logical and> X<operator, logical, and>
a0d0e21e
LW
869
870Binary "&&" performs a short-circuit logical AND operation. That is,
871if the left operand is false, the right operand is not even evaluated.
872Scalar or list context propagates down to the right operand if it
873is evaluated.
874
875=head2 C-style Logical Or
d74e8afc 876X<||> X<operator, logical, or>
a0d0e21e
LW
877
878Binary "||" performs a short-circuit logical OR operation. That is,
879if the left operand is true, the right operand is not even evaluated.
880Scalar or list context propagates down to the right operand if it
881is evaluated.
882
26d9d83b 883=head2 Logical Defined-Or
d74e8afc 884X<//> X<operator, logical, defined-or>
c963b151
BD
885
886Although it has no direct equivalent in C, Perl's C<//> operator is related
89d205f2 887to its C-style or. In fact, it's exactly the same as C<||>, except that it
95bee9ba
A
888tests the left hand side's definedness instead of its truth. Thus,
889C<< EXPR1 // EXPR2 >> returns the value of C<< EXPR1 >> if it's defined,
46f8a5ea
FC
890otherwise, the value of C<< EXPR2 >> is returned.
891(C<< EXPR1 >> is evaluated in scalar context, C<< EXPR2 >>
892in the context of C<< // >> itself). Usually,
95bee9ba
A
893this is the same result as C<< defined(EXPR1) ? EXPR1 : EXPR2 >> (except that
894the ternary-operator form can be used as a lvalue, while C<< EXPR1 // EXPR2 >>
46f8a5ea 895cannot). This is very useful for
bdc7923b 896providing default values for variables. If you actually want to test if
db691027 897at least one of C<$x> and C<$y> is defined, use C<defined($x // $y)>.
c963b151 898
d042e63d 899The C<||>, C<//> and C<&&> operators return the last value evaluated
46f8a5ea 900(unlike C's C<||> and C<&&>, which return 0 or 1). Thus, a reasonably
d042e63d 901portable way to find out the home directory might be:
a0d0e21e 902
c543c01b
TC
903 $home = $ENV{HOME}
904 // $ENV{LOGDIR}
905 // (getpwuid($<))[7]
906 // die "You're homeless!\n";
a0d0e21e 907
5a964f20
TC
908In particular, this means that you shouldn't use this
909for selecting between two aggregates for assignment:
910
911 @a = @b || @c; # this is wrong
912 @a = scalar(@b) || @c; # really meant this
913 @a = @b ? @b : @c; # this works fine, though
914
1ca345ed 915As alternatives to C<&&> and C<||> when used for
f23102e2
RGS
916control flow, Perl provides the C<and> and C<or> operators (see below).
917The short-circuit behavior is identical. The precedence of "and"
c963b151 918and "or" is much lower, however, so that you can safely use them after a
5a964f20 919list operator without the need for parentheses:
a0d0e21e
LW
920
921 unlink "alpha", "beta", "gamma"
922 or gripe(), next LINE;
923
924With the C-style operators that would have been written like this:
925
926 unlink("alpha", "beta", "gamma")
927 || (gripe(), next LINE);
928
1ca345ed
TC
929It would be even more readable to write that this way:
930
931 unless(unlink("alpha", "beta", "gamma")) {
932 gripe();
933 next LINE;
934 }
935
eeb6a2c9 936Using "or" for assignment is unlikely to do what you want; see below.
5a964f20
TC
937
938=head2 Range Operators
d74e8afc 939X<operator, range> X<range> X<..> X<...>
a0d0e21e
LW
940
941Binary ".." is the range operator, which is really two different
fb53bbb2 942operators depending on the context. In list context, it returns a
54ae734e 943list of values counting (up by ones) from the left value to the right
2cdbc966 944value. If the left value is greater than the right value then it
fb53bbb2 945returns the empty list. The range operator is useful for writing
46f8a5ea 946C<foreach (1..10)> loops and for doing slice operations on arrays. In
2cdbc966
JD
947the current implementation, no temporary array is created when the
948range operator is used as the expression in C<foreach> loops, but older
949versions of Perl might burn a lot of memory when you write something
950like this:
a0d0e21e
LW
951
952 for (1 .. 1_000_000) {
953 # code
54310121 954 }
a0d0e21e 955
8f0f46f8 956The range operator also works on strings, using the magical
957auto-increment, see below.
54ae734e 958
5a964f20 959In scalar context, ".." returns a boolean value. The operator is
8f0f46f8 960bistable, like a flip-flop, and emulates the line-range (comma)
46f8a5ea 961operator of B<sed>, B<awk>, and various editors. Each ".." operator
8f0f46f8 962maintains its own boolean state, even across calls to a subroutine
46f8a5ea 963that contains it. It is false as long as its left operand is false.
a0d0e21e
LW
964Once the left operand is true, the range operator stays true until the
965right operand is true, I<AFTER> which the range operator becomes false
8f0f46f8 966again. It doesn't become false till the next time the range operator
967is evaluated. It can test the right operand and become false on the
968same evaluation it became true (as in B<awk>), but it still returns
46f8a5ea 969true once. If you don't want it to test the right operand until the
8f0f46f8 970next evaluation, as in B<sed>, just use three dots ("...") instead of
19799a22
GS
971two. In all other regards, "..." behaves just like ".." does.
972
973The right operand is not evaluated while the operator is in the
974"false" state, and the left operand is not evaluated while the
975operator is in the "true" state. The precedence is a little lower
976than || and &&. The value returned is either the empty string for
8f0f46f8 977false, or a sequence number (beginning with 1) for true. The sequence
978number is reset for each range encountered. The final sequence number
979in a range has the string "E0" appended to it, which doesn't affect
980its numeric value, but gives you something to search for if you want
981to exclude the endpoint. You can exclude the beginning point by
982waiting for the sequence number to be greater than 1.
df5f8116
CW
983
984If either operand of scalar ".." is a constant expression,
985that operand is considered true if it is equal (C<==>) to the current
986input line number (the C<$.> variable).
987
988To be pedantic, the comparison is actually C<int(EXPR) == int(EXPR)>,
989but that is only an issue if you use a floating point expression; when
990implicitly using C<$.> as described in the previous paragraph, the
991comparison is C<int(EXPR) == int($.)> which is only an issue when C<$.>
992is set to a floating point value and you are not reading from a file.
993Furthermore, C<"span" .. "spat"> or C<2.18 .. 3.14> will not do what
994you want in scalar context because each of the operands are evaluated
995using their integer representation.
996
997Examples:
a0d0e21e
LW
998
999As a scalar operator:
1000
df5f8116 1001 if (101 .. 200) { print; } # print 2nd hundred lines, short for
950b09ed 1002 # if ($. == 101 .. $. == 200) { print; }
9f10b797
RGS
1003
1004 next LINE if (1 .. /^$/); # skip header lines, short for
f343f960 1005 # next LINE if ($. == 1 .. /^$/);
9f10b797
RGS
1006 # (typically in a loop labeled LINE)
1007
1008 s/^/> / if (/^$/ .. eof()); # quote body
a0d0e21e 1009
5a964f20
TC
1010 # parse mail messages
1011 while (<>) {
1012 $in_header = 1 .. /^$/;
df5f8116
CW
1013 $in_body = /^$/ .. eof;
1014 if ($in_header) {
f343f960 1015 # do something
df5f8116 1016 } else { # in body
f343f960 1017 # do something else
df5f8116 1018 }
5a964f20 1019 } continue {
df5f8116 1020 close ARGV if eof; # reset $. each file
5a964f20
TC
1021 }
1022
acf31ca5
SF
1023Here's a simple example to illustrate the difference between
1024the two range operators:
1025
1026 @lines = (" - Foo",
1027 "01 - Bar",
1028 "1 - Baz",
1029 " - Quux");
1030
9f10b797
RGS
1031 foreach (@lines) {
1032 if (/0/ .. /1/) {
acf31ca5
SF
1033 print "$_\n";
1034 }
1035 }
1036
46f8a5ea 1037This program will print only the line containing "Bar". If
9f10b797 1038the range operator is changed to C<...>, it will also print the
acf31ca5
SF
1039"Baz" line.
1040
1041And now some examples as a list operator:
a0d0e21e 1042
1ca345ed
TC
1043 for (101 .. 200) { print } # print $_ 100 times
1044 @foo = @foo[0 .. $#foo]; # an expensive no-op
1045 @foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items
a0d0e21e 1046
5a964f20 1047The range operator (in list context) makes use of the magical
5f05dabc 1048auto-increment algorithm if the operands are strings. You
a0d0e21e
LW
1049can say
1050
c543c01b 1051 @alphabet = ("A" .. "Z");
a0d0e21e 1052
54ae734e 1053to get all normal letters of the English alphabet, or
a0d0e21e 1054
c543c01b 1055 $hexdigit = (0 .. 9, "a" .. "f")[$num & 15];
a0d0e21e
LW
1056
1057to get a hexadecimal digit, or
1058
1ca345ed
TC
1059 @z2 = ("01" .. "31");
1060 print $z2[$mday];
a0d0e21e 1061
ea4f5703
YST
1062to get dates with leading zeros.
1063
1064If the final value specified is not in the sequence that the magical
1065increment would produce, the sequence goes until the next value would
1066be longer than the final value specified.
1067
1068If the initial value specified isn't part of a magical increment
c543c01b 1069sequence (that is, a non-empty string matching C</^[a-zA-Z]*[0-9]*\z/>),
ea4f5703
YST
1070only the initial value will be returned. So the following will only
1071return an alpha:
1072
c543c01b 1073 use charnames "greek";
ea4f5703
YST
1074 my @greek_small = ("\N{alpha}" .. "\N{omega}");
1075
c543c01b
TC
1076To get the 25 traditional lowercase Greek letters, including both sigmas,
1077you could use this instead:
ea4f5703 1078
c543c01b 1079 use charnames "greek";
1ca345ed
TC
1080 my @greek_small = map { chr } ( ord("\N{alpha}")
1081 ..
1082 ord("\N{omega}")
1083 );
c543c01b
TC
1084
1085However, because there are I<many> other lowercase Greek characters than
1086just those, to match lowercase Greek characters in a regular expression,
47c56cc8
KW
1087you could use the pattern C</(?:(?=\p{Greek})\p{Lower})+/> (or the
1088L<experimental feature|perlrecharclass/Extended Bracketed Character
1089Classes> C<S</(?[ \p{Greek} & \p{Lower} ])+/>>).
a0d0e21e 1090
df5f8116
CW
1091Because each operand is evaluated in integer form, C<2.18 .. 3.14> will
1092return two elements in list context.
1093
1094 @list = (2.18 .. 3.14); # same as @list = (2 .. 3);
1095
a0d0e21e 1096=head2 Conditional Operator
d74e8afc 1097X<operator, conditional> X<operator, ternary> X<ternary> X<?:>
a0d0e21e
LW
1098
1099Ternary "?:" is the conditional operator, just as in C. It works much
1100like an if-then-else. If the argument before the ? is true, the
1101argument before the : is returned, otherwise the argument after the :
cb1a09d0
AD
1102is returned. For example:
1103
54310121 1104 printf "I have %d dog%s.\n", $n,
c543c01b 1105 ($n == 1) ? "" : "s";
cb1a09d0
AD
1106
1107Scalar or list context propagates downward into the 2nd
54310121 1108or 3rd argument, whichever is selected.
cb1a09d0 1109
db691027
SF
1110 $x = $ok ? $y : $z; # get a scalar
1111 @x = $ok ? @y : @z; # get an array
1112 $x = $ok ? @y : @z; # oops, that's just a count!
cb1a09d0
AD
1113
1114The operator may be assigned to if both the 2nd and 3rd arguments are
1115legal lvalues (meaning that you can assign to them):
a0d0e21e 1116
db691027 1117 ($x_or_y ? $x : $y) = $z;
a0d0e21e 1118
5a964f20
TC
1119Because this operator produces an assignable result, using assignments
1120without parentheses will get you in trouble. For example, this:
1121
db691027 1122 $x % 2 ? $x += 10 : $x += 2
5a964f20
TC
1123
1124Really means this:
1125
db691027 1126 (($x % 2) ? ($x += 10) : $x) += 2
5a964f20
TC
1127
1128Rather than this:
1129
db691027 1130 ($x % 2) ? ($x += 10) : ($x += 2)
5a964f20 1131
19799a22
GS
1132That should probably be written more simply as:
1133
db691027 1134 $x += ($x % 2) ? 10 : 2;
19799a22 1135
4633a7c4 1136=head2 Assignment Operators
d74e8afc 1137X<assignment> X<operator, assignment> X<=> X<**=> X<+=> X<*=> X<&=>
5ac3b81c 1138X<<< <<= >>> X<&&=> X<-=> X</=> X<|=> X<<< >>= >>> X<||=> X<//=> X<.=>
fb7054ba 1139X<%=> X<^=> X<x=> X<&.=> X<|.=> X<^.=>
a0d0e21e
LW
1140
1141"=" is the ordinary assignment operator.
1142
1143Assignment operators work as in C. That is,
1144
db691027 1145 $x += 2;
a0d0e21e
LW
1146
1147is equivalent to
1148
db691027 1149 $x = $x + 2;
a0d0e21e
LW
1150
1151although without duplicating any side effects that dereferencing the lvalue
54310121
PP
1152might trigger, such as from tie(). Other assignment operators work similarly.
1153The following are recognized:
a0d0e21e 1154
fb7054ba
FC
1155 **= += *= &= &.= <<= &&=
1156 -= /= |= |.= >>= ||=
1157 .= %= ^= ^.= //=
9f10b797 1158 x=
a0d0e21e 1159
19799a22 1160Although these are grouped by family, they all have the precedence
82848c10
FC
1161of assignment. These combined assignment operators can only operate on
1162scalars, whereas the ordinary assignment operator can assign to arrays,
1163hashes, lists and even references. (See L<"Context"|perldata/Context>
1164and L<perldata/List value constructors>, and L<perlref/Assigning to
1165References>.)
a0d0e21e 1166
b350dd2f
GS
1167Unlike in C, the scalar assignment operator produces a valid lvalue.
1168Modifying an assignment is equivalent to doing the assignment and
1169then modifying the variable that was assigned to. This is useful
1170for modifying a copy of something, like this:
a0d0e21e 1171
1ca345ed
TC
1172 ($tmp = $global) =~ tr/13579/24680/;
1173
1174Although as of 5.14, that can be also be accomplished this way:
1175
1176 use v5.14;
1177 $tmp = ($global =~ tr/13579/24680/r);
a0d0e21e
LW
1178
1179Likewise,
1180
db691027 1181 ($x += 2) *= 3;
a0d0e21e
LW
1182
1183is equivalent to
1184
db691027
SF
1185 $x += 2;
1186 $x *= 3;
a0d0e21e 1187
b350dd2f
GS
1188Similarly, a list assignment in list context produces the list of
1189lvalues assigned to, and a list assignment in scalar context returns
1190the number of elements produced by the expression on the right hand
1191side of the assignment.
1192
fb7054ba
FC
1193The three dotted bitwise assignment operators (C<&.= |.= ^.=>) are new in
1194Perl 5.22 and experimental. See L</Bitwise String Operators>.
1195
748a9306 1196=head2 Comma Operator
d74e8afc 1197X<comma> X<operator, comma> X<,>
a0d0e21e 1198
5a964f20 1199Binary "," is the comma operator. In scalar context it evaluates
a0d0e21e
LW
1200its left argument, throws that value away, then evaluates its right
1201argument and returns that value. This is just like C's comma operator.
1202
5a964f20 1203In list context, it's just the list argument separator, and inserts
ed5c6d31
PJ
1204both its arguments into the list. These arguments are also evaluated
1205from left to right.
a0d0e21e 1206
4e1988c6
FC
1207The C<< => >> operator is a synonym for the comma except that it causes a
1208word on its left to be interpreted as a string if it begins with a letter
344f2c40
IG
1209or underscore and is composed only of letters, digits and underscores.
1210This includes operands that might otherwise be interpreted as operators,
46f8a5ea 1211constants, single number v-strings or function calls. If in doubt about
c543c01b 1212this behavior, the left operand can be quoted explicitly.
344f2c40
IG
1213
1214Otherwise, the C<< => >> operator behaves exactly as the comma operator
1215or list argument separator, according to context.
1216
1217For example:
a44e5664
MS
1218
1219 use constant FOO => "something";
1220
1221 my %h = ( FOO => 23 );
1222
1223is equivalent to:
1224
1225 my %h = ("FOO", 23);
1226
1227It is I<NOT>:
1228
1229 my %h = ("something", 23);
1230
719b43e8
RGS
1231The C<< => >> operator is helpful in documenting the correspondence
1232between keys and values in hashes, and other paired elements in lists.
748a9306 1233
a12b8f3c
FC
1234 %hash = ( $key => $value );
1235 login( $username => $password );
a44e5664 1236
4e1988c6
FC
1237The special quoting behavior ignores precedence, and hence may apply to
1238I<part> of the left operand:
1239
1240 print time.shift => "bbb";
1241
1242That example prints something like "1314363215shiftbbb", because the
1243C<< => >> implicitly quotes the C<shift> immediately on its left, ignoring
1244the fact that C<time.shift> is the entire left operand.
1245
a0d0e21e 1246=head2 List Operators (Rightward)
d74e8afc 1247X<operator, list, rightward> X<list operator>
a0d0e21e 1248
c543c01b 1249On the right side of a list operator, the comma has very low precedence,
a0d0e21e
LW
1250such that it controls all comma-separated expressions found there.
1251The only operators with lower precedence are the logical operators
1252"and", "or", and "not", which may be used to evaluate calls to list
1ca345ed
TC
1253operators without the need for parentheses:
1254
1255 open HANDLE, "< :utf8", "filename" or die "Can't open: $!\n";
1256
1257However, some people find that code harder to read than writing
1258it with parentheses:
1259
1260 open(HANDLE, "< :utf8", "filename") or die "Can't open: $!\n";
1261
1262in which case you might as well just use the more customary "||" operator:
a0d0e21e 1263
1ca345ed 1264 open(HANDLE, "< :utf8", "filename") || die "Can't open: $!\n";
a0d0e21e 1265
5ba421f6 1266See also discussion of list operators in L<Terms and List Operators (Leftward)>.
a0d0e21e
LW
1267
1268=head2 Logical Not
d74e8afc 1269X<operator, logical, not> X<not>
a0d0e21e
LW
1270
1271Unary "not" returns the logical negation of the expression to its right.
1272It's the equivalent of "!" except for the very low precedence.
1273
1274=head2 Logical And
d74e8afc 1275X<operator, logical, and> X<and>
a0d0e21e
LW
1276
1277Binary "and" returns the logical conjunction of the two surrounding
c543c01b
TC
1278expressions. It's equivalent to C<&&> except for the very low
1279precedence. This means that it short-circuits: the right
a0d0e21e
LW
1280expression is evaluated only if the left expression is true.
1281
59ab9d6e 1282=head2 Logical or and Exclusive Or
f23102e2 1283X<operator, logical, or> X<operator, logical, xor>
59ab9d6e 1284X<operator, logical, exclusive or>
f23102e2 1285X<or> X<xor>
a0d0e21e
LW
1286
1287Binary "or" returns the logical disjunction of the two surrounding
c543c01b
TC
1288expressions. It's equivalent to C<||> except for the very low precedence.
1289This makes it useful for control flow:
5a964f20
TC
1290
1291 print FH $data or die "Can't write to FH: $!";
1292
c543c01b
TC
1293This means that it short-circuits: the right expression is evaluated
1294only if the left expression is false. Due to its precedence, you must
1295be careful to avoid using it as replacement for the C<||> operator.
1296It usually works out better for flow control than in assignments:
5a964f20 1297
db691027
SF
1298 $x = $y or $z; # bug: this is wrong
1299 ($x = $y) or $z; # really means this
1300 $x = $y || $z; # better written this way
5a964f20 1301
19799a22 1302However, when it's a list-context assignment and you're trying to use
c543c01b 1303C<||> for control flow, you probably need "or" so that the assignment
5a964f20
TC
1304takes higher precedence.
1305
1306 @info = stat($file) || die; # oops, scalar sense of stat!
1307 @info = stat($file) or die; # better, now @info gets its due
1308
c963b151
BD
1309Then again, you could always use parentheses.
1310
1ca345ed 1311Binary C<xor> returns the exclusive-OR of the two surrounding expressions.
c543c01b 1312It cannot short-circuit (of course).
a0d0e21e 1313
59ab9d6e
MB
1314There is no low precedence operator for defined-OR.
1315
a0d0e21e 1316=head2 C Operators Missing From Perl
d74e8afc
ITB
1317X<operator, missing from perl> X<&> X<*>
1318X<typecasting> X<(TYPE)>
a0d0e21e
LW
1319
1320Here is what C has that Perl doesn't:
1321
1322=over 8
1323
1324=item unary &
1325
1326Address-of operator. (But see the "\" operator for taking a reference.)
1327
1328=item unary *
1329
46f8a5ea 1330Dereference-address operator. (Perl's prefix dereferencing
a0d0e21e
LW
1331operators are typed: $, @, %, and &.)
1332
1333=item (TYPE)
1334
19799a22 1335Type-casting operator.
a0d0e21e
LW
1336
1337=back
1338
5f05dabc 1339=head2 Quote and Quote-like Operators
89d205f2 1340X<operator, quote> X<operator, quote-like> X<q> X<qq> X<qx> X<qw> X<m>
d74e8afc
ITB
1341X<qr> X<s> X<tr> X<'> X<''> X<"> X<""> X<//> X<`> X<``> X<<< << >>>
1342X<escape sequence> X<escape>
1343
a0d0e21e
LW
1344While we usually think of quotes as literal values, in Perl they
1345function as operators, providing various kinds of interpolating and
1346pattern matching capabilities. Perl provides customary quote characters
1347for these behaviors, but also provides a way for you to choose your
1348quote character for any of them. In the following table, a C<{}> represents
9f10b797 1349any pair of delimiters you choose.
a0d0e21e 1350
2c268ad5
TP
1351 Customary Generic Meaning Interpolates
1352 '' q{} Literal no
1353 "" qq{} Literal yes
af9219ee 1354 `` qx{} Command yes*
2c268ad5 1355 qw{} Word list no
af9219ee
MG
1356 // m{} Pattern match yes*
1357 qr{} Pattern yes*
1358 s{}{} Substitution yes*
2c268ad5 1359 tr{}{} Transliteration no (but see below)
c543c01b 1360 y{}{} Transliteration no (but see below)
7e3b091d 1361 <<EOF here-doc yes*
a0d0e21e 1362
af9219ee
MG
1363 * unless the delimiter is ''.
1364
87275199 1365Non-bracketing delimiters use the same character fore and aft, but the four
c543c01b 1366sorts of ASCII brackets (round, angle, square, curly) all nest, which means
9f10b797 1367that
87275199 1368
c543c01b 1369 q{foo{bar}baz}
35f2feb0 1370
9f10b797 1371is the same as
87275199 1372
c543c01b 1373 'foo{bar}baz'
87275199
GS
1374
1375Note, however, that this does not always work for quoting Perl code:
1376
db691027 1377 $s = q{ if($x eq "}") ... }; # WRONG
87275199 1378
46f8a5ea 1379is a syntax error. The C<Text::Balanced> module (standard as of v5.8,
c543c01b 1380and from CPAN before then) is able to do this properly.
87275199 1381
19799a22 1382There can be whitespace between the operator and the quoting
fb73857a 1383characters, except when C<#> is being used as the quoting character.
19799a22
GS
1384C<q#foo#> is parsed as the string C<foo>, while C<q #foo#> is the
1385operator C<q> followed by a comment. Its argument will be taken
1386from the next line. This allows you to write:
fb73857a
PP
1387
1388 s {foo} # Replace foo
1389 {bar} # with bar.
1390
c543c01b
TC
1391The following escape sequences are available in constructs that interpolate,
1392and in transliterations:
5691ca5f 1393X<\t> X<\n> X<\r> X<\f> X<\b> X<\a> X<\e> X<\x> X<\0> X<\c> X<\N> X<\N{}>
04341565 1394X<\o{}>
5691ca5f 1395
2c4c1ff2
KW
1396 Sequence Note Description
1397 \t tab (HT, TAB)
1398 \n newline (NL)
1399 \r return (CR)
1400 \f form feed (FF)
1401 \b backspace (BS)
1402 \a alarm (bell) (BEL)
1403 \e escape (ESC)
c543c01b 1404 \x{263A} [1,8] hex char (example: SMILEY)
2c4c1ff2 1405 \x1b [2,8] restricted range hex char (example: ESC)
fb121860 1406 \N{name} [3] named Unicode character or character sequence
2c4c1ff2
KW
1407 \N{U+263D} [4,8] Unicode character (example: FIRST QUARTER MOON)
1408 \c[ [5] control char (example: chr(27))
1409 \o{23072} [6,8] octal char (example: SMILEY)
1410 \033 [7,8] restricted range octal char (example: ESC)
5691ca5f
KW
1411
1412=over 4
1413
1414=item [1]
1415
2c4c1ff2
KW
1416The result is the character specified by the hexadecimal number between
1417the braces. See L</[8]> below for details on which character.
96448467 1418
46f8a5ea 1419Only hexadecimal digits are valid between the braces. If an invalid
96448467
DG
1420character is encountered, a warning will be issued and the invalid
1421character and all subsequent characters (valid or invalid) within the
1422braces will be discarded.
1423
1424If there are no valid digits between the braces, the generated character is
1425the NULL character (C<\x{00}>). However, an explicit empty brace (C<\x{}>)
c543c01b 1426will not cause a warning (currently).
40687185
KW
1427
1428=item [2]
1429
2c4c1ff2
KW
1430The result is the character specified by the hexadecimal number in the range
14310x00 to 0xFF. See L</[8]> below for details on which character.
96448467
DG
1432
1433Only hexadecimal digits are valid following C<\x>. When C<\x> is followed
2c4c1ff2 1434by fewer than two valid digits, any valid digits will be zero-padded. This
c543c01b 1435means that C<\x7> will be interpreted as C<\x07>, and a lone <\x> will be
2c4c1ff2 1436interpreted as C<\x00>. Except at the end of a string, having fewer than
c543c01b 1437two valid digits will result in a warning. Note that although the warning
96448467
DG
1438says the illegal character is ignored, it is only ignored as part of the
1439escape and will still be used as the subsequent character in the string.
1440For example:
1441
1442 Original Result Warns?
1443 "\x7" "\x07" no
1444 "\x" "\x00" no
1445 "\x7q" "\x07q" yes
1446 "\xq" "\x00q" yes
1447
40687185
KW
1448=item [3]
1449
fb121860 1450The result is the Unicode character or character sequence given by I<name>.
2c4c1ff2 1451See L<charnames>.
40687185
KW
1452
1453=item [4]
1454
2c4c1ff2
KW
1455C<\N{U+I<hexadecimal number>}> means the Unicode character whose Unicode code
1456point is I<hexadecimal number>.
40687185
KW
1457
1458=item [5]
1459
5691ca5f
KW
1460The character following C<\c> is mapped to some other character as shown in the
1461table:
1462
1463 Sequence Value
1464 \c@ chr(0)
1465 \cA chr(1)
1466 \ca chr(1)
1467 \cB chr(2)
1468 \cb chr(2)
1469 ...
1470 \cZ chr(26)
1471 \cz chr(26)
1472 \c[ chr(27)
1473 \c] chr(29)
1474 \c^ chr(30)
c3e9d7a9
KW
1475 \c_ chr(31)
1476 \c? chr(127) # (on ASCII platforms)
5691ca5f 1477
d813941f 1478In other words, it's the character whose code point has had 64 xor'd with
c3e9d7a9
KW
1479its uppercase. C<\c?> is DELETE on ASCII platforms because
1480S<C<ord("?") ^ 64>> is 127, and
d813941f
KW
1481C<\c@> is NULL because the ord of "@" is 64, so xor'ing 64 itself produces 0.
1482
5691ca5f
KW
1483Also, C<\c\I<X>> yields C< chr(28) . "I<X>"> for any I<X>, but cannot come at the
1484end of a string, because the backslash would be parsed as escaping the end
1485quote.
1486
1487On ASCII platforms, the resulting characters from the list above are the
1488complete set of ASCII controls. This isn't the case on EBCDIC platforms; see
c3e9d7a9
KW
1489L<perlebcdic/OPERATOR DIFFERENCES> for a full discussion of the
1490differences between these for ASCII versus EBCDIC platforms.
5691ca5f 1491
c3e9d7a9 1492Use of any other character following the C<"c"> besides those listed above is
63a63d81
KW
1493discouraged, and as of Perl v5.20, the only characters actually allowed
1494are the printable ASCII ones, minus the left brace C<"{">. What happens
1495for any of the allowed other characters is that the value is derived by
1496xor'ing with the seventh bit, which is 64, and a warning raised if
1497enabled. Using the non-allowed characters generates a fatal error.
5691ca5f
KW
1498
1499To get platform independent controls, you can use C<\N{...}>.
1500
40687185
KW
1501=item [6]
1502
2c4c1ff2
KW
1503The result is the character specified by the octal number between the braces.
1504See L</[8]> below for details on which character.
04341565
DG
1505
1506If a character that isn't an octal digit is encountered, a warning is raised,
1507and the value is based on the octal digits before it, discarding it and all
1508following characters up to the closing brace. It is a fatal error if there are
1509no octal digits at all.
1510
1511=item [7]
1512
c543c01b 1513The result is the character specified by the three-digit octal number in the
2c4c1ff2
KW
1514range 000 to 777 (but best to not use above 077, see next paragraph). See
1515L</[8]> below for details on which character.
1516
1517Some contexts allow 2 or even 1 digit, but any usage without exactly
40687185 1518three digits, the first being a zero, may give unintended results. (For
5db3e519
FC
1519example, in a regular expression it may be confused with a backreference;
1520see L<perlrebackslash/Octal escapes>.) Starting in Perl 5.14, you may
c543c01b 1521use C<\o{}> instead, which avoids all these problems. Otherwise, it is best to
04341565
DG
1522use this construct only for ordinals C<\077> and below, remembering to pad to
1523the left with zeros to make three digits. For larger ordinals, either use
9fef6a0d 1524C<\o{}>, or convert to something else, such as to hex and use C<\x{}>
04341565 1525instead.
40687185 1526
2c4c1ff2
KW
1527=item [8]
1528
c543c01b 1529Several constructs above specify a character by a number. That number
2c4c1ff2 1530gives the character's position in the character set encoding (indexed from 0).
c543c01b 1531This is called synonymously its ordinal, code position, or code point. Perl
2c4c1ff2
KW
1532works on platforms that have a native encoding currently of either ASCII/Latin1
1533or EBCDIC, each of which allow specification of 256 characters. In general, if
1534the number is 255 (0xFF, 0377) or below, Perl interprets this in the platform's
1535native encoding. If the number is 256 (0x100, 0400) or above, Perl interprets
c543c01b 1536it as a Unicode code point and the result is the corresponding Unicode
2c4c1ff2
KW
1537character. For example C<\x{50}> and C<\o{120}> both are the number 80 in
1538decimal, which is less than 256, so the number is interpreted in the native
1539character set encoding. In ASCII the character in the 80th position (indexed
1540from 0) is the letter "P", and in EBCDIC it is the ampersand symbol "&".
1541C<\x{100}> and C<\o{400}> are both 256 in decimal, so the number is interpreted
1542as a Unicode code point no matter what the native encoding is. The name of the
9fef6a0d 1543character in the 256th position (indexed by 0) in Unicode is
2c4c1ff2
KW
1544C<LATIN CAPITAL LETTER A WITH MACRON>.
1545
9fef6a0d 1546There are a couple of exceptions to the above rule. S<C<\N{U+I<hex number>}>> is
2c4c1ff2
KW
1547always interpreted as a Unicode code point, so that C<\N{U+0050}> is "P" even
1548on EBCDIC platforms. And if L<C<S<use encoding>>|encoding> is in effect, the
1549number is considered to be in that encoding, and is translated from that into
1550the platform's native encoding if there is a corresponding native character;
1551otherwise to Unicode.
1552
5691ca5f 1553=back
4c77eaa2 1554
e526e8bb 1555B<NOTE>: Unlike C and other languages, Perl has no C<\v> escape sequence for
8b312c40
KW
1556the vertical tab (VT, which is 11 in both ASCII and EBCDIC), but you may
1557use C<\ck> or
1558C<\x0b>. (C<\v>
e526e8bb
KW
1559does have meaning in regular expression patterns in Perl, see L<perlre>.)
1560
1561The following escape sequences are available in constructs that interpolate,
904501ec 1562but not in transliterations.
628253b8 1563X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q> X<\F>
904501ec 1564
c543c01b
TC
1565 \l lowercase next character only
1566 \u titlecase (not uppercase!) next character only
e4d34742
EB
1567 \L lowercase all characters till \E or end of string
1568 \U uppercase all characters till \E or end of string
628253b8 1569 \F foldcase all characters till \E or end of string
736fe711
KW
1570 \Q quote (disable) pattern metacharacters till \E or
1571 end of string
7e31b643 1572 \E end either case modification or quoted section
c543c01b
TC
1573 (whichever was last seen)
1574
736fe711
KW
1575See L<perlfunc/quotemeta> for the exact definition of characters that
1576are quoted by C<\Q>.
1577
628253b8 1578C<\L>, C<\U>, C<\F>, and C<\Q> can stack, in which case you need one
c543c01b
TC
1579C<\E> for each. For example:
1580
9fef6a0d
KW
1581 say"This \Qquoting \ubusiness \Uhere isn't quite\E done yet,\E is it?";
1582 This quoting\ Business\ HERE\ ISN\'T\ QUITE\ done\ yet\, is it?
a0d0e21e 1583
66cbab2c
KW
1584If C<use locale> is in effect (but not C<use locale ':not_characters'>),
1585the case map used by C<\l>, C<\L>,
c543c01b 1586C<\u>, and C<\U> is taken from the current locale. See L<perllocale>.
b6538e4f 1587If Unicode (for example, C<\N{}> or code points of 0x100 or
c543c01b
TC
1588beyond) is being used, the case map used by C<\l>, C<\L>, C<\u>, and
1589C<\U> is as defined by Unicode. That means that case-mapping
1590a single character can sometimes produce several characters.
31f05a37
KW
1591Under C<use locale>, C<\F> produces the same results as C<\L>
1592for all locales but a UTF-8 one, where it instead uses the Unicode
1593definition.
a034a98d 1594
5a964f20
TC
1595All systems use the virtual C<"\n"> to represent a line terminator,
1596called a "newline". There is no such thing as an unvarying, physical
19799a22 1597newline character. It is only an illusion that the operating system,
5a964f20
TC
1598device drivers, C libraries, and Perl all conspire to preserve. Not all
1599systems read C<"\r"> as ASCII CR and C<"\n"> as ASCII LF. For example,
c543c01b
TC
1600on the ancient Macs (pre-MacOS X) of yesteryear, these used to be reversed,
1601and on systems without line terminator,
1602printing C<"\n"> might emit no actual data. In general, use C<"\n"> when
5a964f20
TC
1603you mean a "newline" for your system, but use the literal ASCII when you
1604need an exact character. For example, most networking protocols expect
2a380090 1605and prefer a CR+LF (C<"\015\012"> or C<"\cM\cJ">) for line terminators,
5a964f20
TC
1606and although they often accept just C<"\012">, they seldom tolerate just
1607C<"\015">. If you get in the habit of using C<"\n"> for networking,
1608you may be burned some day.
d74e8afc
ITB
1609X<newline> X<line terminator> X<eol> X<end of line>
1610X<\n> X<\r> X<\r\n>
5a964f20 1611
904501ec
MG
1612For constructs that do interpolate, variables beginning with "C<$>"
1613or "C<@>" are interpolated. Subscripted variables such as C<$a[3]> or
ad0f383a
A
1614C<< $href->{key}[0] >> are also interpolated, as are array and hash slices.
1615But method calls such as C<< $obj->meth >> are not.
af9219ee
MG
1616
1617Interpolating an array or slice interpolates the elements in order,
1618separated by the value of C<$">, so is equivalent to interpolating
c543c01b
TC
1619C<join $", @array>. "Punctuation" arrays such as C<@*> are usually
1620interpolated only if the name is enclosed in braces C<@{*}>, but the
1621arrays C<@_>, C<@+>, and C<@-> are interpolated even without braces.
af9219ee 1622
bc7b91c6
EB
1623For double-quoted strings, the quoting from C<\Q> is applied after
1624interpolation and escapes are processed.
1625
1626 "abc\Qfoo\tbar$s\Exyz"
1627
1628is equivalent to
1629
1630 "abc" . quotemeta("foo\tbar$s") . "xyz"
1631
1632For the pattern of regex operators (C<qr//>, C<m//> and C<s///>),
1633the quoting from C<\Q> is applied after interpolation is processed,
46f8a5ea
FC
1634but before escapes are processed. This allows the pattern to match
1635literally (except for C<$> and C<@>). For example, the following matches:
bc7b91c6
EB
1636
1637 '\s\t' =~ /\Q\s\t/
1638
1639Because C<$> or C<@> trigger interpolation, you'll need to use something
1640like C</\Quser\E\@\Qhost/> to match them literally.
1d2dff63 1641
a0d0e21e
LW
1642Patterns are subject to an additional level of interpretation as a
1643regular expression. This is done as a second pass, after variables are
1644interpolated, so that regular expressions may be incorporated into the
1645pattern from the variables. If this is not what you want, use C<\Q> to
1646interpolate a variable literally.
1647
19799a22
GS
1648Apart from the behavior described above, Perl does not expand
1649multiple levels of interpolation. In particular, contrary to the
1650expectations of shell programmers, back-quotes do I<NOT> interpolate
1651within double quotes, nor do single quotes impede evaluation of
1652variables when used within double quotes.
a0d0e21e 1653
5f05dabc 1654=head2 Regexp Quote-Like Operators
d74e8afc 1655X<operator, regexp>
cb1a09d0 1656
5f05dabc 1657Here are the quote-like operators that apply to pattern
cb1a09d0
AD
1658matching and related activities.
1659
a0d0e21e
LW
1660=over 8
1661
33be4c61 1662=item qr/STRING/msixpodualn
01c6f5f4 1663X<qr> X</i> X</m> X</o> X</s> X</x> X</p>
a0d0e21e 1664
87e95b7f
YO
1665This operator quotes (and possibly compiles) its I<STRING> as a regular
1666expression. I<STRING> is interpolated the same way as I<PATTERN>
1667in C<m/PATTERN/>. If "'" is used as the delimiter, no interpolation
1668is done. Returns a Perl value which may be used instead of the
33be4c61 1669corresponding C</STRING/msixpodualn> expression. The returned value is a
46f8a5ea 1670normalized version of the original pattern. It magically differs from
1c8ee595
CO
1671a string containing the same characters: C<ref(qr/x/)> returns "Regexp";
1672however, dereferencing it is not well defined (you currently get the
1673normalized version of the original pattern, but this may change).
1674
a0d0e21e 1675
87e95b7f
YO
1676For example,
1677
1678 $rex = qr/my.STRING/is;
85dd5c8b 1679 print $rex; # prints (?si-xm:my.STRING)
87e95b7f
YO
1680 s/$rex/foo/;
1681
1682is equivalent to
1683
1684 s/my.STRING/foo/is;
1685
1686The result may be used as a subpattern in a match:
1687
1688 $re = qr/$pattern/;
7188ca43
KW
1689 $string =~ /foo${re}bar/; # can be interpolated in other
1690 # patterns
87e95b7f
YO
1691 $string =~ $re; # or used standalone
1692 $string =~ /$re/; # or this way
1693
f6050459 1694Since Perl may compile the pattern at the moment of execution of the qr()
87e95b7f
YO
1695operator, using qr() may have speed advantages in some situations,
1696notably if the result of qr() is used standalone:
1697
1698 sub match {
1699 my $patterns = shift;
1700 my @compiled = map qr/$_/i, @$patterns;
1701 grep {
1702 my $success = 0;
1703 foreach my $pat (@compiled) {
1704 $success = 1, last if /$pat/;
1705 }
1706 $success;
1707 } @_;
5a964f20
TC
1708 }
1709
87e95b7f
YO
1710Precompilation of the pattern into an internal representation at
1711the moment of qr() avoids a need to recompile the pattern every
1712time a match C</$pat/> is attempted. (Perl has many other internal
1713optimizations, but none would be triggered in the above example if
1714we did not use qr() operator.)
1715
765fa144 1716Options (specified by the following modifiers) are:
87e95b7f
YO
1717
1718 m Treat string as multiple lines.
1719 s Treat string as single line. (Make . match a newline)
1720 i Do case-insensitive pattern matching.
1721 x Use extended regular expressions.
1722 p When matching preserve a copy of the matched string so
7188ca43
KW
1723 that ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be
1724 defined.
87e95b7f 1725 o Compile pattern only once.
7188ca43
KW
1726 a ASCII-restrict: Use ASCII for \d, \s, \w; specifying two
1727 a's further restricts /i matching so that no ASCII
48cbae4f
SK
1728 character will match a non-ASCII one.
1729 l Use the locale.
1730 u Use Unicode rules.
1731 d Use Unicode or native charset, as in 5.12 and earlier.
33be4c61 1732 n Non-capture mode. Don't let () fill in $1, $2, etc...
87e95b7f
YO
1733
1734If a precompiled pattern is embedded in a larger pattern then the effect
33be4c61 1735of "msixpluadn" will be propagated appropriately. The effect the "o"
87e95b7f
YO
1736modifier has is not propagated, being restricted to those patterns
1737explicitly using it.
1738
b6fa137b 1739The last four modifiers listed above, added in Perl 5.14,
850b7ec9 1740control the character set rules, but C</a> is the only one you are likely
18509dec
KW
1741to want to specify explicitly; the other three are selected
1742automatically by various pragmas.
da392a17 1743
87e95b7f 1744See L<perlre> for additional information on valid syntax for STRING, and
5e2aa8f5 1745for a detailed look at the semantics of regular expressions. In
1ca345ed
TC
1746particular, all modifiers except the largely obsolete C</o> are further
1747explained in L<perlre/Modifiers>. C</o> is described in the next section.
a0d0e21e 1748
33be4c61 1749=item m/PATTERN/msixpodualngc
89d205f2
YO
1750X<m> X<operator, match>
1751X<regexp, options> X<regexp> X<regex, options> X<regex>
01c6f5f4 1752X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c>
a0d0e21e 1753
33be4c61 1754=item /PATTERN/msixpodualngc
a0d0e21e 1755
5a964f20 1756Searches a string for a pattern match, and in scalar context returns
19799a22
GS
1757true if it succeeds, false if it fails. If no string is specified
1758via the C<=~> or C<!~> operator, the $_ string is searched. (The
1759string specified with C<=~> need not be an lvalue--it may be the
1760result of an expression evaluation, but remember the C<=~> binds
006671a6 1761rather tightly.) See also L<perlre>.
a0d0e21e 1762
f6050459 1763Options are as described in C<qr//> above; in addition, the following match
01c6f5f4 1764process modifiers are available:
a0d0e21e 1765
950b09ed 1766 g Match globally, i.e., find all occurrences.
7188ca43
KW
1767 c Do not reset search position on a failed match when /g is
1768 in effect.
a0d0e21e 1769
725a61d7 1770If "/" is the delimiter then the initial C<m> is optional. With the C<m>
c543c01b 1771you can use any pair of non-whitespace (ASCII) characters
725a61d7
Z
1772as delimiters. This is particularly useful for matching path names
1773that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is
1774the delimiter, then a match-only-once rule applies,
46f8a5ea 1775described in C<m?PATTERN?> below. If "'" (single quote) is the delimiter,
6ca3c6c6 1776no interpolation is performed on the PATTERN.
ed02a3bf
DN
1777When using a character valid in an identifier, whitespace is required
1778after the C<m>.
a0d0e21e 1779
532c9e80
KW
1780PATTERN may contain variables, which will be interpolated
1781every time the pattern search is evaluated, except
1f247705
GS
1782for when the delimiter is a single quote. (Note that C<$(>, C<$)>, and
1783C<$|> are not interpolated because they look like end-of-string tests.)
532c9e80
KW
1784Perl will not recompile the pattern unless an interpolated
1785variable that it contains changes. You can force Perl to skip the
1786test and never recompile by adding a C</o> (which stands for "once")
1787after the trailing delimiter.
1788Once upon a time, Perl would recompile regular expressions
1789unnecessarily, and this modifier was useful to tell it not to do so, in the
5cc41653 1790interests of speed. But now, the only reasons to use C</o> are one of:
532c9e80
KW
1791
1792=over
1793
1794=item 1
1795
1796The variables are thousands of characters long and you know that they
1797don't change, and you need to wring out the last little bit of speed by
1798having Perl skip testing for that. (There is a maintenance penalty for
1799doing this, as mentioning C</o> constitutes a promise that you won't
18509dec 1800change the variables in the pattern. If you do change them, Perl won't
532c9e80
KW
1801even notice.)
1802
1803=item 2
1804
1805you want the pattern to use the initial values of the variables
1806regardless of whether they change or not. (But there are saner ways
1807of accomplishing this than using C</o>.)
1808
fa9b8686
DM
1809=item 3
1810
1811If the pattern contains embedded code, such as
1812
1813 use re 'eval';
1814 $code = 'foo(?{ $x })';
1815 /$code/
1816
1817then perl will recompile each time, even though the pattern string hasn't
1818changed, to ensure that the current value of C<$x> is seen each time.
1819Use C</o> if you want to avoid this.
1820
532c9e80 1821=back
a0d0e21e 1822
18509dec
KW
1823The bottom line is that using C</o> is almost never a good idea.
1824
e9d89077
DN
1825=item The empty pattern //
1826
5a964f20 1827If the PATTERN evaluates to the empty string, the last
46f8a5ea 1828I<successfully> matched regular expression is used instead. In this
c543c01b 1829case, only the C<g> and C<c> flags on the empty pattern are honored;
46f8a5ea 1830the other flags are taken from the original pattern. If no match has
d65afb4b
HS
1831previously succeeded, this will (silently) act instead as a genuine
1832empty pattern (which will always match).
a0d0e21e 1833
89d205f2
YO
1834Note that it's possible to confuse Perl into thinking C<//> (the empty
1835regex) is really C<//> (the defined-or operator). Perl is usually pretty
1836good about this, but some pathological cases might trigger this, such as
db691027 1837C<$x///> (is that C<($x) / (//)> or C<$x // />?) and C<print $fh //>
89d205f2
YO
1838(C<print $fh(//> or C<print($fh //>?). In all of these examples, Perl
1839will assume you meant defined-or. If you meant the empty regex, just
1840use parentheses or spaces to disambiguate, or even prefix the empty
c963b151
BD
1841regex with an C<m> (so C<//> becomes C<m//>).
1842
e9d89077
DN
1843=item Matching in list context
1844
19799a22 1845If the C</g> option is not used, C<m//> in list context returns a
a0d0e21e 1846list consisting of the subexpressions matched by the parentheses in the
3ff8ecf9
BF
1847pattern, that is, (C<$1>, C<$2>, C<$3>...) (Note that here C<$1> etc. are
1848also set). When there are no parentheses in the pattern, the return
1849value is the list C<(1)> for success.
1850With or without parentheses, an empty list is returned upon failure.
a0d0e21e
LW
1851
1852Examples:
1853
7188ca43
KW
1854 open(TTY, "+</dev/tty")
1855 || die "can't access /dev/tty: $!";
c543c01b 1856
7188ca43 1857 <TTY> =~ /^y/i && foo(); # do foo if desired
a0d0e21e 1858
7188ca43 1859 if (/Version: *([0-9.]*)/) { $version = $1; }
a0d0e21e 1860
7188ca43 1861 next if m#^/usr/spool/uucp#;
a0d0e21e 1862
7188ca43
KW
1863 # poor man's grep
1864 $arg = shift;
1865 while (<>) {
1866 print if /$arg/o; # compile only once (no longer needed!)
1867 }
a0d0e21e 1868
7188ca43 1869 if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
a0d0e21e
LW
1870
1871This last example splits $foo into the first two words and the
5f05dabc 1872remainder of the line, and assigns those three fields to $F1, $F2, and
c543c01b
TC
1873$Etc. The conditional is true if any variables were assigned; that is,
1874if the pattern matched.
a0d0e21e 1875
19799a22 1876The C</g> modifier specifies global pattern matching--that is,
46f8a5ea
FC
1877matching as many times as possible within the string. How it behaves
1878depends on the context. In list context, it returns a list of the
19799a22 1879substrings matched by any capturing parentheses in the regular
46f8a5ea 1880expression. If there are no parentheses, it returns a list of all
19799a22
GS
1881the matched strings, as if there were parentheses around the whole
1882pattern.
a0d0e21e 1883
7e86de3e 1884In scalar context, each execution of C<m//g> finds the next match,
19799a22 1885returning true if it matches, and false if there is no further match.
3dd93342 1886The position after the last match can be read or set using the C<pos()>
46f8a5ea 1887function; see L<perlfunc/pos>. A failed match normally resets the
7e86de3e 1888search position to the beginning of the string, but you can avoid that
46f8a5ea 1889by adding the C</c> modifier (for example, C<m//gc>). Modifying the target
7e86de3e 1890string also resets the search position.
c90c0ff4 1891
e9d89077
DN
1892=item \G assertion
1893
c90c0ff4 1894You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a
3dd93342 1895zero-width assertion that matches the exact position where the
46f8a5ea 1896previous C<m//g>, if any, left off. Without the C</g> modifier, the
3dd93342 1897C<\G> assertion still anchors at C<pos()> as it was at the start of
1898the operation (see L<perlfunc/pos>), but the match is of course only
46f8a5ea 1899attempted once. Using C<\G> without C</g> on a target string that has
3dd93342 1900not previously had a C</g> match applied to it is the same as using
1901the C<\A> assertion to match the beginning of the string. Note also
1902that, currently, C<\G> is only properly supported when anchored at the
1903very beginning of the pattern.
c90c0ff4
PP
1904
1905Examples:
a0d0e21e
LW
1906
1907 # list context
1908 ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
1909
1910 # scalar context
c543c01b
TC
1911 local $/ = "";
1912 while ($paragraph = <>) {
1913 while ($paragraph =~ /\p{Ll}['")]*[.!?]+['")]*\s/g) {
19799a22 1914 $sentences++;
a0d0e21e
LW
1915 }
1916 }
c543c01b
TC
1917 say $sentences;
1918
1919Here's another way to check for sentences in a paragraph:
1920
7188ca43
KW
1921 my $sentence_rx = qr{
1922 (?: (?<= ^ ) | (?<= \s ) ) # after start-of-string or
1923 # whitespace
1924 \p{Lu} # capital letter
1925 .*? # a bunch of anything
1926 (?<= \S ) # that ends in non-
1927 # whitespace
1928 (?<! \b [DMS]r ) # but isn't a common abbr.
1929 (?<! \b Mrs )
1930 (?<! \b Sra )
1931 (?<! \b St )
1932 [.?!] # followed by a sentence
1933 # ender
1934 (?= $ | \s ) # in front of end-of-string
1935 # or whitespace
1936 }sx;
1937 local $/ = "";
1938 while (my $paragraph = <>) {
1939 say "NEW PARAGRAPH";
1940 my $count = 0;
1941 while ($paragraph =~ /($sentence_rx)/g) {
1942 printf "\tgot sentence %d: <%s>\n", ++$count, $1;
c543c01b 1943 }
7188ca43 1944 }
c543c01b
TC
1945
1946Here's how to use C<m//gc> with C<\G>:
a0d0e21e 1947
137443ea 1948 $_ = "ppooqppqq";
44a8e56a
PP
1949 while ($i++ < 2) {
1950 print "1: '";
c90c0ff4 1951 print $1 while /(o)/gc; print "', pos=", pos, "\n";
44a8e56a 1952 print "2: '";
c90c0ff4 1953 print $1 if /\G(q)/gc; print "', pos=", pos, "\n";
44a8e56a 1954 print "3: '";
c90c0ff4 1955 print $1 while /(p)/gc; print "', pos=", pos, "\n";
44a8e56a 1956 }
5d43e42d 1957 print "Final: '$1', pos=",pos,"\n" if /\G(.)/;
44a8e56a
PP
1958
1959The last example should print:
1960
1961 1: 'oo', pos=4
137443ea 1962 2: 'q', pos=5
44a8e56a
PP
1963 3: 'pp', pos=7
1964 1: '', pos=7
137443ea
PP
1965 2: 'q', pos=8
1966 3: '', pos=8
5d43e42d
DC
1967 Final: 'q', pos=8
1968
1969Notice that the final match matched C<q> instead of C<p>, which a match
46f8a5ea
FC
1970without the C<\G> anchor would have done. Also note that the final match
1971did not update C<pos>. C<pos> is only updated on a C</g> match. If the
c543c01b
TC
1972final match did indeed match C<p>, it's a good bet that you're running a
1973very old (pre-5.6.0) version of Perl.
44a8e56a 1974
c90c0ff4 1975A useful idiom for C<lex>-like scanners is C</\G.../gc>. You can
e7ea3e70 1976combine several regexps like this to process a string part-by-part,
c90c0ff4
PP
1977doing different actions depending on which regexp matched. Each
1978regexp tries to match where the previous one leaves off.
e7ea3e70 1979
3fe9a6f1 1980 $_ = <<'EOL';
7188ca43
KW
1981 $url = URI::URL->new( "http://example.com/" );
1982 die if $url eq "xXx";
3fe9a6f1 1983 EOL
c543c01b
TC
1984
1985 LOOP: {
950b09ed 1986 print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc;
7188ca43
KW
1987 print(" lowercase"), redo LOOP
1988 if /\G\p{Ll}+\b[,.;]?\s*/gc;
1989 print(" UPPERCASE"), redo LOOP
1990 if /\G\p{Lu}+\b[,.;]?\s*/gc;
1991 print(" Capitalized"), redo LOOP
1992 if /\G\p{Lu}\p{Ll}+\b[,.;]?\s*/gc;
c543c01b 1993 print(" MiXeD"), redo LOOP if /\G\pL+\b[,.;]?\s*/gc;
7188ca43
KW
1994 print(" alphanumeric"), redo LOOP
1995 if /\G[\p{Alpha}\pN]+\b[,.;]?\s*/gc;
c543c01b 1996 print(" line-noise"), redo LOOP if /\G\W+/gc;
950b09ed 1997 print ". That's all!\n";
c543c01b 1998 }
e7ea3e70
IZ
1999
2000Here is the output (split into several lines):
2001
7188ca43
KW
2002 line-noise lowercase line-noise UPPERCASE line-noise UPPERCASE
2003 line-noise lowercase line-noise lowercase line-noise lowercase
2004 lowercase line-noise lowercase lowercase line-noise lowercase
2005 lowercase line-noise MiXeD line-noise. That's all!
44a8e56a 2006
33be4c61 2007=item m?PATTERN?msixpodualngc
725a61d7 2008X<?> X<operator, match-once>
87e95b7f 2009
33be4c61 2010=item ?PATTERN?msixpodualngc
55d389e7 2011
725a61d7
Z
2012This is just like the C<m/PATTERN/> search, except that it matches
2013only once between calls to the reset() operator. This is a useful
87e95b7f 2014optimization when you want to see only the first occurrence of
ceb131e8 2015something in each file of a set of files, for instance. Only C<m??>
87e95b7f
YO
2016patterns local to the current package are reset.
2017
2018 while (<>) {
ceb131e8 2019 if (m?^$?) {
87e95b7f
YO
2020 # blank line between header and body
2021 }
2022 } continue {
725a61d7 2023 reset if eof; # clear m?? status for next file
87e95b7f
YO
2024 }
2025
c543c01b
TC
2026Another example switched the first "latin1" encoding it finds
2027to "utf8" in a pod file:
2028
2029 s//utf8/ if m? ^ =encoding \h+ \K latin1 ?x;
2030
2031The match-once behavior is controlled by the match delimiter being
4932eeca 2032C<?>; with any other delimiter this is the normal C<m//> operator.
725a61d7 2033
0381ecf1
MH
2034In the past, the leading C<m> in C<m?PATTERN?> was optional, but omitting it
2035would produce a deprecation warning. As of v5.22.0, omitting it produces a
2036syntax error. If you encounter this construct in older code, you can just add
2037C<m>.
87e95b7f 2038
33be4c61 2039=item s/PATTERN/REPLACEMENT/msixpodualngcer
87e95b7f 2040X<substitute> X<substitution> X<replace> X<regexp, replace>
4f4d7508 2041X<regexp, substitute> X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c> X</e> X</r>
87e95b7f
YO
2042
2043Searches a string for a pattern, and if found, replaces that pattern
2044with the replacement text and returns the number of substitutions
2045made. Otherwise it returns false (specifically, the empty string).
2046
c543c01b 2047If the C</r> (non-destructive) option is used then it runs the
679563bb
KW
2048substitution on a copy of the string and instead of returning the
2049number of substitutions, it returns the copy whether or not a
c543c01b
TC
2050substitution occurred. The original string is never changed when
2051C</r> is used. The copy will always be a plain string, even if the
2052input is an object or a tied variable.
4f4d7508 2053
87e95b7f 2054If no string is specified via the C<=~> or C<!~> operator, the C<$_>
c543c01b
TC
2055variable is searched and modified. Unless the C</r> option is used,
2056the string specified must be a scalar variable, an array element, a
2057hash element, or an assignment to one of those; that is, some sort of
2058scalar lvalue.
87e95b7f
YO
2059
2060If the delimiter chosen is a single quote, no interpolation is
2061done on either the PATTERN or the REPLACEMENT. Otherwise, if the
2062PATTERN contains a $ that looks like a variable rather than an
2063end-of-string test, the variable will be interpolated into the pattern
2064at run-time. If you want the pattern compiled only once the first time
2065the variable is interpolated, use the C</o> option. If the pattern
2066evaluates to the empty string, the last successfully executed regular
2067expression is used instead. See L<perlre> for further explanation on these.
87e95b7f
YO
2068
2069Options are as with m// with the addition of the following replacement
2070specific options:
2071
2072 e Evaluate the right side as an expression.
7188ca43
KW
2073 ee Evaluate the right side as a string then eval the
2074 result.
2075 r Return substitution and leave the original string
2076 untouched.
87e95b7f 2077
ed02a3bf
DN
2078Any non-whitespace delimiter may replace the slashes. Add space after
2079the C<s> when using a character allowed in identifiers. If single quotes
2080are used, no interpretation is done on the replacement string (the C</e>
3ff8ecf9 2081modifier overrides this, however). Note that Perl treats backticks
ed02a3bf
DN
2082as normal delimiters; the replacement text is not evaluated as a command.
2083If the PATTERN is delimited by bracketing quotes, the REPLACEMENT has
1ca345ed 2084its own pair of quotes, which may or may not be bracketing quotes, for example,
87e95b7f
YO
2085C<s(foo)(bar)> or C<< s<foo>/bar/ >>. A C</e> will cause the
2086replacement portion to be treated as a full-fledged Perl expression
2087and evaluated right then and there. It is, however, syntax checked at
46f8a5ea 2088compile-time. A second C<e> modifier will cause the replacement portion
87e95b7f
YO
2089to be C<eval>ed before being run as a Perl expression.
2090
2091Examples:
2092
7188ca43 2093 s/\bgreen\b/mauve/g; # don't change wintergreen
87e95b7f
YO
2094
2095 $path =~ s|/usr/bin|/usr/local/bin|;
2096
2097 s/Login: $foo/Login: $bar/; # run-time pattern
2098
7188ca43
KW
2099 ($foo = $bar) =~ s/this/that/; # copy first, then
2100 # change
2101 ($foo = "$bar") =~ s/this/that/; # convert to string,
2102 # copy, then change
4f4d7508
DC
2103 $foo = $bar =~ s/this/that/r; # Same as above using /r
2104 $foo = $bar =~ s/this/that/r
7188ca43
KW
2105 =~ s/that/the other/r; # Chained substitutes
2106 # using /r
2107 @foo = map { s/this/that/r } @bar # /r is very useful in
2108 # maps
87e95b7f 2109
7188ca43 2110 $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-cnt
87e95b7f
YO
2111
2112 $_ = 'abc123xyz';
2113 s/\d+/$&*2/e; # yields 'abc246xyz'
2114 s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz'
2115 s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz'
2116
2117 s/%(.)/$percent{$1}/g; # change percent escapes; no /e
2118 s/%(.)/$percent{$1} || $&/ge; # expr now, so /e
2119 s/^=(\w+)/pod($1)/ge; # use function call
2120
4f4d7508 2121 $_ = 'abc123xyz';
db691027 2122 $x = s/abc/def/r; # $x is 'def123xyz' and
4f4d7508
DC
2123 # $_ remains 'abc123xyz'.
2124
87e95b7f
YO
2125 # expand variables in $_, but dynamics only, using
2126 # symbolic dereferencing
2127 s/\$(\w+)/${$1}/g;
2128
2129 # Add one to the value of any numbers in the string
2130 s/(\d+)/1 + $1/eg;
2131
c543c01b
TC
2132 # Titlecase words in the last 30 characters only
2133 substr($str, -30) =~ s/\b(\p{Alpha}+)\b/\u\L$1/g;
2134
87e95b7f
YO
2135 # This will expand any embedded scalar variable
2136 # (including lexicals) in $_ : First $1 is interpolated
2137 # to the variable name, and then evaluated
2138 s/(\$\w+)/$1/eeg;
2139
2140 # Delete (most) C comments.
2141 $program =~ s {
2142 /\* # Match the opening delimiter.
2143 .*? # Match a minimal number of characters.
2144 \*/ # Match the closing delimiter.
2145 } []gsx;
2146
7188ca43
KW
2147 s/^\s*(.*?)\s*$/$1/; # trim whitespace in $_,
2148 # expensively
87e95b7f 2149
7188ca43
KW
2150 for ($variable) { # trim whitespace in $variable,
2151 # cheap
87e95b7f
YO
2152 s/^\s+//;
2153 s/\s+$//;
2154 }
2155
2156 s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
2157
2158Note the use of $ instead of \ in the last example. Unlike
2159B<sed>, we use the \<I<digit>> form in only the left hand side.
2160Anywhere else it's $<I<digit>>.
2161
2162Occasionally, you can't use just a C</g> to get all the changes
2163to occur that you might want. Here are two common cases:
2164
2165 # put commas in the right places in an integer
2166 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g;
2167
2168 # expand tabs to 8-column spacing
2169 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
2170
2171=back
2172
2173=head2 Quote-Like Operators
2174X<operator, quote-like>
2175
01c6f5f4
RGS
2176=over 4
2177
a0d0e21e 2178=item q/STRING/
5d44bfff 2179X<q> X<quote, single> X<'> X<''>
a0d0e21e 2180
5d44bfff 2181=item 'STRING'
a0d0e21e 2182
19799a22 2183A single-quoted, literal string. A backslash represents a backslash
68dc0745
PP
2184unless followed by the delimiter or another backslash, in which case
2185the delimiter or backslash is interpolated.
a0d0e21e
LW
2186
2187 $foo = q!I said, "You said, 'She said it.'"!;
2188 $bar = q('This is it.');
68dc0745 2189 $baz = '\n'; # a two-character string
a0d0e21e
LW
2190
2191=item qq/STRING/
d74e8afc 2192X<qq> X<quote, double> X<"> X<"">
a0d0e21e
LW
2193
2194=item "STRING"
2195
2196A double-quoted, interpolated string.
2197
2198 $_ .= qq
2199 (*** The previous line contains the naughty word "$1".\n)
19799a22 2200 if /\b(tcl|java|python)\b/i; # :-)
68dc0745 2201 $baz = "\n"; # a one-character string
a0d0e21e
LW
2202
2203=item qx/STRING/
d74e8afc 2204X<qx> X<`> X<``> X<backtick>
a0d0e21e
LW
2205
2206=item `STRING`
2207
43dd4d21 2208A string which is (possibly) interpolated and then executed as a
f703fc96 2209system command with F</bin/sh> or its equivalent. Shell wildcards,
43dd4d21
JH
2210pipes, and redirections will be honored. The collected standard
2211output of the command is returned; standard error is unaffected. In
2212scalar context, it comes back as a single (potentially multi-line)
2213string, or undef if the command failed. In list context, returns a
2214list of lines (however you've defined lines with $/ or
2215$INPUT_RECORD_SEPARATOR), or an empty list if the command failed.
5a964f20
TC
2216
2217Because backticks do not affect standard error, use shell file descriptor
2218syntax (assuming the shell supports this) if you care to address this.
2219To capture a command's STDERR and STDOUT together:
a0d0e21e 2220
5a964f20
TC
2221 $output = `cmd 2>&1`;
2222
2223To capture a command's STDOUT but discard its STDERR:
2224
2225 $output = `cmd 2>/dev/null`;
2226
2227To capture a command's STDERR but discard its STDOUT (ordering is
2228important here):
2229
2230 $output = `cmd 2>&1 1>/dev/null`;
2231
2232To exchange a command's STDOUT and STDERR in order to capture the STDERR
2233but leave its STDOUT to come out the old STDERR:
2234
2235 $output = `cmd 3>&1 1>&2 2>&3 3>&-`;
2236
2237To read both a command's STDOUT and its STDERR separately, it's easiest
2359510d
SD
2238to redirect them separately to files, and then read from those files
2239when the program is done:
5a964f20 2240
2359510d 2241 system("program args 1>program.stdout 2>program.stderr");
5a964f20 2242
30398227
SP
2243The STDIN filehandle used by the command is inherited from Perl's STDIN.
2244For example:
2245
c543c01b
TC
2246 open(SPLAT, "stuff") || die "can't open stuff: $!";
2247 open(STDIN, "<&SPLAT") || die "can't dupe SPLAT: $!";
40bbb707 2248 print STDOUT `sort`;
30398227 2249
40bbb707 2250will print the sorted contents of the file named F<"stuff">.
30398227 2251
5a964f20
TC
2252Using single-quote as a delimiter protects the command from Perl's
2253double-quote interpolation, passing it on to the shell instead:
2254
2255 $perl_info = qx(ps $$); # that's Perl's $$
2256 $shell_info = qx'ps $$'; # that's the new shell's $$
2257
19799a22 2258How that string gets evaluated is entirely subject to the command
5a964f20
TC
2259interpreter on your system. On most platforms, you will have to protect
2260shell metacharacters if you want them treated literally. This is in
2261practice difficult to do, as it's unclear how to escape which characters.
2262See L<perlsec> for a clean and safe example of a manual fork() and exec()
2263to emulate backticks safely.
a0d0e21e 2264
bb32b41a
GS
2265On some platforms (notably DOS-like ones), the shell may not be
2266capable of dealing with multiline commands, so putting newlines in
2267the string may not get you what you want. You may be able to evaluate
2268multiple commands in a single line by separating them with the command
1ca345ed
TC
2269separator character, if your shell supports that (for example, C<;> on
2270many Unix shells and C<&> on the Windows NT C<cmd> shell).
bb32b41a 2271
3ff8ecf9 2272Perl will attempt to flush all files opened for
0f897271
GS
2273output before starting the child process, but this may not be supported
2274on some platforms (see L<perlport>). To be safe, you may need to set
2275C<$|> ($AUTOFLUSH in English) or call the C<autoflush()> method of
2276C<IO::Handle> on any open handles.
2277
bb32b41a
GS
2278Beware that some command shells may place restrictions on the length
2279of the command line. You must ensure your strings don't exceed this
2280limit after any necessary interpolations. See the platform-specific
2281release notes for more details about your particular environment.
2282
5a964f20
TC
2283Using this operator can lead to programs that are difficult to port,
2284because the shell commands called vary between systems, and may in
2285fact not be present at all. As one example, the C<type> command under
2286the POSIX shell is very different from the C<type> command under DOS.
2287That doesn't mean you should go out of your way to avoid backticks
2288when they're the right way to get something done. Perl was made to be
2289a glue language, and one of the things it glues together is commands.
2290Just understand what you're getting yourself into.
bb32b41a 2291
da87341d 2292See L</"I/O Operators"> for more discussion.
a0d0e21e 2293
945c54fd 2294=item qw/STRING/
d74e8afc 2295X<qw> X<quote, list> X<quote, words>
945c54fd
JH
2296
2297Evaluates to a list of the words extracted out of STRING, using embedded
2298whitespace as the word delimiters. It can be understood as being roughly
2299equivalent to:
2300
c543c01b 2301 split(" ", q/STRING/);
945c54fd 2302
efb1e162
CW
2303the differences being that it generates a real list at compile time, and
2304in scalar context it returns the last element in the list. So
945c54fd
JH
2305this expression:
2306
2307 qw(foo bar baz)
2308
2309is semantically equivalent to the list:
2310
c543c01b 2311 "foo", "bar", "baz"
945c54fd
JH
2312
2313Some frequently seen examples:
2314
2315 use POSIX qw( setlocale localeconv )
2316 @EXPORT = qw( foo bar baz );
2317
2318A common mistake is to try to separate the words with comma or to
2319put comments into a multi-line C<qw>-string. For this reason, the
89d205f2 2320C<use warnings> pragma and the B<-w> switch (that is, the C<$^W> variable)
945c54fd
JH
2321produces warnings if the STRING contains the "," or the "#" character.
2322
8ff32507 2323=item tr/SEARCHLIST/REPLACEMENTLIST/cdsr
d74e8afc 2324X<tr> X<y> X<transliterate> X</c> X</d> X</s>
a0d0e21e 2325
8ff32507 2326=item y/SEARCHLIST/REPLACEMENTLIST/cdsr
a0d0e21e 2327
2c268ad5 2328Transliterates all occurrences of the characters found in the search list
a0d0e21e
LW
2329with the corresponding character in the replacement list. It returns
2330the number of characters replaced or deleted. If no string is
c543c01b
TC
2331specified via the C<=~> or C<!~> operator, the $_ string is transliterated.
2332
2333If the C</r> (non-destructive) option is present, a new copy of the string
2334is made and its characters transliterated, and this copy is returned no
2335matter whether it was modified or not: the original string is always
2336left unchanged. The new copy is always a plain string, even if the input
2337string is an object or a tied variable.
8ada0baa 2338
c543c01b
TC
2339Unless the C</r> option is used, the string specified with C<=~> must be a
2340scalar variable, an array element, a hash element, or an assignment to one
2341of those; in other words, an lvalue.
8ff32507 2342
89d205f2 2343A character range may be specified with a hyphen, so C<tr/A-J/0-9/>
2c268ad5 2344does the same replacement as C<tr/ACEGIBDFHJ/0246813579/>.
54310121
PP
2345For B<sed> devotees, C<y> is provided as a synonym for C<tr>. If the
2346SEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST has
c543c01b
TC
2347its own pair of quotes, which may or may not be bracketing quotes;
2348for example, C<tr[aeiouy][yuoiea]> or C<tr(+\-*/)/ABCD/>.
2349
2350Note that C<tr> does B<not> do regular expression character classes such as
2351C<\d> or C<\pL>. The C<tr> operator is not equivalent to the tr(1)
2352utility. If you want to map strings between lower/upper cases, see
2353L<perlfunc/lc> and L<perlfunc/uc>, and in general consider using the C<s>
2354operator if you need regular expressions. The C<\U>, C<\u>, C<\L>, and
2355C<\l> string-interpolation escapes on the right side of a substitution
2356operator will perform correct case-mappings, but C<tr[a-z][A-Z]> will not
2357(except sometimes on legacy 7-bit data).
cc255d5f 2358
8ada0baa
JH
2359Note also that the whole range idea is rather unportable between
2360character sets--and even within character sets they may cause results
2361you probably didn't expect. A sound principle is to use only ranges
2362that begin from and end at either alphabets of equal case (a-e, A-E),
2363or digits (0-4). Anything else is unsafe. If in doubt, spell out the
2364character sets in full.
2365
a0d0e21e
LW
2366Options:
2367
2368 c Complement the SEARCHLIST.
2369 d Delete found but unreplaced characters.
2370 s Squash duplicate replaced characters.
8ff32507
FC
2371 r Return the modified string and leave the original string
2372 untouched.
a0d0e21e 2373
19799a22
GS
2374If the C</c> modifier is specified, the SEARCHLIST character set
2375is complemented. If the C</d> modifier is specified, any characters
2376specified by SEARCHLIST not found in REPLACEMENTLIST are deleted.
2377(Note that this is slightly more flexible than the behavior of some
2378B<tr> programs, which delete anything they find in the SEARCHLIST,
46f8a5ea 2379period.) If the C</s> modifier is specified, sequences of characters
19799a22
GS
2380that were transliterated to the same character are squashed down
2381to a single instance of the character.
a0d0e21e
LW
2382
2383If the C</d> modifier is used, the REPLACEMENTLIST is always interpreted
2384exactly as specified. Otherwise, if the REPLACEMENTLIST is shorter
2385than the SEARCHLIST, the final character is replicated till it is long
5a964f20 2386enough. If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated.
a0d0e21e
LW
2387This latter is useful for counting characters in a class or for
2388squashing character sequences in a class.
2389
2390Examples:
2391
c543c01b 2392 $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case ASCII
a0d0e21e
LW
2393
2394 $cnt = tr/*/*/; # count the stars in $_
2395
2396 $cnt = $sky =~ tr/*/*/; # count the stars in $sky
2397
2398 $cnt = tr/0-9//; # count the digits in $_
2399
2400 tr/a-zA-Z//s; # bookkeeper -> bokeper
2401
2402 ($HOST = $host) =~ tr/a-z/A-Z/;
c543c01b 2403 $HOST = $host =~ tr/a-z/A-Z/r; # same thing
8ff32507 2404
c543c01b 2405 $HOST = $host =~ tr/a-z/A-Z/r # chained with s///r
8ff32507 2406 =~ s/:/ -p/r;
a0d0e21e
LW
2407
2408 tr/a-zA-Z/ /cs; # change non-alphas to single space
2409
8ff32507
FC
2410 @stripped = map tr/a-zA-Z/ /csr, @original;
2411 # /r with map
2412
a0d0e21e 2413 tr [\200-\377]
c543c01b 2414 [\000-\177]; # wickedly delete 8th bit
a0d0e21e 2415
19799a22
GS
2416If multiple transliterations are given for a character, only the
2417first one is used:
748a9306
LW
2418
2419 tr/AAA/XYZ/
2420
2c268ad5 2421will transliterate any A to X.
748a9306 2422
19799a22 2423Because the transliteration table is built at compile time, neither
a0d0e21e 2424the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote
19799a22
GS
2425interpolation. That means that if you want to use variables, you
2426must use an eval():
a0d0e21e
LW
2427
2428 eval "tr/$oldlist/$newlist/";
2429 die $@ if $@;
2430
2431 eval "tr/$oldlist/$newlist/, 1" or die $@;
2432
7e3b091d 2433=item <<EOF
d74e8afc 2434X<here-doc> X<heredoc> X<here-document> X<<< << >>>
7e3b091d
DA
2435
2436A line-oriented form of quoting is based on the shell "here-document"
2437syntax. Following a C<< << >> you specify a string to terminate
2438the quoted material, and all lines following the current line down to
89d205f2
YO
2439the terminating string are the value of the item.
2440
2441The terminating string may be either an identifier (a word), or some
2442quoted text. An unquoted identifier works like double quotes.
2443There may not be a space between the C<< << >> and the identifier,
2444unless the identifier is explicitly quoted. (If you put a space it
2445will be treated as a null identifier, which is valid, and matches the
2446first empty line.) The terminating string must appear by itself
2447(unquoted and with no surrounding whitespace) on the terminating line.
2448
2449If the terminating string is quoted, the type of quotes used determine
2450the treatment of the text.
2451
2452=over 4
2453
2454=item Double Quotes
2455
2456Double quotes indicate that the text will be interpolated using exactly
2457the same rules as normal double quoted strings.
7e3b091d
DA
2458
2459 print <<EOF;
2460 The price is $Price.
2461 EOF
2462
2463 print << "EOF"; # same as above
2464 The price is $Price.
2465 EOF
2466
89d205f2
YO
2467
2468=item Single Quotes
2469
2470Single quotes indicate the text is to be treated literally with no
46f8a5ea 2471interpolation of its content. This is similar to single quoted
89d205f2
YO
2472strings except that backslashes have no special meaning, with C<\\>
2473being treated as two backslashes and not one as they would in every
2474other quoting construct.
2475
c543c01b
TC
2476Just as in the shell, a backslashed bareword following the C<<< << >>>
2477means the same thing as a single-quoted string does:
2478
2479 $cost = <<'VISTA'; # hasta la ...
2480 That'll be $10 please, ma'am.
2481 VISTA
2482
2483 $cost = <<\VISTA; # Same thing!
2484 That'll be $10 please, ma'am.
2485 VISTA
2486
89d205f2
YO
2487This is the only form of quoting in perl where there is no need
2488to worry about escaping content, something that code generators
2489can and do make good use of.
2490
2491=item Backticks
2492
2493The content of the here doc is treated just as it would be if the
46f8a5ea 2494string were embedded in backticks. Thus the content is interpolated
89d205f2
YO
2495as though it were double quoted and then executed via the shell, with
2496the results of the execution returned.
2497
2498 print << `EOC`; # execute command and get results
7e3b091d 2499 echo hi there
7e3b091d
DA
2500 EOC
2501
89d205f2
YO
2502=back
2503
2504It is possible to stack multiple here-docs in a row:
2505
7e3b091d
DA
2506 print <<"foo", <<"bar"; # you can stack them
2507 I said foo.
2508 foo
2509 I said bar.
2510 bar
2511
2512 myfunc(<< "THIS", 23, <<'THAT');
2513 Here's a line
2514 or two.
2515 THIS
2516 and here's another.
2517 THAT
2518
2519Just don't forget that you have to put a semicolon on the end
2520to finish the statement, as Perl doesn't know you're not going to
2521try to do this:
2522
2523 print <<ABC
2524 179231
2525 ABC
2526 + 20;
2527
872d7e53
ST
2528If you want to remove the line terminator from your here-docs,
2529use C<chomp()>.
2530
2531 chomp($string = <<'END');
2532 This is a string.
2533 END
2534
2535If you want your here-docs to be indented with the rest of the code,
2536you'll need to remove leading whitespace from each line manually:
7e3b091d
DA
2537
2538 ($quote = <<'FINIS') =~ s/^\s+//gm;
89d205f2 2539 The Road goes ever on and on,
7e3b091d
DA
2540 down from the door where it began.
2541 FINIS
2542
2543If you use a here-doc within a delimited construct, such as in C<s///eg>,
1bf48760
FC
2544the quoted material must still come on the line following the
2545C<<< <<FOO >>> marker, which means it may be inside the delimited
2546construct:
7e3b091d
DA
2547
2548 s/this/<<E . 'that'
2549 the other
2550 E
2551 . 'more '/eg;
2552
1bf48760
FC
2553It works this way as of Perl 5.18. Historically, it was inconsistent, and
2554you would have to write
7e3b091d 2555
89d205f2
YO
2556 s/this/<<E . 'that'
2557 . 'more '/eg;
2558 the other
2559 E
7e3b091d 2560
1bf48760
FC
2561outside of string evals.
2562
c543c01b 2563Additionally, quoting rules for the end-of-string identifier are
46f8a5ea 2564unrelated to Perl's quoting rules. C<q()>, C<qq()>, and the like are not
89d205f2
YO
2565supported in place of C<''> and C<"">, and the only interpolation is for
2566backslashing the quoting character:
7e3b091d
DA
2567
2568 print << "abc\"def";
2569 testing...
2570 abc"def
2571
2572Finally, quoted strings cannot span multiple lines. The general rule is
2573that the identifier must be a string literal. Stick with that, and you
2574should be safe.
2575
a0d0e21e
LW
2576=back
2577
75e14d17 2578=head2 Gory details of parsing quoted constructs
d74e8afc 2579X<quote, gory details>
75e14d17 2580
19799a22
GS
2581When presented with something that might have several different
2582interpretations, Perl uses the B<DWIM> (that's "Do What I Mean")
2583principle to pick the most probable interpretation. This strategy
2584is so successful that Perl programmers often do not suspect the
2585ambivalence of what they write. But from time to time, Perl's
2586notions differ substantially from what the author honestly meant.
2587
2588This section hopes to clarify how Perl handles quoted constructs.
2589Although the most common reason to learn this is to unravel labyrinthine
2590regular expressions, because the initial steps of parsing are the
2591same for all quoting operators, they are all discussed together.
2592
2593The most important Perl parsing rule is the first one discussed
2594below: when processing a quoted construct, Perl first finds the end
2595of that construct, then interprets its contents. If you understand
2596this rule, you may skip the rest of this section on the first
2597reading. The other rules are likely to contradict the user's
2598expectations much less frequently than this first one.
2599
2600Some passes discussed below are performed concurrently, but because
2601their results are the same, we consider them individually. For different
2602quoting constructs, Perl performs different numbers of passes, from
6deea57f 2603one to four, but these passes are always performed in the same order.
75e14d17 2604
13a2d996 2605=over 4
75e14d17
IZ
2606
2607=item Finding the end
2608
6deea57f
ST
2609The first pass is finding the end of the quoted construct, where
2610the information about the delimiters is used in parsing.
2611During this search, text between the starting and ending delimiters
46f8a5ea 2612is copied to a safe location. The text copied gets delimiter-independent.
6deea57f
ST
2613
2614If the construct is a here-doc, the ending delimiter is a line
46f8a5ea 2615that has a terminating string as the content. Therefore C<<<EOF> is
6deea57f
ST
2616terminated by C<EOF> immediately followed by C<"\n"> and starting
2617from the first column of the terminating line.
2618When searching for the terminating line of a here-doc, nothing
46f8a5ea 2619is skipped. In other words, lines after the here-doc syntax
6deea57f
ST
2620are compared with the terminating string line by line.
2621
2622For the constructs except here-docs, single characters are used as starting
46f8a5ea 2623and ending delimiters. If the starting delimiter is an opening punctuation
6deea57f
ST
2624(that is C<(>, C<[>, C<{>, or C<< < >>), the ending delimiter is the
2625corresponding closing punctuation (that is C<)>, C<]>, C<}>, or C<< > >>).
2626If the starting delimiter is an unpaired character like C</> or a closing
2627punctuation, the ending delimiter is same as the starting delimiter.
2628Therefore a C</> terminates a C<qq//> construct, while a C<]> terminates
fc693347 2629both C<qq[]> and C<qq]]> constructs.
6deea57f
ST
2630
2631When searching for single-character delimiters, escaped delimiters
1ca345ed 2632and C<\\> are skipped. For example, while searching for terminating C</>,
6deea57f
ST
2633combinations of C<\\> and C<\/> are skipped. If the delimiters are
2634bracketing, nested pairs are also skipped. For example, while searching
2635for closing C<]> paired with the opening C<[>, combinations of C<\\>, C<\]>,
2636and C<\[> are all skipped, and nested C<[> and C<]> are skipped as well.
2637However, when backslashes are used as the delimiters (like C<qq\\> and
2638C<tr\\\>), nothing is skipped.
32581033 2639During the search for the end, backslashes that escape delimiters or
7188ca43 2640other backslashes are removed (exactly speaking, they are not copied to the
32581033 2641safe location).
75e14d17 2642
19799a22
GS
2643For constructs with three-part delimiters (C<s///>, C<y///>, and
2644C<tr///>), the search is repeated once more.
fc693347 2645If the first delimiter is not an opening punctuation, the three delimiters must
d74605e5
FC
2646be the same, such as C<s!!!> and C<tr)))>,
2647in which case the second delimiter
6deea57f 2648terminates the left part and starts the right part at once.
b6538e4f 2649If the left part is delimited by bracketing punctuation (that is C<()>,
6deea57f 2650C<[]>, C<{}>, or C<< <> >>), the right part needs another pair of
b6538e4f 2651delimiters such as C<s(){}> and C<tr[]//>. In these cases, whitespace
fc693347 2652and comments are allowed between the two parts, though the comment must follow
b6538e4f
TC
2653at least one whitespace character; otherwise a character expected as the
2654start of the comment may be regarded as the starting delimiter of the right part.
75e14d17 2655
19799a22
GS
2656During this search no attention is paid to the semantics of the construct.
2657Thus:
75e14d17
IZ
2658
2659 "$hash{"$foo/$bar"}"
2660
2a94b7ce 2661or:
75e14d17 2662
89d205f2 2663 m/
2a94b7ce 2664 bar # NOT a comment, this slash / terminated m//!
75e14d17
IZ
2665 /x
2666
19799a22
GS
2667do not form legal quoted expressions. The quoted part ends on the
2668first C<"> and C</>, and the rest happens to be a syntax error.
2669Because the slash that terminated C<m//> was followed by a C<SPACE>,
2670the example above is not C<m//x>, but rather C<m//> with no C</x>
2671modifier. So the embedded C<#> is interpreted as a literal C<#>.
75e14d17 2672
89d205f2 2673Also no attention is paid to C<\c\> (multichar control char syntax) during
46f8a5ea 2674this search. Thus the second C<\> in C<qq/\c\/> is interpreted as a part
89d205f2 2675of C<\/>, and the following C</> is not recognized as a delimiter.
0d594e51
ST
2676Instead, use C<\034> or C<\x1c> at the end of quoted constructs.
2677
75e14d17 2678=item Interpolation
d74e8afc 2679X<interpolation>
75e14d17 2680
19799a22 2681The next step is interpolation in the text obtained, which is now
89d205f2 2682delimiter-independent. There are multiple cases.
75e14d17 2683
13a2d996 2684=over 4
75e14d17 2685
89d205f2 2686=item C<<<'EOF'>
75e14d17
IZ
2687
2688No interpolation is performed.
6deea57f
ST
2689Note that the combination C<\\> is left intact, since escaped delimiters
2690are not available for here-docs.
75e14d17 2691
6deea57f 2692=item C<m''>, the pattern of C<s'''>
89d205f2 2693
6deea57f
ST
2694No interpolation is performed at this stage.
2695Any backslashed sequences including C<\\> are treated at the stage
2696to L</"parsing regular expressions">.
89d205f2 2697
6deea57f 2698=item C<''>, C<q//>, C<tr'''>, C<y'''>, the replacement of C<s'''>
75e14d17 2699
89d205f2 2700The only interpolation is removal of C<\> from pairs of C<\\>.
6deea57f
ST
2701Therefore C<-> in C<tr'''> and C<y'''> is treated literally
2702as a hyphen and no character range is available.
2703C<\1> in the replacement of C<s'''> does not work as C<$1>.
89d205f2
YO
2704
2705=item C<tr///>, C<y///>
2706
6deea57f
ST
2707No variable interpolation occurs. String modifying combinations for
2708case and quoting such as C<\Q>, C<\U>, and C<\E> are not recognized.
2709The other escape sequences such as C<\200> and C<\t> and backslashed
2710characters such as C<\\> and C<\-> are converted to appropriate literals.
89d205f2
YO
2711The character C<-> is treated specially and therefore C<\-> is treated
2712as a literal C<->.
75e14d17 2713
89d205f2 2714=item C<"">, C<``>, C<qq//>, C<qx//>, C<< <file*glob> >>, C<<<"EOF">
75e14d17 2715
628253b8 2716C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\F> (possibly paired with C<\E>) are
19799a22
GS
2717converted to corresponding Perl constructs. Thus, C<"$foo\Qbaz$bar">
2718is converted to C<$foo . (quotemeta("baz" . $bar))> internally.
6deea57f
ST
2719The other escape sequences such as C<\200> and C<\t> and backslashed
2720characters such as C<\\> and C<\-> are replaced with appropriate
2721expansions.
2a94b7ce 2722
19799a22
GS
2723Let it be stressed that I<whatever falls between C<\Q> and C<\E>>
2724is interpolated in the usual way. Something like C<"\Q\\E"> has
48cbae4f 2725no C<\E> inside. Instead, it has C<\Q>, C<\\>, and C<E>, so the
19799a22
GS
2726result is the same as for C<"\\\\E">. As a general rule, backslashes
2727between C<\Q> and C<\E> may lead to counterintuitive results. So,
2728C<"\Q\t\E"> is converted to C<quotemeta("\t")>, which is the same
2729as C<"\\\t"> (since TAB is not alphanumeric). Note also that:
2a94b7ce
IZ
2730
2731 $str = '\t';
2732 return "\Q$str";
2733
2734may be closer to the conjectural I<intention> of the writer of C<"\Q\t\E">.
2735
19799a22 2736Interpolated scalars and arrays are converted internally to the C<join> and
92d29cee 2737C<.> catenation operations. Thus, C<"$foo XXX '@arr'"> becomes:
75e14d17 2738
19799a22 2739 $foo . " XXX '" . (join $", @arr) . "'";
75e14d17 2740
19799a22 2741All operations above are performed simultaneously, left to right.
75e14d17 2742
19799a22
GS
2743Because the result of C<"\Q STRING \E"> has all metacharacters
2744quoted, there is no way to insert a literal C<$> or C<@> inside a
2745C<\Q\E> pair. If protected by C<\>, C<$> will be quoted to became
2746C<"\\\$">; if not, it is interpreted as the start of an interpolated
2747scalar.
75e14d17 2748
19799a22 2749Note also that the interpolation code needs to make a decision on
89d205f2 2750where the interpolated scalar ends. For instance, whether
db691027 2751C<< "a $x -> {c}" >> really means:
75e14d17 2752
db691027 2753 "a " . $x . " -> {c}";
75e14d17 2754
2a94b7ce 2755or:
75e14d17 2756
db691027 2757 "a " . $x -> {c};
75e14d17 2758
19799a22
GS
2759Most of the time, the longest possible text that does not include
2760spaces between components and which contains matching braces or
2761brackets. because the outcome may be determined by voting based
2762on heuristic estimators, the result is not strictly predictable.
2763Fortunately, it's usually correct for ambiguous cases.
75e14d17 2764
6deea57f 2765=item the replacement of C<s///>
75e14d17 2766
628253b8 2767Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\F> and interpolation
6deea57f
ST
2768happens as with C<qq//> constructs.
2769
2770It is at this step that C<\1> is begrudgingly converted to C<$1> in
2771the replacement text of C<s///>, in order to correct the incorrigible
2772I<sed> hackers who haven't picked up the saner idiom yet. A warning
2773is emitted if the C<use warnings> pragma or the B<-w> command-line flag
2774(that is, the C<$^W> variable) was set.
2775
2776=item C<RE> in C<?RE?>, C</RE/>, C<m/RE/>, C<s/RE/foo/>,
2777
628253b8 2778Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\F>, C<\E>,
cc74c5bd
ST
2779and interpolation happens (almost) as with C<qq//> constructs.
2780
5d03b57c
KW
2781Processing of C<\N{...}> is also done here, and compiled into an intermediate
2782form for the regex compiler. (This is because, as mentioned below, the regex
2783compilation may be done at execution time, and C<\N{...}> is a compile-time
2784construct.)
2785
cc74c5bd
ST
2786However any other combinations of C<\> followed by a character
2787are not substituted but only skipped, in order to parse them
2788as regular expressions at the following step.
6deea57f 2789As C<\c> is skipped at this step, C<@> of C<\c@> in RE is possibly
1749ea0d 2790treated as an array symbol (for example C<@foo>),
6deea57f 2791even though the same text in C<qq//> gives interpolation of C<\c@>.
6deea57f 2792
e128ab2c
DM
2793Code blocks such as C<(?{BLOCK})> are handled by temporarily passing control
2794back to the perl parser, in a similar way that an interpolated array
2795subscript expression such as C<"foo$array[1+f("[xyz")]bar"> would be.
2796
6deea57f 2797Moreover, inside C<(?{BLOCK})>, C<(?# comment )>, and
19799a22
GS
2798a C<#>-comment in a C<//x>-regular expression, no processing is
2799performed whatsoever. This is the first step at which the presence
2800of the C<//x> modifier is relevant.
2801
1749ea0d
ST
2802Interpolation in patterns has several quirks: C<$|>, C<$(>, C<$)>, C<@+>
2803and C<@-> are not interpolated, and constructs C<$var[SOMETHING]> are
2804voted (by several different estimators) to be either an array element
2805or C<$var> followed by an RE alternative. This is where the notation
19799a22
GS
2806C<${arr[$bar]}> comes handy: C</${arr[0-9]}/> is interpreted as
2807array element C<-9>, not as a regular expression from the variable
2808C<$arr> followed by a digit, which would be the interpretation of
2809C</$arr[0-9]/>. Since voting among different estimators may occur,
2810the result is not predictable.
2811
19799a22
GS
2812The lack of processing of C<\\> creates specific restrictions on
2813the post-processed text. If the delimiter is C</>, one cannot get
2814the combination C<\/> into the result of this step. C</> will
2815finish the regular expression, C<\/> will be stripped to C</> on
2816the previous step, and C<\\/> will be left as is. Because C</> is
2817equivalent to C<\/> inside a regular expression, this does not
2818matter unless the delimiter happens to be character special to the
2819RE engine, such as in C<s*foo*bar*>, C<m[foo]>, or C<?foo?>; or an
2820alphanumeric char, as in:
2a94b7ce
IZ
2821
2822 m m ^ a \s* b mmx;
2823
19799a22 2824In the RE above, which is intentionally obfuscated for illustration, the
6deea57f 2825delimiter is C<m>, the modifier is C<mx>, and after delimiter-removal the
89d205f2 2826RE is the same as for C<m/ ^ a \s* b /mx>. There's more than one
19799a22
GS
2827reason you're encouraged to restrict your delimiters to non-alphanumeric,
2828non-whitespace choices.
75e14d17
IZ
2829
2830=back
2831
19799a22 2832This step is the last one for all constructs except regular expressions,
75e14d17
IZ
2833which are processed further.
2834
6deea57f
ST
2835=item parsing regular expressions
2836X<regexp, parse>
75e14d17 2837
19799a22 2838Previous steps were performed during the compilation of Perl code,
ac036724 2839but this one happens at run time, although it may be optimized to
19799a22 2840be calculated at compile time if appropriate. After preprocessing
6deea57f 2841described above, and possibly after evaluation if concatenation,
19799a22
GS
2842joining, casing translation, or metaquoting are involved, the
2843resulting I<string> is passed to the RE engine for compilation.
2844
2845Whatever happens in the RE engine might be better discussed in L<perlre>,
2846but for the sake of continuity, we shall do so here.
2847
2848This is another step where the presence of the C<//x> modifier is
2849relevant. The RE engine scans the string from left to right and
2850converts it to a finite automaton.
2851
2852Backslashed characters are either replaced with corresponding
2853literal strings (as with C<\{>), or else they generate special nodes
2854in the finite automaton (as with C<\b>). Characters special to the
2855RE engine (such as C<|>) generate corresponding nodes or groups of
2856nodes. C<(?#...)> comments are ignored. All the rest is either
2857converted to literal strings to match, or else is ignored (as is
2858whitespace and C<#>-style comments if C<//x> is present).
2859
2860Parsing of the bracketed character class construct, C<[...]>, is
2861rather different than the rule used for the rest of the pattern.
2862The terminator of this construct is found using the same rules as
2863for finding the terminator of a C<{}>-delimited construct, the only
2864exception being that C<]> immediately following C<[> is treated as
e128ab2c
DM
2865though preceded by a backslash.
2866
2867The terminator of runtime C<(?{...})> is found by temporarily switching
2868control to the perl parser, which should stop at the point where the
2869logically balancing terminating C<}> is found.
19799a22
GS
2870
2871It is possible to inspect both the string given to RE engine and the
2872resulting finite automaton. See the arguments C<debug>/C<debugcolor>
2873in the C<use L<re>> pragma, as well as Perl's B<-Dr> command-line
4a4eefd0 2874switch documented in L<perlrun/"Command Switches">.
75e14d17
IZ
2875
2876=item Optimization of regular expressions
d74e8afc 2877X<regexp, optimization>
75e14d17 2878
7522fed5 2879This step is listed for completeness only. Since it does not change
75e14d17 2880semantics, details of this step are not documented and are subject
19799a22
GS
2881to change without notice. This step is performed over the finite
2882automaton that was generated during the previous pass.
2a94b7ce 2883
19799a22
GS
2884It is at this stage that C<split()> silently optimizes C</^/> to
2885mean C</^/m>.
75e14d17
IZ
2886
2887=back
2888
a0d0e21e 2889=head2 I/O Operators
d74e8afc 2890X<operator, i/o> X<operator, io> X<io> X<while> X<filehandle>
80a96bfc 2891X<< <> >> X<< <<>> >> X<@ARGV>
a0d0e21e 2892
54310121 2893There are several I/O operators you should know about.
fbad3eb5 2894
7b8d334a 2895A string enclosed by backticks (grave accents) first undergoes
19799a22
GS
2896double-quote interpolation. It is then interpreted as an external
2897command, and the output of that command is the value of the
e9c56f9b
JH
2898backtick string, like in a shell. In scalar context, a single string
2899consisting of all output is returned. In list context, a list of
2900values is returned, one per line of output. (You can set C<$/> to use
2901a different line terminator.) The command is executed each time the
2902pseudo-literal is evaluated. The status value of the command is
2903returned in C<$?> (see L<perlvar> for the interpretation of C<$?>).
2904Unlike in B<csh>, no translation is done on the return data--newlines
2905remain newlines. Unlike in any of the shells, single quotes do not
2906hide variable names in the command from interpretation. To pass a
2907literal dollar-sign through to the shell you need to hide it with a
2908backslash. The generalized form of backticks is C<qx//>. (Because
2909backticks always undergo shell expansion as well, see L<perlsec> for
2910security concerns.)
d74e8afc 2911X<qx> X<`> X<``> X<backtick> X<glob>
19799a22
GS
2912
2913In scalar context, evaluating a filehandle in angle brackets yields
2914the next line from that file (the newline, if any, included), or
2915C<undef> at end-of-file or on error. When C<$/> is set to C<undef>
2916(sometimes known as file-slurp mode) and the file is empty, it
2917returns C<''> the first time, followed by C<undef> subsequently.
2918
2919Ordinarily you must assign the returned value to a variable, but
2920there is one situation where an automatic assignment happens. If
2921and only if the input symbol is the only thing inside the conditional
2922of a C<while> statement (even if disguised as a C<for(;;)> loop),
2923the value is automatically assigned to the global variable $_,
2924destroying whatever was there previously. (This may seem like an
2925odd thing to you, but you'll use the construct in almost every Perl
17b829fa 2926script you write.) The $_ variable is not implicitly localized.
19799a22
GS
2927You'll have to put a C<local $_;> before the loop if you want that
2928to happen.
2929
2930The following lines are equivalent:
a0d0e21e 2931
748a9306 2932 while (defined($_ = <STDIN>)) { print; }
7b8d334a 2933 while ($_ = <STDIN>) { print; }
a0d0e21e
LW
2934 while (<STDIN>) { print; }
2935 for (;<STDIN>;) { print; }
748a9306 2936 print while defined($_ = <STDIN>);
7b8d334a 2937 print while ($_ = <STDIN>);
a0d0e21e
LW
2938 print while <STDIN>;
2939
1ca345ed
TC
2940This also behaves similarly, but assigns to a lexical variable
2941instead of to C<$_>:
7b8d334a 2942
89d205f2 2943 while (my $line = <STDIN>) { print $line }
7b8d334a 2944
19799a22
GS
2945In these loop constructs, the assigned value (whether assignment
2946is automatic or explicit) is then tested to see whether it is
1ca345ed
TC
2947defined. The defined test avoids problems where the line has a string
2948value that would be treated as false by Perl; for example a "" or
19799a22
GS
2949a "0" with no trailing newline. If you really mean for such values
2950to terminate the loop, they should be tested for explicitly:
7b8d334a
GS
2951
2952 while (($_ = <STDIN>) ne '0') { ... }
2953 while (<STDIN>) { last unless $_; ... }
2954
1ca345ed 2955In other boolean contexts, C<< <FILEHANDLE> >> without an
5ef4d93e 2956explicit C<defined> test or comparison elicits a warning if the
9f1b1f2d 2957C<use warnings> pragma or the B<-w>
19799a22 2958command-line switch (the C<$^W> variable) is in effect.
7b8d334a 2959
5f05dabc 2960The filehandles STDIN, STDOUT, and STDERR are predefined. (The
19799a22
GS
2961filehandles C<stdin>, C<stdout>, and C<stderr> will also work except
2962in packages, where they would be interpreted as local identifiers
2963rather than global.) Additional filehandles may be created with
2964the open() function, amongst others. See L<perlopentut> and
2965L<perlfunc/open> for details on this.
d74e8afc 2966X<stdin> X<stdout> X<sterr>
a0d0e21e 2967
35f2feb0 2968If a <FILEHANDLE> is used in a context that is looking for
19799a22
GS
2969a list, a list comprising all input lines is returned, one line per
2970list element. It's easy to grow to a rather large data space this
2971way, so use with care.
a0d0e21e 2972
35f2feb0 2973<FILEHANDLE> may also be spelled C<readline(*FILEHANDLE)>.
19799a22 2974See L<perlfunc/readline>.
fbad3eb5 2975
35f2feb0 2976The null filehandle <> is special: it can be used to emulate the
1ca345ed
TC
2977behavior of B<sed> and B<awk>, and any other Unix filter program
2978that takes a list of filenames, doing the same to each line
2979of input from all of them. Input from <> comes either from
a0d0e21e 2980standard input, or from each file listed on the command line. Here's
35f2feb0 2981how it works: the first time <> is evaluated, the @ARGV array is
5a964f20 2982checked, and if it is empty, C<$ARGV[0]> is set to "-", which when opened
a0d0e21e
LW
2983gives you standard input. The @ARGV array is then processed as a list
2984of filenames. The loop
2985
2986 while (<>) {
2987 ... # code for each line
2988 }
2989
2990is equivalent to the following Perl-like pseudo code:
2991
3e3baf6d 2992 unshift(@ARGV, '-') unless @ARGV;
a0d0e21e
LW
2993 while ($ARGV = shift) {
2994 open(ARGV, $ARGV);
2995 while (<ARGV>) {
2996 ... # code for each line
2997 }
2998 }
2999
19799a22
GS
3000except that it isn't so cumbersome to say, and will actually work.
3001It really does shift the @ARGV array and put the current filename
3002into the $ARGV variable. It also uses filehandle I<ARGV>
46f8a5ea 3003internally. <> is just a synonym for <ARGV>, which
19799a22 3004is magical. (The pseudo code above doesn't work because it treats
35f2feb0 3005<ARGV> as non-magical.)
a0d0e21e 3006
48ab5743
ML
3007Since the null filehandle uses the two argument form of L<perlfunc/open>
3008it interprets special characters, so if you have a script like this:
3009
3010 while (<>) {
3011 print;
3012 }
3013
3014and call it with C<perl dangerous.pl 'rm -rfv *|'>, it actually opens a
3015pipe, executes the C<rm> command and reads C<rm>'s output from that pipe.
3016If you want all items in C<@ARGV> to be interpreted as file names, you
1033ba6e
PM
3017can use the module C<ARGV::readonly> from CPAN, or use the double bracket:
3018
3019 while (<<>>) {
3020 print;
3021 }
3022
3023Using double angle brackets inside of a while causes the open to use the
3024three argument form (with the second argument being C<< < >>), so all
80a96bfc
RGS
3025arguments in ARGV are treated as literal filenames (including "-").
3026(Note that for convenience, if you use C<< <<>> >> and if @ARGV is
3027empty, it will still read from the standard input.)
48ab5743 3028
35f2feb0 3029You can modify @ARGV before the first <> as long as the array ends up
a0d0e21e 3030containing the list of filenames you really want. Line numbers (C<$.>)
19799a22
GS
3031continue as though the input were one big happy file. See the example
3032in L<perlfunc/eof> for how to reset line numbers on each file.
5a964f20 3033
89d205f2 3034If you want to set @ARGV to your own list of files, go right ahead.
5a964f20
TC
3035This sets @ARGV to all plain text files if no @ARGV was given:
3036
3037 @ARGV = grep { -f && -T } glob('*') unless @ARGV;
a0d0e21e 3038
5a964f20
TC
3039You can even set them to pipe commands. For example, this automatically
3040filters compressed arguments through B<gzip>:
3041
3042 @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV;
3043
3044If you want to pass switches into your script, you can use one of the
a0d0e21e
LW
3045Getopts modules or put a loop on the front like this:
3046
3047 while ($_ = $ARGV[0], /^-/) {
3048 shift;
3049 last if /^--$/;
3050 if (/^-D(.*)/) { $debug = $1 }
3051 if (/^-v/) { $verbose++ }
5a964f20 3052 # ... # other switches
a0d0e21e 3053 }
5a964f20 3054
a0d0e21e 3055 while (<>) {
5a964f20 3056 # ... # code for each line
a0d0e21e
LW
3057 }
3058
89d205f2
YO
3059The <> symbol will return C<undef> for end-of-file only once.
3060If you call it again after this, it will assume you are processing another
19799a22 3061@ARGV list, and if you haven't set @ARGV, will read input from STDIN.
a0d0e21e 3062
1ca345ed 3063If what the angle brackets contain is a simple scalar variable (for example,
35f2feb0 3064<$foo>), then that variable contains the name of the
19799a22
GS
3065filehandle to input from, or its typeglob, or a reference to the
3066same. For example:
cb1a09d0
AD
3067
3068 $fh = \*STDIN;
3069 $line = <$fh>;
a0d0e21e 3070
5a964f20
TC
3071If what's within the angle brackets is neither a filehandle nor a simple
3072scalar variable containing a filehandle name, typeglob, or typeglob
3073reference, it is interpreted as a filename pattern to be globbed, and
3074either a list of filenames or the next filename in the list is returned,
19799a22 3075depending on context. This distinction is determined on syntactic
35f2feb0
GS
3076grounds alone. That means C<< <$x> >> is always a readline() from
3077an indirect handle, but C<< <$hash{key}> >> is always a glob().
5a964f20 3078That's because $x is a simple scalar variable, but C<$hash{key}> is
ef191992
YST
3079not--it's a hash element. Even C<< <$x > >> (note the extra space)
3080is treated as C<glob("$x ")>, not C<readline($x)>.
5a964f20
TC
3081
3082One level of double-quote interpretation is done first, but you can't
35f2feb0 3083say C<< <$foo> >> because that's an indirect filehandle as explained
5a964f20
TC
3084in the previous paragraph. (In older versions of Perl, programmers
3085would insert curly brackets to force interpretation as a filename glob:
35f2feb0 3086C<< <${foo}> >>. These days, it's considered cleaner to call the
5a964f20 3087internal function directly as C<glob($foo)>, which is probably the right
19799a22 3088way to have done it in the first place.) For example:
a0d0e21e
LW
3089
3090 while (<*.c>) {
3091 chmod 0644, $_;
3092 }
3093
3a4b19e4 3094is roughly equivalent to:
a0d0e21e
LW
3095
3096 open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
3097 while (<FOO>) {
5b3eff12 3098 chomp;
a0d0e21e
LW
3099 chmod 0644, $_;
3100 }
3101
3a4b19e4
GS
3102except that the globbing is actually done internally using the standard
3103C<File::Glob> extension. Of course, the shortest way to do the above is:
a0d0e21e
LW
3104
3105 chmod 0644, <*.c>;
3106
19799a22
GS
3107A (file)glob evaluates its (embedded) argument only when it is
3108starting a new list. All values must be read before it will start
3109over. In list context, this isn't important because you automatically
3110get them all anyway. However, in scalar context the operator returns
069e01df 3111the next value each time it's called, or C<undef> when the list has
19799a22
GS
3112run out. As with filehandle reads, an automatic C<defined> is
3113generated when the glob occurs in the test part of a C<while>,
1ca345ed
TC
3114because legal glob returns (for example,
3115a file called F<0>) would otherwise
19799a22
GS
3116terminate the loop. Again, C<undef> is returned only once. So if
3117you're expecting a single value from a glob, it is much better to
3118say
4633a7c4
LW
3119
3120 ($file) = <blurch*>;
3121
3122than
3123
3124 $file = <blurch*>;
3125
3126because the latter will alternate between returning a filename and
19799a22 3127returning false.
4633a7c4 3128
b159ebd3 3129If you're trying to do variable interpolation, it's definitely better
4633a7c4 3130to use the glob() function, because the older notation can cause people
e37d713d 3131to become confused with the indirect filehandle notation.
4633a7c4
LW
3132
3133 @files = glob("$dir/*.[ch]");
3134 @files = glob($files[$i]);
3135
a0d0e21e 3136=head2 Constant Folding
d74e8afc 3137X<constant folding> X<folding>
a0d0e21e
LW
3138
3139Like C, Perl does a certain amount of expression evaluation at
19799a22 3140compile time whenever it determines that all arguments to an
a0d0e21e
LW
3141operator are static and have no side effects. In particular, string
3142concatenation happens at compile time between literals that don't do
19799a22 3143variable substitution. Backslash interpolation also happens at
a0d0e21e
LW
3144compile time. You can say
3145
1ca345ed
TC
3146 'Now is the time for all'
3147 . "\n"
3148 . 'good men to come to.'
a0d0e21e 3149
54310121 3150and this all reduces to one string internally. Likewise, if
a0d0e21e
LW
3151you say
3152
3153 foreach $file (@filenames) {
5a964f20 3154 if (-s $file > 5 + 100 * 2**16) { }
54310121 3155 }
a0d0e21e 3156
1ca345ed 3157the compiler precomputes the number which that expression
19799a22 3158represents so that the interpreter won't have to.
a0d0e21e 3159
fd1abbef 3160=head2 No-ops
d74e8afc 3161X<no-op> X<nop>
fd1abbef
DN
3162
3163Perl doesn't officially have a no-op operator, but the bare constants
1ca345ed 3164C<0> and C<1> are special-cased not to produce a warning in void
fd1abbef
DN
3165context, so you can for example safely do
3166
3167 1 while foo();
3168
2c268ad5 3169=head2 Bitwise String Operators
fb7054ba 3170X<operator, bitwise, string> X<&.> X<|.> X<^.> X<~.>
2c268ad5
TP
3171
3172Bitstrings of any size may be manipulated by the bitwise operators
3173(C<~ | & ^>).
3174
19799a22
GS
3175If the operands to a binary bitwise op are strings of different
3176sizes, B<|> and B<^> ops act as though the shorter operand had
3177additional zero bits on the right, while the B<&> op acts as though
3178the longer operand were truncated to the length of the shorter.
3179The granularity for such extension or truncation is one or more
3180bytes.
2c268ad5 3181
89d205f2 3182 # ASCII-based examples
2c268ad5
TP
3183 print "j p \n" ^ " a h"; # prints "JAPH\n"
3184 print "JA" | " ph\n"; # prints "japh\n"
3185 print "japh\nJunk" & '_____'; # prints "JAPH\n";
3186 print 'p N$' ^ " E<H\n"; # prints "Perl\n";
3187
19799a22 3188If you are intending to manipulate bitstrings, be certain that
2c268ad5 3189you're supplying bitstrings: If an operand is a number, that will imply
19799a22 3190a B<numeric> bitwise operation. You may explicitly show which type of
2c268ad5
TP
3191operation you intend by using C<""> or C<0+>, as in the examples below.
3192
4358a253
SS
3193 $foo = 150 | 105; # yields 255 (0x96 | 0x69 is 0xFF)
3194 $foo = '150' | 105; # yields 255
2c268ad5
TP
3195 $foo = 150 | '105'; # yields 255
3196 $foo = '150' | '105'; # yields string '155' (under ASCII)
3197
3198 $baz = 0+$foo & 0+$bar; # both ops explicitly numeric
3199 $biz = "$foo" ^ "$bar"; # both ops explicitly stringy
a0d0e21e 3200
fb7054ba
FC
3201This somewhat unpredictable behavior can be avoided with the experimental
3202"bitwise" feature, new in Perl 5.22. You can enable it via C<use feature
3203'bitwise'>. By default, it will warn unless the "experimental::bitwise"
3204warnings category has been disabled. (C<use experimental 'bitwise'> will
3205enable the feature and disable the warning.) Under this feature, the four
3206standard bitwise operators (C<~ | & ^>) are always numeric. Adding a dot
3207after each operator (C<~. |. &. ^.>) forces it to treat its operands as
3208strings:
3209
3210 use experimental "bitwise";
3211 $foo = 150 | 105; # yields 255 (0x96 | 0x69 is 0xFF)
3212 $foo = '150' | 105; # yields 255
3213 $foo = 150 | '105'; # yields 255
3214 $foo = '150' | '105'; # yields 255
3215 $foo = 150 |. 105; # yields string '155' (under ASCII)
3216 $foo = '150' |. 105; # yields string '155'
3217 $foo = 150 |.'105'; # yields string '155'
3218 $foo = '150' |.'105'; # yields string '155'
3219
3220 $baz = $foo & $bar; # both operands numeric
3221 $biz = $foo ^. $bar; # both operands stringy
3222
3223The assignment variants of these operators (C<&= |= ^= &.= |.= ^.=>)
3224behave likewise under the feature.
3225
1ae175c8
GS
3226See L<perlfunc/vec> for information on how to manipulate individual bits
3227in a bit vector.
3228
55497cff 3229=head2 Integer Arithmetic
d74e8afc 3230X<integer>
a0d0e21e 3231
19799a22 3232By default, Perl assumes that it must do most of its arithmetic in
a0d0e21e
LW
3233floating point. But by saying
3234
3235 use integer;
3236
3eab78e3
CW
3237you may tell the compiler to use integer operations
3238(see L<integer> for a detailed explanation) from here to the end of
3239the enclosing BLOCK. An inner BLOCK may countermand this by saying
a0d0e21e
LW
3240
3241 no integer;
3242
19799a22 3243which lasts until the end of that BLOCK. Note that this doesn't
3eab78e3
CW
3244mean everything is an integer, merely that Perl will use integer
3245operations for arithmetic, comparison, and bitwise operators. For
3246example, even under C<use integer>, if you take the C<sqrt(2)>, you'll
3247still get C<1.4142135623731> or so.
19799a22
GS
3248
3249Used on numbers, the bitwise operators ("&", "|", "^", "~", "<<",
89d205f2 3250and ">>") always produce integral results. (But see also
13a2d996 3251L<Bitwise String Operators>.) However, C<use integer> still has meaning for
19799a22
GS
3252them. By default, their results are interpreted as unsigned integers, but
3253if C<use integer> is in effect, their results are interpreted
3254as signed integers. For example, C<~0> usually evaluates to a large
0be96356 3255integral value. However, C<use integer; ~0> is C<-1> on two's-complement
19799a22 3256machines.
68dc0745
PP
3257
3258=head2 Floating-point Arithmetic
06ce2fa3 3259
d74e8afc 3260X<floating-point> X<floating point> X<float> X<real>
68dc0745
PP
3261
3262While C<use integer> provides integer-only arithmetic, there is no
19799a22
GS
3263analogous mechanism to provide automatic rounding or truncation to a
3264certain number of decimal places. For rounding to a certain number
3265of digits, sprintf() or printf() is usually the easiest route.
3266See L<perlfaq4>.
68dc0745 3267
5a964f20
TC
3268Floating-point numbers are only approximations to what a mathematician
3269would call real numbers. There are infinitely more reals than floats,
3270so some corners must be cut. For example:
3271
3272 printf "%.20g\n", 123456789123456789;
3273 # produces 123456789123456784
3274
8548cb57
RGS
3275Testing for exact floating-point equality or inequality is not a
3276good idea. Here's a (relatively expensive) work-around to compare
5a964f20
TC
3277whether two floating-point numbers are equal to a particular number of
3278decimal places. See Knuth, volume II, for a more robust treatment of
3279this topic.
3280
3281 sub fp_equal {
3282 my ($X, $Y, $POINTS) = @_;
3283 my ($tX, $tY);
3284 $tX = sprintf("%.${POINTS}g", $X);
3285 $tY = sprintf("%.${POINTS}g", $Y);
3286 return $tX eq $tY;
3287 }
3288
68dc0745 3289The POSIX module (part of the standard perl distribution) implements
19799a22
GS
3290ceil(), floor(), and other mathematical and trigonometric functions.
3291The Math::Complex module (part of the standard perl distribution)
3292defines mathematical functions that work on both the reals and the
3293imaginary numbers. Math::Complex not as efficient as POSIX, but
68dc0745
PP
3294POSIX can't work with complex numbers.
3295
3296Rounding in financial applications can have serious implications, and
3297the rounding method used should be specified precisely. In these
3298cases, it probably pays not to trust whichever system rounding is
3299being used by Perl, but to instead implement the rounding function you
3300need yourself.
5a964f20
TC
3301
3302=head2 Bigger Numbers
d74e8afc 3303X<number, arbitrary precision>
5a964f20 3304
c543c01b 3305The standard C<Math::BigInt>, C<Math::BigRat>, and C<Math::BigFloat> modules,
fb1a95c6 3306along with the C<bignum>, C<bigint>, and C<bigrat> pragmas, provide
19799a22 3307variable-precision arithmetic and overloaded operators, although
46f8a5ea 3308they're currently pretty slow. At the cost of some space and
19799a22
GS
3309considerable speed, they avoid the normal pitfalls associated with
3310limited-precision representations.
5a964f20 3311
c543c01b
TC
3312 use 5.010;
3313 use bigint; # easy interface to Math::BigInt
3314 $x = 123456789123456789;
3315 say $x * $x;
3316 +15241578780673678515622620750190521
3317
3318Or with rationals:
3319
db691027
SF
3320 use 5.010;
3321 use bigrat;
3322 $x = 3/22;
3323 $y = 4/6;
3324 say "x/y is ", $x/$y;
3325 say "x*y is ", $x*$y;
3326 x/y is 9/44
3327 x*y is 1/11
c543c01b
TC
3328
3329Several modules let you calculate with (bound only by memory and CPU time)
46f8a5ea
FC
3330unlimited or fixed precision. There
3331are also some non-standard modules that
c543c01b 3332provide faster implementations via external C libraries.
cd5c4fce
T
3333
3334Here is a short, but incomplete summary:
3335
950b09ed
KW
3336 Math::String treat string sequences like numbers
3337 Math::FixedPrecision calculate with a fixed precision
3338 Math::Currency for currency calculations
3339 Bit::Vector manipulate bit vectors fast (uses C)
3340 Math::BigIntFast Bit::Vector wrapper for big numbers
3341 Math::Pari provides access to the Pari C library
70c45be3
FC
3342 Math::Cephes uses the external Cephes C library (no
3343 big numbers)
950b09ed
KW
3344 Math::Cephes::Fraction fractions via the Cephes library
3345 Math::GMP another one using an external C library
70c45be3
FC
3346 Math::GMPz an alternative interface to libgmp's big ints
3347 Math::GMPq an interface to libgmp's fraction numbers
3348 Math::GMPf an interface to libgmp's floating point numbers
cd5c4fce
T
3349
3350Choose wisely.
16070b82
GS
3351
3352=cut