This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
document context provided by refgen
[perl5.git] / pod / perlop.pod
CommitLineData
a0d0e21e 1=head1 NAME
d74e8afc 2X<operator>
a0d0e21e
LW
3
4perlop - Perl operators and precedence
5
d042e63d
MS
6=head1 DESCRIPTION
7
ae3f7391 8In Perl, the operator determines what operation is performed,
ba7f043c 9independent of the type of the operands. For example S<C<$x + $y>>
db691027 10is always a numeric addition, and if C<$x> or C<$y> do not contain
ae3f7391
ML
11numbers, an attempt is made to convert them to numbers first.
12
13This is in contrast to many other dynamic languages, where the
46f8a5ea 14operation is determined by the type of the first argument. It also
ae3f7391 15means that Perl has two versions of some operators, one for numeric
ba7f043c
KW
16and one for string comparison. For example S<C<$x == $y>> compares
17two numbers for equality, and S<C<$x eq $y>> compares two strings.
ae3f7391
ML
18
19There are a few exceptions though: C<x> can be either string
20repetition or list repetition, depending on the type of the left
0b55efd7 21operand, and C<&>, C<|>, C<^> and C<~> can be either string or numeric bit
ae3f7391
ML
22operations.
23
89d205f2 24=head2 Operator Precedence and Associativity
d74e8afc 25X<operator, precedence> X<precedence> X<associativity>
d042e63d
MS
26
27Operator precedence and associativity work in Perl more or less like
28they do in mathematics.
29
30I<Operator precedence> means some operators are evaluated before
ba7f043c
KW
31others. For example, in S<C<2 + 4 * 5>>, the multiplication has higher
32precedence so S<C<4 * 5>> is evaluated first yielding S<C<2 + 20 ==
3322>> and not S<C<6 * 5 == 30>>.
d042e63d
MS
34
35I<Operator associativity> defines what happens if a sequence of the
36same operators is used one after another: whether the evaluator will
ba7f043c
KW
37evaluate the left operations first, or the right first. For example, in
38S<C<8 - 4 - 2>>, subtraction is left associative so Perl evaluates the
39expression left to right. S<C<8 - 4>> is evaluated first making the
40expression S<C<4 - 2 == 2>> and not S<C<8 - 2 == 6>>.
a0d0e21e
LW
41
42Perl operators have the following associativity and precedence,
19799a22
GS
43listed from highest precedence to lowest. Operators borrowed from
44C keep the same precedence relationship with each other, even where
45C's precedence is slightly screwy. (This makes learning Perl easier
46for C folks.) With very few exceptions, these all operate on scalar
47values only, not array values.
a0d0e21e
LW
48
49 left terms and list operators (leftward)
50 left ->
51 nonassoc ++ --
52 right **
53 right ! ~ \ and unary + and -
54310121 54 left =~ !~
a0d0e21e
LW
55 left * / % x
56 left + - .
57 left << >>
58 nonassoc named unary operators
59 nonassoc < > <= >= lt gt le ge
0d863452 60 nonassoc == != <=> eq ne cmp ~~
a0d0e21e
LW
61 left &
62 left | ^
63 left &&
c963b151 64 left || //
137443ea 65 nonassoc .. ...
a0d0e21e 66 right ?:
2ba1f20a 67 right = += -= *= etc. goto last next redo dump
a0d0e21e
LW
68 left , =>
69 nonassoc list operators (rightward)
a5f75d66 70 right not
a0d0e21e 71 left and
f23102e2 72 left or xor
a0d0e21e 73
3df91f1a
DM
74In the following sections, these operators are covered in detail, in the
75same order in which they appear in the table above.
a0d0e21e 76
5a964f20
TC
77Many operators can be overloaded for objects. See L<overload>.
78
a0d0e21e 79=head2 Terms and List Operators (Leftward)
d74e8afc 80X<list operator> X<operator, list> X<term>
a0d0e21e 81
62c18ce2 82A TERM has the highest precedence in Perl. They include variables,
5f05dabc 83quote and quote-like operators, any expression in parentheses,
a0d0e21e
LW
84and any function whose arguments are parenthesized. Actually, there
85aren't really functions in this sense, just list operators and unary
86operators behaving as functions because you put parentheses around
87the arguments. These are all documented in L<perlfunc>.
88
ba7f043c 89If any list operator (C<print()>, etc.) or any unary operator (C<chdir()>, etc.)
a0d0e21e
LW
90is followed by a left parenthesis as the next token, the operator and
91arguments within parentheses are taken to be of highest precedence,
92just like a normal function call.
93
94In the absence of parentheses, the precedence of list operators such as
95C<print>, C<sort>, or C<chmod> is either very high or very low depending on
54310121 96whether you are looking at the left side or the right side of the operator.
a0d0e21e
LW
97For example, in
98
99 @ary = (1, 3, sort 4, 2);
100 print @ary; # prints 1324
101
ba7f043c 102the commas on the right of the C<sort> are evaluated before the C<sort>,
19799a22
GS
103but the commas on the left are evaluated after. In other words,
104list operators tend to gobble up all arguments that follow, and
a0d0e21e 105then act like a simple TERM with regard to the preceding expression.
19799a22 106Be careful with parentheses:
a0d0e21e
LW
107
108 # These evaluate exit before doing the print:
109 print($foo, exit); # Obviously not what you want.
110 print $foo, exit; # Nor is this.
111
112 # These do the print before evaluating exit:
113 (print $foo), exit; # This is what you want.
114 print($foo), exit; # Or this.
115 print ($foo), exit; # Or even this.
116
117Also note that
118
119 print ($foo & 255) + 1, "\n";
120
d042e63d
MS
121probably doesn't do what you expect at first glance. The parentheses
122enclose the argument list for C<print> which is evaluated (printing
ba7f043c 123the result of S<C<$foo & 255>>). Then one is added to the return value
d042e63d
MS
124of C<print> (usually 1). The result is something like this:
125
126 1 + 1, "\n"; # Obviously not what you meant.
127
128To do what you meant properly, you must write:
129
130 print(($foo & 255) + 1, "\n");
131
5a0de581 132See L</Named Unary Operators> for more discussion of this.
a0d0e21e 133
ba7f043c 134Also parsed as terms are the S<C<do {}>> and S<C<eval {}>> constructs, as
54310121 135well as subroutine and method calls, and the anonymous
a0d0e21e
LW
136constructors C<[]> and C<{}>.
137
5a0de581 138See also L</Quote and Quote-like Operators> toward the end of this section,
da87341d 139as well as L</"I/O Operators">.
a0d0e21e
LW
140
141=head2 The Arrow Operator
d74e8afc 142X<arrow> X<dereference> X<< -> >>
a0d0e21e 143
35f2feb0 144"C<< -> >>" is an infix dereference operator, just as it is in C
19799a22
GS
145and C++. If the right side is either a C<[...]>, C<{...}>, or a
146C<(...)> subscript, then the left side must be either a hard or
147symbolic reference to an array, a hash, or a subroutine respectively.
148(Or technically speaking, a location capable of holding a hard
149reference, if it's an array or hash reference being used for
150assignment.) See L<perlreftut> and L<perlref>.
a0d0e21e 151
19799a22
GS
152Otherwise, the right side is a method name or a simple scalar
153variable containing either the method name or a subroutine reference,
154and the left side must be either an object (a blessed reference)
155or a class name (that is, a package name). See L<perlobj>.
a0d0e21e 156
821361b6 157The dereferencing cases (as opposed to method-calling cases) are
2ad792cd 158somewhat extended by the C<postderef> feature. For the
821361b6
RS
159details of that feature, consult L<perlref/Postfix Dereference Syntax>.
160
5f05dabc 161=head2 Auto-increment and Auto-decrement
d74e8afc 162X<increment> X<auto-increment> X<++> X<decrement> X<auto-decrement> X<-->
a0d0e21e 163
ba7f043c 164C<"++"> and C<"--"> work as in C. That is, if placed before a variable,
d042e63d
MS
165they increment or decrement the variable by one before returning the
166value, and if placed after, increment or decrement after returning the
167value.
168
169 $i = 0; $j = 0;
170 print $i++; # prints 0
171 print ++$j; # prints 1
a0d0e21e 172
b033823e 173Note that just as in C, Perl doesn't define B<when> the variable is
46f8a5ea
FC
174incremented or decremented. You just know it will be done sometime
175before or after the value is returned. This also means that modifying
c543c01b 176a variable twice in the same statement will lead to undefined behavior.
b033823e
A
177Avoid statements like:
178
179 $i = $i ++;
180 print ++ $i + $i ++;
181
182Perl will not guarantee what the result of the above statements is.
183
54310121 184The auto-increment operator has a little extra builtin magic to it. If
a0d0e21e
LW
185you increment a variable that is numeric, or that has ever been used in
186a numeric context, you get a normal increment. If, however, the
5f05dabc 187variable has been used in only string contexts since it was set, and
5a964f20 188has a value that is not the empty string and matches the pattern
9c0670e1 189C</^[a-zA-Z]*[0-9]*\z/>, the increment is done as a string, preserving each
a0d0e21e
LW
190character within its range, with carry:
191
c543c01b
TC
192 print ++($foo = "99"); # prints "100"
193 print ++($foo = "a0"); # prints "a1"
194 print ++($foo = "Az"); # prints "Ba"
195 print ++($foo = "zz"); # prints "aaa"
a0d0e21e 196
6a61d433
HS
197C<undef> is always treated as numeric, and in particular is changed
198to C<0> before incrementing (so that a post-increment of an undef value
199will return C<0> rather than C<undef>).
200
5f05dabc 201The auto-decrement operator is not magical.
a0d0e21e
LW
202
203=head2 Exponentiation
d74e8afc 204X<**> X<exponentiation> X<power>
a0d0e21e 205
ba7f043c
KW
206Binary C<"**"> is the exponentiation operator. It binds even more
207tightly than unary minus, so C<-2**4> is C<-(2**4)>, not C<(-2)**4>.
208(This is
209implemented using C's C<pow(3)> function, which actually works on doubles
cb1a09d0 210internally.)
a0d0e21e 211
44a465b3
JH
212Note that certain exponentiation expressions are ill-defined:
213these include C<0**0>, C<1**Inf>, and C<Inf**0>. Do not expect
214any particular results from these special cases, the results
215are platform-dependent.
216
a0d0e21e 217=head2 Symbolic Unary Operators
d74e8afc 218X<unary operator> X<operator, unary>
a0d0e21e 219
4b05bc8e
KW
220Unary C<"!"> performs logical negation, that is, "not". See also
221L<C<not>|/Logical Not> for a lower precedence version of this.
d74e8afc 222X<!>
a0d0e21e 223
ba7f043c 224Unary C<"-"> performs arithmetic negation if the operand is numeric,
da2f94c5
FC
225including any string that looks like a number. If the operand is
226an identifier, a string consisting of a minus sign concatenated
227with the identifier is returned. Otherwise, if the string starts
228with a plus or minus, a string starting with the opposite sign is
ba7f043c
KW
229returned. One effect of these rules is that C<-bareword> is equivalent
230to the string C<"-bareword">. If, however, the string begins with a
231non-alphabetic character (excluding C<"+"> or C<"-">), Perl will attempt
232to convert
233the string to a numeric, and the arithmetic negation is performed. If the
06705523
SP
234string cannot be cleanly converted to a numeric, Perl will give the warning
235B<Argument "the string" isn't numeric in negation (-) at ...>.
d74e8afc 236X<-> X<negation, arithmetic>
a0d0e21e 237
ba7f043c 238Unary C<"~"> performs bitwise negation, that is, 1's complement. For
5a0de581
LM
239example, S<C<0666 & ~027>> is 0640. (See also L</Integer Arithmetic> and
240L</Bitwise String Operators>.) Note that the width of the result is
ba7f043c 241platform-dependent: C<~0> is 32 bits wide on a 32-bit platform, but 64
972b05a9 242bits wide on a 64-bit platform, so if you are expecting a certain bit
ba7f043c 243width, remember to use the C<"&"> operator to mask off the excess bits.
d74e8afc 244X<~> X<negation, binary>
a0d0e21e 245
fac71630
KW
246Starting in Perl 5.28, it is a fatal error to try to complement a string
247containing a character with an ordinal value above 255.
f113cf86 248
ba7f043c
KW
249If the experimental "bitwise" feature is enabled via S<C<use feature
250'bitwise'>>, then unary C<"~"> always treats its argument as a number, and an
251alternate form of the operator, C<"~.">, always treats its argument as a
fb7054ba
FC
252string. So C<~0> and C<~"0"> will both give 2**32-1 on 32-bit platforms,
253whereas C<~.0> and C<~."0"> will both yield C<"\xff">. This feature
ba7f043c 254produces a warning unless you use S<C<no warnings 'experimental::bitwise'>>.
fb7054ba 255
ba7f043c 256Unary C<"+"> has no effect whatsoever, even on strings. It is useful
a0d0e21e
LW
257syntactically for separating a function name from a parenthesized expression
258that would otherwise be interpreted as the complete list of function
a95b3d6a 259arguments. (See examples above under L</Terms and List Operators (Leftward)>.)
d74e8afc 260X<+>
a0d0e21e 261
39dc9d14
Z
262Unary C<"\"> creates references. If its operand is a single sigilled
263thing, it creates a reference to that object. If its operand is a
264parenthesised list, then it creates references to the things mentioned
265in the list. Otherwise it puts its operand in list context, and creates
266a list of references to the scalars in the list provided by the operand.
267See L<perlreftut>
19799a22
GS
268and L<perlref>. Do not confuse this behavior with the behavior of
269backslash within a string, although both forms do convey the notion
270of protecting the next thing from interpolation.
d74e8afc 271X<\> X<reference> X<backslash>
a0d0e21e
LW
272
273=head2 Binding Operators
d74e8afc 274X<binding> X<operator, binding> X<=~> X<!~>
a0d0e21e 275
ba7f043c
KW
276Binary C<"=~"> binds a scalar expression to a pattern match. Certain operations
277search or modify the string C<$_> by default. This operator makes that kind
cb1a09d0 278of operation work on some other string. The right argument is a search
2c268ad5
TP
279pattern, substitution, or transliteration. The left argument is what is
280supposed to be searched, substituted, or transliterated instead of the default
ba7f043c
KW
281C<$_>. When used in scalar context, the return value generally indicates the
282success of the operation. The exceptions are substitution (C<s///>)
283and transliteration (C<y///>) with the C</r> (non-destructive) option,
8ff32507
FC
284which cause the B<r>eturn value to be the result of the substitution.
285Behavior in list context depends on the particular operator.
000c65fc
DG
286See L</"Regexp Quote-Like Operators"> for details and L<perlretut> for
287examples using these operators.
f8bab1e9
GS
288
289If the right argument is an expression rather than a search pattern,
2c268ad5 290substitution, or transliteration, it is interpreted as a search pattern at run
46f8a5ea
FC
291time. Note that this means that its
292contents will be interpolated twice, so
89d205f2 293
1ca345ed 294 '\\' =~ q'\\';
89d205f2
YO
295
296is not ok, as the regex engine will end up trying to compile the
297pattern C<\>, which it will consider a syntax error.
a0d0e21e 298
ba7f043c 299Binary C<"!~"> is just like C<"=~"> except the return value is negated in
a0d0e21e
LW
300the logical sense.
301
ba7f043c
KW
302Binary C<"!~"> with a non-destructive substitution (C<s///r>) or transliteration
303(C<y///r>) is a syntax error.
4f4d7508 304
a0d0e21e 305=head2 Multiplicative Operators
d74e8afc 306X<operator, multiplicative>
a0d0e21e 307
ba7f043c 308Binary C<"*"> multiplies two numbers.
d74e8afc 309X<*>
a0d0e21e 310
ba7f043c 311Binary C<"/"> divides two numbers.
d74e8afc 312X</> X<slash>
a0d0e21e 313
ba7f043c 314Binary C<"%"> is the modulo operator, which computes the division
f7918450
KW
315remainder of its first argument with respect to its second argument.
316Given integer
ba7f043c 317operands C<$m> and C<$n>: If C<$n> is positive, then S<C<$m % $n>> is
db691027 318C<$m> minus the largest multiple of C<$n> less than or equal to
ba7f043c 319C<$m>. If C<$n> is negative, then S<C<$m % $n>> is C<$m> minus the
db691027 320smallest multiple of C<$n> that is not less than C<$m> (that is, the
89b4f0ad 321result will be less than or equal to zero). If the operands
db691027 322C<$m> and C<$n> are floating point values and the absolute value of
ba7f043c 323C<$n> (that is C<abs($n)>) is less than S<C<(UV_MAX + 1)>>, only
db691027 324the integer portion of C<$m> and C<$n> will be used in the operation
4848a83b 325(Note: here C<UV_MAX> means the maximum of the unsigned integer type).
db691027 326If the absolute value of the right operand (C<abs($n)>) is greater than
ba7f043c
KW
327or equal to S<C<(UV_MAX + 1)>>, C<"%"> computes the floating-point remainder
328C<$r> in the equation S<C<($r = $m - $i*$n)>> where C<$i> is a certain
f7918450 329integer that makes C<$r> have the same sign as the right operand
db691027
SF
330C<$n> (B<not> as the left operand C<$m> like C function C<fmod()>)
331and the absolute value less than that of C<$n>.
ba7f043c 332Note that when S<C<use integer>> is in scope, C<"%"> gives you direct access
f7918450 333to the modulo operator as implemented by your C compiler. This
55d729e4
GS
334operator is not as well defined for negative operands, but it will
335execute faster.
f7918450 336X<%> X<remainder> X<modulo> X<mod>
55d729e4 337
e509fc38
Z
338Binary C<x> is the repetition operator. In scalar context, or if the
339left operand is neither enclosed in parentheses nor a C<qw//> list,
340it performs a string repetition. In that case it supplies scalar
341context to the left operand, and returns a string consisting of the
342left operand string repeated the number of times specified by the right
343operand. If the C<x> is in list context, and the left operand is either
344enclosed in parentheses or a C<qw//> list, it performs a list repetition.
345In that case it supplies list context to the left operand, and returns
346a list consisting of the left operand list repeated the number of times
347specified by the right operand.
31201a8e
KW
348If the right operand is zero or negative (raising a warning on
349negative), it returns an empty string
3585017f 350or an empty list, depending on the context.
d74e8afc 351X<x>
a0d0e21e
LW
352
353 print '-' x 80; # print row of dashes
354
355 print "\t" x ($tab/8), ' ' x ($tab%8); # tab over
356
357 @ones = (1) x 80; # a list of 80 1's
358 @ones = (5) x @ones; # set all elements to 5
359
360
361=head2 Additive Operators
d74e8afc 362X<operator, additive>
a0d0e21e 363
ba7f043c 364Binary C<"+"> returns the sum of two numbers.
d74e8afc 365X<+>
a0d0e21e 366
ba7f043c 367Binary C<"-"> returns the difference of two numbers.
d74e8afc 368X<->
a0d0e21e 369
ba7f043c 370Binary C<"."> concatenates two strings.
d74e8afc
ITB
371X<string, concatenation> X<concatenation>
372X<cat> X<concat> X<concatenate> X<.>
a0d0e21e
LW
373
374=head2 Shift Operators
d74e8afc
ITB
375X<shift operator> X<operator, shift> X<<< << >>>
376X<<< >> >>> X<right shift> X<left shift> X<bitwise shift>
377X<shl> X<shr> X<shift, right> X<shift, left>
a0d0e21e 378
ba7f043c 379Binary C<<< "<<" >>> returns the value of its left argument shifted left by the
55497cff 380number of bits specified by the right argument. Arguments should be
5a0de581 381integers. (See also L</Integer Arithmetic>.)
a0d0e21e 382
ba7f043c 383Binary C<<< ">>" >>> returns the value of its left argument shifted right by
55497cff 384the number of bits specified by the right argument. Arguments should
5a0de581 385be integers. (See also L</Integer Arithmetic>.)
a0d0e21e 386
5a0de581 387If S<C<use integer>> (see L</Integer Arithmetic>) is in force then
a63df121
JH
388signed C integers are used (I<arithmetic shift>), otherwise unsigned C
389integers are used (I<logical shift>), even for negative shiftees.
390In arithmetic right shift the sign bit is replicated on the left,
391in logical shift zero bits come in from the left.
392
393Either way, the implementation isn't going to generate results larger
394than the size of the integer type Perl was built with (32 bits or 64 bits).
395
396Shifting by negative number of bits means the reverse shift: left
397shift becomes right shift, right shift becomes left shift. This is
398unlike in C, where negative shift is undefined.
399
400Shifting by more bits than the size of the integers means most of the
401time zero (all bits fall off), except that under S<C<use integer>>
402right overshifting a negative shiftee results in -1. This is unlike
403in C, where shifting by too many bits is undefined. A common C
404behavior is "shift by modulo wordbits", so that for example
405
406 1 >> 64 == 1 >> (64 % 64) == 1 >> 0 == 1 # Common C behavior.
407
408but that is completely accidental.
b16cf6df 409
1ca345ed 410If you get tired of being subject to your platform's native integers,
ba7f043c 411the S<C<use bigint>> pragma neatly sidesteps the issue altogether:
1ca345ed
TC
412
413 print 20 << 20; # 20971520
a727cfac 414 print 20 << 40; # 5120 on 32-bit machines,
1ca345ed
TC
415 # 21990232555520 on 64-bit machines
416 use bigint;
417 print 20 << 100; # 25353012004564588029934064107520
418
a0d0e21e 419=head2 Named Unary Operators
d74e8afc 420X<operator, named unary>
a0d0e21e
LW
421
422The various named unary operators are treated as functions with one
568e6d8b 423argument, with optional parentheses.
a0d0e21e 424
ba7f043c 425If any list operator (C<print()>, etc.) or any unary operator (C<chdir()>, etc.)
a0d0e21e
LW
426is followed by a left parenthesis as the next token, the operator and
427arguments within parentheses are taken to be of highest precedence,
3981b0eb 428just like a normal function call. For example,
1ca345ed 429because named unary operators are higher precedence than C<||>:
a0d0e21e
LW
430
431 chdir $foo || die; # (chdir $foo) || die
432 chdir($foo) || die; # (chdir $foo) || die
433 chdir ($foo) || die; # (chdir $foo) || die
434 chdir +($foo) || die; # (chdir $foo) || die
435
ba7f043c 436but, because C<"*"> is higher precedence than named operators:
a0d0e21e
LW
437
438 chdir $foo * 20; # chdir ($foo * 20)
439 chdir($foo) * 20; # (chdir $foo) * 20
440 chdir ($foo) * 20; # (chdir $foo) * 20
441 chdir +($foo) * 20; # chdir ($foo * 20)
442
443 rand 10 * 20; # rand (10 * 20)
444 rand(10) * 20; # (rand 10) * 20
445 rand (10) * 20; # (rand 10) * 20
446 rand +(10) * 20; # rand (10 * 20)
447
568e6d8b
RGS
448Regarding precedence, the filetest operators, like C<-f>, C<-M>, etc. are
449treated like named unary operators, but they don't follow this functional
450parenthesis rule. That means, for example, that C<-f($file).".bak"> is
ba7f043c 451equivalent to S<C<-f "$file.bak">>.
d74e8afc 452X<-X> X<filetest> X<operator, filetest>
568e6d8b 453
5a0de581 454See also L</"Terms and List Operators (Leftward)">.
a0d0e21e
LW
455
456=head2 Relational Operators
d74e8afc 457X<relational operator> X<operator, relational>
a0d0e21e 458
a727cfac 459Perl operators that return true or false generally return values
1ca345ed
TC
460that can be safely used as numbers. For example, the relational
461operators in this section and the equality operators in the next
462one return C<1> for true and a special version of the defined empty
463string, C<"">, which counts as a zero but is exempt from warnings
ba7f043c 464about improper numeric conversions, just as S<C<"0 but true">> is.
1ca345ed 465
ba7f043c 466Binary C<< "<" >> returns true if the left argument is numerically less than
a0d0e21e 467the right argument.
d74e8afc 468X<< < >>
a0d0e21e 469
ba7f043c 470Binary C<< ">" >> returns true if the left argument is numerically greater
a0d0e21e 471than the right argument.
d74e8afc 472X<< > >>
a0d0e21e 473
ba7f043c 474Binary C<< "<=" >> returns true if the left argument is numerically less than
a0d0e21e 475or equal to the right argument.
d74e8afc 476X<< <= >>
a0d0e21e 477
ba7f043c 478Binary C<< ">=" >> returns true if the left argument is numerically greater
a0d0e21e 479than or equal to the right argument.
d74e8afc 480X<< >= >>
a0d0e21e 481
ba7f043c 482Binary C<"lt"> returns true if the left argument is stringwise less than
a0d0e21e 483the right argument.
d74e8afc 484X<< lt >>
a0d0e21e 485
ba7f043c 486Binary C<"gt"> returns true if the left argument is stringwise greater
a0d0e21e 487than the right argument.
d74e8afc 488X<< gt >>
a0d0e21e 489
ba7f043c 490Binary C<"le"> returns true if the left argument is stringwise less than
a0d0e21e 491or equal to the right argument.
d74e8afc 492X<< le >>
a0d0e21e 493
ba7f043c 494Binary C<"ge"> returns true if the left argument is stringwise greater
a0d0e21e 495than or equal to the right argument.
d74e8afc 496X<< ge >>
a0d0e21e
LW
497
498=head2 Equality Operators
d74e8afc 499X<equality> X<equal> X<equals> X<operator, equality>
a0d0e21e 500
ba7f043c 501Binary C<< "==" >> returns true if the left argument is numerically equal to
a0d0e21e 502the right argument.
d74e8afc 503X<==>
a0d0e21e 504
ba7f043c 505Binary C<< "!=" >> returns true if the left argument is numerically not equal
a0d0e21e 506to the right argument.
d74e8afc 507X<!=>
a0d0e21e 508
ba7f043c 509Binary C<< "<=>" >> returns -1, 0, or 1 depending on whether the left
6ee5d4e7 510argument is numerically less than, equal to, or greater than the right
ba7f043c
KW
511argument. If your platform supports C<NaN>'s (not-a-numbers) as numeric
512values, using them with C<< "<=>" >> returns undef. C<NaN> is not
513C<< "<" >>, C<< "==" >>, C<< ">" >>, C<< "<=" >> or C<< ">=" >> anything
514(even C<NaN>), so those 5 return false. S<C<< NaN != NaN >>> returns
515true, as does S<C<NaN !=> I<anything else>>. If your platform doesn't
516support C<NaN>'s then C<NaN> is just a string with numeric value 0.
517X<< <=> >>
518X<spaceship>
7d3a9d88 519
db691027
SF
520 $ perl -le '$x = "NaN"; print "No NaN support here" if $x == $x'
521 $ perl -le '$x = "NaN"; print "NaN support here" if $x != $x'
1ca345ed 522
db691027 523(Note that the L<bigint>, L<bigrat>, and L<bignum> pragmas all
ba7f043c 524support C<"NaN">.)
a0d0e21e 525
ba7f043c 526Binary C<"eq"> returns true if the left argument is stringwise equal to
a0d0e21e 527the right argument.
d74e8afc 528X<eq>
a0d0e21e 529
ba7f043c 530Binary C<"ne"> returns true if the left argument is stringwise not equal
a0d0e21e 531to the right argument.
d74e8afc 532X<ne>
a0d0e21e 533
ba7f043c 534Binary C<"cmp"> returns -1, 0, or 1 depending on whether the left
d4ad863d
JH
535argument is stringwise less than, equal to, or greater than the right
536argument.
d74e8afc 537X<cmp>
a0d0e21e 538
ba7f043c 539Binary C<"~~"> does a smartmatch between its arguments. Smart matching
1ca345ed 540is described in the next section.
0d863452
RH
541X<~~>
542
ba7f043c
KW
543C<"lt">, C<"le">, C<"ge">, C<"gt"> and C<"cmp"> use the collation (sort)
544order specified by the current C<LC_COLLATE> locale if a S<C<use
545locale>> form that includes collation is in effect. See L<perllocale>.
546Do not mix these with Unicode,
547only use them with legacy 8-bit locale encodings.
548The standard C<L<Unicode::Collate>> and
549C<L<Unicode::Collate::Locale>> modules offer much more powerful
550solutions to collation issues.
1ca345ed 551
82365311
DG
552For case-insensitive comparisions, look at the L<perlfunc/fc> case-folding
553function, available in Perl v5.16 or later:
554
555 if ( fc($x) eq fc($y) ) { ... }
556
1ca345ed
TC
557=head2 Smartmatch Operator
558
559First available in Perl 5.10.1 (the 5.10.0 version behaved differently),
560binary C<~~> does a "smartmatch" between its arguments. This is mostly
561used implicitly in the C<when> construct described in L<perlsyn>, although
562not all C<when> clauses call the smartmatch operator. Unique among all of
cc08d69f
RS
563Perl's operators, the smartmatch operator can recurse. The smartmatch
564operator is L<experimental|perlpolicy/experimental> and its behavior is
565subject to change.
1ca345ed
TC
566
567It is also unique in that all other Perl operators impose a context
568(usually string or numeric context) on their operands, autoconverting
569those operands to those imposed contexts. In contrast, smartmatch
570I<infers> contexts from the actual types of its operands and uses that
571type information to select a suitable comparison mechanism.
572
573The C<~~> operator compares its operands "polymorphically", determining how
574to compare them according to their actual types (numeric, string, array,
575hash, etc.) Like the equality operators with which it shares the same
576precedence, C<~~> returns 1 for true and C<""> for false. It is often best
577read aloud as "in", "inside of", or "is contained in", because the left
578operand is often looked for I<inside> the right operand. That makes the
40bec8a5 579order of the operands to the smartmatch operand often opposite that of
1ca345ed
TC
580the regular match operator. In other words, the "smaller" thing is usually
581placed in the left operand and the larger one in the right.
582
583The behavior of a smartmatch depends on what type of things its arguments
584are, as determined by the following table. The first row of the table
585whose types apply determines the smartmatch behavior. Because what
586actually happens is mostly determined by the type of the second operand,
587the table is sorted on the right operand instead of on the left.
588
a727cfac 589 Left Right Description and pseudocode
1ca345ed 590 ===============================================================
a727cfac 591 Any undef check whether Any is undefined
1ca345ed
TC
592 like: !defined Any
593
594 Any Object invoke ~~ overloading on Object, or die
595
596 Right operand is an ARRAY:
597
a727cfac 598 Left Right Description and pseudocode
1ca345ed
TC
599 ===============================================================
600 ARRAY1 ARRAY2 recurse on paired elements of ARRAY1 and ARRAY2[2]
601 like: (ARRAY1[0] ~~ ARRAY2[0])
602 && (ARRAY1[1] ~~ ARRAY2[1]) && ...
a727cfac 603 HASH ARRAY any ARRAY elements exist as HASH keys
1ca345ed
TC
604 like: grep { exists HASH->{$_} } ARRAY
605 Regexp ARRAY any ARRAY elements pattern match Regexp
606 like: grep { /Regexp/ } ARRAY
a727cfac 607 undef ARRAY undef in ARRAY
1ca345ed 608 like: grep { !defined } ARRAY
a727cfac 609 Any ARRAY smartmatch each ARRAY element[3]
1ca345ed
TC
610 like: grep { Any ~~ $_ } ARRAY
611
612 Right operand is a HASH:
613
a727cfac 614 Left Right Description and pseudocode
1ca345ed 615 ===============================================================
a727cfac 616 HASH1 HASH2 all same keys in both HASHes
1ca345ed
TC
617 like: keys HASH1 ==
618 grep { exists HASH2->{$_} } keys HASH1
a727cfac 619 ARRAY HASH any ARRAY elements exist as HASH keys
1ca345ed 620 like: grep { exists HASH->{$_} } ARRAY
a727cfac 621 Regexp HASH any HASH keys pattern match Regexp
1ca345ed 622 like: grep { /Regexp/ } keys HASH
a727cfac 623 undef HASH always false (undef can't be a key)
1ca345ed 624 like: 0 == 1
a727cfac 625 Any HASH HASH key existence
1ca345ed
TC
626 like: exists HASH->{Any}
627
628 Right operand is CODE:
f703fc96 629
a727cfac 630 Left Right Description and pseudocode
1ca345ed
TC
631 ===============================================================
632 ARRAY CODE sub returns true on all ARRAY elements[1]
633 like: !grep { !CODE->($_) } ARRAY
634 HASH CODE sub returns true on all HASH keys[1]
635 like: !grep { !CODE->($_) } keys HASH
a727cfac 636 Any CODE sub passed Any returns true
1ca345ed
TC
637 like: CODE->(Any)
638
639Right operand is a Regexp:
640
a727cfac 641 Left Right Description and pseudocode
1ca345ed 642 ===============================================================
a727cfac 643 ARRAY Regexp any ARRAY elements match Regexp
1ca345ed 644 like: grep { /Regexp/ } ARRAY
a727cfac 645 HASH Regexp any HASH keys match Regexp
1ca345ed 646 like: grep { /Regexp/ } keys HASH
a727cfac 647 Any Regexp pattern match
1ca345ed
TC
648 like: Any =~ /Regexp/
649
650 Other:
651
a727cfac 652 Left Right Description and pseudocode
1ca345ed
TC
653 ===============================================================
654 Object Any invoke ~~ overloading on Object,
655 or fall back to...
656
a727cfac 657 Any Num numeric equality
1ca345ed
TC
658 like: Any == Num
659 Num nummy[4] numeric equality
660 like: Num == nummy
661 undef Any check whether undefined
662 like: !defined(Any)
a727cfac 663 Any Any string equality
1ca345ed
TC
664 like: Any eq Any
665
666
667Notes:
668
669=over
670
671=item 1.
a727cfac 672Empty hashes or arrays match.
1ca345ed
TC
673
674=item 2.
40bec8a5 675That is, each element smartmatches the element of the same index in the other array.[3]
1ca345ed
TC
676
677=item 3.
a727cfac 678If a circular reference is found, fall back to referential equality.
1ca345ed
TC
679
680=item 4.
681Either an actual number, or a string that looks like one.
682
683=back
684
685The smartmatch implicitly dereferences any non-blessed hash or array
686reference, so the C<I<HASH>> and C<I<ARRAY>> entries apply in those cases.
687For blessed references, the C<I<Object>> entries apply. Smartmatches
688involving hashes only consider hash keys, never hash values.
689
690The "like" code entry is not always an exact rendition. For example, the
40bec8a5 691smartmatch operator short-circuits whenever possible, but C<grep> does
1ca345ed
TC
692not. Also, C<grep> in scalar context returns the number of matches, but
693C<~~> returns only true or false.
694
695Unlike most operators, the smartmatch operator knows to treat C<undef>
696specially:
697
698 use v5.10.1;
699 @array = (1, 2, 3, undef, 4, 5);
700 say "some elements undefined" if undef ~~ @array;
701
702Each operand is considered in a modified scalar context, the modification
703being that array and hash variables are passed by reference to the
704operator, which implicitly dereferences them. Both elements
705of each pair are the same:
706
707 use v5.10.1;
708
709 my %hash = (red => 1, blue => 2, green => 3,
710 orange => 4, yellow => 5, purple => 6,
711 black => 7, grey => 8, white => 9);
712
713 my @array = qw(red blue green);
714
715 say "some array elements in hash keys" if @array ~~ %hash;
716 say "some array elements in hash keys" if \@array ~~ \%hash;
717
718 say "red in array" if "red" ~~ @array;
719 say "red in array" if "red" ~~ \@array;
720
721 say "some keys end in e" if /e$/ ~~ %hash;
722 say "some keys end in e" if /e$/ ~~ \%hash;
723
40bec8a5
TC
724Two arrays smartmatch if each element in the first array smartmatches
725(that is, is "in") the corresponding element in the second array,
726recursively.
1ca345ed
TC
727
728 use v5.10.1;
729 my @little = qw(red blue green);
730 my @bigger = ("red", "blue", [ "orange", "green" ] );
731 if (@little ~~ @bigger) { # true!
732 say "little is contained in bigger";
a727cfac 733 }
1ca345ed
TC
734
735Because the smartmatch operator recurses on nested arrays, this
736will still report that "red" is in the array.
737
738 use v5.10.1;
739 my @array = qw(red blue green);
740 my $nested_array = [[[[[[[ @array ]]]]]]];
741 say "red in array" if "red" ~~ $nested_array;
742
743If two arrays smartmatch each other, then they are deep
744copies of each others' values, as this example reports:
745
746 use v5.12.0;
a727cfac
SF
747 my @a = (0, 1, 2, [3, [4, 5], 6], 7);
748 my @b = (0, 1, 2, [3, [4, 5], 6], 7);
1ca345ed
TC
749
750 if (@a ~~ @b && @b ~~ @a) {
751 say "a and b are deep copies of each other";
a727cfac 752 }
1ca345ed
TC
753 elsif (@a ~~ @b) {
754 say "a smartmatches in b";
a727cfac 755 }
1ca345ed
TC
756 elsif (@b ~~ @a) {
757 say "b smartmatches in a";
a727cfac 758 }
1ca345ed
TC
759 else {
760 say "a and b don't smartmatch each other at all";
a727cfac 761 }
1ca345ed
TC
762
763
ba7f043c
KW
764If you were to set S<C<$b[3] = 4>>, then instead of reporting that "a and b
765are deep copies of each other", it now reports that C<"b smartmatches in a">.
766That's because the corresponding position in C<@a> contains an array that
1ca345ed
TC
767(eventually) has a 4 in it.
768
769Smartmatching one hash against another reports whether both contain the
46f8a5ea 770same keys, no more and no less. This could be used to see whether two
1ca345ed
TC
771records have the same field names, without caring what values those fields
772might have. For example:
773
774 use v5.10.1;
775 sub make_dogtag {
776 state $REQUIRED_FIELDS = { name=>1, rank=>1, serial_num=>1 };
777
778 my ($class, $init_fields) = @_;
779
780 die "Must supply (only) name, rank, and serial number"
781 unless $init_fields ~~ $REQUIRED_FIELDS;
782
783 ...
784 }
785
1b590b38
LM
786However, this only does what you mean if C<$init_fields> is indeed a hash
787reference. The condition C<$init_fields ~~ $REQUIRED_FIELDS> also allows the
788strings C<"name">, C<"rank">, C<"serial_num"> as well as any array reference
789that contains C<"name"> or C<"rank"> or C<"serial_num"> anywhere to pass
790through.
1ca345ed
TC
791
792The smartmatch operator is most often used as the implicit operator of a
793C<when> clause. See the section on "Switch Statements" in L<perlsyn>.
794
795=head3 Smartmatching of Objects
796
40bec8a5
TC
797To avoid relying on an object's underlying representation, if the
798smartmatch's right operand is an object that doesn't overload C<~~>,
799it raises the exception "C<Smartmatching a non-overloaded object
46f8a5ea
FC
800breaks encapsulation>". That's because one has no business digging
801around to see whether something is "in" an object. These are all
40bec8a5 802illegal on objects without a C<~~> overload:
1ca345ed
TC
803
804 %hash ~~ $object
805 42 ~~ $object
806 "fred" ~~ $object
807
808However, you can change the way an object is smartmatched by overloading
46f8a5ea
FC
809the C<~~> operator. This is allowed to
810extend the usual smartmatch semantics.
1ca345ed
TC
811For objects that do have an C<~~> overload, see L<overload>.
812
813Using an object as the left operand is allowed, although not very useful.
814Smartmatching rules take precedence over overloading, so even if the
815object in the left operand has smartmatch overloading, this will be
816ignored. A left operand that is a non-overloaded object falls back on a
817string or numeric comparison of whatever the C<ref> operator returns. That
818means that
819
820 $object ~~ X
821
822does I<not> invoke the overload method with C<I<X>> as an argument.
823Instead the above table is consulted as normal, and based on the type of
824C<I<X>>, overloading may or may not be invoked. For simple strings or
ba7f043c 825numbers, "in" becomes equivalent to this:
1ca345ed
TC
826
827 $object ~~ $number ref($object) == $number
a727cfac 828 $object ~~ $string ref($object) eq $string
1ca345ed
TC
829
830For example, this reports that the handle smells IOish
831(but please don't really do this!):
832
833 use IO::Handle;
834 my $fh = IO::Handle->new();
835 if ($fh ~~ /\bIO\b/) {
836 say "handle smells IOish";
a727cfac 837 }
1ca345ed
TC
838
839That's because it treats C<$fh> as a string like
840C<"IO::Handle=GLOB(0x8039e0)">, then pattern matches against that.
a034a98d 841
a0d0e21e 842=head2 Bitwise And
d74e8afc 843X<operator, bitwise, and> X<bitwise and> X<&>
a0d0e21e 844
ba7f043c 845Binary C<"&"> returns its operands ANDed together bit by bit. Although no
c791a246
KW
846warning is currently raised, the result is not well defined when this operation
847is performed on operands that aren't either numbers (see
5a0de581 848L</Integer Arithmetic>) nor bitstrings (see L</Bitwise String Operators>).
a0d0e21e 849
ba7f043c 850Note that C<"&"> has lower priority than relational operators, so for example
1ca345ed 851the parentheses are essential in a test like
2cdc098b 852
1ca345ed 853 print "Even\n" if ($x & 1) == 0;
2cdc098b 854
ba7f043c
KW
855If the experimental "bitwise" feature is enabled via S<C<use feature
856'bitwise'>>, then this operator always treats its operand as numbers. This
857feature produces a warning unless you also use C<S<no warnings
858'experimental::bitwise'>>.
fb7054ba 859
a0d0e21e 860=head2 Bitwise Or and Exclusive Or
d74e8afc
ITB
861X<operator, bitwise, or> X<bitwise or> X<|> X<operator, bitwise, xor>
862X<bitwise xor> X<^>
a0d0e21e 863
ba7f043c 864Binary C<"|"> returns its operands ORed together bit by bit.
a0d0e21e 865
ba7f043c 866Binary C<"^"> returns its operands XORed together bit by bit.
c791a246
KW
867
868Although no warning is currently raised, the results are not well
869defined when these operations are performed on operands that aren't either
5a0de581 870numbers (see L</Integer Arithmetic>) nor bitstrings (see L</Bitwise String
c791a246 871Operators>).
a0d0e21e 872
ba7f043c
KW
873Note that C<"|"> and C<"^"> have lower priority than relational operators, so
874for example the parentheses are essential in a test like
2cdc098b 875
1ca345ed 876 print "false\n" if (8 | 2) != 10;
2cdc098b 877
ba7f043c
KW
878If the experimental "bitwise" feature is enabled via S<C<use feature
879'bitwise'>>, then this operator always treats its operand as numbers. This
880feature produces a warning unless you also use S<C<no warnings
881'experimental::bitwise'>>.
fb7054ba 882
a0d0e21e 883=head2 C-style Logical And
d74e8afc 884X<&&> X<logical and> X<operator, logical, and>
a0d0e21e 885
ba7f043c 886Binary C<"&&"> performs a short-circuit logical AND operation. That is,
a0d0e21e
LW
887if the left operand is false, the right operand is not even evaluated.
888Scalar or list context propagates down to the right operand if it
889is evaluated.
890
891=head2 C-style Logical Or
d74e8afc 892X<||> X<operator, logical, or>
a0d0e21e 893
ba7f043c 894Binary C<"||"> performs a short-circuit logical OR operation. That is,
a0d0e21e
LW
895if the left operand is true, the right operand is not even evaluated.
896Scalar or list context propagates down to the right operand if it
897is evaluated.
898
26d9d83b 899=head2 Logical Defined-Or
d74e8afc 900X<//> X<operator, logical, defined-or>
c963b151
BD
901
902Although it has no direct equivalent in C, Perl's C<//> operator is related
ba7f043c 903to its C-style "or". In fact, it's exactly the same as C<||>, except that it
95bee9ba 904tests the left hand side's definedness instead of its truth. Thus,
ba7f043c 905S<C<< EXPR1 // EXPR2 >>> returns the value of C<< EXPR1 >> if it's defined,
46f8a5ea
FC
906otherwise, the value of C<< EXPR2 >> is returned.
907(C<< EXPR1 >> is evaluated in scalar context, C<< EXPR2 >>
908in the context of C<< // >> itself). Usually,
ba7f043c
KW
909this is the same result as S<C<< defined(EXPR1) ? EXPR1 : EXPR2 >>> (except that
910the ternary-operator form can be used as a lvalue, while S<C<< EXPR1 // EXPR2 >>>
46f8a5ea 911cannot). This is very useful for
bdc7923b 912providing default values for variables. If you actually want to test if
ba7f043c 913at least one of C<$x> and C<$y> is defined, use S<C<defined($x // $y)>>.
c963b151 914
d042e63d 915The C<||>, C<//> and C<&&> operators return the last value evaluated
46f8a5ea 916(unlike C's C<||> and C<&&>, which return 0 or 1). Thus, a reasonably
d042e63d 917portable way to find out the home directory might be:
a0d0e21e 918
c543c01b
TC
919 $home = $ENV{HOME}
920 // $ENV{LOGDIR}
921 // (getpwuid($<))[7]
922 // die "You're homeless!\n";
a0d0e21e 923
5a964f20
TC
924In particular, this means that you shouldn't use this
925for selecting between two aggregates for assignment:
926
bf55d65d
LTC
927 @a = @b || @c; # This doesn't do the right thing
928 @a = scalar(@b) || @c; # because it really means this.
929 @a = @b ? @b : @c; # This works fine, though.
5a964f20 930
1ca345ed 931As alternatives to C<&&> and C<||> when used for
f23102e2 932control flow, Perl provides the C<and> and C<or> operators (see below).
ba7f043c
KW
933The short-circuit behavior is identical. The precedence of C<"and">
934and C<"or"> is much lower, however, so that you can safely use them after a
5a964f20 935list operator without the need for parentheses:
a0d0e21e
LW
936
937 unlink "alpha", "beta", "gamma"
938 or gripe(), next LINE;
939
940With the C-style operators that would have been written like this:
941
942 unlink("alpha", "beta", "gamma")
943 || (gripe(), next LINE);
944
1ca345ed
TC
945It would be even more readable to write that this way:
946
947 unless(unlink("alpha", "beta", "gamma")) {
948 gripe();
949 next LINE;
a727cfac 950 }
1ca345ed 951
ba7f043c 952Using C<"or"> for assignment is unlikely to do what you want; see below.
5a964f20
TC
953
954=head2 Range Operators
d74e8afc 955X<operator, range> X<range> X<..> X<...>
a0d0e21e 956
ba7f043c 957Binary C<".."> is the range operator, which is really two different
fb53bbb2 958operators depending on the context. In list context, it returns a
54ae734e 959list of values counting (up by ones) from the left value to the right
2cdbc966 960value. If the left value is greater than the right value then it
fb53bbb2 961returns the empty list. The range operator is useful for writing
ba7f043c 962S<C<foreach (1..10)>> loops and for doing slice operations on arrays. In
2cdbc966
JD
963the current implementation, no temporary array is created when the
964range operator is used as the expression in C<foreach> loops, but older
965versions of Perl might burn a lot of memory when you write something
966like this:
a0d0e21e
LW
967
968 for (1 .. 1_000_000) {
969 # code
54310121 970 }
a0d0e21e 971
8f0f46f8 972The range operator also works on strings, using the magical
973auto-increment, see below.
54ae734e 974
ba7f043c 975In scalar context, C<".."> returns a boolean value. The operator is
8f0f46f8 976bistable, like a flip-flop, and emulates the line-range (comma)
ba7f043c 977operator of B<sed>, B<awk>, and various editors. Each C<".."> operator
8f0f46f8 978maintains its own boolean state, even across calls to a subroutine
46f8a5ea 979that contains it. It is false as long as its left operand is false.
a0d0e21e
LW
980Once the left operand is true, the range operator stays true until the
981right operand is true, I<AFTER> which the range operator becomes false
8f0f46f8 982again. It doesn't become false till the next time the range operator
983is evaluated. It can test the right operand and become false on the
984same evaluation it became true (as in B<awk>), but it still returns
46f8a5ea 985true once. If you don't want it to test the right operand until the
ba7f043c
KW
986next evaluation, as in B<sed>, just use three dots (C<"...">) instead of
987two. In all other regards, C<"..."> behaves just like C<".."> does.
19799a22
GS
988
989The right operand is not evaluated while the operator is in the
990"false" state, and the left operand is not evaluated while the
991operator is in the "true" state. The precedence is a little lower
992than || and &&. The value returned is either the empty string for
8f0f46f8 993false, or a sequence number (beginning with 1) for true. The sequence
994number is reset for each range encountered. The final sequence number
ba7f043c 995in a range has the string C<"E0"> appended to it, which doesn't affect
8f0f46f8 996its numeric value, but gives you something to search for if you want
997to exclude the endpoint. You can exclude the beginning point by
998waiting for the sequence number to be greater than 1.
df5f8116 999
ba7f043c 1000If either operand of scalar C<".."> is a constant expression,
df5f8116
CW
1001that operand is considered true if it is equal (C<==>) to the current
1002input line number (the C<$.> variable).
1003
ba7f043c 1004To be pedantic, the comparison is actually S<C<int(EXPR) == int(EXPR)>>,
df5f8116
CW
1005but that is only an issue if you use a floating point expression; when
1006implicitly using C<$.> as described in the previous paragraph, the
ba7f043c 1007comparison is S<C<int(EXPR) == int($.)>> which is only an issue when C<$.>
df5f8116 1008is set to a floating point value and you are not reading from a file.
ba7f043c 1009Furthermore, S<C<"span" .. "spat">> or S<C<2.18 .. 3.14>> will not do what
df5f8116
CW
1010you want in scalar context because each of the operands are evaluated
1011using their integer representation.
1012
1013Examples:
a0d0e21e
LW
1014
1015As a scalar operator:
1016
df5f8116 1017 if (101 .. 200) { print; } # print 2nd hundred lines, short for
950b09ed 1018 # if ($. == 101 .. $. == 200) { print; }
9f10b797
RGS
1019
1020 next LINE if (1 .. /^$/); # skip header lines, short for
f343f960 1021 # next LINE if ($. == 1 .. /^$/);
9f10b797
RGS
1022 # (typically in a loop labeled LINE)
1023
1024 s/^/> / if (/^$/ .. eof()); # quote body
a0d0e21e 1025
5a964f20
TC
1026 # parse mail messages
1027 while (<>) {
1028 $in_header = 1 .. /^$/;
df5f8116
CW
1029 $in_body = /^$/ .. eof;
1030 if ($in_header) {
f343f960 1031 # do something
df5f8116 1032 } else { # in body
f343f960 1033 # do something else
df5f8116 1034 }
5a964f20 1035 } continue {
df5f8116 1036 close ARGV if eof; # reset $. each file
5a964f20
TC
1037 }
1038
acf31ca5
SF
1039Here's a simple example to illustrate the difference between
1040the two range operators:
1041
1042 @lines = (" - Foo",
1043 "01 - Bar",
1044 "1 - Baz",
1045 " - Quux");
1046
9f10b797
RGS
1047 foreach (@lines) {
1048 if (/0/ .. /1/) {
acf31ca5
SF
1049 print "$_\n";
1050 }
1051 }
1052
46f8a5ea 1053This program will print only the line containing "Bar". If
9f10b797 1054the range operator is changed to C<...>, it will also print the
acf31ca5
SF
1055"Baz" line.
1056
1057And now some examples as a list operator:
a0d0e21e 1058
1ca345ed
TC
1059 for (101 .. 200) { print } # print $_ 100 times
1060 @foo = @foo[0 .. $#foo]; # an expensive no-op
1061 @foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items
a0d0e21e 1062
5a964f20 1063The range operator (in list context) makes use of the magical
5f05dabc 1064auto-increment algorithm if the operands are strings. You
a0d0e21e
LW
1065can say
1066
c543c01b 1067 @alphabet = ("A" .. "Z");
a0d0e21e 1068
54ae734e 1069to get all normal letters of the English alphabet, or
a0d0e21e 1070
c543c01b 1071 $hexdigit = (0 .. 9, "a" .. "f")[$num & 15];
a0d0e21e
LW
1072
1073to get a hexadecimal digit, or
1074
1ca345ed
TC
1075 @z2 = ("01" .. "31");
1076 print $z2[$mday];
a0d0e21e 1077
ea4f5703
YST
1078to get dates with leading zeros.
1079
1080If the final value specified is not in the sequence that the magical
1081increment would produce, the sequence goes until the next value would
1082be longer than the final value specified.
1083
d6c970c7
AC
1084As of Perl 5.26, the list-context range operator on strings works as expected
1085in the scope of L<< S<C<"use feature 'unicode_strings">>|feature/The
1086'unicode_strings' feature >>. In previous versions, and outside the scope of
1087that feature, it exhibits L<perlunicode/The "Unicode Bug">: its behavior
1088depends on the internal encoding of the range endpoint.
1089
ea4f5703 1090If the initial value specified isn't part of a magical increment
c543c01b 1091sequence (that is, a non-empty string matching C</^[a-zA-Z]*[0-9]*\z/>),
ea4f5703
YST
1092only the initial value will be returned. So the following will only
1093return an alpha:
1094
c543c01b 1095 use charnames "greek";
ea4f5703
YST
1096 my @greek_small = ("\N{alpha}" .. "\N{omega}");
1097
c543c01b
TC
1098To get the 25 traditional lowercase Greek letters, including both sigmas,
1099you could use this instead:
ea4f5703 1100
c543c01b 1101 use charnames "greek";
a727cfac 1102 my @greek_small = map { chr } ( ord("\N{alpha}")
1ca345ed 1103 ..
a727cfac 1104 ord("\N{omega}")
1ca345ed 1105 );
c543c01b
TC
1106
1107However, because there are I<many> other lowercase Greek characters than
1108just those, to match lowercase Greek characters in a regular expression,
47c56cc8
KW
1109you could use the pattern C</(?:(?=\p{Greek})\p{Lower})+/> (or the
1110L<experimental feature|perlrecharclass/Extended Bracketed Character
1111Classes> C<S</(?[ \p{Greek} & \p{Lower} ])+/>>).
a0d0e21e 1112
ba7f043c 1113Because each operand is evaluated in integer form, S<C<2.18 .. 3.14>> will
df5f8116
CW
1114return two elements in list context.
1115
1116 @list = (2.18 .. 3.14); # same as @list = (2 .. 3);
1117
a0d0e21e 1118=head2 Conditional Operator
d74e8afc 1119X<operator, conditional> X<operator, ternary> X<ternary> X<?:>
a0d0e21e 1120
ba7f043c
KW
1121Ternary C<"?:"> is the conditional operator, just as in C. It works much
1122like an if-then-else. If the argument before the C<?> is true, the
1123argument before the C<:> is returned, otherwise the argument after the
1124C<:> is returned. For example:
cb1a09d0 1125
54310121 1126 printf "I have %d dog%s.\n", $n,
c543c01b 1127 ($n == 1) ? "" : "s";
cb1a09d0
AD
1128
1129Scalar or list context propagates downward into the 2nd
54310121 1130or 3rd argument, whichever is selected.
cb1a09d0 1131
db691027
SF
1132 $x = $ok ? $y : $z; # get a scalar
1133 @x = $ok ? @y : @z; # get an array
1134 $x = $ok ? @y : @z; # oops, that's just a count!
cb1a09d0
AD
1135
1136The operator may be assigned to if both the 2nd and 3rd arguments are
1137legal lvalues (meaning that you can assign to them):
a0d0e21e 1138
db691027 1139 ($x_or_y ? $x : $y) = $z;
a0d0e21e 1140
5a964f20
TC
1141Because this operator produces an assignable result, using assignments
1142without parentheses will get you in trouble. For example, this:
1143
db691027 1144 $x % 2 ? $x += 10 : $x += 2
5a964f20
TC
1145
1146Really means this:
1147
db691027 1148 (($x % 2) ? ($x += 10) : $x) += 2
5a964f20
TC
1149
1150Rather than this:
1151
db691027 1152 ($x % 2) ? ($x += 10) : ($x += 2)
5a964f20 1153
19799a22
GS
1154That should probably be written more simply as:
1155
db691027 1156 $x += ($x % 2) ? 10 : 2;
19799a22 1157
4633a7c4 1158=head2 Assignment Operators
d74e8afc 1159X<assignment> X<operator, assignment> X<=> X<**=> X<+=> X<*=> X<&=>
5ac3b81c 1160X<<< <<= >>> X<&&=> X<-=> X</=> X<|=> X<<< >>= >>> X<||=> X<//=> X<.=>
fb7054ba 1161X<%=> X<^=> X<x=> X<&.=> X<|.=> X<^.=>
a0d0e21e 1162
ba7f043c 1163C<"="> is the ordinary assignment operator.
a0d0e21e
LW
1164
1165Assignment operators work as in C. That is,
1166
db691027 1167 $x += 2;
a0d0e21e
LW
1168
1169is equivalent to
1170
db691027 1171 $x = $x + 2;
a0d0e21e
LW
1172
1173although without duplicating any side effects that dereferencing the lvalue
ba7f043c 1174might trigger, such as from C<tie()>. Other assignment operators work similarly.
54310121 1175The following are recognized:
a0d0e21e 1176
fb7054ba
FC
1177 **= += *= &= &.= <<= &&=
1178 -= /= |= |.= >>= ||=
1179 .= %= ^= ^.= //=
9f10b797 1180 x=
a0d0e21e 1181
19799a22 1182Although these are grouped by family, they all have the precedence
82848c10
FC
1183of assignment. These combined assignment operators can only operate on
1184scalars, whereas the ordinary assignment operator can assign to arrays,
1185hashes, lists and even references. (See L<"Context"|perldata/Context>
1186and L<perldata/List value constructors>, and L<perlref/Assigning to
1187References>.)
a0d0e21e 1188
b350dd2f
GS
1189Unlike in C, the scalar assignment operator produces a valid lvalue.
1190Modifying an assignment is equivalent to doing the assignment and
1191then modifying the variable that was assigned to. This is useful
1192for modifying a copy of something, like this:
a0d0e21e 1193
1ca345ed
TC
1194 ($tmp = $global) =~ tr/13579/24680/;
1195
1196Although as of 5.14, that can be also be accomplished this way:
1197
1198 use v5.14;
1199 $tmp = ($global =~ tr/13579/24680/r);
a0d0e21e
LW
1200
1201Likewise,
1202
db691027 1203 ($x += 2) *= 3;
a0d0e21e
LW
1204
1205is equivalent to
1206
db691027
SF
1207 $x += 2;
1208 $x *= 3;
a0d0e21e 1209
b350dd2f
GS
1210Similarly, a list assignment in list context produces the list of
1211lvalues assigned to, and a list assignment in scalar context returns
1212the number of elements produced by the expression on the right hand
1213side of the assignment.
1214
ba7f043c 1215The three dotted bitwise assignment operators (C<&.=> C<|.=> C<^.=>) are new in
fb7054ba
FC
1216Perl 5.22 and experimental. See L</Bitwise String Operators>.
1217
748a9306 1218=head2 Comma Operator
d74e8afc 1219X<comma> X<operator, comma> X<,>
a0d0e21e 1220
ba7f043c 1221Binary C<","> is the comma operator. In scalar context it evaluates
a0d0e21e
LW
1222its left argument, throws that value away, then evaluates its right
1223argument and returns that value. This is just like C's comma operator.
1224
5a964f20 1225In list context, it's just the list argument separator, and inserts
ed5c6d31
PJ
1226both its arguments into the list. These arguments are also evaluated
1227from left to right.
a0d0e21e 1228
ba7f043c
KW
1229The C<< => >> operator (sometimes pronounced "fat comma") is a synonym
1230for the comma except that it causes a
4e1988c6 1231word on its left to be interpreted as a string if it begins with a letter
344f2c40
IG
1232or underscore and is composed only of letters, digits and underscores.
1233This includes operands that might otherwise be interpreted as operators,
46f8a5ea 1234constants, single number v-strings or function calls. If in doubt about
c543c01b 1235this behavior, the left operand can be quoted explicitly.
344f2c40
IG
1236
1237Otherwise, the C<< => >> operator behaves exactly as the comma operator
1238or list argument separator, according to context.
1239
1240For example:
a44e5664
MS
1241
1242 use constant FOO => "something";
1243
1244 my %h = ( FOO => 23 );
1245
1246is equivalent to:
1247
1248 my %h = ("FOO", 23);
1249
1250It is I<NOT>:
1251
1252 my %h = ("something", 23);
1253
719b43e8
RGS
1254The C<< => >> operator is helpful in documenting the correspondence
1255between keys and values in hashes, and other paired elements in lists.
748a9306 1256
a12b8f3c
FC
1257 %hash = ( $key => $value );
1258 login( $username => $password );
a44e5664 1259
4e1988c6
FC
1260The special quoting behavior ignores precedence, and hence may apply to
1261I<part> of the left operand:
1262
1263 print time.shift => "bbb";
1264
ba7f043c 1265That example prints something like C<"1314363215shiftbbb">, because the
4e1988c6
FC
1266C<< => >> implicitly quotes the C<shift> immediately on its left, ignoring
1267the fact that C<time.shift> is the entire left operand.
1268
a0d0e21e 1269=head2 List Operators (Rightward)
d74e8afc 1270X<operator, list, rightward> X<list operator>
a0d0e21e 1271
c543c01b 1272On the right side of a list operator, the comma has very low precedence,
a0d0e21e
LW
1273such that it controls all comma-separated expressions found there.
1274The only operators with lower precedence are the logical operators
ba7f043c 1275C<"and">, C<"or">, and C<"not">, which may be used to evaluate calls to list
1ca345ed
TC
1276operators without the need for parentheses:
1277
a8980281
P
1278 open HANDLE, "< :encoding(UTF-8)", "filename"
1279 or die "Can't open: $!\n";
1ca345ed
TC
1280
1281However, some people find that code harder to read than writing
1282it with parentheses:
1283
a8980281
P
1284 open(HANDLE, "< :encoding(UTF-8)", "filename")
1285 or die "Can't open: $!\n";
1ca345ed 1286
ba7f043c 1287in which case you might as well just use the more customary C<"||"> operator:
a0d0e21e 1288
a8980281
P
1289 open(HANDLE, "< :encoding(UTF-8)", "filename")
1290 || die "Can't open: $!\n";
a0d0e21e 1291
a95b3d6a 1292See also discussion of list operators in L</Terms and List Operators (Leftward)>.
a0d0e21e
LW
1293
1294=head2 Logical Not
d74e8afc 1295X<operator, logical, not> X<not>
a0d0e21e 1296
ba7f043c
KW
1297Unary C<"not"> returns the logical negation of the expression to its right.
1298It's the equivalent of C<"!"> except for the very low precedence.
a0d0e21e
LW
1299
1300=head2 Logical And
d74e8afc 1301X<operator, logical, and> X<and>
a0d0e21e 1302
ba7f043c 1303Binary C<"and"> returns the logical conjunction of the two surrounding
c543c01b
TC
1304expressions. It's equivalent to C<&&> except for the very low
1305precedence. This means that it short-circuits: the right
a0d0e21e
LW
1306expression is evaluated only if the left expression is true.
1307
59ab9d6e 1308=head2 Logical or and Exclusive Or
f23102e2 1309X<operator, logical, or> X<operator, logical, xor>
59ab9d6e 1310X<operator, logical, exclusive or>
f23102e2 1311X<or> X<xor>
a0d0e21e 1312
ba7f043c 1313Binary C<"or"> returns the logical disjunction of the two surrounding
c543c01b
TC
1314expressions. It's equivalent to C<||> except for the very low precedence.
1315This makes it useful for control flow:
5a964f20
TC
1316
1317 print FH $data or die "Can't write to FH: $!";
1318
c543c01b
TC
1319This means that it short-circuits: the right expression is evaluated
1320only if the left expression is false. Due to its precedence, you must
1321be careful to avoid using it as replacement for the C<||> operator.
1322It usually works out better for flow control than in assignments:
5a964f20 1323
db691027
SF
1324 $x = $y or $z; # bug: this is wrong
1325 ($x = $y) or $z; # really means this
1326 $x = $y || $z; # better written this way
5a964f20 1327
19799a22 1328However, when it's a list-context assignment and you're trying to use
ba7f043c 1329C<||> for control flow, you probably need C<"or"> so that the assignment
5a964f20
TC
1330takes higher precedence.
1331
1332 @info = stat($file) || die; # oops, scalar sense of stat!
1333 @info = stat($file) or die; # better, now @info gets its due
1334
c963b151
BD
1335Then again, you could always use parentheses.
1336
ba7f043c 1337Binary C<"xor"> returns the exclusive-OR of the two surrounding expressions.
c543c01b 1338It cannot short-circuit (of course).
a0d0e21e 1339
59ab9d6e
MB
1340There is no low precedence operator for defined-OR.
1341
a0d0e21e 1342=head2 C Operators Missing From Perl
d74e8afc
ITB
1343X<operator, missing from perl> X<&> X<*>
1344X<typecasting> X<(TYPE)>
a0d0e21e
LW
1345
1346Here is what C has that Perl doesn't:
1347
1348=over 8
1349
1350=item unary &
1351
ba7f043c 1352Address-of operator. (But see the C<"\"> operator for taking a reference.)
a0d0e21e
LW
1353
1354=item unary *
1355
46f8a5ea 1356Dereference-address operator. (Perl's prefix dereferencing
ba7f043c 1357operators are typed: C<$>, C<@>, C<%>, and C<&>.)
a0d0e21e
LW
1358
1359=item (TYPE)
1360
19799a22 1361Type-casting operator.
a0d0e21e
LW
1362
1363=back
1364
5f05dabc 1365=head2 Quote and Quote-like Operators
89d205f2 1366X<operator, quote> X<operator, quote-like> X<q> X<qq> X<qx> X<qw> X<m>
d74e8afc
ITB
1367X<qr> X<s> X<tr> X<'> X<''> X<"> X<""> X<//> X<`> X<``> X<<< << >>>
1368X<escape sequence> X<escape>
1369
a0d0e21e
LW
1370While we usually think of quotes as literal values, in Perl they
1371function as operators, providing various kinds of interpolating and
1372pattern matching capabilities. Perl provides customary quote characters
1373for these behaviors, but also provides a way for you to choose your
1374quote character for any of them. In the following table, a C<{}> represents
9f10b797 1375any pair of delimiters you choose.
a0d0e21e 1376
2c268ad5
TP
1377 Customary Generic Meaning Interpolates
1378 '' q{} Literal no
1379 "" qq{} Literal yes
af9219ee 1380 `` qx{} Command yes*
2c268ad5 1381 qw{} Word list no
af9219ee
MG
1382 // m{} Pattern match yes*
1383 qr{} Pattern yes*
1384 s{}{} Substitution yes*
2c268ad5 1385 tr{}{} Transliteration no (but see below)
c543c01b 1386 y{}{} Transliteration no (but see below)
7e3b091d 1387 <<EOF here-doc yes*
a0d0e21e 1388
af9219ee
MG
1389 * unless the delimiter is ''.
1390
87275199 1391Non-bracketing delimiters use the same character fore and aft, but the four
c543c01b 1392sorts of ASCII brackets (round, angle, square, curly) all nest, which means
9f10b797 1393that
87275199 1394
c543c01b 1395 q{foo{bar}baz}
35f2feb0 1396
9f10b797 1397is the same as
87275199 1398
c543c01b 1399 'foo{bar}baz'
87275199
GS
1400
1401Note, however, that this does not always work for quoting Perl code:
1402
db691027 1403 $s = q{ if($x eq "}") ... }; # WRONG
87275199 1404
ba7f043c 1405is a syntax error. The C<L<Text::Balanced>> module (standard as of v5.8,
c543c01b 1406and from CPAN before then) is able to do this properly.
87275199 1407
841bfb48
KW
1408There can (and in some cases, must) be whitespace between the operator
1409and the quoting
fb73857a 1410characters, except when C<#> is being used as the quoting character.
ba7f043c 1411C<q#foo#> is parsed as the string C<foo>, while S<C<q #foo#>> is the
19799a22
GS
1412operator C<q> followed by a comment. Its argument will be taken
1413from the next line. This allows you to write:
fb73857a
PP
1414
1415 s {foo} # Replace foo
1416 {bar} # with bar.
1417
841bfb48
KW
1418The cases where whitespace must be used are when the quoting character
1419is a word character (meaning it matches C</\w/>):
1420
1421 q XfooX # Works: means the string 'foo'
1422 qXfooX # WRONG!
1423
c543c01b
TC
1424The following escape sequences are available in constructs that interpolate,
1425and in transliterations:
5691ca5f 1426X<\t> X<\n> X<\r> X<\f> X<\b> X<\a> X<\e> X<\x> X<\0> X<\c> X<\N> X<\N{}>
04341565 1427X<\o{}>
5691ca5f 1428
2c4c1ff2
KW
1429 Sequence Note Description
1430 \t tab (HT, TAB)
1431 \n newline (NL)
1432 \r return (CR)
1433 \f form feed (FF)
1434 \b backspace (BS)
1435 \a alarm (bell) (BEL)
1436 \e escape (ESC)
c543c01b 1437 \x{263A} [1,8] hex char (example: SMILEY)
2c4c1ff2 1438 \x1b [2,8] restricted range hex char (example: ESC)
fb121860 1439 \N{name} [3] named Unicode character or character sequence
2c4c1ff2
KW
1440 \N{U+263D} [4,8] Unicode character (example: FIRST QUARTER MOON)
1441 \c[ [5] control char (example: chr(27))
1442 \o{23072} [6,8] octal char (example: SMILEY)
1443 \033 [7,8] restricted range octal char (example: ESC)
5691ca5f
KW
1444
1445=over 4
1446
1447=item [1]
1448
2c4c1ff2
KW
1449The result is the character specified by the hexadecimal number between
1450the braces. See L</[8]> below for details on which character.
96448467 1451
46f8a5ea 1452Only hexadecimal digits are valid between the braces. If an invalid
96448467
DG
1453character is encountered, a warning will be issued and the invalid
1454character and all subsequent characters (valid or invalid) within the
1455braces will be discarded.
1456
1457If there are no valid digits between the braces, the generated character is
1458the NULL character (C<\x{00}>). However, an explicit empty brace (C<\x{}>)
c543c01b 1459will not cause a warning (currently).
40687185
KW
1460
1461=item [2]
1462
2c4c1ff2
KW
1463The result is the character specified by the hexadecimal number in the range
14640x00 to 0xFF. See L</[8]> below for details on which character.
96448467
DG
1465
1466Only hexadecimal digits are valid following C<\x>. When C<\x> is followed
2c4c1ff2 1467by fewer than two valid digits, any valid digits will be zero-padded. This
ba7f043c 1468means that C<\x7> will be interpreted as C<\x07>, and a lone C<"\x"> will be
2c4c1ff2 1469interpreted as C<\x00>. Except at the end of a string, having fewer than
c543c01b 1470two valid digits will result in a warning. Note that although the warning
96448467
DG
1471says the illegal character is ignored, it is only ignored as part of the
1472escape and will still be used as the subsequent character in the string.
1473For example:
1474
1475 Original Result Warns?
1476 "\x7" "\x07" no
1477 "\x" "\x00" no
1478 "\x7q" "\x07q" yes
1479 "\xq" "\x00q" yes
1480
40687185
KW
1481=item [3]
1482
fb121860 1483The result is the Unicode character or character sequence given by I<name>.
2c4c1ff2 1484See L<charnames>.
40687185
KW
1485
1486=item [4]
1487
ba7f043c 1488S<C<\N{U+I<hexadecimal number>}>> means the Unicode character whose Unicode code
2c4c1ff2 1489point is I<hexadecimal number>.
40687185
KW
1490
1491=item [5]
1492
5691ca5f
KW
1493The character following C<\c> is mapped to some other character as shown in the
1494table:
1495
1496 Sequence Value
1497 \c@ chr(0)
1498 \cA chr(1)
1499 \ca chr(1)
1500 \cB chr(2)
1501 \cb chr(2)
1502 ...
1503 \cZ chr(26)
1504 \cz chr(26)
1505 \c[ chr(27)
ba7f043c 1506 # See below for chr(28)
5691ca5f
KW
1507 \c] chr(29)
1508 \c^ chr(30)
c3e9d7a9 1509 \c_ chr(31)
ba7f043c
KW
1510 \c? chr(127) # (on ASCII platforms; see below for link to
1511 # EBCDIC discussion)
5691ca5f 1512
d813941f 1513In other words, it's the character whose code point has had 64 xor'd with
c3e9d7a9
KW
1514its uppercase. C<\c?> is DELETE on ASCII platforms because
1515S<C<ord("?") ^ 64>> is 127, and
ba7f043c 1516C<\c@> is NULL because the ord of C<"@"> is 64, so xor'ing 64 itself produces 0.
d813941f 1517
ba7f043c 1518Also, C<\c\I<X>> yields S<C< chr(28) . "I<X>">> for any I<X>, but cannot come at the
5691ca5f
KW
1519end of a string, because the backslash would be parsed as escaping the end
1520quote.
1521
1522On ASCII platforms, the resulting characters from the list above are the
1523complete set of ASCII controls. This isn't the case on EBCDIC platforms; see
c3e9d7a9
KW
1524L<perlebcdic/OPERATOR DIFFERENCES> for a full discussion of the
1525differences between these for ASCII versus EBCDIC platforms.
5691ca5f 1526
c3e9d7a9 1527Use of any other character following the C<"c"> besides those listed above is
63a63d81
KW
1528discouraged, and as of Perl v5.20, the only characters actually allowed
1529are the printable ASCII ones, minus the left brace C<"{">. What happens
1530for any of the allowed other characters is that the value is derived by
1531xor'ing with the seventh bit, which is 64, and a warning raised if
1532enabled. Using the non-allowed characters generates a fatal error.
5691ca5f
KW
1533
1534To get platform independent controls, you can use C<\N{...}>.
1535
40687185
KW
1536=item [6]
1537
2c4c1ff2
KW
1538The result is the character specified by the octal number between the braces.
1539See L</[8]> below for details on which character.
04341565
DG
1540
1541If a character that isn't an octal digit is encountered, a warning is raised,
1542and the value is based on the octal digits before it, discarding it and all
1543following characters up to the closing brace. It is a fatal error if there are
1544no octal digits at all.
1545
1546=item [7]
1547
c543c01b 1548The result is the character specified by the three-digit octal number in the
2c4c1ff2
KW
1549range 000 to 777 (but best to not use above 077, see next paragraph). See
1550L</[8]> below for details on which character.
1551
1552Some contexts allow 2 or even 1 digit, but any usage without exactly
40687185 1553three digits, the first being a zero, may give unintended results. (For
5db3e519
FC
1554example, in a regular expression it may be confused with a backreference;
1555see L<perlrebackslash/Octal escapes>.) Starting in Perl 5.14, you may
c543c01b 1556use C<\o{}> instead, which avoids all these problems. Otherwise, it is best to
04341565
DG
1557use this construct only for ordinals C<\077> and below, remembering to pad to
1558the left with zeros to make three digits. For larger ordinals, either use
ba7f043c
KW
1559C<\o{}>, or convert to something else, such as to hex and use C<\N{U+}>
1560(which is portable between platforms with different character sets) or
1561C<\x{}> instead.
40687185 1562
2c4c1ff2
KW
1563=item [8]
1564
c543c01b 1565Several constructs above specify a character by a number. That number
2c4c1ff2 1566gives the character's position in the character set encoding (indexed from 0).
c543c01b 1567This is called synonymously its ordinal, code position, or code point. Perl
2c4c1ff2
KW
1568works on platforms that have a native encoding currently of either ASCII/Latin1
1569or EBCDIC, each of which allow specification of 256 characters. In general, if
1570the number is 255 (0xFF, 0377) or below, Perl interprets this in the platform's
1571native encoding. If the number is 256 (0x100, 0400) or above, Perl interprets
c543c01b 1572it as a Unicode code point and the result is the corresponding Unicode
2c4c1ff2
KW
1573character. For example C<\x{50}> and C<\o{120}> both are the number 80 in
1574decimal, which is less than 256, so the number is interpreted in the native
1575character set encoding. In ASCII the character in the 80th position (indexed
ba7f043c 1576from 0) is the letter C<"P">, and in EBCDIC it is the ampersand symbol C<"&">.
2c4c1ff2
KW
1577C<\x{100}> and C<\o{400}> are both 256 in decimal, so the number is interpreted
1578as a Unicode code point no matter what the native encoding is. The name of the
9fef6a0d 1579character in the 256th position (indexed by 0) in Unicode is
2c4c1ff2
KW
1580C<LATIN CAPITAL LETTER A WITH MACRON>.
1581
2dc9bc84 1582An exception to the above rule is that S<C<\N{U+I<hex number>}>> is
ba7f043c 1583always interpreted as a Unicode code point, so that C<\N{U+0050}> is C<"P"> even
2dc9bc84 1584on EBCDIC platforms.
2c4c1ff2 1585
5691ca5f 1586=back
4c77eaa2 1587
e526e8bb 1588B<NOTE>: Unlike C and other languages, Perl has no C<\v> escape sequence for
8b312c40 1589the vertical tab (VT, which is 11 in both ASCII and EBCDIC), but you may
ba7f043c 1590use C<\N{VT}>, C<\ck>, C<\N{U+0b}>, or C<\x0b>. (C<\v>
e526e8bb
KW
1591does have meaning in regular expression patterns in Perl, see L<perlre>.)
1592
1593The following escape sequences are available in constructs that interpolate,
904501ec 1594but not in transliterations.
628253b8 1595X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q> X<\F>
904501ec 1596
c543c01b
TC
1597 \l lowercase next character only
1598 \u titlecase (not uppercase!) next character only
e4d34742
EB
1599 \L lowercase all characters till \E or end of string
1600 \U uppercase all characters till \E or end of string
628253b8 1601 \F foldcase all characters till \E or end of string
736fe711
KW
1602 \Q quote (disable) pattern metacharacters till \E or
1603 end of string
7e31b643 1604 \E end either case modification or quoted section
c543c01b
TC
1605 (whichever was last seen)
1606
736fe711
KW
1607See L<perlfunc/quotemeta> for the exact definition of characters that
1608are quoted by C<\Q>.
1609
628253b8 1610C<\L>, C<\U>, C<\F>, and C<\Q> can stack, in which case you need one
c543c01b
TC
1611C<\E> for each. For example:
1612
9fef6a0d
KW
1613 say"This \Qquoting \ubusiness \Uhere isn't quite\E done yet,\E is it?";
1614 This quoting\ Business\ HERE\ ISN\'T\ QUITE\ done\ yet\, is it?
a0d0e21e 1615
ba7f043c
KW
1616If a S<C<use locale>> form that includes C<LC_CTYPE> is in effect (see
1617L<perllocale>), the case map used by C<\l>, C<\L>, C<\u>, and C<\U> is
1618taken from the current locale. If Unicode (for example, C<\N{}> or code
1619points of 0x100 or beyond) is being used, the case map used by C<\l>,
1620C<\L>, C<\u>, and C<\U> is as defined by Unicode. That means that
1621case-mapping a single character can sometimes produce a sequence of
1622several characters.
1623Under S<C<use locale>>, C<\F> produces the same results as C<\L>
31f05a37
KW
1624for all locales but a UTF-8 one, where it instead uses the Unicode
1625definition.
a034a98d 1626
5a964f20
TC
1627All systems use the virtual C<"\n"> to represent a line terminator,
1628called a "newline". There is no such thing as an unvarying, physical
19799a22 1629newline character. It is only an illusion that the operating system,
5a964f20
TC
1630device drivers, C libraries, and Perl all conspire to preserve. Not all
1631systems read C<"\r"> as ASCII CR and C<"\n"> as ASCII LF. For example,
c543c01b 1632on the ancient Macs (pre-MacOS X) of yesteryear, these used to be reversed,
ba7f043c 1633and on systems without a line terminator,
c543c01b 1634printing C<"\n"> might emit no actual data. In general, use C<"\n"> when
5a964f20
TC
1635you mean a "newline" for your system, but use the literal ASCII when you
1636need an exact character. For example, most networking protocols expect
2a380090 1637and prefer a CR+LF (C<"\015\012"> or C<"\cM\cJ">) for line terminators,
5a964f20
TC
1638and although they often accept just C<"\012">, they seldom tolerate just
1639C<"\015">. If you get in the habit of using C<"\n"> for networking,
1640you may be burned some day.
d74e8afc
ITB
1641X<newline> X<line terminator> X<eol> X<end of line>
1642X<\n> X<\r> X<\r\n>
5a964f20 1643
904501ec
MG
1644For constructs that do interpolate, variables beginning with "C<$>"
1645or "C<@>" are interpolated. Subscripted variables such as C<$a[3]> or
ad0f383a
A
1646C<< $href->{key}[0] >> are also interpolated, as are array and hash slices.
1647But method calls such as C<< $obj->meth >> are not.
af9219ee
MG
1648
1649Interpolating an array or slice interpolates the elements in order,
1650separated by the value of C<$">, so is equivalent to interpolating
ba7f043c 1651S<C<join $", @array>>. "Punctuation" arrays such as C<@*> are usually
c543c01b
TC
1652interpolated only if the name is enclosed in braces C<@{*}>, but the
1653arrays C<@_>, C<@+>, and C<@-> are interpolated even without braces.
af9219ee 1654
bc7b91c6
EB
1655For double-quoted strings, the quoting from C<\Q> is applied after
1656interpolation and escapes are processed.
1657
1658 "abc\Qfoo\tbar$s\Exyz"
1659
1660is equivalent to
1661
1662 "abc" . quotemeta("foo\tbar$s") . "xyz"
1663
1664For the pattern of regex operators (C<qr//>, C<m//> and C<s///>),
1665the quoting from C<\Q> is applied after interpolation is processed,
46f8a5ea
FC
1666but before escapes are processed. This allows the pattern to match
1667literally (except for C<$> and C<@>). For example, the following matches:
bc7b91c6
EB
1668
1669 '\s\t' =~ /\Q\s\t/
1670
1671Because C<$> or C<@> trigger interpolation, you'll need to use something
1672like C</\Quser\E\@\Qhost/> to match them literally.
1d2dff63 1673
a0d0e21e
LW
1674Patterns are subject to an additional level of interpretation as a
1675regular expression. This is done as a second pass, after variables are
1676interpolated, so that regular expressions may be incorporated into the
1677pattern from the variables. If this is not what you want, use C<\Q> to
1678interpolate a variable literally.
1679
19799a22
GS
1680Apart from the behavior described above, Perl does not expand
1681multiple levels of interpolation. In particular, contrary to the
1682expectations of shell programmers, back-quotes do I<NOT> interpolate
1683within double quotes, nor do single quotes impede evaluation of
1684variables when used within double quotes.
a0d0e21e 1685
5f05dabc 1686=head2 Regexp Quote-Like Operators
d74e8afc 1687X<operator, regexp>
cb1a09d0 1688
5f05dabc 1689Here are the quote-like operators that apply to pattern
cb1a09d0
AD
1690matching and related activities.
1691
a0d0e21e
LW
1692=over 8
1693
ba7f043c 1694=item C<qr/I<STRING>/msixpodualn>
01c6f5f4 1695X<qr> X</i> X</m> X</o> X</s> X</x> X</p>
a0d0e21e 1696
87e95b7f
YO
1697This operator quotes (and possibly compiles) its I<STRING> as a regular
1698expression. I<STRING> is interpolated the same way as I<PATTERN>
6d314683
YO
1699in C<m/I<PATTERN>/>. If C<"'"> is used as the delimiter, no variable
1700interpolation is done. Returns a Perl value which may be used instead of the
ba7f043c 1701corresponding C</I<STRING>/msixpodualn> expression. The returned value is a
46f8a5ea 1702normalized version of the original pattern. It magically differs from
1c8ee595 1703a string containing the same characters: C<ref(qr/x/)> returns "Regexp";
a727cfac 1704however, dereferencing it is not well defined (you currently get the
1c8ee595
CO
1705normalized version of the original pattern, but this may change).
1706
a0d0e21e 1707
87e95b7f
YO
1708For example,
1709
1710 $rex = qr/my.STRING/is;
85dd5c8b 1711 print $rex; # prints (?si-xm:my.STRING)
87e95b7f
YO
1712 s/$rex/foo/;
1713
1714is equivalent to
1715
1716 s/my.STRING/foo/is;
1717
1718The result may be used as a subpattern in a match:
1719
1720 $re = qr/$pattern/;
7188ca43
KW
1721 $string =~ /foo${re}bar/; # can be interpolated in other
1722 # patterns
87e95b7f
YO
1723 $string =~ $re; # or used standalone
1724 $string =~ /$re/; # or this way
1725
ba7f043c
KW
1726Since Perl may compile the pattern at the moment of execution of the C<qr()>
1727operator, using C<qr()> may have speed advantages in some situations,
1728notably if the result of C<qr()> is used standalone:
87e95b7f
YO
1729
1730 sub match {
1731 my $patterns = shift;
1732 my @compiled = map qr/$_/i, @$patterns;
1733 grep {
1734 my $success = 0;
1735 foreach my $pat (@compiled) {
1736 $success = 1, last if /$pat/;
1737 }
1738 $success;
1739 } @_;
5a964f20
TC
1740 }
1741
87e95b7f 1742Precompilation of the pattern into an internal representation at
ba7f043c 1743the moment of C<qr()> avoids the need to recompile the pattern every
87e95b7f
YO
1744time a match C</$pat/> is attempted. (Perl has many other internal
1745optimizations, but none would be triggered in the above example if
ba7f043c 1746we did not use C<qr()> operator.)
87e95b7f 1747
765fa144 1748Options (specified by the following modifiers) are:
87e95b7f
YO
1749
1750 m Treat string as multiple lines.
1751 s Treat string as single line. (Make . match a newline)
1752 i Do case-insensitive pattern matching.
77c8f263
KW
1753 x Use extended regular expressions; specifying two
1754 x's means \t and the SPACE character are ignored within
1755 square-bracketed character classes
87e95b7f 1756 p When matching preserve a copy of the matched string so
7188ca43 1757 that ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be
ba7f043c 1758 defined (ignored starting in v5.20) as these are always
1a8aad5a 1759 defined starting in that release
87e95b7f 1760 o Compile pattern only once.
8ef45c18
KW
1761 a ASCII-restrict: Use ASCII for \d, \s, \w and [[:posix:]]
1762 character classes; specifying two a's adds the further
1763 restriction that no ASCII character will match a
1764 non-ASCII one under /i.
ba7f043c 1765 l Use the current run-time locale's rules.
48cbae4f
SK
1766 u Use Unicode rules.
1767 d Use Unicode or native charset, as in 5.12 and earlier.
33be4c61 1768 n Non-capture mode. Don't let () fill in $1, $2, etc...
87e95b7f
YO
1769
1770If a precompiled pattern is embedded in a larger pattern then the effect
ba7f043c
KW
1771of C<"msixpluadn"> will be propagated appropriately. The effect that the
1772C</o> modifier has is not propagated, being restricted to those patterns
87e95b7f
YO
1773explicitly using it.
1774
b6fa137b 1775The last four modifiers listed above, added in Perl 5.14,
850b7ec9 1776control the character set rules, but C</a> is the only one you are likely
18509dec
KW
1777to want to specify explicitly; the other three are selected
1778automatically by various pragmas.
da392a17 1779
ba7f043c 1780See L<perlre> for additional information on valid syntax for I<STRING>, and
5e2aa8f5 1781for a detailed look at the semantics of regular expressions. In
1ca345ed
TC
1782particular, all modifiers except the largely obsolete C</o> are further
1783explained in L<perlre/Modifiers>. C</o> is described in the next section.
a0d0e21e 1784
ba7f043c 1785=item C<m/I<PATTERN>/msixpodualngc>
89d205f2
YO
1786X<m> X<operator, match>
1787X<regexp, options> X<regexp> X<regex, options> X<regex>
01c6f5f4 1788X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c>
a0d0e21e 1789
ba7f043c 1790=item C</I<PATTERN>/msixpodualngc>
a0d0e21e 1791
5a964f20 1792Searches a string for a pattern match, and in scalar context returns
19799a22 1793true if it succeeds, false if it fails. If no string is specified
ba7f043c 1794via the C<=~> or C<!~> operator, the C<$_> string is searched. (The
19799a22
GS
1795string specified with C<=~> need not be an lvalue--it may be the
1796result of an expression evaluation, but remember the C<=~> binds
006671a6 1797rather tightly.) See also L<perlre>.
a0d0e21e 1798
f6050459 1799Options are as described in C<qr//> above; in addition, the following match
01c6f5f4 1800process modifiers are available:
a0d0e21e 1801
950b09ed 1802 g Match globally, i.e., find all occurrences.
7188ca43
KW
1803 c Do not reset search position on a failed match when /g is
1804 in effect.
a0d0e21e 1805
ba7f043c 1806If C<"/"> is the delimiter then the initial C<m> is optional. With the C<m>
c543c01b 1807you can use any pair of non-whitespace (ASCII) characters
725a61d7 1808as delimiters. This is particularly useful for matching path names
ba7f043c 1809that contain C<"/">, to avoid LTS (leaning toothpick syndrome). If C<"?"> is
725a61d7 1810the delimiter, then a match-only-once rule applies,
ba7f043c 1811described in C<m?I<PATTERN>?> below. If C<"'"> (single quote) is the delimiter,
6d314683 1812no variable interpolation is performed on the I<PATTERN>.
ba7f043c 1813When using a delimiter character valid in an identifier, whitespace is required
ed02a3bf 1814after the C<m>.
a0d0e21e 1815
ba7f043c 1816I<PATTERN> may contain variables, which will be interpolated
532c9e80 1817every time the pattern search is evaluated, except
1f247705
GS
1818for when the delimiter is a single quote. (Note that C<$(>, C<$)>, and
1819C<$|> are not interpolated because they look like end-of-string tests.)
532c9e80
KW
1820Perl will not recompile the pattern unless an interpolated
1821variable that it contains changes. You can force Perl to skip the
1822test and never recompile by adding a C</o> (which stands for "once")
1823after the trailing delimiter.
1824Once upon a time, Perl would recompile regular expressions
1825unnecessarily, and this modifier was useful to tell it not to do so, in the
5cc41653 1826interests of speed. But now, the only reasons to use C</o> are one of:
532c9e80
KW
1827
1828=over
1829
1830=item 1
1831
1832The variables are thousands of characters long and you know that they
1833don't change, and you need to wring out the last little bit of speed by
1834having Perl skip testing for that. (There is a maintenance penalty for
1835doing this, as mentioning C</o> constitutes a promise that you won't
18509dec 1836change the variables in the pattern. If you do change them, Perl won't
532c9e80
KW
1837even notice.)
1838
1839=item 2
1840
1841you want the pattern to use the initial values of the variables
1842regardless of whether they change or not. (But there are saner ways
1843of accomplishing this than using C</o>.)
1844
fa9b8686
DM
1845=item 3
1846
1847If the pattern contains embedded code, such as
1848
1849 use re 'eval';
1850 $code = 'foo(?{ $x })';
1851 /$code/
1852
1853then perl will recompile each time, even though the pattern string hasn't
1854changed, to ensure that the current value of C<$x> is seen each time.
1855Use C</o> if you want to avoid this.
1856
532c9e80 1857=back
a0d0e21e 1858
18509dec
KW
1859The bottom line is that using C</o> is almost never a good idea.
1860
ba7f043c 1861=item The empty pattern C<//>
e9d89077 1862
ba7f043c 1863If the I<PATTERN> evaluates to the empty string, the last
46f8a5ea 1864I<successfully> matched regular expression is used instead. In this
c543c01b 1865case, only the C<g> and C<c> flags on the empty pattern are honored;
46f8a5ea 1866the other flags are taken from the original pattern. If no match has
d65afb4b
HS
1867previously succeeded, this will (silently) act instead as a genuine
1868empty pattern (which will always match).
a0d0e21e 1869
89d205f2
YO
1870Note that it's possible to confuse Perl into thinking C<//> (the empty
1871regex) is really C<//> (the defined-or operator). Perl is usually pretty
1872good about this, but some pathological cases might trigger this, such as
ba7f043c
KW
1873C<$x///> (is that S<C<($x) / (//)>> or S<C<$x // />>?) and S<C<print $fh //>>
1874(S<C<print $fh(//>> or S<C<print($fh //>>?). In all of these examples, Perl
89d205f2
YO
1875will assume you meant defined-or. If you meant the empty regex, just
1876use parentheses or spaces to disambiguate, or even prefix the empty
c963b151
BD
1877regex with an C<m> (so C<//> becomes C<m//>).
1878
e9d89077
DN
1879=item Matching in list context
1880
19799a22 1881If the C</g> option is not used, C<m//> in list context returns a
a0d0e21e 1882list consisting of the subexpressions matched by the parentheses in the
3ff8ecf9
BF
1883pattern, that is, (C<$1>, C<$2>, C<$3>...) (Note that here C<$1> etc. are
1884also set). When there are no parentheses in the pattern, the return
a727cfac 1885value is the list C<(1)> for success.
3ff8ecf9 1886With or without parentheses, an empty list is returned upon failure.
a0d0e21e
LW
1887
1888Examples:
1889
7188ca43
KW
1890 open(TTY, "+</dev/tty")
1891 || die "can't access /dev/tty: $!";
c543c01b 1892
7188ca43 1893 <TTY> =~ /^y/i && foo(); # do foo if desired
a0d0e21e 1894
7188ca43 1895 if (/Version: *([0-9.]*)/) { $version = $1; }
a0d0e21e 1896
7188ca43 1897 next if m#^/usr/spool/uucp#;
a0d0e21e 1898
7188ca43
KW
1899 # poor man's grep
1900 $arg = shift;
1901 while (<>) {
1902 print if /$arg/o; # compile only once (no longer needed!)
1903 }
a0d0e21e 1904
7188ca43 1905 if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
a0d0e21e 1906
ba7f043c
KW
1907This last example splits C<$foo> into the first two words and the
1908remainder of the line, and assigns those three fields to C<$F1>, C<$F2>, and
1909C<$Etc>. The conditional is true if any variables were assigned; that is,
c543c01b 1910if the pattern matched.
a0d0e21e 1911
19799a22 1912The C</g> modifier specifies global pattern matching--that is,
46f8a5ea
FC
1913matching as many times as possible within the string. How it behaves
1914depends on the context. In list context, it returns a list of the
19799a22 1915substrings matched by any capturing parentheses in the regular
46f8a5ea 1916expression. If there are no parentheses, it returns a list of all
19799a22
GS
1917the matched strings, as if there were parentheses around the whole
1918pattern.
a0d0e21e 1919
7e86de3e 1920In scalar context, each execution of C<m//g> finds the next match,
19799a22 1921returning true if it matches, and false if there is no further match.
3dd93342 1922The position after the last match can be read or set using the C<pos()>
46f8a5ea 1923function; see L<perlfunc/pos>. A failed match normally resets the
7e86de3e 1924search position to the beginning of the string, but you can avoid that
46f8a5ea 1925by adding the C</c> modifier (for example, C<m//gc>). Modifying the target
7e86de3e 1926string also resets the search position.
c90c0ff4 1927
ba7f043c 1928=item C<\G I<assertion>>
e9d89077 1929
c90c0ff4 1930You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a
3dd93342 1931zero-width assertion that matches the exact position where the
46f8a5ea 1932previous C<m//g>, if any, left off. Without the C</g> modifier, the
3dd93342 1933C<\G> assertion still anchors at C<pos()> as it was at the start of
1934the operation (see L<perlfunc/pos>), but the match is of course only
46f8a5ea 1935attempted once. Using C<\G> without C</g> on a target string that has
3dd93342 1936not previously had a C</g> match applied to it is the same as using
1937the C<\A> assertion to match the beginning of the string. Note also
1938that, currently, C<\G> is only properly supported when anchored at the
1939very beginning of the pattern.
c90c0ff4
PP
1940
1941Examples:
a0d0e21e
LW
1942
1943 # list context
1944 ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
1945
1946 # scalar context
c543c01b
TC
1947 local $/ = "";
1948 while ($paragraph = <>) {
1949 while ($paragraph =~ /\p{Ll}['")]*[.!?]+['")]*\s/g) {
19799a22 1950 $sentences++;
a0d0e21e
LW
1951 }
1952 }
c543c01b
TC
1953 say $sentences;
1954
1955Here's another way to check for sentences in a paragraph:
1956
7188ca43
KW
1957 my $sentence_rx = qr{
1958 (?: (?<= ^ ) | (?<= \s ) ) # after start-of-string or
1959 # whitespace
1960 \p{Lu} # capital letter
1961 .*? # a bunch of anything
1962 (?<= \S ) # that ends in non-
1963 # whitespace
1964 (?<! \b [DMS]r ) # but isn't a common abbr.
1965 (?<! \b Mrs )
1966 (?<! \b Sra )
1967 (?<! \b St )
1968 [.?!] # followed by a sentence
1969 # ender
1970 (?= $ | \s ) # in front of end-of-string
1971 # or whitespace
1972 }sx;
1973 local $/ = "";
1974 while (my $paragraph = <>) {
1975 say "NEW PARAGRAPH";
1976 my $count = 0;
1977 while ($paragraph =~ /($sentence_rx)/g) {
1978 printf "\tgot sentence %d: <%s>\n", ++$count, $1;
c543c01b 1979 }
7188ca43 1980 }
c543c01b
TC
1981
1982Here's how to use C<m//gc> with C<\G>:
a0d0e21e 1983
137443ea 1984 $_ = "ppooqppqq";
44a8e56a
PP
1985 while ($i++ < 2) {
1986 print "1: '";
c90c0ff4 1987 print $1 while /(o)/gc; print "', pos=", pos, "\n";
44a8e56a 1988 print "2: '";
c90c0ff4 1989 print $1 if /\G(q)/gc; print "', pos=", pos, "\n";
44a8e56a 1990 print "3: '";
c90c0ff4 1991 print $1 while /(p)/gc; print "', pos=", pos, "\n";
44a8e56a 1992 }
5d43e42d 1993 print "Final: '$1', pos=",pos,"\n" if /\G(.)/;
44a8e56a
PP
1994
1995The last example should print:
1996
1997 1: 'oo', pos=4
137443ea 1998 2: 'q', pos=5
44a8e56a
PP
1999 3: 'pp', pos=7
2000 1: '', pos=7
137443ea
PP
2001 2: 'q', pos=8
2002 3: '', pos=8
5d43e42d
DC
2003 Final: 'q', pos=8
2004
2005Notice that the final match matched C<q> instead of C<p>, which a match
46f8a5ea
FC
2006without the C<\G> anchor would have done. Also note that the final match
2007did not update C<pos>. C<pos> is only updated on a C</g> match. If the
c543c01b
TC
2008final match did indeed match C<p>, it's a good bet that you're running a
2009very old (pre-5.6.0) version of Perl.
44a8e56a 2010
c90c0ff4 2011A useful idiom for C<lex>-like scanners is C</\G.../gc>. You can
e7ea3e70 2012combine several regexps like this to process a string part-by-part,
c90c0ff4
PP
2013doing different actions depending on which regexp matched. Each
2014regexp tries to match where the previous one leaves off.
e7ea3e70 2015
3fe9a6f1 2016 $_ = <<'EOL';
7188ca43
KW
2017 $url = URI::URL->new( "http://example.com/" );
2018 die if $url eq "xXx";
3fe9a6f1 2019 EOL
c543c01b
TC
2020
2021 LOOP: {
950b09ed 2022 print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc;
7188ca43
KW
2023 print(" lowercase"), redo LOOP
2024 if /\G\p{Ll}+\b[,.;]?\s*/gc;
2025 print(" UPPERCASE"), redo LOOP
2026 if /\G\p{Lu}+\b[,.;]?\s*/gc;
2027 print(" Capitalized"), redo LOOP
2028 if /\G\p{Lu}\p{Ll}+\b[,.;]?\s*/gc;
c543c01b 2029 print(" MiXeD"), redo LOOP if /\G\pL+\b[,.;]?\s*/gc;
7188ca43
KW
2030 print(" alphanumeric"), redo LOOP
2031 if /\G[\p{Alpha}\pN]+\b[,.;]?\s*/gc;
c543c01b 2032 print(" line-noise"), redo LOOP if /\G\W+/gc;
950b09ed 2033 print ". That's all!\n";
c543c01b 2034 }
e7ea3e70
IZ
2035
2036Here is the output (split into several lines):
2037
7188ca43
KW
2038 line-noise lowercase line-noise UPPERCASE line-noise UPPERCASE
2039 line-noise lowercase line-noise lowercase line-noise lowercase
2040 lowercase line-noise lowercase lowercase line-noise lowercase
2041 lowercase line-noise MiXeD line-noise. That's all!
44a8e56a 2042
ba7f043c 2043=item C<m?I<PATTERN>?msixpodualngc>
725a61d7 2044X<?> X<operator, match-once>
87e95b7f 2045
ba7f043c
KW
2046This is just like the C<m/I<PATTERN>/> search, except that it matches
2047only once between calls to the C<reset()> operator. This is a useful
87e95b7f 2048optimization when you want to see only the first occurrence of
ceb131e8 2049something in each file of a set of files, for instance. Only C<m??>
87e95b7f
YO
2050patterns local to the current package are reset.
2051
2052 while (<>) {
ceb131e8 2053 if (m?^$?) {
87e95b7f
YO
2054 # blank line between header and body
2055 }
2056 } continue {
725a61d7 2057 reset if eof; # clear m?? status for next file
87e95b7f
YO
2058 }
2059
c543c01b
TC
2060Another example switched the first "latin1" encoding it finds
2061to "utf8" in a pod file:
2062
2063 s//utf8/ if m? ^ =encoding \h+ \K latin1 ?x;
2064
2065The match-once behavior is controlled by the match delimiter being
4932eeca 2066C<?>; with any other delimiter this is the normal C<m//> operator.
725a61d7 2067
ba7f043c 2068In the past, the leading C<m> in C<m?I<PATTERN>?> was optional, but omitting it
0381ecf1
MH
2069would produce a deprecation warning. As of v5.22.0, omitting it produces a
2070syntax error. If you encounter this construct in older code, you can just add
2071C<m>.
87e95b7f 2072
ba7f043c 2073=item C<s/I<PATTERN>/I<REPLACEMENT>/msixpodualngcer>
0a31ee11 2074X<s> X<substitute> X<substitution> X<replace> X<regexp, replace>
4f4d7508 2075X<regexp, substitute> X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c> X</e> X</r>
87e95b7f
YO
2076
2077Searches a string for a pattern, and if found, replaces that pattern
2078with the replacement text and returns the number of substitutions
e792e8c0
LM
2079made. Otherwise it returns false (a value that is both an empty string (C<"">)
2080and numeric zero (C<0>) as described in L</Relational Operators>).
87e95b7f 2081
c543c01b 2082If the C</r> (non-destructive) option is used then it runs the
679563bb
KW
2083substitution on a copy of the string and instead of returning the
2084number of substitutions, it returns the copy whether or not a
c543c01b
TC
2085substitution occurred. The original string is never changed when
2086C</r> is used. The copy will always be a plain string, even if the
2087input is an object or a tied variable.
4f4d7508 2088
87e95b7f 2089If no string is specified via the C<=~> or C<!~> operator, the C<$_>
c543c01b
TC
2090variable is searched and modified. Unless the C</r> option is used,
2091the string specified must be a scalar variable, an array element, a
2092hash element, or an assignment to one of those; that is, some sort of
2093scalar lvalue.
87e95b7f 2094
6d314683 2095If the delimiter chosen is a single quote, no variable interpolation is
ba7f043c
KW
2096done on either the I<PATTERN> or the I<REPLACEMENT>. Otherwise, if the
2097I<PATTERN> contains a C<$> that looks like a variable rather than an
87e95b7f
YO
2098end-of-string test, the variable will be interpolated into the pattern
2099at run-time. If you want the pattern compiled only once the first time
2100the variable is interpolated, use the C</o> option. If the pattern
2101evaluates to the empty string, the last successfully executed regular
2102expression is used instead. See L<perlre> for further explanation on these.
87e95b7f 2103
ba7f043c 2104Options are as with C<m//> with the addition of the following replacement
87e95b7f
YO
2105specific options:
2106
2107 e Evaluate the right side as an expression.
7188ca43
KW
2108 ee Evaluate the right side as a string then eval the
2109 result.
2110 r Return substitution and leave the original string
2111 untouched.
87e95b7f 2112
ed02a3bf
DN
2113Any non-whitespace delimiter may replace the slashes. Add space after
2114the C<s> when using a character allowed in identifiers. If single quotes
2115are used, no interpretation is done on the replacement string (the C</e>
3ff8ecf9 2116modifier overrides this, however). Note that Perl treats backticks
ed02a3bf 2117as normal delimiters; the replacement text is not evaluated as a command.
ba7f043c 2118If the I<PATTERN> is delimited by bracketing quotes, the I<REPLACEMENT> has
1ca345ed 2119its own pair of quotes, which may or may not be bracketing quotes, for example,
87e95b7f
YO
2120C<s(foo)(bar)> or C<< s<foo>/bar/ >>. A C</e> will cause the
2121replacement portion to be treated as a full-fledged Perl expression
2122and evaluated right then and there. It is, however, syntax checked at
46f8a5ea 2123compile-time. A second C<e> modifier will cause the replacement portion
87e95b7f
YO
2124to be C<eval>ed before being run as a Perl expression.
2125
2126Examples:
2127
7188ca43 2128 s/\bgreen\b/mauve/g; # don't change wintergreen
87e95b7f
YO
2129
2130 $path =~ s|/usr/bin|/usr/local/bin|;
2131
2132 s/Login: $foo/Login: $bar/; # run-time pattern
2133
7188ca43
KW
2134 ($foo = $bar) =~ s/this/that/; # copy first, then
2135 # change
2136 ($foo = "$bar") =~ s/this/that/; # convert to string,
2137 # copy, then change
4f4d7508
DC
2138 $foo = $bar =~ s/this/that/r; # Same as above using /r
2139 $foo = $bar =~ s/this/that/r
7188ca43
KW
2140 =~ s/that/the other/r; # Chained substitutes
2141 # using /r
2142 @foo = map { s/this/that/r } @bar # /r is very useful in
2143 # maps
87e95b7f 2144
7188ca43 2145 $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-cnt
87e95b7f
YO
2146
2147 $_ = 'abc123xyz';
2148 s/\d+/$&*2/e; # yields 'abc246xyz'
2149 s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz'
2150 s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz'
2151
2152 s/%(.)/$percent{$1}/g; # change percent escapes; no /e
2153 s/%(.)/$percent{$1} || $&/ge; # expr now, so /e
2154 s/^=(\w+)/pod($1)/ge; # use function call
2155
4f4d7508 2156 $_ = 'abc123xyz';
db691027 2157 $x = s/abc/def/r; # $x is 'def123xyz' and
4f4d7508
DC
2158 # $_ remains 'abc123xyz'.
2159
87e95b7f
YO
2160 # expand variables in $_, but dynamics only, using
2161 # symbolic dereferencing
2162 s/\$(\w+)/${$1}/g;
2163
2164 # Add one to the value of any numbers in the string
2165 s/(\d+)/1 + $1/eg;
2166
c543c01b
TC
2167 # Titlecase words in the last 30 characters only
2168 substr($str, -30) =~ s/\b(\p{Alpha}+)\b/\u\L$1/g;
2169
87e95b7f
YO
2170 # This will expand any embedded scalar variable
2171 # (including lexicals) in $_ : First $1 is interpolated
2172 # to the variable name, and then evaluated
2173 s/(\$\w+)/$1/eeg;
2174
2175 # Delete (most) C comments.
2176 $program =~ s {
2177 /\* # Match the opening delimiter.
2178 .*? # Match a minimal number of characters.
2179 \*/ # Match the closing delimiter.
2180 } []gsx;
2181
7188ca43
KW
2182 s/^\s*(.*?)\s*$/$1/; # trim whitespace in $_,
2183 # expensively
87e95b7f 2184
7188ca43
KW
2185 for ($variable) { # trim whitespace in $variable,
2186 # cheap
87e95b7f
YO
2187 s/^\s+//;
2188 s/\s+$//;
2189 }
2190
2191 s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
2192
ba7f043c
KW
2193Note the use of C<$> instead of C<\> in the last example. Unlike
2194B<sed>, we use the \<I<digit>> form only in the left hand side.
87e95b7f
YO
2195Anywhere else it's $<I<digit>>.
2196
2197Occasionally, you can't use just a C</g> to get all the changes
2198to occur that you might want. Here are two common cases:
2199
2200 # put commas in the right places in an integer
2201 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g;
2202
2203 # expand tabs to 8-column spacing
2204 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
2205
2206=back
2207
2208=head2 Quote-Like Operators
2209X<operator, quote-like>
2210
01c6f5f4
RGS
2211=over 4
2212
ba7f043c 2213=item C<q/I<STRING>/>
5d44bfff 2214X<q> X<quote, single> X<'> X<''>
a0d0e21e 2215
ba7f043c 2216=item C<'I<STRING>'>
a0d0e21e 2217
19799a22 2218A single-quoted, literal string. A backslash represents a backslash
68dc0745
PP
2219unless followed by the delimiter or another backslash, in which case
2220the delimiter or backslash is interpolated.
a0d0e21e
LW
2221
2222 $foo = q!I said, "You said, 'She said it.'"!;
2223 $bar = q('This is it.');
68dc0745 2224 $baz = '\n'; # a two-character string
a0d0e21e 2225
ba7f043c 2226=item C<qq/I<STRING>/>
d74e8afc 2227X<qq> X<quote, double> X<"> X<"">
a0d0e21e 2228
ba7f043c 2229=item "I<STRING>"
a0d0e21e
LW
2230
2231A double-quoted, interpolated string.
2232
2233 $_ .= qq
2234 (*** The previous line contains the naughty word "$1".\n)
19799a22 2235 if /\b(tcl|java|python)\b/i; # :-)
68dc0745 2236 $baz = "\n"; # a one-character string
a0d0e21e 2237
ba7f043c 2238=item C<qx/I<STRING>/>
d74e8afc 2239X<qx> X<`> X<``> X<backtick>
a0d0e21e 2240
ba7f043c 2241=item C<`I<STRING>`>
a0d0e21e 2242
43dd4d21 2243A string which is (possibly) interpolated and then executed as a
f703fc96 2244system command with F</bin/sh> or its equivalent. Shell wildcards,
43dd4d21
JH
2245pipes, and redirections will be honored. The collected standard
2246output of the command is returned; standard error is unaffected. In
2247scalar context, it comes back as a single (potentially multi-line)
ba7f043c
KW
2248string, or C<undef> if the command failed. In list context, returns a
2249list of lines (however you've defined lines with C<$/> or
2250C<$INPUT_RECORD_SEPARATOR>), or an empty list if the command failed.
5a964f20
TC
2251
2252Because backticks do not affect standard error, use shell file descriptor
2253syntax (assuming the shell supports this) if you care to address this.
2254To capture a command's STDERR and STDOUT together:
a0d0e21e 2255
5a964f20
TC
2256 $output = `cmd 2>&1`;
2257
2258To capture a command's STDOUT but discard its STDERR:
2259
2260 $output = `cmd 2>/dev/null`;
2261
2262To capture a command's STDERR but discard its STDOUT (ordering is
2263important here):
2264
2265 $output = `cmd 2>&1 1>/dev/null`;
2266
2267To exchange a command's STDOUT and STDERR in order to capture the STDERR
2268but leave its STDOUT to come out the old STDERR:
2269
2270 $output = `cmd 3>&1 1>&2 2>&3 3>&-`;
2271
2272To read both a command's STDOUT and its STDERR separately, it's easiest
2359510d
SD
2273to redirect them separately to files, and then read from those files
2274when the program is done:
5a964f20 2275
2359510d 2276 system("program args 1>program.stdout 2>program.stderr");
5a964f20 2277
30398227
SP
2278The STDIN filehandle used by the command is inherited from Perl's STDIN.
2279For example:
2280
c543c01b
TC
2281 open(SPLAT, "stuff") || die "can't open stuff: $!";
2282 open(STDIN, "<&SPLAT") || die "can't dupe SPLAT: $!";
40bbb707 2283 print STDOUT `sort`;
30398227 2284
40bbb707 2285will print the sorted contents of the file named F<"stuff">.
30398227 2286
5a964f20
TC
2287Using single-quote as a delimiter protects the command from Perl's
2288double-quote interpolation, passing it on to the shell instead:
2289
2290 $perl_info = qx(ps $$); # that's Perl's $$
2291 $shell_info = qx'ps $$'; # that's the new shell's $$
2292
19799a22 2293How that string gets evaluated is entirely subject to the command
5a964f20
TC
2294interpreter on your system. On most platforms, you will have to protect
2295shell metacharacters if you want them treated literally. This is in
2296practice difficult to do, as it's unclear how to escape which characters.
ba7f043c 2297See L<perlsec> for a clean and safe example of a manual C<fork()> and C<exec()>
5a964f20 2298to emulate backticks safely.
a0d0e21e 2299
bb32b41a
GS
2300On some platforms (notably DOS-like ones), the shell may not be
2301capable of dealing with multiline commands, so putting newlines in
2302the string may not get you what you want. You may be able to evaluate
2303multiple commands in a single line by separating them with the command
a727cfac 2304separator character, if your shell supports that (for example, C<;> on
1ca345ed 2305many Unix shells and C<&> on the Windows NT C<cmd> shell).
bb32b41a 2306
3ff8ecf9 2307Perl will attempt to flush all files opened for
0f897271
GS
2308output before starting the child process, but this may not be supported
2309on some platforms (see L<perlport>). To be safe, you may need to set
ba7f043c
KW
2310C<$|> (C<$AUTOFLUSH> in C<L<English>>) or call the C<autoflush()> method of
2311C<L<IO::Handle>> on any open handles.
0f897271 2312
bb32b41a
GS
2313Beware that some command shells may place restrictions on the length
2314of the command line. You must ensure your strings don't exceed this
2315limit after any necessary interpolations. See the platform-specific
2316release notes for more details about your particular environment.
2317
5a964f20
TC
2318Using this operator can lead to programs that are difficult to port,
2319because the shell commands called vary between systems, and may in
2320fact not be present at all. As one example, the C<type> command under
2321the POSIX shell is very different from the C<type> command under DOS.
2322That doesn't mean you should go out of your way to avoid backticks
2323when they're the right way to get something done. Perl was made to be
2324a glue language, and one of the things it glues together is commands.
2325Just understand what you're getting yourself into.
bb32b41a 2326
7cf4dd3e
DB
2327Like C<system>, backticks put the child process exit code in C<$?>.
2328If you'd like to manually inspect failure, you can check all possible
2329failure modes by inspecting C<$?> like this:
2330
2331 if ($? == -1) {
2332 print "failed to execute: $!\n";
2333 }
2334 elsif ($? & 127) {
2335 printf "child died with signal %d, %s coredump\n",
2336 ($? & 127), ($? & 128) ? 'with' : 'without';
2337 }
2338 else {
2339 printf "child exited with value %d\n", $? >> 8;
2340 }
2341
fe43a9cc
TC
2342Use the L<open> pragma to control the I/O layers used when reading the
2343output of the command, for example:
2344
2345 use open IN => ":encoding(UTF-8)";
2346 my $x = `cmd-producing-utf-8`;
2347
da87341d 2348See L</"I/O Operators"> for more discussion.
a0d0e21e 2349
ba7f043c 2350=item C<qw/I<STRING>/>
d74e8afc 2351X<qw> X<quote, list> X<quote, words>
945c54fd 2352
ba7f043c 2353Evaluates to a list of the words extracted out of I<STRING>, using embedded
945c54fd
JH
2354whitespace as the word delimiters. It can be understood as being roughly
2355equivalent to:
2356
c543c01b 2357 split(" ", q/STRING/);
945c54fd 2358
5a9c3bf4
Z
2359the differences being that it only splits on ASCII whitespace,
2360generates a real list at compile time, and
efb1e162 2361in scalar context it returns the last element in the list. So
945c54fd
JH
2362this expression:
2363
2364 qw(foo bar baz)
2365
2366is semantically equivalent to the list:
2367
c543c01b 2368 "foo", "bar", "baz"
945c54fd
JH
2369
2370Some frequently seen examples:
2371
2372 use POSIX qw( setlocale localeconv )
2373 @EXPORT = qw( foo bar baz );
2374
ba7f043c 2375A common mistake is to try to separate the words with commas or to
945c54fd 2376put comments into a multi-line C<qw>-string. For this reason, the
ba7f043c
KW
2377S<C<use warnings>> pragma and the B<-w> switch (that is, the C<$^W> variable)
2378produces warnings if the I<STRING> contains the C<","> or the C<"#"> character.
945c54fd 2379
ba7f043c 2380=item C<tr/I<SEARCHLIST>/I<REPLACEMENTLIST>/cdsr>
d74e8afc 2381X<tr> X<y> X<transliterate> X</c> X</d> X</s>
a0d0e21e 2382
ba7f043c 2383=item C<y/I<SEARCHLIST>/I<REPLACEMENTLIST>/cdsr>
a0d0e21e 2384
2c268ad5 2385Transliterates all occurrences of the characters found in the search list
a0d0e21e
LW
2386with the corresponding character in the replacement list. It returns
2387the number of characters replaced or deleted. If no string is
ba7f043c 2388specified via the C<=~> or C<!~> operator, the C<$_> string is transliterated.
c543c01b
TC
2389
2390If the C</r> (non-destructive) option is present, a new copy of the string
2391is made and its characters transliterated, and this copy is returned no
2392matter whether it was modified or not: the original string is always
2393left unchanged. The new copy is always a plain string, even if the input
2394string is an object or a tied variable.
8ada0baa 2395
c543c01b
TC
2396Unless the C</r> option is used, the string specified with C<=~> must be a
2397scalar variable, an array element, a hash element, or an assignment to one
2398of those; in other words, an lvalue.
8ff32507 2399
89d205f2 2400A character range may be specified with a hyphen, so C<tr/A-J/0-9/>
2c268ad5 2401does the same replacement as C<tr/ACEGIBDFHJ/0246813579/>.
54310121 2402For B<sed> devotees, C<y> is provided as a synonym for C<tr>. If the
af2cbe4d
KW
2403I<SEARCHLIST> is delimited by bracketing quotes, the I<REPLACEMENTLIST>
2404must have its own pair of quotes, which may or may not be bracketing
2405quotes; for example, C<tr[aeiouy][yuoiea]> or C<tr(+\-*/)/ABCD/>.
c543c01b 2406
ba7f043c 2407Characters may be literals or any of the escape sequences accepted in
6d314683
YO
2408double-quoted strings. But there is no variable interpolation, so C<"$">
2409and C<"@"> are treated as literals. A hyphen at the beginning or end, or
ba7f043c
KW
2410preceded by a backslash is considered a literal. Escape sequence
2411details are in L<the table near the beginning of this section|/Quote and
f4240379 2412Quote-like Operators>.
ba7f043c 2413
c543c01b 2414Note that C<tr> does B<not> do regular expression character classes such as
ba7f043c 2415C<\d> or C<\pL>. The C<tr> operator is not equivalent to the C<L<tr(1)>>
af2cbe4d
KW
2416utility. C<tr[a-z][A-Z]> will uppercase the 26 letters "a" through "z",
2417but for case changing not confined to ASCII, use
2418L<C<lc>|perlfunc/lc>, L<C<uc>|perlfunc/uc>,
2419L<C<lcfirst>|perlfunc/lcfirst>, L<C<ucfirst>|perlfunc/ucfirst>
2420(all documented in L<perlfunc>), or the
2421L<substitution operator C<sE<sol>I<PATTERN>E<sol>I<REPLACEMENT>E<sol>>|/sE<sol>PATTERNE<sol>REPLACEMENTE<sol>msixpodualngcer>
2422(with C<\U>, C<\u>, C<\L>, and C<\l> string-interpolation escapes in the
2423I<REPLACEMENT> portion).
cc255d5f 2424
f4240379
KW
2425Most ranges are unportable between character sets, but certain ones
2426signal Perl to do special handling to make them portable. There are two
2427classes of portable ranges. The first are any subsets of the ranges
2428C<A-Z>, C<a-z>, and C<0-9>, when expressed as literal characters.
2429
2430 tr/h-k/H-K/
2431
2432capitalizes the letters C<"h">, C<"i">, C<"j">, and C<"k"> and nothing
2433else, no matter what the platform's character set is. In contrast, all
2434of
2435
2436 tr/\x68-\x6B/\x48-\x4B/
2437 tr/h-\x6B/H-\x4B/
2438 tr/\x68-k/\x48-K/
2439
2440do the same capitalizations as the previous example when run on ASCII
2441platforms, but something completely different on EBCDIC ones.
2442
2443The second class of portable ranges is invoked when one or both of the
2444range's end points are expressed as C<\N{...}>
2445
2446 $string =~ tr/\N{U+20}-\N{U+7E}//d;
2447
2448removes from C<$string> all the platform's characters which are
2449equivalent to any of Unicode U+0020, U+0021, ... U+007D, U+007E. This
2450is a portable range, and has the same effect on every platform it is
2451run on. It turns out that in this example, these are the ASCII
2452printable characters. So after this is run, C<$string> has only
2453controls and characters which have no ASCII equivalents.
2454
2455But, even for portable ranges, it is not generally obvious what is
2456included without having to look things up. A sound principle is to use
2457only ranges that begin from and end at either ASCII alphabetics of equal
8df98a27 2458case (C<b-e>, C<B-E>), or digits (C<1-4>). Anything else is unclear
f4240379 2459(and unportable unless C<\N{...}> is used). If in doubt, spell out the
8ada0baa
JH
2460character sets in full.
2461
a0d0e21e
LW
2462Options:
2463
2464 c Complement the SEARCHLIST.
2465 d Delete found but unreplaced characters.
2466 s Squash duplicate replaced characters.
8ff32507
FC
2467 r Return the modified string and leave the original string
2468 untouched.
a0d0e21e 2469
ba7f043c 2470If the C</c> modifier is specified, the I<SEARCHLIST> character set
19799a22 2471is complemented. If the C</d> modifier is specified, any characters
ba7f043c 2472specified by I<SEARCHLIST> not found in I<REPLACEMENTLIST> are deleted.
19799a22 2473(Note that this is slightly more flexible than the behavior of some
ba7f043c 2474B<tr> programs, which delete anything they find in the I<SEARCHLIST>,
46f8a5ea 2475period.) If the C</s> modifier is specified, sequences of characters
19799a22
GS
2476that were transliterated to the same character are squashed down
2477to a single instance of the character.
a0d0e21e 2478
ba7f043c
KW
2479If the C</d> modifier is used, the I<REPLACEMENTLIST> is always interpreted
2480exactly as specified. Otherwise, if the I<REPLACEMENTLIST> is shorter
2481than the I<SEARCHLIST>, the final character is replicated till it is long
2482enough. If the I<REPLACEMENTLIST> is empty, the I<SEARCHLIST> is replicated.
a0d0e21e
LW
2483This latter is useful for counting characters in a class or for
2484squashing character sequences in a class.
2485
2486Examples:
2487
c543c01b 2488 $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case ASCII
a0d0e21e
LW
2489
2490 $cnt = tr/*/*/; # count the stars in $_
2491
2492 $cnt = $sky =~ tr/*/*/; # count the stars in $sky
2493
2494 $cnt = tr/0-9//; # count the digits in $_
2495
2496 tr/a-zA-Z//s; # bookkeeper -> bokeper
2497
2498 ($HOST = $host) =~ tr/a-z/A-Z/;
c543c01b 2499 $HOST = $host =~ tr/a-z/A-Z/r; # same thing
8ff32507 2500
c543c01b 2501 $HOST = $host =~ tr/a-z/A-Z/r # chained with s///r
8ff32507 2502 =~ s/:/ -p/r;
a0d0e21e
LW
2503
2504 tr/a-zA-Z/ /cs; # change non-alphas to single space
2505
8ff32507
FC
2506 @stripped = map tr/a-zA-Z/ /csr, @original;
2507 # /r with map
2508
a0d0e21e 2509 tr [\200-\377]
c543c01b 2510 [\000-\177]; # wickedly delete 8th bit
a0d0e21e 2511
19799a22
GS
2512If multiple transliterations are given for a character, only the
2513first one is used:
748a9306
LW
2514
2515 tr/AAA/XYZ/
2516
2c268ad5 2517will transliterate any A to X.
748a9306 2518
19799a22 2519Because the transliteration table is built at compile time, neither
ba7f043c 2520the I<SEARCHLIST> nor the I<REPLACEMENTLIST> are subjected to double quote
19799a22 2521interpolation. That means that if you want to use variables, you
ba7f043c 2522must use an C<eval()>:
a0d0e21e
LW
2523
2524 eval "tr/$oldlist/$newlist/";
2525 die $@ if $@;
2526
2527 eval "tr/$oldlist/$newlist/, 1" or die $@;
2528
ba7f043c 2529=item C<< <<I<EOF> >>
d74e8afc 2530X<here-doc> X<heredoc> X<here-document> X<<< << >>>
7e3b091d
DA
2531
2532A line-oriented form of quoting is based on the shell "here-document"
2533syntax. Following a C<< << >> you specify a string to terminate
2534the quoted material, and all lines following the current line down to
89d205f2
YO
2535the terminating string are the value of the item.
2536
47eb4411
MH
2537Prefixing the terminating string with a C<~> specifies that you
2538want to use L</Indented Here-docs> (see below).
2539
89d205f2
YO
2540The terminating string may be either an identifier (a word), or some
2541quoted text. An unquoted identifier works like double quotes.
2542There may not be a space between the C<< << >> and the identifier,
2543unless the identifier is explicitly quoted. (If you put a space it
2544will be treated as a null identifier, which is valid, and matches the
2545first empty line.) The terminating string must appear by itself
2546(unquoted and with no surrounding whitespace) on the terminating line.
2547
2548If the terminating string is quoted, the type of quotes used determine
2549the treatment of the text.
2550
2551=over 4
2552
2553=item Double Quotes
2554
2555Double quotes indicate that the text will be interpolated using exactly
2556the same rules as normal double quoted strings.
7e3b091d
DA
2557
2558 print <<EOF;
2559 The price is $Price.
2560 EOF
2561
2562 print << "EOF"; # same as above
2563 The price is $Price.
2564 EOF
2565
89d205f2
YO
2566
2567=item Single Quotes
2568
2569Single quotes indicate the text is to be treated literally with no
46f8a5ea 2570interpolation of its content. This is similar to single quoted
89d205f2
YO
2571strings except that backslashes have no special meaning, with C<\\>
2572being treated as two backslashes and not one as they would in every
2573other quoting construct.
2574
c543c01b
TC
2575Just as in the shell, a backslashed bareword following the C<<< << >>>
2576means the same thing as a single-quoted string does:
2577
2578 $cost = <<'VISTA'; # hasta la ...
2579 That'll be $10 please, ma'am.
2580 VISTA
2581
2582 $cost = <<\VISTA; # Same thing!
2583 That'll be $10 please, ma'am.
2584 VISTA
2585
89d205f2
YO
2586This is the only form of quoting in perl where there is no need
2587to worry about escaping content, something that code generators
2588can and do make good use of.
2589
2590=item Backticks
2591
2592The content of the here doc is treated just as it would be if the
46f8a5ea 2593string were embedded in backticks. Thus the content is interpolated
89d205f2
YO
2594as though it were double quoted and then executed via the shell, with
2595the results of the execution returned.
2596
2597 print << `EOC`; # execute command and get results
7e3b091d 2598 echo hi there
7e3b091d
DA
2599 EOC
2600
89d205f2
YO
2601=back
2602
47eb4411
MH
2603=over 4
2604
2605=item Indented Here-docs
2606
2607The here-doc modifier C<~> allows you to indent your here-docs to make
2608the code more readable:
2609
2610 if ($some_var) {
2611 print <<~EOF;
2612 This is a here-doc
2613 EOF
2614 }
2615
2616This will print...
2617
2618 This is a here-doc
2619
2620...with no leading whitespace.
2621
2622The delimiter is used to determine the B<exact> whitespace to
2623remove from the beginning of each line. All lines B<must> have
2624at least the same starting whitespace (except lines only
2625containing a newline) or perl will croak. Tabs and spaces can
2626be mixed, but are matched exactly. One tab will not be equal to
26278 spaces!
2628
2629Additional beginning whitespace (beyond what preceded the
2630delimiter) will be preserved:
2631
2632 print <<~EOF;
2633 This text is not indented
2634 This text is indented with two spaces
2635 This text is indented with two tabs
2636 EOF
2637
2638Finally, the modifier may be used with all of the forms
2639mentioned above:
2640
2641 <<~\EOF;
2642 <<~'EOF'
2643 <<~"EOF"
2644 <<~`EOF`
2645
2646And whitespace may be used between the C<~> and quoted delimiters:
2647
2648 <<~ 'EOF'; # ... "EOF", `EOF`
2649
2650=back
2651
89d205f2
YO
2652It is possible to stack multiple here-docs in a row:
2653
7e3b091d
DA
2654 print <<"foo", <<"bar"; # you can stack them
2655 I said foo.
2656 foo
2657 I said bar.
2658 bar
2659
2660 myfunc(<< "THIS", 23, <<'THAT');
2661 Here's a line
2662 or two.
2663 THIS
2664 and here's another.
2665 THAT
2666
2667Just don't forget that you have to put a semicolon on the end
2668to finish the statement, as Perl doesn't know you're not going to
2669try to do this:
2670
2671 print <<ABC
2672 179231
2673 ABC
2674 + 20;
2675
872d7e53
ST
2676If you want to remove the line terminator from your here-docs,
2677use C<chomp()>.
2678
2679 chomp($string = <<'END');
2680 This is a string.
2681 END
2682
2683If you want your here-docs to be indented with the rest of the code,
377a7450 2684use the C<<< <<~FOO >>> construct described under L</Indented Here-docs>:
7e3b091d 2685
377a7450 2686 $quote = <<~'FINIS';
89d205f2 2687 The Road goes ever on and on,
7e3b091d 2688 down from the door where it began.
377a7450 2689 FINIS
7e3b091d
DA
2690
2691If you use a here-doc within a delimited construct, such as in C<s///eg>,
1bf48760
FC
2692the quoted material must still come on the line following the
2693C<<< <<FOO >>> marker, which means it may be inside the delimited
2694construct:
7e3b091d
DA
2695
2696 s/this/<<E . 'that'
2697 the other
2698 E
2699 . 'more '/eg;
2700
1bf48760
FC
2701It works this way as of Perl 5.18. Historically, it was inconsistent, and
2702you would have to write
7e3b091d 2703
89d205f2
YO
2704 s/this/<<E . 'that'
2705 . 'more '/eg;
2706 the other
2707 E
7e3b091d 2708
1bf48760
FC
2709outside of string evals.
2710
c543c01b 2711Additionally, quoting rules for the end-of-string identifier are
46f8a5ea 2712unrelated to Perl's quoting rules. C<q()>, C<qq()>, and the like are not
89d205f2
YO
2713supported in place of C<''> and C<"">, and the only interpolation is for
2714backslashing the quoting character:
7e3b091d
DA
2715
2716 print << "abc\"def";
2717 testing...
2718 abc"def
2719
2720Finally, quoted strings cannot span multiple lines. The general rule is
2721that the identifier must be a string literal. Stick with that, and you
2722should be safe.
2723
a0d0e21e
LW
2724=back
2725
75e14d17 2726=head2 Gory details of parsing quoted constructs
d74e8afc 2727X<quote, gory details>
75e14d17 2728
19799a22
GS
2729When presented with something that might have several different
2730interpretations, Perl uses the B<DWIM> (that's "Do What I Mean")
2731principle to pick the most probable interpretation. This strategy
2732is so successful that Perl programmers often do not suspect the
2733ambivalence of what they write. But from time to time, Perl's
2734notions differ substantially from what the author honestly meant.
2735
2736This section hopes to clarify how Perl handles quoted constructs.
2737Although the most common reason to learn this is to unravel labyrinthine
2738regular expressions, because the initial steps of parsing are the
2739same for all quoting operators, they are all discussed together.
2740
2741The most important Perl parsing rule is the first one discussed
2742below: when processing a quoted construct, Perl first finds the end
2743of that construct, then interprets its contents. If you understand
2744this rule, you may skip the rest of this section on the first
2745reading. The other rules are likely to contradict the user's
2746expectations much less frequently than this first one.
2747
2748Some passes discussed below are performed concurrently, but because
2749their results are the same, we consider them individually. For different
2750quoting constructs, Perl performs different numbers of passes, from
6deea57f 2751one to four, but these passes are always performed in the same order.
75e14d17 2752
13a2d996 2753=over 4
75e14d17
IZ
2754
2755=item Finding the end
2756
ba7f043c
KW
2757The first pass is finding the end of the quoted construct. This results
2758in saving to a safe location a copy of the text (between the starting
2759and ending delimiters), normalized as necessary to avoid needing to know
2760what the original delimiters were.
6deea57f
ST
2761
2762If the construct is a here-doc, the ending delimiter is a line
46f8a5ea 2763that has a terminating string as the content. Therefore C<<<EOF> is
6deea57f
ST
2764terminated by C<EOF> immediately followed by C<"\n"> and starting
2765from the first column of the terminating line.
2766When searching for the terminating line of a here-doc, nothing
46f8a5ea 2767is skipped. In other words, lines after the here-doc syntax
6deea57f
ST
2768are compared with the terminating string line by line.
2769
2770For the constructs except here-docs, single characters are used as starting
46f8a5ea 2771and ending delimiters. If the starting delimiter is an opening punctuation
6deea57f
ST
2772(that is C<(>, C<[>, C<{>, or C<< < >>), the ending delimiter is the
2773corresponding closing punctuation (that is C<)>, C<]>, C<}>, or C<< > >>).
2774If the starting delimiter is an unpaired character like C</> or a closing
ba7f043c 2775punctuation, the ending delimiter is the same as the starting delimiter.
6deea57f 2776Therefore a C</> terminates a C<qq//> construct, while a C<]> terminates
fc693347 2777both C<qq[]> and C<qq]]> constructs.
6deea57f
ST
2778
2779When searching for single-character delimiters, escaped delimiters
1ca345ed 2780and C<\\> are skipped. For example, while searching for terminating C</>,
6deea57f
ST
2781combinations of C<\\> and C<\/> are skipped. If the delimiters are
2782bracketing, nested pairs are also skipped. For example, while searching
ba7f043c 2783for a closing C<]> paired with the opening C<[>, combinations of C<\\>, C<\]>,
6deea57f
ST
2784and C<\[> are all skipped, and nested C<[> and C<]> are skipped as well.
2785However, when backslashes are used as the delimiters (like C<qq\\> and
2786C<tr\\\>), nothing is skipped.
32581033 2787During the search for the end, backslashes that escape delimiters or
7188ca43 2788other backslashes are removed (exactly speaking, they are not copied to the
32581033 2789safe location).
75e14d17 2790
19799a22
GS
2791For constructs with three-part delimiters (C<s///>, C<y///>, and
2792C<tr///>), the search is repeated once more.
fc693347 2793If the first delimiter is not an opening punctuation, the three delimiters must
d74605e5
FC
2794be the same, such as C<s!!!> and C<tr)))>,
2795in which case the second delimiter
6deea57f 2796terminates the left part and starts the right part at once.
b6538e4f 2797If the left part is delimited by bracketing punctuation (that is C<()>,
6deea57f 2798C<[]>, C<{}>, or C<< <> >>), the right part needs another pair of
b6538e4f 2799delimiters such as C<s(){}> and C<tr[]//>. In these cases, whitespace
ba7f043c 2800and comments are allowed between the two parts, although the comment must follow
a727cfac 2801at least one whitespace character; otherwise a character expected as the
b6538e4f 2802start of the comment may be regarded as the starting delimiter of the right part.
75e14d17 2803
19799a22
GS
2804During this search no attention is paid to the semantics of the construct.
2805Thus:
75e14d17
IZ
2806
2807 "$hash{"$foo/$bar"}"
2808
2a94b7ce 2809or:
75e14d17 2810
89d205f2 2811 m/
2a94b7ce 2812 bar # NOT a comment, this slash / terminated m//!
75e14d17
IZ
2813 /x
2814
19799a22
GS
2815do not form legal quoted expressions. The quoted part ends on the
2816first C<"> and C</>, and the rest happens to be a syntax error.
2817Because the slash that terminated C<m//> was followed by a C<SPACE>,
2818the example above is not C<m//x>, but rather C<m//> with no C</x>
2819modifier. So the embedded C<#> is interpreted as a literal C<#>.
75e14d17 2820
89d205f2 2821Also no attention is paid to C<\c\> (multichar control char syntax) during
46f8a5ea 2822this search. Thus the second C<\> in C<qq/\c\/> is interpreted as a part
89d205f2 2823of C<\/>, and the following C</> is not recognized as a delimiter.
0d594e51
ST
2824Instead, use C<\034> or C<\x1c> at the end of quoted constructs.
2825
75e14d17 2826=item Interpolation
d74e8afc 2827X<interpolation>
75e14d17 2828
19799a22 2829The next step is interpolation in the text obtained, which is now
89d205f2 2830delimiter-independent. There are multiple cases.
75e14d17 2831
13a2d996 2832=over 4
75e14d17 2833
89d205f2 2834=item C<<<'EOF'>
75e14d17
IZ
2835
2836No interpolation is performed.
6deea57f
ST
2837Note that the combination C<\\> is left intact, since escaped delimiters
2838are not available for here-docs.
75e14d17 2839
6deea57f 2840=item C<m''>, the pattern of C<s'''>
89d205f2 2841
6deea57f
ST
2842No interpolation is performed at this stage.
2843Any backslashed sequences including C<\\> are treated at the stage
2844to L</"parsing regular expressions">.
89d205f2 2845
6deea57f 2846=item C<''>, C<q//>, C<tr'''>, C<y'''>, the replacement of C<s'''>
75e14d17 2847
89d205f2 2848The only interpolation is removal of C<\> from pairs of C<\\>.
ba7f043c 2849Therefore C<"-"> in C<tr'''> and C<y'''> is treated literally
6deea57f
ST
2850as a hyphen and no character range is available.
2851C<\1> in the replacement of C<s'''> does not work as C<$1>.
89d205f2
YO
2852
2853=item C<tr///>, C<y///>
2854
6deea57f
ST
2855No variable interpolation occurs. String modifying combinations for
2856case and quoting such as C<\Q>, C<\U>, and C<\E> are not recognized.
2857The other escape sequences such as C<\200> and C<\t> and backslashed
2858characters such as C<\\> and C<\-> are converted to appropriate literals.
ba7f043c
KW
2859The character C<"-"> is treated specially and therefore C<\-> is treated
2860as a literal C<"-">.
75e14d17 2861
89d205f2 2862=item C<"">, C<``>, C<qq//>, C<qx//>, C<< <file*glob> >>, C<<<"EOF">
75e14d17 2863
628253b8 2864C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\F> (possibly paired with C<\E>) are
19799a22 2865converted to corresponding Perl constructs. Thus, C<"$foo\Qbaz$bar">
ba7f043c 2866is converted to S<C<$foo . (quotemeta("baz" . $bar))>> internally.
6deea57f
ST
2867The other escape sequences such as C<\200> and C<\t> and backslashed
2868characters such as C<\\> and C<\-> are replaced with appropriate
2869expansions.
2a94b7ce 2870
19799a22
GS
2871Let it be stressed that I<whatever falls between C<\Q> and C<\E>>
2872is interpolated in the usual way. Something like C<"\Q\\E"> has
48cbae4f 2873no C<\E> inside. Instead, it has C<\Q>, C<\\>, and C<E>, so the
19799a22
GS
2874result is the same as for C<"\\\\E">. As a general rule, backslashes
2875between C<\Q> and C<\E> may lead to counterintuitive results. So,
2876C<"\Q\t\E"> is converted to C<quotemeta("\t")>, which is the same
2877as C<"\\\t"> (since TAB is not alphanumeric). Note also that:
2a94b7ce
IZ
2878
2879 $str = '\t';
2880 return "\Q$str";
2881
2882may be closer to the conjectural I<intention> of the writer of C<"\Q\t\E">.
2883
19799a22 2884Interpolated scalars and arrays are converted internally to the C<join> and
ba7f043c 2885C<"."> catenation operations. Thus, S<C<"$foo XXX '@arr'">> becomes:
75e14d17 2886
19799a22 2887 $foo . " XXX '" . (join $", @arr) . "'";
75e14d17 2888
19799a22 2889All operations above are performed simultaneously, left to right.
75e14d17 2890
ba7f043c 2891Because the result of S<C<"\Q I<STRING> \E">> has all metacharacters
19799a22 2892quoted, there is no way to insert a literal C<$> or C<@> inside a
ba7f043c 2893C<\Q\E> pair. If protected by C<\>, C<$> will be quoted to become
19799a22
GS
2894C<"\\\$">; if not, it is interpreted as the start of an interpolated
2895scalar.
75e14d17 2896
19799a22 2897Note also that the interpolation code needs to make a decision on
89d205f2 2898where the interpolated scalar ends. For instance, whether
ba7f043c 2899S<C<< "a $x -> {c}" >>> really means:
75e14d17 2900
db691027 2901 "a " . $x . " -> {c}";
75e14d17 2902
2a94b7ce 2903or:
75e14d17 2904
db691027 2905 "a " . $x -> {c};
75e14d17 2906
19799a22
GS
2907Most of the time, the longest possible text that does not include
2908spaces between components and which contains matching braces or
2909brackets. because the outcome may be determined by voting based
2910on heuristic estimators, the result is not strictly predictable.
2911Fortunately, it's usually correct for ambiguous cases.
75e14d17 2912
6deea57f 2913=item the replacement of C<s///>
75e14d17 2914
628253b8 2915Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\F> and interpolation
6deea57f
ST
2916happens as with C<qq//> constructs.
2917
2918It is at this step that C<\1> is begrudgingly converted to C<$1> in
2919the replacement text of C<s///>, in order to correct the incorrigible
2920I<sed> hackers who haven't picked up the saner idiom yet. A warning
ba7f043c 2921is emitted if the S<C<use warnings>> pragma or the B<-w> command-line flag
6deea57f
ST
2922(that is, the C<$^W> variable) was set.
2923
9c6deb98 2924=item C<RE> in C<m?RE?>, C</RE/>, C<m/RE/>, C<s/RE/foo/>,
6deea57f 2925
628253b8 2926Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\F>, C<\E>,
cc74c5bd
ST
2927and interpolation happens (almost) as with C<qq//> constructs.
2928
5d03b57c
KW
2929Processing of C<\N{...}> is also done here, and compiled into an intermediate
2930form for the regex compiler. (This is because, as mentioned below, the regex
2931compilation may be done at execution time, and C<\N{...}> is a compile-time
2932construct.)
2933
cc74c5bd
ST
2934However any other combinations of C<\> followed by a character
2935are not substituted but only skipped, in order to parse them
2936as regular expressions at the following step.
6deea57f 2937As C<\c> is skipped at this step, C<@> of C<\c@> in RE is possibly
1749ea0d 2938treated as an array symbol (for example C<@foo>),
6deea57f 2939even though the same text in C<qq//> gives interpolation of C<\c@>.
6deea57f 2940
e128ab2c
DM
2941Code blocks such as C<(?{BLOCK})> are handled by temporarily passing control
2942back to the perl parser, in a similar way that an interpolated array
2943subscript expression such as C<"foo$array[1+f("[xyz")]bar"> would be.
2944
ba7f043c
KW
2945Moreover, inside C<(?{BLOCK})>, S<C<(?# comment )>>, and
2946a C<#>-comment in a C</x>-regular expression, no processing is
19799a22 2947performed whatsoever. This is the first step at which the presence
ba7f043c 2948of the C</x> modifier is relevant.
19799a22 2949
1749ea0d
ST
2950Interpolation in patterns has several quirks: C<$|>, C<$(>, C<$)>, C<@+>
2951and C<@-> are not interpolated, and constructs C<$var[SOMETHING]> are
2952voted (by several different estimators) to be either an array element
2953or C<$var> followed by an RE alternative. This is where the notation
19799a22
GS
2954C<${arr[$bar]}> comes handy: C</${arr[0-9]}/> is interpreted as
2955array element C<-9>, not as a regular expression from the variable
2956C<$arr> followed by a digit, which would be the interpretation of
2957C</$arr[0-9]/>. Since voting among different estimators may occur,
2958the result is not predictable.
2959
19799a22
GS
2960The lack of processing of C<\\> creates specific restrictions on
2961the post-processed text. If the delimiter is C</>, one cannot get
2962the combination C<\/> into the result of this step. C</> will
2963finish the regular expression, C<\/> will be stripped to C</> on
2964the previous step, and C<\\/> will be left as is. Because C</> is
2965equivalent to C<\/> inside a regular expression, this does not
2966matter unless the delimiter happens to be character special to the
9c6deb98 2967RE engine, such as in C<s*foo*bar*>, C<m[foo]>, or C<m?foo?>; or an
19799a22 2968alphanumeric char, as in:
2a94b7ce
IZ
2969
2970 m m ^ a \s* b mmx;
2971
19799a22 2972In the RE above, which is intentionally obfuscated for illustration, the
6deea57f 2973delimiter is C<m>, the modifier is C<mx>, and after delimiter-removal the
ba7f043c 2974RE is the same as for S<C<m/ ^ a \s* b /mx>>. There's more than one
19799a22
GS
2975reason you're encouraged to restrict your delimiters to non-alphanumeric,
2976non-whitespace choices.
75e14d17
IZ
2977
2978=back
2979
19799a22 2980This step is the last one for all constructs except regular expressions,
75e14d17
IZ
2981which are processed further.
2982
6deea57f
ST
2983=item parsing regular expressions
2984X<regexp, parse>
75e14d17 2985
19799a22 2986Previous steps were performed during the compilation of Perl code,
ac036724 2987but this one happens at run time, although it may be optimized to
19799a22 2988be calculated at compile time if appropriate. After preprocessing
6deea57f 2989described above, and possibly after evaluation if concatenation,
19799a22
GS
2990joining, casing translation, or metaquoting are involved, the
2991resulting I<string> is passed to the RE engine for compilation.
2992
2993Whatever happens in the RE engine might be better discussed in L<perlre>,
2994but for the sake of continuity, we shall do so here.
2995
ba7f043c 2996This is another step where the presence of the C</x> modifier is
19799a22 2997relevant. The RE engine scans the string from left to right and
ba7f043c 2998converts it into a finite automaton.
19799a22
GS
2999
3000Backslashed characters are either replaced with corresponding
3001literal strings (as with C<\{>), or else they generate special nodes
3002in the finite automaton (as with C<\b>). Characters special to the
3003RE engine (such as C<|>) generate corresponding nodes or groups of
3004nodes. C<(?#...)> comments are ignored. All the rest is either
3005converted to literal strings to match, or else is ignored (as is
ba7f043c 3006whitespace and C<#>-style comments if C</x> is present).
19799a22
GS
3007
3008Parsing of the bracketed character class construct, C<[...]>, is
3009rather different than the rule used for the rest of the pattern.
3010The terminator of this construct is found using the same rules as
3011for finding the terminator of a C<{}>-delimited construct, the only
3012exception being that C<]> immediately following C<[> is treated as
e128ab2c
DM
3013though preceded by a backslash.
3014
3015The terminator of runtime C<(?{...})> is found by temporarily switching
3016control to the perl parser, which should stop at the point where the
3017logically balancing terminating C<}> is found.
19799a22
GS
3018
3019It is possible to inspect both the string given to RE engine and the
3020resulting finite automaton. See the arguments C<debug>/C<debugcolor>
ba7f043c 3021in the S<C<use L<re>>> pragma, as well as Perl's B<-Dr> command-line
4a4eefd0 3022switch documented in L<perlrun/"Command Switches">.
75e14d17
IZ
3023
3024=item Optimization of regular expressions
d74e8afc 3025X<regexp, optimization>
75e14d17 3026
7522fed5 3027This step is listed for completeness only. Since it does not change
75e14d17 3028semantics, details of this step are not documented and are subject
19799a22
GS
3029to change without notice. This step is performed over the finite
3030automaton that was generated during the previous pass.
2a94b7ce 3031
19799a22
GS
3032It is at this stage that C<split()> silently optimizes C</^/> to
3033mean C</^/m>.
75e14d17
IZ
3034
3035=back
3036
a0d0e21e 3037=head2 I/O Operators
d74e8afc 3038X<operator, i/o> X<operator, io> X<io> X<while> X<filehandle>
80a96bfc 3039X<< <> >> X<< <<>> >> X<@ARGV>
a0d0e21e 3040
54310121 3041There are several I/O operators you should know about.
fbad3eb5 3042
7b8d334a 3043A string enclosed by backticks (grave accents) first undergoes
19799a22
GS
3044double-quote interpolation. It is then interpreted as an external
3045command, and the output of that command is the value of the
e9c56f9b
JH
3046backtick string, like in a shell. In scalar context, a single string
3047consisting of all output is returned. In list context, a list of
3048values is returned, one per line of output. (You can set C<$/> to use
3049a different line terminator.) The command is executed each time the
3050pseudo-literal is evaluated. The status value of the command is
3051returned in C<$?> (see L<perlvar> for the interpretation of C<$?>).
3052Unlike in B<csh>, no translation is done on the return data--newlines
3053remain newlines. Unlike in any of the shells, single quotes do not
3054hide variable names in the command from interpretation. To pass a
3055literal dollar-sign through to the shell you need to hide it with a
3056backslash. The generalized form of backticks is C<qx//>. (Because
3057backticks always undergo shell expansion as well, see L<perlsec> for
3058security concerns.)
d74e8afc 3059X<qx> X<`> X<``> X<backtick> X<glob>
19799a22
GS
3060
3061In scalar context, evaluating a filehandle in angle brackets yields
3062the next line from that file (the newline, if any, included), or
3063C<undef> at end-of-file or on error. When C<$/> is set to C<undef>
3064(sometimes known as file-slurp mode) and the file is empty, it
3065returns C<''> the first time, followed by C<undef> subsequently.
3066
3067Ordinarily you must assign the returned value to a variable, but
3068there is one situation where an automatic assignment happens. If
3069and only if the input symbol is the only thing inside the conditional
3070of a C<while> statement (even if disguised as a C<for(;;)> loop),
ba7f043c 3071the value is automatically assigned to the global variable C<$_>,
19799a22
GS
3072destroying whatever was there previously. (This may seem like an
3073odd thing to you, but you'll use the construct in almost every Perl
ba7f043c
KW
3074script you write.) The C<$_> variable is not implicitly localized.
3075You'll have to put a S<C<local $_;>> before the loop if you want that
19799a22
GS
3076to happen.
3077
3078The following lines are equivalent:
a0d0e21e 3079
748a9306 3080 while (defined($_ = <STDIN>)) { print; }
7b8d334a 3081 while ($_ = <STDIN>) { print; }
a0d0e21e
LW
3082 while (<STDIN>) { print; }
3083 for (;<STDIN>;) { print; }
748a9306 3084 print while defined($_ = <STDIN>);
7b8d334a 3085 print while ($_ = <STDIN>);
a0d0e21e
LW
3086 print while <STDIN>;
3087
a727cfac 3088This also behaves similarly, but assigns to a lexical variable
1ca345ed 3089instead of to C<$_>:
7b8d334a 3090
89d205f2 3091 while (my $line = <STDIN>) { print $line }
7b8d334a 3092
19799a22
GS
3093In these loop constructs, the assigned value (whether assignment
3094is automatic or explicit) is then tested to see whether it is
1ca345ed
TC
3095defined. The defined test avoids problems where the line has a string
3096value that would be treated as false by Perl; for example a "" or
ba7f043c 3097a C<"0"> with no trailing newline. If you really mean for such values
19799a22 3098to terminate the loop, they should be tested for explicitly:
7b8d334a
GS
3099
3100 while (($_ = <STDIN>) ne '0') { ... }
3101 while (<STDIN>) { last unless $_; ... }
3102
ba7f043c 3103In other boolean contexts, C<< <I<FILEHANDLE>> >> without an
5ef4d93e 3104explicit C<defined> test or comparison elicits a warning if the
ba7f043c 3105S<C<use warnings>> pragma or the B<-w>
19799a22 3106command-line switch (the C<$^W> variable) is in effect.
7b8d334a 3107
5f05dabc 3108The filehandles STDIN, STDOUT, and STDERR are predefined. (The
19799a22
GS
3109filehandles C<stdin>, C<stdout>, and C<stderr> will also work except
3110in packages, where they would be interpreted as local identifiers
3111rather than global.) Additional filehandles may be created with
ba7f043c 3112the C<open()> function, amongst others. See L<perlopentut> and
19799a22 3113L<perlfunc/open> for details on this.
d74e8afc 3114X<stdin> X<stdout> X<sterr>
a0d0e21e 3115
ba7f043c 3116If a C<< <I<FILEHANDLE>> >> is used in a context that is looking for
19799a22
GS
3117a list, a list comprising all input lines is returned, one line per
3118list element. It's easy to grow to a rather large data space this
3119way, so use with care.
a0d0e21e 3120
ba7f043c 3121C<< <I<FILEHANDLE>> >> may also be spelled C<readline(*I<FILEHANDLE>)>.
19799a22 3122See L<perlfunc/readline>.
fbad3eb5 3123
ba7f043c 3124The null filehandle C<< <> >> is special: it can be used to emulate the
1ca345ed
TC
3125behavior of B<sed> and B<awk>, and any other Unix filter program
3126that takes a list of filenames, doing the same to each line
ba7f043c 3127of input from all of them. Input from C<< <> >> comes either from
a0d0e21e 3128standard input, or from each file listed on the command line. Here's
ba7f043c
KW
3129how it works: the first time C<< <> >> is evaluated, the C<@ARGV> array is
3130checked, and if it is empty, C<$ARGV[0]> is set to C<"-">, which when opened
3131gives you standard input. The C<@ARGV> array is then processed as a list
a0d0e21e
LW
3132of filenames. The loop
3133
3134 while (<>) {
3135 ... # code for each line
3136 }
3137
3138is equivalent to the following Perl-like pseudo code:
3139
3e3baf6d 3140 unshift(@ARGV, '-') unless @ARGV;
a0d0e21e
LW
3141 while ($ARGV = shift) {
3142 open(ARGV, $ARGV);
3143 while (<ARGV>) {
3144 ... # code for each line
3145 }
3146 }
3147
19799a22 3148except that it isn't so cumbersome to say, and will actually work.
ba7f043c
KW
3149It really does shift the C<@ARGV> array and put the current filename
3150into the C<$ARGV> variable. It also uses filehandle I<ARGV>
3151internally. C<< <> >> is just a synonym for C<< <ARGV> >>, which
19799a22 3152is magical. (The pseudo code above doesn't work because it treats
ba7f043c 3153C<< <ARGV> >> as non-magical.)
a0d0e21e 3154
48ab5743
ML
3155Since the null filehandle uses the two argument form of L<perlfunc/open>
3156it interprets special characters, so if you have a script like this:
3157
3158 while (<>) {
3159 print;
3160 }
3161
ba7f043c 3162and call it with S<C<perl dangerous.pl 'rm -rfv *|'>>, it actually opens a
48ab5743
ML
3163pipe, executes the C<rm> command and reads C<rm>'s output from that pipe.
3164If you want all items in C<@ARGV> to be interpreted as file names, you
1033ba6e
PM
3165can use the module C<ARGV::readonly> from CPAN, or use the double bracket:
3166
3167 while (<<>>) {
3168 print;
3169 }
3170
3171Using double angle brackets inside of a while causes the open to use the
3172three argument form (with the second argument being C<< < >>), so all
ba7f043c
KW
3173arguments in C<ARGV> are treated as literal filenames (including C<"-">).
3174(Note that for convenience, if you use C<< <<>> >> and if C<@ARGV> is
80a96bfc 3175empty, it will still read from the standard input.)
48ab5743 3176
ba7f043c 3177You can modify C<@ARGV> before the first C<< <> >> as long as the array ends up
a0d0e21e 3178containing the list of filenames you really want. Line numbers (C<$.>)
19799a22
GS
3179continue as though the input were one big happy file. See the example
3180in L<perlfunc/eof> for how to reset line numbers on each file.
5a964f20 3181
ba7f043c
KW
3182If you want to set C<@ARGV> to your own list of files, go right ahead.
3183This sets C<@ARGV> to all plain text files if no C<@ARGV> was given:
5a964f20
TC
3184
3185 @ARGV = grep { -f && -T } glob('*') unless @ARGV;
a0d0e21e 3186
5a964f20
TC
3187You can even set them to pipe commands. For example, this automatically
3188filters compressed arguments through B<gzip>:
3189
3190 @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV;
3191
3192If you want to pass switches into your script, you can use one of the
ba7f043c 3193C<Getopts> modules or put a loop on the front like this:
a0d0e21e
LW
3194
3195 while ($_ = $ARGV[0], /^-/) {
3196 shift;
3197 last if /^--$/;
3198 if (/^-D(.*)/) { $debug = $1 }
3199 if (/^-v/) { $verbose++ }
5a964f20 3200 # ... # other switches
a0d0e21e 3201 }
5a964f20 3202
a0d0e21e 3203 while (<>) {
5a964f20 3204 # ... # code for each line
a0d0e21e
LW
3205 }
3206
ba7f043c 3207The C<< <> >> symbol will return C<undef> for end-of-file only once.
89d205f2 3208If you call it again after this, it will assume you are processing another
ba7f043c 3209C<@ARGV> list, and if you haven't set C<@ARGV>, will read input from STDIN.
a0d0e21e 3210
1ca345ed 3211If what the angle brackets contain is a simple scalar variable (for example,
ba7f043c 3212C<$foo>), then that variable contains the name of the
19799a22
GS
3213filehandle to input from, or its typeglob, or a reference to the
3214same. For example:
cb1a09d0
AD
3215
3216 $fh = \*STDIN;
3217 $line = <$fh>;
a0d0e21e 3218
5a964f20
TC
3219If what's within the angle brackets is neither a filehandle nor a simple
3220scalar variable containing a filehandle name, typeglob, or typeglob
3221reference, it is interpreted as a filename pattern to be globbed, and
3222either a list of filenames or the next filename in the list is returned,
19799a22 3223depending on context. This distinction is determined on syntactic
ba7f043c
KW
3224grounds alone. That means C<< <$x> >> is always a C<readline()> from
3225an indirect handle, but C<< <$hash{key}> >> is always a C<glob()>.
3226That's because C<$x> is a simple scalar variable, but C<$hash{key}> is
ef191