This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Filter::Simple: Version bump to align with CPAN release
[perl5.git] / pod / perlop.pod
CommitLineData
a0d0e21e 1=head1 NAME
d74e8afc 2X<operator>
a0d0e21e
LW
3
4perlop - Perl operators and precedence
5
d042e63d
MS
6=head1 DESCRIPTION
7
ae3f7391 8In Perl, the operator determines what operation is performed,
ba7f043c 9independent of the type of the operands. For example S<C<$x + $y>>
db691027 10is always a numeric addition, and if C<$x> or C<$y> do not contain
ae3f7391
ML
11numbers, an attempt is made to convert them to numbers first.
12
13This is in contrast to many other dynamic languages, where the
46f8a5ea 14operation is determined by the type of the first argument. It also
ae3f7391 15means that Perl has two versions of some operators, one for numeric
ba7f043c
KW
16and one for string comparison. For example S<C<$x == $y>> compares
17two numbers for equality, and S<C<$x eq $y>> compares two strings.
ae3f7391
ML
18
19There are a few exceptions though: C<x> can be either string
20repetition or list repetition, depending on the type of the left
0b55efd7 21operand, and C<&>, C<|>, C<^> and C<~> can be either string or numeric bit
ae3f7391
ML
22operations.
23
89d205f2 24=head2 Operator Precedence and Associativity
d74e8afc 25X<operator, precedence> X<precedence> X<associativity>
d042e63d
MS
26
27Operator precedence and associativity work in Perl more or less like
28they do in mathematics.
29
30I<Operator precedence> means some operators are evaluated before
ba7f043c
KW
31others. For example, in S<C<2 + 4 * 5>>, the multiplication has higher
32precedence so S<C<4 * 5>> is evaluated first yielding S<C<2 + 20 ==
3322>> and not S<C<6 * 5 == 30>>.
d042e63d
MS
34
35I<Operator associativity> defines what happens if a sequence of the
36same operators is used one after another: whether the evaluator will
ba7f043c
KW
37evaluate the left operations first, or the right first. For example, in
38S<C<8 - 4 - 2>>, subtraction is left associative so Perl evaluates the
39expression left to right. S<C<8 - 4>> is evaluated first making the
40expression S<C<4 - 2 == 2>> and not S<C<8 - 2 == 6>>.
a0d0e21e
LW
41
42Perl operators have the following associativity and precedence,
19799a22
GS
43listed from highest precedence to lowest. Operators borrowed from
44C keep the same precedence relationship with each other, even where
45C's precedence is slightly screwy. (This makes learning Perl easier
46for C folks.) With very few exceptions, these all operate on scalar
47values only, not array values.
a0d0e21e
LW
48
49 left terms and list operators (leftward)
50 left ->
51 nonassoc ++ --
52 right **
53 right ! ~ \ and unary + and -
54310121 54 left =~ !~
a0d0e21e
LW
55 left * / % x
56 left + - .
57 left << >>
58 nonassoc named unary operators
59 nonassoc < > <= >= lt gt le ge
0d863452 60 nonassoc == != <=> eq ne cmp ~~
a0d0e21e
LW
61 left &
62 left | ^
63 left &&
c963b151 64 left || //
137443ea 65 nonassoc .. ...
a0d0e21e 66 right ?:
2ba1f20a 67 right = += -= *= etc. goto last next redo dump
a0d0e21e
LW
68 left , =>
69 nonassoc list operators (rightward)
a5f75d66 70 right not
a0d0e21e 71 left and
f23102e2 72 left or xor
a0d0e21e 73
3df91f1a
DM
74In the following sections, these operators are covered in detail, in the
75same order in which they appear in the table above.
a0d0e21e 76
5a964f20
TC
77Many operators can be overloaded for objects. See L<overload>.
78
a0d0e21e 79=head2 Terms and List Operators (Leftward)
d74e8afc 80X<list operator> X<operator, list> X<term>
a0d0e21e 81
62c18ce2 82A TERM has the highest precedence in Perl. They include variables,
5f05dabc 83quote and quote-like operators, any expression in parentheses,
a0d0e21e
LW
84and any function whose arguments are parenthesized. Actually, there
85aren't really functions in this sense, just list operators and unary
86operators behaving as functions because you put parentheses around
87the arguments. These are all documented in L<perlfunc>.
88
ba7f043c 89If any list operator (C<print()>, etc.) or any unary operator (C<chdir()>, etc.)
a0d0e21e
LW
90is followed by a left parenthesis as the next token, the operator and
91arguments within parentheses are taken to be of highest precedence,
92just like a normal function call.
93
94In the absence of parentheses, the precedence of list operators such as
95C<print>, C<sort>, or C<chmod> is either very high or very low depending on
54310121 96whether you are looking at the left side or the right side of the operator.
a0d0e21e
LW
97For example, in
98
99 @ary = (1, 3, sort 4, 2);
100 print @ary; # prints 1324
101
ba7f043c 102the commas on the right of the C<sort> are evaluated before the C<sort>,
19799a22
GS
103but the commas on the left are evaluated after. In other words,
104list operators tend to gobble up all arguments that follow, and
a0d0e21e 105then act like a simple TERM with regard to the preceding expression.
19799a22 106Be careful with parentheses:
a0d0e21e
LW
107
108 # These evaluate exit before doing the print:
109 print($foo, exit); # Obviously not what you want.
110 print $foo, exit; # Nor is this.
111
112 # These do the print before evaluating exit:
113 (print $foo), exit; # This is what you want.
114 print($foo), exit; # Or this.
115 print ($foo), exit; # Or even this.
116
117Also note that
118
119 print ($foo & 255) + 1, "\n";
120
d042e63d
MS
121probably doesn't do what you expect at first glance. The parentheses
122enclose the argument list for C<print> which is evaluated (printing
ba7f043c 123the result of S<C<$foo & 255>>). Then one is added to the return value
d042e63d
MS
124of C<print> (usually 1). The result is something like this:
125
126 1 + 1, "\n"; # Obviously not what you meant.
127
128To do what you meant properly, you must write:
129
130 print(($foo & 255) + 1, "\n");
131
5a0de581 132See L</Named Unary Operators> for more discussion of this.
a0d0e21e 133
ba7f043c 134Also parsed as terms are the S<C<do {}>> and S<C<eval {}>> constructs, as
54310121 135well as subroutine and method calls, and the anonymous
a0d0e21e
LW
136constructors C<[]> and C<{}>.
137
5a0de581 138See also L</Quote and Quote-like Operators> toward the end of this section,
da87341d 139as well as L</"I/O Operators">.
a0d0e21e
LW
140
141=head2 The Arrow Operator
d74e8afc 142X<arrow> X<dereference> X<< -> >>
a0d0e21e 143
35f2feb0 144"C<< -> >>" is an infix dereference operator, just as it is in C
19799a22
GS
145and C++. If the right side is either a C<[...]>, C<{...}>, or a
146C<(...)> subscript, then the left side must be either a hard or
147symbolic reference to an array, a hash, or a subroutine respectively.
148(Or technically speaking, a location capable of holding a hard
149reference, if it's an array or hash reference being used for
150assignment.) See L<perlreftut> and L<perlref>.
a0d0e21e 151
19799a22
GS
152Otherwise, the right side is a method name or a simple scalar
153variable containing either the method name or a subroutine reference,
154and the left side must be either an object (a blessed reference)
155or a class name (that is, a package name). See L<perlobj>.
a0d0e21e 156
821361b6 157The dereferencing cases (as opposed to method-calling cases) are
2ad792cd 158somewhat extended by the C<postderef> feature. For the
821361b6
RS
159details of that feature, consult L<perlref/Postfix Dereference Syntax>.
160
5f05dabc 161=head2 Auto-increment and Auto-decrement
d74e8afc 162X<increment> X<auto-increment> X<++> X<decrement> X<auto-decrement> X<-->
a0d0e21e 163
ba7f043c 164C<"++"> and C<"--"> work as in C. That is, if placed before a variable,
d042e63d
MS
165they increment or decrement the variable by one before returning the
166value, and if placed after, increment or decrement after returning the
167value.
168
169 $i = 0; $j = 0;
170 print $i++; # prints 0
171 print ++$j; # prints 1
a0d0e21e 172
b033823e 173Note that just as in C, Perl doesn't define B<when> the variable is
46f8a5ea
FC
174incremented or decremented. You just know it will be done sometime
175before or after the value is returned. This also means that modifying
c543c01b 176a variable twice in the same statement will lead to undefined behavior.
b033823e
A
177Avoid statements like:
178
179 $i = $i ++;
180 print ++ $i + $i ++;
181
182Perl will not guarantee what the result of the above statements is.
183
54310121 184The auto-increment operator has a little extra builtin magic to it. If
a0d0e21e
LW
185you increment a variable that is numeric, or that has ever been used in
186a numeric context, you get a normal increment. If, however, the
5f05dabc 187variable has been used in only string contexts since it was set, and
5a964f20 188has a value that is not the empty string and matches the pattern
9c0670e1 189C</^[a-zA-Z]*[0-9]*\z/>, the increment is done as a string, preserving each
a0d0e21e
LW
190character within its range, with carry:
191
c543c01b
TC
192 print ++($foo = "99"); # prints "100"
193 print ++($foo = "a0"); # prints "a1"
194 print ++($foo = "Az"); # prints "Ba"
195 print ++($foo = "zz"); # prints "aaa"
a0d0e21e 196
6a61d433
HS
197C<undef> is always treated as numeric, and in particular is changed
198to C<0> before incrementing (so that a post-increment of an undef value
199will return C<0> rather than C<undef>).
200
5f05dabc 201The auto-decrement operator is not magical.
a0d0e21e
LW
202
203=head2 Exponentiation
d74e8afc 204X<**> X<exponentiation> X<power>
a0d0e21e 205
ba7f043c
KW
206Binary C<"**"> is the exponentiation operator. It binds even more
207tightly than unary minus, so C<-2**4> is C<-(2**4)>, not C<(-2)**4>.
208(This is
209implemented using C's C<pow(3)> function, which actually works on doubles
cb1a09d0 210internally.)
a0d0e21e 211
44a465b3
JH
212Note that certain exponentiation expressions are ill-defined:
213these include C<0**0>, C<1**Inf>, and C<Inf**0>. Do not expect
214any particular results from these special cases, the results
215are platform-dependent.
216
a0d0e21e 217=head2 Symbolic Unary Operators
d74e8afc 218X<unary operator> X<operator, unary>
a0d0e21e 219
4b05bc8e
KW
220Unary C<"!"> performs logical negation, that is, "not". See also
221L<C<not>|/Logical Not> for a lower precedence version of this.
d74e8afc 222X<!>
a0d0e21e 223
ba7f043c 224Unary C<"-"> performs arithmetic negation if the operand is numeric,
da2f94c5
FC
225including any string that looks like a number. If the operand is
226an identifier, a string consisting of a minus sign concatenated
227with the identifier is returned. Otherwise, if the string starts
228with a plus or minus, a string starting with the opposite sign is
ba7f043c
KW
229returned. One effect of these rules is that C<-bareword> is equivalent
230to the string C<"-bareword">. If, however, the string begins with a
231non-alphabetic character (excluding C<"+"> or C<"-">), Perl will attempt
232to convert
233the string to a numeric, and the arithmetic negation is performed. If the
06705523
SP
234string cannot be cleanly converted to a numeric, Perl will give the warning
235B<Argument "the string" isn't numeric in negation (-) at ...>.
d74e8afc 236X<-> X<negation, arithmetic>
a0d0e21e 237
ba7f043c 238Unary C<"~"> performs bitwise negation, that is, 1's complement. For
5a0de581
LM
239example, S<C<0666 & ~027>> is 0640. (See also L</Integer Arithmetic> and
240L</Bitwise String Operators>.) Note that the width of the result is
ba7f043c 241platform-dependent: C<~0> is 32 bits wide on a 32-bit platform, but 64
972b05a9 242bits wide on a 64-bit platform, so if you are expecting a certain bit
ba7f043c 243width, remember to use the C<"&"> operator to mask off the excess bits.
d74e8afc 244X<~> X<negation, binary>
a0d0e21e 245
fac71630
KW
246Starting in Perl 5.28, it is a fatal error to try to complement a string
247containing a character with an ordinal value above 255.
f113cf86 248
ba7f043c
KW
249If the experimental "bitwise" feature is enabled via S<C<use feature
250'bitwise'>>, then unary C<"~"> always treats its argument as a number, and an
251alternate form of the operator, C<"~.">, always treats its argument as a
fb7054ba
FC
252string. So C<~0> and C<~"0"> will both give 2**32-1 on 32-bit platforms,
253whereas C<~.0> and C<~."0"> will both yield C<"\xff">. This feature
ba7f043c 254produces a warning unless you use S<C<no warnings 'experimental::bitwise'>>.
fb7054ba 255
ba7f043c 256Unary C<"+"> has no effect whatsoever, even on strings. It is useful
a0d0e21e
LW
257syntactically for separating a function name from a parenthesized expression
258that would otherwise be interpreted as the complete list of function
a95b3d6a 259arguments. (See examples above under L</Terms and List Operators (Leftward)>.)
d74e8afc 260X<+>
a0d0e21e 261
ba7f043c 262Unary C<"\"> creates a reference to whatever follows it. See L<perlreftut>
19799a22
GS
263and L<perlref>. Do not confuse this behavior with the behavior of
264backslash within a string, although both forms do convey the notion
265of protecting the next thing from interpolation.
d74e8afc 266X<\> X<reference> X<backslash>
a0d0e21e
LW
267
268=head2 Binding Operators
d74e8afc 269X<binding> X<operator, binding> X<=~> X<!~>
a0d0e21e 270
ba7f043c
KW
271Binary C<"=~"> binds a scalar expression to a pattern match. Certain operations
272search or modify the string C<$_> by default. This operator makes that kind
cb1a09d0 273of operation work on some other string. The right argument is a search
2c268ad5
TP
274pattern, substitution, or transliteration. The left argument is what is
275supposed to be searched, substituted, or transliterated instead of the default
ba7f043c
KW
276C<$_>. When used in scalar context, the return value generally indicates the
277success of the operation. The exceptions are substitution (C<s///>)
278and transliteration (C<y///>) with the C</r> (non-destructive) option,
8ff32507
FC
279which cause the B<r>eturn value to be the result of the substitution.
280Behavior in list context depends on the particular operator.
000c65fc
DG
281See L</"Regexp Quote-Like Operators"> for details and L<perlretut> for
282examples using these operators.
f8bab1e9
GS
283
284If the right argument is an expression rather than a search pattern,
2c268ad5 285substitution, or transliteration, it is interpreted as a search pattern at run
46f8a5ea
FC
286time. Note that this means that its
287contents will be interpolated twice, so
89d205f2 288
1ca345ed 289 '\\' =~ q'\\';
89d205f2
YO
290
291is not ok, as the regex engine will end up trying to compile the
292pattern C<\>, which it will consider a syntax error.
a0d0e21e 293
ba7f043c 294Binary C<"!~"> is just like C<"=~"> except the return value is negated in
a0d0e21e
LW
295the logical sense.
296
ba7f043c
KW
297Binary C<"!~"> with a non-destructive substitution (C<s///r>) or transliteration
298(C<y///r>) is a syntax error.
4f4d7508 299
a0d0e21e 300=head2 Multiplicative Operators
d74e8afc 301X<operator, multiplicative>
a0d0e21e 302
ba7f043c 303Binary C<"*"> multiplies two numbers.
d74e8afc 304X<*>
a0d0e21e 305
ba7f043c 306Binary C<"/"> divides two numbers.
d74e8afc 307X</> X<slash>
a0d0e21e 308
ba7f043c 309Binary C<"%"> is the modulo operator, which computes the division
f7918450
KW
310remainder of its first argument with respect to its second argument.
311Given integer
ba7f043c 312operands C<$m> and C<$n>: If C<$n> is positive, then S<C<$m % $n>> is
db691027 313C<$m> minus the largest multiple of C<$n> less than or equal to
ba7f043c 314C<$m>. If C<$n> is negative, then S<C<$m % $n>> is C<$m> minus the
db691027 315smallest multiple of C<$n> that is not less than C<$m> (that is, the
89b4f0ad 316result will be less than or equal to zero). If the operands
db691027 317C<$m> and C<$n> are floating point values and the absolute value of
ba7f043c 318C<$n> (that is C<abs($n)>) is less than S<C<(UV_MAX + 1)>>, only
db691027 319the integer portion of C<$m> and C<$n> will be used in the operation
4848a83b 320(Note: here C<UV_MAX> means the maximum of the unsigned integer type).
db691027 321If the absolute value of the right operand (C<abs($n)>) is greater than
ba7f043c
KW
322or equal to S<C<(UV_MAX + 1)>>, C<"%"> computes the floating-point remainder
323C<$r> in the equation S<C<($r = $m - $i*$n)>> where C<$i> is a certain
f7918450 324integer that makes C<$r> have the same sign as the right operand
db691027
SF
325C<$n> (B<not> as the left operand C<$m> like C function C<fmod()>)
326and the absolute value less than that of C<$n>.
ba7f043c 327Note that when S<C<use integer>> is in scope, C<"%"> gives you direct access
f7918450 328to the modulo operator as implemented by your C compiler. This
55d729e4
GS
329operator is not as well defined for negative operands, but it will
330execute faster.
f7918450 331X<%> X<remainder> X<modulo> X<mod>
55d729e4 332
ba7f043c 333Binary C<"x"> is the repetition operator. In scalar context or if the left
62d10b70
GS
334operand is not enclosed in parentheses, it returns a string consisting
335of the left operand repeated the number of times specified by the right
336operand. In list context, if the left operand is enclosed in
ba7f043c 337parentheses or is a list formed by C<qw/I<STRING>/>, it repeats the list.
31201a8e
KW
338If the right operand is zero or negative (raising a warning on
339negative), it returns an empty string
3585017f 340or an empty list, depending on the context.
d74e8afc 341X<x>
a0d0e21e
LW
342
343 print '-' x 80; # print row of dashes
344
345 print "\t" x ($tab/8), ' ' x ($tab%8); # tab over
346
347 @ones = (1) x 80; # a list of 80 1's
348 @ones = (5) x @ones; # set all elements to 5
349
350
351=head2 Additive Operators
d74e8afc 352X<operator, additive>
a0d0e21e 353
ba7f043c 354Binary C<"+"> returns the sum of two numbers.
d74e8afc 355X<+>
a0d0e21e 356
ba7f043c 357Binary C<"-"> returns the difference of two numbers.
d74e8afc 358X<->
a0d0e21e 359
ba7f043c 360Binary C<"."> concatenates two strings.
d74e8afc
ITB
361X<string, concatenation> X<concatenation>
362X<cat> X<concat> X<concatenate> X<.>
a0d0e21e
LW
363
364=head2 Shift Operators
d74e8afc
ITB
365X<shift operator> X<operator, shift> X<<< << >>>
366X<<< >> >>> X<right shift> X<left shift> X<bitwise shift>
367X<shl> X<shr> X<shift, right> X<shift, left>
a0d0e21e 368
ba7f043c 369Binary C<<< "<<" >>> returns the value of its left argument shifted left by the
55497cff 370number of bits specified by the right argument. Arguments should be
5a0de581 371integers. (See also L</Integer Arithmetic>.)
a0d0e21e 372
ba7f043c 373Binary C<<< ">>" >>> returns the value of its left argument shifted right by
55497cff 374the number of bits specified by the right argument. Arguments should
5a0de581 375be integers. (See also L</Integer Arithmetic>.)
a0d0e21e 376
5a0de581 377If S<C<use integer>> (see L</Integer Arithmetic>) is in force then
a63df121
JH
378signed C integers are used (I<arithmetic shift>), otherwise unsigned C
379integers are used (I<logical shift>), even for negative shiftees.
380In arithmetic right shift the sign bit is replicated on the left,
381in logical shift zero bits come in from the left.
382
383Either way, the implementation isn't going to generate results larger
384than the size of the integer type Perl was built with (32 bits or 64 bits).
385
386Shifting by negative number of bits means the reverse shift: left
387shift becomes right shift, right shift becomes left shift. This is
388unlike in C, where negative shift is undefined.
389
390Shifting by more bits than the size of the integers means most of the
391time zero (all bits fall off), except that under S<C<use integer>>
392right overshifting a negative shiftee results in -1. This is unlike
393in C, where shifting by too many bits is undefined. A common C
394behavior is "shift by modulo wordbits", so that for example
395
396 1 >> 64 == 1 >> (64 % 64) == 1 >> 0 == 1 # Common C behavior.
397
398but that is completely accidental.
b16cf6df 399
1ca345ed 400If you get tired of being subject to your platform's native integers,
ba7f043c 401the S<C<use bigint>> pragma neatly sidesteps the issue altogether:
1ca345ed
TC
402
403 print 20 << 20; # 20971520
a727cfac 404 print 20 << 40; # 5120 on 32-bit machines,
1ca345ed
TC
405 # 21990232555520 on 64-bit machines
406 use bigint;
407 print 20 << 100; # 25353012004564588029934064107520
408
a0d0e21e 409=head2 Named Unary Operators
d74e8afc 410X<operator, named unary>
a0d0e21e
LW
411
412The various named unary operators are treated as functions with one
568e6d8b 413argument, with optional parentheses.
a0d0e21e 414
ba7f043c 415If any list operator (C<print()>, etc.) or any unary operator (C<chdir()>, etc.)
a0d0e21e
LW
416is followed by a left parenthesis as the next token, the operator and
417arguments within parentheses are taken to be of highest precedence,
3981b0eb 418just like a normal function call. For example,
1ca345ed 419because named unary operators are higher precedence than C<||>:
a0d0e21e
LW
420
421 chdir $foo || die; # (chdir $foo) || die
422 chdir($foo) || die; # (chdir $foo) || die
423 chdir ($foo) || die; # (chdir $foo) || die
424 chdir +($foo) || die; # (chdir $foo) || die
425
ba7f043c 426but, because C<"*"> is higher precedence than named operators:
a0d0e21e
LW
427
428 chdir $foo * 20; # chdir ($foo * 20)
429 chdir($foo) * 20; # (chdir $foo) * 20
430 chdir ($foo) * 20; # (chdir $foo) * 20
431 chdir +($foo) * 20; # chdir ($foo * 20)
432
433 rand 10 * 20; # rand (10 * 20)
434 rand(10) * 20; # (rand 10) * 20
435 rand (10) * 20; # (rand 10) * 20
436 rand +(10) * 20; # rand (10 * 20)
437
568e6d8b
RGS
438Regarding precedence, the filetest operators, like C<-f>, C<-M>, etc. are
439treated like named unary operators, but they don't follow this functional
440parenthesis rule. That means, for example, that C<-f($file).".bak"> is
ba7f043c 441equivalent to S<C<-f "$file.bak">>.
d74e8afc 442X<-X> X<filetest> X<operator, filetest>
568e6d8b 443
5a0de581 444See also L</"Terms and List Operators (Leftward)">.
a0d0e21e
LW
445
446=head2 Relational Operators
d74e8afc 447X<relational operator> X<operator, relational>
a0d0e21e 448
a727cfac 449Perl operators that return true or false generally return values
1ca345ed
TC
450that can be safely used as numbers. For example, the relational
451operators in this section and the equality operators in the next
452one return C<1> for true and a special version of the defined empty
453string, C<"">, which counts as a zero but is exempt from warnings
ba7f043c 454about improper numeric conversions, just as S<C<"0 but true">> is.
1ca345ed 455
ba7f043c 456Binary C<< "<" >> returns true if the left argument is numerically less than
a0d0e21e 457the right argument.
d74e8afc 458X<< < >>
a0d0e21e 459
ba7f043c 460Binary C<< ">" >> returns true if the left argument is numerically greater
a0d0e21e 461than the right argument.
d74e8afc 462X<< > >>
a0d0e21e 463
ba7f043c 464Binary C<< "<=" >> returns true if the left argument is numerically less than
a0d0e21e 465or equal to the right argument.
d74e8afc 466X<< <= >>
a0d0e21e 467
ba7f043c 468Binary C<< ">=" >> returns true if the left argument is numerically greater
a0d0e21e 469than or equal to the right argument.
d74e8afc 470X<< >= >>
a0d0e21e 471
ba7f043c 472Binary C<"lt"> returns true if the left argument is stringwise less than
a0d0e21e 473the right argument.
d74e8afc 474X<< lt >>
a0d0e21e 475
ba7f043c 476Binary C<"gt"> returns true if the left argument is stringwise greater
a0d0e21e 477than the right argument.
d74e8afc 478X<< gt >>
a0d0e21e 479
ba7f043c 480Binary C<"le"> returns true if the left argument is stringwise less than
a0d0e21e 481or equal to the right argument.
d74e8afc 482X<< le >>
a0d0e21e 483
ba7f043c 484Binary C<"ge"> returns true if the left argument is stringwise greater
a0d0e21e 485than or equal to the right argument.
d74e8afc 486X<< ge >>
a0d0e21e
LW
487
488=head2 Equality Operators
d74e8afc 489X<equality> X<equal> X<equals> X<operator, equality>
a0d0e21e 490
ba7f043c 491Binary C<< "==" >> returns true if the left argument is numerically equal to
a0d0e21e 492the right argument.
d74e8afc 493X<==>
a0d0e21e 494
ba7f043c 495Binary C<< "!=" >> returns true if the left argument is numerically not equal
a0d0e21e 496to the right argument.
d74e8afc 497X<!=>
a0d0e21e 498
ba7f043c 499Binary C<< "<=>" >> returns -1, 0, or 1 depending on whether the left
6ee5d4e7 500argument is numerically less than, equal to, or greater than the right
ba7f043c
KW
501argument. If your platform supports C<NaN>'s (not-a-numbers) as numeric
502values, using them with C<< "<=>" >> returns undef. C<NaN> is not
503C<< "<" >>, C<< "==" >>, C<< ">" >>, C<< "<=" >> or C<< ">=" >> anything
504(even C<NaN>), so those 5 return false. S<C<< NaN != NaN >>> returns
505true, as does S<C<NaN !=> I<anything else>>. If your platform doesn't
506support C<NaN>'s then C<NaN> is just a string with numeric value 0.
507X<< <=> >>
508X<spaceship>
7d3a9d88 509
db691027
SF
510 $ perl -le '$x = "NaN"; print "No NaN support here" if $x == $x'
511 $ perl -le '$x = "NaN"; print "NaN support here" if $x != $x'
1ca345ed 512
db691027 513(Note that the L<bigint>, L<bigrat>, and L<bignum> pragmas all
ba7f043c 514support C<"NaN">.)
a0d0e21e 515
ba7f043c 516Binary C<"eq"> returns true if the left argument is stringwise equal to
a0d0e21e 517the right argument.
d74e8afc 518X<eq>
a0d0e21e 519
ba7f043c 520Binary C<"ne"> returns true if the left argument is stringwise not equal
a0d0e21e 521to the right argument.
d74e8afc 522X<ne>
a0d0e21e 523
ba7f043c 524Binary C<"cmp"> returns -1, 0, or 1 depending on whether the left
d4ad863d
JH
525argument is stringwise less than, equal to, or greater than the right
526argument.
d74e8afc 527X<cmp>
a0d0e21e 528
ba7f043c 529Binary C<"~~"> does a smartmatch between its arguments. Smart matching
1ca345ed 530is described in the next section.
0d863452
RH
531X<~~>
532
ba7f043c
KW
533C<"lt">, C<"le">, C<"ge">, C<"gt"> and C<"cmp"> use the collation (sort)
534order specified by the current C<LC_COLLATE> locale if a S<C<use
535locale>> form that includes collation is in effect. See L<perllocale>.
536Do not mix these with Unicode,
537only use them with legacy 8-bit locale encodings.
538The standard C<L<Unicode::Collate>> and
539C<L<Unicode::Collate::Locale>> modules offer much more powerful
540solutions to collation issues.
1ca345ed 541
82365311
DG
542For case-insensitive comparisions, look at the L<perlfunc/fc> case-folding
543function, available in Perl v5.16 or later:
544
545 if ( fc($x) eq fc($y) ) { ... }
546
1ca345ed
TC
547=head2 Smartmatch Operator
548
549First available in Perl 5.10.1 (the 5.10.0 version behaved differently),
550binary C<~~> does a "smartmatch" between its arguments. This is mostly
551used implicitly in the C<when> construct described in L<perlsyn>, although
552not all C<when> clauses call the smartmatch operator. Unique among all of
cc08d69f
RS
553Perl's operators, the smartmatch operator can recurse. The smartmatch
554operator is L<experimental|perlpolicy/experimental> and its behavior is
555subject to change.
1ca345ed
TC
556
557It is also unique in that all other Perl operators impose a context
558(usually string or numeric context) on their operands, autoconverting
559those operands to those imposed contexts. In contrast, smartmatch
560I<infers> contexts from the actual types of its operands and uses that
561type information to select a suitable comparison mechanism.
562
563The C<~~> operator compares its operands "polymorphically", determining how
564to compare them according to their actual types (numeric, string, array,
565hash, etc.) Like the equality operators with which it shares the same
566precedence, C<~~> returns 1 for true and C<""> for false. It is often best
567read aloud as "in", "inside of", or "is contained in", because the left
568operand is often looked for I<inside> the right operand. That makes the
40bec8a5 569order of the operands to the smartmatch operand often opposite that of
1ca345ed
TC
570the regular match operator. In other words, the "smaller" thing is usually
571placed in the left operand and the larger one in the right.
572
573The behavior of a smartmatch depends on what type of things its arguments
574are, as determined by the following table. The first row of the table
575whose types apply determines the smartmatch behavior. Because what
576actually happens is mostly determined by the type of the second operand,
577the table is sorted on the right operand instead of on the left.
578
a727cfac 579 Left Right Description and pseudocode
1ca345ed 580 ===============================================================
a727cfac 581 Any undef check whether Any is undefined
1ca345ed
TC
582 like: !defined Any
583
584 Any Object invoke ~~ overloading on Object, or die
585
586 Right operand is an ARRAY:
587
a727cfac 588 Left Right Description and pseudocode
1ca345ed
TC
589 ===============================================================
590 ARRAY1 ARRAY2 recurse on paired elements of ARRAY1 and ARRAY2[2]
591 like: (ARRAY1[0] ~~ ARRAY2[0])
592 && (ARRAY1[1] ~~ ARRAY2[1]) && ...
a727cfac 593 HASH ARRAY any ARRAY elements exist as HASH keys
1ca345ed
TC
594 like: grep { exists HASH->{$_} } ARRAY
595 Regexp ARRAY any ARRAY elements pattern match Regexp
596 like: grep { /Regexp/ } ARRAY
a727cfac 597 undef ARRAY undef in ARRAY
1ca345ed 598 like: grep { !defined } ARRAY
a727cfac 599 Any ARRAY smartmatch each ARRAY element[3]
1ca345ed
TC
600 like: grep { Any ~~ $_ } ARRAY
601
602 Right operand is a HASH:
603
a727cfac 604 Left Right Description and pseudocode
1ca345ed 605 ===============================================================
a727cfac 606 HASH1 HASH2 all same keys in both HASHes
1ca345ed
TC
607 like: keys HASH1 ==
608 grep { exists HASH2->{$_} } keys HASH1
a727cfac 609 ARRAY HASH any ARRAY elements exist as HASH keys
1ca345ed 610 like: grep { exists HASH->{$_} } ARRAY
a727cfac 611 Regexp HASH any HASH keys pattern match Regexp
1ca345ed 612 like: grep { /Regexp/ } keys HASH
a727cfac 613 undef HASH always false (undef can't be a key)
1ca345ed 614 like: 0 == 1
a727cfac 615 Any HASH HASH key existence
1ca345ed
TC
616 like: exists HASH->{Any}
617
618 Right operand is CODE:
f703fc96 619
a727cfac 620 Left Right Description and pseudocode
1ca345ed
TC
621 ===============================================================
622 ARRAY CODE sub returns true on all ARRAY elements[1]
623 like: !grep { !CODE->($_) } ARRAY
624 HASH CODE sub returns true on all HASH keys[1]
625 like: !grep { !CODE->($_) } keys HASH
a727cfac 626 Any CODE sub passed Any returns true
1ca345ed
TC
627 like: CODE->(Any)
628
629Right operand is a Regexp:
630
a727cfac 631 Left Right Description and pseudocode
1ca345ed 632 ===============================================================
a727cfac 633 ARRAY Regexp any ARRAY elements match Regexp
1ca345ed 634 like: grep { /Regexp/ } ARRAY
a727cfac 635 HASH Regexp any HASH keys match Regexp
1ca345ed 636 like: grep { /Regexp/ } keys HASH
a727cfac 637 Any Regexp pattern match
1ca345ed
TC
638 like: Any =~ /Regexp/
639
640 Other:
641
a727cfac 642 Left Right Description and pseudocode
1ca345ed
TC
643 ===============================================================
644 Object Any invoke ~~ overloading on Object,
645 or fall back to...
646
a727cfac 647 Any Num numeric equality
1ca345ed
TC
648 like: Any == Num
649 Num nummy[4] numeric equality
650 like: Num == nummy
651 undef Any check whether undefined
652 like: !defined(Any)
a727cfac 653 Any Any string equality
1ca345ed
TC
654 like: Any eq Any
655
656
657Notes:
658
659=over
660
661=item 1.
a727cfac 662Empty hashes or arrays match.
1ca345ed
TC
663
664=item 2.
40bec8a5 665That is, each element smartmatches the element of the same index in the other array.[3]
1ca345ed
TC
666
667=item 3.
a727cfac 668If a circular reference is found, fall back to referential equality.
1ca345ed
TC
669
670=item 4.
671Either an actual number, or a string that looks like one.
672
673=back
674
675The smartmatch implicitly dereferences any non-blessed hash or array
676reference, so the C<I<HASH>> and C<I<ARRAY>> entries apply in those cases.
677For blessed references, the C<I<Object>> entries apply. Smartmatches
678involving hashes only consider hash keys, never hash values.
679
680The "like" code entry is not always an exact rendition. For example, the
40bec8a5 681smartmatch operator short-circuits whenever possible, but C<grep> does
1ca345ed
TC
682not. Also, C<grep> in scalar context returns the number of matches, but
683C<~~> returns only true or false.
684
685Unlike most operators, the smartmatch operator knows to treat C<undef>
686specially:
687
688 use v5.10.1;
689 @array = (1, 2, 3, undef, 4, 5);
690 say "some elements undefined" if undef ~~ @array;
691
692Each operand is considered in a modified scalar context, the modification
693being that array and hash variables are passed by reference to the
694operator, which implicitly dereferences them. Both elements
695of each pair are the same:
696
697 use v5.10.1;
698
699 my %hash = (red => 1, blue => 2, green => 3,
700 orange => 4, yellow => 5, purple => 6,
701 black => 7, grey => 8, white => 9);
702
703 my @array = qw(red blue green);
704
705 say "some array elements in hash keys" if @array ~~ %hash;
706 say "some array elements in hash keys" if \@array ~~ \%hash;
707
708 say "red in array" if "red" ~~ @array;
709 say "red in array" if "red" ~~ \@array;
710
711 say "some keys end in e" if /e$/ ~~ %hash;
712 say "some keys end in e" if /e$/ ~~ \%hash;
713
40bec8a5
TC
714Two arrays smartmatch if each element in the first array smartmatches
715(that is, is "in") the corresponding element in the second array,
716recursively.
1ca345ed
TC
717
718 use v5.10.1;
719 my @little = qw(red blue green);
720 my @bigger = ("red", "blue", [ "orange", "green" ] );
721 if (@little ~~ @bigger) { # true!
722 say "little is contained in bigger";
a727cfac 723 }
1ca345ed
TC
724
725Because the smartmatch operator recurses on nested arrays, this
726will still report that "red" is in the array.
727
728 use v5.10.1;
729 my @array = qw(red blue green);
730 my $nested_array = [[[[[[[ @array ]]]]]]];
731 say "red in array" if "red" ~~ $nested_array;
732
733If two arrays smartmatch each other, then they are deep
734copies of each others' values, as this example reports:
735
736 use v5.12.0;
a727cfac
SF
737 my @a = (0, 1, 2, [3, [4, 5], 6], 7);
738 my @b = (0, 1, 2, [3, [4, 5], 6], 7);
1ca345ed
TC
739
740 if (@a ~~ @b && @b ~~ @a) {
741 say "a and b are deep copies of each other";
a727cfac 742 }
1ca345ed
TC
743 elsif (@a ~~ @b) {
744 say "a smartmatches in b";
a727cfac 745 }
1ca345ed
TC
746 elsif (@b ~~ @a) {
747 say "b smartmatches in a";
a727cfac 748 }
1ca345ed
TC
749 else {
750 say "a and b don't smartmatch each other at all";
a727cfac 751 }
1ca345ed
TC
752
753
ba7f043c
KW
754If you were to set S<C<$b[3] = 4>>, then instead of reporting that "a and b
755are deep copies of each other", it now reports that C<"b smartmatches in a">.
756That's because the corresponding position in C<@a> contains an array that
1ca345ed
TC
757(eventually) has a 4 in it.
758
759Smartmatching one hash against another reports whether both contain the
46f8a5ea 760same keys, no more and no less. This could be used to see whether two
1ca345ed
TC
761records have the same field names, without caring what values those fields
762might have. For example:
763
764 use v5.10.1;
765 sub make_dogtag {
766 state $REQUIRED_FIELDS = { name=>1, rank=>1, serial_num=>1 };
767
768 my ($class, $init_fields) = @_;
769
770 die "Must supply (only) name, rank, and serial number"
771 unless $init_fields ~~ $REQUIRED_FIELDS;
772
773 ...
774 }
775
1b590b38
LM
776However, this only does what you mean if C<$init_fields> is indeed a hash
777reference. The condition C<$init_fields ~~ $REQUIRED_FIELDS> also allows the
778strings C<"name">, C<"rank">, C<"serial_num"> as well as any array reference
779that contains C<"name"> or C<"rank"> or C<"serial_num"> anywhere to pass
780through.
1ca345ed
TC
781
782The smartmatch operator is most often used as the implicit operator of a
783C<when> clause. See the section on "Switch Statements" in L<perlsyn>.
784
785=head3 Smartmatching of Objects
786
40bec8a5
TC
787To avoid relying on an object's underlying representation, if the
788smartmatch's right operand is an object that doesn't overload C<~~>,
789it raises the exception "C<Smartmatching a non-overloaded object
46f8a5ea
FC
790breaks encapsulation>". That's because one has no business digging
791around to see whether something is "in" an object. These are all
40bec8a5 792illegal on objects without a C<~~> overload:
1ca345ed
TC
793
794 %hash ~~ $object
795 42 ~~ $object
796 "fred" ~~ $object
797
798However, you can change the way an object is smartmatched by overloading
46f8a5ea
FC
799the C<~~> operator. This is allowed to
800extend the usual smartmatch semantics.
1ca345ed
TC
801For objects that do have an C<~~> overload, see L<overload>.
802
803Using an object as the left operand is allowed, although not very useful.
804Smartmatching rules take precedence over overloading, so even if the
805object in the left operand has smartmatch overloading, this will be
806ignored. A left operand that is a non-overloaded object falls back on a
807string or numeric comparison of whatever the C<ref> operator returns. That
808means that
809
810 $object ~~ X
811
812does I<not> invoke the overload method with C<I<X>> as an argument.
813Instead the above table is consulted as normal, and based on the type of
814C<I<X>>, overloading may or may not be invoked. For simple strings or
ba7f043c 815numbers, "in" becomes equivalent to this:
1ca345ed
TC
816
817 $object ~~ $number ref($object) == $number
a727cfac 818 $object ~~ $string ref($object) eq $string
1ca345ed
TC
819
820For example, this reports that the handle smells IOish
821(but please don't really do this!):
822
823 use IO::Handle;
824 my $fh = IO::Handle->new();
825 if ($fh ~~ /\bIO\b/) {
826 say "handle smells IOish";
a727cfac 827 }
1ca345ed
TC
828
829That's because it treats C<$fh> as a string like
830C<"IO::Handle=GLOB(0x8039e0)">, then pattern matches against that.
a034a98d 831
a0d0e21e 832=head2 Bitwise And
d74e8afc 833X<operator, bitwise, and> X<bitwise and> X<&>
a0d0e21e 834
ba7f043c 835Binary C<"&"> returns its operands ANDed together bit by bit. Although no
c791a246
KW
836warning is currently raised, the result is not well defined when this operation
837is performed on operands that aren't either numbers (see
5a0de581 838L</Integer Arithmetic>) nor bitstrings (see L</Bitwise String Operators>).
a0d0e21e 839
ba7f043c 840Note that C<"&"> has lower priority than relational operators, so for example
1ca345ed 841the parentheses are essential in a test like
2cdc098b 842
1ca345ed 843 print "Even\n" if ($x & 1) == 0;
2cdc098b 844
ba7f043c
KW
845If the experimental "bitwise" feature is enabled via S<C<use feature
846'bitwise'>>, then this operator always treats its operand as numbers. This
847feature produces a warning unless you also use C<S<no warnings
848'experimental::bitwise'>>.
fb7054ba 849
a0d0e21e 850=head2 Bitwise Or and Exclusive Or
d74e8afc
ITB
851X<operator, bitwise, or> X<bitwise or> X<|> X<operator, bitwise, xor>
852X<bitwise xor> X<^>
a0d0e21e 853
ba7f043c 854Binary C<"|"> returns its operands ORed together bit by bit.
a0d0e21e 855
ba7f043c 856Binary C<"^"> returns its operands XORed together bit by bit.
c791a246
KW
857
858Although no warning is currently raised, the results are not well
859defined when these operations are performed on operands that aren't either
5a0de581 860numbers (see L</Integer Arithmetic>) nor bitstrings (see L</Bitwise String
c791a246 861Operators>).
a0d0e21e 862
ba7f043c
KW
863Note that C<"|"> and C<"^"> have lower priority than relational operators, so
864for example the parentheses are essential in a test like
2cdc098b 865
1ca345ed 866 print "false\n" if (8 | 2) != 10;
2cdc098b 867
ba7f043c
KW
868If the experimental "bitwise" feature is enabled via S<C<use feature
869'bitwise'>>, then this operator always treats its operand as numbers. This
870feature produces a warning unless you also use S<C<no warnings
871'experimental::bitwise'>>.
fb7054ba 872
a0d0e21e 873=head2 C-style Logical And
d74e8afc 874X<&&> X<logical and> X<operator, logical, and>
a0d0e21e 875
ba7f043c 876Binary C<"&&"> performs a short-circuit logical AND operation. That is,
a0d0e21e
LW
877if the left operand is false, the right operand is not even evaluated.
878Scalar or list context propagates down to the right operand if it
879is evaluated.
880
881=head2 C-style Logical Or
d74e8afc 882X<||> X<operator, logical, or>
a0d0e21e 883
ba7f043c 884Binary C<"||"> performs a short-circuit logical OR operation. That is,
a0d0e21e
LW
885if the left operand is true, the right operand is not even evaluated.
886Scalar or list context propagates down to the right operand if it
887is evaluated.
888
26d9d83b 889=head2 Logical Defined-Or
d74e8afc 890X<//> X<operator, logical, defined-or>
c963b151
BD
891
892Although it has no direct equivalent in C, Perl's C<//> operator is related
ba7f043c 893to its C-style "or". In fact, it's exactly the same as C<||>, except that it
95bee9ba 894tests the left hand side's definedness instead of its truth. Thus,
ba7f043c 895S<C<< EXPR1 // EXPR2 >>> returns the value of C<< EXPR1 >> if it's defined,
46f8a5ea
FC
896otherwise, the value of C<< EXPR2 >> is returned.
897(C<< EXPR1 >> is evaluated in scalar context, C<< EXPR2 >>
898in the context of C<< // >> itself). Usually,
ba7f043c
KW
899this is the same result as S<C<< defined(EXPR1) ? EXPR1 : EXPR2 >>> (except that
900the ternary-operator form can be used as a lvalue, while S<C<< EXPR1 // EXPR2 >>>
46f8a5ea 901cannot). This is very useful for
bdc7923b 902providing default values for variables. If you actually want to test if
ba7f043c 903at least one of C<$x> and C<$y> is defined, use S<C<defined($x // $y)>>.
c963b151 904
d042e63d 905The C<||>, C<//> and C<&&> operators return the last value evaluated
46f8a5ea 906(unlike C's C<||> and C<&&>, which return 0 or 1). Thus, a reasonably
d042e63d 907portable way to find out the home directory might be:
a0d0e21e 908
c543c01b
TC
909 $home = $ENV{HOME}
910 // $ENV{LOGDIR}
911 // (getpwuid($<))[7]
912 // die "You're homeless!\n";
a0d0e21e 913
5a964f20
TC
914In particular, this means that you shouldn't use this
915for selecting between two aggregates for assignment:
916
bf55d65d
LTC
917 @a = @b || @c; # This doesn't do the right thing
918 @a = scalar(@b) || @c; # because it really means this.
919 @a = @b ? @b : @c; # This works fine, though.
5a964f20 920
1ca345ed 921As alternatives to C<&&> and C<||> when used for
f23102e2 922control flow, Perl provides the C<and> and C<or> operators (see below).
ba7f043c
KW
923The short-circuit behavior is identical. The precedence of C<"and">
924and C<"or"> is much lower, however, so that you can safely use them after a
5a964f20 925list operator without the need for parentheses:
a0d0e21e
LW
926
927 unlink "alpha", "beta", "gamma"
928 or gripe(), next LINE;
929
930With the C-style operators that would have been written like this:
931
932 unlink("alpha", "beta", "gamma")
933 || (gripe(), next LINE);
934
1ca345ed
TC
935It would be even more readable to write that this way:
936
937 unless(unlink("alpha", "beta", "gamma")) {
938 gripe();
939 next LINE;
a727cfac 940 }
1ca345ed 941
ba7f043c 942Using C<"or"> for assignment is unlikely to do what you want; see below.
5a964f20
TC
943
944=head2 Range Operators
d74e8afc 945X<operator, range> X<range> X<..> X<...>
a0d0e21e 946
ba7f043c 947Binary C<".."> is the range operator, which is really two different
fb53bbb2 948operators depending on the context. In list context, it returns a
54ae734e 949list of values counting (up by ones) from the left value to the right
2cdbc966 950value. If the left value is greater than the right value then it
fb53bbb2 951returns the empty list. The range operator is useful for writing
ba7f043c 952S<C<foreach (1..10)>> loops and for doing slice operations on arrays. In
2cdbc966
JD
953the current implementation, no temporary array is created when the
954range operator is used as the expression in C<foreach> loops, but older
955versions of Perl might burn a lot of memory when you write something
956like this:
a0d0e21e
LW
957
958 for (1 .. 1_000_000) {
959 # code
54310121 960 }
a0d0e21e 961
8f0f46f8 962The range operator also works on strings, using the magical
963auto-increment, see below.
54ae734e 964
ba7f043c 965In scalar context, C<".."> returns a boolean value. The operator is
8f0f46f8 966bistable, like a flip-flop, and emulates the line-range (comma)
ba7f043c 967operator of B<sed>, B<awk>, and various editors. Each C<".."> operator
8f0f46f8 968maintains its own boolean state, even across calls to a subroutine
46f8a5ea 969that contains it. It is false as long as its left operand is false.
a0d0e21e
LW
970Once the left operand is true, the range operator stays true until the
971right operand is true, I<AFTER> which the range operator becomes false
8f0f46f8 972again. It doesn't become false till the next time the range operator
973is evaluated. It can test the right operand and become false on the
974same evaluation it became true (as in B<awk>), but it still returns
46f8a5ea 975true once. If you don't want it to test the right operand until the
ba7f043c
KW
976next evaluation, as in B<sed>, just use three dots (C<"...">) instead of
977two. In all other regards, C<"..."> behaves just like C<".."> does.
19799a22
GS
978
979The right operand is not evaluated while the operator is in the
980"false" state, and the left operand is not evaluated while the
981operator is in the "true" state. The precedence is a little lower
982than || and &&. The value returned is either the empty string for
8f0f46f8 983false, or a sequence number (beginning with 1) for true. The sequence
984number is reset for each range encountered. The final sequence number
ba7f043c 985in a range has the string C<"E0"> appended to it, which doesn't affect
8f0f46f8 986its numeric value, but gives you something to search for if you want
987to exclude the endpoint. You can exclude the beginning point by
988waiting for the sequence number to be greater than 1.
df5f8116 989
ba7f043c 990If either operand of scalar C<".."> is a constant expression,
df5f8116
CW
991that operand is considered true if it is equal (C<==>) to the current
992input line number (the C<$.> variable).
993
ba7f043c 994To be pedantic, the comparison is actually S<C<int(EXPR) == int(EXPR)>>,
df5f8116
CW
995but that is only an issue if you use a floating point expression; when
996implicitly using C<$.> as described in the previous paragraph, the
ba7f043c 997comparison is S<C<int(EXPR) == int($.)>> which is only an issue when C<$.>
df5f8116 998is set to a floating point value and you are not reading from a file.
ba7f043c 999Furthermore, S<C<"span" .. "spat">> or S<C<2.18 .. 3.14>> will not do what
df5f8116
CW
1000you want in scalar context because each of the operands are evaluated
1001using their integer representation.
1002
1003Examples:
a0d0e21e
LW
1004
1005As a scalar operator:
1006
df5f8116 1007 if (101 .. 200) { print; } # print 2nd hundred lines, short for
950b09ed 1008 # if ($. == 101 .. $. == 200) { print; }
9f10b797
RGS
1009
1010 next LINE if (1 .. /^$/); # skip header lines, short for
f343f960 1011 # next LINE if ($. == 1 .. /^$/);
9f10b797
RGS
1012 # (typically in a loop labeled LINE)
1013
1014 s/^/> / if (/^$/ .. eof()); # quote body
a0d0e21e 1015
5a964f20
TC
1016 # parse mail messages
1017 while (<>) {
1018 $in_header = 1 .. /^$/;
df5f8116
CW
1019 $in_body = /^$/ .. eof;
1020 if ($in_header) {
f343f960 1021 # do something
df5f8116 1022 } else { # in body
f343f960 1023 # do something else
df5f8116 1024 }
5a964f20 1025 } continue {
df5f8116 1026 close ARGV if eof; # reset $. each file
5a964f20
TC
1027 }
1028
acf31ca5
SF
1029Here's a simple example to illustrate the difference between
1030the two range operators:
1031
1032 @lines = (" - Foo",
1033 "01 - Bar",
1034 "1 - Baz",
1035 " - Quux");
1036
9f10b797
RGS
1037 foreach (@lines) {
1038 if (/0/ .. /1/) {
acf31ca5
SF
1039 print "$_\n";
1040 }
1041 }
1042
46f8a5ea 1043This program will print only the line containing "Bar". If
9f10b797 1044the range operator is changed to C<...>, it will also print the
acf31ca5
SF
1045"Baz" line.
1046
1047And now some examples as a list operator:
a0d0e21e 1048
1ca345ed
TC
1049 for (101 .. 200) { print } # print $_ 100 times
1050 @foo = @foo[0 .. $#foo]; # an expensive no-op
1051 @foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items
a0d0e21e 1052
5a964f20 1053The range operator (in list context) makes use of the magical
5f05dabc 1054auto-increment algorithm if the operands are strings. You
a0d0e21e
LW
1055can say
1056
c543c01b 1057 @alphabet = ("A" .. "Z");
a0d0e21e 1058
54ae734e 1059to get all normal letters of the English alphabet, or
a0d0e21e 1060
c543c01b 1061 $hexdigit = (0 .. 9, "a" .. "f")[$num & 15];
a0d0e21e
LW
1062
1063to get a hexadecimal digit, or
1064
1ca345ed
TC
1065 @z2 = ("01" .. "31");
1066 print $z2[$mday];
a0d0e21e 1067
ea4f5703
YST
1068to get dates with leading zeros.
1069
1070If the final value specified is not in the sequence that the magical
1071increment would produce, the sequence goes until the next value would
1072be longer than the final value specified.
1073
d6c970c7
AC
1074As of Perl 5.26, the list-context range operator on strings works as expected
1075in the scope of L<< S<C<"use feature 'unicode_strings">>|feature/The
1076'unicode_strings' feature >>. In previous versions, and outside the scope of
1077that feature, it exhibits L<perlunicode/The "Unicode Bug">: its behavior
1078depends on the internal encoding of the range endpoint.
1079
ea4f5703 1080If the initial value specified isn't part of a magical increment
c543c01b 1081sequence (that is, a non-empty string matching C</^[a-zA-Z]*[0-9]*\z/>),
ea4f5703
YST
1082only the initial value will be returned. So the following will only
1083return an alpha:
1084
c543c01b 1085 use charnames "greek";
ea4f5703
YST
1086 my @greek_small = ("\N{alpha}" .. "\N{omega}");
1087
c543c01b
TC
1088To get the 25 traditional lowercase Greek letters, including both sigmas,
1089you could use this instead:
ea4f5703 1090
c543c01b 1091 use charnames "greek";
a727cfac 1092 my @greek_small = map { chr } ( ord("\N{alpha}")
1ca345ed 1093 ..
a727cfac 1094 ord("\N{omega}")
1ca345ed 1095 );
c543c01b
TC
1096
1097However, because there are I<many> other lowercase Greek characters than
1098just those, to match lowercase Greek characters in a regular expression,
47c56cc8
KW
1099you could use the pattern C</(?:(?=\p{Greek})\p{Lower})+/> (or the
1100L<experimental feature|perlrecharclass/Extended Bracketed Character
1101Classes> C<S</(?[ \p{Greek} & \p{Lower} ])+/>>).
a0d0e21e 1102
ba7f043c 1103Because each operand is evaluated in integer form, S<C<2.18 .. 3.14>> will
df5f8116
CW
1104return two elements in list context.
1105
1106 @list = (2.18 .. 3.14); # same as @list = (2 .. 3);
1107
a0d0e21e 1108=head2 Conditional Operator
d74e8afc 1109X<operator, conditional> X<operator, ternary> X<ternary> X<?:>
a0d0e21e 1110
ba7f043c
KW
1111Ternary C<"?:"> is the conditional operator, just as in C. It works much
1112like an if-then-else. If the argument before the C<?> is true, the
1113argument before the C<:> is returned, otherwise the argument after the
1114C<:> is returned. For example:
cb1a09d0 1115
54310121 1116 printf "I have %d dog%s.\n", $n,
c543c01b 1117 ($n == 1) ? "" : "s";
cb1a09d0
AD
1118
1119Scalar or list context propagates downward into the 2nd
54310121 1120or 3rd argument, whichever is selected.
cb1a09d0 1121
db691027
SF
1122 $x = $ok ? $y : $z; # get a scalar
1123 @x = $ok ? @y : @z; # get an array
1124 $x = $ok ? @y : @z; # oops, that's just a count!
cb1a09d0
AD
1125
1126The operator may be assigned to if both the 2nd and 3rd arguments are
1127legal lvalues (meaning that you can assign to them):
a0d0e21e 1128
db691027 1129 ($x_or_y ? $x : $y) = $z;
a0d0e21e 1130
5a964f20
TC
1131Because this operator produces an assignable result, using assignments
1132without parentheses will get you in trouble. For example, this:
1133
db691027 1134 $x % 2 ? $x += 10 : $x += 2
5a964f20
TC
1135
1136Really means this:
1137
db691027 1138 (($x % 2) ? ($x += 10) : $x) += 2
5a964f20
TC
1139
1140Rather than this:
1141
db691027 1142 ($x % 2) ? ($x += 10) : ($x += 2)
5a964f20 1143
19799a22
GS
1144That should probably be written more simply as:
1145
db691027 1146 $x += ($x % 2) ? 10 : 2;
19799a22 1147
4633a7c4 1148=head2 Assignment Operators
d74e8afc 1149X<assignment> X<operator, assignment> X<=> X<**=> X<+=> X<*=> X<&=>
5ac3b81c 1150X<<< <<= >>> X<&&=> X<-=> X</=> X<|=> X<<< >>= >>> X<||=> X<//=> X<.=>
fb7054ba 1151X<%=> X<^=> X<x=> X<&.=> X<|.=> X<^.=>
a0d0e21e 1152
ba7f043c 1153C<"="> is the ordinary assignment operator.
a0d0e21e
LW
1154
1155Assignment operators work as in C. That is,
1156
db691027 1157 $x += 2;
a0d0e21e
LW
1158
1159is equivalent to
1160
db691027 1161 $x = $x + 2;
a0d0e21e
LW
1162
1163although without duplicating any side effects that dereferencing the lvalue
ba7f043c 1164might trigger, such as from C<tie()>. Other assignment operators work similarly.
54310121 1165The following are recognized:
a0d0e21e 1166
fb7054ba
FC
1167 **= += *= &= &.= <<= &&=
1168 -= /= |= |.= >>= ||=
1169 .= %= ^= ^.= //=
9f10b797 1170 x=
a0d0e21e 1171
19799a22 1172Although these are grouped by family, they all have the precedence
82848c10
FC
1173of assignment. These combined assignment operators can only operate on
1174scalars, whereas the ordinary assignment operator can assign to arrays,
1175hashes, lists and even references. (See L<"Context"|perldata/Context>
1176and L<perldata/List value constructors>, and L<perlref/Assigning to
1177References>.)
a0d0e21e 1178
b350dd2f
GS
1179Unlike in C, the scalar assignment operator produces a valid lvalue.
1180Modifying an assignment is equivalent to doing the assignment and
1181then modifying the variable that was assigned to. This is useful
1182for modifying a copy of something, like this:
a0d0e21e 1183
1ca345ed
TC
1184 ($tmp = $global) =~ tr/13579/24680/;
1185
1186Although as of 5.14, that can be also be accomplished this way:
1187
1188 use v5.14;
1189 $tmp = ($global =~ tr/13579/24680/r);
a0d0e21e
LW
1190
1191Likewise,
1192
db691027 1193 ($x += 2) *= 3;
a0d0e21e
LW
1194
1195is equivalent to
1196
db691027
SF
1197 $x += 2;
1198 $x *= 3;
a0d0e21e 1199
b350dd2f
GS
1200Similarly, a list assignment in list context produces the list of
1201lvalues assigned to, and a list assignment in scalar context returns
1202the number of elements produced by the expression on the right hand
1203side of the assignment.
1204
ba7f043c 1205The three dotted bitwise assignment operators (C<&.=> C<|.=> C<^.=>) are new in
fb7054ba
FC
1206Perl 5.22 and experimental. See L</Bitwise String Operators>.
1207
748a9306 1208=head2 Comma Operator
d74e8afc 1209X<comma> X<operator, comma> X<,>
a0d0e21e 1210
ba7f043c 1211Binary C<","> is the comma operator. In scalar context it evaluates
a0d0e21e
LW
1212its left argument, throws that value away, then evaluates its right
1213argument and returns that value. This is just like C's comma operator.
1214
5a964f20 1215In list context, it's just the list argument separator, and inserts
ed5c6d31
PJ
1216both its arguments into the list. These arguments are also evaluated
1217from left to right.
a0d0e21e 1218
ba7f043c
KW
1219The C<< => >> operator (sometimes pronounced "fat comma") is a synonym
1220for the comma except that it causes a
4e1988c6 1221word on its left to be interpreted as a string if it begins with a letter
344f2c40
IG
1222or underscore and is composed only of letters, digits and underscores.
1223This includes operands that might otherwise be interpreted as operators,
46f8a5ea 1224constants, single number v-strings or function calls. If in doubt about
c543c01b 1225this behavior, the left operand can be quoted explicitly.
344f2c40
IG
1226
1227Otherwise, the C<< => >> operator behaves exactly as the comma operator
1228or list argument separator, according to context.
1229
1230For example:
a44e5664
MS
1231
1232 use constant FOO => "something";
1233
1234 my %h = ( FOO => 23 );
1235
1236is equivalent to:
1237
1238 my %h = ("FOO", 23);
1239
1240It is I<NOT>:
1241
1242 my %h = ("something", 23);
1243
719b43e8
RGS
1244The C<< => >> operator is helpful in documenting the correspondence
1245between keys and values in hashes, and other paired elements in lists.
748a9306 1246
a12b8f3c
FC
1247 %hash = ( $key => $value );
1248 login( $username => $password );
a44e5664 1249
4e1988c6
FC
1250The special quoting behavior ignores precedence, and hence may apply to
1251I<part> of the left operand:
1252
1253 print time.shift => "bbb";
1254
ba7f043c 1255That example prints something like C<"1314363215shiftbbb">, because the
4e1988c6
FC
1256C<< => >> implicitly quotes the C<shift> immediately on its left, ignoring
1257the fact that C<time.shift> is the entire left operand.
1258
a0d0e21e 1259=head2 List Operators (Rightward)
d74e8afc 1260X<operator, list, rightward> X<list operator>
a0d0e21e 1261
c543c01b 1262On the right side of a list operator, the comma has very low precedence,
a0d0e21e
LW
1263such that it controls all comma-separated expressions found there.
1264The only operators with lower precedence are the logical operators
ba7f043c 1265C<"and">, C<"or">, and C<"not">, which may be used to evaluate calls to list
1ca345ed
TC
1266operators without the need for parentheses:
1267
a8980281
P
1268 open HANDLE, "< :encoding(UTF-8)", "filename"
1269 or die "Can't open: $!\n";
1ca345ed
TC
1270
1271However, some people find that code harder to read than writing
1272it with parentheses:
1273
a8980281
P
1274 open(HANDLE, "< :encoding(UTF-8)", "filename")
1275 or die "Can't open: $!\n";
1ca345ed 1276
ba7f043c 1277in which case you might as well just use the more customary C<"||"> operator:
a0d0e21e 1278
a8980281
P
1279 open(HANDLE, "< :encoding(UTF-8)", "filename")
1280 || die "Can't open: $!\n";
a0d0e21e 1281
a95b3d6a 1282See also discussion of list operators in L</Terms and List Operators (Leftward)>.
a0d0e21e
LW
1283
1284=head2 Logical Not
d74e8afc 1285X<operator, logical, not> X<not>
a0d0e21e 1286
ba7f043c
KW
1287Unary C<"not"> returns the logical negation of the expression to its right.
1288It's the equivalent of C<"!"> except for the very low precedence.
a0d0e21e
LW
1289
1290=head2 Logical And
d74e8afc 1291X<operator, logical, and> X<and>
a0d0e21e 1292
ba7f043c 1293Binary C<"and"> returns the logical conjunction of the two surrounding
c543c01b
TC
1294expressions. It's equivalent to C<&&> except for the very low
1295precedence. This means that it short-circuits: the right
a0d0e21e
LW
1296expression is evaluated only if the left expression is true.
1297
59ab9d6e 1298=head2 Logical or and Exclusive Or
f23102e2 1299X<operator, logical, or> X<operator, logical, xor>
59ab9d6e 1300X<operator, logical, exclusive or>
f23102e2 1301X<or> X<xor>
a0d0e21e 1302
ba7f043c 1303Binary C<"or"> returns the logical disjunction of the two surrounding
c543c01b
TC
1304expressions. It's equivalent to C<||> except for the very low precedence.
1305This makes it useful for control flow:
5a964f20
TC
1306
1307 print FH $data or die "Can't write to FH: $!";
1308
c543c01b
TC
1309This means that it short-circuits: the right expression is evaluated
1310only if the left expression is false. Due to its precedence, you must
1311be careful to avoid using it as replacement for the C<||> operator.
1312It usually works out better for flow control than in assignments:
5a964f20 1313
db691027
SF
1314 $x = $y or $z; # bug: this is wrong
1315 ($x = $y) or $z; # really means this
1316 $x = $y || $z; # better written this way
5a964f20 1317
19799a22 1318However, when it's a list-context assignment and you're trying to use
ba7f043c 1319C<||> for control flow, you probably need C<"or"> so that the assignment
5a964f20
TC
1320takes higher precedence.
1321
1322 @info = stat($file) || die; # oops, scalar sense of stat!
1323 @info = stat($file) or die; # better, now @info gets its due
1324
c963b151
BD
1325Then again, you could always use parentheses.
1326
ba7f043c 1327Binary C<"xor"> returns the exclusive-OR of the two surrounding expressions.
c543c01b 1328It cannot short-circuit (of course).
a0d0e21e 1329
59ab9d6e
MB
1330There is no low precedence operator for defined-OR.
1331
a0d0e21e 1332=head2 C Operators Missing From Perl
d74e8afc
ITB
1333X<operator, missing from perl> X<&> X<*>
1334X<typecasting> X<(TYPE)>
a0d0e21e
LW
1335
1336Here is what C has that Perl doesn't:
1337
1338=over 8
1339
1340=item unary &
1341
ba7f043c 1342Address-of operator. (But see the C<"\"> operator for taking a reference.)
a0d0e21e
LW
1343
1344=item unary *
1345
46f8a5ea 1346Dereference-address operator. (Perl's prefix dereferencing
ba7f043c 1347operators are typed: C<$>, C<@>, C<%>, and C<&>.)
a0d0e21e
LW
1348
1349=item (TYPE)
1350
19799a22 1351Type-casting operator.
a0d0e21e
LW
1352
1353=back
1354
5f05dabc 1355=head2 Quote and Quote-like Operators
89d205f2 1356X<operator, quote> X<operator, quote-like> X<q> X<qq> X<qx> X<qw> X<m>
d74e8afc
ITB
1357X<qr> X<s> X<tr> X<'> X<''> X<"> X<""> X<//> X<`> X<``> X<<< << >>>
1358X<escape sequence> X<escape>
1359
a0d0e21e
LW
1360While we usually think of quotes as literal values, in Perl they
1361function as operators, providing various kinds of interpolating and
1362pattern matching capabilities. Perl provides customary quote characters
1363for these behaviors, but also provides a way for you to choose your
1364quote character for any of them. In the following table, a C<{}> represents
9f10b797 1365any pair of delimiters you choose.
a0d0e21e 1366
2c268ad5
TP
1367 Customary Generic Meaning Interpolates
1368 '' q{} Literal no
1369 "" qq{} Literal yes
af9219ee 1370 `` qx{} Command yes*
2c268ad5 1371 qw{} Word list no
af9219ee
MG
1372 // m{} Pattern match yes*
1373 qr{} Pattern yes*
1374 s{}{} Substitution yes*
2c268ad5 1375 tr{}{} Transliteration no (but see below)
c543c01b 1376 y{}{} Transliteration no (but see below)
7e3b091d 1377 <<EOF here-doc yes*
a0d0e21e 1378
af9219ee
MG
1379 * unless the delimiter is ''.
1380
87275199 1381Non-bracketing delimiters use the same character fore and aft, but the four
c543c01b 1382sorts of ASCII brackets (round, angle, square, curly) all nest, which means
9f10b797 1383that
87275199 1384
c543c01b 1385 q{foo{bar}baz}
35f2feb0 1386
9f10b797 1387is the same as
87275199 1388
c543c01b 1389 'foo{bar}baz'
87275199
GS
1390
1391Note, however, that this does not always work for quoting Perl code:
1392
db691027 1393 $s = q{ if($x eq "}") ... }; # WRONG
87275199 1394
ba7f043c 1395is a syntax error. The C<L<Text::Balanced>> module (standard as of v5.8,
c543c01b 1396and from CPAN before then) is able to do this properly.
87275199 1397
841bfb48
KW
1398There can (and in some cases, must) be whitespace between the operator
1399and the quoting
fb73857a 1400characters, except when C<#> is being used as the quoting character.
ba7f043c 1401C<q#foo#> is parsed as the string C<foo>, while S<C<q #foo#>> is the
19799a22
GS
1402operator C<q> followed by a comment. Its argument will be taken
1403from the next line. This allows you to write:
fb73857a 1404
1405 s {foo} # Replace foo
1406 {bar} # with bar.
1407
841bfb48
KW
1408The cases where whitespace must be used are when the quoting character
1409is a word character (meaning it matches C</\w/>):
1410
1411 q XfooX # Works: means the string 'foo'
1412 qXfooX # WRONG!
1413
c543c01b
TC
1414The following escape sequences are available in constructs that interpolate,
1415and in transliterations:
5691ca5f 1416X<\t> X<\n> X<\r> X<\f> X<\b> X<\a> X<\e> X<\x> X<\0> X<\c> X<\N> X<\N{}>
04341565 1417X<\o{}>
5691ca5f 1418
2c4c1ff2
KW
1419 Sequence Note Description
1420 \t tab (HT, TAB)
1421 \n newline (NL)
1422 \r return (CR)
1423 \f form feed (FF)
1424 \b backspace (BS)
1425 \a alarm (bell) (BEL)
1426 \e escape (ESC)
c543c01b 1427 \x{263A} [1,8] hex char (example: SMILEY)
2c4c1ff2 1428 \x1b [2,8] restricted range hex char (example: ESC)
fb121860 1429 \N{name} [3] named Unicode character or character sequence
2c4c1ff2
KW
1430 \N{U+263D} [4,8] Unicode character (example: FIRST QUARTER MOON)
1431 \c[ [5] control char (example: chr(27))
1432 \o{23072} [6,8] octal char (example: SMILEY)
1433 \033 [7,8] restricted range octal char (example: ESC)
5691ca5f
KW
1434
1435=over 4
1436
1437=item [1]
1438
2c4c1ff2
KW
1439The result is the character specified by the hexadecimal number between
1440the braces. See L</[8]> below for details on which character.
96448467 1441
46f8a5ea 1442Only hexadecimal digits are valid between the braces. If an invalid
96448467
DG
1443character is encountered, a warning will be issued and the invalid
1444character and all subsequent characters (valid or invalid) within the
1445braces will be discarded.
1446
1447If there are no valid digits between the braces, the generated character is
1448the NULL character (C<\x{00}>). However, an explicit empty brace (C<\x{}>)
c543c01b 1449will not cause a warning (currently).
40687185
KW
1450
1451=item [2]
1452
2c4c1ff2
KW
1453The result is the character specified by the hexadecimal number in the range
14540x00 to 0xFF. See L</[8]> below for details on which character.
96448467
DG
1455
1456Only hexadecimal digits are valid following C<\x>. When C<\x> is followed
2c4c1ff2 1457by fewer than two valid digits, any valid digits will be zero-padded. This
ba7f043c 1458means that C<\x7> will be interpreted as C<\x07>, and a lone C<"\x"> will be
2c4c1ff2 1459interpreted as C<\x00>. Except at the end of a string, having fewer than
c543c01b 1460two valid digits will result in a warning. Note that although the warning
96448467
DG
1461says the illegal character is ignored, it is only ignored as part of the
1462escape and will still be used as the subsequent character in the string.
1463For example:
1464
1465 Original Result Warns?
1466 "\x7" "\x07" no
1467 "\x" "\x00" no
1468 "\x7q" "\x07q" yes
1469 "\xq" "\x00q" yes
1470
40687185
KW
1471=item [3]
1472
fb121860 1473The result is the Unicode character or character sequence given by I<name>.
2c4c1ff2 1474See L<charnames>.
40687185
KW
1475
1476=item [4]
1477
ba7f043c 1478S<C<\N{U+I<hexadecimal number>}>> means the Unicode character whose Unicode code
2c4c1ff2 1479point is I<hexadecimal number>.
40687185
KW
1480
1481=item [5]
1482
5691ca5f
KW
1483The character following C<\c> is mapped to some other character as shown in the
1484table:
1485
1486 Sequence Value
1487 \c@ chr(0)
1488 \cA chr(1)
1489 \ca chr(1)
1490 \cB chr(2)
1491 \cb chr(2)
1492 ...
1493 \cZ chr(26)
1494 \cz chr(26)
1495 \c[ chr(27)
ba7f043c 1496 # See below for chr(28)
5691ca5f
KW
1497 \c] chr(29)
1498 \c^ chr(30)
c3e9d7a9 1499 \c_ chr(31)
ba7f043c
KW
1500 \c? chr(127) # (on ASCII platforms; see below for link to
1501 # EBCDIC discussion)
5691ca5f 1502
d813941f 1503In other words, it's the character whose code point has had 64 xor'd with
c3e9d7a9
KW
1504its uppercase. C<\c?> is DELETE on ASCII platforms because
1505S<C<ord("?") ^ 64>> is 127, and
ba7f043c 1506C<\c@> is NULL because the ord of C<"@"> is 64, so xor'ing 64 itself produces 0.
d813941f 1507
ba7f043c 1508Also, C<\c\I<X>> yields S<C< chr(28) . "I<X>">> for any I<X>, but cannot come at the
5691ca5f
KW
1509end of a string, because the backslash would be parsed as escaping the end
1510quote.
1511
1512On ASCII platforms, the resulting characters from the list above are the
1513complete set of ASCII controls. This isn't the case on EBCDIC platforms; see
c3e9d7a9
KW
1514L<perlebcdic/OPERATOR DIFFERENCES> for a full discussion of the
1515differences between these for ASCII versus EBCDIC platforms.
5691ca5f 1516
c3e9d7a9 1517Use of any other character following the C<"c"> besides those listed above is
63a63d81
KW
1518discouraged, and as of Perl v5.20, the only characters actually allowed
1519are the printable ASCII ones, minus the left brace C<"{">. What happens
1520for any of the allowed other characters is that the value is derived by
1521xor'ing with the seventh bit, which is 64, and a warning raised if
1522enabled. Using the non-allowed characters generates a fatal error.
5691ca5f
KW
1523
1524To get platform independent controls, you can use C<\N{...}>.
1525
40687185
KW
1526=item [6]
1527
2c4c1ff2
KW
1528The result is the character specified by the octal number between the braces.
1529See L</[8]> below for details on which character.
04341565
DG
1530
1531If a character that isn't an octal digit is encountered, a warning is raised,
1532and the value is based on the octal digits before it, discarding it and all
1533following characters up to the closing brace. It is a fatal error if there are
1534no octal digits at all.
1535
1536=item [7]
1537
c543c01b 1538The result is the character specified by the three-digit octal number in the
2c4c1ff2
KW
1539range 000 to 777 (but best to not use above 077, see next paragraph). See
1540L</[8]> below for details on which character.
1541
1542Some contexts allow 2 or even 1 digit, but any usage without exactly
40687185 1543three digits, the first being a zero, may give unintended results. (For
5db3e519
FC
1544example, in a regular expression it may be confused with a backreference;
1545see L<perlrebackslash/Octal escapes>.) Starting in Perl 5.14, you may
c543c01b 1546use C<\o{}> instead, which avoids all these problems. Otherwise, it is best to
04341565
DG
1547use this construct only for ordinals C<\077> and below, remembering to pad to
1548the left with zeros to make three digits. For larger ordinals, either use
ba7f043c
KW
1549C<\o{}>, or convert to something else, such as to hex and use C<\N{U+}>
1550(which is portable between platforms with different character sets) or
1551C<\x{}> instead.
40687185 1552
2c4c1ff2
KW
1553=item [8]
1554
c543c01b 1555Several constructs above specify a character by a number. That number
2c4c1ff2 1556gives the character's position in the character set encoding (indexed from 0).
c543c01b 1557This is called synonymously its ordinal, code position, or code point. Perl
2c4c1ff2
KW
1558works on platforms that have a native encoding currently of either ASCII/Latin1
1559or EBCDIC, each of which allow specification of 256 characters. In general, if
1560the number is 255 (0xFF, 0377) or below, Perl interprets this in the platform's
1561native encoding. If the number is 256 (0x100, 0400) or above, Perl interprets
c543c01b 1562it as a Unicode code point and the result is the corresponding Unicode
2c4c1ff2
KW
1563character. For example C<\x{50}> and C<\o{120}> both are the number 80 in
1564decimal, which is less than 256, so the number is interpreted in the native
1565character set encoding. In ASCII the character in the 80th position (indexed
ba7f043c 1566from 0) is the letter C<"P">, and in EBCDIC it is the ampersand symbol C<"&">.
2c4c1ff2
KW
1567C<\x{100}> and C<\o{400}> are both 256 in decimal, so the number is interpreted
1568as a Unicode code point no matter what the native encoding is. The name of the
9fef6a0d 1569character in the 256th position (indexed by 0) in Unicode is
2c4c1ff2
KW
1570C<LATIN CAPITAL LETTER A WITH MACRON>.
1571
2dc9bc84 1572An exception to the above rule is that S<C<\N{U+I<hex number>}>> is
ba7f043c 1573always interpreted as a Unicode code point, so that C<\N{U+0050}> is C<"P"> even
2dc9bc84 1574on EBCDIC platforms.
2c4c1ff2 1575
5691ca5f 1576=back
4c77eaa2 1577
e526e8bb 1578B<NOTE>: Unlike C and other languages, Perl has no C<\v> escape sequence for
8b312c40 1579the vertical tab (VT, which is 11 in both ASCII and EBCDIC), but you may
ba7f043c 1580use C<\N{VT}>, C<\ck>, C<\N{U+0b}>, or C<\x0b>. (C<\v>
e526e8bb
KW
1581does have meaning in regular expression patterns in Perl, see L<perlre>.)
1582
1583The following escape sequences are available in constructs that interpolate,
904501ec 1584but not in transliterations.
628253b8 1585X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q> X<\F>
904501ec 1586
c543c01b
TC
1587 \l lowercase next character only
1588 \u titlecase (not uppercase!) next character only
e4d34742
EB
1589 \L lowercase all characters till \E or end of string
1590 \U uppercase all characters till \E or end of string
628253b8 1591 \F foldcase all characters till \E or end of string
736fe711
KW
1592 \Q quote (disable) pattern metacharacters till \E or
1593 end of string
7e31b643 1594 \E end either case modification or quoted section
c543c01b
TC
1595 (whichever was last seen)
1596
736fe711
KW
1597See L<perlfunc/quotemeta> for the exact definition of characters that
1598are quoted by C<\Q>.
1599
628253b8 1600C<\L>, C<\U>, C<\F>, and C<\Q> can stack, in which case you need one
c543c01b
TC
1601C<\E> for each. For example:
1602
9fef6a0d
KW
1603 say"This \Qquoting \ubusiness \Uhere isn't quite\E done yet,\E is it?";
1604 This quoting\ Business\ HERE\ ISN\'T\ QUITE\ done\ yet\, is it?
a0d0e21e 1605
ba7f043c
KW
1606If a S<C<use locale>> form that includes C<LC_CTYPE> is in effect (see
1607L<perllocale>), the case map used by C<\l>, C<\L>, C<\u>, and C<\U> is
1608taken from the current locale. If Unicode (for example, C<\N{}> or code
1609points of 0x100 or beyond) is being used, the case map used by C<\l>,
1610C<\L>, C<\u>, and C<\U> is as defined by Unicode. That means that
1611case-mapping a single character can sometimes produce a sequence of
1612several characters.
1613Under S<C<use locale>>, C<\F> produces the same results as C<\L>
31f05a37
KW
1614for all locales but a UTF-8 one, where it instead uses the Unicode
1615definition.
a034a98d 1616
5a964f20
TC
1617All systems use the virtual C<"\n"> to represent a line terminator,
1618called a "newline". There is no such thing as an unvarying, physical
19799a22 1619newline character. It is only an illusion that the operating system,
5a964f20
TC
1620device drivers, C libraries, and Perl all conspire to preserve. Not all
1621systems read C<"\r"> as ASCII CR and C<"\n"> as ASCII LF. For example,
c543c01b 1622on the ancient Macs (pre-MacOS X) of yesteryear, these used to be reversed,
ba7f043c 1623and on systems without a line terminator,
c543c01b 1624printing C<"\n"> might emit no actual data. In general, use C<"\n"> when
5a964f20
TC
1625you mean a "newline" for your system, but use the literal ASCII when you
1626need an exact character. For example, most networking protocols expect
2a380090 1627and prefer a CR+LF (C<"\015\012"> or C<"\cM\cJ">) for line terminators,
5a964f20
TC
1628and although they often accept just C<"\012">, they seldom tolerate just
1629C<"\015">. If you get in the habit of using C<"\n"> for networking,
1630you may be burned some day.
d74e8afc
ITB
1631X<newline> X<line terminator> X<eol> X<end of line>
1632X<\n> X<\r> X<\r\n>
5a964f20 1633
904501ec
MG
1634For constructs that do interpolate, variables beginning with "C<$>"
1635or "C<@>" are interpolated. Subscripted variables such as C<$a[3]> or
ad0f383a
A
1636C<< $href->{key}[0] >> are also interpolated, as are array and hash slices.
1637But method calls such as C<< $obj->meth >> are not.
af9219ee
MG
1638
1639Interpolating an array or slice interpolates the elements in order,
1640separated by the value of C<$">, so is equivalent to interpolating
ba7f043c 1641S<C<join $", @array>>. "Punctuation" arrays such as C<@*> are usually
c543c01b
TC
1642interpolated only if the name is enclosed in braces C<@{*}>, but the
1643arrays C<@_>, C<@+>, and C<@-> are interpolated even without braces.
af9219ee 1644
bc7b91c6
EB
1645For double-quoted strings, the quoting from C<\Q> is applied after
1646interpolation and escapes are processed.
1647
1648 "abc\Qfoo\tbar$s\Exyz"
1649
1650is equivalent to
1651
1652 "abc" . quotemeta("foo\tbar$s") . "xyz"
1653
1654For the pattern of regex operators (C<qr//>, C<m//> and C<s///>),
1655the quoting from C<\Q> is applied after interpolation is processed,
46f8a5ea
FC
1656but before escapes are processed. This allows the pattern to match
1657literally (except for C<$> and C<@>). For example, the following matches:
bc7b91c6
EB
1658
1659 '\s\t' =~ /\Q\s\t/
1660
1661Because C<$> or C<@> trigger interpolation, you'll need to use something
1662like C</\Quser\E\@\Qhost/> to match them literally.
1d2dff63 1663
a0d0e21e
LW
1664Patterns are subject to an additional level of interpretation as a
1665regular expression. This is done as a second pass, after variables are
1666interpolated, so that regular expressions may be incorporated into the
1667pattern from the variables. If this is not what you want, use C<\Q> to
1668interpolate a variable literally.
1669
19799a22
GS
1670Apart from the behavior described above, Perl does not expand
1671multiple levels of interpolation. In particular, contrary to the
1672expectations of shell programmers, back-quotes do I<NOT> interpolate
1673within double quotes, nor do single quotes impede evaluation of
1674variables when used within double quotes.
a0d0e21e 1675
5f05dabc 1676=head2 Regexp Quote-Like Operators
d74e8afc 1677X<operator, regexp>
cb1a09d0 1678
5f05dabc 1679Here are the quote-like operators that apply to pattern
cb1a09d0
AD
1680matching and related activities.
1681
a0d0e21e
LW
1682=over 8
1683
ba7f043c 1684=item C<qr/I<STRING>/msixpodualn>
01c6f5f4 1685X<qr> X</i> X</m> X</o> X</s> X</x> X</p>
a0d0e21e 1686
87e95b7f
YO
1687This operator quotes (and possibly compiles) its I<STRING> as a regular
1688expression. I<STRING> is interpolated the same way as I<PATTERN>
6d314683
YO
1689in C<m/I<PATTERN>/>. If C<"'"> is used as the delimiter, no variable
1690interpolation is done. Returns a Perl value which may be used instead of the
ba7f043c 1691corresponding C</I<STRING>/msixpodualn> expression. The returned value is a
46f8a5ea 1692normalized version of the original pattern. It magically differs from
1c8ee595 1693a string containing the same characters: C<ref(qr/x/)> returns "Regexp";
a727cfac 1694however, dereferencing it is not well defined (you currently get the
1c8ee595
CO
1695normalized version of the original pattern, but this may change).
1696
a0d0e21e 1697
87e95b7f
YO
1698For example,
1699
1700 $rex = qr/my.STRING/is;
85dd5c8b 1701 print $rex; # prints (?si-xm:my.STRING)
87e95b7f
YO
1702 s/$rex/foo/;
1703
1704is equivalent to
1705
1706 s/my.STRING/foo/is;
1707
1708The result may be used as a subpattern in a match:
1709
1710 $re = qr/$pattern/;
7188ca43
KW
1711 $string =~ /foo${re}bar/; # can be interpolated in other
1712 # patterns
87e95b7f
YO
1713 $string =~ $re; # or used standalone
1714 $string =~ /$re/; # or this way
1715
ba7f043c
KW
1716Since Perl may compile the pattern at the moment of execution of the C<qr()>
1717operator, using C<qr()> may have speed advantages in some situations,
1718notably if the result of C<qr()> is used standalone:
87e95b7f
YO
1719
1720 sub match {
1721 my $patterns = shift;
1722 my @compiled = map qr/$_/i, @$patterns;
1723 grep {
1724 my $success = 0;
1725 foreach my $pat (@compiled) {
1726 $success = 1, last if /$pat/;
1727 }
1728 $success;
1729 } @_;
5a964f20
TC
1730 }
1731
87e95b7f 1732Precompilation of the pattern into an internal representation at
ba7f043c 1733the moment of C<qr()> avoids the need to recompile the pattern every
87e95b7f
YO
1734time a match C</$pat/> is attempted. (Perl has many other internal
1735optimizations, but none would be triggered in the above example if
ba7f043c 1736we did not use C<qr()> operator.)
87e95b7f 1737
765fa144 1738Options (specified by the following modifiers) are:
87e95b7f
YO
1739
1740 m Treat string as multiple lines.
1741 s Treat string as single line. (Make . match a newline)
1742 i Do case-insensitive pattern matching.
77c8f263
KW
1743 x Use extended regular expressions; specifying two
1744 x's means \t and the SPACE character are ignored within
1745 square-bracketed character classes
87e95b7f 1746 p When matching preserve a copy of the matched string so
7188ca43 1747 that ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be
ba7f043c 1748 defined (ignored starting in v5.20) as these are always
1a8aad5a 1749 defined starting in that release
87e95b7f 1750 o Compile pattern only once.
8ef45c18
KW
1751 a ASCII-restrict: Use ASCII for \d, \s, \w and [[:posix:]]
1752 character classes; specifying two a's adds the further
1753 restriction that no ASCII character will match a
1754 non-ASCII one under /i.
ba7f043c 1755 l Use the current run-time locale's rules.
48cbae4f
SK
1756 u Use Unicode rules.
1757 d Use Unicode or native charset, as in 5.12 and earlier.
33be4c61 1758 n Non-capture mode. Don't let () fill in $1, $2, etc...
87e95b7f
YO
1759
1760If a precompiled pattern is embedded in a larger pattern then the effect
ba7f043c
KW
1761of C<"msixpluadn"> will be propagated appropriately. The effect that the
1762C</o> modifier has is not propagated, being restricted to those patterns
87e95b7f
YO
1763explicitly using it.
1764
b6fa137b 1765The last four modifiers listed above, added in Perl 5.14,
850b7ec9 1766control the character set rules, but C</a> is the only one you are likely
18509dec
KW
1767to want to specify explicitly; the other three are selected
1768automatically by various pragmas.
da392a17 1769
ba7f043c 1770See L<perlre> for additional information on valid syntax for I<STRING>, and
5e2aa8f5 1771for a detailed look at the semantics of regular expressions. In
1ca345ed
TC
1772particular, all modifiers except the largely obsolete C</o> are further
1773explained in L<perlre/Modifiers>. C</o> is described in the next section.
a0d0e21e 1774
ba7f043c 1775=item C<m/I<PATTERN>/msixpodualngc>
89d205f2
YO
1776X<m> X<operator, match>
1777X<regexp, options> X<regexp> X<regex, options> X<regex>
01c6f5f4 1778X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c>
a0d0e21e 1779
ba7f043c 1780=item C</I<PATTERN>/msixpodualngc>
a0d0e21e 1781
5a964f20 1782Searches a string for a pattern match, and in scalar context returns
19799a22 1783true if it succeeds, false if it fails. If no string is specified
ba7f043c 1784via the C<=~> or C<!~> operator, the C<$_> string is searched. (The
19799a22
GS
1785string specified with C<=~> need not be an lvalue--it may be the
1786result of an expression evaluation, but remember the C<=~> binds
006671a6 1787rather tightly.) See also L<perlre>.
a0d0e21e 1788
f6050459 1789Options are as described in C<qr//> above; in addition, the following match
01c6f5f4 1790process modifiers are available:
a0d0e21e 1791
950b09ed 1792 g Match globally, i.e., find all occurrences.
7188ca43
KW
1793 c Do not reset search position on a failed match when /g is
1794 in effect.
a0d0e21e 1795
ba7f043c 1796If C<"/"> is the delimiter then the initial C<m> is optional. With the C<m>
c543c01b 1797you can use any pair of non-whitespace (ASCII) characters
725a61d7 1798as delimiters. This is particularly useful for matching path names
ba7f043c 1799that contain C<"/">, to avoid LTS (leaning toothpick syndrome). If C<"?"> is
725a61d7 1800the delimiter, then a match-only-once rule applies,
ba7f043c 1801described in C<m?I<PATTERN>?> below. If C<"'"> (single quote) is the delimiter,
6d314683 1802no variable interpolation is performed on the I<PATTERN>.
ba7f043c 1803When using a delimiter character valid in an identifier, whitespace is required
ed02a3bf 1804after the C<m>.
a0d0e21e 1805
ba7f043c 1806I<PATTERN> may contain variables, which will be interpolated
532c9e80 1807every time the pattern search is evaluated, except
1f247705
GS
1808for when the delimiter is a single quote. (Note that C<$(>, C<$)>, and
1809C<$|> are not interpolated because they look like end-of-string tests.)
532c9e80
KW
1810Perl will not recompile the pattern unless an interpolated
1811variable that it contains changes. You can force Perl to skip the
1812test and never recompile by adding a C</o> (which stands for "once")
1813after the trailing delimiter.
1814Once upon a time, Perl would recompile regular expressions
1815unnecessarily, and this modifier was useful to tell it not to do so, in the
5cc41653 1816interests of speed. But now, the only reasons to use C</o> are one of:
532c9e80
KW
1817
1818=over
1819
1820=item 1
1821
1822The variables are thousands of characters long and you know that they
1823don't change, and you need to wring out the last little bit of speed by
1824having Perl skip testing for that. (There is a maintenance penalty for
1825doing this, as mentioning C</o> constitutes a promise that you won't
18509dec 1826change the variables in the pattern. If you do change them, Perl won't
532c9e80
KW
1827even notice.)
1828
1829=item 2
1830
1831you want the pattern to use the initial values of the variables
1832regardless of whether they change or not. (But there are saner ways
1833of accomplishing this than using C</o>.)
1834
fa9b8686
DM
1835=item 3
1836
1837If the pattern contains embedded code, such as
1838
1839 use re 'eval';
1840 $code = 'foo(?{ $x })';
1841 /$code/
1842
1843then perl will recompile each time, even though the pattern string hasn't
1844changed, to ensure that the current value of C<$x> is seen each time.
1845Use C</o> if you want to avoid this.
1846
532c9e80 1847=back
a0d0e21e 1848
18509dec
KW
1849The bottom line is that using C</o> is almost never a good idea.
1850
ba7f043c 1851=item The empty pattern C<//>
e9d89077 1852
ba7f043c 1853If the I<PATTERN> evaluates to the empty string, the last
46f8a5ea 1854I<successfully> matched regular expression is used instead. In this
c543c01b 1855case, only the C<g> and C<c> flags on the empty pattern are honored;
46f8a5ea 1856the other flags are taken from the original pattern. If no match has
d65afb4b
HS
1857previously succeeded, this will (silently) act instead as a genuine
1858empty pattern (which will always match).
a0d0e21e 1859
89d205f2
YO
1860Note that it's possible to confuse Perl into thinking C<//> (the empty
1861regex) is really C<//> (the defined-or operator). Perl is usually pretty
1862good about this, but some pathological cases might trigger this, such as
ba7f043c
KW
1863C<$x///> (is that S<C<($x) / (//)>> or S<C<$x // />>?) and S<C<print $fh //>>
1864(S<C<print $fh(//>> or S<C<print($fh //>>?). In all of these examples, Perl
89d205f2
YO
1865will assume you meant defined-or. If you meant the empty regex, just
1866use parentheses or spaces to disambiguate, or even prefix the empty
c963b151
BD
1867regex with an C<m> (so C<//> becomes C<m//>).
1868
e9d89077
DN
1869=item Matching in list context
1870
19799a22 1871If the C</g> option is not used, C<m//> in list context returns a
a0d0e21e 1872list consisting of the subexpressions matched by the parentheses in the
3ff8ecf9
BF
1873pattern, that is, (C<$1>, C<$2>, C<$3>...) (Note that here C<$1> etc. are
1874also set). When there are no parentheses in the pattern, the return
a727cfac 1875value is the list C<(1)> for success.
3ff8ecf9 1876With or without parentheses, an empty list is returned upon failure.
a0d0e21e
LW
1877
1878Examples:
1879
7188ca43
KW
1880 open(TTY, "+</dev/tty")
1881 || die "can't access /dev/tty: $!";
c543c01b 1882
7188ca43 1883 <TTY> =~ /^y/i && foo(); # do foo if desired
a0d0e21e 1884
7188ca43 1885 if (/Version: *([0-9.]*)/) { $version = $1; }
a0d0e21e 1886
7188ca43 1887 next if m#^/usr/spool/uucp#;
a0d0e21e 1888
7188ca43
KW
1889 # poor man's grep
1890 $arg = shift;
1891 while (<>) {
1892 print if /$arg/o; # compile only once (no longer needed!)
1893 }
a0d0e21e 1894
7188ca43 1895 if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
a0d0e21e 1896
ba7f043c
KW
1897This last example splits C<$foo> into the first two words and the
1898remainder of the line, and assigns those three fields to C<$F1>, C<$F2>, and
1899C<$Etc>. The conditional is true if any variables were assigned; that is,
c543c01b 1900if the pattern matched.
a0d0e21e 1901
19799a22 1902The C</g> modifier specifies global pattern matching--that is,
46f8a5ea
FC
1903matching as many times as possible within the string. How it behaves
1904depends on the context. In list context, it returns a list of the
19799a22 1905substrings matched by any capturing parentheses in the regular
46f8a5ea 1906expression. If there are no parentheses, it returns a list of all
19799a22
GS
1907the matched strings, as if there were parentheses around the whole
1908pattern.
a0d0e21e 1909
7e86de3e 1910In scalar context, each execution of C<m//g> finds the next match,
19799a22 1911returning true if it matches, and false if there is no further match.
3dd93342 1912The position after the last match can be read or set using the C<pos()>
46f8a5ea 1913function; see L<perlfunc/pos>. A failed match normally resets the
7e86de3e 1914search position to the beginning of the string, but you can avoid that
46f8a5ea 1915by adding the C</c> modifier (for example, C<m//gc>). Modifying the target
7e86de3e 1916string also resets the search position.
c90c0ff4 1917
ba7f043c 1918=item C<\G I<assertion>>
e9d89077 1919
c90c0ff4 1920You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a
3dd93342 1921zero-width assertion that matches the exact position where the
46f8a5ea 1922previous C<m//g>, if any, left off. Without the C</g> modifier, the
3dd93342 1923C<\G> assertion still anchors at C<pos()> as it was at the start of
1924the operation (see L<perlfunc/pos>), but the match is of course only
46f8a5ea 1925attempted once. Using C<\G> without C</g> on a target string that has
3dd93342 1926not previously had a C</g> match applied to it is the same as using
1927the C<\A> assertion to match the beginning of the string. Note also
1928that, currently, C<\G> is only properly supported when anchored at the
1929very beginning of the pattern.
c90c0ff4 1930
1931Examples:
a0d0e21e
LW
1932
1933 # list context
1934 ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g);
1935
1936 # scalar context
c543c01b
TC
1937 local $/ = "";
1938 while ($paragraph = <>) {
1939 while ($paragraph =~ /\p{Ll}['")]*[.!?]+['")]*\s/g) {
19799a22 1940 $sentences++;
a0d0e21e
LW
1941 }
1942 }
c543c01b
TC
1943 say $sentences;
1944
1945Here's another way to check for sentences in a paragraph:
1946
7188ca43
KW
1947 my $sentence_rx = qr{
1948 (?: (?<= ^ ) | (?<= \s ) ) # after start-of-string or
1949 # whitespace
1950 \p{Lu} # capital letter
1951 .*? # a bunch of anything
1952 (?<= \S ) # that ends in non-
1953 # whitespace
1954 (?<! \b [DMS]r ) # but isn't a common abbr.
1955 (?<! \b Mrs )
1956 (?<! \b Sra )
1957 (?<! \b St )
1958 [.?!] # followed by a sentence
1959 # ender
1960 (?= $ | \s ) # in front of end-of-string
1961 # or whitespace
1962 }sx;
1963 local $/ = "";
1964 while (my $paragraph = <>) {
1965 say "NEW PARAGRAPH";
1966 my $count = 0;
1967 while ($paragraph =~ /($sentence_rx)/g) {
1968 printf "\tgot sentence %d: <%s>\n", ++$count, $1;
c543c01b 1969 }
7188ca43 1970 }
c543c01b
TC
1971
1972Here's how to use C<m//gc> with C<\G>:
a0d0e21e 1973
137443ea 1974 $_ = "ppooqppqq";
44a8e56a 1975 while ($i++ < 2) {
1976 print "1: '";
c90c0ff4 1977 print $1 while /(o)/gc; print "', pos=", pos, "\n";
44a8e56a 1978 print "2: '";
c90c0ff4 1979 print $1 if /\G(q)/gc; print "', pos=", pos, "\n";
44a8e56a 1980 print "3: '";
c90c0ff4 1981 print $1 while /(p)/gc; print "', pos=", pos, "\n";
44a8e56a 1982 }
5d43e42d 1983 print "Final: '$1', pos=",pos,"\n" if /\G(.)/;
44a8e56a 1984
1985The last example should print:
1986
1987 1: 'oo', pos=4
137443ea 1988 2: 'q', pos=5
44a8e56a 1989 3: 'pp', pos=7
1990 1: '', pos=7
137443ea 1991 2: 'q', pos=8
1992 3: '', pos=8
5d43e42d
DC
1993 Final: 'q', pos=8
1994
1995Notice that the final match matched C<q> instead of C<p>, which a match
46f8a5ea
FC
1996without the C<\G> anchor would have done. Also note that the final match
1997did not update C<pos>. C<pos> is only updated on a C</g> match. If the
c543c01b
TC
1998final match did indeed match C<p>, it's a good bet that you're running a
1999very old (pre-5.6.0) version of Perl.
44a8e56a 2000
c90c0ff4 2001A useful idiom for C<lex>-like scanners is C</\G.../gc>. You can
e7ea3e70 2002combine several regexps like this to process a string part-by-part,
c90c0ff4 2003doing different actions depending on which regexp matched. Each
2004regexp tries to match where the previous one leaves off.
e7ea3e70 2005
3fe9a6f1 2006 $_ = <<'EOL';
7188ca43
KW
2007 $url = URI::URL->new( "http://example.com/" );
2008 die if $url eq "xXx";
3fe9a6f1 2009 EOL
c543c01b
TC
2010
2011 LOOP: {
950b09ed 2012 print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc;
7188ca43
KW
2013 print(" lowercase"), redo LOOP
2014 if /\G\p{Ll}+\b[,.;]?\s*/gc;
2015 print(" UPPERCASE"), redo LOOP
2016 if /\G\p{Lu}+\b[,.;]?\s*/gc;
2017 print(" Capitalized"), redo LOOP
2018 if /\G\p{Lu}\p{Ll}+\b[,.;]?\s*/gc;
c543c01b 2019 print(" MiXeD"), redo LOOP if /\G\pL+\b[,.;]?\s*/gc;
7188ca43
KW
2020 print(" alphanumeric"), redo LOOP
2021 if /\G[\p{Alpha}\pN]+\b[,.;]?\s*/gc;
c543c01b 2022 print(" line-noise"), redo LOOP if /\G\W+/gc;
950b09ed 2023 print ". That's all!\n";
c543c01b 2024 }
e7ea3e70
IZ
2025
2026Here is the output (split into several lines):
2027
7188ca43
KW
2028 line-noise lowercase line-noise UPPERCASE line-noise UPPERCASE
2029 line-noise lowercase line-noise lowercase line-noise lowercase
2030 lowercase line-noise lowercase lowercase line-noise lowercase
2031 lowercase line-noise MiXeD line-noise. That's all!
44a8e56a 2032
ba7f043c 2033=item C<m?I<PATTERN>?msixpodualngc>
725a61d7 2034X<?> X<operator, match-once>
87e95b7f 2035
ba7f043c
KW
2036This is just like the C<m/I<PATTERN>/> search, except that it matches
2037only once between calls to the C<reset()> operator. This is a useful
87e95b7f 2038optimization when you want to see only the first occurrence of
ceb131e8 2039something in each file of a set of files, for instance. Only C<m??>
87e95b7f
YO
2040patterns local to the current package are reset.
2041
2042 while (<>) {
ceb131e8 2043 if (m?^$?) {
87e95b7f
YO
2044 # blank line between header and body
2045 }
2046 } continue {
725a61d7 2047 reset if eof; # clear m?? status for next file
87e95b7f
YO
2048 }
2049
c543c01b
TC
2050Another example switched the first "latin1" encoding it finds
2051to "utf8" in a pod file:
2052
2053 s//utf8/ if m? ^ =encoding \h+ \K latin1 ?x;
2054
2055The match-once behavior is controlled by the match delimiter being
4932eeca 2056C<?>; with any other delimiter this is the normal C<m//> operator.
725a61d7 2057
ba7f043c 2058In the past, the leading C<m> in C<m?I<PATTERN>?> was optional, but omitting it
0381ecf1
MH
2059would produce a deprecation warning. As of v5.22.0, omitting it produces a
2060syntax error. If you encounter this construct in older code, you can just add
2061C<m>.
87e95b7f 2062
ba7f043c 2063=item C<s/I<PATTERN>/I<REPLACEMENT>/msixpodualngcer>
0a31ee11 2064X<s> X<substitute> X<substitution> X<replace> X<regexp, replace>
4f4d7508 2065X<regexp, substitute> X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c> X</e> X</r>
87e95b7f
YO
2066
2067Searches a string for a pattern, and if found, replaces that pattern
2068with the replacement text and returns the number of substitutions
2069made. Otherwise it returns false (specifically, the empty string).
2070
c543c01b 2071If the C</r> (non-destructive) option is used then it runs the
679563bb
KW
2072substitution on a copy of the string and instead of returning the
2073number of substitutions, it returns the copy whether or not a
c543c01b
TC
2074substitution occurred. The original string is never changed when
2075C</r> is used. The copy will always be a plain string, even if the
2076input is an object or a tied variable.
4f4d7508 2077
87e95b7f 2078If no string is specified via the C<=~> or C<!~> operator, the C<$_>
c543c01b
TC
2079variable is searched and modified. Unless the C</r> option is used,
2080the string specified must be a scalar variable, an array element, a
2081hash element, or an assignment to one of those; that is, some sort of
2082scalar lvalue.
87e95b7f 2083
6d314683 2084If the delimiter chosen is a single quote, no variable interpolation is
ba7f043c
KW
2085done on either the I<PATTERN> or the I<REPLACEMENT>. Otherwise, if the
2086I<PATTERN> contains a C<$> that looks like a variable rather than an
87e95b7f
YO
2087end-of-string test, the variable will be interpolated into the pattern
2088at run-time. If you want the pattern compiled only once the first time
2089the variable is interpolated, use the C</o> option. If the pattern
2090evaluates to the empty string, the last successfully executed regular
2091expression is used instead. See L<perlre> for further explanation on these.
87e95b7f 2092
ba7f043c 2093Options are as with C<m//> with the addition of the following replacement
87e95b7f
YO
2094specific options:
2095
2096 e Evaluate the right side as an expression.
7188ca43
KW
2097 ee Evaluate the right side as a string then eval the
2098 result.
2099 r Return substitution and leave the original string
2100 untouched.
87e95b7f 2101
ed02a3bf
DN
2102Any non-whitespace delimiter may replace the slashes. Add space after
2103the C<s> when using a character allowed in identifiers. If single quotes
2104are used, no interpretation is done on the replacement string (the C</e>
3ff8ecf9 2105modifier overrides this, however). Note that Perl treats backticks
ed02a3bf 2106as normal delimiters; the replacement text is not evaluated as a command.
ba7f043c 2107If the I<PATTERN> is delimited by bracketing quotes, the I<REPLACEMENT> has
1ca345ed 2108its own pair of quotes, which may or may not be bracketing quotes, for example,
87e95b7f
YO
2109C<s(foo)(bar)> or C<< s<foo>/bar/ >>. A C</e> will cause the
2110replacement portion to be treated as a full-fledged Perl expression
2111and evaluated right then and there. It is, however, syntax checked at
46f8a5ea 2112compile-time. A second C<e> modifier will cause the replacement portion
87e95b7f
YO
2113to be C<eval>ed before being run as a Perl expression.
2114
2115Examples:
2116
7188ca43 2117 s/\bgreen\b/mauve/g; # don't change wintergreen
87e95b7f
YO
2118
2119 $path =~ s|/usr/bin|/usr/local/bin|;
2120
2121 s/Login: $foo/Login: $bar/; # run-time pattern
2122
7188ca43
KW
2123 ($foo = $bar) =~ s/this/that/; # copy first, then
2124 # change
2125 ($foo = "$bar") =~ s/this/that/; # convert to string,
2126 # copy, then change
4f4d7508
DC
2127 $foo = $bar =~ s/this/that/r; # Same as above using /r
2128 $foo = $bar =~ s/this/that/r
7188ca43
KW
2129 =~ s/that/the other/r; # Chained substitutes
2130 # using /r
2131 @foo = map { s/this/that/r } @bar # /r is very useful in
2132 # maps
87e95b7f 2133
7188ca43 2134 $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-cnt
87e95b7f
YO
2135
2136 $_ = 'abc123xyz';
2137 s/\d+/$&*2/e; # yields 'abc246xyz'
2138 s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz'
2139 s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz'
2140
2141 s/%(.)/$percent{$1}/g; # change percent escapes; no /e
2142 s/%(.)/$percent{$1} || $&/ge; # expr now, so /e
2143 s/^=(\w+)/pod($1)/ge; # use function call
2144
4f4d7508 2145 $_ = 'abc123xyz';
db691027 2146 $x = s/abc/def/r; # $x is 'def123xyz' and
4f4d7508
DC
2147 # $_ remains 'abc123xyz'.
2148
87e95b7f
YO
2149 # expand variables in $_, but dynamics only, using
2150 # symbolic dereferencing
2151 s/\$(\w+)/${$1}/g;
2152
2153 # Add one to the value of any numbers in the string
2154 s/(\d+)/1 + $1/eg;
2155
c543c01b
TC
2156 # Titlecase words in the last 30 characters only
2157 substr($str, -30) =~ s/\b(\p{Alpha}+)\b/\u\L$1/g;
2158
87e95b7f
YO
2159 # This will expand any embedded scalar variable
2160 # (including lexicals) in $_ : First $1 is interpolated
2161 # to the variable name, and then evaluated
2162 s/(\$\w+)/$1/eeg;
2163
2164 # Delete (most) C comments.
2165 $program =~ s {
2166 /\* # Match the opening delimiter.
2167 .*? # Match a minimal number of characters.
2168 \*/ # Match the closing delimiter.
2169 } []gsx;
2170
7188ca43
KW
2171 s/^\s*(.*?)\s*$/$1/; # trim whitespace in $_,
2172 # expensively
87e95b7f 2173
7188ca43
KW
2174 for ($variable) { # trim whitespace in $variable,
2175 # cheap
87e95b7f
YO
2176 s/^\s+//;
2177 s/\s+$//;
2178 }
2179
2180 s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
2181
ba7f043c
KW
2182Note the use of C<$> instead of C<\> in the last example. Unlike
2183B<sed>, we use the \<I<digit>> form only in the left hand side.
87e95b7f
YO
2184Anywhere else it's $<I<digit>>.
2185
2186Occasionally, you can't use just a C</g> to get all the changes
2187to occur that you might want. Here are two common cases:
2188
2189 # put commas in the right places in an integer
2190 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g;
2191
2192 # expand tabs to 8-column spacing
2193 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
2194
2195=back
2196
2197=head2 Quote-Like Operators
2198X<operator, quote-like>
2199
01c6f5f4
RGS
2200=over 4
2201
ba7f043c 2202=item C<q/I<STRING>/>
5d44bfff 2203X<q> X<quote, single> X<'> X<''>
a0d0e21e 2204
ba7f043c 2205=item C<'I<STRING>'>
a0d0e21e 2206
19799a22 2207A single-quoted, literal string. A backslash represents a backslash
68dc0745 2208unless followed by the delimiter or another backslash, in which case
2209the delimiter or backslash is interpolated.
a0d0e21e
LW
2210
2211 $foo = q!I said, "You said, 'She said it.'"!;
2212 $bar = q('This is it.');
68dc0745 2213 $baz = '\n'; # a two-character string
a0d0e21e 2214
ba7f043c 2215=item C<qq/I<STRING>/>
d74e8afc 2216X<qq> X<quote, double> X<"> X<"">
a0d0e21e 2217
ba7f043c 2218=item "I<STRING>"
a0d0e21e
LW
2219
2220A double-quoted, interpolated string.
2221
2222 $_ .= qq
2223 (*** The previous line contains the naughty word "$1".\n)
19799a22 2224 if /\b(tcl|java|python)\b/i; # :-)
68dc0745 2225 $baz = "\n"; # a one-character string
a0d0e21e 2226
ba7f043c 2227=item C<qx/I<STRING>/>
d74e8afc 2228X<qx> X<`> X<``> X<backtick>
a0d0e21e 2229
ba7f043c 2230=item C<`I<STRING>`>
a0d0e21e 2231
43dd4d21 2232A string which is (possibly) interpolated and then executed as a
f703fc96 2233system command with F</bin/sh> or its equivalent. Shell wildcards,
43dd4d21
JH
2234pipes, and redirections will be honored. The collected standard
2235output of the command is returned; standard error is unaffected. In
2236scalar context, it comes back as a single (potentially multi-line)
ba7f043c
KW
2237string, or C<undef> if the command failed. In list context, returns a
2238list of lines (however you've defined lines with C<$/> or
2239C<$INPUT_RECORD_SEPARATOR>), or an empty list if the command failed.
5a964f20
TC
2240
2241Because backticks do not affect standard error, use shell file descriptor
2242syntax (assuming the shell supports this) if you care to address this.
2243To capture a command's STDERR and STDOUT together:
a0d0e21e 2244
5a964f20
TC
2245 $output = `cmd 2>&1`;
2246
2247To capture a command's STDOUT but discard its STDERR:
2248
2249 $output = `cmd 2>/dev/null`;
2250
2251To capture a command's STDERR but discard its STDOUT (ordering is
2252important here):
2253
2254 $output = `cmd 2>&1 1>/dev/null`;
2255
2256To exchange a command's STDOUT and STDERR in order to capture the STDERR
2257but leave its STDOUT to come out the old STDERR:
2258
2259 $output = `cmd 3>&1 1>&2 2>&3 3>&-`;
2260
2261To read both a command's STDOUT and its STDERR separately, it's easiest
2359510d
SD
2262to redirect them separately to files, and then read from those files
2263when the program is done:
5a964f20 2264
2359510d 2265 system("program args 1>program.stdout 2>program.stderr");
5a964f20 2266
30398227
SP
2267The STDIN filehandle used by the command is inherited from Perl's STDIN.
2268For example:
2269
c543c01b
TC
2270 open(SPLAT, "stuff") || die "can't open stuff: $!";
2271 open(STDIN, "<&SPLAT") || die "can't dupe SPLAT: $!";
40bbb707 2272 print STDOUT `sort`;
30398227 2273
40bbb707 2274will print the sorted contents of the file named F<"stuff">.
30398227 2275
5a964f20
TC
2276Using single-quote as a delimiter protects the command from Perl's
2277double-quote interpolation, passing it on to the shell instead:
2278
2279 $perl_info = qx(ps $$); # that's Perl's $$
2280 $shell_info = qx'ps $$'; # that's the new shell's $$
2281
19799a22 2282How that string gets evaluated is entirely subject to the command
5a964f20
TC
2283interpreter on your system. On most platforms, you will have to protect
2284shell metacharacters if you want them treated literally. This is in
2285practice difficult to do, as it's unclear how to escape which characters.
ba7f043c 2286See L<perlsec> for a clean and safe example of a manual C<fork()> and C<exec()>
5a964f20 2287to emulate backticks safely.
a0d0e21e 2288
bb32b41a
GS
2289On some platforms (notably DOS-like ones), the shell may not be
2290capable of dealing with multiline commands, so putting newlines in
2291the string may not get you what you want. You may be able to evaluate
2292multiple commands in a single line by separating them with the command
a727cfac 2293separator character, if your shell supports that (for example, C<;> on
1ca345ed 2294many Unix shells and C<&> on the Windows NT C<cmd> shell).
bb32b41a 2295
3ff8ecf9 2296Perl will attempt to flush all files opened for
0f897271
GS
2297output before starting the child process, but this may not be supported
2298on some platforms (see L<perlport>). To be safe, you may need to set
ba7f043c
KW
2299C<$|> (C<$AUTOFLUSH> in C<L<English>>) or call the C<autoflush()> method of
2300C<L<IO::Handle>> on any open handles.
0f897271 2301
bb32b41a
GS
2302Beware that some command shells may place restrictions on the length
2303of the command line. You must ensure your strings don't exceed this
2304limit after any necessary interpolations. See the platform-specific
2305release notes for more details about your particular environment.
2306
5a964f20
TC
2307Using this operator can lead to programs that are difficult to port,
2308because the shell commands called vary between systems, and may in
2309fact not be present at all. As one example, the C<type> command under
2310the POSIX shell is very different from the C<type> command under DOS.
2311That doesn't mean you should go out of your way to avoid backticks
2312when they're the right way to get something done. Perl was made to be
2313a glue language, and one of the things it glues together is commands.
2314Just understand what you're getting yourself into.
bb32b41a 2315
7cf4dd3e
DB
2316Like C<system>, backticks put the child process exit code in C<$?>.
2317If you'd like to manually inspect failure, you can check all possible
2318failure modes by inspecting C<$?> like this:
2319
2320 if ($? == -1) {
2321 print "failed to execute: $!\n";
2322 }
2323 elsif ($? & 127) {
2324 printf "child died with signal %d, %s coredump\n",
2325 ($? & 127), ($? & 128) ? 'with' : 'without';
2326 }
2327 else {
2328 printf "child exited with value %d\n", $? >> 8;
2329 }
2330
fe43a9cc
TC
2331Use the L<open> pragma to control the I/O layers used when reading the
2332output of the command, for example:
2333
2334 use open IN => ":encoding(UTF-8)";
2335 my $x = `cmd-producing-utf-8`;
2336
da87341d 2337See L</"I/O Operators"> for more discussion.
a0d0e21e 2338
ba7f043c 2339=item C<qw/I<STRING>/>
d74e8afc 2340X<qw> X<quote, list> X<quote, words>
945c54fd 2341
ba7f043c 2342Evaluates to a list of the words extracted out of I<STRING>, using embedded
945c54fd
JH
2343whitespace as the word delimiters. It can be understood as being roughly
2344equivalent to:
2345
c543c01b 2346 split(" ", q/STRING/);
945c54fd 2347
efb1e162
CW
2348the differences being that it generates a real list at compile time, and
2349in scalar context it returns the last element in the list. So
945c54fd
JH
2350this expression:
2351
2352 qw(foo bar baz)
2353
2354is semantically equivalent to the list:
2355
c543c01b 2356 "foo", "bar", "baz"
945c54fd
JH
2357
2358Some frequently seen examples:
2359
2360 use POSIX qw( setlocale localeconv )
2361 @EXPORT = qw( foo bar baz );
2362
ba7f043c 2363A common mistake is to try to separate the words with commas or to
945c54fd 2364put comments into a multi-line C<qw>-string. For this reason, the
ba7f043c
KW
2365S<C<use warnings>> pragma and the B<-w> switch (that is, the C<$^W> variable)
2366produces warnings if the I<STRING> contains the C<","> or the C<"#"> character.
945c54fd 2367
ba7f043c 2368=item C<tr/I<SEARCHLIST>/I<REPLACEMENTLIST>/cdsr>
d74e8afc 2369X<tr> X<y> X<transliterate> X</c> X</d> X</s>
a0d0e21e 2370
ba7f043c 2371=item C<y/I<SEARCHLIST>/I<REPLACEMENTLIST>/cdsr>
a0d0e21e 2372
2c268ad5 2373Transliterates all occurrences of the characters found in the search list
a0d0e21e
LW
2374with the corresponding character in the replacement list. It returns
2375the number of characters replaced or deleted. If no string is
ba7f043c 2376specified via the C<=~> or C<!~> operator, the C<$_> string is transliterated.
c543c01b
TC
2377
2378If the C</r> (non-destructive) option is present, a new copy of the string
2379is made and its characters transliterated, and this copy is returned no
2380matter whether it was modified or not: the original string is always
2381left unchanged. The new copy is always a plain string, even if the input
2382string is an object or a tied variable.
8ada0baa 2383
c543c01b
TC
2384Unless the C</r> option is used, the string specified with C<=~> must be a
2385scalar variable, an array element, a hash element, or an assignment to one
2386of those; in other words, an lvalue.
8ff32507 2387
89d205f2 2388A character range may be specified with a hyphen, so C<tr/A-J/0-9/>
2c268ad5 2389does the same replacement as C<tr/ACEGIBDFHJ/0246813579/>.
54310121 2390For B<sed> devotees, C<y> is provided as a synonym for C<tr>. If the
af2cbe4d
KW
2391I<SEARCHLIST> is delimited by bracketing quotes, the I<REPLACEMENTLIST>
2392must have its own pair of quotes, which may or may not be bracketing
2393quotes; for example, C<tr[aeiouy][yuoiea]> or C<tr(+\-*/)/ABCD/>.
c543c01b 2394
ba7f043c 2395Characters may be literals or any of the escape sequences accepted in
6d314683
YO
2396double-quoted strings. But there is no variable interpolation, so C<"$">
2397and C<"@"> are treated as literals. A hyphen at the beginning or end, or
ba7f043c
KW
2398preceded by a backslash is considered a literal. Escape sequence
2399details are in L<the table near the beginning of this section|/Quote and
f4240379 2400Quote-like Operators>.
ba7f043c 2401
c543c01b 2402Note that C<tr> does B<not> do regular expression character classes such as
ba7f043c 2403C<\d> or C<\pL>. The C<tr> operator is not equivalent to the C<L<tr(1)>>
af2cbe4d
KW
2404utility. C<tr[a-z][A-Z]> will uppercase the 26 letters "a" through "z",
2405but for case changing not confined to ASCII, use
2406L<C<lc>|perlfunc/lc>, L<C<uc>|perlfunc/uc>,
2407L<C<lcfirst>|perlfunc/lcfirst>, L<C<ucfirst>|perlfunc/ucfirst>
2408(all documented in L<perlfunc>), or the
2409L<substitution operator C<sE<sol>I<PATTERN>E<sol>I<REPLACEMENT>E<sol>>|/sE<sol>PATTERNE<sol>REPLACEMENTE<sol>msixpodualngcer>
2410(with C<\U>, C<\u>, C<\L>, and C<\l> string-interpolation escapes in the
2411I<REPLACEMENT> portion).
cc255d5f 2412
f4240379
KW
2413Most ranges are unportable between character sets, but certain ones
2414signal Perl to do special handling to make them portable. There are two
2415classes of portable ranges. The first are any subsets of the ranges
2416C<A-Z>, C<a-z>, and C<0-9>, when expressed as literal characters.
2417
2418 tr/h-k/H-K/
2419
2420capitalizes the letters C<"h">, C<"i">, C<"j">, and C<"k"> and nothing
2421else, no matter what the platform's character set is. In contrast, all
2422of
2423
2424 tr/\x68-\x6B/\x48-\x4B/
2425 tr/h-\x6B/H-\x4B/
2426 tr/\x68-k/\x48-K/
2427
2428do the same capitalizations as the previous example when run on ASCII
2429platforms, but something completely different on EBCDIC ones.
2430
2431The second class of portable ranges is invoked when one or both of the
2432range's end points are expressed as C<\N{...}>
2433
2434 $string =~ tr/\N{U+20}-\N{U+7E}//d;
2435
2436removes from C<$string> all the platform's characters which are
2437equivalent to any of Unicode U+0020, U+0021, ... U+007D, U+007E. This
2438is a portable range, and has the same effect on every platform it is
2439run on. It turns out that in this example, these are the ASCII
2440printable characters. So after this is run, C<$string> has only
2441controls and characters which have no ASCII equivalents.
2442
2443But, even for portable ranges, it is not generally obvious what is
2444included without having to look things up. A sound principle is to use
2445only ranges that begin from and end at either ASCII alphabetics of equal
8df98a27 2446case (C<b-e>, C<B-E>), or digits (C<1-4>). Anything else is unclear
f4240379 2447(and unportable unless C<\N{...}> is used). If in doubt, spell out the
8ada0baa
JH
2448character sets in full.
2449
a0d0e21e
LW
2450Options:
2451
2452 c Complement the SEARCHLIST.
2453 d Delete found but unreplaced characters.
2454 s Squash duplicate replaced characters.
8ff32507
FC
2455 r Return the modified string and leave the original string
2456 untouched.
a0d0e21e 2457
ba7f043c 2458If the C</c> modifier is specified, the I<SEARCHLIST> character set
19799a22 2459is complemented. If the C</d> modifier is specified, any characters
ba7f043c 2460specified by I<SEARCHLIST> not found in I<REPLACEMENTLIST> are deleted.
19799a22 2461(Note that this is slightly more flexible than the behavior of some
ba7f043c 2462B<tr> programs, which delete anything they find in the I<SEARCHLIST>,
46f8a5ea 2463period.) If the C</s> modifier is specified, sequences of characters
19799a22
GS
2464that were transliterated to the same character are squashed down
2465to a single instance of the character.
a0d0e21e 2466
ba7f043c
KW
2467If the C</d> modifier is used, the I<REPLACEMENTLIST> is always interpreted
2468exactly as specified. Otherwise, if the I<REPLACEMENTLIST> is shorter
2469than the I<SEARCHLIST>, the final character is replicated till it is long
2470enough. If the I<REPLACEMENTLIST> is empty, the I<SEARCHLIST> is replicated.
a0d0e21e
LW
2471This latter is useful for counting characters in a class or for
2472squashing character sequences in a class.
2473
2474Examples:
2475
c543c01b 2476 $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case ASCII
a0d0e21e
LW
2477
2478 $cnt = tr/*/*/; # count the stars in $_
2479
2480 $cnt = $sky =~ tr/*/*/; # count the stars in $sky
2481
2482 $cnt = tr/0-9//; # count the digits in $_
2483
2484 tr/a-zA-Z//s; # bookkeeper -> bokeper
2485
2486 ($HOST = $host) =~ tr/a-z/A-Z/;
c543c01b 2487 $HOST = $host =~ tr/a-z/A-Z/r; # same thing
8ff32507 2488
c543c01b 2489 $HOST = $host =~ tr/a-z/A-Z/r # chained with s///r
8ff32507 2490 =~ s/:/ -p/r;
a0d0e21e
LW
2491
2492 tr/a-zA-Z/ /cs; # change non-alphas to single space
2493
8ff32507
FC
2494 @stripped = map tr/a-zA-Z/ /csr, @original;
2495 # /r with map
2496
a0d0e21e 2497 tr [\200-\377]
c543c01b 2498 [\000-\177]; # wickedly delete 8th bit
a0d0e21e 2499
19799a22
GS
2500If multiple transliterations are given for a character, only the
2501first one is used:
748a9306
LW
2502
2503 tr/AAA/XYZ/
2504
2c268ad5 2505will transliterate any A to X.
748a9306 2506
19799a22 2507Because the transliteration table is built at compile time, neither
ba7f043c 2508the I<SEARCHLIST> nor the I<REPLACEMENTLIST> are subjected to double quote
19799a22 2509interpolation. That means that if you want to use variables, you
ba7f043c 2510must use an C<eval()>:
a0d0e21e
LW
2511
2512 eval "tr/$oldlist/$newlist/";
2513 die $@ if $@;
2514
2515 eval "tr/$oldlist/$newlist/, 1" or die $@;
2516
ba7f043c 2517=item C<< <<I<EOF> >>
d74e8afc 2518X<here-doc> X<heredoc> X<here-document> X<<< << >>>
7e3b091d
DA
2519
2520A line-oriented form of quoting is based on the shell "here-document"
2521syntax. Following a C<< << >> you specify a string to terminate
2522the quoted material, and all lines following the current line down to
89d205f2
YO
2523the terminating string are the value of the item.
2524
47eb4411
MH
2525Prefixing the terminating string with a C<~> specifies that you
2526want to use L</Indented Here-docs> (see below).
2527
89d205f2
YO
2528The terminating string may be either an identifier (a word), or some
2529quoted text. An unquoted identifier works like double quotes.
2530There may not be a space between the C<< << >> and the identifier,
2531unless the identifier is explicitly quoted. (If you put a space it
2532will be treated as a null identifier, which is valid, and matches the
2533first empty line.) The terminating string must appear by itself
2534(unquoted and with no surrounding whitespace) on the terminating line.
2535
2536If the terminating string is quoted, the type of quotes used determine
2537the treatment of the text.
2538
2539=over 4
2540
2541=item Double Quotes
2542
2543Double quotes indicate that the text will be interpolated using exactly
2544the same rules as normal double quoted strings.
7e3b091d
DA
2545
2546 print <<EOF;
2547 The price is $Price.
2548 EOF
2549
2550 print << "EOF"; # same as above
2551 The price is $Price.
2552 EOF
2553
89d205f2
YO
2554
2555=item Single Quotes
2556
2557Single quotes indicate the text is to be treated literally with no
46f8a5ea 2558interpolation of its content. This is similar to single quoted
89d205f2
YO
2559strings except that backslashes have no special meaning, with C<\\>
2560being treated as two backslashes and not one as they would in every
2561other quoting construct.
2562
c543c01b
TC
2563Just as in the shell, a backslashed bareword following the C<<< << >>>
2564means the same thing as a single-quoted string does:
2565
2566 $cost = <<'VISTA'; # hasta la ...
2567 That'll be $10 please, ma'am.
2568 VISTA
2569
2570 $cost = <<\VISTA; # Same thing!
2571 That'll be $10 please, ma'am.
2572 VISTA
2573
89d205f2
YO
2574This is the only form of quoting in perl where there is no need
2575to worry about escaping content, something that code generators
2576can and do make good use of.
2577
2578=item Backticks
2579
2580The content of the here doc is treated just as it would be if the
46f8a5ea 2581string were embedded in backticks. Thus the content is interpolated
89d205f2
YO
2582as though it were double quoted and then executed via the shell, with
2583the results of the execution returned.
2584
2585 print << `EOC`; # execute command and get results
7e3b091d 2586 echo hi there
7e3b091d
DA
2587 EOC
2588
89d205f2
YO
2589=back
2590
47eb4411
MH
2591=over 4
2592
2593=item Indented Here-docs
2594
2595The here-doc modifier C<~> allows you to indent your here-docs to make
2596the code more readable:
2597
2598 if ($some_var) {
2599 print <<~EOF;
2600 This is a here-doc
2601 EOF
2602 }
2603
2604This will print...
2605
2606 This is a here-doc
2607
2608...with no leading whitespace.
2609
2610The delimiter is used to determine the B<exact> whitespace to
2611remove from the beginning of each line. All lines B<must> have
2612at least the same starting whitespace (except lines only
2613containing a newline) or perl will croak. Tabs and spaces can
2614be mixed, but are matched exactly. One tab will not be equal to
26158 spaces!
2616
2617Additional beginning whitespace (beyond what preceded the
2618delimiter) will be preserved:
2619
2620 print <<~EOF;
2621 This text is not indented
2622 This text is indented with two spaces
2623 This text is indented with two tabs
2624 EOF
2625
2626Finally, the modifier may be used with all of the forms
2627mentioned above:
2628
2629 <<~\EOF;
2630 <<~'EOF'
2631 <<~"EOF"
2632 <<~`EOF`
2633
2634And whitespace may be used between the C<~> and quoted delimiters:
2635
2636 <<~ 'EOF'; # ... "EOF", `EOF`
2637
2638=back
2639
89d205f2
YO
2640It is possible to stack multiple here-docs in a row:
2641
7e3b091d
DA
2642 print <<"foo", <<"bar"; # you can stack them
2643 I said foo.
2644 foo
2645 I said bar.
2646 bar
2647
2648 myfunc(<< "THIS", 23, <<'THAT');
2649 Here's a line
2650 or two.
2651 THIS
2652 and here's another.
2653 THAT
2654
2655Just don't forget that you have to put a semicolon on the end
2656to finish the statement, as Perl doesn't know you're not going to
2657try to do this:
2658
2659 print <<ABC
2660 179231
2661 ABC
2662 + 20;
2663
872d7e53
TS
2664If you want to remove the line terminator from your here-docs,
2665use C<chomp()>.
2666
2667 chomp($string = <<'END');
2668 This is a string.
2669 END
2670
2671If you want your here-docs to be indented with the rest of the code,
2672you'll need to remove leading whitespace from each line manually:
7e3b091d
DA
2673
2674 ($quote = <<'FINIS') =~ s/^\s+//gm;
89d205f2 2675 The Road goes ever on and on,
7e3b091d
DA
2676 down from the door where it began.
2677 FINIS
2678
2679If you use a here-doc within a delimited construct, such as in C<s///eg>,
1bf48760
FC
2680the quoted material must still come on the line following the
2681C<<< <<FOO >>> marker, which means it may be inside the delimited
2682construct:
7e3b091d
DA
2683
2684 s/this/<<E . 'that'
2685 the other
2686 E
2687 . 'more '/eg;
2688
1bf48760
FC
2689It works this way as of Perl 5.18. Historically, it was inconsistent, and
2690you would have to write
7e3b091d 2691
89d205f2
YO
2692 s/this/<<E . 'that'
2693 . 'more '/eg;
2694 the other
2695 E
7e3b091d 2696
1bf48760
FC
2697outside of string evals.
2698
c543c01b 2699Additionally, quoting rules for the end-of-string identifier are
46f8a5ea 2700unrelated to Perl's quoting rules. C<q()>, C<qq()>, and the like are not
89d205f2
YO
2701supported in place of C<''> and C<"">, and the only interpolation is for
2702backslashing the quoting character:
7e3b091d
DA
2703
2704 print << "abc\"def";
2705 testing...
2706 abc"def
2707
2708Finally, quoted strings cannot span multiple lines. The general rule is
2709that the identifier must be a string literal. Stick with that, and you
2710should be safe.
2711
a0d0e21e
LW
2712=back
2713
75e14d17 2714=head2 Gory details of parsing quoted constructs
d74e8afc 2715X<quote, gory details>
75e14d17 2716
19799a22
GS
2717When presented with something that might have several different
2718interpretations, Perl uses the B<DWIM> (that's "Do What I Mean")
2719principle to pick the most probable interpretation. This strategy
2720is so successful that Perl programmers often do not suspect the
2721ambivalence of what they write. But from time to time, Perl's
2722notions differ substantially from what the author honestly meant.
2723
2724This section hopes to clarify how Perl handles quoted constructs.
2725Although the most common reason to learn this is to unravel labyrinthine
2726regular expressions, because the initial steps of parsing are the
2727same for all quoting operators, they are all discussed together.
2728
2729The most important Perl parsing rule is the first one discussed
2730below: when processing a quoted construct, Perl first finds the end
2731of that construct, then interprets its contents. If you understand
2732this rule, you may skip the rest of this section on the first
2733reading. The other rules are likely to contradict the user's
2734expectations much less frequently than this first one.
2735
2736Some passes discussed below are performed concurrently, but because
2737their results are the same, we consider them individually. For different
2738quoting constructs, Perl performs different numbers of passes, from
6deea57f 2739one to four, but these passes are always performed in the same order.
75e14d17 2740
13a2d996 2741=over 4
75e14d17
IZ
2742
2743=item Finding the end
2744
ba7f043c
KW
2745The first pass is finding the end of the quoted construct. This results
2746in saving to a safe location a copy of the text (between the starting
2747and ending delimiters), normalized as necessary to avoid needing to know
2748what the original delimiters were.
6deea57f
TS
2749
2750If the construct is a here-doc, the ending delimiter is a line
46f8a5ea 2751that has a terminating string as the content. Therefore C<<<EOF> is
6deea57f
TS
2752terminated by C<EOF> immediately followed by C<"\n"> and starting
2753from the first column of the terminating line.
2754When searching for the terminating line of a here-doc, nothing
46f8a5ea 2755is skipped. In other words, lines after the here-doc syntax
6deea57f
TS
2756are compared with the terminating string line by line.
2757
2758For the constructs except here-docs, single characters are used as starting
46f8a5ea 2759and ending delimiters. If the starting delimiter is an opening punctuation
6deea57f
TS
2760(that is C<(>, C<[>, C<{>, or C<< < >>), the ending delimiter is the
2761corresponding closing punctuation (that is C<)>, C<]>, C<}>, or C<< > >>).
2762If the starting delimiter is an unpaired character like C</> or a closing
ba7f043c 2763punctuation, the ending delimiter is the same as the starting delimiter.
6deea57f 2764Therefore a C</> terminates a C<qq//> construct, while a C<]> terminates
fc693347 2765both C<qq[]> and C<qq]]> constructs.
6deea57f
TS
2766
2767When searching for single-character delimiters, escaped delimiters
1ca345ed 2768and C<\\> are skipped. For example, while searching for terminating C</>,
6deea57f
TS
2769combinations of C<\\> and C<\/> are skipped. If the delimiters are
2770bracketing, nested pairs are also skipped. For example, while searching
ba7f043c 2771for a closing C<]> paired with the opening C<[>, combinations of C<\\>, C<\]>,
6deea57f
TS
2772and C<\[> are all skipped, and nested C<[> and C<]> are skipped as well.
2773However, when backslashes are used as the delimiters (like C<qq\\> and
2774C<tr\\\>), nothing is skipped.
32581033 2775During the search for the end, backslashes that escape delimiters or
7188ca43 2776other backslashes are removed (exactly speaking, they are not copied to the
32581033 2777safe location).
75e14d17 2778
19799a22
GS
2779For constructs with three-part delimiters (C<s///>, C<y///>, and
2780C<tr///>), the search is repeated once more.
fc693347 2781If the first delimiter is not an opening punctuation, the three delimiters must
d74605e5
FC
2782be the same, such as C<s!!!> and C<tr)))>,
2783in which case the second delimiter
6deea57f 2784terminates the left part and starts the right part at once.
b6538e4f 2785If the left part is delimited by bracketing punctuation (that is C<()>,
6deea57f 2786C<[]>, C<{}>, or C<< <> >>), the right part needs another pair of
b6538e4f 2787delimiters such as C<s(){}> and C<tr[]//>. In these cases, whitespace
ba7f043c 2788and comments are allowed between the two parts, although the comment must follow
a727cfac 2789at least one whitespace character; otherwise a character expected as the
b6538e4f 2790start of the comment may be regarded as the starting delimiter of the right part.
75e14d17 2791
19799a22
GS
2792During this search no attention is paid to the semantics of the construct.
2793Thus:
75e14d17
IZ
2794
2795 "$hash{"$foo/$bar"}"
2796
2a94b7ce 2797or:
75e14d17 2798
89d205f2 2799 m/
2a94b7ce 2800 bar # NOT a comment, this slash / terminated m//!
75e14d17
IZ
2801 /x
2802
19799a22
GS
2803do not form legal quoted expressions. The quoted part ends on the
2804first C<"> and C</>, and the rest happens to be a syntax error.
2805Because the slash that terminated C<m//> was followed by a C<SPACE>,
2806the example above is not C<m//x>, but rather C<m//> with no C</x>
2807modifier. So the embedded C<#> is interpreted as a literal C<#>.
75e14d17 2808
89d205f2 2809Also no attention is paid to C<\c\> (multichar control char syntax) during
46f8a5ea 2810this search. Thus the second C<\> in C<qq/\c\/> is interpreted as a part
89d205f2 2811of C<\/>, and the following C</> is not recognized as a delimiter.
0d594e51
TS
2812Instead, use C<\034> or C<\x1c> at the end of quoted constructs.
2813
75e14d17 2814=item Interpolation
d74e8afc 2815X<interpolation>
75e14d17 2816
19799a22 2817The next step is interpolation in the text obtained, which is now
89d205f2 2818delimiter-independent. There are multiple cases.
75e14d17 2819
13a2d996 2820=over 4
75e14d17 2821
89d205f2 2822=item C<<<'EOF'>
75e14d17
IZ
2823
2824No interpolation is performed.
6deea57f
TS
2825Note that the combination C<\\> is left intact, since escaped delimiters
2826are not available for here-docs.
75e14d17 2827
6deea57f 2828=item C<m''>, the pattern of C<s'''>
89d205f2 2829
6deea57f
TS
2830No interpolation is performed at this stage.
2831Any backslashed sequences including C<\\> are treated at the stage
2832to L</"parsing regular expressions">.
89d205f2 2833
6deea57f 2834=item C<''>, C<q//>, C<tr'''>, C<y'''>, the replacement of C<s'''>
75e14d17 2835
89d205f2 2836The only interpolation is removal of C<\> from pairs of C<\\>.
ba7f043c 2837Therefore C<"-"> in C<tr'''> and C<y'''> is treated literally
6deea57f
TS
2838as a hyphen and no character range is available.
2839C<\1> in the replacement of C<s'''> does not work as C<$1>.
89d205f2
YO
2840
2841=item C<tr///>, C<y///>
2842
6deea57f
TS
2843No variable interpolation occurs. String modifying combinations for
2844case and quoting such as C<\Q>, C<\U>, and C<\E> are not recognized.
2845The other escape sequences such as C<\200> and C<\t> and backslashed
2846characters such as C<\\> and C<\-> are converted to appropriate literals.
ba7f043c
KW
2847The character C<"-"> is treated specially and therefore C<\-> is treated
2848as a literal C<"-">.
75e14d17 2849
89d205f2 2850=item C<"">, C<``>, C<qq//>, C<qx//>, C<< <file*glob> >>, C<<<"EOF">
75e14d17 2851
628253b8 2852C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\F> (possibly paired with C<\E>) are
19799a22 2853converted to corresponding Perl constructs. Thus, C<"$foo\Qbaz$bar">
ba7f043c 2854is converted to S<C<$foo . (quotemeta("baz" . $bar))>> internally.
6deea57f
TS
2855The other escape sequences such as C<\200> and C<\t> and backslashed
2856characters such as C<\\> and C<\-> are replaced with appropriate
2857expansions.
2a94b7ce 2858
19799a22
GS
2859Let it be stressed that I<whatever falls between C<\Q> and C<\E>>
2860is interpolated in the usual way. Something like C<"\Q\\E"> has
48cbae4f 2861no C<\E> inside. Instead, it has C<\Q>, C<\\>, and C<E>, so the
19799a22
GS
2862result is the same as for C<"\\\\E">. As a general rule, backslashes
2863between C<\Q> and C<\E> may lead to counterintuitive results. So,
2864C<"\Q\t\E"> is converted to C<quotemeta("\t")>, which is the same
2865as C<"\\\t"> (since TAB is not alphanumeric). Note also that:
2a94b7ce
IZ
2866
2867 $str = '\t';
2868 return "\Q$str";
2869
2870may be closer to the conjectural I<intention> of the writer of C<"\Q\t\E">.
2871
19799a22 2872Interpolated scalars and arrays are converted internally to the C<join> and
ba7f043c 2873C<"."> catenation operations. Thus, S<C<"$foo XXX '@arr'">> becomes:
75e14d17 2874
19799a22 2875 $foo . " XXX '" . (join $", @arr) . "'";
75e14d17 2876
19799a22 2877All operations above are performed simultaneously, left to right.
75e14d17 2878
ba7f043c 2879Because the result of S<C<"\Q I<STRING> \E">> has all metacharacters
19799a22 2880quoted, there is no way to insert a literal C<$> or C<@> inside a
ba7f043c 2881C<\Q\E> pair. If protected by C<\>, C<$> will be quoted to become
19799a22
GS
2882C<"\\\$">; if not, it is interpreted as the start of an interpolated
2883scalar.
75e14d17 2884
19799a22 2885Note also that the interpolation code needs to make a decision on
89d205f2 2886where the interpolated scalar ends. For instance, whether
ba7f043c 2887S<C<< "a $x -> {c}" >>> really means:
75e14d17 2888
db691027 2889 "a " . $x . " -> {c}";
75e14d17 2890
2a94b7ce 2891or:
75e14d17 2892
db691027 2893 "a " . $x -> {c};
75e14d17 2894
19799a22
GS
2895Most of the time, the longest possible text that does not include
2896spaces between components and which contains matching braces or
2897brackets. because the outcome may be determined by voting based
2898on heuristic estimators, the result is not strictly predictable.
2899Fortunately, it's usually correct for ambiguous cases.
75e14d17 2900
6deea57f 2901=item the replacement of C<s///>
75e14d17 2902
628253b8 2903Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\F> and interpolation
6deea57f
TS
2904happens as with C<qq//> constructs.
2905
2906It is at this step that C<\1> is begrudgingly converted to C<$1> in
2907the replacement text of C<s///>, in order to correct the incorrigible
2908I<sed> hackers who haven't picked up the saner idiom yet. A warning
ba7f043c 2909is emitted if the S<C<use warnings>> pragma or the B<-w> command-line flag
6deea57f
TS
2910(that is, the C<$^W> variable) was set.
2911
9c6deb98 2912=item C<RE> in C<m?RE?>, C</RE/>, C<m/RE/>, C<s/RE/foo/>,
6deea57f 2913
628253b8 2914Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\F>, C<\E>,
cc74c5bd
TS
2915and interpolation happens (almost) as with C<qq//> constructs.
2916
5d03b57c
KW
2917Processing of C<\N{...}> is also done here, and compiled into an intermediate
2918form for the regex compiler. (This is because, as mentioned below, the regex
2919compilation may be done at execution time, and C<\N{...}> is a compile-time
2920construct.)
2921
cc74c5bd
TS
2922However any other combinations of C<\> followed by a character
2923are not substituted but only skipped, in order to parse them
2924as regular expressions at the following step.
6deea57f 2925As C<\c> is skipped at this step, C<@> of C<\c@> in RE is possibly
1749ea0d 2926treated as an array symbol (for example C<@foo>),
6deea57f 2927even though the same text in C<qq//> gives interpolation of C<\c@>.
6deea57f 2928
e128ab2c
DM
2929Code blocks such as C<(?{BLOCK})> are handled by temporarily passing control
2930back to the perl parser, in a similar way that an interpolated array
2931subscript expression such as C<"foo$array[1+f("[xyz")]bar"> would be.
2932
ba7f043c
KW
2933Moreover, inside C<(?{BLOCK})>, S<C<(?# comment )>>, and
2934a C<#>-comment in a C</x>-regular expression, no processing is
19799a22 2935performed whatsoever. This is the first step at which the presence
ba7f043c 2936of the C</x> modifier is relevant.
19799a22 2937
1749ea0d
TS
2938Interpolation in patterns has several quirks: C<$|>, C<$(>, C<$)>, C<@+>
2939and C<@-> are not interpolated, and constructs C<$var[SOMETHING]> are
2940voted (by several different estimators) to be either an array element
2941or C<$var> followed by an RE alternative. This is where the notation
19799a22
GS
2942C<${arr[$bar]}> comes handy: C</${arr[0-9]}/> is interpreted as
2943array element C<-9>, not as a regular expression from the variable
2944C<$arr> followed by a digit, which would be the interpretation of
2945C</$arr[0-9]/>. Since voting among different estimators may occur,
2946the result is not predictable.
2947
19799a22
GS
2948The lack of processing of C<\\> creates specific restrictions on
2949the post-processed text. If the delimiter is C</>, one cannot get
2950the combination C<\/> into the result of this step. C</> will
2951finish the regular expression, C<\/> will be stripped to C</> on
2952the previous step, and C<\\/> will be left as is. Because C</> is
2953equivalent to C<\/> inside a regular expression, this does not
2954matter unless the delimiter happens to be character special to the
9c6deb98 2955RE engine, such as in C<s*foo*bar*>, C<m[foo]>, or C<m?foo?>; or an
19799a22 2956alphanumeric char, as in:
2a94b7ce
IZ
2957
2958 m m ^ a \s* b mmx;
2959
19799a22 2960In the RE above, which is intentionally obfuscated for illustration, the
6deea57f 2961delimiter is C<m>, the modifier is C<mx>, and after delimiter-removal the
ba7f043c 2962RE is the same as for S<C<m/ ^ a \s* b /mx>>. There's more than one
19799a22
GS
2963reason you're encouraged to restrict your delimiters to non-alphanumeric,
2964non-whitespace choices.
75e14d17
IZ
2965
2966=back
2967
19799a22 2968This step is the last one for all constructs except regular expressions,
75e14d17
IZ
2969which are processed further.
2970
6deea57f
TS
2971=item parsing regular expressions
2972X<regexp, parse>
75e14d17 2973
19799a22 2974Previous steps were performed during the compilation of Perl code,
ac036724 2975but this one happens at run time, although it may be optimized to
19799a22 2976be calculated at compile time if appropriate. After preprocessing
6deea57f 2977described above, and possibly after evaluation if concatenation,
19799a22
GS
2978joining, casing translation, or metaquoting are involved, the
2979resulting I<string> is passed to the RE engine for compilation.
2980
2981Whatever happens in the RE engine might be better discussed in L<perlre>,
2982but for the sake of continuity, we shall do so here.
2983
ba7f043c 2984This is another step where the presence of the C</x> modifier is
19799a22 2985relevant. The RE engine scans the string from left to right and
ba7f043c 2986converts it into a finite automaton.
19799a22
GS
2987
2988Backslashed characters are either replaced with corresponding
2989literal strings (as with C<\{>), or else they generate special nodes
2990in the finite automaton (as with C<\b>). Characters special to the
2991RE engine (such as C<|>) generate corresponding nodes or groups of
2992nodes. C<(?#...)> comments are ignored. All the rest is either
2993converted to literal strings to match, or else is ignored (as is
ba7f043c 2994whitespace and C<#>-style comments if C</x> is present).
19799a22
GS
2995
2996Parsing of the bracketed character class construct, C<[...]>, is
2997rather different than the rule used for the rest of the pattern.
2998The terminator of this construct is found using the same rules as
2999for finding the terminator of a C<{}>-delimited construct, the only
3000exception being that C<]> immediately following C<[> is treated as
e128ab2c
DM
3001though preceded by a backslash.
3002
3003The terminator of runtime C<(?{...})> is found by temporarily switching
3004control to the perl parser, which should stop at the point where the
3005logically balancing terminating C<}> is found.
19799a22
GS
3006
3007It is possible to inspect both the string given to RE engine and the
3008resulting finite automaton. See the arguments C<debug>/C<debugcolor>
ba7f043c 3009in the S<C<use L<re>>> pragma, as well as Perl's B<-Dr> command-line
4a4eefd0 3010switch documented in L<perlrun/"Command Switches">.
75e14d17
IZ
3011
3012=item Optimization of regular expressions
d74e8afc 3013X<regexp, optimization>
75e14d17 3014
7522fed5 3015This step is listed for completeness only. Since it does not change
75e14d17 3016semantics, details of this step are not documented and are subject
19799a22
GS
3017to change without notice. This step is performed over the finite
3018automaton that was generated during the previous pass.
2a94b7ce 3019
19799a22
GS
3020It is at this stage that C<split()> silently optimizes C</^/> to
3021mean C</^/m>.
75e14d17
IZ
3022
3023=back
3024
a0d0e21e 3025=head2 I/O Operators
d74e8afc 3026X<operator, i/o> X<operator, io> X<io> X<while> X<filehandle>
80a96bfc 3027X<< <> >> X<< <<>> >> X<@ARGV>
a0d0e21e 3028
54310121 3029There are several I/O operators you should know about.
fbad3eb5 3030
7b8d334a 3031A string enclosed by backticks (grave accents) first undergoes
19799a22
GS
3032double-quote interpolation. It is then interpreted as an external
3033command, and the output of that command is the value of the
e9c56f9b
JH
3034backtick string, like in a shell. In scalar context, a single string
3035consisting of all output is returned. In list context, a list of
3036values is returned, one per line of output. (You can set C<$/> to use
3037a different line terminator.) The command is executed each time the
3038pseudo-literal is evaluated. The status value of the command is
3039returned in C<$?> (see L<perlvar> for the interpretation of C<$?>).
3040Unlike in B<csh>, no translation is done on the return data--newlines
3041remain newlines. Unlike in any of the shells, single quotes do not
3042hide variable names in the command from interpretation. To pass a
3043literal dollar-sign through to the shell you need to hide it with a
3044backslash. The generalized form of backticks is C<qx//>. (Because
3045backticks always undergo shell expansion as well, see L<perlsec> for
3046security concerns.)
d74e8afc 3047X<qx> X<`> X<``> X<backtick> X<glob>
19799a22
GS
3048
3049In scalar context, evaluating a filehandle in angle brackets yields
3050the next line from that file (the newline, if any, included), or
3051C<undef> at end-of-file or on error. When C<$/> is set to C<undef>
3052(sometimes known as file-slurp mode) and the file is empty, it
3053returns C<''> the first time, followed by C<undef> subsequently.
3054
3055Ordinarily you must assign the returned value to a variable, but
3056there is one situation where an automatic assignment happens. If
3057and only if the input symbol is the only thing inside the conditional
3058of a C<while> statement (even if disguised as a C<for(;;)> loop),
ba7f043c 3059the value is automatically assigned to the global variable C<$_>,
19799a22
GS
3060destroying whatever was there previously. (This may seem like an
3061odd thing to you, but you'll use the construct in almost every Perl
ba7f043c
KW
3062script you write.) The C<$_> variable is not implicitly localized.
3063You'll have to put a S<C<local $_;>> before the loop if you want that
19799a22
GS
3064to happen.
3065
3066The following lines are equivalent:
a0d0e21e 3067
748a9306 3068 while (defined($_ = <STDIN>)) { print; }
7b8d334a 3069 while ($_ = <STDIN>) { print; }
a0d0e21e
LW
3070 while (<STDIN>) { print; }
3071 for (;<STDIN>;) { print; }
748a9306 3072 print while defined($_ = <STDIN>);
7b8d334a 3073 print while ($_ = <STDIN>);
a0d0e21e
LW
3074 print while <STDIN>;
3075
a727cfac 3076This also behaves similarly, but assigns to a lexical variable
1ca345ed 3077instead of to C<$_>:
7b8d334a 3078
89d205f2 3079 while (my $line = <STDIN>) { print $line }
7b8d334a 3080
19799a22
GS
3081In these loop constructs, the assigned value (whether assignment
3082is automatic or explicit) is then tested to see whether it is
1ca345ed
TC
3083defined. The defined test avoids problems where the line has a string
3084value that would be treated as false by Perl; for example a "" or
ba7f043c 3085a C<"0"> with no trailing newline. If you really mean for such values
19799a22 3086to terminate the loop, they should be tested for explicitly:
7b8d334a
GS
3087
3088 while (($_ = <STDIN>) ne '0') { ... }
3089 while (<STDIN>) { last unless $_; ... }
3090
ba7f043c 3091In other boolean contexts, C<< <I<FILEHANDLE>> >> without an
5ef4d93e 3092explicit C<defined> test or comparison elicits a warning if the
ba7f043c 3093S<C<use warnings>> pragma or the B<-w>
19799a22 3094command-line switch (the C<$^W> variable) is in effect.
7b8d334a 3095
5f05dabc 3096The filehandles STDIN, STDOUT, and STDERR are predefined. (The
19799a22
GS
3097filehandles C<stdin>, C<stdout>, and C<stderr> will also work except
3098in packages, where they would be interpreted as local identifiers
3099rather than global.) Additional filehandles may be created with
ba7f043c 3100the C<open()> function, amongst others. See L<perlopentut> and
19799a22 3101L<perlfunc/open> for details on this.
d74e8afc 3102X<stdin> X<stdout> X<sterr>
a0d0e21e 3103
ba7f043c 3104If a C<< <I<FILEHANDLE>> >> is used in a context that is looking for
19799a22
GS
3105a list, a list comprising all input lines is returned, one line per
3106list element. It's easy to grow to a rather large data space this
3107way, so use with care.
a0d0e21e 3108
ba7f043c 3109C<< <I<FILEHANDLE>> >> may also be spelled C<readline(*I<FILEHANDLE>)>.
19799a22 3110See L<perlfunc/readline>.
fbad3eb5 3111
ba7f043c 3112The null filehandle C<< <> >> is special: it can be used to emulate the
1ca345ed
TC
3113behavior of B<sed> and B<awk>, and any other Unix filter program
3114that takes a list of filenames, doing the same to each line
ba7f043c 3115of input from all of them. Input from C<< <> >> comes either from
a0d0e21e 3116standard input, or from each file listed on the command line. Here's
ba7f043c
KW
3117how it works: the first time C<< <> >> is evaluated, the C<@ARGV> array is
3118checked, and if it is empty, C<$ARGV[0]> is set to C<"-">, which when opened
3119gives you standard input. The C<@ARGV> array is then processed as a list
a0d0e21e
LW
3120of filenames. The loop
3121
3122 while (<>) {
3123 ... # code for each line
3124 }
3125
3126is equivalent to the following Perl-like pseudo code:
3127
3e3baf6d 3128 unshift(@ARGV, '-') unless @ARGV;
a0d0e21e
LW
3129 while ($ARGV = shift) {
3130 open(ARGV, $ARGV);
3131 while (<ARGV>) {
3132 ... # code for each line
3133 }
3134 }
3135
19799a22 3136except that it isn't so cumbersome to say, and will actually work.
ba7f043c
KW
3137It really does shift the C<@ARGV> array and put the current filename
3138into the C<$ARGV> variable. It also uses filehandle I<ARGV>
3139internally. C<< <> >> is just a synonym for C<< <ARGV> >>, which
19799a22 3140is magical. (The pseudo code above doesn't work because it treats
ba7f043c 3141C<< <ARGV> >> as non-magical.)
a0d0e21e 3142
48ab5743
ML
3143Since the null filehandle uses the two argument form of L<perlfunc/open>
3144it interprets special characters, so if you have a script like this:
3145
3146 while (<>) {
3147 print;
3148 }
3149
ba7f043c 3150and call it with S<C<perl dangerous.pl 'rm -rfv *|'>>, it actually opens a
48ab5743
ML
3151pipe, executes the C<rm> command and reads C<rm>'s output from that pipe.
3152If you want all items in C<@ARGV> to be interpreted as file names, you
1033ba6e
PM
3153can use the module C<ARGV::readonly> from CPAN, or use the double bracket:
3154
3155 while (<<>>) {
3156 print;
3157 }
3158
3159Using double angle brackets inside of a while causes the open to use the
3160three argument form (with the second argument being C<< < >>), so all
ba7f043c
KW
3161arguments in C<ARGV> are treated as literal filenames (including C<"-">).
3162(Note that for convenience, if you use C<< <<>> >> and if C<@ARGV> is
80a96bfc 3163empty, it will still read from the standard input.)
48ab5743 3164
ba7f043c 3165You can modify C<@ARGV> before the first C<< <> >> as long as the array ends up
a0d0e21e 3166containing the list of filenames you really want. Line numbers (C<$.>)
19799a22
GS
3167continue as though the input were one big happy file. See the example
3168in L<perlfunc/eof> for how to reset line numbers on each file.
5a964f20 3169
ba7f043c
KW
3170If you want to set C<@ARGV> to your own list of files, go right ahead.
3171This sets C<@ARGV> to all plain text files if no C<@ARGV> was given:
5a964f20
TC
3172
3173 @ARGV = grep { -f && -T } glob('*') unless @ARGV;
a0d0e21e 3174
5a964f20
TC
3175You can even set them to pipe commands. For example, this automatically
3176filters compressed arguments through B<gzip>:
3177
3178 @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV;
3179
3180If you want to pass switches into your script, you can use one of the
ba7f043c 3181C<Getopts> modules or put a loop on the front like this:
a0d0e21e
LW
3182
3183 while ($_ = $ARGV[0], /^-/) {
3184 shift;
3185 last if /^--$/;
3186 if (/^-D(.*)/) { $debug = $1 }
3187 if (/^-v/) { $verbose++ }
5a964f20 3188 # ... # other switches
a0d0e21e 3189 }
5a964f20 3190
a0d0e21e 3191 while (<>) {
5a964f20 3192 # ... # code for each line
a0d0e21e
LW
3193 }
3194
ba7f043c 3195The C<< <> >> symbol will return C<undef> for end-of-file only once.
89d205f2 3196If you call it again after this, it will assume you are processing another
ba7f043c 3197C<@ARGV> list, and if you haven't set C<@ARGV>, will read input from STDIN.
a0d0e21e 3198
1ca345ed 3199If what the angle brackets contain is a simple scalar variable (for example,
ba7f043c 3200C<$foo>), then that variable contains the name of the
19799a22
GS
3201filehandle to input from, or its typeglob, or a reference to the
3202same. For example:
cb1a09d0
AD
3203
3204 $fh = \*STDIN;
3205 $line = <$fh>;
a0d0e21e 3206
5a964f20
TC
3207If what's within the angle brackets is neither a filehandle nor a simple
3208scalar variable containing a filehandle name, typeglob, or typeglob
3209reference, it is interpreted as a filename pattern to be globbed, and
3210either a list of filenames or the next filename in the list is returned,
19799a22 3211depending on context. This distinction is determined on syntactic
ba7f043c
KW
3212grounds alone. That means C<< <$x> >> is always a C<readline()> from
3213an indirect handle, but C<< <$hash{key}> >> is always a C<glob()>.
3214That's because C<$x> is a simple scalar variable, but C<$hash{key}> is
ef191992
YST
3215not--it's a hash element. Even C<< <$x > >> (note the extra space)
3216is treated as C<glob("$x ")>, not C<readline($x)>.
5a964f20
TC
3217
3218One level of double-quote interpretation is done first, but you can't
35f2feb0 3219say C<< <$foo> >> because that's an indirect filehandle as explained
5a964f20
TC
3220in the previous paragraph. (In older versions of Perl, programmers
3221would insert curly brackets to force interpretation as a filename glob:
35f2feb0 3222C<< <${foo}> >>. These days, it's considered cleaner to call the
5a964f20 3223internal function directly as C<glob($foo)>, which is probably the right
19799a22 3224way to have done it in the first place.) For example:
a0d0e21e
LW
3225
3226 while (<*.c>) {
3227 chmod 0644, $_;
3228 }
3229
3a4b19e4 3230is roughly equivalent to:
a0d0e21e
LW
3231
3232 open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
3233 while (<FOO>) {
5b3eff12 3234 chomp;
a0d0e21e
LW
3235 chmod 0644, $_;
3236 }
3237
3a4b19e4 3238except that the globbing is actually done internally using the standard
ba7f043c 3239C<L<File::Glob>> extension. Of course, the shortest way to do the above is:
a0d0e21e
LW
3240
3241 chmod 0644, <*.c>;
3242
19799a22
GS
3243A (file)glob evaluates its (embedded) argument only when it is
3244starting a new list. All values must be read before it will start
3245over. In list context, this isn't important because you automatically
3246get them all anyway. However, in scalar context the operator returns
069e01df 3247the next value each time it's called, or C<undef> when the list has
19799a22
GS
3248run out. As with filehandle reads, an automatic C<defined> is
3249generated when the glob occurs in the test part of a C<while>,
1ca345ed
TC
3250because legal glob returns (for example,
3251a file called F<0>) would otherwise
19799a22
GS
3252terminate the loop. Again, C<undef> is returned only once. So if
3253you're expecting a single value from a glob, it is much better to
3254say
4633a7c4
LW
3255
3256 ($file) = <blurch*>;
3257
3258than
3259
3260 $file = <blurch*>;
3261
3262because the latter will alternate between returning a filename and
19799a22 3263returning false.
4633a7c4 3264
b159ebd3 3265If you're trying to do variable interpolation, it's definitely better
ba7f043c 3266to use the C<glob()> function, because the older notation can cause people
e37d713d 3267to become confused with the indirect filehandle notation.
4633a7c4
LW
3268
3269 @files = glob("$dir/*.[ch]");
3270 @files = glob($files[$i]);
3271
a0d0e21e 3272=head2 Constant Folding
d74e8afc 3273X<constant folding> X<folding>
a0d0e21e
LW
3274
3275Like C, Perl does a certain amount of expression evaluation at
19799a22 3276compile time whenever it determines that all arguments to an
a0d0e21e
LW
3277operator are static and have no side effects. In particular, string
3278concatenation happens at compile time between literals that don't do
19799a22 3279variable substitution. Backslash interpolation also happens at
a0d0e21e
LW
3280compile time. You can say
3281
1ca345ed 3282 'Now is the time for all'
a727cfac 3283 . "\n"
1ca345ed 3284 . 'good men to come to.'
a0d0e21e 3285
54310121 3286and this all reduces to one string internally. Likewise, if
a0d0e21e
LW
3287you say
3288
3289 foreach $file (@filenames) {
5a964f20 3290 if (-s $file > 5 + 100 * 2**16) { }
54310121 3291 }
a0d0e21e 3292
1ca345ed 3293the compiler precomputes the number which that expression
19799a22 3294represents so that the interpreter won't have to.
a0d0e21e 3295
fd1abbef 3296=head2 No-ops
d74e8afc 3297X<no-op> X<nop>
fd1abbef
DN
3298
3299Perl doesn't officially have a no-op operator, but the bare constants
1ca345ed 3300C<0> and C<1> are special-cased not to produce a warning in void
fd1abbef
DN
3301context, so you can for example safely do
3302
3303 1 while foo();
3304
2c268ad5 3305=head2 Bitwise String Operators
fb7054ba 3306X<operator, bitwise, string> X<&.> X<|.> X<^.> X<~.>
2c268ad5
TP
3307
3308Bitstrings of any size may be manipulated by the bitwise operators
3309(C<~ | & ^>).
3310
19799a22
GS
3311If the operands to a binary bitwise op are strings of different
3312sizes, B<|> and B<^> ops act as though the shorter operand had
3313additional zero bits on the right, while the B<&> op acts as though
3314the longer operand were truncated to the length of the shorter.
3315The granularity for such extension or truncation is one or more
3316bytes.
2c268ad5 3317
89d205f2 3318 # ASCII-based examples
2c268ad5
TP
3319 print "j p \n" ^ " a h"; # prints "JAPH\n"
3320 print "JA" | " ph\n"; # prints "japh\n"
3321 print "japh\nJunk" & '_____'; # prints "JAPH\n";
3322 print 'p N$' ^ " E<H\n"; # prints "Perl\n";
3323
19799a22 3324If you are intending to manipulate bitstrings, be certain that
2c268ad5 3325you're supplying bitstrings: If an operand is a number, that will imply
19799a22 3326a B<numeric> bitwise operation. You may explicitly show which type of
2c268ad5
TP
3327operation you intend by using C<""> or C<0+>, as in the examples below.
3328
4358a253
SS
3329 $foo = 150 | 105; # yields 255 (0x96 | 0x69 is 0xFF)
3330 $foo = '150' | 105; # yields 255
2c268ad5
TP
3331 $foo = 150 | '105'; # yields 255
3332 $foo = '150' | '105'; # yields string '155' (under ASCII)
3333
3334 $baz = 0+$foo & 0+$bar; # both ops explicitly numeric
3335 $biz = "$foo" ^ "$bar"; # both ops explicitly stringy
a0d0e21e 3336
fb7054ba 3337This somewhat unpredictable behavior can be avoided with the experimental
ba7f043c
KW
3338"bitwise" feature, new in Perl 5.22. You can enable it via S<C<use feature
3339'bitwise'>>. By default, it will warn unless the C<"experimental::bitwise">
3340warnings category has been disabled. (S<C<use experimental 'bitwise'>> will
fb7054ba
FC
3341enable the feature and disable the warning.) Under this feature, the four
3342standard bitwise operators (C<~ | & ^>) are always numeric. Adding a dot
3343after each operator (C<~. |. &. ^.>) forces it to treat its operands as
3344strings:
3345
3346 use experimental "bitwise";
3347 $foo = 150 | 105; # yields 255 (0x96 | 0x69 is 0xFF)
3348 $foo = '150' | 105; # yields 255
3349 $foo = 150 | '105'; # yields 255
3350 $foo = '150' | '105'; # yields 255
9f1b8172 3351 $foo = 150 |. 105; # yields string '155'
fb7054ba
FC
3352 $foo = '150' |. 105; # yields string '155'
3353 $foo = 150 |.'105'; # yields string '155'
3354 $foo = '150' |.'105'; # yields string '155'
3355
3356 $baz = $foo & $bar; # both operands numeric
3357 $biz = $foo ^. $bar; # both operands stringy
3358
3359The assignment variants of these operators (C<&= |= ^= &.= |.= ^.=>)
3360behave likewise under the feature.
3361
fac71630
KW
3362It is a fatal error if an operand contains a character whose ordinal
3363value is above 0xFF, and hence not expressible except in UTF-8. The
3364operation is performed on a non-UTF-8 copy for other operands encoded in
3365UTF-8. See L<perlunicode/Byte and Character Semantics>.
737f7534 3366
1ae175c8
GS
3367See L<perlfunc/vec> for information on how to manipulate individual bits
3368in a bit vector.
3369
55497cff 3370=head2 Integer Arithmetic
d74e8afc 3371X<integer>
a0d0e21e 3372
19799a22 3373By default, Perl assumes that it must do most of its arithmetic in
a0d0e21e
LW
3374floating point. But by saying
3375
3376 use integer;
3377
3eab78e3
CW
3378you may tell the compiler to use integer operations
3379(see L<integer> for a detailed explanation) from here to the end of
3380the enclosing BLOCK. An inner BLOCK may countermand this by saying
a0d0e21e
LW
3381
3382 no integer;
3383
19799a22 3384which lasts until the end of that BLOCK. Note that this doesn't
3eab78e3
CW
3385mean everything is an integer, merely that Perl will use integer
3386operations for arithmetic, comparison, and bitwise operators. For
ba7f043c 3387example, even under S<C<use integer>>, if you take the C<sqrt(2)>, you'll
3eab78e3 3388still get C<1.4142135623731> or so.
19799a22 3389
ba7f043c
KW
3390Used on numbers, the bitwise operators (C<&> C<|> C<^> C<~> C<< << >>
3391C<< >> >>) always produce integral results. (But see also
5a0de581 3392L</Bitwise String Operators>.) However, S<C<use integer>> still has meaning for
19799a22 3393them. By default, their results are interpreted as unsigned integers, but
ba7f043c 3394if S<C<use integer>> is in effect, their results are interpreted
19799a22 3395as signed integers. For example, C<~0> usually evaluates to a large
ba7f043c 3396integral value. However, S<C<use integer; ~0>> is C<-1> on two's-complement
19799a22 3397machines.
68dc0745 3398
3399=head2 Floating-point Arithmetic
06ce2fa3 3400
d74e8afc 3401X<floating-point> X<floating point> X<float> X<real>
68dc0745 3402
ba7f043c 3403While S<C<use integer>> provides integer-only arithmetic, there is no
19799a22
GS
3404analogous mechanism to provide automatic rounding or truncation to a
3405certain number of decimal places. For rounding to a certain number
ba7f043c 3406of digits, C<sprintf()> or C<printf()> is usually the easiest route.
19799a22 3407See L<perlfaq4>.
68dc0745 3408
5a964f20
TC
3409Floating-point numbers are only approximations to what a mathematician
3410would call real numbers. There are infinitely more reals than floats,
3411so some corners must be cut. For example:
3412
3413 printf "%.20g\n", 123456789123456789;
3414 # produces 123456789123456784
3415
8548cb57
RGS
3416Testing for exact floating-point equality or inequality is not a
3417good idea. Here's a (relatively expensive) work-around to compare
5a964f20
TC
3418whether two floating-point numbers are equal to a particular number of
3419decimal places. See Knuth, volume II, for a more robust treatment of
3420this topic.
3421
3422 sub fp_equal {
3423 my ($X, $Y, $POINTS) = @_;
3424 my ($tX, $tY);
3425 $tX = sprintf("%.${POINTS}g", $X);
3426 $tY = sprintf("%.${POINTS}g", $Y);
3427 return $tX eq $tY;
3428 }
3429
68dc0745 3430The POSIX module (part of the standard perl distribution) implements
ba7f043c
KW
3431C<ceil()>, C<floor()>, and other mathematical and trigonometric functions.
3432The C<L<Math::Complex>> module (part of the standard perl distribution)
19799a22 3433defines mathematical functions that work on both the reals and the
ba7f043c 3434imaginary numbers. C<Math::Complex> is not as efficient as POSIX, but
68dc0745 3435POSIX can't work with complex numbers.
3436
3437Rounding in financial applications can have serious implications, and
3438the rounding method used should be specified precisely. In these
3439cases, it probably pays not to trust whichever system rounding is
3440being used by Perl, but to instead implement the rounding function you
3441need yourself.
5a964f20
TC
3442
3443=head2 Bigger Numbers
d74e8afc 3444X<number, arbitrary precision>
5a964f20 3445
ba7f043c
KW
3446The standard C<L<Math::BigInt>>, C<L<Math::BigRat>>, and
3447C<L<Math::BigFloat>> modules,
fb1a95c6 3448along with the C<bignum>, C<bigint>, and C<bigrat> pragmas, provide
19799a22 3449variable-precision arithmetic and overloaded operators, although
46f8a5ea 3450they're currently pretty slow. At the cost of some space and
19799a22
GS
3451considerable speed, they avoid the normal pitfalls associated with
3452limited-precision representations.
5a964f20 3453
c543c01b
TC
3454 use 5.010;
3455 use bigint; # easy interface to Math::BigInt
3456 $x = 123456789123456789;
3457 say $x * $x;
3458 +15241578780673678515622620750190521
3459
3460Or with rationals:
3461
db691027
SF
3462 use 5.010;
3463 use bigrat;
3464 $x = 3/22;
3465 $y = 4/6;
3466 say "x/y is ", $x/$y;
3467 say "x*y is ", $x*$y;
3468 x/y is 9/44
3469 x*y is 1/11
c543c01b 3470
ba7f043c
KW
3471Several modules let you calculate with unlimited or fixed precision
3472(bound only by memory and CPU time). There
46f8a5ea 3473are also some non-standard modules that
c543c01b 3474provide faster implementations via external C libraries.
cd5c4fce
T
3475
3476Here is a short, but incomplete summary:
3477
950b09ed
KW
3478 Math::String treat string sequences like numbers
3479 Math::FixedPrecision calculate with a fixed precision
3480 Math::Currency for currency calculations
3481 Bit::Vector manipulate bit vectors fast (uses C)
3482 Math::BigIntFast Bit::Vector wrapper for big numbers
3483 Math::Pari provides access to the Pari C library
70c45be3
FC
3484 Math::Cephes uses the external Cephes C library (no
3485 big numbers)
950b09ed
KW
3486 Math::Cephes::Fraction fractions via the Cephes library
3487 Math::GMP another one using an external C library
70c45be3
FC
3488 Math::GMPz an alternative interface to libgmp's big ints
3489 Math::GMPq an interface to libgmp's fraction numbers
3490 Math::GMPf an interface to libgmp's floating point numbers
cd5c4fce
T
3491
3492Choose wisely.
16070b82
GS
3493
3494=cut