precedence version of this.
X<!>
-Unary "-" performs arithmetic negation if the operand is numeric. If
-the operand is an identifier, a string consisting of a minus sign
-concatenated with the identifier is returned. Otherwise, if the string
-starts with a plus or minus, a string starting with the opposite sign
-is returned. One effect of these rules is that -bareword is equivalent
+Unary "-" performs arithmetic negation if the operand is numeric,
+including any string that looks like a number. If the operand is
+an identifier, a string consisting of a minus sign concatenated
+with the identifier is returned. Otherwise, if the string starts
+with a plus or minus, a string starting with the opposite sign is
+returned. One effect of these rules is that -bareword is equivalent
to the string "-bareword". If, however, the string begins with a
non-alphabetic character (excluding "+" or "-"), Perl will attempt to convert
the string to a numeric and the arithmetic negation is performed. If the
pattern, substitution, or transliteration. The left argument is what is
supposed to be searched, substituted, or transliterated instead of the default
$_. When used in scalar context, the return value generally indicates the
-success of the operation. Behavior in list context depends on the particular
-operator. See L</"Regexp Quote-Like Operators"> for details and
-L<perlretut> for examples using these operators.
+success of the operation. The exceptions are substitution (s///)
+and transliteration (y///) with the C</r> (non-destructive) option,
+which cause the B<r>eturn value to be the result of the substitution.
+Behavior in list context depends on the particular operator.
+See L</"Regexp Quote-Like Operators"> for details and L<perlretut> for
+examples using these operators.
If the right argument is an expression rather than a search pattern,
substitution, or transliteration, it is interpreted as a search pattern at run
Binary "!~" is just like "=~" except the return value is negated in
the logical sense.
+Binary "!~" with a non-destructive substitution (s///r) or transliteration
+(y///r) is a syntax error.
+
=head2 Multiplicative Operators
X<operator, multiplicative>
to its C-style or. In fact, it's exactly the same as C<||>, except that it
tests the left hand side's definedness instead of its truth. Thus, C<$a // $b>
is similar to C<defined($a) || $b> (except that it returns the value of C<$a>
-rather than the value of C<defined($a)>) and is exactly equivalent to
-C<defined($a) ? $a : $b>. This is very useful for providing default values
-for variables. If you actually want to test if at least one of C<$a> and
-C<$b> is defined, use C<defined($a // $b)>.
+rather than the value of C<defined($a)>) and yields the same result as
+C<defined($a) ? $a : $b> (except that the ternary-operator form can be
+used as a lvalue, while C<$a // $b> cannot). This is very useful for
+providing default values for variables. If you actually want to test if
+at least one of C<$a> and C<$b> is defined, use C<defined($a // $b)>.
The C<||>, C<//> and C<&&> operators return the last value evaluated
(unlike C's C<||> and C<&&>, which return 0 or 1). Thus, a reasonably
The following escape sequences are available in constructs that interpolate
and in transliterations.
X<\t> X<\n> X<\r> X<\f> X<\b> X<\a> X<\e> X<\x> X<\0> X<\c> X<\N> X<\N{}>
-
- Sequence Note Description
- \t tab (HT, TAB)
- \n newline (NL)
- \r return (CR)
- \f form feed (FF)
- \b backspace (BS)
- \a alarm (bell) (BEL)
- \e escape (ESC)
- \033 octal char (example: ESC)
- \x1b hex char (example: ESC)
- \x{263a} wide hex char (example: SMILEY)
- \c[ [1] control char (example: chr(27))
- \N{name} [2] named Unicode character
- \N{U+263D} [3] Unicode character (example: FIRST QUARTER MOON)
+X<\o{}>
+
+ Sequence Note Description
+ \t tab (HT, TAB)
+ \n newline (NL)
+ \r return (CR)
+ \f form feed (FF)
+ \b backspace (BS)
+ \a alarm (bell) (BEL)
+ \e escape (ESC)
+ \x{263a} [1,8] hex char (example: SMILEY)
+ \x1b [2,8] restricted range hex char (example: ESC)
+ \N{name} [3] named Unicode character or character sequence
+ \N{U+263D} [4,8] Unicode character (example: FIRST QUARTER MOON)
+ \c[ [5] control char (example: chr(27))
+ \o{23072} [6,8] octal char (example: SMILEY)
+ \033 [7,8] restricted range octal char (example: ESC)
=over 4
=item [1]
+The result is the character specified by the hexadecimal number between
+the braces. See L</[8]> below for details on which character.
+
+Only hexadecimal digits are valid between the braces. If an invalid
+character is encountered, a warning will be issued and the invalid
+character and all subsequent characters (valid or invalid) within the
+braces will be discarded.
+
+If there are no valid digits between the braces, the generated character is
+the NULL character (C<\x{00}>). However, an explicit empty brace (C<\x{}>)
+will not cause a warning.
+
+=item [2]
+
+The result is the character specified by the hexadecimal number in the range
+0x00 to 0xFF. See L</[8]> below for details on which character.
+
+Only hexadecimal digits are valid following C<\x>. When C<\x> is followed
+by fewer than two valid digits, any valid digits will be zero-padded. This
+means that C<\x7> will be interpreted as C<\x07> and C<\x> alone will be
+interpreted as C<\x00>. Except at the end of a string, having fewer than
+two valid digits will result in a warning. Note that while the warning
+says the illegal character is ignored, it is only ignored as part of the
+escape and will still be used as the subsequent character in the string.
+For example:
+
+ Original Result Warns?
+ "\x7" "\x07" no
+ "\x" "\x00" no
+ "\x7q" "\x07q" yes
+ "\xq" "\x00q" yes
+
+=item [3]
+
+The result is the Unicode character or character sequence given by I<name>.
+See L<charnames>.
+
+=item [4]
+
+C<\N{U+I<hexadecimal number>}> means the Unicode character whose Unicode code
+point is I<hexadecimal number>.
+
+=item [5]
+
The character following C<\c> is mapped to some other character as shown in the
table:
To get platform independent controls, you can use C<\N{...}>.
-=item [2]
-
-For documentation of C<\N{name}>, see L<charnames>.
-
-=item [3]
-
-C<\N{U+I<wide hex char>}> means the Unicode character whose Unicode ordinal
-number is I<wide hex char>.
+=item [6]
+
+The result is the character specified by the octal number between the braces.
+See L</[8]> below for details on which character.
+
+If a character that isn't an octal digit is encountered, a warning is raised,
+and the value is based on the octal digits before it, discarding it and all
+following characters up to the closing brace. It is a fatal error if there are
+no octal digits at all.
+
+=item [7]
+
+The result is the character specified by the three digit octal number in the
+range 000 to 777 (but best to not use above 077, see next paragraph). See
+L</[8]> below for details on which character.
+
+Some contexts allow 2 or even 1 digit, but any usage without exactly
+three digits, the first being a zero, may give unintended results. (For
+example, see L<perlrebackslash/Octal escapes>.) Starting in Perl 5.14, you may
+use C<\o{}> instead which avoids all these problems. Otherwise, it is best to
+use this construct only for ordinals C<\077> and below, remembering to pad to
+the left with zeros to make three digits. For larger ordinals, either use
+C<\o{}> , or convert to something else, such as to hex and use C<\x{}>
+instead.
+
+Having fewer than 3 digits may lead to a misleading warning message that says
+that what follows is ignored. For example, C<"\128"> in the ASCII character set
+is equivalent to the two characters C<"\n8">, but the warning C<Illegal octal
+digit '8' ignored> will be thrown. To avoid this warning, make sure to pad
+your octal number with C<0>s: C<"\0128">.
+
+=item [8]
+
+Several of the constructs above specify a character by a number. That number
+gives the character's position in the character set encoding (indexed from 0).
+This is called synonymously its ordinal, code position, or code point). Perl
+works on platforms that have a native encoding currently of either ASCII/Latin1
+or EBCDIC, each of which allow specification of 256 characters. In general, if
+the number is 255 (0xFF, 0377) or below, Perl interprets this in the platform's
+native encoding. If the number is 256 (0x100, 0400) or above, Perl interprets
+it as as a Unicode code point and the result is the corresponding Unicode
+character. For example C<\x{50}> and C<\o{120}> both are the number 80 in
+decimal, which is less than 256, so the number is interpreted in the native
+character set encoding. In ASCII the character in the 80th position (indexed
+from 0) is the letter "P", and in EBCDIC it is the ampersand symbol "&".
+C<\x{100}> and C<\o{400}> are both 256 in decimal, so the number is interpreted
+as a Unicode code point no matter what the native encoding is. The name of the
+character in the 100th position (indexed by 0) in Unicode is
+C<LATIN CAPITAL LETTER A WITH MACRON>.
+
+There are a couple of exceptions to the above rule. C<\N{U+I<hex number>}> is
+always interpreted as a Unicode code point, so that C<\N{U+0050}> is "P" even
+on EBCDIC platforms. And if L<C<S<use encoding>>|encoding> is in effect, the
+number is considered to be in that encoding, and is translated from that into
+the platform's native encoding if there is a corresponding native character;
+otherwise to Unicode.
=back
\u uppercase next char
\L lowercase till \E
\U uppercase till \E
- \E end case modification
\Q quote non-word characters till \E
+ \E end either case modification or quoted section
If C<use locale> is in effect, the case map used by C<\l>, C<\L>,
C<\u> and C<\U> is taken from the current locale. See L<perllocale>.
interpolated if the name is enclosed in braces C<@{*}>, but special
arrays C<@_>, C<@+>, and C<@-> are interpolated, even without braces.
-You cannot include a literal C<$> or C<@> within a C<\Q> sequence.
-An unescaped C<$> or C<@> interpolates the corresponding variable,
-while escaping will cause the literal string C<\$> to be inserted.
-You'll need to write something like C<m/\Quser\E\@\Qhost/>.
+For double-quoted strings, the quoting from C<\Q> is applied after
+interpolation and escapes are processed.
+
+ "abc\Qfoo\tbar$s\Exyz"
+
+is equivalent to
+
+ "abc" . quotemeta("foo\tbar$s") . "xyz"
+
+For the pattern of regex operators (C<qr//>, C<m//> and C<s///>),
+the quoting from C<\Q> is applied after interpolation is processed,
+but before escapes are processed. This allows the pattern to match
+literally (except for C<$> and C<@>). For example, the following matches:
+
+ '\s\t' =~ /\Q\s\t/
+
+Because C<$> or C<@> trigger interpolation, you'll need to use something
+like C</\Quser\E\@\Qhost/> to match them literally.
Patterns are subject to an additional level of interpretation as a
regular expression. This is done as a second pass, after variables are
g Match globally, i.e., find all occurrences.
c Do not reset search position on a failed match when /g is in effect.
-If "/" is the delimiter then the initial C<m> is optional. With the C<m>
-you can use any pair of non-whitespace characters
-as delimiters. This is particularly useful for matching path names
-that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is
-the delimiter, then the match-only-once rule of C<?PATTERN?> applies.
+If "/" is the delimiter then the initial C<m> is optional. With the
+C<m> you can use any pair of non-whitespace characters as delimiters.
+This is particularly useful for matching path names that contain "/",
+to avoid LTS (leaning toothpick syndrome). If "?" is the delimiter,
+then the match-only-once rule of C<m?PATTERN?> applies (see below).
If "'" is the delimiter, no interpolation is performed on the PATTERN.
When using a character valid in an identifier, whitespace is required
after the C<m>.
the pattern matched.
The C</g> modifier specifies global pattern matching--that is,
-matching as many times as possible within the string. How it behaves
-depends on the context. In list context, it returns a list of the
+matching as many times as possible within the string. How it behaves
+depends on the context. In list context, it returns a list of the
substrings matched by any capturing parentheses in the regular
-expression. If there are no parentheses, it returns a list of all
+expression. If there are no parentheses, it returns a list of all
the matched strings, as if there were parentheses around the whole
pattern.
In scalar context, each execution of C<m//g> finds the next match,
returning true if it matches, and false if there is no further match.
-The position after the last match can be read or set using the pos()
-function; see L<perlfunc/pos>. A failed match normally resets the
+The position after the last match can be read or set using the C<pos()>
+function; see L<perlfunc/pos>. A failed match normally resets the
search position to the beginning of the string, but you can avoid that
-by adding the C</c> modifier (e.g. C<m//gc>). Modifying the target
+by adding the C</c> modifier (e.g. C<m//gc>). Modifying the target
string also resets the search position.
=item \G assertion
You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a
-zero-width assertion that matches the exact position where the previous
-C<m//g>, if any, left off. Without the C</g> modifier, the C<\G> assertion
-still anchors at pos(), but the match is of course only attempted once.
-Using C<\G> without C</g> on a target string that has not previously had a
-C</g> match applied to it is the same as using the C<\A> assertion to match
-the beginning of the string. Note also that, currently, C<\G> is only
-properly supported when anchored at the very beginning of the pattern.
+zero-width assertion that matches the exact position where the
+previous C<m//g>, if any, left off. Without the C</g> modifier, the
+C<\G> assertion still anchors at C<pos()> as it was at the start of
+the operation (see L<perlfunc/pos>), but the match is of course only
+attempted once. Using C<\G> without C</g> on a target string that has
+not previously had a C</g> match applied to it is the same as using
+the C<\A> assertion to match the beginning of the string. Note also
+that, currently, C<\G> is only properly supported when anchored at the
+very beginning of the pattern.
Examples:
Here is the output (split into several lines):
- line-noise lowercase line-noise lowercase UPPERCASE line-noise
- UPPERCASE line-noise lowercase line-noise lowercase line-noise
- lowercase lowercase line-noise lowercase lowercase line-noise
- MiXeD line-noise. That's all!
+ line-noise lowercase line-noise UPPERCASE line-noise UPPERCASE
+ line-noise lowercase line-noise lowercase line-noise lowercase
+ lowercase line-noise lowercase lowercase line-noise lowercase
+ lowercase line-noise MiXeD line-noise. That's all!
-=item ?PATTERN?
+=item m?PATTERN?
X<?>
-This is just like the C</pattern/> search, except that it matches only
+This is just like the C<m/pattern/> search, except that it matches only
once between calls to the reset() operator. This is a useful
optimization when you want to see only the first occurrence of
-something in each file of a set of files, for instance. Only C<??>
+something in each file of a set of files, for instance. Only C<m??>
patterns local to the current package are reset.
while (<>) {
- if (?^$?) {
+ if (m?^$?) {
# blank line between header and body
}
} continue {
reset if eof; # clear ?? status for next file
}
-This usage is vaguely deprecated, which means it just might possibly
-be removed in some distant future version of Perl, perhaps somewhere
-around the year 2168.
+The use of C<?PATTERN?> without a leading "m" is vaguely deprecated,
+which means it just might possibly be removed in some distant future
+version of Perl, perhaps somewhere around the year 2168.
-=item s/PATTERN/REPLACEMENT/msixpogce
+=item s/PATTERN/REPLACEMENT/msixpogcer
X<substitute> X<substitution> X<replace> X<regexp, replace>
-X<regexp, substitute> X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c> X</e>
+X<regexp, substitute> X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c> X</e> X</r>
Searches a string for a pattern, and if found, replaces that pattern
with the replacement text and returns the number of substitutions
made. Otherwise it returns false (specifically, the empty string).
+If the C</r> (non-destructive) option is used then it will perform the
+substitution on a copy of the string and return the copy whether or not a
+substitution occurred. The original string will always remain unchanged in
+this case. The copy will always be a plain string, even if the input is an
+object or a tied variable.
+
If no string is specified via the C<=~> or C<!~> operator, the C<$_>
variable is searched and modified. (The string specified with C<=~> must
be scalar variable, an array element, a hash element, or an assignment
specific options:
e Evaluate the right side as an expression.
- ee Evaluate the right side as a string then eval the result
+ ee Evaluate the right side as a string then eval the result.
+ r Return substitution and leave the original string untouched.
Any non-whitespace delimiter may replace the slashes. Add space after
the C<s> when using a character allowed in identifiers. If single quotes
s/Login: $foo/Login: $bar/; # run-time pattern
($foo = $bar) =~ s/this/that/; # copy first, then change
+ ($foo = "$bar") =~ s/this/that/; # convert to string, copy, then change
+ $foo = $bar =~ s/this/that/r; # Same as above using /r
+ $foo = $bar =~ s/this/that/r
+ =~ s/that/the other/r; # Chained substitutes using /r
+ @foo = map { s/this/that/r } @bar # /r is very useful in maps
$count = ($paragraph =~ s/Mister\b/Mr./g); # get change-count
s/%(.)/$percent{$1} || $&/ge; # expr now, so /e
s/^=(\w+)/pod($1)/ge; # use function call
+ $_ = 'abc123xyz';
+ $a = s/abc/def/r; # $a is 'def123xyz' and
+ # $_ remains 'abc123xyz'.
+
# expand variables in $_, but dynamics only, using
# symbolic dereferencing
s/\$(\w+)/${$1}/g;
produces warnings if the STRING contains the "," or the "#" character.
-=item tr/SEARCHLIST/REPLACEMENTLIST/cds
+=item tr/SEARCHLIST/REPLACEMENTLIST/cdsr
X<tr> X<y> X<transliterate> X</c> X</d> X</s>
-=item y/SEARCHLIST/REPLACEMENTLIST/cds
+=item y/SEARCHLIST/REPLACEMENTLIST/cdsr
Transliterates all occurrences of the characters found in the search list
with the corresponding character in the replacement list. It returns
string specified with =~ must be a scalar variable, an array element, a
hash element, or an assignment to one of those, i.e., an lvalue.)
+If the C</r> (non-destructive) option is used then it will perform the
+replacement on a copy of the string and return the copy whether or not it
+was modified. The original string will always remain unchanged in
+this case. The copy will always be a plain string, even if the input is an
+object or a tied variable.
+
A character range may be specified with a hyphen, so C<tr/A-J/0-9/>
does the same replacement as C<tr/ACEGIBDFHJ/0246813579/>.
For B<sed> devotees, C<y> is provided as a synonym for C<tr>. If the
c Complement the SEARCHLIST.
d Delete found but unreplaced characters.
s Squash duplicate replaced characters.
+ r Return the modified string and leave the original string
+ untouched.
If the C</c> modifier is specified, the SEARCHLIST character set
is complemented. If the C</d> modifier is specified, any characters
tr/a-zA-Z//s; # bookkeeper -> bokeper
($HOST = $host) =~ tr/a-z/A-Z/;
+ $HOST = $host =~ tr/a-z/A-Z/r; # same thing
+
+ $HOST = $host =~ tr/a-z/A-Z/r # chained with s///
+ =~ s/:/ -p/r;
tr/a-zA-Z/ /cs; # change non-alphas to single space
+ @stripped = map tr/a-zA-Z/ /csr, @original;
+ # /r with map
+
tr [\200-\377]
[\000-\177]; # delete 8th bit
use integer;
-you may tell the compiler that it's okay to use integer operations
-(if it feels like it) from here to the end of the enclosing BLOCK.
-An inner BLOCK may countermand this by saying
+you may tell the compiler to use integer operations
+(see L<integer> for a detailed explanation) from here to the end of
+the enclosing BLOCK. An inner BLOCK may countermand this by saying
no integer;
which lasts until the end of that BLOCK. Note that this doesn't
-mean everything is only an integer, merely that Perl may use integer
-operations if it is so inclined. For example, even under C<use
-integer>, if you take the C<sqrt(2)>, you'll still get C<1.4142135623731>
-or so.
+mean everything is an integer, merely that Perl will use integer
+operations for arithmetic, comparison, and bitwise operators. For
+example, even under C<use integer>, if you take the C<sqrt(2)>, you'll
+still get C<1.4142135623731> or so.
Used on numbers, the bitwise operators ("&", "|", "^", "~", "<<",
and ">>") always produce integral results. (But see also