precedence version of this.
X<!>
-Unary "-" performs arithmetic negation if the operand is numeric. If
-the operand is an identifier, a string consisting of a minus sign
-concatenated with the identifier is returned. Otherwise, if the string
-starts with a plus or minus, a string starting with the opposite sign
-is returned. One effect of these rules is that -bareword is equivalent
+Unary "-" performs arithmetic negation if the operand is numeric,
+including any string that looks like a number. If the operand is
+an identifier, a string consisting of a minus sign concatenated
+with the identifier is returned. Otherwise, if the string starts
+with a plus or minus, a string starting with the opposite sign is
+returned. One effect of these rules is that -bareword is equivalent
to the string "-bareword". If, however, the string begins with a
non-alphabetic character (excluding "+" or "-"), Perl will attempt to convert
the string to a numeric and the arithmetic negation is performed. If the
pattern, substitution, or transliteration. The left argument is what is
supposed to be searched, substituted, or transliterated instead of the default
$_. When used in scalar context, the return value generally indicates the
-success of the operation. The exception is substitution with the C</r>
-(non-destructive) option, which causes the return value to be the result of
-the substition. Behavior in list context depends on the particular operator.
+success of the operation. The exceptions are substitution (s///)
+and transliteration (y///) with the C</r> (non-destructive) option,
+which cause the B<r>eturn value to be the result of the substitution.
+Behavior in list context depends on the particular operator.
See L</"Regexp Quote-Like Operators"> for details and L<perlretut> for
examples using these operators.
Binary "!~" is just like "=~" except the return value is negated in
the logical sense.
-Binary "!~" with a non-destructive substitution (s///r) is a syntax error.
+Binary "!~" with a non-destructive substitution (s///r) or transliteration
+(y///r) is a syntax error.
=head2 Multiplicative Operators
X<operator, multiplicative>
to its C-style or. In fact, it's exactly the same as C<||>, except that it
tests the left hand side's definedness instead of its truth. Thus, C<$a // $b>
is similar to C<defined($a) || $b> (except that it returns the value of C<$a>
-rather than the value of C<defined($a)>) and is exactly equivalent to
-C<defined($a) ? $a : $b>. This is very useful for providing default values
-for variables. If you actually want to test if at least one of C<$a> and
-C<$b> is defined, use C<defined($a // $b)>.
+rather than the value of C<defined($a)>) and yields the same result as
+C<defined($a) ? $a : $b> (except that the ternary-operator form can be
+used as a lvalue, while C<$a // $b> cannot). This is very useful for
+providing default values for variables. If you actually want to test if
+at least one of C<$a> and C<$b> is defined, use C<defined($a // $b)>.
The C<||>, C<//> and C<&&> operators return the last value evaluated
(unlike C's C<||> and C<&&>, which return 0 or 1). Thus, a reasonably
\e escape (ESC)
\x{263a} [1,8] hex char (example: SMILEY)
\x1b [2,8] restricted range hex char (example: ESC)
- \N{name} [3] named Unicode character
+ \N{name} [3] named Unicode character or character sequence
\N{U+263D} [4,8] Unicode character (example: FIRST QUARTER MOON)
\c[ [5] control char (example: chr(27))
\o{23072} [6,8] octal char (example: SMILEY)
=item [3]
-The result is the Unicode character given by I<name>.
+The result is the Unicode character or character sequence given by I<name>.
See L<charnames>.
=item [4]
sequences mean on both ASCII and EBCDIC platforms.
Use of any other character following the "c" besides those listed above is
-discouraged, and may become deprecated or forbidden. What happens for those
+discouraged, and some are deprecated with the intention of removing
+those in Perl 5.16. What happens for any of these
other characters currently though, is that the value is derived by inverting
the 7th bit (0x40).
use C<\o{}> instead which avoids all these problems. Otherwise, it is best to
use this construct only for ordinals C<\077> and below, remembering to pad to
the left with zeros to make three digits. For larger ordinals, either use
-C<\o{}> , or convert to someething else, such as to hex and use C<\x{}>
+C<\o{}> , or convert to something else, such as to hex and use C<\x{}>
instead.
-A backslash followed by a non-octal digit in a bracketed character class
-(C<[\8]> or C<[\9]>) will be interpreted as a NULL character and the digit.
-
Having fewer than 3 digits may lead to a misleading warning message that says
that what follows is ignored. For example, C<"\128"> in the ASCII character set
is equivalent to the two characters C<"\n8">, but the warning C<Illegal octal
digit '8' ignored> will be thrown. To avoid this warning, make sure to pad
-your octal number with C<0>s: C<"\0128">.
+your octal number with C<0>'s: C<"\0128">.
=item [8]
\u uppercase next char
\L lowercase till \E
\U uppercase till \E
- \E end case modification
\Q quote non-word characters till \E
+ \E end either case modification or quoted section
If C<use locale> is in effect, the case map used by C<\l>, C<\L>,
C<\u> and C<\U> is taken from the current locale. See L<perllocale>.
-If Unicode (for example, C<\N{}> or wide hex characters of 0x100 or
+If Unicode (for example, C<\N{}> or code points of 0x100 or
beyond) is being used, the case map used by C<\l>, C<\L>, C<\u> and
C<\U> is as defined by Unicode.
interpolated if the name is enclosed in braces C<@{*}>, but special
arrays C<@_>, C<@+>, and C<@-> are interpolated, even without braces.
-You cannot include a literal C<$> or C<@> within a C<\Q> sequence.
-An unescaped C<$> or C<@> interpolates the corresponding variable,
-while escaping will cause the literal string C<\$> to be inserted.
-You'll need to write something like C<m/\Quser\E\@\Qhost/>.
+For double-quoted strings, the quoting from C<\Q> is applied after
+interpolation and escapes are processed.
+
+ "abc\Qfoo\tbar$s\Exyz"
+
+is equivalent to
+
+ "abc" . quotemeta("foo\tbar$s") . "xyz"
+
+For the pattern of regex operators (C<qr//>, C<m//> and C<s///>),
+the quoting from C<\Q> is applied after interpolation is processed,
+but before escapes are processed. This allows the pattern to match
+literally (except for C<$> and C<@>). For example, the following matches:
+
+ '\s\t' =~ /\Q\s\t/
+
+Because C<$> or C<@> trigger interpolation, you'll need to use something
+like C</\Quser\E\@\Qhost/> to match them literally.
Patterns are subject to an additional level of interpretation as a
regular expression. This is done as a second pass, after variables are
=over 8
-=item qr/STRING/msixpo
+=item qr/STRING/msixpodual
X<qr> X</i> X</m> X</o> X</s> X</x> X</p>
This operator quotes (and possibly compiles) its I<STRING> as a regular
expression. I<STRING> is interpolated the same way as I<PATTERN>
in C<m/PATTERN/>. If "'" is used as the delimiter, no interpolation
is done. Returns a Perl value which may be used instead of the
-corresponding C</STRING/msixpo> expression. The returned value is a
+corresponding C</STRING/msixpodual> expression. The returned value is a
normalized version of the original pattern. It magically differs from
a string containing the same characters: C<ref(qr/x/)> returns "Regexp",
even though dereferencing the result returns undef.
$string =~ $re; # or used standalone
$string =~ /$re/; # or this way
-Since Perl may compile the pattern at the moment of execution of qr()
+Since Perl may compile the pattern at the moment of execution of the qr()
operator, using qr() may have speed advantages in some situations,
notably if the result of qr() is used standalone:
optimizations, but none would be triggered in the above example if
we did not use qr() operator.)
-Options are:
+Options (specified by the following modifiers) are:
m Treat string as multiple lines.
s Treat string as single line. (Make . match a newline)
p When matching preserve a copy of the matched string so
that ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
o Compile pattern only once.
+ l Use the locale
+ u Use Unicode rules
+ a Use ASCII for \d, \s, \w; specifying two a's further restricts
+ /i matching so that no ASCII character will match a non-ASCII
+ one
+ d Use Unicode or native charset, as in 5.12 and earlier
If a precompiled pattern is embedded in a larger pattern then the effect
-of 'msixp' will be propagated appropriately. The effect of the 'o'
+of 'msixpluad' will be propagated appropriately. The effect the 'o'
modifier has is not propagated, being restricted to those patterns
explicitly using it.
+The last four modifiers listed above, added in Perl 5.14,
+control the character set semantics.
+
See L<perlre> for additional information on valid syntax for STRING, and
-for a detailed look at the semantics of regular expressions.
+for a detailed look at the semantics of regular expressions. In
+particular, all the modifiers execpt C</o> are further explained in
+L<perlre/Modifiers>. C</o> is described in the next section.
-=item m/PATTERN/msixpogc
+=item m/PATTERN/msixpodualgc
X<m> X<operator, match>
X<regexp, options> X<regexp> X<regex, options> X<regex>
X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c>
-=item /PATTERN/msixpogc
+=item /PATTERN/msixpodualgc
Searches a string for a pattern match, and in scalar context returns
true if it succeeds, false if it fails. If no string is specified
via the C<=~> or C<!~> operator, the $_ string is searched. (The
string specified with C<=~> need not be an lvalue--it may be the
result of an expression evaluation, but remember the C<=~> binds
-rather tightly.) See also L<perlre>. See L<perllocale> for
-discussion of additional considerations that apply when C<use locale>
-is in effect.
+rather tightly.) See also L<perlre>.
-Options are as described in C<qr//>; in addition, the following match
+Options are as described in C<qr//> above; in addition, the following match
process modifiers are available:
g Match globally, i.e., find all occurrences.
you can use any pair of non-whitespace characters
as delimiters. This is particularly useful for matching path names
that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is
-the delimiter, then the match-only-once rule of C<?PATTERN?> applies.
+the delimiter, then a match-only-once rule applies,
+described in C<m?PATTERN?> below.
If "'" is the delimiter, no interpolation is performed on the PATTERN.
When using a character valid in an identifier, whitespace is required
after the C<m>.
-PATTERN may contain variables, which will be interpolated (and the
-pattern recompiled) every time the pattern search is evaluated, except
+PATTERN may contain variables, which will be interpolated
+every time the pattern search is evaluated, except
for when the delimiter is a single quote. (Note that C<$(>, C<$)>, and
C<$|> are not interpolated because they look like end-of-string tests.)
-If you want such a pattern to be compiled only once, add a C</o> after
-the trailing delimiter. This avoids expensive run-time recompilations,
-and is useful when the value you are interpolating won't change over
-the life of the script. However, mentioning C</o> constitutes a promise
-that you won't change the variables in the pattern. If you change them,
-Perl won't even notice. See also L<"qr/STRING/msixpo">.
+Perl will not recompile the pattern unless an interpolated
+variable that it contains changes. You can force Perl to skip the
+test and never recompile by adding a C</o> (which stands for "once")
+after the trailing delimiter.
+Once upon a time, Perl would recompile regular expressions
+unnecessarily, and this modifier was useful to tell it not to do so, in the
+interests of speed. But now, the only reasons to use C</o> are either:
+
+=over
+
+=item 1
+
+The variables are thousands of characters long and you know that they
+don't change, and you need to wring out the last little bit of speed by
+having Perl skip testing for that. (There is a maintenance penalty for
+doing this, as mentioning C</o> constitutes a promise that you won't
+change the variables in the pattern. If you change them, Perl won't
+even notice.)
+
+=item 2
+
+you want the pattern to use the initial values of the variables
+regardless of whether they change or not. (But there are saner ways
+of accomplishing this than using C</o>.)
+
+=back
=item The empty pattern //
the pattern matched.
The C</g> modifier specifies global pattern matching--that is,
-matching as many times as possible within the string. How it behaves
-depends on the context. In list context, it returns a list of the
+matching as many times as possible within the string. How it behaves
+depends on the context. In list context, it returns a list of the
substrings matched by any capturing parentheses in the regular
-expression. If there are no parentheses, it returns a list of all
+expression. If there are no parentheses, it returns a list of all
the matched strings, as if there were parentheses around the whole
pattern.
In scalar context, each execution of C<m//g> finds the next match,
returning true if it matches, and false if there is no further match.
-The position after the last match can be read or set using the pos()
-function; see L<perlfunc/pos>. A failed match normally resets the
+The position after the last match can be read or set using the C<pos()>
+function; see L<perlfunc/pos>. A failed match normally resets the
search position to the beginning of the string, but you can avoid that
-by adding the C</c> modifier (e.g. C<m//gc>). Modifying the target
+by adding the C</c> modifier (e.g. C<m//gc>). Modifying the target
string also resets the search position.
=item \G assertion
You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a
-zero-width assertion that matches the exact position where the previous
-C<m//g>, if any, left off. Without the C</g> modifier, the C<\G> assertion
-still anchors at pos(), but the match is of course only attempted once.
-Using C<\G> without C</g> on a target string that has not previously had a
-C</g> match applied to it is the same as using the C<\A> assertion to match
-the beginning of the string. Note also that, currently, C<\G> is only
-properly supported when anchored at the very beginning of the pattern.
+zero-width assertion that matches the exact position where the
+previous C<m//g>, if any, left off. Without the C</g> modifier, the
+C<\G> assertion still anchors at C<pos()> as it was at the start of
+the operation (see L<perlfunc/pos>), but the match is of course only
+attempted once. Using C<\G> without C</g> on a target string that has
+not previously had a C</g> match applied to it is the same as using
+the C<\A> assertion to match the beginning of the string. Note also
+that, currently, C<\G> is only properly supported when anchored at the
+very beginning of the pattern.
Examples:
Here is the output (split into several lines):
- line-noise lowercase line-noise lowercase UPPERCASE line-noise
- UPPERCASE line-noise lowercase line-noise lowercase line-noise
- lowercase lowercase line-noise lowercase lowercase line-noise
- MiXeD line-noise. That's all!
+ line-noise lowercase line-noise UPPERCASE line-noise UPPERCASE
+ line-noise lowercase line-noise lowercase line-noise lowercase
+ lowercase line-noise lowercase lowercase line-noise lowercase
+ lowercase line-noise MiXeD line-noise. That's all!
+
+=item m?PATTERN?
+X<?> X<operator, match-once>
=item ?PATTERN?
-X<?>
-This is just like the C</pattern/> search, except that it matches only
-once between calls to the reset() operator. This is a useful
+This is just like the C<m/PATTERN/> search, except that it matches
+only once between calls to the reset() operator. This is a useful
optimization when you want to see only the first occurrence of
-something in each file of a set of files, for instance. Only C<??>
+something in each file of a set of files, for instance. Only C<m??>
patterns local to the current package are reset.
while (<>) {
- if (?^$?) {
+ if (m?^$?) {
# blank line between header and body
}
} continue {
- reset if eof; # clear ?? status for next file
+ reset if eof; # clear m?? status for next file
}
-This usage is vaguely deprecated, which means it just might possibly
-be removed in some distant future version of Perl, perhaps somewhere
-around the year 2168.
+The match-once behaviour is controlled by the match delimiter being
+C<?>; with any other delimiter this is the normal C<m//> operator.
-=item s/PATTERN/REPLACEMENT/msixpogcer
+For historical reasons, the leading C<m> in C<m?PATTERN?> is optional,
+but the resulting C<?PATTERN?> syntax is deprecated, will warn on
+usage and may be removed from a future stable release of Perl without
+further notice.
+
+=item s/PATTERN/REPLACEMENT/msixpodualgcer
X<substitute> X<substitution> X<replace> X<regexp, replace>
X<regexp, substitute> X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c> X</e> X</r>
made. Otherwise it returns false (specifically, the empty string).
If the C</r> (non-destructive) option is used then it will perform the
-substitution on a copy of the string and return the copy whether or not a
+substitution on a copy of the string and instead of returning the
+number of substitutions, it returns the copy whether or not a
substitution occurred. The original string will always remain unchanged in
this case. The copy will always be a plain string, even if the input is an
object or a tied variable.
the variable is interpolated, use the C</o> option. If the pattern
evaluates to the empty string, the last successfully executed regular
expression is used instead. See L<perlre> for further explanation on these.
-See L<perllocale> for discussion of additional considerations that apply
-when C<use locale> is in effect.
Options are as with m// with the addition of the following replacement
specific options:
# expand tabs to 8-column spacing
1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e;
+C<s///le> is treated as a substitution followed by the C<le> operator, not
+the C</le> flags. This may change in a future version of Perl. It
+produces a warning if warnings are enabled. To disambiguate, use a space
+or change the order of the flags:
+
+ s/foo/bar/ le 5; # "le" infix operator
+ s/foo/bar/el; # "e" and "l" flags
+
=back
=head2 Quote-Like Operators
The STDIN filehandle used by the command is inherited from Perl's STDIN.
For example:
- open BLAM, "blam" || die "Can't open: $!";
- open STDIN, "<&BLAM";
- print `sort`;
+ open SPLAT, "stuff" or die "can't open stuff: $!";
+ open STDIN, "<&SPLAT" or die "can't dupe SPLAT: $!";
+ print STDOUT `sort`;
-will print the sorted contents of the file "blam".
+will print the sorted contents of the file named F<"stuff">.
Using single-quote as a delimiter protects the command from Perl's
double-quote interpolation, passing it on to the shell instead:
produces warnings if the STRING contains the "," or the "#" character.
-=item tr/SEARCHLIST/REPLACEMENTLIST/cds
+=item tr/SEARCHLIST/REPLACEMENTLIST/cdsr
X<tr> X<y> X<transliterate> X</c> X</d> X</s>
-=item y/SEARCHLIST/REPLACEMENTLIST/cds
+=item y/SEARCHLIST/REPLACEMENTLIST/cdsr
Transliterates all occurrences of the characters found in the search list
with the corresponding character in the replacement list. It returns
string specified with =~ must be a scalar variable, an array element, a
hash element, or an assignment to one of those, i.e., an lvalue.)
+If the C</r> (non-destructive) option is used then it will perform the
+replacement on a copy of the string and return the copy whether or not it
+was modified. The original string will always remain unchanged in
+this case. The copy will always be a plain string, even if the input is an
+object or a tied variable.
+
A character range may be specified with a hyphen, so C<tr/A-J/0-9/>
does the same replacement as C<tr/ACEGIBDFHJ/0246813579/>.
For B<sed> devotees, C<y> is provided as a synonym for C<tr>. If the
c Complement the SEARCHLIST.
d Delete found but unreplaced characters.
s Squash duplicate replaced characters.
+ r Return the modified string and leave the original string
+ untouched.
If the C</c> modifier is specified, the SEARCHLIST character set
is complemented. If the C</d> modifier is specified, any characters
tr/a-zA-Z//s; # bookkeeper -> bokeper
($HOST = $host) =~ tr/a-z/A-Z/;
+ $HOST = $host =~ tr/a-z/A-Z/r; # same thing
+
+ $HOST = $host =~ tr/a-z/A-Z/r # chained with s///
+ =~ s/:/ -p/r;
tr/a-zA-Z/ /cs; # change non-alphas to single space
+ @stripped = map tr/a-zA-Z/ /csr, @original;
+ # /r with map
+
tr [\200-\377]
[\000-\177]; # delete 8th bit
If the first delimiter is not an opening punctuation, three delimiters must
be same such as C<s!!!> and C<tr)))>, in which case the second delimiter
terminates the left part and starts the right part at once.
-If the left part is delimited by bracketing punctuations (that is C<()>,
+If the left part is delimited by bracketing punctuation (that is C<()>,
C<[]>, C<{}>, or C<< <> >>), the right part needs another pair of
-delimiters such as C<s(){}> and C<tr[]//>. In these cases, whitespaces
+delimiters such as C<s(){}> and C<tr[]//>. In these cases, whitespace
and comments are allowed between both parts, though the comment must follow
-at least one whitespace; otherwise a character expected as the start of
-the comment may be regarded as the starting delimiter of the right part.
+at least one whitespace character; otherwise a character expected as the
+start of the comment may be regarded as the starting delimiter of the right part.
During this search no attention is paid to the semantics of the construct.
Thus:
use integer;
-you may tell the compiler that it's okay to use integer operations
-(if it feels like it) from here to the end of the enclosing BLOCK.
-An inner BLOCK may countermand this by saying
+you may tell the compiler to use integer operations
+(see L<integer> for a detailed explanation) from here to the end of
+the enclosing BLOCK. An inner BLOCK may countermand this by saying
no integer;
which lasts until the end of that BLOCK. Note that this doesn't
-mean everything is only an integer, merely that Perl may use integer
-operations if it is so inclined. For example, even under C<use
-integer>, if you take the C<sqrt(2)>, you'll still get C<1.4142135623731>
-or so.
+mean everything is an integer, merely that Perl will use integer
+operations for arithmetic, comparison, and bitwise operators. For
+example, even under C<use integer>, if you take the C<sqrt(2)>, you'll
+still get C<1.4142135623731> or so.
Used on numbers, the bitwise operators ("&", "|", "^", "~", "<<",
and ">>") always produce integral results. (But see also