pattern, substitution, or transliteration. The left argument is what is
supposed to be searched, substituted, or transliterated instead of the default
$_. When used in scalar context, the return value generally indicates the
-success of the operation. Behavior in list context depends on the particular
-operator. See L</"Regexp Quote-Like Operators"> for details and
-L<perlretut> for examples using these operators.
+success of the operation. The exception is substitution with the C</r>
+(non-destructive) option, which causes the return value to be the result of
+the substition. Behavior in list context depends on the particular operator.
+See L</"Regexp Quote-Like Operators"> for details and L<perlretut> for
+examples using these operators.
If the right argument is an expression rather than a search pattern,
substitution, or transliteration, it is interpreted as a search pattern at run
Binary "!~" is just like "=~" except the return value is negated in
the logical sense.
+Binary "!~" with a non-destructive substitution (s///r) is a syntax error.
+
=head2 Multiplicative Operators
X<operator, multiplicative>
As a scalar operator:
if (101 .. 200) { print; } # print 2nd hundred lines, short for
- # if ($. == 101 .. $. == 200) { print; }
+ # if ($. == 101 .. $. == 200) { print; }
next LINE if (1 .. /^$/); # skip header lines, short for
# next LINE if ($. == 1 .. /^$/);
To get lower-case greek letters, use this instead:
- my @greek_small = map { chr } ( ord("\N{alpha}") .. ord("\N{omega}") );
+ my @greek_small = map { chr } ( ord("\N{alpha}") ..
+ ord("\N{omega}") );
Because each operand is evaluated in integer form, C<2.18 .. 3.14> will
return two elements in list context.
=head2 Yada Yada Operator
X<...> X<... operator> X<yada yada operator>
-The yada yada operator (noted C<...>) is a placeholder for code.
-It parses without error, but when executed it throws an exception
-with the text C<Unimplemented>:
-
- sub foo { ... }
- foo();
-
- Unimplemented at <file> line <line number>.
-
-It takes no argument.
+The yada yada operator (noted C<...>) is a placeholder for code. Perl
+parses it without error, but when you try to execute a yada yada, it
+throws an exception with the text C<Unimplemented>:
+
+ sub unimplemented { ... }
+
+ eval { unimplemented() };
+ if( $@ eq 'Unimplemented' ) {
+ print "I found the yada yada!\n";
+ }
+
+You can only use the yada yada to stand in for a complete statement.
+These examples of the yada yada work:
+
+ { ... }
+
+ sub foo { ... }
+
+ ...;
+
+ eval { ... };
+
+ sub foo {
+ my( $self ) = shift;
+
+ ...;
+ }
+
+ do { my $n; ...; print 'Hurrah!' };
+
+The yada yada cannot stand in for an expression that is part of a
+larger statement since the C<...> is also the three-dot version of the
+range operator (see L<Range Operators>). These examples of the yada
+yada are still syntax errors:
+
+ print ...;
+
+ open my($fh), '>', '/dev/passwd' or ...;
+
+ if( $condition && ... ) { print "Hello\n" };
+
+There are some cases where Perl can't immediately tell the difference
+between an expression and a statement. For instance, the syntax for a
+block and an anonymous hash reference constructor look the same unless
+there's something in the braces that give Perl a hint. The yada yada
+is a syntax error if Perl doesn't guess that the C<{ ... }> is a
+block. In that case, it doesn't think the C<...> is the yada yada
+because it's expecting an expression instead of a statement:
+
+ my @transformed = map { ... } @input; # syntax error
+
+You can use a C<;> inside your block to denote that the C<{ ... }> is
+a block and not a hash reference constructor. Now the yada yada works:
+
+ my @transformed = map {; ... } @input; # ; disambiguates
+
+ my @transformed = map { ...; } @input; # ; disambiguates
=head2 List Operators (Rightward)
X<operator, list, rightward> X<list operator>
The following escape sequences are available in constructs that interpolate
and in transliterations.
-X<\t> X<\n> X<\r> X<\f> X<\b> X<\a> X<\e> X<\x> X<\0> X<\c> X<\N>
-
- \t tab (HT, TAB)
- \n newline (NL)
- \r return (CR)
- \f form feed (FF)
- \b backspace (BS)
- \a alarm (bell) (BEL)
- \e escape (ESC)
- \033 octal char (example: ESC)
- \x1b hex char (example: ESC)
- \x{263a} wide hex char (example: SMILEY)
- \c[ control char (example: ESC)
- \N{name} named Unicode character
-
-The character following C<\c> is mapped to some other character by
-converting letters to upper case and then (on ASCII systems) by inverting
-the 7th bit (0x40). The most interesting range is from '@' to '_'
-(0x40 through 0x5F), resulting in a control character from 0x00
-through 0x1F. A '?' maps to the DEL character. On EBCDIC systems only
-'@', the letters, '[', '\', ']', '^', '_' and '?' will work, resulting
-in 0x00 through 0x1F and 0x7F.
-
-B<NOTE>: Unlike C and other languages, Perl has no \v escape sequence for
-the vertical tab (VT - ASCII 11), but you may use C<\ck> or C<\x0b>.
+X<\t> X<\n> X<\r> X<\f> X<\b> X<\a> X<\e> X<\x> X<\0> X<\c> X<\N> X<\N{}>
+X<\o{}>
+
+ Sequence Note Description
+ \t tab (HT, TAB)
+ \n newline (NL)
+ \r return (CR)
+ \f form feed (FF)
+ \b backspace (BS)
+ \a alarm (bell) (BEL)
+ \e escape (ESC)
+ \x{263a} [1] hex char (example: SMILEY)
+ \x1b [2] restricted range hex char (example: ESC)
+ \N{name} [3] named Unicode character
+ \N{U+263D} [4] Unicode character (example: FIRST QUARTER MOON)
+ \c[ [5] control char (example: chr(27))
+ \o{23072} [6] octal char (example: SMILEY)
+ \033 [7] restricted range octal char (example: ESC)
-The following escape sequences are available in constructs that interpolate
+=over 4
+
+=item [1]
+
+The result is the character whose ordinal is the hexadecimal number between
+the braces. If the ordinal is 0x100 and above, the character will be the
+Unicode character corresponding to the ordinal. If the ordinal is between
+0 and 0xFF, the rules for which character it represents are the same as for
+L<restricted hex chars|/[2]>.
+
+Only hexadecimal digits are valid between the braces. If an invalid
+character is encountered, a warning will be issued and the invalid
+character and all subsequent characters (valid or invalid) within the
+braces will be discarded.
+
+If there are no valid digits between the braces, the generated character is
+the NULL character (C<\x{00}>). However, an explicit empty brace (C<\x{}>)
+will not cause a warning.
+
+=item [2]
+
+The result is a single-byte character whose ordinal is in the range 0x00 to
+0xFF.
+
+Only hexadecimal digits are valid following C<\x>. When C<\x> is followed
+by less than two valid digits, any valid digits will be zero-padded. This
+means that C<\x7> will be interpreted as C<\x07> and C<\x> alone will be
+interpreted as C<\x00>. Except at the end of a string, having less than
+two valid digits will result in a warning. Note that while the warning
+says the illegal character is ignored, it is only ignored as part of the
+escape and will still be used as the subsequent character in the string.
+For example:
+
+ Original Result Warns?
+ "\x7" "\x07" no
+ "\x" "\x00" no
+ "\x7q" "\x07q" yes
+ "\xq" "\x00q" yes
+
+The B<run-time> interpretation of single-byte characters depends on the
+platform and on pragmata in effect. On EBCDIC platforms the character is
+treated as native to the platform's code page. On other platforms, the
+representation and semantics (sort order and which characters are upper
+case, lower case, digit, non-digit, etc.) depends on the current
+L<S<C<locale>>|perllocale> settings at run-time.
+
+However, when L<C<S<use feature 'unicode_strings'>>|feature> is in effect
+and both L<C<S<use bytes>>|bytes> and L<C<S<use locale>>|locale> are not,
+characters from 0x80 to 0xff are treated as Unicode code points from
+the Latin-1 Supplement block.
+
+Note that the locale semantics of single-byte characters in a regular
+expression are determined when the regular expression is compiled, not when
+the regular expression is used. When a regular expression is interpolated
+into another regular expression -- any prior semantics are ignored and only
+current locale matters for the resulting regular expression.
+
+=item [3]
+
+For documentation of C<\N{name}>, see L<charnames>.
+
+=item [4]
+
+C<\N{U+I<wide hex char>}> means the Unicode character whose Unicode ordinal
+number is I<wide hex char>.
+
+=item [5]
+
+The character following C<\c> is mapped to some other character as shown in the
+table:
+
+ Sequence Value
+ \c@ chr(0)
+ \cA chr(1)
+ \ca chr(1)
+ \cB chr(2)
+ \cb chr(2)
+ ...
+ \cZ chr(26)
+ \cz chr(26)
+ \c[ chr(27)
+ \c] chr(29)
+ \c^ chr(30)
+ \c? chr(127)
+
+Also, C<\c\I<X>> yields C< chr(28) . "I<X>"> for any I<X>, but cannot come at the
+end of a string, because the backslash would be parsed as escaping the end
+quote.
+
+On ASCII platforms, the resulting characters from the list above are the
+complete set of ASCII controls. This isn't the case on EBCDIC platforms; see
+L<perlebcdic/OPERATOR DIFFERENCES> for the complete list of what these
+sequences mean on both ASCII and EBCDIC platforms.
+
+Use of any other character following the "c" besides those listed above is
+discouraged, and may become deprecated or forbidden. What happens for those
+other characters currently though, is that the value is derived by inverting
+the 7th bit (0x40).
+
+To get platform independent controls, you can use C<\N{...}>.
+
+=item [6]
+
+The result is the character whose ordinal is the octal number between the
+braces.
+
+If a character that isn't an octal digit is encountered, a warning is raised,
+and the value is based on the octal digits before it, discarding it and all
+following characters up to the closing brace. It is a fatal error if there are
+no octal digits at all.
+
+=item [7]
+
+The result is the character whose ordinal is the given three digit octal
+number. Some contexts allow 2 or even 1 digit, but any usage without exactly
+three digits, the first being a zero, may give unintended results. (For
+example, see L<perlrebackslash/Octal escapes>.) Starting in Perl 5.14, you may
+use C<\o{}> instead which avoids all these problems. Otherwise, it is best to
+use this construct only for ordinals C<\077> and below, remembering to pad to
+the left with zeros to make three digits. For larger ordinals, either use
+C<\o{}> , or convert to someething else, such as to hex and use C<\x{}>
+instead.
+
+A backslash followed by a non-octal digit in a bracketed character class
+(C<[\8]> or C<[\9]>) will be interpreted as a NULL character and the digit.
+Having fewer than 3 digits may lead to a misleading warning message that says
+that what follows is ignored. For example, C<"\128"> in the ASCII character set
+is equivalent to the two characters C<"\n8">, but the warning C<Illegal octal
+digit '8' ignored> will be thrown. To avoid this warning, make sure to pad
+your octal number with C<0>s: C<"\0128">.
+
+=back
+
+B<NOTE>: Unlike C and other languages, Perl has no C<\v> escape sequence for
+the vertical tab (VT - ASCII 11), but you may use C<\ck> or C<\x0b>. (C<\v>
+does have meaning in regular expression patterns in Perl, see L<perlre>.)
+
+The following escape sequences are available in constructs that interpolate,
but not in transliterations.
X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q>
C<\u> and C<\U> is taken from the current locale. See L<perllocale>.
If Unicode (for example, C<\N{}> or wide hex characters of 0x100 or
beyond) is being used, the case map used by C<\l>, C<\L>, C<\u> and
-C<\U> is as defined by Unicode. For documentation of C<\N{name}>,
-see L<charnames>.
+C<\U> is as defined by Unicode.
All systems use the virtual C<"\n"> to represent a line terminator,
called a "newline". There is no such thing as an unvarying, physical
Options are as described in C<qr//>; in addition, the following match
process modifiers are available:
- g Match globally, i.e., find all occurrences.
- c Do not reset search position on a failed match when /g is in effect.
+ g Match globally, i.e., find all occurrences.
+ c Do not reset search position on a failed match when /g is in effect.
If "/" is the delimiter then the initial C<m> is optional. With the C<m>
you can use any pair of non-whitespace characters
regexp tries to match where the previous one leaves off.
$_ = <<'EOL';
- $url = URI::URL->new( "http://example.com/" ); die if $url eq "xXx";
+ $url = URI::URL->new( "http://example.com/" ); die if $url eq "xXx";
EOL
LOOP:
{
- print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc;
- print(" lowercase"), redo LOOP if /\G[a-z]+\b[,.;]?\s*/gc;
- print(" UPPERCASE"), redo LOOP if /\G[A-Z]+\b[,.;]?\s*/gc;
- print(" Capitalized"), redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/gc;
- print(" MiXeD"), redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/gc;
- print(" alphanumeric"), redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/gc;
- print(" line-noise"), redo LOOP if /\G[^A-Za-z0-9]+/gc;
- print ". That's all!\n";
+ print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc;
+ print(" lowercase"), redo LOOP if /\G[a-z]+\b[,.;]?\s*/gc;
+ print(" UPPERCASE"), redo LOOP if /\G[A-Z]+\b[,.;]?\s*/gc;
+ print(" Capitalized"), redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/gc;
+ print(" MiXeD"), redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/gc;
+ print(" alphanumeric"), redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/gc;
+ print(" line-noise"), redo LOOP if /\G[^A-Za-z0-9]+/gc;
+ print ". That's all!\n";
}
Here is the output (split into several lines):
be removed in some distant future version of Perl, perhaps somewhere
around the year 2168.
-=item s/PATTERN/REPLACEMENT/msixpogce
+=item s/PATTERN/REPLACEMENT/msixpogcer
X<substitute> X<substitution> X<replace> X<regexp, replace>
-X<regexp, substitute> X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c> X</e>
+X<regexp, substitute> X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c> X</e> X</r>
Searches a string for a pattern, and if found, replaces that pattern
with the replacement text and returns the number of substitutions
made. Otherwise it returns false (specifically, the empty string).
+If the C</r> (non-destructive) option is used then it will perform the
+substitution on a copy of the string and return the copy whether or not a
+substitution occurred. The original string will always remain unchanged in
+this case. The copy will always be a plain string, even if the input is an
+object or a tied variable.
+
If no string is specified via the C<=~> or C<!~> operator, the C<$_>
variable is searched and modified. (The string specified with C<=~> must
be scalar variable, an array element, a hash element, or an assignment
specific options:
e Evaluate the right side as an expression.
- ee Evaluate the right side as a string then eval the result
+ ee Evaluate the right side as a string then eval the result.
+ r Return substitution and leave the original string untouched.
Any non-whitespace delimiter may replace the slashes. Add space after
the C<s> when using a character allowed in identifiers. If single quotes
s/Login: $foo/Login: $bar/; # run-time pattern
($foo = $bar) =~ s/this/that/; # copy first, then change
+ ($foo = "$bar") =~ s/this/that/; # convert to string, copy, then change
+ $foo = $bar =~ s/this/that/r; # Same as above using /r
+ $foo = $bar =~ s/this/that/r
+ =~ s/that/the other/r; # Chained substitutes using /r
+ @foo = map { s/this/that/r } @bar # /r is very useful in maps
$count = ($paragraph =~ s/Mister\b/Mr./g); # get change-count
s/%(.)/$percent{$1} || $&/ge; # expr now, so /e
s/^=(\w+)/pod($1)/ge; # use function call
+ $_ = 'abc123xyz';
+ $a = s/abc/def/r; # $a is 'def123xyz' and
+ # $_ remains 'abc123xyz'.
+
# expand variables in $_, but dynamics only, using
# symbolic dereferencing
s/\$(\w+)/${$1}/g;
Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\E>,
and interpolation happens (almost) as with C<qq//> constructs.
+Processing of C<\N{...}> is also done here, and compiled into an intermediate
+form for the regex compiler. (This is because, as mentioned below, the regex
+compilation may be done at execution time, and C<\N{...}> is a compile-time
+construct.)
+
However any other combinations of C<\> followed by a character
are not substituted but only skipped, in order to parse them
as regular expressions at the following step.
printf "%.20g\n", 123456789123456789;
# produces 123456789123456784
-Testing for exact equality of floating-point equality or inequality is
-not a good idea. Here's a (relatively expensive) work-around to compare
+Testing for exact floating-point equality or inequality is not a
+good idea. Here's a (relatively expensive) work-around to compare
whether two floating-point numbers are equal to a particular number of
decimal places. See Knuth, volume II, for a more robust treatment of
this topic.
Here is a short, but incomplete summary:
- Math::Fraction big, unlimited fractions like 9973 / 12967
- Math::String treat string sequences like numbers
- Math::FixedPrecision calculate with a fixed precision
- Math::Currency for currency calculations
- Bit::Vector manipulate bit vectors fast (uses C)
- Math::BigIntFast Bit::Vector wrapper for big numbers
- Math::Pari provides access to the Pari C library
- Math::BigInteger uses an external C library
- Math::Cephes uses external Cephes C library (no big numbers)
- Math::Cephes::Fraction fractions via the Cephes library
- Math::GMP another one using an external C library
+ Math::Fraction big, unlimited fractions like 9973 / 12967
+ Math::String treat string sequences like numbers
+ Math::FixedPrecision calculate with a fixed precision
+ Math::Currency for currency calculations
+ Bit::Vector manipulate bit vectors fast (uses C)
+ Math::BigIntFast Bit::Vector wrapper for big numbers
+ Math::Pari provides access to the Pari C library
+ Math::BigInteger uses an external C library
+ Math::Cephes uses external Cephes C library (no big numbers)
+ Math::Cephes::Fraction fractions via the Cephes library
+ Math::GMP another one using an external C library
Choose wisely.