Clarify m?PATTERN? is ok and only ?PATTERN? is not

[perl5.git] / pod / perlop.pod
diff --git a/pod/perlop.pod b/pod/perlop.pod

index 58c0660..1b258c0 100644 (file)
--- a/pod/perlop.pod
+++ b/pod/perlop.pod
@@ -194,11 +194,12 @@ Unary "!" performs logical negation, i.e., "not".  See also C<not> for a lower
  precedence version of this.
  X<!>
  
-Unary "-" performs arithmetic negation if the operand is numeric.  If
-the operand is an identifier, a string consisting of a minus sign
-concatenated with the identifier is returned.  Otherwise, if the string
-starts with a plus or minus, a string starting with the opposite sign
-is returned.  One effect of these rules is that -bareword is equivalent
+Unary "-" performs arithmetic negation if the operand is numeric,
+including any string that looks like a number.  If the operand is
+an identifier, a string consisting of a minus sign concatenated
+with the identifier is returned.  Otherwise, if the string starts
+with a plus or minus, a string starting with the opposite sign is
+returned.  One effect of these rules is that -bareword is equivalent
  to the string "-bareword".  If, however, the string begins with a
  non-alphabetic character (excluding "+" or "-"), Perl will attempt to convert
  the string to a numeric and the arithmetic negation is performed. If the
@@ -235,9 +236,12 @@ of operation work on some other string.  The right argument is a search
  pattern, substitution, or transliteration.  The left argument is what is
  supposed to be searched, substituted, or transliterated instead of the default
  $_.  When used in scalar context, the return value generally indicates the
-success of the operation.  Behavior in list context depends on the particular
-operator.  See L</"Regexp Quote-Like Operators"> for details and
-L<perlretut> for examples using these operators.
+success of the operation.  The exceptions are substitution (s///)
+and transliteration (y///) with the C</r> (non-destructive) option,
+which cause the B<r>eturn value to be the result of the substitution.
+Behavior in list context depends on the particular operator.
+See L</"Regexp Quote-Like Operators"> for details and L<perlretut> for
+examples using these operators.
  
  If the right argument is an expression rather than a search pattern,
  substitution, or transliteration, it is interpreted as a search pattern at run
@@ -251,6 +255,9 @@ pattern C<\>, which it will consider a syntax error.
  Binary "!~" is just like "=~" except the return value is negated in
  the logical sense.
  
+Binary "!~" with a non-destructive substitution (s///r) or transliteration
+(y///r) is a syntax error.
+
  =head2 Multiplicative Operators
  X<operator, multiplicative>
  
@@ -503,10 +510,11 @@ Although it has no direct equivalent in C, Perl's C<//> operator is related
  to its C-style or.  In fact, it's exactly the same as C<||>, except that it
  tests the left hand side's definedness instead of its truth.  Thus, C<$a // $b>
  is similar to C<defined($a) || $b> (except that it returns the value of C<$a>
-rather than the value of C<defined($a)>) and is exactly equivalent to
-C<defined($a) ? $a : $b>.  This is very useful for providing default values
-for variables.  If you actually want to test if at least one of C<$a> and
-C<$b> is defined, use C<defined($a // $b)>.
+rather than the value of C<defined($a)>) and yields the same result as
+C<defined($a) ? $a : $b> (except that the ternary-operator form can be
+used as a lvalue, while C<$a // $b> cannot).  This is very useful for
+providing default values for variables.  If you actually want to test if
+at least one of C<$a> and C<$b> is defined, use C<defined($a // $b)>.
  
  The C<||>, C<//> and C<&&> operators return the last value evaluated
  (unlike C's C<||> and C<&&>, which return 0 or 1). Thus, a reasonably
@@ -1013,26 +1021,72 @@ from the next line.  This allows you to write:
  The following escape sequences are available in constructs that interpolate
  and in transliterations.
  X<\t> X<\n> X<\r> X<\f> X<\b> X<\a> X<\e> X<\x> X<\0> X<\c> X<\N> X<\N{}>
-
-    Sequence    Note  Description
-    \t                tab             (HT, TAB)
-    \n                newline         (NL)
-    \r                return          (CR)
-    \f                form feed       (FF)
-    \b                backspace       (BS)
-    \a                alarm (bell)    (BEL)
-    \e                escape          (ESC)
-    \033              octal char      (example: ESC)
-    \x1b              hex char        (example: ESC)
-    \x{263a}          wide hex char   (example: SMILEY)
-    \c[          [1]  control char    (example: chr(27))
-    \N{name}     [2]  named Unicode character
-    \N{U+263D}   [3]  Unicode character (example: FIRST QUARTER MOON)
+X<\o{}>
+
+    Sequence     Note  Description
+    \t                  tab               (HT, TAB)
+    \n                  newline           (NL)
+    \r                  return            (CR)
+    \f                  form feed         (FF)
+    \b                  backspace         (BS)
+    \a                  alarm (bell)      (BEL)
+    \e                  escape            (ESC)
+    \x{263a}     [1,8]  hex char          (example: SMILEY)
+    \x1b         [2,8]  restricted range hex char (example: ESC)
+    \N{name}     [3]    named Unicode character or character sequence
+    \N{U+263D}   [4,8]  Unicode character (example: FIRST QUARTER MOON)
+    \c[          [5]    control char      (example: chr(27))
+    \o{23072}    [6,8]  octal char        (example: SMILEY)
+    \033         [7,8]  restricted range octal char  (example: ESC)
  
  =over 4
  
  =item [1]
  
+The result is the character specified by the hexadecimal number between
+the braces.  See L</[8]> below for details on which character.
+
+Only hexadecimal digits are valid between the braces. If an invalid
+character is encountered, a warning will be issued and the invalid
+character and all subsequent characters (valid or invalid) within the
+braces will be discarded.
+
+If there are no valid digits between the braces, the generated character is
+the NULL character (C<\x{00}>).  However, an explicit empty brace (C<\x{}>)
+will not cause a warning.
+
+=item [2]
+
+The result is the character specified by the hexadecimal number in the range
+0x00 to 0xFF.  See L</[8]> below for details on which character.
+
+Only hexadecimal digits are valid following C<\x>.  When C<\x> is followed
+by fewer than two valid digits, any valid digits will be zero-padded.  This
+means that C<\x7> will be interpreted as C<\x07> and C<\x> alone will be
+interpreted as C<\x00>.  Except at the end of a string, having fewer than
+two valid digits will result in a warning.  Note that while the warning
+says the illegal character is ignored, it is only ignored as part of the
+escape and will still be used as the subsequent character in the string.
+For example:
+
+  Original    Result    Warns?
+  "\x7"       "\x07"    no
+  "\x"        "\x00"    no
+  "\x7q"      "\x07q"   yes
+  "\xq"       "\x00q"   yes
+
+=item [3]
+
+The result is the Unicode character or character sequence given by I<name>.
+See L<charnames>.
+
+=item [4]
+
+C<\N{U+I<hexadecimal number>}> means the Unicode character whose Unicode code
+point is I<hexadecimal number>.
+
+=item [5]
+
  The character following C<\c> is mapped to some other character as shown in the
  table:
  
@@ -1066,14 +1120,62 @@ the 7th bit (0x40).
  
  To get platform independent controls, you can use C<\N{...}>.
  
-=item [2]
-
-For documentation of C<\N{name}>, see L<charnames>.
-
-=item [3]
-
-C<\N{U+I<wide hex char>}> means the Unicode character whose Unicode ordinal
-number is I<wide hex char>.
+=item [6]
+
+The result is the character specified by the octal number between the braces.
+See L</[8]> below for details on which character.
+
+If a character that isn't an octal digit is encountered, a warning is raised,
+and the value is based on the octal digits before it, discarding it and all
+following characters up to the closing brace.  It is a fatal error if there are
+no octal digits at all.
+
+=item [7]
+
+The result is the character specified by the three digit octal number in the
+range 000 to 777 (but best to not use above 077, see next paragraph).  See
+L</[8]> below for details on which character.
+
+Some contexts allow 2 or even 1 digit, but any usage without exactly
+three digits, the first being a zero, may give unintended results.  (For
+example, see L<perlrebackslash/Octal escapes>.)  Starting in Perl 5.14, you may
+use C<\o{}> instead which avoids all these problems.  Otherwise, it is best to
+use this construct only for ordinals C<\077> and below, remembering to pad to
+the left with zeros to make three digits.  For larger ordinals, either use
+C<\o{}> , or convert to something else, such as to hex and use C<\x{}>
+instead.
+
+Having fewer than 3 digits may lead to a misleading warning message that says
+that what follows is ignored.  For example, C<"\128"> in the ASCII character set
+is equivalent to the two characters C<"\n8">, but the warning C<Illegal octal
+digit '8' ignored> will be thrown.  To avoid this warning, make sure to pad
+your octal number with C<0>s: C<"\0128">.
+
+=item [8]
+
+Several of the constructs above specify a character by a number.  That number
+gives the character's position in the character set encoding (indexed from 0).
+This is called synonymously its ordinal, code position, or code point).  Perl
+works on platforms that have a native encoding currently of either ASCII/Latin1
+or EBCDIC, each of which allow specification of 256 characters.  In general, if
+the number is 255 (0xFF, 0377) or below, Perl interprets this in the platform's
+native encoding.  If the number is 256 (0x100, 0400) or above, Perl interprets
+it as as a Unicode code point and the result is the corresponding Unicode
+character.  For example C<\x{50}> and C<\o{120}> both are the number 80 in
+decimal, which is less than 256, so the number is interpreted in the native
+character set encoding.  In ASCII the character in the 80th position (indexed
+from 0) is the letter "P", and in EBCDIC it is the ampersand symbol "&".
+C<\x{100}> and C<\o{400}> are both 256 in decimal, so the number is interpreted
+as a Unicode code point no matter what the native encoding is.  The name of the
+character in the 100th position (indexed by 0) in Unicode is
+C<LATIN CAPITAL LETTER A WITH MACRON>.
+
+There are a couple of exceptions to the above rule.  C<\N{U+I<hex number>}> is
+always interpreted as a Unicode code point, so that C<\N{U+0050}> is "P" even
+on EBCDIC platforms.  And if L<C<S<use encoding>>|encoding> is in effect, the
+number is considered to be in that encoding, and is translated from that into
+the platform's native encoding if there is a corresponding native character;
+otherwise to Unicode.
  
  =back
  
@@ -1089,8 +1191,8 @@ X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q>
      \u         uppercase next char
      \L         lowercase till \E
      \U         uppercase till \E
-    \E         end case modification
      \Q         quote non-word characters till \E
+    \E         end either case modification or quoted section
  
  If C<use locale> is in effect, the case map used by C<\l>, C<\L>,
  C<\u> and C<\U> is taken from the current locale.  See L<perllocale>.
@@ -1125,10 +1227,24 @@ C<join $", @array>.    "Punctuation" arrays such as C<@*> are only
  interpolated if the name is enclosed in braces C<@{*}>, but special
  arrays C<@_>, C<@+>, and C<@-> are interpolated, even without braces.
  
-You cannot include a literal C<$> or C<@> within a C<\Q> sequence.
-An unescaped C<$> or C<@> interpolates the corresponding variable,
-while escaping will cause the literal string C<\$> to be inserted.
-You'll need to write something like C<m/\Quser\E\@\Qhost/>.
+For double-quoted strings, the quoting from C<\Q> is applied after
+interpolation and escapes are processed.
+
+    "abc\Qfoo\tbar$s\Exyz"
+
+is equivalent to
+
+    "abc" . quotemeta("foo\tbar$s") . "xyz"
+
+For the pattern of regex operators (C<qr//>, C<m//> and C<s///>),
+the quoting from C<\Q> is applied after interpolation is processed,
+but before escapes are processed. This allows the pattern to match
+literally (except for C<$> and C<@>). For example, the following matches:
+
+    '\s\t' =~ /\Q\s\t/
+
+Because C<$> or C<@> trigger interpolation, you'll need to use something
+like C</\Quser\E\@\Qhost/> to match them literally.
  
  Patterns are subject to an additional level of interpretation as a
  regular expression.  This is done as a second pass, after variables are
@@ -1241,11 +1357,11 @@ process modifiers are available:
   g  Match globally, i.e., find all occurrences.
   c  Do not reset search position on a failed match when /g is in effect.
  
-If "/" is the delimiter then the initial C<m> is optional.  With the C<m>
-you can use any pair of non-whitespace characters
-as delimiters.  This is particularly useful for matching path names
-that contain "/", to avoid LTS (leaning toothpick syndrome).  If "?" is
-the delimiter, then the match-only-once rule of C<?PATTERN?> applies.
+If "/" is the delimiter then the initial C<m> is optional.  With the
+C<m> you can use any pair of non-whitespace characters as delimiters.
+This is particularly useful for matching path names that contain "/",
+to avoid LTS (leaning toothpick syndrome).  If "?" is the delimiter,
+then the match-only-once rule of C<m?PATTERN?> applies (see below).
  If "'" is the delimiter, no interpolation is performed on the PATTERN.
  When using a character valid in an identifier, whitespace is required
  after the C<m>.
@@ -1312,31 +1428,33 @@ $Etc.  The conditional is true if any variables were assigned, i.e., if
  the pattern matched.
  
  The C</g> modifier specifies global pattern matching--that is,
-matching as many times as possible within the string.  How it behaves
-depends on the context.  In list context, it returns a list of the
+matching as many times as possible within the string. How it behaves
+depends on the context. In list context, it returns a list of the
  substrings matched by any capturing parentheses in the regular
-expression.  If there are no parentheses, it returns a list of all
+expression. If there are no parentheses, it returns a list of all
  the matched strings, as if there were parentheses around the whole
  pattern.
  
  In scalar context, each execution of C<m//g> finds the next match,
  returning true if it matches, and false if there is no further match.
-The position after the last match can be read or set using the pos()
-function; see L<perlfunc/pos>.   A failed match normally resets the
+The position after the last match can be read or set using the C<pos()>
+function; see L<perlfunc/pos>. A failed match normally resets the
  search position to the beginning of the string, but you can avoid that
-by adding the C</c> modifier (e.g. C<m//gc>).  Modifying the target
+by adding the C</c> modifier (e.g. C<m//gc>). Modifying the target
  string also resets the search position.
  
  =item \G assertion
  
  You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a
-zero-width assertion that matches the exact position where the previous
-C<m//g>, if any, left off.  Without the C</g> modifier, the C<\G> assertion
-still anchors at pos(), but the match is of course only attempted once.
-Using C<\G> without C</g> on a target string that has not previously had a
-C</g> match applied to it is the same as using the C<\A> assertion to match
-the beginning of the string.  Note also that, currently, C<\G> is only
-properly supported when anchored at the very beginning of the pattern.
+zero-width assertion that matches the exact position where the
+previous C<m//g>, if any, left off. Without the C</g> modifier, the
+C<\G> assertion still anchors at C<pos()> as it was at the start of
+the operation (see L<perlfunc/pos>), but the match is of course only
+attempted once. Using C<\G> without C</g> on a target string that has
+not previously had a C</g> match applied to it is the same as using
+the C<\A> assertion to match the beginning of the string.  Note also
+that, currently, C<\G> is only properly supported when anchored at the
+very beginning of the pattern.
  
  Examples:
  
@@ -1402,40 +1520,46 @@ regexp tries to match where the previous one leaves off.
  
  Here is the output (split into several lines):
  
- line-noise lowercase line-noise lowercase UPPERCASE line-noise
- UPPERCASE line-noise lowercase line-noise lowercase line-noise
- lowercase lowercase line-noise lowercase lowercase line-noise
- MiXeD line-noise. That's all!
+       line-noise lowercase line-noise UPPERCASE line-noise UPPERCASE
+       line-noise lowercase line-noise lowercase line-noise lowercase
+       lowercase line-noise lowercase lowercase line-noise lowercase
+       lowercase line-noise MiXeD line-noise. That's all!
  
-=item ?PATTERN?
+=item m?PATTERN?
  X<?>
  
-This is just like the C</pattern/> search, except that it matches only
+This is just like the C<m/pattern/> search, except that it matches only
  once between calls to the reset() operator.  This is a useful
  optimization when you want to see only the first occurrence of
-something in each file of a set of files, for instance.  Only C<??>
+something in each file of a set of files, for instance.  Only C<m??>
  patterns local to the current package are reset.
  
      while (<>) {
-       if (?^$?) {
+       if (m?^$?) {
                             # blank line between header and body
         }
      } continue {
         reset if eof;       # clear ?? status for next file
      }
  
-This usage is vaguely deprecated, which means it just might possibly
-be removed in some distant future version of Perl, perhaps somewhere
-around the year 2168.
+The use of C<?PATTERN?> without a leading "m" is vaguely deprecated,
+which means it just might possibly be removed in some distant future
+version of Perl, perhaps somewhere around the year 2168.
  
-=item s/PATTERN/REPLACEMENT/msixpogce
+=item s/PATTERN/REPLACEMENT/msixpogcer
  X<substitute> X<substitution> X<replace> X<regexp, replace>
-X<regexp, substitute> X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c> X</e>
+X<regexp, substitute> X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c> X</e> X</r>
  
  Searches a string for a pattern, and if found, replaces that pattern
  with the replacement text and returns the number of substitutions
  made.  Otherwise it returns false (specifically, the empty string).
  
+If the C</r> (non-destructive) option is used then it will perform the
+substitution on a copy of the string and return the copy whether or not a
+substitution occurred. The original string will always remain unchanged in
+this case. The copy will always be a plain string, even if the input is an
+object or a tied variable.
+
  If no string is specified via the C<=~> or C<!~> operator, the C<$_>
  variable is searched and modified.  (The string specified with C<=~> must
  be scalar variable, an array element, a hash element, or an assignment
@@ -1456,7 +1580,8 @@ Options are as with m// with the addition of the following replacement
  specific options:
  
      e  Evaluate the right side as an expression.
-    ee  Evaluate the right side as a string then eval the result
+    ee  Evaluate the right side as a string then eval the result.
+    r   Return substitution and leave the original string untouched.
  
  Any non-whitespace delimiter may replace the slashes.  Add space after
  the C<s> when using a character allowed in identifiers.  If single quotes
@@ -1480,6 +1605,11 @@ Examples:
      s/Login: $foo/Login: $bar/; # run-time pattern
  
      ($foo = $bar) =~ s/this/that/;     # copy first, then change
+    ($foo = "$bar") =~ s/this/that/;   # convert to string, copy, then change
+    $foo = $bar =~ s/this/that/r;      # Same as above using /r
+    $foo = $bar =~ s/this/that/r
+                =~ s/that/the other/r; # Chained substitutes using /r
+    @foo = map { s/this/that/r } @bar  # /r is very useful in maps
  
      $count = ($paragraph =~ s/Mister\b/Mr./g);  # get change-count
  
@@ -1492,6 +1622,10 @@ Examples:
      s/%(.)/$percent{$1} || $&/ge;      # expr now, so /e
      s/^=(\w+)/pod($1)/ge;      # use function call
  
+    $_ = 'abc123xyz';
+    $a = s/abc/def/r;           # $a is 'def123xyz' and
+                                # $_ remains 'abc123xyz'.
+
      # expand variables in $_, but dynamics only, using
      # symbolic dereferencing
      s/\$(\w+)/${$1}/g;
@@ -1686,10 +1820,10 @@ C<use warnings> pragma and the B<-w> switch (that is, the C<$^W> variable)
  produces warnings if the STRING contains the "," or the "#" character.
  
  
-=item tr/SEARCHLIST/REPLACEMENTLIST/cds
+=item tr/SEARCHLIST/REPLACEMENTLIST/cdsr
  X<tr> X<y> X<transliterate> X</c> X</d> X</s>
  
-=item y/SEARCHLIST/REPLACEMENTLIST/cds
+=item y/SEARCHLIST/REPLACEMENTLIST/cdsr
  
  Transliterates all occurrences of the characters found in the search list
  with the corresponding character in the replacement list.  It returns
@@ -1698,6 +1832,12 @@ specified via the =~ or !~ operator, the $_ string is transliterated.  (The
  string specified with =~ must be a scalar variable, an array element, a
  hash element, or an assignment to one of those, i.e., an lvalue.)
  
+If the C</r> (non-destructive) option is used then it will perform the
+replacement on a copy of the string and return the copy whether or not it
+was modified. The original string will always remain unchanged in
+this case. The copy will always be a plain string, even if the input is an
+object or a tied variable.
+
  A character range may be specified with a hyphen, so C<tr/A-J/0-9/>
  does the same replacement as C<tr/ACEGIBDFHJ/0246813579/>.
  For B<sed> devotees, C<y> is provided as a synonym for C<tr>.  If the
@@ -1723,6 +1863,8 @@ Options:
      c  Complement the SEARCHLIST.
      d  Delete found but unreplaced characters.
      s  Squash duplicate replaced characters.
+    r  Return the modified string and leave the original string
+       untouched.
  
  If the C</c> modifier is specified, the SEARCHLIST character set
  is complemented.  If the C</d> modifier is specified, any characters
@@ -1753,9 +1895,16 @@ Examples:
      tr/a-zA-Z//s;              # bookkeeper -> bokeper
  
      ($HOST = $host) =~ tr/a-z/A-Z/;
+    $HOST = $host =~ tr/a-z/A-Z/r;   # same thing
+
+    $HOST = $host =~ tr/a-z/A-Z/r    # chained with s///
+                  =~ s/:/ -p/r;
  
      tr/a-zA-Z/ /cs;            # change non-alphas to single space
  
+    @stripped = map tr/a-zA-Z/ /csr, @original;
+                               # /r with map
+
      tr [\200-\377]
         [\000-\177];            # delete 8th bit
  
@@ -2521,17 +2670,17 @@ floating point.  But by saying
  
      use integer;
  
-you may tell the compiler that it's okay to use integer operations
-(if it feels like it) from here to the end of the enclosing BLOCK.
-An inner BLOCK may countermand this by saying
+you may tell the compiler to use integer operations
+(see L<integer> for a detailed explanation) from here to the end of
+the enclosing BLOCK.  An inner BLOCK may countermand this by saying
  
      no integer;
  
  which lasts until the end of that BLOCK.  Note that this doesn't
-mean everything is only an integer, merely that Perl may use integer
-operations if it is so inclined.  For example, even under C<use
-integer>, if you take the C<sqrt(2)>, you'll still get C<1.4142135623731>
-or so.
+mean everything is an integer, merely that Perl will use integer
+operations for arithmetic, comparison, and bitwise operators.  For
+example, even under C<use integer>, if you take the C<sqrt(2)>, you'll
+still get C<1.4142135623731> or so.
  
  Used on numbers, the bitwise operators ("&", "|", "^", "~", "<<",
  and ">>") always produce integral results.  (But see also